Changes

5,553 bytes added ,  08:23, 10 June 2022
start page creation, still in draft stage
{{Project Infobox|Self researchers=User:Gedankenstuecke|Related tools=Oura Ring, Fitbit, Apple Watch, RescueTime|Related topics=Sleep, Activity, Weight tracking, Productivity tracking|Related projects=100 Days of Summer}}

'''"What does clustering tell us"''' is a personal science project that is still ongoing and that is the result of [[Personal Science Community Meet Ups|discussions during the weekly self-research chats]]. The goal is to understand whether unsupervised clustering of a large number of metrics can lead to a better understanding of how different metrics relate to each other and if it shows any interesting clusters in how different days are similar to each other.

== Background ==
The idea for doing this project came up during the spring of 2022 when discussing potential projects for the annual [[Keating Memorial]], when some participants in the self-research chats brainstormed whether or how one could learn more from a large set of data across a number of topics and metrics.

During this discussion a suggestion was to see whether unsupervised clustering could help uncover which variables correlate with each other while also highlighting whether there are different types of days. A search in the Show & Tell archives showed that a similar approach had already been tried in the [[100 Days of Summer]] project.

== Reducing dimensions with a Principal Component Analysis ==
To give it a first try, I decided to go ahead and use some of my data to see if such a clustering could work. In order to limit the scope I decided to use data from a variety of sources. To simplify the approach, I decided to use the '''''day''''' as the unit of observation. For this, I either summed up or averaged measurements throughout the day, depending on the metric (see Table below).

=== Metrics used ===
A total list of 37 different metrics to be used for this work.
{| class="wikitable"
|+All variables (so far) used for this project
!Data source
!Metric type
!Variable
!Details
|-
| rowspan="25" |[[Oura Ring]]
| rowspan="13" |Activity
|Daily movement
|activity measured in "walking distance equivalent", single value per day
|-
|Steps
|Number of steps taken, single value per day
|-
|Total calories
|single value per day
|-
|Active calories
|single value per day
|-
|Average MET
|Average ''[[VO2Max|Metabolic equivalent of task]],'' single value per day
|-
|inactive MET minutes
|single value per day
|-
|low activity MET minutes
|single value per day
|-
|medium activity MET minutes
|single value per day
|-
|high activity MET minutes
|single value per day
|-
|inactive time
|minutes spent not being active, single value per day
|-
|low activity time
|minutes spent with low activity (low MET level)
|-
|medium activity time
|minutes spent with medium intensity activity
|-
|high activity time
|minutes spent with high intensity activity
|-
| rowspan="8" |Sleep
|Total time in bed
|Includes time being awake, single value per day given in seconds
|-
|Total sleep time
|Total time asleep (excludes delay to falling asleep and awake time during night)
|-
|Time awake
|Time awake during the night
|-
|REM sleep
|seconds spent in REM sleep
|-
|Deep sleep
|seconds spent in deep sleep
|-
|light sleep
|seconds spend in light sleep
|-
|Time restless
|Time spent moving during sleep
|-
|Sleep latency
|Time difference between going to bed & falling asleep
|-
| rowspan="4" |"Recovery" / Rest
|Resting heart rate
|lowest nightly heart rate
|-
|Average sleep heart rate
|average heart rate measured during sleep
|-
|[[HRV (Heart Rate Variability)|Heart Rate Variability]]
|highest HRV measured during sleep
|-
|Body temperature delta
|Nightly body temperature compared to long-term baseline
|-
| rowspan="2" |[[Apple Watch]]
| rowspan="2" |Activity
|Cycling distance
|Total distance cycled during a day (given in km)
|-
|Walking + Running distance
|Total distance walked / run during a day (given in km)
|-
|[[Fitbit]]
|Body
|Weight
|Daily weight in kilogram (averaged if more than one measurement per day)
|-
| rowspan="5" |[[RescueTime]]
| rowspan="5" |Productivity
|Very distracting time
|Total amount of time spent using apps classified as "very distracting"
|-
|Distracting time
|Total amount of time spent using apps classified as "distracting"
|-
|Neutral time
|Total amount of time spent using apps classified as "neutral"
|-
|Productive time
|Total amount of time spent using apps classified as "productive"
|-
|Very productive time
|Total amount of time spent using apps classified as "very productive"
|}
The data was exported from the respective sources through the [[Open Humans]] integrations. A Jupyter notebook to export all this data in a unified spreadsheet will be made available soon.

=== Processing the data ===
I exported data for all these variables for a time period between September 1, 2021 and June 08, 2022 as this was the period for which I felt like most data would be complete. Following the export of the data as one large spreadsheet, some more processing was needed.

Doing a [[principal component analysis]] ideally requires a "complete" data set without any missing values. Depending on the metric, the spreadsheet generated above still had gaps in it. Some gaps were due to lack of measurements (e.g. a gap in the weight record represents me not weighing myself), while in other cases a gap means that the value should be zero (e.g. if I did not cycle at all, then Apple Health would report a data gap, but it actually represents zero kilometers cycled).
{{Project Queries}}
[[Category:Projects]]