What does clustering tell us

From Personal Science Wiki
Revision as of 08:23, 10 June 2022 by Gedankenstuecke (talk | contribs) (start page creation, still in draft stage)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Project Infobox Question-icon.png
Self researcher(s) User:Gedankenstuecke
Related tools Oura Ring, Fitbit, Apple Watch, RescueTime
Related topics Sleep, Activity, Weight tracking, Productivity tracking

Builds on project(s)
100 Days of Summer
Has inspired Projects (0)


"What does clustering tell us" is a personal science project that is still ongoing and that is the result of discussions during the weekly self-research chats. The goal is to understand whether unsupervised clustering of a large number of metrics can lead to a better understanding of how different metrics relate to each other and if it shows any interesting clusters in how different days are similar to each other.

Background

The idea for doing this project came up during the spring of 2022 when discussing potential projects for the annual Keating Memorial, when some participants in the self-research chats brainstormed whether or how one could learn more from a large set of data across a number of topics and metrics.

During this discussion a suggestion was to see whether unsupervised clustering could help uncover which variables correlate with each other while also highlighting whether there are different types of days. A search in the Show & Tell archives showed that a similar approach had already been tried in the 100 Days of Summer project.

Reducing dimensions with a Principal Component Analysis

To give it a first try, I decided to go ahead and use some of my data to see if such a clustering could work. In order to limit the scope I decided to use data from a variety of sources. To simplify the approach, I decided to use the day as the unit of observation. For this, I either summed up or averaged measurements throughout the day, depending on the metric (see Table below).

Metrics used

A total list of 37 different metrics to be used for this work.

All variables (so far) used for this project
Data source Metric type Variable Details
Oura Ring Activity Daily movement activity measured in "walking distance equivalent", single value per day
Steps Number of steps taken, single value per day
Total calories single value per day
Active calories single value per day
Average MET Average Metabolic equivalent of task, single value per day
inactive MET minutes single value per day
low activity MET minutes single value per day
medium activity MET minutes single value per day
high activity MET minutes single value per day
inactive time minutes spent not being active, single value per day
low activity time minutes spent with low activity (low MET level)
medium activity time minutes spent with medium intensity activity
high activity time minutes spent with high intensity activity
Sleep Total time in bed Includes time being awake, single value per day given in seconds
Total sleep time Total time asleep (excludes delay to falling asleep and awake time during night)
Time awake Time awake during the night
REM sleep seconds spent in REM sleep
Deep sleep seconds spent in deep sleep
light sleep seconds spend in light sleep
Time restless Time spent moving during sleep
Sleep latency Time difference between going to bed & falling asleep
"Recovery" / Rest Resting heart rate lowest nightly heart rate
Average sleep heart rate average heart rate measured during sleep
Heart Rate Variability highest HRV measured during sleep
Body temperature delta Nightly body temperature compared to long-term baseline
Apple Watch Activity Cycling distance Total distance cycled during a day (given in km)
Walking + Running distance Total distance walked / run during a day (given in km)
Fitbit Body Weight Daily weight in kilogram (averaged if more than one measurement per day)
RescueTime Productivity Very distracting time Total amount of time spent using apps classified as "very distracting"
Distracting time Total amount of time spent using apps classified as "distracting"
Neutral time Total amount of time spent using apps classified as "neutral"
Productive time Total amount of time spent using apps classified as "productive"
Very productive time Total amount of time spent using apps classified as "very productive"

The data was exported from the respective sources through the Open Humans integrations. A Jupyter notebook to export all this data in a unified spreadsheet will be made available soon.

Processing the data

I exported data for all these variables for a time period between September 1, 2021 and June 08, 2022 as this was the period for which I felt like most data would be complete. Following the export of the data as one large spreadsheet, some more processing was needed.

Doing a principal component analysis ideally requires a "complete" data set without any missing values. Depending on the metric, the spreadsheet generated above still had gaps in it. Some gaps were due to lack of measurements (e.g. a gap in the weight record represents me not weighing myself), while in other cases a gap means that the value should be zero (e.g. if I did not cycle at all, then Apple Health would report a data gap, but it actually represents zero kilometers cycled).

Linked content on this wiki

(The content in the table below is automatically created. See Template:Project Queries for details. If newly linked pages do not appear here, click on "More" and "Refresh".)

Project that build on this project  
We talked about this project in the following meetings  
2022-06-09 Self-Research Chat