Editing What does clustering tell us
Jump to navigation
Jump to search
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 129: | Line 129: | ||
|Total amount of time spent using apps classified as "very productive" | |Total amount of time spent using apps classified as "very productive" | ||
|} | |} | ||
− | The data was exported from the respective sources through the [[Open Humans]] integrations. A Jupyter notebook to export all this data in a unified spreadsheet | + | The data was exported from the respective sources through the [[Open Humans]] integrations. A Jupyter notebook to export all this data in a unified spreadsheet will be made available soon. |
=== Processing the data === | === Processing the data === | ||
Line 137: | Line 137: | ||
== Running a PCA == | == Running a PCA == | ||
− | With the full data table prepared for this time period, I ended up with 280 observations (aka days) that had full data for these 37 variables that I could use to run the PCA. For this I used the [[R]] package <code>FactoMineR</code> as it not only provides the basic functions for running the analysis, but also a wide set of visualization options | + | [[File:PCA test variable alignment.png|thumb|The variable distribution after the PCA, answering how the 37 different variables correlate with each other. Arrows pointing in the same direction positively correlate with each other. Arrows pointing in opposite directions are negatively correlated. Length of the arrows is a metric for how 'well' the variable is represented in the PCA. Colors are the result of kmeans clustering of variables.]] |
+ | With the full data table prepared for this time period, I ended up with 280 observations (aka days) that had full data for these 37 variables that I could use to run the PCA. For this I used the [[R]] package <code>FactoMineR</code> as it not only provides the basic functions for running the analysis, but also a wide set of visualization options. Roughly speaking, PCAs are a way to reduce the dimensionality of data by 'rotating' the data in a way that it can be represented in fewer dimensions, ideally no more than 2-3 as this would allow visualizing it in a human-readable space. In this case, we have 37 different dimensions as given by the 37 variables and would like to boil it down to fewer dimensions without losing any information. | ||
− | ===How do the different metrics correlate?=== | + | === How do the different metrics correlate? === |
− | + | Running the PCA – including a normalization/re-scaling of the variables – results in the graph on the right. Doing an additional clustering by kmeans shows that there are three main groups in which the variables can be clustered: | |
− | #The '''top left quadrant''' mainly includes all metrics associated to productivity as measured by RescueTime (regardless of productivity/unproductivity category), as well as different metrics from Oura that relate to inactivity but also my cycling distance. | + | # The '''top left quadrant''' mainly includes all metrics associated to productivity as measured by RescueTime (regardless of productivity/unproductivity category), as well as different metrics from Oura that relate to inactivity but also my cycling distance. |
− | #The '''bottom right quadrant''' includes mainly different sleep metrics from Oura but also associated metrics such as resting heart rate and average sleeping heart rate and furthermore my weight. | + | # The '''bottom right quadrant''' includes mainly different sleep metrics from Oura but also associated metrics such as resting heart rate and average sleeping heart rate and furthermore my weight. |
− | #The '''top right quadrant''' includes metrics that have a 90º vector to both other clusters and mainly includes different metrics to medium & higher intensity activity. These include my overall step count as well as active calorie burn. | + | # The '''top right quadrant''' includes metrics that have a 90º vector to both other clusters and mainly includes different metrics to medium & higher intensity activity. These include my overall step count as well as active calorie burn. |
− | The axis-labels also show how much of the overall variance in my data can be explained among these two dimensions that are being plotted, which comes down to 18.5% of variance on the X-axis (dimension 1) and 13.8% on the Y-axis (dimension 2) | + | The axis-labels also show how much of the overall variance in my data can be explained among these two dimensions that are being plotted, which comes down to 18.5% of variance on the X-axis (dimension 1) and 13.8% on the Y-axis (dimension 2). |
− | + | {{Project Queries}} | |
− | + | [[Category:Projects]] | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | [[Category:Projects |