Editing What does clustering tell us
Jump to navigation
Jump to search
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 8: | Line 8: | ||
During this discussion a suggestion was to see whether unsupervised clustering could help uncover which variables correlate with each other while also highlighting whether there are different types of days. A search in the Show & Tell archives showed that a similar approach had already been tried in the [[100 Days of Summer]] project. | During this discussion a suggestion was to see whether unsupervised clustering could help uncover which variables correlate with each other while also highlighting whether there are different types of days. A search in the Show & Tell archives showed that a similar approach had already been tried in the [[100 Days of Summer]] project. | ||
− | == | + | == Reducing dimensions with a Principal Component Analysis == |
− | To give it a first try, I decided to go ahead and use some of my data to see if such a clustering could work | + | To give it a first try, I decided to go ahead and use some of my data to see if such a clustering could work. In order to limit the scope I decided to use data from a variety of sources. To simplify the approach, I decided to use the '''''day''''' as the unit of observation. For this, I either summed up or averaged measurements throughout the day, depending on the metric (see Table below). |
=== Metrics used === | === Metrics used === | ||
Line 129: | Line 129: | ||
|Total amount of time spent using apps classified as "very productive" | |Total amount of time spent using apps classified as "very productive" | ||
|} | |} | ||
− | The data was exported from the respective sources through the [[Open Humans]] integrations. A Jupyter notebook to export all this data in a unified spreadsheet | + | The data was exported from the respective sources through the [[Open Humans]] integrations. A Jupyter notebook to export all this data in a unified spreadsheet will be made available soon. |
=== Processing the data === | === Processing the data === | ||
I exported data for all these variables for a time period between September 1, 2021 and June 08, 2022 as this was the period for which I felt like most data would be complete. Following the export of the data as one large spreadsheet, some more processing was needed. | I exported data for all these variables for a time period between September 1, 2021 and June 08, 2022 as this was the period for which I felt like most data would be complete. Following the export of the data as one large spreadsheet, some more processing was needed. | ||
− | Doing a | + | Doing a [[principal component analysis]] ideally requires a "complete" data set without any missing values. Depending on the metric, the spreadsheet generated above still had gaps in it. Some gaps were due to lack of measurements (e.g. a gap in the weight record represents me not weighing myself), while in other cases a gap means that the value should be zero (e.g. if I did not cycle at all, then Apple Health would report a data gap, but it actually represents zero kilometers cycled). |
− | + | {{Project Queries}} | |
− | + | [[Category:Projects]] | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | [[Category:Projects |