Changes

Jump to navigation Jump to search
am I done?
Line 1: Line 1: −
This is for DIYers with at least some knowledge of [[Excel]] or other spreadsheet software.
+
This page is mainly for DIYers with at least some knowledge of [[Excel]] or other spreadsheet software.
    
==== File Format ====
 
==== File Format ====
Comma Separated Values (CSV) is the easiest for beginners to work with. Most analysis tools can read JSON and sqlite without problems. Avoid proprietary formats which some software may be unable to open.  
+
Comma Separated Values (CSV) is the easiest for beginners to work with. Most analysis tools can read JSON and sqlite without problems. Avoid proprietary formats which some software may be unable to open. If you are using [[R]], readr package will automatically point out errors in CSV file.
If you are using r readr package will automatically point out errors in CSV file.
      
==== Structure ====
 
==== Structure ====
[[Time tracking]] and events. For example consider tracking the muscle spasms of left arm.
+
Statistical analysis of self tracking data is usually done on tabular data, like spreadsheets, with rows representing individual observations.<ref>https://en.wikipedia.org/wiki/Relational_database</ref> In all but a few cases this is sufficient structure.  
   −
* When. Always the time of the event is included.  
+
 
 +
[[Time tracking]] and events. For example consider tracking the muscle spasms of left arm. 
 +
 
 +
* What. Necessary if variable description does not imply only one value.
 +
* When. Always the time of the event is included.
 
* Duration. How long event lasted.
 
* Duration. How long event lasted.
 
* Strength. How strong was the event. See [[Self assessment]].
 
* Strength. How strong was the event. See [[Self assessment]].
* Notes. A written description of anything unusual about the event. like  [[Tools for journaling, thoughts and note taking|journaling and noting]] but with a specific connection to this event that is easier for analysis tool to find.
+
* Notes. A written description of anything unusual about the event. like  [[Tools for journaling, thoughts and note taking|journaling and noting]] but with a specific connection to this event that is easier for analysis tool to use.
* Additional / advanced. For example a true or false of weather the spasms hurt. Some of these can be tests, others can be really complicated like with [[Diet tracking tools|Diet tracking apps]].
+
* Additional / advanced. For example a true or false of weather the spasms hurt. Some of these can be tests, others can be really complicated like with [[Diet tracking tools|Diet tracking apps]]. Sometimes this type of data can not easily be represented in tabular form.  
      −
States that are written once and apply until changed to something else. For example, place of residence or whether a brace is being worn. Technically this is almost the same as the 'event' structure above but 'state' describes something very different.
+
States that are written once and apply until changed to something else. For example, place of residence or whether a brace is being worn continuously. This structure is similar to a simple "event" with just a when-time and what though duration is calculated from replacement.  
      −
Continuous sensor sampling will produce a time series with regular intervals. These are made automatically by physical wearable sensors. Something similar is produced by [[Tools to survey symptoms and states|Tools to survey symptoms]] though the frequency is irregular and far greater.  
+
Continuous sensors, usually wearables, will produce a time series with regular intervals. This will be when and strength. Something similar is produced by [[Tools to survey symptoms and states|Tools to survey symptoms]] though the frequency is irregular and far less.  
      −
[[Tools for journaling, thoughts and note taking|Journal]] entries. Just written texts describing the day.  
+
[[Tools for journaling, thoughts and note taking|Journal]] entries and notes. Often journal entries are written texts describing the day.  
    
==== Variables ====
 
==== Variables ====
Some common characteristics of sources of data you should write before analyzing.
+
Some common characteristics of sources of data you should record before analyzing.
   −
* Independence. Is the variable affected by other variables your are measuring or is it completely dependent on outside factors like weather?
+
* Independence. Is the variable affected by other variables your are measuring or is it almost completely dependent on outside factors like the weather?
* A variable that depends on previous values of this same variable is not independent and is called auto-correlative? and non-stationary. For example skills at playing the guitar.
+
* A variable that depends on previous values of this same variable is not independent and is called auto-correlative? and non-stationary. For example skills at playing the guitar.
 
* Randomness of Missingness. Similar to independence but its not the value of the variable but whether other measured variables could correlate with higher incidence of missing values. For example forgetting to charge the smart band because of tiredness and having a night without it on.
 
* Randomness of Missingness. Similar to independence but its not the value of the variable but whether other measured variables could correlate with higher incidence of missing values. For example forgetting to charge the smart band because of tiredness and having a night without it on.
 
* Target. Level. Is this variable something you want to improve, or a variable likely to affect those or just an intermediary background variable measured because it was easy and provided context?
 
* Target. Level. Is this variable something you want to improve, or a variable likely to affect those or just an intermediary background variable measured because it was easy and provided context?
 
* Similarity. Proxy. Is this variable measuring something very similar to what another variable is measuring. The most common example is [[Tools for heart rate or pulse|heart rate]] as many wearable measure it and the avid self tracker always has a few.
 
* Similarity. Proxy. Is this variable measuring something very similar to what another variable is measuring. The most common example is [[Tools for heart rate or pulse|heart rate]] as many wearable measure it and the avid self tracker always has a few.
 +
 +
==== Data Cleaning ====
 +
Check if the device or app produces correct data soon after first use. Correct, remove or impute outliers (very extreme values) produced by errors but not real events. In the rare case that the data is raw sensor like [[Accelerometry]], aggregate it into something more manageable. Consumer wearables make "steps per 10 minutes" for which open source resource is likely available. Finally, compare against other data to remove errors like exercising in the middle of sleep.
    
== References ==
 
== References ==
1,683

edits

Navigation menu