Editing Data format structure variables

Jump to navigation Jump to search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
{{Topic Infobox}}
+
This page is mainly for DIYers with at least some knowledge of [[Excel]] or other spreadsheet software.
This page is mainly for DIYers with at least some knowledge of [[Excel]] or other statistical analysis software.  Previous steps would be choosing a [[:Category:Tools|Tool]] or [[Aggregators|Aggregator]]. Some tools give data straight to the user. This page is the set of steps after receiving data. Check if the device or app produces correct data soon after first use.  
 
  
 
==== File Format ====
 
==== File Format ====
Line 6: Line 5:
  
 
==== Structure ====
 
==== Structure ====
Statistical analysis of self tracking data is usually done on tabular data, like spreadsheets, with rows representing individual observations.<ref>https://r4ds.had.co.nz/tidy-data.html</ref><ref>https://en.wikipedia.org/wiki/Relational_database</ref> In all but a few cases this is sufficient structure. See also [[Dates and Times]].  
+
Statistical analysis of self tracking data is usually done on tabular data, like spreadsheets, with rows representing individual observations.<ref>https://en.wikipedia.org/wiki/Relational_database</ref> In all but a few cases this is sufficient structure.  
  
  
Line 19: Line 18:
  
  
States that are written once and apply until changed to something else. For example, place of residence or whether a brace is being worn continuously. This structure is similar to a simple "event" with just a 'when-time' and 'what' though 'duration' is calculated based on when the state is changed to something new.  
+
States that are written once and apply until changed to something else. For example, place of residence or whether a brace is being worn continuously. This structure is similar to a simple "event" with just a when-time and what though duration is calculated from replacement.  
  
  
Line 25: Line 24:
  
  
[[Tools for journaling, thoughts and note taking|Journal]] entries and notes.  Often journal entries are written texts describing the day.
+
[[Tools for journaling, thoughts and note taking|Journal]] entries and notes.  Often journal entries are written texts describing the day.  
  
 
==== Variables ====
 
==== Variables ====
Line 33: Line 32:
 
* A variable that depends on previous values of this same variable is not independent and is called auto-correlative? and non-stationary. For example skills at playing the guitar.
 
* A variable that depends on previous values of this same variable is not independent and is called auto-correlative? and non-stationary. For example skills at playing the guitar.
 
* Randomness of Missingness. Similar to independence but its not the value of the variable but whether other measured variables could correlate with higher incidence of missing values. For example forgetting to charge the smart band because of tiredness and having a night without it on.
 
* Randomness of Missingness. Similar to independence but its not the value of the variable but whether other measured variables could correlate with higher incidence of missing values. For example forgetting to charge the smart band because of tiredness and having a night without it on.
* Target. Level. Is this variable something you want to improve, or a variable likely to affect those, or just an intermediary background variable measured because it was easy and provided context? If this is a target variable, mention the purpose of of tracking such as [[Life extension]], your doctor told you based on [[Lab tests]], or are you trying to improve performance [[Sports]].
+
* Target. Level. Is this variable something you want to improve, or a variable likely to affect those or just an intermediary background variable measured because it was easy and provided context?
 
* Similarity. Proxy. Is this variable measuring something very similar to what another variable is measuring. The most common example is [[Tools for heart rate or pulse|heart rate]] as many wearable measure it and the avid self tracker always has a few.
 
* Similarity. Proxy. Is this variable measuring something very similar to what another variable is measuring. The most common example is [[Tools for heart rate or pulse|heart rate]] as many wearable measure it and the avid self tracker always has a few.
* Sign. Positivity. If variable is a target, are higher values better or the opposite. Sometimes some middle value is best like with BMI.
 
* Scale and fact of [[Self assessment]]. Whether variable is anchored to objective standard or subjective or even relative to previous measurement. Also mention that it is self assessment.
 
* Is this target variable a measure of a problem, like pain, an accomplishment like playing guitar better, or both like a scale of cleverness in conversation?
 
  
 
==== Data Cleaning ====
 
==== Data Cleaning ====
Correct, remove or impute outliers (very extreme values) produced by errors but not real events. In the rare case that the data is raw sensor like [[Accelerometry]], aggregate it into something more manageable. Consumer wearables make "steps per 10 minutes" for which open source script is likely available. Finally, compare against other data to remove errors like exercising in the middle of sleep. I have not seen a script for this yet. [[User:DG|DG]] ([[User talk:DG|talk]])
+
Check if the device or app produces correct data soon after first use. Correct, remove or impute outliers (very extreme values) produced by errors but not real events. In the rare case that the data is raw sensor like [[Accelerometry]], aggregate it into something more manageable. Consumer wearables make "steps per 10 minutes" for which open source script is likely available. Finally, compare against other data to remove errors like exercising in the middle of sleep. I have not seen a script for this yet. [[User:DG|DG]] ([[User talk:DG|talk]]) 02:34, 29 May 2022 (UTC)
  
 
== References ==
 
== References ==
 
<references />
 
<references />
 
+
{{Topic Queries}}
[[Category:Data analysis]]
+
[[Category:Topics]]

Please note that all contributions to Personal Science Wiki are considered to be released under the Creative Commons Attribution-ShareAlike 3.0 Unported (see PersonalScienceWiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

Cancel Editing help (opens in new window)

Template used on this page: