Data format structure variables

From Personal Science Wiki
Jump to navigation Jump to search

This is for DIYers with at least some knowledge of Excel or other spreadsheet software.

File Format

Comma Separated Values (CSV) is the easiest for beginners to work with. Most analysis tools can read JSON and sqlite without problems. Avoid proprietary formats which some software may be unable to open. If you are using r readr package will automatically point out errors in CSV file.

Structure

Time tracking and events. For example consider tracking the muscle spasms of left arm.

  • When. Always the time of the event is included.
  • Duration. How long event lasted.
  • Strength. How strong was the event. See Self assessment.
  • Notes. A written description of anything unusual about the event. like journaling and noting but with a specific connection to this event that is easier for analysis tool to find.
  • Additional / advanced. For example a true or false of weather the spasms hurt. Some of these can be tests, others can be really complicated like with Diet tracking apps.


States that are written once and apply until changed to something else. For example, place of residence or whether a brace is being worn. Technically this is almost the same as the 'event' structure above but 'state' describes something very different.


Continuous sensor sampling will produce a time series with regular intervals. These are made automatically by physical wearable sensors. Something similar is produced by Tools to survey symptoms though the frequency is irregular and far greater.


Journal entries. Just written texts describing the day.

Variables

Some common characteristics of sources of data you should write before analyzing.

  • Independence. Is the variable affected by other variables your are measuring or is it completely dependent on outside factors like weather?
  • A variable that depends on previous values of this same variable is not independent and is called auto-correlative? and non-stationary. For example skills at playing the guitar.
  • Randomness of Missingness. Similar to independence but its not the value of the variable but whether other measured variables could correlate with higher incidence of missing values. For example forgetting to charge the smart band because of tiredness and having a night without it on.
  • Target. Level. Is this variable something you want to improve, or a variable likely to affect those or just an intermediary background variable measured because it was easy and provided context?
  • Similarity. Proxy. Is this variable measuring something very similar to what another variable is measuring. The most common example is heart rate as many wearable measure it and the avid self tracker always has a few.

References

Linked content on this wiki

(The content in the table below is automatically created. See Template:Topic Queries for details. If newly linked pages do not appear here, click on "More" and "Refresh".)

Tools related to this topic  
Projects related to this topic  
Self researchers related to this topic  
We talked about this topic in the following meetings