Editing Finding relations between variables in time series

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

Latest revision Your text
Line 2: Line 2:
 
Most personal science projects require finding relationships between different variables of the type 'time series'<ref>Core-Guide_Longitudinal-Data-Analysis_10-05-17.pdf (duke.edu)</ref>. An example could be the question "does my daily chocolate consumption correlate with my daily focus score?".   
 
Most personal science projects require finding relationships between different variables of the type 'time series'<ref>Core-Guide_Longitudinal-Data-Analysis_10-05-17.pdf (duke.edu)</ref>. An example could be the question "does my daily chocolate consumption correlate with my daily focus score?".   
  
You could do experiments if you control everything rigidly or if the effects are strong and quick, like less than a week. Old data may be useable as Baseline and a baseline may rule out some issues. If both block (like 2 weeks) and daily mixed (random intervention every day)  produce the same results then issues of time series are probably not in your experiment.   
+
You could do experiments if you control everything rigidly or if the effects are strong and quick, like less than a week. Old data may be useable as Baseline.  
  
 
Finding more complicated relationships require better statistical tests and algorithms and data science skills. Apps that would do this automatically or at least easily are not yet available. See below. Most internet resources treat time series as (regular cyclical) series, which is not useful as most of the tracked variables have irregular patterns and don't even have a regularly cyclical component.   
 
Finding more complicated relationships require better statistical tests and algorithms and data science skills. Apps that would do this automatically or at least easily are not yet available. See below. Most internet resources treat time series as (regular cyclical) series, which is not useful as most of the tracked variables have irregular patterns and don't even have a regularly cyclical component.   
Line 44: Line 44:
 
==== young.ai and [http://www.aging.ai/ aging.ai] ====
 
==== young.ai and [http://www.aging.ai/ aging.ai] ====
 
Deep learning predictor of age based on human blood tests, young.ai also makes recommendations.
 
Deep learning predictor of age based on human blood tests, young.ai also makes recommendations.
 
==== Sonar [https://www.sonarhealth.co sonarhealth.co] ====
 
Customizable aggregation and syncing like weigh fitbit twice as much as apple watch or average steps instead of sum.
 
 
====== tunum.health ======
 
pearson correlation, trend analysis and manual dichotomization
 
  
 
[[Gyroscope]]
 
[[Gyroscope]]
Line 66: Line 60:
  
 
Export from Apple Health<ref>github.com/Lybron/health-auto-export</ref> (no analysis)
 
Export from Apple Health<ref>github.com/Lybron/health-auto-export</ref> (no analysis)
 
ConnectorDB DIY OS no analysis
 
 
Heedy DIY OS no analysis
 
 
Zapier, Integromat, IFTTT, DIY no analysis
 
  
 
== List of very technical tools ==
 
== List of very technical tools ==
Line 87: Line 75:
 
Really strong relationships will be detected even through most of these problems.  
 
Really strong relationships will be detected even through most of these problems.  
  
[http://www.tylervigen.com/spurious-correlations Spurious Correlations] mostly shows that if two things are trending in one direction and are checked for correlation they will show a very significant correlation. Practice effect is a subset. Another is one instance of an event type A increases the chances of the same event type happening soon after. Economists suggest unit root.
+
[http://www.tylervigen.com/spurious-correlations Spurious Correlations] mostly shows that if two things are trending in one direction and are checked for correlation they will show a very significant correlation. Practice effect is a subset. Another is one instance of an event increases the chances of the same event happening soon after. Economists suggest unit root.
  
 
Effects on target variable from outside known variables. In non time series this is compensated for with RCT but in time series such an effect may last a while and coincide with an intervention causing very false results. This problem makes baseline data gathering more difficult and also necessary. Sometimes a baseline will show that this issue does not occur for a particular target variable. Alternatively experimenter could compensate by strictly controlling all possible sources of variance.   
 
Effects on target variable from outside known variables. In non time series this is compensated for with RCT but in time series such an effect may last a while and coincide with an intervention causing very false results. This problem makes baseline data gathering more difficult and also necessary. Sometimes a baseline will show that this issue does not occur for a particular target variable. Alternatively experimenter could compensate by strictly controlling all possible sources of variance.   
Line 95: Line 83:
 
Build up. What if it takes two days of eating pizza to cause heartburn?  
 
Build up. What if it takes two days of eating pizza to cause heartburn?  
  
Rate of change. Trend. Opposite of build up; derivative instead of integral. Stopping or starting an all pizza diet causes heartburn.  
+
Few positive instances but they are important. Went to a specific restaurant twice got sick soon after twice. Only ever got sick with similar symptoms five times. Or. Two large rare humps happen almost one after the other, similar to previous example if treated as events, adding the fact that lots of samples showing their similarity in shape too.  
  
Bin. Window. Smooth. Variables only make domain sense as aggregate over some time. Variables have a really high sampling rate.   
+
Different sampling rates need to be interpolated to be compared. Window.   
  
Interpolate. Variables have different sampling rates so need to be interpolated to be compared.
+
Since removing real effects of other variables on target variable makes the variable of interest's effect stand out, machine learning will be used. Basic approach would be to bin predictor variables multiple ways based on time from effect being checked, mean or other aggregator method and window of the aggregator.  
  
Types of data. [Exercised] is an event with specific occurrence moment and length while [tired] is a vaguer value user could use to try to describe feelings past 4 hours.  
+
Machine learning also has limits on the kind of patterns it can detect.
 +
 
 +
Types of data. [Exercised] is an event with specific occurrence moment and length while [tired] is a vaguer value user could use to try to describe feelings past 4 hours.    
  
 
All the [[Issues with Self Report]] .       
 
All the [[Issues with Self Report]] .       
 
Few positive instances but they are important. Went to a specific restaurant twice got sick soon after twice. Only ever got sick with similar symptoms five times. Or. Two large rare humps happen almost one after the other, similar to previous example if treated as events, adding the fact that lots of samples showing their similarity in shape too. 
 
 
Since removing real effects of other variables on target variable makes the variable of interest's effect stand out, 'machine learning' needs to be used. Basic approach would be to bin predictor variables multiple ways based on time from effect being checked, mean or other aggregator method and window of the aggregator.
 
 
Machine learning also has limits on the kind of patterns it can detect.
 
  
 
=== What to expect from the complete analysis tool ===
 
=== What to expect from the complete analysis tool ===
Line 124: Line 108:
 
Cycles decomposition using a model like ARIMA. Ex. kayak season is in the summer or lunch is at exactly 1pm.   
 
Cycles decomposition using a model like ARIMA. Ex. kayak season is in the summer or lunch is at exactly 1pm.   
  
Detection of repeated shapes implying similar events that are not cyclical; like dinner is anywhere between 4pm and 10pm and causes a particular 2 hour spike in glucose.
+
Detection of repeated shapes implying similar events that are not cyclical; like dinner is anywhere between 4pm and 10pm and causes a particular 2 hour spike in glucose.
  
 
== References ==
 
== References ==
 
[[Category:Data analysis]]
 
[[Category:Data analysis]]

Please note that all contributions to Personal Science Wiki are considered to be released under the Creative Commons Attribution-ShareAlike 3.0 Unported (see PersonalScienceWiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

Cancel Editing help (opens in new window)

Template used on this page: