Finding relations between variables in time series

From Personal Science Wiki
Jump to navigation Jump to search
Topic Infobox Question-icon.png
Linked pages on this wiki Tools (0),

Projects (0),

People (0)

A frequent need when engaging in personal science is finding relationships between different variables across a time series, an example could be the question "does eating chocolate improve focus?".

How does one find relations between variables[edit | edit source]

To do this you need to have your data parsed, cleaned, all in one place, and ideally even visualized. The next step is to find relations between variables. Some people do this by themselves, by using programming languages such as R and Python in notebooks or apps. Coding platforms such as the notebooks on Open Humans, Kaggle or GitHub can help, but it frequently requires technical skills.

There are also a number of tools or apps that can semi-automatically perform these correlations and help in doing the analyses.

List of less technical tools[edit | edit source]

Open Humans and their Personal Analysis notebooks[edit | edit source]

Open Humans provides a library of notebooks that can be used to visualize data across data sources and find relations between different variables. It also supports the upload of generic data files through the File Uploader.

Zenobase[edit | edit source]

Zenobase can test correlations based on user-specified questions. User must configure lag, regression method and aggregation method using a UI. Powerful filtering tools too.

Data Flexor[edit | edit source]

DataFlexor Lots of pretty pictures. Not super advanced statistics yet.[edit | edit source]

"Baysian nodes, 'do' semantics, AI and experts"[1] Right now only for sports teams. [2][edit | edit source]

From the main site :"Which habits go together? Correlations are the most powerful part of Exist. By combining your data, we can answer questions like: “What makes me happiest?”, “What can I do to be more active?”, “When am I most productive?”"

Habitdash[edit | edit source]

Habitdash's Automatic data analysis searches for hidden patterns to find relationships between activity, sleep, weight and other habits.

Optimized app[edit | edit source]

Optimized claims to do "automatic correlation mining"

Lytiko[edit | edit source] promises correlations connections deep insights and visualizations.

Vital[edit | edit source]

Vital ( API for health and fitness data. Free to use API for collecting wearables and health data and standardising them into one API. You can also use Vital's API for delivering at-home test kits. and[edit | edit source]

deep learning predictor of age based on human blood tests and makes recommendations



Bearable App

Realize Me

Heads Up

Inside Tracker

Wellness FX

List of very technical tools[edit | edit source]

Programming languages for statistics; Matlab, R, Python, Julia.

Try Python GUI time series analysis .

DIY Individuals[edit | edit source]

Some people allow people to use their scripts that analyze lots of data at once but this does require some programming skill.

Reasons time series analysis especially as applied to QS is hard[edit | edit source]

Wavelet coherence is one potential solution.

Spurious Correlations mostly shows that if two things are trending in one direction and are checked for correlation they will show a very significant correlation. Practice effect is a subset. Another is one instance of an event increases the chances of the same event happening soon after. Economists suggest unit root.

Lag. What if eating pizza on one day causes heartburn the next?

Few positive instances but they are important. Went to a specific restaurant twice got sick soon after twice. Only ever got sick with similar symptoms five times. Or. Two large rare humps happen almost one after the other, similar to previous example if treated as events, adding the fact that lots of samples showing their similarity in shape too.

Different sampling rates need to be interpolated to be compared. Window. Since removing the effects of other variables makes the variable of interest's effect stand out, machine learning must be used. Common approach would be to bin predictor variables multiple ways based on time from effect being checked, mean or other aggregator method and window of the aggregator.

Machine learning also has limits on the kind of patters it can detect.

Types of data. [Exercised] is an event with specific occurrence moment and length while [tired] is a vaguer value user could use to try to describe feelings past 4 hours.