Talk:Finding relations between variables in time series

Categorization[edit | edit source]

Currently this page is sorted in as a tool, though it's rather a meta-article. I wonder whether it would make more sense to file it under Topic instead but would love to hear second opinions on this! - Gedankenstuecke (talk) 12:53, 30 November 2021 (UTC)

if you think thats better then do it DG (talk)

I've moved it to a topic page and also restructured/renamed the page slightly to fit into the topic dimension. - Gedankenstuecke (talk) 08:56, 2 December 2021 (UTC)

todo suggestion[edit | edit source]

mp April 7th at 1:51 PM @ oh an additional thought, post chat — I think a “fast up / slow down” pattern would be reflected in a Markov probability distribution where any given number has a low probability of later numbers being higher than it. e.g. if prior was 7, then 7-or-lower are likely (gradual decline), but 8+ very unlikely.I was struggling to remember the language here, but I think a “first order” Markov model is where probability distributions at any given step are based only according to the previous step (and no further “memory” in the system). A “second order” is influenced by the two previous steps (a bit more memory).

First! Carefully compare variance internally to days and between days. If internal is too high this variable has too poor "distinctness". Can also look for "stability" in derivatives and between any time measured such as 12 hours or a month... or maybe strongly unlikely Markov model? Also maybe first, multimodality and outlier and anomalies.

potential sources of solutions[edit | edit source]

Convert time series into bunch of splines? Especially MARS! https://www.google.com/search?client=firefox-b-1-d&q=splines+time+series

https://andrewncarr.gumroad.com/l/everydaydata

https://stats.stackexchange.com/questions/tagged/time-series wow really so much I forgot! Anomalies and Events and Periodicity! mentions QS : https://stats.stackexchange.com/questions/17623/how-to-detect-a-significant-change-in-time-series-data-due-to-a-policy-change/17661

https://hermandevries.nl/2020/09/23/relationships-between-hrv-sleep-and-physical-activity-in-personal-data/ suggested by gedankenstuek

http://beautifuldata.net/2015/01/how-to-analyze-smartphone-sensor-data-with-r-and-the-breakoutdetection-package/

http://beautifuldata.net/2015/01/how-to-analyze-smartphone-sensor-data-with-r-and-the-breakoutdetection-package/#comment-37605 for raw sensor data.

https://www.nature.com/articles/s41398-021-01445-0 personalized time series machine learning it is. Fairly commonly recommended procedures to Data scientists. I suspect faults from not takeing into account issues with time series; no mention of unit root for example. "Analytics code is available upon request from the corresponding author."

https://forum.quantifiedself.com/t/interventions-to-improve-sleep/9599/15 just linear lasso but lag and other issues discussed.

https://github.com/fasiha/ebisu#the-math intense math for flashcard prediction and timing adjustment!

https://old.reddit.com/r/CausalInference/ https://www.reddit.com/r/CausalInference/comments/ti18wz/personalized_nof1_or_singlecasesubject_causal/

https://www.physiq.com/ "physIQ is the only company that uses FDA-cleared, AI-based analytics to “learn” and detect even the most subtle changes in an individual’s own unique physiology 24/7."

https://play.google.com/store/apps/details?id=edu.brown.selfe&hl=en_CA&referrer=utm_source%3Dgoogle%26utm_medium%3Dorganic%26utm_term%3D%22self-e%22+app formal elf experiment support app from brown. definitely not advanced analytics but still

do not forget https://en.wikipedia.org/wiki/Multimodal_distribution is also a source

https://correlaid.org/en/ where to headhunt data scientists

other potential source[edit | edit source]

https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-s897-machine-learning-for-healthcare-spring-2019/lecture-notes/MIT6_S897S19_lec14.pdf

https://www.microsoft.com/en-us/research/group/alice/ just let microsoft do it. https://econml.azurewebsites.net/spec/motivation.html

https://github.com/cuge1995/awesome-time-series

https://github.com/youngdou/awesome-time-series-analysis

https://github.com/ejain/n-of-1-ml

https://github.com/yzhao062/anomaly-detection-resources#32-time-series-outlier-detection

https://github.com/gianlucatruda/quantified-sleep analysis before designing an intervention

https://www.gwern.net/Causality

https://papers.nips.cc/paper/2019/file/42a6845a557bef704ad8ac9cb4461d43-Paper.pdf

https://ml4qs.org/ hoogendoorn and funke

to test algorithm generate data https://old.reddit.com/r/rstats/comments/nhenrm/recommend_r_packages_to_generate_data/ but is it time series? https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume13/cheng00a-html/node15.html

https://academic.oup.com/jamia/article/24/1/198/2631444

https://physionet.org/about/tutorial/#exploreanalyse

extra... https://forum.quantifiedself.com/search?q=matlab https://www.google.com/search?q=non+independence+of+observations https://www.google.com/search?q=time+series+distributions https://www.google.com/search?q=time+series+kernel+binning

maybe ask here again https://old.reddit.com/r/AskStatistics/ maybe datasets https://old.reddit.com/r/datasets/search?q=time+health+subreddit%3Adatasets&include_over_18=on&sort=relevance&t=all

https://cran.r-project.org/web/views/TimeSeries.html

https://www.nature.com/articles/s41598-017-05778-z

https://r-graph-gallery.com/266-ggplot2-boxplot-with-variable-width.html

https://www.jmir.org/2022/1/e28953

https://preprints.jmir.org/preprint/40238

https://www.jmir.org/2022/1/e30720

http://www.biostathandbook.com/independence.html

correlation analysis https://www.frontiersin.org/articles/10.3389/fdgth.2020.00003/full

https://www.sciencedaily.com/releases/2021/11/211124154126.htm what was their meta-analysis?? It might be relevant for individual analysis not just meta.

even more sources[edit | edit source]

From and more at; https://arxiv-sanity-lite.com/?q=health+time&rank=search&tags=&pid=&time_filter=&svm_c=0.01&skip_have=no&page_number=3

https://arxiv.org/abs/2206.08178 just survival analysis https://arxiv.org/abs/2205.11680 also EHR

https://arxiv.org/abs/2206.09107 time series EHR rare binary features

https://arxiv.org/abs/2206.11505 possibly automatic generation

https://arxiv.org/abs/2207.06414 ER time series "robustness of this approach" deep learning interpretable https://arxiv.org/abs/2207.06414 DEEP, irregular time intervals, EHR, Long-term Dependencies and Short-term Correlations

https://arxiv.org/abs/2206.12414 !!!? marked temporal point processes DEEP missing events

https://arxiv.org/abs/2207.04305 https://arxiv.org/abs/2207.04308 DEEP

https://arxiv.org/abs/2207.08159 !!!!! Gaus mix model, autoencoders similarities among different time series distance metric

https://arxiv.org/abs/2004.02319 ! anomaly detection

https://arxiv.org/abs/2107.03502 Time Series Imputation autoregressive models score-based diffusion models

https://arxiv.org/abs/2110.05357 !! irregular sampling, graph neural network, dynamics of sensors purely from observational data, classify time series, healthcare

https://arxiv.org/abs/2204.00961 ... ? LSTM DEEP REINFORCEMENT recommend exercise routines to user sepcific needs https://arxiv.org/abs/2106.03211 extreme events RNN s&p500 stocks https://arxiv.org/abs/2107.05489 !!! ML for time series LSTM ....walk-forward algorithm that also calculates point-wise confidence intervals for the predictions

https://arxiv.org/abs/2108.13461 !!!!!!! healthcare predictive analytics, DEEP ?feature selection is not an issue? " feature engineering to capture the sequential nature of patient data, which may not adequately leverage the temporal patterns" " representations of key factors (e.g., medical concepts or patients) and their interactions from high-dimensional raw" summarises key research streams

https://arxiv.org/abs/2204.13451 EHR predicting "The common time-series representation is indirect in extracting such information from EHR because it focuses on detailed dependencies between values in successive observations, not cumulative information. "

https://arxiv.org/abs/2205.15598 !!! Disease prediction with ML. heterogeneity complex factors at the individual level. phase diagram

https://en.wikipedia.org/wiki/Sequential_pattern_mining

https://old.reddit.com/r/QuantifiedSelf/comments/wfuy03/personalized_digital_health_and_medicine_at_jsm/ "the g-formula (i.e., standardization, back-door adjustment) under serial interfer- ence. It estimates stable recurring effects, as is done in n-of-1 trials and single case experimental designs. We compare our approach to standard methods (with possible confounding) to show how to use causal inference to make better personalized recommendations for health behavior change, "

https://www.gwern.net/Replication crisis easy nice read. Since will be data dredging eventually and making stats test this useful.

https://old.reddit.com/r/wearables/comments/xmn06r/using_wearables_and_apps_to_characterize_your_own/" Well, the experimental design of n-of-1 trials and SCEDs actually checks for causation, not just correlation. This is why randomized controlled trials (RCTs) in clinical research are a gold-standard technique for figuring out if a new intervention or treatment actually works. “Flipping the coin” in a way balances everything else that might confuse or “confound” the way the treatment might impact the health-related outcome."

https://www.lesswrong.com/posts/9kNxhKWvixtKW5anS/you-are-not-measuring-what-you-think-you-are-measuring 2 rules- takeways You are not measuring what you think you are measuring but enough data sources and types of things you measure you may find out what that is.

https://gitlab.unige.ch/qol IMPORTANT

also eric j daza 's papers.

funny on this stat analysis[edit | edit source]

https://xkcd.com/2560/

Quick write I made here for later.[edit | edit source]

Collection is really just a matter of finding the right devices and taking the time to use them. Analysis outside of immediate obvious effect can become difficult. If the effect is subtle and drowned in other effects, or hard to measure. If the intervention is not something user can easily or wants to reproduce. If the effect take long time to build up, or is shifted in time from intervention. If the successful effect only happens under several conditions or several interventions together. If the spray and pray approach is dangerous. If the spray and pray approach only hits gold once in a while. Multiple comparison problem (see wikipedia). If user is bad at keeping records. There are probably more. There are many many apps that just do correlation and none that do anything more. Here is a list of both problems and apps.

new section : single variable validity[edit | edit source]

how to prove that what you are measuring really is what you are trying to measure. aka construct validation

quick way; compare to a scientifically validated standard.

also consider en.wikipedia.org/wiki/Convergent_validity many tests all agree more or less and "divergent validity" they do not correlate with things that they should not

en.wikipedia.org/wiki/Nomological_network several constructs and their relationships to each other such as ageing causes memory loss

some more thoughts[edit | edit source]

stats.stackexchange.com/questions/264225/finding-brief-repeated-patterns-in-a-time-series the question is not answered but in the side bar similar questions are well answered. Copulas may be important? The difference between time series and non-time series seems to be that with time series the patterns are cyclical, not a specific type of pattern/shape that happens every so often. Health tracking data seems to need both. I imagine blood sugar spike after meals but meals are not eaten at constant time. Long term trend and smoothing is covered in some Arima like models. Also changepoint which is like trend but in shorter time. And outliers? THat is like changepoint. results in a pretty visualization illustration of a single time series. If any of the apps were serious this would appear there. Maybe multiple moving averages www.investopedia.com/terms/g/guppy-multiple-moving-average.asp as which kernel width fits best. Maybe ecg decomposition with DWT and ICA This all called: single subject longitudinal analysis . how about Temporal Dynamics?

research chat suggests[edit | edit source]

look at the extremes of the predictor variables; like three day long terrible sleep and three day long good sleep and compare cognitive ability

another aggregator[edit | edit source]

www.opencures.org

apps.apple.com/us/app/this-that/id1660363624

Series analysis is not it[edit | edit source]

Old concept about constantly doing old school statistical testing to know when there is enough data to stop something like a clinical trial.