Changes

Jump to navigation Jump to search
7,188 bytes added ,  11:18, 8 June 2023
add validation part using quantified flu
Line 60: Line 60:  
== Conclusions ==
 
== Conclusions ==
 
Overall, this "scoring" approach seems to work in principle, even if I might have to tweak the cutoff/boundary values a bit. But what good is that score? To actually make use of it, I went ahead and updated the little script that populates my website footer, to display the overall values for the current day<ref>https://tzovar.as/</ref>. Which means I can now sync my Oura ring in the morning and then check my website to see what the predicted day looks like: Will I feel fine (score <3), ''maybe'' a bit under the weather (score between 3-4) and ''likely'' a bit under the weather (score >4). Time will tell if this is more useful than my own mental heuristics.
 
Overall, this "scoring" approach seems to work in principle, even if I might have to tweak the cutoff/boundary values a bit. But what good is that score? To actually make use of it, I went ahead and updated the little script that populates my website footer, to display the overall values for the current day<ref>https://tzovar.as/</ref>. Which means I can now sync my Oura ring in the morning and then check my website to see what the predicted day looks like: Will I feel fine (score <3), ''maybe'' a bit under the weather (score between 3-4) and ''likely'' a bit under the weather (score >4). Time will tell if this is more useful than my own mental heuristics.
 +
 +
== EDIT 2023-06-08: Using historical data to "validate" approach ==
 +
[[File:QF symptom distribution.png|thumb|200x200px|Total distribution of my observed daily symptom scores. For most of the ~1100 days I have no reported any symptoms (score of zero). ]]
 +
During one of the [[Personal Science Community|recent self-research chats]] I got inspired to use some of my existing data to explore whether it can be used to validate my heuristic to some level. Because in addition to all of my wearable data, I do happen to have around 3 years (or 1100 days) worth of self-reported symptom data through being a daily user of the [[Quantified Flu]] (QF) project.
 +
 +
=== Looking at my daily symptom reports. ===
 +
Each day at around 6pm local time I report whether I experience any symptoms of infection (e.g. cough, headache, fever, nausea, diarrhea, …). Each of these symptoms is scored as a Likert-item between 0-4. As for this validation we don't care too much for the different kinds of symptoms but rather a more abstract high-level view, we can just sum up all the symptom scores into one total score. As there is 12 scored symptoms at the moment, this gives us a daily symptom score between 0 – 48.
 +
 +
As seen in the score distribution on the right, those combined scores are be either zero or at least hover close to zero for the most part. And even for the extremes my values during this period never got larger than a total symptom score of 13. Overall, this distribution looks quite similar to the scores I created for my heuristic (see above).
 +
 +
=== Combining the symptom scores from QF with my heuristic ===
 +
[[File:Boxplot prediction syptoms.png|thumb|A boxplot comparing the physiological deviation scores used for the heuristic to my symptom scores as recorded in Quantified Flu. ]]
 +
With those daily symptom scores in hand, I could now merge the data with the "feeling fine"-scores from my heuristic by just matching them on the date. This results in ~1100 combined data points for which I have both the daily symptom score as well as the heuristic prediction.
 +
 +
==== Visual validation ====
 +
This now allows making a simple boxplot for some visual validation of the approach (see figure on the right). On the X-axis we can see the "physiological deviation" score that is calculated according to the approach outlined above. To recap, higher scores indicate a larger deviation of my physiological parameters away from my personal "norm", which I assume to indicate somehow not feeling 100% okay. On the Y-axis we see the "symptom score" – the sum of my reported symptom intensities.
 +
 +
If the heuristic works, then the average symptom scores should be higher on days where the physiological deviation score is higher. And broadly speaking that's what the boxplot shows. The median symptom ratings (indicated by the black horizontal bars in the boxes) are zero for days with a physiological score ≤4 but then start going up. And while the median for the symptom score remains zero at the physiological score of 4, we can already see that the distribution shifts, as the top of the box (giving the third quartile) moves up from zero.
 +
 +
But what we also see is that there are plenty of symptom score outliers (the points) for physiological scores between 0 – 3. These are cases where the heuristic would predict that I'm doing fine, but where I still have some symptoms. I think partially this can be explained by the fact that not all QF symptoms are limited to be present during infections/inflammations. For example, fatigue, headaches and muscle pains can come from a variety of sources and aren't limited to infections and thus might show up less in metrics such as HRV or body temperature.
 +
 +
Additionally, I experienced what's known as a ''smoker's or quitter's flu''<ref>https://www.verywellmind.com/quitters-flu-2824817</ref> when I stopped smoking in early 2023. During this the body can exhibit a lot of symptoms that look like a cold or flu, but that are just the body recovering. Curiously, most of my physiological data during that time looked quite unaffected.
 +
 +
==== Statistical validation: Permutation test ====
 +
[[File:Heuristic-validation-permutation-test.png|thumb|Results of the permutation test. Randomly generated differences in blue, pink vertical bar gives observed difference.]]
 +
Based on the visual examination through the boxplot it looks like this heuristic might indeed be quite promising, with larger deviations on average having higher symptom counts/intensities than less-deviating days. And given that both the physiological deviation scores (by definition) and the symptom scores skew towards lower numbers it seems unlikely that this would be purely by chance. But just to feel a bit more confident in these results I decided to perform a permutation test to explore how extreme the actually observed difference between those values is<ref>https://mac-theobio.github.io/QMEE/lectures/permutation_examples.notes.html</ref>.
 +
 +
For doing this I applied the heuristic cut-off as outlined above, with physiological deviations of ≥4 being labeled as "probably somehow being sick" and values of <4 being labeled as "not sick", splitting the data set into two groups, allowing us to calculate the average difference in symptom scores between the two groups. In a permutation test one then effectively randomly shuffles the data without replacement. Doing this means that the same number of labels as sick/not sick are now randomly tacked to the observed symptom scores. From this one can now calculate how big the symptom score difference is in this random shuffle.
 +
 +
If we now repeat the random shuffle some hundreds or thousands of times, we can create an expectation of how often one would randomly find that the values between the two groups are at least as extreme as the actually observed difference. In the figure on the right we can see the results: The real average difference in symptom score between the sick and not-sick groups is above 1.5 (pink vertical line), and across the 10,000 random shuffles we see that the majority of score differences centers around zero (blue bars). The corresponding [[Is it chance? Use a T-Test to identify how likely an intervention worked|p-value]] calculated for this comes out as being tiny, with ''p-value: < 2.2e-16''. 
 +
 +
=== A validated heuristic? ===
 +
Overall, I feel like this makes me somewhat more confident in the predictions given by the heuristic, in particular for the more extreme deviations (e.g. only very rarely are there cases where the score is ≥5 and I don't report any symptoms – which might also just have been me not recording properly). It's also interesting how my previously chosen cutoff of values of 4 or larger seems to be right on the borderline, with more than half of the days where a 4 is reported not having any symptoms associated. It could be interesting to explore if that assessment changes on the following day, i.e. whether the symptom report just lags by day!
 +
 +
Equally interesting to me are the outliers: Days where the physiological data is perfectly normal, but I still experience a lot of symptoms. It might be worth looking deeper into those days in particular!
    
== References ==  
 
== References ==  

Navigation menu