Flash Cards as Cognitive Test

From Personal Science Wiki
Jump to navigation Jump to search

Main place for news of this project is on a thread in Anki forum because most of the audience would be there.

Project Infobox Question-icon.png
Self researcher(s) User:DG
Related tools Anki, Spaced Repetition
Related topics Tools for Cognitive Testing

Builds on project(s)
Of Trivial Value? Lessons From Using SuperMemo, Spaced Repetition: A Cognitive QS Method for Knowledge Acquisition, Knowledge Tracking, Memory and Learning
Has inspired Projects (0)


Flash cards are cards with question on one side and answers on opposite. They are used for memorization[1], making explicit[2] (requires effort to remember) declarative[3] semantic[4] memory though the goal of language learning is to make each memory automatic and therefore implicit,[5] system one.[6] They do this through the principal of spaced repetition.[7] Several computer apps automated the process and have recorded a lot of data. Inspired by Wozniak's and Gwern's[8] experiments, I started this project to analyze the data that Anki records because I expected electronic flashcard data to be useful as a continuous cognitive test. If this works, the resulting information would be useful for Citizen Science and Health Tracking. Academic papers may be successfully replicated here even though neither complicated multi-step analysis (though common some places[9][10]) nor single subject longitudinal observational studies like this project are common on google scholar.

Turns out the project will teach users about learning and allow them to experiment with and optimize their own learning process. All flashcard apps already optimize their student's learning but do not open the process to the user, except Super Memo 18. (this project already has different features) Resulting visualizations encourage studying by illustrating success in different way than existing apps, similar to gamification.

The project is a work in progress and much of the intended functionality is not yet working. I cannot guarantee that each of the goals will be all that useful to the end user. Project will take about ten thousand lines of code to complete, so I expect to burn out a few times before it is done. No AI or LLM was used in the making of this project.

Other Goals[edit | edit source]

Simply adding colorful plots that Anki does not already have will encourage many people to study more. Several variables may potentially correlate with general mood and feelings towards Anki.

The goal of teaching is almost as easy to guarantee because analyzing the data requires delving into the learning process. This goal may actually be the most impactful.

Project will advise user on how to optimize their studying by comparing what actually happened with what would have happened if they had done something different,[11] according to a machine learning model. Effects will be illustrated using Partial Dependence Plots and ICE[12].

To help user conduct experiments, project will compare one part of time series with another or progress on one set of cards with another. User may have to use another tool like Open Humans.

Flash cards may train the memorization or at least the metacognitive[13] skills.[14][15]

Potential Tests of Emotion[edit | edit source]

Tests that should mostly measure user's opinion of Anki itself and possibly general mood. Possibly strongly influenced by outside factors. Useful for optimizing Anki performance but not cognitive testing. For example, expectation of reward (candy) right after session could improve performance by improving mood or opinion of Anki.

  • Starting early in the day
  • Not skipping days
  • Not getting distracted
  • Reviewing cards fast
  • Reviewing many cards

Confounders and Artifacts of Procedure[edit | edit source]

Any decent skill test will detect when subject is severely sick. For a test to be useful for optimization and experimentation, it must detect more subtle patterns. Statistical tests detect plenty of patterns in my data that are both subtle enough and clearly not generated by the same process as the rest of the data.[16] Unfortunately, those patterns could easily be artifacts of the analysis or of the test taking process. Including all available variables that should NOT correlate with target in a machine learning model and then using resulting residual errors as the final test results should help, but not all variables that cause artifacts are available. Some independent variables will not be useable by the ML model in which case I will plot them against test series with changepoints.

Some of the artifacts will differ between tests so comparing results from multiple tests may result in a single decent time series strongly dependent on skill. If these solutions are not enough, one of the tests may correlate and transfer to an established validated cognitive test. That would still allow experimentation on and optimization of a skill, but through a cognitive ability.

Cognitive Health[edit | edit source]

Everyone should track their cognitive ability as much as health-conscious people track their heartrate and exercise. Faster learning performance decline is a warning sign in elderly.[17][18] IQ tests are supposed to have high 'reliability' and not change much between days for any individual. Formal cognitive testing takes too much time and effect on daily life of the specific thing the cognitive test tests is often questioned. Skill trainers and testers like typing tutors have none of the mentioned problems. However, even if a skill test is useful for optimizing the skill, things like dependence on psychological factors may not make it a good cognitive test. The skill test will have to correlate with validated cognitive tests or obviously important health things or at least transfer to other tests to become validated[19] as a cognitive or health test. The tests are unlikely to be pure in the sense of Quantified Mind' science page. This makes them better checks for general unhealth but harder to diagnose a specific problem with. The skill of memorizing from flashcards may be trainable. It may even transfer to memory in general.

Many elements of important theory of memory should be reproducible in this project even if the specific tests do not match exactly.[20] Most of the tests in the cited paper tested if item seen before (recognition) not if response was remembered with the help of a cue (cued recall).[21] In the scientific papers the delay of the recall is usually small relative to spaced retrieval, and it is almost never performed continuously or longitudinally. Hopefully, the tests transfer to cognitive concept of memory and be important in everyday functioning.

Most aspects of 1-day delayed free recall become impaired different amounts as age increases. [22]

Specific cognitive health tests[edit | edit source]

Goal of Anik skill is to minimize user time required to remember information on a card to a target future date. Decomposition of the general skill[23][24] by looking for sources of time loss results in learning speed, relearning speed, forgetting, memory consolidation, ability to remember effectively today, slowness to answer, and distraction.[25] Each corresponds to a potential test and it's validations. Consider "test-retest" reliability[26] and applicability to longitudinal studies. New theory of Disuse separates memory into retrieval strength and storage strength or middle and long term memory.[27][28]

Most tests are intended to diagnose clinical (sick) persons, not to analyze cognition in healthy people.[29]

Consolidation of Memory[30][31][edit | edit source]

If on a set day the cards were memorized better or worse as will be seen on their next review. This is like the following two potential identical tests but more generally about change and not about speed?

Gwern found that treadmill negatively affected future card success but not current recall.[32]

Recall[33][edit | edit source]

Remembering on a given day. If long-term then "retention". Wozniak explains it better but I would say that this should be separated in to real forgetting that changes state for several reviews or just momentary lapse. Both are measured separately making two tests. There are also more mild state of card consolidation changes that may happen that are not just forgetting. This may be a third test.

Wozniak of supermemo has found correlation between sleep (recency, debt) and flash card skill.[34][35][36] Here alertness corresponds to recall.

Long delay cued recall (or CVLT?) declines with age but improves with testing (testing effect on specific memory or the general skill?).[37]

Time taken to learn a new word[edit | edit source]

Short-term memory, from seconds to hours.[38] If considering only the first session, learning new words in Anki is like cued verbal learning like in the CVLT, a cognitive test. Though in the CVLT[39] the cue is much weaker. And verbal learning transfers to other abilities: "There is considerable evidence that verbal learning correlates reasonably strongly with performance in a number of important practical tasks. For instance, verbal learning tests have been demonstrated to be highly correlated with prospective remembering in real life, which means remembering to perform a planned action at the appropriate time [29]."[40] Sever sleep deprivation harms verbal learning.[41] Most of time spent on this task may be on short term memory (10 min) and not mid-long term memory (1 day) so the real test could be percent of successful learnings of new words and not speed. Age and memory complaints correlate with poorer cued recall at 55min delay and the test may identify subtle impairments.[42]

Time taken to relearn a forgotten word[edit | edit source]

Like learning new word but with stronger cue or part of the memory still in mind.

Forgetting between sessions[edit | edit source]

Forgetting as a consequence of something bad happening between test days, like a concussion. This is similar to multiple sequential days being bad for longer term cards other but over a much longer period of test days. So lets say a concussion makes all cards from week to month since last review more likely to be forgotten but only those in mid term state. When the cards would be reviewed, they would affect the score and so a lowering of overall memory for some time period (before concussion or after depending). So this is not a test but rather a pattern in other tests. In each test and each session results should be subdivide by original state as well in case informational pattern?

Speed answering cards[edit | edit source]

Fluency is a goal of language learning[43] so less time per card is good.

Vs Restraint. Taking enough time before answering a prompt rather than speeding through it.

Time taken to answer a question will be compared against ML model's recommended amount. Both are likely influenced more by emotion than mental ability.

Predicting optimal interval[edit | edit source]

Choosing optimal interval (day, month) for next review after seeing answer. Also compared against ML model's recommended amount.

Judgement of learning could be after cue (question) or answer. Delayed, cue only, JOL produced highest correlation with actual recall. If answer is added though, the effect goes way down. Worse still, study decisions are not better than only rereading. This seems to be because users do not realize they learned because of last test.[44] Metacognitive. Making JOLs also increases memory of items, maybe.

Cognitive endurance[edit | edit source]

For how long can user continue to recall and learn without accruing fatigue. This may be a matter of psychology. I suspect there will be a limit to how long each user should study without taking a break.

References[edit | edit source]

  1. en.wikipedia.org/wiki/Testing_effect
  2. en.wikipedia.org/wiki/Explicit_memory
  3. en.wikipedia.org/wiki/Declarative_learning
  4. en.wikipedia.org/wiki/Semantic_memory
  5. en.wikipedia.org/wiki/Implicit_memory
  6. bokcenter.harvard.edu/how-memory-works
  7. https://www.gwern.net/Spaced-repetition
  8. gwern.net/treadmill#treadmill-effect-on-spaced-repetition-performance-randomized-experiment
  9. www.kaggle.com/competitions?hostSegmentIdFilter=2&searchQuery=university
  10. paperswithcode.com/dataset/shhs
  11. scite.ai/reports/very-long-term-memory-for-knowledge-nLe1R6
  12. Visualizing ML Models with LIME · UC Business Analytics R Programming Guide (uc-r.github.io)
  13. www.ncbi.nlm.nih.gov/mesh/2009719
  14. academic.oup.com/psychsocgerontology/article/68/2/153/570871
  15. www.tandfonline.com/doi/abs/10.1080/13803390590935462
  16. en.wikipedia.org/wiki/Internal_consistency
  17. n.neurology.org/content/42/2/396.short
  18. www.tandfonline.com/doi/abs/10.1080/13825580600954256
  19. en.wikipedia.org/wiki/Validity_(statistics)
  20. Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.), Essays in honor of William K. Estes, Vol. 1. From learning theory to connectionist theory; Vol. 2. From learning processes to cognitive processes (pp. 35–67). Lawrence Erlbaum Associates, Inc.
  21. techpsych.wordpress.ncsu.edu/2020/03/18/hello-world/
  22. Acquisition, Recall, and Forgetting of Verbal Information in Long-Term Memory by Young, Middle-Aged, and Elderly Individuals, Cortex, Volume 39, Issues 4–5, 2003, academiccommons.columbia.edu/doi/10.7916/9p9w-tc61/download
  23. en.wikipedia.org/wiki/Job_analysis
  24. Foundations of Fluency: An Exploration: Reading Psychology: Vol 26, No 2 (tandfonline.com) www.tandfonline.com/doi/abs/10.1080/02702710590930519
  25. compmem.princeton.edu/wp/wp-content/uploads/2020/04/replay-based-consolidation-governs-enduring-memory-storage.pdf
  26. www.tandfonline.com/doi/abs/10.1080/13854046.2017.1310300
  27. www.learningscientists.org/blog/2016/5/10-1
  28. www.researchgate.net/profile/Robert-Bjork-2/publication/281322665_A_new_theory_of_disuse_and_an_old_theory_of_stimulus_fluctuation/links/58b6f20945851591c5d55e96/A-new-theory-of-disuse-and-an-old-theory-of-stimulus-fluctuation.pdf
  29. www.ncbi.nlm.nih.gov/pmc/articles/PMC5829025/
  30. supermemo.guru/wiki/Memory_consolidation
  31. www.ncbi.nlm.nih.gov/mesh/2009921
  32. gwern.net/treadmill#treadmill-effect-on-spaced-repetition-performance-randomized-experiment
  33. supermemo.guru/wiki/Recall
  34. supermemo.guru/wiki/Sleep_and_learning#Studying_sleep_and_learning_with_SuperMemo
  35. supermemo.guru/wiki/Biphasic_life#Biphasic_learning
  36. supermemopedia.com/wiki/SleepChart
  37. Longitudinal changes in verbal memory in older adults Distinguishing the effects of age from repeat testing Melissa Lamar, Susan M. Resnick, Alan B. Zonderman Neurology Jan 2003, 60 (1) 82-86; DOI: 10.1212/WNL.60.1.82
  38. Memory, Short-Term - MeSH - NCBI (nih.gov) www.ncbi.nlm.nih.gov/mesh/68008570
  39. en.wikipedia.org/wiki/California_Verbal_Learning_Test
  40. www.quantified-mind.com/science
  41. pubmed.ncbi.nlm.nih.gov/10688201/
  42. pubmed.ncbi.nlm.nih.gov/35786222/
  43. What's the story? The tale of reading fluency told at speed - Benjamin - 2012 - Human Brain Mapping - Wiley Online Library onlinelibrary.wiley.com/doi/abs/10.1002/hbm.21384
  44. Kornell, Nate, and Matthew G. Rhodes. "Feedback Reduces the Metacognitive Benefit of Tests." Journal of Experimental Psychology: Applied, vol. 19, no. 1, Mar. 2013, p. 1.