Changes

Jump to navigation Jump to search
no edit summary
Line 1: Line 1: −
A frequent problem in personal science is that you tried an intervention and want to see if it worked. But you are unsure whether any differences you observed are maybe just by chance. So how can you see how likely is it that the results were chance? One of the simplest tests is a '''T-Test''', sometimes called a “Student T Test”<ref>https://en.wikipedia.org/wiki/Student%27s_t-test</ref>, which provides you with a ''p-value'' after doing the t-test.
+
{{Topic Infobox}}
 +
: ''This article is about when and how to do statistical testing with t-tests. For a broader article on statistics, see [[Statistical testing]].''
   −
== Background ==
+
A frequent problem in personal science is that you tried an intervention and want to see if it worked. But you are unsure whether any differences you observed are maybe just by chance. So how can you see how likely is it that the results were chance? One of the simplest tests is a '''T-Test''', sometimes called a “Student T Test”<ref>https://en.wikipedia.org/wiki/Student%27s_t-test</ref>, which provides you with a ''p-value'' after doing the t-test.
 +
==Background ==
 
Generally, statisticians use the concept of ''p-value''s to discuss how often you would expect to observe the effect you did observe just by chance. A a simplified example, a p-value of 0.05 means that you would randomly observe this effect in 5 out of 100 retries of your intervention just by pure chance. While this crude measure doesn’t describe all the ways something might happen due to chance, generally the lower the p-value, the better.   
 
Generally, statisticians use the concept of ''p-value''s to discuss how often you would expect to observe the effect you did observe just by chance. A a simplified example, a p-value of 0.05 means that you would randomly observe this effect in 5 out of 100 retries of your intervention just by pure chance. While this crude measure doesn’t describe all the ways something might happen due to chance, generally the lower the p-value, the better.   
    
Professional scientists, especially those who understand statistics, are skeptical of claiming a result based purely on p-values, but for Personal Science purposes, it’s a good start. There is no “''correct”'' p-value cutoff that determines whether an effect is not due to chance alone, but traditionally people assume that any p-value that is smaller than 0.05 deserves a closer look and indicates that is probably a '''real''<nowiki/>' effect.
 
Professional scientists, especially those who understand statistics, are skeptical of claiming a result based purely on p-values, but for Personal Science purposes, it’s a good start. There is no “''correct”'' p-value cutoff that determines whether an effect is not due to chance alone, but traditionally people assume that any p-value that is smaller than 0.05 deserves a closer look and indicates that is probably a '''real''<nowiki/>' effect.
   −
== What you need ==  
+
==What you need==  
 
To be able to perform a t-test you need your numerical data which is split into exactly two conditions. Depending on the kind of intervention you did there are two different ways to split your data:
 
To be able to perform a t-test you need your numerical data which is split into exactly two conditions. Depending on the kind of intervention you did there are two different ways to split your data:
   −
# Grouped (also called 'paired') data, e.g. measurements are taken before and after the intervention, always creating a pair of data. An example of this would be if you regularly measure your [[blood pressure]] immediately before drinking a coffee and then take a second measurement after the consumption. This way each time you do the intervention you generates a pair of data.
+
#Grouped (also called 'paired') data, e.g. measurements are taken before and after the intervention, always creating a pair of data. An example of this would be if you regularly measure your [[blood pressure]] immediately before drinking a coffee and then take a second measurement after the consumption. This way each time you do the intervention you generates a pair of data.
# Independently sampled data, e.g. when there is no clear before/after. As an example, suppose you would like to know if taking a [[Melatonin]] supplement will help you sleep longer. You measure your daily sleep, taking the supplements on some days (the “intervention”) and not on others (“control”). Each night of sleep data now either goes into the ''intervention'' or ''control'' group.  
+
#Independently sampled data, e.g. when there is no clear before/after. As an example, suppose you would like to know if taking a [[Melatonin]] supplement will help you sleep longer. You measure your daily sleep, taking the supplements on some days (the “intervention”) and not on others (“control”). Each night of sleep data now either goes into the ''intervention'' or ''control'' group.
    
In order to perform a grouped/paired t-test you will need exactly the same amount of observations in both groups to keep the pairing. For independently sampled data this is not the case, i.e. you still do an independent t-test if you have more observations for interventions than controls (or vice versa).  
 
In order to perform a grouped/paired t-test you will need exactly the same amount of observations in both groups to keep the pairing. For independently sampled data this is not the case, i.e. you still do an independent t-test if you have more observations for interventions than controls (or vice versa).  
   −
=== ''Tails'' in a t-test ===
+
===''Tails'' in a t-test===
 
To perform a t-test one also needs to decide on whether one wants to perform a ''one-tailed'' or ''two-tailed'' test.  
 
To perform a t-test one also needs to decide on whether one wants to perform a ''one-tailed'' or ''two-tailed'' test.  
   Line 23: Line 25:  
As an example, a large p-value when using a one-tailed test to find out whether coffee ''lowers'' a persons blood pressure does not rule out that the blood pressure after drinking coffee has actually increased. It just means that we know it probably has not lowered. In contrast, if the same data would have been analysed with a two-tailed test and had resulted in a large p-value we would assume that the blood pressure has neither increased nor decreased.  
 
As an example, a large p-value when using a one-tailed test to find out whether coffee ''lowers'' a persons blood pressure does not rule out that the blood pressure after drinking coffee has actually increased. It just means that we know it probably has not lowered. In contrast, if the same data would have been analysed with a two-tailed test and had resulted in a large p-value we would assume that the blood pressure has neither increased nor decreased.  
   −
==== One- or two-tailed test? Benefits & drawbacks ====
+
====One- or two-tailed test? Benefits & drawbacks====
 
One- and two-tailed tests have their own benefits and drawbacks: Doing a one-tailed test is more sensitive – e.g. the p-values become low more quickly if there is real effect – than the two-tailed test. This increase in sensitivity comes at the cost of having to specify a clear direction and not being able to find differences which go against the selection of direction. Depending on your question, this might be acceptable or not.  
 
One- and two-tailed tests have their own benefits and drawbacks: Doing a one-tailed test is more sensitive – e.g. the p-values become low more quickly if there is real effect – than the two-tailed test. This increase in sensitivity comes at the cost of having to specify a clear direction and not being able to find differences which go against the selection of direction. Depending on your question, this might be acceptable or not.  
    
For example, you might only worry whether coffee increases your blood pressure but would not be interested in if coffee might actually decrease your blood pressure. In other cases you might want to know if an intervention has ''any'' effect, be it positive or negative. In these and cases where you do not have a clear prior expectation using a two-tailed test might be worth even at the cost of some sensitivity.  
 
For example, you might only worry whether coffee increases your blood pressure but would not be interested in if coffee might actually decrease your blood pressure. In other cases you might want to know if an intervention has ''any'' effect, be it positive or negative. In these and cases where you do not have a clear prior expectation using a two-tailed test might be worth even at the cost of some sensitivity.  
   −
== Practical examples ==
+
==Practical examples==
   −
===t-tests in Excel===
+
=== t-tests in Excel===
 
Here’s an example for how to do a t-test in [[Excel]]. It uses the example of evaluating the impact of taking melatonin on sleep duration.
 
Here’s an example for how to do a t-test in [[Excel]]. It uses the example of evaluating the impact of taking melatonin on sleep duration.
   Line 50: Line 52:  
The p-value in this example, <code>0.24</code>, is above <code>0.05</code> and therefore we will assume that any difference in sleep between the nights is due to pure chance.
 
The p-value in this example, <code>0.24</code>, is above <code>0.05</code> and therefore we will assume that any difference in sleep between the nights is due to pure chance.
   −
=== t-tests in R ===
+
===t-tests in R===
 
You can do the same t-test in the statistical programming language [[R]], using the same example:  
 
You can do the same t-test in the statistical programming language [[R]], using the same example:  
   Line 80: Line 82:       −
== Limitations ==
+
==Limitations ==
 
T-tests remain an easily and widely used statistical tool to investigate whether observed differences between conditions are likely to be due to random chance or not. One of the main limitations of t-tests is that they only work for comparisons between exactly two groups. If you try to do more complex interventions/experiments, the t-test might not work for you.  
 
T-tests remain an easily and widely used statistical tool to investigate whether observed differences between conditions are likely to be due to random chance or not. One of the main limitations of t-tests is that they only work for comparisons between exactly two groups. If you try to do more complex interventions/experiments, the t-test might not work for you.  
   −
=== Technical, statistical aspects ===
+
===Technical, statistical aspects===
    
The standard t-test also expects that your measurements (or pairs of measurements in the case of paired tests) are independent of each other. Depending on what you are measuring this might not be the case, in particular for time-series data where you take very frequent samples (e.g. two heart rate measurements taken a minute apart will be highly linked to each other).  
 
The standard t-test also expects that your measurements (or pairs of measurements in the case of paired tests) are independent of each other. Depending on what you are measuring this might not be the case, in particular for time-series data where you take very frequent samples (e.g. two heart rate measurements taken a minute apart will be highly linked to each other).  
Line 91: Line 93:  
==References==
 
==References==
 
<references />
 
<references />
{{Topic Queries}}
+
 
[[Category:Topics]]
+
[[Category:Data analysis]]

Navigation menu