# Replace "path/to/your/directory" with the full path to your working folder
setwd("path/to/your/directory")Lab 3: T-Tests
An NCRM Case Study in Pedagogy
Introduction
Hypothesis Testing
Moving on From Lab 2
Last week we explored the basis of statistical inference. We introduced important ideas around what a sample is, what a sampling distribution is and what the central limit theorem allows us to do in terms of building inferential statements and forming the basis of hypothesis testing.
Last week was, by nature, a highly conceptual exploration of the foundations of statistics. As we build on that knowledge this week, it’s important to remember that while random sampling is the best approach to avoiding systematic bias in data collection, it still involves uncertainty. However, because this uncertainty is random, it is not a cause for concern. We are just as likely to over sample as to under sample, and if the sample size is large enough, most samples we collect will be very close to the population mean. This results in the familiar normal distribution curve, which allows us to begin making inferential claims.
One important distinction in our discussions today versus last week is the nature of repeated sampling. Is this something we’re ever likely to do? Probably not. Collecting data can be prohibitively expensive so having one very well executed data collection exercise is what we will aim for. Repeated sampling is unnecessary for most applications so long as we have collected our data on the basis of a random sampling approach. Remember, central limit theory is key here and we assume that this holds.
This week we move on to look at hypothesis testing. We wil spend the next three weeks looking at hypothesis testing through a variety of statistical tests:
- Ttests (this week)
- Chi2 tests (next week)
- Correlation tests (following week)
Hypothesis testing is a fundamental concept in statistics used to make inferences about a population based on a sample of data. It provides a structured framework for evaluating claims and determining whether there is enough evidence to support a particular hypothesis. This process is widely applied in scientific research, business analytics, medicine, and various other fields to make data-driven decisions.
The Components of Hypothesis Testing
Hypothesis testing begins with two competing hypotheses:
Null Hypothesis (H₀): This is the default assumption that there is no effect or no significant difference. It represents a statement of no change or status quo.
Alternative Hypothesis (H₁ or Ha): This is the statement that contradicts the null hypothesis, suggesting that there is a significant effect or difference.
For example, if a pharmaceutical company wants to test whether a new drug is more effective than an existing one, the null hypothesis might state that there is no difference in effectiveness between the two drugs, while the alternative hypothesis would suggest that the new drug is more effective.
Steps in Hypothesis Testing
Hypothesis testing follows a similar approach that is outlined below. Some of these terms will seem a bit technical, but we will explore these as we go along. For now, let’s just remember that these are the core steps involved in hypothesis testing regardless of whether looking at ttest, chi2 test or a correlation (and the many others you’ll learn after this course!)
- Formulating the Hypotheses: Clearly define the null and alternative hypotheses based on the research question.
What are null and alternative hypotheses?
Null Hypothesis (H0): This is the assumption that there is no effect or no difference between the groups being studied. For example, if we are comparing the test scores of two classes taught using different teaching methods, the null hypothesis would state that there is no difference in their average scores.
Alternative Hypothesis (H1 or Ha): This is the opposite of the null hypothesis and suggests that there is a difference or an effect. In our example, the alternative hypothesis would state that one teaching method leads to higher scores than the other.
Choosing a Significance Level (α): This is the threshold at which we decide whether to reject the null hypothesis. The significance level, commonly set at 0.05 (5%), determines the probability of rejecting the null hypothesis when it is actually true (Type I error).
Selecting a Test Statistic: Depending on the nature of the data and research question, different statistical tests may be used, such as the t-test, chi-square test, or Correlation.
Test Statistic: This is a value calculated from the data, which we compare against a critical value to determine whether the observed results are statistically significant. In this course we’ll examine t-statistics, chi2 statistics and Pearsons and Spearmans correlation coefficients.
Calculating the p-value: The p-value measures the probability of obtaining results as extreme as the observed ones, assuming the null hypothesis is true. A small p-value (typically less than α) suggests strong evidence against the null hypothesis. Hypothesis testing can be done on either the result of the test-statistic or a p-value. The p-value statistic can be calculated from each of the test statistics and is consistent in interpretation regardless of what test. Commonly, this is the easiest element to determine the outcome of a hypothesis test because it is standardised.
Making a Decision: If the p-value is less than the significance level, the null hypothesis is rejected in favor of the alternative hypothesis. Otherwise, we fail to reject the null hypothesis.
The T Test
Brewing Ideas: The foundations of the T-Test
The t-test was developed by William Sealy Gosset in the early 20th century while working as a statistician for the Guinness Brewery in Dublin. Guinness was interested in improving the quality of its beer by using statistical methods to analyze small samples of barley yields and brewing processes. However, at the time, most statistical techniques were designed for large samples, which were impractical for quality control in brewing.
The key problem Gosset faced was that existing statistical methods assumed that sample sizes were large enough for the central limit theorem to ensure a normal distribution of sample means. But when working with small samples, the variability was greater and less predictable, making it difficult to determine if differences in brewing conditions were statistically significant.
To address this, Gosset developed the Student’s t-distribution, which adjusted for the increased uncertainty in small samples. This led to the creation of the t-test, a method for comparing sample means when sample sizes are small, while accounting for variability. He published his findings under the pseudonym “Student” in 1908 because Guinness prohibited employees from publishing research that might reveal trade secrets.
The t-test has since become a fundamental tool in inferential statistics, widely used in fields beyond brewing, including medicine, psychology, and social sciences, to test hypotheses about population means.
Ttests: variants and purposes
A t-test is used when we want to compare the means of two groups and determine whether any observed differences are statistically significant. The test assumes that the data follows a normal distribution and is particularly useful when sample sizes are small.
There are three main types of t-tests:
One-Sample t-Test: Used when comparing the mean of a single sample to a known population mean. For example, if we want to test whether the average height of a group of students differs from the national average height, we would use a one-sample t-test.
Independent (or Two-Sample) t-Test: Used when comparing the means of two separate groups. For example, if we want to determine whether there is a difference in exam scores between students who attended revision sessions and those who did not, an independent t-test would be appropriate. This is what we will focus on in todays lab.
Paired t-Test: Used when comparing two related samples, such as measurements taken from the same individuals before and after an intervention. For instance, if we measure students’ test scores before and after a study skills workshop, we would use a paired t-test.
Understanding the Critical Value and t-Statistic
The t-test works by comparing an observed test statistic (the t-value) against a critical value from the t-distribution. The test statistic is calculated using the following formula for an independent t-test:
t = (mean1 - mean2) / (standard error of the difference)
Where:
mean1 and mean2 are the averages of the two groups being compared.
Standard error of the difference accounts for variability in the data and the sample sizes.
The critical value is determined based on the chosen significance level (e.g., 0.05) and the degrees of freedom, which depend on the sample size. If the absolute value of the test statistic is greater than the critical value, we reject the null hypothesis and conclude that there is a significant difference between the groups.
For example, suppose we conduct a t-test comparing the test scores of two student groups.
Significant result:
Group 1 mean = 78, Group 2 mean = 85
Standard error of the difference = 2
t = (78 - 85) / 2 = -3.5
Degrees of freedom = 28
Critical value at α = 0.05 (two-tailed) ≈ ±2.05
Since |t| = 3.5 > 2.05, we reject the null hypothesis and conclude a significant difference between the groups.
Non-significant result:
Group 1 mean = 78, Group 2 mean = 80
Standard error of the difference = 3
t = (78 - 80) / 3 = -0.67
Degrees of freedom = 28
Critical value at α = 0.05 (two-tailed) ≈ ±2.05
Since |t| = 0.67 < 2.05, we fail to reject the null hypothesis, meaning there is no significant difference.
If you wanted to hypothesis test based upon the t-statistic you would have to go away and look up the critical value in a t-table. This however, is complicated and time consuming so it is easier to use the p-value which is calculated by the software on the basis of the t-statistic result (if you want to see a t-table this clickable link takes you to one)
The Role of the P-Value
The p-value helps us assess the strength of evidence against the null hypothesis. A low p-value (typically less than 0.05) indicates that the observed data is unlikely to occur under the assumption that the null hypothesis is true, leading us to reject the null hypothesis.
For example, suppose we conduct an independent t-test comparing test scores of two groups.
Significant result:
Group 1 mean = 78, Group 2 mean = 85
Standard error of the difference = 2
t = (78 - 85) / 2 = -3.5
Degrees of freedom = 28
p-value = 0.002
Since the p-value (0.002) is well below the 0.05 threshold, we reject the null hypothesis and conclude that there is a statistically significant difference between the groups.
Non-significant result:
Group 1 mean = 78, Group 2 mean = 80
Standard error of the difference = 3
t = (78 - 80) / 3 = -0.67
Degrees of freedom = 28
p-value = 0.51
Here, the p-value (0.51) is much greater than 0.05, meaning the observed difference is likely due to random variation. We fail to reject the null hypothesis and conclude that there is no significant difference between the groups.
This illustrates why the p-value is useful—it eliminates the need to manually look up the critical value in a t-table. Instead of comparing the test statistic to a threshold, we can simply interpret the p-value directly. If it is below 0.05, we reject the null hypothesis; if it is above 0.05, we fail to reject it.
Note: Understanding the maths is not a requirement to pass the course, however it can be helpful in aiding understanding. For the code and full explanations of calculations of ttest and p-values please see the appendix.
Type I and Type II Errors
When conducting a t-test, there are two types of errors we need to be aware of:
Type I Error: This occurs when we incorrectly reject the null hypothesis when it is actually true. For example, if we conclude that a new teaching method improves student performance when it actually does not, we have made a Type I error. The probability of making this error is equal to the significance level (α), often set at 0.05.
Type II Error: This occurs when we fail to reject the null hypothesis when there is actually a real effect. For example, if a new teaching method does improve performance but we do not detect this improvement due to small sample size or high variability, we have made a Type II error. The probability of making this error is denoted as β, and 1-β represents the statistical power of the test.
Interpreting t-Test Results in Practice
To interpret t-test results, follow these steps:
Check the t-Statistic and Critical Value: If the test statistic is larger than the critical value, reject the null hypothesis.
Examine the p-Value: If the p-value is less than 0.05, reject the null hypothesis and conclude that there is a statistically significant difference.
Consider Effect Size: Even if a result is statistically significant, it is important to assess whether the difference is meaningful in practice.
Independent Samples T-Test in R
This week we are going to learn and practice conducting t-tests, using the Varieties of Democracy (V-Dem) data set. This data set contains expert-coded assessments of the levels of democracy in over 200 states around the world. In particular, the data includes five major measures of democracy:
v2x_polyarchy– electoral democracyv2x_libdem– liberal democracyv2x_partipdem– participatory democracyv2x_delibdem– deliberative democracyv2x_egaldem– egalitarian democracy
Each variable is measured on a scale from 0 to 1. For more information on these variables, and how the scores are generated, you should consult the code-book available here.
The unit of analysis for this data is a “country-year” i.e. one observation per country per year (beginning from the year the state was created). In total, for the 202 states within the data, that equates to 27013 rows of data.
The full data set also contains thousands of variables, and therefore we have provided you with a much shortened version of the data containing, alongside the five democracy measures, six other variables of interest.
year– Observation yearcountry_name– Name of the countrye_civil_war– Was there a civil war? Yes (1), No (0)e_miinteco– Did the country participate in an international armed conflict? Yes (1), No (0)e_miinterc– Did the country experience an internal armed conflict? Yes (1), No (0)e_area– Land area of the country in square kilometres.
New commands this week
ttestIS(..., students = FALSE, welch = TRUE)– runs a t-test usingjamovit.test(...)– runs a t.test using subsettingmutate(...)– create or modify a variable within adata.frame
Reminder of key accessible commands
Opening a code chunk:
- Keyboard shortcut:
Ctrl + Alt + I(Windows/Linux) orCmd + Option + I(Mac)
- Keyboard shortcut:
Running a code chunk:
Ctrl + Shift + Enter(Windows/Linux) orCmd + Shift + Enter(Mac) while inside the chunkClick the green play button at the top right of the chunk
Executing a single line of code:
Ctrl + Enter(Windows/Linux) orCmd + Enter(Mac) with the cursor on the line
Rendering a Quarto document:
Ctrl + Shift + K(Windows/Linux) orCmd + Shift + K(Mac)Click the “Render” button in the toolbar
Setting up R for Analysis
Before we do anything, we need to open a quarto script and open a code chunk to place our working directory. Remember we need to run this before we proceed with the lab.
Next, I would recommend setting up an output_filelab3 to sink the results so that we can check analysis and any error messages as we go.
Remember, you need to include in the brackets at the beginning of the chunk:
r setup, include=TRUE, eval=FALSE
This code will make sure that your output file updates as you run chunks but that you can also render the document as a whole.
# Open a file connection
output_file <- file("output.txt", open = "wt")
# Redirect both standard output and error messages to the same file
sink(output_file, split = TRUE) # Capture normal output
sink(output_file, type = "message") # Capture errors and warningsThis first half of the session we will go through and talk about the ttest practically and go through how to use the code. The section called Exercises gives you the opportunity to ‘apply’ these skills to some questions.
We are going to start off by asking a very important question in comparative politics: Are democratic states more peaceful?
To answer this we are going to use data from the V-Dem (Varieties of democracy) dataset and use t-tests to explore differences between democratic and authoritarian states.
The V-Dem Dataset is created by taking coding by 5 experts per country who outline their views on the democratic nature of each state. It has the following structure and content:
Full data has 4108 variables, 202 countries, between 1789 and 2019!
Our subset includes all years, but a smaller set of variables (see the lab worksheet)
5 key measures of democracy
Each ranges from 0 (low) to 1 (high)
We need to load the relevant packages and read in our data. Like last week, we are loading a csv file so we will use the command read_csv.
Once you have run the packages and loaded the data remember to run the chunk. You can then also check your output text file to check they have loaded correctly.
The code should look like this:
library(tidyverse)Warning: package 'ggplot2' was built under R version 4.2.3
Warning: package 'tidyr' was built under R version 4.2.3
Warning: package 'readr' was built under R version 4.2.3
Warning: package 'dplyr' was built under R version 4.2.3
Warning: package 'stringr' was built under R version 4.2.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(jmv)
library(BrailleR)The BrailleR.View, option is set to FALSE.
Attaching package: 'BrailleR'
The following objects are masked from 'package:graphics':
boxplot, hist
The following object is masked from 'package:utils':
history
The following objects are masked from 'package:base':
grep, gsub
vdem <- read_csv("Lab 3/data/rpir_vdem.csv")Rows: 27013 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): country_name
dbl (11): year, v2x_polyarchy, v2x_libdem, v2x_partipdem, v2x_delibdem, v2x_...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Before we proceed with any analysis, we need to revisit our research question.
We chose to focus on the following question: Are democratic countries more peaceful? First, we need to think about what our theoretical expectations are and what might help inform us about what our hypothesis should be.
A good starting point is the following theory: Liberal democracies protect minority rights, leading to less of a need for violent action within states.
If this theory holds, it would logically lead us to the following expectation:Those countries engaged in civil war are less liberally democratic (Note: this inverts the research question (but we need to do this given our specific data).
The next step in the research process is to think about how we operationalise these terms. We’ve got two terms we need to deal with:
Peaceful
Whether or not the state is engaged in civil war
We measure this through the variable e_civil_war
The liberal democracy V-Dem score
we measure this through the variable v2x_libdem
Ranges from 0-1 - but continuous within this range
“The liberal principle of democracy emphasizes the importance of protecting individual and minority rights against the tyranny of the state and the tyranny of the majority” V-Dem codebook
Like in previous weeks, the first step in getting to know our variables is to explore them descriptively. We can do this using the vdem command ‘descriptives’. We are going to explore two variables at a time so we will use the ‘c’ option to ask R to provide output on two variables in one piece of code. The code should look like:
descriptives(vdem, c("e_civil_war","v2x_libdem"))
DESCRIPTIVES
Descriptives
────────────────────────────────────────────────────
e_civil_war v2x_libdem
────────────────────────────────────────────────────
N 13879 24350
Missing 13134 2663
Mean 0.06095540 0.2179225
Median 0.000000 0.1230000
Standard deviation 0.2392571 0.2280718
Minimum 0.000000 0.003000000
Maximum 1.000000 0.8910000
────────────────────────────────────────────────────
Once you’ve written in the code, run the chunk.
If we look at the output in your text file, you will see that we get some descriptive statistics including mean/median/mode and standard deviation. How would we interpret these?
One thing to note about the data that we have so far is that it contains observations from all years between 1789-2019. This is a vast time period, and given how much we know that the number of democracies has changed over time and how also the nature of war and conflict has changed it might be better to look at a shorter time period to begin with. We are going to filter this data into an object called vdem_short where we keep only the results from 1980 and 2000.
To do this, we use the filter command. The filter command works by allowing us to take a subset of data from the original data object. Next, we need to specify the years of interest we wish to keep. We do this through using year %in% c(1980, 2000) where year is the name of the variable %in% tells the software we are selecting specific information from within the object and c(1980,2000) selects the data where year is either 1980 or 2000. We store this in a ew data object we call vdem_short. The code should look like this:
vdem_short <- filter(vdem, year %in% c(1980,2000))Run this code chunk once you’ve got the code entered.
Next, we can check our descriptive statistics again, but we want to do this in a way that enables us to see the results for 1980 and 2000 separately. We can do this by modifying our previous descriptives command by using the option splitBy = “year”. What this does is tells the software that we want to see descriptives for our two variables, but we want to see them separately for 1980 and 2000. The code should look like this:
descriptives(vdem_short, c("e_civil_war","v2x_libdem"), splitBy = "year")
DESCRIPTIVES
Descriptives
───────────────────────────────────────────────────────────
year e_civil_war v2x_libdem
───────────────────────────────────────────────────────────
N 1980 141 154
2000 162 176
Missing 1980 16 3
2000 15 1
Mean 1980 0.09219858 0.2490584
2000 0.07407407 0.3869375
Median 1980 0.000000 0.1135000
2000 0.000000 0.3230000
Standard deviation 1980 0.2903375 0.2670067
2000 0.2627035 0.2732960
Minimum 1980 0.000000 0.01000000
2000 0.000000 0.01200000
Maximum 1980 1.000000 0.8720000
2000 1.000000 0.8830000
───────────────────────────────────────────────────────────
What do the findings show us? They show us that the mean of civil war in 1980 is 0.09 and in 2000 is 0.07. This illustrates that the amount of civil war conflict reduced between the two periods. Remember, as this is a binary variable (civil war either occurred or not) this exists as a proportion. The standard deviation also shrunk from 0.29 to 0.26 suggesting that the variation in civil war across countries also declined between 1980 to 2000.
The liberal democracy index has a mean of 0.24 in 1980 and 0.38 in 2000. This again suggested that liberal demcoracy is risen between 1980 and 2000. The standard deviation rises slightly from 0.26 to 0.27 suggesting a greater dispersion in liberal democracy in the year 2000.
Now that we’ve had a look at our data and considered a theory, we need to set our hypotheses. As a reminder a hypothesis is:
- A proposition made without assuming its truth
- Prospective
- Forms the basis for the scientific method
There are two key types of hypothesis:
Null hypothesis (H0)
This is the proposition we actually test
Normally refers to “no effect” or “no difference” or “zero correlation”
- Hence “null”
Typically an exact value (0)
Alternative hypothesis (H1)
This is the proposition that holds fi the null is false
E.g. “there is an effect” , “there is a difference”, “there is a correlation”
Could be any value (and so harder to test non-arbitrarily)
What should our hypotheses look like? As a reminder, our empirical question is ‘How democratic are countries engaged in war and not engaged in war?’
Using the mean, we can estimate the “average” level of democracy for those engaged in civil war and those not
Null hypothesis:
Average level of democracy in peaceful states equals average level of democracy in states undergoing civil wars
i.e. on average Dem.War − Dem.Peace = 0
Alternative hypothesis:
- Average level of democracy in peaceful states is not equal to average level of democracy in states undergoing civil wars
One thing to note is that when we come to examining our data we can infer the direction of the difference from the actual difference in values:
If Dem.war < (less than) Dem.peace
- Countries at war are less democratic
If Dem.war > (more than) Dem.peace
- Countries at war are more democratic
We can start exploring descriptively whether or not there are differences between democracies and authoritarian regimes in how much civil conflict they engage in by simply comparing their mean values. we can extend this from the descriptives command by using the splitBy function. This allows us to in essence stratify the results of liberal democracy by whether civil war broke out or not.
descriptives(vdem, "v2x_libdem", splitBy = "e_civil_war")
DESCRIPTIVES
Descriptives
────────────────────────────────────────────────────
e_civil_war v2x_libdem
────────────────────────────────────────────────────
N 0 12361
1 823
Missing 0 672
1 23
Mean 0 0.2598558
1 0.1258007
Median 0 0.1590000
1 0.08500000
Standard deviation 0 0.2427202
1 0.1227415
Minimum 0 0.003000000
1 0.009000000
Maximum 0 0.8910000
1 0.6330000
────────────────────────────────────────────────────
What do these results show us? The output provides us with a variety of descriptive statistics split by civil war. Mean average liberal democracy among states with no civil war is 0.25 versus 0.12 in states that had a civil war. These seems like a reasonably big difference! We also see that standard deviation is bigger among states that did not have a civil war versus those who did. So democracy is more variable among states with no civil war.
The Ttest in R
What we have done here is descriptively explore differences in mean averages between groups, however this is not an inferential test. It is still only descriptive and we are yet to understand the probability of finding a difference this big if we assumed that the null was the true state of the world.
A Primer on subsetting variables
In order to move onto examine ttests, I want to talk you through a bit of code that we will use as we construct a ttest.
One thing we have to do is be able to subset variables. The following code explains how you do this. Imagine we want to access a particular variable such as e_civil_war. In R we could simply state the object$variable name and it’d produce the output of that variable.
(Important note: I have added the [1:50] to make sure it only prints the first 50 cases. If you wanted to not limit to a smaller number of cases you would delete this, however be warned if you don’t limit it you’ll have several a4 pages of lines of cases produced in the output.)
vdem$e_civil_war[1:50] [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA NA NA NA NA 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0
As you can see, in this example we get 50 cases of the civil_war variable printed out.
We could if we wanted to also ask it to check the logical condition for every value of a variable separately. To do this, we simply add == (think of this as equal to) followed by the value of the variable. In this case we want to look at states where value = 1 (so where civil war occurred).
If we do this again for with the subset 1:50, it will turn a lot of NAs. The reason is those cases are within the first 50 but are where it was equal to 0 it highlights as an NA. A way around this is to use head which takes the first 50 row numbers from that list and which, which will return row numbers where e_civil_war equals 1.
Try both of the codes below and you’ll see what it does.
vdem$e_civil_war[1:50] == 1 [1] NA NA NA NA NA NA NA NA NA NA NA NA
[13] NA NA NA NA NA NA NA NA NA NA NA NA
[25] NA NA NA NA NA NA NA NA NA FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE TRUE
[49] FALSE FALSE
head(which(vdem$e_civil_war == 1), 50) [1] 44 47 48 59 60 61 62 63 64 65 66 67 70 71 72
[16] 73 88 122 123 124 125 126 127 128 129 130 131 132 135 136
[31] 138 139 140 141 142 1131 1433 1434 1435 1436 1453 1454 1455 1456 1457
[46] 1458 1459 1460 1461 1462
Finally, we might want to subset a variable on the condition of another varibale. We can do this using square brackets. The first line of code below vdem$v2x_libdem[vdem$e_civil_war == 1] selects values from the v2x_libdem column, but only for rows where e_civil_war is equal to 1. The expression vdem$e_civil_war == 1 creates a logical vector, marking TRUE for rows that meet the condition and FALSE for all others. When used inside the brackets, it filters the v2x_libdem column to include only those cases where e_civil_war is 1.
For the purposes of this exercise, it is worth noting that if many rows meet the condition, the output could be too long for a screen reader. To limit it to the first 50 cases, you can use head(vdem$v2x_libdem[vdem$e_civil_war == 1], 50). This ensures that only the first 50 matching values are displayed, making the output more manageable.
vdem$v2x_libdem[1:50][vdem$e_civil_war[1:50] == 1] [1] NA NA NA NA NA NA NA NA NA NA NA NA
[13] NA NA NA NA NA NA NA NA NA NA NA NA
[25] NA NA NA NA NA NA NA NA NA 0.117 0.121 0.134
head(vdem$v2x_libdem[vdem$e_civil_war == 1], 50) [1] NA NA NA NA NA NA NA NA NA NA NA NA
[13] NA NA NA NA NA NA NA NA NA NA NA NA
[25] NA NA NA NA NA NA NA NA NA 0.117 0.121 0.134
[37] 0.137 0.141 0.141 0.127 0.127 0.127 0.127 0.143 0.143 0.113 0.113 0.122
[49] 0.119 0.116
One final thing we need to do before we do our ttest is to filter out data from one specific year into a new data object. The reason for this is solely down to the fact that a ttest requires us to have a grouping variable (civil war, yes or no) and a continuous variable (democracy). We don’t want this to vary over time so on the basis of this we will use a version of our data where we only keep cases from the year 2000.
we are going to create a new data object called vdem_2000 and within this we are going to keep cases from our original object vdem where the variable year is equal to 2000. The code should look like this:
vdem_2000 <- filter(vdem, year == 2000)Run the code chunk and you should have a new object in the environment.
T Tests using Base R
There are two different approaches that we can use to construct a ttest. The first we are going to examine is in base R. The command for a ttest in base r is t.test. Within this we need to specify that we want to examine the variable v2x_libdem where e_civil_war is equal to 1 and another where e_civil_war is equal to 0.
In this code, subsetting is done using vdem_2000$e_civil_war == 1 and vdem_2000$e_civil_war == 0 to create two separate groups for comparison. The expression vdem_2000$e_civil_war == 1 generates a logical vector that is TRUE for rows where e_civil_war equals 1 (indicating the presence of civil war) and FALSE otherwise. Placing this inside the square brackets selects only the values of v2x_libdem corresponding to countries experiencing civil war. The same logic applies to vdem_2000$e_civil_war == 0, which filters for countries without civil war. This method ensures that the t.test() function compares the v2x_libdem values of only those two distinct groups, rather than using all observations in the dataset.
t.test(vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 1],
vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 0])
Welch Two Sample t-test
data: vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 1] and vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 0]
t = -2.6837, df = 14.875, p-value = 0.0171
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.29273089 -0.03346933
sample estimates:
mean of x mean of y
0.2354167 0.3985168
The output for this might be a bit horrible. There are two things that we really want to pay attention to. The t-statistic and the p-value. Other things that might be useful are the mean of x/y.
The test statistic is t = -2.68, with df = 14.88, and the p-value = 0.0171. Since the p-value is below 0.05, we reject the null hypothesis, suggesting a statistically significant difference in liberal democracy scores between the two groups.
Looking at the means, countries in civil war have a lower average liberal democracy score (0.235) compared to those without civil war (0.399). The 95% confidence interval for the difference in means is [-0.293, -0.033], meaning we are 95% confident that the true difference falls within this range. Since the interval does not include zero, this further supports the conclusion that civil war is associated with lower levels of liberal democracy.
The command for an independent samples ttest is ttestIS. We are going to specify the data object (vdem_2000), our variable v2x_libdem and our grouping variable
We can also perform the same tests using jamovi ttestIS. This runs the same test but does it in a way that might be easier to interpret and less cluttered.
The ttestIS() function from the jmv package in jamovi is used to perform an independent samples t-test on the vdem_2000 dataset. In this case, the test compares the mean values of v2x_libdem between two groups defined by the e_civil_war variable, where 1 represents countries experiencing civil war and 0 represents those without civil war. The dependent variable, "v2x_libdem", is specified as a character string, and the grouping variable is defined using group = "e_civil_war", allowing the function to automatically separate the data into the two groups. Unlike the standard t.test() function in base R, ttestIS() does not require manually subsetting the data. The argument students = FALSE disables the standard Student’s t-test, which assumes equal variances, while welchs = TRUE enables Welch’s t-test, which adjusts for potential differences in variance between the two groups. This approach provides a more user-friendly method for conducting t-tests in jamovi, presenting results in a structured format that includes descriptive statistics, effect sizes, and assumption checks.
ttestIS(vdem_2000, "v2x_libdem", group = "e_civil_war",
students = FALSE, welchs = TRUE)
INDEPENDENT SAMPLES T-TEST
Independent Samples T-Test
─────────────────────────────────────────────────────────────────
Statistic df p
─────────────────────────────────────────────────────────────────
v2x_libdem Welch's t 2.683740 14.87451 0.0170990
─────────────────────────────────────────────────────────────────
Note. Hₐ μ <sub>0</sub> ≠ μ <sub>1</sub>
The output here is much more straight forward in content, it still provides the test statistic, degrees of freedom and p-value. As you can see the results are the same and we would still reject the null hypothesis it appears that there is a statistically significant difference between states who have had civil wars and those that haven’t. What does this mean? Practically when we look at the p-value there is only a 0.01 chance that we would see a difference in means as big as we do if the null was the true state of the world.
If we wanted to provide a visual of the ttest, we could use a boxplot simliar to that used in Lab 1. We use the boxplot command, specify the outcome variable (v2x_libdem) then use tilda key and write our grouping variable (e_civil_war) we then use a comma and tell it the data source data = vdem_2000. We can then give it a title using “main” and label the y and x axis using ylab and xlab. once we have run this command we can use VI command from brailleR to provide a written description of the boxplots.
BoxDem = boxplot(v2x_libdem ~ e_civil_war, data = vdem_2000,
main = "Boxplot of Liberal Democracy by Civil War Status",
ylab = "Liberal Democracy",
xlab = "Civil War Status (0 = No, 1 = Yes)")
VI(BoxDem)This graph has 2 boxplots printed vertically
With the title: Boxplot of Liberal Democracy by Civil War Status
"" appears on the x-axis.
"" appears on the y-axis.
Tick marks for the y-axis are at: 0, 0.2, 0.4, 0.6, and 0.8
Group 0 has 149 values.
There are no outliers marked for this group
The whiskers extend to 0.012 and 0.883 from the ends of the box,
which are at 0.149 and 0.661
The median, 0.35 is 39 % from the lower end of the box to the upper end.
The upper whisker is 1.62 times the length of the lower whisker.
Group 1 has 12 values.
There are no outliers marked for this group
The whiskers extend to 0.018 and 0.633 from the ends of the box,
which are at 0.0705 and 0.3725
The median, 0.1995 is 43 % from the lower end of the box to the upper end.
The upper whisker is 4.96 times the length of the lower whisker.
Statistical Control in T-Test
If we wanted to check how robust our findings are, we might want to consider using a control variable. This enables us to see whether the finding between civil war and liberal democracy holds even when we control for a third variable.
To demonstrate statistical control using a t-test, we need to examine how the relationship between civil war and liberal democracy changes while controlling for international armed conflict. The first step is to split the data based on the presence or absence of international armed conflict (represented by e_miinteco). This can be done by creating two groups: one group where both civil war (e_civil_war == 1) and international armed conflict (e_miinteco == 1) are present, and another where civil war is absent but international armed conflict is present.
For the first t-test, we subset the v2x_libdem variable for countries that have both civil war and international armed conflict. This is done with the expression vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 1 & vdem_2000$e_miinteco == 1]. Then, we subset v2x_libdem for countries that do not have civil war but do have international armed conflict using vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 0 & vdem_2000$e_miinteco == 1]. The t-test compares the means of v2x_libdem between these two groups to assess if there is a significant difference in liberal democracy between countries with and without civil war, while controlling for international armed conflict.
For the second t-test, we focus on countries that do not have international armed conflict. The first subset is for countries with civil war but no international armed conflict, using the expression vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 1 & vdem_2000$e_miinteco == 0]. The second subset is for countries without civil war and no international armed conflict, represented by vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 0 & vdem_2000$e_miinteco == 0]. Again, the t-test compares the means of v2x_libdem between these two groups.
By splitting the analysis into these two separate t-tests, we control for the influence of international armed conflict (e_miinteco). This approach allows us to examine how the relationship between civil war (e_civil_war) and liberal democracy (v2x_libdem) varies, depending on whether or not a country is also experiencing international armed conflict. The separate tests provide insight into whether civil war has a different impact on liberal democracy in countries with or without international armed conflict.
Your code should look like this:
# Subsetting for countries with international armed conflict
t.test(vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 1 & vdem_2000$e_miinteco == 1],
vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 0 & vdem_2000$e_miinteco == 1])
Welch Two Sample t-test
data: vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 1 & vdem_2000$e_miinteco == 1] and vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 0 & vdem_2000$e_miinteco == 1]
t = 0.81338, df = 10.437, p-value = 0.4342
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1494044 0.3227377
sample estimates:
mean of x mean of y
0.2963333 0.2096667
# Subsetting for countries without international armed conflict
t.test(vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 1 & vdem_2000$e_miinteco == 0],
vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 0 & vdem_2000$e_miinteco == 0])
Welch Two Sample t-test
data: vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 1 & vdem_2000$e_miinteco == 0] and vdem_2000$v2x_libdem[vdem_2000$e_civil_war == 0 & vdem_2000$e_miinteco == 0]
t = -3.0428, df = 6.097, p-value = 0.02227
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.42218174 -0.04663121
sample estimates:
mean of x mean of y
0.1745000 0.4089065
In the first t-test, where we compared countries with civil war and international armed conflict versus those without civil war but with international armed conflict, the p-value was 0.4342. Since this p-value is greater than 0.05, we fail to reject the null hypothesis, meaning that there is no statistically significant difference in liberal democracy between these two groups. This suggests that civil war, in the presence of international armed conflict, may not have a significant effect on liberal democracy.
However, in the second t-test (where we compared countries with civil war but no international armed conflict versus those with neither civil war nor international armed conflict), the p-value was 0.02227, which is less than 0.05, meaning that we reject the null hypothesis. This indicates a statistically significant difference in liberal democracy between these two groups, suggesting that civil war (when international armed conflict is absent) negatively impacts liberal democracy.
Overall, what does this mean? The fact that in the first test we fail to reject the null means that overall we would class the result as at most demonstrating partial significance. The difference in states who are at civil war or not in relation to liberal democracy only holds in the absence of international conflict. That pattern does not hold when international armed conflict.
So, when we revisit our null hypothesis: the difference in the average response between the two groups is zero, we would have to fail to reject the null. In the presence of a counfounding variable we are not certain that the difference between the two groups is not zero.
Plotting with a Control Variable
To take into account the control variable e_miinteco (international armed conflict) in the boxplot, you can use the e_miinteco variable to create separate boxplots for different levels of international armed conflict. This allows you to visualize how the distribution of liberal democracy (v2x_libdem) differs across the two groups (with and without civil war) while controlling for the presence or absence of international armed conflict.
Here’s how you can modify the boxplot() code to include e_miinteco as a control variable:
BoxDem = boxplot(v2x_libdem ~ e_civil_war * e_miinteco, data = vdem_2000,
main = "Boxplot of Liberal Democracy by Civil War Status and International Armed Conflict",
ylab = "Liberal Democracy",
xlab = "Civil War Status (0 = No, 1 = Yes) and International Armed Conflict (0 = No, 1 = Yes)")
Conclusion
With this, we see the first practical application of central limit theorem in helping us make inferential claims.
Next, we move onto do the homework exercises. You can either keep working in the same quarto file or open up a new quarto for this. If you keep working in the same quarto file you do not need to run the packages/working directory load data and can instead proceed to Q1a.
Homework Exercises
Exercise 1 – loading and filtering the data
Once we have set up our quarto file, set our working directory and a sink file to output our findings into plain text we can move onto loading our packages, loading our data and filtering the data.
Start by downloading the data rpir_vdem.csv from Blackboard and save it into the folder where you have located your working directory . Then, we need to load the tidyverse and jmv packages. We can then read the data into R saving it to a variable called vdem. Notice the data is saved in .CSV format, so we use the read_csv() function to open the data.
Your R script should look something like this:
library(tidyverse)
library(jmv)
library(BrailleR)
vdem <- read_csv("Lab 3/data/rpir_vdem.csv")In the lab workbook, we cover how to filter data and store the result as a new data.frame object. Having read in the data, in this exercise we will put to practice some of the tools you have learned in the first two labs to compare levels of democracy over time, while practicing subsetting data using the filter command.
- Q1a. Create a new
data.framewithin R calledvdem_2000that only contains observations from the year (you guessed it) 2000. What was the average level of participatory democracy in 2000 across all the countries within this new dataset?
vdem_2000 <- filter(vdem, year == 2000)
descriptives(vdem_2000, "v2x_partipdem")
DESCRIPTIVES
Descriptives
───────────────────────────────────────
v2x_partipdem
───────────────────────────────────────
N 176
Missing 1
Mean 0.3171761
Median 0.2725000
Standard deviation 0.2104733
Minimum 0.01400000
Maximum 0.7400000
───────────────────────────────────────
descriptives(vdem_2000, "v2x_partipdem")
DESCRIPTIVES
Descriptives
───────────────────────────────────────
v2x_partipdem
───────────────────────────────────────
N 176
Missing 1
Mean 0.3171761
Median 0.2725000
Standard deviation 0.2104733
Minimum 0.01400000
Maximum 0.7400000
───────────────────────────────────────
# Or, as discussed last week, just to calculate the mean:
mean(vdem_2000$v2x_partipdem, na.rm = TRUE)[1] 0.3171761
- Q1b. Create another
data.framebut this time only containing observations from 1980 – make sure to store this data as a variable with an appropriate name (Hint: see Q1a). Compare the average level of participatory democracy for 1980 compared to 2000. Is it higher or lower than in 2000?
vdem_1980 <- filter(vdem, year == 1980)
mean(vdem_1980$v2x_partipdem, na.rm = TRUE)[1] 0.194729
- Q1c. What can we conclude?
# The average level of participatory democracy increased by over 50% its level in 1980.Exercise 2 – Peace and deliberation?
In this exercise, we would like you to run and interpret a t-test on the full set of observations. Specifically, we would like to know whether there is a statistically significant difference between average levels of deliberative democracy between those observations involved in civil wars, and those not.
Note that for internal conflict to be classed as a ‘civil war’ the V-Dem dataset requires “at least one intra-state war with at least 1,000 battle deaths for each country-year.”
- Q2a. What is the null hypothesis for the t-test? And hence, what is the alternative hypothesis?
# Null: There is no difference in the population means between the two groups
# Alternative: There is a difference in the population means between the two groups- Q2b. Calculate the relevant t-test (you may use either the
ttestISort.teststrategy). What is the p-value for the test?
ttestIS(vdem, vars = "v2x_delibdem", group = "e_civil_war",
students = FALSE, welchs = TRUE)
INDEPENDENT SAMPLES T-TEST
Independent Samples T-Test
────────────────────────────────────────────────────────────────────
Statistic df p
────────────────────────────────────────────────────────────────────
v2x_delibdem Welch's t 19.80916 927.2652 < .0000001
────────────────────────────────────────────────────────────────────
Note. Hₐ μ <sub>0</sub> ≠ μ <sub>1</sub>
# or
t.test(vdem$v2x_delibdem[vdem$e_civil_war == 1],
vdem$v2x_delibdem[vdem$e_civil_war == 0])
Welch Two Sample t-test
data: vdem$v2x_delibdem[vdem$e_civil_war == 1] and vdem$v2x_delibdem[vdem$e_civil_war == 0]
t = -19.809, df = 927.27, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1388515 -0.1138190
sample estimates:
mean of x mean of y
0.1511950 0.2775303
- Q2c. On the basis of Q2b, can we reject the null hypothesis?
# Yes, since p < 0.05 (indeed p < 0.001 so highly statistically significant)- Q2d. How large is the difference in means? Is this substantively important?
0.277503 - 0.1511950 [1] 0.126308
# About 0.12 - that's just over 10% of the range of the scale
# So not negligible, but neither is it particularly large.
# Substantively, it is a moderately small difference between groupsExercise 3 – Cumulative measure of democracy
The five democracy measures in the V-dem dataset each touch on related, but substantively different, facets of the broader concept of “democracy”. Suppose now we wanted to know whether democracy overall is related to engagement in civil war, and let us focus just on the data for the year 1980, i.e. the vdem_1980 object you created in Q1.
The pipe operator (%>%) in R, primarily from the{tidyverse} packages, is a tool that improves code readability and flow by passing the output of one function directly into the next. Instead of nesting multiple functions within one another, which can be difficult to read, the pipe operator allows users to write code in a linear, step-by-step manner.
The pipe works by taking the result from the left-hand side and passing it as the first argument to the function on the right. If a function requires additional arguments, they can be added normally. Using pipes simplifies code, improves readability, and makes debugging easier by allowing users to check intermediate steps more conveniently.
Using the pipe operator, we can create “composite” measures by combining variables. One simple method is to scale each variable to the same range, and then create a new variable which is equal to the sum of the scaled variables. Fortunately for us, our five measures of democracy are already scaled between 0 and 1, so we can skip that step. We can create a new variable that combines these 5 variables by using the mutate command. Mutate is a command from dplyr that allows us to create a new variable on the basis of a range of existing variables or manipulate the shape and values of a variable. We’ll learn a lot about this next week but we start off with this illustration of a composite variable.
Therefore, we can create a new composite variable as follows:
vdem_2000 <- vdem_2000 %>%
mutate(NEW_VARIABLE_NAME = VARIABLE1 + VARIABLE2 + VARIABLE3 + VARIABLE4 + VARIABLE5)We just need to substitute in the correct variable names. Notice also that, because we use the mutate command, we do not need to put our variable names in quotation marks.
- Q3a. Using the above abstract code to guide you, create a new variable within
vdem_1980calleddem_comp(“democracy composite”) that is the sum of the five V-dem democracy measures.
vdem_1980 <- vdem_1980 %>%
mutate(dem_comp = v2x_polyarchy + v2x_libdem + v2x_partipdem + v2x_delibdem + v2x_egaldem)
summary(vdem_1980$dem_comp) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.1750 0.4597 0.6420 1.2959 1.8415 4.2210 3
- Q3b. Conduct a t-test between country-years that underwent civil war, and those that did not using the new
dem_compvariable Is the result statistically significant?
t.test(vdem_1980$dem_comp[vdem_1980$e_civil_war == 1],
vdem_1980$dem_comp[vdem_1980$e_civil_war == 0])
Welch Two Sample t-test
data: vdem_1980$dem_comp[vdem_1980$e_civil_war == 1] and vdem_1980$dem_comp[vdem_1980$e_civil_war == 0]
t = -5.6816, df = 42.623, p-value = 1.089e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.2026081 -0.5724024
sample estimates:
mean of x mean of y
0.4898333 1.3773386
# Yes, statistically significant (p < 0.001)- Q3c. Substantively, how does this difference in means compare to that in Q2.
# Here the difference is:
1.3773386 - 0.4898333[1] 0.8875053
# Which is about 0.9.
# Remember that our scale is now between 0 and 5
# So this effect is just under 20% of the range
# I.e. about double that of the difference in Q2.Exercise 4 – Internal conflict and democracy
So far we have focused on peace in terms of civil wars. But not all violence within a state reaches the level of at least 1000 battledeaths per year. Some internal conflicts have fewer fatalities, yet still reflect major departures from a ‘peaceful’ state. Does the observed difference in means above hold if we relax our definition of internal conflict?
Fortunately, the V-Dem dataset also includes a broader measure of internal conflict: e_miinterc.
- Q4a. Using the
vdem_1980data, and the newdem_compcomposite measure you created in Q3, conduct a t-test on the level of democracy between states that experienced internal conflict (e_miinterc == 1) and those that did not (e_miinterc == 0).
t.test(vdem_1980$dem_comp[vdem_1980$e_miinterc == 1],
vdem_1980$dem_comp[vdem_1980$e_miinterc == 0])
Welch Two Sample t-test
data: vdem_1980$dem_comp[vdem_1980$e_miinterc == 1] and vdem_1980$dem_comp[vdem_1980$e_miinterc == 0]
t = -1.7188, df = 7.3571, p-value = 0.1273
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.3864092 0.2126377
sample estimates:
mean of x mean of y
0.7737143 1.3606000
- Q4b. Is the result statistically significant?
# No, p = 0.13 which is greater that 0.05- Q4c. What can we therefore infer?
# We fail to reject the null hypothesis
# that the two population means are equal.
# That is, the average level of democracy between those states
# that experienced internal conflict and those that did not are
# statistically indistinguishable.Data citations
Coppedge, Michael, John Gerring, Carl Henrik Knutsen, Staffan I. Lindberg, Jan Teorell, David Altman, Michael Bernhard, M. Steven Fish, Adam Glynn, Allen Hicken, Anna Luhrmann, Kyle L. Marquardt, Kelly McMann, Pamela Paxton, Daniel Pemstein, Brigitte Seim, Rachel Sig- man, Svend-Erik Skaaning, Jeffrey Staton, Steven Wilson, Agnes Cornell, Nazifa Alizada, Lisa Gastaldi, Haakon Gjerløw, Garry Hindle, Nina Ilchenko, Laura Maxwell, Valeriya Mechkova, Juraj Medzihorsky, Johannes von Romer, Aksel Sundstrom, Eitan Tzelgov, Yi-ting Wang, Tore Wig, and Daniel Ziblatt. 2020. “V-Dem Country–Year Dataset v10” Varieties of Democracy (V-Dem) Project. https://doi.org/10.23696/vdemds20.
Coppedge, Michael, John Gerring, Carl Henrik Knutsen, Staffan I. Lindberg, Jan Teorell, David Altman, Michael Bernhard, M. Steven Fish, Adam Glynn, Allen Hicken, Anna Luhrmann, Kyle L. Marquardt, Kelly McMann, Pamela Paxton, Daniel Pemstein, Brigitte Seim, Rachel Sigman, Svend-Erik Skaaning, Jeffrey Staton, Agnes Cornell, Lisa Gastaldi, Haakon Gjerløw, Valeriya Mechkova, Johannes von Romer, Aksel Sundtrom, Eitan Tzelgov, Luca Uberti, Yi-ting Wang, Tore Wig, and Daniel Ziblatt. 2020. “V-Dem Codebook v10” Varieties of Democracy (V-Dem) Project.
Appendix: Calculating Standard Errors, T-statistics and P-Values
The t-Test Formula (Screen-Readable Format)
The formula for the independent samples t-test is:
t = (Mean₁ - Mean₂) / Standard Error of the Difference
Where:
Mean₁ and Mean₂ are the sample means of the two groups being compared.
Standard Error of the Difference (SE_diff) accounts for the variability in the data and the sample sizes. It is calculated as:
SE_diff = sqrt((s₁² / n₁) + (s₂² / n₂))
Where:
s₁² and s₂² are the variances of the two groups. Variance measures how spread out the data points are.
n₁ and n₂ are the sample sizes for each group.
Explanation of Each Component
Mean Difference (Numerator: Mean₁ - Mean₂)
This represents the observed difference between the two sample means.
A larger difference suggests a potential effect, but we need to assess whether it is statistically significant.
Standard Error of the Difference (Denominator: SE_diff)
This measures how much variability we expect in the difference between sample means due to random chance.
It is influenced by the sample sizes (larger samples reduce standard error) and the variability within each group (higher variability increases standard error).
The square root operation ensures the standard error remains in the same unit as the original measurements.
t-Statistic (Final Calculation: t = Mean Difference / SE_diff)
This value tells us how many standard errors the observed difference is away from zero (the null hypothesis expectation).
A larger absolute t-value suggests a greater difference between groups relative to variability.
How We Use the t-Statistic to Find the p-Value
The p-value tells us the probability of observing a t-statistic at least as extreme as the one we calculated, assuming the null hypothesis is true.
Steps to Obtain the p-Value from t
Determine Degrees of Freedom (df)
The degrees of freedom (df) help us find the shape of the t-distribution.
For an independent t-test, the df is calculated as:
df ≈ (n₁ + n₂ - 2) (when assuming equal variances).
Find the p-Value Using the t-Distribution
Once we have the t-statistic and degrees of freedom, we compare our t-value to a t-distribution.
The p-value is the probability of getting a t-value as extreme or more extreme than our observed t-value.
If the test is two-tailed, the p-value is the probability in both tails beyond ±t.
If the test is one-tailed, the p-value is the probability in one tail beyond t.
Compare the p-Value to the Significance Level (α = 0.05)
If p < 0.05, we reject the null hypothesis (the result is statistically significant).
If p ≥ 0.05, we fail to reject the null hypothesis (there is not enough evidence to conclude a significant difference).
Example of p-Value Calculation
Significant Result:
Mean₁ = 78, Mean₂ = 85
Standard error = 2
t = (78 - 85) / 2 = -3.5
Degrees of freedom = 28
Looking up t = -3.5 in a t-table or using statistical software, we get p = 0.002
Since p < 0.05, we reject the null hypothesis.
Non-Significant Result:
Mean₁ = 78, Mean₂ = 80
Standard error = 3
t = (78 - 80) / 3 = -0.67
Degrees of freedom = 28
Looking up t = -0.67, we get p = 0.51
Since p > 0.05, we fail to reject the null hypothesis.
Why the p-Value is Useful
Instead of looking up the critical value for a given significance level, the p-value lets us make a direct comparison:
If p < α (e.g., 0.05), the result is statistically significant.
If p > α, the result is not significant.
Statistical software calculates p-values automatically, making hypothesis testing easier than using t-tables.