Comparing means between groups is fundamental in statistical analysis and data science. Whether you are testing drug efficacy, evaluating marketing campaigns, or analyzing experimental data, there are Powerful Tools for mean comparison tests in R Language.
Mean Comparison Tests in R Language
Here, we learn some basics about how to perform the Mean Comparison Test in R Language: hypothesis testing for one sample test, two-sample independent test, and dependent sample test. We will also learn how to find the p-values for a certain distribution, such as t-distribution and critical region values. We will also see how to perform one-tailed and two-tailed hypothesis tests.
Table of Contents
How to Perform One-Sample t-Test in R
A recent article in The Wall Street Journal reported that the 30-year mortgage rate is now less than 6%. A sample of eight small banks in the Midwest revealed the following 30-year rates (in percent)
4.8 | 5.3 | 6.5 | 4.8 | 6.1 | 5.8 | 6.2 | 5.6 |
At the 0.01 significance level (probability of type-I error), can we conclude that the 30-year mortgage rate for small banks is less than 6%?
Manual Calculations for One-Sample t-Test and Confidence Interval
One sample mean comparison test can be performed manually.
# Manual way X <- c(4.8, 5.3, 6.5, 4.8, 6.1, 5.8, 6.2, 5.6) xbar <- mean(X) s <- sd(X) mu = 6 n = length(X) df = n - 1 tcal = (xbar - mu)/(s/sqrt(n) ) tcal c(xbar - qt(0.995, df = df) * s/sqrt(n), xbar + qt(0.995, df = df) * s/sqrt(n))
Critical Values from t-Table
# Critical Value for Left Tail qt(0.01, df = df, lower.tail = T) # Critical Value for Right Tail qt(0.99, df = df, lower.tail = T) # Critical Vale for Both Tails qt(0.995, df = df)
Finding p-Values
# p-value (altenative is less) pt(tcal, df = df) # p-value (altenative is greater) 1 - pt(tcal, df = df) # p-value (alternative two tailed or not equal to) 2 * pt(tcal, df = df)
Performing One-Sample Confidence Interval and t-test Using Built-in Function
One can perform one sample mean comparison test using built-in functions available in the R Language.
# Left Tail test t.test(x = X, mu = 6, alternative = c("less"), conf.level = 0.99) # Right Tail test t.test(x = X, mu = 6, alternative = c("greater"), conf.level = 0.99) # Two Tail test t.test(x = X, mu = 6, alternative = c("two.sided"), conf.level = 0.99)
How to Perform a Two-Sample t-Test in R
Consider we have two samples stored in two vectors $X$ and $Y$ as shown in R code. We are interested in the Mean Comparison Test among two groups of people regarding (say) their wages in a certain week.
X = c(70, 82, 78, 70, 74, 82, 90) Y = c(60, 80, 91, 89, 77, 69, 88, 82)
Manual Calculations for Two-Sample t-Test and Confidence Interval
The manual calculation for two sample t-tests as a mean comparison test is as follows.
nx = length(X) ny = length(Y) xbar = mean(X) sx = sd(X) ybar = mean(Y) sy = sd(Y) df = nx + ny - 2
# Pooled Standard Deviation/ Variance SP = sqrt( ( (nx-1) * sx^2 + (ny-1) * sy^2) / df ) tcal = (( xbar - ybar ) - 0) / (SP *sqrt(1/nx + 1/ny)) tcal # Confidence Interval LL <- (xbar - ybar) - qt(0.975, df)* sqrt((SP^2 *(1/nx + 1/ny) )) UL <- (xbar - ybar) + qt(0.975, df)* sqrt((SP^2 *(1/nx + 1/ny) )) c(LL, UL)
Finding p-values
# The p-value at the left-hand side of Critical Region pt(tcal, df ) # The p-value for two-tailed Critical Region 2 * pt(tcal, df ) # The p-value at the right-hand side of Critical Region 1 - pt(tcal, df)
Finding Critical Values from the t-Table
# Left Tail qt(0.025, df = df, lower.tail = T) # Right Tail qt(0.975, df = df, lower.tail = T) # Both tails qt(0.05, df = df)
Performing Two-Sample Confidence Interval and T-test using Built-in Function
One can perform two sample mean comparison tests using built-in functions in R Language.
# Left Tail test t.test(X, Y, alternative = c("less"), var.equal = T) # Right Tail test t.test(X, Y, alternative = c("greater"), var.equal = T) # Two Tail test t.test(X, Y, alternative = c("two.sided"), var.equal = T)
Note that if $X$ and $Y$ variables are from a data frame, then perform the two-sample t-test using the formula symbol (~). Let’s first make the data frame from vectors $X$
and $$Y
.
data <- data.frame(values = c(X, Y), group = c(rep("A", nx), rep("B", ny))) t.test(values ~ group, data = data, alternative = "less", var.equal = T) t.test(values ~ group, data = data, alternative = "greater", var.equal = T) t.test(values ~ group, data = data, alternative = "two.side", var.equal = T)
To understand probability distributions functions in R, click the link: Probability Distributions in R