Mean Comparison Tests in R

Comparing means between groups is fundamental in statistical analysis and data science. Whether you are testing drug efficacy, evaluating marketing campaigns, or analyzing experimental data, there are Powerful Tools for mean comparison tests in R Language.

Mean Comparison Tests in R Language

Here, we learn some basics about how to perform the Mean Comparison Test in R Language: hypothesis testing for one sample test, two-sample independent test, and dependent sample test. We will also learn how to find the p-values for a certain distribution, such as t-distribution and critical region values. We will also see how to perform one-tailed and two-tailed hypothesis tests.

How to Perform One-Sample t-Test in R

A recent article in The Wall Street Journal reported that the 30-year mortgage rate is now less than 6%. A sample of eight small banks in the Midwest revealed the following 30-year rates (in percent)

4.8

5.3

6.5

4.8

6.1

5.8

6.2

5.6

At the 0.01 significance level (probability of type-I error), can we conclude that the 30-year mortgage rate for small banks is less than 6%?

Manual Calculations for One-Sample t-Test and Confidence Interval

One sample mean comparison test can be performed manually.

# Manual way
X <- c(4.8, 5.3, 6.5, 4.8, 6.1, 5.8, 6.2, 5.6)
xbar <- mean(X)
s <- sd(X)
mu = 6
n = length(X)
df = n - 1 
tcal = (xbar - mu)/(s/sqrt(n) )
tcal
c(xbar - qt(0.995, df = df) * s/sqrt(n), xbar + qt(0.995, df = df) * s/sqrt(n))

Mean Comparison Tests: One sample Confidence Interval

Critical Values from t-Table

# Critical Value for Left Tail
qt(0.01, df = df, lower.tail = T)
# Critical Value for Right Tail
qt(0.99, df = df, lower.tail = T)
# Critical Vale for Both Tails
qt(0.995, df = df)

Finding p-Values

# p-value (altenative is less)
pt(tcal, df = df)
# p-value (altenative is greater)
1 - pt(tcal, df = df)
# p-value (alternative two tailed or not equal to)
2 * pt(tcal, df = df)

Performing One-Sample Confidence Interval and t-test Using Built-in Function

One can perform one sample mean comparison test using built-in functions available in the R Language.

# Left Tail test
t.test(x = X, mu = 6, alternative = c("less"), conf.level = 0.99)
# Right Tail test
t.test(x = X, mu = 6, alternative = c("greater"), conf.level = 0.99)
# Two Tail test
t.test(x = X, mu = 6, alternative = c("two.sided"), conf.level = 0.99)

How to Perform a Two-Sample t-Test in R

Consider we have two samples stored in two vectors $X$ and $Y$ as shown in R code. We are interested in the Mean Comparison Test among two groups of people regarding (say) their wages in a certain week.

X = c(70, 82, 78, 70, 74, 82, 90)
Y = c(60, 80, 91, 89, 77, 69, 88, 82)

Manual Calculations for Two-Sample t-Test and Confidence Interval

The manual calculation for two sample t-tests as a mean comparison test is as follows.

nx = length(X)
ny = length(Y)
xbar = mean(X)
sx = sd(X)
ybar = mean(Y)
sy = sd(Y)
df = nx + ny - 2

# Pooled Standard Deviation/ Variance 
SP = sqrt( ( (nx-1) * sx^2 + (ny-1) * sy^2) / df )
tcal = (( xbar - ybar ) - 0) / (SP *sqrt(1/nx + 1/ny))
tcal
# Confidence Interval
LL <- (xbar - ybar) - qt(0.975, df)* sqrt((SP^2 *(1/nx + 1/ny) ))
UL <- (xbar - ybar) + qt(0.975, df)* sqrt((SP^2 *(1/nx + 1/ny) ))
c(LL, UL)

Finding p-values

# The p-value at the left-hand side of Critical Region 
pt(tcal, df ) 
# The p-value for two-tailed Critical Region 
2 * pt(tcal, df ) 
# The p-value at the right-hand side of Critical Region 
1 - pt(tcal, df)

Finding Critical Values from the t-Table

# Left Tail
qt(0.025, df = df, lower.tail = T)
# Right Tail
qt(0.975, df = df, lower.tail = T)
# Both tails
qt(0.05, df = df)

Performing Two-Sample Confidence Interval and T-test using Built-in Function

One can perform two sample mean comparison tests using built-in functions in R Language.

# Left Tail test
t.test(X, Y, alternative = c("less"), var.equal = T)
# Right Tail test
t.test(X, Y, alternative = c("greater"), var.equal = T)
# Two Tail test
t.test(X, Y, alternative = c("two.sided"), var.equal = T)

Note that if $X$ and $Y$ variables are from a data frame, then perform the two-sample t-test using the formula symbol (~). Let’s first make the data frame from vectors $X$ and $$Y.

data <- data.frame(values = c(X, Y), group = c(rep("A", nx), rep("B", ny)))
t.test(values ~ group, data = data, alternative = "less", var.equal = T)
t.test(values ~ group, data = data, alternative = "greater", var.equal = T)
t.test(values ~ group, data = data, alternative = "two.side", var.equal = T)

Frequently Asked Questions About R
Mean Comparison Test in R

To understand probability distributions functions in R, click the link: Probability Distributions in R

MCQs in Statistics