One should check/test the assumption of normality before performing a statistical test that requires the assumption of normality. In this article, we will discuss the Shapiro-Wilk Test in R (one sample t-test). The hypothesis is
$H_0$: The data are normally distributed
$H_1$: The data are not normally distributed
Performing Shapiro-Wilk Test in R
To check the normality using the Shapiro-Wilk test in R, we will use a built-in data set of mtcars.
attach(mtcars) shapiro.test(mpg)
The results indicate that the $mpg$ variable is statistically normal as the p-value from the Shapiro-Wilk Test is much greater than the 0.05 level of significance.
- By looking at the p-value, one can determine whether to reject or accept the null hypothesis of normality:
- If the p-value is less than the chosen significance level (e.g., 0.05), reject the null hypothesis and conclude that the data is likely not normally distributed.
- If the p-value is greater than the chosen significance level, one failed to reject the null hypothesis, suggesting the data might be normal (but it does not necessarily confirm normality).
The normality can be visualized using a QQ plot.
# QQ Plot from Base Package qqnorm(mpg, pch = 1, fram = F) qqline(mpg, col="red", lwd = 2)
From the QQ plot of the base
package, it can be seen that there are a few points due to which $mpg$ variable is not normally distributed.
# QQ plot from car Package library(car) qqPlot(mpg)
From the QQ plot (with confidence interval band), one can observe that the $mpg$ variable is approximately normally distributed.
Note that
- The Shapiro-Wilk test is generally more powerful than other normality tests like the Kolmogorov-Smirnov test for smaller sample sizes (typically less than 5000).
- It is important to visually inspect the data using a histogram or Q-Q plot to complement the Shapiro-Wilk test results for a more comprehensive assessment of normality.