Shapiro-Wilk Test in R (2024)

One should check/test the assumption of normality before performing a statistical test that requires the assumption of normality. In this article, we will discuss the Shapiro-Wilk Test in R (one sample t-test). The hypothesis is

$H_0$: The data are normally distributed

$H_1$: The data are not normally distributed

Performing Shapiro-Wilk Test in R

To check the normality using the Shapiro-Wilk test in R, we will use a built-in data set of mtcars.

attach(mtcars)
shapiro.test(mpg)
Shapiro-Wilk Test in R Checking Normality Assumption

The results indicate that the $mpg$ variable is statistically normal as the p-value from the Shapiro-Wilk Test is much greater than the 0.05 level of significance.

  • By looking at the p-value, one can determine whether to reject or accept the null hypothesis of normality:
    • If the p-value is less than the chosen significance level (e.g., 0.05), reject the null hypothesis and conclude that the data is likely not normally distributed.
    • If the p-value is greater than the chosen significance level, one failed to reject the null hypothesis, suggesting the data might be normal (but it does not necessarily confirm normality).

The normality can be visualized using a QQ plot.

# QQ Plot from Base Package
qqnorm(mpg, pch = 1, fram = F)
qqline(mpg, col="red", lwd = 2)

QQ Plot from Base Package

From the QQ plot of the base package, it can be seen that there are a few points due to which $mpg$ variable is not normally distributed.

# QQ plot from car Package
library(car)
qqPlot(mpg)
QQ Plot from car Package

From the QQ plot (with confidence interval band), one can observe that the $mpg$ variable is approximately normally distributed.

Note that

  • The Shapiro-Wilk test is generally more powerful than other normality tests like the Kolmogorov-Smirnov test for smaller sample sizes (typically less than 5000).
  • It is important to visually inspect the data using a histogram or Q-Q plot to complement the Shapiro-Wilk test results for a more comprehensive assessment of normality.

https://itfeature.com

R Language: A Quick Reference Guide – IV

R Quick Reference Guide

Quick Reference Quide R Language

R language: A Quick Reference Guide about learning R Programming with a short description of the widely used commands. It will help the learner and intermediate user of the R Programming Language to get help with different functions quickly. This Quick Reference is classified into different groups. Let us start with R Language: A Quick Reference – IV.

This Quick Reference will help in performing different descriptive statistics on vectors, matrices, lists, data frames, arrays, and factors.

Basic Descriptive Statistics in R Language

The following is the list of widely used functions that are further helpful in computing descriptive statistics. The functions below are not direct descriptive statistics functions, however, these functions are helpful to compute other descriptive statistics.

R CommandShort Description
sum(x1, x2, … , xn)Computes the sum/total of $n$ numeric values given as argument
prod(x1, x2, … , xn)Computes the product of all $n$ numeric values given as argument
min(x1, x2, … , xn)Gives smallest of all $n$ values given as argument
max(x1, x2, …, xn)Gives largest of all $n$ values given as argument
range(x1, x2, … , xn)Gives both the smallest and largest of all $n$ values given as argument
pmin(x1, x2, …)Returns minima of the input values
pmax(x1, x2, …)Returns maxima of the input values

Statistical Descriptive Statistics in R Language

The following functions are used to compute measures of central tendency, measures of dispersion, and measures of positions.

R CommandShort Description
mean(x)Computes the arithmetic mean of all elements in $x$
sd(x)Computes the standard deviation of all elements in $x$
var(x)Computes the variance of all elements in $x$
median(x)Computes the median of all elements in $x$
quantile(x)Computes the median, quartiles, and extremes in $x$
quantile(x, p)Computes the quantiles specified by $p$

Cumulative Summaries in R Language

The following functions are also helpful in computing the other descriptive calculations.

R CommandShort Description
cumsum(x)Computes the cumulative sum of $x$
cumprod(x)Computes the cumulative product of $x$
cummin(x)Computes the cumulative minimum of $x$
cummax(x)Computes the cumulative maximum of $x$

Sorting and Ordering Elements in R Language

The sorting and ordering functions are useful in especially non-parametric methods.

R CommandShort Description
sort(x)Sort the all elements of $x$ in ascending order
sort(x, decreasing = TRUE)Sor the all elements of $x$ in descending order
rev(x)Reverse the elements in $x$
order(x)Get the ordering permutation of $x$

Sequence and Repetition of Elements in R Language

These functions are used to generate a sequence of numbers or repeat the set of numbers $n$ times.

R CommandShort Description
a:bGenerates a sequence of numbers from $a$ to $b$ in steps of size 1
seq(n)Generates a sequence of numbers from 1 to $n$
seq(a, b)Generates a sequence of numbers from $a$ to $b$ in steps of size 1, it is the same as a:b
seq(a, b, by=s)Generates a sequence of numbers from $a$ to $b$ in steps of size $s$.
seq(a, b, length=n)Generates a sequence of numbers having length $n$ from $a$ to $b$
rep(x, n)Repeats the elements $n$ times
rep(x, each=n)Repeats the elements of $x$, each element is repeated $n$ times
R Quick Reference Guide Frequently Asked Questions About R

R Language: A Quick Reference – I

https://gmstat.com

The Poisson Regression in R

The Poisson regression model should be used when the dependent (response) variable is in the form of counts or values of the response variables following a Poisson distribution. In R, glm() function can be used to perform Poisson regression analysis.

Note that the lm() function performs simple and multiple linear regression models when the dependent variable is continuous.

Poisson Regression Models in R Language

Statistical models such as linear or Poisson regression models can be performed easily in R language.

The Poisson regression is used to analyze count data.

For the Poisson model, let us consider another built-in data set warpbreaks. This data set describes the effect of wool type (A or B) and tension (Low, Medium, and High) on the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn.

head(warpbreaks)

The $breaks$ variable is a response variable since it contains the number of breaks (count of breaks). The $tension$ and $type$ variables are taken as predictor variables.

pois_mod <- glm(breaks ~ wool + tension, data = warpbreaks, family = poisson)

The output from the pois_mod object is

Poisson Regression using glm()

The glm() provides eight choices for a family with the following default link functions:

FamilyDefault Link Function
binomial(link = “logit”)
gaussian(link = “identity”)
Gamma(link = “inverse”)
inverse.gaussian(link =$\frac{1}{\mu^2}$)
poisson(link = “log”)
quasi(link = “identity”, variance = “constant”)
quasibinomial(link = “logit”)
quasipoisson(link = “log”)

The detailed output (estimation and testing of parameters) can be obtained as

summary(pois_mod)
Summary Output Poisson Regression

Poisson Example

  • A number of cargo ships were damaged by waves (McCullagh & Nelder, 1989).
  • Number of deaths due to AIDs in Australia per quarter (3 month periods) from January 1983 – June 1986.
  • A number of violent incidents were exhibited over a 6-month period by patients who had been treated in the ER of a psychiatric hospital (Gardner, Mulvey, & Shaw, 1995).
  • Daily homicide counts in California (Grogger, 1990).
  • Founding of daycare centers in Toronto (Baum & Oliver, 1992).
  • Political party-switching among members of the US House of Representatives (King, 1988).
  • Number of presidential appointments to the Supreme Court (King, 1987).
  • A number of children in a classroom that a child lists as being their friend (unlimited nomination procedure, sociometric data).
  • A number of hard disk failures during a year.
  • Number of deaths due to SARs (Yu, Chan & Fung, 2006).
  • A number of arrests resulted from 911 calls.
  • A number of orders of protection were issued.

FAQs about Poisson Regression in R

  1. What function is used in R to perform Poisson Regression?
  2. Write about important arguments of glm() function in R to perform the Poisson Regression Model.
  3. Give real-life examples of data sets, for which Poisson regression may be performed.
  4. List the link function of the family.
  5. How Poisson Model is different from Linear Regression models?
Frequently Asked Questions About R
Poisson Regression in R

MCQs in Statistics