One-Way ANOVA in R: A Comprehensive Quide

In this post, we will learn about one-way ANOVA in R Language.

The two-sample t or z-test is used to compare two groups from the independent population. However, if there are more than two groups, One-Way ANOVA (analysis of variance) or its further versions can be used in R.

Introduction to One-Way ANOVA

The statistical test statistic associated with ANOVA is the F-test (also called F-ratio). In the Anova procedure, an observed F-value is computed and then compared with a critical F-value derived from the relevant F-distribution. The F-value comes from a family of F-distribution defined by two numbers (the degrees of freedom). Note that the F-distribution cannot be negative as it is the ratio of variance and variances are always positive numbers.

The One-Way ANOVA is also known as one-factor ANOVA. It is the extension of the independent two-sample test for comparing means when there are more than two groups. The data in One-Way ANOVA is organized into several groups based on grouping variables (called factor variables too).

To compute the F-value, the ratio of “the variance between groups”,  and the “variance within groups” needs to be computed. The assumptions of ANOVA should also be checked before performing the ANOVA test. We will learn how to perform One-Way ANOVA in R.

One-Way ANOVA in R

Suppose we are interested in finding the difference of miles per gallon based on number of the cylinders in an automobile; from the dataset “mtcars”

Let us get some basic insight into the data before performing the ANOVA.

# load and attach the data mtcars
attach(mtcars)
# see the variable names and initial observations
head(mtcars)

Let us find the means of each number of the cylinder group

print(model.tables(res, "means"), digits = 4)

Let us draw the boxplot of each group

boxplot(mpg ~ cyl, main="Boxplot", xlab="Number of Cylinders", ylab="mpg")

Now, to perform One-Way ANOVA in R using the aov( ) function. For example,

aov(mpg ~ cyl)

The variable “mpg” is continuous and the variable “cyl” is the grouping variable. From the output note the degrees of freedom under the variable “cyl”. It will be one. It means the results are not correct as the degrees of freedom should be two as there are three groups on “cyl”. In the mode (data type) of grouping variable required for ANOVA  should be the factor variable. For this purpose, the “cyl” variable can be converted to factor as

cyl <- as.factor(cyl)

Now re-issue the aov( ) function as

aov(mpg ~ cyl)

Now the results will be as required. To get the ANOVA table, use the summary( ) function as

summary(aov (mpg ~ cyl))

Let’s store the ANOVA results obtained from aov( ) in object say res

res <- aov(mpg ~ cyl)
summary(res)

Post-Hoc Analysis (Multiple Pairwise Comparison)

Post-hoc tests or multiple-pairwise comparison tests help in finding out which groups differ (significantly) from one other and which do not. The post-hoc tests allow for multiple-pairwise comparisons without inflating the type-I error. To understand it, suppose the level of significance (type-I error) is 5%. Then the probability of making at least one Type-I error (assuming independence of three events), the maximum family-wise error rate will be

$1-(0.95 \times 0.95 \times 0.95) =  14.2%$

It will give the probability of having at least one FALSE alarm (type-I error).

To perform Tykey’s post-hoc test and plot the group’s differences in means from Tukey’s test.

# Tukey Honestly Significant Differences
TukeyHSD(res)
plot(TukeyHSD(res))

Diagnostic Plots (Checking Model Assumptions)

The diagnostic plots can be used to check the assumption of heteroscedasticity, normality, and influential observations.

layout(matrix(c(1,2,3,4), 2,2))
plot(res)
Diagnostic Plots for One-Way ANOVA in R

Levene’s Test

To check the assumption of ANOVA, Levene’s test can be used. For this purpose leveneTest( ) function can be used which is available in the car package.

library(car)
leveneTest(res)

https://itfeature.com

https://gmstat.com

For loop in R | Simulating Data in R: A Comprehensive Guide

In different programming languages and R, the for loop (for statement) allows one to specify the set of codes (commands) that should be repeated a fixed number of times. The for loop in R is not limited to integers or even numbers in the input. The character vectors, logical vectors, lists, or even expressions can also be used in a for loop in R.

General Syntax of For Loop in R

The general syntax of the for loop is

for (named vector) {
   statements (R codes)
}

The curly braces contain a set of commands so that these commands can be treated as a single command and can be repeated the desired number of times. However, if there is only a single statement then there is no need to use curly races.

Let us understand the loop through different examples. Note that some of the examples can be done without the use of a for loop or with alternatives such as apply(), lapply(), sapply(), and tapply() functions.

Working Examples of For Loop in R

Example: Suppose you want to compute the squared values for 1 to 10. Let us do this using a for loop as shown below:

for (i in 1:10){
    squared <- i^2
    print(squared)
}

Note: if you write print(squared) command outside the for loop (after the curly braces), then the last result of the loop iteration will be displayed in the console only, that is, the square of the last number (n = 10) will be printed.

To store the result of each iteration in a variable (vector, matrix, data frame, or list), a container (variable) needs to be specified of the sample length as that of the loop. For example, the outcome of each iteration (from the above example) can be stored in a variable as,

result <- vector("numeric", 10)

for (i in length(result) ){
    squared <- i^2
    result[i] <- squared
}

result

Now results can be displayed without print() command as the results are stored in a container (vector variable). To store results in a data frame or matrix (in the form of the table) with iteration number, the above example can be extended as

result <- data.frame(matrix(nrow = 10, ncol = 2))
colnames(result) <- c("i", "Square")

for (i in 1:10 ){
    squared <- i ^ 2
    result[i, 1] <- i  # stores iteration number in 1stcolumn of data frame
    result[i, 2] <- squared # stores iteration result in 2nd column of data frame
}

result

Nesting For Loop in R

Placing the loop inside the body of another loop is called nesting. For nested loops, the outer loop takes control of the iteration of the inner loop. The inner loop will be executed (iterated) n-times for every iteration of the outer loop. For example

for (i in 1:10){
    for (j in 1:5){
        print(i*j)
    }
}

There will be a total of 50 iterations. For each iteration of the first loop (outer loop), there will be five iterations in the inner loop.

The break statement can be used inside a loop if one wants to stop the iteration when a certain condition (situation) occurs and the control will be out of the loop when the condition is satisfied. For example,

n <- 1:5
for (i in x){
   if( i == 3){
     break
   }
print(i)
}

It is also possible to jump to the next iteration using the next statement when certain conditions are satisfied. For example,

n <- 1:5

for( i in x ){
    if (i == 2){
       next
    }
print(i)
}

Now consider the example of for loop using a character vector

v = LETTERS[1:10]
for(i in v){
    print(i)
}

Using For Loop in Simulations

In simulations, use loops to generate or resample (bootstrap) data. For example, let’s create a variable having 1000 observations (n = 1000), where each observation is a function of the previous observation according to the equation $y_t$ is 80%, $y_{t-1} + 20%$. with random noise having a mean of 0 and a standard deviation of 1. The value of$y_1=1$.

y <- rep(1,1000)

for(i in 2:1000){
    y[i] = 0.8 * y[i-1] + 0.2 * rnorm(1)
}

y

Consider another example of generating simulated data. Suppose, you want to simulate the mixture data and want to repeat it many times and you also want to store the data for each time.

n = 100
res = list()
N = 1000
X = matrix(0, nrow = N, ncol = 2)

for(i in 1:n){
    U = runif(N, min = 0, max = 1)
    for(j in 1:N){
        if (U[j] & 0.8){
          X[j,] <- rnorm(1, 2.5, 3)
        } else{
             X[j,] <- rnorm(1,2,1)
          }
        }
  
    res[[i]] = X
}

res[[100]]
Simulated Data using for loop in R

Note that each res[[i]] is a separate data set, which can be used for further calculations.

Learn about Conditional Statements in R
Online MCQs Statistics with Answers

Lists in R Language: Create, Name, and Append

Before understanding and working on lists in R Language, let’s review the other data types in R Language.

Each of the data types (vectors, matrices, and data frames) has some level of constraints. For example, vectors are single-column data types and can only store one type of data. Matrices are of two-dimensional, but they can store only one type of data. On the other hand, data frames are two-dimensional and can store different types of data, but in data frames, the length of columns should be the same.

Lists in R Language

Lists in R are a fundamental data structure used to store collections of elements. Lists offer a flexible way to organize data of different types, unlike vectors that can hold elements of the same data type. Lists in R language have no such constraints as vectors, matrices, and data frames. The element of a list can contain any type of data can contain any type of data having varying lengths for each element.

Lists are the most flexible data type in R, as the list can hold all kinds of different operations when programming. Let us consider some examples of how to create lists in R, and how elements of the list can be named, retrieved, and appended. The list data type is created using the list() keyword. For examples,

mylist <- list('alpha', 'beta', 'gamma', 1:5, TRUE)

Note that lists are printed differently and they have their form of indexing for retrieving each element of the list.

Retrieving List Elements

A certain element of a list can be accessed using the subsetting technique (based on the list indexing mechanism. For example, to access the first three elements of the list created above, one can write in R console as,

A certain element of a list can be accessed using a subsetting technique (based on the list indexing mechanism. For example, to access the first three elements of the list created above, one can write in R console as,

mylist[1:3]

To examine one element of the list, double square brackets can be used with the required index number. For example, to access the fourth element of the list, one can use,

mylist[[4]]

Note when one uses normal subsetting, a list will be obtained as a result. When double square brackets are used the contents of the elements are obtained. Note the difference and command and output of each command used below:

mylist[4]
mylist[[4]]

Naming Elements of a List

The elements of a list can be named, and elements of lists can be retrieved using the $ operator instead of square brackets. The first command will name the elements of the mylist object. and then other commands will result in the same output.

names(mylist) <- c("a", "b", "c", "d", "e")

mylist$d
myslit[[4]]
Lists in R Language: Create, Name, and Append

Appending an Element to List

Like data frames, an element of a list can be created by assigning something to an index that does not exist yet. For example, a sixth element is added to the list

mylist$f <- c("Pass", "Fail")

Now the 6th element of the list contains a vector of strings “Pass”, and “Fail”. To check this,

mylist$f
mylist[[6]]

To access the elements of specific list vector subsetting and two square brackets can be used. For example to access the fourth element of the list one can use,

mylist[[4]][3] # 3rd element of 4th list element
mylist[[4]][1:3]# first three elements of 4th list element
mylist$d[3]
  • Remember to name elements for readability when dealing with complex data structures.
  • Lists are versatile for storing various data types, making them a powerful tool for data organization in R.

For further reading about lists see the link Lists in R.

MCQs General Knowledge