Backward Deletion Method Step by Step in R

Introduction to Backward Deletion Method

With many predictor variables, one can create the most statistically significant model from the data. There are two main choices: forward stepwise regression and backward deletion method.
In Forward Stepwise Regression: Start with the single best variable and add more variables to build your model into a more complex form.

In Backward Deletion (Backward Selection) Regression: put all the variables in the model and reduce the model by removing variables until you are left with only significant terms.

Backward Deletion method (Step by Step Procedure)

Let’s start with a big model and trim it until you get the best (most statistically significant) regression model. This drop1() command can examine a linear model and determine the effect of removing each one from the existing model. Complete the following steps to perform a backward deletion. Note that the model has different R packages for the Backward and Forward Selection of predictors.

Step 1: (Full Model)

Step 1: To start, create a “full” model (all variables at once in the model). It would be tedious to enter all the variables in the model, one can use the shortcut, the dot notation.

mod <- lm(mpg ~., data = mtcars)

Step 2: Formula Function

Step 2: Let’s use the formula() function to see the response and predictor variables used in Step 1.

formula(mod)
Backward Deletion Method

Step 3: Drop1 Function

Step 3: Let’s use the drop1() function to see which term (predictor) should be deleted from the model

drop1(mod)

Step 4: Remove the Term

Step 4: Look to remove the term with the lowest AIC value. Re-form the model without the variable that is non-significant or has the lowest AIC value. The simplest way to do this is to copy the model formula in the clipboard, paste it into a new command, and edit out the term you do not want

mod1 <- lm(mpg ~ ., data = mtcars)

Step 5: Examine the Effect

Step 5: Examine the effect of dropping another term by running the drop1() command once more:

drop1(mod1)

If you see any variable having the lowest AIC value, if found, remove the variable and carry out this process repeatedly until you have a model that you are happy with.

FAQS about Backward Deletion Method in R

  1. Write a step-by-step procedure to perform the Backward Deletion Method in r.
  2. How one can examine the effect of dropping the term from the model?
  3. What is the use of the formula function term in lm() model?
  4. What is the use of drop1() function in r?

Learn more about lm() function

Online MCQs Quiz Website

Performing Linear Regression in R: A Quick Reference

Introduction to Performing Linear Regression in R

Regression is to build a function of independent variables (also known as predictors, regressors, explanatory variables, and features) to predict a dependent variable (also called a response, target, and regressand). Here we will focus on performing linear regression in R Language.

Linear regression is to predict response with a linear function of predictors as $$y=\beta_0+\beta_1x_1+\beta_2x_2+\cdots + \beta_kx_k,$$ where $x_1, x_2, \cdots, x_k$ are predictors and $y$ is the response to predict.

Before performing the regression analysis it will be very helpful to computer the coefficient of correlation between dependent variable and independent variable and also better to draw the scatter diagram.

Performing Linear Regression in R

Load the mtcars data, and check the data structure using str().

str(mtcars)

You have data stored in some external file such as CSV, then you can use read.csv() function to load the data in R. To learn about importing data files in R follow the link: Import Data files in R

Let us want to check the impact of weight (wt) on miles per gallon (mpg) and test the significance of the regression coefficient and other statistics to see the goodness of our fitted model

mod <- lm(mpg ~ wt, data = mtcars)
summary(mod)
Performing Linear Regression in R Estimation and Testing

Now look at the objects of results stored in mod

names(mod)

Getting Coefficients and Different Regression Statistics

Let us get the coefficients of the fitted regression model in R

mod$coef
coef(mod)

To obtain the confidence intervals of the estimated coefficients, one can use the confint()

confint(mod)

Fitted values from the regression model can be obtained by using fitted()

mod$fitted
fitted(mod)

The residuals can be obtained for the regression model using residual() function

mod$resid
resid(mod)

One can check the formula used to perform the simple/ multiple regression. It will tell you which variable is used as a response and others as explanatory variables.

formula (mod)

Graphical Representation of Relationship

To graphically visualize the relationship between variables or pairs of variables one can use plot() or pair() functions. Let us draw the scatter diagram between the dependent variable mpg and the explanatory variable wt using the plot() function.

plot(mpg ~ wt, data = mtcars)
Scatter Plot and Performing Linear Regression in R

One can add a best-fitted line to the scatter plot. For this purpose use abline() with an object having the class lm such as mod in this case

abline(mod)

There are many other functions and R packages to perform linear regression models in the R Language.

FAQS about Performing Linear Regression Models in R

  1. What is the use of abline() function in R?
  2. How a simple linear regression model can be visualized in R?
  3. How one can obtain fitted/predicted values of the simple linear regression model in R?
  4. Write a command that saves the residuals of lm() model in a variable.
  5. State the step-by-step procedure of performing linear regression in R.

To learn more about the lm() function in R

https://itfeature.com

lm Function in R: A Comprehensive Guide

Introduction to lm Function in R

Many generic functions are available for the computation of regression coefficients, for example, testing the coefficients, computing the residuals, prediction values, etc. Therefore, a good grasp of the lm() function is necessary. It is assumed that you are aware of performing the regression analysis using the lm function.

mod <- lm(mpg ~ hp, data = mtcars)

To learn about performing linear regression analysis using the lm function you can visit the article “Performing Linear Regression in R

Objects of “lm” Class

The object returned by the lm() function has a class of “lm”. The objects associated with the “lm” class have mode as a list.

class(mod)

The name of the objects related to the “lm” class can be queried via

names(mod)

All the components of the “lm” class can be assessed directly. For example,

mod$rank

mod$coef   # or mod$coefficients

Generic Functions of “lm” model

The following is the list of some generic functions for the fitted “lm” model.

Generic FunctionShort Description
print()print or display the results in the R Console
summary()print or displays regression coefficients, their standard errors, t-ratios, p-values, and significance
coef()extracts regression coefficients
residuals()or resid(): extracts residuals of the fitted model
fitted()or fitted.values() : extracts fitted values
anova()perform comparisons of the nested model
predict()compute predicted values for new data
plot()draw a diagnostics plot of the regression model
confint()compute the confidence intervals for regression coefficients
deviance()compute the residual sum of squares
vcov()compute estimated variance-covariance matrix
logLik()compute the log-likelihood
AIC(), BIC()compute information criteria

It is better to save objects from the summary() function.

The summary() function returns an object of class “summary.lm()” and its components can be queried via

sum_mod <- summary(mod)

names(sum_mod)
names( summary(mod) )
lm class objects

The objects from the summary() function can be obtained as

sum_mod$residuals
sum_mod$r.squared
sum_mod$adj.r.squared
sum_mod$df
sum_mod$sigma
sum_mod$fstatistic

Computation and Visualization of Prediction and Confidence Interval

The confidence interval for estimated coefficients can be computed as

confint(mod, level = 0.95)

Note that level argument is optional if the confidence level is 95% (significance level is 5%).

The prediction intervals for mean and individual for hp (regressor) equal to 200 and 160, can be computed as

predict(mod, newdata=data.frame(hp = c(200, 160)), interval = "confidence" )
predict(mod, newdata=data.frame(hp = c(200, 160)), interval = "prediction" )

The prediction intervals can be used for computing and visualizing confidence bands. For example,

x = seq(50, 350, length = 32 )
pred <- predict(mod, newdata=data.frame(x), interval = "prediction" )

plot(hp, mpg)
lines(pred[,1] ~ x, col = 1) # fitted values
lines(pred[,2] ~ x, col = 2) # lower limit
lines(pred[,3] ~ x, col = 2) # upper limit
Visualization of prediction intervals and confidence band

Regression Diagnostics

For diagnostics plot, the plot() function can be used and it provides four graphs of

  • residuals vs fitted values
  • QQ plot of standardized residuals
  • scale-location plot of fitted values against the square root of standardized residuals
  • standardized residuals vs leverage
diagnostic plot of model from lm function

To plot say QQ plot only use

plot(mod, which = 2)

which argument is used to select the graph produced out of four.

FAQS about lm() Functions in R

  1. What is the use of lm() function in R?
  2. What is the class of lm() and name the object of lm() function too?
  3. Describe the generic functions for the object of class lm.
  4. What are the important objects of summary.lm() object?
  5. How objects of summary.lm() function can be accessed?
  6. How confidence and prediction intervals can be visualized in R for linear models?
  7. How diagnostics are performed in the R Language?
  8. What is the use of confint(), fitted(), coef(), anova(), vcov(), deviance(), and residuals generic functions?

Test Preparation MCQs

Learn R Programming