lm Function in R: A Comprehensive Guide

Introduction to lm Function in R

Many generic functions are available for the computation of regression coefficients, for example, testing the coefficients, computing the residuals, prediction values, etc. Therefore, a good grasp of the lm() function is necessary. It is assumed that you are aware of performing the regression analysis using the lm function.

mod <- lm(mpg ~ hp, data = mtcars)

Objects of “lm” Class

The object returned by the lm() function has a class of “lm”. The objects associated with the “lm” class have mode as a list.


The name of the objects related to the “lm” class can be queried via


All the components of the “lm” class can be assessed directly. For example,


mod$coef   # or mod$coefficients

Generic Functions of “lm” model

The following is the list of some generic functions for the fitted “lm” model.

Generic FunctionShort Description
print()print or display the results in the R Console
summary()print or displays regression coefficients, their standard errors, t-ratios, p-values, and significance
coef()extracts regression coefficients
residuals()or resid(): extracts residuals of the fitted model
fitted()or fitted.values() : extracts fitted values
anova()perform comparisons of the nested model
predict()compute predicted values for new data
plot()draw a diagnostics plot of the regression model
confint()compute the confidence intervals for regression coefficients
deviance()compute the residual sum of squares
vcov()compute estimated variance-covariance matrix
logLik()compute the log-likelihood
AIC(), BIC()compute information criteria

It is better to save objects from the summary() function.

The summary() function returns an object of class “summary.lm()” and its components can be queried via

sum_mod <- summary(mod)

names( summary(mod) )
lm class objects

The objects from the summary() function can be obtained as


Computation and Visualization of Prediction and Confidence Interval

The confidence interval for estimated coefficients can be computed as

confint(mod, level = 0.95)

Note that level argument is optional if the confidence level is 95% (significance level is 5%).

The prediction intervals for mean and individual for hp (regressor) equal to 200 and 160, can be computed as

predict(mod, newdata=data.frame(hp = c(200, 160)), interval = "confidence" )
predict(mod, newdata=data.frame(hp = c(200, 160)), interval = "prediction" )

The prediction intervals can be used for computing and visualizing confidence bands. For example,

x = seq(50, 350, length = 32 )
pred <- predict(mod, newdata=data.frame(x), interval = "prediction" )

plot(hp, mpg)
lines(pred[,1] ~ x, col = 1) # fitted values
lines(pred[,2] ~ x, col = 2) # lower limit
lines(pred[,3] ~ x, col = 2) # upper limit
Visualization of prediction intervals and confidence band

Regression Diagnostics

For diagnostics plot, the plot() function can be used and it provides four graphs of

  • residuals vs fitted values
  • QQ plot of standardized residuals
  • scale-location plot of fitted values against the square root of standardized residuals
  • standardized residuals vs leverage
diagnostic plot of model from lm function

To plot say QQ plot only use

plot(mod, which = 2)

which argument is used to select the graph produced out of four.

Simple Linear Regression Model

Introduction to Simple Linear Regression Model

The linear regression model is typically estimated by the ordinary least squares (OLS) technique. The model in general form is

$$Y_i=x’_i\beta + \varepsilon, \quad\quad i=1,2,\cdots,n$$

In matrix notation

$$y=X\beta + \varepsilon,$$

where $y$ is a vector of order $n\times 1$ that contains values of the dependent variable, $X=(x_1,x_2,\cdots,x_n)’$ is regressor(s) matrix containing $n$ observations. $X$ matrix also called model matrix (whose column represents regressors), The $\beta$ is a $p\times 1$ vector of regressor coefficients, and $\varepsilon$ is a vector of order $n\times 1$ containing error terms. To learn more about Simple linear Models, visit the link: Simple Linear Regression Models.

Estimating Regression Coefficients

The regression coefficients $\ beta$ can be estimated


The fitted values can be computed


The residuals are

$$\hat{\varepsilon} = y – \hat{y}$$

The residual sum of squares is


R language has excellent facilities for fitting linear models. The basic function for fitting linear models by the least square method is lm() function.  The model is specified by a formula notation.

We will consider mtcars the dataset. Let $Y=mpg$ and $X=hp$, the simple linear regression model is

$$Y_i = \beta_1 + \beta_2 hp + \varepsilon_i$$

where $\beta_1$ is the intercept and $\beta_2$ is the slope coefficient.

Fitting Simple Linear Regression Model in R

To fit this simple linear regression model in R, one can follow:


mod <- lm(mpg ~ hp)

The lm() function uses a formula mpg ~ hp with the response variable on the left of the tilde (~) and predictor on the right. It is better to supply the data argument to lm() function. That is,

mod <- lm(mpg ~ hp, data = mtcars)

The lm() function returns an object of the class lm, saved in a variable mod (it can be different). Printing the object produces a brief report.

Hypothesis Testing of Regression Coefficients

For hypothesis testing regression coefficients summary() function should be used. It will bring more information about the fitted model such as standard errors, t-values, and p-values for each coefficient of the model fitting. For example,


One can fit a regression model without an intercept term if required.

lm(mpg ~ hp -1, data = mtcars)

Graphical Representation of the Model

For the graphical representation of the model, one can use the plot() function to draw scatter points and the abline() function to draw the regression line.

plot(hp, mpg)

Note the order of variables in the plot() function. The first argument to plot() function represents the predictor variable while the second argument to plot() function represents the response variable.

The function abline() plots a line on the graph according to the slope and intercept provided by the argument mod or by providing it manually.

One can change the style of the regression line using lty argument. Similarly, the color of the regression line can be changed from black to some other color using col argument. That is,

plot(hp, mpg)
abline(mod, lty = 2, col = "blue")

Note that one can identify different observations on a graph using the identify() function. For example,

identify(hp, mpg)
Simple Linear Regression Model

Note to identify a point, place the mouse pointer near the point and press the left mouse button, to exit from identify procedure, press the right mouse button, or ESC button from the keyboard.

FAQs about Simple Linear Regression in R

  1. What is a simple linear regression model? How it can be performed in the R Language?
  2. How lm() function is used to fit a simple linear regression model in detail?
  3. How estimation and testing of the regression coefficient can be performed in R?
  4. What is the use of summary() function in R, explain.
  5. How visualization of regression models in R can be performed?

Graphical Representations in R

Many graphical representations in R Language are available for qualitative and quantitative data types. This post will only discuss graphical representations in R such as histograms, bar plots, and box plots.

Creating Histogram in R

To visualize a single variable, the histogram can be drawn using the hist() function in R. The use of histograms is to judge the shape and distribution of data in a graphical way. Histograms are also used to check the normality of the variable.

Let us attach the data from iris dataset.


We can enhance the histogram by using some arguments/parameters related to the hist() function in R. For example,

  xlab = "Petal Width",
  ylab = "Frequency",
  main = "Histogram of Petal Width from Iris Data set",
  breaks = 10,
  col = "dodgerblue",
  border = "orange")
Graphical Representations in R Language

If these arguments are not provided, R will attempt to intelligently guess them, especially the number of breaks. See the YouTube tutorial for graphical representations of the histogram.

Creating Barplots in R

The bar plots are the best choice for visual inspection of a categorical variable (or a numeric variable with a finite number of values), or a rank variable. Usually, one can use bar plots for comparison purposes. The barplot() function can be used for visual inspection of a categorical variable.

barplot( table(cyl) )
  ylab = "Frequency",
  xlab = "Cylinders (4, 6, 8)",
  main = "Number of cylinders ",
  col = "green",
  border = "blue")

Creating Boxplots in R

One can use Boxplots to visualize the normality, skewness, and existence of outliers in the data based on five-number summary statistics.


However, one can compare a numerical variable for different values of a categorical/grouping variable. For example,

boxplot(mpg ~ cyl, data = mtcars)
Graphical Representations in R Boxplot

The reads the formula mpg ~ cyl as: “Plot the mpg variable against the cyl variable using the dataset mtcars. The symbol ~ used to specify a formula in R.

boxplot(mpg ~ cyl, data =mtcars,
  xlab = "Cylinders",
  ylab = "Miles per Gallon",
  pch = 20,
  cex = 2,
  col = "pink",
  border = "black")

See How to perform descriptive statistics

