Simple Linear Regression Model

The linear regression model, typically estimated by the ordinary least squares (OLS) technique. The model in general form is

$$Y_i=x’_i\beta + \varepsilon, \quad\quad i=1,2,\cdots,n$$

In matrix notation

$$y=X\beta + \varepsilon,$$

where $y$ is a vector of order $n\times 1$ that contains values of the dependent variable, $X=(x_1,x_2,\cdots,x_n)’$ is regressor(s) matrix containing $n$ observations. $X$ matrix also called model matrix (whose column represents regressors), The $\beta$ is a $p\times 1$ vector of regressor coefficients, and $\varepsilon$ is an vector of order $n\times 1$ containing error terms.

The regression coefficients $\beta$’s can be estimated


The fitted values can be computed


The residuals are

$$\hat{\varepsilon} = y – \hat{y}$$

The residual sum of squares is


R language has excellent facilities for fitting linear models. The basic function for fitting linear models by the least square method is lm() function.  The model is specified by a formula notation.

We will consider mtcars the dataset. Let $Y=mpg$ and $X=hp$, the simple linear regression model is

$$Y_i = \beta_1 + \beta_2 hp + \varepsilon_i$$

where $\beta_1$ is the intercept and $\beta_2$ is the slope coefficient.

To fit this simple linear regression model in R, one can follow:

mod <- lm(mpg ~ hp)

The lm() function uses a formula mpg ~ hp with the response variable on the left of tilde (~) and predictor on the right. It is better to supply the data argument to lm() function. That is,

mod <- lm(mpg ~ hp, data = mtcars)

The lm() function returns an object of the class lm, saved in a variable mod (it can be different). Printing the object produces a brief report. For hypothesis testing of regression coefficients summary() function should be used. It will bring more information about the fitted model such as standard errors, t-values, and p-values for each coefficient of the model fitting. For example,


One can fit a regression model without an intercept term if required.

lm(mpg ~ hp -1, data = mtcars)

For the graphical representation of the model, one can use plot() function to draw scatter points and abline() function to draw the regression line.

plot(hp, mpg)

Note the order of variables in plot() function. The first argument to plot() function represents the predictor variable while the second argument to plot() function represents the response variable.

The function abline() plots a line on the graph according to the slope and intercept provided by the argument mod or by providing manually.

One can change the style of the regression line using lty argument. Similarly, the color of the regression line can be changed from black to some other color using col argument. That is,

plot(hp, mpg)
abline(mod, lty = 2, col = "blue")

Note that one can identify different observations on a graph using identify() function. For example,

identify(hp, mpg)
plot and identify function in R

Note to identify a point, place the mouse pointer near the point and press the left mouse button, to exit from identify procedure, press the right mouse button, or ESC button from the keyboard.

Read more on Statistical models in R.