Simple Linear Regression Model

The linear regression model is typically estimated by the ordinary least squares (OLS) technique. The model in general form is

$$Y_i=x’_i\beta + \varepsilon, \quad\quad i=1,2,\cdots,n$$

In matrix notation

$$y=X\beta + \varepsilon,$$

where $y$ is a vector of order $n\times 1$ that contains values of the dependent variable, $X=(x_1,x_2,\cdots,x_n)’$ is regressor(s) matrix containing $n$ observations. $X$ matrix also called model matrix (whose column represents regressors), The $\beta$ is a $p\times 1$ vector of regressor coefficients, and $\varepsilon$ is a vector of order $n\times 1$ containing error terms.

Estimating Regression Coefficients

The regression coefficients $\ beta$ can be estimated

$$\hat{\beta}=(X’X)^{-1}X’Y$$

The fitted values can be computed

$$\hat{y}=X\hat{\beta}$$

The residuals are

$$\hat{\varepsilon} = y – \hat{y}$$

The residual sum of squares is

$$\hat{\varepsilon}\varepsilon$$

R language has excellent facilities for fitting linear models. The basic function for fitting linear models by the least square method is lm() function.  The model is specified by a formula notation.

We will consider mtcars the dataset. Let $Y=mpg$ and $X=hp$, the simple linear regression model is

$$Y_i = \beta_1 + \beta_2 hp + \varepsilon_i$$

where $\beta_1$ is the intercept and $\beta_2$ is the slope coefficient.

Fitting Simple Linear Regression Model in R

To fit this simple linear regression model in R, one can follow:

attach(mtcars)

mod <- lm(mpg ~ hp)
mod

The lm() function uses a formula mpg ~ hp with the response variable on the left of the tilde (~) and predictor on the right. It is better to supply the data argument to lm() function. That is,

mod <- lm(mpg ~ hp, data = mtcars)

The lm() function returns an object of the class lm, saved in a variable mod (it can be different). Printing the object produces a brief report. For hypothesis testing regression coefficients summary() function should be used. It will bring more information about the fitted model such as standard errors, t-values, and p-values for each coefficient of the model fitting. For example,

summary(mod)

One can fit a regression model without an intercept term if required.

lm(mpg ~ hp -1, data = mtcars)

Graphical Representation of the Model

For the graphical representation of the model, one can use the plot() function to draw scatter points and the abline() function to draw the regression line.

plot(hp, mpg)
abline(mod)

Note the order of variables in the plot() function. The first argument to plot() function represents the predictor variable while the second argument to plot() function represents the response variable.

The function abline() plots a line on the graph according to the slope and intercept provided by the argument mod or by providing it manually.

One can change the style of the regression line using lty argument. Similarly, the color of the regression line can be changed from black to some other color using col argument. That is,

plot(hp, mpg)
abline(mod, lty = 2, col = "blue")

Note that one can identify different observations on a graph using the identify() function. For example,

identify(hp, mpg)
Simple Linear Regression Model

Note to identify a point, place the mouse pointer near the point and press the left mouse button, to exit from identify procedure, press the right mouse button, or ESC button from the keyboard.

Read more on Statistical models in R

MCQs in Statistics

Leave a Reply