The linear regression model, typically estimated by the ordinary least squares (OLS) technique. The model in general form is

$$Y_i=x’_i\beta + \varepsilon, \quad\quad i=1,2,\cdots,n$$

In matrix notation

$$y=X\beta + \varepsilon,$$

where $y$ is a vector of order $n\times 1$ that contains values of the dependent variable, $X=(x_1,x_2,\cdots,x_n)’$ is regressor(s) matrix containing $n$ observations. $X$ matrix also called model matrix (whose column represents regressors), The $\beta$ is a $p\times 1$ vector of regressor coefficients, and $\varepsilon$ is an vector of order $n\times 1$ containing error terms.

The regression coefficients $\beta$’s can be estimated

$$\hat{\beta}=(X’X)^{-1}X’Y$$

The fitted values can be computed

$$\hat{y}=X\hat{\beta}$$

The residuals are

$$\hat{\varepsilon} = y – \hat{y}$$

The residual sum of squares is

$$\hat{\varepsilon}\varepsilon$$

R language has excellent facilities for fitting linear models. The basic function for fitting linear models by the least square method is `lm()`

function. The model is specified by a formula notation.

We will consider `mtcars`

the dataset. Let $Y=mpg$ and $X=hp$, the simple linear regression model is

$$Y_i = \beta_1 + \beta_2 hp + \varepsilon_i$$

where $\beta_1$ is the intercept and $\beta_2$ is the slope coefficient.

To fit this simple linear regression model in R, one can follow:

attach(mtcars) mod <- lm(mpg ~ hp) mod

The `lm()`

function uses a formula `mpg ~ hp`

with the response variable on the left of tilde (`~`

) and predictor on the right. It is better to supply the `data`

argument to `lm()`

function. That is,

mod <- lm(mpg ~ hp, data = mtcars)

The `lm()`

function returns an object of the class `lm`

, saved in a variable mod (it can be different). Printing the object produces a brief report. For hypothesis testing of regression coefficients `summary()`

function should be used. It will bring more information about the fitted model such as standard errors, t-values, and p-values for each coefficient of the model fitting. For example,

summary(mod)

One can fit a regression model without an intercept term if required.

lm(mpg ~ hp -1, data = mtcars)

For the graphical representation of the model, one can use `plot()`

function to draw scatter points and `abline()`

function to draw the regression line.

plot(hp, mpg) abline(mod)

Note the order of variables in `plot()`

function. The first argument to `plot()`

function represents the predictor variable while the second argument to `plot()`

function represents the response variable.

The function `abline()`

plots a line on the graph according to the slope and intercept provided by the argument mod or by providing manually.

One can change the style of the regression line using `lty`

argument. Similarly, the color of the regression line can be changed from black to some other color using `col`

argument. That is,

plot(hp, mpg) abline(mod, lty = 2, col = "blue")

Note that one can identify different observations on a graph using `identify()`

function. For example,

identify(hp, mpg)

Note to identify a point, place the mouse pointer near the point and press the left mouse button, to exit from `identify`

procedure, press the right mouse button, or `ESC`

button from the keyboard.

**Read more on Statistical models in R.**