Introduction to Simple Linear Regression Model
The linear regression model is typically estimated by the ordinary least squares (OLS) technique. The model in general form is
$$Y_i=x’_i\beta + \varepsilon, \quad\quad i=1,2,\cdots,n$$
In matrix notation
$$y=X\beta + \varepsilon,$$
where $y$ is a vector of order $n\times 1$ that contains values of the dependent variable, $X=(x_1,x_2,\cdots,x_n)’$ is regressor(s) matrix containing $n$ observations. $X$ matrix also called model matrix (whose column represents regressors), The $\beta$ is a $p\times 1$ vector of regressor coefficients, and $\varepsilon$ is a vector of order $n\times 1$ containing error terms. To learn more about Simple linear Models, visit the link: Simple Linear Regression Models.
Table of Contents
Estimating Regression Coefficients
The regression coefficients $\ beta$ can be estimated
$$\hat{\beta}=(X’X)^{-1}X’Y$$
The fitted values can be computed
$$\hat{y}=X\hat{\beta}$$
The residuals are
$$\hat{\varepsilon} = y – \hat{y}$$
The residual sum of squares is
$$\hat{\varepsilon}\varepsilon$$
R language has excellent facilities for fitting linear models. The basic function for fitting linear models by the least square method is lm()
function. The model is specified by a formula notation.
We will consider mtcars
the dataset. Let $Y=mpg$ and $X=hp$, the simple linear regression model is
$$Y_i = \beta_1 + \beta_2 hp + \varepsilon_i$$
where $\beta_1$ is the intercept and $\beta_2$ is the slope coefficient.
Fitting Simple Linear Regression Model in R
To fit this simple linear regression model in R, one can follow:
attach(mtcars) mod <- lm(mpg ~ hp) mod
The lm()
function uses a formula mpg ~ hp
with the response variable on the left of the tilde (~
) and predictor on the right. It is better to supply the data
argument to lm()
function. That is,
mod <- lm(mpg ~ hp, data = mtcars)
The lm()
function returns an object of the class lm
, saved in a variable mod (it can be different). Printing the object produces a brief report.
Hypothesis Testing of Regression Coefficients
For hypothesis testing regression coefficients summary()
function should be used. It will bring more information about the fitted model such as standard errors, t-values, and p-values for each coefficient of the model fitting. For example,
summary(mod)
One can fit a regression model without an intercept term if required.
lm(mpg ~ hp -1, data = mtcars)
Graphical Representation of the Model
For the graphical representation of the model, one can use the plot()
function to draw scatter points and the abline()
function to draw the regression line.
plot(hp, mpg) abline(mod)
Note the order of variables in the plot()
function. The first argument to plot()
function represents the predictor variable while the second argument to plot()
function represents the response variable.
The function abline()
plots a line on the graph according to the slope and intercept provided by the argument mod or by providing it manually.
One can change the style of the regression line using lty
argument. Similarly, the color of the regression line can be changed from black to some other color using col
argument. That is,
plot(hp, mpg) abline(mod, lty = 2, col = "blue")
Note that one can identify different observations on a graph using the identify()
function. For example,
identify(hp, mpg)
Note to identify a point, place the mouse pointer near the point and press the left mouse button, to exit from identify
procedure, press the right mouse button, or ESC
button from the keyboard.
FAQs about Simple Linear Regression in R
- What is a simple linear regression model? How it can be performed in the R Language?
- How lm() function is used to fit a simple linear regression model in detail?
- How estimation and testing of the regression coefficient can be performed in R?
- What is the use of summary() function in R, explain.
- How visualization of regression models in R can be performed?
Read more on Statistical models in R