Simple Linear Regression Model

Introduction to Simple Linear Regression Model

The linear regression model is typically estimated by the ordinary least squares (OLS) technique. The model in general form is

$$Y_i=x’_i\beta + \varepsilon, \quad\quad i=1,2,\cdots,n$$

In matrix notation

$$y=X\beta + \varepsilon,$$

where $y$ is a vector of order $n\times 1$ that contains values of the dependent variable, $X=(x_1,x_2,\cdots,x_n)’$ is regressor(s) matrix containing $n$ observations. $X$ matrix also called model matrix (whose column represents regressors), The $\beta$ is a $p\times 1$ vector of regressor coefficients, and $\varepsilon$ is a vector of order $n\times 1$ containing error terms. To learn more about Simple linear Models, visit the link: Simple Linear Regression Models.

Estimating Regression Coefficients

The regression coefficients $\ beta$ can be estimated

$$\hat{\beta}=(X’X)^{-1}X’Y$$

The fitted values can be computed

$$\hat{y}=X\hat{\beta}$$

The residuals are

$$\hat{\varepsilon} = y – \hat{y}$$

The residual sum of squares is

$$\hat{\varepsilon}\varepsilon$$

R language has excellent facilities for fitting linear models. The basic function for fitting linear models by the least square method is lm() function.  The model is specified by a formula notation.

We will consider mtcars the dataset. Let $Y=mpg$ and $X=hp$, the simple linear regression model is

$$Y_i = \beta_1 + \beta_2 hp + \varepsilon_i$$

where $\beta_1$ is the intercept and $\beta_2$ is the slope coefficient.

Fitting Simple Linear Regression Model in R

To fit this simple linear regression model in R, one can follow:

attach(mtcars)

mod <- lm(mpg ~ hp)
mod

The lm() function uses a formula mpg ~ hp with the response variable on the left of the tilde (~) and predictor on the right. It is better to supply the data argument to lm() function. That is,

mod <- lm(mpg ~ hp, data = mtcars)

The lm() function returns an object of the class lm, saved in a variable mod (it can be different). Printing the object produces a brief report.

Hypothesis Testing of Regression Coefficients

For hypothesis testing regression coefficients summary() function should be used. It will bring more information about the fitted model such as standard errors, t-values, and p-values for each coefficient of the model fitting. For example,

summary(mod)

One can fit a regression model without an intercept term if required.

lm(mpg ~ hp -1, data = mtcars)

Graphical Representation of the Model

For the graphical representation of the model, one can use the plot() function to draw scatter points and the abline() function to draw the regression line.

plot(hp, mpg)
abline(mod)

Note the order of variables in the plot() function. The first argument to plot() function represents the predictor variable while the second argument to plot() function represents the response variable.

The function abline() plots a line on the graph according to the slope and intercept provided by the argument mod or by providing it manually.

One can change the style of the regression line using lty argument. Similarly, the color of the regression line can be changed from black to some other color using col argument. That is,

plot(hp, mpg)
abline(mod, lty = 2, col = "blue")

Note that one can identify different observations on a graph using the identify() function. For example,

identify(hp, mpg)
Simple Linear Regression Model

Note to identify a point, place the mouse pointer near the point and press the left mouse button, to exit from identify procedure, press the right mouse button, or ESC button from the keyboard.

FAQs about Simple Linear Regression in R

  1. What is a simple linear regression model? How it can be performed in the R Language?
  2. How lm() function is used to fit a simple linear regression model in detail?
  3. How estimation and testing of the regression coefficient can be performed in R?
  4. What is the use of summary() function in R, explain.
  5. How visualization of regression models in R can be performed?

Read more on Statistical models in R

MCQs in Statistics

Graphical Representations in R

Many graphical representations in R Language are available for both qualitative and quantitative data types. In this post, we will only discuss graphical representations in R such as histograms, bar plots, and box plots.

Creating Histogram in R

To visualize a single variable, the histogram can be drawn using the hist() function in R. The use of histograms is to judge the shape and distribution of data in a graphical way. Histograms are also used to check the normality of the variable.

Let us attach the data from iris dataset.

attach(iris)
head(iris)
hist(Petal.Width)

We can enhance the histogram by using some arguments/parameters related to the hist() function in R. For example,

hist(Petal.Width,
  xlab = "Petal Width",
  ylab = "Frequency",
  main = "Histogram of Petal Width from Iris Data set",
  breaks = 10,
  col = "dodgerblue",
  border = "orange")
Graphical Representations in R Language

If these arguments are not provided, R will attempt to intelligently guess them, especially the number of breaks. See the YouTube tutorial for graphical representations of the histogram.

Creating Barplots in R

The bar plots are the best choice for visual inspection of a categorical variable (or a numeric variable with a finite number of values), or a rank variable. Usually, one can use bar plots for comparison purposes. The barplot() function can be used for visual inspection of a categorical variable.

library(mtcars)
barplot( table(cyl) )
barplot(table(cyl),
  ylab = "Frequency",
  xlab = "Cylinders (4, 6, 8)",
  main = "Number of cylinders ",
  col = "green",
  border = "blue")

Creating Boxplots in R

One can use Boxplots to visualize the normality, skewness, and existence of outliers in the data based on five-number summary statistics.

boxplot(mpg)
boxplot(Petal.Width)
boxplot(Petal.Length)

However, one can compare a numerical variable for different values of a categorical/grouping variable. For example,

boxplot(mpg ~ cyl, data = mtcars)
Graphical Representations in R Boxplot

The reads the formula mpg ~ cyl as: “Plot the mpg variable against the cyl variable using the dataset mtcars. The symbol ~ used to specify a formula in R.

boxplot(mpg ~ cyl, data =mtcars,
  xlab = "Cylinders",
  ylab = "Miles per Gallon",
  pch = 20,
  cex = 2,
  col = "pink",
  border = "black")
Graphical-representation-in-r

See How to perform descriptive statistics

Visit: MCQs and Quiz site https://gmstat.com

switch Statement in R

In R language, the switch statements allow a variable to be tested for equality against a list of values. Each value in a list is called a case, and the variable being switched on is checked for each case. R switch is almost the same as the if statement regarding working functionality.

The basic syntax is

Basic Syntax of Switch Statement in R Language

switch(expression,
     case 1,
     case 2,
     case 3,
     .
     .
)

The expression values are tested against multiple cases (case1, case2, …, casen). The one-line syntax is,

switch statement in r

An R Language Switch statement allows a default statement can also be added. The default statement will be executed when the Expression value is not matching with any of the case statements.

The following example is a simple command-line type calculator using R.

Simple Calculator Example

number1  <- 30
number2  <- 20
operator <- readline(prompt = "Enter any ARITHMETIC OPERATOR (+, -, *, ^, /, %/%, %%)!: ")

switch(operator,
       "+" = print(paste("Addition (number1+number2) = ", number1 + number2)),
       "-" = print(paste("Subtraction (number1-number2) = ", number1 - number2)),
       "*" = print(paste("Multiplication (number1*number2) = ", number1 * number2)),
       "^" = print(paste("Exponent (number1^number2) = ", number1 ^ number2)),
       "/" = print(paste("Division (number1/number2) = ", number1 / number2)),
       "%/%" = print(paste("Integer Division (number1 %/% number2) = ", number1 %/% number2)),
       "%%" = print(paste("Division (number1 %% number2) = ", number1 %% number2))
)

From the above example, one can easily compute some basic computations on two numbers. The operation on these two numbers depends on the input given to readline( ) the function and the expression in the switch. The operator value from readline() is matched with the options (cases) in the switch statement and results are displayed when matched.

Probability under the F-Curve

Consider another example, for different probabilities, the area under the curve for an F-curve can be selected using the switch as given below.

# q contains the probability under the curve for a F-curve
q <- c(0.25, 0.5, 0.75, 0.999)
test = 3
v1 = 10
v2 = 20

switch(test,
      "1" = print (qf(q[1], df1=v1, df2=v2, lower.tail = T) ),
      "2" = print (qf(q[2], df1=v1, df2=v2, lower.tail = T) ),
      "3" = print (qf(q[3], df1=v1, df2=v2, lower.tail = T) ),
      "4" = print (qf(q[4], df1=v1, df2=v2, lower.tail = T) )
)

The code above will produce F-table values for different probability values.

MCQs in Statistics

MCQs General Knowledge