Weighted Least Squares In R: A Quick WLS Tutorial

Introduction to Weighted Least Squares in R

This post will discuss the implementation of Weighted Least Squares (WLS) in R. The OLS method minimizes the sum of squared residuals, while the WLS weights the square residuals. The WLS technique is used when the OLS assumption related to constant variance in the errors is violated.

The WLS technique is also known as weighted linear regression it is a generalization of ordinary least squares (OLS) and linear regression in which knowledge of the variance of observations is incorporated into the regression.

WLS in R Language

Let us perform the WLS in R.

Here we will use the mtcars dataset. You need to load this data set first using the attach() function. For example,

attach(mtcars)

Consider the following example regarding the weighted least squares in which the reciprocal of $wt$ variable is used as weights. Here two different weighted models are performed and then check the fit of the model using anova() function.

# Weighted Model 1
w_model1 <- lm(mpg ~ wt + hp, data = mtcars)

# Weighted Model 2
w_model2 <- lm(mpg ~ wt + hp, data = mtcars, weights = 1/wt)

Check/ Test The Model Fit

To check the model fit, summary statistics of the fitted model, and different diagnostic plots of the fitted model, one can use the built-in functions as,

anova(w_model1, w_model2 )
summary(w_model1)
summary(w_model2)

plot(w_model1)
plot(w_model2)

The output of the First Weighted Model is:

Call:
lm(formula = mpg ~ wt + hp, data = mtcars)

Residuals:
   Min     1Q Median     3Q    Max 
-3.941 -1.600 -0.182  1.050  5.854 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
wt          -3.87783    0.63273  -6.129 1.12e-06 ***
hp          -0.03177    0.00903  -3.519  0.00145 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared:  0.8268,	Adjusted R-squared:  0.8148 
F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

The output of the Second Weighted Model is

Call:
lm(formula = mpg ~ wt + hp, data = mtcars, weights = 1/wt)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-2.2271 -1.0538 -0.3894  0.6397  3.7627 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 39.002317   1.541462  25.302  < 2e-16 ***
wt          -4.443823   0.688300  -6.456 4.59e-07 ***
hp          -0.031460   0.009776  -3.218  0.00317 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.554 on 29 degrees of freedom
Multiple R-squared:  0.8389,	Adjusted R-squared:  0.8278 
F-statistic: 75.49 on 2 and 29 DF,  p-value: 3.189e-12

Graphical Representation of Models

The graphical representation of both models is:

Weighted Least Squares (Model 1)
Diagnostic Plots for WLS model: Model-1
Weighted Least Squares Model-2
Diagnostic Plots for WLS model: Model-2

FAQS Weighted Least Square in R

  1. How weighted least squares can be performed in R Lanague?
  2. How lm() function can be used to conduct a WLS in R?
  3. What are the important arguments for performing a weighted least squares model in R?

Learn about Generalized Least Squares

Curvilinear Regression in R: A Quick Reference

Introduction to Curvilinear Regression in R Language

In this post, we will learn about some basics of curvilinear regression in R.

The curvilinear/non-linear regression analysis is used to determine if there is a non-linear trend exists between $X$ and $Y$.

Adding more parameters to an equation results in a better fit to the data. A quadratic and cubic equation will always have higher $R^2$ than the linear regression model. Similarly, a cubic equation will usually have higher $R^2$ than a quadratic one.

Logarithmic and Polynomial Relationships

The logarithmic relationship can be described as follows:
$$Y=m\, log(x)++c$$
the polynomial relationship can be described as follows:
$$Y=m_1x + m_2x^2 + m_3x^3 + m_nx^n + c$$

The logarithmic example is more akin to a simple regression, whereas the polynomial example is multiple regression. Logarithmic relationships are common in the natural world; you may encounter them in many circumstances. Drawing the relationships between response and predictor variables as a scatter plot is generally a good starting point.

Consider the following data that are related in a curvilinear form,

GrowthNutrient
22
94
116
128
1310
1416
1722
1928
1730
1836
2048

Performing Curvilinear Regression in R

Let us perform a curvilinear regression in R language.

Growth <- c(2, 9, 11, 12, 13, 14, 17, 19, 17, 18, 20)
Nutrient <- c(2, 4, 6, 8, 10, 16, 22, 28, 30, 36, 48)
data <- as.data.frame(cbind(Growth, Nutrient))

ggplot(data, aes(Nutrient, Growth) ) +
  geom_point() +
  stat_smooth()
Curvilinear Regression in R

The Scatter plot shows the relationship appears to be a logarithmic one.

Linear Regression in R

Let us carry out a linear regression using the lm() function by taking the $\log$ of the predictor variable rather than the basic variable itself.

data <- cbind(Growth, Nutrient)
mod  <- lm(Growth~log(Nutrient, data))
summary(mod)

##
Call:

lm(formula = Growth ~ log(Nutrient), data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2274 -0.9039  0.5400  0.9344  1.3097 
Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.6914     1.0596   0.652     0.53    
log(Nutrient)   5.1014     0.3858  13.223 3.36e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.229 on 9 degrees of freedom
Multiple R-squared:  0.951,     Adjusted R-squared:  0.9456 
F-statistic: 174.8 on 1 and 9 DF,  p-value: 3.356e-07

FAQS about Curvilinear Regression in R

  1. Write in detail about curvilinear regression models.
  2. How visually one can guess the curvilinear relationship between the response and predictor variable?
  3. What may be the consequences, if a curvilinear relationship is estimated using a simple linear regression model?

Learn about Performing Linear Regression in R

Learn Statistics

Backward Deletion Method Step by Step in R

Introduction to Backward Deletion Method

With many predictor variables, one can create the most statistically significant model from the data. There are two main choices: forward stepwise regression and backward deletion method.
In Forward Stepwise Regression: Start with the single best variable and add more variables to build your model into a more complex form.

In Backward Deletion (Backward Selection) Regression: put all the variables in the model and reduce the model by removing variables until you are left with only significant terms.

Backward Deletion method (Step by Step Procedure)

Let’s start with a big model and trim it until you get the best (most statistically significant) regression model. This drop1() command can examine a linear model and determine the effect of removing each one from the existing model. Complete the following steps to perform a backward deletion. Note that the model has different R packages for the Backward and Forward Selection of predictors.

Step 1: (Full Model)

Step 1: To start, create a “full” model (all variables at once in the model). It would be tedious to enter all the variables in the model, one can use the shortcut, the dot notation.

mod <- lm(mpg ~., data = mtcars)

Step 2: Formula Function

Step 2: Let’s use the formula() function to see the response and predictor variables used in Step 1.

formula(mod)
Backward Deletion Method

Step 3: Drop1 Function

Step 3: Let’s use the drop1() function to see which term (predictor) should be deleted from the model

drop1(mod)

Step 4: Remove the Term

Step 4: Look to remove the term with the lowest AIC value. Re-form the model without the variable that is non-significant or has the lowest AIC value. The simplest way to do this is to copy the model formula in the clipboard, paste it into a new command, and edit out the term you do not want

mod1 <- lm(mpg ~ ., data = mtcars)

Step 5: Examine the Effect

Step 5: Examine the effect of dropping another term by running the drop1() command once more:

drop1(mod1)

If you see any variable having the lowest AIC value, if found, remove the variable and carry out this process repeatedly until you have a model that you are happy with.

FAQS about Backward Deletion Method in R

  1. Write a step-by-step procedure to perform the Backward Deletion Method in r.
  2. How one can examine the effect of dropping the term from the model?
  3. What is the use of the formula function term in lm() model?
  4. What is the use of drop1() function in r?

Learn more about lm() function

Online MCQs Quiz Website