Weighted Least Squares In R: A Quick WLS Tutorial

Introduction to Weighted Least Squares in R

This post will discuss the implementation of Weighted Least Squares (WLS) in R. The OLS method minimizes the sum of squared residuals, while the WLS weights the square residuals. The WLS technique is used when the OLS assumption related to constant variance in the errors is violated.

The WLS technique is also known as weighted linear regression it is a generalization of ordinary least squares (OLS) and linear regression in which knowledge of the variance of observations is incorporated into the regression.

WLS in R Language

Let us perform the WLS in R.

Here we will use the mtcars dataset. You need to load this data set first using the attach() function. For example,

attach(mtcars)

Consider the following example regarding the weighted least squares in which the reciprocal of $wt$ variable is used as weights. Here two different weighted models are performed and then check the fit of the model using anova() function.

# Weighted Model 1
w_model1 <- lm(mpg ~ wt + hp, data = mtcars)

# Weighted Model 2
w_model2 <- lm(mpg ~ wt + hp, data = mtcars, weights = 1/wt)

Check/ Test The Model Fit

To check the model fit, summary statistics of the fitted model, and different diagnostic plots of the fitted model, one can use the built-in functions as,

anova(w_model1, w_model2 )
summary(w_model1)
summary(w_model2)

plot(w_model1)
plot(w_model2)

The output of the First Weighted Model is:

Call:
lm(formula = mpg ~ wt + hp, data = mtcars)

Residuals:
   Min     1Q Median     3Q    Max 
-3.941 -1.600 -0.182  1.050  5.854 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
wt          -3.87783    0.63273  -6.129 1.12e-06 ***
hp          -0.03177    0.00903  -3.519  0.00145 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared:  0.8268,	Adjusted R-squared:  0.8148 
F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

The output of the Second Weighted Model is

Call:
lm(formula = mpg ~ wt + hp, data = mtcars, weights = 1/wt)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-2.2271 -1.0538 -0.3894  0.6397  3.7627 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 39.002317   1.541462  25.302  < 2e-16 ***
wt          -4.443823   0.688300  -6.456 4.59e-07 ***
hp          -0.031460   0.009776  -3.218  0.00317 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.554 on 29 degrees of freedom
Multiple R-squared:  0.8389,	Adjusted R-squared:  0.8278 
F-statistic: 75.49 on 2 and 29 DF,  p-value: 3.189e-12

Graphical Representation of Models

The graphical representation of both models is:

Weighted Least Squares (Model 1)
Diagnostic Plots for WLS model: Model-1
Weighted Least Squares Model-2
Diagnostic Plots for WLS model: Model-2

FAQS Weighted Least Square in R

  1. How weighted least squares can be performed in R Lanague?
  2. How lm() function can be used to conduct a WLS in R?
  3. What are the important arguments for performing a weighted least squares model in R?

Learn about Generalized Least Squares

Curvilinear Regression in R: A Quick Reference

Introduction to Curvilinear Regression in R Language

In this post, we will learn about some basics of curvilinear regression in R.

The curvilinear/non-linear regression analysis is used to determine if there is a non-linear trend exists between $X$ and $Y$.

Adding more parameters to an equation results in a better fit to the data. A quadratic and cubic equation will always have higher $R^2$ than the linear regression model. Similarly, a cubic equation will usually have higher $R^2$ than a quadratic one.

Logarithmic and Polynomial Relationships

The logarithmic relationship can be described as follows:
$$Y=m\, log(x)++c$$
the polynomial relationship can be described as follows:
$$Y=m_1x + m_2x^2 + m_3x^3 + m_nx^n + c$$

The logarithmic example is more akin to a simple regression, whereas the polynomial example is multiple regression. Logarithmic relationships are common in the natural world; you may encounter them in many circumstances. Drawing the relationships between response and predictor variables as a scatter plot is generally a good starting point.

Consider the following data that are related in a curvilinear form,

GrowthNutrient
22
94
116
128
1310
1416
1722
1928
1730
1836
2048

Performing Curvilinear Regression in R

Let us perform a curvilinear regression in R language.

Growth <- c(2, 9, 11, 12, 13, 14, 17, 19, 17, 18, 20)
Nutrient <- c(2, 4, 6, 8, 10, 16, 22, 28, 30, 36, 48)
data <- as.data.frame(cbind(Growth, Nutrient))

ggplot(data, aes(Nutrient, Growth) ) +
  geom_point() +
  stat_smooth()
Curvilinear Regression in R

The Scatter plot shows the relationship appears to be a logarithmic one.

Linear Regression in R

Let us carry out a linear regression using the lm() function by taking the $\log$ of the predictor variable rather than the basic variable itself.

data <- cbind(Growth, Nutrient)
mod  <- lm(Growth~log(Nutrient, data))
summary(mod)

##
Call:

lm(formula = Growth ~ log(Nutrient), data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2274 -0.9039  0.5400  0.9344  1.3097 
Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.6914     1.0596   0.652     0.53    
log(Nutrient)   5.1014     0.3858  13.223 3.36e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.229 on 9 degrees of freedom
Multiple R-squared:  0.951,     Adjusted R-squared:  0.9456 
F-statistic: 174.8 on 1 and 9 DF,  p-value: 3.356e-07

FAQS about Curvilinear Regression in R

  1. Write in detail about curvilinear regression models.
  2. How visually one can guess the curvilinear relationship between the response and predictor variable?
  3. What may be the consequences, if a curvilinear relationship is estimated using a simple linear regression model?

Learn about Performing Linear Regression in R

Learn Statistics

R Basics Online Quiz 7

R Basics Online Quiz: The R language is a free and open-source language developed by Ross Ihaka and Robert Gentleman in 1991 at the University of Auckland, New Zealand. The R Language is used for statistical computing and graphics to clean, analyze, and graph your data. Let us start with the R Basics Online Quiz.

This quiz is about R Basics, covering the topics of R sequence operator, R objects, R Environment, and many more.

1. In ggplot2, you can use the __________ function to specify the data frame to use for your plot.

 
 
 
 

2. The ______________ is your current R working environment that includes user-defined objects

 
 
 
 

3. A sequence of integer values can be created using the operator

 
 
 
 

4. Which of the following describes R Language best

 
 
 
 

5. R is an _____________ programming language

 
 
 
 

6. Which of the following functions can a data analyst use to get a statistical summary of their dataset?

 
 
 
 

7. In 1991 R Language was created by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of ____________.

 
 
 
 

8. In R, an object name cannot start with

 
 
 
 

9. Which of the following software is used for statistical analysis in R

 
 
 
 

10. ___________ developed R language

 
 
 
 

11. A data analyst inputs the following command:
quartet %>% group_by(set) %>% summarize(mean(x), sd(x), mean(y), sd(y), cor(x, y)).

Which of the functions in this command can help them determine how strongly related their variables are?

 
 
 
 

12. GUI stands for

 
 
 
 

13. R was named partly after the first names of _____________ R authors?

 
 
 
 

14. The file “.RData” in the current R session is

 
 
 
 

15. A data analyst wants to create the date February 27th, 2027 using the lubridatefunctions. Which of the following are examples of code that would create this value?

 
 
 
 

16. R Language functionality is divided into a number of ________

 
 
 
 

17. Factors in R, are used to represent the

 
 
 
 

18. What does CRAN stand for _________ ?

 
 
 
 

19. The R console is a tool that is used to write (insert) standard

 
 
 
 

20. What type of plot will the following code create?
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

 
 
 
 

Frequently Asked Questions About R
R Basics Online Quiz

R Basics Online Quiz with Answers

  • A sequence of integer values can be created using the operator
  • R is an ———— programming language
  • The ———— is your current R working environment that includes user-defined objects
  • Which of the following software is used for statistical analysis in R
  • R was named partly after the first names of ———— R authors.
  • The R console is a tool that is used to write (insert) standard
  • In R, an object name cannot start with
  • The file “.RData” in the current R session is
  • Which of the following describes R Language best
  • R Language functionality is divided into a number of ————
  • GUI stands for
  • What does CRAN stand for ————?
  • In 1991 R Language was created by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of ————.
  • ———— developed R language
  • Factors in R, are used to represent the
  • Which of the following functions can a data analyst use to get a statistical summary of their dataset?
  • A data analyst inputs the following command: quartet %>% group_by(set) %>% summarize(mean(x), sd(x), mean(y), sd(y), cor(x, y)). Which of the functions in this command can help them determine how strongly related their variables are?
  • In ggplot2, you can use the ———— function to specify the data frame to use for your plot.
  • What type of plot will the following code create? ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
  • A data analyst wants to create the date February 27th, 2027 using the lubridate functions. Which of the following are examples of code that would create this value?

MCQs General Knowledge

MCQs in Statistics