Many generic functions are available for the computation of regression coefficients, for the testing of coefficients, for computation of residuals or predictions values, etc. Therefore, a good grasp of lm()
function is necessary. Suppose, we have performed the regression analysis using lm() function as done in the previous lesson.
mod <- lm(mpg ~ hp, data = mtcars)
The object returned by the lm() function has a class of “lm”. The objects associated with the “lm” class have mode as a list.
class(mod)
The name of the objects related to the “lm” class can be queried via
names(mod)
All the components of the “lm” class can be assessed directly. For example,
mod$rank mod$coef # or mod$coefficients
The following is the list of some generic functions for the fitted “lm” model.
Generic Function | Short Description |
print() | print or displays the results in R Console |
summary() | print or displays regression coefficients, their standard errors, t-ratios, p-values, and significance |
coef() | extracts regression coefficients |
residuals() | or resid() : extracts residuals of the fitted model |
fitted() | or fitted.values() : extracts fitted values |
anova() | perform comparisons of the nested model |
predict() | compute predicted values for new data |
plot() | draw diagnostics plot of the regression model |
confint() | compute the confidence intervals for regression coefficients |
deviance() | compute the residual sum of squares |
vcov() | compute estimated variance-covariance matrix |
logLik() | compute the log-likelihood |
AIC(), BIC() | compute information criteria |
It is better to save objects from the summary()
function.
The summary() function returns an object of class “summy.lm()
” and its components can be queried via
sum_mod <- summary(mod) names(sum_mod) names( summary(mod) )
The objects from the summary()
function can be obtained as
sum_mod$residuals sum_mod$r.squared sum_mod$adj.r.squared sum_mod$df sum_mod$sigma sum_mod$fstatistic
The confidence interval for estimated coefficients can be computed as
confint(mod, level = 0.95)
Note that level argument is optional if the confidence level is 95% (significance level is 5%).
The prediction intervals for mean and individual for hp
(regressor) equal to 200 and 160, can be computed as
predict(mod, newdata=data.frame(hp = c(200, 160)), interval = "confidence" ) predict(mod, newdata=data.frame(hp = c(200, 160)), interval = "prediction" )
The prediction intervals can be used for computing and visualizing confidence bands. For example,
x = seq(50, 350, length=32 ) pred <-predict(mod, newdata=data.frame(x), interval = "prediction" ) plot(hp, mpg) lines(pred[,1] ~ x, col = 1) # fitted values lines(pred[,2] ~ x, col = 2) # lower limit lines(pred[,3] ~ x, col = 2) # upper limit
For diagnostics plot, plot() function can be used and it provides four graphs of
- residuals vs fitted values
- QQ plot of standardized residuals
- scale-location plot of fitted values against the square root of standardized residuals
- standardized residuals vs leverage
To plot say QQ plot only use
plot(mod, which = 2)
which argument is used to select the graph produced out of four.