Generalized Linear Models (GLM) in R

The generalized linear models (GLM) can be used when the distribution of the response variable is non-normal or when the response variable is transformed into linearity. The GLMs are flexible extensions of linear models that are used to fit the regression models to non-Gaussian data.

Generalized Linear Models

The basic form of a Generalized linear model is
g(\mu_i) &= X_i’ \beta \\
&= \beta_0 + \sum\limits_{j=1}^p x_{ij} \beta_j
where $\mu_i=E(U_i)$ is the expected value of the response variable $Y_i$ given the predictors, $g(\cdot)$ is a smooth and monotonic link function that connects $\mu_i$ to the predictors, $X_i’=(x_{i0}, x_{i1}, \cdots, x_{ip})$ is the known vector having $i$th observations with $x_{i0}=1$, and $\beta=(\beta_0, \beta_1, \cdots, \beta_p)’$ is the unknown vector of regression coefficients.

The glm() is a function that can be used to fit a generalized linear model, using the form

mod <- glm(formula, family = gaussian, data = data.frame)

The family argument is a description of the error distribution and link function to be used in the model.

The class of generalized linear models is specified by giving a symbolic description of the linear predictor and a description of the error distribution.

Family NameLink Functions
binomiallogit , probit, cloglog
gaussianidentity, log, inverse
Gammaidentity, inverse, log
inverse gaussian$1/ \mu^2$, identity, inverse,log
poissonlogit, probit, cloglog, identity, inverse
quasilog, $1/ \mu^2$, sqrt

Generalized Linear Model Example in R

Consider the “cars” dataset available in R.

scatter.smooth(x=speed, y=dist, main = "Dist ~ Speed")

# Linear Model
lm(dist ~ speed, data = cars)
summary(lm(dist ~ speed, data = cars)

# Generalized Linear Model
glm(dist ~ speed, data=cars, family = "binomial")
plot(glm(dist ~ speed, data = cars))
summary(glm(dist ~ speed, data = cars))