The glm Function in R

Learn about the glm function in R with this comprehensive Q&A guide. Understand logistic regression, Poisson regression, syntax, families, key components, use cases, model diagnostics, and goodness of fit. Includes a practical example for logistic regression using glm() function in R.

What is the glm function in the R language?

The glm (Generalized Linear Models) function in R is a powerful tool for fitting linear models to data where the response variable may have a non-normal distribution. It extends the capabilities of traditional linear regression to handle various types of response variables through the use of link functions and exponential family distributions.

Since the distribution of the response depends on the stimulus variables through a single linear function only, the same mechanism as was used for linear models can still be used to specify the linear part of a generalized model.

What is Logistic Regression?

Logistic regression is used to predict the binary outcome from the given set of continuous predictor variables.

What is the Poisson Regression?

The Poisson regression is used to predict the outcome variable, which represents counts from the given set of continuous predictor variables.

What is the general syntax of the glm function in R Language?

The general syntax to fit a Generalized Linear Model is glm() function in R is:

glm(formula, family = gaussian, data, weights, subset, na.action, start = NULL,
    etastart, mustart, offset, control = list(...), model = TRUE, method = "glm.fit",
    x = FALSE, y = TRUE, contrasts = NULL, ...)

What are families in R?

The class of Generalized Linear Models handled by facilities supplied in R includes Gaussian, Binomial, Poisson, Inverse Gaussian, and Gamma response distributions, and also quasi-likelihood models where the response distribution is not explicitly specified. In the latter case, the variance function must be specified as a function of the mean, but in other cases, this function is implied by the response distribution.

Write about the Key components of glm Function in R

Formula

It specifies the relationship between variables, similar to lm(). For example,

y ~ x1 + x2 + x3  # main effects
y ~ x1*x2         # main effects plus interaction
y ~ .    

Family

It defines the error distribution and link function. The Common families are:

  • gaussian(): Normal distribution (default)
  • binomial(): Logistic regression (binary outcomes)
  • poisson(): Poisson regression (count data)
  • Gamma(): Gamma regression
  • inverse.gaussian(): Inverse Gaussian distribution

What are the common use cases of glm() function?

Each family has link functions (e.g., logit for binomial, log for Poisson).

Logistic Regression (Binary Outcomes

model <- glm(outcome ~ predictor1 + predictor2, family = binomial(link = "logit"),
             data = mydata)

Poisson Regression (Count Data)

model <- glm(count ~ treatment + offset(log(exposure)), family = poisson(link = "log"),
             data = count_data)

What statistics can be computed after fitting glm() model?

After fitting a model, one can use:

summary(model)   # Detailed output including coefficients
coef(model)      # Model coefficients
confint(model)   # Confidence intervals
predict(model)   # Predicted values

What are model diagnostics and goodness-of-fit?

The following are built-in glm() model diagnostics and goodness of fit:

anova(model, test = "Chisq")  # Analysis of deviance
residuals(model)              # Various residual types available
plot(model)                   # Diagnostic plots

Give an example of logistic regression fitting using glm() function.

Consider the mtcars data set, where am is the response variable

# Fit model
data(mtcars)
model <- glm(am ~ hp + wt, family = binomial, data = mtcars)

# View results
summary(model)

# Predict probabilities
predict(model, type = "response")

# Plot
par(mfrow = c(2, 2))
plot(model)
glm() Function in R Language

Tips for effective Use of glm() function?

  1. Always check model assumptions and diagnostics
  2. For binomial models, the response can be:
    • A factor (first level = failure, others = success)
    • A numeric vector of 0/1 values
    • A two-column matrix of successes/failures
  3. Use drop1() or add1() for model selection
  4. Consider glm.nb() from the MASS package for overdispersed count data

The glm() function in R is fundamental for many statistical analyses in R, providing flexibility to handle various types of response variables beyond normal distributions.

Try Pedagogy Quizzes

The Poisson Regression in R

The Poisson regression model should be used when the dependent (response) variable is in the form of counts or values of the response variables following a Poisson distribution. In R, glm() function can be used to perform Poisson regression analysis.

Note that the lm() function performs simple and multiple linear regression models when the dependent variable is continuous.

Poisson Regression Models in R Language

Statistical models such as linear or Poisson regression models can be performed easily in R language.

The Poisson regression is used to analyze count data.

For the Poisson model, let us consider another built-in data set warpbreaks. This data set describes the effect of wool type (A or B) and tension (Low, Medium, and High) on the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn.

head(warpbreaks)

The $breaks$ variable is a response variable since it contains the number of breaks (count of breaks). The $tension$ and $type$ variables are taken as predictor variables.

pois_mod <- glm(breaks ~ wool + tension, data = warpbreaks, family = poisson)

The output from the pois_mod object is

Poisson Regression using glm()

The glm() provides eight choices for a family with the following default link functions:

FamilyDefault Link Function
binomial(link = “logit”)
gaussian(link = “identity”)
Gamma(link = “inverse”)
inverse.gaussian(link =$\frac{1}{\mu^2}$)
poisson(link = “log”)
quasi(link = “identity”, variance = “constant”)
quasibinomial(link = “logit”)
quasipoisson(link = “log”)

The detailed output (estimation and testing of parameters) can be obtained as

summary(pois_mod)
Summary Output Poisson Regression

Poisson Example

  • A number of cargo ships were damaged by waves (McCullagh & Nelder, 1989).
  • Number of deaths due to AIDs in Australia per quarter (3 month periods) from January 1983 – June 1986.
  • A number of violent incidents were exhibited over a 6-month period by patients who had been treated in the ER of a psychiatric hospital (Gardner, Mulvey, & Shaw, 1995).
  • Daily homicide counts in California (Grogger, 1990).
  • Founding of daycare centers in Toronto (Baum & Oliver, 1992).
  • Political party-switching among members of the US House of Representatives (King, 1988).
  • Number of presidential appointments to the Supreme Court (King, 1987).
  • A number of children in a classroom that a child lists as being their friend (unlimited nomination procedure, sociometric data).
  • A number of hard disk failures during a year.
  • Number of deaths due to SARs (Yu, Chan & Fung, 2006).
  • A number of arrests resulted from 911 calls.
  • A number of orders of protection were issued.

FAQs about Poisson Regression in R

  1. What function is used in R to perform Poisson Regression?
  2. Write about important arguments of glm() function in R to perform the Poisson Regression Model.
  3. Give real-life examples of data sets, for which Poisson regression may be performed.
  4. List the link function of the family.
  5. How Poisson Model is different from Linear Regression models?
Frequently Asked Questions About R
Poisson Regression in R

MCQs in Statistics

Non-Linear Regression Model: A Comprehensive Guide

The article is about using and applying Non-Linear Regression Models in R Language. In the least square method, the regression model is established in such a way that

"The sum of the squares of the vertical distances of different points (residuals) from the regression line is minimized"

When the relationship between the variables is not linear (one has a non-linear regression model), one may

  1. try to transform the data to linearize the relationship,
  2. fit polynomial or complex spline model to the data, or
  3. fit a non-linear regression to the data.

Non-Linear Regression Model

In the non-linear regression model, a function is specified by a set of parameters to fit the data. The non-linear least squares approach is used to estimate such parameters. In R, the nls() is used to approximate the non-linear function using a linear one and iteratively try to find the best parameter values.

Some frequently used non-linear regression models are listed in the Table below.

sr no.NameModel
1)Michaelis-Menten$y=\frac{ax}{1+bx}$
2)Two-parameter asymptotic exponential$y=a(1-e^{-bx})$
3)Three-parameter asymptotic exponential$y=a-be^{-cx}$
4)Two parameter Logistic$y=\frac{e^{a+bx}}{1+e^{a+bx}}$
5)Three parameter Logistic$y=\frac{a}{1+be^{-ex}}$
6)Weibull$y=a-be^{-cx^d}$
7)Gompertz$y=e^{-be^{-cx}}$
8)Ricker curves$y=axe^{-bx}$
9)Bell-Shaped$y=a \, exp(-|bx|^2)$

Let’s fit the Michaelis-Menten non-linear function to the data given below.

x <- seq(1, 10, 1)
y <- c(3.7, 7.1, 11.9, 19, 27, 38.5, 51, 67.7, 85, 102)

nls_model <- nls(y ~ a * x/(1 + b * x), start = list(a = 1, b = 1))

summary(nls_model)

The output of the above code for the Michaelis-Menten non-linear function is

#### Output
Formula: y ~ a * x/(1 + b * x)

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
a  4.107257   0.226711   18.12 8.85e-08 ***
b -0.060900   0.002708  -22.49 1.62e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.805 on 8 degrees of freedom

Number of iterations to convergence: 11 
Achieved convergence tolerance: 6.354e-06

Let us plot the non-linear predicted values from 10 data points of newly generated x-values

new.data <- data.frame(x = seq(min(x), max(x), len = 10))

plot(x, y)
lines(new.data$x, predict(nls_model, newdata = new.data) )
Non-Linear Regression Models

The sum of squared residuals and the confidence interval of the chosen values of the coefficient can be obtained by issuing the commands,

sum(resid(nls_model)^2) 
# or 
print(sum(resid(nls_model)^2))
confint(nls_model) 
# or 
print(confint(nls_model))
Non=Linear Regression Models in R Output

Note that the formula nls() does not use special coding in linear terms, factors, interactions, etc. The right-hand side in the expression nls() computes the expected value to the left-hand side. The start argument contains the list of starting values of the parameter used in the expression and is varied by the algorithm.

Computer MCQs Online Test