Muhammad Imdad Ullah, Author at R Programming FAQs

Non-Linear Regression Model: A Comprehensive Guide

August 6, 2024October 16, 2022 by Muhammad Imdad Ullah

The article is about using and applying Non-Linear Regression Models in R Language. In the least square method, the regression model is established in such a way that

"The sum of the squares of the vertical distances of different points (residuals) from the regression line is minimized"

When the relationship between the variables is not linear (one has a non-linear regression model), one may

try to transform the data to linearize the relationship,
fit polynomial or complex spline model to the data, or
fit a non-linear regression to the data.

Non-Linear Regression Model

In the non-linear regression model, a function is specified by a set of parameters to fit the data. The non-linear least squares approach is used to estimate such parameters. In R, the nls() is used to approximate the non-linear function using a linear one and iteratively try to find the best parameter values.

Some frequently used non-linear regression models are listed in the Table below.

sr no.	Name	Model
1)	Michaelis-Menten	$y=\frac{ax}{1+bx}$
2)	Two-parameter asymptotic exponential	$y=a(1-e^{-bx})$
3)	Three-parameter asymptotic exponential	$y=a-be^{-cx}$
4)	Two parameter Logistic	$y=\frac{e^{a+bx}}{1+e^{a+bx}}$
5)	Three parameter Logistic	$y=\frac{a}{1+be^{-ex}}$
6)	Weibull	$y=a-be^{-cx^d}$
7)	Gompertz	$y=e^{-be^{-cx}}$
8)	Ricker curves	$y=axe^{-bx}$
9)	Bell-Shaped	$y=a \, exp(-\|bx\|^2)$

Let’s fit the Michaelis-Menten non-linear function to the data given below.

x <- seq(1, 10, 1)
y <- c(3.7, 7.1, 11.9, 19, 27, 38.5, 51, 67.7, 85, 102)

nls_model <- nls(y ~ a * x/(1 + b * x), start = list(a = 1, b = 1))

summary(nls_model)

The output of the above code for the Michaelis-Menten non-linear function is

#### Output
Formula: y ~ a * x/(1 + b * x)

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
a  4.107257   0.226711   18.12 8.85e-08 ***
b -0.060900   0.002708  -22.49 1.62e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.805 on 8 degrees of freedom

Number of iterations to convergence: 11 
Achieved convergence tolerance: 6.354e-06

Let us plot the non-linear predicted values from 10 data points of newly generated x-values

new.data <- data.frame(x = seq(min(x), max(x), len = 10))

plot(x, y)
lines(new.data$x, predict(nls_model, newdata = new.data) )

The sum of squared residuals and the confidence interval of the chosen values of the coefficient can be obtained by issuing the commands,

sum(resid(nls_model)^2) 
# or 
print(sum(resid(nls_model)^2))
confint(nls_model) 
# or 
print(confint(nls_model))

Non=Linear Regression Models in R Output

Note that the formula nls() does not use special coding in linear terms, factors, interactions, etc. The right-hand side in the expression nls() computes the expected value to the left-hand side. The start argument contains the list of starting values of the parameter used in the expression and is varied by the algorithm.

Computer MCQs Online Test

Logistic Regression Models in R

June 20, 2024October 9, 2022 by Muhammad Imdad Ullah

The article is about the use and application of Logistic Regression Models in R Language. In logistic regression models, the response variable ($y$) is of categorical (binary, dichotomous) values such as 1 or 0 (TRUE/ FALSE). It measures the probability of a binary response variable based on a mathematical equation relating the values of the response variable with the predictor(s). The built-in glm() function in R can be used to perform logistic regression analysis.

Probability and Odds Ratio

The odds are used in logistic regression. If $p$ is the probability of success, the odds of in favour of success are, $\frac{p}{q}=\frac{p}{1-p}$.

Note that probability can be converted to odds and odds can also be converted to likelihood (probability). However, unlike probability, odds can exceed 1. For example, if the likelihood of an event is 0.25, the odds in favour of that event are $\frac{0.25}{0.75}=0.33$. And the odds against the same event are $\frac{0.75}{0.25}=3$.

Logistic Regression Models in R (Example)

In built-in dataset (“mtcars“), the column (am) describes the transmission mode (automatic or manual) which is of binary value (0 or 1). Let us perform logistic regression models between the response variable “am” and other regressors: “hp”, “wt”, and “cyl” as given:

Logistic Regression with one Dichotomous Predictor

logmodel1 <- glm(am ~ vs, family = "binomial")
summary(logmodel1)

Logistic Regression with one Continuous Predictor

If the prediction variable is continuous then the logistic regression formula in R would be as given below:

logmodel2 <- glm(am ~ wt, family = "binomial")
summary(logmodel2)

Multiple Predictors in Logistic Regression

The following is an example of a logistic regression model with more than one predictor. For the model diagnostic plots are also drawn.

logmodel3 <- glm(am ~ cyl + hp + wt, family = "binomial")
summary(logmodel3)
plot(logmodel3)

Note: in the logistic regression model, dichotomous and continuous variables can be used as predictors.

Logistic Regression Models in R and Diagnostic Plots

In R language, the coefficients returned by logistic regression are a logit, or the log of the odds. To convert logits to odds ratio exponentiates it and to convert logits to probability use $\frac{e^\beta}{1-e^\beta}$. For example,

logmodel1 <- glm(am ~ vs, family = "binomial", data = mtcars)
logit_coef <- logmodel1$coef
exp(logmodel1$coef)
exp(logit_coef)/(1 + exp(logmodel1$coef))

Generalized Linear Models (GLM) in R

March 21, 2025August 9, 2022 by Muhammad Imdad Ullah

The generalized linear models (GLM) can be used when the distribution of the response variable is non-normal or when the response variable is transformed into linearity. The GLMs are flexible extensions of linear models that are used to fit the regression models to non-Gaussian data.

Introduction to Generalized Linear Models

Generalized Linear Models (GLMs) in R are an extension of linear regression that allow for response variables with non-normal distributions. GLMs are used to model relationships between a dependent variable and one or more independent variables. Generalized Linear Models consist of three components:

Random Component: Specifies the probability distribution of the response variable (e.g., Gaussian, Binomial, Poisson).
Systematic Component: The linear predictor, which is a linear combination of the predictors (independent variables).
Link Function: Connects the mean of the response variable to the linear predictor (e.g., identity, logit, log).

One can classify a regression model as linear or non-linear regression models.

Basic Form of a Generalized Linear Models

The basic form of a Generalized linear model is
\begin{align*}
g(\mu_i) &= X_i’ \beta \\
&= \beta_0 + \sum\limits_{j=1}^p x_{ij} \beta_j
\end{align*}
where $\mu_i=E(U_i)$ is the expected value of the response variable $Y_i$ given the predictors, $g(\cdot)$ is a smooth and monotonic link function that connects $\mu_i$ to the predictors, $X_i’=(x_{i0}, x_{i1}, \cdots, x_{ip})$ is the known vector having $i$th observations with $x_{i0}=1$, and $\beta=(\beta_0, \beta_1, \cdots, \beta_p)’$ is the unknown vector of regression coefficients.

Syntax of glm() Function

In R, GLMs are fitted using the glm() function. The basic syntax of glm() function is

glm(formula, family, data)

formula: Specifies the model (e.g., y ~ x1 + x2).
family: Describes the distribution and link function (e.g., gaussian(link = "identity"), binomial(link = "logit"), poisson(link = "log")).
data: The dataset containing the variables.

Fitting Generalized Linear Models

The glm() is a function that can be used to fit a generalized linear model, using the generic form of the model below. The formula argument is similar to that used in the lm() function for the linear regression model.

mod <- glm(formula, family = gaussian, data = data.frame)

The family argument is a description of the error distribution and link function to be used in the model.

The class of generalized linear models is specified by giving a symbolic description of the linear predictor and a description of the error distribution. The link functions for different families of the probability distribution of the response variables are given below. The family name can be used as an argument in the glm( ) function.

Link Functions for Different Families

Family Name	Link Functions
`binomial`	`logit` , `probit`, `cloglog`
`gaussian`	`identity`, `log`, `inverse`
`Gamma`	`identity`, `inverse`, `log`
`inverse gaussian`	$1/ \mu^2$, `identity`, `inverse`,`log`
`poisson`	`logit`, `probit`, `cloglog`, `identity`, `inverse`
`quasi`	`log`, $1/ \mu^2$, `sqrt`

Generalized Linear Models, GLM Example in R

Consider the “cars” dataset available in R. Let us fit a generalized linear regression model on the data set by assuming the “dist” variable as the response variable, and the “speed” variable as the predictor. Both the linear and generalized linear models are performed in the example below.

data(cars)
head(cars)
attach(cars)

scatter.smooth(x=speed, y=dist, main = "Dist ~ Speed")

# Linear Model
lm(dist ~ speed, data = cars)
summary(lm(dist ~ speed, data = cars)

# Generalized Linear Model
glm(dist ~ speed, data=cars, family = "gaussian")
plot(glm(dist ~ speed, data = cars))
summary(glm(dist ~ speed, data = cars))

Diagnostic Plots of Generalized Linear Models

Generalized Linear Models Types and Applications

GLM Type	Response Variable	Real-Life Example
Logistic Regression	Binary (0/1)	Customer churn, disease diagnosis
Poisson Regression	Count data	Insurance claims, website visits
Gamma Regression	Positive, skewed continuous	Insurance claim amounts, machine failure time
Multinomial Regression	Multi-category	Product choice, species classification
Negative Binomial Regression	Overdispersed count data	Accident counts, sick days
Ordinal Regression	Ordered categories	Customer satisfaction, disease severity
Tweedie Regression	Zero-inflated continuous	Insurance claims with many zeros

https://gmstat.com