The article is about the use and application of Logistic Regression Models in R Language. In logistic regression models, the response variable ($y$) is of categorical (binary, dichotomous) values such as 1 or 0 (TRUE/ FALSE). It measures the probability of a binary response variable based on a mathematical equation relating the values of the response variable with the predictor(s). The built-in glm()
function in R can be used to perform logistic regression analysis.
Probability and Odds Ratio
The odds are used in logistic regression. If $p$ is the probability of success, the odds of in favour of success are, $\frac{p}{q}=\frac{p}{1-p}$.
Note that probability can be converted to odds and odds can also be converted to likelihood (probability). However, unlike probability, odds can exceed 1. For example, if the likelihood of an event is 0.25, the odds in favour of that event are $\frac{0.25}{0.75}=0.33$. And the odds against the same event are $\frac{0.75}{0.25}=3$.
Logistic Regression Models in R (Example)
In built-in dataset (“mtcars
“), the column (am) describes the transmission mode (automatic or manual) which is of binary value (0 or 1). Let us perform logistic regression models between the response variable “am” and other regressors: “hp”, “wt”, and “cyl” as given:
Logistic Regression with one Dichotomous Predictor
logmodel1 <- glm(am ~ vs, family = "binomial") summary(logmodel1)
Logistic Regression with one Continuous Predictor
If the prediction variable is continuous then the logistic regression formula in R would be as given below:
logmodel2 <- glm(am ~ wt, family = "binomial") summary(logmodel2)
Multiple Predictors in Logistic Regression
The following is an example of a logistic regression model with more than one predictor. For the model diagnostic plots are also drawn.
logmodel3 <- glm(am ~ cyl + hp + wt, family = "binomial") summary(logmodel3) plot(logmodel3)
Note: in the logistic regression model, dichotomous and continuous variables can be used as predictors.
In R language, the coefficients returned by logistic regression are a logit, or the log of the odds. To convert logits to odds ratio exponentiates it and to convert logits to probability use $\frac{e^\beta}{1-e^\beta}$. For example,
logmodel1 <- glm(am ~ vs, family = "binomial", data = mtcars) logit_coef <- logmodel1$coef exp(logmodel1$coef) exp(logit_coef)/(1 + exp(logmodel1$coef))