R language provides an interlocking suite of facilities that make fitting statistical models very simple. The output from statistical models in R language is minimal and one needs to ask for the details by calling extractor functions.
Defining Statistical Models in R Language
The template for a statistical model is a linear regression model with independent, heteroscedastic errors, that is
$$\sum_{j=0}^p \beta_j x_{ij}+ e_i, \quad e_i \sim NID(0, \sigma^2), \quad i=1,2,\dots, n, j=1,2,\cdots, p$$
In matrix form, the statistical model can be written as
$$y=X\beta+e$$
where the $y$ is the dependent (response) variable, $X$ is the model matrix or design matrix (matrix of regressors), and has columns $x_0, x_1, \cdots, x_p$, the determining variables with intercept term. Usually, $x_0$ is a column of ones defining an intercept term in the statistical model.
Statistical Model Examples
Suppose $y, x, x_0, x_1, x_2, \cdots$ are numeric variables, $X$ is a matrix. Following are some examples that specify statistical models in R.
- y ~ x or y ~ 1 + x
Both examples imply the same simple linear regression model of $y$ on $x$. The first formulae have an implicit intercept term and the second formulae have an explicit intercept term. - y ~ 0 + x or y ~ -1 + x or y ~ x – 1
All these imply the same simple linear regression model of $y$ on $x$ through the origin, without an intercept term. - log(y) ~ x1 + x2
Imply multiple regression of the transformed variable, $(log(y)$ on $x_1$ and $x_2$ with an implicit intercept term. - y ~ poly(x , 2) or y ~ 1 + x + I(x, 2)
Imply a polynomial regression model of $y$ on $x$ of degree 2 (second-degree polynomials) and the second formulae use explicit powers as a basis. - y~ X + poly(x, 2)
Multiple regression $y$ with a model matrix consisting of the design matrix $X$ as well as polynomial terms in $x$ to degree 2.
Note that the operator ~ defines a model formula in R language. The form of an ordinary linear regression model is, $response\,\, ~ \,\, op_1\,\, term_1\,\, op_2\,\, term_2\,\, op_3\,\, term_3\,\, \cdots $,
where
- The response is a vector or matrix defining the response (dependent) variable(s).
- $op_i$ is an operator, either + or -, implying the inclusion or exclusion of a term in the model. The + operator is optional.
- $term_i$ is either a matrix or vector or 1. It may be a factor or a formula expression consisting of factors, vectors, or matrices connected by formula operators.
FAQS about Statistical Models in R
- How statistical models are specified in R Language?
- How linear regression is performed in R language using the formula?
- How linear regression can be performed without intercept in r?
- How polynomial regression can be performed in R?
- Write about the ~ operator in R.