mctest R Package for Detection of Collinearity

In this post, I will discuss the existence and detection of collinearity among regressors using the mctest R Package.

The problem of multicollinearity plagues the numerical stability of regression estimates. It also causes some serious problems in the validation and interpretation of the regression model. Consider the usual multiple linear regression model,

$$y = X \beta+u$$

where $y$ is an $n\times 1$ vector of observation on dependent variable, $X$ is known design matrix of order $\times p$, having full-column rank $p$, $\beta$ is $p \times 1$ vector of unknown parameters and $u$ is an $n\times 1$ vector of random errors with mean zero and variance $\sigma^2 I_n$, where $I_n$ is an identity matrix of order $n$.

Collinearity Among Regressors

The existence of linear dependence (relationship) between regressors can affect the regression model’s ability to estimate the model’s parameters. Therefore, multicollinearity is a lack of independence or the presence of interdependence signified by usually high inter-correlations within a set of regressors (predictors).

In case of severe multicollinearity (mathematically when the matrix is ill-conditioned) the $X’X$ matrix cannot be inverted, implausible signs of coefficients, low t-ratios, high R-squared values, inflated standard errors, wider confidence intervals, very large condition number (CN) and non-significant and/or magnitude of regression coefficient estimates are some of the possible issues.

There are many diagnostic methods are available to check the existence of collinearity among regressors, such as variance inflation Factor (VIF), values of pair-wise correlation among regressors, eigenvalues, CN, Farrar and Glauber tests, Theil’s measure, and Klein’s rule, etc.

Our recently developed R package mctest computes several collinearity diagnostics measures to test the existence of collinearity among regressors. We classified these measures as individual collinearity diagnostics and overall collinearity diagnostics. Overall collinearity diagnostic includes determinant of $X’X$ matrix, red indicator, Farrar Chi-Square test, Theil indicator, CN, and the sum of lambda inverse values. Individual collinearity diagnostics include VIF/ TOL, Farrar and Glauber Wi test, the relationship between $R^2$ and F-test, corrected VIF (CVIF), and Klein’s rule.

How to use the mctest R Package

You must have installed and loaded the mctest R Package to start with the testing/ detection of collinearity among regressors. As an example, we used Hald data which is already bundled in the mctest R Package.

mctest R package have 4 functions namely, mctest(), omcdiag(), imcdiag() and mc.plot() functions. The mctest() function can be used to have overall and/or individual collinearity diagnostic. The mc.plot() is used to draw the graph of VIF and eigenvalues to judge collinearity among regressors.

mctest illustrative Example

Arguments of mctest with syntax are

mctest(x, y, type = c("o", "I", "b"), na.rm = TRUE, Inter = TRUE, method = NULL, corr = FALSE, detr = 0.01, red = 0.5, theil = 0.5, cn = 30, vif = 10, tol = 0.1, conf = 0.95, cvif = 10, leamer = 0.1, all = all)

See the detail of each argument and see the mctest package documentation. Following are a few commands that can be used to get different collinearity diagnostics.

x <- Hald[ ,-1]     # X variables from Hald data
> y <- Hald[ ,1]      # y variable from Hald data

mctest(x, y)        # default collinearity diagnostics
mctest(x, y, type = "i")  # individual collinearity diagnostics
mctest(x, y, type = "o") # overall collinearity diagnostics

Overall Collinearity Diagnostics in R

For overall collinearity diagnostics, eigenvalues, and condition numbers are also produced whether the intercept term is included or not. The syntax of omcdiag() function is

omcdiag(x, y, na.rm = TRUE, Inter = True, detr = 0.01, red = 0.5, conf = 0.95, theil = 0.5, cn = 30, …)

Determinant of the correlation matrix, Farrar test of Chi-square, Red indicator, the sum of lambda inverse values, Theils’ indicator, and CN.

omcdiag(x, y, Inter=FALSE)
omcdiag(x, y)[1]

omcidag(x,y, detr = 0.001, conf = 0.99)

For the last command (with the threshold for determinant and confidence interval for the Farrar and Glauber test). The output is,

mctest r package overall collinearity diagnostics

Individual Collinearity Diagnostics in R

The general syntax of individual collinearity Diagnostics in R Language through mctest packages is:

imcdiag(x, y, method = NULL, na.rm = TRUE, corr = FALSE, vif = 10, tol = 0.1, conf = 0.95, cvif = 10, leamer = 0.1, all = all)

Function imcdiag() detects the existence of multicollinearity due to certain X-variable. This includes VIF, TOL, Klein’s rule, CVIF, F&G test of Chi-square and F-test.

imcdiag(x = x, y)
imcdiag(x = x, y, corr = TRUE) # correlation matrix
imcdiag(x = x, y, vif = 5, leamer = 0.05)  # with threshold of VIF and leamer method
mctest r package individual collinearity diagnostics
imcdiag(x = x, y, all = True)
imcdiag(x = x, y, all = TRUE, vif = 5, leamer = 0.2, cvif = 5)
mctest: individual collinearity diagnostics

Graphical Representation of VIF and Eigenvalues

mc.plot(x, y, Inter = FALSE, vif = 10, ev = 0.01)
mc.plot(x, y)
mc.plot(x, y, vif = 5, ev = 0.2)
mctest: collinearity diagnostic measures

For further details about collinearity diagnostic see

Statistical Models in R Language: Secrets

R language provides an interlocking suite of facilities that make fitting statistical models very simple. The output from statistical models in R language is minimal and one needs to ask for the details by calling extractor functions.

Defining Statistical Models in R Language

The template for a statistical model is a linear regression model with independent, heteroscedastic errors, that is
$$\sum_{j=0}^p \beta_j x_{ij}+ e_i, \quad e_i \sim NID(0, \sigma^2), \quad i=1,2,\dots, n, j=1,2,\cdots, p$$

In matrix form, the statistical model can be written as

$$y=X\beta+e$$

where the $y$ is the dependent (response) variable, $X$ is the model matrix or design matrix (matrix of regressors), and has columns $x_0, x_1, \cdots, x_p$, the determining variables with intercept term. Usually, $x_0$ is a column of ones defining an intercept term in the statistical model.

Statistical Model Examples

Suppose $y, x, x_0, x_1, x_2, \cdots$ are numeric variables, $X$ is a matrix. Following are some examples that specify statistical models in R.

  • y ~ x    or   y ~ 1 + x
    Both examples imply the same simple linear regression model of $y$ on $x$. The first formulae have an implicit intercept term and the second formulae have an explicit intercept term.
  • y ~ 0 + x  or  y ~ -1 + x  or y ~ x – 1
    All these imply the same simple linear regression model of $y$ on $x$ through the origin, without an intercept term.
  • log(y) ~ x1 + x2
    Imply multiple regression of the transformed variable, $(log(y)$ on $x_1$ and $x_2$ with an implicit intercept term.
  • y ~ poly(x , 2)  or  y ~ 1 + x + I(x, 2)
    Imply a polynomial regression model of $y$ on $x$ of degree 2 (second-degree polynomials) and the second formulae use explicit powers as a basis.
  • y~ X + poly(x, 2)
    Multiple regression $y$ with a model matrix consisting of the design matrix $X$ as well as polynomial terms in $x$ to degree 2.

Note that the operator ~ defines a model formula in R language. The form of an ordinary linear regression model is, $response\,\, ~ \,\, op_1\,\, term_1\,\, op_2\,\, term_2\,\, op_3\,\, term_3\,\, \cdots $,

where

  • The response is a vector or matrix defining the response (dependent) variable(s).
  • $op_i$ is an operator, either + or -, implying the inclusion or exclusion of a term in the model. The + operator is optional.
  • $term_i$ is either a matrix or vector or 1. It may be a factor or a formula expression consisting of factors, vectors, or matrices connected by formula operators.
Statistical Models in R Language

FAQS about Statistical Models in R

  1. How statistical models are specified in R Language?
  2. How linear regression is performed in R language using the formula?
  3. How linear regression can be performed without intercept in r?
  4. How polynomial regression can be performed in R?
  5. Write about the ~ operator in R.
Statistical Models in R Language R FAQs https://rfaqs.com

https://gmstat.com
https://itfeature.com

How to View Source Code of R Method/ Function?

The article is about viewing the source code of R Method. There are different ways to view the source code of an R method or function. It will help to know how the function is working.

Source Code of R Method (Internal Functions)

If you want to see the source code of R method or the internal function (functions from base packages), just type the name of the function at the R prompt such as;

rowMeans
view R code of method

Functions or Methods from the S3 Class System

For S3 classes, the methods function can be used to list the methods for a particular generic function or class.

methods(predict)
Methods from the S3

Note that “Non-Visible functions are asterisked” means that the function is not exported from its package’s namespace.

One can still view its source code via the ::: function such as

stats:::predict.lm

or by using getAnywhere() function, such as

getAnywhere(predict.lm)

Note that the getAnywhere() function is useful as you don’t need to know from which package the function or method comes from.

Functions or Methods from the S4 Class System

The S4 system is a newer method dispatch system and is an alternative to the S3 system. The package ‘Matrix’ is an example of S4 function.

library(Matrix)
chol2inv
S4 Class System

The output already offers a lot of information. The standardGeneric is an indicator of an S4 function. The method to see defined S4 methods is to use showMethods(chol2inv), that is;

showMethods(chol2inv)
Source Code of R Method: view R code S4 System

The getMethod can be used to see the source code of one of the methods, such as,

getMethod ("chol2inv", "diagonalMatrix")
view R code S4 System

View Source Code of Unexported Functions

In the case of unexported functions such as ts.union, .cbindts, and .makeNamesTs from the stats namespace, one can view the source code of these unexported functions using the ::: operator or getAnywhere() function, for example;

stats::: .makeNamesTs
getAnywhere(.makeNamesTs)
view R code S4 System

https://itfeature.com

Online MCQs Test Preparation Website