In this post, I will discuss the existence and detection of collinearity among regressors using the mctest R Package.
The problem of multicollinearity plagues the numerical stability of regression estimates. It also causes some serious problems in the validation and interpretation of the regression model. Consider the usual multiple linear regression model,
$$y = X \beta+u$$
where $y$ is an $n\times 1$ vector of observation on dependent variable, $X$ is known design matrix of order $\times p$, having full-column rank $p$, $\beta$ is $p \times 1$ vector of unknown parameters and $u$ is an $n\times 1$ vector of random errors with mean zero and variance $\sigma^2 I_n$, where $I_n$ is an identity matrix of order $n$.
Collinearity Among Regressors
The existence of linear dependence (relationship) between regressors can affect the regression model’s ability to estimate the model’s parameters. Therefore, multicollinearity is a lack of independence or the presence of interdependence signified by usually high inter-correlations within a set of regressors (predictors).
In case of severe multicollinearity (mathematically when the matrix is ill-conditioned) the $X’X$ matrix cannot be inverted, implausible signs of coefficients, low t-ratios, high R-squared values, inflated standard errors, wider confidence intervals, very large condition number (CN) and non-significant and/or magnitude of regression coefficient estimates are some of the possible issues.
There are many diagnostic methods are available to check the existence of collinearity among regressors, such as variance inflation Factor (VIF), values of pair-wise correlation among regressors, eigenvalues, CN, Farrar and Glauber tests, Theil’s measure, and Klein’s rule, etc.
Our recently developed R package mctest computes several collinearity diagnostics measures to test the existence of collinearity among regressors. We classified these measures as individual collinearity diagnostics and overall collinearity diagnostics. Overall collinearity diagnostic includes determinant of $X’X$ matrix, red indicator, Farrar Chi-Square test, Theil indicator, CN, and the sum of lambda inverse values. Individual collinearity diagnostics include VIF/ TOL, Farrar and Glauber Wi test, the relationship between $R^2$ and F-test, corrected VIF (CVIF), and Klein’s rule.
How to use the mctest R Package
You must have installed and loaded the mctest R Package to start with the testing/ detection of collinearity among regressors. As an example, we used Hald data which is already bundled in the mctest R Package.
mctest R package have 4 functions namely, mctest(), omcdiag(), imcdiag() and mc.plot() functions. The mctest() function can be used to have overall and/or individual collinearity diagnostic. The mc.plot() is used to draw the graph of VIF and eigenvalues to judge collinearity among regressors.
mctest illustrative Example
Arguments of mctest with syntax are
mctest(x, y, type = c("o", "I", "b"), na.rm = TRUE, Inter = TRUE, method = NULL, corr = FALSE, detr = 0.01, red = 0.5, theil = 0.5, cn = 30, vif = 10, tol = 0.1, conf = 0.95, cvif = 10, leamer = 0.1, all = all)
See the detail of each argument and see the mctest package documentation. Following are a few commands that can be used to get different collinearity diagnostics.
x <- Hald[ ,-1] # X variables from Hald data
> y <- Hald[ ,1] # y variable from Hald data
mctest(x, y) # default collinearity diagnostics
mctest(x, y, type = "i") # individual collinearity diagnostics
mctest(x, y, type = "o") # overall collinearity diagnostics
Overall Collinearity Diagnostics in R
For overall collinearity diagnostics, eigenvalues, and condition numbers are also produced whether the intercept term is included or not. The syntax of omcdiag() function is
omcdiag(x, y, na.rm = TRUE, Inter = True, detr = 0.01, red = 0.5, conf = 0.95, theil = 0.5, cn = 30, …)
Determinant of the correlation matrix, Farrar test of Chi-square, Red indicator, the sum of lambda inverse values, Theils’ indicator, and CN.
omcdiag(x, y, Inter=FALSE)
omcdiag(x, y)[1]
omcidag(x,y, detr = 0.001, conf = 0.99)
For the last command (with the threshold for determinant and confidence interval for the Farrar and Glauber test). The output is,
Individual Collinearity Diagnostics in R
The general syntax of individual collinearity Diagnostics in R Language through mctest packages is:
imcdiag(x, y, method = NULL, na.rm = TRUE, corr = FALSE, vif = 10, tol = 0.1, conf = 0.95, cvif = 10, leamer = 0.1, all = all)
Function imcdiag() detects the existence of multicollinearity due to certain X-variable. This includes VIF, TOL, Klein’s rule, CVIF, F&G test of Chi-square and F-test.
imcdiag(x = x, y)
imcdiag(x = x, y, corr = TRUE) # correlation matrix
imcdiag(x = x, y, vif = 5, leamer = 0.05) # with threshold of VIF and leamer method
imcdiag(x = x, y, all = True)
imcdiag(x = x, y, all = TRUE, vif = 5, leamer = 0.2, cvif = 5)
Graphical Representation of VIF and Eigenvalues
mc.plot(x, y, Inter = FALSE, vif = 10, ev = 0.01)
mc.plot(x, y)
mc.plot(x, y, vif = 5, ev = 0.2)
For further details about collinearity diagnostic see