Tag «testing collinearity»

mctest: An R package for Detection of Collinearity among Regressors

The problem of multicollinearity plagues the numerical stability of regression estimates. It also causes some serious problem in validation and interpretation of the regression model. Consider the usual multiple linear regression model,

y = X \beta+u,

where y is an n\times 1 vector of observation on dependent variable, X is known design matrix of order \times p, having full-column rank p, \beta is p \times 1 vector of unknown parameters and u is an n\times 1 vector of random errors with mean zero and variance \sigma^2 I_n, where I_n is an identity matrix of order n.

Existence of linear dependence (relationship) between regressors can affect the regression model ability to estimate the model’s parameters (regression coefficients). Therefore, multicollinearity is lack of independence or the presence of interdependence signified by usually high inter-correlations within a set of regressors (predictors).

In case of sever multicollinearity (ill-conditioning) of X'X matrix, implausible signs, low t-ratios, high R-squared values, inflated standard errors, wider confidence intervals, very large condition number (CN) and non-significant and/or magnitude of regression coefficient estimates are some of possible issues.

There are many diagnostic methods are available to check the existence of collinearity among regressors, such as variance inflation Factor (VIF), values of pair-wise correlation among regressors, eigenvalues, CN, Farrar and Glauber tests, Theil’s measure, klein’s rule etc.

Our recently developed R package mctest computes several diagnostic measures to test the existence of collinearity among regressors. We classified these measures as individual collinearity diagnostic and overall collinearity diagnostics. Overall collinearity diagnostic include determinant of X'X matrix, red indicator, Farrar Chi-Square test, Theil indicator, CN, and sum of lambda inverse values. Individual collinearity diagnostics include VIF/ TOL, Farrar and Glaube Wi test, relationship between $R^2$ and F-test, corrected VIF (CVIF) and Klein’s rule.

How to use mctest package

You must have installed and load the mctest package to start with testing of collinearity among regressors. As an example, we used Hald data which is already bundled in mctest package.

mctest package have 4 functions namely, mctest(), omcdiag(), imcdiag() and mc.plot() functions. The mctest() function can be used to have overall and/or individual collinearity diagnostic. The mc.plot() can be used to draw graph of VIF and eigenvalues to have graphical judgement of among collinearity among regressors.

mctest illustrative Example
The argument of mctest is

mctest(x, y, type = c(“o”, “I”, “b”), na.rm = TRUE, Inter = TRUE, method = NULL, corr = FALSE, detr = 0.01, red = 0.5, theil = 0.5, cn = 30, vif = 10, tol = 0.1, conf = 0.95, cvif = 10, leamer = 0.1, all = all)

For detail of each argument see the mctest package documentation. Following are few commands that can be used get different collinearity diagnostics.

x<-Hald[ ,-1]  # X variables from Hald data
> y<-Hald[ ,1]   # y variable from Hald data
> mctest(x, y)   # default collinearity diagnostics
> mctest(x, y, type = “i”)  # individual collinearity diagnostics
> mctest(x, y, type = “o”) # overall collinearity diagnostics

Overall collinearity diagnostics

For overall collinearity diagnostics, eigenvalues and condition numbers are also produced either intercept term is included or not. The syntax of omcdiag() function is

omcdiag(x, y, na.rm = TRUE, Inter = True, detr = 0.01, red = 0.5, conf = 0.95, theil = 0.5, cn = 30, …)

Determinant of correlation matrix, Farrar test of Chi-square, Red indicator, sum of lambda inverse values, Theils’ indicator and CN.

> omcdiag(x, y, Inter=FALSE)
> omcdiag(x, y)[1]
> omcidag(x,y, detr=0.001, conf=0.99)

The output of last command (with threshold for determinant and confidence interval for Farrar and Glauber test).

mctest: overall collinearity diagnostics

Individual collinearity diagnostics

imcdiag(x, y, method = NULL, na.rm = TRUE, corr = FALSE, vif = 10, tol = 0.1, conf = 0.95, cvif = 10, leamer = 0.1, all = all)

The imcdiag() function detects the existence of multicollinearity due to certain X-variable. This includes VIF, TOL, Klein’s rule, CVIF, F&G test of Chi-square and F-test.

> imcdiag(x = x, y)
> imcdiag(x = x, y, corr = TRUE) # correlation matrix
> imcdiag(x = x, y, vif = 5, leamer = 0.05)   # with threshold of VIF and leamer method

mctest: individual collinearity diagnostics
> imcdiag(x = x, y, all = True)
> imcdiag(x = x, y, all = TRUE, vif = 5, leamer = 0.2, cvif = 5)

mctest: individual collinearity diagnostics

Graphical representation of VIF and Eigenvalues

> mc.plot(x, y, Inter = FALSE, vif = 10, ev = 0.01)
> mc.plot(x, y)
> mc.plot(x, y, vif = 5, ev =  0.2)

mctest: collinearity diagnostic measures

For further detail about collinearity diagnostic see

%d bloggers like this: