Using R packages

Reading and Writing JSON files in R

A JSON file store simple data structures and objects in JavaScript object Notation (JSON) format. JSON is a standard data lightweight interchange format that is primarily used for transmitting data between a web application and a server. The JSON file is a text file that is language independent, self-describing, and easy to understand. Here we will discuss reading and writing JSON files in R Language in detail using the R package “rjson“.

Since JSON file format is text only, which can be sent to and from a server, and used as a data format by any programming language. The data in the JSON file is nested and hierarchical. Let us start reading and writing JSON files in R.

Reading JSON files in R

R can read JSON files using the rjson package. First, install rjson package.

Issue the following command in the R console, to install the rjson package.

install.packages("rjson")

Let create a JSON file. Copy the following lines into a text editor such as notepad. Save the file with a .json extension and choosing the file type as all files(*.*). Let the file name is “data.json”, stored on “D:” drive.

{ 
"ID":["1","2","3","4","5","6","7","8" ],
"Name":["Rick","Dan","Michelle","Ryan","Gary","Nina","Simon","Guru" ],
"Salary":["623.3","515.2","611","729","843.25","578","632.8","722.5" ],

"StartDate":[ "1/1/2012","9/23/2013","11/15/2014","5/11/2014","3/27/2015","5/21/2013",
"7/30/2013","6/17/2014"],
"Dept":[ "IT","Operations","IT","HR","Finance","IT","Operations","Finance"]
}
JSON data file

To read a JSON file, the rjson package needs to be loaded. Use fromJSON( ) function to read the file.

# Give the data file name to the function.
result <- fromJSON(file = "D:\\data.json")
# Print the result.
print(result)

The JSON file now can be converted to a Data Frame for further analysis using the as.data.frame() function.

# Convert JSON file to a data frame.
json_data_frame <- as.data.frame(result)
print(json_data_frame)

Writing JSON objects to .Json file

To write JSON Object to file, the toJSON() function from the rjson library can be used to prepare a JSON object and then use the write() function for writing the JSON object to a local file.

Let create a list of objects as follows

list1 <- vector(mode="list", length=2)
list1[[1]] <- c("apple", "banana", "rose")
list1[[2]] <- c("fruit", "fruit", "flower")

read the above list to JSON

jsonData < toJSON(list1)

write JSON object to file

write(jsonData, "output.json")

Read more about importing and exporting data in R: see the post

mctest: An R package for Detection of Collinearity among Regressors

In this post, I will discuss about existence and detection of collinearity among regressors.

The problem of multicollinearity plagues the numerical stability of regression estimates. It also causes some serious problems in the validation and interpretation of the regression model. Consider the usual multiple linear regression model,

$y = X \beta+u$,

where $y$ is an $n\times 1$ vector of observation on dependent variable, $X$ is known design matrix of order $\times p$, having full-column rank $p$, $\beta$ is $p \times 1$ vector of unknown parameters and $u$ is an $n\times 1$ vector of random errors with mean zero and variance $\sigma^2 I_n$, where $I_n$ is an identity matrix of order $n$.

Collinearity Among Regressors

The existence of linear dependence (relationship) between regressors can affect the regression model’s ability to estimate the model’s parameters (regression coefficients). Therefore, multicollinearity is a lack of independence or the presence of interdependence signified by usually high inter-correlations within a set of regressors (predictors).

In case of severe multicollinearity (mathematically when the matrix is ill-conditioned) the $X’X$ matrix cannot be inverted, implausible signs of coefficients, low t-ratios, high R-squared values, inflated standard errors, wider confidence intervals, very large condition number (CN) and non-significant and/or magnitude of regression coefficient estimates are some of the possible issues.

There are many diagnostic methods are available to check the existence of collinearity among regressors, such as variance inflation Factor (VIF), values of pair-wise correlation among regressors, eigenvalues, CN, Farrar and Glauber tests, Theil’s measure, and Klein’s rule, etc.

Our recently developed R package mctest computes several diagnostic measures to test the existence of collinearity among regressors. We classified these measures as individual collinearity diagnostics and overall collinearity diagnostics. Overall collinearity diagnostic includes determinant of $X’X$ matrix, red indicator, Farrar Chi-Square test, Theil indicator, CN, and the sum of lambda inverse values. Individual collinearity diagnostics include VIF/ TOL, Farrar and Glauber Wi test, the relationship between $R^2$ and F-test, corrected VIF (CVIF), and Klein’s rule.

How to use the mctest Package

You must have installed and loaded the mctest package to start with the testing/ detection of collinearity among regressors. As an example, we used Hald data which is already bundled in the mctest package.

mctest package have 4 functions namely, mctest(), omcdiag(), imcdiag() and mc.plot() functions. The mctest() function can be used to have overall and/or individual collinearity diagnostic. The mc.plot() can be used to draw graph of VIF and eigenvalues to have the graphical judgment of collinearity among regressors.

mctest illustrative Example

Arguments of mctest with syntax are

mctest(x, y, type = c("o", "I", "b"), na.rm = TRUE, Inter = TRUE, method = NULL, corr = FALSE, detr = 0.01, red = 0.5, theil = 0.5, cn = 30, vif = 10, tol = 0.1, conf = 0.95, cvif = 10, leamer = 0.1, all = all)

See the detail of each argument and see the mctest package documentation. Following are a few commands that can be used to get different collinearity diagnostics.

> x <- Hald[ ,-1]     # X variables from Hald data
> y <- Hald[ ,1]      # y variable from Hald data
> mctest(x, y)        # default collinearity diagnostics
> mctest(x, y, type = "i")  # individual collinearity diagnostics
> mctest(x, y, type = "o") # overall collinearity diagnostics

Overall Collinearity Diagnostics

For overall collinearity diagnostics, eigenvalues, and condition numbers are also produced whether the intercept term is included or not. The syntax of omcdiag() function is

> omcdiag(x, y, na.rm = TRUE, Inter = True, detr = 0.01, red = 0.5, conf = 0.95, theil = 0.5, cn = 30, …)

Determinant of the correlation matrix, Farrar test of Chi-square, Red indicator, the sum of lambda inverse values, Theils’ indicator, and CN.

> omcdiag(x, y, Inter=FALSE)
> omcdiag(x, y)[1] 
> omcidag(x,y, detr=0.001, conf=0.99)

For the last command (with threshold for determinant and confidence interval for Farrar and Glauber test). the output is,

mctest: overall collinearity diagnostics

Individual Collinearity Diagnostics

> imcdiag(x, y, method = NULL, na.rm = TRUE, corr = FALSE, vif = 10, tol = 0.1, conf = 0.95, cvif = 10, leamer = 0.1, all = all)

Function imcdiag() detects the existence of multicollinearity due to certain X-variable. This includes VIF, TOL, Klein’s rule, CVIF, F&G test of Chi-square and F-test.

> imcdiag(x = x, y)
> imcdiag(x = x, y, corr = TRUE) # correlation matrix
> imcdiag(x = x, y, vif = 5, leamer = 0.05)  # with threshold of VIF and leamer method
mctest: individual collinearity diagnostics
> imcdiag(x = x, y, all = True)
> imcdiag(x = x, y, all = TRUE, vif = 5, leamer = 0.2, cvif = 5)
mctest: individual collinearity diagnostics

Graphical Representation of VIF and Eigenvalues

> mc.plot(x, y, Inter = FALSE, vif = 10, ev = 0.01)
> mc.plot(x, y)
> mc.plot(x, y, vif = 5, ev = 0.2)
mctest: collinearity diagnostic measures

For further details about collinearity diagnostic see

Scroll to top
x  Powerful Protection for WordPress, from Shield Security
This Site Is Protected By
Shield Security