## Statistical Models in R Language

R language provides an interlocking suite of facilities that make fitting statistical models very simple. The output from statistical models in R language is minimal and one needs to ask for the details by calling extractor functions.

Defining Statistical Models; Formulae in R Language

The template for a statistical model is a linear regression model with independent, heteroscedastic errors, that is
$\sum_{j=0}^p \beta_j x_{ij}+ e_i, \quad e_i \sim NID(0, \sigma^2), \quad i=1,2,\dots, n, j=1,2,\cdots, p$

In matrix form, statistical model can be written as
$y=X\beta+e$,
where the $y$ is the dependent (response) variable, $X$ is the model matrix or design matrix (matrix of regressors) and has columns $x_0, x_1, \cdots, x_p$, the determining variables with intercept term. Usually $x_0$ is a column of ones defining an intercept term in statistical model.

Statistical Model Examples
Suppose $y, x, x_0, x_1, x_2, \cdots$ are numeric variables, $X$ is a matrix. Following are some examples that specify statistical models in R.

• y ~ x    or   y ~ 1 + x
Both examples imply the same simple linear regression model of $y$ on $x$. The first formulae has an implicit intercept term and the second formulae has an explicit intercept term.
• y ~ 0 + x  or  y ~ -1 + x  or y ~ x – 1
All these imply the same simple linear regression model of $y$ on $x$ through the origin, that is, without an intercept term.
• log(y) ~ x1 + x2
Imply multiple regression of the transformed variable, $latex(log(y)$ on $x_1$ and $x_2$ with an implicit intercept term.
• y ~ poly(x , 2)  or  y ~ 1 + x + I(x, 2)
Imply a polynomial regression model of $latex$ y on $latex x$ of degree 2 (second degree polynomials) and the second formulae uses explicit powers as basis.
• y~ X + poly(x, 2)
Multiple regression $y$ with model matrix consisting of the design matrix $X$ as well as polynomial terms in $x$ to degree 2.

Note that the operator ~ is used to define a model formula in R language. The form of an ordinary linear regression model is, $response\,\, ~ \,\, op_1\,\, term_1\,\, op_2\,\, term_2\,\, op_3\,\, term_3\,\, \cdots$,

where

response is a vector or matrix defining the response (dependent) variable(s).
$op_i$ is an operator, either + or -, implying the inclusion or exclusion of a term in the model. The + operator is optional.
$term_i$ is either a matrix or vector or 1. It may be a factor or a formula expression consisting of factors, vectors or matrices connected by formula operators.

## Source Code of R Method

There are different ways to view the source code of an R method or function. It will help to know how function is working.

Internal Functions
If you want to see the source code of internal function (functions from base packages), just type the name of the function at R prompt such as;

> rowMeans

Functions or Methods from S3 Class System
For S3 classes, methods function can be used to list the methods for a particular generic function or class.

> methods(predict)

Note that “Non-Visible functions are asterisked” means that the function is not exported from its package’s namespace.

One can still view its source code via the ::: function such as

> stats:::predict.lm

or by using getAnywhere() function, such as

> getAnywhere(predict.lm)

Note that the getAnywhere() function is useful as you don’t need to know from which package the function or method is came from.

Functions or Methods from S4 Class System
The S4 system is a newer method dispatch system and is an alternative to the S3 system. The package ‘Matrix’ is an example of S4 function.

> library(Matrix)
> chol2inv

The output already offers a lot of information. The standardGeneric is an indicator of an S4 function. The method to see defined S4 methods is to use showMethods(chol2inv), that is;

>showMethods(chol2inv)

The getMethod can be used to see the source code of one of the methods, such as,
> getMethod (“chol2inv”, “diagonalMatrix”)

Functions that Calls Unexported Functions
In the case of unexported functions such as ts.union, .cbindts and .makeNamesTs from the stats namespace, one can view source code of these unexported functions using ::: operator or getAnywhere() function, for example;
> stats::: .makeNamesTs
> getAnywhere(.makeNamesTs)

## Greek letters in R plot label and title

Question: How one can include Greek letter (symbols) in R plot labels?
Answer: Greek letters or symbols can be included in titles and labels of graph using the expression command. Following are some examples

Note that in these example random data is generated from normal distribution. You can use your own data set to produce graphs that have symbols or Greek letters in their labels or titles.

Example 1:

> mycoef <- rnorm (1000)
> hist(mycoef, main = expression(beta) )

where beta in expression is Greek letter (symbol) of $\beta$. A histogram similar to following will be produced.

Example 2:

sample <- rnorm(mean=5, sd=1, n=100)
> hist(sample, main=expression( paste(“sampled values, “, mu, “=5, “, sigma, “=1” )))

where mu and sigma are symbols of $\mu$ and $\sigma$ respectively. Now histogram will look like

Example 3:

curve(dnorm, from= -3, to=3, n=1000, main=”Normal Probability Density Function”)

will produce curve of Normal probability density function ranging from $-3$ to $3$.

To add normal density function formula, we need to use text and paste command, that is

> text(-2, 0.3, expression(f(x)== paste(frac(1, sqrt(2*pi* sigma^2 ) ), ” “, e^{frac(-(x-mu)^2, 2*sigma^2)})), cex=1.2)

Now the updated curve of Normal probability density function will be

Example 4:

x <- dnorm( seq(-3, 3, 0.001))
> plot(seq(-3, 3, 0.001), cumsum(x)/sum(x), type=”l”, col=”blue”, xlab=”x”, main=”Normal Cumulative Distribution Function”)

The Normal Cumulative Distribution function will look like,

To add formula, use text and paste command, that is

text(-1.5, 0.7, expression(phi(x)==paste(frac(1, sqrt(2*pi)), ” “, integral(e^(-t^2/2)*dt, -infinity, x))), cex=1.2)

The Curve of Normal Cumulative Distribution Function and its formula in plot will look like,

## Import Data using read.table function

Question: How I can check my Working Directory so that I would be able to import my data in R.
Answer: To find working directory, the command getwd() can be used, that is

> getwd()

Question: How I can change working directory to my own path.
Answer: Use function setwd(), that is

> setwd(“d:/mydata”)
> setwd(“C:/Users/XYZ/Documents”)

Question: I have data set stored in text format (ASCII) that contain rectangular data. How I can read this data in tabular form. I have already set my working directory.
Answer: As data is already in a directory, which is set as working directory, use following command

mydata is named object that will have data from file “data.dat” or “data.txt” in data frame format. Each variable in data file will be named by default V1, V2, ….

Question: How this stored data can be to accessed?
Answer: To access the stored data, write data frame object name (“mydata”) with $sign and name of the variable. That is, mydata$V1
mydata\$V2
mydata[“V1”]
mydata[,1]

Question: My data file has variables names in first row of the data file. In previous Question, variables names were V1, V2, V3, … How I can get actual names of the variable store in first row of data.dat file.

Question: I want to read a data file which is not store in working directory?
Answer: To access the data file which is not stored in working directory, provide complete path of the file, such as.

Note that read.table() is used to read the data from external files that has a normally a special form:

• The first line of the file should have a name for each variable in the data frame. However, if first row does not contains name of variable then header argument should not be set to FALSE.
• Each additional line of the file has it first item a row label and the values for each variable.

In R it is strongly suggested that variables need to be held in data frame. For this purpose read.table() function can be used. For further details about read.table() function use,

## Introduction to R plot() function

Question: Can we draw graphics in R language?
Answer: Yes. R language produces high quality statistical graphs. There are many useful and sophisticated kinds of graphs available in R.

Question: Where graphics are displayed in R?
Answer: In R, all graphs are produced in a windows named Graphic Windows which can be resized.

Question: What is the use of plot function in R?
Answer: In R, plot() is a generic function that can be used to make a variety of point and line graphs. plot() function can also be used to define a coordinate space.

Question: What are the arguments of plot() function?
Answer: There are many arguments used in plot() function. Some of these arguments are x, y, type, xlab, ylab, etc. To see the full list of arguments of plot(), write the command in R console;

args(plot.default)

Question: Does all arguments are necessary to be used in R?
Answer: No. The first two arguments x and y provide the horizontal and vertical coordinates of points or lines to be plotted and also define a data-coordinate system for the graph. At least argument x is required.

Question: What is the use of the argument type in plot() function?
Answer: The argument type determines the type of the graph to be drawn. There are several types of graph that can be drawn. The default type of graph type=’p’, plots points at the coordinates specified by x and y argument. Specifying type=’l’ produces a line graph, and type=’n’ sets up the plotting region to accommodate the data set but plots nothing.

Question: Is there other types of graph are?
Answer: Yes. Setting type=’b’, draw graphs having both points and lines. Setting type=’h’ draws histogram like vertical lines and setting type=’s’ and type=’S’ draws stair-step-like lines starting horizontally and vertically respectively.

Question: What is the use of xlim and ylim in plot() function?
Answer: The argument xlim and ylim may be used to define the limits of the horizontal and vertical axes. Usually these arguments are unnecessary, because R langauge reasonably pick limits from x and y.

Question: What is the purpose of xlab and xlab argument in plot() function?
Answer: xlab and ylab argument tack character-string arguments to label the horizontal and vertical axes.

Question: Provide few examples of plot() function?
Answer: Suppose of have data set on variable x and y, such as

x <- rnorm(100, m=10, sd=10)
> y <- rnorm(100)
> plot(x, y)
> plot(x, y, xlab=’X  (Mean=10, SD=10)’,   ylab=’Y (Mean=1, SD=1)’ , type=’l’)
> plot(x, y, xlab=’X  (Mean=10, SD=10)’,   ylab=’Y (Mean=1, SD=1)’ , type=’o’)
> plot(x, y, xlab=’X  (Mean=10, SD=10)’,   ylab=’Y (Mean=1, SD=1)’ , pch=10)