## Exploring Data in R

Examination of data (Exploring Data), particularly graphical examination and representation of data is an important prelude to statistical data analysis and modeling. Note that there are some limitations on the kinds of graphs that we can create.

One should be familiar with standard procedures for exploratory data analysis, statistical graphics, and data transformation too. We can categorize the graphical representation of data on the basis of nature (or type) of variable, number of variables, and objectivity of analysis. For example, if we are comparing groups then comparison graphs such as bar graphs can be used and if we are interested in the kind of relationship between variables then a scatter plot can be useful.

**Distributional Displays:**

The distributional displays include stem and leaf display, histograms, density estimates, quantile comparison plots, and box plots.**Plots of the Relationship between two variables:**

The graphical representations for the relationship between two variables include various versions of scatter plots, scatter plot smoothers, bivariate density estimates, and parallel box plots.**Multivariate Displays:**

Multivariate graphical representations include scatter plot matrices,coplots , and dynamic three dimensional scatter plots.

For exploring the data in R, following are some examples:

**Stem and Leaf display and Histogram in R**

attach(mtcars) hist(mpg) hist(mpg, nclass=3, col=3) stem(mpg)

**Density Estimates**

Consider the following R code for a representation of distribution by smoothing the histogram.

hist(mpg, probability=T, ylab='Density') lines(density(mpg, lwd=2)) points(mpg, rep(0, length(mpg)), pch="|") lines(density(mpg, adjust=0.9), lwd=1)

The `hist()`

function constructs the histogram with `probability = TRUE`

specifying density scaling. The `lines()`

function draws the density estimate on the graph having a thickness of the line as double due to parameter `lwd=2`

. The `points()`

function draws a one-dimensional scatter plot at the bottom of the graph by using a vertical bar as the plotting symbol. The second call to density in `lines()`

function with `adjust=0.9`

, specifies a bandwidth 0.9 the default value.

**Quantile Comparison Plots**

Quantile plots help in comparing the distribution of a variable with a theoretical distribution such as the normal distribution.

library(car) qqPlot(mpg)

Note that the `qqPlot()`

function is available in car library. The `qq.plot()`

function is defunct.

**Relationship Graphs**

To explore the relationship between two quantitative variables use `plot()`

function and for a more enhanced version of a scatter plot between two variables use `scatterplot()`

function. This function plot the variables with least squares and non-parametric regression lines. For example,

plot(mpg, wt) scatterplot(mpg, wt) scatterplot(mpg, wt, labels=rownames(cyl))

**CLICK to learn about **`plot()`

function in R

`plot()`

function in R