Exploring Data in R: A Comprehensive R Tutorial

Examination of data (Exploring Data), particularly graphical examination and representation of data is an important prelude to statistical data analysis and modeling. Note that there are some limitations on the kinds of graphs that we can create.

One should be familiar with standard procedures for exploratory data analysis, statistical graphics, and data transformation. We can categorize the graphical representation of data based on the variable’s nature (or type), the number of variables, and the objectivity of the analysis. For example, if we are comparing groups then comparison graphs such as bar graphs can be used. If we are interested in the kind of relationship between variables then a scatter plot can be useful.

  • Distributional Displays:
    The distributional displays include stem and leaf displays, histograms, density estimates, quantile comparison plots, and box plots.
  • Plots of the Relationship between two variables:
    The graphical representations of the relationship between two variables include various versions of scatter plots, scatter plot smoothers, bivariate density estimates, and parallel box plots.
  • Multivariate Displays:
    Multivariate graphical representations include scatter plot matrices, coplots, and dynamic three-dimensional scatter plots.

For exploring the data in R, the following are some examples:

Stem and Leaf Display and Histogram in R

attach(mtcars)
hist(mpg)
hist(mpg, nclass = 3, col = 3)
stem(mpg)
Histogram: Exploring Data in R

Exploring Data in R: Density Estimates

Consider the following R code for a representation of distribution by smoothing the histogram.

hist(mpg, probability = T, ylab = 'Density')
lines(density(mpg, lwd = 2))
points(mpg, rep(0, length(mpg)), pch = "|")
lines(density(mpg, adjust = 0.9), lwd = 1)

The hist() function constructs the histogram with probability=TRUE specifying density scaling. The lines() function draws the density estimate on the graph having a thickness of the line as double due to the parameter lwd=2. The points() function draws a one-dimensional scatter plot at the bottom of the graph by using a vertical bar as the plotting symbol. The second call to density in lines() the function with adjust=0.9, specifies a bandwidth of 0.9 the default value.

Quantile Comparison Plots

Quantile plots help in comparing the distribution of a variable with a theoretical distribution such as the normal distribution.

library(car)
qqPlot(mpg)

Note that the qqPlot() function is available in the car library. The qq.plot() function is defunct.

Exploring Data: Relationship Graphs

To explore the relationship between two quantitative variables use plot() function and for a more enhanced version of a scatter plot between two variables use scatterplot() function. This function plots the variables with least squares and non-parametric regression lines. For example,

plot(mpg, wt)
scatterplot(mpg, wt)
scatterplot(mpg, wt, labels = rownames(cyl))

CLICK to learn about plot() function in R

FAQs about R Language

  1. What do you mean by exploring data?
  2. What are the objectives of exploratory data analysis?
  3. What are the important visualizations for exploratory data analysis?
  4. For exploratory analysis, which graph is used for comparison purposes?
  5. For exploratory analysis, which graph is used to explore the relationship between variables?
  6. What is a quantile comparison plot?
  7. What is the objective of density estimation graphs?
  8. Name some of the multivariate plots used for EDA.

R Programming Language

Computer MCQs Online Test

Greek Letters in R Plot Label and Title

Introduction to Greek Letters in R Plot

The post is about writing Greek letters in R plot, their labels, and the title of the plots. There are two main ways to include Greek letters in your R plot labels (axis labels, title, legend):

  1. Using the expression Function
    This is the recommended approach as it provides more flexibility and control over the formatting of the Greek letters and mathematical expressions.
  2. Using raw Greek letter Codes
    This method is less common and requires memorizing the character codes for each Greek letter.

Question: How one can include Greek letters (symbols) in R plot labels?
Answer: Greek letters or symbols can be included in titles and labels of a graph using the expression command. Following are some examples

Note that in these examples random data is generated from a normal distribution. You can use your own data set to produce graphs that have symbols or Greek letters in their labels or titles.

Greek Letters in R Plot

The following are a few examples of writing Greek letters in R plot.

Example 1: Draw Histogram

mycoef <- rnorm (1000)
hist(mycoef, main = expression(beta) )

where beta in expression is the Greek letter (symbol) of $latex \beta$. A histogram similar to the following will be produced.

greek Letters in r plot-1

Example 2:

sample <- rnorm(mean=5, sd=1, n=100)
hist(sample, main=expression( paste("sampled values, ", mu, "=5, ", sigma, "=1" )))

where mu and sigma are symbols of $latex \mu$ and $latex \sigma$ respectively. The histogram will look like

greek symbols in r plot-2

Example 3:

curve(dnorm, from= -3, to=3, n=1000, main="Normal Probability Density Function")

will produce a curve of Normal probability density function ranging from $latex -3$ to $latex 3$.

greek symbols in r plot-3

Normal Density Function

To add a normal density function formula, we need to use the text and paste command, that is

text(-2, 0.3, expression(f(x) == paste(frac(1, sqrt(2*pi* sigma^2 ) ), " ", e^{frac(-(x-mu)^2, 2*sigma^2)})), cex=1.2)

Now the updated curve of the Normal probability density function will be

Normal Probability Density Function

Example 4:

x <- dnorm( seq(-3, 3, 0.001))
plot(seq(-3, 3, 0.001), cumsum(x)/sum(x), 
           type="l", col="blue", xlab="x", 
           main="Normal Cumulative Distribution Function")

The Normal Cumulative Distribution function will look like,

Normal Cumulative Distribution Function

To add the formula, use the text and paste command, that is

text(-1.5, 0.7, 
       expression(phi(x) == paste(frac(1, sqrt(2*pi)), " ", 
       integral(e^(-t^2/2)*dt, -infinity, x))), cex = 1.2)

The curve of Normal Cumulative Distribution Function

The Curve of the Normal Cumulative Distribution Function and its formula in the plot will look like this,

Normal Cumulative distribution

https://itfeature.com

https://gmstat.com

Plot Function in R

This article about the plot function in R Language gives some introduction about the plot() function, the use and purpose of its arguments, and a few examples are provided. Using the R plot function one can draw different graphical representations and the arguments of the plot() function can be used to enhance the graph.

Introduction to Graphics in R Language

Question: Can we draw graphics in R language?
Answer: Yes. R language produces high-quality statistical graphs. There are many useful and sophisticated kinds of graphs available in R.

Question: Where graphics are displayed in R?
Answer: In R, all graphs are produced in a window named Graphic Windows which can be resized.

Question: What is the use of the plot function in R?
Answer: In R, plot() is a generic function that can be used to make a variety of point and line graphs. plot() function can also be used to define a coordinate space.

Important Arguments of the Plot Function in R

Question: What are the arguments of the plot() function?
Answer: There are many arguments used in the plot() function. Some of these arguments are x, y, type, xlab, ylab, etc. To see the full list of arguments of the plot() write the command in the R console;

args(plot.default)

Question: Are all arguments necessary to be used in R?
Answer: No. The first two arguments x and y provide the horizontal and vertical coordinates of points or lines to be plotted and define a data-coordinate system for the graph. At least argument x is required. Note that many of the arguments are set to default values in the plot function.

Question: What is the use of the argument type in the plot() function?
Answer: In the R plot function, the argument type determines the type of the graph to be drawn. Several types of graphs can be drawn. The default type of graph type=’p’, plots points at the coordinates specified by the x and y argument. Specifying type=’l’ produces a line graph, and type=’n’ sets up the plotting region to accommodate the data set but plots nothing.

Other Types of Graphs: Setting type Argument

Question: Are there other types of graphs?
Answer: Yes. Setting type=’b’, draw graphs having both points and lines. Setting type=’h’ draws histogram-like vertical lines and setting type=’s’ and type=’S’ draws stair-step-like lines starting horizontally and vertically respectively.

Question: What is the use of xlim and ylim in plot() function?
Answer: The arguments xlim and ylim may be used to define the limits of the horizontal and vertical axes. Usually, these arguments are unnecessary, because R language reasonably picks limits from x and y.

Question: What are the purpose of xlab and xlab arguments in the plot() function?
Answer: xlab and ylab argument tack character-string arguments to label the horizontal and vertical axes.

Examples of R Plot Function in R

Question: Provide a few examples of the R plot function.
Answer: The following are a few examples of R plot functions. Suppose you have a data set on variables x and y, such as

x <- rnorm(100, m=10, sd=10)
y <- rnorm(100)

plot(x, y)
plot(x, y, xlab='X  (Mean=10, SD=10)',   ylab='Y (Mean=1, SD=1)' , type='l')
plot(x, y, xlab='X  (Mean=10, SD=10)',   ylab='Y (Mean=1, SD=1)' , type='o')
plot(x, y, xlab='X  (Mean=10, SD=10)',   ylab='Y (Mean=1, SD=1)' , pch=10)
Introduction to plot function in R

https://gmstat.com

https://itfeature.com