Scatter Plots In R

Introduction to Scatter Plots in R Language

Scatter plots (scatter diagrams) are bivariate graphical representations for examining the relationship between two quantitative variables. Scatter plots are essential for visualizing correlations and trends in data. A scatter plot helps identify the direction and strength of the relationship between two quantitative variables. The scatter plot also helps in identifying the linear to non-linear trend in the data. If there are more than two variables in a data set, one can draw a scatter matrix diagram between all/different pairs of quantitative variables.

Scatter plots in R can be drawn in several ways. Here we will discuss how to make several kinds of scatter plots in R.

The plot function in R

In plot() function when two numeric vectors are provided as arguments (one for horizontal and the other for vertical coordinates), the default behavior of the plot() function is to make a scatter diagram. For example,

library(car)
attach(Prestige)
plot(income, prestige)

will draw a simple scatterplot of prestige by income.

Usually, the interpretation of a scatterplot is often assisted by enhancing the plot with least-squares or non-parametric regression lines. For this purpose scatterplot() in car package can be used and it will add marginal boxplots for the two variables

scatterplot(prestige ~ income, lwd = 3 )

Note that in the scatterplot, the non-parametric regression curve is drawn by a local regression smoother, where local regression works by fitting a least-square line in the neighborhood of each observation, placing greater weight on points closer to the focal observation. A fitted value for the focal observation is extracted from each local regression, and the resulting fitted values are connected to produce the non-parametric regression line.

Coded Scatterplots

The scatterplot() function can also be used to create coded scatterplots. For this purpose, a categorical variable is used for coloring or using different symbols for each category. For example, let us plot prestige by income, coded by the type of occupation

scatterplot(prestige ~ income | type)

Note that variables in the scatterplot are given in a formula-style (as y ~ x | groups).

The coded scatterplot indicates that the relationship between prestige and income may well be linear within occupation types. The slope of the relationship looks steepest for blue-collar (bc) occupations, and least steep for professional and managerial occupations.

Jittering Scatter Plots

Jittering the data by adding a small random quantity to each coordinate serves to separate the overplotted points.

data(Vocab)
attach(Vocab)
plot(education, vocabulary) 
# without jittering
plot(jitter (education), jitter(vocabulary) )
Scatter Plots in R Language

The degree of jittering can be controlled via factor argument. For example, specifying factor = 2 doubles the jitter.

plot(jitter(education, factor = 2), jitter(vocabulary, factor = 2))

Let’s add the least-squares and non-parametric regression line.

abline(lm(vocabulary ~ education), lwd = 3, lty = 2)
lines(lowess(education, vocabulary, f = 0.2), lwd = 3)

The lowess function (an acronym for locally weighted regression) returns coordinates for the local regression curve, which is drawn by lines. The “f” arguments set the span of the local regression to lowess.

Using these different kinds of graphical representations of relationships between variables may help to identify some hidden information (hidden due to overplotting).

FAQs about Scatter Plots in R

  1. How one can draw a scatter plot in R Language?
  2. What is the importance of scatter plots?
  3. What function can be used to draw scatter plots in R?
  4. What is the use of scatterplot() function in R?
  5. What is meant by a coded scatter plot?
  6. What are jittering scatter plots in R?
  7. What are the important arguments of a plot() function to draw a scatter plot?

See more on plot() function

https://itfeature.com, https://gmstat.com

Greek Letters in R Plot Label and Title

Introduction to Greek Letters in R Plot

The post is about writing Greek letters in R plot, their labels, and the title of the plots. There are two main ways to include Greek letters in your R plot labels (axis labels, title, legend):

  1. Using the expression Function
    This is the recommended approach as it provides more flexibility and control over the formatting of the Greek letters and mathematical expressions.
  2. Using raw Greek letter Codes
    This method is less common and requires memorizing the character codes for each Greek letter.

Question: How one can include Greek letters (symbols) in R plot labels?
Answer: Greek letters or symbols can be included in titles and labels of a graph using the expression command. Following are some examples

Note that in these examples random data is generated from a normal distribution. You can use your own data set to produce graphs that have symbols or Greek letters in their labels or titles.

Greek Letters in R Plot

The following are a few examples of writing Greek letters in R plot.

Example 1: Draw Histogram

mycoef <- rnorm (1000)
hist(mycoef, main = expression(beta) )

where beta in expression is the Greek letter (symbol) of $latex \beta$. A histogram similar to the following will be produced.

greek Letters in r plot-1

Example 2:

sample <- rnorm(mean=5, sd=1, n=100)
hist(sample, main=expression( paste("sampled values, ", mu, "=5, ", sigma, "=1" )))

where mu and sigma are symbols of $latex \mu$ and $latex \sigma$ respectively. The histogram will look like

greek symbols in r plot-2

Example 3:

curve(dnorm, from= -3, to=3, n=1000, main="Normal Probability Density Function")

will produce a curve of Normal probability density function ranging from $latex -3$ to $latex 3$.

greek symbols in r plot-3

Normal Density Function

To add a normal density function formula, we need to use the text and paste command, that is

text(-2, 0.3, expression(f(x) == paste(frac(1, sqrt(2*pi* sigma^2 ) ), " ", e^{frac(-(x-mu)^2, 2*sigma^2)})), cex=1.2)

Now the updated curve of the Normal probability density function will be

Normal Probability Density Function

Example 4:

x <- dnorm( seq(-3, 3, 0.001))
plot(seq(-3, 3, 0.001), cumsum(x)/sum(x), 
           type="l", col="blue", xlab="x", 
           main="Normal Cumulative Distribution Function")

The Normal Cumulative Distribution function will look like,

Normal Cumulative Distribution Function

To add the formula, use the text and paste command, that is

text(-1.5, 0.7, 
       expression(phi(x) == paste(frac(1, sqrt(2*pi)), " ", 
       integral(e^(-t^2/2)*dt, -infinity, x))), cex = 1.2)

The curve of Normal Cumulative Distribution Function

The Curve of the Normal Cumulative Distribution Function and its formula in the plot will look like this,

Normal Cumulative distribution

https://itfeature.com

https://gmstat.com

Plot Function in R

This article about the plot function in R Language gives some introduction about the plot() function, the use and purpose of its arguments, and a few examples are provided. Using the R plot function one can draw different graphical representations and the arguments of the plot() function can be used to enhance the graph.

Introduction to Graphics in R Language

Question: Can we draw graphics in R language?
Answer: Yes. R language produces high-quality statistical graphs. There are many useful and sophisticated kinds of graphs available in R.

Question: Where graphics are displayed in R?
Answer: In R, all graphs are produced in a window named Graphic Windows which can be resized.

Question: What is the use of the plot function in R?
Answer: In R, plot() is a generic function that can be used to make a variety of point and line graphs. plot() function can also be used to define a coordinate space.

Important Arguments of the Plot Function in R

Question: What are the arguments of the plot() function?
Answer: There are many arguments used in the plot() function. Some of these arguments are x, y, type, xlab, ylab, etc. To see the full list of arguments of the plot() write the command in the R console;

args(plot.default)

Question: Are all arguments necessary to be used in R?
Answer: No. The first two arguments x and y provide the horizontal and vertical coordinates of points or lines to be plotted and define a data-coordinate system for the graph. At least argument x is required. Note that many of the arguments are set to default values in the plot function.

Question: What is the use of the argument type in the plot() function?
Answer: In the R plot function, the argument type determines the type of the graph to be drawn. Several types of graphs can be drawn. The default type of graph type=’p’, plots points at the coordinates specified by the x and y argument. Specifying type=’l’ produces a line graph, and type=’n’ sets up the plotting region to accommodate the data set but plots nothing.

Other Types of Graphs: Setting type Argument

Question: Are there other types of graphs?
Answer: Yes. Setting type=’b’, draw graphs having both points and lines. Setting type=’h’ draws histogram-like vertical lines and setting type=’s’ and type=’S’ draws stair-step-like lines starting horizontally and vertically respectively.

Question: What is the use of xlim and ylim in plot() function?
Answer: The arguments xlim and ylim may be used to define the limits of the horizontal and vertical axes. Usually, these arguments are unnecessary, because R language reasonably picks limits from x and y.

Question: What are the purpose of xlab and xlab arguments in the plot() function?
Answer: xlab and ylab argument tack character-string arguments to label the horizontal and vertical axes.

Examples of R Plot Function in R

Question: Provide a few examples of the R plot function.
Answer: The following are a few examples of R plot functions. Suppose you have a data set on variables x and y, such as

x <- rnorm(100, m=10, sd=10)
y <- rnorm(100)

plot(x, y)
plot(x, y, xlab='X  (Mean=10, SD=10)',   ylab='Y (Mean=1, SD=1)' , type='l')
plot(x, y, xlab='X  (Mean=10, SD=10)',   ylab='Y (Mean=1, SD=1)' , type='o')
plot(x, y, xlab='X  (Mean=10, SD=10)',   ylab='Y (Mean=1, SD=1)' , pch=10)
Introduction to plot function in R

https://gmstat.com

https://itfeature.com