Scatter Plots In R

Introduction to Scatter Plots in R Language

Scatter plots (scatter diagrams) are bivariate graphical representations for examining the relationship between two quantitative variables. Scatter plots are essential for visualizing correlations and trends in data. A scatter plot helps identify the direction and strength of the relationship between two quantitative variables. The scatter plot also helps in identifying the linear to non-linear trend in the data. If there are more than two variables in a data set, one can draw a scatter matrix diagram between all/different pairs of quantitative variables.

Scatter plots in R can be drawn in several ways. Here, we will discuss how to make several kinds of scatter plots in R.

The plot Function in R

For plot() function in R, when two numeric vectors are provided as arguments (one for horizontal and the other for vertical coordinates), the default behavior of the plot() function in R is to make a scatter diagram. For example,

library(car)
attach(Prestige)
plot(income, prestige)

will draw a simple scatterplot of prestige by income.

Usually, the interpretation of a scatterplot is often assisted by enhancing the plot with least-squares or non-parametric regression lines. For this purpose scatterplot() in car package can be used, and it will add marginal boxplots for the two variables

scatterplot(prestige ~ income, lwd = 3 )

Note that in the scatterplot, the non-parametric regression curve is drawn by a local regression smoother, where local regression works by fitting a least-square line in the neighborhood of each observation, placing greater weight on points closer to the focal observation. A fitted value for the focal observation is extracted from each local regression, and the resulting fitted values are connected to produce the non-parametric regression line.

Coded Scatterplots

The scatterplot() function can also be used to create coded scatterplots. For this purpose, a categorical variable is used for coloring or using different symbols for each category. For example, let us plot prestige by income, coded by the type of occupation

scatterplot(prestige ~ income | type)

Note that variables in the scatterplot are given in a formula-style (as y ~ x | groups).

The coded scatterplot indicates that the relationship between prestige and income may well be linear within occupation types. The slope of the relationship looks steepest for blue-collar (bc) occupations and least steep for professional and managerial occupations.

Common Plot Symbols in R

R uses numeric values to represent different symbols. The following is a list of the most commonly used plot symbols and their corresponding numbers:

SymbolCodeDescription
Circle1Solid circle (default)
Square15Solid square
Triangle2Solid triangle
Diamond18Solid diamond
Plus Sign3Plus sign
X4X marks the spot
Open Circle1Circle with no fill
Open Square0Square with no fill
Open Triangle17Triangle with no fill

Customizing Your Scatter Plots in R

One can customize the scatter plot further by adjusting the point size, color, axis labels, title, and more. For example, customized Scatter Plot with Larger Points and Color:

# Customized scatter plot
plot(x, y, 
     main="Customized Scatter Plot", 
     xlab="X Axis Label", ylab="Y Axis Label", 
     pch=17, col="red", cex=1.5, 
     xlim=c(0, 6), ylim=c(0, 12))
  • pch=17: Uses a triangle symbol for points.
  • col="red": Changes the point color to red.
  • cex=1.5: Increases the point size.
  • xlim=c(0, 6) and ylim=c(0, 12): Sets the x and y axis limits.

Jittering Scatter Plots

Jittering the data by adding a small random quantity to each coordinate serves to separate the overplotted points.

data(Vocab)
attach(Vocab)
plot(education, vocabulary) 
# without jittering
plot(jitter (education), jitter(vocabulary) )
Scatter Plots in R Language

The degree of jittering can be controlled via a factor argument. For example, specifying factor = 2 doubles the jitter.

plot(jitter(education, factor = 2), jitter(vocabulary, factor = 2))

Let’s add the least-squares and non-parametric regression line.

abline(lm(vocabulary ~ education), lwd = 3, lty = 2)
lines(lowess(education, vocabulary, f = 0.2), lwd = 3)

The lowess function (an acronym for locally weighted regression) returns coordinates for the local regression curve, which is drawn by lines. The “f” arguments set the span of the local regression to lowess.

Using these different kinds of graphical representations of relationships between variables may help to identify some hidden information (hidden due to overplotting).

FAQs about Scatter Plots in R

  1. How can one draw a scatter plot in R Language?
  2. What is the importance of scatter plots?
  3. What function can be used to draw scatter plots in R?
  4. What is the use of the scatterplot() function in R?
  5. What is meant by a coded scatter plot?
  6. What are jittering scatter plots in R?
  7. What are the important arguments of a plot() function to draw a scatter plot?
  8. What is meant by R Plot Symbols?

See more on plot() function

Summary

Scatter plots in R are essential for visualizing relationships between two continuous variables, detecting patterns, and identifying trends. You can customize the points, colors, add regression lines, and even incorporate grids for clearer insights.

https://itfeature.com, https://gmstat.com

Exploring Data in R

Master Exploring Data in R with this essential guide! Learn how to use summary statistics, data visualization, and exploratory data analysis (EDA) techniques to uncover patterns, detect outliers, and prepare your datasets for machine learning. Perfect for data scientists, analysts, and researchers!

Introduction to Exploring Data in R Language

The examination of data (Exploring Data), particularly graphical examination and representation of data, is an important prelude to statistical data analysis and modeling. Note that there are some limitations on the kinds of graphs that we can create.

One should be familiar with standard procedures for exploratory data analysis, statistical graphics, and data transformation. We can categorize the graphical representation of data based on the variable’s nature (or type), the number of variables, and the objectivity of the analysis. For example, if we are comparing groups then comparison graphs such as bar graphs can be used. If we are interested in the kind of relationship between variables then a scatter plot can be useful.

  • Distributional Displays:
    The distributional displays include stem and leaf displays, histograms, density estimates, quantile comparison plots, and box plots.
  • Plots of the Relationship between two variables:
    The graphical representations of the relationship between two variables include various versions of scatter plots, scatter plot smoothers, bivariate density estimates, and parallel box plots.
  • Multivariate Displays:
    Multivariate graphical representations include scatter plot matrices, coplots, and dynamic three-dimensional scatter plots.

Before exploring data in R, it is important to understand the structure of your data set.

Understanding Your Data Structure

Use the following built-in functions to understand your data set first.

  • str() – Examine object structure
  • summary() – Quick statistical overview
  • head()/tail() – View first/last rows
  • dim() – Check dataset dimensions
  • class() – Identify variable types

Stem and Leaf Display and Histogram in R

attach(mtcars)
hist(mpg)
hist(mpg, nclass = 3, col = 3)
stem(mpg)
Histogram: Exploring Data in R

Exploring Data in R: Density Estimates

Consider the following R code for a representation of distribution by smoothing the histogram.

hist(mpg, probability = T, ylab = 'Density')
lines(density(mpg, lwd = 2))
points(mpg, rep(0, length(mpg)), pch = "|")
lines(density(mpg, adjust = 0.9), lwd = 1)

The hist() function constructs the histogram with probability=TRUE specifying density scaling. The lines() function draws the density estimate on the graph, having a thickness of the line as double due to the parameter lwd=2. The points() function draws a one-dimensional scatter plot at the bottom of the graph by using a vertical bar as the plotting symbol. The second call to density in lines() the function with adjust=0.9, specifies a bandwidth of 0.9, the default value.

Quantile Comparison Plots in R

Quantile plots help in comparing the distribution of a variable with a theoretical distribution, such as the normal distribution.

library(car)
qqPlot(mpg)

Note that the qqPlot() function is available in the car library. The qq.plot() function is defunct.

Exploring Data: Relationship Graphs

To explore the relationship between two quantitative variables use plot() function, and for a more enhanced version of a scatter plot between two variables, use scatterplot() function. This function plots the variables with least squares and non-parametric regression lines. For example,

plot(mpg, wt)
scatterplot(mpg, wt)
scatterplot(mpg, wt, labels = rownames(cyl))

CLICK to learn about plot() function in R

FAQs about Exploring Data in R Language

  1. What do you mean by exploring data?
  2. What are the objectives of exploratory data analysis?
  3. What are the important visualizations for exploratory data analysis?
  4. For exploratory analysis, which graph is used for comparison purposes?
  5. For exploratory analysis, which graph is used to explore the relationship between variables?
  6. What is a quantile comparison plot?
  7. What is the objective of density estimation graphs?
  8. Name some of the multivariate plots used for EDA.

R Programming Language

Computer MCQs Online Test

Greek Letters in R Plot Label and Title

In R, plot symbols (Greek Letters in R Plot) are used to represent data points in scatter plots and other types of plots. These symbols can be customized to suit your preferences, making your data visualization more effective and aesthetically pleasing graphs or plots in R.

Common Plot Symbols in R

R Language uses numeric values to represent different symbols. The following is a list of the most commonly used plot symbols and their corresponding numbers:

SymbolCodeDescription
Circle1Solid circle (default)
Square15Solid square
Triangle2Solid triangle
Diamond18Solid diamond
Plus Sign3Plus sign
X4X marks the spot
Open Circle1Circle with no fill
Open Square0Square with no fill
Open Triangle17Triangle with no fill

Introduction to R Plot Symbols (Greek Letters)

The post is about writing (Greek Letters in) R plot symbols, their labels, and the title of the plots. There are two main ways to include Greek letters in your R plot labels (axis labels, title, legend):

  1. Using the expression Function
    This is the recommended approach as it provides more flexibility and control over the formatting of the Greek letters and mathematical expressions.
  2. Using raw Greek letter Codes
    This method is less common and requires memorizing the character codes for each Greek letter.

Question: How can one include Greek letters (symbols) in R plot labels?
Answer: Greek letters or symbols can be included in titles and labels of a graph using the expression command. Following are some examples

Note that in these examples, random data is generated from a normal distribution. You can use your own data set to produce graphs that have symbols or Greek letters in their labels or titles.

Greek Letters in R Plot

The following are a few examples of writing Greek letters in R plot.

Example 1: Draw Histogram

mycoef <- rnorm (1000)
hist(mycoef, main = expression(beta) )

where beta in expression is the Greek letter (symbol) of $\beta$. A histogram similar to the following will be produced.

greek Letters in r plot-1

Example 2:

sample <- rnorm(mean=5, sd=1, n=100)
hist(sample, main=expression( paste("sampled values, ", mu, "=5, ", sigma, "=1" )))

where mu and sigma are symbols of $\mu$ and $\sigma$ respectively. The histogram will look like

greek symbols in r plot-2

Example 3:

curve(dnorm, from= -3, to=3, n=1000, main="Normal Probability Density Function")

will produce a curve of Normal probability density function ranging from $-3$ to $3$.

greek symbols in r plot-3

List of Common Greek Letters in R Plot

The following is a list of common Greek letters and their corresponding R expressions:

Greek LetterR ExpressionR ExampleSymbol
Alphaalphaexpression(alpha)$\alpha$
Betabetaexpression(beta)$\beta$
Gammagammaexpression(gamma)$\gamma$
Deltadeltaexpression(delta)$delta$
Thetathetaexpression(theta)$theta$
Pipiexpression(pi)$\pi$
Sigmasigmaexpression(sigma)$\sigma$
Lambdalambdaexpression(lambda)$\lambda$
Rhorhoexpression(rho)$\rho$
Phiphiexpression(phi)$phi$
Mumuexpression(mu)$\mu$
Omegaomegaexpression(omega)$\omega$

Complex Mathematical Expressions in R Plot

One can also combine Greek Letters with other math functions like sum or integrals

# Plot with complex mathematical expression
x = runif(100)
y = runif(100)
plot(x, y, main=expression(paste("Sum: ", sum(x[i]^2), " for all ", i)))

Normal Density Function

To add a normal density function formula, we need to use the text and paste command, that is

text(-2, 0.3, expression(f(x) == paste(frac(1, sqrt(2*pi* sigma^2 ) ), " ", e^{frac(-(x-mu)^2, 2*sigma^2)})), cex=1.2)

Now, the updated curve of the Normal probability density function will be

Normal Probability Density Function

Example 4:

x <- dnorm( seq(-3, 3, 0.001))
plot(seq(-3, 3, 0.001), cumsum(x)/sum(x), 
           type="l", col="blue", xlab="x", 
           main="Normal Cumulative Distribution Function")

The Normal Cumulative Distribution function will look like

Normal Cumulative Distribution Function

To add the formula, use the text and paste command, that is

text(-1.5, 0.7, 
       expression(phi(x) == paste(frac(1, sqrt(2*pi)), " ", 
       integral(e^(-t^2/2)*dt, -infinity, x))), cex = 1.2)

The Curve of the Normal Cumulative Distribution Function

The Curve of the Normal Cumulative Distribution Function and its formula in the plot will look like this,

Normal Cumulative distribution

https://itfeature.com, https://gmstat.com