Introduction to Scatter Plots in R Language
Scatter plots (scatter diagrams) are bivariate graphical representations for examining the relationship between two quantitative variables. Scatter plots are essential for visualizing correlations and trends in data. A scatter plot helps identify the direction and strength of the relationship between two quantitative variables. The scatter plot also helps in identifying the linear to non-linear trend in the data. If there are more than two variables in a data set, one can draw a scatter matrix diagram between all/different pairs of quantitative variables.
Table of Contents
Scatter plots in R can be drawn in several ways. Here, we will discuss how to make several kinds of scatter plots in R.
The plot Function in R
For plot()
function in R, when two numeric vectors are provided as arguments (one for horizontal and the other for vertical coordinates), the default behavior of the plot()
function in R is to make a scatter diagram. For example,
library(car) attach(Prestige) plot(income, prestige)
will draw a simple scatterplot of prestige by income.
Usually, the interpretation of a scatterplot is often assisted by enhancing the plot with least-squares or non-parametric regression lines. For this purpose scatterplot()
in car
package can be used, and it will add marginal boxplots for the two variables
scatterplot(prestige ~ income, lwd = 3 )
Note that in the scatterplot, the non-parametric regression curve is drawn by a local regression smoother, where local regression works by fitting a least-square line in the neighborhood of each observation, placing greater weight on points closer to the focal observation. A fitted value for the focal observation is extracted from each local regression, and the resulting fitted values are connected to produce the non-parametric regression line.
Coded Scatterplots
The scatterplot()
function can also be used to create coded scatterplots. For this purpose, a categorical variable is used for coloring or using different symbols for each category. For example, let us plot prestige by income, coded by the type of occupation
scatterplot(prestige ~ income | type)
Note that variables in the scatterplot are given in a formula-style (as y ~ x | groups)
.
The coded scatterplot indicates that the relationship between prestige and income may well be linear within occupation types. The slope of the relationship looks steepest for blue-collar (bc) occupations and least steep for professional and managerial occupations.
Common Plot Symbols in R
R uses numeric values to represent different symbols. The following is a list of the most commonly used plot symbols and their corresponding numbers:
Symbol | Code | Description |
---|---|---|
Circle | 1 | Solid circle (default) |
Square | 15 | Solid square |
Triangle | 2 | Solid triangle |
Diamond | 18 | Solid diamond |
Plus Sign | 3 | Plus sign |
X | 4 | X marks the spot |
Open Circle | 1 | Circle with no fill |
Open Square | 0 | Square with no fill |
Open Triangle | 17 | Triangle with no fill |
Customizing Your Scatter Plots in R
One can customize the scatter plot further by adjusting the point size, color, axis labels, title, and more. For example, customized Scatter Plot with Larger Points and Color:
# Customized scatter plot plot(x, y, main="Customized Scatter Plot", xlab="X Axis Label", ylab="Y Axis Label", pch=17, col="red", cex=1.5, xlim=c(0, 6), ylim=c(0, 12))
pch=17
: Uses a triangle symbol for points.col="red"
: Changes the point color to red.cex=1.5
: Increases the point size.xlim=c(0, 6)
andylim=c(0, 12)
: Sets the x and y axis limits.
Jittering Scatter Plots
Jittering the data by adding a small random quantity to each coordinate serves to separate the overplotted points.
data(Vocab) attach(Vocab) plot(education, vocabulary) # without jittering plot(jitter (education), jitter(vocabulary) )
The degree of jittering can be controlled via a factor argument. For example, specifying factor = 2
doubles the jitter.
plot(jitter(education, factor = 2), jitter(vocabulary, factor = 2))
Let’s add the least-squares and non-parametric regression line.
abline(lm(vocabulary ~ education), lwd = 3, lty = 2)
lines(lowess(education, vocabulary, f = 0.2), lwd = 3)
The lowess function (an acronym for locally weighted regression) returns coordinates for the local regression curve, which is drawn by lines. The “f” arguments set the span of the local regression to lowess.
Using these different kinds of graphical representations of relationships between variables may help to identify some hidden information (hidden due to overplotting).
FAQs about Scatter Plots in R
- How can one draw a scatter plot in R Language?
- What is the importance of scatter plots?
- What function can be used to draw scatter plots in R?
- What is the use of the scatterplot() function in R?
- What is meant by a coded scatter plot?
- What are jittering scatter plots in R?
- What are the important arguments of a plot() function to draw a scatter plot?
- What is meant by R Plot Symbols?
See more on plot() function
Summary
Scatter plots in R are essential for visualizing relationships between two continuous variables, detecting patterns, and identifying trends. You can customize the points, colors, add regression lines, and even incorporate grids for clearer insights.
https://itfeature.com, https://gmstat.com