In the R language, there is much graphical representation of qualitative and quantitative data. We will only discuss the histogram, bar plot, and box plot in this post.
Histogram
To visualize a single variable, the histogram can be drawn using the hist( )
function.
Let use the data from iris
dataset.
attach(iris) head(iris) hist(Petal.Width)
We can enhance the histogram by using some arguments/parameters related to hist( )
function. For example,
hist(Petal.Width, xlab = "Petal Width", ylab = "Frequency", main = "Histogram of Petal Width from Iris Data set", breaks =10, col = "dodgerblue", border = "orange")
If these arguments are not provided, R will attempt to intelligently guess them, especially the number of breaks
. See the YouTube tutorial for a graphical representation of the histogram.
Barplots
The bar plots are the best choice for visual inspection of a categorical variable (or a numeric variable with a finite number of values), or a rank variable. For example,
library(mtcars) barplot( table(cyl) )
barplot(table(cyl), ylab = "Frequency", xlab = "Cylinders (4, 6, 8)", main = "Number of cylinders ", col = "green", border = "blue")
Boxplots
Boxplots are used to visualize the normality, skewness, and existence of outliers in the data based on five-number summary statistics.
boxplot(mpg) boxplot(Petal.Width) boxplot(Petal.Length)
However, we often compare a numerical variable for different values of a categorical variable. For example,
boxplot(mpg ~ cyl, data = mtcars)
The reads the formula mpg ~ cyl
as: “plot the mpg
variable against the cyl
variable using the dataset mtcars
. The symbol ~
used to specify a formula in R.
boxplot(mpg ~ cyl, data =mtcars, xlab = "Cylinders", ylab = "Miles per Gallon", pch = 20, cex = 2, col = "pink", border = "black")
See How to perform descriptive statistics