Using ggplot2 in R Language

Introduction to using ggplot2 in R Language

ggplot2 is a popular R package that provides flexible and elegant grammar of graphics for creating a wide range of dynamic and static graphics. It breaks down plots into fundamental components like data, aesthetics, geometric objects, and statistical transformations. In this post, we will learn about using ggplot2 in R Language.

There are three strategies for plotting in R language.

  1. base graphics using functions such as plot(), points(), and par()
  2. lattice graphics to create nice graphics, however, it is not easy to create high-dimensional data graphics.
  3. ggplot package, it is an implementation of “Grammar of Graphics”.

The ggplot2 is built on the principle of layering graphical elements, making it flexible and customizable.

To plot using ggplot2 in R Langauge, a data.frame object is required as an input, then one needs to define plot layers that stack on top of each other, and each layer has visual/text elements that are mapped to aesthetics (size, colors, and opacity). An extremely informative graph will be produced using the above-described simple set of commands.

Before drawing high-quality informative graphs, one needs to install the ggplot2 package. If ggplot2 is already installed, one does not need to reinstall it using the command below.

install.packages("ggplot2")

Scatter Plot using ggplot2 in R

Let us draw a dot plot (scatter points) graph between variables $hp$ (horsepower) and $disp$ (displacement) from mtcars dataset.

# first load the data set say mtcars
attach(mtcars)

# load the ggplot2 library
library(ggplot2)

# now specify the dataset and variables
p <- ggplot(mtcars, aes(x = disp, y = hp))

# Add a plot layer with points
p <- p + geom_point()
print(p) # display/ show the plot
using ggplot2 in R Language

Note that geom, aesthetics, and facets are three important concepts in drawing the graphs using ggplot2, where

  • geom is the type of the plot
  • aesthetics is the shape, color, size, and alpha values used in ggplot
  • facet are small multiples, displaying different subsets of data

When certain aesthetics are defined, an appropriate legend is chosen and displayed automatically.

p <- ggplot(mtcars, aes(x = disp, y = hp))
p <- p + geom_point(aes(color = mpg))
p
using ggplot2 in R with aesthetics

Updating Graphs using aesthetics (color, size, and shape)

Graphs can be updated by assigning variables to aesthetics color, size, and shape. For example

p <- ggplot(mtcars, aes(x = disp, y = hp))
p <- p + geom_point(aes(color = gear, size = wt))
p
Using ggplot2 in R scatter plot with more aesthetics

Consider the following example. Here, the $gear$ variable is taken as a factor (grouping variable).

p <- ggplot(mtcars, aes(x = disp, y = hp))
p <- p + geom_point(aes(color = as.factor(gear), size = wt))
p
ggplot2

Note that the behaviour of the aesthetics is predictable and customizable.

AestheticDiscrete VariableContinuous Variable
colorRainbow of colorsGradient from red to blue
sizeDiscrete size stepsLinear mapping between radius and value
shapeDifferent shapes for each groupShould not work

Faceting in ggplot2

A small multiple (sometimes called faceting, trellis chart, lattice chart, panel chart, or grid chart) is a series or grid of small similar graphics or charts for comparison purposes. Usually, these small multiples are used to display different subsets of the data and these multiples are useful for exploring some conditional relationship between variables (especially when data is large enough).

Let us examine the faceting of different types. The following are some examples of subsetting the scatterplot in facets

# Create a basic scatter plot
p <- ggplot(mtcars, aes(x = disp, y = hp))
p <- p + geom_point()

# columns are cyl categories
p1 <- p + facet_grid(. ~ cyl)

# rows are cyl categories
p2 <- p + facet_grid(cyl ~ .)

# columns and rows both
p3 <- p + facet_grid(carb ~.)

wrap plots by cyl
p4 <- p + facet_grid(~ am)

# plot all four in one 
library(gridExtra)
grid.arrange(grobs = list(p1, p2, p3, p4), ncol = 2, top = "Facet Examples")
using ggplot2 in R using facets

https://itfeature.com

https://gmstat.com

Vector in R Language

A vector in R is a set of numbers. A vector can be considered as a single column or a single row of a spreadsheet. The following examples are numbers that are not technically “vectors”. It is because these vectors are not in a column/row structure, however, they are ordered. These vectors can be referred to by index.

Creating Vector in R

# Creating a vector with the c function

c(1, 4, 6, 7, 9)

c(1:5, 10)
Creating Vector in R Language

A vector in R language can be created using seq() function, it generates a series of numbers.

# Create a vector using seq() function

seq(1, 10, by = 2)
seq(0, 50, length = 11)
seq(1, 50, length = 11)
Creating Vector in R using seq() Function

The vector can be created in R using the colon (:) operator. Following are the examples

# Create vector using : operator

1:10

## Output
[1]  1  2  3  4  5  6  7  8  9 10

5:1

## Output
[1] 5 4 3 2 1

The non-integer sequences can also be created in R Language.

# non-integer sequences
seq(0, 100*pi, by = pi)
Non integer vector in R

One can assign a vector to a variable using the assignment operator (<-) or equal symbol (=). The examples are:

a <- 1:5
b <- seq(15, 3, length=5)
c <- a * b

There are a lot of built-in functions that can be used to perform different computations on vectors. For example,

a <- 1:5

# compute the total of elements of a vector
sum(a)

## Output
15

# product of elements of a vector
prod(a)

## Output
120

# average of the vector
mean(a)

## Output
3

# standard deviation and variance of a vector
sd(a)

## Output 
1.581139

var(a)

## Output
2.5

One can extract the elements of a vector by using square brackets and the index of the component of the vector.

V <- seq(0, 100, by = 10)
V[] # gives all the elements of the vector

## Output
[1]   0  10  20  30  40  50  60  70  80  90 100

V[5] # 5th elements from vector z

## Output
[1] 40

V[c(2, 4, 6, 8)] #2nd, 4th, th, and 8th element

## Output
[1] 10 30 50 70

V[-c(2, 4, 6, 8)] # elements except 2nd, 4th, 6th, and 8th element

## Output
[1]   0  20  40  60  80  90 100

The specific / required elements of a vector can be updated

V[c(2, 4)] <- c(500, 600) # the second and 4th element is updated to 500 and 600
Updating vector elements in R

https://itfeature.com

https://gmstat.com

The important points about vectors in R language are:

  • Data Types: Vectors can hold logical, integer, double, character, complex, or raw data.
  • Creation: Use the c() function to combine elements into a vector.
  • Accessing Elements: Use indexing (square brackets) to access individual elements.
  • Vector Operations: Perform arithmetic, logical, and comparison operations on vectors.
  • Vectorization: R excels at vectorized operations, making calculations efficient.

Learn How to Create User Defined Functions in R

Introduction to User Defined Functions in R

One can create user defined functions in R Language easily. User-defined functions allow to write/create custom blocks of code to be reused throughout the analysis. The article presents some useful examples of how to write user defined functions in R Language. R language helps to create much more efficient and possibly elegant coding.

Assigning Function to a Variable

Example 1: Create a simple function and assign the function to a variable name as we do with any other objects.

f <- function(x, y = 0){
		z <- x + y
		z
}

x = rnorm(10)
f(x + y)

Regression Coefficients

Example 2: Given $n\times 1$ vector of $y$ and matrix of $X$ the $\hat{\beta}=E[X|y] = (X’X)^{-1}X’y$, where $(X’X)^{-1}$ is generalized invers of $X’X$.

Beta <- function(x, y){
		X <- qr(x)
		qr.coef(X, y)
	}

attach(mtcars)
xmat = cbind(1, hp, wt)
yvar = mpg
regcoef <- Beta(xmat, yvar)

The qr() function computes the QR decomposition of a matrix. The QR decomposition if used to solve the equation $Ax=b$ for a given matrix $A$, and vector $b$. It is very useful in computing regression coefficients and in applying Newton Raphson’s algorithm.

User Defined Functions in R

Removing all Objects from globalenv

Example 3: Create a function capable of removing all objects from the globalenv.

clear <- function(env = globalenv() ){
		obj = ls(envir = env)
		rm(list = obj, envir = env)
}

The clear() function removes all objects from a specified environment and seems to work correctly. However, the clear() function detects also itself and as a result, it cannot be reused without redefining the function again.

The clear() function can be improved to keep the function clear() when all other objects are deleted.

clear <- function(env = globalenv()){
		objects <- objects(env)
		objects <- objects[objects != "clear"]
		rm(list = objects, envir = env)
		invisible(NULL)
	}

Computing Measure of Central Tendency

Example 4: Create a function that can compute some basic Measure of Central Tendency.

center = function(x, type){
	switch(type, 
		mean = mean(x),
		median = median(x),
		trimmed = mean(x, trim = 0.1))
}

attach(airquality)
center(Temp, "mean")     # for calcualtion of mean
center(Temp, "median")   # for calculation of median
center(Temp, "trimmed")  # for calculation of trimmed mean
User defined functions in R: Measure of Central Tendency

Note that the user-defined functions in R can incorporate conditional statements, loops, and other functionalities to perform more advanced tasks. They can also have default parameter values for added flexibility.

https://itfeature.com

https://gmstat.com