Using ggplot2 in R Language

Introduction to using ggplot2 in R Language

ggplot2 is a popular R package that provides flexible and elegant grammar of graphics for creating a wide range of dynamic and static graphics. It breaks down plots into fundamental components like data, aesthetics, geometric objects, and statistical transformations. In this post, we will learn about using ggplot2 in R Language.

There are three strategies for plotting in R language.

  1. base graphics using functions such as plot(), points(), and par()
  2. lattice graphics to create nice graphics, however, it is not easy to create high-dimensional data graphics.
  3. ggplot package, it is an implementation of “Grammar of Graphics”.

The ggplot2 is built on the principle of layering graphical elements, making it flexible and customizable.

To plot using ggplot2 in R Langauge, a data.frame object is required as an input, then one needs to define plot layers that stack on top of each other, and each layer has visual/text elements that are mapped to aesthetics (size, colors, and opacity). An extremely informative graph will be produced using the above-described simple set of commands.

Before drawing high-quality informative graphs, one needs to install the ggplot2 package. If ggplot2 is already installed, one does not need to reinstall it using the command below.

install.packages("ggplot2")

Scatter Plot using ggplot2 in R

Let us draw a dot plot (scatter points) graph between variables $hp$ (horsepower) and $disp$ (displacement) from mtcars dataset.

# first load the data set say mtcars
attach(mtcars)

# load the ggplot2 library
library(ggplot2)

# now specify the dataset and variables
p <- ggplot(mtcars, aes(x = disp, y = hp))

# Add a plot layer with points
p <- p + geom_point()
print(p) # display/ show the plot
using ggplot2 in R Language

Note that geom, aesthetics, and facets are three important concepts in drawing the graphs using ggplot2, where

  • geom is the type of the plot
  • aesthetics is the shape, color, size, and alpha values used in ggplot
  • facet are small multiples, displaying different subsets of data

When certain aesthetics are defined, an appropriate legend is chosen and displayed automatically.

p <- ggplot(mtcars, aes(x = disp, y = hp))
p <- p + geom_point(aes(color = mpg))
p
using ggplot2 in R with aesthetics

Updating Graphs using aesthetics (color, size, and shape)

Graphs can be updated by assigning variables to aesthetics color, size, and shape. For example

p <- ggplot(mtcars, aes(x = disp, y = hp))
p <- p + geom_point(aes(color = gear, size = wt))
p
Using ggplot2 in R scatter plot with more aesthetics

Consider the following example. Here, the $gear$ variable is taken as a factor (grouping variable).

p <- ggplot(mtcars, aes(x = disp, y = hp))
p <- p + geom_point(aes(color = as.factor(gear), size = wt))
p
ggplot2

Note that the behaviour of the aesthetics is predictable and customizable.

AestheticDiscrete VariableContinuous Variable
colorRainbow of colorsGradient from red to blue
sizeDiscrete size stepsLinear mapping between radius and value
shapeDifferent shapes for each groupShould not work

Faceting in ggplot2

A small multiple (sometimes called faceting, trellis chart, lattice chart, panel chart, or grid chart) is a series or grid of small similar graphics or charts for comparison purposes. Usually, these small multiples are used to display different subsets of the data and these multiples are useful for exploring some conditional relationship between variables (especially when data is large enough).

Let us examine the faceting of different types. The following are some examples of subsetting the scatterplot in facets

# Create a basic scatter plot
p <- ggplot(mtcars, aes(x = disp, y = hp))
p <- p + geom_point()

# columns are cyl categories
p1 <- p + facet_grid(. ~ cyl)

# rows are cyl categories
p2 <- p + facet_grid(cyl ~ .)

# columns and rows both
p3 <- p + facet_grid(carb ~.)

wrap plots by cyl
p4 <- p + facet_grid(~ am)

# plot all four in one 
library(gridExtra)
grid.arrange(grobs = list(p1, p2, p3, p4), ncol = 2, top = "Facet Examples")
using ggplot2 in R using facets

https://itfeature.com

https://gmstat.com

Vector in R Language

A vector in R is a set of numbers. A vector can be considered as a single column or a single row of a spreadsheet. The following examples are numbers that are not technically “vectors”. It is because these vectors are not in a column/row structure, however, they are ordered. These vectors can be referred to by index.

In R programming, vectors are the most basic data structure and a core building block of data analysis. Whether you’re new to R or brushing up on concepts, understanding vectors is essential. They form the building blocks for more complex structures like matrices, lists, and data frames.

Key Characteristics of Vectors

  • Support Vectorized Operations: Arithmetic and logical operations can be applied element-wise without loops.
  • Homogeneous: All elements must be of the same data type (such as numeric, character, logical, etc.).
  • Indexed: Elements can be accessed using indices (starting at 1).
  • Dynamic: Vectors can grow or shrink in size.

Types of Vectors in R Language

R supports several types of vectors based on the data they store:

(a) Numeric Vectors: Store real numbers (decimals or integers). For example: > c(1.5, 2.3, 4.0)

(b) Integer Vectors: Store whole numbers (explicitly defined with L). For example, > c(1L, 2L, 3L)

(c) Logical Vectors: Store TRUE, FALSE, or NA (missing value). For example: > c(TRUE, FALSE, NA)

(d) Character Vectors” Store text (strings). For example: > c("apple", "banana", "cherry")

(e) Complex Vectors: Store complex numbers. For example: > c(1+2i, 3+4i)

Creating Vectors in R

One can create vectors in R Language using:

  • c() function
  • seq()
  • : operator
# Creating a vector with the c() function

c(1, 4, 6, 7, 9)

c(1:5, 10)
Creating Vector in R Language

A vector in R language can be created using seq() in R, it generates a series of numbers.

# Create a vector using seq() in R

seq(1, 10, by = 2)
seq(0, 50, length = 11)
seq(1, 50, length = 11)
Creating Vector in R using seq() Function

The vector can be created in R using the colon (:) operator. Following are the examples

# Create vector in R using : operator

1:10

## Output
[1]  1  2  3  4  5  6  7  8  9 10

5:1

## Output
[1] 5 4 3 2 1

Creating Non-Integer Sequences in R

The non-integer sequences can also be created in the R Language.

# non-integer sequences
seq(0, 100*pi, by = pi)
Non integer vectors in R

Assigning Vector to Variable

One can assign a vector to a variable using the assignment operator (<-) or equal symbol (=). The examples are:

a <- 1:5
b <- seq(15, 3, length=5)
c <- a * b

Performing Computation on Vectors

There are a lot of built-in functions that can be used to perform different computations on vectors. For example,

a <- 1:5

# compute the total of elements of a vector
sum(a)

## Output
15

# product of elements of a vector
prod(a)

## Output
120

# average of the vector
mean(a)

## Output
3

# standard deviation and variance of a vector
sd(a)

## Output 
1.581139

var(a)

## Output
2.5

Indexing and Slicing Vectors

One can extract the elements of a vector by using square brackets and the index of the component of the vector.

V <- seq(0, 100, by = 10)
V[] # gives all the elements of the vector

## Output
[1]   0  10  20  30  40  50  60  70  80  90 100

V[5] # 5th elements from vector z

## Output
[1] 40

V[c(2, 4, 6, 8)] #2nd, 4th, th, and 8th element

## Output
[1] 10 30 50 70

V[-c(2, 4, 6, 8)] # elements except 2nd, 4th, 6th, and 8th element

## Output
[1]   0  20  40  60  80  90 100

Updating Vector Elements

The specific / required elements of a vector can be updated

V[c(2, 4)] <- c(500, 600) # the second and 4th element is updated to 500 and 600
Updating vector elements in R, Vectors in R Language

https://itfeature.com

https://gmstat.com

Special Vector Values

The following are special vector values used in R Language.

Special ValueMeaningExample
NAMissing valuec(1, NA, 3)
NaNNot a Number0/0 → NaN
InfInfinity1/0 → Inf
NULLEmpty objectvector() → NULL

Important Points About Vectors

The important points about vectors in R language are:

  • Data Types: Vectors can hold logical, integer, double, character, complex, or raw data.
  • Creation: Use the c() function to combine elements into a vector.
  • Accessing Elements: Use indexing (square brackets) to access individual elements.
  • Vector Operations: Perform arithmetic, logical, and comparison operations on vectors.
  • Vectorization: R excels at vectorized operations, making calculations efficient.

Learn How to Create User Defined Functions in R

Introduction to User Defined Functions in R

One can create user defined functions in R Language easily. User-defined functions allow to write/create custom blocks of code to be reused throughout the analysis. The article presents some useful examples of how to write user defined functions in R Language. R language helps to create much more efficient and possibly elegant coding.

Assigning Function to a Variable

Example 1: Create a simple function and assign the function to a variable name as we do with any other objects.

f <- function(x, y = 0){
		z <- x + y
		z
}

x = rnorm(10)
f(x + y)

Regression Coefficients

Example 2: Given $n\times 1$ vector of $y$ and matrix of $X$ the $\hat{\beta}=E[X|y] = (X’X)^{-1}X’y$, where $(X’X)^{-1}$ is generalized invers of $X’X$.

Beta <- function(x, y){
		X <- qr(x)
		qr.coef(X, y)
	}

attach(mtcars)
xmat = cbind(1, hp, wt)
yvar = mpg
regcoef <- Beta(xmat, yvar)

The qr() function computes the QR decomposition of a matrix. The QR decomposition if used to solve the equation $Ax=b$ for a given matrix $A$, and vector $b$. It is very useful in computing regression coefficients and in applying Newton Raphson’s algorithm.

User Defined Functions in R

Removing all Objects from globalenv

Example 3: Create a function capable of removing all objects from the globalenv.

clear <- function(env = globalenv() ){
		obj = ls(envir = env)
		rm(list = obj, envir = env)
}

The clear() function removes all objects from a specified environment and seems to work correctly. However, the clear() function detects also itself and as a result, it cannot be reused without redefining the function again.

The clear() function can be improved to keep the function clear() when all other objects are deleted.

clear <- function(env = globalenv()){
		objects <- objects(env)
		objects <- objects[objects != "clear"]
		rm(list = objects, envir = env)
		invisible(NULL)
	}

Computing Measure of Central Tendency

Example 4: Create a function that can compute some basic Measure of Central Tendency.

center = function(x, type){
	switch(type, 
		mean = mean(x),
		median = median(x),
		trimmed = mean(x, trim = 0.1))
}

attach(airquality)
center(Temp, "mean")     # for calcualtion of mean
center(Temp, "median")   # for calculation of median
center(Temp, "trimmed")  # for calculation of trimmed mean
User defined functions in R: Measure of Central Tendency

Note that the user-defined functions in R can incorporate conditional statements, loops, and other functionalities to perform more advanced tasks. They can also have default parameter values for added flexibility.

https://itfeature.com

https://gmstat.com