Introduction: Matrices in R

While dealing with matrices in R, all columns in the matrix must have the same mode (numeric, character, etc.), and the same length. A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the function matrix() in R.

The general syntax of creating matrices in R is:

matrix_name <- matrix(vector, nrow = r, ncol = c,
                         byrow = FALSE, dimnames = list(char_vector_rownames,
                                                        char_vector_colnames)
)

byrow = TRUE indicates that the matrix will be filled by rows.

dimnames provides optional labels for the columns and rows.

Creating Matrices in R

Following the general syntax of the function matrix() in R, let us create a matrix from a vector of the first 20 numbers.

Example 1:

# Generate matrix having 5 rows and 4 columns 
y1 <- matrix (1 : 20, nrow = 5, ncol = 4) ; y1

# Output
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
y2 <- matrix (1 : 20, nrow = 5, ncol = 4, byrow = FALSE); y2

# Output
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
y3 <- matrix (1 : 20, nrow = 5, ncol = 4, byrow = TRUE) ; y3

# Output
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20

Example 2:

elements <- c(11, 23, 29, 67)
rownames <- c("R1", "R2")
colnames <- c("C1", "C2")

m1 <- matrix(elements, nrow = 2, ncol = 2, byrow = TRUE, 
      dimnames = list(rownames, colnames)
      )

# Output
   C1 C2
R1 11 23
R2 29 67

Try the above example 2 with the following values set to arguments as below

nrow = 4 and ncol = 1, byrow = FALSE

Note the difference. You may also have some errors related to the number of rows or columns. Therefore, if you change the number of rows or columns then ensure that you have the same number of row names and column names too.

Matrix Operations in R Language

In the R language, there are some operators and functions that can be used to perform computation on one or more matrices. Some basic matrix operations in R are:

Matrix OperationOperator/ Function
Add/ Subtract+, −
Multiply%*%
Transposet( )
Inversesolve ( )
Extract Diagonaldiag( ) It is described at the end too
Determinantdet( )

The following are some examples related to these operators and matrix functions.

m1 <- matrix(c(11, 23, 9, 35), nrow = 2)
m2 <- matrix(c(5, 19, 11, 20), nrow =2)
m3 <- m1 + m2
m4 <- m1 - m2
m5 <- m1 %*% m2
m6 <- m1 / m2
m1t <- t(m1)
m1tminv <- solve(m1t %*% m1)
diag(m1tminv)

# Output
> m1
     [,1] [,2]
[1,]   11    9
[2,]   23   35

> m2
     [,1] [,2]
[1,]    5   11
[2,]   19   20

> m3
     [,1] [,2]
[1,]   16   20
[2,]   42   55

> m4
     [,1] [,2]
[1,]    6   -2
[2,]    4   15

> m5
     [,1] [,2]
[1,]  226  301
[2,]  780  953

> m6
         [,1]      [,2]
[1,] 2.200000 0.8181818
[2,] 1.210526 1.7500000

> m1t
     [,1] [,2]
[1,]   11   23
[2,]    9   35
Introduction: Matrices in R

Some other important functions can be used to perform some required computations on matrices in R. These matrix operations in R are described below for matrix $X$. You can use your matrix.

Consider we have a matrix X with elements.

X <- matrix(1:20, nrow = 4, ncol = 5) 
X
FunctionDescription
rowSums(X)Compute the average value of each column of the Matrix $X$
colSums(X)Compute the average value of each row of the Matrix $X$
rowMeans(X)Compute the average value of each column of the Matrix $X$
colMeans(X)Compute the average value of each column of the Matrix $X$
diag(X)Extract diagonal elements of the Matrix $X$, or
Create a Matrix that has required diagonal elements such as diag(1:5), diag(5),
crossprod(X,X)Compute X‘X. It is a shortcut of t(X)%*%X

Obtaining Regression Coefficients using Matrices in R

Consider we have a dataset that has a response variable and few regressors. There are many ways to create data (or variables), such as one can create a vector for each variable, a data frame for all of the variables, matrices, or can read data stored in a file.

Here we try it using vectors, then bind the vectors where required. We will use matrices to obtain the regression coefficients.

y  <- c(5, 6, 7, 9, 8, 4, 3, 2, 1, 6, 0, 7)
x1 <- c(4, 5, 6, 7, 8, 3, 4, 9, 9, 8, 7, 5)
x2 <- c(10, 22, 23, 10, 11, 14, 15, 16, 17, 12, 11, 17)
x  <- cbind(1, x1, x2)

The cbind( ) function is used to create a matrix x. Note that 1 is also bound to get the intercept term (the model with the intercept term). Let us compute $\beta$’s from OLS using matrix functions and operators.

xt <- t(x)
xtx <- xt %*% x
xtxinv <- solve(xtx)
xty <- xt %*% y
b <- xtxinv %*% xty

The output is

#Output
x
        x1 x2
 [1,] 1  4 10
 [2,] 1  5 22
 [3,] 1  6 23
 [4,] 1  7 10
 [5,] 1  8 11
 [6,] 1  3 14
 [7,] 1  4 15
 [8,] 1  9 16
 [9,] 1  9 17
[10,] 1  8 12
[11,] 1  7 11
[12,] 1  5 17

xt
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
      1    1    1    1    1    1    1    1    1     1     1     1
x1    4    5    6    7    8    3    4    9    9     8     7     5
x2   10   22   23   10   11   14   15   16   17    12    11    17

xtx
         x1   x2
    12   75  178
x1  75  515 1103
x2 178 1103 2854

xtxinv
xty
b
Computing regression coefficient, matrices in R

Data Structure Matrix in R

visit https://gmstat.com

Creating Vectors in R

The article is about creating vectors in R language. You will also learn about quick and short methods of subsetting the vectors in R and the vectorization of vectors. Learn how to create vectors in R using c() and vector() function. It is perfect for beginners and data analysts!

Introduction to Vectors in R

Vectors in R are the building blocks of R Programming, used to store and manipulate data efficiently. Whether you are working with numbers, text, or logical values, mastering vector creation in R is essential for data analysis, statistical modeling, and machine learning.

Creating Vectors in R Using c() Function

The c() function can be used for creating vectors of objects in R. This function concatenates the values having one dimension (either row or column matrix in a sense). The following are some examples related to creating different types of vectors in R.

# Numeric vector
x <- c(1, 2, 5, 0.5, 10, 20, pi)
# Logical vector
x <- c(TRUE, FALSE, FALSE, T, T, F)
# Character vector
x <- c("a", "z", "good", "bad", "null hypothesis")
# Integer vector 
x <- 9 : 29   # (colon operator is used)
x <- c(1L, 5L, 0L, 15L)
# Complex vector
x <- c(1+0i, 2+4i, 0+0i)

Using vector() Function

Creates a vector of $n$ elements with a default value of zero for numeric vector, an empty string for character vector, FALSE for logical vector, and 0+0i for complex vector.

# Numeric vector of lenght 10 (default is zero)
x <- vector("numeric", length = 10)
# Integer vector of length 10 (default is integer zeros)
x <- vector("integer", length = 10)
# Character vector of length 10 (default is empty string)
x <- vector("character", length = 10)
# Logical vector of length 10 (default is FALSE)
x <- vector("logical", length = 10)
# Complex vector of length 10 (default is 0+0i)
x <- vector("complex", length=10)
Vectors in R

Creating Vectors with Mixed Objects

When different objects are mixed in a vector, coercion occurs; that is, the data type of the vector changes intelligently.

The following are examples

# coerce to character vector 
y <- c(1.2, "good")
y <- c("a", T)
# coerce to a numeric vector
y <- c(T, 2)

From the above examples, the coercion will make each element of the vector of the same class.

Explicitly Coercing Objects to Other Class

Objects can be explicitly coerced from one class to another class using as.character(), as.numeric(), as.integer(), as.complex(), and as.logical() functions. For example;

x <- 0:6
as.numeric(x)
as.logical(x)
as.character(x)
as.complex(x)

Note that non-sensual coercion results in NAs (missing values). For example,

x <- c("a", "b", "c")
as.numeric(x)
as.logical(x)
as.complex(x)
as.integer(x)

Vectorization in R

Many operations in the R Language are vectorized. The operations ( +, -, *, and / ) are performed element by element. For example,

r vectors
x <- 1 : 4
y <- 6 : 9

# Arithmetics
x + y
x - y
x * y
x / y
# Logical Operation
x >= 2
x < 3
y == 8

Without vectorization (as in other languages), one has to use a for loop for performing element-by-element operations on say, vectors.

Subsetting Vectors in R Language

Subsetting in the R Language can be done easily. Subsetting vectors means extracting the elements of a vector. For this purpose, square brackets ([ ]) are used. For example;

x <- c(1, 6, 10, -15, 0, 13, 5, 2, 10, 9)

# Subsetting  Examples
x[1]   # extract first element of x vecotr
x[1:5] # extract first five values of x
x[-1]  # extract all values except first
x[x > 2] # extracts all elements that are greater than 2

head(x)  # extracts first 6 elements of x
tail(x)  # extracts last 6 elements of x

x[x > 5 & x < 10]  # extracts elements that are greater than 5 but less than 10

One can use the subset() function to extract the desired element using logical operators. For example,

subset(x, x > 5)
subset(x, x > 5 & x < 10)
subset(x, !x < 0 )

Learn more about Vectors

https://itfeature.com

https://gmstat.com

Summary Statistics in R

In this article, you will learn about how to perform Summary Statistics in R Language on a data set and finally, you will create a data quality Report file. Let us start learning “Computing Summary Statistics in R”.

We will follow each step as a Task for better understanding. It will also help us to complete all work in sequential tasks.

Task 1: Load and View Data Set

It is better to confirm the working directory using getwd() and save your data in the working directory, or save the data in the required folder and then set the path of this folder (directory) in R using setwd() function.

getwd()
data <- read.csv("data.csv")

Task 2: Calculate Measure of Frequency Metrics in R

Before calculating the frequency metrics it is better to check the data structure and some other useful information about the data, For example,

Note: here we are using mtcars data set.

data <- mtcars
str(data)
head(data)
length(data$cyl)
length(unique(data$cyl))
table(data$cyl)

freq <- table(data$cyl)
freq <- sort(freq, descreasing = T)
print(freq)
Descriptive summary Statistics in R

The above lines of code will tell you about the number of observations in the data set, the frequency of the cylinder variable, its unique category, and finally sorted frequency in order.

Task 3: Calculate the Measure of Central Tendency in R

Here we will calculate some available measures of central tendencies such as mean, median, and mode. One can easily calculate the measures of central tendency in R by following the commands below:

mean(data$mpg)
mean(data$mpg, na.rm = T)
median(data$mpg)
median(data$mpg, na.rm = T)

Note the use of na.rm argument. If there are missing values in the data then na.rm should be set to true. Since the mtcars data set does not contain any missing values, therefore, results for both will be the same.

There is no direct function to compute the most repeated value in the variable. However, using a combination of different functions we can calculate the mode. For example

# for continuous variable
uniquevalues <- unique(data$hp)
uniquevalues[which.max(tabulate(match(data$ho, uniquevalues)))]
# for categorical variable
uniquevalues <- unique(data$cyl)
uniquevalues[which.max(tabulate(match(data$cyl, uniquevalues)))]

Task 4: Calculate Measure of Dispersion in R Programming

The measures of dispersion such as range, variance, and standard deviation can be computed as given below. The use of different functions for the measure of dispersion in R programming is described as follows:

min(data$disp)
min(data$disp, na.rm = T)
max(data$disp)
max(data$disp, na.rm = T)
range(data$disp, na.rm = T)
var(data$disp, na.rm = T)
sd(data$disp, na.rm = T)

Task 5: Calculate Additional Quality Data Metrics

To compute more data metrics we must be aware of the data type of variables. Suppose we have numbers but its data type is set to the character. For example,

test <- as.character(1:3)

Finding the mean of such character variable (the numbers are converted to character class) will result in a warning.

mean(test)

[1] NA 
Warning message: In mean.default(test) : argument is not numeric or logical: returning NA

Therefore, one must be aware of the data type and class of the variable for which calculations are being performed. The class of variable in R can be checked using class() function. For example

class(data$hp)
class(mtcars)

It may also be useful if we know the number of missing observations in the data set.

test2 <- c(NA, 2, 55, 10, NA)

sum(is.na(test2))
sum(is.na(data$hp))
sum(is.na(data$hp))

Note that the data set we are using does not contain any missing values.

Task 6: Computing Summary Statistics in R on all Columns

There are functions in R that can be applied to each column to perform certain calculations on them. For example, apply() the function is used to compute the number of observations in the data set using length function as an argument of apply() function.

apply(data, MARGIN=2, length)

sapply(data, function(x) min(x, na.rm=T))

Let us create a user-defined function that can compute the minimum, maximum, mean, total, number of missing values, unique values, and data type of each variable (column) of the data frame.

quality_data <- function(df = NULL){
    if (is.null(df))
          print("Please Pass a non-empty data frame")
  
summary_tab <- do.call(data.frame,
     list(
           Min = sapply(df, function(x) min(x, na.rm = T) ),
           Max = sapply(df, function(x) max(x, na.rm = T) ),
           Mean = sapply(df, function(x) mean(x, na.rm = T) ),
           Total = apply(df, 2, length),
           NULLS = sapply(df, function(x) sum(is.na(x)) ),
           Unique = sapply(df, function(x) length(unique(x)) ),
           DataType = sapply(df, class)
      )
)
                         
nums <- vapply(summary_tab, is.numeric, FUN.VALUE = logical(1))
summary_tab[, nums] &lt;- round(summary_tab[, nums], digits = 3)
      
return(summary_tab)

}

quality_data(data)

Task 7: Generate a Quality Data Report File

df_quality <- quality_data(data)
df_quality <- cbind(columns = rownames(df_quality),
                    data.frame(df_quality, row.names = NULL)  )

write.csv(df_quality, "Data Quality Report.csv", row.names = F)

write.csv(df_quality, paste0("Data Quality Repor", 
      format(Sys.time(), "%d-%m-%Y-%M%M%S"), ".csv"),
      row.names = F)

The write.csv() function will create a file that contains all the results produced by the quality_data() function.

That’s all about Calculating Descriptive Statistics in R. There are many other descriptive measures, we will learn in future posts.

To learn about importing and exporting different data files, see the post on Importing and Exporting Data in R.

FAQs in R

  1. What summary statistics can easily be computed in R?
  2. How to load the data set in the current workspace?
  3. What are the functions that can be used to compute different measures of dispersions in R Language?
  4. How to compute the summary statistics of all columns at once in R?
  5. What measure of central tendencies can be computed in R?
  6. What functions can be used to get information about the loaded dataset in R?
  7. How missing observations can be identified in R?

Learn Basic Statistics