Data Frames in R Language (2024)

Data frames in R are one of the most essential data structures. A data frame in R is a list with the class “data.frame“. The data frame structure is used to store tabular data. Data frames in R Language are essentially lists of vectors of equal length, where each vector represents a column and each element of the vector corresponds to a row.

Data frames in R are the workhorse of data analysis, providing a flexible and efficient way to store, manipulate, and analyze data.

Restrictions on Data Frames in R

The following are restrictions on data frames in R:

  1. The components (Columns or features) must be vectors (numeric, character, or logical), numeric matrices, factors, lists, or other data frames.
  2. Lists, Matrices, and data frames provide as many variables to the new data frame as they have columns, elements, or variables.
  3. Numeric vectors, logical vectors, and factors are included as is, by default, character vectors are coerced to be factors, whose levels are the unique values appearing in the vector.
  4. Vecture structures appearing as variables of the data frame must all have the same length, and matrix structures must all have the same row size.

A data frame may for many purposes be regarded as a matrix with columns possibly of differing modes and attributes. It may be displayed in matrix form, and its rows and columns are extracted using matrix indexing conventions.

Key Characteristics of Data Frame

  • Column-Based Operations: R language provides powerful functions and operators for performing operations on entire columns or subsets of columns, making data analysis and manipulation efficient.
  • Heterogeneous Data: Data frames can store data of different data types within the same structure, making them versatile for handling various kinds of data.
  • Named Columns: Each column in a data frame has a unique name, which is used to reference and access specific data within the frame.
  • Row-Based Indexing: Data frames are indexed based on their rows, allowing you to easily extract or manipulate data based on row numbers.

Making/ Creating Data Frames in R

Objects satisfying the restrictions placed on the columns (components) of a data frame may be used to form one using the function data.frame(). For example:

BMI <- data.frame(
  age = c(20, 40, 33, 45),
  weight = c(65, 70, 53, 69),
  height = c(62, 65, 55, 58)
)
Creating Data frames in R manually

Note that a list whose components conform to the restrictions of a data frame may coerced into a data frame using the function as.data.frame().

Other Way of Creating a Data Frame

One can also use read.table(), read.csv(), read_excel(), and read_csv() functions to read an entire data frame from an external file.

Accessing and Manipulating Data

  • Accessing Data: Use column names or row indices to extract specific values or subsets of data.
  • Creating New Columns: Calculate new columns based on existing ones using arithmetic operations, logical expressions, or functions.
  • Grouping and Summarizing: Group data by specific columns and calculate summary statistics (e.g., mean, median, sum).
  • Sorting Data: Arrange rows in ascending or descending order based on column values.
  • Filtering Data: Select rows based on conditions using logical expressions and indexing.
# Create a data frame manually
data <- data.frame(
  Name = c("Ali", "Usman", "Hamza"),
  Age  = c(25, 30, 35),
  City = c("Multan", "Lahore", "Faisalabad")
)

# Accessing data
print(data$Age)      # Displays the "Age" column
print(data[2, ])  # Displays the second row

# Creating a new column
data$Age_Category <- ifelse(data$Age < 30, "Young", "Old")

# Filtering data
young_people <- data[data$Age < 30, ]

# Sort data
sorted_data <- data[order(data$Age), ]
data frame after manipulation

https://itfeature.com, https://gmstat.com

Logical Vectors in R: A Quick Guide

The logical vectors in R Language are the vectors whose elements are TRUE, FALSE, or NA (Not Available). R language allows the easy manipulation of logical (or relational) quantities. The TRUE and FALSE values are often used to represent the conditions or Boolean expressions.

In R, the reserved words TRUE and FALSE are often abbreviated as T and F, respectively. However, the T and F are not reserved words and hence can be overwritten by the user. Therefore, instead of T and F; it is better to use TRUE and FALSE.

Logical vectors in R can be created by:

  • Direct assignment of TRUE and FALSE values to the elements of a vector
  • By using conditions (use of logical or comparison operators) on elements of the vectors. (Operators in R Language)
  • Using ifelse statement

Creating Logical Vectors in R Using Direct Assignment

v1 <- c(TRUE, FALSE, TRUE)
print(v1)
## Output
[1]  TRUE FALSE  TRUE

Creating Logical Vectors using Comparison Operators

x <- 5
y <- 10
v2 <- x > y
print(v2)
## Output
FALSE
Logical Vectors in R using Comparison Operators
data <- c(1, 2, 3, 4, 5)
v3 <- data < 3
print(v3)
## Output
[1]  TRUE  TRUE FALSE FALSE FALSE
Logical Vectors in R

Creating Logical Vectors using ifelse Statement

The ifelse statement can also be used to create/generate logical vectors in R Language. For example,

data <- c(3, 4, 6, 8, 4, 4, 6, 10, -5)
v4 <- ifelse(data > 5, TRUE, FALSE)
print(v4)

## Output
[1] FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE

From the above examples, the logical vectors are usually generated by conditions. The length of the logical vector will be the same as that of the vectors to which the condition is applied. Depending on the condition, the corresponding elements result in FALSE if the element of the vectors does not meet the condition specified and TRUE where it is.

Logical Operators

The following is the list of logical operators

Logical OperatorShort Description
<Less than
>Greater than
<=Less than or Equal to
>=Greater than or Equal to
==Exactly Equal to
!=Not Equal to

In addition to logical operators, the relational/logical operators are:

OperatorShort Description
& (and)It takes two logical values and returns TRUE only if both values are TRUE themselves
| (or)It takes two logical values and returns TRUE if just one value is TRUE.
! (not)It negates the logical value it’s used on

Use of Logical Operators

Filtering Data

The logical vectors in R language are commonly used for filtering the data. For example,

data <- data.frame(x = c(1, 2, 3, 4, 5), y = c("a", "b", "c", "d", "e"))
filtered_data <- data[data$x > 3, ]
Logical Vectors in R: Filtering Data

Ordinary Arithmetic

Logical vectors may be used in ordinary arithmetic, in which case they are coerced into numeric vectors, FALSE becoming 0 and TRUE becoming. For example,

x = c(TRUE, FALSE, FALSE, TRUE)
y = c(5, 10, 6, 15)
x+y

## Output
[1]  6 10  6 16

sum(x)
## Output
[1] 2

Logical vectors in R language are a fundamental tool for working with conditions and Boolean expressions. Understanding how to create, manipulate, and use logical vectors is essential for effective data analysis and programming in R.

https://itfeature.com, https://gmstat.com

Vector Arithmetic in R: Made Easy 2024

The post is about vector arithmetic in R Language. In R, different mathematical operations can be performed on vectors, that is vectors can be used in arithmetic expressions. The vector arithmetic operations are performed element by element.

It is important to note that vectors occurring in the same mathematical expression need not be of the same length (size). The shorter vectors in the arithmetic expression are recycled until they match the length of the longest vector.

Vector Arithmetic Operations

The vector arithmetic operations can be performed using arithmetic operators and vector functions. The +, -, *, /, and ^ are elementary arithmetic operators. The arithmetic functions are also available, such as, log, exp, sin, cos, tan, sqrt, and so on. The max() and min() functions returns the largest and smallest elements of a vector, respectively. Similarly, the range() function results in a vector of length two having minimum and maximum values from the vector, that is, c(min(x), max(x)).

The length(x) function returns the number of elements (size or number of observations) in a vector say $x$, sum(x) gives the total (sum) of the elements in vector $x$, and prod(x) returns the product of elements.

Instead of performing simple arithmetics (+, -, *, and /), we will use some functions for arithmetic that can be performed on a vector.

Vector Arithmetic in R: Examples

The basic vector arithmetic in R can be performed just like adding numbers on a calculator.

x <- c(1, 2, 3, 4, 5)
y <- c(4, 5, 6, 7, 8)

# Addition
x + y

# Subtraction
x - y

# Multiplication
x * y

# Division
x / y

# Exponentiation
x ^ y

One can compute the average (mean value) of a vector by performing arithmetics on a vector, such as

x <- c(5, 10, 5, 3, 5, 6, 7, 8, 4, 3, 10)
sum(x)/ length(x)

## Output
6

The built-in function for the computation of the average value of a vector is mean(), that is mean(x).

mean(x)

## output
6

The variance can also be computed by performing arithmetics on a vector say $x$.

sum((x - mean(x))^2)/ (length(x)-1)

## Output
6.2
Vector Arithmetic in R Language

The built-in function for sample variance is var(x). Note that if the argument var() is a $n$-by-$p$ matrix, a $p$-by-$p$ matrix of the sample covariance matrix will return.

var(x)

## Output
6.2

The sort(x) function returns a vector of the same size as $x$ with the elements arranged in increasing order.

sort(x)

## Output
[1]  3  3  4  5  5  5  6  7  8 10 10

The min() and max() functions are used to select the smallest and largest values from the argument, even if the argument contains several vectors.

In summary, Vector arithmetic is a fundamental aspect of R programming, enabling efficient and concise mathematical operations on sequences of elements. By understanding the basic operations, vector recycling, and available functions, you can effectively leverage vectors to solve a wide range of problems in data analysis and scientific computing.

https://rfaqs.com vector arithme5ics

https://itfeature.com, https://gmstat.com