Data frames in R are one of the most essential data structures. A data frame in R is a list with the class “data.frame“. The data frame structure is used to store tabular data. Data frames in R Language are essentially lists of vectors of equal length, where each vector represents a column and each element of the vector corresponds to a row.
Table of Contents
Data frames in R are the workhorse of data analysis, providing a flexible and efficient way to store, manipulate, and analyze data.
Restrictions on Data Frames in R
The following are restrictions on data frames in R:
- The components (Columns or features) must be vectors (numeric, character, or logical), numeric matrices, factors, lists, or other data frames.
- Lists, Matrices, and data frames provide as many variables to the new data frame as they have columns, elements, or variables.
- Numeric vectors, logical vectors, and factors are included as is, by default, character vectors are coerced to be factors, whose levels are the unique values appearing in the vector.
- Vecture structures appearing as variables of the data frame must all have the same length, and matrix structures must all have the same row size.
A data frame may for many purposes be regarded as a matrix with columns possibly of differing modes and attributes. It may be displayed in matrix form, and its rows and columns are extracted using matrix indexing conventions.
Key Characteristics of Data Frame
- Column-Based Operations: R language provides powerful functions and operators for performing operations on entire columns or subsets of columns, making data analysis and manipulation efficient.
- Heterogeneous Data: Data frames can store data of different data types within the same structure, making them versatile for handling various kinds of data.
- Named Columns: Each column in a data frame has a unique name, which is used to reference and access specific data within the frame.
- Row-Based Indexing: Data frames are indexed based on their rows, allowing you to easily extract or manipulate data based on row numbers.
Making/ Creating Data Frames in R
Objects satisfying the restrictions placed on the columns (components) of a data frame may be used to form one using the function data.frame(). For example:
BMI <- data.frame( age = c(20, 40, 33, 45), weight = c(65, 70, 53, 69), height = c(62, 65, 55, 58) )
Note that a list whose components conform to the restrictions of a data frame may coerced into a data frame using the function as.data.frame().
Other Way of Creating a Data Frame
One can also use read.table(), read.csv(), read_excel(), and read_csv() functions to read an entire data frame from an external file.
Accessing and Manipulating Data
- Accessing Data: Use column names or row indices to extract specific values or subsets of data.
- Creating New Columns: Calculate new columns based on existing ones using arithmetic operations, logical expressions, or functions.
- Grouping and Summarizing: Group data by specific columns and calculate summary statistics (e.g., mean, median, sum).
- Sorting Data: Arrange rows in ascending or descending order based on column values.
- Filtering Data: Select rows based on conditions using logical expressions and indexing.
# Create a data frame manually data <- data.frame( Name = c("Ali", "Usman", "Hamza"), Age = c(25, 30, 35), City = c("Multan", "Lahore", "Faisalabad") ) # Accessing data print(data$Age) # Displays the "Age" column print(data[2, ]) # Displays the second row # Creating a new column data$Age_Category <- ifelse(data$Age < 30, "Young", "Old") # Filtering data young_people <- data[data$Age < 30, ] # Sort data sorted_data <- data[order(data$Age), ]
https://itfeature.com, https://gmstat.com