Data Frame in R Language

Introduction to Data Frame in R Language

In R Programming language a data frame is a two-dimensional data structure. The data frame objects contain rows and columns. The number of rows for each column should have equal length. The cross-section of the row and column can be considered as a cell. Each cell of the data frame is associated with a combination of row number and column number.

A data frame in R Programming Langauge has:

  • Rows: Represent individual observations or data points.
  • Columns: Represent variables or features being measured. Each column holds values for a single variable across all observations.
  • Data Types: Columns can hold data of different types, including numeric, character, logical (TRUE/FALSE), and factors (categorical variables).

One can modify, extract, and re-arrange the data contents of a data frame; the process is called the manipulation of the data frame. To create a data frame a general syntax can be followed

Data Frame Syntax in R

The general syntax of a data frame in R Language is

df <- data.frame(first column = c(data values separated with commas,
                           second column = c(data values separate with commans,
                           ......
          )

An exemplary data frame in the R Programming language is

df = data.frame(age = c(23, 24, 25, 26, 23, 25, 29, 20),
                marks = c(99, 80, 67, 56, 98, 65, 45, 77),
                grade = c("A", "A", "C", "D", "A", "B", "F", "B")
                )
print(df)
Data Frame in R Language

One can name or rename the columns and rows of the data frame

# Naming / renaming columns 
colnames(df) <- c("Age", "Score", "Grad")

# Naming / renaming rows
row.names(df) <- c("1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th")
Data Frame in R Language colnames and row names

Subsetting a Data Frame

The subset() method can be used to create a new data set by removing specified column(s). This splits the data frame into two sets, one with excluded columns and the other with included columns. To understand subsetting a data frame, let us create a data frame first.

# creating a data frame
df = data.frame(row1 = 0:3, row2 = 3:6, row3 = 6:9)

# creating a subset
df <- subset(df, select = c(row1, row2))
subsetting a data frame

Question: Data Frame in R Language

Suppose we have a frequency distribution of sales from a sample of 100 sales receipts.

Price ValueNumber of Sales
0 to 2016
20 to 4018
40 to 6014
60 to 8024
80 to 10020
100 to 1208

Calculate the mean, median, variance, standard deviation, and coefficient of variation by using the R code.

Solution

# Crate a data frame 

df <- data.frame(lower_class = seq(0, 100, by = 20), upper_class=seq(20, 120, by=20), freq = c(16, 18, 14, 24, 20, 8))

# mid points
m <- (df["lower_class"] + df["upper_class"])/2

mf <- df["freq"] * m
mfsquare <- df["freq"] * m^2


data <- cbind(df, m, mf, mfsquare)
colnames(data) <- c("LL","UL", "freq" , "M", "mf", "mf2")

# Computation
avg = sum(data$mf)/sum(data$freq)
var = (sum(data$mf2) - sum(data$mf)^2 / sum(data$freq))/(sum(data$freq)-1)
sd = sqrt(var)
CV = sd/avg * 100

## Outputs
paste("Mean = ", round(avg, 3))
paste("Variance = ", round(var, 3))
paste("Standard Deviation = ", round(sd, 3))
paste("Coefficient of Variation = ", round(CV, 3))
Frequency Distribution and Descriptive Statistics

Using Logical Conditions for Selecting Rows and Columns

For selecting rows and columns using logical conditions, we consider the iris data set. Here, suppose we are interested in Selecting rows whose values are higher than the median for Sepal Length and whose Petal.Width >= 1.7. In the code below, each value is Sepal.Length variable (column) is compared with the median value of Sepal.Length. Similarly, each value of Petal.Width is compared with 1.7 to extract the required values from these two columns.

attach(iris) 

iris[(Sepal.Length > median(Sepal.Length) & Petal.Width >= 1.7), ]

One can select only the numeric columns from the data frame by following the code below

# Selecting Numeric Columns only
iris[ , sapply(iris, is.numeric)]

# Selecting factor columns only
iris[, sapply(iris, is.factor)]

# Selecting only certain Species
 iris[Species == "virginica", ]

Omitting Missing Observations in a Data Frame

# Omit rows with missing data
na.omit(iris)

# check for missing data across rows
apply(iris, 2, is.na)
iris[complete.cases(iris), ]

https://itfeature.com

https://gmstat.com

Important R Language Questions

The post is about R Language Questions that are commonly asked in interviews or R Language-related examinations and tests.

R Language Questions

Question: What is a file in R?
Answer: A script file written in R has a file extension of R. Since, R is a programming language designed to perform statistical computing and graphics on given data, that is why, a file in R contains code that can be executed within the R software environment.

Question: What is the table in R?
Answer: A table in R language is an arbitrary R object, that is inherited from the class “table” for the as.data.frame method. A table in R language refers to a data structure that is used to represent categorical data and frequency counts. A table provides a convenient way to summarize and organize the data into a tabular format, making it easier to analyze and interpret.

Factor Variables in R

Questions: What is the factor variable in R language?
Answer: Factor variables are categorical variables that hold either string or numeric values. The factor variables are used in various types of graphics, particularly for statistical modeling where the correct number of degrees of freedom is assigned to them.

Data Structure in R

Questions: What is Data Structure in R?
Answer: A data structure is a specialized format for organizing and storing data. General data structure types include the array, the file, the record, the table, the tree, and so on. R offers several data structures, each with its characteristics and purposes. In R common data structures are: vector, factor, matrix, array, data frame, and lists.

scan() Function in R

Question: What is a scan() in R?
Answer: The scan() in R is used to Read Data Values: Read data into a vector or list from the console or file. For Example:

Z <- scan()
1: 12 5
3: 2
4:
Read 3 items

> z
[1] 12 5 2
R Language Questions FAQS Logo

readline() Function in R

Questions: What is readline() in R?
Answer: The deadline() function in R, read text lines from a Connection: Read some or all text lines from a connection. One can use readline() for inputting a line from the keyboard in the form of a string. For Example:

w <- readline()
xyz vw u
> w

[1] "xyz vw u"

R and Data Analysis

MCQs in Statistics

Computer MCQs Online Test

R Language Reference Guide III: A Quick Guide

The post is about the R Language Reference Guide subsetting Vectors, Lists, Matrices, and Data Frames in R Language.

R Language A Quick Reference

R language Reference Guide is about learning R Programming with a short description of the widely used commands. It will help the learner and intermediate user of the R Programming Language to get help with different functions quickly. This R language reference is classified into different groups. Let us start with the R Language Reference Guide – III.

This R Language Quick Reference contains R commands about subsetting in R, such as subsetting of vectors, matrices, lists, data frames, arrays, and factors. It also discusses setting the different properties related to R language data types.

Subsetting Vectors: Quick R Language Reference

The following are ways to subset or slice the values from a vector.

R CommandShort Description
x[1:5]Select elements of $x$ by index
x[-(1:5)]Exclude elements of $x$ by index
x[c(TRUE, FALSE)]Select elements of $x$ corresponding to the True value
x[c(“a”, “b”)]Select elements of $x$ by name

Subsetting Lists in R Language

The following methods are used to subset or slice a list in R Language.

R CommandShort Description
x[1:5]Extracts a sublist of the list $x$
x[-(1:5)]Extract a sublist by excluding elements of list $x$
x[c(TRUE, FALSE)]Extract a sublist with logical subscripts
x[c(“a”, “b”)]Extract a sublist by name
x[[2]]Extract an element of the list $x$
x[[“a”]]Extract the element with the name “a” from list $x$
x$aExtract the element with the name “a” from list $x$

Subsetting Matrices in R: A Quick Reference

To subset or extract certain elements from a matrix follow the ways described below.

R CommandShort Description
x[i, j]Extracts elements of matrix $x$, specified by row $i$ and column $j$
x[i, j] = vSet or rest the elements of matrix $x$, specified by row $i$ and column $j$
x[i, ]Extracts $i$th row of a matrix $x$
x[i, ] = vSet or resets the $i$th row of a matrix $x$ specified by $i$th row
x[ , j]Extracts the $j$ column of a matrix $x$
x[ , j] = vSets or resets the $j$ column of matrix $x$
x[i]Subets a matrix $x$ as a vector
x[i] = vSets or resets the $i$th elements (treated as a vector operation)

Subsetting a Data Frame in R Language

One can easily subset or slice a Data Frame in R.

R CommandShort Description
df[i, j]Matrix subsetting of a data frame, specified by $i$th row and $j$th column
df[i, j] = dfSets or resets a subset of a data frame
subset(df, subset = i)Subset of the $i$ cases/ observations of a data frame
subset(df, select = i)Subset of the $i$ variables/ columns of a data frame
subset(df, subset=i, select=j)Subset of the $i$ cases and $j$ variables of a data frame
R Language Reference Guide

R Language: A Quick Reference – I

https://gmstat.com