Data Frame in R Language

Introduction to Data Frame in R Language

In R Programming language a data frame is a two-dimensional data structure. The data frame objects contain rows and columns. The number of rows for each column should have equal length. The cross-section of the row and column can be considered as a cell. Each cell of the data frame is associated with a combination of row number and column number.

A data frame in R Programming Langauge has:

  • Rows: Represent individual observations or data points.
  • Columns: Represent variables or features being measured. Each column holds values for a single variable across all observations.
  • Data Types: Columns can hold data of different types, including numeric, character, logical (TRUE/FALSE), and factors (categorical variables).

One can modify, extract, and re-arrange the data contents of a data frame; the process is called the manipulation of the data frame. To create a data frame a general syntax can be followed

Data Frame Syntax in R

The general syntax of a data frame in R Language is

df <- data.frame(first column = c(data values separated with commas,
                           second column = c(data values separate with commans,
                           ......
          )

An exemplary data frame in the R Programming language is

df = data.frame(age = c(23, 24, 25, 26, 23, 25, 29, 20),
                marks = c(99, 80, 67, 56, 98, 65, 45, 77),
                grade = c("A", "A", "C", "D", "A", "B", "F", "B")
                )
print(df)
Data Frame in R Language

One can name or rename the columns and rows of the data frame

# Naming / renaming columns 
colnames(df) <- c("Age", "Score", "Grad")

# Naming / renaming rows
row.names(df) <- c("1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th")
Data Frame in R Language colnames and row names

Subsetting a Data Frame

The subset() method can be used to create a new data set by removing specified column(s). This splits the data frame into two sets, one with excluded columns and the other with included columns. To understand subsetting a data frame, let us create a data frame first.

# creating a data frame
df = data.frame(row1 = 0:3, row2 = 3:6, row3 = 6:9)

# creating a subset
df <- subset(df, select = c(row1, row2))
subsetting a data frame

Question: Data Frame in R Language

Suppose we have a frequency distribution of sales from a sample of 100 sales receipts.

Price ValueNumber of Sales
0 to 2016
20 to 4018
40 to 6014
60 to 8024
80 to 10020
100 to 1208

Calculate the mean, median, variance, standard deviation, and coefficient of variation by using the R code.

Solution

# Crate a data frame 

df <- data.frame(lower_class = seq(0, 100, by = 20), upper_class=seq(20, 120, by=20), freq = c(16, 18, 14, 24, 20, 8))

# mid points
m <- (df["lower_class"] + df["upper_class"])/2

mf <- df["freq"] * m
mfsquare <- df["freq"] * m^2


data <- cbind(df, m, mf, mfsquare)
colnames(data) <- c("LL","UL", "freq" , "M", "mf", "mf2")

# Computation
avg = sum(data$mf)/sum(data$freq)
var = (sum(data$mf2) - sum(data$mf)^2 / sum(data$freq))/(sum(data$freq)-1)
sd = sqrt(var)
CV = sd/avg * 100

## Outputs
paste("Mean = ", round(avg, 3))
paste("Variance = ", round(var, 3))
paste("Standard Deviation = ", round(sd, 3))
paste("Coefficient of Variation = ", round(CV, 3))
Frequency Distribution and Descriptive Statistics

Using Logical Conditions for Selecting Rows and Columns

For selecting rows and columns using logical conditions, we consider the iris data set. Here, suppose we are interested in Selecting rows whose values are higher than the median for Sepal Length and whose Petal.Width >= 1.7. In the code below, each value is Sepal.Length variable (column) is compared with the median value of Sepal.Length. Similarly, each value of Petal.Width is compared with 1.7 to extract the required values from these two columns.

attach(iris) 

iris[(Sepal.Length > median(Sepal.Length) & Petal.Width >= 1.7), ]

One can select only the numeric columns from the data frame by following the code below

# Selecting Numeric Columns only
iris[ , sapply(iris, is.numeric)]

# Selecting factor columns only
iris[, sapply(iris, is.factor)]

# Selecting only certain Species
 iris[Species == "virginica", ]

Omitting Missing Observations in a Data Frame

# Omit rows with missing data
na.omit(iris)

# check for missing data across rows
apply(iris, 2, is.na)
iris[complete.cases(iris), ]

https://itfeature.com

https://gmstat.com

Important MCQs R Package Development 13

The post is about MCQs R Package Development Quiz. The quiz also contains questions about git. There are a total of 17 questions and some of the questions have multiple correct answers. Let us start with MCQs R Package Development.

Online MCQs about R Package Development

1. What does the is_a() function do in the context of testthat?

 
 
 
 

2. What is the purpose of the DESCRIPTION file in a package?

 
 
 
 

3. Which of the following files and folders are required in an R package?

 
 
 
 
 
 

4. When a test fails in a call to expect_that(), what happens?

 
 
 
 

5. What does the ::: operator do?

 
 
 
 

6. Which of the following statements correctly describes how R functions should be defined with the package directory?

 
 
 
 

7. How is attaching a package namespace different from loading a namespace?

 
 
 
 

8. What is Git?

 
 
 
 

9. What is the purpose of the Imports field in the DESCRIPTION file?

 
 
 
 

10. Which of the following are good reasons for open-sourcing your software?

 
 
 
 

11. The GNU General Public License is called a copyleft license because

 
 
 
 

12. Which of the following functions from the `devtools` package are you likely to use often, rather than just once per package, when building a package?

 
 
 
 

13. In which sub-directory of an R package should tests be placed?

 
 
 
 

14. For packages that require C code, what should be installed on your system?

 
 
 
 

15. Which of the following files and subdirectories will be included in the initial package directory if you create a new package using the ‘create’ function from ‘devtools’?

 
 
 
 
 
 
 
 
 
 
 

16. Which of the following are good reasons to build an R Package?

 
 
 
 

17. What is a pull request on GitHub?

 
 
 
 

MCQs R Package Development with Answers

R FAQS Logo: MCQs R Package Development
  • Which of the following are good reasons to build an R Package?
  • Which of the following files and folders are required in an R package?
  • Which of the following files and subdirectories will be included in the initial package directory if you create a new package using the ‘create’ function from ‘devtools’?
  • Which of the following functions from the devtools package are you likely to use often, rather than just once per package, when building a package?
  • What is the purpose of the DESCRIPTION file in a package?
  • Which of the following statements correctly describes how R functions should be defined with the package directory?
  • How is attaching a package namespace different from loading a namespace?
  • For packages that require C code, what should be installed on your system?
  • What is the purpose of the Imports field in the DESCRIPTION file?
  • Which of the following are good reasons for open-sourcing your software?
  • When a test fails in a call to expect_that(), what happens?
  • What does the is_a() function do in the context of testthat?
  • In which sub-directory of an R package should tests be placed?
  • What is Git?
  • What is a pull request on GitHub?
  • The GNU General Public License is called a copyleft license because
  • What does the ::: operator do?

https://itfeature.com, https://gmstat.com

Important Online Python Quiz 2

The post contains a Quiz about Python with Answers. The MCQs in this Python Quiz cover topics related to data frames in Python, some basic concepts, and an introduction to Python. Let us start with the quiz.

Please go to Important Online Python Quiz 2 to view the test

Python Quiz with Answers

  • We have a data frame called df. Which line of code aggregates the data based on a column (col_A) and counts the number of rows?
  • We have a data frame called df. Which method is used to create a line chart of two columns $a$ and $b$?
  • We want to delete a list of columns from our data frame df. Which one of these methods is used to delete a column in a data frame?
  • Which method creates the correlation matrix of the numerical columns in a data frame df?
  • How would you access the column “symboling” from the data frame df?
  • We have a data frame called df, in which the pandas property is used to check the columns’ data types.
  • We have a data frame called df, choose the correct pandas property that shows you the number of rows and columns in the data frame.
  • Suppose you have a data frame named df What does the following method do to the data frame df.head(12)?
  • A data professional wants to merge two pandas’ data frames. They want to join the data so that only the keys that are in both data frames get included in the merge. What technique can they use to do so?
  • How do you determine the median of data using Pandas?
  • How would you get the columns for temperature and rainfall from a data frame using Python?
  • What will be the datatype of the output of df[‘A’] where ‘df’ is a data frame and ‘A’ is one of the columns?
  • What will be the datatype of the output of df[‘A’] where ‘df’ is a data frame and ‘A’ is one of the columns?
  • A data professional is working with a NumPy array that has three rows and two columns. They want to change the data into two rows and three columns. What method can they use to do so?
  • Which data structure is [1,2,3,4].
  • What is indicated by the term null?
  • What is the name of the attribute that we want to predict
  • What is the difference between a list and a tuple in Python?
  • What is the syntax to create a tuple in Python?
  • What happens when you add an int and a float?
Python Quiz with Answers

https://itfeare.com

https://gmstat.com