Important MCQs On dplyr in R 16

The post is about multiple-choice questions about the package dplyr in R Language. There are 20 MCQs about the package and its use. Let us start with the Quiz on dplyr in R Language.

Online Multiple Choice Questions about R and dplyr package

1. What does the dplyr verb ‘Arrange’ do?

 
 
 
 

2. How can a new column/variable (total_price) be created in dplyr with the sum of two existing columns/variables price1 and price2?

 
 
 
 

3. The dplyr verb Arrange is responsible for what action?

 
 
 
 

4. What does the dplyr verb ‘Mutate’ do?

 
 
 
 

5. What is the function of the dplyr verb Select?

 
 
 
 

6. What does the dplyr verb mutate do?

 
 
 
 

7. The dplyr verb ‘Select’ does?

 
 
 
 

8. In dplyr, what does the slice() function do?

 
 
 
 

9. What symbol is used in dplyr that holds verbs together in a single phrase?

 
 
 
 

10. How does Summarise work?

 
 
 
 

11. The dplyr verb ‘Filter‘ does what to a data frame?

 
 
 
 

12. ———– function is similar to the existing subset() function in R but is quite a bit faster.

 
 
 
 

13. What is the purpose of ungroup() function in dplyr?

 
 
 
 

14. What is the function of the dplyr verb Filter?

 
 
 
 

15. Reproducibility tools for reports like knitr help with:

 
 
 
 

16. In dplyr, what is the purpose of the %>% operator (known as pipe operator)

 
 
 
 

17. Example tools for reproducible report writing are:

 
 
 
 

18. What is the purpose of the distinct() function in dplyr?

 
 
 
 

19. What does the dplyr verb ‘Group By’ do?

 
 
 
 

20. What is the function of the dplyr verb Group By?

 
 
 
 

MCQs dplyr in R Language

  • What is the function of the dplyr verb Filter?
  • What is the function of the dplyr verb Select?
  • What is the function of the dplyr verb Group By?
  • How does Summarise work?
  • What does the dplyr verb mutate do?
  • The dplyr verb Arrange is responsible for what action?
  • The dplyr verb ‘Filter‘ does what to a data frame?
  • The dplyr verb ‘Select‘ does?
  • What does the dplyr verb ‘Group By‘ do?
  • What does the dplyr verb ‘Arrange‘ do?
  • What does the dplyr verb ‘Mutate‘ do?
  • What symbol is used in dplyr that holds verbs together in a single phrase?
  • Example tools for reproducible report writing are:
  • Reproducibility tools for reports like knitr help with:
  • What is the purpose of the distinct() function in dplyr?
  • In dplyr, what is the purpose of the %>% operator (known as pipe operator)
  • ———– function is similar to the existing subset() function in R but is quite a bit faster.
  • What is the purpose of ungroup() function in dplyr?
  • In dplyr, what does the slice() function do?
  • How can a new column/variable (total_price) be created in dplyr with the sum of two existing columns/variables price1 and price2?

An Introduction to dplyr Package

The dplyr package is used for data manipulation and transformation. It gives a set of functions that make it easy to perform common data manipulation tasks, which include (1) filtering, (2) grouping, (3) summarizing, (4) arranging, and (5) joining data frames.

The package is part of the tidyverse, a collection of R packages designed to work together seamlessly for data analysis and visualization.

Some key functions available in dplyr R Package include:

  • filter(): Used to subset rows based on specified conditions.
  • select(): Used to choose specific columns from a data frame.
  • arrange(): Used to reorder rows based on one or more columns.
  • mutate(): Used to create new columns or modify existing ones.
  • group_by(): Used to group data by one or more variables.
  • summarize(): Used to compute summary statistics for groups of data.
  • join(): Used to merge data frames based on common keys.
dplyr in R Language

The dplyr package provides a powerful and efficient toolkit for data manipulation in R.

R FAQS Logo: dplyr in R Language

https://itfeature.com, https://gmstat.com

Simple Random Sampling in R: Explained Easy

Introduction to Simple Random Sampling in R

Simple random Sampling (SRS) is the most basic method of taking a probability sample. A sample of $n$ units is selected from a population $N$ using simple random sampling. Each of the $\binom{N}{n}$ possible samples has the same chance of being selected. The choice of the specific sample can be made using a random number generator on a computer. In this post we will learn about simple random sampling in R, that is, the selection of elements in a sample using simple random sampling.

The following commands will generate random permutations of $n$ integers or random samples from a population of numbers.

Random permutation of integers $1$ to $n$

The sample(n) may be used to generate a random sample.

sample(10)

## Output
[1]  5  8  9  4  3  2  1  6 10  7

Random permutation of elements in a vector $x$

A random selection of elements from a vector can be done using sample(n).

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x)

## Output
[1]  21  25   0   4  20 -15  -1  19  23

Random Sample of $n$ items from $x$ without replacement

A random selection of $n$ elements from a vector $x$ without replacement using sample(x, n)

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x, 5)

## Output
[1] -1 19 21 23  0
Simple Random Sampling in R Language

Random sample of $n$ items from $x$ with replacement

A random sample of $n$ items from vector $x$ can be selected with replacement using sample(x, 5, replace = T)

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x, 5, replace = T)

## Output
[1]  0 -1  4 19 -1

Random Sample with Probabilities

A random sample of $n$ items from $x$ with elements of $x$ having differing probabilities of selection. A vector of probabilities is required for each element in $x$. Note that the sum of elements in the probability vector must be one.

x <- c(23, 45, 69, -1, .9, 4, 25, 19)
p <- c(.1, .1, 0, 0, .2, .3, .1, .2)
sum(p)

sample(x, 5, replace = T, p)

## Output
[1]  4 19 19 19 45

Random Selection of Integers without Replacement

The random selection of $n$ integers from the integers 1 to $N$, without replacement can be done using sample(N, n)

sample(1000, 10)

##Output
[1] 138 147 911 523 586 163 915 966 951 245
Simple Random Sampling in R with output

One can estimate $\mu$ and variance of $\mu$.

Let $y_1, y_2, \cdots, y_n$ be the measurements obtained from the simple random sampling of $n$ units from the population. The estimator of population mean $\mu$ is

$$\hat{\mu} = \frac{1}{n} \sum\limits_{i=1}^n y_i$$

with estimated variance of $\hat{\mu}$ given by

$$\hat{var(\hat{\mu})} = \frac{s^2}{n} \left( \frac{N-n}{N}\right)$$

where $s^2 = \frac{1}{n-1} \sum\limits_{i=1}^n (y_i – \overline{y})^2$.

Statistics MCQs and Data Analysis

Online MCQs Website with Answers

Statistical Power Analysis in R: A Comprehensive Guide

Introduction to Power Analysis

The post is about statistical power analysis in R. First, define the meaning of power in statistics. The power is the probability ($1-\beta$) of detecting an effect given that the effect is here. Power is the probability of correctly rejecting the null hypothesis when it is false.

Suppose, a simple study of a drug-A and a placebo. Let the drug be truly effective. The power is the probability of finding a difference between two groups (drug-A and placebo group). Imagine that a power of $1-\beta=0.8$ (having a power of 0.8 means that 80% of the time, there will be statistically significant differences between the drug-A and the placebo group, whereas there are 20% of the time, the statistically significant effect will not be obtained between two groups). Also, note that this study was conducted many times. Therefore, the probability of a Type-II error is $\beta=0.2$.

One-Sample Power

The following plot is for a one-sample one-tailed greater than t-test. In the graph below, let the null hypothesis $H_0:\mu = \mu_0$ be true, and the test statistic $t$ follows the null distribution indicated by the hashed area. Under the specific alternative hypothesis, $H_1:\mu = \mu_1$, the test statistic $t$ follows the distribution shown by solid area.

The $\alpha$ is the probability of making a type-I error (that is rejecting $H_0$ when it is true), and the “crit. Val” is the location of the $t_{crit}$ value associated with $H_0$ on the scale of the data. The rejection region is the area under $H_0$ at least as far as $crit. val.” is from $\mu_0$.

The test’s power ($1-\beta$) is the green area, the area under $H_1$ in the rejection region. A type-II error is made when $H_1$ is true, but we fail to reject $H_0$ in the red region.

Type-II Error and Power Analysis in R

#One Sample Power

x <- seq(-4, 4, length = 1000)
hx <- dnorm(x, mean = 0, sd = 1)

plot(x, hx, type = "n", xlim = c(-4, 8), ylim = c(0, 0.5),
     main = expression (paste("Type-II Error (", beta, ") and Power (", 1 - beta, ")")), 
     axes = FALSE)

# one-tailed shift
shift = qnorm (1 - 0.05, mean=0, sd = 1 )*1.7
xfit2 = x + shift
yfit2 = dnorm(xfit2, mean=shift, sd = 1 )

axis (1, at = c(-qnorm(0.05), 0, shift), labels = expression("crit. val.", mu[0], mu[1]))
axis(1, at = c(-4, 4 + shift), labels = expression(-infinity, infinity), 
     lwd = 1, lwd.tick = FALSE)

# The alternative hypothesis area 
# the red - underpowered area

lb <- min(xfit2)               # lower bound
ub <- round(qnorm(0.95), 2)    # upper bound
col1 = "#CC2222"

i <- xfit2 >= lb & xfit2 <= ub
polygon(c(lb, xfit2[i], ub), c(0, yfit2[i],0), col = col1)

# The green area where the power is
col2 = "#22CC22"
i <- xfit2 >= ub
polygon(c(ub, xfit2[i], max(xfit2)), c(0, yfit2[i], 0), col = col2)

# Outline the alternative hypothesis
lines(xfit2, yfit2, lwd = 2)

# Print null hypothesis area
col_null = "#AAAAAA"
polygon (c(min(x), x, max(x)), c(0, hx, 0), col = col_null,
         lwd = 2, density = c(10, 40), angle = -45, border = 0)

lines(x, hx, lwd = 2, lty = "dashed", col=col_null)

axis(1, at = (c(ub, max(xfit2))), labels = c("", expression(infinity)), col = col2,
     lwd = 1, lwd.tick = FALSE)

#Legend
legend("topright", inset = 0.015, title = "Color", 
       c("Null Hypothesis", "Type-II error", "Power"), fill = c(col_null, col1, col2), 
       angle = -45, density = c(20, 1000, 1000), horiz = FALSE)

abline(v=ub, lwd=2, col="#000088", lty = "dashed")
arrows(ub, 0.45, ub+1, 0.45, lwd=3, col="#008800")
arrows(ub, 0.45, ub-1, 0.45, lwd=3, col="#880000")
Type-II Error and Power Analysis in R
Frequently Asked Questions About R: Power Analysis in R

Online Quiz Website

Statistics and Data Analysis