Comparing Two Sample Means in R

Comparing Two Sample Means in R

One can easily compare two sample means in R, as in R language all the classical tests are available in the package stats. There are different comparison tests such as (i) one sample mean test, (ii) two independent sample means test, and (iii) dependent sample test. When population standard deviation is known, or sample size (number of observations in the sample) is large enough ($n\ge 30), tests related to normal distribution are performed.

Data for Two Sample Means

Consider the following data set on the “latent heat of the fusion of ice (cal/gm)” from Rice, 1995.

Method A79.9880.0480.0280.0480.0380.0380.0479.9780.05
80.0380.0280.0080.02
Method B80.0279.9479.9879.9779.9780.0379.9579.97

Let us draw boxplots to make a comparison between two these two methods. The comparison will help in checking the assumption of the independent two-sample test.

Note that one can read the data using the scan() function, create vectors, or even read the above data from data files such as *.txt and *.csv. In this tutorial, we assume vectors $A$ and $B$ for method A and method B.

A = c(79.98, 80.04, 80.02, 80.04, 80.03, 80.03, 80.04, 79.97, 80.05, 80.03, 80.02, 80.00, 80.02)
B = c(80.02, 79.94, 79.98, 79.97, 79.97, 80.03, 79.95, 79.97)

Draw a Boxplot of Samples

Let us draw boxplots for each method that indicate the first group tends to give higher results than the second one.

boxplot(A, B)
Comparing Two Sample Means in R

Comparing Two Sample Means in R using t.test() Function

The unpaired t-test (independent two-sample test) for the equality of the means can be done using the function t.test() in R Language.

t.test(A, B)
t.test in R Language

From the results above, one can see that the p-value = 0.006939 is less than 0.05 (level of significance) which means that on average both methods are statistically different from each other with reference to latent heat of fusion of ice.

Testing the Equality of Variances of Samples

Note that, the R language does not assume the equality of variances in the two samples. However, the F-test can be used to check/test the equality in the variances, provided that the two samples are from normal populations.

var.test(A, B)
Testing the equality of variances in R

From the above results, there is no evidence that the variances of both samples are statistically significant, as the p-value is greater than the 0.05 level of significance. It means that one can use the classical t-test that assumes the equality of the variances.

t.test(A, B, var.equa. = TRUE)

## Output
        Welch Two Sample t-test

data:  A and B
t = 3.2499, df = 12.027, p-value = 0.006939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.01385526 0.07018320
sample estimates:
mean of x mean of y 
 80.02077  79.97875 

https://rfaqs.com

https://gmstat.com

Important MCQs On dplyr in R 16

The post is about multiple-choice questions about the package dplyr in R Language. There are 20 MCQs about the package and its use. Let us start with the Quiz on dplyr in R Language.

Online Multiple Choice Questions about R and dplyr package

1. What symbol is used in dplyr that holds verbs together in a single phrase?

 
 
 
 

2. What does the dplyr verb ‘Group By’ do?

 
 
 
 

3. What does the dplyr verb ‘Mutate’ do?

 
 
 
 

4. What does the dplyr verb mutate do?

 
 
 
 

5. What is the function of the dplyr verb Group By?

 
 
 
 

6. What is the purpose of ungroup() function in dplyr?

 
 
 
 

7. The dplyr verb ‘Filter‘ does what to a data frame?

 
 
 
 

8. How can a new column/variable (total_price) be created in dplyr with the sum of two existing columns/variables price1 and price2?

 
 
 
 

9. What does the dplyr verb ‘Arrange’ do?

 
 
 
 

10. Reproducibility tools for reports like knitr help with:

 
 
 
 

11. How does Summarise work?

 
 
 
 

12. The dplyr verb ‘Select’ does?

 
 
 
 

13. ———– function is similar to the existing subset() function in R but is quite a bit faster.

 
 
 
 

14. In dplyr, what does the slice() function do?

 
 
 
 

15. What is the function of the dplyr verb Select?

 
 
 
 

16. In dplyr, what is the purpose of the %>% operator (known as pipe operator)

 
 
 
 

17. What is the purpose of the distinct() function in dplyr?

 
 
 
 

18. Example tools for reproducible report writing are:

 
 
 
 

19. The dplyr verb Arrange is responsible for what action?

 
 
 
 

20. What is the function of the dplyr verb Filter?

 
 
 
 

MCQs dplyr in R Language

  • What is the function of the dplyr verb Filter?
  • What is the function of the dplyr verb Select?
  • What is the function of the dplyr verb Group By?
  • How does Summarise work?
  • What does the dplyr verb mutate do?
  • The dplyr verb Arrange is responsible for what action?
  • The dplyr verb ‘Filter‘ does what to a data frame?
  • The dplyr verb ‘Select‘ does?
  • What does the dplyr verb ‘Group By‘ do?
  • What does the dplyr verb ‘Arrange‘ do?
  • What does the dplyr verb ‘Mutate‘ do?
  • What symbol is used in dplyr that holds verbs together in a single phrase?
  • Example tools for reproducible report writing are:
  • Reproducibility tools for reports like knitr help with:
  • What is the purpose of the distinct() function in dplyr?
  • In dplyr, what is the purpose of the %>% operator (known as pipe operator)
  • ———– function is similar to the existing subset() function in R but is quite a bit faster.
  • What is the purpose of ungroup() function in dplyr?
  • In dplyr, what does the slice() function do?
  • How can a new column/variable (total_price) be created in dplyr with the sum of two existing columns/variables price1 and price2?

An Introduction to dplyr Package

The dplyr package is used for data manipulation and transformation. It gives a set of functions that make it easy to perform common data manipulation tasks, which include (1) filtering, (2) grouping, (3) summarizing, (4) arranging, and (5) joining data frames.

The package is part of the tidyverse, a collection of R packages designed to work together seamlessly for data analysis and visualization.

Some key functions available in dplyr R Package include:

  • filter(): Used to subset rows based on specified conditions.
  • select(): Used to choose specific columns from a data frame.
  • arrange(): Used to reorder rows based on one or more columns.
  • mutate(): Used to create new columns or modify existing ones.
  • group_by(): Used to group data by one or more variables.
  • summarize(): Used to compute summary statistics for groups of data.
  • join(): Used to merge data frames based on common keys.
dplyr in R Language

The dplyr package provides a powerful and efficient toolkit for data manipulation in R.

R FAQS Logo: dplyr in R Language

https://itfeature.com, https://gmstat.com

Simple Random Sampling in R: Explained Easy

Introduction to Simple Random Sampling in R

Simple random Sampling (SRS) is the most basic method of taking a probability sample. A sample of $n$ units is selected from a population $N$ using simple random sampling. Each of the $\binom{N}{n}$ possible samples has the same chance of being selected. The choice of the specific sample can be made using a random number generator on a computer. In this post we will learn about simple random sampling in R, that is, the selection of elements in a sample using simple random sampling.

The following commands will generate random permutations of $n$ integers or random samples from a population of numbers.

Random permutation of integers $1$ to $n$

The sample(n) may be used to generate a random sample.

sample(10)

## Output
[1]  5  8  9  4  3  2  1  6 10  7

Random permutation of elements in a vector $x$

A random selection of elements from a vector can be done using sample(n).

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x)

## Output
[1]  21  25   0   4  20 -15  -1  19  23

Random Sample of $n$ items from $x$ without replacement

A random selection of $n$ elements from a vector $x$ without replacement using sample(x, n)

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x, 5)

## Output
[1] -1 19 21 23  0
Simple Random Sampling in R Language

Random sample of $n$ items from $x$ with replacement

A random sample of $n$ items from vector $x$ can be selected with replacement using sample(x, 5, replace = T)

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x, 5, replace = T)

## Output
[1]  0 -1  4 19 -1

Random Sample with Probabilities

A random sample of $n$ items from $x$ with elements of $x$ having differing probabilities of selection. A vector of probabilities is required for each element in $x$. Note that the sum of elements in the probability vector must be one.

x <- c(23, 45, 69, -1, .9, 4, 25, 19)
p <- c(.1, .1, 0, 0, .2, .3, .1, .2)
sum(p)

sample(x, 5, replace = T, p)

## Output
[1]  4 19 19 19 45

Random Selection of Integers without Replacement

The random selection of $n$ integers from the integers 1 to $N$, without replacement can be done using sample(N, n)

sample(1000, 10)

##Output
[1] 138 147 911 523 586 163 915 966 951 245
Simple Random Sampling in R with output

One can estimate $\mu$ and variance of $\mu$.

Let $y_1, y_2, \cdots, y_n$ be the measurements obtained from the simple random sampling of $n$ units from the population. The estimator of population mean $\mu$ is

$$\hat{\mu} = \frac{1}{n} \sum\limits_{i=1}^n y_i$$

with estimated variance of $\hat{\mu}$ given by

$$\hat{var(\hat{\mu})} = \frac{s^2}{n} \left( \frac{N-n}{N}\right)$$

where $s^2 = \frac{1}{n-1} \sum\limits_{i=1}^n (y_i – \overline{y})^2$.

Statistics MCQs and Data Analysis

Online MCQs Website with Answers