Special Values in R Programming: A Quick Guide

There are some special values in R Programming language, namely, these are NA, Inf, -inf, NaN, and NULL.

Special Values in R Programming Language

For numeric variables, several formalized special values are used. The calculations involving special values often result in special values. Regarding statistics, the real-world phenomenon should not include a special value. Therefore, it is desirable to handle special values before performing any statistical, especially inferential analysis. On the other hand, functions in R result in errors or warnings when a variable contains special values.

The NA values in R (NA stands for Not Available) represent the missing observations. A missing value may occur due to the non-response of the respondent or may arise when the vector size is expanded. For example,

v = c(1, 5, 6)
v[5] = 4
v

## Output
[1]  1  5  6 NA  4

To learn about how to handle missing values in R, see the article: Handling Missing Values in R

Inf and -Inf values in R represent a too-big number, which occurs during computation. Inf is for the positive number and -Inf is for the negative number (both represent the positive infinity, and negative infinity, respectively). Inf or -Inf also results when a value or variable is divided by 0. For example,

2 ^ 1024
## Output
[1] Inf

-2^1024

## Output
[1] -Inf

1/0

## Output
[1] Inf

-Inf + 1e10

## Output
[1] -Inf
Special Values in R programming Language

Sometimes a computation will produce a result that makes little sense. In such cases, R often returns NaN (Not a Number). For example,

Inf - Inf
NaN
0/0

## Output

In R, the Null object is represented by the symbol NULL. It is often used as an argument in functions to represent that no value was assigned to the argument. Additionally, some functions may return NULL. Note that the NULL is not the same as NA, Inf, -Inf, or NaN.

Getting Information about Special Values

Also, look at the str(), typeof(), and the length of Inf, -Inf, NA, NaN, and Null.

It is worth noting that, the special values in numeric variables indicate values that are not an element of the mathematical set of real numbers. One can use is.finite() function to determine whether the values are regular values or special values. is.finite() function only accepts vector objects. for example,

is.finite(c(1, Inf, NaN, NA))

A function can be written to deal with every numerical column in a data frame. For example,

special <- function(x){
    if (is.numeric(x)){
        return(!is.finite(x))
    }else {
        return (is.na(x))
    }
}

sapply(airquality, special)
Special values in R programming

The user defined special() function will test each column of the data frame object (airquality). The function will each special value if the object is numeric, otherwise it only checks for NA.

R FAQs: Special Values in R Programming

https://itfeature.com

https://gmstat.com

Comparing Two Sample Means in R

Comparing Two Sample Means in R

One can easily compare two sample means in R, as in R language all the classical tests are available in the package stats. There are different comparison tests such as (i) one sample mean test, (ii) two independent sample means test, and (iii) dependent sample test. When population standard deviation is known, or sample size (number of observations in the sample) is large enough ($n\ge 30), tests related to normal distribution are performed.

Data for Two Sample Means

Consider the following data set on the “latent heat of the fusion of ice (cal/gm)” from Rice, 1995.

Method A79.9880.0480.0280.0480.0380.0380.0479.9780.05
80.0380.0280.0080.02
Method B80.0279.9479.9879.9779.9780.0379.9579.97

Let us draw boxplots to make a comparison between two these two methods. The comparison will help in checking the assumption of the independent two-sample test.

Note that one can read the data using the scan() function, create vectors, or even read the above data from data files such as *.txt and *.csv. In this tutorial, we assume vectors $A$ and $B$ for method A and method B.

A = c(79.98, 80.04, 80.02, 80.04, 80.03, 80.03, 80.04, 79.97, 80.05, 80.03, 80.02, 80.00, 80.02)
B = c(80.02, 79.94, 79.98, 79.97, 79.97, 80.03, 79.95, 79.97)

Draw a Boxplot of Samples

Let us draw boxplots for each method that indicate the first group tends to give higher results than the second one.

boxplot(A, B)
Comparing Two Sample Means in R

Comparing Two Sample Means in R using t.test() Function

The unpaired t-test (independent two-sample test) for the equality of the means can be done using the function t.test() in R Language.

t.test(A, B)
t.test in R Language

From the results above, one can see that the p-value = 0.006939 is less than 0.05 (level of significance) which means that on average both methods are statistically different from each other with reference to latent heat of fusion of ice.

Testing the Equality of Variances of Samples

Note that, the R language does not assume the equality of variances in the two samples. However, the F-test can be used to check/test the equality in the variances, provided that the two samples are from normal populations.

var.test(A, B)
Testing the equality of variances in R

From the above results, there is no evidence that the variances of both samples are statistically significant, as the p-value is greater than the 0.05 level of significance. It means that one can use the classical t-test that assumes the equality of the variances.

t.test(A, B, var.equa. = TRUE)

## Output
        Welch Two Sample t-test

data:  A and B
t = 3.2499, df = 12.027, p-value = 0.006939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.01385526 0.07018320
sample estimates:
mean of x mean of y 
 80.02077  79.97875 

https://rfaqs.com

https://gmstat.com

Important MCQs On dplyr in R

The post is about Multiple Choice Questions about the package dplyr in R Language. There are 20 MCQs about the package and its use. Let us start with the Quiz on dplyr in R Language.

Online Multiple Choice Questions about R and dplyr package

1. What symbol is used in dplyr that holds verbs together in a single phrase?

 
 
 
 

2. How can a new column/variable (total_price) be created in dplyr with the sum of two existing columns/variables price1 and price2?

 
 
 
 

3. In dplyr, what is the purpose of the %>% operator (known as pipe operator)

 
 
 
 

4. Example tools for reproducible report writing are:

 
 
 
 

5. What does the dplyr verb mutate do?

 
 
 
 

6. What is the purpose of ungroup() function in dplyr?

 
 
 
 

7. What does the dplyr verb ‘Arrange’ do?

 
 
 
 

8. What is the function of the dplyr verb Group By?

 
 
 
 

9. The dplyr verb Arrange is responsible for what action?

 
 
 
 

10. In dplyr, what does the slice() function do?

 
 
 
 

11. The dplyr verb ‘Select’ does?

 
 
 
 

12. What does the dplyr verb ‘Group By’ do?

 
 
 
 

13. What is the purpose of the distinct() function in dplyr?

 
 
 
 

14. Reproducibility tools for reports like knitr help with:

 
 
 
 

15. How does Summarise work?

 
 
 
 

16. What is the function of the dplyr verb Select?

 
 
 
 

17. ———– function is similar to the existing subset() function in R but is quite a bit faster.

 
 
 
 

18. The dplyr verb ‘Filter‘ does what to a data frame?

 
 
 
 

19. What does the dplyr verb ‘Mutate’ do?

 
 
 
 

20. What is the function of the dplyr verb Filter?

 
 
 
 

MCQs dplyr in R Language

  • What is the function of the dplyr verb Filter?
  • What is the function of the dplyr verb Select?
  • What is the function of the dplyr verb Group By?
  • How does Summarise work?
  • What does the dplyr verb mutate do?
  • The dplyr verb Arrange is responsible for what action?
  • The dplyr verb ‘Filter‘ does what to a data frame?
  • The dplyr verb ‘Select‘ does?
  • What does the dplyr verb ‘Group By‘ do?
  • What does the dplyr verb ‘Arrange‘ do?
  • What does the dplyr verb ‘Mutate‘ do?
  • What symbol is used in dplyr that holds verbs together in a single phrase?
  • Example tools for reproducible report writing are:
  • Reproducibility tools for reports like knitr help with:
  • What is the purpose of the distinct() function in dplyr?
  • In dplyr, what is the purpose of the %>% operator (known as pipe operator)
  • ———– function is similar to the existing subset() function in R but is quite a bit faster.
  • What is the purpose of ungroup() function in dplyr?
  • In dplyr, what does the slice() function do?
  • How can a new column/variable (total_price) be created in dplyr with the sum of two existing columns/variables price1 and price2?

An Introduction to dplyr Package

The dplyr package is used for data manipulation and transformation. It gives a set of functions that make it easy to perform common data manipulation tasks, which include (1) filtering, (2) grouping, (3) summarizing, (4) arranging, and (5) joining data frames.

The package is part of the tidyverse, a collection of R packages designed to work together seamlessly for data analysis and visualization.

Some key functions available in dplyr R Package include:

  • filter(): Used to subset rows based on specified conditions.
  • select(): Used to choose specific columns from a data frame.
  • arrange(): Used to reorder rows based on one or more columns.
  • mutate(): Used to create new columns or modify existing ones.
  • group_by(): Used to group data by one or more variables.
  • summarize(): Used to compute summary statistics for groups of data.
  • join(): Used to merge data frames based on common keys.
dplyr in R Language

The dplyr package provides a powerful and efficient toolkit for data manipulation in R.

R FAQS Logo: dplyr in R Language

https://itfeature.com

https://gmstat.com

Simple Random Sampling in R: Explained Easy

Introduction to Simple Random Sampling in R

Simple random Sampling (SRS) is the most basic method of taking a probability sample. A sample of $n$ units is selected from a population $N$ using simple random sampling. Each of the $\binom{N}{n}$ possible samples has the same chance of being selected. The choice of the specific sample can be made using a random number generator on a computer. In this post we will learn about simple random sampling in R, that is, the selection of elements in a sample using simple random sampling.

The following commands will generate random permutations of $n$ integers or random samples from a population of numbers.

Random permutation of integers $1$ to $n$

The sample(n) may be used to generate a random sample.

sample(10)

## Output
[1]  5  8  9  4  3  2  1  6 10  7

Random permutation of elements in a vector $x$

A random selection of elements from a vector can be done using sample(n).

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x)

## Output
[1]  21  25   0   4  20 -15  -1  19  23

Random Sample of $n$ items from $x$ without replacement

A random selection of $n$ elements from a vector $x$ without replacement using sample(x, n)

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x, 5)

## Output
[1] -1 19 21 23  0
Simple Random Sampling in R Language

Random sample of $n$ items from $x$ with replacement

A random sample of $n$ items from vector $x$ can be selected with replacement using sample(x, 5, replace = T)

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x, 5, replace = T)

## Output
[1]  0 -1  4 19 -1

Random Sample with Probabilities

A random sample of $n$ items from $x$ with elements of $x$ having differing probabilities of selection. A vector of probabilities is required for each element in $x$. Note that the sum of elements in the probability vector must be one.

x <- c(23, 45, 69, -1, .9, 4, 25, 19)
p <- c(.1, .1, 0, 0, .2, .3, .1, .2)
sum(p)

sample(x, 5, replace = T, p)

## Output
[1]  4 19 19 19 45

Random Selection of Integers without Replacement

The random selection of $n$ integers from the integers 1 to $N$, without replacement can be done using sample(N, n)

sample(1000, 10)

##Output
[1] 138 147 911 523 586 163 915 966 951 245
Simple Random Sampling in R with output

One can estimate $\mu$ and variance of $\mu$.

Let $y_1, y_2, \cdots, y_n$ be the measurements obtained from the simple random sampling of $n$ units from the population. The estimator of population mean $\mu$ is

$$\hat{\mu} = \frac{1}{n} \sum\limits_{i=1}^n y_i$$

with estimated variance of $\hat{\mu}$ given by

$$\hat{var(\hat{\mu})} = \frac{s^2}{n} \left( \frac{N-n}{N}\right)$$

where $s^2 = \frac{1}{n-1} \sum\limits_{i=1}^n (y_i – \overline{y})^2$.

Statistics MCQs and Data Analysis

Online MCQs Website with Answers