Muhammad Imdad Ullah, Author at R Programming FAQs

Special Values in R Programming: A Quick Guide

August 29, 2024August 29, 2024 by Muhammad Imdad Ullah

There are some special values in R Programming language, namely, these are NA, Inf, -inf, NaN, and NULL.

Special Values in R Programming Language

For numeric variables, several formalized special values are used. The calculations involving special values often result in special values. Regarding statistics, the real-world phenomenon should not include a special value. Therefore, it is desirable to handle special values before performing any statistical, especially inferential analysis. On the other hand, functions in R result in errors or warnings when a variable contains special values.

The NA values in R (NA stands for Not Available) represent the missing observations. A missing value may occur due to the non-response of the respondent or may arise when the vector size is expanded. For example,

v = c(1, 5, 6)
v[5] = 4
v

## Output
[1]  1  5  6 NA  4

To learn about how to handle missing values in R, see the article: Handling Missing Values in R

Inf and -Inf values in R represent a too-big number, which occurs during computation. Inf is for the positive number and -Inf is for the negative number (both represent the positive infinity, and negative infinity, respectively). Inf or -Inf also results when a value or variable is divided by 0. For example,

2 ^ 1024
## Output
[1] Inf

-2^1024

## Output
[1] -Inf

1/0

## Output
[1] Inf

-Inf + 1e10

## Output
[1] -Inf

Special Values in R programming Language

Sometimes a computation will produce a result that makes little sense. In such cases, R often returns NaN (Not a Number). For example,

Inf - Inf
NaN
0/0

## Output

In R, the Null object is represented by the symbol NULL. It is often used as an argument in functions to represent that no value was assigned to the argument. Additionally, some functions may return NULL. Note that the NULL is not the same as NA, Inf, -Inf, or NaN.

Getting Information about Special Values

Also, look at the str(), typeof(), and the length of Inf, -Inf, NA, NaN, and Null.

It is worth noting that, the special values in numeric variables indicate values that are not an element of the mathematical set of real numbers. One can use is.finite() function to determine whether the values are regular values or special values. is.finite() function only accepts vector objects. for example,

is.finite(c(1, Inf, NaN, NA))

A function can be written to deal with every numerical column in a data frame. For example,

special <- function(x){
    if (is.numeric(x)){
        return(!is.finite(x))
    }else {
        return (is.na(x))
    }
}

sapply(airquality, special)

The user defined special() function will test each column of the data frame object (airquality). The function will each special value if the object is numeric, otherwise it only checks for NA.

https://itfeature.com

https://gmstat.com

Comparing Two Sample Means in R

August 24, 2024August 24, 2024 by Muhammad Imdad Ullah

Comparing Two Sample Means in R

One can easily compare two sample means in R, as in R language all the classical tests are available in the package stats. There are different comparison tests such as (i) one sample mean test, (ii) two independent sample means test, and (iii) dependent sample test. When population standard deviation is known, or sample size (number of observations in the sample) is large enough ($n\ge 30), tests related to normal distribution are performed.

Data for Two Sample Means

Consider the following data set on the “latent heat of the fusion of ice (cal/gm)” from Rice, 1995.

Method A	79.98	80.04	80.02	80.04	80.03	80.03	80.04	79.97	80.05
	80.03	80.02	80.00	80.02
Method B	80.02	79.94	79.98	79.97	79.97	80.03	79.95	79.97

Let us draw boxplots to make a comparison between two these two methods. The comparison will help in checking the assumption of the independent two-sample test.

Note that one can read the data using the scan() function, create vectors, or even read the above data from data files such as *.txt and *.csv. In this tutorial, we assume vectors $A$ and $B$ for method A and method B.

A = c(79.98, 80.04, 80.02, 80.04, 80.03, 80.03, 80.04, 79.97, 80.05, 80.03, 80.02, 80.00, 80.02)
B = c(80.02, 79.94, 79.98, 79.97, 79.97, 80.03, 79.95, 79.97)

Draw a Boxplot of Samples

Let us draw boxplots for each method that indicate the first group tends to give higher results than the second one.

boxplot(A, B)

Comparing Two Sample Means in R using t.test() Function

The unpaired t-test (independent two-sample test) for the equality of the means can be done using the function t.test() in R Language.

t.test(A, B)

From the results above, one can see that the p-value = 0.006939 is less than 0.05 (level of significance) which means that on average both methods are statistically different from each other with reference to latent heat of fusion of ice.

Testing the Equality of Variances of Samples

Note that, the R language does not assume the equality of variances in the two samples. However, the F-test can be used to check/test the equality in the variances, provided that the two samples are from normal populations.

var.test(A, B)

From the above results, there is no evidence that the variances of both samples are statistically significant, as the p-value is greater than the 0.05 level of significance. It means that one can use the classical t-test that assumes the equality of the variances.

t.test(A, B, var.equa. = TRUE)

## Output
        Welch Two Sample t-test

data:  A and B
t = 3.2499, df = 12.027, p-value = 0.006939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.01385526 0.07018320
sample estimates:
mean of x mean of y 
 80.02077  79.97875

https://rfaqs.com

https://gmstat.com

Important MCQs On dplyr in R 16

September 16, 2024August 10, 2024 by Muhammad Imdad Ullah

The post is about multiple-choice questions about the package dplyr in R Language. There are 20 MCQs about the package and its use. Let us start with the Quiz on dplyr in R Language.

MCQs dplyr in R Language

What is the function of the dplyr verb Filter?
What is the function of the dplyr verb Select?
What is the function of the dplyr verb Group By?
How does Summarise work?
What does the dplyr verb mutate do?
The dplyr verb Arrange is responsible for what action?
The dplyr verb ‘Filter‘ does what to a data frame?
The dplyr verb ‘Select‘ does?
What does the dplyr verb ‘Group By‘ do?
What does the dplyr verb ‘Arrange‘ do?
What does the dplyr verb ‘Mutate‘ do?
What symbol is used in dplyr that holds verbs together in a single phrase?
Example tools for reproducible report writing are:
Reproducibility tools for reports like knitr help with:
What is the purpose of the distinct() function in dplyr?
In dplyr, what is the purpose of the %>% operator (known as pipe operator)
———– function is similar to the existing subset() function in R but is quite a bit faster.
What is the purpose of ungroup() function in dplyr?
In dplyr, what does the slice() function do?
How can a new column/variable (total_price) be created in dplyr with the sum of two existing columns/variables price1 and price2?

An Introduction to dplyr Package

The dplyr package is used for data manipulation and transformation. It gives a set of functions that make it easy to perform common data manipulation tasks, which include (1) filtering, (2) grouping, (3) summarizing, (4) arranging, and (5) joining data frames.

The package is part of the tidyverse, a collection of R packages designed to work together seamlessly for data analysis and visualization.

Some key functions available in dplyr R Package include:

filter(): Used to subset rows based on specified conditions.
select(): Used to choose specific columns from a data frame.
arrange(): Used to reorder rows based on one or more columns.
mutate(): Used to create new columns or modify existing ones.
group_by(): Used to group data by one or more variables.
summarize(): Used to compute summary statistics for groups of data.
join(): Used to merge data frames based on common keys.

The dplyr package provides a powerful and efficient toolkit for data manipulation in R.

https://itfeature.com, https://gmstat.com