The post Special Values in R Programming: A Quick Guide appeared first on R Language Frequently Asked Questions.

]]>`NA, Inf, -inf, NaN`

, and `NULL`

.
For numeric variables, several formalized special values are used. The calculations involving special values often result in special values. Regarding statistics, the real-world phenomenon should not include a special value. Therefore, it is desirable to handle special values before performing any statistical, especially inferential analysis. On the other hand, functions in R result in errors or warnings when a variable contains special values.

The `NA`

values in R (NA stands for Not Available) represent the missing observations. A missing value may occur due to the non-response of the respondent or may arise when the vector size is expanded. For example,

v = c(1, 5, 6) v[5] = 4 v ## Output [1] 1 5 6 NA 4

To learn about how to handle missing values in R, see the article: Handling Missing Values in R

`Inf`

and `-Inf`

values in R represent a too-big number, which occurs during computation. `Inf`

is for the positive number and -Inf is for the negative number (both represent the positive infinity, and negative infinity, respectively). `Inf`

or `-Inf`

also results when a value or variable is divided by 0. For example,

2 ^ 1024 ## Output [1] Inf -2^1024 ## Output [1] -Inf 1/0 ## Output [1] Inf -Inf + 1e10 ## Output [1] -Inf

Sometimes a computation will produce a result that makes little sense. In such cases, R often returns `NaN`

(Not a Number). For example,

Inf - Inf NaN 0/0 ## Output

In R, the Null object is represented by the symbol `NULL`

. It is often used as an argument in functions to represent that no value was assigned to the argument. Additionally, some functions may return NULL. Note that the NULL is not the same as `NA, Inf, -Inf`

, or `NaN`

.

Also, look at the `str(), typeof()`

, and the length of `Inf`

, `-Inf, NA, NaN`

, and `Null`

.

It is worth noting that, the special values in numeric variables indicate values that are not an element of the mathematical set of real numbers. One can use `is.finite()`

function to determine whether the values are regular values or special values. `is.finite()`

function only accepts vector objects. for example,

is.finite(c(1, Inf, NaN, NA))

A function can be written to deal with every numerical column in a data frame. For example,

special <- function(x){ if (is.numeric(x)){ return(!is.finite(x)) }else { return (is.na(x)) } } sapply(airquality, special)

The user defined `special()`

function will test each column of the data frame object (airquality). The function will each special value if the object is numeric, otherwise it only checks for `NA`

.

The post Special Values in R Programming: A Quick Guide appeared first on R Language Frequently Asked Questions.

]]>The post Comparing Two Sample Means in R appeared first on R Language Frequently Asked Questions.

]]>One can easily compare two sample means in R, as in R language all the classical tests are available in the package **stats**. There are different comparison tests such as (i) one sample mean test, (ii) two independent sample means test, and (iii) dependent sample test. When population standard deviation is known, or sample size (number of observations in the sample) is large enough ($n\ge 30), tests related to normal distribution are performed.

Consider the following data set on the “latent heat of the fusion of ice (cal/gm)” from Rice, 1995.

Method A | 79.98 | 80.04 | 80.02 | 80.04 | 80.03 | 80.03 | 80.04 | 79.97 | 80.05 |

80.03 | 80.02 | 80.00 | 80.02 | ||||||

Method B | 80.02 | 79.94 | 79.98 | 79.97 | 79.97 | 80.03 | 79.95 | 79.97 |

Let us draw boxplots to make a comparison between two these two methods. The comparison will help in checking the assumption of the independent two-sample test.

Note that one can read the data using the scan() function, create vectors, or even read the above data from data files such as *.txt and *.csv. In this tutorial, we assume vectors $A$ and $B$ for method A and method B.

A = c(79.98, 80.04, 80.02, 80.04, 80.03, 80.03, 80.04, 79.97, 80.05, 80.03, 80.02, 80.00, 80.02) B = c(80.02, 79.94, 79.98, 79.97, 79.97, 80.03, 79.95, 79.97)

Let us draw boxplots for each method that indicate the first group tends to give higher results than the second one.

boxplot(A, B)

The unpaired t-test (independent two-sample test) for the equality of the means can be done using the function `t.test()`

in R Language.

t.test(A, B)

From the results above, one can see that the p-value = 0.006939 is less than 0.05 (level of significance) which means that on average both methods are statistically different from each other with reference to latent heat of fusion of ice.

Note that, the R language does not assume the equality of variances in the two samples. However, the F-test can be used to check/test the equality in the variances, provided that the two samples are from normal populations.

var.test(A, B)

From the above results, there is no evidence that the variances of both samples are statistically significant, as the p-value is greater than the 0.05 level of significance. It means that one can use the classical t-test that assumes the equality of the variances.

t.test(A, B, var.equa. = TRUE) ## Output Welch Two Sample t-test data: A and B t = 3.2499, df = 12.027, p-value = 0.006939 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.01385526 0.07018320 sample estimates: mean of x mean of y 80.02077 79.97875

The post Comparing Two Sample Means in R appeared first on R Language Frequently Asked Questions.

]]>The post Important MCQs On dplyr in R appeared first on R Language Frequently Asked Questions.

]]>Online Multiple Choice Questions about R and dplyr package

- What is the function of the dplyr verb Filter?
- What is the function of the dplyr verb Select?
- What is the function of the dplyr verb Group By?
- How does
*Summarise*work? - What does the dplyr verb mutate do?
- The dplyr verb
*Arrange*is responsible for what action? - The dplyr verb ‘
*Filter*‘ does what to a data frame? - The dplyr verb ‘
*Select*‘ does? - What does the dplyr verb ‘
*Group By*‘ do? - What does the dplyr verb ‘
*Arrange*‘ do? - What does the dplyr verb ‘
*Mutate*‘ do? - What symbol is used in dplyr that holds verbs together in a single phrase?
- Example tools for reproducible report writing are:
- Reproducibility tools for reports like knitr help with:
- What is the purpose of the
*distinct*() function in dplyr? - In dplyr, what is the purpose of the %>% operator (known as pipe operator)
- ———– function is similar to the existing
*subset*() function in R but is quite a bit faster. - What is the purpose of
*ungroup*() function in dplyr? - In dplyr, what does the
*slice*() function do? - How can a new column/variable (total_price) be created in dplyr with the sum of two existing columns/variables price1 and price2?

The `dplyr`

package is used for data manipulation and transformation. It gives a set of functions that make it easy to perform common data manipulation tasks, which include (1) filtering, (2) grouping, (3) summarizing, (4) arranging, and (5) joining data frames.

The package is part of the *tidyverse*, a collection of R packages designed to work together seamlessly for data analysis and visualization.

Some key functions available in d`plyr`

R Package include:

`filter()`

: Used to subset rows based on specified conditions.`select()`

: Used to choose specific columns from a data frame.`arrange()`

: Used to reorder rows based on one or more columns.`mutate()`

: Used to create new columns or modify existing ones.`group_by()`

: Used to group data by one or more variables.`summarize()`

: Used to compute summary statistics for groups of data.`join()`

: Used to merge data frames based on common keys.

The `dplyr`

package provides a powerful and efficient toolkit for data manipulation in R.

The post Important MCQs On dplyr in R appeared first on R Language Frequently Asked Questions.

]]>