Frequency Table in R: Factors Variable

Recall that in R language a factor is a variable that defines a partition into groups. A single factor variable can be used to create a simple frequency table in R, while a pair of factors can be used to define a two-way cross-classification (contingency or frequency distribution). For this purpose, the table() function allows to creation of frequency tables. The frequency table is calculated from equal length factors.

Frequency Table in R of Categorical/ Group/ Factor Variable

We will use the “mtcars” dataset. For the variable $gear$, let us create a frequency table using the table() function. The table() function will count the gear code for each entry in the data vector. For example,

attach(mtcars)

freq <- table(gear)
freq
frequency table using factor

The freq object will give a table of frequencies of each gear code in the sample. It is important to note that, the frequencies are ordered and labeled by the levels attribute of the factor.

Frequency Distribution of a Continuous Variable

One can also create a frequency distribution table for a continuous variable. Suppose from the mtcars data set, we are interested in creating a frequency table of $mpg$ variable. For this purpose, first, we need to define the cut points or bins to define the classes/groups of the frequency table. For example,

cut(mpg, 10+5*(0:5))

## Output
(10,15] (15,20] (20,25] (25,30] (30,35] 
      6      12       8       2       4 

The cut() function is used to split the continuous data vector into groups. The groups are defined by creating a sequence of values using 10+5*(0:5), that is

10+5*(0:5)

## Output
10 15 20 25 30 35

The cut() function, cuts and counts the occurrence of each observation of mpg regarding the cut points created using breaks = 10+5*(0:5). The frequency table will be

frequency table of a continuous variable

Creating Graph of Frequency Table

For the frequency table created above, one can easily create different graphical representations, such as pie charts and bar plots of the frequency table. For example,

freq<-table(cut(mpg, 10+5*(0:5)))
pie(freq)
hist(freq)
barplot(freq)
plot(freq)
Bar plot frequency table in R
bar plot in R language
pie chart in frequency table in R language

Note that: for a $k$ factor argument, the result is a $k$-way array of frequencies.

https://itfeature.com, https://gmstat.com

Types of Objects in R

R language operates on entities which are known as objects. There are various types of objects in R exists, such as vectors, matrices, factors, lists, data frames, functions, etc. In R, objects are classified into several types based on their structure and content.

Types of Objects in R

Matrices

Arrays or matrices are multi-dimensional generalizations of vectors. Matrices are vectors indexed by two or more indices and displayed specially. Matrices contain rows and columns of homogeneous elements. The class of matrices object is “matrix”. See more about matrices by following the matrices.

Factors

Factors are used to handle categorical data. Factor variables may contain two or more levels, used to define the group or category of the variable. See more about factors in detail by following factors.

Lists

Lists are a general form of vectors in which the various elements need not be of the same type, that is, lists may contain heterogeneous data. Lists are often vectors or lists themselves. Lists are a convenient way to get different results from statistical computation, as lists may contain different types of data objects. See more about lists by following the link Lists.

Data Frame

Data frame objects are similar to matrix object structures. Unlike matrix objects, the data frame objects may contain different types of objects, that is, heterogeneous data. Think of the data frame as “Data Matrices” with one row per observational unit but with (possibly) both numerical and categorical variables. Many experiments are best described by data frames, the treatments are categorical but the response (output) is numeric. For more details about the data frame, follow the link data frame.

Functions

Functions are themselves objects. In R Language, functions can be stored in the project’s workspace. Functions provide a quick, simple, and convenient way to extend the functionality and power of R. See more about functions and customization of functions, see Functions.

Examples of Different Types of Objects in R

# Scalar types
x <- 5        # Numeric (integer)
y <- 3.14159  # Numeric (double)
z <- "Hello"  # Character
b <- TRUE     # Logical

# Vector types
numbers <- c(1, 2, 3, 4)                  # Numeric vector
fruits <- c("apple", "banana", "orange")  # Character vector
bools <- c(TRUE, FALSE, TRUE)             # Logical vector

# Data frame
df <- data.frame(
  name = c("Ali", "Babar", "Usman"),
  age = c(25, 30, 28),
  city = c("Multan", "Lahore", "Karachi")
)

# Matrix
mat <- matrix(1:9, nrow = 3, ncol = 3)

# List
my_list <- list(
  numbers = numbers,
  fruits = fruits,
  df = df
)

# Factor
colors <- factor(c("red", "blue", "green", "red"))
Types of Objects in R Language

https://itfeature.com, https://gmstat.com

Special Values in R Programming: A Quick Guide

There are some special values in R Programming language, namely, these are NA, Inf, -inf, NaN, and NULL.

Special Values in R Programming Language

For numeric variables, several formalized special values are used. The calculations involving special values often result in special values. Regarding statistics, the real-world phenomenon should not include a special value. Therefore, it is desirable to handle special values before performing any statistical, especially inferential analysis. On the other hand, functions in R result in errors or warnings when a variable contains special values.

The NA values in R (NA stands for Not Available) represent the missing observations. A missing value may occur due to the non-response of the respondent or may arise when the vector size is expanded. For example,

v = c(1, 5, 6)
v[5] = 4
v

## Output
[1]  1  5  6 NA  4

To learn about how to handle missing values in R, see the article: Handling Missing Values in R

Inf and -Inf values in R represent a too-big number, which occurs during computation. Inf is for the positive number and -Inf is for the negative number (both represent the positive infinity, and negative infinity, respectively). Inf or -Inf also results when a value or variable is divided by 0. For example,

2 ^ 1024
## Output
[1] Inf

-2^1024

## Output
[1] -Inf

1/0

## Output
[1] Inf

-Inf + 1e10

## Output
[1] -Inf
Special Values in R programming Language

Sometimes a computation will produce a result that makes little sense. In such cases, R often returns NaN (Not a Number). For example,

Inf - Inf
NaN
0/0

## Output

In R, the Null object is represented by the symbol NULL. It is often used as an argument in functions to represent that no value was assigned to the argument. Additionally, some functions may return NULL. Note that the NULL is not the same as NA, Inf, -Inf, or NaN.

Getting Information about Special Values

Also, look at the str(), typeof(), and the length of Inf, -Inf, NA, NaN, and Null.

It is worth noting that, the special values in numeric variables indicate values that are not an element of the mathematical set of real numbers. One can use is.finite() function to determine whether the values are regular values or special values. is.finite() function only accepts vector objects. for example,

is.finite(c(1, Inf, NaN, NA))

A function can be written to deal with every numerical column in a data frame. For example,

special <- function(x){
    if (is.numeric(x)){
        return(!is.finite(x))
    }else {
        return (is.na(x))
    }
}

sapply(airquality, special)
Special values in R programming

The user defined special() function will test each column of the data frame object (airquality). The function will each special value if the object is numeric, otherwise it only checks for NA.

R FAQs: Special Values in R Programming

https://itfeature.com

https://gmstat.com