Descriptive Summary in R

Introduction to Descriptive Summary in R

Statistics is a study of data: describing properties of data (descriptive statistics) and drawing conclusions about a population based on information in a sample (inferential statistics). In this article, we will discuss the computation of descriptive summary in R (Descriptive statistics in R Programming).

Example: Twenty elementary school children were asked if they live with both parents (B), father only (F), mother only (M), or someone else (S) and how many brothers has he. The responses of the children are as follows:

CaseSexNo. of His BrothersCaseSexNo. of His Brothers
MFemale3BMale2
BFemale2FMale1
BFemale3BMale0
MFemale4MMale0
FMale3MMale3
SMale1BFemale4
BMale2BFemale3
MMale2FMale2
FFemale4BFemale1
BFemale3MFemale2


Consider the following computation is required. These computations are related to the Descriptive summary in R.

  • Construct a frequency distribution table in r relative to the case of each one.
  • Draw a bar and pie graphs of the frequency distribution for each category using the R code.

Creating the Frequency Table in R

# Enter the data in the vector form 
x <- c("M", "B", "B", "M", "F", "S", "B", "M", "F", "B", "B", "F", "B", "M", "M", "B", "B", "F", "B", "M") 

# Creating the frequency table use Table command 
tabx=table(x) ; tabx

# Output
x
B F M S 
9 4 6 1 

Draw a Bar Chart and Pie Chart from the Frequency Table

# Drawing the bar chart for the resulting table in Green color with main title, x label and y label 

barplot(tabx, xlab = "x", ylab = "Frequency", main = "Sample of Twenty elementary school children ",col = "Green") 

# Drawing the pie chart for the resulting table with main title.
pie(tabx, main = "Sample of Twenty elementary school children ")
Graphical Descriptive summary in R Programming Language
Descriptive summary in R Programming Language

Descriptive Statistics for Air Quality Data

Consider the air quality data for computing numerical and graphical descriptive summary in R. The air quality data already exists in the R Datasets package.

attach(airquality)
# To choose the temperature degree only
Temperature = airquality[, 4]
hist(Temperature)

hist(Temperature, main="Maximum daily temperature at La Guardia Airport", xlab="Temperature in degrees Fahrenheit", xlim = c(50, 100), col="darkmagenta", freq=T)

h <- hist(Temperature, ylim = c(0,40))
text(h$mids, h$counts, labels=h$counts, adj=c(0.5, -0.5))
Histogram Descriptive Statistics in R Programming Language

In the above histogram, the frequency of each bar is drawn at the top of each bar by using the text() function.

Note that to change the number of classes or the interval, we should use the sequence function to divide the $range$, $Max$, and $Min$, into $n$ using the function length.out=n+1

hist(Temperature, breaks = seq(min(Temperature), max(Temperature), length.out = 7))
Histogram with breaks. Descriptive Statistics in R Programming Language

Median for Ungrouped Data

Numeric descriptive statistics such as median, mean, mode, and other summary statistics can be computed.

median(Temperature)
## Output 79
mean(Temperature)
summary(Temperature)
Numerical Descriptive Statistics in R Programming Language

A customized function for the computation of the median can be created. For example

arithmetic.median <- function(xx){
    modulo <- length(xx) %% 2
    if (modulo == 0){
      (sort(xx)[ceiling(length(xx)/2)] + sort(xx)[ceiling(1+length(xx)/2)])/2
    } else{
     sort(xx)[ceiling(length(xx)/2)]
  }
}
arithmetic.median(Temperature)

Computing Quartiles and IQR

The quantiles (Quartiles, Deciles, and Percentiles) can be computed using the function quantile() in R. The interquartile range (IQR) can also be computed using the iqr() function.

y = airquality[, 4]  # temperature variable

quantile(y)

quantile(y, probs = c(0.25,0.5,0.75))
quantile(y, probs = c(0.30,0.50,0.70,0.90))

IQR(y)
Quartiles Descriptive summary in R Programming Language

One can create a custom function for the computation of Quartiles and IQR. For example,

quart<- function(x) {
   x <- sort(x)
   n <- length(x)
   m <- (n+1)/2
   if (floor(m) != m) {
      l <- m-1/2; u <- m+1/2
     } else {
     l <- m-1; u <- m+1
     }
   c(Q1 = median(x[1:l]), 
   Q3 = median(x[u:n]), 
   IQR = median(x[u:n])-median(x[1:l]))
}

quart(y)

FAQs in R Language

  1. How one can perform descriptive statistics in R Language?
  2. Discuss the strategy of creating a frequency table in R.
  3. How Pie Charts and Bar Charts can be drawn in R Language? Discuss the commands and important arguments.
  4. What default function is used to compute the quartiles of a data set?
  5. You are interested in computing the median for group and ungroup data in R. Write a customized R function.
  6. Create a User-Defined function that can compute, Quaritles and IQR of the inputted data set.

https://itfeature.com

https://gmstat.com

Components of Functions in R Language

Components of Function in R Language

Functions in the R Language are objects with three basic components. The components of functions in R language are:

Let us discuss the components of functions in R language in detail.

Formal Argument in R

To learn about Formal argument in R language, see the post formal argument, also see the basics about functions in R Language.

Body of a Function

The body of a function is parsed R statements. The body of a function is usually a collection of statements in braces but it can be a single statement, a symbol, or even a constant.

Environment of a Function

The environment of a function is the environment that was active at the time that the function was created. The environment of a function is a structural component of the function and belongs to the function itself.

A fourth component of a function in R language can be considered as a “Return Value” by a function.

Return Value of a Function

The last object called within a function is returned by the function and therefore available for assignment. Functions can return only a single value but in practice, this is not a limitation as a list containing any number of objects can be returned. Objects can be returned visible or invisible. This option does not affect the assignment side but affects the way results are displayed when the function is called.

y <- function(n){
  out <- runif(n)
  cat (head(out))
  invisible(out)
}

Functions Closures in R

A function closure or closure is a function together with a referencing environment. Almost all functions in R are closures as they remember the environment where they were created. The functions that cannot be classified as closures, and therefore do not have a referencing environment, are known as primitives.

In R, internal functions are called the underlying C code. These sum() and c() are good cases in point:

environment(sum)

When we call a function, a new environment is created to hold the function’s execution, and normally, that environment is destroyed when the function exists. But, if we define a function g() that returns a f(), the environment where f() is created is the execution environment of g(), that is, the execution environment of g() is the referencing environment of f(). As a consequence, the execution environment of g() is not destroyed as g() exists but it persists as long as f() exists. Finally as f() remembers all objects bound to its referencing environment f() remembers all objects bound to the execution environment of g().

We can use the referencing environment of f(), to hold any of the objects and these will be available to f().

g <- function(){
  y <- 1
  function(x){
    x + y
  }
  
f1 <- g()
  
print(f1)
  
f1(3)
components of Functions in R Language

Closures can be used to write functions that in turn closures can be used to generate new functions. This allows us to have a double layer of development: a first layer that is used to do all the complex work in common to all functions and a second layer that defines the details of each function.

f <- function(i){
  function(x){
    x+i
  }
}
f1 <- f(1)
f2 <- f(2)
f2(4)
Functions in R Language

By understanding these components, one can effectively create and use functions to enhance one’s R programming.

Note that in R language:

  • Functions can be nested within other functions.
  • Functions can access variables from the environment where they are defined (lexical scoping).
  • R provides many built-in functions for various tasks.
  • One can create customized functions to automate repetitive tasks and improve code readability.

https://itfeature.com

https://rfaqs.com

https://gmstat.com

Formal Arguments in R: Quick Guide

Introduction to Formal Arguments in R

Formal arguments in R are essentially variables you define within a function’s code block. These arguments act as placeholders for the data that are provided when a user uses the function. Formal are the formal arguments of function returned as an object of class pairlist which can be thought of as something similar to a list with an important difference:

is.null(pairlist())
is.null(list())

That is a pairlist of length zero is NULL while a list is not.

Specifying Formal Arguments Positions

Formal arguments in R can be specified by position or by name and we can mix positional matching with matching by name. The following are equivalent.

mean(x = 1:5, trim = 0.1)
mean(1:5, trim = 0.1)
mean(x = 1:5, 0.1)
mean(1:5, 0.1)
mean(trim = 0.1, x = 1:5)
formal arguments in R Language

Functions Formals Default Values

Functions formals may also have the construct symbol=default, which unless differently specified, forces any argument to be used with its default value. Specifically, functions mean() also have a third argument na.rm that defaults to FALSE and as a result, passing vectors with NA values to mean() returns NA.

mean(c(1, 2, NA))

while by specifying na.rm=TRUE we get the mean of all non-missing elements of vector x.

mean(c(1, 2, NA), na.rm = TRUE)

we can redefine mean() function that defaults na.rm to TRUE by simply

mean(c(1, 2, NA))

Now we have a copy of mean.default() in our globalenv:

exists("mean.default", envir = globaenv())

also, notice

environment(mean.default)

The … Argument in a Function

The … argument of a function is special and can contain any number of symbol=value arguments. The … argument is transformed by R into a list that is simply added to the formal list:

h<-function(x, …){
            0
}

formals(h)

The … argument can be used if the number of arguments is unknown. Suppose one wants to define a function that counts the number of rows of any given number of data frames. One can write:

count_rows<-function(…){
     list<-list(…)
     lapply(list, nrow)
}

count_rows(airquality, cars)

By effectively using formal arguments in R Language, one can create reusable and adaptable functions that make the R code more concise and efficient.

https://itfeature.com

https://gmstat.com