Some Descriptive Statistics in R

Descriptive Statistics in R

There are numerous functions in the R language that are used to computer descriptive statistics. Here, we will consider the data mtcars to get descriptive statistics in R. You can use a dataset of your own choice. To learn about what are descriptive statistics, read the different posts from the Basic Statistics Section.

Getting Dataset Information in R

Before performing any descriptive or inferential statistics, it is better to get some basic information about the data. It will help to understand the mode (type) of variables in the datasets.

# attach the mtcars datasets

# data structure

You will see the dataset mtcars contains 32 observations and 11 variables.

It is also best to inspect the first and last rows of the dataset.

# for the first six rows

# for the last six rows

Getting Numerical Descriptive Statistics in R

To get a quick overview of the dataset, the summary( ) function can also be used. We can use the summary( ) function separately for each of the variables in the dataset.

Some Descriptive Statistics in R

Note that the summary( ) the function provides five-number summary statistics (minimum, first quartile, median, third quartile, and maximum) and an average value of the variable used as the argument. Note the difference between the output of the following code.

summary( factor(cyl) )

Remember that if for a certain variable, the datatype is defined or changed R will automatically choose an appropriate descriptive statistics in R. If categorical variables are defined as a factor, the summary( ) function will result in a frequency table.

Some other functions can be used instead of summary() function.

# average value
# median value
# minimum value
# maximum value
# Quatiles, percentiles, deciles
quantile(mpg, probs=c(10, 20, 30, 70, 90))
# variance and standard deviation
# Inter-quartile range
# Range

Creating a Frequency Table in R

We can produce a frequency table and a relative frequency table for any categorical variable.

freq <- table(cyl); freq
rf <- prop.table(freq)

Barplot and Pie chart Some Descriptive Statistics in R

Creating a Contingency Table (Cross-Tabulation)

The contingency table can be used to summarize the relationship between two categorical variables. The xtab( ) or table( ) functions can be used to produce cross-tabulation (contingency table).

xtabs(~cyl + gear, data = mtcars)
table(cyl, gear)

Finding a Correlation between Variables

The cor( ) function can be used to find the degree of relationship between variables using Pearson’s method.

cor(mpg, wt)

However, if variables are heavily skewed, the non-parametric method Spearman’s correlation can be used.

cor(mpg, wt, method = "spearman")

The scatter plot can be drawn using plot( ) a function.

plot(mpg ~ wt)

Learn more about plot( ) function: plot( ) function

Visit: Learn Basic Statistics

Leave a Reply

Discover more from R Language Frequently Asked Questions

Subscribe now to keep reading and get access to the full archive.

Continue reading