Here, we will consider the data `mtcars`

to get descriptive statistics in R. You can use a dataset of your own choice.

**Getting Dataset Information**

Let get some basic information about data. It will help to understand the mode (type) of variables in the datasets.

# attach the mtcars datasets attach(mtcars) # data structure str(mtcars)

You will see the dataset `mtcars`

contains 32 observations and 11 variables.

It is also best to inspect the first and last rows of the dataset.

# for the first six rows head(mtcars) # for the last six rows tail(mtcars)

**Getting Numerical Information about Dataset and Variables**

To get a quick overview of the dataset, the `summary( )`

function can also be used. We can use the `summary( )`

function separately for each of the variables in the dataset.

summary(mtcars) summary(mpg) summary(gear)

Note that the `summary( )`

function provides five-number summary statistics (minimum, first quartile, median, third quartile, and maximum) and an average value of the variable used as the argument. Note the difference between the output of the following code.

summary(cyl) summary( factor(cyl) )

Note that if for a certain variable the datatype is defined or changed R will automatically choose an appropriate descriptive statistics in R. If categorical variables are defined as a factor, the `summary( )`

function will result in a frequency table.

There are some other functions that can be used instead of `summary()`

function.

# average value mean(mpg) # median value median(mpg) # minimum value min(mpg) # maximum value max(mpg) # Quatiles, percentiles, deciles quantile(mpg) quantile(mpg, probs=c(10, 20, 30, 70, 90)) # variance and standard deviation var(mpg) sd(mpg) # Inter-quartile range IQR(mpg) # Range range(mpg)

**Creating a Frequency Table**

We can produce a frequency table and a relative frequency table for any categorical variable.

freq <- table(cyl); freq rf <- prop.table(freq) barplot(freq) barplot(rf) pie(freq) pie(rf)

**Creating a Contingency Table (Cross-Tabulation)**

The contingency table can be used to summarize the relationship between two categorical variables. The `xtab( )`

or `table( )`

functions can be used to produce cross-tabulation (contingency table).

xtabs(~cyl + gear, data = mtcars) table(cyl, gear)

**Finding a Correlation between Variables**

The `cor( )`

function can be used to find the degree of relationship between variables using Pearson’s method.

cor(mpg, wt)

However, if variables are heavily skewed, the non-parametric method Spearman’s correlation can be used.

cor(mpg, wt, method = "spearman")

The scatter plot can be drawn suing `plot( )`

function.

plot(mpg ~ wt)

Learn more about `plot( )`

function: `plot( )`

function

You must log in to post a comment.