Probability - R Programming FAQs

Exploring Data Distribution in R

December 3, 2024December 1, 2024 by Muhammad Imdad Ullah

Exploring Data Distribution in R Language

Suppose we have univariate data and need to examine its distribution. There are a variety of tools and techniques to explore univariate data distributions. The simplest way is to explore the numbers. The summary() and fivenum() are numerical while the stem() is a display of the numbers to examine the distribution of the data set. This post will teach you the basics of exploring data distribution in the R Language.

Five Number Summary and Stem and Leaf Plot

One can use numeric and visual tools in exploring data distribution. For example,

attach(faithful)
summary(eruptions)

## Output
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.600   2.163   4.000   3.488   4.454   5.100 

fivenum(eruptions)

## Output
 1.6000 2.1585 4.0000 4.4585 5.1000

stem(eruptions)

Exploring Data Distribution in R Language stem and leaf display

Histogram and Density Plot

The stem-and-leaf display is like a histogram which can be drawn using the hist() function to plot histograms in R language. The boxplot() function can also be used to visualize the distribution of the data. This will help in exploring data distribution.

# make the bins smaller, and make a plot of density

hist(eruptions)
hist(eruptions, seq(1.6, 5.2, 0.2), prob=TRUE)
lines(density(eruptions, bw=0.1))
rug(eruptions) # Show the actual data points

Exploring data distribution in R using hist and density function

The density can be used to create more elegant density plots, a line is also produced by the density and bw bandwidth is chosen by trial and error as the defaults give too much smoothing (it usually does for “interesting” densities). Better automated methods for bandwidth are also available (in the above example bw="SJ" gives good results.)

Empirical Cumulative Distribution Function

One can also plot the empirical cumulative distribution function by using the function ecdf.

plot(ecdf(eruptions), do.points = FALSE, verticals = TRUE)

For the right-hand mode (eruptions of longer than 3 minutes), let us fit a normal distribution and overlay the fitted CDF.

long <- eruptions[eruptions > 3]
plot (ecdf(long), do.points = FALSE, verticals = TRUE)
x <- seq(3, 5.4, 0.01)
lines(x, pnorm(x, mean = mean(long), sd = sqrt(var(long))), lty = 3)

par(pty = "s")
qqnorm(long)
qqline(long)

The Quantile-Quantile (QQ Plot) long shows a reasonable fit but a shorter right tail than one would expect from a normal distribution. One can compare it with some simulated data from t-distribution.

x <- rt(250, df = 5)
qqnorm(x)
qqline(x)

which will show a longer tail (as a random sample from the t distribution) compared to a normal distribution.

Normality Test in R

To determine if the data follows the normal distribution,

The Shapiro-Wilk Normality Test using the shapiro.test() function can determine if the data follows a normal distribution.

    Shapiro-Wilk normality test

shapiro.test(eruptions)
## Output
		Shapiro-Wilk normality test

data:  eruptions
W = 0.84592, p-value = 9.036e-16

The Kolmogorov-Smirnov Test using the ks.test() function can determine if the data follows a normal distribution

ks.test(eruptions, "pnorm")

## Output
        Asymptotic one-sample Kolmogorov-Smirnov test

data:  eruptions
D = 0.94857, p-value < 2.2e-16
alternative hypothesis: two-sided

Warning message:
In ks.test.default(eruptions, "pnorm") :
  ties should not be present for the one-sample Kolmogorov-Smirnov test

By combining the above techniques, exploring data distribution helps in gaining valuable insights into the distribution of univariate data, identifying potential outliers, and assessing normality assumptions for further statistical analysis.

Online Quiz Website, Learn Basic Statistics

Binomial Random Numbers Generation in R

July 29, 2024January 6, 2024 by Muhammad Imdad Ullah

We will learn how to generate Bernoulli or Binomial Random Numbers (Binomial distribution) in R with the example of a flip of a coin. This tutorial is based on how to generate random numbers according to different statistical probability distributions in R. Our focus is on binomial random numbers generation in R.

Binomial Random Numbers in R

We know that in Bernoulli distribution, either something will happen or not such as a coin flip has two outcomes head or tail (either head will occur or head will not occur i.e. tail will occur). For an unbiased coin, there will be a 50% chance that the head or tail will occur in the long run. To generate a random number that is binomial in R, use the rbinom(n, size, prob) command.

rbinom(n, size, prob) #command has three parameters, namey

where
‘$n$’ is the number of observations
‘$size$’ is the number of trials (it may be zero or more)
‘$prob$’ is the probability of success on each trial for example 1/2

Examples of Generation Binomial Random Numbers

One coin is tossed 10 times with a probability of success=0.5
the coin will be fair (unbiased coin as p=1/2)
rbinom(n=10, size=1, prob=1/2)
OUTPUT: 1 1 0 0 1 1 1 1 0 1
Two coins are tossed 10 times with a probability of success=0.5
rbinom(n=10, size=2, prob=1/2)
OUTPUT: 2 1 2 1 2 0 1 0 0 1
One coin is tossed one hundred thousand times with a probability of success=0.5
rbinom(n=100,000, size=1, prob=1/2)
store simulation results in $x$ vector
x <- rbinom(n=100000, size=5, prob=1/2)
count 1’s in x vector
sum(x)
find the frequency distribution
table(x)
creates a frequency distribution table with frequency
t = (table(x)/n *100)
plot frequency distribution table
plot(table(x),ylab = "Probability",main = "size=5,prob=0.5")

View the Video tutorial on rbinom command

Learn Basic Statistics and Online MCQs about Statistics

Probability Distributions in R

April 6, 2025December 26, 2020 by Muhammad Imdad Ullah

The article is a discussion about Probability Distributions in R Language. Master probability distributions in R with this comprehensive guide! Learn how to work with Normal, Binomial, Poisson, Exponential, and other key distributions using built-in R functions. Discover practical examples for PDFs, CDFs, random sampling, statistical tests, and data fitting. Perfect for data scientists, statisticians, and R programmers!

Probability distributions are the foundation of statistical analysis and data modeling. Whether you are performing hypothesis testing, simulations, or predictive modeling, understanding how to use probability distributions in R is essential. We often make probabilistic statements when working with statistical Probability Distributions. We want to know four things:

The density (PDF) at a particular value,
The distribution (CDF) at a particular probability,
The quantile value corresponding to a particular probability, and
A random draw of values from a particular distribution.

Probability Distributions in R Language

R language has plenty of functions for obtaining density, distribution, quantile, and random numbers and variables.

Consider a random variable $X$ which is $N(\mu = 2, \sigma^2 = 16)$. We want to:

1) Calculate the value of PDF at $x=3$ (that is, the height of the curve at $x=3$)

dnorm(x = 3, mean = 2, sd = sqrt(16) ) 

dnorm(x = 3, mean = 2, sd = 4) 
dnorm(x = 3, 2, 4)

2) Calculate the value of the CDF at $x=3$ (that is, $P(X\le 3)$)

pnorm(q = 3, m = 2, sd = 4)

3) Calculate the quantile for probability 0.975

qnorm(p = 0.975, m = 2, sd = 4)

4) Generate a random sample of size $n = 10$

rnorm(n = 10, m = 2, sd = 5)

There are many probability distributions available in the R Language. The following is the commonly used probability distributions n R Language.

Binomial	dbinom( )	qbinom( )	pbinom( )	rbinom( )
t	`dt( )`	`qt( )`	`pt( )`	`rt( )`
Poisson	`dpois( )`	`qpois( )`	`ppois( )`	`rpois( )`
f	`df( )`	`qf( )`	`pf( )`	`rf( )`
Chi-Square	`dchisq( )`	`qchisq( )`	`pchisq( )`	`rchisq()`

Observe that a prefix (d, q, p, and r) is added for each distribution.

Distribution	Distribution Name in R	Parameters
Binomial	`binom`	n = Number of trials, and p= probability of success for one trial
Geometric	`geom`	p=probability of success for one trial
Poisson	`pois`	lambda = mean
Beta	`beta`	shape1, shape2
Chi-Square	`chisq`	df=degrees of freedom
F	`f`	df1, df2 degrees of freedom
Logistic	`logis`	location, scale
normal	`norm`	mean, sd
Student’s t	`t`	df=degrees of freedom
Weibull	`weibull`	shape, scale

Visualizing Density Function in R

The density function in R for example, dnorm() can be used to draw a graph of normal (or any distribution). Let us compare two normal distributions, both with mean = 20, one with sd = 6, and the other with sd = 3.

For this purpose, we need $x$-axis values, such as $\overline{x} \pm 3SD \Rightarrow 20 + \pm 3\times 6$.

xaxis <- seq(0, 40, 0.5)
y1 <- dnorm(xaxis, 20, 6)
y2 <- dnorm(xaxis, 20, 3)

plot(xaxis, y2, type = "l", main = "comparing two normal distributions", col = "blue")

points(xaxis, y1, type="l", col = "red")

Comparing Normal Probability Distributions in R

Finding Probabilities in R

Probabilities in R language can be computed using pnorm() function for normal distribution.

#Left Tailed Probability
pnorm(1.96)

#Area between two Z-scores
pnorm(1.96) - pnorm(-1.96)

Finding Right-Tailed Probabilities

1 - pnorm(1.96)

Solving Real Problems

Suppose you took a standardized test that has a mean of 500 and a standard deviation of 100. You got 720 marks (score). You are interested in the approximate percentile on this test.

To solve this problem, you have to find the Z-score of 720 and then use the pnorm( ) to find the percentile of your score.

zscore <- scale(x = 720,  500,  100)

pnorm(2.2)
pnorm(zscore[1,1])
pnorm(zscore[1])
pnorm(zscore[1, ])

Who Is This For?

✅ Data Scientists – Enhance statistical modeling
✅ Statisticians – Apply distributions in hypothesis testing
✅ R Programmers – Master distribution functions for simulations
✅ Students & Researchers – Learn with hands-on examples

MCQs in Statistics

Exploring Data Distribution in R

Exploring Data Distribution in R Language

Table of Contents

Five Number Summary and Stem and Leaf Plot

Histogram and Density Plot

Empirical Cumulative Distribution Function

Normality Test in R

Binomial Random Numbers Generation in R

Table of Contents

Binomial Random Numbers in R

Examples of Generation Binomial Random Numbers

Probability Distributions in R

Table of Contents

Probability Distributions in R Language

Visualizing Density Function in R

Finding Probabilities in R

Who Is This For?