Probability Distributions in R

The article is a discussion about Probability Distributions in R Language.

We often make probabilistic statements when working with statistical Probability Distributions. We want to know four things:

The density (PDF) at a particular value,
The distribution (CDF) at a particular probability,
The quantile value corresponding to a particular probability, and
A random draw of values from a particular distribution.

Probability Distributions in R Language

R has plenty of functions for obtaining density, distribution, quantile, and random numbers and variables.

Consider a random variable $X$ which is $N(\mu = 2, \sigma^2 = 16)$. We want to:

1) Calculate the value of PDF at $x=3$ (that is, the height of the curve at $x=3$)

dnorm(x = 3, mean = 2, sd = sqrt(16) ) 

dnorm(x = 3, mean = 2, sd = 4) 
dnorm(x = 3, 2, 4)

2) Calculate the value of the CDF at $x=3$ (that is, $P(X\le 3)$)

pnorm(q = 3, m = 2, sd = 4)

3) Calculate the quantile for probability 0.975

qnorm(p = 0.975, m = 2, sd = 4)

4) Generate a random sample of size $n = 10$

rnorm(n = 10, m = 2, sd = 5)

There are many probability distributions available in the R Language. I will list only a few.

Binomial	dbinom( )	qbinom( )	pbinom( )	rbinom( )
t	`dt( )`	`qt( )`	`pt( )`	`rt( )`
Poisson	`dpois( )`	`qpois( )`	`ppois( )`	`rpois( )`
f	`df( )`	`qf( )`	`pf( )`	`rf( )`
Chi-Square	`dchisq( )`	`qchisq( )`	`pchisq( )`	`rchisq()`

Observe that a prefix (d, q, p, and r) is added for each distribution.

Distribution	Distribution Name in R	Parameters
Binomial	`binom`	n = Number of trials, and p= probability of success for one trial
Geometric	`geom`	p=probability of success for one trial
Poisson	`pois`	lambda = mean
Beta	`beta`	shape1, shape2
Chi-Square	`chisq`	df=degrees of freedom
F	`f`	df1, df2 degrees of freedom
Logistic	`logis`	location, scale
normal	`norm`	mean, sd
Student’s t	`t`	df=degrees of freedom
Weibull	`weibull`	shape, scale

Drawing the Density Function

The density function dnorm() can be used to draw a graph of normal (or any distribution). Let us compare two normal distributions both with mean = 20, one with sd = 6, and the other with sd = 3.

For this purpose, we need $x$-axis values, such as $\overline{x} \pm 3SD \Rightarrow 20 + \pm 3\times 6$.

xaxis <- seq(0, 40, 0.5)
y1 <- dnorm(xaxis, 20, 6)
y2 <- dnorm(xaxis, 20, 3)

plot(xaxis, y2, type = "l", main = "comparing two normal distributions", col = "blue")

points(xaxis, y1, type="l", col = "red")

Comparing Normal Probability Distributions in R

Finding Probabilities in R

Probabilities in R can be computed using pnorm() function for normal distribution.

#Left Tailed Probability
pnorm(1.96)

#Area between two Z-scores
pnorm(1.96) - pnorm(-1.96)

Finding Right-Tailed Probabilities

1 - pnorm(1.96)

Solving Real Problem

Suppose, you took a standardized test that has a mean of 500 and a standard deviation of 100. You took 720 marks (score). You are interested in the approximate percentile on this test.

To solve this problem, you have to find the Z-score of 720 and then use the pnorm( ) to find the percentile of your score.

zscore <- scale(x = 720,  500,  100)

pnorm(2.2)
pnorm(zscore[1,1])
pnorm(zscore[1])
pnorm(zscore[1, ])

MCQs in Statistics

Probability Distributions in R Language

Drawing the Density Function

Finding Probabilities in R

Related

Leave a ReplyCancel reply

Probability Distributions in R Language

Drawing the Density Function

Finding Probabilities in R

Related

Leave a ReplyCancel reply

Discover more from R Language Frequently Asked Questions