The article is a discussion about Probability Distributions in R Language.
We often make probabilistic statements when working with statistical Probability Distributions. We want to know four things:
- The density (PDF) at a particular value,
- The distribution (CDF) at a particular probability,
- The quantile value corresponding to a particular probability, and
- A random draw of values from a particular distribution.
Probability Distributions in R Language
R language has plenty of functions for obtaining density, distribution, quantile, and random numbers and variables.
Consider a random variable $X$ which is $N(\mu = 2, \sigma^2 = 16)$. We want to:
1) Calculate the value of PDF at $x=3$ (that is, the height of the curve at $x=3$)
dnorm(x = 3, mean = 2, sd = sqrt(16) ) dnorm(x = 3, mean = 2, sd = 4) dnorm(x = 3, 2, 4)
2) Calculate the value of the CDF at $x=3$ (that is, $P(X\le 3)$)
pnorm(q = 3, m = 2, sd = 4)
3) Calculate the quantile for probability 0.975
qnorm(p = 0.975, m = 2, sd = 4)
4) Generate a random sample of size $n = 10$
rnorm(n = 10, m = 2, sd = 5)
There are many probability distributions available in the R Language. I will list only a few.
Binomial | dbinom( ) | qbinom( ) | pbinom( ) | rbinom( ) |
t | dt( ) | qt( ) | pt( ) | rt( ) |
Poisson | dpois( ) | qpois( ) | ppois( ) | rpois( ) |
f | df( ) | qf( ) | pf( ) | rf( ) |
Chi-Square | dchisq( ) | qchisq( ) | pchisq( ) | rchisq() |
Observe that a prefix (d, q, p, and r) is added for each distribution.
Distribution | Distribution Name in R | Parameters |
Binomial | binom | n = Number of trials, and p= probability of success for one trial |
Geometric | geom | p=probability of success for one trial |
Poisson | pois | lambda = mean |
Beta | beta | shape1, shape2 |
Chi-Square | chisq | df=degrees of freedom |
F | f | df1, df2 degrees of freedom |
Logistic | logis | location, scale |
normal | norm | mean, sd |
Student’s t | t | df=degrees of freedom |
Weibull | weibull | shape, scale |
Drawing the Density Function
The density function dnorm()
can be used to draw a graph of normal (or any distribution). Let us compare two normal distributions both with mean = 20, one with sd = 6, and the other with sd = 3.
For this purpose, we need $x$-axis values, such as $\overline{x} \pm 3SD \Rightarrow 20 + \pm 3\times 6$.
xaxis <- seq(0, 40, 0.5) y1 <- dnorm(xaxis, 20, 6) y2 <- dnorm(xaxis, 20, 3) plot(xaxis, y2, type = "l", main = "comparing two normal distributions", col = "blue") points(xaxis, y1, type="l", col = "red")
Finding Probabilities in R
Probabilities in R language can be computed using pnorm()
function for normal distribution.
#Left Tailed Probability pnorm(1.96) #Area between two Z-scores pnorm(1.96) - pnorm(-1.96)
Finding Right-Tailed Probabilities
1 - pnorm(1.96)
Solving Real Problem
Suppose, you took a standardized test that has a mean of 500 and a standard deviation of 100. You took 720 marks (score). You are interested in the approximate percentile on this test.
To solve this problem, you have to find the Z-score of 720 and then use the pnorm( )
to find the percentile of your score.
zscore <- scale(x = 720, 500, 100) pnorm(2.2) pnorm(zscore[1,1]) pnorm(zscore[1]) pnorm(zscore[1, ])