## Probability Distributions in R

We often make probabilistic statements when working with statistical Probability Distributions. We want to know four things:

- The density (PDF) at a particular value,
- The distribution (CDF) at a particular probability,
- The quantile value corresponding to a particular probability, and
- A random draw of values from a particular distribution.

R has plenty of functions for obtaining density, distribution, quantile, and random variables.

Consider a random variable $X$ which is $N(\mu = 2, \sigma^2 = 16)$. We want to:

1) Calculate the value of PDF at $x=3$ (that is, the height of the curve at $x=3$)

dnorm(x=3, mean = 2, sd = sqrt(16) ) dnorm(x=3, mean = 2, sd = 4) dnorm(x=3, 2, 4)

2) Calculate the value of the CDF at $x=3$ (that is, $P(X\le 3)$)

pnorm(q=3, m=2, sd=4)

3) Calculate the quantile for probability 0.975

qnorm(p = 0.975, m = 2, sd = 4)

4) Generate a random sample of size $n = 10$

rnorm(n = 10, m = 2, sd = 5)

There are many probability distributions available in the R Language. I will list only a few.

Binomial | dbinom( ) | qbinom( ) | pbinom( ) | rbinom( ) |

t | `dt( )` | `qt( )` | `pt( )` | `rt( )` |

Poisson | `dpois( )` | `qpois( )` | `ppois( )` | `rpois( )` |

f | `df( )` | `qf( )` | `pf( )` | `rf( )` |

Chi-Square | `dchisq( )` | `qchisq( )` | `pchisq( )` | `rchisq()` |

Observe that a prefix (d, q, p, and r) is added for each distribution.

Distribution | Distribution Name in R | Parameters |

Binomial | `binom` | n=Number of trials, and p= probability of success for one trial |

Geometric | `geom` | p=probability of success for one trial |

Poisson | `pois` | lambda = mean |

Beta | `beta` | shape1, shape2 |

Chi-Square | `chisq` | df=degrees of freedom |

F | `f` | df1, df2 degrees of freedom |

Logistic | `logis` | location, scale |

normal | `norm` | mean, sd |

Student’s t | `t` | df=degrees of freedom |

Weibull | `weibull` | shape, scale |

**Drawing the Density function**

The density function `dnorm( )`

can be used to draw a graph of normal (or any distribution). Let us compare two normal distributions both with mean = 20, and one with sd = 6, and other with sd = 3.

For this purpose, we need $x$-axis values, such as $\overline{x} \pm 3SD \Rightarrow 20 + \pm 3\times 6$.

xaxis <- seq(0, 40, 0.5) y1 <- dnorm(xaxis, 20, 6) y2 <- dnorm(xaxis, 20, 3) plot(xaxis, y2, type = "l", main = "comparing two normal distributions", col = "blue") points(xaxis, y1, type="l", col = "red")

**Finding Probabilities**

#Left Tailed Probability pnorm(1.96) #Area between two Z-scores pnorm(1.96) - pnorm(-1.96)

**Finding Right Tailed Probabilities**

1 - pnorm(1.96)

**Solving Real Problem**

Suppose, you took a standardized test that has a mean of 500 and a standard deviation of 100. You took 720 marks (score). You are interested in the approximate percentile on this test.

To solve this problem, you have to find the Z-score of 720 and then use the `pnorm( )`

to find the percentile of your score.

zscore<-scale(x = 720, 500, 100) pnorm(2.2) pnorm(zscore[1,1]) pnorm(zscore[1]) pnorm(zscore[1, ])