Muhammad Imdad Ullah, Author at R Programming FAQs

Simple Random Sampling in R: Explained Easy

August 8, 2024August 8, 2024 by Muhammad Imdad Ullah

Introduction to Simple Random Sampling in R

Simple random Sampling (SRS) is the most basic method of taking a probability sample. A sample of $n$ units is selected from a population $N$ using simple random sampling. Each of the $\binom{N}{n}$ possible samples has the same chance of being selected. The choice of the specific sample can be made using a random number generator on a computer. In this post we will learn about simple random sampling in R, that is, the selection of elements in a sample using simple random sampling.

The following commands will generate random permutations of $n$ integers or random samples from a population of numbers.

Random permutation of integers $1$ to $n$

The sample(n) may be used to generate a random sample.

sample(10)

## Output
[1]  5  8  9  4  3  2  1  6 10  7

Random permutation of elements in a vector $x$

A random selection of elements from a vector can be done using sample(n).

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x)

## Output
[1]  21  25   0   4  20 -15  -1  19  23

Random Sample of $n$ items from $x$ without replacement

A random selection of $n$ elements from a vector $x$ without replacement using sample(x, n)

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x, 5)

## Output
[1] -1 19 21 23  0

Random sample of $n$ items from $x$ with replacement

A random sample of $n$ items from vector $x$ can be selected with replacement using sample(x, 5, replace = T)

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x, 5, replace = T)

## Output
[1]  0 -1  4 19 -1

Random Sample with Probabilities

A random sample of $n$ items from $x$ with elements of $x$ having differing probabilities of selection. A vector of probabilities is required for each element in $x$. Note that the sum of elements in the probability vector must be one.

x <- c(23, 45, 69, -1, .9, 4, 25, 19)
p <- c(.1, .1, 0, 0, .2, .3, .1, .2)
sum(p)

sample(x, 5, replace = T, p)

## Output
[1]  4 19 19 19 45

Random Selection of Integers without Replacement

The random selection of $n$ integers from the integers 1 to $N$, without replacement can be done using sample(N, n)

sample(1000, 10)

##Output
[1] 138 147 911 523 586 163 915 966 951 245

One can estimate $\mu$ and variance of $\mu$.

Let $y_1, y_2, \cdots, y_n$ be the measurements obtained from the simple random sampling of $n$ units from the population. The estimator of population mean $\mu$ is

$$\hat{\mu} = \frac{1}{n} \sum\limits_{i=1}^n y_i$$

with estimated variance of $\hat{\mu}$ given by

$$\hat{var(\hat{\mu})} = \frac{s^2}{n} \left( \frac{N-n}{N}\right)$$

where $s^2 = \frac{1}{n-1} \sum\limits_{i=1}^n (y_i – \overline{y})^2$.

Statistics MCQs and Data Analysis

Online MCQs Website with Answers

Statistical Power Analysis in R: A Comprehensive Guide

August 7, 2024 by Muhammad Imdad Ullah

Introduction to Power Analysis

The post is about statistical power analysis in R. First, define the meaning of power in statistics. The power is the probability ($1-\beta$) of detecting an effect given that the effect is here. Power is the probability of correctly rejecting the null hypothesis when it is false.

Suppose, a simple study of a drug-A and a placebo. Let the drug be truly effective. The power is the probability of finding a difference between two groups (drug-A and placebo group). Imagine that a power of $1-\beta=0.8$ (having a power of 0.8 means that 80% of the time, there will be statistically significant differences between the drug-A and the placebo group, whereas there are 20% of the time, the statistically significant effect will not be obtained between two groups). Also, note that this study was conducted many times. Therefore, the probability of a Type-II error is $\beta=0.2$.

One-Sample Power

The following plot is for a one-sample one-tailed greater than t-test. In the graph below, let the null hypothesis $H_0:\mu = \mu_0$ be true, and the test statistic $t$ follows the null distribution indicated by the hashed area. Under the specific alternative hypothesis, $H_1:\mu = \mu_1$, the test statistic $t$ follows the distribution shown by solid area.

The $\alpha$ is the probability of making a type-I error (that is rejecting $H_0$ when it is true), and the “crit. Val” is the location of the $t_{crit}$ value associated with $H_0$ on the scale of the data. The rejection region is the area under $H_0$ at least as far as $crit. val.” is from $\mu_0$.

The test’s power ($1-\beta$) is the green area, the area under $H_1$ in the rejection region. A type-II error is made when $H_1$ is true, but we fail to reject $H_0$ in the red region.

Type-II Error and Power Analysis in R

#One Sample Power

x <- seq(-4, 4, length = 1000)
hx <- dnorm(x, mean = 0, sd = 1)

plot(x, hx, type = "n", xlim = c(-4, 8), ylim = c(0, 0.5),
     main = expression (paste("Type-II Error (", beta, ") and Power (", 1 - beta, ")")), 
     axes = FALSE)

# one-tailed shift
shift = qnorm (1 - 0.05, mean=0, sd = 1 )*1.7
xfit2 = x + shift
yfit2 = dnorm(xfit2, mean=shift, sd = 1 )

axis (1, at = c(-qnorm(0.05), 0, shift), labels = expression("crit. val.", mu[0], mu[1]))
axis(1, at = c(-4, 4 + shift), labels = expression(-infinity, infinity), 
     lwd = 1, lwd.tick = FALSE)

# The alternative hypothesis area 
# the red - underpowered area

lb <- min(xfit2)               # lower bound
ub <- round(qnorm(0.95), 2)    # upper bound
col1 = "#CC2222"

i <- xfit2 >= lb & xfit2 <= ub
polygon(c(lb, xfit2[i], ub), c(0, yfit2[i],0), col = col1)

# The green area where the power is
col2 = "#22CC22"
i <- xfit2 >= ub
polygon(c(ub, xfit2[i], max(xfit2)), c(0, yfit2[i], 0), col = col2)

# Outline the alternative hypothesis
lines(xfit2, yfit2, lwd = 2)

# Print null hypothesis area
col_null = "#AAAAAA"
polygon (c(min(x), x, max(x)), c(0, hx, 0), col = col_null,
         lwd = 2, density = c(10, 40), angle = -45, border = 0)

lines(x, hx, lwd = 2, lty = "dashed", col=col_null)

axis(1, at = (c(ub, max(xfit2))), labels = c("", expression(infinity)), col = col2,
     lwd = 1, lwd.tick = FALSE)

#Legend
legend("topright", inset = 0.015, title = "Color", 
       c("Null Hypothesis", "Type-II error", "Power"), fill = c(col_null, col1, col2), 
       angle = -45, density = c(20, 1000, 1000), horiz = FALSE)

abline(v=ub, lwd=2, col="#000088", lty = "dashed")
arrows(ub, 0.45, ub+1, 0.45, lwd=3, col="#008800")
arrows(ub, 0.45, ub-1, 0.45, lwd=3, col="#880000")

Frequently Asked Questions About R: Power Analysis in R

Online Quiz Website

Statistics and Data Analysis

Important Python Quiz with Answers 3

August 2, 2024August 2, 2024 by Muhammad Imdad Ullah

The post is about the Python Quiz with answers. There are 20 Multiple-Choice Questions about Python. The topics covered in the quiz are introduction to Python, Data Structures, Importing and Exporting Files, Control Structures (if statements and loops), and graphical representations of the data. Let us start with the Python Quiz with Answers.

Python Quiz with Answers

How can you access the length of a list in Python?
Which command will grab the last few rows of a data frame?
Which data structure is {‘one’:1, ‘two’:2}.
In Python, what types of data can tuples contain?
In Python, the ———- statement sets a piece of code to run only when the condition of the if statement is false.
In Python, when does an else statement execute a piece of code?
A ————- is a body of reusable code for performing specific processes or tasks.
Which of these for loop statements would error (assume columns as an array)?
Which of these print statements would output an error message in Python?
How do you print “X is large” if $X$ is greater than 28 in Python?
What defines the body of a decision construct in Python?
How do you add an element to a set in Python?
How do you access the value of a dictionary key in Python?
How can you access a specific element in a list in Python?
A pair plot can be created using which Python module uses the pairplot method?
Which of the following data structures are immutable, meaning that values cannot be changed in place?
Which of the following are valid keywords for loops in Python?
What keyword is used to create a function?
Which Python libraries were used to create the boxplots?
Which of the following statements accurately describe NumPy arrays? Select all that apply.

https://itfeature.com

https://rfaqs.com