Simple Random Sampling in R: Explained Easy

Introduction to Simple Random Sampling in R

Simple random Sampling (SRS) is the most basic method of taking a probability sample. A sample of $n$ units is selected from a population $N$ using simple random sampling. Each of the $\binom{N}{n}$ possible samples has the same chance of being selected. The choice of the specific sample can be made using a random number generator on a computer. In this post we will learn about simple random sampling in R, that is, the selection of elements in a sample using simple random sampling.

The following commands will generate random permutations of $n$ integers or random samples from a population of numbers.

Random permutation of integers $1$ to $n$

The sample(n) may be used to generate a random sample.

sample(10)

## Output
[1]  5  8  9  4  3  2  1  6 10  7

Random permutation of elements in a vector $x$

A random selection of elements from a vector can be done using sample(n).

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x)

## Output
[1]  21  25   0   4  20 -15  -1  19  23

Random Sample of $n$ items from $x$ without replacement

A random selection of $n$ elements from a vector $x$ without replacement using sample(x, n)

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x, 5)

## Output
[1] -1 19 21 23  0
Simple Random Sampling in R Language

Random sample of $n$ items from $x$ with replacement

A random sample of $n$ items from vector $x$ can be selected with replacement using sample(x, 5, replace = T)

x <- c(20, 25, 19, -15, 4, 21, -1, 0, 23)
sample(x, 5, replace = T)

## Output
[1]  0 -1  4 19 -1

Random Sample with Probabilities

A random sample of $n$ items from $x$ with elements of $x$ having differing probabilities of selection. A vector of probabilities is required for each element in $x$. Note that the sum of elements in the probability vector must be one.

x <- c(23, 45, 69, -1, .9, 4, 25, 19)
p <- c(.1, .1, 0, 0, .2, .3, .1, .2)
sum(p)

sample(x, 5, replace = T, p)

## Output
[1]  4 19 19 19 45

Random Selection of Integers without Replacement

The random selection of $n$ integers from the integers 1 to $N$, without replacement can be done using sample(N, n)

sample(1000, 10)

##Output
[1] 138 147 911 523 586 163 915 966 951 245
Simple Random Sampling in R with output

One can estimate $\mu$ and variance of $\mu$.

Let $y_1, y_2, \cdots, y_n$ be the measurements obtained from the simple random sampling of $n$ units from the population. The estimator of population mean $\mu$ is

$$\hat{\mu} = \frac{1}{n} \sum\limits_{i=1}^n y_i$$

with estimated variance of $\hat{\mu}$ given by

$$\hat{var(\hat{\mu})} = \frac{s^2}{n} \left( \frac{N-n}{N}\right)$$

where $s^2 = \frac{1}{n-1} \sum\limits_{i=1}^n (y_i – \overline{y})^2$.

Statistics MCQs and Data Analysis

Online MCQs Website with Answers

Simulating Coin Tossing: Game of Chance

Introduction to Simulating Coin Tossing

Simulation provides a straightforward way of approximating probabilities. For simulating a Game of chance of coin tossing, one simulates a particular random experiment (coin tossing, dice roll, and/or card drawing) a large number of times. The probability of an outcome is approximated by the relative frequency of the outcome in the repeated experiments.

The use of simulation experiments to better understand probability patterns is called the Monte Carlo Method.

Practical Example: Simulating Coin Tossing Experiment

Let person “A” and person “B” play a simple game involving repeated tosses of a fair coin. In a given toss, if the head is observed, “A” wins $1 from “B”; otherwise if the tail is tossed, “A” gives $1 to “B”. If “A” starts with zero dollars, we are interested in his fortune as the game is played for 50 tosses.

Simulating The Game in R using sample() Function

For the above scenario, one can simulate this game using the R function “sample()”. A’s winning on a particular toss will be $1$ or $-1$ with equal probability. His winning on 50 repeated tosses can be considered to be a sample of size 50 selected with replacement from the set {$1, -$1}.

option(width=60)
sample(c(-1, 1), size = 50, relapce = T)

# output
 [1] -1  1  1 -1 -1  1 -1  1  1 -1  1 -1  1  1 -1 -1  1 -1 -1 -1  1  1  1 -1 -1
[26]  1 -1  1 -1  1 -1 -1  1  1  1 -1 -1  1 -1 -1 -1  1 -1  1  1 -1  1 -1  1  1

Graphical Representation of the Simulations

One can graphically represent the outcome, as coded below. Note that the results will be different for each compilation of the code as samples are drawn randomly.

results <- sample( c(-1, 1), size = 50,replace = TRUE)

x = table(results)
names(x) = c("loss", "win")

barplot(x)
Simulating a Game of Chance of Coin Tossing

Extended Example

Suppose “A” is interested in his cumulative winnings as he plays this game. One needs to score his toss winnings in the variable $win$. The function “cumsum()” will compute the cumulative winnings of the individual values and the cumulative values are stored in “cam.win“.

win = sample(c(-1, 1), size = 50, replace = T)
cum.win = cumsum(win)

# Output for different execution 
 [1] -1 -2 -1 -2 -1  0  1  0 -1  0  1  0  1  2  1  2  1  2  3  4  5  6  7  6  5
[26]  4  5  4  5  6  7  6  7  8  9  8  7  6  5  4  5  4  3  2  1  0 -1 -2 -3 -4

 [1] 1 2 1 0 1 2 1 2 1 2 1 2 3 4 5 4 5 6 7 8 7 8 7 6 7 6 5 4 5 4 3 4 3 4 5 4 3 2
[39] 3 4 3 4 3 2 3 4 5 6 5 4

Extending and plotting the sequence of cumulative winnings for 4 games. For four games, the win/loss score is plotted in four combined (2 by 2) graphs, to visualize the situation in all four games.

par(mfrow = c(2, 2))
for (j in 1:4){
win = sample( c(-1, 1), size = 50, replace = TRUE)
plot(cumsum(win), type = “l”, ylim = c(-15, 15))
abline(h=0)
}

Four coin toss win loss change

The horizontal line in each graph is drawn at break-even. The points above the horizontal line show the win situation while the information below the horizontal line shows the loss to the player.

One can make a customized function for the situation discussed above. For example,

# customized function 
winloss <- function (n=50){
    win = sample(c(-1,1), size = n, replace = T)
    sum(win)
}

# Insights about win/ loss situation
F = replicate(1000, winloss() )

table(F)

# output
F
-22 -18 -16 -14 -12 -10  -8  -6  -4  -2   0   2   4   6   8  10  12  14  16  18 
  3   6   7  22  28  52  60  78  82 128  93 103  95  72  50  48  34  24   8   3 
 20  22  24 
  1   2   1 


par(mfrow = c(1, 1) )

plot(table(F))

Game of Change: Simulating coin Experiment

https://itfeature.com

https://rfaqs.com

Simulation in R for Sampling (2024)

Introduction to Simulation in R Language

The post is about simulation for sampling in R Programming Language. It contains useful examples for generating samples and then computing basic calculations in generated data.

Simulations are a powerful tool in R for exploring “what-if” scenarios without the need for real-world data. One can use R Language to simulate data from various probability distributions or even design customized functions for more complex simulations.

Question 1: Simulate a coin toss 20 times.

sample(c("H", "T"), 20, replace=T)

Question 2: Write R commands to find out the 95% confidence interval for the mean (unknown variance) from the following population

yp <- c(111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
N  <- length(yp)
ys <- sample(yp, 5)
n  <- length(ys)
mys <- mean(ys)
vys <- vary(ys)
vybar <- var(yp)/n
sdr <- sqrt(vybar)
error <- qnorm(0.975)*sdr
ll <- mys - error
ul <- mys + error

Sampling without Replacement and Histogram

Question 3: If we have a population ِye <- c(112, 114, 119, 125, 158, 117, 135, 141, 185, 128) then simulate this population with $k=100$ and $n=3$ for Simple Random Sampling without Replacement (SRSWOR). Also, find out the sample mean. Draw the histogram of the sample means generated.

k = 100; n = 3
m1 <- c()
ye <- c(112, 114, 119, 125, 158, 117, 135, 141, 185, 128)

for(i in 1:100){
  s <- sample(ye, 3)
  m1[i] <- mean(s)
}

m1
hist(m1)
histogram: Simulation in R

Question 4: Perform a simulation in R by writing the R code considering generating a population of size 500 values from a normal distribution with a mean = 20 and a standard deviation = 30. Select 5000 samples, each of size 50 using the systematic sampling technique, and estimate the mean of each sample. Find the mean and variance of 5000 means.

N = 500; n = 50;
k = N/n; m = c();
pop <- rnorm (N, mean=20, sd=30)

for(i in 1:5000){
  start <- sample(1: k, 1)
  s <- seq(start, N, k)
  sys.sample <- pop[s]
  m[i] = mean(sys.sample)
}

mean(m); var(m)

Question 5: Why do we use simulation for sampling?
Answer: The simulation study is useful to evaluate a sampling strategy. We can generate the populations considering specific situations. Generating the population, the sample of size $n$ is obtained $k$ times. From each sample, the estimator is obtained. The variance of $k$ estimators is calculated for examining the efficiency.

Coin Toss Experiment in R

Question 6: Write an R code to Simulate a coin-tossing experiment.

# Define the Number of tosses of a coin
n_tosses <- 100

# Simulate coin tosses (1 for heads, 0 for tails)
coin_tosses <- sample(c(0, 1), n_tosses, replace = TRUE)

# Calculate the proportion of heads
prop_heads <- mean(coin_tosses)

# Display results
cat("Number of Heads:", sum(coin_tosses), "\n")
cat("Proportion of Heads:", prop_heads, "\n")
# Plot the results
barplot(c(sum(coin_tosses), n_tosses - sum(coin_tosses)),
        names.arg = c("Heads", "Tails"),
        col = c("skyblue", "salmon"),
        main = "Coin Toss Simulation"
       )
Simulation in R for Sampling

One can adapt these examples for more complex statistical simulations or specific scenarios by modifying the simulation process and analyzing the results accordingly. Simulations are commonly used in various fields, such as statistics, finance, and operations research, to model and analyze uncertain or random processes.

Simulation Data in R using For Loops

Learn Basic Statistics and Data Analysis