Simulation in R for Sampling (2024)

Introduction to Simulation in R Language

The post is about simulation for sampling in R Programming Language. It contains useful examples for generating samples and then computing basic calculations in generated data.

Simulations are a powerful tool in R for exploring “what-if” scenarios without the need for real-world data. One can use R Language to simulate data from various probability distributions or even design customized functions for more complex simulations.

Question 1: Simulate a coin toss 20 times.

sample(c("H", "T"), 20, replace=T)

Question 2: Write R commands to find out the 95% confidence interval for the mean (unknown variance) from the following population

yp <- c(111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
N  <- length(yp)
ys <- sample(yp, 5)
n  <- length(ys)
mys <- mean(ys)
vys <- vary(ys)
vybar <- var(yp)/n
sdr <- sqrt(vybar)
error <- qnorm(0.975)*sdr
ll <- mys - error
ul <- mys + error

Sampling without Replacement and Histogram

Question 3: If we have a population ِye <- c(112, 114, 119, 125, 158, 117, 135, 141, 185, 128) then simulate this population with $k=100$ and $n=3$ for Simple Random Sampling without Replacement (SRSWOR). Also, find out the sample mean. Draw the histogram of the sample means generated.

k = 100; n = 3
m1 <- c()
ye <- c(112, 114, 119, 125, 158, 117, 135, 141, 185, 128)

for(i in 1:100){
  s <- sample(ye, 3)
  m1[i] <- mean(s)
}

m1
hist(m1)
histogram: Simulation in R

Question 4: Perform a simulation in R by writing the R code considering generating a population of size 500 values from a normal distribution with a mean = 20 and a standard deviation = 30. Select 5000 samples, each of size 50 using the systematic sampling technique, and estimate the mean of each sample. Find the mean and variance of 5000 means.

N = 500; n = 50;
k = N/n; m = c();
pop <- rnorm (N, mean=20, sd=30)

for(i in 1:5000){
  start <- sample(1: k, 1)
  s <- seq(start, N, k)
  sys.sample <- pop[s]
  m[i] = mean(sys.sample)
}

mean(m); var(m)

Question 5: Why do we use simulation for sampling?
Answer: The simulation study is useful to evaluate a sampling strategy. We can generate the populations considering specific situations. Generating the population, the sample of size $n$ is obtained $k$ times. From each sample, the estimator is obtained. The variance of $k$ estimators is calculated for examining the efficiency.

Coin Toss Experiment in R

Question 6: Write an R code to Simulate a coin-tossing experiment.

# Define the Number of tosses of a coin
n_tosses <- 100

# Simulate coin tosses (1 for heads, 0 for tails)
coin_tosses <- sample(c(0, 1), n_tosses, replace = TRUE)

# Calculate the proportion of heads
prop_heads <- mean(coin_tosses)

# Display results
cat("Number of Heads:", sum(coin_tosses), "\n")
cat("Proportion of Heads:", prop_heads, "\n")
# Plot the results
barplot(c(sum(coin_tosses), n_tosses - sum(coin_tosses)),
        names.arg = c("Heads", "Tails"),
        col = c("skyblue", "salmon"),
        main = "Coin Toss Simulation"
       )
Simulation in R for Sampling

One can adapt these examples for more complex statistical simulations or specific scenarios by modifying the simulation process and analyzing the results accordingly. Simulations are commonly used in various fields, such as statistics, finance, and operations research, to model and analyze uncertain or random processes.

Simulation Data in R using For Loops

Learn Basic Statistics and Data Analysis

Leave a Reply

Discover more from R Language Frequently Asked Questions

Subscribe now to keep reading and get access to the full archive.

Continue reading