Simulation in R for Sampling (2024)

The post is about simulation for sampling in R Programming Language. It contains some useful basic examples for generating samples and then computing some basic calculations in generated data.

Question 1: Simulate a coin toss 20 times.

sample(c("H", "T"), 20, replace=T)

Question 2: Write R commands to find out the 95% confidence interval for the mean (unknown variance) from the following population

yp <- c(111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
N <- length(yp)
ys <- sample(yp, 5)
n <- length(ys)
mys <- mean(ys)
vys <- vary(ys)
vybar <- var(yp)/n
sdr <- sqrt(vybar)
error <- qnorm(0.975)*sdr
ll <- mys - error
ul <- mys + error

Question 3: If we have a population ye <- c(112, 114, 119, 125, 158, 117, 135, 141, 185, 128) then simulate this population with $k=100$ and $n=3$ for Simple Random Sampling without Replacement (SRSWOR). Also, find out the sample mean. Draw the histogram of the sample means generated.

k=100; n=3
m1 <- c()
ye <- c(112, 114, 119, 125, 158, 117, 135, 141, 185, 128)
for(i in 1:100){
  s <- sample(ye, 3)
  m1[i] <- mean(s)
}
m1
hist(m1)
histogram: Simulation in R

Question 4: Perform a simulation in R by writing the R code considering generating a population of size 500 values from a normal distribution with a mean = 20 and a standard deviation = 30. Select 5000 samples, each of size 50 using the systematic sampling technique, and estimate the mean of each sample. Find the mean and variance of 5000 means.

N=500; n=50; 
k=N/n; m=c();
pop <- rnorm (N, mean=20, sd=30)

for(i in 1:5000){
  start <- sample(1:k, 1)
  s <- seq(start, N, k)
  sys.sample <- pop[s]
  m[i] = mean(sys.sample)
}
mean(m); var(m)

Question 5: Why do we use simulation for sampling?
Answer: The simulation study is useful to evaluate a sampling strategy. We can generate the populations considering specific situations. Generating the population, the sample of size $n$ is obtained $k$ times. From each sample, the estimator is obtained. The variance of $k$ estimators is calculated for examining the efficiency.

Question 6: Write an R code to Simulate a coin-tossing experiment.

# Define the Number of tosses of a coin
n_tosses <- 100

# Simulate coin tosses (1 for heads, 0 for tails)
coin_tosses <- sample(c(0, 1), n_tosses, replace = TRUE)

# Calculate the proportion of heads
prop_heads <- mean(coin_tosses)

# Display results
cat("Number of Heads:", sum(coin_tosses), "\n")
cat("Proportion of Heads:", prop_heads, "\n")

# Plot the results
barplot(c(sum(coin_tosses), n_tosses - sum(coin_tosses)),
        names.arg = c("Heads", "Tails"),
        col = c("skyblue", "salmon"),
        main = "Coin Toss Simulation")
Simulation in R for Sampling

One can adapt these examples for more complex statistical simulations or specific scenarios by modifying the simulation process and analyzing the results accordingly. Simulations are commonly used in various fields, such as statistics, finance, and operations research, to model and analyze uncertain or random processes.

Simulation Data in R using For Loops

Learn Basic Statistics and Data Analysis

Using R as a Calculator

In the Windows Operating system, The R installer will have created an icon for R on the desktop and a Start Menu item. Double-click the R icon to start the R Program; R will open the console, to type R commands.

The greater than sing (>) in the console is the prompt symbol. In this tutorial, we will use the R language as a calculator (we will be Using R as a Calculator for mathematical expressions), by typing some simple mathematical expressions at the prompt (>). Anything that can be computed on a pocket calculator can also be computed at the R prompt. After entering the expression on the prompt, you have to press the Enter key from the keyboard to execute the command.

Using R Language As a Calculator

Some examples using R as a calculator are as follows

> 1 + 2   #add two or more numbers
> 1 - 2   #Substracts two or more numbers
> 1 * 2   #multiply two or more numbers
> 1 / 2   #divides two more more numbers
> 1 %/% 2 #gives the integer part of the quotient
> 2 ^ 1   #gives exponentiation
> 31 %% 7 #gives the remainder after division

These operators also work fine for complex numbers.

Upon pressing the enter key, the result of the expression will appear, prefixed by a number in a square bracket:

> 1 + 2
[1] 54

The [1] indicates that this is the first result from the command.

One can also use R as an advanced scientific calculator. Some advanced calculations that are available in scientific calculators can also be easily done in R for example,

> sqrt(5)      #Square Root of a number
> log(10)      #Natural log of a number
> sin(45)      #Trignometric function (sin function)
> pi           #pi value 3.141593
> exp(2)       #Antilog, e raised to a power
> log10(5)     #Log of a number base 10
> factorial(5) #Factorial of a number e.g 5!
> abs(1/-2)    #Absolute values of a number 
> 2*pi/360     #Number of radian in one Babylonian degree of a circle

Remember R prints all very large or very small numbers in scientific notation.

Order of Precedence/ Operations

The R language also makes use of parentheses for grouping operations to follow the rules for the order of operations. for example

> 1 - 2/3   #It first computes 2/3 and then subtracts it from 1
> (1-2)/3   #It first computes (1-2) and then divides it by 3

The R Language recognizes certain goofs, like trying to divide by zero, missing values in data, etc.

> 1/0   #Undefined, R tells it an infinity (Inf)
> 0/0   #Not a number (NaN) 
> "one"/2   #Strings or characters is divided by a number

Further Reading: Computing Descriptive Statistics in R

Online MCQs Computer Science with Answers

Introduction to R Language

Introduction to R Language

What is R (Language)

R is an open-source (GPL) programming language for statistical computing and graphics, made after S and S-plus language. The S language was developed by AT&T laboratories in the late ’80s. Robert Gentleman and Ross Ihaka started the research project of the statistics department of the University of Auckland in 1995 called R Language.

The R language is currently maintained by the R core development team (an international team of volunteer developers). The (R Project website) is the main site for information about R. From this page information about obtaining the software, accompanying package, and many other sources of documentation (help files) can be obtained.

R provides a wide variety of statistical and graphical techniques such as linear and non-linear modeling, classical statistical tests, time-series analysis, classification, multivariate analysis, etc., as it is an integrated suite of software having facilities for data manipulation, calculation, and graphics display. It includes

  • Effective data handling and storage facilities
  • Have a suite of operators for calculation on arrays, particularly for matrices
  • Have a large, coherent, integrated collection of intermediate tools for data analysis
  • Graphical data analysis
  • Conditions, loops, user-defined recursive functions, and input-output facilities.

Obtaining R Software

R program can be obtained/downloaded from the R Project site the ready-to-run (binaries) files for several operating systems such as Windows, Mac OS X, Linux, Solaris, etc. The source code for R is also available for download and can be compiled for other platforms. R language simplifies many statistical computations as R is a very powerful statistical language having many statistical routines (programming code) developed by people from all over the world and are freely available from the R project website as “Packages”. The basic installation of R language contains many powerful sets of tools and it includes some basic packages required for data handling and data analysis.

Many users of R think of R as a statistical system, but it is an environment within which statistical techniques are implemented. The R language can also be extended via packages.

Installing R

For Windows, the operating system binary version is available from http://cran.r- project.org/bin/windows/base/. “R-3.0.0-win.exe. R-3.0.0” is the latest version of R released on 03 April 2013 by Duncan Murdoch.
After downloading the binary file double-click it, and almost automatic installation of the R system will start although the customized installation option is also available. Follow the instructions during the installation procedure. Once the installation process is complete, you have the R icon on your computer desktop.

The R Console

When R starts, you will see R console windows, where you type commands to get the required results. Note that commands are typed on the R Console command prompt. You can also edit the commands previously typed on the command prompt by using the left, right, up, and down arrow keys, home, end, backspace, insert, and delete keys from the keyboard. Command history can be obtained by up and down arrow keys to scroll through recent commands. It is also possible to type commands in a file and then execute the file, using the source function in the R console.

Books on R Programming Language

The following books can be useful for learning the R and S language.

  • “Psychologie statistique avec R” by Yvonnick Noel. Partique R. Springer, 2013.
  • “Instant R: An introduction to R for Statistical Analysis” by Sarah Stowell. Jotunheim Publishing, 2012.
  • “Financial Risk Modeling and Portfolio Optimization with R” by Bernhard Pfaff. Wiley, Chichester, UK, 2012.
  • “An R Companion to Applied Regression” by John Fox and Sanford Weisberg, Sage Publications, Thousand Oaks, CA, USA, 2nd Edition, 2011,
  • “R Graphs Cookbook” by Hrishi Mittal, Packt Publishing, 2011
  • “R in Action” by Rob Kabacoff. Manning, 2010.
  • “The Statistical Analysis with R Beginners Guide” by John M. Quick. Packt Publishing, 2010.
  • “Introducing Monte Carlo Methods with R” by Christian Robert and George Casella. Use R. Springer, 2010.
  • “R for SAS and SPSS users” by Robert A. Muenchen. Springer Series in Statistics and Computing. Springer, 2009.

Reading Creating and Import Data in R

There are many ways to read data into R-Language.  We will learn here how to import data in R Language too. We can also generate certain kinds of patterned data. Some of them are

Reading Data from the Keyboard Directly

For small data (few observations) one can input data in vector form directly on R Console, such as

x <- c(1,2,3,4,5)
y <-c('a', 'b', 'c')

In vector form, data can be on several lines by omitting the right parentheses, until the data are complete, such as

x <-c(1,2 
      3,4)

Note that it is more convenient to use the scan function, which permits the index of the next entry.

Using Scan Function

For small data sets it is better to read data from the console by using the scan function. The data can be entered on a separate line, by using a single space and/or tab. After entering the complete required data, pressing twice the enter key will terminate the scanning.

X <-scan()
     3 4 5
     4 5 6 7
     2 3 4 5 6 6

Reading String Data using the “what” Option

y <- scan(what=" ")
      red green blue
      white

The scan function can be used to import data. The scan function returns a list or a vector while read.table function returns a data frame. It means that the scan function is less useful for imputing “rectangular” type data.

Reading data from ASCII or plain text files into R as Data Frame

The read.table function reads any type of delimited ASCII file. It can be numeric and character values. Reading data into R by read.table is the easiest and most reliable method. The default delimiter is a blank space.

data <-read.table(file=file.choose()) #select from dialog box
data <-read.table("http://itfeature.com/test.txt", header=TRUE)) # read from web site

Note that read.table command can also be used for reading data from the computer disk by providing an appropriate path in inverted commas such as

data <-read.table("D:/data.txt", header=TRUE)) # read from your computer

For missing data, read.table will not work and you will receive an error. For missing values the easiest way to fix this error, change the type of delimiter by using a sep argument to specify the delimiter.

data <-read.table("http//itfeature.com/missing_comma.txt", header=TRUE, sep=","))

Comma-delimited files can be read in by read.table function and sep argument, but it can also be read in by the read.csv function specifically written for comma-delimited files. To display the contents of the file use print() function or file name.

data <-read.csv(file=file.choose())

Reading in fixed formatted files

To read data in fixed format use read.fwf function and argument width are used to indicate the width (number of columns) for each variable. In this format variable names are not there in the first line, therefore they must be added after reading the data. Variable names are added by dimnames function and the bracket notation to indicate that we are attaching names to the variables (columns) of the data file. Anyhow there are several different ways to do this task.

data <-read.fwf("http://itfeature.com/test_fixed.txt", width=c(8,1,3,1,1,1))
dimnames(data)[[2]]
c("v1", "v2", "v3", "v4", "v5","v6")

Import Data In R

Importing data in R is fairly simple. For Stata and Systat, use the foreign package. For SPSS and SAS recommended package is the Hmisc package for ease and functionality. See the Quick-R section on packages, for information on obtaining and installing these packages. Examples of importing data in R are provided below.

From Excel

On Windows systems, you can use the RODBC package to access Excel files. The first row of the Excel file should contain variable/column names.

# Excel file name is myexcel and WorkSheet name is mysheet
library(RODBC)
channel <- odbcConnectExcel("c:/myexel.xls")
mydata <- sqlFetch(channel, "mysheet") 
odbcClose(channel)
From SPSS
# First save SPSS dataset in trasport format
get file='c:\data.sav'
export outfile='c:\data.por' 
library(Hmisc)
mydata <- spss.get("c:/data.por", use.value.labels=TRUE)   # "use.value.labels" option converts value labels to R factors.
From SAS
# save SAS dataset in trasport format
libname out xport 'c:/mydata.xpt';
data out.data;
set sasuser.data;
run;

# in R
library(Hmisc)
mydata <- sasxport.get("c:/data.xpt")
# character variables are converted to R factors

From Stata
# input Stata file
library(foreign)
mydata <- read.dta("c:/data.dta")

From systat
# input Systat file
library(foreign)
mydata <- read.systat("c:/mydata.dta")
Importing Data in R

Accessing Data in R Library

Many of the R libraries including CAR library contain data sets. For example to access the Duncan data frame from the CAR library in R type the following command on R Console

library(car)
data(Duncan)
attach(Duncan)

Some Important Commands for Dataframes

data        #displays the entire data set on command editor
head(data)  #displays the first 6 rows of dataframe
tail(data)  #displays the last 6 rows of dataframe
str(data)   #displays the names of variable and their types
names(data) #shows the variable names only
rename(V1,Variable1, dataFrame=data) # renames V1 to variable 1; note that epicalc packagemust be installed
ls()        #shows a list of objects that are available
attach(data)#attached the dataframe to the R search path, which makes it easy to access variables names.
x  Powerful Protection for WordPress, from Shield Security
This Site Is Protected By
Shield Security