Import Data in R, Reading, and Creating Data

There are many ways to read data into R Language.  We will learn here how to import data in R Language too. We can also generate certain kinds of patterned data. Some of them are:

Reading Data from the Keyboard Directly

For small data (few observations) one can input data in vector form directly on R Console, such as

x <- c(1, 2, 3, 4, 5)
y <-c('a', 'b', 'c')

In vector form, data can be on several lines by omitting the right parentheses, until the data are complete, such as

x <- c(1, 2 
       3, 4)

Note that it is more convenient to use the scan function, which permits the index of the next entry.

Using Scan Function

For small data sets it is better to read data from the console by using the scan function. The data can be entered on a separate line, by using a single space and/or tab. After entering the complete required data, pressing twice the enter key will terminate the scanning.

X <- scan()
1:   3 4 5
4:   4 5 6 7
8:   2 3 4 5 6 6
14:
Read 13 items

Reading String Data using the “what” Option

y <- scan(what=" ")
1:    red green blue
4:    white
5:
Read 4 items

The scan function can be used to import data. The scan function returns a list or a vector while read.table function returns a data frame. It means that the scan function is less useful for imputing “rectangular” type data.

Reading data from ASCII or plain text files into R as Data Frame

The read.table function reads any type of delimited ASCII file. It can be numeric and character values. Reading data into R read.table is the easiest and most reliable method. The default delimiter is a blank space.

data <- read.table(file=file.choose()) #select from dialog box

data <- read.table("http://itfeature.com/test.txt", header=TRUE)) # read from web site

Note that the read.table command can also be used for reading data from the computer disk by providing an appropriate path in inverted commas such as

data <-read.table("D:/data.txt", header=TRUE)) # read from your computer

For missing data, read.table will not work and you will receive an error. For missing values the easiest way to fix this error, change the type of delimiter by using a sep argument to specify the delimiter.

data <-read.table("http//itfeature.com/missing_comma.txt", header=TRUE, sep=","))

Comma-delimited files can be read in by read.table function and sep argument, but they can also be read in by the read.csv function specifically written for comma-delimited files. To display the contents of the file use print() function or file name.

data <- read.csv(file=file.choose() )

Reading in fixed formatted files

To read data in fixed format use read.fwf function and argument width are used to indicate the width (number of columns) for each variable. In this format variable names are not there in the first line, therefore they must be added after reading the data. Variable names are added by dimnames function and the bracket notation to indicate that we are attaching names to the variables (columns) of the data file. Anyhow there are several different ways to do this task.

data <- read.fwf("http://itfeature.com/test_fixed.txt", width = c(8,1,3,1,1,1) )

dimnames(data)[[2]]
c("v1", "v2", "v3", "v4", "v5","v6")

Import Data In R

Importing data in R is fairly simple. For Stata and Systat, use the foreign package. For SPSS and SAS recommended package is the Hmisc package for ease and functionality. See the Quick-R section on packages, for information on obtaining and installing these packages. Examples of importing data in R are provided below.

From Excel

On Windows systems, you can use the RODBC package to access Excel files. The first row of the Excel file should contain variable/column names.

# Excel file name is myexcel and WorkSheet name is mysheet
library(RODBC)
channel <- odbcConnectExcel("c:/myexel.xls")
mydata <- sqlFetch(channel, "mysheet") 
odbcClose(channel)

From SPSS

# First save SPSS dataset in trasport format
get file = 'c:\data.sav'
export outfile = 'c:\data.por'
library(Hmisc)
mydata <- spss.get("c:/data.por", use.value.labels=TRUE)   
# "use.value.labels" option converts value labels to R factors.

From SAS

# save SAS dataset in trasport format
libname out xport 'c:/mydata.xpt';
data out.data;
set sasuser.data;
run;
# in R
library(Hmisc)
mydata &lt;- sasxport.get("c:/data.xpt")
# character variables are converted to R factors
From Stata
# input Stata file
library(foreign)
mydata &lt;- read.dta("c:/data.dta")
From systat
# input Systat file
library(foreign)
mydata &lt;- read.systat("c:/mydata.dta")
Importing Data in R

Accessing Data in R Library

Many of the R libraries including CAR library contain data sets. For example to access the Duncan data frame from the CAR library in R type the following command on R Console

library(car)
data(Duncan)
attach(Duncan)

Some Important Commands for Dataframes

data        #displays the entire data set on command editor
head(data)  #displays the first 6 rows of dataframe
tail(data)  #displays the last 6 rows of dataframe
str(data)   #displays the names of variable and their types
names(data) #shows the variable names only
rename(V1,Variable1, dataFrame=data) # renames V1 to variable 1; note that epicalc packagemust be installed
ls()        #shows a list of objects that are available
attach(data)#attached the dataframe to the R search path, which makes it easy to access variables names.

https://gmstat.com

https://itfeature.com

Important R Programming MCQs 11

The post is about R Programming MCQs Quiz with Answers. The quiz covers, MCQs about Rstudio, Data Analysis in R, and Some Basics of R Programming Languages. Let us start with the R Programming MCQs Quiz.

Online Multiple Choice Questions about R Programming Language

1. Many data analysts prefer to use a programming language for which of the following reasons?

 
 
 
 

2. RStudio includes which of the following panes?

 
 
 
 

3. A data analyst is searching for a tool that will allow them to communicate instructions that a computer can run. What tool should they use?

 
 
 
 

4. What type of software application is RStudio?

 
 
 
 

5. The R programming language can be used for which of the following tasks?

 
 
 
 

6. Which of the following are the benefits of open-source code? Select all that apply.

 
 
 
 

7. A data analyst is working with spreadsheet data. The analyst imports the data from the spreadsheet into RStudio. Where in RStudio can the analyst find the imported data?

 
 
 
 

8. What tool gives data analysts the highest level of control over their data analysis?

 
 
 
 

9. What are the benefits of using a programming language for data analysis?

 
 
 
 

10. An analyst includes the following calculation in their R programming: midyear_sales <- (quarter_1_sales + quarter_2_sales) - overhead_costsWhich variable will the total from this calculation be assigned to?

 
 
 
 

11. What are the benefits of using a programming language to work with your data?

 
 
 
 

12. A data analyst wants to use a programming language that they can modify. What type of programming language should they use?

 
 
 
 

13. A data analyst needs to quickly create a series of scatterplots to visualize a very large dataset. What should they use for the analysis?

 
 
 
 

14. A data analyst wants to write R code that they can access again after they close their current session in RStudio. Where should they write their code?

 
 
 
 

15. When using RStudio, what does the installed.packages()function do?

 
 
 
 

16. If you write code directly in the R source editor, RStudio can save your code when you close your current session.

 
 

17. In data analytics, what is CRAN?

 
 
 
 

18. Why do analysts use comments In R programming?

 
 
 
 

19. Programming involves _____ a computer to perform an action or set of actions.

 
 
 
 

20. A data analyst is searching for an open-source tool that will allow them to reproduce every step of their analysis, including data cleaning and transformations, calculations, and visualizations. What tool is the best option?

 
 
 
 

R Programming MCQs

Online R Programming MCQs Quiz

  • Programming involves ___________ a computer to perform an action or set of actions.
  • What are the benefits of using a programming language to work with your data?
  • The R programming language can be used for which of the following tasks?
  • What type of software application is RStudio?
  • RStudio includes which of the following panes?
  • If you write code directly in the R source editor, RStudio can save your code when you close your current session.
  • What tool gives data analysts the highest level of control over their data analysis?
  • A data analyst is searching for a tool that will allow them to communicate instructions that a computer can run. What tool should they use?
  • What are the benefits of using a programming language for data analysis?
  • Many data analysts prefer to use a programming language for which of the following reasons?
  • A data analyst wants to use a programming language that they can modify. What type of programming language should they use?
  • A data analyst is searching for an open-source tool that will allow them to reproduce every step of their analysis, including data cleaning and transformations, calculations, and visualizations. What tool is the best option?
  • When using RStudio, what is the installed.packages() function do?
  • In data analytics, what is CRAN?
  • Why do analysts use comments In R programming?
  • An analyst includes the following calculation in their R programming: midyear_sales <- (quarter_1_sales + quarter_2_sales) – overhead_costsWhich variable will the total from this calculation be assigned to?
  • Which of the following are the benefits of open-source code?
  • A data analyst needs to quickly create a series of scatterplots to visualize a large dataset. What should they use for the analysis?
  • A data analyst wants to write R code that they can access again after they close their current session in RStudio. Where should they write their code?
  • A data analyst is working with spreadsheet data. The analyst imports the data from the spreadsheet into RStudio. Where in RStudio can the analyst find the imported data?

Statistics and Data Analysis Website

Online MCQs Website Quiz with Answers

Binomial Random Numbers Generation in R

We will learn how to generate Bernoulli or Binomial Random Numbers (Binomial distribution) in R with the example of a flip of a coin. This tutorial is based on how to generate random numbers according to different statistical probability distributions in R. Our focus is on binomial random numbers generation in R.

Binomial Random Numbers in R

We know that in Bernoulli distribution, either something will happen or not such as a coin flip has two outcomes head or tail (either head will occur or head will not occur i.e. tail will occur). For an unbiased coin, there will be a 50% chance that the head or tail will occur in the long run. To generate a random number that is binomial in R, use the rbinom(n, size, prob) command.

rbinom(n, size, prob) #command has three parameters, namey

where
‘$n$’ is the number of observations
‘$size$’ is the number of trials (it may be zero or more)
‘$prob$’ is the probability of success on each trial for example 1/2

Examples of Generation Binomial Random Numbers

  • One coin is tossed 10 times with a probability of success=0.5
    the coin will be fair (unbiased coin as p=1/2)
    rbinom(n=10, size=1, prob=1/2)
    OUTPUT: 1 1 0 0 1 1 1 1 0 1
  • Two coins are tossed 10 times with a probability of success=0.5
  • rbinom(n=10, size=2, prob=1/2)
    OUTPUT: 2 1 2 1 2 0 1 0 0 1
  • One coin is tossed one hundred thousand times with a probability of success=0.5
    rbinom(n=100,000, size=1, prob=1/2)
  • store simulation results in $x$ vector
    x <- rbinom(n=100000, size=5, prob=1/2)
    count 1’s in x vector
    sum(x)
    find the frequency distribution
    table(x)
    creates a frequency distribution table with frequency
    t = (table(x)/n *100)
    plot frequency distribution table
    plot(table(x),ylab = "Probability",main = "size=5,prob=0.5")
Binomial Random Numbers

View the Video tutorial on rbinom command

Learn Basic Statistics and Online MCQs about Statistics