R FAQS

R Frequently Asked Questions (FAQS)

Reading Creating and Import Data in R

There are many ways to read data into R-Language.  We will learn here how to import data in R Language too. We can also generate certain kinds of patterned data. Some of them are

Reading Data from the Keyboard Directly

For small data (few observations) one can input data in vector form directly on R Console, such as

x <- c(1,2,3,4,5)
y <-c('a', 'b', 'c')

In vector form, data can be on several lines by omitting the right parentheses, until the data are complete, such as

x <-c(1,2 
      3,4)

Note that it is more convenient to use the scan function, which permits the index of the next entry.

Using Scan Function

For small data sets it is better to read data from the console by using the scan function. The data can be entered on a separate line, by using a single space and/or tab. After entering the complete required data, pressing twice the enter key will terminate the scanning.

X <-scan()
     3 4 5
     4 5 6 7
     2 3 4 5 6 6

Reading String Data using the “what” Option

y <- scan(what=" ")
      red green blue
      white

The scan function can be used to import data. The scan function returns a list or a vector while read.table function returns a data frame. It means that the scan function is less useful for imputing “rectangular” type data.

Reading data from ASCII or plain text files into R as Data Frame

The read.table function reads any type of delimited ASCII file. It can be numeric and character values. Reading data into R by read.table is the easiest and most reliable method. The default delimiter is a blank space.

data <-read.table(file=file.choose()) #select from dialog box
data <-read.table("http://itfeature.com/test.txt", header=TRUE)) # read from web site

Note that read.table command can also be used for reading data from the computer disk by providing an appropriate path in inverted commas such as

data <-read.table("D:/data.txt", header=TRUE)) # read from your computer

For missing data, read.table will not work and you will receive an error. For missing values the easiest way to fix this error, change the type of delimiter by using a sep argument to specify the delimiter.

data <-read.table("http//itfeature.com/missing_comma.txt", header=TRUE, sep=","))

Comma-delimited files can be read in by read.table function and sep argument, but it can also be read in by the read.csv function specifically written for comma-delimited files. To display the contents of the file use print() function or file name.

data <-read.csv(file=file.choose())

Reading in fixed formatted files

To read data in fixed format use read.fwf function and argument width are used to indicate the width (number of columns) for each variable. In this format variable names are not there in the first line, therefore they must be added after reading the data. Variable names are added by dimnames function and the bracket notation to indicate that we are attaching names to the variables (columns) of the data file. Anyhow there are several different ways to do this task.

data <-read.fwf("http://itfeature.com/test_fixed.txt", width=c(8,1,3,1,1,1))
dimnames(data)[[2]]
c("v1", "v2", "v3", "v4", "v5","v6")

Import Data In R

Importing data in R is fairly simple. For Stata and Systat, use the foreign package. For SPSS and SAS recommended package is the Hmisc package for ease and functionality. See the Quick-R section on packages, for information on obtaining and installing these packages. Examples of importing data in R are provided below.

From Excel

On Windows systems, you can use the RODBC package to access Excel files. The first row of the Excel file should contain variable/column names.

# Excel file name is myexcel and WorkSheet name is mysheet
library(RODBC)
channel <- odbcConnectExcel("c:/myexel.xls")
mydata <- sqlFetch(channel, "mysheet") 
odbcClose(channel)
From SPSS
# First save SPSS dataset in trasport format
get file='c:\data.sav'
export outfile='c:\data.por' 
library(Hmisc)
mydata <- spss.get("c:/data.por", use.value.labels=TRUE)   # "use.value.labels" option converts value labels to R factors.
From SAS
# save SAS dataset in trasport format
libname out xport 'c:/mydata.xpt';
data out.data;
set sasuser.data;
run;

# in R
library(Hmisc)
mydata <- sasxport.get("c:/data.xpt")
# character variables are converted to R factors

From Stata
# input Stata file
library(foreign)
mydata <- read.dta("c:/data.dta")

From systat
# input Systat file
library(foreign)
mydata <- read.systat("c:/mydata.dta")
Importing Data in R

Accessing Data in R Library

Many of the R libraries including CAR library contain data sets. For example to access the Duncan data frame from the CAR library in R type the following command on R Console

library(car)
data(Duncan)
attach(Duncan)

Some Important Commands for Dataframes

data        #displays the entire data set on command editor
head(data)  #displays the first 6 rows of dataframe
tail(data)  #displays the last 6 rows of dataframe
str(data)   #displays the names of variable and their types
names(data) #shows the variable names only
rename(V1,Variable1, dataFrame=data) # renames V1 to variable 1; note that epicalc packagemust be installed
ls()        #shows a list of objects that are available
attach(data)#attached the dataframe to the R search path, which makes it easy to access variables names.

Important Frequently Asked Questions about R

This post is about some frequently asked Questions about R Language. These questions will help you prepare for examinations and interviews.

Frequently Asked Questions About R

Question: What is a Compiler in R Language?
Answer: A compiler is software that transforms computer code (source code) to another computer language (target language, i.e., object code).

Question: What is a package in R Language?
Answer: The R package is a collection of R functions, compiled code, sample data, and help documentation. The R packages are stored in a directory called “library” in the R environment. The R language also installed a set of packages during installation.

Question: What is JIT?
Answer:
JIT standards for “Just in Time” compiler. It is a method to improve the run-time performance of a computer program.

Question: What is procedural Programming in R Language?
Answer:
Procedural programming is derived from structured programming and it is based on the concept of procedure call. Procedures are also known as routines, subroutines, or functions. It contains a series of computational steps to be carried out. Any procedure may be called (at any point) during a program’s execution.

Question: What is the recycling of elements in a vector?
Answer: When a mathematical operation (such as addition, subtraction, multiplication, division, etc) is performed on two vectors of different lengths (the number of elements in both vectors is different), the element having a shorter length is reused to complete the mathematical operations.

vect1 <- c(4, 1, 4, 5, 6, 9)
vect2 <- c(2, 5)

vect1 * vect2 

###
8, 5, 8, 25, 12, 45

The elements of vect2 are recycled to complete the operation of all elements of vect1.

Question: What is the difference between a data frame and a matrix in R Language?
Answer: In R, the data frame contains heterogeneous data (different columns of the data frame may have different types of variable) while a matrix contains homogeneous data (all the columns of the matrix have the same type of variable). In a matrix, similar data types can be stored while in a data frame, different types of data can be stored.

See Questions about R language Missing Values

R Language: A Quick Reference – IV

R Programming: A Quick Reference

R language: A Quick Reference is about learning R Programming with a short description of the widely used commands. It will help the learner and intermediate user of the R Programming Language to get help with different functions quickly. This Quick Reference is classified into different groups. Let us start with R Language: A Quick Reference – IV.

This R Language: A Quick Reference contains R commands about performing different descriptive statistics on vectors, matrices, lists, data frames, arrays, and factors.

Basic Descriptive Statistics in R Language

The following is the list of widely used functions that are further helpful in computing descriptive statistics. The functions below are not direct descriptive statistics functions, however, these functions are helpful to compute other descriptive statistics.

R CommandShort Description
sum(x1, x2, … , xn)Computes the sum/total of $n$ numeric values given as argument
prod(x1, x2, … , xn)Computes the product of all $n$ numeric values given as argument
min(x1, x2, … , xn)Gives smallest of all $n$ values given as argument
max(x1, x2, …, xn)Gives largest of all $n$ values given as argument
range(x1, x2, … , xn)Gives both the smallest and largest of all $n$ values given as argument
pmin(x1, x2, …)Returns minima of the input values
pmax(x1, x2, …)Returns maxima of the input values

Statistical Descriptive Statistics in R Language

The following is the list of functions that are used to compute measures of central tendency (Mean and Median), measures of dispersion (Standard Deviation and Variance), and measures of locations (Quantiles, and Median).

R CommandShort Description
mean(x)Computes the arithmetic mean of all elements in $x$
sd(x)Computes the standard deviation of all elements in $x$
var(x)Computes the variance of all elements in $x$
median(x)Computes the median of all elements in $x$
quantile(x)Computes the median, quartiles, and extremes in $x$
quantile(x, p)Computes the quantiles specified by $p$

Cumulative Summaries in R Language

The following functions are also helpful in computing the other descriptive calculations.

R CommandShort Description
cumsum(x)Computes the cumulative sum of $x$
cumprod(x)Computes the cumulative product of $x$
cummin(x)Computes the cumulative minimum of $x$
cummax(x)Computes the cumulative maximum of $x$

Sorting and Ordering Elements in R Language

The sorting and ordering functions are useful in especially non-parametric methods.

R CommandShort Description
sort(x)Sort the all elements of $x$ in ascending order
sort(x, decreasing = TRUE)Sor the all elements of $x$ in descending order
rev(x)Reverse the elements in $x$
order(x)Get the ordering permutation of $x$

Sequence and Repetition of Elements in R Language

These functions are used to generate a sequence of numbers or repeat the set of numbers $n$ times.

R CommandShort Description
a:bGenerates a sequence of numbers from $a$ to $b$ in steps of size 1
seq(n)Generates a sequence of numbers from 1 to $n$
seq(a, b)Generates a sequence of numbers from $a$ to $b$ in steps of size 1, it is the same as a:b
seq(a, b, by=s)Generates a sequence of numbers from $a$ to $b$ in steps of size $s$.
seq(a, b, length=n)Generates a sequence of numbers having length $n$ from $a$ to $b$
rep(x, n)Repeats the elements $n$ times
rep(x, each=n)Repeats the elements of $x$, each element is repeated $n$ times

R Language: A Quick Reference – I

Visit https://gmstat.com

R Language: A Quick Reference – III

R Programming: A Quick Reference

R language: A Quick Reference is about learning R Programming with a short description of the widely used commands. It will help the learner and intermediate user of the R Programming Language to get help with different functions quickly. This Quick Reference is classified into different groups. Let us start with R Language: A Quick Reference – III.

This R Language: A Quick Reference contains R commands about subsetting of vectors, matrices, lists, data frames, arrays, and factors. It also discusses setting the different properties related to R language data types.

Subsetting Vectors in R Language

The following are ways to subset or slice the values from a vector.

R CommandShort Description
x[1:5]Select elements of $x$ by index
x[-(1:5)]Exclude elements of $x$ by index
x[c(TRUE, FALSE)]Select elements of $x$ corresponding to the True value
x[c(“a”, “b”)]Select elements of $x$ by name

Subsetting Lists in R Language

The following methods are used to subset of slice a list in R Language.

R CommandShort Description
x[1:5]Extracts a sublist of the list $x$
x[-(1:5)]Extract a sublist by excluding elements of list $x$
x[c(TRUE, FALSE)]Extract a sublist with logical subscripts
x[c(“a”, “b”)]Extract a sublist by name
x[[2]]Extract an element of the list $x$
x[[“a”]]Extract the element with the name “a” from list $x$
x$aExtract the element with the name “a” from list $x$

Subsetting Matrices in R Language

To subset or extract certain elements from a matrix follow the ways described below.

R CommandShort Description
x[i, j]Extracts elements of matrix $x$, specified by row $i$ and column $j$
x[i, j] = vSet or rest the elements of matrix $x$, specified by row $i$ and column $j$
x[i, ]Extracts $i$th row of a matrix $x$
x[i, ] = vSet or resets the $i$th row of a matrix $x$ specified by $i$th row
x[ , j]Extracts the $j$ column of a matrix $x$
x[ , j] = vSets or resets the $j$ column of matrix $x$
x[i]Subets a matrix $x$ as a vector
x[i] = vSets or resets the $i$th elements (treated as a vector operation)

Subsetting a Data Frame in R Language

One can easily subset or slice a Data Frame in R.

R CommandShort Description
df[i, j]Matrix subsetting of a data frame, specified by $i$th row and $j$th column
df[i, j] = dfvSets or resets a subset of a data frame
subset(df, subset = i)Subset of the $i$ cases/ observations of a data frame
subset(df, select = i)Subset of the $i$ variables/ columns of a data frame
subset(df, subset=i, select=j)Subset of the $i$ cases and $j$ variables of a data frame

R Language: A Quick Reference – I

Scroll to top
x  Powerful Protection for WordPress, from Shield Security
This Site Is Protected By
Shield Security