There are many ways to read data into R Language. We will learn here how to import data in R Language too. We can also generate certain kinds of patterned data. Some of them are:
Table of Contents
Reading Data from the Keyboard Directly
For small data (few observations) one can input data in vector form directly on R Console, such as
x <- c(1, 2, 3, 4, 5) y <-c('a', 'b', 'c')
In vector form, data can be on several lines by omitting the right parentheses, until the data are complete, such as
x <- c(1, 2 3, 4)
Note that it is more convenient to use the scan function, which permits the index of the next entry.
Using Scan Function
For small data sets it is better to read data from the console by using the scan function. The data can be entered on a separate line, by using a single space and/or tab. After entering the complete required data, pressing twice the enter key will terminate the scanning.
X <- scan() 1: 3 4 5 4: 4 5 6 7 8: 2 3 4 5 6 6 14: Read 13 items
Reading String Data using the “what” Option
y <- scan(what=" ") 1: red green blue 4: white 5: Read 4 items
The scan function can be used to import data. The scan function returns a list or a vector while read.table
function returns a data frame. It means that the scan function is less useful for imputing “rectangular” type data.
Reading data from ASCII or plain text files into R as Data Frame
The read.table
function reads any type of delimited ASCII file. It can be numeric and character values. Reading data into R read.table
is the easiest and most reliable method. The default delimiter is a blank space.
data <- read.table(file=file.choose()) #select from dialog box data <- read.table("http://itfeature.com/test.txt", header=TRUE)) # read from web site
Note that the read.table
command can also be used for reading data from the computer disk by providing an appropriate path in inverted commas such as
data <-read.table("D:/data.txt", header=TRUE)) # read from your computer
For missing data, read.table
will not work and you will receive an error. For missing values the easiest way to fix this error, change the type of delimiter by using a sep argument to specify the delimiter.
data <-read.table("http//itfeature.com/missing_comma.txt", header=TRUE, sep=","))
Comma-delimited files can be read in by read.table
function and sep argument, but they can also be read in by the read.csv
function specifically written for comma-delimited files. To display the contents of the file use print()
function or file name.
data <- read.csv(file=file.choose() )
Reading in fixed formatted files
To read data in fixed format use read.fwf
function and argument width are used to indicate the width (number of columns) for each variable. In this format variable names are not there in the first line, therefore they must be added after reading the data. Variable names are added by dimnames
function and the bracket notation to indicate that we are attaching names to the variables (columns) of the data file. Anyhow there are several different ways to do this task.
data <- read.fwf("http://itfeature.com/test_fixed.txt", width = c(8,1,3,1,1,1) ) dimnames(data)[[2]] c("v1", "v2", "v3", "v4", "v5","v6")
Import Data In R
Importing data in R is fairly simple. For Stata and Systat, use the foreign package. For SPSS and SAS recommended package is the Hmisc package for ease and functionality. See the Quick-R section on packages, for information on obtaining and installing these packages. Examples of importing data in R are provided below.
From Excel
On Windows systems, you can use the RODBC package to access Excel files. The first row of the Excel file should contain variable/column names.
# Excel file name is myexcel and WorkSheet name is mysheet library(RODBC) channel <- odbcConnectExcel("c:/myexel.xls") mydata <- sqlFetch(channel, "mysheet") odbcClose(channel)
From SPSS
# First save SPSS dataset in trasport format get file = 'c:\data.sav' export outfile = 'c:\data.por'
library(Hmisc) mydata <- spss.get("c:/data.por", use.value.labels=TRUE) # "use.value.labels" option converts value labels to R factors.
From SAS
# save SAS dataset in trasport format libname out xport 'c:/mydata.xpt'; data out.data; set sasuser.data; run; # in R library(Hmisc) mydata <- sasxport.get("c:/data.xpt") # character variables are converted to R factors From Stata # input Stata file library(foreign) mydata <- read.dta("c:/data.dta") From systat # input Systat file library(foreign) mydata <- read.systat("c:/mydata.dta")
Accessing Data in R Library
Many of the R libraries including CAR library contain data sets. For example to access the Duncan data frame from the CAR library in R type the following command on R Console
library(car) data(Duncan) attach(Duncan)
Some Important Commands for Dataframes
data #displays the entire data set on command editor head(data) #displays the first 6 rows of dataframe tail(data) #displays the last 6 rows of dataframe str(data) #displays the names of variable and their types names(data) #shows the variable names only rename(V1,Variable1, dataFrame=data) # renames V1 to variable 1; note that epicalc packagemust be installed ls() #shows a list of objects that are available attach(data)#attached the dataframe to the R search path, which makes it easy to access variables names.