Category: Data Frame

Reading and Writing data in R

Reading and Writing Data in R

Reading Data in R

For reading (importing) data into R following are some functions.

  • read.table(), and read.csv(), for reading tabular data
  • readLines() for reading lines of a text file
  • source() for reading in R code files (inverse of dump)
  • dget() for reading in R code files (inverse of dput)
  • load() for reading in saved workspaces.

Writing Data to files

Following are few functions for writing (exporting) data to files.

  • write.table(), and write.csv() exports data to wider range of file format including csv and tab-delimited.
  • writeLines() write text lines to a text-mode connection.
  • dump() takes a vector of names of R objects and produces text representations of the objects on a file (or connection). A dump file can usually be sourced into another R session.
  • dput() writes a ASCII text representation of an R object to a file (or connection), or uses one to recreate the object.
  • save() writes an external representation of R objects to the specified file.

Reading data files with read.table()

The read.table() function is one of the most commonly used function for reading data into R. It has a few important arguments.

  • file, the name of a file, or a connection
  • header, logical indicating if the file has a header line
  • sep, a string indicating how the columns are separated
  • colClasses, a character vector indicating the class of each column in the data set
  • nrows, the number of rows in the dataset
  • comment.char, a character string indicating the comment character
  • skip, the number of lines to skip from the beginning
  • stringsAsFactors, should character variables be coded as factors?

read.table() and read.csv() Examples

data<-read.table(“foo.txt”)
data<-read.table(“D:\\datafiles\\mydata.txt”)
data<-read.csv(“D:\\datafiles\\mydata.csv”)

R will automatically skip lines that begin with a #, figure out how many rows there are (and how much memory needs to be allocated). R also figure out what type of variable is in each column of the table.

Writing data files with write.table()

Following are few important arguments usually used in write.table() function.

  • x, the object to be written, typically a data frame
  • file, the name of the file which the data are to be written to
  • sep, the field separator string
  • col.names, a logical value indicating whether the column names of x are to be written along with x, or a character vector of column names to be written
  • row.names, a logical value indicating whether the row names of x are to be written along with x, or a character vector of row names to be written
  • na, the string to use for missing values in the data

write.table() and write.csv() Examples

x <- data.frame(a=5, b=10, c=pi)
write.table(x, file=”data.csv”, sep=”,”)
write.table(x, “c:\\mydata.txt”, sep=”\t”)
write.csv(x, file=”data.csv”)

For further details about reading and writing data see Importing data

Import Data using read.table function

Question: How I can check my Working Directory so that I would be able to import my data in R.
Answer: To find working directory, the command getwd() can be used, that is

> getwd()

Question: How I can change working directory to my own path.
Answer: Use function setwd(), that is

> setwd(“d:/mydata”)
> setwd(“C:/Users/XYZ/Documents”)

Question: I have data set stored in text format (ASCII) that contain rectangular data. How I can read this data in tabular form. I have already set my working directory.
Answer: As data is already in a directory, which is set as working directory, use following command

> mydata <- read.table(“data.dat”)
> mydata <- read.table(“data.txt”)

mydata is named object that will have data from file “data.dat” or “data.txt” in data frame format. Each variable in data file will be named by default V1, V2, ….

Question: How this stored data can be to accessed?
Answer: To access the stored data, write data frame object name (“mydata”) with $ sign and name of the variable. That is,

mydata$V1
mydata$V2
mydata[“V1”]
mydata[,1]

Question: My data file has variables names in first row of the data file. In previous Question, variables names were V1, V2, V3, … How I can get actual names of the variable store in first row of data.dat file.
Answer: Instead of reading data file with default values of arguments, use

> read.table(“data.dat”, header = TRUE)

Question: I want to read a data file which is not store in working directory?
Answer: To access the data file which is not stored in working directory, provide complete path of the file, such as.

> read.table(“d:/data.dat” , header = TRUE)
> read.table(“d:/Rdata/data.txt” , header = TRUE)

Note that read.table() is used to read the data from external files that has a normally a special form:

  • The first line of the file should have a name for each variable in the data frame. However, if first row does not contains name of variable then header argument should not be set to FALSE.
  • Each additional line of the file has it first item a row label and the values for each variable.

In R it is strongly suggested that variables need to be held in data frame. For this purpose read.table() function can be used. For further details about read.table() function use,

help(read.table)

 

R FAQs about Data Frame

Please load the require data set before running the commands given below in R FAQs related to data frame. As an example for R FAQs about data frame we are assuming iris data set that is available already in R. At R prompt write data(iris)

Question: How to name or rename a column in a data frame?
Answer: Suppose you want to change/ rename the 3rd column of the data frame, then on R prompt write

>names (iris)[,3] <- “new_name”

Suppose you want to change second and third column of the data frame

>names(irisi)[c(2,4)] <- c(“A”, “D”)

Note that names(iris) command is used to find the names of each column in a data frame.

Question: How you can determine the column information of a data frame such as the “names, type, missing values” etc.?
Answer: There are two built-in functions in R to find the information about columns of a data frame.

> str(iris)
>summary(iris)

Question: How a data frame can be exported in R, so that it can be used in other statistical software?
Answer: Use write.csv command to export the data in comma separated format (CSV).

> write.csv(iris, “iris.csv”, row.names=FALSE)

Question: How one can select a particular row or column of a data frame?
Answer: The easiest way is to use the indexing notation []

Suppose you want to select first column only, then at R prompt, write

>iris[,1]

Suppose we want to select the first column and also want to put the content in a new vector, then

>new <- iris[,1]

Suppose you want to select different columns, for example columns 1, 3, and 5, then

>newdata <- iris[, c(1, 3, 5)]

Suppose you want to select first and third row, then

>iris[c(1,2), ]

Question: How to deal with missing values in a data frame?
Answer: In R language it is easy to deal with missing values. Suppose you want to import a file names “file.csv” that contains missing values represented by a “.” (period), then on R prompt write

>data<-read.csv(“file.csv”, na.string= “.”)

If missing values are represented as “NA” values then write

>dataset<-read.csv(“file.csv”, na.string=”NA”)

For the case of built in data such (here iris), use

>data<-na.omit(iris)