read.table Function in R (2016): A Comprehensive Guide

The post is about how to import data using read.table() function in R. You will also learn what is a file path and how to get and set the working directory in R language. The read.table() function in R is a powerful tool for importing tabular data, typically from text files, into the R environment. The read.table function converts the tabular data from a flat-file format into a more usable data structure called the data frame.

Question: How can I check my Working Directory so that I would be able to import my data in R? Answer: To find the working directory, the command getwd() can be used, that is

getwd()
import data using read.table function in R

Question: How can I change the working directory to my path?
Answer: Use function setwd(), that is

setwd("d:/mydata")
setwd("C:/Users/XYZ/Documents")

Import Data using read.table Function in R

Question: I have a data set stored in text format (ASCII) that contains rectangular data. How can I read this data in tabular form? I have already set my working directory.
Answer: As data is already in a directory set as the working directory, use the following command to import the data using read.table() command.

mydata <- read.table("data.dat")
mydata <- read.table("data.txt")

The mydata is a named object that will have data from the file “data.dat” or “data.txt” in data frame format. Each variable in the data file will be named by default V1, V2,…

Question: How this stored data can be accessed?
Answer: To access the stored data, write the data frame object name (“mydata”) with the $ sign and name of the variable. That is,

mydata$V1
mydata$V2
mydata["V1"]
mydata[ , 1]

Question: My data file has variable names in the first row of the data file. In the previous question, the variable names were V1, V2, V3, … How can I get the actual names of the variables stored in the first row of the data.dat file?
Answer: Instead of reading a data file with default values of arguments, use

read.table("data.dat", header = TRUE)

Question: I want to read a data file that is not stored in the working directory.
Answer: To access the data file that is not stored in the working directory, provide a complete path of the file, such as.

read.table("d:/data.dat" , header = TRUE)
read.table("d:/Rdata/data.txt" , header = TRUE)

Note that read.table() is used to read the data from external files that have normally a special form:

  • The first line of the file should have a name for each variable in the data frame. However, if the first row does not contain the name of a variable then the header argument should not be set to FALSE.
  • Each additional line of the file has its first item a row label and the values for each variable.

In R it is strongly suggested that variables need to be held in the data frame. For this purpose read.table() function in R can be used. For further details about read.table() function use,

help(read.table)
read.table function in R; rfaqs.com

Important Arguments of read.table Function:

  • file: (required argument) it is used to specify the path to the file one wants to read.
  • header: A logical value (TRUE or FALSE) indicating whether the first line of the file contains column names. The default value is set to FALSE.
  • sep: The separator that segregates values between columns. The default is set to white space. One can specify other delimiters like commas (“,”) or tabs (“\t”).
  • as.is: A vector of logical values or column indices specifying which columns to read as characters and prevent conversion to numeric or factors.
  • colClasses: A vector specifying the data type for each column. Useful for ensuring specific data formats during import. This can be useful to ensure the data is read in the correct format (e.g., numeric, character).
Learn R Language and FAQS

https://gmstat.com

https://itfeature.com

Data Frame in R Language

Please load the required data set before running the commands given below in R FAQs related to the data frame. As an example for R FAQs about data frame in R, we are assuming the iris data set is available already in R. At R prompt write data(iris).

Naming/ Renaming Columns in a Data Frame

Question: How do you name or rename a column in a data frame?
Answer: Suppose you want to change/ rename the 3rd column of the data frame, then on R prompt write

names (iris)[,3] <- "new_name"

Suppose you want to change the second and third columns of the data frame

names(irisi)[c(2,4)] <- c("A", "D")

Note that names(iris) command can be used to find the names of each column in a data frame.

Question: How you can determine the column information of a data frame such as the “names, type, missing values” etc.?
Answer: There are two built-in functions in R to find the information about columns of a data frame.

str(iris)
summary(iris)
Data Frame in R Language

Exporting a Data Frame in R

Question: How a data frame can be exported in R so that it can be used in other statistical software?
Answer: Use the write.csv command to export the data in comma-separated format (CSV).

write.csv(iris, "iris.csv", row.names = FALSE)

Question: How one can select a particular row or column of a data frame?
Answer: The easiest way is to use the indexing notation []

Suppose you want to select the first column only, then at the R prompt, write

iris[,1]

Suppose we want to select the first column and also want to put the content in a new vector, then

new <- iris[,1]

Suppose you want to select different columns, for example, columns 1, 3, and 5, then

newdata <- iris[, c(1, 3, 5)]

Suppose you want to select a first and third row, then

iris[c(1,2), ]

Dealing with Missing Values in a Data Frame

Question: How do you deal with missing values in a data frame?
Answer: In R language it is easy to deal with missing values. Suppose you want to import a file named “file.csv” that contains missing values represented by a “.” (period), then on the R prompt write

data <- read.csv("file.csv", na.string = ".")

If missing values are represented as “NA” values then write

dataset <- read.csv("file.csv", na.string = "NA")

For the case of built-in data such (here iris), use

data <- na.omit(iris)

https://gmstat.com