Import Data Files into R

You can read/import data files into R produced by different software such as MS-Excel, Minitab, SPSS, SAS, and STATA.

Reading Excel Files

The XLConnect package can be used to import data files into R produced in MS-Excel. First, you have to install this package.

install.packages("XLConnect")

You can check if the package is already installed using code:

any(grepl("XLConnect", installed.packages()))

If the package is installed, you need to activate the package in your workspace (using library( ) function) to load data from MS-Excel files.

importing data files in r

Suppose, you have a data file (Hald.xlsx) stored at path "D:\STAT\STA-654\Hald.xlsx". To read this file the readWorksheetFromFile( ) function can be used. For example,

library(XLConnect) 
data <- readWorksheetFromFile("D:/stat/sta-654/Hald.xlsx", sheet=1)

Since Excel workbook can contain more than one sheet, therefore, you need to specify the sheet argument and specify which sheet you want to load into R. In this example, data from sheet 1 will be loaded.

If you want o load the whole workbook (all sheets) use the loadWorkbook( ) function to load the required worksheet as a data file frame. For example,

wb <- loadWorkbook("D:/stat/sta-654/Hald.xlsx") 
df <- readWorksheet(wb, sheet = 1)

The readxl package can also be used to read MS-Excel files more easily.

library(readxl) 
df <- read_excel("Data file with path")

Note that the MS-Excel files with extension *.xls, or *.xlsx can be specified. The sheet argument can also be added, just like with the XLSconnect package.

Read SPSS Data Files

To read the SPSS files install foreign package. After loading the foreign package, the read.spss( ) function can be used to load an SPSS data file in R. For example,

library(foreign) 
mydata <- read.spss ("SPSS data file with path", to.data.frame = TRUE)

The argument to.data.frame is set to TRUE so that the data displayed in a data frame format. Since SPSS data file contain value label, and if you do not want the variables with value labels to be converted into R factors with corresponding levels, setuse.value.labels = FALSE. For example,

library(foreign) 
mydata <- read.spss("SPSS data file with path", 
                    to.data.frame = TRUE, 
                    use.value.labels = FALSE)

Reading STATA Data Files

To import Stata files the read.dta( ) function from foreign package can be used. For example,

library(foreign) 
mydata <- read.dta("STATA file with path")

Reading SAS Data Files

The read.sas7bdat( ) function from sas7bdat package can be used to read SAS data files into R.

library(sas7bdat) 
mydata <- read.sas7bdat("SAS data file with path")

Reading Minitab Data File

Minitab data files can be imported in R using read.mtp( ) function from foreign package.

library(foreign) 
mydata <- read.mtp("Minitab data file with path")

Reading RDA or RData Files

The R data files RDA and RData files can be easily read using load( ) function.

load("filename.RDA")

Reading Text (*.txt) Files

We can import data that is already saved (available) in a file created in a text (*.txt) files, MS-Excel, SPSS, or some other software. Before importing a data stored in a file (that is, reading test (*.txt) files), one should be clear and understand the following:

  1. Usually, data produced from spreadsheets reserved the first row as header (name of variables), while the first column is used to identify the sampling unit (observation number).
  2. Avoid names, the value of fields with blank spaces, each word may be interpreted as a separate variable, resulting in errors.
  3. To concatenate words, use a full stop (.) instead of space between words.
  4. Name variables with short or as abbreviated names.
  5. Try to avoid using names of variables that contains symbols such as ?, $, %, ^, *, (, ), – , #, <, >, /, |, ,\, [, ], {, and }.
  6. Delete comments if you have made in your excel file.
  7. Make sure missing values in your dataset are indicated with NA.

Preparing you R workspace

Before importing data in R, it is better to delete all objects using the following line of code

rm(list = ls() )

The rm( ) function “remove objects from a specified environment”. Since no argument to ls( ) function is provided, datasets and user-defined functions will be deleted.

Confirm your working directory before importing a file to R, using

getwd()

If possible change the path of your working directory. such as

setwd("D:\\Stat\\STA-654")

Note you may have to create the directory (folder) and the path discussed above.

File Path

Reading Text (*.txt) Files

Reading Text (*.txt) files in R is easy and simple enough. If you have data in a *.txt file or a tab-delimited text file, you can easily import it with read.table( ) function. Suppose we have a data file named "Hald.txt" stored at path "D:\STAT\STA-654\Hald.txt". The following code line can be used for reading text (*.txt) files in R:

datafile <- read.table ("D:/stat/sta-654/Hald.txt", header = TRUE)

If you have data stored on some web address, you can also import it as

datafile <- read.table ("http://itfeature.com/wp-content/uploads/2020/03/Hald.txt", header = TRUE)

Note that the first argument of read.table() provide the name and extension of the file that you want to import in R. the header argument specifies whether or not you have specified column names in your data file. The Hald.txt file will be imported as data.frame an object.

Reading Data from R Library

Here we will discuss how to read data from the R library. Many R libraries contain datasets. For example, the car package contains a Duncan dataset that can be used for learning and implementing different R functions. To use Duncan data, first, you have to load the car package. Note that the car package must be installed to make use of the Duncan dataset. Let us make use of the Duncan dataset.

library(car)
data(Duncan)
attach(Duncan)

If the car package is not installed on your system, you can install using the following command. Note your system should be connected to the internet.

install.packages("car")

The attach( ) function makes accessible each variable without writing the variable name with the respective dataset name. After attaching the Duncan dataset one can access the variable say education instead of writing Duncan$education. Let us make some functions on this dataset.

head(Duncan)

The head( ) function will display the top six observations with their variable names in table type format. It will help to understand the structure of the dataset.

summary(Duncan)

For quantitative variables, the summary( ) function will provide five-number summary statistics with the mean value. For qualitative variables, the summary( ) function will provide the frequency of each group.

To plot a scatter plot one can use the plot function. For example,

plot(education, income)
Scatter plot of Education and Income

The scatter plot shows the strength and direction of the relationship between “Percentage of occupational incumbents in 1950 who were high school graduates’ and ‘Percentage of occupational incumbents in the 1950 US Census who earned $3,500’.

To check how many observations and columns are in a dataset, one can make use of nrow( ) and ncol( ) function. For example,

nrow(Duncan)
ncol(Duncan)

To get the definition of a dataset and its variable, one can read the dataset documentation:

?Duncan

To see the list of pre-loaded data, type the function data( ):

data( )

It is best practice to attach data only one at a time when reading data from the R library or importing from the data file. To remove a data frame from the search path, use detach()function.

Exercise

Try the following dataset and make use of all functions discussed in this lecture.

mtcars
iris
TootGrowth
PlantGrowth
USAarrests

Reading and Writing data in R

Reading Data in R

Here we will discuss about reading and writing data in R. For reading, (importing) data into R following are some functions.

  • read.table(), and read.csv(), for reading tabular data
  • readLines() for reading lines of a text file
  • source() for reading in R code files (inverse of dump)
  • dget() for reading in R code files (inverse of dput)
  • load() for reading in saved workspaces.

Writing Data to files

Following are few functions for writing (exporting) data to files.

  • write.table(), and write.csv() exports data to wider range of file format including csv and tab-delimited.
  • writeLines() write text lines to a text-mode connection.
  • dump() takes a vector of names of R objects and produces text representations of the objects on a file (or connection). A dump file can usually be sourced into another R session.
  • dput() writes an ASCII text representation of an R object to a file (or connection) or uses one to recreate the object.
  • save() writes an external representation of R objects to the specified file.

Reading data files with read.table()

The read.table() function is one of the most commonly used functions for reading data into R. It has a few important arguments.

  • file, the name of a file, or a connection
  • header, logical indicating if the file has a header line
  • sep, a string indicating how the columns are separated
  • colClasses, a character vector indicating the class of each column in the data set
  • nrows, the number of rows in the dataset
  • comment.char, a character string indicating the comment character
  • skip, the number of lines to skip from the beginning
  • stringsAsFactors, should character variables be coded as factors?

read.table() and read.csv() Examples

data <-read.table("foo.txt")
data <-read.table("D:\\datafiles\\mydata.txt")
data <-read.csv("D:\\datafiles\\mydata.csv")

R will automatically skip lines that begin with a #, figure out how many rows there are (and how much memory needs to be allocated). R also figure out what type of variable is in each column of the table.

Writing data files with write.table()

Following are few important arguments usually used in write.table() function.

  • x, the object to be written, typically a data frame
  • file, the name of the file which the data are to be written to
  • sep, the field separator string
  • col.names, a logical value indicating whether the column names of x are to be written along with x, or a character vector of column names to be written
  • row.names, a logical value indicating whether the row names of x are to be written along with x, or a character vector of row names to be written
  • na, the string to use for missing values in the data

write.table() and write.csv() Examples

x <- data.frame(a = 5, b = 10, c = pi)
write.table(x, file = "data.csv", sep = ",")
write.table(x, "c:\\mydata.txt", sep = "\t")
write.csv(x, file = "data.csv")

Read more about importing and exporting data in R

x  Powerful Protection for WordPress, from Shield Security
This Site Is Protected By
Shield Security