Introduction to R Language

The post is about an introduction to R Language. In this introduction to R Language, we will discuss here a short history of R programming language, obtaining R, the installation path of the language, installing R, and R console. Let us start with an introduction to the R language.

Introduction to R Language

R is an open-source (GPL) programming language for statistical computing and graphics, made after S and S-plus language. The S language was developed by AT&T laboratories in the late ’80s. Robert Gentleman and Ross Ihaka started the research project of the statistics department of the University of Auckland in 1995 called R Language.

The R language is currently maintained by the R core development team (an international team of volunteer developers). The (R Project website) is the main site for information about R. From this page information about obtaining the software, accompanying package, and many other sources of documentation (help files) can be obtained.

Introduction to R Language

R provides a wide variety of statistical and graphical techniques such as linear and non-linear modeling, classical statistical tests, time-series analysis, classification, multivariate analysis, etc., as it is an integrated suite of software having facilities for data manipulation, calculation, and graphics display. It includes

  • Effective data handling and storage facilities
  • Have a suite of operators for calculation on arrays, particularly for matrices
  • Have a large, coherent, integrated collection of intermediate tools for data analysis
  • Graphical data analysis
  • Conditions, loops, user-defined recursive functions, and input-output facilities.

Obtaining R Software

R language Software can be obtained/downloaded from the R Project site the ready-to-run (binaries) files for several operating systems such as Windows, Mac OS X, Linux, Solaris, etc. The source code for R is also available for download and can be compiled for other platforms. R language simplifies many statistical computations as R is a very powerful statistical language with many statistical routines (programming code) developed by people from all over the world and freely available from the R project website as “Packages”. The basic installation of R language contains many powerful sets of tools and it includes some basic packages required for data handling and data analysis.

Many users of R think of R as a statistical system, but it is an environment within which statistical techniques are implemented. The R language can also be extended via packages.

Installing R

For the Windows operating system, the binary version is available from http://cran.r- project.org/bin/windows/base/. “R-4.4.1-win.exe. R-4.4.1” (Race for Your Life) is the latest version of R released on 15 June 2024 by Duncan Murdoch.

After downloading the binary file double-click it, and almost automatic installation of the R system will start although the customized installation option is also available. Follow the instructions during the installation procedure. Once the installation process is complete, you have the R icon on your computer desktop.

The R Console

When R starts, you will see R console windows, where you type commands to get the required results. Note that commands are typed on the R Console command prompt. You can also edit the commands previously typed on the command prompt by using the left, right, up, and down arrow keys, home, end, backspace, insert and delete keys from the keyboard. Command history can be obtained by up and down arrow keys to scroll through recent commands. It is also possible to type commands in a file and then execute the file, using the source function in the R console.

Books on R Programming Language

The following books can be useful for learning the R and S language.

  • “Practicing R for Statistical Computing by Aslam, M, and Imdad Ullah, M., Springer, 2023.
  • “Psychologie statistique avec R” by Yvonnick Noel. Partique R. Springer, 2013.
  • “Instant R: An introduction to R for Statistical Analysis” by Sarah Stowell. Jotunheim Publishing, 2012.
  • “Financial Risk Modeling and Portfolio Optimization with R” by Bernhard Pfaff. Wiley, Chichester, UK, 2012.
  • “An R Companion to Applied Regression” by John Fox and Sanford Weisberg, Sage Publications, Thousand Oaks, CA, USA, 2nd Edition, 2011,
  • “R Graphs Cookbook” by Hrishi Mittal, Packt Publishing, 2011
  • “R in Action” by Rob Kabacoff. Manning, 2010.
  • “The Statistical Analysis with R Beginners Guide” by John M. Quick. Packt Publishing, 2010.
  • “Introducing Monte Carlo Methods with R” by Christian Robert and George Casella. Use R. Springer, 2010.
  • “R for SAS and SPSS users” by Robert A. Muenchen. Springer Series in Statistics and Computing. Springer, 2009.

MCQs General Knowledge

Import Data in R, Reading, and Creating Data

There are many ways to read data into R Language.  We will learn here how to import data in R Language too. We can also generate certain kinds of patterned data. Some of them are:

Reading Data from the Keyboard Directly

For small data (few observations) one can input data in vector form directly on R Console, such as

x <- c(1, 2, 3, 4, 5)
y <-c('a', 'b', 'c')

In vector form, data can be on several lines by omitting the right parentheses, until the data are complete, such as

x <- c(1, 2 
       3, 4)

Note that it is more convenient to use the scan function, which permits the index of the next entry.

Using Scan Function

For small data sets it is better to read data from the console by using the scan function. The data can be entered on a separate line, by using a single space and/or tab. After entering the complete required data, pressing twice the enter key will terminate the scanning.

X <- scan()
1:   3 4 5
4:   4 5 6 7
8:   2 3 4 5 6 6
14:
Read 13 items

Reading String Data using the “what” Option

y <- scan(what=" ")
1:    red green blue
4:    white
5:
Read 4 items

The scan function can be used to import data. The scan function returns a list or a vector while read.table function returns a data frame. It means that the scan function is less useful for imputing “rectangular” type data.

Reading data from ASCII or plain text files into R as Data Frame

The read.table function reads any type of delimited ASCII file. It can be numeric and character values. Reading data into R read.table is the easiest and most reliable method. The default delimiter is a blank space.

data <- read.table(file=file.choose()) #select from dialog box

data <- read.table("http://itfeature.com/test.txt", header=TRUE)) # read from web site

Note that the read.table command can also be used for reading data from the computer disk by providing an appropriate path in inverted commas such as

data <-read.table("D:/data.txt", header=TRUE)) # read from your computer

For missing data, read.table will not work and you will receive an error. For missing values the easiest way to fix this error, change the type of delimiter by using a sep argument to specify the delimiter.

data <-read.table("http//itfeature.com/missing_comma.txt", header=TRUE, sep=","))

Comma-delimited files can be read in by read.table function and sep argument, but they can also be read in by the read.csv function specifically written for comma-delimited files. To display the contents of the file use print() function or file name.

data <- read.csv(file=file.choose() )

Reading in fixed formatted files

To read data in fixed format use read.fwf function and argument width are used to indicate the width (number of columns) for each variable. In this format variable names are not there in the first line, therefore they must be added after reading the data. Variable names are added by dimnames function and the bracket notation to indicate that we are attaching names to the variables (columns) of the data file. Anyhow there are several different ways to do this task.

data <- read.fwf("http://itfeature.com/test_fixed.txt", width = c(8,1,3,1,1,1) )

dimnames(data)[[2]]
c("v1", "v2", "v3", "v4", "v5","v6")

Import Data In R

Importing data in R is fairly simple. For Stata and Systat, use the foreign package. For SPSS and SAS recommended package is the Hmisc package for ease and functionality. See the Quick-R section on packages, for information on obtaining and installing these packages. Examples of importing data in R are provided below.

From Excel

On Windows systems, you can use the RODBC package to access Excel files. The first row of the Excel file should contain variable/column names.

# Excel file name is myexcel and WorkSheet name is mysheet
library(RODBC)
channel <- odbcConnectExcel("c:/myexel.xls")
mydata <- sqlFetch(channel, "mysheet") 
odbcClose(channel)

From SPSS

# First save SPSS dataset in trasport format
get file = 'c:\data.sav'
export outfile = 'c:\data.por'
library(Hmisc)
mydata <- spss.get("c:/data.por", use.value.labels=TRUE)   
# "use.value.labels" option converts value labels to R factors.

From SAS

# save SAS dataset in trasport format
libname out xport 'c:/mydata.xpt';
data out.data;
set sasuser.data;
run;
# in R
library(Hmisc)
mydata &lt;- sasxport.get("c:/data.xpt")
# character variables are converted to R factors
From Stata
# input Stata file
library(foreign)
mydata &lt;- read.dta("c:/data.dta")
From systat
# input Systat file
library(foreign)
mydata &lt;- read.systat("c:/mydata.dta")
Importing Data in R

Accessing Data in R Library

Many of the R libraries including CAR library contain data sets. For example to access the Duncan data frame from the CAR library in R type the following command on R Console

library(car)
data(Duncan)
attach(Duncan)

Some Important Commands for Dataframes

data        #displays the entire data set on command editor
head(data)  #displays the first 6 rows of dataframe
tail(data)  #displays the last 6 rows of dataframe
str(data)   #displays the names of variable and their types
names(data) #shows the variable names only
rename(V1,Variable1, dataFrame=data) # renames V1 to variable 1; note that epicalc packagemust be installed
ls()        #shows a list of objects that are available
attach(data)#attached the dataframe to the R search path, which makes it easy to access variables names.

https://gmstat.com

https://itfeature.com

Important R Programming MCQs 11

The post is about R Programming MCQs Quiz with Answers. The quiz covers, MCQs about Rstudio, Data Analysis in R, and Some Basics of R Programming Languages. Let us start with the R Programming MCQs Quiz.

Online Multiple Choice Questions about R Programming Language

1. Why do analysts use comments In R programming?

 
 
 
 

2. When using RStudio, what does the installed.packages()function do?

 
 
 
 

3. If you write code directly in the R source editor, RStudio can save your code when you close your current session.

 
 

4. Which of the following are the benefits of open-source code? Select all that apply.

 
 
 
 

5. Many data analysts prefer to use a programming language for which of the following reasons?

 
 
 
 

6. A data analyst wants to write R code that they can access again after they close their current session in RStudio. Where should they write their code?

 
 
 
 

7. In data analytics, what is CRAN?

 
 
 
 

8. What are the benefits of using a programming language to work with your data?

 
 
 
 

9. Programming involves _____ a computer to perform an action or set of actions.

 
 
 
 

10. RStudio includes which of the following panes?

 
 
 
 

11. A data analyst needs to quickly create a series of scatterplots to visualize a very large dataset. What should they use for the analysis?

 
 
 
 

12. A data analyst wants to use a programming language that they can modify. What type of programming language should they use?

 
 
 
 

13. An analyst includes the following calculation in their R programming: midyear_sales <- (quarter_1_sales + quarter_2_sales) - overhead_costsWhich variable will the total from this calculation be assigned to?

 
 
 
 

14. The R programming language can be used for which of the following tasks?

 
 
 
 

15. What tool gives data analysts the highest level of control over their data analysis?

 
 
 
 

16. A data analyst is searching for an open-source tool that will allow them to reproduce every step of their analysis, including data cleaning and transformations, calculations, and visualizations. What tool is the best option?

 
 
 
 

17. What type of software application is RStudio?

 
 
 
 

18. What are the benefits of using a programming language for data analysis?

 
 
 
 

19. A data analyst is searching for a tool that will allow them to communicate instructions that a computer can run. What tool should they use?

 
 
 
 

20. A data analyst is working with spreadsheet data. The analyst imports the data from the spreadsheet into RStudio. Where in RStudio can the analyst find the imported data?

 
 
 
 

R Programming MCQs

Online R Programming MCQs Quiz

  • Programming involves ___________ a computer to perform an action or set of actions.
  • What are the benefits of using a programming language to work with your data?
  • The R programming language can be used for which of the following tasks?
  • What type of software application is RStudio?
  • RStudio includes which of the following panes?
  • If you write code directly in the R source editor, RStudio can save your code when you close your current session.
  • What tool gives data analysts the highest level of control over their data analysis?
  • A data analyst is searching for a tool that will allow them to communicate instructions that a computer can run. What tool should they use?
  • What are the benefits of using a programming language for data analysis?
  • Many data analysts prefer to use a programming language for which of the following reasons?
  • A data analyst wants to use a programming language that they can modify. What type of programming language should they use?
  • A data analyst is searching for an open-source tool that will allow them to reproduce every step of their analysis, including data cleaning and transformations, calculations, and visualizations. What tool is the best option?
  • When using RStudio, what is the installed.packages() function do?
  • In data analytics, what is CRAN?
  • Why do analysts use comments In R programming?
  • An analyst includes the following calculation in their R programming: midyear_sales <- (quarter_1_sales + quarter_2_sales) – overhead_costsWhich variable will the total from this calculation be assigned to?
  • Which of the following are the benefits of open-source code?
  • A data analyst needs to quickly create a series of scatterplots to visualize a large dataset. What should they use for the analysis?
  • A data analyst wants to write R code that they can access again after they close their current session in RStudio. Where should they write their code?
  • A data analyst is working with spreadsheet data. The analyst imports the data from the spreadsheet into RStudio. Where in RStudio can the analyst find the imported data?

Statistics and Data Analysis Website

Online MCQs Website Quiz with Answers