R FAQs Interview Questions

The post is about R FAQs Interview Questions. It contains some basic questions that are usually asked in interviews.

R FAQs Interview Questions

The following are R FAQs Interview Questions with their detailed answers:

Why Should One Adopt the R Programming Language?

  • R programming language is the best software for statistical data analysis and machine learning. By using R language software, one can create objects, functions, and R packages.
  • R is an open-source programming language.
  • Using R one can create any form of statistical analysis and data manipulation.
  • It can be used in almost every field of finance, marketing, sports, etc.
  • R Programming is extensible and hence, R contributor groups are noted for their energetic contributions.
  • A lot of R’s typical features can be written in R Language itself and hence, R has gotten faster over time and serves as a glue language.

What are the programming features of R?

  • Packages are part of R programming. R Packages are useful in collecting sets of R functions into a single unit.
  • R’s programming features include database input, exporting data, viewing data, variable labels, missing data, etc.
  • R is an interpreted language, so one can access it through a command line interpreter.
  • R supports matrix arithmetic.
  • R supports procedural programming with functions and object-oriented programming with generic functions.
  • Procedural programming includes procedures, records, modules, and procedure calls while object-oriented programming language includes classes, objects, and functions.

Is R is a slow language?

  • R programs can be slow, however, well-written R code/programs are usually fast enough.
  • In R language, Speed was not the primary design criterion.
  • R language is designed to make programming easier.
  • Slow programs are often a result of bad programming practices or not understanding how R works.
  • There are various options for calling C or C++ functions from R.

Why is R important for data science?

  • One can run the R code without any Compiler because R language is an interpreted language. Hence one can run Code without any compiler.
  • R interprets the Code and makes the development of code easier.
  • Many calculations are done with vectors because R is a vector language, so anyone can add functions to a single Vector without putting it in a loop. Hence, the R language is more powerful and faster than other languages.
  • R language is a Language widely used in biology, genetics as well as in Statistics. R is to a turning complete language where any type of task can be performed.

Why is R Good for Business?

  • The most important reason why R is good for business is that it is open-source and Free. R language is great for data visualization. As per new research, R has far more capabilities as compared to earlier tools and computing languages.
  • For data-driven decisions in businesses, data science talent shortage is a very big problem. Companies are using R programming as their platform and recruit trained users of R.

What are the statistical and programming features of the R Language?

  1. Statistical Features
  • Basic Statistics: Measures of central tendencies (Mean, variance, median, etc.), measures of dispersion (range, standard deviation, variance), Quartiles, etc.
  • Static graphics: Basic plots, graphic maps, scatter plots, line charts, etc.
  • Probability distributions: Normal, Poisson, Binomial, t, F, Beta, Gamma, etc.
  • Inferential Statistics: Comparison tests (one sample, two samples, ANOVA, etc.), correlation and regression analysis, non-parametric tests, etc.
  • Multivariate Analysis: Principal Component Analysis (PCA), Factor Analysis, Canonical Correlation, etc.
  1. Programming Features
  • Distributed Computing: Distributed computing is an open-source, high-performance platform for the R language. It splits tasks between multiple processing nodes to reduce execution time and analyze large datasets.
  • R packages: R packages are a collection of R functions, compiled code, documentation, and sample data. By default, R installs a set of packages during installation.
  • R is an interpreted language: R language does not need a compiler to make a program from the code. R directly interprets provided code into lower-level calls and pre-compiled code.
  • Compatible Programming Language: Most R language functions are written in R itself, C, C++, or FORTRAN, and can be used for computationally heavy tasks. Java, .NET, Python, C, C++, and FORTRAN can also be used to manipulate objects directly.
R FAQS Interview Questions Frequently Asked Questions About R

https://itfeature.com, https://gmstat.com

Generating Regular Sequences in R

R language has a number of facilities for generating commonly used sequences of numbers. There are a number of functions for generating regular sequences in R to perform data analysis tasks:

  • Colon Operator (:)
  • seq() Function
  • rep() Function

Generating Regular Sequences in R Language

Usually, the functions related to generating regular sequences in R are used to create index vectors, vectors of evenly spaced numbers, repeating the patterns, and creating sequences for plotting.

Colon Operator (:)

The colon operator generates a sequence of integers, for example, 1:30 is the vector c(1, 2, …, 29, 30). The colon operator has a high priority within an expression, for example, 2*1:15 is the vector c(2, 4, …, 28, 30).

Let set $n=10$ and then compare the sequences $1:n-1$ and $1:(n-1)$:

n = 10
1:n-1
1:(n-1)
Generating Regular Sequences in R Language

The 30:1 may be used to generate a sequence backward.

30:1
## Output
 [1] 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6
[26]  5  4  3  2  1

The seq() Function

The seq() functions offer more flexibility and control over generating sequences. The seq() functions have five arguments, some of which may be specified in any call. The first two arguments of the function specify the beginning and end of the sequence.

Like other R functions, the arguments to seq() can also given in named form, in which case the order in which they appear is irrelevant. The first two arguments of seq() functions may be named from=value and to=value. Therefore seq(1, 30), seq(from = 1, to = 30) and seq(to = 30, from = 1) are all the same as 1:30. The other two arguments may be named by = value and length = value, which specify a step size and a length for the sequence, respectively. By default the by argument is set to 1, that is, by = 1. The examples of seq() functions are

seq(1, 20)
## Output
[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

seq(from = 1, to = 20)
## Output 
[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

seq(from = 1, to = 20, by = 1)
## OUtput
[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

seq(-5, 5, by = 0.2)
##
 [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2
[16] -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2  0.0  0.2  0.4  0.6  0.8
[31]  1.0  1.2  1.4  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0  3.2  3.4  3.6  3.8
[46]  4.0  4.2  4.4  4.6  4.8  5.0

seq(length = 51, from = -5, by = 0.2)

Note that if only the first two arguments are given the result is the same as the colon operator. For example, seq(2, 10) results in the same output as 2:10.

The length.out argument may be used to generate a sequence of evenly spaced numbers, for example,

# generate a sequence of evenly spaced numbers between 0 and 1
seq(from = 0, to = 1, length.out = 11) 

## Output
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

The fifth argument may be named along = vector, which is normally used as the only argument to create the sequence 1,2, …., length(vector) or the empty sequence if the vector is empty. For example

x = rnorm(10)
seq(along = x)

## Output
[1]  1  2  3  4  5  6  7  8  9 10

The rep() Function

The rep function is used for replicating or repeating an object in various complicated ways. The simplest form of the rep() function is

rep(1:5, times = 5)

## Output
[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

The rep(1:5, times = 5) will put five copies of 1:5 end-to-end. The other useful version of rep() function is

rep(1:5, each = 5)

## Output
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5

The rep(1:5, each = 5) repeats each element of 1:5 five times before moving on to the next number.

Frequently Asked Questions About R, generating regular sequences in R

R Language Quiz

General Knowledge Quizzes

Statistics and Data Analysis

Frequently Asked Questions about Generating Sequences

  • Describe R functions that are used to generate regular sequences.
  • What is the use of seq() function in R?
  • Give some examples of colon operators in R?
  • Describe rep() function in R with examples.
  • What is the length.out argument in seq() function?
  • Write about important arguments of seq() function in R language.
  • How one can generate a sequence backward, give an example.

Frequency Table in R: Factors Variable

Recall that in R language a factor is a variable that defines a partition into groups. A single factor variable can be used to create a simple frequency table in R, while a pair of factors can be used to define a two-way cross-classification (contingency or frequency distribution). For this purpose, the table() function allows to creation of frequency tables. The frequency table is calculated from equal length factors.

Frequency Table in R of Categorical/ Group/ Factor Variable

We will use the “mtcars” dataset. For the variable $gear$, let us create a frequency table using the table() function. The table() function will count the gear code for each entry in the data vector. For example,

attach(mtcars)

freq <- table(gear)
freq
frequency table using factor

The freq object will give a table of frequencies of each gear code in the sample. It is important to note that, the frequencies are ordered and labeled by the levels attribute of the factor.

Frequency Distribution of a Continuous Variable

One can also create a frequency distribution table for a continuous variable. Suppose from the mtcars data set, we are interested in creating a frequency table of $mpg$ variable. For this purpose, first, we need to define the cut points or bins to define the classes/groups of the frequency table. For example,

cut(mpg, 10+5*(0:5))

## Output
(10,15] (15,20] (20,25] (25,30] (30,35] 
      6      12       8       2       4 

The cut() function is used to split the continuous data vector into groups. The groups are defined by creating a sequence of values using 10+5*(0:5), that is

10+5*(0:5)

## Output
10 15 20 25 30 35

The cut() function, cuts and counts the occurrence of each observation of mpg regarding the cut points created using breaks = 10+5*(0:5). The frequency table will be

frequency table of a continuous variable

Creating Graph of Frequency Table

For the frequency table created above, one can easily create different graphical representations, such as pie charts and bar plots of the frequency table. For example,

freq<-table(cut(mpg, 10+5*(0:5)))
pie(freq)
hist(freq)
barplot(freq)
plot(freq)
Bar plot frequency table in R
bar plot in R language
pie chart in frequency table in R language

Note that: for a $k$ factor argument, the result is a $k$-way array of frequencies.

https://itfeature.com, https://gmstat.com