R Functions Explained

Learn key R functions Explained: like sort(), search(), subset(), sample(), all(), and any() with practical examples. Discover how to check if an element exists in a vector and understand the differences between all() and any(). Perfect for R beginners!” learn Q&A guide on sort(), search(), subset(), sample(), all(), any(), and element checks in vectors. Boost your R skills today!”

Which function is used for sorting in the R Language?

Several functions in R can be used for sorting data. The most commonly used R functions for sorting are:

  • sort(): Sorts a vector in ascending or descending order. The general syntax is sort(x, decreasing = FALSE, na.last = NA)
  • order(): Returns the indices that would sort a vector (it is useful for sorting data frames). The general syntax of order() is order(x, decreasing = FALSE, na.last = TRUE)
  • arrange(): It sorts a data frame (however, it requires dplyr package). The general syntax is: arrange(.data, …, .by_group = FALSE)
# sort() Function
vec <- c(3, 1, 4, 1, 5)
sort(vec)                		# Ascending (default): 1 1 3 4 5
sort(vec, decreasing = TRUE)  	# Descending: 5 4 3 1 1

# order() Function
df <- data.frame(name = c("Ali", "Usman", "Umar"), age = c(25, 20, 30))
df[order(df$age), ]  # Sort data frame by age (ascending)

# arrange() Function from dplyr package
library(dplyr)
df %>% arrange(age)               # Ascending
df %>% arrange(desc(age))         # Descending
R functions explained sort arrange order

Why search() function used?

In R language, the search() function is used to display the current search path of R objects (such as functions, datasets, variables, etc.). This shows the order in which R looks for objects when you reference them.

What Does search() function do?

  • Lists all attached packages and environments in the order R searches them.
  • Helps diagnose issues when multiple packages have functions with the same name (name conflicts).
  • Shows where R will look when you call a function or variable.

What is the use of subset() and sample() functions in R?

In R language, subset() and sample() are two useful functions for data manipulation and sampling:

  • subset(): is used to extract subsets of data frames or vectors based on some condition. The general syntax is subset(x, subset, select, …)
  • sample(): is used for random sampling from a dataset with or without replacement. The general system is: sample(x, size, replace = FALSE, prob = NULL).

The examples of subset() and sample() are describe below

# Example data frame
df <- data.frame(
  name = c("Ali", "Usman", "Aziz", "Daood"),
  age = c(25, 30, 22, 28),
  salary = c(50000, 60000, 45000, 70000)
)

# Filter rows where age > 25
subset(df, age > 25)

# Filter rows and select specific columns
subset(df, salary > 50000, select = c(name, salary))
R functions explained
# Randomly sample 3 numbers from 1 to 10 without replacement
sample(1:10, 3)

# Sample with replacement (possible duplicates)
sample(1:5, 10, replace = TRUE)

# Sample rows from a data frame
df[sample(nrow(df), 2), ]  # Picks 2 random rows
R functions explained

What is the use of all() and any()?

In R language, the all() and any() functions are logical functions used to evaluate conditions across vectors or arrays.

  • all() function: checks if all elements of a logical vector are TRUE. It returns TRUE only if every element in the input is TRUE, otherwise, it returns FALSE. The general syntax is all(..., na.rm=FALSE)
  • any() Function: checks if at least one element of a logical vector is TRUE. It returns TRUE if any element is TRUE and FALSE only if all are FALSE. The general syntax is any(..., na.rm = FALSE)

The examples of all() and any() functions are:

x <- c(TRUE, TRUE, FALSE)
all(x)  # FALSE (not all elements are TRUE)

y <- c(5, 10, 15)
all(y > 3)  # TRUE (all elements are greater than 3)
x <- c(TRUE, FALSE, FALSE)
any(x)  # TRUE (at least one element is TRUE)

y <- c(2, 4, 6)
any(y > 5)  # TRUE (6 is greater than 5)

Note that if NA is present and na.rm = FALSE, any() returns NA unless a TRUE value exists.

What are the key differences between all() and any()?

The key differences between all() and any() are:

FunctionReturns TRUE WhenReturns FALSE When
all()All elements are TRUEAt least one is FALSE
any()At least one element is TRUEAll are FALSE

What is the R command to check if element 15 is present in a vector $x$?

One can check if the element (say) 15 is present in a vector x using either

  • %in% Operator
  • any() with logical comparison
  • which() to find the position of 15
# %in%
x <- c(10, 15, 20, 25)
15 %in% x  # Returns TRUE
30 %in% x  # Returns FALSE

# any()
x <- c(5, 10, 15)
any(x == 15)  # TRUE
any(x == 99)  # FALSE

# Which()
x <- c(10, 15, 20, 15)
which(x == 15)  # Returns c(2, 4)

Try Normal Distribution Quiz

The glm Function in R

Learn about the glm function in R with this comprehensive Q&A guide. Understand logistic regression, Poisson regression, syntax, families, key components, use cases, model diagnostics, and goodness of fit. Includes a practical example for logistic regression using glm() function in R.

What is the glm function in the R language?

The glm (Generalized Linear Models) function in R is a powerful tool for fitting linear models to data where the response variable may have a non-normal distribution. It extends the capabilities of traditional linear regression to handle various types of response variables through the use of link functions and exponential family distributions.

Since the distribution of the response depends on the stimulus variables through a single linear function only, the same mechanism as was used for linear models can still be used to specify the linear part of a generalized model.

What is Logistic Regression?

Logistic regression is used to predict the binary outcome from the given set of continuous predictor variables.

What is the Poisson Regression?

The Poisson regression is used to predict the outcome variable, which represents counts from the given set of continuous predictor variables.

What is the general syntax of the glm function in R Language?

The general syntax to fit a Generalized Linear Model is glm() function in R is:

glm(formula, family = gaussian, data, weights, subset, na.action, start = NULL,
    etastart, mustart, offset, control = list(...), model = TRUE, method = "glm.fit",
    x = FALSE, y = TRUE, contrasts = NULL, ...)

What are families in R?

The class of Generalized Linear Models handled by facilities supplied in R includes Gaussian, Binomial, Poisson, Inverse Gaussian, and Gamma response distributions, and also quasi-likelihood models where the response distribution is not explicitly specified. In the latter case, the variance function must be specified as a function of the mean, but in other cases, this function is implied by the response distribution.

Write about the Key components of glm Function in R

Formula

It specifies the relationship between variables, similar to lm(). For example,

y ~ x1 + x2 + x3  # main effects
y ~ x1*x2         # main effects plus interaction
y ~ .    

Family

It defines the error distribution and link function. The Common families are:

  • gaussian(): Normal distribution (default)
  • binomial(): Logistic regression (binary outcomes)
  • poisson(): Poisson regression (count data)
  • Gamma(): Gamma regression
  • inverse.gaussian(): Inverse Gaussian distribution

What are the common use cases of glm() function?

Each family has link functions (e.g., logit for binomial, log for Poisson).

Logistic Regression (Binary Outcomes

model <- glm(outcome ~ predictor1 + predictor2, family = binomial(link = "logit"),
             data = mydata)

Poisson Regression (Count Data)

model <- glm(count ~ treatment + offset(log(exposure)), family = poisson(link = "log"),
             data = count_data)

What statistics can be computed after fitting glm() model?

After fitting a model, one can use:

summary(model)   # Detailed output including coefficients
coef(model)      # Model coefficients
confint(model)   # Confidence intervals
predict(model)   # Predicted values

What are model diagnostics and goodness-of-fit?

The following are built-in glm() model diagnostics and goodness of fit:

anova(model, test = "Chisq")  # Analysis of deviance
residuals(model)              # Various residual types available
plot(model)                   # Diagnostic plots

Give an example of logistic regression fitting using glm() function.

Consider the mtcars data set, where am is the response variable

# Fit model
data(mtcars)
model <- glm(am ~ hp + wt, family = binomial, data = mtcars)

# View results
summary(model)

# Predict probabilities
predict(model, type = "response")

# Plot
par(mfrow = c(2, 2))
plot(model)
glm() Function in R Language

Tips for effective Use of glm() function?

  1. Always check model assumptions and diagnostics
  2. For binomial models, the response can be:
    • A factor (first level = failure, others = success)
    • A numeric vector of 0/1 values
    • A two-column matrix of successes/failures
  3. Use drop1() or add1() for model selection
  4. Consider glm.nb() from the MASS package for overdispersed count data

The glm() function in R is fundamental for many statistical analyses in R, providing flexibility to handle various types of response variables beyond normal distributions.

Try Pedagogy Quizzes

Use of Important Functions in R

Looking for the most important functions in R? This blog post answers key questions like creating frequency tables (table()), redirecting output (sink()), transposing data, calculating standard deviation, performing t-tests, ANOVA, and more. Perfect for R beginners and data analysts!

  • Important functions in R
  • R programming cheat sheet
  • Frequency table in R (table())
  • How to use sink() in R
  • Transpose data in R (t())
  • Standard deviation in R (sd())
  • T-test, ANOVA, and Shapiro-Wilk test in R
  • Correlation and covariance in R
  • Scatterplot matrices (pairs())
  • Diagnostic plots in R

This Important functions in R, Q&A-style guide covers essential R functions with clear examples, helping you master data manipulation, statistical tests, and visualization in R. Whether you’re a beginner or an intermediate user, this post will strengthen your R programming skills!

Which function is used to create a frequency table in R?

In R, a frequency table can be created by using table() function.

What is the use of sink() function?

The sink() function in R is used to redirect R output (such as the results of computations, printed messages, or console output) to a file instead of displaying it in the console. This is particularly useful for saving logs, results of analyses, or any other text output generated by R scripts.

Explain what transpose is and how it is performed.

Transpose is used for reshaping the data, which is used for analysis. Transpose is performed by t() function.

What is the length function in R?

The length() function in R gets or sets the length of a vector (list) or other objects. The length() function can be used for all R objects. For an environment, it returns the object number in it. NULL returns 0.

What is the difference between seq(4) and seq_along(4)?

seq(4) means vector from 1 to 4 (c(1,2,3,4)) whereas seq_along(4) means a vector of the length(4) or 1 (c(1)).

Vector $v$ is c(1,2,3,4) and list $x$ is list(5:8). What is the output of v*x[[1]]?

[1] 5 12 21 32s

Important functions in R Language

How do you get the standard deviation for a vector $x$?

sd(x, na.rm=TRUE)

$X$ is the vector c(5,9.2,3,8.51,NA). What is the output of mean(x)?

The output will be NA.

Important function in R Programming

How can one compute correlation and covariance in R?

Correlation is produced by cor() and covariance is produced by cov() function.

How to create scatterplot matrices?

pair() or splom() function are used to create scatterplot matrices.

What is the use of diagnostic plots?

It is used to check the normality, heteroscedasticity, and influential observations.

What is principal() function?

It is defined in the psych package that is used to rotate and extract the principal components.

Define mshapiro.test()?

It is a function which defined in the mvnormtest package. It produces the Shapiro-Wilk test for multivariate normality.

Define barlett.test().

The barlett.test() is used to provide a parametric k-sample test of the equality of variances.

Define anova() function.

The anova() is used to compare the nested models. Read more One-Way ANOVA

Define plotmeans().

It is defined under the gplots package, which includes confidence intervals, and it produces a mean plot for single factors.

Define loglm() function.

The loglm() function is used to create log-linear models.

What is t-tests() in R?

We use it to determine whether the means of two groups are equal or not by using t.test() function.

Statistics and Data Analysis