Mastering Data Manipulation Functions in R

Learn essential Data Manipulation Functions in R like with(), by(), subset(), sample() and concatenation functions in this comprehensive Q&A guide. Perfect for students, researchers, and R programmers seeking practical R coding techniques. Struggling with data manipulation in R? This blog post about Data manipulation in R breaks down critical R functions in an easy question-answer format, covering:
with() vs by() – When to use each for efficient data handling.
Concatenation functions (c(), paste(), cbind(), etc.) – Combine data like a pro.
subset() vs sample() – Filter data and generate random samples effortlessly.
The Data manipulation functions in R include practical examples to boost R programming skills for data analysis, research, and machine learning.

Data Manipulation Functions in R

Explain with() and by() functions in R are used for?

In R programming, with() and by() functions are two useful functions for data manipulation and analysis.

  • with() Function: allows to evaluate expressions within a specific data environment (such as data.frame, or list) without repeatedly referencing the dataset. The syntax with an example is with(data, expr)
    df = data.frame(x = 1:5, y=6:10)
    with(df, x + y)
  • by() Function: applies a function to subsets of a dataset split by one or more factors (similar to GROUP BY in SQL). The syntax with an example is
    by(data, INDICES, FUN, …)

    df <- data.frame(group = c("A", "B", "B"), value = c(10, 20, 30, 40))
    by(df$value, df$group, mean) # computes the mean for each group
Data Manipulation Functions in R with by functions

Use with() to simplify code when working with columns in a data frame.

Use by() (or dplyr/tidyverse alternatives) for group-wise computations.

Data Manipulation Functions in R Language

Both with() and by() functions are base R functions, but modern alternatives like dplyr (mutate(), summarize(), group_by()) are often preferred for readability. The key difference between with() and by() functions are:

FunctionPurposeInputOutput
with()Evaluate expressions in a data environmentData frame + expressionResult of expression
by()Apply a function to groups of dataData + grouping factor + functionResults

What are the concatenation functions in R?

In the R programming language, concatenation refers to combining values into vectors, lists, or other structures. The following are primary concatenation functions:

  • c() Basic Concatenation: is used to combine elements into a vector (atomic or list). It works with numbers, characters, logical values, and lists. The examples are
    x <- c(1, 2, 3)
    y <- c("a", "b", "c")
    z <- c(TRUE, FALSE, TRUE, TRUE)
  • paste() and paste0() String Concatenation: is used to combine strings (character vectors with optional separators. The key difference between paste() and paste0 is the use of a separator. The paste() has a default space separator. The examples are:
    paste("Hello", "world")
    paste0("hello", "world")
    paste(c("A", "B"), 1:2, sep = "-")
  • cat() Print Concatenation: is used to concatenate outputs to the console/file (it is not used for storing results). It is useful for printing messages or writing to files. The example is:
    cat("R Frequently Asked Questions", "https://rfaqs.com", "\n")
  • append() Insert into Vectors/ Lists: is used to add elements to an existing vector/ list at a specified position.
    x <- c(1, 2, 3)
    append(x, 4, after = 2) # inserts 4 after position 2
  • cbind() and rbind() Matrix/ Data Frame Concatenation: is used to combine objects column-wise and row-wise, respectively. It works with vectors, matrices, or data frames. The examples are:
    df1 <- data.frame(A = 1:2, B = c("X", "Y"))
    df2 <- data.frame(A = 3:4, B = c("Z", "W"))
    rbind(df1, df2) # stacks rows
    cbind(df1, C= c(10, 20)) # adds a new column
  • list() Concatenate into a list: is used to combine elements into a list (preserves structure, unlike c(). The example is:
    my_list = list(1, "a", TRUE, 10:15) # keeps elements as separate list time

The key differences between these concatenation functions are:

FunctionOutput TypeUse Case
c()Atomic vector/listSimple element concatenation
paste()Character vectorString merging with separators
cat()Console outputPrinting/writing text
append()Modified vector/listInserting elements at a position
cbind()Matrix/data frameColumn-wise combination
rbind()Matrix/data framebRow-wise combination
list()ListPreserves heterogeneous elements

What is the use of subset() function and sample() function in R?

Both subset() and sample() are essential functions in R for data manipulation and random sampling, respectively. One can use subset() when one needs to filter rows or select columns based on logical conditions. One can prefer cleaner syntax over $df[df$age > 25, ]$. Use sample() when one needs random samples (such as for machine learning splits) or one wants to shuffle data or perform bootstrapping.

  • subset() function: is used to filter rows and select columns from a data frame based on conditions. It provides a cleaner syntax compared to base R subsetting with []. The syntax and example are:
    subset(data, subset, select)

    df <- data.frame(
    name = c("Ali", "Usman", "Imdad"),
    age = c(25, 30, 22),
    score = c(85, 90, 60))
    subset(df, age > 25)
    subset(df, age > 25, select = c(name, score))
    Note that the subset() function works only with data frames.
  • sample() Function: is used for random sampling from a vector or data frame. It helps create train-test splits, bootstrapping, and randomizing data order. The syntax and example are:
    sample(x, size, replace = FALSE, prob = NULL)

    sample(1:10, 3) # sample 3 number from 1 to 10 without replacement
    sample(1:6, 10, replace = TRUE) # 6 possible outcomes, sampled 10 times with replacement
    sample(letters[1:5]) # shuffle letters A to E

The key difference between subset() and sample() are:

Featuresubset()sample()
PurposeFilter data based on conditionsRandomly select elements/rows
InputData framesVectors, data frames
OutputSubsetted data frameRandomly sampled elements
Use CaseData cleaning, filteringTrain-test splits, bootstrapping

Statistics and Data Analysis

Python NumPy MCQs 11

These Python NumPy MCQs are designed to test your understanding of fundamental NumPy concepts, including array creation, manipulation, and common operations. The Python NumPy MCQs Quiz consists of multiple-choice questions (MCQs) covering essential topics such as:

  • Array creation (np.array(), np.zeros(), np.arange())
  • Array properties (.shape, .size)
  • Basic operations (dot product, arithmetic)
  • NumPy terminology

Each question is followed by the correct answer, making it useful for self-assessment, interviews, or exam preparation. Whether you are a beginner or an intermediate Python programmer, this Python Quiz helps reinforce key NumPy skills efficiently. Let us start with the Python NumPy MCQs now.

1. How do you access the element at the second row and third column of a 2D NumPy array ‘arr’?

 
 
 
 

2. What does np.random.seed(42) do?

 
 
 
 

3. What Python libraries are commonly used for data mining?

 
 
 
 

4. Which function is used to create a NumPy array?

 
 
 
 

5. What is the output of np.zeros((2,3))?

 
 
 
 

6. Which is the correct way to create a $2\times2$ NumPy array filled with ones?

 
 
 
 

7. What result will the following lines of code give?
a=np.array([0,1])
b=np.array([1,0])
np.dot(a,b)

 
 
 
 

8. What does np.arange(5) produce?

 
 
 
 

9. Which of the following statements about creating and manipulating multi-dimensional arrays in Python using NumPy are true?

 
 
 
 
 

10. What is the primary purpose of the NumPy library in Python?

 
 
 
 

11. If you run the following lines of code, what values will the variable ‘out’ take?
X=np.array([[1,0,1],[2,2,2]])
out=X[0:2,2]

 
 
 
 

12. What line of code would produce the following: array([11, 11, 11, 11, 11])?

 
 
 
 

13. Which method returns the shape of a NumPy array?

 
 
 
 

14. After executing the given code, what value does $Z$ hold?
X=np.array([[1,0],[0,1]])
Y=np.array([[2,1],[1,2]])
Z=np.dot(X,Y)

 
 
 
 

15. Which of the following operations can be performed on NumPy arrays?

 
 
 
 
 

16. What does NumPy stand for?

 
 
 
 

17. Which of the following are valid ways to create a NumPy array?

 
 
 
 
 

18. If you run the following lines of code, what values will the variable ‘out’ take?
X=np.array([[1,0],[0,1]])
Y=np.array([[2,2],[2,2]])
Z=np.dot(X,Y)

 
 
 
 

19. What does the value of $Z$ become after executing the following code?
X=np.array([[1,0],[0,1]])
Y=np.array([[0,1],[1,0]])
Z=X+Y

 

 
 
 
 

20. What outcome do the following lines of code produce?
a=np.array([0,1,0,1,0])
b=np.array([1,0,1,0,1])
a+b

 
 
 
 

Online Python NumPy MCQs with Answers

Online Python Numpy MCQs with Answers

  • What result will the following lines of code give?
    a=np.array([0,1])
    b=np.array([1,0]) np.dot(a,b)
  • What does the value of $Z$ become after executing the following code?
    X=np.array([[1,0], [0,1]])
    Y=np.array([[0,1], [1,0]])
    Z=X+Y  
  • If you run the following lines of code, what values will the variable ‘out’ take?
    X=np.array([[1,0,1],[2,2,2]])
    out=X[0:2,2]
  • If you run the following lines of code, what values will the variable ‘out’ take?
    X=np.array([[1,0], [0,1]])
    Y=np.array([[2,2], [2,2]])
    Z=np.dot(X,Y)
  • After executing the given code, what value does $Z$ hold?
    X=np.array([[1,0], [0,1]])
    Y=np.array([[2,1], [1,2]])
    Z=np.dot(X,Y)
  • What outcome do the following lines of code produce?
    a=np.array([0,1,0,1,0])
    b=np.array([1,0,1,0,1])
    a+b
  • What line of code would produce the following: array([11, 11, 11, 11, 11])?
  • Which is the correct way to create a $2\times2$ NumPy array filled with ones?
  • Which of the following are valid ways to create a NumPy array?
  • Which of the following operations can be performed on NumPy arrays?
  • How do you access the element at the second row and third column of a 2D NumPy array ‘arr’?
  • What is the primary purpose of the NumPy library in Python?
  • What Python libraries are commonly used for data mining?
  • What does NumPy stand for?
  • Which of the following statements about creating and manipulating multi-dimensional arrays in Python using NumPy are true?
  • Which function is used to create a NumPy array?
  • What is the output of np.zeros((2,3))?
  • Which method returns the shape of a NumPy array?
  • What does np.arange(5) produce?
  • What does np.random.seed(42) do?

Islamiat MCQs 9th Class Quiz

How to Save Data in R

The post is about data in R language. Learn how to save and read data in R with this comprehensive guide. Discover methods like write.csv()saveRDS(), and read.table(), understand keyboard input using readline()scan(), and master file-based data loading for matrices and datasets. Perfect for beginners and intermediate R users!

How can you Save the Data in R Language?

To save data in R Language, there are many ways. The easiest way of saving data in R is to click Data –> Active Data Set –> Export Active Data. A dialogue box will appear. Click OK in the dialogue box. The data will be saved. The other ways to save data in R are:

Saving to CSV Files

# Base R package
write.csv(Your_DataFrae, "path/to/file.csv", row.names = FALSE)

#readr (tidyverse) Package
library(readr)
write_csv(your_DataFrame, "path/to/file.csv")

Saving to MS Excel Files

To save data to Excel files, the writexl or openxlsx package can be used

library(writexl)
write_xlsx(your_DataFrame, "path/to/file.xlsx")

Saving to R’s Native Formats

Single or Multiple objects can be saved to a single file, such as RData

# .RData file
save(object1, object2, file = "path/to/data.RData")

# .rds file
saveRDS(your_DataFrame, "path/to/data.rds")

Saving to Text Files

Data can be saved to text files using the following commands:

# Using Base R Package
write.table(your_DataFrame, "path/to/file.txt", sep = "\t", row.names = FALSE)

# using readr Package
write_delim(your_DataFrame, "path/to/file.txt", delim = "\t")

Saving to JSON File Format

The data can be saved to a JSON file format using the jsonlite package.

write_json(your_DataFrame, "path/to/file.json")

Saving Data to Databases

Write data to SQL databases (for example, SQLite, PostgreSQL), for example

library(DBI)
library(RSQLIte)

# Create a database connect
con <- dbConnect(RSQLite::SQLite(), "path/to/database.db")

# Write a data frame to the database
dbWriteTable(con, "table_name", your_DataFrame)

# Disconnect when done
dbDisconnect(con)

Saving Data to Other Statistical Software Formats

The haven package can be used to save data for SPSS, Stata, or SAS. For example

library(haven)
write_sav(your_DataFrame, "path/to/file.sav")  # SPSS file format
write_dta(your_DataFrame, "path/to/file.dta")  # STATA file format

It is important to note that

  • File Paths: Use absolute file paths, for example, D:/projects/data.csv, or relative paths such as data/file.csv.
  • Overwriting: By default, R will overwrite existing files. Add checks to avoid accidental loss, for example,
    if (!file.exists("file.csv")){
    write.csv(your_DataFrame, "file.csv")
    }

How to Read Data from the Keyboard?

To read the data from the keyboard, one can use the following functions

  • scan(): read data by directly pressing keyboard keys
  • deadline(): read text lines from a file connection
  • print(): used to display specified keystrokes on the display/monitor.

Explain How to Read Data or a Matrix from a File?

  • read.table(): usually read.table() function is used to read data. The default value of a header is set to “FALSE,” and hence, when we do not have a header, we need not use this argument.
  • Use read.csv() or read.table() function to import/read spreadsheet exported files in which columns are separated by commas instead of white spaces. For MS Excel file use, read.xls() function.
  • When you read in a matrix using read.table(), the resultant object will become a data frame, even when all the entries got to be numeric. The as.matrix() function can be used to read it into a matrix form like this
    as.matrix(x, nrow = 5, byrow=T)

What is scan() Function in R?

The scan() function is used to read data into a vector or list from the console or a file.

z <- scan()
1: 12 5
3: 5
4:
Read 3 items

z
### Output
12 5 5
Data in R Language

What is readline() Function in R?

The readline() function is used to read text lines from a connection. The readline() function is used for inputting a line from the keyboard in the form of a string. For example,

w <- readline()
xyz vw u

w

## Output

xyz vw u

Statistics and Data Analysis