Skip to content

R Frequently Asked Questions

Statistical Computing and Graphics in R

Menu
  • Learn R
    • R Basics
      • R FAQS about Package
      • R GUI
      • Using R packages
      • Missing Values
    • R Graphics
    • Data Structure
      • Data Frame
      • Matrices
      • List
    • R Programming
    • Statistical Models
  • R Quiz
    • MCQs R Programming
    • R Basic Quiz 7
    • MCQs R Debugging 6
    • MCQs R Vectors 5
    • R History & Basics 4
    • R Language Test 3
    • R Language MCQs 2
    • R Language MCQs 1
  • MCQs
    • MCQs Statistics
      • MCQs Basic Statistics
      • MCQs Probability
      • MCQs Graph & Charts
      • MCQs Sampling
      • MCQs Inference
      • MCQs Correlation & Regression
      • MCQs Time Series
      • MCQs Index Numbers
      • MCQs Quality Control 1
    • MCQS Computer
    • MCQs Mathematics Part-I
  • About ME
  • Contact Us
  • Glossary

Category: Factors in R

Factors in R (Categorical Data)

No Comments
| Factors in R

Factors in R Language are used to represent categorical data in the R language. Factors can be ordered or unordered. One can think of a factor as an integer vector where each integer has a label. Factors are specially treated by modeling functions such as lm() and glm().  Factors are the data objects used for categorical data and store it as levels. Factors can store both string and integer variables. 

Using factors with labels is better than using integers as factors are self-describing; having a variable that has values “Male” and “Female” is better than a variable having values 1 and 2.

Creating a Simple Factor

create a simple factor that has two levels

# Simple factor with two levels
x <- factor(c("yes", "yes", "no", "yes", "no"))
# computes frequency of factors
table(x)

# strips out the class
unclass(x)

The order of the levels can be set using the levels argument to factor(). This can be important in linear modeling because the first level is used as the baseline level.

x <- factor(c("yes","yes","no","yes","no"), levels = c("yes","no"))

Factors can be given names using the label argument. The label argument changes the old values of the variable to a new one. For example,

x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no"), label = c(1,2) )

x <- factor(c("yes","yes","no","yes","no"), levels = c("yes","no"), label = c("Level-1", "level-2"))

x <- factor(c("yes","yes","no","yes","no"), levels = c("yes","no"), label = c("group-1", "group-2"))

Suppose, you have a factor variable with numerical values. You want to compute the mean. The mean vector will result in the average value of the vector, but the mean of the factor variable will result in a warning message. To calculate the mean of the original numeric values of the "f" variable, you have to convert the values using the level argument. For example,

# vector
v <- c(10,20,20,50,10,20,10,50,20)
# vector converted to factor
f <- factor(v)

# mean of the vector
mean(v)

# mean of factor
mean(f)

mean(as.numeric(levels(f)[f]))

Use of cut( ) Function to Create a Factor Variable

The the cut( ) function can also be used to convert a numeric variable into factor. The breaks argument can be used to describe how ranges of numbers will be converted to factor values. If the breaks argument is set to a single number then the resulting factor will be created by dividing the range of the variable into that number of equal-length intervals. However, if a vector of values is given to the breaks argument, the values in the vectors are used to determine the breakpoint. The number of levels of the resultant factor will be one less than the number of values in the vector provided to the breaks argument. For example,

attach(mtcars)
cut(mpg, breaks = 3)

factors <- cut(mpg, breaks = c(10, 18, 25, 30, 35) )

table(factors)

You will notice that the default label for factors produced by cut() function contains the actual range of values that were used to divide the variable into factors.

Learn about Data Frames in R

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Skype
  • Tumblr
  • Pinterest
  • Print
  • WhatsApp
  • Telegram
  • Reddit
  • Pocket

Like this:

Like Loading...

Read More »

Subscribe via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 265 other subscribers

Search Form

Facebook

Facebook

Categories

  • Advance R Programming (3)
  • Data Analysis (12)
    • Comparisons Tests (2)
    • Statistical Models (10)
  • Data Structure (9)
    • Data Frame (2)
    • Factors in R (1)
    • List (2)
    • Matrices (2)
    • Vectors in R (1)
  • Importing/ Exporting Data (4)
    • R Data Library (4)
  • R Control Structure (3)
    • For loop in R (1)
    • Switch Statement (1)
  • R FAQS (18)
    • Missing Values (2)
    • R Basics (12)
    • R FAQS about Package (3)
    • R Programming (2)
  • R Graphics (4)
    • Exploring Data in R (1)
    • plot Function (2)
  • R Language Basics (4)
  • R Language Quiz (8)
  • Using R packages (2)
https://www.youtube.com/watch?v=MZpiMyAfnYQ&list=PLB01qg3XnNiMbKkvP2wYzzHkv6ZekaKZx

Posts: itfeature.com: Basic Statistics and Data Analysis

MCQs Chi-Square Association 2

The relationship/ Dependency (also called Association) between the attributes is called relationship/association and the measure of degrees of relationship between the attributes is called the coefficient of association. The Chi-Square Statistic is used to…

Short Questions Sampling and Sampling Distributions 1

The post is about some important Short Questions about sampling and sampling distribution. Q1: Define Sample and Sampling. Answer: Sample: A small portion of the population representing the qualities of the population being sampled…

MCQs IBM SPSS-1

Online MCQs about IBM SPSS with answers.

MCQs Correlation and Regression 6

This Quiz contains MCQs about Correlation and Regression Analysis, Multiple Regression Analysis, Coefficient of Determination (Explained Variation), Unexplained Variation, Model Selection Criteria, Model Assumptions, Interpretation of results, Intercept, Slope, Partial Correlation, Significance tests, OLS Assumptions,…

Short Questions: Normal and Standard Normal Distribution

The following post is about Short Questions related to Normal and Standard Normal Distribution. Q1: What is a standard normal variable? Ans: The variable $Z=\frac{X-\mu}{\sigma}$ which measures the deviations of variable $X$ from the…

Posts: gmstat.com: GM Statistics

MCQs Number System – 4

MCQs Economics – 3

MCQs Economics – 2

Try MCQs Economics Test 1

MCQs Economics – 1

MCQs Econometrics Quiz 5

This quiz is about Econometrics, which covers the topics of Regression analysis, correlation, dummy variable, multicollinearity, heteroscedasticity, autocorrelation, and many other topics. Let’s start with MCQs Econometrics test An application of different statistical methods applied to the economic data used…

R Frequently Asked Questions 2023 . Powered by WordPress

%d bloggers like this:
    pixel