Numeric Data Type in R Language

The article is about Numeric Data Type in R Language. Decimal values are referred to as numeric data types in R, which is the default working data type for numbers in the R Language.

Numeric Data Type in R Language

Assigning a decimal value to a variable $x$ creates a variable that has a numeric data type. For example

x <- 6.2
print(x)

Since numeric data types consist of numbers, one can perform different mathematical operations such as addition, subtraction, multiplication, division, etc.

Class of Numeric Data Type

In R, the class of numeric variables is numeric. One can check the class of a numeric object ($x$) by using class() function.

class(x)
Numeric Data Type in R

Converting Character Type to Numeric Type in R

In R Language, the as.numeric() function is used to convert a vector of character values to a numeric value. Note that by default, R converts character vectors to factors.

One can confirm the data type of an object by using a function is.numeric(). For example,

is.numeric(x)

If is.numeric(x) results in an output of TRUE then it means that the data type of the variable/object $x$ is numeric. Let’s assign a whole number to a variable $y$ and then check the class of object $y$:

y <- 2
class(y)
[1] "numeric"

It means that the default data type for numbers is the numeric type in R Language. One can also use typeof() function to confirm the data type of a variable.

Creating Numeric Vectors In R

One can also create a variable (called a numeric vector) by using the numeric function in R. It will create a vector of zeros. For example,

z <- numeric(5)
print(z)

[1] 0 0 0 0 0

class(z)

[1] "numeric"


Other methods also exist for the creation of numeric vectors. Note that the numeric data type is different from the integer.

MCQs Data Basic Statistics Quiz

In Summary, the numeric data type in R is a fundamental data structure for numerical computations in R. Understanding its properties and when to potentially use the integer data type is essential for effective data analysis in R.

Frequently Asked Questions About R Numeric Data type in R

Best Statistical Inference Quiz in R 14

The article contains a Statistical Inference quiz in R language with Answers. There are 16 questions in the “Statistical Inference Quiz in R Language”. The MCQs are from probability and regression models. Let us Start with the Statistical Inference Quiz in R.

Statistical Inference Quiz in R Language

1. The respiratory disturbance index (RDI), a measure of sleep disturbance, for a specific population has a mean of 15 (sleep events per hour) and a standard deviation of 10. They are not normally distributed. Give your best estimate of the probability that a sample average RDI of 100 people is between 14 and 16 events per hour.

 
 
 
 

2. Consider the following data set. What is the intercept for fitting the model with $x$ as the predictor and $y$ as the outcome?
x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)

 
 
 
 

3. Consider the following PMF shown below in R
x <- 1:4
p <- x/sum(x)
temp <- rbind(x, p)
rownames(temp) <- c("X", "Prob")
temp
What is the mean?

 
 
 
 

4. Consider the mtcars data set. Fit a model with mpg as the outcome that includes a number of cylinders as a factor variable and weight as a confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

 
 
 
 

5. Consider the following data set
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
Give the hat diagonal for the most influential point

 
 
 
 

6. Consider a standard uniform density. The mean for this density is 0.5 and the variance is 1 / 12. You sample 1,000 observations from this distribution and take the sample mean, what value would you expect it to be near?

 
 
 
 

7. Consider the following data set
x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)
Fit the regression through the origin and get the slope treating $y$ as the outcome and $x$ as the regressor.

(Hint, do not center the data since we want regression through the origin, not through the means of the data.)

 
 
 
 

8. Consider the mtcars data set. Fit a model with mpg as the outcome that considers numbers of cylinders as a factor variable and weight as a confounder. Now fit a second model with mpg as the outcome model that considers the interaction between numbers of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.

 
 
 
 

9. Do data(mtcars) from the datasets package and fit the regression model with mpg as the outcome and weight as the predictor. Give the slope coefficient.

 
 
 
 

10. Brain volume for adult women is normally distributed with a mean of about 1,100 cc for women with a standard deviation of 75 cc. What brain volume represents the 95th percentile?

 
 
 
 

11. Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

Give the slope dfbeta for the point with the highest hat value.

influence.measures(fit5)$infmat[which.max(abs(influence.measures(fit5)$infmat[, 2])), 2]

 
 
 
 

12. Consider the mtcars data set. Fit a model with mpg as the outcome that includes the number of cylinders as a factor variable and weight included in the model as

lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)

How is the wt coefficient interpreted?

 
 
 
 

13. Consider the mtcars data set. Fit a model with mpg as the outcome that includes numbers of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by-weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?

 
 
 
 

14. The number of people showing up at a bus stop is assumed to be Poisson with a mean of 5 people per hour. You watch the bus stop for 3 hours. About what’s the probability of viewing 10 or fewer people?

 
 
 
 

15. You flip a fair coin 5 times, about what’s the probability of getting 4 or 5 heads?

 
 
 
 

16. Suppose that diastolic blood pressures (DBPs) for men aged 35-44 are normally distributed with a mean of 80 (mm Hg) and a standard deviation of 10. About what is the probability that a random 35-44-year-old has a DBP less than 70?

 
 
 
 

Statistical Inference Quiz in R Language

Statistical Inference Quiz in R with Answers

  • Consider the following PMF shown below in R
    x <- 1:4 p <- x/sum(x)
    temp <- rbind(x, p)
    rownames(temp) <- c(“X”, “Prob”)
    temp
    What is the mean?
  • Suppose that diastolic blood pressures (DBPs) for men aged 35-44 are normally distributed with a mean of 80 (mm Hg) and a standard deviation of 10. About what is the probability that a random 35-44-year-old has a DBP less than 70?
  • Brain volume for adult women is normally distributed with a mean of about 1,100 cc for women with a standard deviation of 75 cc. What brain volume represents the 95th percentile?
  • You flip a fair coin 5 times, about what’s the probability of getting 4 or 5 heads?
  • The respiratory disturbance index (RDI), a measure of sleep disturbance, for a specific population has a mean of 15 (sleep events per hour) and a standard deviation of 10. They are not normally distributed. Give your best estimate of the probability that a sample average RDI of 100 people is between 14 and 16 events per hour.
  • Consider a standard uniform density. The mean for this density is 0.5 and the variance is 1 / 12. You sample 1,000 observations from this distribution and take the sample mean, what value would you expect it to be near?
  • The number of people showing up at a bus stop is assumed to be Poisson with a mean of 5 people per hour. You watch the bus stop for 3 hours. About what’s the probability of viewing 10 or fewer people?
  • Consider the mtcars data set. Fit a model with mpg as the outcome that includes a number of cylinders as a factor variable and weight as a confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.
  • Consider the mtcars data set. Fit a model with mpg as the outcome that includes the number of cylinders as a factor variable and weight included in the model as
    lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
    How is the wt coefficient interpreted?
  • Consider the following data set
    x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
    y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
    Give the hat diagonal for the most influential point
  • Consider the following data set
    x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
    y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
    Give the slope dfbeta for the point with the highest hat value. influence.measures(fit5)$infmat[which.max(abs(influence.measures(fit5)$infmat[, 2])), 2]
  • Consider the mtcars data set. Fit a model with mpg as the outcome that includes a number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by-weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?
  • Consider the mtcars data set. Fit a model with mpg as the outcome that considers a number of cylinders as a factor variable and weight as a confounder. Now fit a second model with mpg as the outcome model that considers the interaction between numbers of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.
  • Consider the following data set
    x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
    y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)
    Fit the regression through the origin and get the slope treating $y$ as the outcome and $x$ as the regressor. (Hint, do not center the data since we want regression through the origin, not through the means of the data.)
  • Do data(mtcars) from the datasets package and fit the regression model with mpg as the outcome and weight as the predictor. Give the slope coefficient.
  • Consider the following data set. What is the intercept for fitting the model with $x$ as the predictor and $y$ as the outcome?
    x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
    y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)

Sampling and Sampling Distributions Quiz with Answers

Intermediate Mathematics Part-I Quiz with Answers

Shapiro-Wilk Test in R (2024)

One should check/test the assumption of normality before performing a statistical test that requires the assumption of normality. In this article, we will discuss the Shapiro-Wilk Test in R (one sample t-test). The hypothesis is

$H_0$: The data are normally distributed

$H_1$: The data are not normally distributed

Performing Shapiro-Wilk Test in R

To check the normality using the Shapiro-Wilk test in R, we will use a built-in data set of mtcars.

attach(mtcars)
shapiro.test(mpg)
Shapiro-Wilk Test in R Checking Normality Assumption

The results indicate that the $mpg$ variable is statistically normal as the p-value from the Shapiro-Wilk Test is much greater than the 0.05 level of significance.

  • By looking at the p-value, one can determine whether to reject or accept the null hypothesis of normality:
    • If the p-value is less than the chosen significance level (e.g., 0.05), reject the null hypothesis and conclude that the data is likely not normally distributed.
    • If the p-value is greater than the chosen significance level, one failed to reject the null hypothesis, suggesting the data might be normal (but it does not necessarily confirm normality).

The normality can be visualized using a QQ plot.

# QQ Plot from Base Package
qqnorm(mpg, pch = 1, fram = F)
qqline(mpg, col="red", lwd = 2)

QQ Plot from Base Package

From the QQ plot of the base package, it can be seen that there are a few points due to which $mpg$ variable is not normally distributed.

# QQ plot from car Package
library(car)
qqPlot(mpg)
QQ Plot from car Package

From the QQ plot (with confidence interval band), one can observe that the $mpg$ variable is approximately normally distributed.

Note that

  • The Shapiro-Wilk test is generally more powerful than other normality tests like the Kolmogorov-Smirnov test for smaller sample sizes (typically less than 5000).
  • It is important to visually inspect the data using a histogram or Q-Q plot to complement the Shapiro-Wilk test results for a more comprehensive assessment of normality.

https://itfeature.com