Student and Instructor of Statistics and business mathematics.
Completed my Ph.D. in Statistics from the Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan.
l like Applied Statistics, Mathematics and Statistical Computing.
Statistical and Mathematical software used is SAS, STATA, GRETL, EVIEWS, R, SPSS, VBA in MS-Excel.
Like to use type-setting LaTeX for composing Articles, thesis, etc.
One should check/test the assumption of normality before performing a statistical test that requires the assumption of normality. In this article, we will discuss the Shapiro-Wilk Test in R (one sample t-test). The hypothesis is
$H_0$: The data are normally distributed
$H_1$: The data are not normally distributed
Performing Shapiro-Wilk Test in R
To check the normality using the Shapiro-Wilk test in R, we will use a built-in data set of mtcars.
attach(mtcars)
shapiro.test(mpg)
The results indicate that the $mpg$ variable is statistically normal as the p-value from the Shapiro-Wilk Test is much greater than the 0.05 level of significance.
By looking at the p-value, one can determine whether to reject or accept the null hypothesis of normality:
If the p-value is less than the chosen significance level (e.g., 0.05), reject the null hypothesis and conclude that the data is likely not normally distributed.
If the p-value is greater than the chosen significance level, one failed to reject the null hypothesis, suggesting the data might be normal (but it does not necessarily confirm normality).
The normality can be visualized using a QQ plot.
# QQ Plot from Base Package
qqnorm(mpg, pch = 1, fram = F)
qqline(mpg, col="red", lwd = 2)
From the QQ plot of the base package, it can be seen that there are a few points due to which $mpg$ variable is not normally distributed.
# QQ plot from car Package
library(car)
qqPlot(mpg)
From the QQ plot (with confidence interval band), one can observe that the $mpg$ variable is approximately normally distributed.
Note that
The Shapiro-Wilk test is generally more powerful than other normality tests like the Kolmogorov-Smirnov test for smaller sample sizes (typically less than 5000).
It is important to visually inspect the data using a histogram or Q-Q plot to complement the Shapiro-Wilk test results for a more comprehensive assessment of normality.
In R Programming language a data frame is a two-dimensional data structure. The data frame objects contain rows and columns. The number of rows for each column should have equal length. The cross-section of the row and column can be considered as a cell. Each cell of the data frame is associated with a combination of row number and column number.
Table of Contents
A data frame in R Programming Langauge has:
Rows: Represent individual observations or data points.
Columns: Represent variables or features being measured. Each column holds values for a single variable across all observations.
Data Types: Columns can hold data of different types, including numeric, character, logical (TRUE/FALSE), and factors (categorical variables).
One can modify, extract, and re-arrange the data contents of a data frame; the process is called the manipulation of the data frame. To create a data frame a general syntax can be followed
Data Frame Syntax in R
The general syntax of a data frame in R Language is
df <- data.frame(first column = c(data values separated with commas,
second column = c(data values separate with commans,
......
)
An exemplary data frame in the R Programming language is
The subset() method can be used to create a new data set by removing specified column(s). This splits the data frame into two sets, one with excluded columns and the other with included columns. To understand subsetting a data frame, let us create a data frame first.
# creating a data frame
df = data.frame(row1 = 0:3, row2 = 3:6, row3 = 6:9)
# creating a subset
df <- subset(df, select = c(row1, row2))
Question: Data Frame in R Language
Suppose we have a frequency distribution of sales from a sample of 100 sales receipts.
Price Value
Number of Sales
0 to 20
16
20 to 40
18
40 to 60
14
60 to 80
24
80 to 100
20
100 to 120
8
Calculate the mean, median, variance, standard deviation, and coefficient of variation by using the R code.
Using Logical Conditions for Selecting Rows and Columns
For selecting rows and columns using logical conditions, we consider the iris data set. Here, suppose we are interested in Selecting rows whose values are higher than the median for Sepal Length and whose Petal.Width >= 1.7. In the code below, each value is Sepal.Length variable (column) is compared with the median value of Sepal.Length. Similarly, each value of Petal.Width is compared with 1.7 to extract the required values from these two columns.
The post is about MCQs R Package Development Quiz. The quiz also contains questions about git. There are a total of 17 questions and some of the questions have multiple correct answers. Let us start with MCQs R Package Development.
Online MCQs about R Package Development
MCQs R Package Development with Answers
Which of the following are good reasons to build an R Package?
Which of the following files and folders are required in an R package?
Which of the following files and subdirectories will be included in the initial package directory if you create a new package using the ‘create’ function from ‘devtools’?
Which of the following functions from the devtools package are you likely to use often, rather than just once per package, when building a package?
What is the purpose of the DESCRIPTION file in a package?
Which of the following statements correctly describes how R functions should be defined with the package directory?
How is attaching a package namespace different from loading a namespace?
For packages that require C code, what should be installed on your system?
What is the purpose of the Imports field in the DESCRIPTION file?
Which of the following are good reasons for open-sourcing your software?
When a test fails in a call to expect_that(), what happens?
What does the is_a() function do in the context of testthat?
In which sub-directory of an R package should tests be placed?
What is Git?
What is a pull request on GitHub?
The GNU General Public License is called a copyleft license because