Category: Missing Values

R FAQs: Handling Missing values in R

Question: What are the differences of missing values in R and other Statistical Packages?

Answer: Missing values (NA) cannot be used in comparisons, as already discussed in previous post on missing values in R. In other statistical packages (softwares) a “missing value” is assigned some code either very high or very low in magnitude such as 99 or -99 etc. These coded values are considered as missing and can be used to compare to other values and other values can be compared to missing values. In R language NA values are used for all kinds of missing data, while in other packages, missing strings and missing numbers are represented differently, for example, empty quotations for strings, and periods, large or small numbers. Similarly non-NA values cannot be interpreted as missing while in other packages system missing values are designate from other values.

Question: What are NA options in R?

Answer: In previous post on missing values, I introduced is.na() function as a tool for both finding and creating missing values. The is.na() is one of several functions build around NA. Most of the other functions for missing values (NA) are options for na.action(). The possible na.action() settings within R are:

  • na.omit() and na.exclude(): These functions return the object with observations removed if they contain any missing (NA) values. The difference between these two functions na.omit() and na.exclude() can be seen in in some prediction and residual functions.
  • na.pass(): This function returns the object unchanged.
  • na.fail(): This function returns the object only if it contains no missing values.

To understand these NA options use the following lines of code.

getOption(“na.action”)
(m<-as.data.frame(matrix(c(1:5, NA), ncol=2)))
na.omit(m)
na.exclude(m)
na.fail(m)
na.pass(m)

Note that it is wise to both investigate the missing values in you data set and also make use of the help files for all functions you are willing to use for handling missing values. You should be either aware of and comfortable with the default treatments (handling) of missing values or specifying the treatment of missing values you want for you analysis.

R FAQ missing values

Question: Can missing values be handled on R?
Answer: Yes, in R language one can handle missing values. The way of dealing with missing values is different as compared to other statistical softwares such as SPSS, SAS, STATA, EVIEWS etc.

Question: What is the representation of missing values in R Language?
Answer: In R missing values or data appears as NA. Note that NA is not a string nor a numeric value.

Question: Can R user introduce missing value(s) in matrix/ vector?
Answer: Yes user of R can create (introduce) missing values in vector/ Matrix. For example,

    x <- c(1,2,3,4,NA,6,7,8,9,10)
    y <- c(“a”, “b”, “c”, NA, “NA”)

Note that on y vector the fifth value of strong “NA” not a missing value.

Question: How one can check that there are missing value in a vector/ Matrix?
Answer: To check which values in a matrix/vector recognized as missing value by R language, use the is.na function. This function will return a vector of TRUE or FALSE. TRUE indicate that the value at that index is missing while FALSE indicate that the value is not a missing value. For example

> is.na(x)    # fifth element will appear as TRUE while all other will be FALSE
> is.na(y)    # fourth element will be true while all others as FALSE

Note that “NA” in second vector is not a missing value, therefore is.na will return FALSE for this value.

Question: In R language, can missing values be used comparisons?
Answer: No missing values in R cannot be used in comparisons. NA (missing values) is used for all kinds of missing data. Vector x is numeric and vector y is a character object. So Non-NA values cannot be interpreted as missing values. Write the command, to understand it

x < 0
y == NA
is.na(x) <- which(x–7); x1

Question: Provide an example for introducing NA in matrix?
Answer: Following command will create a matrix with all of the elements as NA.

matrix(NA, nrow=3, ncol=3)
matrix(c(NA,1,2,3,4,5,6,NA, NA), nrow=3, ncol=3)