Introduction to Lexical Scoping
The Lexical Scoping in R Language is the set of rules that govern how R will look up the value of a symbol. For example
x <- 10
In this example, scoping is the set of rules that R applies to go from symbol $x$ to its value 10.
Table of Contents
Types of Scoping
R has two types of scoping
- Lexical scoping: implemented automatically at the language level
- Dynamic scoping: used in select functions to save typing during interactive analysis.
Lexical scoping looks up symbol values based on how functions were nested when they were created, not how they are nested when they are called to figure out where the values of a variable will be looked up. You just need to look at the function’s definition.
Basic Principles of Lexical Scoping in R Language
There are four basic principles behind R’s implementation of lexical scoping in R Language:
Name Masking
The following example will illustrate the basic principle of lexical scoping
f <- function(){ x <- 1 y <- 2 c(x, y) } f()
If a name is not defined inside a function, R will look one level up.
x <- 2 g <- function(){ y <- 1 c(x,y) } g()
The same rules apply if a function is defined inside another function: look inside the current function, then where the function was defined, and so on, all the way up to the global environment, and then on to other loaded packages.
x <- 1 h <- function(){ y <- 2 i <- function(){ z <- 3 c(x,y,z) } i() } h() r(x,y)
The same rules apply to closures, functions created by other functions. The following function, j( )
, returns a function.
How does R know what the value of y is after the function has been called? It works because k preserves the environment in which it was defined and because the environment includes the value of y.
j <- function(x){ y <- 2 function(){ c(x,y) } } k<-j(1) k() rm(j,k)
Functions vs Variables
Finding functions works the same way as finding variables:
l <- function(x){ x+1 } m <- function(){ l <- function(x){ x*2 } l(10) } m()
If you are using a name in a context where it’s obvious that you want a function (e.g. f(3)
), R will ignore objects that are not functions while it is searching. In the following example, n takes on a different value depending on whether R is looking for a function or a variable.
n <- function(x) { x/2 } o <- function(){ n <- 10 n(n) } o()
Fresh Start
The following questions can be asked (i) What happens to the values in between invocation of a function? (ii) What will happen the first time you run this function? and (iii) What will happen the second time? (If you have not seen exists()
before it returns TRUE
if there’s a variable of that name, otherwise it returns FALSE
).
j <- function(){ if(!exists("a")) { a <- 1 } else { a<-a+1 } print(a) } j()
From the above example, you might be surprised that it returns the same value, 1 every time. This is because every time a function is called, a new environment is created to host execution. A function has no way to tell what happened the last time it was run; each invocation is completely independent (but see mutable states).
Dynamic Lookup
Lexical scoping determines where to look for values, not when to look for them. R looks for values when the function is run, not when it’s created. This means that the output of a function can be different depending on objects outside its environment:
f <- function() { x } x <- 15 f() x <- 20 f()
You generally want to avoid this behavior because it means the function is no longer self-contained.
One way to detect this problem is the findGlobals()
function from codetools. This function lists all the external dependencies of a function:
f <- function{ x + 1 } codetools::findGlobals(f)
Another way to try and solve the problem would be to manually change the environment of the function to the emptyenv()
, an environment that contains absolutely nothing:
environment(f) <- emptyenv()
This doesn’t work because R relies on lexical scoping to find everything, even the + operator. It’s never possible to make a function completely self-contained because you must always rely on functions defined in base R or other packages.
Since all standard operators in R are functions, you can override them with your alternatives.
'(' <- function(e1) { if(is.numeric(e1) && runif(1)<0.1){ e1 + 1 } else { e1 } } replicate (50,(1+2))
A pernicious bug is introduced: 10% of the time, 1 will be added to any numeric calculation inside parenthesis. This is another good reason to regularly restart with a clean R session!
Bound Symbol or Variable
If a symbol is bound to a function argument, it is called a bound symbol or variable. In case, if a symbol is not bound to a function argument, it is called a free symbol or variable.
If a free variable is looked up in the environment in which the function is called, the scoping is said to be dynamic. If a free variable is looked up in the environment in which the function was originally defined the scoping is said to be static or lexical. R, like Lisp, is lexically scoped whereas R and S-plus are dynamically scoped.
y = 20 foo = function(){ y = 10 #clouser for the foo function function(x) { x + y } } bar=foo()
Foo returns an anonymous function.
bar=foo()
is a function in global like foo. $x + y$ is created in the foo environment not in global. Foo has a function as a return value, which is then bound to bar the global environment. Note that anonymous is a function that has no name.