Explain a Lazy Evaluation Quirk

Explain a lazy evaluation quirk

The goal of:

adders <- lapply(1:10, function(x)  add(x) )

is to create a list of add functions, the first adds 1 to its input, the second adds 2, etc. Lazy evaluation causes R to wait for really creating the adders functions until you really start calling the functions. The problem is that after creating the first adder function, x is increased by the lapply loop, ending at a value of 10. When you call the first adder function, lazy evaluation now builds the function, getting the value of x. The problem is that the original x is no longer equal to one, but to the value at the end of the lapply loop, i.e. 10.

Therefore, lazy evaluation causes all adder functions to wait until after the lapply loop has completed in really building the function. Then they build their function with the same value, i.e. 10. The solution Hadley suggests is to force x to be evaluated directly, avoiding lazy evaluation, and getting the correct functions with the correct x values.

Understanding the lazy evaluation example in Advanced R

Since x is a parameter to the function, it's defined when the function is run. All parameters will show up when you run ls() inside a function. A variable can exist before it's evaluated.

Can you more clearly explain lazy evaluation in R function operators?

When you do

what_is_love <- function(f) {
function(...) {
cat('f is', f, '\n')
}
}

the inner function creates an enclosure for f, but the catch is that until you actually use a variable passed to a function, it remains a "promise" and is not actually evaluated. If you want to "capture" the current value of f, then you need to force the evaluation of the promise; you can use the force() function fo this.

what_is_love <- function(f) {
force(f)
function(...) {
cat('f is', f, '\n')
}
}
funs <- lapply(c('love', 'cherry'), what_is_love)

funs[[1]]()
# f is love
funs[[2]]()
# f is cherry

Without force(), f remains a promise inside both of the functions in your list. It is not evaluated until you call the function, and when you call the function that promise is evaluated to the last known value for f which is "cherry."

As @MartinMorgran pointed out, this behavior has changed in R 3.2.0. From the release notes

Higher order functions such as the apply functions and Reduce()
now force arguments to the functions they apply in order to
eliminate undesirable interactions between lazy evaluation and
variable capture in closures. This resolves PR#16093.

How to not fall into R's 'lazy evaluation trap'

You are creating functions with implicit parameters, which isn't necessarily best practice. In your example, the implicit parameter is i. Another way to rework it would be:

library(functional)
myprint <- function(x) print(x)
funs <- lapply(1:10, function(i) Curry(myprint, i))
funs[[1]]()
# [1] 1
funs[[2]]()
# [1] 2

Here, we explicitly specify the parameters to the function by using Curry. Note we could have curried print directly but didn't here for illustrative purposes.

Curry creates a new version of the function with parameters pre-specified. This makes the parameter specification explicit and avoids the potential issues you are running into because Curry forces evaluations (there is a version that doesn't, but it wouldn't help here).

Another option is to capture the entire environment of the parent function, copy it, and make it the parent env of your new function:

funs2 <- lapply(
1:10, function(i) {
fun.res <- function() print(i)
environment(fun.res) <- list2env(as.list(environment())) # force parent env copy
fun.res
}
)
funs2[[1]]()
# [1] 1
funs2[[2]]()
# [1] 2

but I don't recommend this since you will be potentially copying a whole bunch of variables you may not even need. Worse, this gets a lot more complicated if you have nested layers of functions that create functions. The only benefit of this approach is that you can continue your implicit parameter specification, but again, that seems like bad practice to me.

Explain a lazy evaluation quirk

The goal of:

adders <- lapply(1:10, function(x)  add(x) )

is to create a list of add functions, the first adds 1 to its input, the second adds 2, etc. Lazy evaluation causes R to wait for really creating the adders functions until you really start calling the functions. The problem is that after creating the first adder function, x is increased by the lapply loop, ending at a value of 10. When you call the first adder function, lazy evaluation now builds the function, getting the value of x. The problem is that the original x is no longer equal to one, but to the value at the end of the lapply loop, i.e. 10.

Therefore, lazy evaluation causes all adder functions to wait until after the lapply loop has completed in really building the function. Then they build their function with the same value, i.e. 10. The solution Hadley suggests is to force x to be evaluated directly, avoiding lazy evaluation, and getting the correct functions with the correct x values.

Lazy evaluation example with NULL in Advanced R

This is just the way the && operator works. It's called short-circuiting and is separate from lazy evaluation.

Lazy evaluation refers to the way function arguments are evaluated. In particular arguments are only evaluated when (if) they are actually used in the function. For example, consider the following function

f <- function(a, b) NULL

that does nothing and returns NULL. The arguments a and b are never evaluated because they are unused. They don't appear in the body of f, so you can call f with any expressions you want (as long as it's syntactically correct) as arguments because the expressions won't be evaluated. E.g.

> f(1, 2)
NULL
> f(43$foo, unboundvariableblablabla)
NULL

Without lazy evaluation the arguments are evaluated first and then passed to the function, so the call above would fail because if you try to evaluate 43$foo you'll get an error

> 43$foo
Error in 43$foo : $ operator is invalid for atomic vectors

Lazy evaluation of `which` function arguments?

With if there can be efficiency gains as a result of that behavior. It is documented to work that way, and I don't think it is due to lazy evaluation. Even if you "force()-ed" that expression it would still only evaluate a series of &'s until it had a single FALSE. See this help page:

?Logic

@XuWang probably deserved the credit for emphasizing the difference between "&" and "&&". The "&" operator works on vectors and returns vectors. The "&&" operator acts on scalars (actually vectors of length==1) and returns a vector of length== 1. When offered a vector or length >1 as either side of the arguments, it will work on only the information in the first value of each and emit a warning. It is only the "&&" version that does what is being called "lazy" evaluation. You can see that hte "&" operator is not acting in a "lazy fashion with a simepl test:

 fn1 <- function(x) print(x)
fn2 <- function(x) print(x)
x1 <- sample(c(TRUE, FALSE), 10, replace=TRUE)

fn1(x1) & fn2(x1) # the first two indicate evaluation of both sides regardless of first value
# [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
# [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
# [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE

lazy evaluation in ocaml

Each call to lseq will construct a new value of type 'a llist. The value will consist of two parts. The first part is a list element, produced on this step, and the second one is a function that will produce the rest of the list. Then the function is not called yet, so the function doesn't cycle.

In layman terms, a list is a pair that has one value, and a phone number, where you should call to get the rest. So if you need more values, you need to call more, e.g.,

let rec print_lseq (LazyList (x,next)) = 
print_int x;
print_lseq (next ())

Of course, this function will never terminate and will print an infinite sequence of numbers.

What concerning your example, lseq 5 is an infinite sequence that looks like: 5, 6, 7, .... It is not eagerly constructed in the memory, but instead it is more like a recipe, how to construct the sequence.

What's the advantage using lazy evaluation in Queue data structure?

Lazy version of check function (and thus snoc) is actually O(1), because it performs reverse using lazy operations, i.e., (++) and reverse are both lazy. That is the place where the credit is given. You start to pay when you take head or tail. Moreover, due to hidden mutability (and laziness is actually a variant of restricted mutability) you will pay for this credit only once, even if you have different futures. There is a very interesting blog post on bankers queue (and on a batch queue) that may help you to understand why this make difference.

The performance of (++) with lazy evaluation

If you access the whole resulting list, lazy evaluation won't save any computation. It will only delay it until you need each particular element, but at the end, you have to compute the same thing.

If you traverse the concatenated list xs ++ ys, accessing each element of the first part (xs) adds a little constant overhead, checking if xs was spent or not.

So, it makes a big difference if you associate ++ to the left or to the right.

  • If you associate n lists of length k to the left like (..(xs1 ++ xs2) ... ) ++ xsn then accessing each of the first k elements will take O(n) time, accessing each of the next k ones will take O(n-1) etc. So traversing the whole list will take O(k n^2). You can check that

    sum $ foldl (++) [] (replicate 100000 [1])

    takes really long.

  • If you associate n lists of length k to the right like xs1 ++ ( ..(xsn_1 ++ xsn) .. ) then you'll get only constant overhead for each element, so traversing the whole list will be only O(k n). You can check that

    sum $ foldr (++) [] (replicate 100000 [1])

    is quite reasonable.


Edit: This is just the magic hidden behind ShowS. If you convert each string xs to showString xs :: String -> String (showString is just an alias for (++)) and compose these functions, then no matter how you associate their composition, at the end they will be applied from right to left - just what we need to get the linear time complexity. (This is simply because (f . g) x is f (g x).)

You can check that both

length $ (foldl (.) id (replicate 1000000 (showString "x"))) ""

and

length $ (foldr (.) id (replicate 1000000 (showString "x"))) ""

run in a reasonable time (foldr is a bit faster because it has less overhead when composing functions from the right, but both are linear in the number of elements).



Related Topics



Leave a reply



Submit