Debugging Package::Function() Although Lazy Evaluation Is Used

Debugging package::function() although lazy evaluation is used

The following works for me.

I have my package TestDebug with my function

myfun <- function(a,b) {return(a+b)}

If I run the script

debug(TestDebug::myfun)
TestDebug::myfun(1,2)

The debugger steps right into the source of TestDebug::myfun() not into the :: function as it does when you place a browser() before the call to TestDebug::myfun(1,2).

As you mention in your question: in real-life situations TestDebug::myfun(1,2) often calls otherpackage::myfun2(1,2). If you try to step into otherpackage::myfun2(1,2) you will end up inside the :: function again.

To prevent this I add this functions called inside other functions to the debug index on the fly:

As soon as you are on the line inside TestDebug::myfun() where otherpackage::myfun2(1,2) is called I run debug(otherpackage::myfun2(1,2)) in the console. After that I can step into otherpackage::myfun2(1,2) without problems and end up in the source code of otherpackage::myfun2(1,2). (..and not in the source code of ::)

Don't forget to call undebug(otherpackage::myfun2(1,2)) after you're sure that your problem is not inside otherpackage::myfun2(1,2) to prevent the debugger to jump into otherpackage::myfun2(1,2) the next time it is called.

If you prefer you can also use debugonce(otherpackage::myfun(1,2)) (instead of debug(..)) to only debug a function once.

Lazy evaluation in Haskell's do notation using the trace function

My guess would be because of "lazy evaluation".

Note that you don't return anything. In other words, the "return" is not been queried yet (well there is no return), nor is it useful. Inside the return statement, you are not in a "monadic" context. So there is no reason to evaluate it, you simply pass the "call-tree" as the result.

In other words, it remains on the "call-tree" until someone wants to pick it up.

For the second case it is trivial that the trace will be called. The monad is executed until it reaches a "return", and before that return is reached, all necessary actions are taken, including executing debug info if needed.

Example in ghci:

$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> import Debug.Trace
Prelude Debug.Trace> do return (trace "debug message 1" ())
Prelude Debug.Trace> do trace "debug message 2" (return ())
debug message 2

Same for runhaskell. If you write these two programs:

program1.hs:

import Debug.Trace

main = do return (trace "debug message 1" ())

program2.hs:

import Debug.Trace

main = do
trace "debug message 2" (return ())

Then the console reads:

$ runhaskell program1.hs
$ runhaskell program2.hs
debug message 2
$

If you however write an IO Bool (thus with return value) and you later use that value, the trace will be executed, for instance:

testFun :: IO Bool
testFun = do
trace "foo" $ return $ trace "Hello" True

main :: IO ()
main = do
b <- testFun
print b

This will result in:

$ runhaskell program3.hs
foo
Hello
True
$

If you however omit the print b and put return () instead, Haskell has no interest in the what is returned and thus doesn't print the trace:

testFun :: IO Bool
testFun = do
trace "foo" $ return $ trace "Hello" True

main :: IO ()
main = do
b <- testFun
return () --we ran `testFun` but are not interested in the result

The result is:

$ runhaskell program4.hs
foo
$

How is debugging achieved in a lazy functional programming language?

Nothing prevents you from using breakpoints in a lazily evaluated functional program. The difference to eager evaluation is when the program will stop at the breakpoint and what the trace will look like. The program will stop when the expression a breakpoint is set on is actually being reduced (obviously).

Instead of the stack trace you're used to you get the reductions that led up to the reduction of the expression with the breakpoint on it.

Small silly example. You have this Haskell program.

add_two x = 2 + x

times_two x = 2 * x

foo = times_two (add_two 42)

And you put a breakpoint on the first line (add_two), then evaluate foo. When the program stops on the breakpoint, in an eager language you'd expect to have a trace like

add_two
foo

and times_two hasn't even begun to be evaluated, but in the GHCi debugger you get

-1  : foo (debug.hs:5:17-26)
-2 : times_two (debug.hs:3:14-18)
-3 : times_two (debug.hs:3:0-18)
-4 : foo (debug.hs:5:6-27)
<end of history>

which is the list of reductions that led up to the reduction of the expression you put the breakpoint on. Note that it looks like times_two "called" foo even though it does not do so explicitly. You can see from this that the evaluation of 2 * x in times_two (-2) did force the evaluation of (add_two 42) (-1) from the foo line. From there you can perform a step as in an imperative debugger (perform the next reduction).

Another difference to debugging in an eager language is that variables may be not yet evaluated thunks. For example, at step -2 in the above trace and inspect x, you'll find it's still an unevaluated thunk (indicated by brackets in GHCi).

For far more detailed information and examples (how to step through the trace, inspect values, ...), see the GHCi Debugger section in the GHC manual. There's also the Leksah IDE which I haven't used yet as I'm a VIM and terminal user, but it has a graphical frontend to the GHCi debugger according to the manual.

You also asked for print statements. Only with pure functions, this is not so easily possible as a print statement would have to be within the IO monad. So, you have a pure function

foo :: Int -> Int

and wish to add a trace statement, the print would return an action in the IO monad and so you'd have to adjust the signature of the function you wish to put that trace statement in, and the signatures of the functions that call it, ...

This is not a good idea. So, you need some way to break purity to achieve trace statements. In Haskell, this can be done with unsafePerformIO. There's the Debug.Trace module that already has a function

trace :: String -> a -> a

which outputs the string and returns the second parameter. It would be impossible to write as a pure function (well, if you intend to really output the string, that is). It uses unsafePerformIO under the hood. You can put that into a pure function to output a trace print.

Would you have to program a monad for every section of code you want to test?

I'd suggest rather the opposite, make as many functions pure as possible (I'm assuming here you mean the IO monad for printing, monads are not necessarily impure). Lazy evaluation allows you to separate IO code from processing code very cleanly.

Whether imperative debugging techniques are a good idea or not depends on the situation (as usual). I find testing with QuickCheck/SmallCheck much more useful than unit testing in imperative languages, so I'd go that route first to avoid as much debugging as possible. QuickCheck properties actually make nice concise function specifications (a lot of test code in imperative languages looks like just another blob of code to me).

One trick to avoid a lot of debugging is to decompose the function into many smaller subfunctions and test as many of them as possible. This may be a bit unusal when coming from imperative programming, but it's a good habit no matter what language you're using.

Then again, debugging != testing and if something goes wrong somewhere, breakpoints and traces may help you out.

Python: how to do lazy debug logging

Look at this part of the documentation.

Update: Logging already supports lazy evaluation, but slightly differently to the way described in your comnment. For example, see the following script:

import logging

def expensive_func(*args):
print('Expensive func called: %s' % (args,))
return sum(args)

class DeferredMessage(object):
def __init__(self, func, *args):
self.func = func
self.args = args

def __str__(self):
return 'Message {0}'.format(self.func(*self.args))

if __name__ == '__main__':
logging.basicConfig()
logging.info(DeferredMessage(expensive_func, 1, 2))
logging.warning(DeferredMessage(expensive_func, 3, 4))
logging.error(DeferredMessage(expensive_func, 5, 6))

When the above script is run, it should print

Expensive func called: (3, 4)
WARNING:root:Message 7
Expensive func called: (5, 6)
ERROR:root:Message 11

which shows that a potentially expensive function is only called when necessary. The example above can, of course, be generalised to allow the format string to be passed to the DeferredMessage, and to use kwargs, and so on.

Lazy logger message string evaluation

The logging module already has partial support for what you want to do. Do this:

log.debug("Some message: a=%s b=%s", a, b)

... instead of this:

log.debug("Some message: a=%s b=%s" % (a, b))

The logging module is smart enough to not produce the complete log message unless the message actually gets logged somewhere.

To apply this feature to your specific request, you could create a lazyjoin class.

class lazyjoin:
def __init__(self, s, items):
self.s = s
self.items = items
def __str__(self):
return self.s.join(self.items)

Use it like this (note the use of a generator expression, adding to the laziness):

logger.info('Stupid log message %s', lazyjoin(' ', (str(i) for i in range(20))))

Here is a demo that shows this works.

>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> logger = logging.getLogger("log")
>>> class DoNotStr:
... def __str__(self):
... raise AssertionError("the code should not have called this")
...
>>> logger.info('Message %s', DoNotStr())
Traceback (most recent call last):
...
AssertionError: the code should not have called this
>>> logger.debug('Message %s', DoNotStr())
>>>

In the demo, The logger.info() call hit the assertion error, while logger.debug() did not get that far.

Strange debug results in R Studio

Most R packages don't include copies of the package's source code (the keep.source option controls this behavior; you can read more about it here). When you debug code from this kind of package, RStudio has to guess at both what the code looked like originally and where the current execution point is in that code. This process includes some heuristics and does occasionally misinterpret the execution point. So what you're seeing is not the actual execution point jumping around, but RStudio making the wrong guess about where it is in the deparsed code.

Code from your own R scripts (and from packages with keep.source) have source references (see R journal article) which allow RStudio to line up the code and execution point precisely.

I wasn't able to reproduce this with your example (it appears to reference a variable m that it doesn't declare); if you can resolve that and send it to me (jonathan at rstudio dot com) I'll see if we can improve the heuristics so this case works better.

Does calling map twice on the same list forces iterate through the list twice?

Why is it thanks to laziness? I know that laziness means that it does not evaluate expression till it has to but... How does it force iterating list once instead of declared number of maps/filters?

If you are only interested in the first element of a list, map will not evaluate the rest of the list.

Imagine that you are interested in evaluating map foo (map foo [1,2]) to head normal form. Then it will thus call map foo (map foo [1,2]).

map :: (a -> b) -> [a] -> [b] is implemented as [src]:

map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x:xs) = f x : map f xs

It thus will evaluate the subexpression map foo [1,2] to head normal form: it has to know if the outer item is an empty list data constructor [], or a "cons" (_:_).

So it will evaluate the second map, but only up to the outer data constructor. This thus means that it will see that the list [1,2] is not empty, and thus it will produce foo 1 : map foo [2]. It will not evaluate foo 1, since that is not necessary either.

Since we now know that the subexpression has a "cons" as data constructor, the outer map function will return foo (foo 1) : map foo (map foo [2]). It thus moved the "cursor " of the original list one step further, but it will not enumerate over the entire list, nor will it evaluate the function foo on the elements.

If we want to print the head for examle, like print (head (map foo (map foo [1,2]))), it means that head will be evaluated, and head (foo (foo 1) : map foo (map foo [2])) will just return foo (foo 1). It will again not evaluate foo (foo 1). Now the print will need to evaluate the first item, since it is supposed to print that to the output channel, this means that it evaluates foo (foo 1). In order to evaluate foo (foo 1), it first has to evaluate foo 1, since this is necessary to force evaluation of the multiplication, so foo 1 is evaluated to 2, and foo 2 is evaluated to 4.

But we thus saved a lot of cycles by not evaluating the entire list, or the foo (foo …) parts of other elements.

How can I verify that it iterates only once?

There are tools like vizualize-cbn [GitHub] and ghc-viz [Hackage] that can help to visualize how laziness is handled by Haskell. One can see how evaluation of one element results in visualizing other elements that can help understanding recursive patterns.

Bonus question - I know that side effections are but... It would be pretty nice if I can print something to debug e.g.

One can make use of trace :: String -> a -> a to print a message, for example:

import Debug.Trace(trace)

foo p = trace "evaluated foo" (p * 2)

If you then evaluate print (head (map foo (map foo [1,2]))), you see that it only evaluates foo twice, not four times. Indeed:

ghci> import Debug.Trace(trace)
ghci> foo p = trace "evaluated foo" (p * 2)
ghci> print (head (map foo (map foo [1,2])))
evaluated foo
evaluated foo
4

Haskell: can Lazy Evaluation help to stop a voting earlier?

You can do this without parallel evaluation trivially, using lazy naturals. In this case I chose to use the peano-inf package on hackage: http://hackage.haskell.org/package/peano-inf

import Number.Peano.Inf
import Debug.Trace
import Data.List

myList = [trace "1" True, trace "2" True, trace "3" False, trace "4" True, trace "5" True]

btoNat True = 1 :: Nat
btoNat False = 0 :: Nat

ans = sum $ map btoNat myList
{-
*Main> ans > 2
1
2
3
4
True
-}

Note that 5 is not printed in the trace, because evaluation is cut short before then.

To do this with parallelism requires manually spawning and killing threads, etc, which is fine but certainly less pleasant.

Note that the above code uses standard sum. This sort of uncommon use-case is why, though many feel it isn't worth it, sum is not made as strict as possible.



Related Topics



Leave a reply



Submit