8

In trying to learn Haskell, I have implemented a pi calculation in order to understand functions and recursion properly.

Using the Leibniz Formula for calculating pi, I came up with the following, which prints pi to the tolerance of the given parameter, and the number of recursive function calls in order to get that value:

reverseSign :: (Fractional a, Ord a) => a -> a 
reverseSign num = ((if num > 0
                        then -1
                        else 1) * (abs(num) + 2))

piCalc :: (Fractional a, Integral b, Ord a) => a -> (a, b)
piCalc tolerance = piCalc' 1 0.0 tolerance 0

piCalc' :: (Ord a, Fractional a, Integral b) => a -> a -> a -> b -> (a, b)
piCalc' denom prevPi tolerance count = if abs(newPi - prevPi) < tolerance
                                        then (newPi, count)
                                        else piCalc' (reverseSign denom) newPi tolerance (count + 1)
                                        where newPi = prevPi + (4 / denom)

So when I run this in GHCI, it seems to work as expected:

*Main> piCalc 0.001
(3.1420924036835256,2000)

But if I set my tolerance too fine, this happens:

*Main> piCalc 0.0000001
(3.1415927035898146,*** Exception: stack overflow

This seems wholly counter-intuitive to me; the actual calculation works fine, but just trying to print how many recursive calls fails??

Why is this so?

2
  • 3
    In case you don't know what a thunk is (I didn't when I started Haskell!) it's basically an unsolved computation. In your first example, before you print count, it won't have a value of 2000, it will have a value of ...+1)+1)+1)+1)+1) (I omitted the 2000 left-parentheses at the start :P). When you print that, it is actually added up. Commented Jan 30, 2013 at 9:54
  • 2
    I'll just add to what @DanielBuckmaster said that the important point is then that the thunks keep building up, taking more and more memory, while one naively expects count to be something like an Int (constant in space). You'll get used to this, but's definitely something that can bite you. Commented Jan 30, 2013 at 9:56

2 Answers 2

10

The count isn't ever evaluated during the computation, so it's left as a huge amount of thunks (overflowing the stack) until the very end.

You can force its evaluation during the computation by enabling the BangPatterns extension and writing piCalc' denom prevPi tolerance !count = ...

So why do we only need to force the evaluation of count? Well, all the other arguments are evaluated in the if. We actually need to inspect them all before calling piCalc' again, so thunks aren't building up; we need the actual values, not just "promises that they can be computed"! count, on the other hand, is never needed during the computation, and can remain as a series of thunks until the very end.

Sign up to request clarification or add additional context in comments.

Comments

8

This is a variant of the traditional foldl (+) 0 [1..1000000] stack overflow. The problem is that the count value is never evaluated during the evaluation of piCalc'. This means that it just carries an ever-growing set of thunks representing the addition to be done if needed. When it is needed, the fact that evaluating it requires stack depth proportional to the number of thunks causes the overflow.

The simplest solution makes use of the BangPatterns extension, changing the start of piCalc' to

piCalc' denom prevPi tolerance !count = ...

This forces the value of count to be evaluated when the pattern is matched, which means that it will never grow a giant chain of thunks.

Equivalently, and without the use of an extension, you could write it as

piCalc' denom prevPi tolerance count = count `seq` ...

This is exactly equivalent semantically to the above solution, but it uses seq explicitly instead of implicitly via a language extension. This makes it more portable, but a bit more verbose.

As for why the approximation of pi is not a long sequence of nested thunks, but count is: piCalc' branches on the result of a computation that requires the values of newPi, prevPi, and tolerance. It must examine those values before it decides if it's done or if it needs to run another iteration. It's that branch that causes the evaluation to be performed (when the function application is performed, which usually means something is pattern-matching on the result of the function.) On the other hand, nothing in the calculation of piCalc' depends on the value of count, so it isn't evaluated during the calculation.

3 Comments

Can you explain why the thunking is not happening to the computed value of pi in this example, but only to the count?
@DanielBuckmaster That is because piCalc' branches on the result of a computation that requires the values of newPi, prevPi, and tolerance. It must examine those values before it decides if it's done or if it needs to run another iteration. It's that branch that causes the evaluation to be performed (when the function application is performed, which usually means something is pattern-matching on the result of the function.)
Thanks! I think that'd be very valuable to have in an answer. It's the reason why count causes a stack overflow and not the actual calculation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.