4

I would like to get a handle on writing memory efficient haskell code. One thing I ran across is that there is no dead easy way to make python style list generators/iterators (that I could find).

Small example:

Finding the sum of the integers from 1 to 100000000 without using the closed form formula.

Python that can be done quickly with minimal use of memory as sum(xrange(100000000). In Haskell the analogue would be sum [1..100000000]. However this uses up a lot of memory. I thought using foldl or foldr would be fine but even that uses a lot of memory and is slower than python. Any suggestions?

10
  • Haskell lists are lazy, which makes them closer to Python generators/iterators than Python lists, except that their interface looks like simple lists (because they are; Haskell is just a lazy language). Commented May 15, 2016 at 6:36
  • Compiled with -O2. Sum is implemented using folds in the source code. This sum allocates around 3 gb of memory and takes 1.8 seconds on my machine which is about 3x as long as long as python's print sum(xrange(100000001)) which is why I was surprised. I thought laziness would fix the memory issue and that speed should be around equal. Thanks for the help. Commented May 15, 2016 at 7:00
  • @Helix No lazyness makes it harder to understand space usage. Note that python has a very limited "lazyness". Sure generators produce one element at a time, however when doing a function call arguments are still evaluated before the call is performed. In Haskell arguments are not evaluated before the function call but only when, during the call, the value is needed. This means that chains of function call create big unevaluated structures. Commented May 15, 2016 at 7:14
  • btw: if compiled with -O2 while sum [...] will allocate around 3gb in total yes - but not at the same time (it's 44,312 bytes maximum residency on my system) Commented May 15, 2016 at 7:36
  • same with foldl' of course Commented May 15, 2016 at 7:36

1 Answer 1

6

TL;DR - I think the culprit in this case is - defaulting of GHC to Integer.

Admittedly I do not know enough about python, but my first guess would be that python switches to "bigint" only if necessary - therefore all calculations are done with Int a.k.a. 64-bit integer on my machine.

A first check with

$> ghci
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
Prelude> maxBound :: Int
9223372036854775807

reveals that the result of the sum (5000000050000000) is less than that number so we have no fear of Int overflow.

I have guessed your example programs to look roughly like this

sum.py

print(sum(xrange(100000000)))

sum.hs

main :: IO ()
main = print $ sum [1..100000000]

To make things explicit I added the type annotation (100000000 :: Integer), compiling it with

$ > stack build --ghc-options="-O2 -with-rtsopts=-sstderr"

and ran your example,

$ > stack exec -- time sum
5000000050000000
   3,200,051,872 bytes allocated in the heap
         208,896 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6102 colls,     0 par    0.013s   0.012s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.000s   0.000s     0.0001s    0.0001s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    1.725s  (  1.724s elapsed)
  GC      time    0.013s  (  0.012s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    1.739s  (  1.736s elapsed)

  %GC     time       0.7%  (0.7% elapsed)

  Alloc rate    1,855,603,449 bytes per MUT second

  Productivity  99.3% of total user, 99.4% of total elapsed

1.72user 0.00system 0:01.73elapsed 99%CPU (0avgtext+0avgdata 4112maxresident)k

and indeed the ~3GB of memory consumption is reproduced.

Changing the annotation to (100000000 :: Int) - altered the behaviour drastically

$ > stack build
$ > stack exec -- time sum
5000000050000000
          51,872 bytes allocated in the heap
           3,408 bytes copied during GC
          44,312 bytes maximum residency (1 sample(s))
          17,128 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.000s     0.0001s    0.0001s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.034s  (  0.034s elapsed)
  GC      time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.036s  (  0.035s elapsed)

  %GC     time       0.2%  (0.2% elapsed)

  Alloc rate    1,514,680 bytes per MUT second

  Productivity  99.4% of total user, 102.3% of total elapsed

0.03user 0.00system 0:00.03elapsed 91%CPU (0avgtext+0avgdata 3496maxresident)k
0inputs+0outputs (0major+176minor)pagefaults 0swaps

For the interested

The behaviour of the haskell version does not change a lot if you use libraries like conduit or vector (both boxed and unboxed).

sample programs

sumC.hs

import Data.Conduit
import Data.Conduit.List as CL

main :: IO ()
main = do res <- CL.enumFromTo 1 100000000 $$ CL.fold (+) (0 :: Int)
          print res

sumV.hs

import           Data.Vector.Unboxed as V
{-import           Data.Vector as V-}

main :: IO ()
main = print $ V.sum $ V.enumFromTo (1::Int) 100000000

funny enough the version using

main = print $ V.sum $ V.enumFromN (1::Int) 100000000

is doing worse than the above - even though the documentation says otherwise.

enumFromN :: (Unbox a, Num a) => a -> Int -> Vector a

O(n) Yield a vector of the given length containing the values x, x+1 etc. This operation is usually more efficient than enumFromTo.

Update

@Carsten's comment made me curious - so I had a look into the sources for integer - well integer-simple to be precise, because for Integer there exists other versions integer-gmp and integer-gmp2 using libgmp.

data Integer = Positive !Positive | Negative !Positive | Naught

-------------------------------------------------------------------
-- The hard work is done on positive numbers

-- Least significant bit is first

-- Positive's have the property that they contain at least one Bit,
-- and their last Bit is One.
type Positive = Digits
type Positives = List Positive

data Digits = Some !Digit !Digits
            | None
type Digit = Word#

data List a = Nil | Cons a (List a)

so when using Integer there is quite a bit of memory overhead compared to Int or rather unboxed Int# - I guess as this should be optimized, (though I have not confirmed that).

So Integer is (if I calculate correctly)

  • 1 x Word for the sum-type-tag (here Positive
  • n x (Word + Word) for Some and the Digit part
  • 1 x Word for the last None

a memory overhead of (2 + floor(log_10(n)) for each Integer in that calculation + a bit more for the accumulator.

Sign up to request clarification or add additional context in comments.

3 Comments

wow interesting - I can understand the difference in time but the difference in memory consumption is puzzling me
@Carsten I tried to answer your question too - but I am not sure if everything is correct, I am by no means expert on that topic
@epsilonhalbe: IIRC, integer-gmp uses something along data Integer = Positive !ByteStuff# | Negative !ByteStuff# | SmallInt !Int#. So for small integers, it should perform the same as Int, but the checks whether (+) needs more space than Int# provides probably take their additional toll (and I'm not sure of the conditions that switch to Positive from SmallInt).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.