TL;DR - I think the culprit in this case is - defaulting of GHC to Integer.
Admittedly I do not know enough about python, but my first guess would be that python switches to "bigint" only if necessary - therefore all calculations are done with Int a.k.a. 64-bit integer on my machine.
A first check with
$> ghci
GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help
Prelude> maxBound :: Int
9223372036854775807
reveals that the result of the sum (5000000050000000) is less than that number so we have no fear of Int overflow.
I have guessed your example programs to look roughly like this
sum.py
print(sum(xrange(100000000)))
sum.hs
main :: IO ()
main = print $ sum [1..100000000]
To make things explicit I added the type annotation (100000000 :: Integer), compiling it with
$ > stack build --ghc-options="-O2 -with-rtsopts=-sstderr"
and ran your example,
$ > stack exec -- time sum
5000000050000000
3,200,051,872 bytes allocated in the heap
208,896 bytes copied during GC
44,312 bytes maximum residency (2 sample(s))
21,224 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 6102 colls, 0 par 0.013s 0.012s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.000s 0.000s 0.0001s 0.0001s
INIT time 0.000s ( 0.000s elapsed)
MUT time 1.725s ( 1.724s elapsed)
GC time 0.013s ( 0.012s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 1.739s ( 1.736s elapsed)
%GC time 0.7% (0.7% elapsed)
Alloc rate 1,855,603,449 bytes per MUT second
Productivity 99.3% of total user, 99.4% of total elapsed
1.72user 0.00system 0:01.73elapsed 99%CPU (0avgtext+0avgdata 4112maxresident)k
and indeed the ~3GB of memory consumption is reproduced.
Changing the annotation to (100000000 :: Int) - altered the behaviour drastically
$ > stack build
$ > stack exec -- time sum
5000000050000000
51,872 bytes allocated in the heap
3,408 bytes copied during GC
44,312 bytes maximum residency (1 sample(s))
17,128 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.000s 0.000s 0.0001s 0.0001s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.034s ( 0.034s elapsed)
GC time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.036s ( 0.035s elapsed)
%GC time 0.2% (0.2% elapsed)
Alloc rate 1,514,680 bytes per MUT second
Productivity 99.4% of total user, 102.3% of total elapsed
0.03user 0.00system 0:00.03elapsed 91%CPU (0avgtext+0avgdata 3496maxresident)k
0inputs+0outputs (0major+176minor)pagefaults 0swaps
For the interested
The behaviour of the haskell version does not change a lot if you use libraries like conduit or vector (both boxed and unboxed).
sample programs
sumC.hs
import Data.Conduit
import Data.Conduit.List as CL
main :: IO ()
main = do res <- CL.enumFromTo 1 100000000 $$ CL.fold (+) (0 :: Int)
print res
sumV.hs
import Data.Vector.Unboxed as V
{-import Data.Vector as V-}
main :: IO ()
main = print $ V.sum $ V.enumFromTo (1::Int) 100000000
funny enough the version using
main = print $ V.sum $ V.enumFromN (1::Int) 100000000
is doing worse than the above - even though the documentation says otherwise.
enumFromN :: (Unbox a, Num a) => a -> Int -> Vector a
O(n) Yield a vector of the given length containing the values x, x+1
etc. This operation is usually more efficient than enumFromTo.
Update
@Carsten's comment made me curious - so I had a look into the sources for integer - well integer-simple to be precise, because for Integer there exists other versions integer-gmp and integer-gmp2 using libgmp.
data Integer = Positive !Positive | Negative !Positive | Naught
-------------------------------------------------------------------
-- The hard work is done on positive numbers
-- Least significant bit is first
-- Positive's have the property that they contain at least one Bit,
-- and their last Bit is One.
type Positive = Digits
type Positives = List Positive
data Digits = Some !Digit !Digits
| None
type Digit = Word#
data List a = Nil | Cons a (List a)
so when using Integer there is quite a bit of memory overhead compared to Int or rather unboxed Int# - I guess as this should be optimized, (though I have not confirmed that).
So Integer is (if I calculate correctly)
- 1 x
Word for the sum-type-tag (here Positive
- n x (
Word + Word) for Some and the Digit part
- 1 x
Word for the last None
a memory overhead of (2 + floor(log_10(n)) for each Integer in that calculation + a bit more for the accumulator.
print sum(xrange(100000001))which is why I was surprised. I thought laziness would fix the memory issue and that speed should be around equal. Thanks for the help.-O2whilesum [...]will allocate around 3gb in total yes - but not at the same time (it's44,312 bytes maximum residencyon my system)foldl'of course