Still working on my SHA1 implementation in Haskell. I've now got a working implementation and this is the inner loop:
iterateBlock' :: Int -> [Word32] -> Word32 -> Word32 -> Word32 -> Word32 -> Word32 -> [Word32]
iterateBlock' 80 ws a b c d e = [a, b, c, d, e]
iterateBlock' t (w:ws) a b c d e = iterateBlock' (t+1) ws a' b' c' d' e'
where
a' = rotate a 5 + f t b c d + e + w + k t
b' = a
c' = rotate b 30
d' = c
e' = d
The profiler tells me that this function takes 1/3 of the runtime of my implementation. I can think of no way to further optimize it other than maybe inlining the temp variables but I believe -O2 will do that for me anyway.
Can anyone see a significant optimization that can be further applied?
FYI the k and f calls are below. They are so simple I don't think there is a way to optimize these other. Unless the Data.Bits module is slow?
f :: Int -> Word32 -> Word32 -> Word32 -> Word32
f t b c d
| t <= 19 = (b .&. c) .|. ((complement b) .&. d)
| t <= 39 = b `xor` c `xor` d
| t <= 59 = (b .&. c) .|. (b .&. d) .|. (c .&. d)
| otherwise = b `xor` c `xor` d
k :: Int -> Word32
k t
| t <= 19 = 0x5A827999
| t <= 39 = 0x6ED9EBA1
| t <= 59 = 0x8F1BBCDC
| otherwise = 0xCA62C1D6
Word32and manually unroll the loop. Short of that, try it with a strict/unpacked structure holdinga,b,c,d, ande; then you'd only have one variable needing passed (and you'd be sure to put a bang pattern on it, right?).(<=)with a table lookup, though I'm not sure it will help much.MagicHashes and use the GHC primops.