3

I'm playing with some Data Parallel Haskell code and found myself in need of a prefix sum. However I didn't see any basic operator in the dph package for prefix sum.

I rolled my own, but, since I'm new to dph, I'm not sure if it's properly taking advantage of parallelization:

{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module PrefixSum ( scanP ) where
import Data.Array.Parallel (lengthP, indexedP, mapP, zipWithP, concatP, filterP, singletonP, sliceP, (+:+), (!:))
import Data.Array.Parallel.Prelude.Int ((<=), (-), (==), Int, mod)
-- hide prelude
import qualified Prelude 

-- assuming zipWithP (a -> b -> c) given 
-- [:a:] of length n and
-- [:b:] of length m, n /= m
-- will return
-- [:c:] of length min n m

scanP :: (a -> a -> a) -> [:a:] -> [:a:]
scanP f xs = if lengthP xs <= 1
                then xs
                else head +:+ tail
  where -- [: x_0, x_2, ..., x_2n :]
        evens = mapP snd . filterP (even . fst) $ indexedP xs
        -- [: x_1, x_3 ... :]
        odds = mapP snd . filterP (odd . fst)  $ indexedP xs
        lenEvens = lengthP evens
        lenOdds = lengthP odds
        -- calculate the prefix sums [:w:] of the pair sums [:z:]
        psums = scanP f $ zipWithP f evens odds
        -- calculate the total prefix sums as 
        -- [: x_0, w_0, f w_0 x_2, w_1, f w_1 x_4, ..., 
        head = singletonP (evens !: 0)
        body = concatP . zipWithP (\p e -> [: p, f p e :]) psums $ sliceP 1 lenOdds evens
        -- ending at either
        --    ... w_{n-1}, f w_{n-1} x_2n :]
        -- or
        --    ... w_{n-1}, f w_{n-1} x_2n, w_n :]
        -- depending on whether the length of [:x:] is 2n+1 or 2n+2
        tail = if lenEvens == lenOdds then body +:+ singletonP (psums !: (lenEvens - 1)) else body

-- reimplement some of Prelude so it can be vectorised
f $ x = f x
infixr 0 $
(.) f g y = f (g y)

snd (a,b) = b
fst (a,b) = a

even n = n `mod` 2 == 0
odd n = n `mod` 2 == 1
6
  • Hmm, is it even parallelizable? Seems like a pretty serial idea, but maybe I am missing something. Commented Jan 31, 2012 at 15:16
  • @luqui: The parallel prefix sum algorithm for an array of size n takes O(log n) parallel rounds of computation. There are two phases. In the forward phase, given {a_i | i \in [0,2n-1] } you calculate { a_2i + a_{2i+1} | i \in [0,n-1] } using n/2 parallel additions. In the backward phase, given { \sum_0^{2i+1} a_j | i \in [0,n-1] } and { a_i | i \in [0,2n-1] }, you calculate { \sum_0^i a_j | i \in [0, 2n-1 } using n/2 parallel additions. Commented Jan 31, 2012 at 16:24
  • @luqui: naturally, this only works properly for associative +, since there's the inherent assumption that (a_0 + a_1) + (a_2 + a_3) == ((a_0 + a_1) + a_2) + a_3 Commented Jan 31, 2012 at 16:27
  • For a normal array, this would be scanl, but I'm not seeing that in dph-par. Commented Jan 31, 2012 at 23:23
  • right. scanl is O(n) work in O(n) time, scanP is meant to be O(n) work in O(log n) time. Commented Feb 1, 2012 at 10:03

1 Answer 1

2

Parallel prefix scans are supported, in fact, they're rather fundamental. So just pass (+) as your associative operator.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.