I am trying to parse a binary file into a haskell vector. I can load my file into a regular list, but since I have more than 10000000 elements for each file, I have terrible performances.
To parse the binary file, I use Data.Binary.Get and Data.Binary.IEEE754 since I intend to read float values. I am trying to build my vector as Mutable to then return it freezed.
I end up at a where I have a problem because Get is not an instance of Control.Monad.Primitive.PrimMonad which looks pretty obscure to me.
import qualified Data.ByteString.Lazy as B
import qualified Data.Vector.Unboxed.Mutable as UM
import qualified Data.Vector.Unboxed as U
import Data.Binary.Get
import Data.Binary.IEEE754
type MyVectorOfFloats = U.Vector Float
main = do
-- Lazyly read the content of the file as a ByteString
file_content <- B.readFile "vec.bin"
-- Parse the bytestring and get the vector
vec <- runGet (readWithGet 10) file_content :: MyVectorOfFloats
-- Do something usefull with it...
return ()
readWithGet :: Int
-> Get MyVectorOfFloats -- ^ Operates in the Get monad
readWithGet n = do
-- Initialize a mutable vector of the desired size
vec <- UM.new n
-- Initialize the vector with values obtained from the Get monad
fill vec 0
-- Finally return freezed version of the vector
U.unsafeFreeze vec
where
fill v i
| i < n = do
-- Hopefully read one fload32 from the Get monad
f <- getFloat32le
-- place the value inside the vector
-- In the real situation, I would do more complex decoding with
-- my float value f
UM.unsafeWrite v i f
-- and go to the next value to read
fill v (i + 1)
| otherwise = return ()
The example above is quite simple, in my situation I have run-length like decoding to do, but the problem stays the same.
First, does the libraries I selected seem adequate for my use ? I currently do not really need the all vector in memory at once. I can operate on chunks. Something from pipes or Conduit looks like interesting.
Do I have to make Get an instance of Control.Monad.Primitive.PrimMonad to do what I want ?
I think I could try to do some unfolding pattern to build the vector without mutable state.
PrimMonads:IOandST. Any other monad will do though, if you can write it as a transformer over one of these... there's quite an interesting discussion going on right now about whether this should be possible for basically all monads, includingGet... but at any rate it doesn't have a working instance, so, you can't do it right away. — The best performance for such a huge load of binary data is certainly to fiddle with low-levelStorable.Getalways loads data in its entirety. Do you really want all 10000000 numbers in memory at once? If you could stream the data in chunks, you'd be a lot happier with the memory use.