UPDATE: I have made most of the changes Peter suggested. The largest boost in performance came indeed from using the ST Monad to modify a Vector and from substituting the MonadRandom with the mersenne-random package.
The code I was using to modify Vectors is the following:
modifyVectorElement :: Int -> a -> V.Vector a -> V.Vector a
modifyVectorElement !i !x !v = V.modify (\v' -> write v' i x) v
, which I change to:
modifyVectorElement :: Int -> a -> Vector a -> Vector a
modifyVectorElement !i !x !v = runST $ unsafeThaw v >>= (\mv -> write mv i x >> unsafeFreeze mv)
Including all the other changes Peter suggested, brought the time for a 1000 iterations from ~30s to ~3.7s, which is about 25-35% slower than the C equivalent code. This is very satisfactory, but I'm sure more optimizations may be possible.
I'm still surprised by how big a difference optimization makes in Haskell compared to imperative languages, where beyond having a good algorithm, the performance gains from optimizations are much smaller.
For anyone that may be interested, I have updated the code on GitHub with the latest changes.