This is my Haskell submission. It will not be the fastest here, as my parallel implementation in C++ was ~4 times faster. But it was instructive, since my first Haskell try was ~20 times slower than C++. What I learned:
Use
Data.ByteString.Char8instead of the naiveread. I expected reading the file to be the most problematic, and it still is the slowest part. I just can't get ahead with optimizing it any more.Used an unboxed vector to accumulate the counts. It gave a little speedup. Maybe I can get more with mutable vectors?
Compiling with -O2 had some effect, but not a big one.
My first time using the GHC profiler.
Using ghc-9.6.6, vector-0.13
CPU: i7-1260P
Time: ~0.08 sec (avg. over 10 samples)
{-#LANGUAGE TupleSections #-}
module Main (main) where
import Data.Maybe (fromJust)
import Data.List (unfoldr)
import Data.Char (isDigit)
import qualified Data.ByteString.Char8 as B
import qualified Data.Vector.Unboxed as V
-- Functions are broken out for a more granular profiler output
counts :: [Int] -> V.Vector Int
counts lst = V.unsafeAccum (+) (V.replicate 1000 0) (map (,1) lst)
getCounts :: IO (V.Vector Int)
getCounts = do
numStrs <- B.readFile "data/1M_random_numbers.txt"
let nums = unfoldr (B.readInt . B.dropWhile (not . isDigit)) numStrs
return $ counts nums
main :: IO ()
main = do
cnt <- getCounts
let maxOccurrence = V.foldr max 0 cnt
print $ fromJust (V.findIndex (== maxOccurrence) cnt)
For comparison, the C++ version (compiled with GCC, using C++23 standard):
Time: ~0.024 sec
#include <fstream>
#include <string>
#include <algorithm>
#include <print>
#include <cstdlib>
int main() {
std::ifstream numStrs("../haskell/intcount/data/1M_random_numbers.txt");
if (!numStrs.is_open()) {
std::println("File open failed");
return EXIT_FAILURE;
}
// Just plunk it on the stack, fastest for small vectors like this
int accum[1000];
std::fill(accum, accum + 1000, 0);
std::string line;
while (std::getline(numStrs, line)) {
int num = std::atoi(line.data());
accum[num]++;
}
int* maxOccurrence = std::max_element(accum, accum + 1000);
std::println("Mode: {}", (int)(maxOccurrence - accum));
return EXIT_SUCCESS;
}