Skip to main content
1 of 2

This is my Haskell submission. It will not be the fastest here, as my parallel implementation in C++ was ~4 times faster. But it was instructive, since my first Haskell try was ~20 times slower than C++. What I learned:

  1. Use Data.ByteString.Char8 instead of the naive read. I expected reading the file to be the most problematic, and it still is the slowest part. I just can't get ahead with optimizing it any more.

  2. Used an unboxed vector to accumulate the counts. It gave a little speedup. Maybe I can get more with mutable vectors?

  3. Compiling with -O2 had some effect, but not a big one.

  4. My first time using the GHC profiler.

Using ghc-9.6.6, vector-0.13
CPU: i7-1260P

Time: ~0.08 sec (avg. over 10 samples)

{-#LANGUAGE TupleSections #-}

module Main (main) where

import Data.Maybe (fromJust)
import Data.List (unfoldr)
import Data.Char (isDigit)

import qualified Data.ByteString.Char8 as B
import qualified Data.Vector.Unboxed as V

-- Functions are broken out for a more granular profiler output
counts :: [Int] -> V.Vector Int
counts lst = V.unsafeAccum (+) (V.replicate 1000 0) (map (,1) lst)

getCounts :: IO (V.Vector Int)
getCounts = do
    numStrs <- B.readFile "data/1M_random_numbers.txt"
    let nums = unfoldr (B.readInt . B.dropWhile (not . isDigit)) numStrs
    return $ counts nums

main :: IO ()
main = do
    cnt <- getCounts
    let maxOccurrence = V.foldr max 0 cnt

    print $ fromJust (V.findIndex (== maxOccurrence) cnt)

For comparison, the C++ version (compiled with GCC, using C++23 standard):
Time: ~0.024 sec

#include <fstream>
#include <string>
#include <algorithm>

#include <print>
#include <cstdlib>

int main() {
    std::ifstream numStrs("../haskell/intcount/data/1M_random_numbers.txt");

    if (!numStrs.is_open()) {
        std::println("File open failed");
        return EXIT_FAILURE;
    }
    
    // Just plunk it on the stack, fastest for small vectors like this
    int accum[1000];
    std::fill(accum, accum + 1000, 0);

    std::string line;
    while (std::getline(numStrs, line)) {
        int num = std::atoi(line.data());
        accum[num]++;
    }

    int* maxOccurrence = std::max_element(accum, accum + 1000);
    std::println("Mode: {}", (int)(maxOccurrence - accum));

    return EXIT_SUCCESS;
}