How to reuse efficiently input from stdin in Haskell

Question

I understand that I should not try to re-read from stdin because of errors about Haskell IO - handle closed For example, in below:

main = do
  x <- getContents
  putStrLn $ map id x
  x <- getContents     --problem line
  putStrLn x

the second call x <- getContents will cause the error:

test: <stdin>: hGetContents: illegal operation (handle is closed)

Of course, I can omit the second line to read from getContents.

main = do
  x <- getContents
  putStrLn $ map id x
  putStrLn x

But will this become a performance/memory issue? Will GHC have to keep all of the contents read from stdin in the main memory?

I imagine the first time around when x is consumed, GHC can throw away the portions of x that are already processed. So theoretically, GHC could only use a small amount of constant memory for the processing. But since we are going to use x again (and again), it seems that GHC cannot throw away anything. (Nor can it read again from stdin).

Is my understanding about the memory implications here correct? And if so, is there a fix?

Yes, it has to keep all stdin in memory. I can't see how a fix would be possible: do you want your program to discard data and still have it available later? Maybe you are looking for something like do x <- getContents ; y <- useMyDataSomehow x ; useMoreData y ? Anyway your problem might be solved by the pipes/conduit libraries (at the cost of writing your program in the style required by those libraries) — chi
– chi, Commented Feb 11, 2017 at 21:12
You have it exactly right do {cs <- getContents; putStrLn cs} uses basically no memory, do {cs <- getContents; putStrLn cs; putStrLn cs} accumulates all of cs the first time around. How you 'get around' this will depend on what you are doing. Do you literally want to putStrLn twice? Or e.g record length and print, or what? — Michael
– Michael, Commented Feb 11, 2017 at 21:14
Roughly speaking, I am looking to use Haskell to implement UNIX tee command or a variant of it. — thor
– thor, Commented Feb 11, 2017 at 21:24

melpomene · Accepted Answer · 2017-02-11 21:40:24Z

2

Yes, your understanding is correct: If you reuse x, ghc has to keep it all in memory.

I think a possible fix is to consume it lazily (once).

Let's say you want to output x to several output handles hdls :: [Handle]. The naive approach is:

main :: IO ()
main = do
    x <- getContents
    forM_ hdls $ \hdl -> do
        hPutStr hdl x

This will read stdin into x as the first hPutStr traverses the string (at least for unbuffered handles, hPutStr is simply a loop that calls hPutChar for each character in the string). From then on it'll be kept in memory for all following hdls.

Alternatively:

main :: IO ()
main = do
    x <- getContents
    forM_ x $ \c -> do
        forM_ hdls $ \hdl -> do
            hPutChar hdl c

Here we've transposed the loops: Instead of iterating over the handles (and for each handle iterating over the input characters), we iterate over the input characters, and for each character, we print it to each handle.

I haven't tested it, but this form should guarantee that we don't need a lot of memory because each input character c is used once and then discarded.

answered Feb 11, 2017 at 21:40

melpomene

86.2k8 gold badges96 silver badges155 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

thor Over a year ago

Thanks. I can see how your example works. But I probably need something more complex than the synchronized handlers here. For example, if I need to reverse or sort the stream in the second round? I guess somewhere down the line, it will break.

Michael Over a year ago

@tinlyx reverse and sort will force all the input to be accumulated, they are text-book non-streaming operations. Things like length or word count etc. will work.

thor Over a year ago

Thanks for the clarification. Do you think if I use e.g. reverse on a big stream, I'll get the same memory issue anyways, regardless of whether it's used on a second round?

Michael Over a year ago

Yes. Just do {cs <- getContents; putStrLn (reverse cs)} and anything like it will have to accumulate the whole input, even without a second use. Similarly sort can't proceed until it knows it has the smallest element, which might be the last.

Collectives™ on Stack Overflow

How to reuse efficiently input from stdin in Haskell

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related