6

I am streaming the download of an S3 file using amazonka, and I use the sinkBody function to continue with the streaming. Currently, I download the file as follows:

getFile bucketName fileName = do
    resp <- send (getObject (BucketName bucketName) fileName)
    sinkBody (resp ^. gorsBody) sinkLazy

where sinkBody :: MonadIO m => RsBody -> ConduitM ByteString Void (ResourceT IO) a -> m a. In order to run in constant memory, I thought that sinkLazy is a good option for getting a value out of the conduit stream.

After this, I would like to save the lazy bytestring of data (S3 file) into a local file, for which I use this code:

-- fetch stream of data from S3
bytestream <- liftIO $ AWS.runResourceT $ runAwsT awsEnv $ getFile serviceBucket key

-- create a file
liftIO $ writeFile filePath  ""

-- write content of stream into the file (strict version), keeps data in memory...
liftIO $ runConduitRes $ yield bytestream .| mapC B.toStrict .| sinkFile filePath

But this code has the flaw that I need to "realise" all the lazy bytestring in memory, which means that it cannot run in constant space.

  • Is there any way that I can use conduit to yield a lazy bytestring and save it into a file in constant memory?

  • or, any other approach that does not use the sinkLazy and solves the problem of saving into a file running in constant space?

EDIT

I also tested writing the lazy bytestream directly to a file, as follows, but this consumes about 2 times the file size in memory. (The writeFile is from Data.ByteString.Lazy).

bytestream <- liftIO $ AWS.runResourceT $ runAwsT awsEnv $ getFile serviceBucket key
writeFile filename bytestream
0

1 Answer 1

5

Well, the purpose of a streaming library like conduit is to realize some of the benefits of lazy data structures and actions (lazy ByteStrings, lazy I/O, etc.) while better controlling memory usage. The purpose of the sinkLazy function is to take data out of the conduit ecosystem with its well controlled memory footprint and back into the wild West of lazy objects with associated space leaks. So, that's your problem right there.

Rather than sink the stream out of conduit and into a lazy ByteString, you probably want to keep the data in conduit and sink the stream directly into the file, using something like sinkFile. I don't have an AWS test program up and running, but the following type checks and probably does what you want:

import Conduit
import Control.Lens
import Network.AWS
import Network.AWS.S3

getFile bucketName fileName outputFileName = do
    resp <- send (getObject (BucketName bucketName) fileName)
    sinkBody (resp ^. gorsBody) (sinkFile outputFileName)
Sign up to request clarification or add additional context in comments.

2 Comments

Also, if you don't want to pass the output filename to getFile, then getFile should just return the RsBody and let the caller sink it.
Thank you so much. I think you are right that if we leave the conduit ecosystem, we are in Haskell's lazy world. For some reason I thought that the way to go was to write to a file from outside the conduit ecosystem. Thanks again!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.