1

I'm using the Goamz package and could use some help getting bucket.Multi to stream an HTTP GET response to S3.

I'll be downloading a 2+ GB file via chunked HTTP and I'd like to stream it directly into an S3 bucket.

It appears that I need to wrap the resp.Body with something so I can pass an implementation of s3.ReaderAtSeeker to multi.PutAll

// set up s3
auth, _ := aws.EnvAuth()
s3Con := s3.New(auth, aws.USEast)
bucket := s3Con.Bucket("bucket-name")

// make http request to URL
resp, err := http.Get(export_url)
if err != nil {
    fmt.Printf("Get error %v\n", err)
    return
}

defer resp.Body.Close()

// set up multi-part 
multi, err := bucket.InitMulti(s3Path, "text/plain", s3.Private, s3.Options{})
if err != nil {
    fmt.Printf("InitMulti error %v\n", err)
    return
}

// Need struct that implements: s3.ReaderAtSeeker
// type ReaderAtSeeker interface {
//  io.ReaderAt
//  io.ReadSeeker
// }

rs := // Question: what can i wrap `resp.Body` in?

parts, err := multi.PutAll(rs, 5120)
if err != nil {
    fmt.Printf("PutAll error %v\n", err)
    return
}

err = multi.Complete(parts)
if err != nil {
    fmt.Printf("Complete error %v\n", err)
    return
}

Currently I get the following (expected) error when trying to run my program:

./main.go:50: cannot use resp.Body (type io.ReadCloser) as type s3.ReaderAtSeeker in argument to multi.PutAll:
    io.ReadCloser does not implement s3.ReaderAtSeeker (missing ReadAt method)

2 Answers 2

1

You haven't indicated which package you're using to access the S3 api but I'm assuming it's this one https://github.com/mitchellh/goamz/.

Since your file is of a significant in size, a possible solution might be to use the multi.PutPart. This will give you more control than multi.PutAll. Using the Reader from the standard library, your approach would be:

  1. Get the Content-Length from the response header
  2. Get the number of parts needed based on Content-Length and partSize
  3. Loop over number of part and read []byte from response.Body into bytes.Reader and call multi.PutPart
  4. Get parts from multi.ListParts
  5. call multi.Complete with parts.

I don't have access to S3 so I can't test my hypothesis but the above could be worth exploring if you haven't already.

Sign up to request clarification or add additional context in comments.

Comments

0

A simpler approach is to use - http://github.com/minio/minio-go

It implements PutObject() which is a fully managed self contained operation for uploading large files. It also automatically does multipart for more than 5MB worth of data in parallel. if no pre-defined ContentLength is specified. It will keep uploading until it reaches EOF.

Following example shows how to do it, when one doesn't have a pre-defined input length but an io.Reader which is streaming. In this example i have used "os.Stdin" as an equivalent for your chunked input.

package main

import (
    "log"
    "os"

    "github.com/minio/minio-go"
)

func main() {
    config := minio.Config{
        AccessKeyID:     "YOUR-ACCESS-KEY-HERE",
        SecretAccessKey: "YOUR-PASSWORD-HERE",
        Endpoint:        "https://s3.amazonaws.com",
    }
    s3Client, err := minio.New(config)
    if err != nil {
        log.Fatalln(err)
    }

    err = s3Client.PutObject("mybucket", "myobject", "application/octet-stream", 0, os.Stdin)
    if err != nil {
        log.Fatalln(err)
    }

}
$ echo "Hello my new-object" | go run stream-object.go

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.