1

I've been trying to transfer a large file using either LWP (or a web service API that depends on LWP) and running into the issue, no matter how I approach it, that the process crumbles at a certain point. On a whim, I watched top while my script runs and noticed that the memory usage balloons to over 40GB right before things start failing.

I thought the issue was the S3 APIs I used initially, so I decided to use LWP::UserAgent to connect to the server myself. Unfortunately the issues remain using just LWP: memory usage still balloons and while it goes longer before failing, it got halfway through the transfer and then had a segmentation fault.

Simply reading the file I want to transfer in segments works just fine and never takes memory usage above 1.4GB:

my $filename = "/backup/2022-12-13/accounts/backup.tar.gz";
my $size = -s $filename; 
my $chunkSize = (1024*1024*100);
my $parts = ceil($size / $chunkSize);

# open 9.6 GB file
open(my $file, '<', $filename) or die("Error reading file, stopped");
binmode($file); 

for (my $i = 0; $i <= $parts; $i++) {
    my $chunk;
    my $offset = $i * $chunkSize + 1;

    read($file, $chunk, $chunkSize, $offset);

    # Code to do what I need to do with the chunk goes here.
    sleep(5);

    print STDOUT "Uploaded $i of $parts.\n";
}

However, adding in the LWP code suddens raises the memory usage significantly and, as I said, eventually gets a segmentation fault (at 55% of the transfer). Here's a minimal, complete, reproducible example:

use POSIX;
use HTTP::Request::Common;
use Net::Amazon::Signature::V4;
my $awsSignature = Net::Amazon::Signature::V4->new( $config{'access_key_id'}, $config{'access_key'}, 'us-east-1', 's3' );

# Get Upload ID from Amazon.
our $simpleS3 = Amazon::S3->new({
    aws_access_key_id  => $config{'access_key_id'},
    aws_secret_access_key => $config{'access_key'},
    retry => 1
}); 
my $bucket = $simpleS3->bucket($bucketName); 
my $uploadId = $bucket->initiate_multipart_upload('somebigobject');

my $filename = "/backup/2022-12-13/accounts/backup.tar.gz";
my $size = -s $filename; 
my $chunkSize = (1024*1024*100);
my $parts = ceil($size / $chunkSize);

# open 9.6 GB file
open(my $file, '<', $filename) or die("Error reading file, stopped");
binmode($file); 

for (my $i = 0; $i <= $parts; $i++) {
    my $chunk;
    my $offset = $i * $chunkSize + 1;

    read($file, $chunk, $chunkSize, $offset);

    # Code to do what I need to do with the chunk goes here.
    my $request = HTTP::Request::Common::PUT("https://bucket.s3.us-east-1.amazonaws.com/somebigobject?partNumber=" . ($i + 1) . "&uploadId=" . $uploadId);
    $request->header('Content-Length' => length($chunk));
    $request->content($chunk);
    my $signed_request = $awsSignature->sign( $request );
    
    my $ua = LWP::UserAgent->new();
    my $response = $ua->request($signed_request);
    
    my $etag = $response->header('Etag');
    
    # Try to make sure nothing lingers after this loop ends.
    $signed_request = '';
    $request = '';
    $response = '';
    $ua = '';           
        
    ($partList{$i + 1}) = $etag =~ m#^"(.*?)"$#;

    print STDOUT "Uploaded $i of $parts.\n";
}

The same issue occurs -- just even sooner in the process -- if I use Paws::S3, Net::Amazon::S3::Client or Amazon::S3. It appears each chunk somehow stays in memory. As the code progresses I can see a gradual but significant increase in memory usage until it hits that wall at around 40GB. Here's the bit that replaces sleep(5) in the real world code:

        $partList{$i + 1} = $bucket->upload_part_of_multipart_upload('some-big-object', $uploadId, $i + 1, $chunk);

The final code that fails because it uses so much memory:

use Amazon::S3;
our $simpleS3 = Amazon::S3->new({
    aws_access_key_id  => $config{'access_key_id'},
    aws_secret_access_key => $config{'access_key'},
    retry => 1
}); 

my $filename = "/backup/2022-12-13/accounts/backup.tar.gz";
my $size = -s $filename; 
my $chunkSize = (1024*1024*100);
my $parts = ceil($size / $chunkSize);
my %partList;

my $uploadId = $bucket->initiate_multipart_upload('some-big-object');

# open 9.6 GB file
open(my $file, '<', $filename) or die("Error reading file, stopped");
binmode($file); 

for (my $i = 0; $i <= $parts; $i++) {
    my $chunk;
    my $offset = $i * $chunkSize + 1;

    read($file, $chunk, $chunkSize, $offset);

    # Code to do what I need to do with the chunk goes here.
    $partList{$i + 1} = $bucket->upload_part_of_multipart_upload('some-big-object', $uploadId, $i + 1, $chunk);

    print STDOUT "Uploaded $i of $parts.\n";
}
4
  • 2
    I know nothing about the Amazon::S3 API, but it is it really necessary to keep the return value of every call to upload_part_of_multipart_upload()? I would guess that each of those return values is some sort of object which is holding onto the chunk it uploaded. Commented Dec 15, 2022 at 12:30
  • @DaveMitchell in theory, the return value is just the "etag" (essentially an MD5SUM as I understand it) that is ultimately passed over to the server to reassemble the parts. I need to look more closely at that value, but having tried to Dump $bucket it does appear that the Amazon::S3 object stores the chunk for some odd reason. Commented Dec 15, 2022 at 17:21
  • I updated my question, because I believe I've actually narrowed the issue to something happening with LWP, not anything specific to Amazon S3 APIs. Commented Dec 16, 2022 at 0:02
  • Possibly related to Memory leak with Perl's LWP using HTTPS. I don't know if this memory leak is fast enough to cause your problem, though. Commented Dec 16, 2022 at 0:18

1 Answer 1

2

The problem wasn't actually LWP or the S3 API, but a stupid error in how I was reading the files. I was using read($file, $chunk, $chunkSize, $offset);.

Which was creating filler with $offset where I was thinking it was offsetting itself in the file by that much. This was creating chunks that grew in size until it finally crashed. Instead, the code needs to be:

seek ($file, $offset, 0);
read ($file, $chunk, $chunkSize);

Which produces the expected chunk size.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.