s3 bucket

Today I’m gonna show you how to download a file to S3 from a Lambda without using temporary space

As you know, tmp directory in AWS Lambda functions has only 512 Mb. If you want to download a big file you can’t use this tmp directory. I’ll show you how to stream this file and upload it to S3 directly

And for the same price you’ll get a bonus and you’ll see how to do this even if the file is Gzipped

Let’s see the code that make the magic

import zlib
import boto3
import logging
import botocore.vendored.requests.packages.urllib3 as urllib3

log = logging.getLogger()
log.setLevel(logging.INFO)

s3 = boto3.client('s3')


class GzippedResponse():
    READ_BLOCK_SIZE = 1024*8

    def __init__(self, response):
        self.response = response
        self.d = zlib.decompressobj(16 + zlib.MAX_WBITS)

    def read(self, amount=None):
        if amount is None:
            amount = self.READ_BLOCK_SIZE

        data = self.response.read(amount)
        return self.d.decompress(data)


def lambda_handler(event, context):
    url = event["url"]
    bucket = event['bucket']
    is_gzipped = event["is_gzipped"]
    file_name = event['file_name']

    log.info(f"Downloading file from {url}")

    http = urllib3.PoolManager()
    response = http.request('GET', url, preload_content=False)

    if is_gzipped:
        fileobj = GzippedResponse(response)
    else:
        fileobj = response

    path = f"s3://{bucket}/{file_name}"
    log.info(f"Uploading file to {path}")

    s3.upload_fileobj(fileobj, bucket, file_name)

Let’s check some of the important lines:

  • In line 36 we do a request but we don’t preload the content so the file is streamed
  • In line 39 we wrap the response with the class created in line 13. This class must have the method read as you can see in line 19
  • We use the method upload_fileobj that performs a multipart upload in multiple threads if is necessary

To use this lambda you must invoke it with these parameters:

{
  "url: "https://link.to/gzipped.gz",
  "bucket": "my-bucket",
  "is_gzipped": true,
  "file_name": "my-folder/extracted.txt"
}

I’ve tried this code with a 235Mb gzipped file and 908Mb without compression. It tooks 76 seconds to download it and the max memory used was 187Mb. Not bad 😎