Today I’m gonna show you how to download a file to S3 from a Lambda without using temporary space
As you know, tmp
directory in AWS Lambda functions has only 512 Mb. If you want to download a big file you can’t use this tmp
directory. I’ll show you how to stream this file and upload it to S3 directly
And for the same price you’ll get a bonus and you’ll see how to do this even if the file is Gzipped
Let’s see the code that make the magic
import zlib
import boto3
import logging
import botocore.vendored.requests.packages.urllib3 as urllib3
log = logging.getLogger()
log.setLevel(logging.INFO)
s3 = boto3.client('s3')
class GzippedResponse():
READ_BLOCK_SIZE = 1024*8
def __init__(self, response):
self.response = response
self.d = zlib.decompressobj(16 + zlib.MAX_WBITS)
def read(self, amount=None):
if amount is None:
amount = self.READ_BLOCK_SIZE
data = self.response.read(amount)
return self.d.decompress(data)
def lambda_handler(event, context):
url = event["url"]
bucket = event['bucket']
is_gzipped = event["is_gzipped"]
file_name = event['file_name']
log.info(f"Downloading file from {url}")
http = urllib3.PoolManager()
response = http.request('GET', url, preload_content=False)
if is_gzipped:
fileobj = GzippedResponse(response)
else:
fileobj = response
path = f"s3://{bucket}/{file_name}"
log.info(f"Uploading file to {path}")
s3.upload_fileobj(fileobj, bucket, file_name)
Let’s check some of the important lines:
- In line 36 we do a request but we don’t preload the content so the file is streamed
- In line 39 we wrap the response with the class created in line 13. This class must have the method
read
as you can see in line 19 - We use the method
upload_fileobj
that performs a multipart upload in multiple threads if is necessary
To use this lambda you must invoke it with these parameters:
{
"url: "https://link.to/gzipped.gz",
"bucket": "my-bucket",
"is_gzipped": true,
"file_name": "my-folder/extracted.txt"
}
I’ve tried this code with a 235Mb gzipped file and 908Mb without compression. It tooks 76 seconds to download it and the max memory used was 187Mb. Not bad 😎