Scenario:
Using AWS Lambda (Node.js), I want to process large files from S3 ( > 1GB).
The /tmp fs limit of 512MB means that I can't copy the S3 input there.
I can certainly increase the Lambda memory space, in order to read in the files.
Do I pass the memory buffer to ffmpeg? (node.js, how?)
Or....should I just make an EFS mount point and use that as the transcoding scratchpad?
You can just use the HTTP(s) protocol as input for ffmpeg.
Lambda has max 10GB memory limit, and data transfer speed from S3 is around 300MB per second the last time I test. So if you have only 1GB max video and are not doing memory intensive transformation, this approach should work fine
ffmpeg -i "https://public-qk.s3.ap-southeast-1.amazonaws.com/sample.mp4" -ss 00:00:10 -vframes 1 -f image2 "image%03d.jpg"
ffmpeg works on files, so maybe an alternative would be to setup a unix pipe and then read that pipe with ffmpeg, constantly feeding it with the s3 stream.
But maybe you'd wanna consider running this as an ECS task instead, you wouldn't have a time constraint, and not the same storage constraint either. Cold start of it using Fargate would be 1-2 minutes though, which maybe isn't acceptable?
Lambda now supports up to 10Gb storage:
https://aws.amazon.com/blogs/aws/aws-lambda-now-supports-up-to-10-gb-ephemeral-storage/
Update with cli:
$ aws lambda update-function-configuration --function-name PDFGenerator --ephemeral-storage '{"Size": 10240}'
Related
I'm trying to copy some files from one bucket to another (same region), getting speed of around 315mb/s. However I'm using it in lambda and there is a 15 min timeout limit. So for bigger files goes into timeout
Below is the code snippet I'm using (in python), is there any other way I can speed it up? any inputs are welcome.
s3_client = boto3.client(
's3',
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
aws_session_token=session_token,
config=Config(signature_version='s3v4')
)
s3_client.copy(bucket_pair["input"], bucket_pair["output"]["Bucket"],
bucket_pair["output"]["Key"])
I saw many posts of passing chunksize and all, but I don't see them in ALLOWED_COPY_ARGS. Thanks.
You can use a step function and iterate over all objects and copy them. To increase throughput you can use a map task
https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-map-state.html
If you don’t want to use stepfunction you can use one producer lambda to write all objects into a sqs queue and consume them from a lambda to copy them to the respective target.
A different option would be to use S3 object replication
https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication.html
But I’m not sure if that fits for your use case
I have a file whose size is 6TB in AWS EC2, I want to split it to multiple files which size is 1Tb, so that it can be uploaded to AWS s3 bucket
I use this command
split -b1T -d myfile myfil.
but it runs so slow that after 1 hour, only 60G was split out.
How can i make it faster? or is there any way to split binary files more quickly?
I've my application running on one of the Docker containers in AWS Fargate(serverless compute engine) where I'm using a bind volume to tee application logs as follows:
/app.bin 2>&1 | tee /logs/app.log
The AWS bind volumes can grow maximum up to 200 GB, so I want to truncate the older logs once the size of my log file i.e. app.log reaches to some specific number(let's say 190 GB).
Any ideas on how do we achieve this log-rotation?
I figured out that maybe I can use logrotate command along with tee. What are the other ways? How should I do it?
I am transferring around 150 files each 1 gb to s3 using aws s3 cp command in a loop which takes around 20 sec/file, so would be 50 mins. If i put all the files in directory it takes upto 40 mins if i use folder copy with --recursive which is Multithread. I tried to change the s3 config by specifying the concurrent req to 20 , increased bandwidth, but its almost same time. What is the best way to reduce the time.
I am trying to transform data from CSV to JSON in AWS lambda (using Python 3). The size of file is 65 MB, so its getting timeout before completing the process and the entire execution get fails.
I would need to know how I can handle such a case where AWS Lambda should able to process a maximum set of data within the time out period and the remaining payload should keep into an S3 bucket.
Below is the transformation code
import json
import boto3
import csv
import os
json_content = {}
def lambda_handler(event, context):
s3_source = boto3.resource('s3')
if event:
fileObj=event['Records'][0]
fileName=str(fileObj['s3']['object']['key'])
eventTime =fileObj['eventTime']
fileObject= s3_source.Object('inputs3', fileName)
data = fileObject.get()['Body'].read().decode('utf-8-sig').split()
arr=[]
csvreader= csv.DictReader(data)
newFile=getFile_extensionName(fileName,extension_type)
for row in csvreader:
arr.append(dict(row))
json_content['Employees']=arr
print("Json Content is",json_content)
s3_source.Object('s3-output', "output.json").put(Body=(bytes(json.dumps(json_content).encode('utf-8-sig'))))
print("File Uploaded")
return {
'statusCode': 200,
'fileObject':eventTime,
}
AWS Lambda function configuration:
Memory: 640 MB
Timeout: 15 min
Since your function is timing-out, you only have two options:
Increase the amount of assigned memory. This will also increase the amount of CPU assigned to the function, so it should run faster. However, this might not be enough to avoid the timeout.
or
Don't use AWS Lambda.
The most common use-case for AWS Lambda functions is for small microservices, sometimes only running for a few seconds or even a fraction of a second.
If your use-case runs for over 15 minutes, then it probably isn't a good candidate for AWS Lambda.
You can look at alternatives such as running your code on an Amazon EC2 instance or using a Fargate container.
It looks like your function is running out of memory:
Memory Size: 1792 MB Max Memory Used: 1792
Also, it only ran for 12 minutes:
Duration: 723205.42 ms
(723 seconds ≈ 12 minutes)
Therefore, you should either:
Increase memory (but this costs more), or
Change your program so that, instead of accumulating the JSON string in memory, you continually write it out to a local disk file to /tmp/ and then upload the resulting file to Amazon S3
However, the maximum disk storage space provided to an AWS Lambda function is 512MB and it appears that your output file is bigger than this. Therefore, increasing memory would be the only option. The increased expense related to assigning more resources to the Lambda function suggests that you might be better-off using EC2 or Fargate rather than Lambda.