I have got a few files in s3 bucket and all of them need to be converted (3 output file per 1 input file).
Convertion rules are equal for all files.
Is it possible to do this? How can it be implemented on Node AWS sdk?
Do I need any extra service for it?
You can create a MediaConvert JobTemplate
After this you can start one MediaConvert for each file in S3.
If you want to start this every time a file is added, for instance, your safest bet is to create a lambda that gets triggered when a new file is added to the S3 bucket and then start a new MediaConvert job using the saved JobTemplate.
Make sure you don't start a job for the outputs of the MediaConvert Job though.
Related
I'm a node js developer and new yo AWS. I'm working on a task related where I need to check whether a file being uploaded to S3 bucket in last 90 days or not.
Usually when a file is uploaded to S3, Lambda will be triggered and it's data being stored into Cache.
But if not uploaded, I need to trigger a Lambda function and load that file's data to Cache.
Is there any way to check if a file is uploaded to S3 bucket using Node.JS so that I could trigger the Lambda.
Does Cron Job useful to check for the file upload or is there any better approach to do this.
I'll have 100's or 1000's of tiny pdf files that I'll need to zip into one big zip file and upload to S3. My currently solution is as follows:
NodeJS service send request with JSON data of all the pdf files I need to create and zip to a Lambda function
Lambda function processes data, creates each pdf file as buffer, pushes buffer into zip archiver, finalizes archive and then finally zip archive is streamed using PassThroughStream in chunks to S3.
I basically copied the below solution.
https://gist.github.com/amiantos/16bacc9ed742c91151fcf1a41012445e?permalink_comment_id=3804034#gistcomment-3804034
Now although this is a working solution, its not scalable and all of creating pdf buffer, archiving zip and upload to S3 happens in a single lambda execution which takes 20-30 seconds or more depending on the size of the final archived zip file. I have set the Lambda with 10GB memory and max 15 min timeout. Because for every 100MB of zip it requires 1GB of resources otherwise it times out due to max resources used. My zip could be 800MB or more sometimes which means it will require 8GB memory or more.
I want to use AWS multipart upload and somehow invoke multiple parallel lambda functions to achieve this. Its fine if I have to separate the creating the pdf buffers, zipping and s3 uploading into other lambdas. But I need to somehow optimize this and make it run parallely.
I see this post's answer with some nice details and example but it seems to be for a single large file.
Stream and zip to S3 from AWS Lambda Node.JS
https://gist.github.com/vsetka/6504d03bfedc91d4e4903f5229ab358c
Any way I can optimize this? Any ideas and suggestions would be great. Keep in mind the end result needs to be one big zip file. Thanks
I have created a Python lambda function which gets executed as soon as a .zip file lands in a particular folder in an s3 bucket. Now there may be a situation where there is no file uploaded to the S3 within in a certain time period (for example 10 AM morning). How to get an alert for tracking no file arrival?
You may use cloudwatch alarms. You can set an alarm when no event (e.g. lambda execution) is present for metrics.
It has only basic options to configure, but imho it's the simplest solution
I have a huge .csv file on my local machine. I want to load that data in a DynamoDB (eu-west-1, Ireland). How would you do that?
My first approach was:
Iterate the CSV file locally
Send a row to AWS via a curl -X POST -d '<row>' .../connector/mydata
Process the previous call within a lambda and write in DynamoDB
I do not like that solution because:
There are too many requests
If I send data without the CSV header information I have to hardcode the lambda
If I send data with the CSV header there is too much traffic
I was also considering putting the file in an S3 bucket and process it with a lambda, but the file is huge and the lambda's memory and time limits scare me.
I am also considering doing the job on an EC2 machine, but I lose reactivity (if I turn off the machine while not used) or I lose money (if I do not turn off the machine).
I was told that Kinesis may be a solution, but I am not convinced.
Please tell me what would be the best approach to get the huge CSV file in DynamoDB if you were me. I want to minimise the workload for a "second" upload.
I prefer using Node.js or R. Python may be acceptable as a last solution.
If you want to do it the AWS way, then data pipelines may be the best approach:
Here is a tutorial that does a bit more than you need, but should get you started:
The first part of this tutorial explains how to define an AWS Data
Pipeline pipeline to retrieve data from a tab-delimited file in Amazon
S3 to populate a DynamoDB table, use a Hive script to define the
necessary data transformation steps, and automatically create an
Amazon EMR cluster to perform the work.
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part1.html
If all your data is in S3 you can use AWS Data pipeline's predefined template to 'import DynamoDB data from S3' It should be straightforward to configure.
I'm having a user upload a file to s3 storage directly. My issue is that while that file is uploading, I want to run a worker to check if the upload is done, and if it is, do some processing.
If a file upload is in progress, is there a way I can find it from boto3 separately using it's key and bucket? If I can find it, can I check if the upload is done?
Please check S3Transfers
It takes callBack parameter which gives you upload/download percentage.