Alert for no file uploaded to S3 within a particular time - python-3.x

I have created a Python lambda function which gets executed as soon as a .zip file lands in a particular folder in an s3 bucket. Now there may be a situation where there is no file uploaded to the S3 within in a certain time period (for example 10 AM morning). How to get an alert for tracking no file arrival?

You may use cloudwatch alarms. You can set an alarm when no event (e.g. lambda execution) is present for metrics.
It has only basic options to configure, but imho it's the simplest solution

Related

How to check if a file is uploaded to S3 bucket in last X days using Node.JS

I'm a node js developer and new yo AWS. I'm working on a task related where I need to check whether a file being uploaded to S3 bucket in last 90 days or not.
Usually when a file is uploaded to S3, Lambda will be triggered and it's data being stored into Cache.
But if not uploaded, I need to trigger a Lambda function and load that file's data to Cache.
Is there any way to check if a file is uploaded to S3 bucket using Node.JS so that I could trigger the Lambda.
Does Cron Job useful to check for the file upload or is there any better approach to do this.

Trigger S3 Object without upload

I have a feeling the answer to my question will be a correct google term that i am missing but here we go.
I need to trigger all objects in an s3 bucket without uploading. The reason being i have a lambda that gets triggered on PutObject and i want to reprocess all those files again. There are huge images and re-uploading does not sound like a good idea.
I am trying to do this in nodejs but any language that anyone is comfortable with will help and i will translate.
Thanks
Amazon S3 Event can trigger an AWS Lambda function when an object is created/deleted/replicated.
However, it is not possible to "trigger the object" -- the object would need to be created/deleted/replicated to cause the Amazon S3 Event to be generated.
As an alternative, you could create a small program that lists the objects in the bucket, and then directly invokes the AWS Lambda function, passing the object details in the event message to make it look like it came from Amazon S3. There is a sample S3 Event in the Lambda 'test' function -- you could copy this template and have your program insert the appropriate bucket and object key. Your Lambda function would then process it exactly as if an S3 Event had triggered the function.
In addition to what explained above, you can use AWS S3 Batch Operations.
We used this to encrypt existing objects in the S3 bucket which were not encrypted earlier.
This was the easiest out of the box solution available in the S3 console itself.
You could also loop through all objects in the bucket and add a tag. Next, adjust your trigger event to include tag changes. Code sample in bash to follow after I test it.

Issues using Lambda and BOTO3 to copies files between buckets [duplicate]

I am currently exploring storing the attachments of an email separately from the .eml file itself. I have an SES rule set that delivers an inbound email to a bucket. When the bucket retrieves the email, an S3 Put Lambda function parses the raw email (MIME format), base64 decodes the attachment buffers, and does a putObject for each attachment and the original .eml file to a new bucket.
My problem is that this Lambda function does not trigger for emails with attachments exceeding ~3-4 MB. The email is received and stored in the initial bucket, but the function does not trigger when it is received. Also, the event does not appear in CloudWatch. However, the function works perfectly fine when manually testing it with a hardcoded S3 Put payload, and also when manually uploading a .eml file to the assigned bucket.
Do you have any idea why there is this limitation? Perhaps this is a permission issue with the bucket or maybe an issue with the assigned Lambda role? When manually testing I’ve found this is by no means a timeout or exceeding max memory used issue.
The larger files are almost certainly being uploaded via S3 Multipart Upload instead of a regular Put operation. You need to configure your Lambda subscription to also be notified of Multipart uploads. It sounds like the function is only subscribed to s3:ObjectCreated:Put events currently, and you need to add s3:ObjectCreated:CompleteMultipartUpload to the configuration.
I faced the same issue.If the Etag of the file you uploaded to S3 ends with a hyphen followed by a number then it denotes the file was uploaded using Multipart. Subscribing to CompleteMultipartUpload Event resolved the issue.
I was getting same issue. Despite having s3:ObjectCreated:CompleteMultipartUpload as event notification, the trigger failed.
I later realized that the issue was with the lambda's timeout period. This could also be a potential issue.
As per AWS Docs to listen to all object created events you can listen to s3:ObjectCreated:*

Is it possible to convert multiple files in MediaConvert AWS service?

I have got a few files in s3 bucket and all of them need to be converted (3 output file per 1 input file).
Convertion rules are equal for all files.
Is it possible to do this? How can it be implemented on Node AWS sdk?
Do I need any extra service for it?
You can create a MediaConvert JobTemplate
After this you can start one MediaConvert for each file in S3.
If you want to start this every time a file is added, for instance, your safest bet is to create a lambda that gets triggered when a new file is added to the S3 bucket and then start a new MediaConvert job using the saved JobTemplate.
Make sure you don't start a job for the outputs of the MediaConvert Job though.

Spark - how to keep data integrity when writing files to appended folder

In my organization we have application that gets events and stores them on s3 partitioned by day. Some of the events are offline which means that while writing we append the files to the proper folder (according to the date of the offline event).
We get the events by reading folders path from a queue (SQS) and then reading the data from the folders we got. each folder will contain data from several different event dates
The problem is that if the application failed for some reason after one of the stages was completed, I have no idea what was already written to the output folder and I can't delete it all because there is already other data there.
Our solution currently is writing to HDFS and after application finishes we have a script that copies files to s3 (using s3-Dist-cp).
But that doesn't seem very elegant.
My current approach is to write my own FileOutputCommmitter that will add an applicationId prefix to all written files and so in case of error I know what to delete.
So what I'm asking is actually is there an already existing solution to this within Spark and if not what do you think about my approach
--edit--
After chatting with #Yuval Itzchakov I decided to have the application write to and add this path to an AWS SQS queue. An independent process will be triggered every x minutes, read folders from SQS and copy them with s3-dist-cp from to . in the application I wrapped the main method with try-catch, if I catch exception I delete the temp folder.

Resources