get amazon S3 to send http request upon file upload - node.js

I need my nodejs application to receive a http request with a file name, when a file is uploaded in my S3 bucket.
I would like some recommendations on the most simple/straight forward way to achieve this.
So far I see 3 ways to do this, but I feel Im overthinking this, and there surely exist better options:
1/ file uploaded on s3 -> S3 send a notification to SNS -> SNS sends a http request to my application
2/ file uploaded on s3 -> lambda function is triggered and sends a http request to my application
3/ make my application watch the bucket on regular basis and do something when a file is uploaded
thanks
ps. yes, Im really new to amazon services :)

SNS: Will work OK, but you'll have to manage the SNS topic subscription. You also won't have any control over the HTTP post's format.
Lambda: This is what I would go with. It gives you the most control.
How would you efficiently check for new objects exactly? This isn't a good solution.
You could also have S3 post the new object events to SQS, and configure your application to poll the SQS queue instead of listening for an HTTP request.

SNS- If you want to call multiple services on updating S3 then I would suggest SNS. You can create a topic for SNS and there can have multiple subscribers to that topic. Later if you want to add more HTTP then it would be as simple as subscribing the topic.
Lambda - If you need to sent notification to only one HTTP endpoint then I would strongly recommend this.
SQS - You don't need SQL in this scenario. SQS is mainly for decoupling components and would be the best fit for microservices but you can use with other messaging systems as well
You don't need to build something on your on to regularly monitor the bucket for changes as already services there like Lambda, SNS etc.

Related

nodejs and sqs: image upload to s3 with post-processing

I have a nodejs server in express where users can sign up and add a avatar. This avatar is - after uploading to the server - cropped to 200x200 px and some other image alterations with sharp. I want to optimize this process by adding a worker thread connected to amazon SQS. My strategy (please correct me if I'm wrong).
upload the file (raw) to an S3 folder without uploading it first to my nodejs server (so user --> S3 instead of user --> node --> s3).
adding a message to a SQS queue from nodejs with the payload message that a new avatar is uploaded to s3 (with the url of the avatar and the ID of the user in the json payload for the message).
Initiating a worker thread that listen's to the queue for new messages. It receives a new message and the worker thread will do the modifications to the file (cropping it, ...) and upload it back to s3.
I have a few questions about this strategy?
How do I add a worker thread to my nodejs server. I have a digitalocean droplet with 2 CPU's, and I'm using PM2 to spawn my nodejs server to both CPU's. How do I add a worker thread to this system? Or should I add a second server with this worker thread?
Can I do database manipulations in a worker thread?
Thanks in advance!
You can checkout BullMQ for these kinds of requirements, where you need to post-process a logic separately from your client logic.
To put it simply, you need a message-queue-worker system to post-process jobs. This can be achieved using many queue systems available, like RabbitMQ, or the one I mentioned earlier (which I personally prefer for NodeJS).

Make AWS Lambda send back a file after several minutes of processing

I have a node app that takes an url, scrape some text with puppeteer and translate it using deepl before sending me back the result in a txt file. It works as expected locally but having a lot of urls to visit and wanting to learn, I'm trying to make this app works with AWS Lambda and a docker image.
I was thinking about using a GET/POST request to send the url to API Gateway to trigger my lambda and wait for it to send me back the txt file. The issue is the whole process takes 2/3 minutes to complete and send back the file. It is not a problem locally but I know you should not have an http request wait for 3 minutes before returning.
I don't really know how to tackle this problem. Should I create a local server and make the lambda post a request to my ip adress once it is done?
I'm a loss here.
Thanks in advance!
One can see a few alternatives to what is seemingly asynchronous processing concern.
Poke the lambda with the data it needs (via an API, SDK, or CLI) then have it write its results to an S3 bucket. One could poll the s3 bucket for the results asynchronously and pull them down, obviously this requires some scripting.
An another approach would be to have the lambda post the results to a SNS topic that you've subscribed to.
That said, I'm not entirely sure what is meant by local IP, but I would avoid pushing data directly to a self-managed server (or your local IP), rather I would want to use one of the AWS "decoupling" services like SNS, SQS, or even S3 to split apart processing steps. This way it's possible to make many requests and pull down the data as needed.

How can we get data related to an existing Bluemix Message Hub topic through node.js?

I have created a topic using node.js and mapped to Message Hub using prototype MessageHub.prototype.topics.create(topic). I want to add an existing Message Hub topic to node.js and consume data from it. Is there any function for this?
#rajeswari
I guess you're using the node.js module that uses Message Hub REST API.
Feel free to have a look at an example using a native Node.js client for Kafka,
https://github.com/ibm-messaging/message-hub-samples/tree/master/kafka-nodejs-console-sample
#rajeswari the topics.create(topicName) call simply requests the topic is created in MessageHub and when its Promise returns the json output response of that request is available.
If you want to retrieve messages from an existing topic, you can just skip that step and proceed directly to creating a ConsumerInstance via MessageHub.prototype.consume and then call MessageHub.ConsumerInstance.prototype.get(topicName) on the returned ConsumerInstance.

Use case of AWS lambda to nodejs project on Elastic Beanstalk

I have a thumbnailing function running on lambda, and I want to deploy it on elastic beanstalk. Lambda did a lot of background jobs for me so that when I deploy my function to elastic beanstalk, it won't work for me properly as I expected.
My lambda function can thumbnail all images in a given folder of given s3 bucket, and store them into different size images in the same location when it's triggered. However, when I deployed that to beanstalk, it will not be triggered by any of s3 events.
I know the rough step to fix it, but I need to know few specific things:
Before creating a lambda function, we need to configure event resourses:
I want to know if I can somehow pass them in beanstalk, I'm thinking about passing a json into my node.js function, but I don't know exactly how.
I don't know if I should add my function in an infinite loop to monitor events notifications from s3.
I want to combine this node.js function with other independent node.js service by express. And I want to display summary message about how many images has been thumbnailed in browser. But currently, with lambda package structure, I'm exporting a function handler to other js files. How can I export internal data to another static hjs/jade page?
How can I get the notification from s3?
In brief, if it isn't worthy of adding such complexity to deploy lambda function to beanstalk, should I just leave it as a lambda function?
Regarding Elastic Beanstalk vs. AWS Lambda, I think Lambda is going to be more scalable, as well as cheaper, for this sort of task. And I think saving status information to a DynamoDB table would be a quick and easy way to make statistics available that you can display in your web application, while preventing those statistics from disappearing if you redeploy or restart your application. Saving that data in DynamoDB would also allow you to have more than one EC2 instance serving your website in Elastic Beanstalk without having to worry about synchronizing that data across servers somehow.
Regarding sending S3 notifications to Elastic Beanstalk, you would need to do the following:
Configure S3 to send notifications to an SNS topic
Configure the SNS topic to send notifications to an HTTP endpoint
Configure an HTTP endpoint in your Beanstalk application to receive and process those notifications

Can you post to multiple EC2 instances behind an ELB to trigger events?

I use a continuous-deployment style with my node.js application where I:
Update the application code in a specific branch on GitHub
Use a GitHub webhook to post to a URL defined on my server
The server then evaluates whether the 'push' event was on the branch its codebase is mapped to and updates its codebase if so (using naught, which spins up the new version before shutting down the old one).
With a single server this is a breeze, but if I have two servers behind an ELB, is there a way to post to both of them and trigger them to ensure they check their application code against the latest push? I would expect only one instance to receive the post under normal conditions which means other instances would have old application code.
Probably lots of ways to do this, but one option is to enlist the help of AWS SNS. Have each of your machines subscribe to SNS topic using the HTTP / Post action as a webhook.
You would need to get GitHub to directly post the message to the SNS Topic (if possible), or use one of your machines to post that initial message to SNS in response to the Github webhook.
Once the message gets to SNS it will be fanned out to all the subscribers (in your case two), but would make it easy to add more machines in the future.
With just a handful of machines this would only cost pennies.

Resources