nodejs and sqs: image upload to s3 with post-processing - node.js

I have a nodejs server in express where users can sign up and add a avatar. This avatar is - after uploading to the server - cropped to 200x200 px and some other image alterations with sharp. I want to optimize this process by adding a worker thread connected to amazon SQS. My strategy (please correct me if I'm wrong).
upload the file (raw) to an S3 folder without uploading it first to my nodejs server (so user --> S3 instead of user --> node --> s3).
adding a message to a SQS queue from nodejs with the payload message that a new avatar is uploaded to s3 (with the url of the avatar and the ID of the user in the json payload for the message).
Initiating a worker thread that listen's to the queue for new messages. It receives a new message and the worker thread will do the modifications to the file (cropping it, ...) and upload it back to s3.
I have a few questions about this strategy?
How do I add a worker thread to my nodejs server. I have a digitalocean droplet with 2 CPU's, and I'm using PM2 to spawn my nodejs server to both CPU's. How do I add a worker thread to this system? Or should I add a second server with this worker thread?
Can I do database manipulations in a worker thread?
Thanks in advance!

You can checkout BullMQ for these kinds of requirements, where you need to post-process a logic separately from your client logic.
To put it simply, you need a message-queue-worker system to post-process jobs. This can be achieved using many queue systems available, like RabbitMQ, or the one I mentioned earlier (which I personally prefer for NodeJS).

Related

How can i implement a Queue in express App

I'm working with an express app which is deployed in a EC2 container.
This app gets the request from anAWSLambda with some data to handle a Web Scrapping service (is deployed in EC2 because deploying it in AWSLambda is difficult).
The problem is that I need to implement a queue service in the express app to avoid opening more than X quantity of browsers at the same time depending the instance size.
How can I implement a queue service to await for a web scrapper request to be terminated before launching another one?
The time is not a problem because is a scheduled task that executes in early morning.
A simple in memory queue would not be enough, as you would not want request to be lost incase there is a crash.
If you are ok with app crash or think there there is very low probability then node modules like below could be handy.
https://github.com/mcollina/fastq
If reliability is important then amazon SQS should be good to go.
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/sqs-examples-send-receive-messages.html
Queue the work on request handler. Have a simple timeout base handler which can listen to queue and perform task.

Nodejs upload and process video

I'm looking for best approach for my application.
I have video upload functionality. Front-end will send upload/video request with attached video file, then my Back-End will handle this request, will reduce the size and quality of the video (using fluent-ffmpeg
), then will create thumbnail image, based on the first frame of the video, then will upload video and his thumbnail image to the AWS S3 bucket, and in finally will return the compressed video and thumbnail to the front-end.
The problem that i have, all of those (back-end) tasks for compressing, creating thumbnail and uploading are much time consuming and sometimes (depends on the video size and duration) my nginx server will return 504 Gateway Time-out, which is normal. The question is:
How to handle this case. Should i use web sockets to notify the front-end for progression for processing the video, or i don't need to wait until all of those actions is completed.
My goal is to have functionality, where i can upload a video and show some progress bar for video processing and the user can be able to "play" with the application, not to be required to waiting until video is processed successfully
Seems like this is an architectural problem. Here is one of the solution that I prefer.
Use queue and store progress in some key value db. You may be unfamiliar with queue so I would recommend you to check some queue related tutorials. As you are using amazon, sqs might be interesting to you. In Rails you can check sidekiq. Laravel has laravel horizon.
While each queue is progress design app so it can report it. Like 50% 60% etc.
Process Thumbnails etc on queue too.
And if you want to scale you can simply increase the number of queue. I think that's how other also handle it.

How can my node.js server on AWS handle multiple users generating images when node calls Python program?

I have containerized node.js code that runs on ECS. When multiple users use node.js to call a .py image generating problem, only 1 user gets the image, the rest get errors. I wonder if it appropriate to use Lambda so that the image generation multithreads.
For some reason, the containerized code which uses docker works locally, but not on AWS when multiple users access the .py function.
If users send images from mobile application, you can use aws-sdk to upload images from mobile device to AWS S3, and configure Lambda trigger for image upload.
This Lambda will process image data and return you the result.
Since Lambda from serverless world, it can handle really big amount of invocations.
So, in case if you/your team can add aws-sdk to mobile application, it's nice approach to upload image directly from device to S3, trigger Lambda to image processing and change user's data in some storage.
If you have untrusted environment, like user's browser, it's not O.K. to upload image directly from browser to S3, since to achieve this goal you have to provide AWS access keys.
So, in this case, it's O.K. to upload image to server and transfer image from server to S3.
After this, the logic keeps the same: trigger AWS Lambda, process data and update it in storage.
This behaviour will reduce load to the server and allows you to work on features, instead of working on image storing and other stuff, which will just bother your server.

get amazon S3 to send http request upon file upload

I need my nodejs application to receive a http request with a file name, when a file is uploaded in my S3 bucket.
I would like some recommendations on the most simple/straight forward way to achieve this.
So far I see 3 ways to do this, but I feel Im overthinking this, and there surely exist better options:
1/ file uploaded on s3 -> S3 send a notification to SNS -> SNS sends a http request to my application
2/ file uploaded on s3 -> lambda function is triggered and sends a http request to my application
3/ make my application watch the bucket on regular basis and do something when a file is uploaded
thanks
ps. yes, Im really new to amazon services :)
SNS: Will work OK, but you'll have to manage the SNS topic subscription. You also won't have any control over the HTTP post's format.
Lambda: This is what I would go with. It gives you the most control.
How would you efficiently check for new objects exactly? This isn't a good solution.
You could also have S3 post the new object events to SQS, and configure your application to poll the SQS queue instead of listening for an HTTP request.
SNS- If you want to call multiple services on updating S3 then I would suggest SNS. You can create a topic for SNS and there can have multiple subscribers to that topic. Later if you want to add more HTTP then it would be as simple as subscribing the topic.
Lambda - If you need to sent notification to only one HTTP endpoint then I would strongly recommend this.
SQS - You don't need SQL in this scenario. SQS is mainly for decoupling components and would be the best fit for microservices but you can use with other messaging systems as well
You don't need to build something on your on to regularly monitor the bucket for changes as already services there like Lambda, SNS etc.

Image processing server using NodeJS

I have a NodeJS server that is serving out web requests, users can also upload images to my server. I am currently processing these images on the same server. This was really a stop gap solution until I could do something better as the image processing eats up a lot of CPU, and I don't want to do it on my web server.
Type of processing is very simple:
Auto orient the image, POST the image to the processing server and
have it return an oriented image.
Create thumbnails, POST the image to the processing server and have
it return a thumbnail.
I would like to be able to use this image processing server for many other servers to hit. I also want it to run with little memory/cpu. I.E. if I have more CPU/memory available then great but I don't want it to crash if deployed on a server with limited resources.
The way I currently handle this is I have a config file the specifies how many images I can process concurrently. If I have very limited resources I will only process 1 image at a time while queuing the rest. This works fine right now cause the web server and image processing server is one in the same.
My standalone image processing server will have to 'queue' requests such that it can limit the amount of images it is processing at a time.
I don't really want to store the images on the processing server. Mainly because my web server knows the data structure (which images belong with which content). So I really want to just make a request to a server and get back the processed image.
Some questions.
Is a NodeJS implementation a good solution?
Should this be a RESTful http api. I am wondering about this because I am not sure I can pull it off given the requirement of only processing a certain amount of images at a time. If I get 15 requests at the same time I am not sure what to do here, I don't want to process the first while the others wait for a response (processed image).
Could use sockets to connect my web server with my image processing server to avoid question #2. This isn't as 'featureful' as it may be harder to use with other servers. But does get me out of the request/response problem I have in #2 above. IE web server could push image to be processed on socket and processing server could just answer back on the socket whenever it is done.
Is there some opensource solution that will run on linux/unix that will suit my needs. This seems like a pretty common problem, should be fun to implement but would hate to reinvent the wheel if I could use/contribute to another product. I have seen Apace::ImageMagick, but I cannot POST images they have to reside on the server already. There are also some windows only solutions that I cannot use.
Please help...
Why not break down the process into discrete steps.
User uploads image. Gets some kind of confirmation message that image has been successfully uploaded and is being processed.
Place request for image processing in queue.
Perform image processing.
Inform user that image processing is complete.
This fits your requirements in that users can upload many images in a short period of time without overloading the system (they simply go into the queue), image processing can happen on a separate server without moving the files)

Resources