Image processing server using NodeJS - node.js

I have a NodeJS server that is serving out web requests, users can also upload images to my server. I am currently processing these images on the same server. This was really a stop gap solution until I could do something better as the image processing eats up a lot of CPU, and I don't want to do it on my web server.
Type of processing is very simple:
Auto orient the image, POST the image to the processing server and
have it return an oriented image.
Create thumbnails, POST the image to the processing server and have
it return a thumbnail.
I would like to be able to use this image processing server for many other servers to hit. I also want it to run with little memory/cpu. I.E. if I have more CPU/memory available then great but I don't want it to crash if deployed on a server with limited resources.
The way I currently handle this is I have a config file the specifies how many images I can process concurrently. If I have very limited resources I will only process 1 image at a time while queuing the rest. This works fine right now cause the web server and image processing server is one in the same.
My standalone image processing server will have to 'queue' requests such that it can limit the amount of images it is processing at a time.
I don't really want to store the images on the processing server. Mainly because my web server knows the data structure (which images belong with which content). So I really want to just make a request to a server and get back the processed image.
Some questions.
Is a NodeJS implementation a good solution?
Should this be a RESTful http api. I am wondering about this because I am not sure I can pull it off given the requirement of only processing a certain amount of images at a time. If I get 15 requests at the same time I am not sure what to do here, I don't want to process the first while the others wait for a response (processed image).
Could use sockets to connect my web server with my image processing server to avoid question #2. This isn't as 'featureful' as it may be harder to use with other servers. But does get me out of the request/response problem I have in #2 above. IE web server could push image to be processed on socket and processing server could just answer back on the socket whenever it is done.
Is there some opensource solution that will run on linux/unix that will suit my needs. This seems like a pretty common problem, should be fun to implement but would hate to reinvent the wheel if I could use/contribute to another product. I have seen Apace::ImageMagick, but I cannot POST images they have to reside on the server already. There are also some windows only solutions that I cannot use.
Please help...

Why not break down the process into discrete steps.
User uploads image. Gets some kind of confirmation message that image has been successfully uploaded and is being processed.
Place request for image processing in queue.
Perform image processing.
Inform user that image processing is complete.
This fits your requirements in that users can upload many images in a short period of time without overloading the system (they simply go into the queue), image processing can happen on a separate server without moving the files)

Related

Nodejs upload and process video

I'm looking for best approach for my application.
I have video upload functionality. Front-end will send upload/video request with attached video file, then my Back-End will handle this request, will reduce the size and quality of the video (using fluent-ffmpeg
), then will create thumbnail image, based on the first frame of the video, then will upload video and his thumbnail image to the AWS S3 bucket, and in finally will return the compressed video and thumbnail to the front-end.
The problem that i have, all of those (back-end) tasks for compressing, creating thumbnail and uploading are much time consuming and sometimes (depends on the video size and duration) my nginx server will return 504 Gateway Time-out, which is normal. The question is:
How to handle this case. Should i use web sockets to notify the front-end for progression for processing the video, or i don't need to wait until all of those actions is completed.
My goal is to have functionality, where i can upload a video and show some progress bar for video processing and the user can be able to "play" with the application, not to be required to waiting until video is processed successfully
Seems like this is an architectural problem. Here is one of the solution that I prefer.
Use queue and store progress in some key value db. You may be unfamiliar with queue so I would recommend you to check some queue related tutorials. As you are using amazon, sqs might be interesting to you. In Rails you can check sidekiq. Laravel has laravel horizon.
While each queue is progress design app so it can report it. Like 50% 60% etc.
Process Thumbnails etc on queue too.
And if you want to scale you can simply increase the number of queue. I think that's how other also handle it.

NodeJS API sync uploaded files through SFTP

I have a NodeJS REST API which has endpoints for users to upload assets (mostly images). I distribute my assets through a CDN. How I do it right now is call my endpoint /assets/upload with a multipart form, the API creates the DB resource for the asset and then use SFTP to transfer the image to the CDN origin's. Upon success I respond with the url of the uploaded asset.
I noticed that the most expensive operation for relatively small files is the connection to the origin through SFTP.
So my first question is:
1. Is it a bad idea to always keep the connection alive so that I can
always reuse it to sync my files.
My second question is:
2. Is it a bad idea to have my API handle the SFTP transfer to the CDN origin, should I consider having a CDN origin that could handle the HTTP request itself?
Short Answer: (1) it a not a bad idea to keep the connection alive, but it comes with complications. I recommend trying without reusing connections first. And (2) The upload should go through the API, but there maybe be ways to optimize how the API to CDN transfer happens.
Long Answer:
1. Is it a bad idea to always keep the connection alive so that I can always reuse it to sync my files.
It is generally not a bad idea to keep the connection alive. Reusing connections can improve site performance, generally speaking.
However, it does come with some complications. You need to make sure the connection is up. You need to make sure that if the connection went down you recreate it. There are cases where the SFTP client thinks that the connection is still alive, but it actually isn't, and you need to do a retry. You also need to make sure that while one request is using a connection, no other requests can do so. You would possibly want a pool of connections to work with, so that you can service multiple requests at the same time.
If you're lucky, the SFTP client library already handles this (see if it supports connection pools). If you aren't, you will have to do it yourself.
My recommendation - try to do it without reusing the connection first, and see if the site's performance is acceptable. If it isn't, then consider reusing connections. Be careful though.
2. Is it a bad idea to have my API handle the SFTP transfer to the CDN origin, should I consider having a CDN origin that could handle the HTTP request itself?
It is generally a good idea to have the HTTP request go through the API for a couple of reasons:
For security reasons' you want your CDN upload credentials to be stored on your API, and not on your client (website or mobile app). You should assume that your code for website can be seen (via view source) and people can generally decompile or reverse engineer mobile apps, and they'll be able to see your credentials in the code.
This hides implementation details from the client, so you can change this in the future without the client code needing to change.
#tarun-lalwani's suggestion is actually a good one - use S3 to store the image, and use a lambda trigger to upload it to the CDN. There are a couple of Node.js libraries that allow you to stream the image through your API's http request towards the S3 bucket directly. This means that you don't have to worry about disk space on your machine instance.
Regarding your question to #tarun-lalwani's comment - one way to do it is to use the S3 image url path until the lambda function is finished. S3 can serve images too, if properly given permissions to do so. Then after the lambda function is finished uploading to the CDN, you just replace the image path in your db.

Simple message passing Nodejs server accepting only 4 requests at a time

We have a simple express node server deployed on windows server 2012 that recieves GET requests with just 3 parameters. It does some minor processing on these parameters, has a very simple in-memory node-cache for caching some of these parameter combinations, interfaces with an external license server to fetch license for the requesting user and sets it in the cookie, followed by which, it interfaces with some workers via a load balancer (running with zmq) to download some large files (in chunks, and unzips and extracts them, writes them to some directories) and display them to the user. On deploying these files, some other calls to the workers are initiated as well.
The node server does not talk to any database or disk. It simply waits for response from the load balancer running on some other machines (these are long operations taking typically between 2-3 minutes to send response). So, essentially, the computation and database interactions happens on other machines. The node server is only a simple message passing/handshaking server that waits for response in event handlers, initiates other requests and renders the response.
We are not using a 'cluster' module or nginx at the moment. With a bare bones node server, is it possible to accept and process atleast 16 requests simultaneously ? Pages such as these http://adrianmejia.com/blog/2016/03/23/how-to-scale-a-nodejs-app-based-on-number-of-users/ mention that a simple node server can handle only 2-9 requests at a time. But even with our bare bones implementation, not more than 4 requests are accepted at a time.
Is using a cluster module or nginx necessary even for this case ? How to scale this application for a few hundred users to begin with ?
An Express server can handle many more than 9 requests at a time, especially if it isn't talking to a datebase.
The article you're referring to assumes some database access on each request and serving static assets via node itself, rather than a CDN. All of this taking place on a single CPU with 1GB of RAM. That's a database and web server all running on a single core with minimal RAM.
There really are not hard numbers on this sort of thing; You build it and see how it performs. If it doesn't perform well enough, put a reverse proxy in front of it like nginx or haproxy to do load balancing.
However, based on your problem, if you really are running into bottlenecks where only 4 connections are possible at a time, it sounds like you're keeping those connections open way too long and blocking others. Better to have those long running processes kicked off by node, close the connections, then have those servers call back somehow when they're done.

Pass data between multiple NodeJS servers

I am still pretty new to NodeJS and want to know if I am looking at this in the wrong way.
Background:
I am making an app that runs once a week, generates a report, and then emails that out to a list of recipients. My initial reason for using Node was because I have an existing front end already built using angular and I wanted to be able to reuse code in order to simplify maintenance. My main idea was to have 4+ individual node apps running in parallel on our server.
The first app would use node-cron in order to run every Sunday. This would check the database for all scheduled tasks and retrieve the stored parameters for the reports it is running.
The next app is a simple queue that would store the scheduled tasks and pass them to the worker tasks.
The actual pdf generation would be somewhat CPU intensive, so this would be a cluster of n apps that would retrieve and run individual reports from the queue.
When done making the pdf, they would pass to a final email app that would send the file out.
My main concerns are communication between apps. At the moment I am setting up the 3 lower levels (ie. all but the scheduler) on separate ports with express, and opening http requests to them when needed. Is there a better way to handle this? Would the basic 'net' work better than the 'http' package? Is Express even necessary for something like this, or would I be better off running everything as a basic http/net server? So far the only real use I've made of Express is to specifically listen to a path for put requests and to parse the incoming json. I was led to asking here because in tracking logs so far I see every so often the http request is reset, which doesn't appear to affect the data received on the child process, but I still like to avoid errors in my coding.
I think that his kind of decoupling could leverage some sort of stateful priority queue with features like retry on failure, clustering, ...
I've used Kue.js in the past with great sucess, it's redis backed and has nice documentation and interface http://automattic.github.io/kue/

Load test a Backbone App

I've got an NGinx/Node/Express3/Socket.io/Redis/Backbone/Backbone.Marionette app that proxies requests to a PHP/MySQL REST API. I need to load test the entire stack as a whole.
My app takes advantage of static asset caching with NGinx, clustering with node/express and socket is multi-core enabled using Redis. All that's to say, I've gone through a lot of trouble to try and make sure it can stand up to the load.
I hit it with 50,000 users in 10 seconds using blitz.io and it didn't even blink... Which concerned me because I wanted to see it crash, or at least breath a little heavy; but 50k was the max you could throw at it with that tool, indicating to me that they expect you to not reasonably be able to, or need to, handle more than that... Which is when I realized it wasn't actually incurring the load I was expecting because the load is initiated after the page loads and the Backbone app starts up and kicks off the socket connection and requests the data from the correct REST API endpoint (from different server).
So, here's my question:
How can I load test the entire app as a whole? I need the load test to tax the server in the same way that the clients actually will, which means:
Request the single page Backbone app from my NGinx/Node/Express server
Kick off requests for the static assets from NGinx (simulating what the browser would do)
Kick off requests to the REST API (PHP/MySQL running on a different server)
Create the connection to the Socket.io service (running on NGinx/Node/Express, utilizing Redis to handle multi-core junk)
If the testing tool uses a browser-like environment to load the page up, parsing the JS and running it, everything will be copasetic (NGinx/Node/Express server will get hit and so will the PHP/MySQL server). Otherwise, the testing tool will need to simulate this by firing off at least a dozen different kinds of requests nearly simultaneously. Otherwise it's like stress testing a door by looking at it 10,000 times (that is to say, it's pointless).
I need to ensure my app can handle 1,000 users hitting it in under a minute all loading the same page.
You should learn to use Apache JMeter http://jmeter.apache.org/
You can perform stress tests with it,
see this tutorial https://www.youtube.com/watch?v=8NLeq-QxkSw
As you said, "I need the load test to tax the server in the same way that the clients actually will"
That means that the tests is agnostic to the technology you are using.
I highly recommend Jmeter, is widely used and you can integrate it with Jenkins and do a lot of cool stuff with it.

Resources