Don't let deployments cancel file uploads - node.js

We have an application that accepts file uploads from the user.
Whenever we deploy our application we stop the application process and start it again. All lengthy processing is done before we actually stop the application so the actual downtime is fairly small (a few seconds).
However, when stopping the process we also kill active requests to our application (i.e. file uploads).
What would be a good way to handle this? I have a few ideas:
Extract the file upload handler into a separate service?
Make the restart more "intelligent" and tell the processes to not accept any new requests and wait for the currently active requests to stop before killing the process

You've just listed two of essentially three solutions I can think of :-)
The third would be a multi-tier deployment with a smart load balancer and deploy process smart enough to know what node to restart and when.
If it is a smaller scale app with no significant impact, I would go to what seems to me simpler version: track active downloads and monitor this on restart. Maintain just one app, you know? But it makes the upload logic more complex.
However, if the uploads are important enough, and they seem to be, it may be worth it to extract it to a separate service. Not just because of new deploy, but also to protect you from unexpected crashes and shutdowns. You would then have to decide a way to communicate completed uploads from the service though, and also handle the client response etc.
On my view, one app to maintain and deploy is simpler then two, but of course also a bit less robust.
So the answer really depends on your needs and resources, right?

Related

Azure Logic App that does heavy processing?

I am wanting to create a custom Azure Logic App which does some heavy processing. I am reading as much as I can about this. I want to describe what I wish to do, as I understand it currently, then I am hoping someone can point out where I am incorrect in my understanding, or point out a more ideal way to do this.
What I want to do is take an application that runs a heavy computational process on a 3D mesh and turn it into a node to use in Azure Logic App flows.
What I am thinking so far, in a basic form, is this:
HTTP Trigger App: This logic app receives a reference to a 3D mesh to be processed, it then saves this mesh to Azure Store and passes that that reference to the next logic app.
Mesh Computation Process App: This receives the Azure Storage reference to the 3D mesh. It then launches a high performance server with many CPUs and GPU's, the high performance server downloads the mesh, processes the mesh, then uploads the mesh back to Azure Storage. This app then passes the reference to the processed mesh to the next logic app. Finally this shuts down the high performance server so it doesn't consume resource unnecessarily.
Email notification App: This receives the Azure Storage reference to the processed mesh, then fires off an email with the download link to the user.
Is this possible? So far what I've read this appears possible. I am just wanting someone to verify this in case I've severely misunderstood something.
Also I am hoping a to get a little bit of guidance on the mechanism to launch and shut down a high performance server within the 'Mesh Computation Process App'. The only place the Azure documentation mentions asynchronous, long-term, task processing in Logic Apps is on this page:
https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-create-api-app
It talks about it requiring you to launch an API App or a Web App to receive the Azure Logic App request, then ping back status to Azure Logic Apps. I was wondering, is it possible to do this in a serverless manner? So the 'Mesh Computation Process App' would fire off an Azure Function which spins up the higher performance server, then another Azure Function periodically pings that server to report back status until complete, at which point an Azure Function then triggers the higher performance server to shut down, then signals to the 'Mesh Computation Process App' that it is complete and it continues onto the next logic app. Is it possible to do it in that manner?
Any comments or guidance on how to better approach or think about this would be appreciated. This is the first time I've dug into Azure, so I am simultaneously trying to orient myself on proper understanding in Azure and make a system like this.
Should be possible. At the moment I'm not exactly sure if Logic Apps themselves can create all of those things for you, but it definitely can be done with Azure Functions, in a serverless manner.
For your second step, if I understand correctly, you need it to run for long, just so that it can pass something further once the VM is done? You don't really need that. When in serverless, try to not think of long running tasks, and remember that everything is an event.
Putting stuff into Azure Blob storage is an event you can react to, this removes your need for linking.
Your first step, saves stuff to Azure Store, and that's it, doesn't need to do anything else.
Your second app, triggers on inserted stuff to initiate processing.
The VM processes your stuff, and puts it in the store.
The email app triggers on stuff being put into "processed" folder.
Another App triggers on the same file to shut down the VM.
This way you remove the long running state management and chaining the apps directly, and instead each of them does only what it needs to do, and then apps can trigger automatically to the results of the previous flows.
If you do need some kind of state management/orchestration in all of your steps, and you want to still be in serverless, look into durable azure functions. They are serverless, but the actions they take and results they get are stored in table storage, so it can be recreated and restored to a state it was in before. Of course everything is done for you automatically, it just changes a bit on what exactly you can do inside of it to still be durable.
The actual state management you might want to do is maybe something to keep track of all the VM's and try to reuse stuff, instead of spending time spinning them up and killing them. But don't complicate it too much for now.
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview
Of course you still need to think about error handling, like what happens if your VM just dies without uploading anything, you don't want to just miss on stuff. So you can trigger special flows to handle repeats/errors, maybe send different emails, etc.

Nodejs scaling and prioritising functions

We have a node application running on the server that gets hit a lot and has to compile a zip file for download. That works well so far but I am nervous we will hit a point where performance becomes an issue.
(The application is currently running with forever on a ubuntu 14.04 machine.)
I am now asked to add all kinds of new features to the app which are more secondary and should not decrease the performance of the main function (the zip download). It would be OK to have those additional features fail in case the app is hit too many times in favour of the main zipping process.
What is the best practise here. Creating a REST API for the secondary features and put everything into a waiting list? It surely isn't enough to just create a second app and spawn a new process each time the main zip process finishes? How Can I ensure the most redundancy? I'm not talking about a multi-core cluster setup or load-balancing on NGINX, but a smart way of prioritising application functions on application level.
I hope this is not too broad. Cheers
First off, everything should be using async I/O, no synchronous I/O anywhere in your server. That's the #1 rule for building a scalable node.js server.
Second off, the highest priority tasks that have any significant CPU usage should be allowed to use multiple cores. If, as you say, the highest priority tasks is creating the zip download, then you should makes sure that that operation can take advantage of multiple cores.
You can accomplish that either with clustering (your whole server runs multiple instances that can each be on a separate core) or by creating a set of processes specifically for creating the zip files and then create a work queue in the main process that feeds these other processes work and gets the result back from them. This second option is likely a bit more complex to code than clustering, but it does prioritize the zip file creation so only one core is serving other server needs and all other cores of working on zip file creation. Clustering shares all cores with all server responsibilities.
At the pure server application level, your server can maintain a work queue of all incoming work to be done no matter what kind and it can prioritize that work. For example, if an API call comes in and there are already N zip file requests in the queue, you could immediately fail the API call to keep it from building up on the server. I don't think I'd personally recommend that solution unless your API calls are really heavy operations because it's very hard for a developer to reliably use your API if it regularly just fails on them. They would generally find it better for the API to just be slow sometimes than to regularly fail.
You might not even have to use a queue, you could just use a counter to keep track of how many ZIP file requests were "in process", but you'd have to make absolutely sure the counter was accurate in all cases. If there was ever an accumulating error in the counter, then you might just end up failing all API requests until your server was restarted.

I'm not sure how to correctly configure my server setup

This is kind of a multi-tiered question in which my end goal is to establish the best way to setup my server which will be hosting a website as well as a service (using Socket.io) for an iOS (and eventually an Android) app. Both the app service and the website are going to be written in node.js as I need high concurrency and scaling for the app server and I figured whilst I'm at it may as well do the website in node because it wouldn't be that much different in terms of performance than something different like Apache (from my understanding).
Also the website has a lower priority than the app service, the app service should receive significantly higher traffic than the website (but in the long run this may change). Money isn't my greatest priority here, but it is a limiting factor, I feel that having a service that has 99.9% uptime (as 100% uptime appears to be virtually impossible in the long run) is more important than saving money at the compromise of having more down time.
Firstly I understand that having one node process per cpu core is the best way to fully utilise a multi-core cpu. I now understand after researching that running more than one per core is inefficient due to the fact that the cpu has to do context switching between the multiple processes. How come then whenever I see code posted on how to use the in-built cluster module in node.js, the master worker creates a number of workers equal to the number of cores because that would mean you would have 9 processes on an 8 core machine (1 master process and 8 worker processes)? Is this because the master process usually is there just to restart worker processes if they crash or end and therefore does so little it doesnt matter that it shares a cpu core with another node process?
If this is the case then, I am planning to have the workers handle providing the app service and have the master worker handle the workers but also host a webpage which would provide statistical information on the server's state and all other relevant information (like number of clients connected, worker restart count, error logs etc). Is this a bad idea? Would it be better to have this webpage running on a separate worker and just leave the master worker to handle the workers?
So overall I wanted to have the following elements; a service to handle the request from the app (my main point of traffic), a website (fairly simple, a couple of pages and a registration form), an SQL database to store user information, a webpage (probably locally hosted on the server machine) which only I can access that hosts information about the server (users connected, worker restarts, server logs, other useful information etc) and apparently nginx would be a good idea where I'm handling multiple node processes accepting connection from the app. After doing research I've also found that it would probably be best to host on a VPS initially. I was thinking at first when the amount of traffic the app service would be receiving will most likely be fairly low, I could run all of those elements on one VPS. Or would it be best to have them running on seperate VPS's except for the website and the server status webpage which I could run on the same one? I guess this way if there is a hardware failure and something goes down, not everything does and I could run 2 instances of the app service on 2 different VPS's so if one goes down the other one is still functioning. Would this just be overkill? I doubt for a while I would need multiple app service instances to support the traffic load but it would help reduce the apparent down time for users.
Maybe this all depends on what I value more and have the time to do? A more complex server setup that costs more and maybe a little unnecessary but guarantees a consistent and reliable service, or a cheaper and simpler setup that may succumb to downtime due to coding errors and server hardware issues.
Also it's worth noting I've never had any real experience with production level servers so in some ways I've jumped in the deep end a little with this. I feel like I've come a long way in the past half a year and feel like I'm getting a fairly good grasp on what I need to do, I could just do with some advice from someone with experience that has an idea with what roadblocks I may come across along the way and whether I'm causing myself unnecessary problems with this kind of setup.
Any advice is greatly appreciated, thanks for taking the time to read my question.

Wait for critical sections to complete in a graceful node.js shutdown

I want to update my node application on production, but users are using it for things like credit card transactions.
I run supervisor, but I would like to wait until all critical sections (like saving data or sending important information) are complete before it restarts.
Check out up by LearnBoost.
Zero-downtime reloads built on top of the distribute load balancer.
Read more from here:
http://www.devthought.com/2012/01/29/staying-up-with-node-js/
Another one is ncluster.
Creating a programmed dowtime seems the most straightforward thing to do, just notify the users and stop critical transactions a few minutes before the downtime, always choose the right time to go offline and be sure to be only a small timeframe away from your users.
You could also delegate to more applications the various sections of your app, for example process payments in a separate process you can message with a queue.
This clearly depends on your needs, by te way be sure to disclose a programmed downtime to your users, they will be happy to come back later.

which one to use windows services or threading

We are having a web application build using asp.net 3.5 & SQL server as database which is quite big and used by around 300 super users for managing around 5000 staffs.
Now we are implementing SMS functionality into the application which means the users will be able to send and receive SMS. Every two minute the SMS server of the third party is pinged to check whether there are any new messages. Also SMS are hold in queue and send every time interval of 15 to 30 minutes.
I want this checking and sending process to run in the background of the application all the time, even if the user closes the browser window.
I need some advice on how do I do this?
Will using thread will achieve this or do I need to create a windows service for it or are there any other options?
More information:
I want to execute a task in a timer, what will happen if I close the browser window, the task wont be completed isn't it so.
For example I am saving 10 records to the database in a time interval of 5 minutes, which means every 5 minutes when the timer tick event fires, a record is inserted into the database.
How do I run this task if I close the browser window?
I tried looking at windows service but how do I pass a generic collection of data to it for processing.
There really is no thread or service choice, a service can (and usually is!) multi threaded, a thread can start a service.
There are three basic choices you can:-
Somehow start another thread running when a user logs in -- this is probably a very poor choice for what you want, as you cannot really keep it running once the user session is lost.
Write a fully fledged windows service which is starts on OS startup and continues running unitl the server is shutdown. You can make this dependant on the SQLserver service, so it starts after the DB is available. This is the "best" solution but may be overkill for your purposes. Aslo you need to know the services API to write it properly as you need to respond correctly to shutdown and status requests.
You can schedule your task periodically using either the Windows schedular, or, preferably the schedular which is built in to SQLServer, I think this would be the most suitable option for your needs.
Distinguish between what the browser is doing and what's happening server-side.
Your Web App is sitting server-side waiting for requests from whatever browsers may be running, and servicing those requests, in servicing those requests I guess it may well put messages on a queue and have a look in a database for any new messages.
You want the daemon processor, which talks to the third-party SMS, to be triggered by time rather than by browser function. Either of your suggestions would work:
A competely independent service could run and work against the queues and database.
Your web app, which I assume is already a service, could spawn a thread
In either case we have a few technical questions of avoiding any race conditions between the browser-request processing and the daemon - but databases and queueing systems can deal with that.
So I would decide between stand-alone daemon and background thread like this:
Which is easier to implement? I'm a Java EE developer, I know in my app server I have an API for specifying code to be run according to a timer, the API deals with the threading issues. So for me that's very easy. I don't know what you have available. Timers are not quite as trivial as they may appear - so having a reliable API is beneficial. If this was a more complex requirement, where the daemon code were gnarly and might possibly interfere with the WebApp code then I might prefer to keep it conspicuously separate.
Which is easier to deploy and administer? Deploy separate Web App and daemon, or deploy one thing. In the Java EE world we could have a single Enterprise Application with all the code, so that's a single thing to deploy, start and control.
One other thing to consider: Scaling and Resilience. You might choose to have more than one copy of your web app running, either to provide fail-over capabilities or just because you need the extra power. In which case how many daemons would you have? Would it be a problem to have two daemons running? You might need some extra code to mediate between two daemons, for example log in the database the time of last work, each daemon can say "Oh, my buddy balready did the 10:30 job, I'll go back to sleep"

Resources