Google Cloud Node JS - Will all setTimeOuts be erased after each deploy? - node.js

I just want to understand if the server is fully restarted (clean state) or if it manages to migrate some of the memory data that was running from last version.
I have heard of node apps having 10k - 100k timeouts concurrently. On every new version (gcloud app deploy) of my server, will I lose all the pending functions that were not executed and I would need to reschedule all of them?

A node.js server has no memory of its own from the previous time it was run. So, if you want a server to retain some state from one time it's run until the next (like after a restart), then you have to maintain that state yourself in a persistent store and read that state back in when your server starts.
will I lose all the pending functions that were not executed and I would need to reschedule all of them?
Yes, you will lose them all. node.js does not persist timers itself. You will need to persist them yourself or use a different mechanism that does the persistence for you.

Related

node: persist data after process termination

I'm using node-cache to cache data from a CLI application that observes changes in files and caches them to avoid new data processing.
the problem is that I noticed that this cache is destroyed on each command, since each time the tool is called in the terminal a new instance is generated and the old one is destroyed. probably, the data is also destroyed.
I need to keep, for a specific TTL, two things in cache/memory, even if the process ends:
the processed data
the specific instance of fs.watcher, watching and executing caching operations
the question is: how do i do it? I've been searching for days on the internet and trying alternatives and I can't find a solution.
I need to keep ... things in cache/memory, even if the process ends
That's, pretty much by definition, not possible. When a process terminates, all its resources are freed up for use by something else (barring a memory-leak bug in the OS itself).
It sounds like you need to refactor your app into a service that can run in the background and separate front-end that can communicate with it.

Scheduling function calls in a stateless Node.js application

I'm trying to figure out a design pattern for scheduling events in a stateless Node back-end with multiple instances running simultaneously.
Use case example:
Create an message object with a publish date/time and save it to a database
Optionally update the publishing time or delete the object
When the publish time is reached, the message content is sent to a 3rd party API endpoint
Right now my best idea is to use bee-queue or bull to queue delayed jobs. It should be able to store the state and ensure that the job is executed only once. However, I feel like it might introduce a single point of failure, especially when maintaining state on Redis for months and then hoping that the future version of the queue library is still working.
Another option is a service worker that polls the database for upcoming events every n minutes, but this seems like a potential scaling issue down the line for multi-tenant SaaS.
Are there more robust design patterns for solving this?
Don't worry about redis breaking. It's pretty stable, and eventually you can decide to freeze the version.
If there are jobs that will be executed in the future I would suggest a database, like Mongo or Redis, with a disk-store. So you will survive a reboot, you don't have to reinvent the wheel, and already have a nice set of tools for scalability.

If multiple jobs exist in the event loop for one process. What happens to the remaining jobs if the current job crashes the process?

In Node.js cluster mode, if multiple jobs exist in the event loop for one process, should the current job crash the process, what happens to the remaining job?
I'm assuming the remaining jobs in the event loop would go unfulfilled or return a server error. My question is, why is this an acceptable risk? Why would someone opt to use Node.js cluster mode in production then, rather than use something like PHP in production, where there is no risk of this, because PHP handles each request in its own process.
Edit:
Obviously this doesn't just apply to Node.js cluster mode. It can happen on a single instance, in which case obviously the end user would just get a server error. Cluster mode just happens to be my personal use case.
I'm looking for a way to pick back up a job in the queue job should a previous job cause the process to exit, before the subsequent job gets a change to be fulfilled. I am currently reading about how you can use a tool like RabbitMQ to handle your job queue outside of the node.js cluster, and each cluster instance just pulls jobs from the RabbitMQ queue. If anyone has any input on that, that would also be greatly appreciated.
If multiple jobs exist in the event loop for one process. What happens to the remaining jobs if the current job crashes the process?
If a node.js process crashes, the same thing happens to it that happens to any other process. All open sockets get automatically disconnected and the client will receive an immediate close on their socket (socket connection dropped essentially).
If you were using a Java server that was in the middle of handling 10 requests (perhaps in threads) and it crashed, the consequences would be the same. All 10 socket connections would get dropped.
If process isolation from one request to another is your #1 criteria for selecting a server environment, then I guess you wouldn't pick any environment that ever serves multiple requests from the same process. But, you would give up a lot of get that. One of the reasons for the node.js design is that is scales really, really well for a high number of concurrent connections that are all doing mostly I/O things (disk, networking, database stuff, etc...) which happens to be most web servers. Whereas a design that fires up a new process for every incoming connection does not scale as well for a large number of concurrent connections because a process is a much more heavy-weight thing in the eyes of the operating system (memory usage, other system resource usage, task switching overhead, etc...) than the way node.js does things.
And, there are obviously hundreds of other considerations too when choosing a server environment. So, you kind of have to look at the whole picture of what you're designing for and make the best set of tradeoffs.
In general, I wouldn't put this issue anywhere on the radar for why you should choose one over the other unless you expect to be running risky code (perhaps out of your control) that crashes a lot and this issue is therefore more important in your deployment than all the other differences. And, if that was the case, I'd probably isolate the risky code to its own process (even when using nodejs) to alleviate any pain from that crash. You could have a process pool waiting to process risky things. For example, if you were running code submitted by a user, I might run that code in its own isolated VM.
If you're just worried about your own code crashing a lot, then you probably have bigger problems and need more extensive unit testing, more robust error handling and need to take advantage of other tools just as a linter and other code analysis tools to find potential problem areas. With proper design, implementation and error handling, you should be able to keep a single incoming request from harming anything other than itself. That's certainly the philosophy that every server environment that serves multiple requests from the same process advises and the people/companies deploying those servers use.

Nodejs scaling and prioritising functions

We have a node application running on the server that gets hit a lot and has to compile a zip file for download. That works well so far but I am nervous we will hit a point where performance becomes an issue.
(The application is currently running with forever on a ubuntu 14.04 machine.)
I am now asked to add all kinds of new features to the app which are more secondary and should not decrease the performance of the main function (the zip download). It would be OK to have those additional features fail in case the app is hit too many times in favour of the main zipping process.
What is the best practise here. Creating a REST API for the secondary features and put everything into a waiting list? It surely isn't enough to just create a second app and spawn a new process each time the main zip process finishes? How Can I ensure the most redundancy? I'm not talking about a multi-core cluster setup or load-balancing on NGINX, but a smart way of prioritising application functions on application level.
I hope this is not too broad. Cheers
First off, everything should be using async I/O, no synchronous I/O anywhere in your server. That's the #1 rule for building a scalable node.js server.
Second off, the highest priority tasks that have any significant CPU usage should be allowed to use multiple cores. If, as you say, the highest priority tasks is creating the zip download, then you should makes sure that that operation can take advantage of multiple cores.
You can accomplish that either with clustering (your whole server runs multiple instances that can each be on a separate core) or by creating a set of processes specifically for creating the zip files and then create a work queue in the main process that feeds these other processes work and gets the result back from them. This second option is likely a bit more complex to code than clustering, but it does prioritize the zip file creation so only one core is serving other server needs and all other cores of working on zip file creation. Clustering shares all cores with all server responsibilities.
At the pure server application level, your server can maintain a work queue of all incoming work to be done no matter what kind and it can prioritize that work. For example, if an API call comes in and there are already N zip file requests in the queue, you could immediately fail the API call to keep it from building up on the server. I don't think I'd personally recommend that solution unless your API calls are really heavy operations because it's very hard for a developer to reliably use your API if it regularly just fails on them. They would generally find it better for the API to just be slow sometimes than to regularly fail.
You might not even have to use a queue, you could just use a counter to keep track of how many ZIP file requests were "in process", but you'd have to make absolutely sure the counter was accurate in all cases. If there was ever an accumulating error in the counter, then you might just end up failing all API requests until your server was restarted.

Less resources for steady client connections, why?

I heard of node.js is very suitable for applications where a persistent connection from the browser to the server is needed. That "long-polling" technique is used, that allows to send updates to the user in real time without needing a lot of server resources. A more traditional server model would need a thread for every single user.
My question, what is done instead, how are the requests served differently?
Why doesn't it take so much resources?
Nodejs is event-driven. The node script is started and then loops continuously, waiting for events to be fired, until stopped. Once running, the overhead associated with loading is done.
Compare this to a more traditional language such as c#.net or PHP, where a request causes the server to load and run the script and it's dependencies. The script then does its' task (often serving a web page) and then shuts down. Another page is requested, the whole process starts again.

Resources