Detect "master" process in iisnode - node.js

I'm running a node.js app in a 32 core machine with iisnode. IIS creates 32 processes to use all the CPU cores effectively and uses named pipes for each process.
I need to start a small task scheduler and run some other code but only in one of the processes, I don't want the 32 processes to run the same code at the same time. As this is not a node.js cluster I can't use cluster.isMaster and I'm not aware that iisnode gives an ID to each process as PM2 does (read here).
Is there an easy way to run some code but only in one of all the created processes? I know I could use database lock but I was hoping to find a simpler way before having to do that.

I ended up using the proper-lockfile package that worked perfectly in my case!

Related

Two node.js processes are running while I don't suppose that. Can I kill them?

While I don't suppose that there are any processes for node.js on my windows computer, but there are two node.js processes that I can see on task manager.
I don't mean to run any node.js right now. But I have two node.js processes like above.
I had run node.js process by pm2 module before, so it affects badly maybe.
Is it ok to kill the two processes manually from task manager? Or, either process has any other purposes than executing program I wrote, so I should keep either of them or both?

Electron running multiple main processes vs multiple browser windows

I'm running electron on linux server for web scraping. And currently I'm running new electron command for each task. But it results in high cpu usage. Now thinking about running single electron instance, and create new BrowserWindow for each task. It will take some time to adapt the code base for this style, so I wanted to ask here first. Will it make a difference in cpu usage, and how much?
Basically, creating a new NodeJS process will result in re-parsing your application's code, which will highly affect your CPU usage. Creating only a new BrowserWindow will only create a new renderer process, which is way more efficient.
If your application is packaged, e.g. with electron-packager, then creating a new instance will also affect your CPU usage like creating another NodeJS process, because that packaged (aka compiled) application has a copy of NodeJS in it, which is enough to run your code, but still affects the CPU usage.
But the decision depends on how you use the server. If you only run the Electron application to carry out the tasks that have been defined by you, adapting your working code would have no to only a low benefit. If you want to release this application and/or that server is used by some other tasks, e.g. a web server, it would be a real benefit if you adapt your code.
Running multiple instances of the main nodejs process with the default configuration is not actually supported or tested. You'll find that any features that persists data to disk either don't work, or don't work as expected (ie. localstorage, indexeddb, sessions, etc).
https://github.com/electron/electron/issues/2493
You can work around this by changing the data directory for each instance so they don't trample over each other but this is likely to use a lot of disk space and you'd need a way to keep track of all these data directories.
A single main process with multiple renderers is nearly always the answer.

How to prioritize express requests/responds over other intensive server related tasks

My node application currently has two main modules:
a scraper module
an express server
The former is very server intensive task which indefinately runs in a loop. It scrapes information from over more than 100 urls, crunches the data and puts it into a mongodb database (using mongoose). This process runs over and over and over. :P
The latter part, my express server, responds to http/socket get requests and returns the crunched data which was written to the db by the scraper to the requesting client.
I'd like to optimize the performance of my server so that the express requests and responds get prioritized over the server intensive task(s). A client should be able to get the requested data asap, without having the scraper eat up all of my server resources.
I though about putting the server intensive task or the express server into its own thread, but then I stumbled upon cluster, and child processes; and now I'm totally confused which approach would be the right one for my situation.
One of the benefits I'm having is that there is a clear seperation between the writing part of my application and the reading part. The scraper writes stuff to the db, express reads from the db (no post/put/delete/...) calls are exposed. So, I -guess- I won't run into threading problems with different threads trying to write to the same db.
Any good suggestions? Thanks in advance!
Resources like cpu and memory required by processes are managed by the operative system. You should not waste your time writing that logic within your source code.
I think you should look at the problem from outside your source code files. Once they ran they are processes. Processes are managed, as I said, by the OS.
Firstly I would split that on two separate commands.
One being the scraper module (eg npm run scraper, that runs something like node scraper.js).
The other one being your express server (eg npm start, that runs something like node server.js).
This approach will let you configure that within your OS or your cluster.
A rapid approach for that will be to use docker.
With two docker containers running your projects with cpu usage limitations. This is fairly easy to do and does not require for you to lift a new server... and at the same time it provides the
isolation level you need to scale it to many servers in the future.
Steps to do this:
Learn a little about docker and docker compose and install them in your server
Build a docker image for your application (you can upload it to a free private image that docker hub gives you for free)
Build a docker compose for your two services using that image, with the cpu configuration you need (you can set both cpu and memory limits easily)
An alternative to that would be running the two commands (npm run scraper and npm start) with some tool like cpulimit, nice/niceness and ionice, or something else like namespaces and cgroups manually (but docker does that for you).
PD: Also, I would recommend to rethink your backend process. Maybe it's better to run it every 12 hours or something like that, instead of all the time, and you may run it from within cron instead of a loop.

Node.js Clusters with Additional Processes

We use clustering with our express apps on multi cpu boxes. Works well, we get the maximum use out of AWS linux servers.
We inherited an app we are fixing up. It's unusual in that it has two processes. It has an Express API portion, to take incoming requests. But the process that acts on those requests can run for several minutes, so it was build as a seperate background process, node calling python and maya.
Originally the two were tightly coupled, with the python script called by the request to upload the data. But this of course was suboptimal, as it would leave the client waiting for a response for the time it took to run, so it was rewritten as a background process that runs in a loop, checking for new uploads, and processing them sequentially.
So my question is this: if we have this separate node process running in the background, and we run clusters which starts up a process for each CPU, how is that going to work? Are we not going to get two node processes competing for the same CPU. We were getting a bit of weird behaviour and crashing yesterday, without a lot of error messages, (god I love node), so it's bit concerning. I'm assuming Linux will just swap the processes in and out as they are being used. But I wonder if it will be problematic, and I also wonder about someone getting their web session swapped out for several minutes while the longer running process runs.
The smart thing to do would be to rewrite this to run on two different servers, but the files that maya uses/creates are on the server's file system, and we were not given the budget to rebuild the way we should. So, we're stuck with this architecture for now.
Any thoughts now possible problems and how to avoid them would be appreciated.
From an overall architecture prospective, spawning 1 nodejs per core is a great way to go. You have a lot of interdependencies though, the nodejs processes are calling maya which may use mulitple threads (keep that in mind).
The part that is concerning to me is your random crashes and your "process that runs in a loop". If that process is just checking the file system you probably have a race condition where the nodejs processes are competing to work on the same input/output files.
In theory, 1 nodejs process per core will work great and should help to utilize all your CPU usage. Linux always swaps the processes in and out so that is not an issue. You could start multiple nodejs per core and still not have an issue.
One last note, be sure to keep an eye on your memory usage, several linux distributions on EC2 do not have a swap file enabled by default, running out of memory can be another silent app killer, best to add a swap file in case you run into memory issues.

Node.js cluster module: sharing cores with other processes without overlap

Suppose I have a 16 core machine and I want a Node.js webserver to use 8 cores, and I want a Java REST/API server to use the other 8. The idea being that Node would be making API calls over localhost, even though the API server may handle requests coming from elsewhere.
I'm using Node's cluster module to accomplish this; basically, doing cluster.fork() 8x. I assume Java has its own ways of utilizing X cores, but others are writing that piece of software.
Is there anything I need to do, from a node perspective, to ensure that my code doesn't run on any of the same cores Java is running on? It seems simple enough to run fork() n/2 times, where n = numCpus, and if that's all I need to worry about, great, but I wanted to ask this question to make sure I'm not missing something important.
It is up the the kernel scheduler to manage this, but you can potentially give the kernel some hints by setting the processor affinity. For example, on linux, you might be able to use taskset to set process affinity for each of the started processes.

Resources