Distributed computing, running code on one server on start - node.js

I am running a nodejs server on heroku, using the throng package to spin up workers to process api requests. I am also using more than on dyno on heroku. For cron and job processing, I use bull queue to distribute the load across servers, but I came across something I am not sure how to do in this distributed environment.
I want to have only one server execute code immediately on startup. In this case, I want to open up a change stream listener for mongodb. But I only want to do this on one worker on one server, not every server.
I am not sure how to do this running in heroku, any suggestions?

Related

Node Worker Threads vs Heroku Workers

I'm trying to understand difference between Node Worker Threads vs Heroku Workers.
We have a single Dyno for our main API running Express.
Would it make sense to have a separate worker Dyno for our intensive tasks such as processing a large file.
worker: npm run worker
Some files we process are up to 20mb and some processes take longer than the 30s limit to run so kills the connection before it comes back.
Then could I add Node Worker Threads in the worker app to create child processes to handle the requests or is the Heroku worker enough on its own?
After digging much deeper into this and successfully implementing workers to solve the original issue, here is a summary for anyone who comes across the same scenario.
Node worker threads and Heroku workers are similar in that they intend to run code on separate threads in Node that do not block the main thread. How you use and implement them differs and depends on use case.
Node worker threads
These are the new way to create clustered environments on NODE. You can follow the NODE docs to create workers or use something like microjob to make it much easier to setup and run separate NODE threads for specific tasks.
https://github.com/wilk/microjob
This works great and will be much more efficient as they will run on separate worker threads preventing I/O blocking.
Using worker threads on Heroku on a Web process did not solve my problem as the Web process still times out after a query hits 30s.
Important difference: Heroku Workers Do not!
Heroku Workers
These are separate virtual Dyno containers on Heroku within a single App. They are separate processes that run without all the overhead the Web process runs, such as http.
Workers do not listen to HTTP requests. If you are using Express with NODE you need a web process to handle incoming http requests and then a Worker to handle the jobs.
The challenge was working out how to communicate between the web and worker processes. This is done using Redis and Bull Query together to store data and send messages between the processes.
Finally, Throng makes it easier to create a clustered environment using a Procfile, so it is ideal for use with Heroku!
Here is a perfect example that implements all of the above in a starter project that Heroku has made available.
https://devcenter.heroku.com/articles/node-redis-workers
It may make more sense for you to keep a single dyno and scale it up, which means multiple instances will be running in parallel.
See https://devcenter.heroku.com/articles/scaling

node.js with child processes in docker

I have a node.js web application that runs on my amazon aws server using nginx and pm2. The application processes files for the user, which is done using a job system and child processes. In short, when the application starts via pm2, i create a child process for each cpu core of the server. Each child process (worker) then completes jobs from the job queue.
My question is, could i replicate this in docker or would i need to modify it somehow. One assumption i had was that i would need to create a container for the database, one container for the application, and then multiple worker containers to do the processing, so that if one crashes i just spin up another worker.
I have been doing research online, including a udemy course to get my head around this stuff, but i haven't come across an example or something i can relate to my problem/question.
Any help, reading material or suggestions would be greatly appreciated.
Containers run at the same performance level as the host OS. There is no process performance hit. I created a whitepaper with Docker and HPE on this.
You wouldn't use pm2 or nodemon, which are meant to start multiple processes of your node app and restart them if they fail. That's the job of Docker now.
If in Swarm, you'd just increase the replica count of your service to be similar to the number of CPU/threads you'd want to run at the same time in the swarm.
I don't mention the nodemon/pm2 thing for Swarm in my node-docker-good-defaults so I'll at that as an issue to update it for.

Nodejs and batch jobs

I have an server side nodejs express app that responds to requests from front end clients. I need to implement a batch job that will run every hour. If I implement the batch job in the same service, does it mean that the service is 'occupied' until the cron job is completed and it will not be able to serve any requests?
Should I create a separate service instead that will run the batch job?
If your batch job does not occupy the cpu 100%, then the server will still serve requests. Every time you do async io or wait for timers, there is plenty of time for the express root routine to deal with requests.
I don't know what your job and server are doing, look at it from Separation of Concerns. If the job scheduler and server belong together, then implement them together, otherwise I recommend to make two services out of it.
Have worked with a lot batch jobs using JAVA programming language and have processed and moved Millions of rows data as ETL tool. And have experience with Core NODE framework with Linux crone schedulers Its works much more faster then previous JAVA operations with low level servers while with java have used heavy configuration server still was facing memory issues.
You can implement crone jobs using NODEJS you have to make SH files and command with node to run JS files directly and call your DBs connection or CSVs to load or transfer.

How to prioritize express requests/responds over other intensive server related tasks

My node application currently has two main modules:
a scraper module
an express server
The former is very server intensive task which indefinately runs in a loop. It scrapes information from over more than 100 urls, crunches the data and puts it into a mongodb database (using mongoose). This process runs over and over and over. :P
The latter part, my express server, responds to http/socket get requests and returns the crunched data which was written to the db by the scraper to the requesting client.
I'd like to optimize the performance of my server so that the express requests and responds get prioritized over the server intensive task(s). A client should be able to get the requested data asap, without having the scraper eat up all of my server resources.
I though about putting the server intensive task or the express server into its own thread, but then I stumbled upon cluster, and child processes; and now I'm totally confused which approach would be the right one for my situation.
One of the benefits I'm having is that there is a clear seperation between the writing part of my application and the reading part. The scraper writes stuff to the db, express reads from the db (no post/put/delete/...) calls are exposed. So, I -guess- I won't run into threading problems with different threads trying to write to the same db.
Any good suggestions? Thanks in advance!
Resources like cpu and memory required by processes are managed by the operative system. You should not waste your time writing that logic within your source code.
I think you should look at the problem from outside your source code files. Once they ran they are processes. Processes are managed, as I said, by the OS.
Firstly I would split that on two separate commands.
One being the scraper module (eg npm run scraper, that runs something like node scraper.js).
The other one being your express server (eg npm start, that runs something like node server.js).
This approach will let you configure that within your OS or your cluster.
A rapid approach for that will be to use docker.
With two docker containers running your projects with cpu usage limitations. This is fairly easy to do and does not require for you to lift a new server... and at the same time it provides the
isolation level you need to scale it to many servers in the future.
Steps to do this:
Learn a little about docker and docker compose and install them in your server
Build a docker image for your application (you can upload it to a free private image that docker hub gives you for free)
Build a docker compose for your two services using that image, with the cpu configuration you need (you can set both cpu and memory limits easily)
An alternative to that would be running the two commands (npm run scraper and npm start) with some tool like cpulimit, nice/niceness and ionice, or something else like namespaces and cgroups manually (but docker does that for you).
PD: Also, I would recommend to rethink your backend process. Maybe it's better to run it every 12 hours or something like that, instead of all the time, and you may run it from within cron instead of a loop.

Proper implementation of node.js worker process on Heroku Cedar

I have a process that needs to be run periodically - on a Rails app, it would be a worker process. Is there an equivalent for node.js on Heroku?
I'm currently using node-cron to run the periodic process on the same server as my web application. Issues here are:
With only 1 web process, it won't run when the server idles
It will block incoming connections while running
When scaling, the process doesn't need to be run on multiple servers
If it is the case that Heroku simply does not yet handle this, I'm interested in seeing other Node PAAS providers solution here.
you must have missed the new Heroku Cedar app that does all of what you want - read more at
http://devcenter.heroku.com/articles/node-js

Resources