replicating cherrypy process on multi-core server - cherrypy

I have a cherrypy app that is sitting behind nginx (rev-proxy) and handling CPU intensive requests. Since cherrypy's thread pool model doesn't really help with parallelism (because of GIL), how do I replicate the cherrypy process per core to utilize all my server cores? how do I handle the load balancing? I prefer not to add middleware but if its not possible otherwise I will.

You can either start multiple cherrypy servers based on no of cores available or use a WSGI container to deploy your cherrypy application.
eg: uwsgi

Related

One NodeJs instance or multiple, in a single core VPS

I'm on confusion about this.
Let's assume I have a 1 core VPS, and I have a Nodejs server running.
Now, I launch another Nodejs instance and a load balancer to distribute requests (on the same VPS).
Will the performance increase because I will have 2 Nodejs servers sharing the work?
Or It will decrease because 1 node is already enough to handle all the requests so adding another one now plus the load balancer will just consume more of the VPS?
If you create more instances than the number of CPUs, while there is an active process running on one instance, the other instances will compete for CPU to satisfy any incoming request and that will lead so using more CPU than saving time. Although negligible, having the same number of instances as cores will have better performance.

What is a preferred production setup for Flask-SocketIO? Confused about Gunicorn as it can only be spawned with one worker

From the deployment section in the documentation, Gunicorn can only work with 1 worker process with Flask-SocketIO. I was wondering what is the preferred way to deploy a flask-socket-io server? Currently I have a regular Flask app that uses a multi-worker gunicorn server that is proxy-passed to from nginx. While I don't have any load balancing, I expect concurrency to be taken care of by the multiple gunicorn workers, and not having that concerns me a little bit for the websockets server.
Maybe I misunderstand the way eventlets/greenlets function but I see uwsgi as the only other alternative that I have not explored. Is it worth gettinginto the learning curve of uwsgi for this purpose?
Both Gunicorn and uWSGI have a very limited load balancer that does not support the required sticky sessions.
If you want to use multiple workers with either of these frameworks, you need to start several single-worker servers on their own ports, and then use nginx in front as load balancer.
In addition to nginx, you need to add a message queue (RabbitMQ, Redis, etc) that all these processes can use to coordinate efforts.
See the documentation on deploying multiple servers here: https://flask-socketio.readthedocs.io/en/latest/deployment.html#using-multiple-workers

Which gunicorn worker type to use for machine learning inference?

gunicorn version: 19.9.0
python version: 3.7.0
Got a named entity recognition (NER) machine learning inference application built with flask+gunicorn (no nginx).
The app receives a request with details of a document stored in the cloud. It fetches the document from the cloud, does NER on it, stores the results in the cloud (if it succeeds), and sends a success/failure response to the client. The SLA is the client receives a response within a minute of sending a request. The NER task runs multiple models in parallel using python multiprocessing and is quite CPU-intensive.
We are currently using a single gunicorn 'sync' worker with 7 threads. We are using only one worker, since the models take up a lot of memory. This set-up is mostly working alright, except that some threads just vanish in the middle of processing a request after fetching a document from the cloud (no errors in the logs), which requires us to restart gunicorn every few hours.
According to the gunicorn docs:
The default synchronous workers assume that your application is
resource-bound in terms of CPU and network bandwidth. Generally this
means that your application shouldn’t do anything that takes an
undefined amount of time. An example of something that takes an
undefined amount of time is a request to the internet.
Our app does make requests to the internet (cloud storage) and is also CPU-intensive.
What's the best worker class to use in this situation? Also, is there a better workers+threads combination?
When you use sync worker with 7 threads, Gunicorn automatically converts it to threads type worker. The presence of threads configuration will make Gunicorn switch the worker type
You will likely see more improvements when using more workers. If your models are too big, consider using Tensorflow serving (https://www.tensorflow.org/tfx/serving/docker). Aside from other benefits like request batching, TF serving loads and runs the models in a separate process tensorflow_model_server, so only one copy of the ML models is needed, and that will free up your Gunicorn server so you can spring up more Gunicorn workers without worrying about each worker loads its own models into memory
Now that you can use more workers, try with more sync workers with no threads configuration. sync worker is supposed to be good with CPU bound application.
Then try with gevent workers no threads configuration. gevent worker uses pseudo-threads and is supposed to be very good with I/O bound application.
Compare results with sync workers and gevent workers and see which one is better.

Node Worker Threads vs Heroku Workers

I'm trying to understand difference between Node Worker Threads vs Heroku Workers.
We have a single Dyno for our main API running Express.
Would it make sense to have a separate worker Dyno for our intensive tasks such as processing a large file.
worker: npm run worker
Some files we process are up to 20mb and some processes take longer than the 30s limit to run so kills the connection before it comes back.
Then could I add Node Worker Threads in the worker app to create child processes to handle the requests or is the Heroku worker enough on its own?
After digging much deeper into this and successfully implementing workers to solve the original issue, here is a summary for anyone who comes across the same scenario.
Node worker threads and Heroku workers are similar in that they intend to run code on separate threads in Node that do not block the main thread. How you use and implement them differs and depends on use case.
Node worker threads
These are the new way to create clustered environments on NODE. You can follow the NODE docs to create workers or use something like microjob to make it much easier to setup and run separate NODE threads for specific tasks.
https://github.com/wilk/microjob
This works great and will be much more efficient as they will run on separate worker threads preventing I/O blocking.
Using worker threads on Heroku on a Web process did not solve my problem as the Web process still times out after a query hits 30s.
Important difference: Heroku Workers Do not!
Heroku Workers
These are separate virtual Dyno containers on Heroku within a single App. They are separate processes that run without all the overhead the Web process runs, such as http.
Workers do not listen to HTTP requests. If you are using Express with NODE you need a web process to handle incoming http requests and then a Worker to handle the jobs.
The challenge was working out how to communicate between the web and worker processes. This is done using Redis and Bull Query together to store data and send messages between the processes.
Finally, Throng makes it easier to create a clustered environment using a Procfile, so it is ideal for use with Heroku!
Here is a perfect example that implements all of the above in a starter project that Heroku has made available.
https://devcenter.heroku.com/articles/node-redis-workers
It may make more sense for you to keep a single dyno and scale it up, which means multiple instances will be running in parallel.
See https://devcenter.heroku.com/articles/scaling

NodeJS in MultiCore System

"Node.js is limited to a single thread". how the nodeJS will react when we are deploying in Multi-Core systems? will it boost the performance?
The JavaScript running in the Node.js V8 engine is single-threaded, but the underlying libuv multi-platform support library is multi-threaded and those threads will be distributed across the CPU cores by the operating system according to it's scheduling algorithm, so with your JavaScript application running asynchronously (and single-threaded) at the top level, you still benefit from multi-core under the covers.
As others have mentioned, the Node.js Cluster module is an excellent way to exploit multi-core for concurrency at the application (JavaScript V8) level, and since Express is cluster aware, you can have multiple worker processes executing concurrent server logic, without needing a unique listening port for each process. Impressive.
As others have mentioned, you will need Redis or equivalent to share data among the cluster worker processes. You will also want a logging facility that is cluster aware, so the cluster master and all worker processes can log to a single shared log file. The Node log4node module is a good choice here, and it works with logrotate.
Typical web examples show using the runtime detected number of cores as the number of cluster worker processes to fork, but I prefer to make that a configuration option in a config.yaml file so I can tune the number of worker processes running the main JavaScript application as needed.
Nodejs runs in one thread, but you can start multiple nodejs processes.
If you are, for example, building web server you can route every request to one of nodejs processes.
Edit: As hereandnow78 and vkurchatkin suggested, maybe the best way to use power of multi core system would be to use nodejs cluster module
cluster module is the solution.
But u need to know that, node.js cluster is, it invokes child process. It means each process cannot share the data.
To share data, u need to use Redis or other IMDG to share the data across the cluster nodes.

Resources