Sharing single file between gunicorn workers - python-3.x

I'm trying to create flask application with gunicorn. I defined number of gunicorn workers to (multiprocessing.cpu_count() * 2) + 1 according to documentation.
There is problem beacuse in my flask application there is need to wrtite something to single file when HTTP request will come. If more than on worker will do it in the same time there are some errors in application.
Is it possible to define some Lock between gunicorn workers?

Related

Cloud Run Qs :: max-instances + concurrency + threads (gunicorn thread)

(I'm learning Cloud Run acknowledge this is not development or code related, but hoping some GCP engineer can clarify this)
I have a PY application running - gunicorn + Flask... just PoC for now, that's why minimal configurations.
cloud run deploy has following flags:
--max-instances 1
--concurrency 5
--memory 128Mi
--platform managed
guniccorn_cfg.py files has following configurations:
workers=1
worker_class="gthread"
threads=3
I'd like to know:
1) max-instances :: if I were to adjust this, does that mean a new physical server machine is provisioned whenever needed ? Or, does the service achieve that by pulling a container image and simply starting a new container instance (docker run ...) on same physical server machine, effectively sharing the same physical machine as other container instances?
2) concurrency :: does one running container instance receive multiple concurrent requests (5 concurrent requests processed by 3 running container instances for ex.)? or does each concurrent request triggers starting new container instance (docker run ...)
3) lastly, can I effectively reach concurrency > 5 by adjusting gunicorn thread settings ? for ex. 5x3=15 in this case.. for ex. 15 concurrent requests being served by 3 running container instances for ex.? if that's true any pros/cons adjusting thread vs adjusting cloud run concurrency?
additional info:
- It's an IO intensive application (not the CPU intensive). Simply grabbing the HTTP request and publishing to pubsub/sub
thanks a lot
First of all, it's not appropriate on Stackoverflow to ask "cocktail questions" where you ask 5 things at a time. Please limit to 1 question at a time in the future.
You're not supposed to worry about where containers run (physical machines, VMs, ...). --max-instances limit the "number of container instances" that you allow your app to scale. This is to prevent ending up with a huge bill if someone was maliciously sending too many requests to your app.
This is documented at https://cloud.google.com/run/docs/about-concurrency. If you specify --concurrency=10, your container can be routed to have at most 10 in-flight requests at a time. So make sure your app can handle 10 requests at a time.
Yes, read Gunicorn documentation. Test if your setting "locally" lets gunicorn handle 5 requests at the same time... Cloud Run’s --concurrency setting is to ensure you don't get more than 5 requests to 1 container instance at any moment.
I also recommend you to read the officail docs more thoroughly before asking, and perhaps also the cloud-run-faq once which pretty much answers all these.

How to process several HTTP requests with Flask

I have a question regarding Flask, Waitress and parallel processing of HTTP requests.
I have read that Flask alone can only process one HTTP request at a time.
In the table below I put all the possible configuration and I would like to have your feedback concerning the number of HTTP requests that I can process in parallel.
| |Only Flask| Flask and Waitress|
|------------------- -- |----------|-------------------|
|1 CPU & 1 core | 1 request| 1 request |
|1 CPU & 4 core | 1 request| 4 request |
|2 CPU & 1 core each CPU | 1 request| 2 request |
|2 CPU & 4 core each CPU | 1request | 8 requests |
I ask these questions because a colleague told me that we can process several thousand HTTP requests with an Apach server with only 1 CPU and 1 core !!
So, how should I handle the maximum number of HTTP requests in parallel?
Let me clear out the confusion for you.
When you are using Flask while developing locally, you use the built-in server which is single-threaded. which means it will only process one request at a time. This is one of the reasons why you shouldn't simply have FLASK_ENV=production and run in a production environment. The built-in server is not capable to run in those environments. One you change FLASK_ENV to production and run, you'll find a warning in the terminal.
Now, coming on to how to run Flask in a production environment, CPU's, Core's, Threads and other stuff
To run Flask in a production environment, you need to have a proper application server that can run your Flask application. Here comes in Gunicorn which is compatible with Flask and one of the most sought after ways of running Flask.
In gunicorn, you have different ways to configure an optimal way to run it based on the specs of your servers.
You can achieve it in the following ways:
Worker Class - The type of worker to use
No of Workers
No of Threads
The way you calculate the maximum number of concurrent requests is as follows:
Taking a 4 core server as
As per the documentation of gunicorn, the optimal number of workers is suggested as (2 * num_of_cores) + 1 which in this case becomes (2*4)+1 = 9
Now, the optimal configuration for the number of threads is 2 to 4 x $(num_of_cores) which in this case comes out to say 4*9 = 36
So now, you have 9 Workers with 36 threads each. Each thread can handle one request at a time so you can have 9*36=324 concurrent connections
Similarly, you can have the calculation for Waitress. I prefer using Gunicorn so you'll need to check out the docs of waitress for the configuration.
Now coming to Web Servers
Until now, what you have configured is an application server to run Flask. This works, but you shouldn't expose an application server directly to the internet. Instead, it's always suggested to deploy Flask behind a reverse proxy like Nginx. Nginx acts as a full-fledged web server capable of handling real-world workloads.
So in a gist, you could use a combination from the list below as per your requirements,
Flask + Application Server + Web Server where,
Application Server is one of Gunicorn, uWSGI, Gevent, Twisted Web, Waitress, etc and a Web Server from one of Nginx, Apache, Traefik, Caddy, etc

NodeJS Express - Send specific routes to specific cluster workers?

I have a central API server starting cluster worker instances. Each instance has a specific bigger job, and there might be manipulations I want to do on that specific instance only. This was the rough idea I had in mind:
API Server with express, master process
instance 1: GET /instances/1/*
instance 2: GET /instances/2/*
Each instance is a separate worker process, and I was hoping I could delegate all API requests for specific worker, directly to the worker (to execute functions in that worker).
instance/:id does represent the WorkerID.
The client might request the logs where workerID = x, so GET /instances/x/logs.
The goal here is that master routes all requests for instance X to the sub-process identified as x.
This is not for load distribution across workers that are essentially clones/mirrors.
Each of my worker may be performing a variation of a long running task (days, weeks, months long). Methods are shared across all workers, but if I'm calling /instances/x/logs I only want to query that on that specific worker process. That's the goal I'm looking to figure out.
// route these to subprocess x
GET /instances/x/logs
POST /instances/x/settings
// route these to subprocess y
GET /instances/y/logs
POST /instances/y/settings
// route these to subprocess z
GET /instances/z/logs
POST /instances/z/settings
// make a new worker process, returns worker ID as reference
POST /instances/
I saw I can have multiple express listeners on the same port across different processes, but if I understood correctly, this is automatically load-balanced by express. I can't route specific requests to specific workers based on the path, can I?
Each instance is a separate worker process, and I was hoping I could delegate all API requests for specific worker, directly to the worker (to execute functions in that worker).
Indeed you can do that, but unless your instance/:id represents the WorkerID, you've hit an end.
Let's assume the following example where :id is not a worker id:
W - Worker
W1/instances/1/:method - has the following methods names, cities, cars
W2/instances/2/:method - has the following methods names, fruits, stats
The HTTP client will want to access:
GET /instances/1/name, that's great, name exists in both paths. - TRUE
GET /instances/2/fruits, that's great, fruits exists in this path on W2, BUT if the balancer serves you W1 with that route you'll have an error, because fruits doesn't exist on W1 - FALSE
Final answer:
You cannot request workers to pop-up and serve your will, best you can do is some communication between master & workers or have some methods that requires some processing, and based on the CPU usage, trigger those methods when a low worker is served. But take a look at the good part, if they die, you can fork new ones without crashing your whole app.

Falcon/gunicorn code initialization (run once) with multiple workers (singleton?)

I'm using Falcon 1.4.1 and Gunicorn 19.9.0 inside docker.
Having trouble figuring out the best way to initialize the application - running some code once when my REST API is started instead of once per worker. I have 3 or more workers running for my application.
I've tried using the gunicorn on_starting webhook, but it still ran once per worker. In my gunicorn_conf.py file:
def on_starting(server):
print('Here I am')
I also tried the gunicorn preload_app setting which I'm happily using in production now and which does allow application initialization to run once before it starts the workers.
I want to be able to use the gunicorn reload setting so file changes restart the application which directly conflicts the with preload_app setting.
May just want too much :) Anyone have any ideas on solutions? I saw some attempts to get a lock file with multiprocessing, but turns out you get a lockfile/worker.
I am not able to understand properly what exactly you want to achieve? It will help if you post error code also.
As you mention you are able to run your code once using Gunicorn preload_app setting instead of for all worker.
Now you can reload Gunicorn instances on file change using following code:
gunicorn --workers 3 -b localhost:5000 main:app --reload
If this is not what you are looking for then share error code here as you mention that "I saw some attempts to get a lock file with multiprocessing, but turns out you get a lockfile/worker." I will try my best to help you.

parallel requests with gunicorn on heroku

I've pushed an I/O bound gunicorn\flask service to heroku. Heroku docs advise to either increase the number of gunicorn workers or to use async threads such as gevent. I tried the following Procfiles but still the service handles the file upload requests serially. I've added no application level locks.
Multiple processes Procfile:
web: gunicorn service:app --log-file=- --workers 4
Multiple threads Procfile:
web: gunicorn service:app --log-file=- --threads 4 --worker-class gevent
All the service does is receive JSON request, de-serialize it and upload the binary to S3. The logs suggest the limiting factor is that each request is handled only after the last has completed.
Is there something inherent to heroku or to flask that prevents multiple requests being handled in parallel?
AFAIK the code is agnostic to the number of workers, but is it also agnostic to the number of threads? Or should I add some kind of support in the code.

Resources