I've pushed an I/O bound gunicorn\flask service to heroku. Heroku docs advise to either increase the number of gunicorn workers or to use async threads such as gevent. I tried the following Procfiles but still the service handles the file upload requests serially. I've added no application level locks.
Multiple processes Procfile:
web: gunicorn service:app --log-file=- --workers 4
Multiple threads Procfile:
web: gunicorn service:app --log-file=- --threads 4 --worker-class gevent
All the service does is receive JSON request, de-serialize it and upload the binary to S3. The logs suggest the limiting factor is that each request is handled only after the last has completed.
Is there something inherent to heroku or to flask that prevents multiple requests being handled in parallel?
AFAIK the code is agnostic to the number of workers, but is it also agnostic to the number of threads? Or should I add some kind of support in the code.
Related
In terms if Gunicorn workers and threads I need to know if there is a maximum threads that can be assigned to a single worker? I've a flask application for the UI part of a trading bot. What it does it allows the user to create new instances of a trading bot and when a new instance is created, it is added as a new thread. There is a bot manager that allows to stop and start bot instances as threads and also keep track of running threads.
App is dockerized. Gunicorn runs on a single worker (since workers don't share memory between them, or is there a way for the bot manager to speak with other workers?). How many threads can I start on a single worker? Should I specify number of threads for gunicorn?
Currently gunicorn is fired up with following command:
gunicorn app:application --worker-tmp-dir /dev/shm --bind 0.0.0.0:8000 --timeout 600 --workers 1
Can I start lets say 8 threads on a single worker? What benefits will it reap?
gunicorn app:application --worker-tmp-dir /dev/shm --bind 0.0.0.0:8000 --timeout 600 --workers=1 --threads=8
(I'm learning Cloud Run acknowledge this is not development or code related, but hoping some GCP engineer can clarify this)
I have a PY application running - gunicorn + Flask... just PoC for now, that's why minimal configurations.
cloud run deploy has following flags:
--max-instances 1
--concurrency 5
--memory 128Mi
--platform managed
guniccorn_cfg.py files has following configurations:
workers=1
worker_class="gthread"
threads=3
I'd like to know:
1) max-instances :: if I were to adjust this, does that mean a new physical server machine is provisioned whenever needed ? Or, does the service achieve that by pulling a container image and simply starting a new container instance (docker run ...) on same physical server machine, effectively sharing the same physical machine as other container instances?
2) concurrency :: does one running container instance receive multiple concurrent requests (5 concurrent requests processed by 3 running container instances for ex.)? or does each concurrent request triggers starting new container instance (docker run ...)
3) lastly, can I effectively reach concurrency > 5 by adjusting gunicorn thread settings ? for ex. 5x3=15 in this case.. for ex. 15 concurrent requests being served by 3 running container instances for ex.? if that's true any pros/cons adjusting thread vs adjusting cloud run concurrency?
additional info:
- It's an IO intensive application (not the CPU intensive). Simply grabbing the HTTP request and publishing to pubsub/sub
thanks a lot
First of all, it's not appropriate on Stackoverflow to ask "cocktail questions" where you ask 5 things at a time. Please limit to 1 question at a time in the future.
You're not supposed to worry about where containers run (physical machines, VMs, ...). --max-instances limit the "number of container instances" that you allow your app to scale. This is to prevent ending up with a huge bill if someone was maliciously sending too many requests to your app.
This is documented at https://cloud.google.com/run/docs/about-concurrency. If you specify --concurrency=10, your container can be routed to have at most 10 in-flight requests at a time. So make sure your app can handle 10 requests at a time.
Yes, read Gunicorn documentation. Test if your setting "locally" lets gunicorn handle 5 requests at the same time... Cloud Run’s --concurrency setting is to ensure you don't get more than 5 requests to 1 container instance at any moment.
I also recommend you to read the officail docs more thoroughly before asking, and perhaps also the cloud-run-faq once which pretty much answers all these.
I'm trying to create flask application with gunicorn. I defined number of gunicorn workers to (multiprocessing.cpu_count() * 2) + 1 according to documentation.
There is problem beacuse in my flask application there is need to wrtite something to single file when HTTP request will come. If more than on worker will do it in the same time there are some errors in application.
Is it possible to define some Lock between gunicorn workers?
I'm using Falcon 1.4.1 and Gunicorn 19.9.0 inside docker.
Having trouble figuring out the best way to initialize the application - running some code once when my REST API is started instead of once per worker. I have 3 or more workers running for my application.
I've tried using the gunicorn on_starting webhook, but it still ran once per worker. In my gunicorn_conf.py file:
def on_starting(server):
print('Here I am')
I also tried the gunicorn preload_app setting which I'm happily using in production now and which does allow application initialization to run once before it starts the workers.
I want to be able to use the gunicorn reload setting so file changes restart the application which directly conflicts the with preload_app setting.
May just want too much :) Anyone have any ideas on solutions? I saw some attempts to get a lock file with multiprocessing, but turns out you get a lockfile/worker.
I am not able to understand properly what exactly you want to achieve? It will help if you post error code also.
As you mention you are able to run your code once using Gunicorn preload_app setting instead of for all worker.
Now you can reload Gunicorn instances on file change using following code:
gunicorn --workers 3 -b localhost:5000 main:app --reload
If this is not what you are looking for then share error code here as you mention that "I saw some attempts to get a lock file with multiprocessing, but turns out you get a lockfile/worker." I will try my best to help you.
Web Dynos can handle HTTP Requests
and while Web Dynos handles them Worker Dynos can handle jobs from it.
But I don't know how to make Web Dynos and Worker Dynos to communicate each other.
For example, I want to receive a HTTP request by Web Dynos
, send it to Worker Dynos
, process the job and send back result to Web Dynos
, show results on Web.
Is this possible in Node.js? (With RabbitMQ or Kue or etc)?
I could not find an example in Heroku Documentation
Or Should I implement all codes in Web Dynos and scaling Web Dynos only?
As the high-level article on background jobs and queuing suggests, your web dynos will need to communicate with your worker dynos via an intermediate mechanism (often a queue).
To accomplish what it sounds like you're hoping to do follow this general approach:
Web request is received by the web dyno
Web dyno adds a job to the queue
Worker dyno receives job off the queue
Worker dyno executes job, writing incremental progress to a shared component
Browser-side polling requests status of job from the web dyno
Web dyno queries shared component for progress of background job and sends state back to browser
Worker dyno completes execution of the job and marks it as complete in shared component
Browser-side polling requests status of job from the web dyno
Web dyno queries shared component for progress of background job and sends completed state back to browser
As far as actual implementation goes I'm not too familiar with the best libraries in Node.js, but the components that glue this process together are available on Heroku as add-ons.
Queue: AMQP is a well-supported queue protocol and the CloudAMQP add-on can serve as the message queue between your web and worker dynos.
Shared state: You can use one of the Postgres add-ons to share the state of an job being processed or something more performant such as Memcache or Redis.
So, to summarize, you must use an intermediate add-on component to communicate between dynos on Heroku. While this approach involves a little more engineering, the result is a properly-decoupled and scalable architecture.
From what I can tell, Heroku does not supply a way of communicating for you, so you will have to build that yourself. In order to communicate to another process using Node, you will probably have to deal with the process' stdin/out/err manually, something like this:
var attachToProcess = function(pid) {
return {
stdin: fs.createWriteStream('/proc/' + pid + '/fd/0'),
stdout: fs.createReadStream('/proc/' + pid + '/fd/1'),
stderr: fs.createReadStream('/proc/' + pid + '/fd/2')
};
};
var pid = fs.readFile('/path/to/worker.pid', 'utf8', function(err, pid) {
if (err) {throw err;}
var worker = attachToProcess(Number(pid));
worker.stdin.write(...);
});
Then, in your worker process, you will have to store the pid in that pid file:
fs.writeFile('/path/to/worker.pid', process.pid, function(err) {
if (err) {throw err;}
});
I haven't actually tested any of this, so it will likely take some working and building on it, but I think the basic idea is clear.
Edit
I just noticed that you tagged this with "redis" as well, and thought I should add that you can also use redis pub/sub to communicate between your various processes as explained in the node_redis readme.