I have a question regarding Flask, Waitress and parallel processing of HTTP requests.
I have read that Flask alone can only process one HTTP request at a time.
In the table below I put all the possible configuration and I would like to have your feedback concerning the number of HTTP requests that I can process in parallel.
| |Only Flask| Flask and Waitress|
|------------------- -- |----------|-------------------|
|1 CPU & 1 core | 1 request| 1 request |
|1 CPU & 4 core | 1 request| 4 request |
|2 CPU & 1 core each CPU | 1 request| 2 request |
|2 CPU & 4 core each CPU | 1request | 8 requests |
I ask these questions because a colleague told me that we can process several thousand HTTP requests with an Apach server with only 1 CPU and 1 core !!
So, how should I handle the maximum number of HTTP requests in parallel?
Let me clear out the confusion for you.
When you are using Flask while developing locally, you use the built-in server which is single-threaded. which means it will only process one request at a time. This is one of the reasons why you shouldn't simply have FLASK_ENV=production and run in a production environment. The built-in server is not capable to run in those environments. One you change FLASK_ENV to production and run, you'll find a warning in the terminal.
Now, coming on to how to run Flask in a production environment, CPU's, Core's, Threads and other stuff
To run Flask in a production environment, you need to have a proper application server that can run your Flask application. Here comes in Gunicorn which is compatible with Flask and one of the most sought after ways of running Flask.
In gunicorn, you have different ways to configure an optimal way to run it based on the specs of your servers.
You can achieve it in the following ways:
Worker Class - The type of worker to use
No of Workers
No of Threads
The way you calculate the maximum number of concurrent requests is as follows:
Taking a 4 core server as
As per the documentation of gunicorn, the optimal number of workers is suggested as (2 * num_of_cores) + 1 which in this case becomes (2*4)+1 = 9
Now, the optimal configuration for the number of threads is 2 to 4 x $(num_of_cores) which in this case comes out to say 4*9 = 36
So now, you have 9 Workers with 36 threads each. Each thread can handle one request at a time so you can have 9*36=324 concurrent connections
Similarly, you can have the calculation for Waitress. I prefer using Gunicorn so you'll need to check out the docs of waitress for the configuration.
Now coming to Web Servers
Until now, what you have configured is an application server to run Flask. This works, but you shouldn't expose an application server directly to the internet. Instead, it's always suggested to deploy Flask behind a reverse proxy like Nginx. Nginx acts as a full-fledged web server capable of handling real-world workloads.
So in a gist, you could use a combination from the list below as per your requirements,
Flask + Application Server + Web Server where,
Application Server is one of Gunicorn, uWSGI, Gevent, Twisted Web, Waitress, etc and a Web Server from one of Nginx, Apache, Traefik, Caddy, etc
Related
I am using an IIS web garden for long running requests with 15 worker processes.
With, for example, 3 browsers, typically multiple worker processes are used.
With Apache jMeter, all requests are using the same worker process.
Is there a way to force the use of multiple worker processes?
This may have at least 2 explanations:
You have some hard coded ID or session ID in your test plan. Check for their presence and remove them, add Cookie Manager to your test
You have a load balancer that work in Source IP mode, in this case you need to either change policy to Round Robin or add 2 other machines
If you are using 1 thread with X iterations and expecting different workers then check that:
Cookie Manager is configured this way:
And Thread Group this way (notice "Same User on each iteration is unchecked"):
If issue persists, then please share you plan and check that you don't have somewhere in Header Manager a hardcoded id leading to using 1 worker
Well-behaved JMeter script should produce the same network footprint as the real browser do so if you're observing inconsistencies most probably your JMeter configuration is not matching requests which are being sent by the real browser.
Make sure that your JMeter test is doing what it is supposed to be doing by inspecting requests/responses details using View Results Tree listener
Use a 3rd-party tool like Wireshark or Fiddler to capture the requests originating from browser/JMeter, detect the differences and amend your JMeter configuration to eliminate them
More information: How to make JMeter behave more like a real browser
In the absolute majority of cases JMeter script is not working as expected due to missing or improperly implemented correlation of the dynamic values
I have a node.js server which I'm running with PM2 on a Red Hat system with 1 Core. The main purpose of the server is that users can upload data. At peak times around 75 users connect at the same time to the server and upload data (the data is uploaded in chunks of 1 MB and at the end concatenated by the server). Uploading data can take longer (around 10 minutes).
Currently, I'm starting the server using the following command
pm2 start server.js -i max -o ./logs/out.log -e ./logs/err.log
That means I'm starting it in cluster mode. I don't know if that is necessary with only 1 core. Should I instead just use fork mode (i.e. removing -i max)? Do I also have to use pm2 scale server 75 so that I have 75 workers, i.e. one worker for each user? Or else, how can I scale it to 75 concurrent users?
If you have only one core, will have the same efficency & only will use the core available. The cluster is useful if your computer/server has more than one core.
If you are asking for: will my server have 75 threads, one per user? no because NodeJS is single-threaded and will only use one thread for all the incoming requests.
The single threaded server is capable to hold 75 open connections concurrently
(I'm learning Cloud Run acknowledge this is not development or code related, but hoping some GCP engineer can clarify this)
I have a PY application running - gunicorn + Flask... just PoC for now, that's why minimal configurations.
cloud run deploy has following flags:
--max-instances 1
--concurrency 5
--memory 128Mi
--platform managed
guniccorn_cfg.py files has following configurations:
workers=1
worker_class="gthread"
threads=3
I'd like to know:
1) max-instances :: if I were to adjust this, does that mean a new physical server machine is provisioned whenever needed ? Or, does the service achieve that by pulling a container image and simply starting a new container instance (docker run ...) on same physical server machine, effectively sharing the same physical machine as other container instances?
2) concurrency :: does one running container instance receive multiple concurrent requests (5 concurrent requests processed by 3 running container instances for ex.)? or does each concurrent request triggers starting new container instance (docker run ...)
3) lastly, can I effectively reach concurrency > 5 by adjusting gunicorn thread settings ? for ex. 5x3=15 in this case.. for ex. 15 concurrent requests being served by 3 running container instances for ex.? if that's true any pros/cons adjusting thread vs adjusting cloud run concurrency?
additional info:
- It's an IO intensive application (not the CPU intensive). Simply grabbing the HTTP request and publishing to pubsub/sub
thanks a lot
First of all, it's not appropriate on Stackoverflow to ask "cocktail questions" where you ask 5 things at a time. Please limit to 1 question at a time in the future.
You're not supposed to worry about where containers run (physical machines, VMs, ...). --max-instances limit the "number of container instances" that you allow your app to scale. This is to prevent ending up with a huge bill if someone was maliciously sending too many requests to your app.
This is documented at https://cloud.google.com/run/docs/about-concurrency. If you specify --concurrency=10, your container can be routed to have at most 10 in-flight requests at a time. So make sure your app can handle 10 requests at a time.
Yes, read Gunicorn documentation. Test if your setting "locally" lets gunicorn handle 5 requests at the same time... Cloud Run’s --concurrency setting is to ensure you don't get more than 5 requests to 1 container instance at any moment.
I also recommend you to read the officail docs more thoroughly before asking, and perhaps also the cloud-run-faq once which pretty much answers all these.
My scenario is pretty common:
User has made a request with a pretty intensive job. This might take 1 hour or more, so we don't want to block other requests.
What are my options if the server has only 1 CPU and 512MB of RAM? What if there were made another 3 similar requests? Can Node.js handle such case?
Note: It's fine if user will have to wait for the results one day or more.
Note 2: I am hosting my app on Heroku (Hobby Plan).
We have a .NET 2.0 Remoting server running in Single-Call mode under IIS7. It has two APIs, say:
DoLongRunningCalculation() - has a lot of database requests and can take a long time to execute.
HelloWorld() - just returns "Hello World".
We tried to stress test the remoting server (on a Windows 7 machine) in a worst case scenario by bombarding it randomly with the two API calls and found that if we go beyond 10 client requests, the HelloWorld response (which generally is less than 0.1 sec) starts taking longer and longer going into many seconds. Our objective is that we dont want to have the long-running remoting calls to block the short-running calls. Here are the performance counters for ASP.NET v2.0.50727 if we have 20 client threads running:
Requests Queued: 0
Requests Executing: (Max:10)
Worker Processes Running: 0
Pipeline Instance Mode: (Max:10)
Requests in Application Queue: 0
We've tried setting maxConcurrentRequestsPerCPU to "5000" in registry as per Thomas's blog: ASP.NET Thread Usage on IIS 7.0 and 6.0 but it hasn't helped. Based on the above data, it appears that the number of concurrent requests is stuck at 10.
So, the question is:
How do we go about increasing the concurrent requests? The main objective is that we don't want to have the long-running remoting calls to block the short-running calls.
Why are the Max Requests Executing always stuck at 10?
Thanks in advance.
Windows 7 has a 20 inbound connection limit. XP and prior was limited to 10 (not sure about Vista). This is likely the cause of your drop in performance. Try testing on an actual server OS that doesn't have an arbitrary connection limit.