Is Gunicorn's gthread async worker analogous to Waitress? - wsgi

I've read some posts from 2013 that the Gunicorn team was planning to build a threaded buffering layer worker model, similar to how Waitress works. Is that what the gthread async worker does? The gthread workers were released with version 19.0 in 2014.
Waitress has a master async thread that buffers requests, and enqueues each request to one of its sync worker threads when the request I/O is finished.
Gunicorn gthread doesn't have much documentation, but it sounds similar. From the docs:
The worker gthread is a threaded worker. It accepts connections in the main loop, accepted connections are are added to the thread pool as a connection job.
I only ask because I am not super knowledgeable about python async I/O code, though a cursory reading of the gthread.py seems to indicate that it is a socket-buffering process that protects worker threads from long-I/O requests (and buffers the response I/O as well).
https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/gthread.py

The threaded worker in Gunicorn does not buffer I/O and does not read the request body in the main thread.
The main loop asynchronously handles calling accept()[1], but then the socket is immediately submitted to the thread pool[2].

Gunicorn doesn't have HTTP request buffering, which is something you can find in Waitress. Waitress also has the advantage that it fully supports Windows.

Related

Why "worker_threads" when we have default worker pool?

I see clear the cluster method as it deploys different whole processes. And I guess the professional programmers made "worker_threads" library for some good reason... but I still need to clear this point for my understanding:
In a normal single threaded process the event loop thread has the aid of the default worker pool to unload its heavy I/O tasks, so the main thread is not blocked.
At the same time, user defined "worker threads" will be used for the same reason with their own event loops and NodeJS instances.
What's the point of spawning those event loop and Nodejs instances when they are not the bottle neck as the libuv is intended to manage to spawn the workers.
Is this meaning that the default worker pool may not be enough? I mean just a quantity matter or concept?
There are two types of operation(call) in Nodejs blocking and non-blocking
non-blocking
Nodejs use Libuv for IO non-blocking operation. Network, file, and DNS IO operations run asynchronously by Libuv. Nodejs use the following scheme:
Asynchronous system APIs are used by Node.js whenever possible, but where they do not exist, Libuv's thread pool is used to create asynchronous node APIs based on synchronous system APIs. Node.js APIs that use the thread pool are:
all fs APIs, other than the file watcher APIs and those that are:
explicitly synchronous asynchronous crypto APIs such as crypto.pbkdf2(),
crypto.scrypt(), crypto.randomBytes(), crypto.randomFill(), crypto.generateKeyPair()
dns.lookup() all zlib *APIs, other than those that are explicitly synchronous.
So we don't have direct access to the Libuv thread pool. We may define our own uses of the thread pool using C++ add-ons.
Blocking calls
Nodejs execute blocking code in the main thread. fs.readfileSync(), compression-algorithm, encrypting data, image-resize, calculating primes for the large range are some examples of blocking operation. Nodejs golden rule is never block event-loop(main thread). We can execute these operations asynchronously by creating child process using cluster module or child-process module. But creating a child process is a heavy task in terms of OS resources and that's why worker-thread was born.
Using worker-thread you can execute blocking javascript code in worker-thread hence unblocking the main thread and you can communicate to parent thread(main thread) via message passing. Worker threads are still lightweight as compared to a child process.
Read more here:
https://nodesource.com/blog/worker-threads-nodejs
https://blog.insiderattack.net/deep-dive-into-worker-threads-in-node-js-e75e10546b11

Can nodejs worker threads be used for executing long running file I/O based javascript code?

I can see that NodeJS is bringing in multi-threading support via its worker threads module. My current assumption (I have not yet explored personally) is that I can offload a long running /cpu intensive operation to these worker threads.
I want to understand the behaviour if this long running piece of code has some intermittent event callbacks or chain of promises. Do these callbacks still execute on the worker threads, or do they get passed on back to the main thread?
If these promises come back to main thread, the advantage of executing the worker thread may be lost.
Can someone clarify?
Update => Some context of the question
I have a http req that initiates some background processing and returns a 202 status. After receiving such request, I am starting a background processing via
setTimeout (function() { // performs long running file read operations.. })
and immediately return a 202 to the caller.
However, I have observed that, during this time while this background operation is going, other http requests are either not being processed, or very very sluggish at the best.
My hypothesis is that this continuous I/O processing of a million+ lines is filling up the event loop with callbacks / promises that the main thread is unable to process other pending I/O tasks such as accepting new requests.
I have explored the nodejs cluster option and this works well, as the long task is delegated to one of the child processes, and other instances of cluster are available to take up additional requests.
But I was thinking that worker threads might solve the same problem, without the overhead of cloning the process.
I assume each worker thread would have its own event loop.
So if you emit an event in a worker thread, only that thread would receive it and trigger the callback. The same for promises, if you create a promise within a worker, it will only be resolved by that worker.
This is supported by their statement in the documentation regarding Class: Worker: Most Node.js APIs are available inside of it (with some exceptions that are not related to event processing).
However they mention this earlier in the docs:
Workers are useful for performing CPU-intensive JavaScript operations; do not use them for I/O, since Node.js’s built-in mechanisms for performing operations asynchronously already treat it more efficiently than Worker threads can.
I think some small scale async code in worker threads would be fine, but having more callbacks/promises would hurt performance. Some benchmarks could shed some light on this.

worker pool vs libuv's threadpool in node.js

I was reading node.js docs about worker pool and faced two term which I thought both are same - worker pool and libuv's threadpool.
Here is the point of confusion (from node.js doc url):
These are the Node module APIs that make use of this Worker Pool:
I/O-intensive
DNS: dns.lookup(), dns.lookupService().
File System: All file system APIs except fs.FSWatcher() and those that are explicitly synchronous use libuv's threadpool.
Here is my understanding so far:
event loop -> can be considered main thread
worker pool -> which is implemented by libuv, so in that case worker pool thread is actually libuv thread.
So, how workers pool does something without libuv's thread?
The "Worker Pool" and "libuv's threadpool" are the same. The reason you're misunderstanding is due to the formulation of that sentence. As a non-native English speaker myself, I can see why.
This:
File System: All file system APIs except fs.FSWatcher() and those that are explicitly synchronous use libuv's threadpool.
could instead be formulated like this:
File System: All file system APIs use libuv's threadpool, except fs.FSWatcher() and those that are explicitly synchronous.
A better formulation can be seen on the docs for UV_THREADPOOL_SIZE cli option, as seen here:
Node.js APIs that use the threadpool are:
...
all zlib APIs, other than those that are explicitly synchronous

Node.js threadpool

Iv'e been reading a lot about how Node.js works and why it could be a better choice when you dealing with many IO requests, however .. the main advantage is that node.js is a single thread model consists of one main thread (event loop) which use in the background a worker thread for each IO operation so it will always be there to serve more requests all the time .. in contrast to the regular request-response model which assigns a thread for each requests and when there is no more threads in the thread bool the new requests should wait in queue till some thread ends.
So can't Node.js have the same issue when assigning a worker for each IO operation, knowing that the threadpool has a limited number of threads.
Thank you
node.js does not use threads at all for incoming network requests. Incoming requests are queued by the underlying socket infrastructure and a queued request is serviced through the internal node.js event queue when node.js finishes up a prior operation and then goes to the event queue for the next thing to do.
The limit on how many incoming network requests can be in flight at once will most likely be dictated by the underlying OS/TCP stack and how many requests it will queue before refusing the next incoming connection. The HTTP library in node.js does do some connection pooling (in the interest of increasing performance) when making lots out outbound requests to the same host, but that is different than incoming requests and the connection pooling can be bypassed if it is not desirable.
There are other parts of node.js that do use an internal thread pool to make the async behavior work (such as disk I/O). If you try to run more async disk operations than there are threads in the thread pool, then the thread pool will queue the request to start running when a thread frees up. Since the interface to the requests are async, it can just add the event to an internal queue and then service it later when it has a thread available to allocate for it.

What kind of operations are handled by nodejs worker threads?

I need some clarification on what exactly are the nodejs worker threads doing.
I found contradicting info on this one. Some people say worker threads handle all IO, others say they handle only blocking posix requests (for which there is no async version).
For example, assume that I am not blocking the main loop myself with some unreasonable processing. I am just invoking functions from available modules and providing the callbacks. I can see that if this requires some blocking or computationally-expensive operation then it is handled to a worker thread. But for some async IO, is it initiated from the main libuv loop? Or is it passed to a worker thread, to be initiated from there?
Also, would a nodejs worker thread ever initiate a blocking (synchroneous) IO operation when the OS supports an async mode to do the same thing? Is it documented anywhere what kind of operations may end up blocking a worker thread for a longer time?
I'm asking this because there is a fixed-size worker pool and I want to avoid making mistakes with it. Thanks.
Network I/O on all platforms is done on the main thread. File I/O is a different story: on Windows it is done truly asynchronously and non-blocking, but on all other platforms synchronous file I/O operations are performed in a thread pool to be async and non-blocking.

Resources