Cassandra Datastax cpp-driver - Callbacks seem to use 1 thread

Cassandra Datastax cpp-driver - Callbacks seem to use 1 thread - cassandra

I am currently using the CassFuture callback future to implement an asynchronous pattern for processing Cassandra queries. It appears that all of the callbacks are coming in on the same thread. Is this the expected behavoir?

Assuming you are using the current version of the driver (v2.9.0) the number of threads is configured via cass_cluster_set_num_threads_io() before creating a session and defaults to 1. If you increase the number of IO threads in the cluster configuration you will notice that CassFuture callbacks will begin to show up on different threads.
NOTE: If your callback is slow, consider running on a separate thread; otherwise the callback might block IO operations for other requests being executed on the callback/IO thread. Another recommendation is not use the complete number of cores/virtual cores available in your hardware configuration as this could starve resources from your client application (and potentially from the OS services).

Related

Why "worker_threads" when we have default worker pool?

I see clear the cluster method as it deploys different whole processes. And I guess the professional programmers made "worker_threads" library for some good reason... but I still need to clear this point for my understanding:
In a normal single threaded process the event loop thread has the aid of the default worker pool to unload its heavy I/O tasks, so the main thread is not blocked.
At the same time, user defined "worker threads" will be used for the same reason with their own event loops and NodeJS instances.
What's the point of spawning those event loop and Nodejs instances when they are not the bottle neck as the libuv is intended to manage to spawn the workers.
Is this meaning that the default worker pool may not be enough? I mean just a quantity matter or concept?

There are two types of operation(call) in Nodejs blocking and non-blocking
non-blocking
Nodejs use Libuv for IO non-blocking operation. Network, file, and DNS IO operations run asynchronously by Libuv. Nodejs use the following scheme:
Asynchronous system APIs are used by Node.js whenever possible, but where they do not exist, Libuv's thread pool is used to create asynchronous node APIs based on synchronous system APIs. Node.js APIs that use the thread pool are:
all fs APIs, other than the file watcher APIs and those that are:
explicitly synchronous asynchronous crypto APIs such as crypto.pbkdf2(),
crypto.scrypt(), crypto.randomBytes(), crypto.randomFill(), crypto.generateKeyPair()
dns.lookup() all zlib *APIs, other than those that are explicitly synchronous.
So we don't have direct access to the Libuv thread pool. We may define our own uses of the thread pool using C++ add-ons.
Blocking calls
Nodejs execute blocking code in the main thread. fs.readfileSync(), compression-algorithm, encrypting data, image-resize, calculating primes for the large range are some examples of blocking operation. Nodejs golden rule is never block event-loop(main thread). We can execute these operations asynchronously by creating child process using cluster module or child-process module. But creating a child process is a heavy task in terms of OS resources and that's why worker-thread was born.
Using worker-thread you can execute blocking javascript code in worker-thread hence unblocking the main thread and you can communicate to parent thread(main thread) via message passing. Worker threads are still lightweight as compared to a child process.
Read more here:
https://nodesource.com/blog/worker-threads-nodejs
https://blog.insiderattack.net/deep-dive-into-worker-threads-in-node-js-e75e10546b11

Can nodejs worker threads be used for executing long running file I/O based javascript code?

I can see that NodeJS is bringing in multi-threading support via its worker threads module. My current assumption (I have not yet explored personally) is that I can offload a long running /cpu intensive operation to these worker threads.
I want to understand the behaviour if this long running piece of code has some intermittent event callbacks or chain of promises. Do these callbacks still execute on the worker threads, or do they get passed on back to the main thread?
If these promises come back to main thread, the advantage of executing the worker thread may be lost.
Can someone clarify?
Update => Some context of the question
I have a http req that initiates some background processing and returns a 202 status. After receiving such request, I am starting a background processing via
setTimeout (function() { // performs long running file read operations.. })
and immediately return a 202 to the caller.
However, I have observed that, during this time while this background operation is going, other http requests are either not being processed, or very very sluggish at the best.
My hypothesis is that this continuous I/O processing of a million+ lines is filling up the event loop with callbacks / promises that the main thread is unable to process other pending I/O tasks such as accepting new requests.
I have explored the nodejs cluster option and this works well, as the long task is delegated to one of the child processes, and other instances of cluster are available to take up additional requests.
But I was thinking that worker threads might solve the same problem, without the overhead of cloning the process.

I assume each worker thread would have its own event loop.
So if you emit an event in a worker thread, only that thread would receive it and trigger the callback. The same for promises, if you create a promise within a worker, it will only be resolved by that worker.
This is supported by their statement in the documentation regarding Class: Worker: Most Node.js APIs are available inside of it (with some exceptions that are not related to event processing).
However they mention this earlier in the docs:
Workers are useful for performing CPU-intensive JavaScript operations; do not use them for I/O, since Node.js’s built-in mechanisms for performing operations asynchronously already treat it more efficiently than Worker threads can.
I think some small scale async code in worker threads would be fine, but having more callbacks/promises would hurt performance. Some benchmarks could shed some light on this.

Is it possible for Node.js makes asynchronous call to JDBC

I understand that the power of Node.js is that it processes all user requests on a single thread working on a queue of request. The idea being there is no context switch of this thread, no system calls.
input thread ---> | request queue| ---> output thread --(processes tasks if not causing system call, else delegates to thread pool).
The thread pool will:-
- execute tasks involving system calls (usually somewhat long running
ones.. e.g. IO tasks)
- put the results as another request task in the queue..
- which will be processed by the single thread working on queue
My question is, inevitable, Node.js code will need to put data in an RDBMS or JMS system. This is most definitely synchronous (even putting in JMS is synchronous.. although producer - consumer are not synchronous). So the thread pool processing these IO tasks will not only make system calls, but also be blocked during this period. JDBC in any case does not support synch calls (I guess due to need to be transactional, and maybe security issues, since txn and security context are attached to threads).
So how do we actually put data in RDBMS efficiently from a Node.js server?

How is an event based model(Node.js) more efficient than thread based model(Apache) for serving http requests?

In apache, we have a single thread for each incoming request. Each thread consumes a memory space. the memory spaces don't collide with each other because of which each request serves it purpose.
How does this happen in node.js as it has single thread execution. A single memory space is used by all incoming requests. Why don't the requests collide with each other. What differentiates them?

As you self noticed an event based model allows to share the given memory more efficiently as the overhead of reexecuting a stack again and again is minimized.
However to make an event or single threaded model non-blocking you have to get back to threads somewhere and this is where nodes "io-engine" libuv is working.
libuv supplies an API which underneath manages IO-tasks in a thread pool if an IO task is done async. Using a thread pool results in not blocking the main process however extensive javascript operations still can do (this is why there is the cluster module which allows spawning multiple worker processes).
I hope this answers you question if not feel free to comment!

How node.js works?

I don't understand several things about nodejs. Every information source says that node.js is more scalable than standard threaded web servers due to the lack of threads locking and context switching, but I wonder, if node.js doesn't use threads how does it handle concurrent requests in parallel? What does event I/O model means?
Your help is much appreciated.
Thanks

Node is completely event-driven. Basically the server consists of one thread processing one event after another.
A new request coming in is one kind of event. The server starts processing it and when there is a blocking IO operation, it does not wait until it completes and instead registers a callback function. The server then immediately starts to process another event (maybe another request). When the IO operation is finished, that is another kind of event, and the server will process it (i.e. continue working on the request) by executing the callback as soon as it has time.
So the server never needs to create additional threads or switch between threads, which means it has very little overhead. If you want to make full use of multiple hardware cores, you just start multiple instances of node.js
Update
At the lowest level (C++ code, not Javascript), there actually are multiple threads in node.js: there is a pool of IO workers whose job it is to receive the IO interrupts and put the corresponding events into the queue to be processed by the main thread. This prevents the main thread from being interrupted.

Although Question is already explained before a long time, I'm putting my thoughts on the same.
Node.js is single threaded JavaScript runtime environment. Basically it's creator Ryan Dahl concern was that parallel processing using multiple threads is not the right way or too complicated.
if Node.js doesn't use threads how does it handle concurrent requests in parallel
Ans: It's completely wrong sentence when you say it doesn't use threads, Node.js use threads but in a smart way. It uses single thread to serve all the HTTP requests & multiple threads in thread pool(in libuv) for handling any blocking operation
Libuv: A library to handle asynchronous I/O.
What does event I/O model means?
Ans: The right term is non-blocking I/O. It almost never blocks as Node.js official site says. When any request goes to node server it never queues the request. It take request and start executing if it's blocking operation then it's been sent to working threads area and registered a callback for the same as soon as code execution get finished, it trigger the same callback and goes to event queue and processed by event loop again after that create response and send to the respective client.
Useful link:
click here

Node JS is a JavaScript runtime environment. Both browser and Node JS run on V8 JavaScript engine. Node JS uses an event-driven, non-blocking I/O model that makes it lightweight and efficient. Node JS applications uses single threaded event loop architecture to handle concurrent clients. Actually its' main event loop is single threaded but most of the I/O works on separate threads, because the I/O APIs in Node JS are asynchronous/non-blocking by design, in order to accommodate the main event loop. Consider a scenario where we request a backend database for the details of user1 and user2 and then print them on the screen/console. The response to this request takes time, but both of the user data requests can be carried out independently and at the same time. When 100 people connect at once, rather than having different threads, Node will loop over those connections and fire off any events your code should know about. If a connection is new it will tell you .If a connection has sent you data, it will tell you .If the connection isn’t doing anything ,it will skip over it rather than taking up precision CPU time on it. Everything in Node is based on responding to these events. So we can see the result, the CPU stay focused on that one process and doesn’t have a bunch of threads for attention.There is no buffering in Node.JS application it simply output the data in chunks.

Though its been answered , i would like to just share my understandings in simple terms
Nodejs uses a library called Libuv , so this Libuv is written in C
language which uses the concept of threads . These threads are called
as workers and these workers take care of the multiple requests from client.
Parallel processing in nodejs is achieved with the help of 2 concepts
Asynchronous
Non blocking IO

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string