I am trying to build a multithreaded system on Vertx Server. Our system will receive concurrent requests. I have a single event loop runing with 20 worker models. I am worried that since I am passing variables by reference between functions.
Now my question is that does each Thread share the memory pool which might cause the thread to fetch older value of the variable? or will Vertx handle this internally?
Since vertx itself is asynchronous,no need to introduce other thread apis. After set WorkerPoolName, workerPoolSize and setInstances to handle the level of concurrency.
Related
I see clear the cluster method as it deploys different whole processes. And I guess the professional programmers made "worker_threads" library for some good reason... but I still need to clear this point for my understanding:
In a normal single threaded process the event loop thread has the aid of the default worker pool to unload its heavy I/O tasks, so the main thread is not blocked.
At the same time, user defined "worker threads" will be used for the same reason with their own event loops and NodeJS instances.
What's the point of spawning those event loop and Nodejs instances when they are not the bottle neck as the libuv is intended to manage to spawn the workers.
Is this meaning that the default worker pool may not be enough? I mean just a quantity matter or concept?
There are two types of operation(call) in Nodejs blocking and non-blocking
non-blocking
Nodejs use Libuv for IO non-blocking operation. Network, file, and DNS IO operations run asynchronously by Libuv. Nodejs use the following scheme:
Asynchronous system APIs are used by Node.js whenever possible, but where they do not exist, Libuv's thread pool is used to create asynchronous node APIs based on synchronous system APIs. Node.js APIs that use the thread pool are:
all fs APIs, other than the file watcher APIs and those that are:
explicitly synchronous asynchronous crypto APIs such as crypto.pbkdf2(),
crypto.scrypt(), crypto.randomBytes(), crypto.randomFill(), crypto.generateKeyPair()
dns.lookup() all zlib *APIs, other than those that are explicitly synchronous.
So we don't have direct access to the Libuv thread pool. We may define our own uses of the thread pool using C++ add-ons.
Blocking calls
Nodejs execute blocking code in the main thread. fs.readfileSync(), compression-algorithm, encrypting data, image-resize, calculating primes for the large range are some examples of blocking operation. Nodejs golden rule is never block event-loop(main thread). We can execute these operations asynchronously by creating child process using cluster module or child-process module. But creating a child process is a heavy task in terms of OS resources and that's why worker-thread was born.
Using worker-thread you can execute blocking javascript code in worker-thread hence unblocking the main thread and you can communicate to parent thread(main thread) via message passing. Worker threads are still lightweight as compared to a child process.
Read more here:
https://nodesource.com/blog/worker-threads-nodejs
https://blog.insiderattack.net/deep-dive-into-worker-threads-in-node-js-e75e10546b11
I am currently using the CassFuture callback future to implement an asynchronous pattern for processing Cassandra queries. It appears that all of the callbacks are coming in on the same thread. Is this the expected behavoir?
Assuming you are using the current version of the driver (v2.9.0) the number of threads is configured via cass_cluster_set_num_threads_io() before creating a session and defaults to 1. If you increase the number of IO threads in the cluster configuration you will notice that CassFuture callbacks will begin to show up on different threads.
NOTE: If your callback is slow, consider running on a separate thread; otherwise the callback might block IO operations for other requests being executed on the callback/IO thread. Another recommendation is not use the complete number of cores/virtual cores available in your hardware configuration as this could starve resources from your client application (and potentially from the OS services).
I understand that the power of Node.js is that it processes all user requests on a single thread working on a queue of request. The idea being there is no context switch of this thread, no system calls.
input thread ---> | request queue| ---> output thread --(processes tasks if not causing system call, else delegates to thread pool).
The thread pool will:-
- execute tasks involving system calls (usually somewhat long running
ones.. e.g. IO tasks)
- put the results as another request task in the queue..
- which will be processed by the single thread working on queue
My question is, inevitable, Node.js code will need to put data in an RDBMS or JMS system. This is most definitely synchronous (even putting in JMS is synchronous.. although producer - consumer are not synchronous). So the thread pool processing these IO tasks will not only make system calls, but also be blocked during this period. JDBC in any case does not support synch calls (I guess due to need to be transactional, and maybe security issues, since txn and security context are attached to threads).
So how do we actually put data in RDBMS efficiently from a Node.js server?
In apache, we have a single thread for each incoming request. Each thread consumes a memory space. the memory spaces don't collide with each other because of which each request serves it purpose.
How does this happen in node.js as it has single thread execution. A single memory space is used by all incoming requests. Why don't the requests collide with each other. What differentiates them?
As you self noticed an event based model allows to share the given memory more efficiently as the overhead of reexecuting a stack again and again is minimized.
However to make an event or single threaded model non-blocking you have to get back to threads somewhere and this is where nodes "io-engine" libuv is working.
libuv supplies an API which underneath manages IO-tasks in a thread pool if an IO task is done async. Using a thread pool results in not blocking the main process however extensive javascript operations still can do (this is why there is the cluster module which allows spawning multiple worker processes).
I hope this answers you question if not feel free to comment!
In a recent course at school about networking / operating systems I learned about thread pools. Now he basic functionality is pretty straight forward and I understand this.
However, what's not specified in my book is what happens when the thread pool is exhausted? For example you have a pool with 20 threads in it and you have 20 connected clients. Another client tries to connect but there's no threads left in the pool, what happens then? Does the client go in a queue? Does the system make another thread to put in the pool? Something else?
The answer depends highly on your language, your operation system, and your pool implementation.
what happens when the thread pool is exhausted? Another client tries to connect but there's no threads left in the pool, what happens then? Does the client go in a queue?
Typically in a server situation, it depends on the socket settings. Either the socket connection gets queued by the OS or the connection gets refused. This is usually not handled by the thread-pool. In ~unix operation systems, this queue or "backlog" is handled by the listen method.
Does the system make another thread to put in the pool?
This depends on the thread-pool. Some pools are fixed size so no more threads will be added. Other thread-pools are "cached" thread pools so it will reuse a free thread or will create a new one if none are available. Many web servers have max thread settings on their pools so remote users don't thrash the system by starting too many concurrent connections.
It depends on the policy used by the thread-pool:
the pool size can be static, and when a new thread is requested the caller will wait on a synchronization primitives like a semaphore, or the request can be pushed into a queue
the pool size can be unlimited but this may be dangerous because creating too much threads can greatly reduce the performance; more often than note it is ranged between a min and a max set by the pool user
the pool can use a dynamic policy depending on the context: hardware resources like CPU or RAM, OS resources like synchronization primitives and threads, current process resources (memory, threads, handles...)
An example of a smart thread-pool: http://www.codeproject.com/Articles/7933/Smart-Thread-Pool
It depends on the thread pool implementation. They might be put on a queue, they might get a new thread created for them, or they might even just get an error message saying come back later. Or if you are the one implementing the thread pool, you can do whatever you want.