why Redis is single threaded(event driven) - multithreading

I am trying to understanding basics of Redis.
One that that keep coming everywhere is, Redis is single threaded that makes things atomic.But I am unable to imagine how this is working internally.I have below doubt.
Don't we design a server Single thread if it is IO bound application(like Node.js),where thread got free for another request after initiating IO operation and return data to client once IO operation is finished(providing concurrency). But in case of redis all data are available in Main Memory,We are not going to do IO operation at all.So then why Redis is single threaded?What will happen if first request is taking to much time,remaining request will have to keep waiting?

TL;DR: Single thread makes redis simpler, and redis is still IO bound.
Memory is I/O. Redis is still I/O bound. When redis is under heavy load and reaches maximum requests per second it is usually starved for network bandwidth or memory bandwidth, and is usually not using much of the CPU. There are certain commands for which this won't be true, but for most use cases redis will be severely I/O bound by network or memory.
Unless memory and network speeds suddenly get orders of magnitude faster, being single threaded is usually not an issue. If you need to scale beyond one or a few threads (ie: master<->slave<->slave setup) you are already looking at Redis Cluster. In that case you can set up a cluster instance per CPU core if you are somehow CPU starved and want to maximize the number of threads.
I am not very familiar with redis source or internals, but I can see how using a single thread makes it easy to implement lockless atomic actions. Threads would make this more complex and doesn't appear to offer large advantages since redis is not CPU bound. Implementing concurrency at a level above a redis instance seems like a good solution, and is what Redis Sentinel and Redis Cluster help with.
What happens to other requests when redis takes a long time?
Those other requests will block while redis completes the long request. If needed, you can test this using the client-pause command.

The correct answer is Carl's, of course. However.
In Redis v4 we're seeing the beginning of a shift from being mostly single threaded to selectively and carefully multi threaded. Modules and thread-safe contexts are one example of that. Another two are the new UNLINK command and ASYNC mode for FLUSHDB/FLUSHALL. Future plans are to offload more work that's currently being done by the main event loop (e.g. IO-bound tasks) to worker threads.

From redis website
Redis uses a mostly single threaded design. This means that a single
process serves all the client requests, using a technique called
multiplexing. This means that Redis can serve a single request in
every given moment, so all the requests are served sequentially. This
is very similar to how Node.js works as well. However, both products
are not often perceived as being slow. This is caused in part by the
small amount of time to complete a single request, but primarily
because these products are designed to not block on system calls, such
as reading data from or writing data to a socket.
I said that Redis is mostly single threaded since actually from Redis 2.4 we use threads in Redis in order to perform some slow I/O operations in the background, mainly related to disk I/O, but this does not change the fact that Redis serves all the requests using a single thread.
Memory is no I/O operation

Related

Should I use BLPOP, or yield-based busy waiting on hundreds of redis keys?

The former sounds good, but I have the following concerns:
blocking hundreds of connections might be wasteful, e.g., io multiplexing strategy used, and
the redis server should deal with more concurrent connections since each is long-running.
The alternative approach might be the latter:
instead of busy waiting indefinitely, yield the thread at every N-th iteration.
Note that the number of connections would increase in proportion to the number of instances. Outside the two, a fixed pool of BLPOP executors can be introduced, but that could easily be the bottleneck if some of the redis list is idle.
I/O multiplexing is not busy waiting
The BLPOP command is an actual command sent to Redis server which involves I/O multiplexing and not busy waiting in the client code
I don't see any point in busy waiting with or without thread yield
For an alternative I suggest using PUB/SUB function supported by Redis, just in case you need to wait on some other events beside list being empty

Details of how Node JS works?

I want to ask some clarifying questions about NodeJS, which I think are poorly described on the resources I studied.
Many sources say that NodeJS is not suitable for complex calculations, because it is single-threaded and queries are executed sequentially
I created the simplest server on Node and wrote an endpoint that executes a request for about 10 seconds (a cycle). Next, I made 10 consecutive requests via Postman, and indeed, each subsequent request began execution only after the previous one gave a response.
Do I understand correctly that in this case, if the execution time of one endpoint is approximately 300ms, and 700 users will access the server at the same time, then the waiting time for the last user will be a critical 210,000 ms?
I also heard that the advantage of NodeJS is the ability to support a large number of simultaneous connections, then what does this mean and why is it a plus if the answer for the last person from the last question will still be very long
Another statement I came across is that libuv allows you to do many I/O operations at the same time, how does it work if NodeJS processes requests sequentially anyway?
Thank you very much!
TL;DR: I/O operations don't block the single execution thread. CPU intensive tasks DO block the thread and a NodeJS web server is not a good option in that case.
Yes, if your endpoint needs 300ms of synchronous work (cpu) to complete the operation, the last user will wait 210,000ms.
NodeJS is good at handling a large number of connections when the work it needs to do is i/o bound. It is not a good choice if the endpoint needs a lot of CPU time.
I/O operations operate at a different layer and take ZERO CPU time. That means that once the I/O operation is fired, NodeJS can accept new calls to the endpoint. NodeJS then polls the Operating System for completed I/O calls whenever its not using CPU and executes the callbacks. This is what allows it to handle a large number of concurrent requests without one user needing to wait for others to finish.

Is sharing TBB's thread pool with a HTTP server a good idea?

I know this is a weird question but hear me out. I'm working on a high throughput, compute heavy HTTP backend server in C++. It is quite straight forward:
Spins up a HTTP server
Receive some request
Do a lot of math
This step is parallelized using TBB
Send the result back (takes about 20ms)
There's no limit on how soon the response have to get out. But the lower the worst case the better it is.
Now my bottleneck is the server part of uses a different thread pool than TBB. Thus when TBB is busy doing math. The server may suddenly get tens of new requests, then the thread from the server side get scheduled, and cause a lot of cache miss and branch prediction failures.
A solution I came up is to share TBB's thread pool with the server. Then no request will be registered while TBB is busy and processed immediately after TBB is free.
Is this a good idea? Or could it have potential problems?
This is difficult to answer without knowing what that other thread pool is doing. If it handles file or network I/O then combining it with a CPU-intensive pool can be a pessimization since I/O does not consume CPU.
Normally there should be a small pool or maybe even a single thread handling the accept loop and async I/O, handing new requests off to the worker pool for processing and sending the results back to the network.
Try to avoid mixing CPU-intensive work with I/O work, as it makes resource utilization difficult to manage. Having said that, sometimes it's just easier and it's never good to run at 100% CPU anyway. So yes, you should try having just one pool. But measure the performance before/after the change.

If Redis is single Threaded, how can it be so fast?

I'm currently trying to understand some basic implementation things of Redis. I know that redis is single-threaded and I have already stumbled upon the following Question: Redis is single-threaded, then how does it do concurrent I/O?
But I still think I didn't understood it right. Afaik Redis uses the reactor pattern using one single thread. So If I understood this right, there is a watcher (which handles FDs/Incoming/outgoing connections) who delegates the work to be done to it's registered event handlers. They do the actual work and set eg. their responses as event to the watcher, who transfers the response back to the clients. But what happens if a request (R1) of a client takes lets say about 1 minute. Another Client creates another (fast) request (R2). Then - since redis is single threaded - R2 cannot be delegated to the right handler until R1 is finished, right? In a multithreade environment you could just start each handler in a single thread, so the "main" Thread is just accepting and responding to io connections and all other work is carried out in own threads.
If it really just queues the io handling and handler logic, it could never be as fast it is. What am I missing here?
You're not missing anything, besides perhaps the fact that most operations in Redis complete in less than a ~millisecond~ couple of microseconds. Long running operations indeed block the server during their execution.
Let’s say if there were 10,000 users doing live data pulling with 10 seconds each on hmget, and on the other side, server were broadcasting using hmset, redis can only issue the set at the last available queue.
Redis is only good for queuing and handle limited processing like inserting lazy last login info, but not for live info broadcasting, in this case, memcached will be the right choice. Redis is single threaded, like FIFO.

How can NodeJS scale an enterprise application?

Suppose I have an enterprise Java application that basically does the following:
gather user input, query the backend databases (maybe multiple), run some algorithm (say do some in-memory calculation of the queried data sets to produce some statistics etc.), then return the data in some html pages.
My question is: If the bottleneck of the application is on the db query, how can NodeJS helps me in this scenarios since I still need to do all those post-db algorithm before I render the page? How an application architecture looks like?
Of course node can't speed up your storage layer and make that single request that's incurring so much backend processing satisfy that request any faster to the end user. But what it can do is not tie up a thread in the application server thread pool. The single thread can continue on it's loop while that work is going on and accept another request.
That other request might be a cheaper request that will return when it's work is done. That can also happen in an application server with a thread pool model ... that is unless all the threads in the thread pool model are tied up blocked on I/O requests - along with the overhead of each thread. The cheaper request will get queued waiting on a thread out of the thread pool because they are all blocking. Nodes single thread would loop and server the cheap request.
This works because node mandates that all I/O is async and the only work that blocks the loop is your code. That's why the saying "everything in node runs in parallel except your code". While it's possible to write async code in other application servers and achieve similar results, many offer non-async thread pool models where the coding is easier but sometimes less scalable.
For example, this hanselman post illustrates how asp.net is capable of doing async requests but it's not the common model that most have used.

Resources