Server design for synchronous I/O services - multithreading

Im currently trying to decide on a design for a TCP server where the services that the server will provide consist of performing synchronous I/O (tons of DB queries - existing code!)
The system that this server will be part of has a couple of hundred clients that are typically all connected simultaniously and stay connected for several hours.
Basically all client requests are a result of human interaction, so the frequency is low but the response time should be as fast possible.
As I said, the service implementation must perform synchronous I/O, so a fully event based server is obviously out of the question.
Threads seem like a natural choice to serialize blocking IO, but you see advice to not use more threads than CPU cores.
Currently I'm leaning towards using a thread pool with a number of threads that is actually higher than the core count, since the threads will mostly be blocking anyway.
Would such a design be reasonable? What alternatives exist for a server with these requirements?

It seems, (yes again, from ohter posts, not my direct experience), that 200 .NET managed threads 'causes all kinds of problems'. 200 unmanaged threads is not a big problem on Windows/Linux.
200 persistent database connections, however, seems like a lot! You may wish to pool them, or use a threadpool for DB access with suitable inter-thread comms.

Related

Node.js vs ASP.Net Core: Response times when doing I/O heavy operations under stress test?

Let's assume we are stress testing 2 servers that perform database read/write operations without caching, and make network calls. One is a Node.js server, and the other is an ASP.Net Core one that utilizes Tasks. Assume that both perform the same operations, receive the same requests/second, and run on machines with equal processing power.
I've read that Node.js is very performant on I/O operations, but I can't get my head around it. Sure, the awaited db and network calls run in a non-blocking way, however each of these calls are handled by an individual thread from a limited thread pool. This doesn't sound performant to me at all. I would be forced to implement a caching mechanism to mitigate this, which is something I don't really like. So either Node.js is not the greatest choice for these kinds of operations, or I have incorrect knowledge.
As for ASP.NET Core, I don't know the internals of it, but I'm pretty sure it doesn't have the thread limitation issues Node.js has, so it should have shorter response times by logic. Yet I still can't know for sure if this is the case due to resource consumption and context switching cost concerns.
So which of these 2 would theoretically have shorter response times, and why exactly?

What is I/O Scaling Problem in nodeJS Performance?

Can you please explain what is the I/O Scaling Problem in Node.js Performance?
I am reading the book "Basarat Ali Syed - Beginning Node.js-Apress" but the explanation is not enough to I/O Scaling.
A server typically has a mix of computational things to do and I/O things to do (getting data from somewhere like a database or a disk or another server). In today's modern servers with pretty fast multi-core processors, it is more common for a given server request to be limited by I/O than by CPU.
So, if you're going to scale a server to be able to handle lots of requests and to handle them with good performance, you have to find a way to be able to most efficiently handle lots of I/O requests because that's probably what your server is limited by. This would be the "I/O scaling problem". How to scale your server and code architecture to handle lots of I/O requests very efficiently.
It so happens that the node.js single-threaded architecture with asynchronous I/O is very efficient at doing lots of I/O and can be more efficient than other server architectures that use multiple threads and blocking I/O calls.
If you go to your Table of Contents in that book, you will see the following:
Understanding Node.js Performance
The I/O Scaling Problem
Traditional Web Servers Using a Process Per Request
Traditional Web Servers Using a Thread Pool
The Nginx way
Node.js Performance Secret
I don't have the book myself, but I would presume that "The I/O Scaling Problem" section of the book describes it for you. And, then you can read about the node.js performance secret for how it handles this. The servers that use a process or a thread per request take more system resources to have lots of requests in flight at the same time (which is one key to handle lots of requests). The node.js non-blocking I/O model, on the other hand, is very efficient at handling lots and lots of in-flight requests.

Efficient way to process many threads of same Application

I have a Multi-Client Single-Server application where client and server gets connected through sockets. Client and Server are in different machine.
In client Application, client socket gets connected to server and sends data periodically to server.
In server application server socket listens for client to connect. When a client is connected, new thread is created for client to receive data.
for example: 1 client = 1 thread created by server for receiving data. If its 10000 client, server creates 10000 threads. This seems not good and scalable too.
My Application is in Java.
Is there an alternate method for this problem?
Thanks in advance
This is a typical C10K problem. There are patterns to solve this, one examples is Reactor pattern
Java NIO is another way where the incoming request can be processed in non blocking way. See a reference implementation here
Yes, you should not need a separate thread for each client. There's a good tutorial here that explains how to use await to handle asynchronous socket communication. Once you receive data over the socket you can use a fixed number of threads. The tutorial also covers techniques to handle thousands of simultaneous communications.
Unfortunately given the complexity it's not possible to post the code here, so although link-only answers are frowned upon ...
I would say it's a perfect candidate for an Erlang/Elixir application. Whatsapp, RabbitMQ...
Erlang processes are cheap and fast to start, Erlang manages the scheduling for you so you don't have to think about the number of threads, CPUs or even machines, Erlang manages garbage collection for each process after you don't need it anymore.
Haskell is slow, Erlang is fast enough for most applications that are not doing heavy calculations and even then you can use it and hand off the heavy lifting to a C process.
What are you writing in?
Yes, you can use the Actor model, with e.g. Akka or Akka.net. This allows you to create millions of actors that run on e.g. 4 threads. Erlang is a programming language that implements the actor model natively.
However, actors and non-blocking code won't do you much good if you are relying on blocking library calls for backend services that you rely on, such as (the most prominent example in the JVM world) JDBC calls.
There is also a rather interesting approach that Haskell uses, called green threads. It means that the runtime threads are very lightweight and are dynamically mapped to OS threads. It also means that you get a certain amount of scalability "for free", with no need to write non-blocking IO code. It does however require a good IO manager in the runtime to schedule the IO operations efficiently, and GHC Haskell has had a substantial amount of work put into that in recent years.

Redis is single-threaded, then how does it do concurrent I/O?

Trying to grasp some basics of Redis I came across an interesting blog post .
The author states:
Redis is single-threaded with epoll/kqueue and scale indefinitely in terms of I/O concurrency.
I surely misunderstand the whole threading thing, because I find this statement puzzling. If a program is single-threaded, how does it do anything concurrently? Why it is so great that Redis operations are atomic, if the server is single-threaded anyway?
Could anybody please shed some light on the issue?
Well it depends on how you define concurrency.
In server-side software, concurrency and parallelism are often considered as different concepts. In a server, supporting concurrent I/Os means the server is able to serve several clients by executing several flows corresponding to those clients with only one computation unit. In this context, parallelism would mean the server is able to perform several things at the same time (with multiple computation units), which is different.
For instance a bartender is able to look after several customers while he can only prepare one beverage at a time. So he can provide concurrency without parallelism.
This question has been debated here:
What is the difference between concurrency and parallelism?
See also this presentation from Rob Pike.
A single-threaded program can definitely provide concurrency at the I/O level by using an I/O (de)multiplexing mechanism and an event loop (which is what Redis does).
Parallelism has a cost: with the multiple sockets/multiple cores you can find on modern hardware, synchronization between threads is extremely expensive. On the other hand, the bottleneck of an efficient storage engine like Redis is very often the network, well before the CPU. Isolated event loops (which require no synchronization) are therefore seen as a good design to build efficient, scalable, servers.
The fact that Redis operations are atomic is simply a consequence of the single-threaded event loop. The interesting point is atomicity is provided at no extra cost (it does not require synchronization). It can be exploited by the user to implement optimistic locking and other patterns without paying for the synchronization overhead.
OK, Redis is single-threaded at user-level, OTOH, all asynchronous I/O is supported by kernel thread pools and/or split-level drivers.
'Concurrent', to some, includes distributing network events to socket state-machines. It's single-threaded, runs on one core, (at user level), so I would not refer to this as concurrent. Others differ..
'scale indefinitely in terms of I/O concurrency' is just being economical with the truth. They may get more belief if they said 'can scale better than one-thread-per-client, providing the clients don't ask for much', though they may then feel obliged to add 'blown away on heavy loading by other async solutions that use all cores at user level'.

Why are event-based network applications inherently faster than threaded ones?

We've all read the benchmarks and know the facts - event-based asynchronous network servers are faster than their threaded counterparts. Think lighttpd or Zeus vs. Apache or IIS. Why is that?
I think event based vs thread based is not the question - it is a nonblocking Multiplexed I/O, Selectable sockets, solution vs thread pool solution.
In the first case you are handling all input that comes in regardless of what is using it- so there is no blocking on the reads- a single 'listener'. The single listener thread passes data to what can be worker threads of different types- rather than one for each connection. Again, no blocking on writing any of the data- so the data handler can just run with it separately. Because this solution is mostly IO reads/writes it doesn't occupy much CPU time- thus your application can take that to do whatever it wants.
In a thread pool solution you have individual threads handling each connection, so they have to share time to context switch in and out- each one 'listening'. In this solution the CPU + IO ops are in the same thread- which gets a time slice- so you end up waiting on IO ops to complete per thread (blocking) which could traditionally be done without using CPU time.
Google for non-blocking IO for more detail- and you can prob find some comparisons vs. thread pools too.
(if anyone can clarify these points, feel free)
Event-driven applications are not inherently faster.
From Why Events Are a Bad Idea (for High-Concurrency Servers):
We examine the claimed strengths of events over threads and show that the
weaknesses of threads are artifacts of specific threading implementations
and not inherent to the threading paradigm. As evidence, we present a
user-level thread package that scales to 100,000 threads and achieves
excellent performance in a web server.
This was in 2003. Surely the state of threading on modern OSs has improved since then.
Writing the core of an event-based server means re-inventing cooperative multitasking (Windows 3.1 style) in your code, most likely on an OS that already supports proper pre-emptive multitasking, and without the benefit of transparent context switching. This means that you have to manage state on the heap that would normally be implied by the instruction pointer or stored in a stack variable. (If your language has them, closures ease this pain significantly. Trying to do this in C is a lot less fun.)
This also means you gain all of the caveats cooperative multitasking implies. If one of your event handlers takes a while to run for any reason, it stalls that event thread. Totally unrelated requests lag. Even lengthy CPU-invensive operations have to be sent somewhere else to avoid this. When you're talking about the core of a high-concurrency server, 'lengthy operation' is a relative term, on the order of microseconds for a server expected to handle 100,000 requests per second. I hope the virtual memory system never has to pull pages from disk for you!
Getting good performance from an event-based architecture can be tricky, especially when you consider latency and not just throughput. (Of course, there are plenty of mistakes you can make with threads as well. Concurrency is still hard.)
A couple important questions for the author of a new server application:
How do threads perform on the platforms you intend to support today? Are they going to be your bottleneck?
If you're still stuck with a bad thread implementation: why is nobody fixing this?
It really depends what you're doing; event-based programming is certainly tricky for nontrivial applications. Being a web server is really a very trivial well understood problem and both event-driven and threaded models work pretty well on modern OSs.
Correctly developing more complex server applications in an event model is generally pretty tricky - threaded applications are much easier to write. This may be the deciding factor rather than performance.
It isn't about the threads really. It is about the way the threads are used to service requests. For something like lighttpd you have a single thread that services multiple connections via events. For older versions of apache you had a process per connection and the process woke up on incoming data so you ended up with a very large number when there were lots of requests. Now however with MPM apache is event based as well see apache MPM event.

Resources