How many long-living concurrent TCP sessions are consideer reasonable? - linux

I am developing a TCP server in Linux which initially can handle thousands of concurrent clients, which are intendeed to be long living. However, after starting to implement some functionallity, I made a thread pool for that calls which are blocking and should be done apart, like database or disk access.
After some tests, under high load requesting "many" asynchronous functions my server starts to lag due to many tasks being enqueued, as they arrive faster than they can be processed. These tasks are solved in the nanoseconds, but there are thounsands. I do understand this is totally normal.
I could of course grow behind a load balancer or buying better servers with more cores, however, in practice and as standard in the industry, how many concurrent long-lived TCP sessions are consideer a "good" number in such a server like this one I'm describing? How can I say that the number of concurrent connections I got is "good enough"?

Unfortunately there's no a magic number to answer your question but I have some considerations for you to find your number:
First of all, every operational system has its own max number of
simultaneous connections because the port numbers are finite. So
check if you're not trespassing this number, else every new
connection will be refused by your server.
In order to identify how many simultaneous connections are okay to you, you must to establish a max time of response for your service.
Keep in mind that even having simultaneous connections, multicore CPUs etc... the response will come out by the same network and get bottle necked. Thus I advice you to do a load and a stress test over your architecture in order to find your acceptable latency limit.
TL;DR: There's no a magic answer, you should do a load and a stress test to find it.

Related

Can clients with slow connection break down blocking-socket-based server?

By definition of blocking sockets, all calls to send() or recv() are blocking until the whole networking operation is finished. This can take some time especially when using tcp and talking to client with slow connection. This is of course solved by introducing threads and thread pools. But what happens if all threads are blocked by some slow client? For example your server wants to serve 10 000+ clients with 100 threads sending data to all users every second. That means each thread would have to call send() 100 times every second. What happens if at some point 100 clients are connected with connections so slow that one call to send()/recv() takes 5 seconds to complete for them(or possibly attacker who does it on purpose). In that case all 100 threads are blocking and everyone else waits. How is this generally solved? Adding more threads to threadpool is probably not a solution since there can always be more slow clients and going for some really high number of threads would introduce even more problems with context switching resource consumption etc.
Can clients with slow connection break down blocking-socket-based server?
Yes, they can. And it does consume resources on the server side. And if too much of this happens, you can end up with a form of "denial of service".
Note that this is worst if you use blocking I/O on the server side because you are tying down a thread while the response is being sent. But it is still a problem with non-blocking I/O. In the latter case, you consume server side sockets, port numbers, and memory to buffer the responses waiting to be sent.
If you want to guard your server against the effects of slow clients, it needs to implement a timeout on sending responses. If the server finds that it is taking too long to write a response ... for whatever reason ... it should simply close the socket.
Typical web servers do this by default.
Finally, as David notes, using non-blocking I/O will make your server more scalable. You can handle more simultaneous requests with less resources. But there are still limits to how much a single server can scale. Beyond a certain point you need a way to spread the request load over multiple servers.

What's the relationship between QPS/TPS, response time and number of concurrent users

Some Concepts:
TPS means Transactions per second
Response time is the total amount of time it takes to respond to a request for service
Is this formula true?
TPS = number of concurrent users / response time
It is true if transactions happen sequentially and in only one thread (on one TCP connection) per user. In reality, however, when talking about web browsers, they will use multiple concurrent connections when talking to a host. 6 concurrent connections is quite common, so the host will then get TPS = 6 x concurrent users / response time.
Also, the browser will sometimes be blocked and not fetch things. Sometimes because it is executing code, sometimes because it cannot perform some operations simultaneously with other operations. See http://www.browserscope.org for more info.
Also, of course, clients (whether they are humans using a browser or e.g. a mobile phone app talking to its backend via a REST API) don't usually make requests back to back, continuously, at the highest possible rate. That is probably not a very realistic test case. Usually, clients will make a bunch of requests and then fall silent for a while, until the user does something new in the application that requires more data from the backend.

handling millions of requests/second: how does load balancer(main server thread) works

What will happen:
If I write a server application backed with a thread pool of millions of threads and it gets millions of requests per second
I have worked on developing web services. The web service was deployed on 1000's of computers with a front end load balancer. The load balancer's job was to distribute the traffic amongst the servers that actually process the web requests.
So my question is that since the process running inside load balancer itself HAS to be single threaded to listen to web requests on a port, how does it handle accepting millions of requests per second. the load balancer might be busy delegating a task, then what happens to the incoming request at that instance of time?
In my opinion, all clients will not be handled since there will only be single request handler thread to pass on the incoming request to the thread pool
This way no multi threaded server should ever work.
I wonder how does facebook/amazon handles millions of requests per second.
You are right, it won't work. There is a limit to how much a single computer can process, which is nothing to do with how many threads it is running.
The way Amazon and Facebook etc handle it is to have hundreds or thousands of servers spread throughout the world and then they pass the requests out to those various servers. This is a massive subject though so if you want to know more I suggest you read up on distributed computing and come back if you have specific questions.
With the edit, the question makes much more sense. It is not hard to distribute millions of requests per second. A distribution operation should take somewhat in the viscinity of tens of nanoseconds and would merely consist of pushing the received socket into the queue. No biggie.
As soon as it's done, balancer is ready to accept the next request.

Using Fleck Websocket for 10k simultaneous connections

I'm implementing a websocket-secure (wss://) service for an online game where all users will be connected to the service as long they are playing the game, this will use a high number of simultaneous connections, although the traffic won't be a big problem, as the service is used for chat, storage and notifications... not for real-time data synchronization.
I wanted to use Alchemy-Websockets, but it doesn't support TLS (wss://), so I have to look for another service like Fleck (or other).
Alchemy has been tested with high number of simultaneous connections, but I didn't find similar tests for Fleck, so I need to get some real info from users of fleck.
I know that Fleck is non-blocking and uses Async calls, but I need some real info, cuz it might be abusing threads, garbage collector, or any other aspect that won't be visible to lower number of connections.
I will use c# for the client as well, so I don't need neither hybiXX compatibility, nor fallback, I just need scalability and TLS support.
I finally added Mono support to WebSocketListener.
Check here how to run WebSocketListener in Mono.
10K connections is not little thing. WebSocketListener is asynchronous and it scales well. I have done tests with 10K connections and it should be fine.
My tests shows that WebSocketListener is almost as fast and scalable as the Microsoft one, and performs better than Fleck, Alchemy and others.
I made a test on a Windows machine with Core2Duo e8400 processor and 4 GB of ram.
The results were not encouraging as it started delaying handshakes after it reached ~1000 connections, i.e. it would take about one minute to accept a new connection.
These results were improved when i used XSockets as it reached 8000 simultaneous connections before the same thing happened.
I tried to test on a Linux VPS with Mono, but i don't have enough experience with Linux administration, and a few system settings related to TCP, etc. needed to change in order to allow high number of concurrent connections, so i could only reach ~1000 on the default settings, after that he app crashed (both Fleck test and XSocket test).
On the other hand, I tested node.js, and it seemed simpler to manage very high number of connections, as node didn't crash when reached the limits of tcp.
All the tests where echo test, the servers send the same message back to the client who sent the message and one random other connected client, and each connected client sends a random ~30 chars text message to the server on a random interval between 0 and 30 seconds.
I know my tests are not generic enough and i encourage anyone to have their own tests instead, but i just wanted to share my experience.
When we decided to try Fleck, we have implemented a wrapper for Fleck server and implemented a JavaScript client API so that we can send back acknowledgment messages back to the server. We wanted to test the performance of the server - message delivery time, percentage of lost messages etc. The results were pretty impressive for us and currently we are using Fleck in our production environment.
We have 4000 - 5000 concurrent connections during peak hours. On average 40 messages are sent per second. Acknowledged message ratio (acknowledged messages / total sent messages) never drops below 0.994. Average round-trip for messages is around 150 miliseconds (duration between server sending the message and receiving its ack). Finally, we did not have any memory related problems due to Fleck server after its heavy usage.

How is it possible to have more then 1 concurrent connection in IIS?

From what I understand about the HTTP protocol is that it is stateless. This means (to me) that is, it is only ever serving one connection at a time.
Even if there is 1,000,000 million users trying to access a site, it can only ever be serving one connection at a time.
So when I see a setting in IIS saying "Maximum number of concurrent users" (or similar) it makes me wonder, what does this mean?
In theory, it can be any number until the TCP connections run out.
In reality, it is limited by your hardware and your applications and what the application/users are doing. You need to do stress test for your server.
hope it help.

Resources