Mongodb docs suggests to reduce tcp keepalive time for better performance:
If you experience socket errors between clients and servers or between members of a sharded cluster or replica set that do not have other reasonable causes, check the TCP keepalive value (for example, the tcp_keepalive_time value on Linux systems). A common keepalive period is 7200 seconds (2 hours); however, different distributions and macOS may have different settings.
However it does not explain why this will help, how it improves performance. From my (limited)understanding, connections created by mongo shards and replicas will have their own keep alive time, which might be way shorter than linux global keep-alive values. so Mongo might break the connection as par it's config and creating new connection should ideally not take too much time.
How will it improve performance by reducing linux tcp keep alive setting?
I thank the shorter keepalive setting on DB side or client side will keep less total connections but with higher percentage of active connections between server-client or server-server, also less connections (pool) will use less resource on server and client side.
Related
I am using cassandra according to the following struct:
21 nodes , AWS EC2 i3.2xlarge , version 3.11.4 .
The application is opening about 5000 connection per node (so its 100k connections per cluster) using the datastax java connection driver.
Application is using autoscale and frequently opens/close connections.
Number of connections to open at once by app servers can reach up to 500 per node (opens simultaneously on all nodes at once - so its 10k connections opens at the same time across the cluster)
This cause spikes of load on cassandra and cause reads and writes latency.
I have noticed each time connections opens/close there are high number of reads from system_auth.roles and system_auth.role_permissions.
How can I prevent the load and resolve this issue ?
You need to modify your application to work with as small number of connections as possible. You need to have following in mind:
Create Cluster/Session object, once at start and keep it. Initialization of session is very expensive operation, it adds a load to Cassandra, and to your application as well
you may increase the number of the simultaneous requests per connection, instead of opening new connections. Protocol allows to have up to 32k requests per connection. Although, if you have too many requests in-flight, then it's a sign that your Cassandra doesn't keep with workload and can't answer fast enough. See documentation on connection pooling
I am developing a TCP server in Linux which initially can handle thousands of concurrent clients, which are intendeed to be long living. However, after starting to implement some functionallity, I made a thread pool for that calls which are blocking and should be done apart, like database or disk access.
After some tests, under high load requesting "many" asynchronous functions my server starts to lag due to many tasks being enqueued, as they arrive faster than they can be processed. These tasks are solved in the nanoseconds, but there are thounsands. I do understand this is totally normal.
I could of course grow behind a load balancer or buying better servers with more cores, however, in practice and as standard in the industry, how many concurrent long-lived TCP sessions are consideer a "good" number in such a server like this one I'm describing? How can I say that the number of concurrent connections I got is "good enough"?
Unfortunately there's no a magic number to answer your question but I have some considerations for you to find your number:
First of all, every operational system has its own max number of
simultaneous connections because the port numbers are finite. So
check if you're not trespassing this number, else every new
connection will be refused by your server.
In order to identify how many simultaneous connections are okay to you, you must to establish a max time of response for your service.
Keep in mind that even having simultaneous connections, multicore CPUs etc... the response will come out by the same network and get bottle necked. Thus I advice you to do a load and a stress test over your architecture in order to find your acceptable latency limit.
TL;DR: There's no a magic answer, you should do a load and a stress test to find it.
I'm running into a lot of trouble when trying to seed a lot of torrents( > 10k) with libtorrent.
They include:
Choking my network connection
Tracker requests timing out(libtorrent tracker error)
When using auto-manage(they go from checking to seeding very slowly, even when my active_seeding is set to unlimited.
I used to let them be automanaged, but I'd find that it makes nearly all of them unavailable.
Here are my current settings:
sessionSettings.setActiveDownloads(5);
sessionSettings.setActiveLimit(-1);
sessionSettings.setActiveSeeds(-1);
sessionSettings.setActiveDHTLimit(5);
sessionSettings.setPeerConnectTimeout(25);
sessionSettings.announceDoubleNAT(true);
sessionSettings.setUploadRateLimit(0);
sessionSettings.setDownloadRateLimit(0);
sessionSettings.setHalgOpenLimit(5);
sessionSettings.useReadCache(false);
sessionSettings.setMaxPeerlistSize(500);
My current method is to loop over all my 10k+ torrents, and run torrent.resume(). When using automanage, this basically only starts ~ 50 of the torrents, and the others start about at a rate of 1 torrent per 10 minutes, which wouldn't work. When not using automanage, it chokes my connection.
BUT, when I do only 30 of them, they all seem to seed correctly, so my next plan is try to resume() them in groupings either with a time delay, or after they've received a tracker_reply.
I tried to garner what I could from this, but don't know what my settings should be specifically:
http://blog.libtorrent.org/2012/01/seeding-a-million-torrents/
I'd really appreciate someone sharing their settings for seeding thousands of torrents,
When not using automanage, it chokes my connection.
Since you say it can run either on a hosted server or domestic internet connection then you will have not much of a choice but to throttle torrent startups. Domestic internet connections are generally behind consumer grade routers and possibly CGNAT, both of which have fairly small NAT tables that will eventually choke from concurrently established TCP connections (peer-peer connections, tracker announces) or UDP pseudo-connections (UDP trackers, µTP, DHT)
So to run many torrents at once you will have to limit all active maintenance traffic of that kind so that the torrents are only started to listen passively for incoming connections.
Trying to build a TCP server using Spring Integration in which keeps connections may run into thousands at any point in time. Key concerns are regarding
Max no. of concurrent client connections that can be managed as session would be live for a long period of time.
What is advise in case connections exceed limit specified in (1).
Something along the lines of a cluster of servers would be helpful.
There's no mechanism to limit the number of connections allowed. You can, however, limit the workload by using fixed thread pools. You could also use an ApplicationListener to get TcpConnectionOpenEvents and immediately close the socket if your limit is exceeded (perhaps sending some error to the client first).
Of course you can have a cluster, together with some kind of load balancer.
One of our linux servers today experienced problems opening outbound requests.
I've reviewed this answer, Increasing the maximum number of tcp/ip connections in linux and it appears as though we are well within the maximum limits.
At the time, netstat -an showed approximately 700 established connections.
Any new socket connections would fail, but nothing would be written to /var/log.
All connections are long-term, and usually open for several hours at a time.
Is there any logging that would help determine what configuration parameter we are bumping against?
nf_conntrack_tcp_timeout_established
It turns out that there’s another timeout value you need to be concerned with. The established connection timeout. Technically this should only apply to connections that are in the ESTABLISHED state, and a connection should get out of this state when a FIN packet goes through in either direction. This doesn’t appear to happen and I’m not entirely sure why.
So how long do connections stay in this table then? The default value for nf_conntrack_tcp_timeout_established is 432000 seconds. I’ll wait for you to do the long division…
Fun times.
I changed the timeout value to 10 minutes (600 seconds) and in a few days time I noticed conntrack_count go down steadily until it sat at a very manageable level of a few thousand.
We did this by adding another line to the sysctl file:
net.netfilter.nf_conntrack_tcp_timeout_established=600