Deal with a large number of outgoing TCP connections in linux - linux

I'm building a service that periodically connects to a large number of devices (thousands) via TCP sockets. These connections are all established at the same time, a couple of api commands are sent through each, then the sockets are closed. All devices are in the same subnet.
The trouble starts after about 1000 devices. New connections are not established. After waiting for a couple of minutes, everything is back to normal. My first guess was that the maximum number of socket connections had been reached and after reading through many similar questions and tutorials, I modified some kernel networking parameters like various cache sizes, max number of open files, tcp_tw_reuse, somaxconn. Unfortunately, it had little to no effect.
The problem does not seem to be burst-related: The first time I run the script, it works fine, but when I start it again a couple of minutes later, then I start seeing these errors. My best guess was that the number of open sockets builds up over time, possibly in the TIME_WAIT state. On the other hand, setting the tcp_tw_reuse parameter (which seems perfect for this scenario) did not have any noticeable effect. I close the sockets via pythons socket.close().
It may be important to stress that this question is not about a high-load server, but a high-load client! The connections are outgoing. I saw many server-related questions that were answered with the solutions I described above.

Related

How to catch/record a "Burst" in HAProxy and/or NodeJS traffic

We have a real-time service, which gets binary messages from different sources (internal and external), then using a couple of NodeJS instances and one HAProxy instance, configured to route TCP traffic, we deliver them to our end-users and different services who consume the messages. HAProxy version is 1.8.14, NodeJS is 6.14.3, both hosted on a CentOS 7 machine.
Now we've got a complex problem with some "burst"s in the outbound interface of HAProxy instance. We are not sure whether the burst is real (e.g. some messages got stuck in Node and then network gets flooded with messages) or the problem is some kind of misconfig or an indirect effect of some other service (Both latter reasons are more likely, as sometimes we get these bursts during midnight, which we have minimal to zero load).
The issue is annoying right now, but it might get critical as it floods our outbound traffic so our real-time services experience a lag or a small downtime during working hours.
My question is, how can we track and record the nature or the content of these messages with minimum overhead? I've been reading through HAProxy docs to find a way to monitor this, which can be achieved by using a Unix socket, but we are worried about a couple of things:
How much is the overhead of using this socket?
Can we track what is going on in the servers using this socket? Or it only gives us stats?
Is there a way to "catch/echo" the contents of these messages, or find out some information about them? with minimum overhead?
Please let me know if you have any questions regarding this problem.

Network map/mon tool clogs up telnet daemon / login processes

We have an embedded Linux (Kernel 2.6.x / Busybox) system (IP camera/web server) which is being tripped over by a network mapping/monitoring tool (specifically The Dude but I think the problem is a general one) repeatedly probing the Telnet port.
The sequence of events is this:
The tool probes port 23
Our system's Telnet daemon (busybox telnetd) spawns a new /bin/login thread
The tool, having satisfied itself there's something there, skips merrily on its way (it neither logs in nor closes the connection)
This keeps happening (every N seconds) until there are so many sockets open that our system can no longer serve a web page through lack of sockets, and there are hundreds of bin/login processes hanging around.
Apologies for vagueness, full logs & wireshark captures are on a different PC at this moment.
As I see it, we need to do a couple of things:
Put some sort of timeout on the telnet client / bin/login process if no login attempt is made
Put some sort of limit on the number of ports the telnet client can have open at any time
Kill off hanging / zombie sockets (TCP timeout / keepalive config?)
I'm not 100% clear on the correct approach to these three, given that the device is also serving web pages and streaming video so changes to system globals may impact the other services. I was a little surprised that Busybox seems to be open to what's effectively the world's slowest DDOS attack.
Edit:
I've worked out what I think is a reasonable way round this, a new question started for ironing out the wrinkles in that idea. Basically, login exits as soon as someone logs in, so we can kill logins with (relative) impunity when a new instance is launched.
The tool, having satisfied itself there's something there, skips merrily on its way (it neither logs in nor closes the connection)
That is an issue and in fact a type of DOS attack.
This keeps happening (every N seconds) until there are so many sockets open that our system can no longer serve a web page through lack of sockets, and there are hundreds of bin/login processes hanging around.
Apart from stopping the DOS attack you might want to mitigate it using some tools. In this specific case you might want to configure low TCP timeouts and thus get the sockets closed soon after they are open due to inactivity on the other end and thus the login processes should be terminated as well.
Well, let me answer my own question by linking to myself answering my own question on killing zombie logins each time a new one is spawned.
The solution is forcing telnetd to run a script instead of /bin/login, where the script kills any other instances of /bin/login before running a new one. Not ideal but hopefully solving the problem.

Concurrent networking in Scala

I have a working prototype of a concurrent Scala program using Actors. I am now trying to fine tune the number of different Actors, etc..
One stage of the processing requires fetching new data via the internet. Of course, there is nothing I can really do to speed that aspect up. However, I figure if I launch a bunch of requests in parallel, I can bring down the total time. The question, therefore, is:
=> Is there a limit on concurrent networking in Scala or on Unix systems (such as max num sockets)? If so, how can I find out what it is.
In Linux, there is a limit on the number of open file descriptors each program can have open. This can be seen using the ulimit -n. There is a system-wide limit in /proc/sys/kernel/file-max.
Another limit is the number of connections that the Linux firewall can track. If you are using the iptables connection tracking firewall this value is in /proc/sys/net/netfilter/nf_conntrack_max.
Another limit is of course TCP/IP itself. You can only have 65534 connections to the same remote host and port because each connection needs a unique combination of (localIP, localPort, remoteIP, remotePort).
Regarding speeding things up via concurrent connections: it isn't as easy as just using more connections.
It depends on where the bottlenecks are. If your local connection is being fully used, adding more connections will only slow things down. If you are connecting to the same remote server and its connection is fully used, more will only slow it down.
Where you can get a benefit is when your local connection is not fully used and you are connecting to multiple remote hosts.
If you look at web browsers, you will see they have limits on how many connections will be made to the same remote server. They also have limits on how many connections will be made in total.

Comet and node.js - how many simultaneous connections could we expect on an EC2 server?

With a comet server running on node.js - how many simultaneous connections could we expect to get out of an EC2 server?
Anyone done this before and found a reasonable limit?
Our particular application only needs to push data to the clients fairly infrequently, it's more the max simultaneous connections per server that is a worry for us. We're looking at somewhere between 200k - 500k i think, and i'm trying to figure out if comet is going to be workable without a monstrous fleet of servers...
If you are running Linux, get to know the contents of /proc/sys/net/ipv4
In particular, net.ipv4.netfilter.ip_conntrack_max will let you increase the maximum number of open connections, but when you start plugging in really big numbers you will run into other problems. For instance you might need to reduce orphan_retries because you will statistically be more likely to have orphans. And with really big numbers, it is entirely possible that kernel lookup algorithms will slow down significantly. You need to carefully tune the TCP settings.
If I were in your shoes, I would compare at least two OSes, such as Linux and FreeBSD or OpenSolaris/Illumos.
On FreeBSD you will need to change settings in /boot/loader.conf
On OpenSolaris/Illumos you will need to read the documentation for the ndd command.

Can I Open Multiple Connections to a HTTP Server?

I’m writing a small software component in order to download resources from a web Server (IIS).
But it seems like that system's performance is not acceptable. Now I’m planning to increase the number of connection to the web server by spawning multiple threads.
My question is, can I improve performance by using multiple threads? More over dose web server allow me to spawning several simultaneous connections?
Thanks
Upul
All properly configured web servers should be able to handle multiple connections from the same source. This allows, for example, a browser to download two images from a page at once.
Some servers may place an upper limit on the number of concurrent connections it will accept from one client, but this will usually be a high number. Using up to 6 connections is normally safe.
As for whether it will actually improve performance, that depends on your situation. If you have a very fast connection to an internet backbone, and find that the speed you are getting from the remote server is not taking advantage of the speed of your connection, then in many situations multithreading can improve speed. If the speed is already maxing out the speed of your connection to the internet, or the connection of the remote server, then it can't do anything.
Web servers are do allow simultaneous connections. There should not be any problem in opening one's unless the application logic prevents opening multiple connections for same client. To further clarify my point, if you require login before downloading resources and your application does not allow multiple simultaneous logins, then you will get stuck there. There are applications which do not allow lot of connections from same source for security reasons.

Resources