Can I disable TCP windows scaling for a single application? - linux

In Linux, can an application enable or disable TCP window scaling for TCP/IP connections created by the application? As opposed to a system-wide modification through sysctl using the net.ipv4.tcp_window_scaling parameter.

No you can't. There are no per-process APIs for sockets at all, just per-socket APIs and global kernel configurations.
But you don't need to modify the scale settings directly. You just need to set the socket receive buffer size you want prior to connecting. Then the appropriate window scale is negotiated during the connect handshake. If you want mo window scaling! make sure your socket receive buffer is < 64k before connecting. In the case of accepted sockets, that is set on the listening socket.

Related

How can I read the number of received TCP segments for a socket

I have an active TCP connection in a user-mode application that has no root privileges. I was wondering if there would be a way to programmatically check what is the number of TCP segments or the size of TCP/IP headers that the other side has sent me. I know there is no portable way, but perhaps the Linux kernel offers this information? Does anybody know?

Too many persistent TCP connections

We have around 500 clients connected to a Linux RedHat ES 5 server.
Recently it occurs, that the server still holds connections to clients which have been rebootet without stopping the application, which communicates with the server, before.
A netstat on the client always returns only one established connection to the server. After a client reboot, communication runs over a new established connection. On server side sometimes the old connection is closed, sometimes it stays in state established so that we have a growing number of established connections to each client.
Because various client operating systems are affected, I think that this isn't an application issue, but one of the Linux OS of the server.
I tried to tune the values of
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 9
without success.
Also I tried to set the maximum file handles value from 1024 to 2048, but connections still never get closed, not even after the TCP keepalive time expires.
Does somebody have an idea what could cause that strange behaviour?
Those settings allow you to configure the default keep-alive behavior (when keep-alives are enabled). However, they do not make keep-alives automatic. The feature must still be explicitly enabled on a per-socket basis via the SO_KEEPALIVE socket option.
See http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/ for details. From section 3:
Remember that keepalive support, even if configured in the kernel, is not the default behavior in Linux. Programs must request keepalive control for their sockets using the setsockopt interface.

Permanent TCP connection for administration use

I am facing the following situation:
I have several devices (embedded devices running ARCH Linux) and i would like to have administration access to each device at any time. The problem is the devices are behind a NAT, so establishing a connection from a server to a device is not possible. How could i achieve this?
I thought i could write a simple service running on the device that opens a connection to a server at startup. This TCP connection remains open and can be used from the server to administrate the device. But is it a good idea to keep TCP connections open for a long time? If i have a lot of devices, for example 1000, will i have a problem on the server side with 1000 open TCP connections?
Is there maybe another way?
Thanks a lot!
But is it a good idea to keep TCP connections open for a long time?
It's not necessarily a bad idea; although in practice the connections will fail from time to time (e.g. due to network reconfiguration, temporary network outages, etc), so your clients should contain logic to reconnect automatically when this happens. Also note that TCP will not usually not detect it when a completely-idle TCP connection no longer has connectivity, so to avoid "zombie connections" that aren't actually connected, you may want to either enable SO_KEEPALIVE, or have your clients and/or server send the (very occasional) bit of dummy data on the socket just to goose the TCP stack into checking whether connectivity still exists on the socket.
If i have a lot of devices, for example 1000, will i have a problem on the server side with 1000 open TCP connections?
Scaling is definitely an issue you'll need to think about. For example, select() is typically implemented to only handle up to a fixed number of connections (often 1024), or if your server is using the thread-per-connection model, you'd find that a process with 1000+ threads is not very efficient. Check out the c10k problem article for lots of interesting details about various approaches and how well they scale up (or don't).
Is there maybe another way?
If you don't need immediate access to the clients, you could always have them check in periodically instead (e.g. once every 5 minutes); or you could have them occasionally send a UDP packet to the server instead of keeping a TCP connection all the time, just to let the server know their presence, and have the server indicate to them somehow (e.g. by updating a well-known web page that the clients check from time to time) when it wanted one of them to open a full TCP connection. Or maybe just use multiple servers to share the load...
The only limit I know of is imposed by state tracking in the iptables code. Check the value of net.ipv4.netfilter.ip_conntrack_max on both sides if you're using this to make sure you have enough headroom for other activities.
If you set the socket option SO_KEEPALIVE before the connect() call, the kernel will send TCP keepalives to make sure the far end is still there. This will mean that connections won't linger forever in the event of a reboot.

Can I get TCP window size for current connection before I send data with Linux API send or sendto

Is there Linux API available that I can retrieve TCP window size for current TCP connection before I send data with Linux API send or sendto?
The reason I need this is if current TCP window size is less than the length of data I need to send, I can release CPU manually and do something else in other threads or processes.
Maybe there's a better method but, actuallly, only the use of a RAW SOCKET comes in my mind.
Handling data at transport layer allows you to access to TCP header (and then the window 16 bit field).
The counterpart is that you have to handle TCP stack between you and the peer, that is a bit crazy and laborious.
This is an example about what you need to do in order to JUST send a SYN to a host. Avoid the final loop, otherwise it starts a syn-flood attack against your peer :)
TCP with RAW SOCKETS

tcp_tw_reuse vs tcp_tw_recycle : Which to use (or both)?

I have a website and application which use a significant number of connections. It normally has about 3,000 connections statically open, and can receive anywhere from 5,000 to 50,000 connection attempts in a few seconds time frame.
I have had the problem of running out of local ports to open new connections due to TIME_WAIT status sockets. Even with tcp_fin_timeout set to a low value (1-5), this seemed to just be causing too much overhead/slowdown, and it would still occasionally be unable to open a new socket.
I've looked at tcp_tw_reuse and tcp_tw_recycle, but I am not sure which of these would be the preferred choice, or if using both of them is an option.
According to Linux documentation, you should use the TCP_TW_REUSE flag to allow reusing sockets in TIME_WAIT state for new connections.
It seems to be a good option when dealing with a web server that have to handle many short TCP connections left in a TIME_WAIT state.
As described here, The TCP_TW_RECYCLE could cause some problems when using load balancers...
EDIT (to add some warnings ;) ):
as mentionned in comment by #raittes, the "problems when using load balancers" is about public-facing servers. When recycle is enabled, the server can't distinguish new incoming connections from different clients behind the same NAT device.
NOTE: net.ipv4.tcp_tw_recycle has been removed from Linux in 4.12 (4396e46187ca tcp: remove tcp_tw_recycle).
SOURCE: https://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux
pevik mentioned an interesting blog post going the extra mile in describing all available options at the time.
Modifying kernel options must be seen as a last-resort option, and shall generally be avoided unless you know what you are doing... if that were the case you would not be asking for help over here. Hence, I would advise against doing that.
The most suitable piece of advice I can provide is pointing out the part describing what a network connection is: quadruplets (client address, client port, server address, server port).
If you can make the available ports pool bigger, you will be able to accept more concurrent connections:
Client address & client ports you cannot multiply (out of your control)
Server ports: you can only change by tweaking a kernel parameter: less critical than changing TCP buckets or reuse, if you know how much ports you need to leave available for other processes on your system
Server addresses: adding addresses to your host and balancing traffic on them:
behind L4 systems already sized for your load or directly
resolving your domain name to multiple IP addresses (and hoping the load will be shared across addresses through DNS for instance)
According to the VMWare document, the main difference is TCP_TW_REUSE works only on outbound communications.
TCP_TW_REUSE uses server-side time-stamps to allow the server to use a time-wait socket port number for outbound communications once the time-stamp is larger than the last received packet. The use of these time-stamps allows duplicate packets or delayed packets from the old connection to be discarded safely.
TCP_TW_RECYCLE uses the same server-side time-stamps, however it affects both inbound and outbound connections. This is useful when the server is the first party to initiate connection closure. This allows a new client inbound connection from the source IP to the server. Due to this difference, it causes issues where client devices are behind NAT devices, as multiple devices attempting to contact the server may be unable to establish a connection until the Time-Wait state has aged out in its entirety.

Resources