Is it possible to make a system call or write a kernel module to craft a tcp connection right into ESTABLISHED state without going over the three way handshaking process, assuming the correct SYN-seq and ack number are provided dynamically?
You may like to have a look at TCP fast open, which modern Linux kernels implement:
TCP Fast Open (TFO) is an extension to speed up the opening of successive Transmission Control Protocol (TCP) connections between two endpoints. It works by using a TFO cookie (a TCP option) in the initial SYN packet to authenticate a previously connected client. If successful, it may start sending data to the client before the receipt of the final ACK packet of the three way handshake is received, skipping a round trip and lowering the latency in the start of transmission of data.
Related
I have a requirement to enable TCP keepalive on any connections and now I am struggling with the results from our test case. I think this is because I do not really understand when the first keepalive probe is sent. I read the following in the documentation for tcp_keepalive_time on Linux:
the interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the
connection is marked to need keepalive, this counter is not used any
further
Some other sources state that this is the time a connection is idle, but they do not further define what this means. I also looked into Stevens to find a more formal definition of this, because I am wondering what "the last data packet sent" actually means when considering retransmissions.
In my test case, I have a connection where data is only sent from a server to a client at rather high rates. To test keepalive, we unplugged the cable on the client's NIC. I can now see that the network stack tries to send the data and enters the retransmission state, but no keep alive probe is sent. Is it correct that keep alive probes are not sent during retransmission?
I have a connection where data is only sent from a server to a client
at rather high rates.
Then you'll never see keepalives. Keepalives are sent when there is "silence on the wire". RFC1122 has some explanation re keepalives.
A "keep-alive" mechanism periodically probes the other end of a
connection when the connection is otherwise idle, even when there is
no data to be sent
Back to your question:
Some other sources state that this is the time a connection is idle,
but they do not further define what this means.
This is how long TCP will wait before poking the peer "hoy! still alive?".
$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
In other words, you've been using a TCP connection and it has been great. However, for the past 2 hours there hasn't been anything to send. Is it reasonable to assume the connection is still alive? Is it reasonable to assume all the middleboxes in the middle still have state about your connection? Opinions vary and keepalives aren't part of RFC793.
The TCP specification does not include a keep-alive mechanism it
could: (1) cause perfectly good connections to break during transient
Internet failures; (2) consume unnecessary bandwidth ("if no one is
using the connection, who cares if it is still good?")
To test keepalive, we unplugged the cable on the client's NIC.
This isn't testing keepalive. This is testing your TCPs retransmit strategy, i.e. how many times and how often TCP will try to get your message across. On a Linux box this (likely) ends up testing net.ipv4.tcp_retries2:
How may times to retry before killing alive TCP connection. RFC 1122
says that the limit should be longer than 100 sec. It is too small
number. Default value 15 corresponds to 13-30min depending on RTO.
But RFC5482 - TCP User Timeout Option provides more ways to influence it.
The TCP user timeout controls how long transmitted data may remain
unacknowledged before a connection is forcefully closed.
Back to the question:
Is it correct that keep alive probes are not sent during retransmission
It makes sense: TCP is already trying to elicit a response from the other peer, an empty keepalive would be superfluous.
Linux-specific (2.4+) options to influence keepalive
TCP_KEEPCNT The maximum number of keepalive probes TCP should send before dropping the connection.
TCP_KEEPIDLE The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes, if the socket option SO_KEEPALIVE has been set on this socket
TCP_KEEPINTVL The time (in seconds) between individual keepalive probes
Linux-specific (2.6.37+) option to influence TCP User Timeout
TCP_USER_TIMEOUT The maximum amount of time in
milliseconds that transmitted data may remain unacknowledged before
TCP will forcibly close connection.
So for example your application could use this option to determine how long the connection survives when there is no connectivity (similar to your NIC-unplugging example). E.g. if you have reason to believe the client will come back (perhaps they closed the laptop lid? spotty wireless access?) you can specify a timeout of 12 hours and when they do come back the connection will still function.
When i use TCP to send data, the write() function just ensures data has been copied to the TCP send buffer, but if TCP doesn't send data successfully, how do I know? Is there a signal? or what?
Short Answer: you don't. Conventionally the remote TCP peer sends a response, acknowledging your data. This is the 1st step toward building an application level protocol, atop TCP, your transport level protocol.
Longer Answer: This problem is the prime motivation for higher level protocols such as HTTP, STOMP, IMAP, etc.
but if TCP doesn't send data successfully, how do I know?
The write() system call can return -1 and set errno to indicate an error, however you cannot know how much data has been received by the remote peer, and how much was not. That question is best answered by the remote peer.
M.
I'm trying to write a TCP transparent proxy to run on Linux.
I want to, upon receipt of an incoming connection, initiate a corresponding outgoing connection, but only accept (SYN|ACK) the incoming connection if the outgoing connection is successful.
TCP_DEFERRED_ACCEPT doesn't do what I want -- it always sends a SYN|ACK.
The question is: how do I accept TCP connections, but defer the SYN|ACK, with the Linux sockets API?
You can do that with Linux, but not via the socket API. You would use the NFQUEUE target which allows you to redirect some packets to userspace and decide their fate from within your program.
Obiously, you'd still have to parse the packet in userspace, but searching for a few TCP flags should not be that hard and not require a complete TCP stack. And this way Linux still does the whole network job.
In your case, it would seem possible that you both use NFQUEUE and classical sockets API. The first will give you early decisions, the latter TCP stream data access. Although I never tried it.
See https://home.regit.org/netfilter-en/using-nfqueue-and-libnetfilter_queue/ for instance.
Is there Linux API available that I can retrieve TCP window size for current TCP connection before I send data with Linux API send or sendto?
The reason I need this is if current TCP window size is less than the length of data I need to send, I can release CPU manually and do something else in other threads or processes.
Maybe there's a better method but, actuallly, only the use of a RAW SOCKET comes in my mind.
Handling data at transport layer allows you to access to TCP header (and then the window 16 bit field).
The counterpart is that you have to handle TCP stack between you and the peer, that is a bit crazy and laborious.
This is an example about what you need to do in order to JUST send a SYN to a host. Avoid the final loop, otherwise it starts a syn-flood attack against your peer :)
TCP with RAW SOCKETS
After sending some tcp data by any method (mine is below)
DataOutputStream outToServer = new DataOutputStream(clientSocket.getOutputStream());
outToServer.writeBytes(string);
How can I verify in JAVA that TCP data is sent successfully? OR is there any way of reading the ACK received (from tcpserver) ?
You cannot. Operating systems typically does not expose this to applications.
If you need to know whether data has made it to the other end, you need to implement acks at your application protocol, not at the transport level that TCP concerns itself with.
I always use Wireshark to debug TCP apps. It is a TCP tracing tool that shows you the individual packets with their acks, retransmits etc.
It is not in code, but it does allow you to double check the behavior of your app.
Check wireshark.