TCP handshake with SOCK_RAW socket - linux

Ok, I realize this situation is somewhat unusual, but I need to establish a TCP connection (the 3-way handshake) using only raw sockets (in C, in linux) -- i.e. I need to construct the IP headers and TCP headers myself. I'm writing a server (so I have to first respond to the incoming SYN packet), and for whatever reason I can't seem to get it right. Yes, I realize that a SOCK_STREAM will handle this for me, but for reasons I don't want to go into that isn't an option.
The tutorials I've found online on using raw sockets all describe how to build a SYN flooder, but this is somewhat easier than actually establishing a TCP connection, since you don't have to construct a response based on the original packet. I've gotten the SYN flooder examples working, and I can read the incoming SYN packet just fine from the raw socket, but I'm still having trouble creating a valid SYN/ACK response to an incoming SYN from the client.
So, does anyone know a good tutorial on using raw sockets that goes beyond creating a SYN flooder, or does anyone have some code that could do this (using SOCK_RAW, and not SOCK_STREAM)? I would be very grateful.
MarkR is absolutely right -- the problem is that the kernel is sending reset packets in response to the initial packet because it thinks the port is closed. The kernel is beating me to the response and the connection dies. I was using tcpdump to monitor the connection already -- I should have been more observant and noticed that there were TWO replies one of which was a reset that was screwing things up, as well as the response my program created. D'OH!
The solution that seems to work best is to use an iptables rule, as suggested by MarkR, to block the outbound packets. However, there's an easier way to do it than using the mark option, as suggested. I just match whether the reset TCP flag is set. During the course of a normal connection this is unlikely to be needed, and it doesn't really matter to my application if I block all outbound reset packets from the port being used. This effectively blocks the kernel's unwanted response, but not my own packets. If the port my program is listening on is 9999 then the iptables rule looks like this:
iptables -t filter -I OUTPUT -p tcp --sport 9999 --tcp-flags RST RST -j DROP

You want to implement part of a TCP stack in userspace... this is ok, some other apps do this.
One problem you will come across is that the kernel will be sending out (generally negative, unhelpful) replies to incoming packets. This is going to screw up any communication you attempt to initiate.
One way to avoid this is to use an IP address and interface that the kernel does not have its own IP stack using- which is fine but you will need to deal with link-layer stuff (specifically, arp) yourself. That would require a socket lower than IPPROTO_IP, SOCK_RAW - you need a packet socket (I think).
It may also be possible to block the kernel's responses using an iptables rule- but I rather suspect that the rules will apply to your own packets as well somehow, unless you can manage to get them treated differently (perhaps applying a netfilter "mark" to your own packets?)
Read the man pages
socket(7)
ip(7)
packet(7)
Which explain about various options and ioctls which apply to types of sockets.
Of course you'll need a tool like Wireshark to inspect what's going on. You will need several machines to test this, I recommend using vmware (or similar) to reduce the amount of hardware required.
Sorry I can't recommend a specific tutorial.
Good luck.

I realise that this is an old thread, but here's a tutorial that goes beyond the normal SYN flooders: http://www.enderunix.org/docs/en/rawipspoof/
Hope it might be of help to someone.

I can't help you out on any tutorials.
But I can give you some advice on the tools that you could use to assist in debugging.
First off, as bmdhacks has suggested, get yourself a copy of wireshark (or tcpdump - but wireshark is easier to use). Capture a good handshake. Make sure that you save this.
Capture one of your handshakes that fails. Wireshark has quite good packet parsing and error checking, so if there's a straightforward error it will probably tell you.
Next, get yourself a copy of tcpreplay. This should also include a tool called "tcprewrite".
tcprewrite will allow you to split your previously saved capture files into two - one for each side of the handshake.
You can then use tcpreplay to play back one side of the handshake so you have a consistent set of packets to play with.
Then you use wireshark (again) to check your responses.

I don't have a tutorial, but I recently used Wireshark to good effect to debug some raw sockets programming I was doing. If you capture the packets you're sending, wireshark will do a good job of showing you if they're malformed or not. It's useful for comparing to a normal connection too.

There are structures for IP and TCP headers declared in netinet/ip.h & netinet/tcp.h respectively. You may want to look at the other headers in this directory for extra macros & stuff that may be of use.
You send a packet with the SYN flag set and a random sequence number (x). You should receive a SYN+ACK from the other side. This packet will have an acknowledgement number (y) that indicates the next sequence number the other side is expecting to receive as well as another sequence number (z). You send back an ACK packet that has sequence number x+1 and ack number z+1 to complete the connection.
You also need to make sure you calculate appropriate TCP/IP checksums & fill out the remainder of the header for the packets you send. Also, don't forget about things like host & network byte order.
TCP is defined in RFC 793, available here: http://www.faqs.org/rfcs/rfc793.html

Depending on what you're trying to do it may be easier to get existing software to handle the TCP handshaking for you.
One open source IP stack is lwIP (http://savannah.nongnu.org/projects/lwip/) which provides a full tcp/ip stack. It is very possible to get it running in user mode using either SOCK_RAW or pcap.

if you are using raw sockets, if you send using different source mac address to the actual one, linux will ignore the response packet and not send an rst.

Related

Dropping packets with netcat using a UDP transfer?

I'm working on sending large data files between two Linux computers via a 10 Gigabit Ethernet cable and netcat with a UDP transfer, but seem to be having issues.
After running several tests, I've come to the conclusion that netcat is the issue. I've tested the UDP transfer using [UDT][1], [Tsunami-UDP]2, and a Python UDT transfer as well, and all of which have not had any packet loss issues.
On the server side, we've been doing:
cat "bigfile.txt" | pv | nc -u IP PORT
then on the client side, we've been doing:
nc -u -l PORT > "outputFile.txt"
A few things that we've noticed:
On one of the computers, regardless of whether it's the client or server, it just "hangs". That is to say, even once the transfer is complete, Linux doesn't kill the process and move to the next line in the terminal.
If we run pipe view on the receiving side as well, the incoming data rate is significantly lower than what the sending side thinks it's sending.
Running Wireshark doesn't show any packet loss.
Running the system performance monitor in Linux shows that the incoming data rate (for the receiving side) is the same as the outgoing data rate from the sending side. This is in contrast to what pipe view thinks (see #2)
We're not sure where the issue is with netcat, and if there is a way around it. Any help/insights would be greatly appreciated.
Also, for what it's worth, using netcat with a TCP transfer works fine. And, I do understand that UDP isn't known for reliability, and that packet loss should be expected, but it's the protocol we must use.
Thanks
It could well be that the sending instance is sending the data too fast for the receiving instance. Note that this can occur even if you see no drops on the receiving NIC (as you seem to be saying), because the loss can occur at OS level instead. Your OS could have its UDP buffers overflowing. Run this command:
watch -d "cat /proc/net/snmp | grep -w Udp"
To see if your RcvbufErrors field is non-zero and/or growing while your file transfer is going on.
This answer (How to send only one UDP packet with netcat?) says that nc sends one packet per line. Assuming that's true, this could lead to a significantly higher number of packets than your other transfer mechanisms. Presumably, as #Smeeheey suggested, you're running out of receive buffers on the receiving end.
To cause your sending end to exit, you can add -q 1 to the command line (exit 1 second after seeing end of file).
But there's no way that the the receiving end nc can know when the transfer is complete. This is why these other mechanisms are "protocols" -- they have mechanisms built into them to communicate the bounds of a file. Raw UDP has no concept of end of file.
Tuning the Linux networking stack is a bit complicated, as there are many components to tune to figure out where data is being dropped.
If possible/feasible, I'd recommend that you start by monitoring packet drops throughout the entire network stack. Once you've done that, you can determine where exactly packets are being dropped and then adjust tuning parameters as needed. There are a lot of different files to measure with lots of different fields. I wrote a detailed blog post about monitoring and tuning each component of the Linux networking stack from top to bottom. It's a bit difficult to summarize all the information there, but take a look, I think it can help guide you.

Send TCP SYN packet with payload

Is it possible to send a SYN packet with self-defined payload when initiating TCP connections? My gut feeling is that it is doable theoretically. I'm looking for a easy way to achieve this goal in Linux (with C or perhaps Go language) but because it is not a standard behavior, I didn't find helpful information yet. (This post is quite similar while it is not very helpful.)
Please help me, thanks!
EDIT: Sorry for the ambiguity. Not only the possibility for such task, I'm also looking for a way, or even sample codes to achieve it.
As far as I understand (and as written in a comment by Jeff Bencteux in another answer), TCP Fast Open addresses this for TCP.
See this LWN article:
Eliminating a round trip
Theoretically, the initial SYN segment could contain data sent by the initiator of the connection: RFC 793, the specification for TCP, does permit data to be included in a SYN segment. However, TCP is prohibited from delivering that data to the application until the three-way handshake completes.
...
The aim of TFO is to eliminate one round trip time from a TCP conversation by allowing data to be included as part of the SYN segment that initiates the connection.
Obviously if you write your own software on both sides, it is possible to make it work however you want. But if you are relying on standard software on either end (such as, for example, a standard linux or Windows kernel), then no, it isn't possible, because according to TCP, you cannot send data until the session is established, and the session isn't established until you get an acknowledgment to your SYN from the other peer.
So, for example, if you send a SYN packet that also includes additional payload to a linux kernel (caveat: this is speculation to some extent since I haven't actually tried it), it will simply ignore the payload and proceed to acknowledge (SYN/ACK) or reject (with RST) the SYN depending on whether there's a listener.
In any case, you could try this, but since you're going "off the reservation" so to speak, you would need to craft your own raw packets; you won't be able to convince your local OS to create them for you.
The python scapy package could construct it:
#!/usr/bin/env python2
from scapy.all import *
sport = 3377
dport = 2222
src = "192.168.40.2"
dst = "192.168.40.135"
ether = Ether(type=0x800, dst="00:0c:29:60:57:04", src="00:0c:29:78:b0:ff")
ip = IP(src=src, dst=dst)
SYN = TCP(sport=sport, dport=dport, flags='S', seq=1000)
xsyn = ether / ip / SYN / "Some Data"
packet = xsyn.build()
print(repr(packet))
TCP Fast open do that. But both ends should speak TCP fast open. QUIC a new protocol is based to solve this problem AKA 0-RTT.
I had previously stated it was not possible. In the general sense, I stand by that assessment.
However, for the client, it is actually just not possible using the connect() API. There is an alternative connect API when using TCP Fast Open. Example:
sfd = socket(AF_INET, SOCK_STREAM, 0);
sendto(sfd, data, data_len, MSG_FASTOPEN,
(struct sockaddr *) &server_addr, addr_len);
// Replaces connect() + send()/write()
// read and write further data on connected socket sfd
close(sfd);
There is no API to allow the server to attach data to the SYN-ACK sent to the client.
Even so, enabling TCP Fast Open on both the client and server may allow you to achieve your desired result, if you only mean data from the client, but it has its own issues.
If you want the same reliability and data stream semantics of TCP, you will need a new reliable protocol that has the initial data segment in addition to the rest of what TCP provides, such as congestion control and window scaling.
Luckily, you don't have to implement it from scratch. The UDP protocol is a good starting point, and can serve as your L3 for your new L4.
Other projects have done similar things, so it may be possible to use those instead of implementing your own. Consider QUIC or UDT. These protocols were implemented over the existing UDP protocol, and thus avoid the issues faced with deploying TCP Fast Open.

Finding out the number of dropped packets in raw sockets

I am developing a program that sniffs network packets using a raw socket (AF_PACKET, SOCK_RAW) and processes them in some way.
I am not sure whether my program runs fast enough and succeeds to capture all packets on the socket. I am worried that the recieve buffer for this socket occainally gets full (due to traffic bursts) and some packets are dropped.
How do I know if packets were dropped due to lack of space in the
socket's receive buffer?
I have tried running ss -f link -nlp.
This outputs the number of bytes that are currently stored in the revice buffer for that socket, but I can not tell if any packets were dropped.
I am using Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-52-generic x86_64).
Thanks.
I was having a similar problem as you. I knew that tcpdump was able to to generate statistics about packet drops, so I tried to figure out how it did that. By looking at the code of tcpdump, I noticed that it is not generating those statistic by itself, but that it is using the libpcap library to get those statistics. The libpcap is on the other hand getting those statistics by accessing the if_packet.h header and calling the PACKET_STATISTICS socket option (at least I think so, but I'm no C expert).
Therefore, I saw only two solutions to the problem:
I had to interact somehow with the linux header files from my Pyhton script to get the packet statistics, which seemed a bit complicated.
Use the Python version of libpcap which is pypcap to get those information.
Since I had no clue how to do the first thing, I implemented the second option. Here is an example how to get packet statistics using pypcap and how to get the packet data using dpkg:
import pcap
import dpkt
import socket
pc=pcap.pcap(name="eth0", timeout_ms=10000, immediate=True)
def packet_handler(ts,pkt):
#printing packet statistic (packets received, packets dropped, packets dropped by interface
print pc.stats()
#example packet parsing using dpkt
eth=dpkt.ethernet.Ethernet(pkt)
if eth.type != dpkt.ethernet.ETH_TYPE_IP:
return
ip =eth.data
layer4=ip.data
ipsrc=socket.inet_ntoa(ip.src)
ipdst=socket.inet_ntoa(ip.dst)
pc.loop(0,packet_handler)
tpacket_stats structure is defined in linux/packet.h header file
Create variable using the tpacket_stats structre and pass it to getSockOpt with PACKET_STATISTICS SOL_SOCKET options will give packets received and dropped count.
-- some times drop can be due to buffer size
-- so if you want to decrease the drop count check increasing the buffersize using setsockopt function
First off, switch your operating system.
You need a reliable, network oriented operating system. Not some pink fluffy "ease of use" with "security" functionality enabled. NetBSD or Gentoo/ArchLinux (the bare installations, not the GUI kitted ones).
Start a simultaneous tcpdump on a network tap and capture the traffic you're supposed to receive along side of your program and compare the results.
There's no efficient way to check if you've received all the packets you intended to on the receiving end since the packets might be dropped on a lower level than you anticipate.
Also this is a question for Unix # StackOverflow, there's no programming here what I can see, at least there's no code.
The only certain way to verify packet drops is to have a much more beefy sender (perhaps a farm of machines that send packets) to a single client, record every packet sent to your reciever. Have the statistical data analyzed and compared against your senders and see how much you dropped.
The cheaper way is to buy a network tap or even more ad-hoc enable port mirroring in your switch if possible. This enables you to dump as much traffic as possible into a second machine.
This will give you a more accurate result because your application machine will be busy as it is taking care of incoming traffic and processing it.
Further more, this is why network taps are effective because they split the communication up into two channels, the receiving and sending directions of your traffic if you will. This enables you to capture traffic on two separate machines (also using tcpdump, but instead of a mirrored port, you get a more accurate traffic mirroring).
So either use port mirroring
Or you buy one of these:

Linux: How to send a whole packet to a specific port on another host?

I have captured a TCP packet using libpcap, and I want to send this whole packet(without modifying it) to a specific port on another host(which has another sniffer listening to that port).
Is there any way I can do this?
Thanks a lot!
You didn't specify which programming language you're using and what you've tried so far.
Change the IP address field to the target IP and the TCP port field to the port you want. Don't forget to update both checksums.
If what you want is TCP forwarding, the Linux kernel already does this for you.
netcat may work in this case although I think you may have to reconstruct the header, have not tried.
How to escape hex values in netcat
The other option is to use iptables to tee the packet to the other sniffer while still catching it in you package analyzer
http://www.bjou.de/blog/2008/05/howto-copyteeclone-network-traffic-using-iptables/
Another option is using a port mirror, this goes by a few differnt names depending on the switch being used but it allows you to set a port on a a switch to be essentially a hub.
I think your best bet if you can't get netcat to work is to use iptables and you can add filters to that even.
I don't know whether you HAVE to use C or not, but even if you do, I would recommend building a prototype with Python/Scapy to begin with.
Using scapy, here are the steps:
Read the pcap file using rdpcap().
Grab the destination IP address and TCP destination port number (pkt.getlayer(IP).dst, pkt.getlayer(TCP).dport) and save it as a string in a format that you want (e.g. payload = "192.168.1.1:80").
Change the packet's destination IP address and the destination port number so that it can be received by the other host that is listening on the particular port.
Add the payload on top of the packet (pkt = pkt / payload)
Send the packet (sendp(pkt, iface='eth0'))
You will have to dissect the packet on the other host to grab the payload. Without knowing exactly what is on top of the TCP layer in the original packet, I can't give you an accurate code for this, but should be relatively straight forward.
This is all quite easy with Python/Scapy but I expect it to be much harder with C, having to manually calculate the correct offsets and checksums and things. Good luck, and I hope this helps.

I need a TCP option (ioctl) to send data immediately

I've got an unusual situation: I'm using a Linux system in an embedded situation (Intel box, currently using a 2.6.20 kernel.) which has to communicate with an embedded system that has a partially broken TCP implementation. As near as I can tell right now they expect each message from us to come in a separate Ethernet frame! They seem to have problems when messages are split across Ethernet frames.
We are on the local network with the device, and there are no routers between us (just a switch).
We are, of course, trying to force them to fix their system, but that may not end up being feasible.
I've already set TCP_NODELAY on my sockets (I connect to them), but that only helps if I don't try to send more than one message at a time. If I have several outgoing messages in a row, those messages tend to end up in one or two Ethernet frames, which causes trouble on the other system.
I can generally avoid the problem by using a timer to avoid sending messages too close together, but that obviously limits our throughput. Further, if I turn the time down too low, I risk network congestion holding up packet transmits and ending up allowing more than one of my messages into the same packet.
Is there any way that I can tell whether the driver has data queued or not? Is there some way I can force the driver to send independent write calls in independent transport layer packets? I've had a look through the socket(7) and tcp(7) man pages and I didn't find anything. It may just be that I don't know what I'm looking for.
Obviously, UDP would be one way out, but again, I don't think we can make the other end change anything much at this point.
Any help greatly appreciated.
IIUC, setting the TCP_NODELAY option should flush all packets (i.e. tcp.c implements setting of NODELAY with a call to tcp_push_pending_frames). So if you set the socket option after each send call, you should get what you want.
You cannot work around a problem unless you're sure what the problem is.
If they've done the newbie mistake of assuming that recv() receives exactly one message then I don't see a way to solve it completely. Sending only one message per Ethernet frame is one thing, but if multiple Ethernet frames arrive before the receiver calls recv() it will still get multiple messages in one call.
Network congestion makes it practically impossible to prevent this (while maintaining decent throughput) even if they can tell you how often they call recv().
Maybe, set TCP_NODELAY and set your MTU low enough so that there would be at most 1 message per frame? Oh, and add "dont-fragment" flag on outgoing packets
Have you tried opening a new socket for each message and closing it immediately? The overhead may be nauseating,but this should delimit your messages.
In the worst case scenario you could go one level lower (raw sockets), where you have better control over the packets sent, but then you'd have to deal with all the nitty-gritty of TCP.
Maybe you could try putting the tcp stack into low-latency mode:
echo 1 > /proc/sys/net/ipv4/tcp_low_latency
That should favor emitting packets as quickly as possible over combining data. Read the man on tcp(7) for more information.

Resources