Proper way to establish boundaries for a tcp connection - node.js

My question is about the way to properly treat data that are received by using a tcp connection. In fact by establishing a tcp connection a Stream is created.Suppose I want to send a message which has a beginning and an end. As the data are flowing in the stream without specifying any boundaries, how can i identify the beginning and the end of a message. I thought to put some special characters at the beginning and at the end of my message in order to recognize them but I wonder if it is a proper way to do. My question is therefore how can i properly establish boundaries to a message for a tcp connection? (I'm using Node.js for client side and java for server side)
thank you in advance

A plain TCP connection needs some sort of protocol which defines the data format so the receiving end knows how to interpret what is being sent. For example, http is one such protocol, webSocket is another. There are thousands of existing protocols. I'd suggest you find one that is a good match for what you want to do and use it rather than building your own.
Different protocols use different schemes for defining the data format and thus different ways of delineating pieces of your data. For example in http, it uses \n to delineate headers and then use xxxx: yyyy on each line and then uses a blank line to delineate the end of the headers.
Other protocols use a binary format that define message packets with a message type, a message length and a message payload.
There are literally hundreds of different ways to do it. Since there are so many pre-built choices out there, one can usually find an existing protocol that is a decent match and use a pre-built server and client for each end rather than writing your own protocol generating and parsing code.

Related

WinSock: Check if transfered file is complete

I'm planning to make a file transport using sockets(TCP) on Windows with C++. Hence it would be quite convenient to see if the transferred file has been received completely and correctly. What would be the best (and maybe also the easiest) way to do this?
after all of the Bytes are sent everything has been received correctly - TCP will ensure that : https://en.wikipedia.org/wiki/Transmission_Control_Protocol
Do not try to re-calculate checksums of individual packets or something, you will introduce errors. TCP is somewhat reliable, every transfer is automatically segmented, padded with checksums, reassembled and checked for a matching checksum - thats a pretty reliable transport protocol right there, it will work out-of-the-box
If you're really paranoid or simply need to create digital proof of transmission you need to choose another protocol entirely - something like SCTP, perhaps

Send TCP SYN packet with payload

Is it possible to send a SYN packet with self-defined payload when initiating TCP connections? My gut feeling is that it is doable theoretically. I'm looking for a easy way to achieve this goal in Linux (with C or perhaps Go language) but because it is not a standard behavior, I didn't find helpful information yet. (This post is quite similar while it is not very helpful.)
Please help me, thanks!
EDIT: Sorry for the ambiguity. Not only the possibility for such task, I'm also looking for a way, or even sample codes to achieve it.
As far as I understand (and as written in a comment by Jeff Bencteux in another answer), TCP Fast Open addresses this for TCP.
See this LWN article:
Eliminating a round trip
Theoretically, the initial SYN segment could contain data sent by the initiator of the connection: RFC 793, the specification for TCP, does permit data to be included in a SYN segment. However, TCP is prohibited from delivering that data to the application until the three-way handshake completes.
...
The aim of TFO is to eliminate one round trip time from a TCP conversation by allowing data to be included as part of the SYN segment that initiates the connection.
Obviously if you write your own software on both sides, it is possible to make it work however you want. But if you are relying on standard software on either end (such as, for example, a standard linux or Windows kernel), then no, it isn't possible, because according to TCP, you cannot send data until the session is established, and the session isn't established until you get an acknowledgment to your SYN from the other peer.
So, for example, if you send a SYN packet that also includes additional payload to a linux kernel (caveat: this is speculation to some extent since I haven't actually tried it), it will simply ignore the payload and proceed to acknowledge (SYN/ACK) or reject (with RST) the SYN depending on whether there's a listener.
In any case, you could try this, but since you're going "off the reservation" so to speak, you would need to craft your own raw packets; you won't be able to convince your local OS to create them for you.
The python scapy package could construct it:
#!/usr/bin/env python2
from scapy.all import *
sport = 3377
dport = 2222
src = "192.168.40.2"
dst = "192.168.40.135"
ether = Ether(type=0x800, dst="00:0c:29:60:57:04", src="00:0c:29:78:b0:ff")
ip = IP(src=src, dst=dst)
SYN = TCP(sport=sport, dport=dport, flags='S', seq=1000)
xsyn = ether / ip / SYN / "Some Data"
packet = xsyn.build()
print(repr(packet))
TCP Fast open do that. But both ends should speak TCP fast open. QUIC a new protocol is based to solve this problem AKA 0-RTT.
I had previously stated it was not possible. In the general sense, I stand by that assessment.
However, for the client, it is actually just not possible using the connect() API. There is an alternative connect API when using TCP Fast Open. Example:
sfd = socket(AF_INET, SOCK_STREAM, 0);
sendto(sfd, data, data_len, MSG_FASTOPEN,
(struct sockaddr *) &server_addr, addr_len);
// Replaces connect() + send()/write()
// read and write further data on connected socket sfd
close(sfd);
There is no API to allow the server to attach data to the SYN-ACK sent to the client.
Even so, enabling TCP Fast Open on both the client and server may allow you to achieve your desired result, if you only mean data from the client, but it has its own issues.
If you want the same reliability and data stream semantics of TCP, you will need a new reliable protocol that has the initial data segment in addition to the rest of what TCP provides, such as congestion control and window scaling.
Luckily, you don't have to implement it from scratch. The UDP protocol is a good starting point, and can serve as your L3 for your new L4.
Other projects have done similar things, so it may be possible to use those instead of implementing your own. Consider QUIC or UDT. These protocols were implemented over the existing UDP protocol, and thus avoid the issues faced with deploying TCP Fast Open.

Determine if there is Data left on the socket and discard it

I'm writing an Interface under Linux which gets Data from a TCP socket. The user provides a Buffer in which the received Data is stored. If the provided Buffer is to small I just want to return an Error.
First problem is to determine if the Buffer was to small. The recv() function just returns me the amount of bytes actually written into the Buffer. If I use the MSG_TRUNC flag stated on the recv() manpage it still returns me the same.
Second problem is to discard the data still queued in the socket. So if I would determine that my provided Buffer was to small I just want to erase everything which is left on the socket. Is there any other ways to do so except Closing and opening the socket again or just receive until nothing is left?
Best Regards
Toby
As documented in the man page, MSG_TRUNC is only valid for packet sockets (e.g. UDP) so this will not work as you want for your TCP socket which is stream based. There are literally hundreds of posts on stackoverflow and elsewhere that talk about preserving application message boundaries on TCP (hint: you need to do this yourself, TCP is a byte stream interface and doesn't) so I won't go into the details here, suffice it to say, you need a mechanism to know how big an application "message" or "packet" is on the recv() side to enable you to do what you want over TCP (or you need to switch over to UDP).
For TCP, if you need to "drain" the socket, reading until there is no data left would work, however, again, you need to consider message boundaries as mentioned above so that you do not read through one "message" and start eating into the next (again, most important point to remember is that TCP provides a byte stream interface and will not necessarily preserve your concept of application level packets or messages).

TCP handshake with SOCK_RAW socket

Ok, I realize this situation is somewhat unusual, but I need to establish a TCP connection (the 3-way handshake) using only raw sockets (in C, in linux) -- i.e. I need to construct the IP headers and TCP headers myself. I'm writing a server (so I have to first respond to the incoming SYN packet), and for whatever reason I can't seem to get it right. Yes, I realize that a SOCK_STREAM will handle this for me, but for reasons I don't want to go into that isn't an option.
The tutorials I've found online on using raw sockets all describe how to build a SYN flooder, but this is somewhat easier than actually establishing a TCP connection, since you don't have to construct a response based on the original packet. I've gotten the SYN flooder examples working, and I can read the incoming SYN packet just fine from the raw socket, but I'm still having trouble creating a valid SYN/ACK response to an incoming SYN from the client.
So, does anyone know a good tutorial on using raw sockets that goes beyond creating a SYN flooder, or does anyone have some code that could do this (using SOCK_RAW, and not SOCK_STREAM)? I would be very grateful.
MarkR is absolutely right -- the problem is that the kernel is sending reset packets in response to the initial packet because it thinks the port is closed. The kernel is beating me to the response and the connection dies. I was using tcpdump to monitor the connection already -- I should have been more observant and noticed that there were TWO replies one of which was a reset that was screwing things up, as well as the response my program created. D'OH!
The solution that seems to work best is to use an iptables rule, as suggested by MarkR, to block the outbound packets. However, there's an easier way to do it than using the mark option, as suggested. I just match whether the reset TCP flag is set. During the course of a normal connection this is unlikely to be needed, and it doesn't really matter to my application if I block all outbound reset packets from the port being used. This effectively blocks the kernel's unwanted response, but not my own packets. If the port my program is listening on is 9999 then the iptables rule looks like this:
iptables -t filter -I OUTPUT -p tcp --sport 9999 --tcp-flags RST RST -j DROP
You want to implement part of a TCP stack in userspace... this is ok, some other apps do this.
One problem you will come across is that the kernel will be sending out (generally negative, unhelpful) replies to incoming packets. This is going to screw up any communication you attempt to initiate.
One way to avoid this is to use an IP address and interface that the kernel does not have its own IP stack using- which is fine but you will need to deal with link-layer stuff (specifically, arp) yourself. That would require a socket lower than IPPROTO_IP, SOCK_RAW - you need a packet socket (I think).
It may also be possible to block the kernel's responses using an iptables rule- but I rather suspect that the rules will apply to your own packets as well somehow, unless you can manage to get them treated differently (perhaps applying a netfilter "mark" to your own packets?)
Read the man pages
socket(7)
ip(7)
packet(7)
Which explain about various options and ioctls which apply to types of sockets.
Of course you'll need a tool like Wireshark to inspect what's going on. You will need several machines to test this, I recommend using vmware (or similar) to reduce the amount of hardware required.
Sorry I can't recommend a specific tutorial.
Good luck.
I realise that this is an old thread, but here's a tutorial that goes beyond the normal SYN flooders: http://www.enderunix.org/docs/en/rawipspoof/
Hope it might be of help to someone.
I can't help you out on any tutorials.
But I can give you some advice on the tools that you could use to assist in debugging.
First off, as bmdhacks has suggested, get yourself a copy of wireshark (or tcpdump - but wireshark is easier to use). Capture a good handshake. Make sure that you save this.
Capture one of your handshakes that fails. Wireshark has quite good packet parsing and error checking, so if there's a straightforward error it will probably tell you.
Next, get yourself a copy of tcpreplay. This should also include a tool called "tcprewrite".
tcprewrite will allow you to split your previously saved capture files into two - one for each side of the handshake.
You can then use tcpreplay to play back one side of the handshake so you have a consistent set of packets to play with.
Then you use wireshark (again) to check your responses.
I don't have a tutorial, but I recently used Wireshark to good effect to debug some raw sockets programming I was doing. If you capture the packets you're sending, wireshark will do a good job of showing you if they're malformed or not. It's useful for comparing to a normal connection too.
There are structures for IP and TCP headers declared in netinet/ip.h & netinet/tcp.h respectively. You may want to look at the other headers in this directory for extra macros & stuff that may be of use.
You send a packet with the SYN flag set and a random sequence number (x). You should receive a SYN+ACK from the other side. This packet will have an acknowledgement number (y) that indicates the next sequence number the other side is expecting to receive as well as another sequence number (z). You send back an ACK packet that has sequence number x+1 and ack number z+1 to complete the connection.
You also need to make sure you calculate appropriate TCP/IP checksums & fill out the remainder of the header for the packets you send. Also, don't forget about things like host & network byte order.
TCP is defined in RFC 793, available here: http://www.faqs.org/rfcs/rfc793.html
Depending on what you're trying to do it may be easier to get existing software to handle the TCP handshaking for you.
One open source IP stack is lwIP (http://savannah.nongnu.org/projects/lwip/) which provides a full tcp/ip stack. It is very possible to get it running in user mode using either SOCK_RAW or pcap.
if you are using raw sockets, if you send using different source mac address to the actual one, linux will ignore the response packet and not send an rst.

Probability of finding TCP packets with the same payload?

I had a discussion with a developer earlier today re identifying TCP packets going out on a particular interface with the same payload. He told me that the probability of finding a TCP packet that has an equal payload (even if the same data is sent out several times) is very low due to the way TCP packets are constructed at system level. I was aware this may be the case due to the system's MTU settings (usually 1500 bytes) etc., but what sort of probability stats am I really looking at? Are there any specific protocols that would make it easier identifying matching payloads?
It is the protocol running over tcp that defines the uniqueness of the payload, not the tcp protocol itself.
For example, you might naively think that HTTP requests would all be identical when asking for a server's home page, but the referrer and user agent strings make the payloads different.
Similarly, if the response is dynamically generated, it may have a date header:
Date: Fri, 12 Sep 2008 10:44:27 GMT
So that will render the response payloads different. However, subsequent payloads may be identical, if the content is static.
Keep in mind that the actual packets will be different because of differing sequence numbers, which are supposed to be incrementing and pseudorandom.
Chris is right. More specifically, two or three pieces of information in the packet header should be different:
the sequence number (which is
intended to be unpredictable) which
is increases with the number of
bytes transmitted and received.
the timestamp, a field containing two
timestamps (although this field is optional).
the checksum, since both the payload and header are checksummed, including the changing sequence number.
EDIT: Sorry, my original idea was ridiculous.
You got me interested so I googled a little bit and found this. If you wanted to write your own tool you would probably have to inspect each payload, the easiest way would probably be some sort of hash/checksum to check for identical payloads. Just make sure you are checking the payload, not the whole packet.
As for the statistics I will have to defer to someone with greater knowledge on the workings of TCP.
Sending the same PAYLOAD is probably fairly common (particularly if you're running some sort of network service). If you mean sending out the same tcp segment (header and all) or the whole network packet (ip and up), then the probability is substantially reduced.

Resources