Probability of finding TCP packets with the same payload?

Probability of finding TCP packets with the same payload? - statistics

I had a discussion with a developer earlier today re identifying TCP packets going out on a particular interface with the same payload. He told me that the probability of finding a TCP packet that has an equal payload (even if the same data is sent out several times) is very low due to the way TCP packets are constructed at system level. I was aware this may be the case due to the system's MTU settings (usually 1500 bytes) etc., but what sort of probability stats am I really looking at? Are there any specific protocols that would make it easier identifying matching payloads?

It is the protocol running over tcp that defines the uniqueness of the payload, not the tcp protocol itself.
For example, you might naively think that HTTP requests would all be identical when asking for a server's home page, but the referrer and user agent strings make the payloads different.
Similarly, if the response is dynamically generated, it may have a date header:
Date: Fri, 12 Sep 2008 10:44:27 GMT
So that will render the response payloads different. However, subsequent payloads may be identical, if the content is static.
Keep in mind that the actual packets will be different because of differing sequence numbers, which are supposed to be incrementing and pseudorandom.

Chris is right. More specifically, two or three pieces of information in the packet header should be different:
the sequence number (which is
intended to be unpredictable) which
is increases with the number of
bytes transmitted and received.
the timestamp, a field containing two
timestamps (although this field is optional).
the checksum, since both the payload and header are checksummed, including the changing sequence number.

EDIT: Sorry, my original idea was ridiculous.
You got me interested so I googled a little bit and found this. If you wanted to write your own tool you would probably have to inspect each payload, the easiest way would probably be some sort of hash/checksum to check for identical payloads. Just make sure you are checking the payload, not the whole packet.
As for the statistics I will have to defer to someone with greater knowledge on the workings of TCP.

Sending the same PAYLOAD is probably fairly common (particularly if you're running some sort of network service). If you mean sending out the same tcp segment (header and all) or the whole network packet (ip and up), then the probability is substantially reduced.

Related

WinSock: Check if transfered file is complete

I'm planning to make a file transport using sockets(TCP) on Windows with C++. Hence it would be quite convenient to see if the transferred file has been received completely and correctly. What would be the best (and maybe also the easiest) way to do this?

after all of the Bytes are sent everything has been received correctly - TCP will ensure that : https://en.wikipedia.org/wiki/Transmission_Control_Protocol
Do not try to re-calculate checksums of individual packets or something, you will introduce errors. TCP is somewhat reliable, every transfer is automatically segmented, padded with checksums, reassembled and checked for a matching checksum - thats a pretty reliable transport protocol right there, it will work out-of-the-box
If you're really paranoid or simply need to create digital proof of transmission you need to choose another protocol entirely - something like SCTP, perhaps

Finding out the number of dropped packets in raw sockets

I am developing a program that sniffs network packets using a raw socket (AF_PACKET, SOCK_RAW) and processes them in some way.
I am not sure whether my program runs fast enough and succeeds to capture all packets on the socket. I am worried that the recieve buffer for this socket occainally gets full (due to traffic bursts) and some packets are dropped.
How do I know if packets were dropped due to lack of space in the
socket's receive buffer?
I have tried running ss -f link -nlp.
This outputs the number of bytes that are currently stored in the revice buffer for that socket, but I can not tell if any packets were dropped.
I am using Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-52-generic x86_64).
Thanks.

I was having a similar problem as you. I knew that tcpdump was able to to generate statistics about packet drops, so I tried to figure out how it did that. By looking at the code of tcpdump, I noticed that it is not generating those statistic by itself, but that it is using the libpcap library to get those statistics. The libpcap is on the other hand getting those statistics by accessing the if_packet.h header and calling the PACKET_STATISTICS socket option (at least I think so, but I'm no C expert).
Therefore, I saw only two solutions to the problem:
I had to interact somehow with the linux header files from my Pyhton script to get the packet statistics, which seemed a bit complicated.
Use the Python version of libpcap which is pypcap to get those information.
Since I had no clue how to do the first thing, I implemented the second option. Here is an example how to get packet statistics using pypcap and how to get the packet data using dpkg:
import pcap
import dpkt
import socket
pc=pcap.pcap(name="eth0", timeout_ms=10000, immediate=True)
def packet_handler(ts,pkt):
#printing packet statistic (packets received, packets dropped, packets dropped by interface
print pc.stats()
#example packet parsing using dpkt
eth=dpkt.ethernet.Ethernet(pkt)
if eth.type != dpkt.ethernet.ETH_TYPE_IP:
return
ip =eth.data
layer4=ip.data
ipsrc=socket.inet_ntoa(ip.src)
ipdst=socket.inet_ntoa(ip.dst)
pc.loop(0,packet_handler)
tpacket_stats structure is defined in linux/packet.h header file
Create variable using the tpacket_stats structre and pass it to getSockOpt with PACKET_STATISTICS SOL_SOCKET options will give packets received and dropped count.
-- some times drop can be due to buffer size
-- so if you want to decrease the drop count check increasing the buffersize using setsockopt function

First off, switch your operating system.
You need a reliable, network oriented operating system. Not some pink fluffy "ease of use" with "security" functionality enabled. NetBSD or Gentoo/ArchLinux (the bare installations, not the GUI kitted ones).
Start a simultaneous tcpdump on a network tap and capture the traffic you're supposed to receive along side of your program and compare the results.
There's no efficient way to check if you've received all the packets you intended to on the receiving end since the packets might be dropped on a lower level than you anticipate.
Also this is a question for Unix # StackOverflow, there's no programming here what I can see, at least there's no code.
The only certain way to verify packet drops is to have a much more beefy sender (perhaps a farm of machines that send packets) to a single client, record every packet sent to your reciever. Have the statistical data analyzed and compared against your senders and see how much you dropped.
The cheaper way is to buy a network tap or even more ad-hoc enable port mirroring in your switch if possible. This enables you to dump as much traffic as possible into a second machine.
This will give you a more accurate result because your application machine will be busy as it is taking care of incoming traffic and processing it.
Further more, this is why network taps are effective because they split the communication up into two channels, the receiving and sending directions of your traffic if you will. This enables you to capture traffic on two separate machines (also using tcpdump, but instead of a mirrored port, you get a more accurate traffic mirroring).
So either use port mirroring
Or you buy one of these:

Heartbleed: Payloads and padding

I am left with a few questions after reading the RFC 6520 for Heartbeat:
https://www.rfc-editor.org/rfc/rfc6520
Specifically, I don't understand why a heartbeat needs to include arbitrary payloads or even padding for that matter. From what I can understand, the purpose of the heartbeat is to verify that the other party is still paying attention at the other end of the line.
What does these variable length custom payloads provide that a fixed request and response do not?
E.g.
Alice: still alive?
Bob: still alive!
After all, FTP uses the NOOP command to keep connections alive, which seem to work fine.

There is, in fact, a reason for this payload/padding within RFC 6520
From the document:
The user can use the new HeartbeatRequest message,
which has to be answered by the peer with a HeartbeartResponse
immediately. To perform PMTU discovery, HeartbeatRequest messages
containing padding can be used as probe packets, as described in
[RFC4821].
>In particular, after a number of retransmissions without
receiving a corresponding HeartbeatResponse message having the
expected payload, the DTLS connection SHOULD be terminated.
>When a HeartbeatRequest message is received and sending a
HeartbeatResponse is not prohibited as described elsewhere in this
document, the receiver MUST send a corresponding HeartbeatResponse
message carrying an exact copy of the payload of the received
HeartbeatRequest.
If a received HeartbeatResponse message does not contain the expected
payload, the message MUST be discarded silently. If it does contain
the expected payload, the retransmission timer MUST be stopped.
Credit to pwg at HackerNews. There is a good and relevant discussion there as well.

(The following is not a direct answer, but is here to highlight related comments on another question about Heartbleed.)
There are arguments against the protocol design that allowed an arbitrary limit - either that there should have been no payload (or even echo/heartbeat feature) or that a small finite/fixed payload would have been a better design.
From the comments on the accepted answer in Is the heartbleed bug a manifestation of the classic buffer overflow exploit in C?
(R..) In regards to the last question, I would say any large echo request is malicious. It's consuming server resources (bandwidth, which costs money) to do something completely useless. There's really no valid reason for the heartbeat operation to support any length but zero
(Eric Lippert) Had the designers of the API believed that then they would not have allowed a buffer to be passed at all, so clearly they did not believe that. There must be some by-design reason to support the echo feature; why it was not a fixed-size 4 byte buffer, which seems adequate to me, I do not know.
(R..) .. Nobody thinking from a security standpoint would think that supporting arbitrary echo requests is reasonable. Even if it weren't for the heartbleed overflow issue, there may be cryptographic weaknesses related to having such control over the content the peer sends; this seems unlikely, but in the absence of a strong reason to support a[n echo] feature, a cryptographic system should not support it. It should be as simple as possible.

While I don't know the exact motivation behind this decision, it may have been motivated by the ICMP echo request packets used by the ping utility. In an ICMP echo request, an arbitrary payload of data can be attached to the packet, and the destination server will return exactly that payload if it is reachable and responding to ping requests. This can be used to verify that data is being properly sent across the network and that payloads aren't being corrupted in transit.

Is a UDP packet guaranteed to be complete, practical sense, if delivered?

Is it a well known fact that UDP (User Datagram Protocol) is not secure, because the order of the packets sent with it may not be delivered in order, even at all. However if an UDP packet is delivered. Are the information in that packet in practical sense (99.99% and above), guaranteed to be correct?
Is a UDP packet quaranteed to be complete (not corrupted) if delivered, in practical sense (99.99% and above)?
Thanks in advance!

No for two reasons:
UDP checksums are not mandatory (with IPv4). So corrupted packets can be delivered to applications.
Internet checksums can clash much more frequently than other hashes. So even if the checksum matches, the data may be corrupted.

I am no expert but as far as I know, although there isn't any guarantee that the package reaches the destination at all in the most cases it should be correct if it reaches the destination. I think that should be the case because normally there is an error check (Frame Check Sum) on the Data Link Layer.

Order Of Things

I have a sniffer in C++ where I'm getting the Source IP, Destination IP, Control Bit, and Sequence number. I am also getting the IP header and then the TCP info. I want to get the content type of the packets. Do I need to reassemble the packets to do that? Or can I use http request and respond to get the content type of the packets. Any help is appreciated, thank you!

There is no "content type". TCP will only provide an octet stream for the layer above TCP to interpret. If you are sniffing HTTP over TCP, you will have to assemble the packets, and parse the HTTP yourself.
Have you considered using Wireshark?
Update
By assembling the TCP packets into the octet stream, you basically append the payload of the TCP packets into one big byte array. Make sure you pay attention to the sequence number of the TCP packets, because the packets may arrive out of order.
Parsing the HTTP content is much trickier. The first headers are always in ASCII. They specify the content type and content length. It's the content type part that is tricky. Stuff may be encoded in a variety of encoding techniques, and they may be enveloped with yet another encoding technique (zip stream, SSL, etc).
TCP RFC: http://www.faqs.org/rfcs/rfc793.html
HTTP 1.1 RFC: http://www.faqs.org/rfcs/rfc2616.html
It might be a good idea to see how both Wireshark and WinPcap does it. I'm not sure if WinPcap contains filters and decoders for HTTP (basically bringing you the content of the HTTP) or not. At any rate, it might be worth checking out the code.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string