Can somebody shed a light what this strange DHT response means? - bittorrent

Sometimes I receive this strange responses from other nodes. Transaction id match to my request transaction id as well as the remote IP so I tend to believe that node responded with this but it looks like sort of a mix of response and request
d1:q9:find_node1:rd2:id20:.éV0özý.?tj­N.?.!2:ip4:DÄ.^7:nodes.v26:.ï?M.:iSµLW.Ðä¸úzDÄ.^æCe1:t2:..1:y1:re
Worst of all is that it is malformed. Look at 7:nodes.v it means that I add nodes.v to the dictionary. It is supposed to be 5:nodes. So, I'm lost. What is it?

The internet and remote nodes is unreliable or buggy. You have to code defensively. Do not assume that everything you receive will be valid.
Remote peers might
send invalid bencoding, discard those, don't even try to recover.
send truncated messages. usually not recoverable unless it happens to be the very last e of the root dictionary.
omit mandatory keys. you can either ignore those messages or return an error message
contain corrupted data
include unknown keys beyond the mandatory ones. this is not an error, just treat them as if they weren't there for the sake of forward-compatibility
actually be attackers trying to fuzz your implementation or use you as DoS amplifier
I also suspect that some really shoddy implementations are based on whatever string types their programming language supports and incorrectly handle encoding instead of using arrays of uint8 as bencoding demands. There's nothing that can be done about those. Ignore or occasionally send an error message.
Specified dictionary keys are usually ASCII-mappable, but this is not a requirement. E.g. there are some tracker response types that actually use random binary data as dictionary keys.
Here are a few examples of junk I'm seeing[1] that even fails bdecoding:
d1:ad2:id20:�w)��-��t����=?�������i�&�i!94h�#7U���P�)�x��f��YMlE���p:q9Q�etjy��r7�:t�5�����N��H�|1�S�
d1:e�����������������H#
d1:ad2:id20:�����:��m�e��2~�����9>inm�_hash20:X�j�D��nY��-������X�6:noseedi1ee1:q9:get_peers1:t2:�=1:v4:LT��1:y1:qe
d1:ad2:id20:�����:��m�e��2~�����9=inl�_hash20:X�j�D��nY���������X�6:noseedi1ee1:q9:get_peers1:t2:�=1:v4:LT��1:y1:qe
d1:ad2:id20:�����:��m�e��2~�����9?ino�_hash20:X�j�D��nY���������X�6:noseedi1ee1:q9:get_peers1:t2:�=1:v4:LT��1:y1:qe
[1] preserved char count. replaced all non-printable, ASCII-incompatible bytes with the unicode replacement character.

Related

Unused bytes by protobuf implementation (for limiter implementation)

I need to transfer data over a serial port. In order to ensure integrity of the data, I want a small envelope protocol around each protobuf message. I thought about the following:
message type (1 byte)
message size (2 bytes)
protobuf message (N bytes)
(checksum; optional)
The message type will mostly be a mapping between messages defined in proto files. However, if a message gets corrupted or some bytes are lost, the message size will not be correct and all subsequent bytes cannot be interpreted anymore. One way to solve this would be the introduction of limiters between messages, but for that I need to choose something that is not used by protobuf. Is there a byte sequence that is never used by any protobuf message?
I also thought about a different way. If the master finds out that packages are corrupted, it should reset the communication to a clean start. For that I want the master to send a RESTART command to the slave. The slave should answer with an ACK and then start sending complete messages again. All bytes received between RESTART and ACK are to be discarded by the master. I want to encode ACK and RESTART as special messages. But with that approach I face the same problem: I need to find byte sequences for ACK and RESTART that are not used by any protobuf messages.
Maybe I am also taking the wrong approach - feel free to suggest other approaches to deal with lost bytes.
Is there a byte sequence that is never used by any protobuf message?
No; it is a binary serializer and can contain arbitrary binary payloads (especially in the bytes type). You cannot use sentinel values. Length prefix is fine (your "message size" header), and a checksum may be a pragmatic option. Alternatively, you could impose an artificial sentinel to follow each message (maybe a guid chosen per-connection as part of the initial handshake), and use that to double-check that everything looks correct.
One way to help recover packet synchronization after a rare problem is to use synchronization words in the beginning of the message, and use the checksum to check for valid messages.
This means that you put a constant value, e.g. 0x12345678, before your message type field. Then if a message fails checksum check, you can recover by finding the next 0x12345678 in your data.
Even though that value could sometimes occur in the middle of the message, it doesn't matter much. The checksum check will very probably catch that there isn't a real message at that position, and you can search forwards until you find the next marker.

What is BitTorrent peer (Deluge) saying?

I'm writing a small app to test out how torrent p2p works and I created a sample torrent and am seeding it from my Deluge client. From my app I'm trying to connect to Deluge and download the file.
The torrent in question is a single-file torrent (file is called A - without any extension), and its data is the ASCII string Test.
Referring to this I was able to submit the initial handshake and also get a valid response back.
Immediately afterwards Deluge is sending even more data. From the 5th byte it would seem like it is a bitfield message, but I'm not sure what to make of it. I read that torrent clients may send a mixture of Bitfield and Have messages to show which parts of the torrent they possess. (My client isn't sending any bitfield, since it is assuming not to have any part of the file in question).
If my understanding is correct, it's stating that the message size is 2: one for identifier + payload. If that's the case why is it sending so much more data, and what's that supposed to be?
Same thing happens after my app sends an interested command. Deluge responds with a 1-byte message of unchoke (but then again appends more data).
And finally when it actually submits the piece, I'm not sure what to make of the data. The first underlined byte is 84 which corresponds to the letter T, as expected, but I cannot make much more sense of the rest of the data.
Note that the link in question does not really specify how the clients should supply messages in order once the initial handshake is completed. I just assumed to send interested and request based on what seemed to make sense to me, but I might be completely off.
I don't think Deluge is sending the additional bytes you're seeing.
If you look at them, you'll notice that all of the extra bytes are bytes that already existed in the handshake message, which should have been the longest message you received so far.
I think you're reading new messages into the same buffer, without zeroing it out or anything, so you're seeing bytes from earlier messages again, following the bytes of the latest message you read.
Consider checking if the network API you're using has a way to check the number of bytes actually received, and only look at that slice of the buffer, not the entire thing.

Heartbleed: Payloads and padding

I am left with a few questions after reading the RFC 6520 for Heartbeat:
https://www.rfc-editor.org/rfc/rfc6520
Specifically, I don't understand why a heartbeat needs to include arbitrary payloads or even padding for that matter. From what I can understand, the purpose of the heartbeat is to verify that the other party is still paying attention at the other end of the line.
What does these variable length custom payloads provide that a fixed request and response do not?
E.g.
Alice: still alive?
Bob: still alive!
After all, FTP uses the NOOP command to keep connections alive, which seem to work fine.
There is, in fact, a reason for this payload/padding within RFC 6520
From the document:
The user can use the new HeartbeatRequest message,
which has to be answered by the peer with a HeartbeartResponse
immediately. To perform PMTU discovery, HeartbeatRequest messages
containing padding can be used as probe packets, as described in
[RFC4821].
>In particular, after a number of retransmissions without
receiving a corresponding HeartbeatResponse message having the
expected payload, the DTLS connection SHOULD be terminated.
>When a HeartbeatRequest message is received and sending a
HeartbeatResponse is not prohibited as described elsewhere in this
document, the receiver MUST send a corresponding HeartbeatResponse
message carrying an exact copy of the payload of the received
HeartbeatRequest.
If a received HeartbeatResponse message does not contain the expected
payload, the message MUST be discarded silently. If it does contain
the expected payload, the retransmission timer MUST be stopped.
Credit to pwg at HackerNews. There is a good and relevant discussion there as well.
(The following is not a direct answer, but is here to highlight related comments on another question about Heartbleed.)
There are arguments against the protocol design that allowed an arbitrary limit - either that there should have been no payload (or even echo/heartbeat feature) or that a small finite/fixed payload would have been a better design.
From the comments on the accepted answer in Is the heartbleed bug a manifestation of the classic buffer overflow exploit in C?
(R..) In regards to the last question, I would say any large echo request is malicious. It's consuming server resources (bandwidth, which costs money) to do something completely useless. There's really no valid reason for the heartbeat operation to support any length but zero
(Eric Lippert) Had the designers of the API believed that then they would not have allowed a buffer to be passed at all, so clearly they did not believe that. There must be some by-design reason to support the echo feature; why it was not a fixed-size 4 byte buffer, which seems adequate to me, I do not know.
(R..) .. Nobody thinking from a security standpoint would think that supporting arbitrary echo requests is reasonable. Even if it weren't for the heartbleed overflow issue, there may be cryptographic weaknesses related to having such control over the content the peer sends; this seems unlikely, but in the absence of a strong reason to support a[n echo] feature, a cryptographic system should not support it. It should be as simple as possible.
While I don't know the exact motivation behind this decision, it may have been motivated by the ICMP echo request packets used by the ping utility. In an ICMP echo request, an arbitrary payload of data can be attached to the packet, and the destination server will return exactly that payload if it is reachable and responding to ping requests. This can be used to verify that data is being properly sent across the network and that payloads aren't being corrupted in transit.

How long should a message header/prefix be?

I've worked with a few protocols, and written my own. I have written some message formats with only 1 char to identify the message, and some with 4 chars. I don't feel that I'm experienced enough to tell which is better, so I'm looking for an answer which describes in which scenario one might be better than the other.
For performance, you would imagine that sending 2 bytes (A%1i) is faster than sending 5 bytes (ABCD%1i). However, I have noticed that when writing the protocol with the 1 byte prefix, if you have a bug which causes your code to not read enough data from the socket, you might get garbage data comming into your system.
So is the purpose of a 4 byte prefix just to provide a guarentee that your message is clean? Is it worth it for the performance you sacrafice? Do you really sacrafice any performance at all? Maybe it's better to have 2 or 3 byte prefix?
I'm not sure if this question should be specific to TCP, or whether it applies to all transport protocols. Advice on this would be interesting.
Update: For interest, I will mention that Synergy uses 4-byte message prefixes, so for a mouse move delta the header is the same size as the actual data. Some have suggested just having a 1 or 2 byte prefix to improve efficiency. I wonder what drawbacks this would have?
Update: Also, I wonder if only the handshake really matters, if you're worried about garbage data. Synergy has a long handshake (a few bytes), so are the 4-byte message prefixes needed? I made a protocol recently that has only a 1 byte handshake, and that turned out to be a bad idea, since incompatible protocols were spamming the system with bad data (off the back of this, I might reccomend at least having a long handshake).
The purpose of the header is to make it easier to solve the frame synchronization problem ( byte aligning in serial communication ).
To synchronize, the receiver looks for anything in the data stream that "looks like" a start-of-message header.
If you have lots of different kinds of valid start-of-message headers, and all of them are 1 byte long, then you will inevitably get a lot of "false frame synchronizations" -- garbage from something that "looks like" a start-of-message header, but isn't.
It would be better to pick some other header that makes it "unlikely" that anything in the serial data stream "looks like" a valid start-of-message header.
It is inevitable that you will get garbage data coming into your system, no matter how you design the packet header.
Whatever you use to handle these other problems (such as occasional bit errors in the middle of the message) should also be adequate to handle the occasional "false frame synchronization" garbage.
In some systems, any bad data is quickly overwritten by fresh new good data, and if you blink you might never see the bad data.
Other systems need at least some sort of error detection in the footer to reject the bad data.
Yet other systems need to not only detect such errors, but somehow keep re-sending that message -- until both sides are convinced that an error-free version of that message has been successfully received.
As Oleksi implied, in some systems the latency is not significantly different between sending a single binary bit (100 ms) and sending 10 bytes (102.4 ms).
So the advantages of using a tiny header (2.4% less latency!) may not be worth it compared to the advantages of using a more verbose header (easier debugging; easier to make backward-compatible and forward-compatible; easier to test the effect of minor changes "in isolation" without upgrading both sides in lockstep to the new protocol which is completely incompatible with the old protocol).
Perhaps you could get the best of both worlds by (a) keeping the verbose, easy-to-debug headers on messages that are so rarely used that the effect of tiny headers is too small to measure (which I suspect is nearly all messages), and (b) introducing a "tiny header" format for any kind of message where the effect of tiny headers is "noticeably better" or at least at least measurable.
It looks like the Synergy protocol is flexible enough to add such a "tiny header" format in a way that is easily distinguishable from the other kinds of message headers.
I use Synergy between my laptop and a few desktop machines. I am glad someone is trying to make it even better.
The performance will depend on the content of the message you are sending. If your content is several kilobytes, it doesn't really matter how many bytes your header is. For now, I would choose the scheme that's easiest to work with, because the performance difference between sending one byte, or four bytes is going to be negligible compared to the actual data that you're sending.

Understanding protocols

guys need some insight here.
I know the definition of a protocol, being new to this c++ programming is quite a challenging
task.I am creating a Multi-threaded chat using SDL/C++, this is a learning experience for me
and now i have encounter a hump in which I need to overcome but understanding it is a little more difficult than I had thought.I need to make a chat protocol of some sort, I think...but am stump. Up until this point i have been sending messages in strings of characters.Now that am improving the application to the point where clients can register and login, I need a better way to communicating with my clients and server.
thank you.
Create objects that represent a message, then serialize the object, send it over the network, then deserialize at the other end.
For example, you could create a class called LoginMessage that contains two fields. One for a user name, and one for a password. To login, you would do something like:
LoginMessage *msg = new LoginMessage();
msg->username = "Fred";
msg->password = "you'll never guess";
char *serialized_msg = serialize(msg);
// send the bytes over the network
You would do something similar at the other end to convert the byte stream back into an object.
There are APIs for creating message objects and serializing them for you. Here are two popular ones. Both should suit your needs.
Protocol Buffers by Google
Thrift By Facebook
If you want the serialized messages to be readable, you can use YAML. Google has an API called yaml-cpp for serializing data to YAML format.
UPDATE:
Those APIs are for making your own protocol. They just handle the conversion of messages from object form to byte stream form. They do have feature for the actual transport of the messages over the network, but you don't need to use those features. How you design your protocol it up to you. But if you want to create messages by hand, you can do that too.
I'll give you some ideas for creating your own message format.
This is one way to do it.
Have the first 4 bytes of the message represent the length of the message as an unsigned integer. This is necessary to figure out where one message ends and where the next one starts. You will need to convert between host and network byte order when reading and writing to/from these four bytes.
Have the 5th byte represent the message type. For example, you could use a 1 to indicate a login request, a 2 to indicate a login response, and 3 to indicate a chat message. This byte is necessary for interpreting the meaning of the remaining bytes.
The remaining bytes would contain the message contents. For example, if it was a login message, you would encode the username and password into these bytes somehow. If it is a chat message, these bytes would contain the chat text.

Resources