Can someone explain the gibberish characters at the end of every .torrent file?
The picture shows the understandable information along with only a part of the gibberish section. It just seems like the comprehensible part ends so abruptly at the pink pipeline I painted.
By the way, I am viewing it in VIM with UTF-8 encoding, which torrent files should be encoded with if I am not mistaken.
The data you are referring to is the value for the dictionary entry with a key of pieces. The 6:pieces129140: before your marked position indicates that the entry's key has a length of 6 characters, which allows us to determine that the key is pieces. The 129140 which follows the key is the length of the entry's value, in bytes. This data structure is a result of bencoding.
The pieces dictionary entry in the .torrent file contains the SHA1 hashes for all pieces concatenated into one long string. Hashes are important as they allow the user to ensure the pieces they have downloaded are valid. Using hashes for individual pieces is better than only having the hash for the whole file, as it reduces the wasted data; you don't have to download the whole file before your client realises that the data is invalid.
SHA1 hashes consist of 20 bytes, which are stored as raw bytes in the .torrent file. This is why the data appears malformed in your editor.
pieces maps to a string whose length is a multiple of 20. It is to be subdivided into strings of length 20, each of which is the SHA1 hash of the piece at the corresponding index.
Taken from this BitTorrent protocol specification document.
Related
I am trying to build a .torrent file interpreter. The problem is that I can't seem to understand how to go about interpreting the pieces value. I am aware that the pieces key contains a concatenation of the SHA-1 hashes for each piece and that SHA-1 contains 20 bytes. A result of this is that the final output should be a multiple of 20 bytes. However, after counting the bytes from the pieces value as a string or in hexadecimal form it still does not satisfy this. How should I interpret the pieces key?
Here we use bencode and bdecode, and the pieces value can get easily. I think you need to firstly read BEP for more details. What's more, you can see this and use it as an example.
From looking at a real torrent file, I found that the SHA-1 hashes had to be taken from its hexadecimal string format, but I previously thought that it was wrong because the byte length of the hash was not a multiple of 20. Turns out I forgot to add a trailing 0 to hexadecimals that were only 1 character (e.g. a had to be changed to 0a)
According to wikipedia:
When the number of bytes to encode is not divisible by three (that is,
if there are only one or two bytes of input for the last 24-bit
block), then the following action is performed:
Add extra bytes with value zero so there are three bytes, and perform
the conversion to base64.
However, if we got an extra \0 character at the end, the last 6 bits of the input have a value of 0. And the number 0 must be base64-codified as A. The character = doesn't even belong to the base64 encoding table.
I know that those extra null characters doesn't belong to the original binary string, so, we use a different character (=) to avoid confussions, but anyway, the Wikipedia article and other thousand sites doesn't say that. They say that the newly constructed string must be base64-encoded (sentence which strictly implies the use of the transformation table).
Are all of these sites wrong?
Any sequence of four characters chosen from the main base64 set will represent precisely three octets worth of data. Consequently, If the total length of the file to be encoded it will be necessary to either:
Allow the encoded file to have a length which is not a multiple of 4.
Allow the encoded file to have characters outside the main set of 64.
If the former approach were used, then concatenating of files whose length
was not a multiple of three would be likely to yield a file that might
appear valid but would contain bogus information. For example, a file
with length 32 would expand to ten groups of four base64 characters plus
three more for the final pair of octets (total 43). Concatenating another
file with length 32 would yield a total of 86 characters which might look
valid, but information from the second half would not decode correctly.
Using the latter approach, concatenation of files whose length was not a
multiple of three would yield a result that could be unambiguously parsed
or, at worst, recognized as invalid (the base64 Standard does not regard
as valid a file that contains "=" anywhere but at the end, but one could
write a decoder that could process such files unambiguously). In any case,
having such a file be regarded as invalid would be better than having a file
which appeared valid but which produces incorrect data when decoded.
The way I understand the torrent file format is that it contains a field pieces, which specifies a hash list of each piece's SHA-1 hash. But, does it specify how large each piece should be and at which byte the division should occur? How does the client know how to divide the original file?
Thanks
You are looking for the "piece length" in the Info dictionary. Every piece is of equal length except for the final piece, which is irregular. The number of pieces is thus determined by 'ceil( total length / piece size )'.
https://wiki.theory.org/BitTorrentSpecification#Info_Dictionary
Torrent File contain all hashID for each piece or not..?
Means if content (data to be download) contain 1000 pieces then torrent file hold hashID for each piece or not?
If it contain all the hashID then How I get these HashID from torrent file
and If not what technique should I use to get these HashID's.
The .torrent file contain the SHA-1 hash of each piece in the file(s).
To access these hashes, decode the file (it's bencoded). In the dictionary under the key "info" there is a key called "pieces" which is a string. This string contains all piece hashes concatenated. Each has is 20 bytes long.
I am doing a security review on a system.
From one part of the system to another, information is sent using an encrypted string.
This string is over 400 characters long, but within it are 4 sets of 10 identical characters. I am assuming that the data that was encrypted also has this pattern, for example the word "parameters".
I have tested encrypting a string containing several identical strings with DES, but do not get the same pattern.
Question is: Is there an encryption method that would produce this result. Or have the parts been encrypted seperatly and conncatinated?
An encryption system with short key length and no correlation between blocks (e.g. ECB mode) would encrypt short runs of identical plain text identically. It could also just be coincidence, of course.
If what you're seeing is real, it's mostly about encryption mode, not the cipher. Likely culprits are a block cipher in ECB mode (which is usually a bad idea), or the pseudo-"stream" cipher of XORing the plaintext with a short password string repeated over and over (in which case the odds of two copies of the same plaintext at random positions encoding to the same thing are 1 in passwordlength) this one is a really bad idea.
By the way, it's best to be clear what format you're looking at the data in. If it's hex, okay. If it's base64, you should decode it before you look at it -- identical strings won't always look identical after base64 encoding depending on their alignment to a 3-byte boundary.
And just for illustration, here's a discussion of ECB mode on Wikipedia including pictures of the entropy problem with ECB -- scroll down to the pictures of Tux.
What do you mean with "4 sets of 10 identical characters"?
If you mean 4 identical substrings with length 10, it may be the Caesar cipher, which is totally unsecure, as it can be deciphered by a human in no time. Another possibility is the use of an XOR cipher with a bad chosen key.