Torrent contain all Hashes for each piece of file..? - p2p

Torrent File contain all hashID for each piece or not..?
Means if content (data to be download) contain 1000 pieces then torrent file hold hashID for each piece or not?
If it contain all the hashID then How I get these HashID from torrent file
and If not what technique should I use to get these HashID's.

The .torrent file contain the SHA-1 hash of each piece in the file(s).
To access these hashes, decode the file (it's bencoded). In the dictionary under the key "info" there is a key called "pieces" which is a string. This string contains all piece hashes concatenated. Each has is 20 bytes long.

Related

How to get pieces value from .torrent file

I am trying to build a .torrent file interpreter. The problem is that I can't seem to understand how to go about interpreting the pieces value. I am aware that the pieces key contains a concatenation of the SHA-1 hashes for each piece and that SHA-1 contains 20 bytes. A result of this is that the final output should be a multiple of 20 bytes. However, after counting the bytes from the pieces value as a string or in hexadecimal form it still does not satisfy this. How should I interpret the pieces key?
Here we use bencode and bdecode, and the pieces value can get easily. I think you need to firstly read BEP for more details. What's more, you can see this and use it as an example.
From looking at a real torrent file, I found that the SHA-1 hashes had to be taken from its hexadecimal string format, but I previously thought that it was wrong because the byte length of the hash was not a multiple of 20. Turns out I forgot to add a trailing 0 to hexadecimals that were only 1 character (e.g. a had to be changed to 0a)

Where the compressed dictionary is stored in zlib?

In the dictionary implementation, It calculates adler check sum of the dictionary data, it places that in header and it sets FDICT flag in the header right ?. When there is match found in the data, deflate needs to point the dictionary instead of compressing it. For this we need to store this compressed dictionary data some where. Where is the compressed dictionary data is stored ?
You need to store the dictionary on the receiving end where the decoder can find it.

How does the client divide the file?

The way I understand the torrent file format is that it contains a field pieces, which specifies a hash list of each piece's SHA-1 hash. But, does it specify how large each piece should be and at which byte the division should occur? How does the client know how to divide the original file?
Thanks
You are looking for the "piece length" in the Info dictionary. Every piece is of equal length except for the final piece, which is irregular. The number of pieces is thus determined by 'ceil( total length / piece size )'.
https://wiki.theory.org/BitTorrentSpecification#Info_Dictionary

The .torrent file contains gibberish characters

Can someone explain the gibberish characters at the end of every .torrent file?
The picture shows the understandable information along with only a part of the gibberish section. It just seems like the comprehensible part ends so abruptly at the pink pipeline I painted.
By the way, I am viewing it in VIM with UTF-8 encoding, which torrent files should be encoded with if I am not mistaken.
The data you are referring to is the value for the dictionary entry with a key of pieces. The 6:pieces129140: before your marked position indicates that the entry's key has a length of 6 characters, which allows us to determine that the key is pieces. The 129140 which follows the key is the length of the entry's value, in bytes. This data structure is a result of bencoding.
The pieces dictionary entry in the .torrent file contains the SHA1 hashes for all pieces concatenated into one long string. Hashes are important as they allow the user to ensure the pieces they have downloaded are valid. Using hashes for individual pieces is better than only having the hash for the whole file, as it reduces the wasted data; you don't have to download the whole file before your client realises that the data is invalid.
SHA1 hashes consist of 20 bytes, which are stored as raw bytes in the .torrent file. This is why the data appears malformed in your editor.
pieces maps to a string whose length is a multiple of 20. It is to be subdivided into strings of length 20, each of which is the SHA1 hash of the piece at the corresponding index.
Taken from this BitTorrent protocol specification document.

SHA-1 Digest Reduction

I'm using QR Code barcodes to store UUIDs in my system and I need to check that the barcodes generated are mine and not someone else's. I also need to keep the encoded data short so that the QR Codes remain in the lower version range and remain easy to scan.
My approach is to take the UUID raw value number (a 128-bit value) and a 16 bit checksum and then Base64 encoded that data before converting to a QR code. So far so good, this works perfectly.
To generate the checksum I take the string version of the UUID and combine it with a long secret string and XOR the odd bytes together to produce a SHA-1 hash. But this hash is too long, so I XOR all the old bytes together to produce half the checksum, and likewise with the even bytes to produce the other half.
What worries me is that I have compromised the SHA-1 system needlessly by XORing it down. Would it be better to just take two unmanipulated bytes from somewhere within the result? I accept that a 16-bit checksum won't be as secure as a 160-bit checksum, but that is a price I have to pay for usability with the barcodes. What I really don't want to find is that I've now provided a checksum that is easy to crack as the UUID is transmitted in the clear.
If there is a better way of generating the checksum that would also be a suitable answer to the question. As always many thanks for your time or just reading this, double plus good thanks if you post an answer.
There's no reason to do any XORing. Simply taking the first two bytes will be as (in)secure.
To keep the code version as small as possible, you might want to convert the 144 bit value to a decimal string and encode that. QR Codes have different characters sets and encode numbers efficiently. Base64 can only be encoded as 8 bit values in QR codes so you add 30% right there.

Resources