Where the compressed dictionary is stored in zlib? - linux

In the dictionary implementation, It calculates adler check sum of the dictionary data, it places that in header and it sets FDICT flag in the header right ?. When there is match found in the data, deflate needs to point the dictionary instead of compressing it. For this we need to store this compressed dictionary data some where. Where is the compressed dictionary data is stored ?

You need to store the dictionary on the receiving end where the decoder can find it.

Related

how to use zlib.compressobj(..., zdict)

i'm trying to preset zlib's dictionary for compression. as of python 3.3 zlib.compressobj function offers the option. the docs say it should be some bytesarray or a bytes object e.g. b"often-found".
now: how to pass multiple strings ordered ascending by their likeliness to occur as suggested in the docs? is there a secret delimiter e.g. b"likely,more-likely,most-likely"?
No, there is no delimiter needed. All the dictionary is is a resource in which to look for strings that match portions of the data to be compressed. Therefore strings that are likely to occur can simply be concatenated. Or even overlapped if starts and ends match. For example if you want the words lighthouse and household to be available, you can just put lighthousehold in the dictionary.
Since it takes more bits to represent matches that are further back, you would put the most likely matches at the end of the dictionary.

The .torrent file contains gibberish characters

Can someone explain the gibberish characters at the end of every .torrent file?
The picture shows the understandable information along with only a part of the gibberish section. It just seems like the comprehensible part ends so abruptly at the pink pipeline I painted.
By the way, I am viewing it in VIM with UTF-8 encoding, which torrent files should be encoded with if I am not mistaken.
The data you are referring to is the value for the dictionary entry with a key of pieces. The 6:pieces129140: before your marked position indicates that the entry's key has a length of 6 characters, which allows us to determine that the key is pieces. The 129140 which follows the key is the length of the entry's value, in bytes. This data structure is a result of bencoding.
The pieces dictionary entry in the .torrent file contains the SHA1 hashes for all pieces concatenated into one long string. Hashes are important as they allow the user to ensure the pieces they have downloaded are valid. Using hashes for individual pieces is better than only having the hash for the whole file, as it reduces the wasted data; you don't have to download the whole file before your client realises that the data is invalid.
SHA1 hashes consist of 20 bytes, which are stored as raw bytes in the .torrent file. This is why the data appears malformed in your editor.
pieces maps to a string whose length is a multiple of 20. It is to be subdivided into strings of length 20, each of which is the SHA1 hash of the piece at the corresponding index.
Taken from this BitTorrent protocol specification document.

how to write xdr packed data in binary files

I want to convert an array into xdr format and save it in binary format. Here's my code:
# myData is Pandas data frame, whose 3rd col is int (but it could anything else)
import xdrlib
p=xdrlib.Packer()
p.pack_list(myData[2],p.pack_int)
newFile=open("C:\\Temp\\test.bin","wb")
# not sure what to put
# p.get_buffer() returns a string as per document, but how can I provide xdr object?
newFile.write(???)
newFile.close()
How can I provide the XDR-"packed" data to newFile.write function?
Thanks
XDR is a pretty raw data format. Its specification (RFC 1832) doesn't specify any file headers, or anything else, beyond the encoding of various data types.
The binary string you get from p.get_buffer() is the XDR encoding of the data you've fed to p. There is no other kind of XDR object.
I suspect that what you want is simply newFile.write(p.get_buffer()).
Unrelated to the XDR question, I'd suggest using a with statement to take care of closing the file.

how to write output in a file as dictionary not string

im running analysis on a huge file which take several hours to finish and result is a dictionary which i need for next steps and i want to save the output in a file to keep it. but when i write the output in file, it converts the dictionary output to str and saves it, but python can not interpret the saved str as dictionary in future
for example my output dictionary is
output={a:[1,2]}
when i save it , its being saved as :
'{a:[1,2]}' #can not be interpreted as dictionary by python anymore for further use in future!
is there anyway so i could save my output as dictionary in a file or is there any way python could convert string back to dictionary from a file?!
If the dictionary contains solely of values of simple types, then you can use dump() and load() from the json module to produce and retrieve a text representation.
The representations produced by str() are not meant to be valid Python in all cases. It might be possible to reparse the representations produced by repr(), but json is a safe bet, and it's cross-programming-language and cross platform.
If the dictionary contains non-simple types, the json module has provisions allowing yout to provide your own marshalling/unmarshalling for them.

Torrent contain all Hashes for each piece of file..?

Torrent File contain all hashID for each piece or not..?
Means if content (data to be download) contain 1000 pieces then torrent file hold hashID for each piece or not?
If it contain all the hashID then How I get these HashID from torrent file
and If not what technique should I use to get these HashID's.
The .torrent file contain the SHA-1 hash of each piece in the file(s).
To access these hashes, decode the file (it's bencoded). In the dictionary under the key "info" there is a key called "pieces" which is a string. This string contains all piece hashes concatenated. Each has is 20 bytes long.

Resources