I receive some binary data via socket in nodejs, zipped using zlib.
I do not have access to the source that origins the messages.
when I try to unzip the data I got "sometimes" errors from zlib.
below is an example, I coded two message in HEX for convenience.
I am sure they are both zlib compressed (the header "789c" give me that confidence), but I cannot understand why the test1 message works and the test2 message don't.
Maybe a dictionary is needed?
Maybe a version missmatch between compression and decompression algorithm?
I feel I can exclude an issue on reading data, since both messages are read the same way.
The help of a zlib expert will be very appreciated.
let test1 = "789c4d8e410e82301444f79ea2f9fb021f51206971e5c2ad7a01da7e840894d06a20c6bb8bbad0e54c66f29ed84d5dcbee34bac6f61230886057ac4439d1e0f777eabd2b84b2662e84b6bda7c933dd364b2dc139e5815d6e8d91f040aa22131bcdb7581a9e6c50f11c938c27695646b45e672a534f60aeb6c361d9a7e936e2790c3f701c2006086121e84365ba267d75b74e02b0ba74a7d9795a0202b35545e3791e48c2f17d08bf7ee1bff3ea051dd44446";
let test2 = "789c4d8e410e82301444f79ca2f9fb0205059ab4b072e156bd006d3f42544a683110e3dd455de8722633794f54f3ed4aee38bacef6125818435506a29e71f0bb3bf6de954259b39442dbdee3ec89be766b2dc139e5819ca7ce4878285d37292b38cd52cde826ce38e55b8334df2aad8d490aaeea2710d7da61bfeef33c8b294fe0074e42c642065129f04325ba457d71d34d0290b676c7c5795c5c0303629b06c7d332a084c3fb107dfda27fe7e0054ca744b6";
console.log(zlib.unzipSync(Buffer.from(test1, "hex")).toString()); // correct output
console.log(zlib.unzipSync(Buffer.from(test2, "hex")).toString()); // ERROR: data check
The second test message is, in fact, invalid. Either it was corrupted in transit (including within your code), or it was corrupted when it was made.
One thing to check for is, if you are on a Windows operating system, whether you are using binary mode to read the file. If not, then it would in fact be possible for some inputs to be corrupted and not others.
Related
I'm writing a small app to test out how torrent p2p works and I created a sample torrent and am seeding it from my Deluge client. From my app I'm trying to connect to Deluge and download the file.
The torrent in question is a single-file torrent (file is called A - without any extension), and its data is the ASCII string Test.
Referring to this I was able to submit the initial handshake and also get a valid response back.
Immediately afterwards Deluge is sending even more data. From the 5th byte it would seem like it is a bitfield message, but I'm not sure what to make of it. I read that torrent clients may send a mixture of Bitfield and Have messages to show which parts of the torrent they possess. (My client isn't sending any bitfield, since it is assuming not to have any part of the file in question).
If my understanding is correct, it's stating that the message size is 2: one for identifier + payload. If that's the case why is it sending so much more data, and what's that supposed to be?
Same thing happens after my app sends an interested command. Deluge responds with a 1-byte message of unchoke (but then again appends more data).
And finally when it actually submits the piece, I'm not sure what to make of the data. The first underlined byte is 84 which corresponds to the letter T, as expected, but I cannot make much more sense of the rest of the data.
Note that the link in question does not really specify how the clients should supply messages in order once the initial handshake is completed. I just assumed to send interested and request based on what seemed to make sense to me, but I might be completely off.
I don't think Deluge is sending the additional bytes you're seeing.
If you look at them, you'll notice that all of the extra bytes are bytes that already existed in the handshake message, which should have been the longest message you received so far.
I think you're reading new messages into the same buffer, without zeroing it out or anything, so you're seeing bytes from earlier messages again, following the bytes of the latest message you read.
Consider checking if the network API you're using has a way to check the number of bytes actually received, and only look at that slice of the buffer, not the entire thing.
I'm currently working on one Node.js project. I want to have an ability to read, modify and write ZIP file without saving it into FS (we receive it by TCP and send it back after modifications were made), and so far it looks like possible bocause of simple ZIP file structure. Currently I refer to this documentation.
So ZIP file has simple structure:
File header 1
File data 1
File data descriptor 1
File header 2
File data 2
File data descriptor 2
...
[other not important yet]
First we need to read file header, which contains field compressed size, and it could be the perfect way to read file data 1 by it's length. But it's actually not. This field may contain '0' or '0xFFFFFFFF', and those values don't describe its actual length. In that case we have to read file data without information about it's length. But how?..
Compression/Decopression algorithm descriptions looks pretty complex to me, and I plan to use ZLIB for compression itself anyway. So if something useful described there, then I missed the point.
Can someone explain the proper way to read those files?
P.S. Please avoid suggesting npm modules. I do not want to only solve the problem, but also to understand how things work.
Note - I'm assuming you want to read and process the zip file as
it comes off the socket, rather than reading the complete zip file into
memory before processing. Both options are valid.
I'd initially ignore the use cases where the compressed size has a value of '0' or '0xFFFFFFFF'. The former is only present in zip files created in streaming mode, the latter for zip files larger than 4Gig.
Dealing with them adds a lot of complexity - you can add support for them later, if necessary. Whether you ever need to support the 0/0xFFFFFFFF use cases depends on the nature of the zip files you intend to process.
When the compression method is deflated (8), use zlib for compression/decompression. You also need to support compression method stored (0). It gets used for very small files where compression isn't appropriate.
Sometimes I receive this strange responses from other nodes. Transaction id match to my request transaction id as well as the remote IP so I tend to believe that node responded with this but it looks like sort of a mix of response and request
d1:q9:find_node1:rd2:id20:.éV0özý.?tjN.?.!2:ip4:DÄ.^7:nodes.v26:.ï?M.:iSµLW.Ðä¸úzDÄ.^æCe1:t2:..1:y1:re
Worst of all is that it is malformed. Look at 7:nodes.v it means that I add nodes.v to the dictionary. It is supposed to be 5:nodes. So, I'm lost. What is it?
The internet and remote nodes is unreliable or buggy. You have to code defensively. Do not assume that everything you receive will be valid.
Remote peers might
send invalid bencoding, discard those, don't even try to recover.
send truncated messages. usually not recoverable unless it happens to be the very last e of the root dictionary.
omit mandatory keys. you can either ignore those messages or return an error message
contain corrupted data
include unknown keys beyond the mandatory ones. this is not an error, just treat them as if they weren't there for the sake of forward-compatibility
actually be attackers trying to fuzz your implementation or use you as DoS amplifier
I also suspect that some really shoddy implementations are based on whatever string types their programming language supports and incorrectly handle encoding instead of using arrays of uint8 as bencoding demands. There's nothing that can be done about those. Ignore or occasionally send an error message.
Specified dictionary keys are usually ASCII-mappable, but this is not a requirement. E.g. there are some tracker response types that actually use random binary data as dictionary keys.
Here are a few examples of junk I'm seeing[1] that even fails bdecoding:
d1:ad2:id20:�w)��-��t����=?�������i�&�i!94h�#7U���P�)�x��f��YMlE���p:q9Q�etjy��r7�:t�5�����N��H�|1�S�
d1:e�����������������H#
d1:ad2:id20:�����:��m�e��2~�����9>inm�_hash20:X�j�D��nY��-������X�6:noseedi1ee1:q9:get_peers1:t2:�=1:v4:LT��1:y1:qe
d1:ad2:id20:�����:��m�e��2~�����9=inl�_hash20:X�j�D��nY���������X�6:noseedi1ee1:q9:get_peers1:t2:�=1:v4:LT��1:y1:qe
d1:ad2:id20:�����:��m�e��2~�����9?ino�_hash20:X�j�D��nY���������X�6:noseedi1ee1:q9:get_peers1:t2:�=1:v4:LT��1:y1:qe
[1] preserved char count. replaced all non-printable, ASCII-incompatible bytes with the unicode replacement character.
I am migrating a system from the old server (Slackware) to the new one (Redhat). The system includes some .gdbm files. I find out that on my new server, when running
WEB_SERVICES = file.gdbm
tie( %webservices, 'GDBM_File', $WEB_SERVICES, O_RDONLY, 0 )
the %webservices turns out to be empty. But this was working fine on my old server.
So my question is, are .gdbm files able to be simply transferred (using scp command) from one server to another (different operating system and different version of gdbm)?
Also I read the documents http://www.gnu.org.ua/software/gdbm/manual/gdbm.html#SEC12, which says .gdbm files need to be converted into flat format before sending over the network. But still I'm not sure how to do it.
Please help, thanks in advance!
On the old system, GDBM-tie to the hash, dump the hash. Move the dump to the new system. Read the dump into a hash, tie to GDBM to write it.
For dumping, use a platform independent serialisation format (Sereal is best), or if the dump needs to be human readable, Data::Dumper or similar for writing and Data::Undump for reading.
I need to exchange both protobuf-net objects and files between computers and am trying to figure out the best way to do that. Is there a way for A to inform B that the object that follows is a protobuf or a file? Alternately, when a file is transmitted, is there a way to know that the file has ended and the Byte[] that follows is a protobuf?
Using C# 4.0, Visual Studio 2010
Thanks, Manish
This has nothing to do with protobuf or files, and everything to do with your comms protocol, specifically "framing". This means simply: how you demark sub-messages in a single stream. For example, if this is a raw socket you might choose to send (all of)
a brief message-type, maybe a byte: 01 for file, 02 for a protobuf message of a particular file
a length prefix (typically 4 bytes network-byte-order)
the payload, consisting of the previous number of bytes
Then rinse and repeat for each message.
You don't state what comms you are asking, so I can be more specific.
Btw, another approach would be to treat a file as a protobuf message with a byte[] member - mainly suitable for small files, though