Preon encode() does not fill up remaining bits until the byte boundary is reached

Preon encode() does not fill up remaining bits until the byte boundary is reached - preon

I have a message where a variable length of 7Bit characters is encoded. Unfortunately those 7Bit characters are stored in the message as 7Bit. That means the last byte of the message is not necessarily aligned to a byte boundary.
Decoding a message with Preon works fine, but when encoding the previously decoded message with Preon and comparing the byte arrays, the arrays do not match in length.
The encoded byte array is one byte smaller than the original one.
I debugged Preon because I assumed a bug, but it works as designed. When a byte boundary is reached, Preon stores the remaining bits until the next write() call to the BitChannel occures. But for the last byte there is no further call.
The question is, is there a way to tell Preon to flush the remaining buffer?

Related

I have attempted to supply hand written shellcode, but it is being read as a string and not as bytes, what next?

How do I get "\x90" to be read as the byte value corresponding to the x86 NOP instruction when supplied as a field within the standard argument list in Linux? I have a buffer being stuffed all the way to 10 and then being overwritten into the next 8 bytes with the new return address, at least so I would like. Because the byte sequence being supplied is not read as a byte sequence but rather as characters, I do not know how to fix this. What next?

python3 bytes construct adding spurrious bytes

I am new to python 3. I am sending bytes across the wire.
When I send s.send(b'\x8f\x35\x4a\x5f"), and I look at the stack trace, I only see 5f4a358f.
However, if I create a variable:
test=(['\x8f\x35\x4a\x5f'])
print(str(''.join(test).encode()))
I receive b'\xc2\x8f5J_'
As you can see, there is an extra byte /xc2.
My question is two-fold:
1) Why when using str.encode() which encodes a string to bytes which are already "encoded" an extra byte /xc2 is added whereas a literal byte string b'\x8f\x35\x4a\x5f' has no extra encoding is added?
2) If I am passing in bytes into a variable which is used as a buffer to send data across a socket, how does one create and send a set of literal bytes (e.g. b') programmatically such that there is no added /xc2 byte when sent across the wire?
Thank you all for your time! I really appreciate the help.

Because it's not encoded; it's text consisting of U+008F U+0035 U+004A U+005F. And then when you encode it (as UTF-8, per default) the extra byte is added. Either use bytes in the first place, or encode as Latin-1. But use bytes.

problems sending bytes greater 0x7F python3 serial port

I'm working with python3 and do not find an answer for my little problem.
My problem is sending a byte greater than 0x7F over the serial port with my raspberry pi.
example:
import serial
ser=serial.Serial("/dev/ttyAMA0")
a=0x7F
ser.write(bytes(chr(a), 'UTF-8'))
works fine! The receiver gets 0x7F
if a equals 0x80
a=0x80
ser.write(bytes(chr(a), 'UTF-8'))
the receiver gets two bytes: 0xC2 0x80
if i change the type to UTF-16 the receiver reads
0xFF 0xFE 0x80 0x00
The receiver should get only 0x80!
Whats wrong! Thanks for your answers.

UTF-8 specification says that words that are 1 byte/octet start with 0. Because 0x80 is "10000000" in binary, it needs to be preceded by a C2, "11000010 10000000" (2 bytes/octets). 0x7F is 01111111, so when reading it, it knows it is only 1 byte/octet long.
UTF-16 says that all words are represented as 2 byte/octets and has a Byte Order Mark which essentially tells the reader which one is the most-significant octet (or endianness.
Check on UTF-8 for full specifications, but essentially you are moving from the end of the 1 byte range, to the start of the 2 byte range.
I don't understand why you want to send your own custom 1-byte words, but what you are really looking for is any SBCS (Single Byte Character Set) which has a character for those bytes you specify. UTF-8/UTF-16 are MBCS, which means when you encode a character, it may give you more than a single byte.
Before UTF-? came along, everything was SBCS, which meant that any code page you selected was coded using 8-bits. The problem arose when 256 characters were not enough, and they had to make code pages like IBM273 (IBM EBCDIC Germany) and ISO-8859-1 (ANSI Latin 1; Western European) to interpret what "0x2C" meant. Both the sender and receiver needed to set their code page identifier to the same, or they wouldn't understand each other. There is further confusion because these SBCS code pages don't always use the full 256 characters, so "0x7F" may not even exist / have a meaning.
What you could do is encode it to something like codepage 737/IBM 00737, send the "Α" (Greek Alpha) character and it should encode it as 0x80.
If it doesn't work, t'm not sure if you can send the raw byte through pyserial as the write() method seems to require an encoding, you may need to look into the source code to see the lower level details.

a=0x80
ser.write(bytes(chr(a), 'ISO-8859-1'))

Dealing with Padding / Stuff Bits Entropy Encoded JPEG

When decoding entropy encoded DC values in JPEG (or the entropy encoded prediction differences in lossless JPEG), how do I distinguish between 1 bits that have been stuffed to pad a byte before a marker and a Huffman coded value?
For example if I see:
0xAF 0xFF 0xD9
and I have already consumed the bits in [0xA], how can I tell if the next 0xF is padded or should be decoded?
This is from the JPEG Spec:
F.1.2.3 Byte stuffing
In order to provide code space for marker codes
which can be located in the compressed image data without decoding,
byte stuffing is used.
Whenever, in the course of normal encoding, the
byte value X’FF’ is created in the code string, a X’00’ byte is
stuffed into the code string. If a X’00’ byte is detected after a
X’FF’ byte, the decoder must discard it. If the byte is not zero, a
marker has been detected, and shall be interpreted to the extent
needed to complete the decoding of the scan.
Byte alignment of markers
is achieved by padding incomplete bytes with 1-bits. If padding with
1-bits creates a X’FF’ value, a zero byte is stuffed before adding the
marker.

There are only two possibilities for an FF value in the compressed data stream.
Restart Marker; or
FF00 representing FF.
If you are decoding a stream, you will know from the restart interval when to expect a restart marker. When you hit the point in decoding where you should find a restart marker, you discard the remaining bits in the current byte.

pcks5 padding

I have text with 20 octets and 32 octets. So the first one is a complete 16 bytes block and 32 octets is 26 bytes. When I encrypt the file used aes-cbc mode the padding will not be done for the first one, but the padding will be done for the 2nd one. Which is the number of zeros that should be put to make it 32. i.e., the 32th byte will be 5 and the rest of them are zeros. When I encrypted the file with the key.. I have some cipher text.
My question is since from 27-31 are zeros, when the text is encrypted should the algorithm give me the same cipher text between 27-31. Or how will I know that the zeros are added and 5 is the 32nd byte in the text since the value is encrypted.
Correct me if i am wrong..

According to RFC2898 - which defines the PKCS#5 padding - the padding contains in each byte the length of the padding (in bytes). Therefore if you read the last byte of the last decrypted block you received, you will find the information how many padding bytes you can discard.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string