Dealing with Padding / Stuff Bits Entropy Encoded JPEG - jpeg

When decoding entropy encoded DC values in JPEG (or the entropy encoded prediction differences in lossless JPEG), how do I distinguish between 1 bits that have been stuffed to pad a byte before a marker and a Huffman coded value?
For example if I see:
0xAF 0xFF 0xD9
and I have already consumed the bits in [0xA], how can I tell if the next 0xF is padded or should be decoded?
This is from the JPEG Spec:
F.1.2.3 Byte stuffing
In order to provide code space for marker codes
which can be located in the compressed image data without decoding,
byte stuffing is used.
Whenever, in the course of normal encoding, the
byte value X’FF’ is created in the code string, a X’00’ byte is
stuffed into the code string. If a X’00’ byte is detected after a
X’FF’ byte, the decoder must discard it. If the byte is not zero, a
marker has been detected, and shall be interpreted to the extent
needed to complete the decoding of the scan.
Byte alignment of markers
is achieved by padding incomplete bytes with 1-bits. If padding with
1-bits creates a X’FF’ value, a zero byte is stuffed before adding the
marker.

There are only two possibilities for an FF value in the compressed data stream.
Restart Marker; or
FF00 representing FF.
If you are decoding a stream, you will know from the restart interval when to expect a restart marker. When you hit the point in decoding where you should find a restart marker, you discard the remaining bits in the current byte.

Related

ITU T.87 JPEG LS Standard and sample .jls SOS encoded streams have no escape sequence 0xFF 0x00

ITU T.81 states the following:
B.1.1.2 Markers
Markers serve to identify the various structural
parts of the compressed data formats. Most markers start marker
segments containing a related group of parameters; some markers stand
alone. All markers are assigned two-byte codes: an X’FF’ byte followed
by a byte which is not equal to 0 or X’FF’ (see Table B.1). Any marker
may optionally be preceded by any number of fill bytes, which are
bytes assigned code X’FF’. NOTE – Because of this special
code-assignment structure, markers make it possible for a decoder to
parse the compressed data and locate its various parts without having
to decode other segments of image data. "
B.1.1.5 Entropy-coded data segments An entropy-coded data segment
contains the output of an entropy-coding procedure. It consists of an
integer number of bytes, whether the entropy-coding procedure used is
Huffman or arithmetic.
NOTES
(1) Making entropy-coded segments an
integer number of bytes is performed as follows: for Huffman coding,
1-bits are used, if necessary, to pad the end of the compressed data
to complete the final byte of a segment. For arithmetic coding, byte
alignment is performed in the procedure which terminates the
entropy-coded segment (see D.1.8).
(2) In order to ensure that a marker
does not occur within an entropy-coded segment, any X’FF’ byte
generated by either a Huffman or arithmetic encoder, or an X’FF’ byte
that was generated by the padding of 1-bits described in NOTE 1 above,
is followed by a “stuffed” zero byte (see D.1.6 and F.1.2.3).
And in many other places where well known Stuff_0() function is also named.
Not sure where standard ITU T.87 stands in regard to the encoding escape sequence 0xFF 0x00 specified by standard ITU T.81:
Standard ITU T.87 it self that do not specify this but expects it.
Where Standard test samples are incorrectly formed, clearly do not have encoding escape sequence 0xFF 0x00 in encoded streams. For example 0xFF 0x7F, 0xFF 0x2F, and other sequences can be found in encoded streams of .jsl test samples : namelly "T8C0E3.JLS". And no one saw it all these years;
Or if Standard ITU T.87 actually overrides the ITU T.81 regarding this rule for encoded streams and doesn't allow encoding of escape sequence;
In decoder we could make logic to detect decoder errors when 0xFF and !0x00 is to actually use that byte and not skip it if component is not fully decoded. But what if jls file do not have escape sequence and we encounter 0xFF 0x00 sequence should we skip 0x00 byte or not?
Would like some clarification on subject of standard ITU T.87 JPEG-LS encoding, and what is the correct procedure. Should we, or shouldn't we, encode escape sequnce 0xFF 0x00 in encoded streams?
The answer :
ITU T.87 - ANNEX A - point A1 Coding parameters and compressed image data - pass 3
Marker segments are inserted in the data stream as specified in Annex
D. In order to provide for easy detection of marker segments, a single
byte with the value X'FF' in a coded image data segment shall be
followed with the insertion of a single bit '0'. This inserted bit
shall occupy the most significant bit of the next byte. If the X'FF'
byte is followed by a single bit '1', then the decoder shall treat the
byte which follows as the second byte of a marker, and process it in
accordance with Annex C. If a '0' bit was inserted by the encoder, the
decoder shall discard the inserted bit, which does not form part of
the data stream to be decoded.
NOTE 2 – This marker segment detection
procedure differs from the one specified in CCITT Rec. T.81 | ISO/IEC
10918-1.
JPEG-LS T.87 overrides T.81 JPEG Standard for encoded data stream to have byte 0xFF followed by byte with value between 0x00 and 0x7F (inclusive).

JPEG SOS specification

I am parsing a JPG in java byte by byte. I am then writing same image byte by byte, and I have come across an oddity. I have tried looking at the spec but I see no reference.
At the end of the SOS section there are three bytes that most sources say 'skip'. But if I write 0x00,0x00,0x00 then java(fx) complains about an invalid value. If I write 0x000x3f0x00 then there is no complaint. (the three byte sequence is what was produced by GIMP in the original file)
I came across an indirect reference to this in the GoLang repo
// - the bytes "\x00\x3f\x00". Section B.2.3 of the spec says that for
// sequential DCTs, those bytes (8-bit Ss, 8-bit Se, 4-bit Ah, 4-bit Al)
// should be 0x00, 0x3f, 0x00<<4 | 0x00.
My question is should I just write 0x3f at this position, or does the value depend upon something else?
In a sequential JPEG scan this value has no meaning. The standard says to set it to 63 but that tells the decoder nothing. You have to process all 64 DCT coefficients in a sequential scan.
In a progressive scan this value means A LOT.

Problems with SHA 2 Hashing and Java

I am working on following the SHA-2 cryptographically functions as stated in https://en.wikipedia.org/wiki/SHA-2.
I am examining the lines that say:
begin with the original message of length L bits append a single '1' bit;
append K '0' bits, where K is the minimum number >= 0 such that L + 1 + K + 64 is a multiple of 512
append L as a 64-bit big-endian integer, making the total post-processed length a multiple of 512 bits.
I do not understand the last two lines. If my string is short can its length after adding K '0' bits be 512. How should I implement this in Java code?
First of all, it should be made clear that the "string" that is talked about is not a Java String but a bit string. These algorithms are binary/bit based. The implementation will generally not handle bits but bytes. So there is a translation phase where you should see bytes instead of bits.
SHA-512 is operated on in blocks of 512 bits (SHA-224/256) or 1024 bits (SHA-384/512). So basically you have a 64 or 128 byte buffer that you are filling before operating on it. You could also directly cache the data in 32 bit int fields (SHA-224/256) or 64 bit long fields, as that is the word size that is operated on.
Now the padding is relatively simple procedure. The padding is called bit-padding. As it is used in big-endian mode (SHA-2 fortunately uses this instead of the braindead little endian mode in SHA-3) the padding consists of a single bit set on the highest order bit in a byte, with the rest filled by zero's. That makes for a value of (byte) 0x80 which must be put in the buffer.
If you cannot create this padding because the buffer is full then you will have to process the previous block, and then set the first bit of the now available buffer to (byte) 0x80. In the newer Java you can also use (byte) 0b1_0000000 byte the way, which is more explicit.
Now you simply add zero's until you have 8 to 16 bytes left, again depending on the hash output size used. If there aren't enough bytes then fill till the end, process the block, and re-start filling with zero bytes until you have 8 or 16 bytes left again.
Now finally you have to encode the number of bits in those 8 or 16 bytes you've left. So multiply your input by eight, and make sure you encode those bytes in the same way as you'd expect in Java with the least significant bits as much to the right as possible. You might want to use https://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html#putLong-long- for this if you don't want to program it yourself. You may probably forget about anything over 2^56 bytes anyway, so if you have SHA-384/SHA-512 then simply set the first eight bytes to zero.
And that's it, except that you still need to process that last block and then use as many bytes from the left as required for your particular output size.

Preon encode() does not fill up remaining bits until the byte boundary is reached

I have a message where a variable length of 7Bit characters is encoded. Unfortunately those 7Bit characters are stored in the message as 7Bit. That means the last byte of the message is not necessarily aligned to a byte boundary.
Decoding a message with Preon works fine, but when encoding the previously decoded message with Preon and comparing the byte arrays, the arrays do not match in length.
The encoded byte array is one byte smaller than the original one.
I debugged Preon because I assumed a bug, but it works as designed. When a byte boundary is reached, Preon stores the remaining bits until the next write() call to the BitChannel occures. But for the last byte there is no further call.
The question is, is there a way to tell Preon to flush the remaining buffer?

pcks5 padding

I have text with 20 octets and 32 octets. So the first one is a complete 16 bytes block and 32 octets is 26 bytes. When I encrypt the file used aes-cbc mode the padding will not be done for the first one, but the padding will be done for the 2nd one. Which is the number of zeros that should be put to make it 32. i.e., the 32th byte will be 5 and the rest of them are zeros. When I encrypted the file with the key.. I have some cipher text.
My question is since from 27-31 are zeros, when the text is encrypted should the algorithm give me the same cipher text between 27-31. Or how will I know that the zeros are added and 5 is the 32nd byte in the text since the value is encrypted.
Correct me if i am wrong..
According to RFC2898 - which defines the PKCS#5 padding - the padding contains in each byte the length of the padding (in bytes). Therefore if you read the last byte of the last decrypted block you received, you will find the information how many padding bytes you can discard.

Resources