JPEG SOS specification - jpeg

I am parsing a JPG in java byte by byte. I am then writing same image byte by byte, and I have come across an oddity. I have tried looking at the spec but I see no reference.
At the end of the SOS section there are three bytes that most sources say 'skip'. But if I write 0x00,0x00,0x00 then java(fx) complains about an invalid value. If I write 0x000x3f0x00 then there is no complaint. (the three byte sequence is what was produced by GIMP in the original file)
I came across an indirect reference to this in the GoLang repo
// - the bytes "\x00\x3f\x00". Section B.2.3 of the spec says that for
// sequential DCTs, those bytes (8-bit Ss, 8-bit Se, 4-bit Ah, 4-bit Al)
// should be 0x00, 0x3f, 0x00<<4 | 0x00.
My question is should I just write 0x3f at this position, or does the value depend upon something else?

In a sequential JPEG scan this value has no meaning. The standard says to set it to 63 but that tells the decoder nothing. You have to process all 64 DCT coefficients in a sequential scan.
In a progressive scan this value means A LOT.

Related

I have attempted to supply hand written shellcode, but it is being read as a string and not as bytes, what next?

How do I get "\x90" to be read as the byte value corresponding to the x86 NOP instruction when supplied as a field within the standard argument list in Linux? I have a buffer being stuffed all the way to 10 and then being overwritten into the next 8 bytes with the new return address, at least so I would like. Because the byte sequence being supplied is not read as a byte sequence but rather as characters, I do not know how to fix this. What next?

ITU T.87 JPEG LS Standard and sample .jls SOS encoded streams have no escape sequence 0xFF 0x00

ITU T.81 states the following:
B.1.1.2 Markers
Markers serve to identify the various structural
parts of the compressed data formats. Most markers start marker
segments containing a related group of parameters; some markers stand
alone. All markers are assigned two-byte codes: an X’FF’ byte followed
by a byte which is not equal to 0 or X’FF’ (see Table B.1). Any marker
may optionally be preceded by any number of fill bytes, which are
bytes assigned code X’FF’. NOTE – Because of this special
code-assignment structure, markers make it possible for a decoder to
parse the compressed data and locate its various parts without having
to decode other segments of image data. "
B.1.1.5 Entropy-coded data segments An entropy-coded data segment
contains the output of an entropy-coding procedure. It consists of an
integer number of bytes, whether the entropy-coding procedure used is
Huffman or arithmetic.
NOTES
(1) Making entropy-coded segments an
integer number of bytes is performed as follows: for Huffman coding,
1-bits are used, if necessary, to pad the end of the compressed data
to complete the final byte of a segment. For arithmetic coding, byte
alignment is performed in the procedure which terminates the
entropy-coded segment (see D.1.8).
(2) In order to ensure that a marker
does not occur within an entropy-coded segment, any X’FF’ byte
generated by either a Huffman or arithmetic encoder, or an X’FF’ byte
that was generated by the padding of 1-bits described in NOTE 1 above,
is followed by a “stuffed” zero byte (see D.1.6 and F.1.2.3).
And in many other places where well known Stuff_0() function is also named.
Not sure where standard ITU T.87 stands in regard to the encoding escape sequence 0xFF 0x00 specified by standard ITU T.81:
Standard ITU T.87 it self that do not specify this but expects it.
Where Standard test samples are incorrectly formed, clearly do not have encoding escape sequence 0xFF 0x00 in encoded streams. For example 0xFF 0x7F, 0xFF 0x2F, and other sequences can be found in encoded streams of .jsl test samples : namelly "T8C0E3.JLS". And no one saw it all these years;
Or if Standard ITU T.87 actually overrides the ITU T.81 regarding this rule for encoded streams and doesn't allow encoding of escape sequence;
In decoder we could make logic to detect decoder errors when 0xFF and !0x00 is to actually use that byte and not skip it if component is not fully decoded. But what if jls file do not have escape sequence and we encounter 0xFF 0x00 sequence should we skip 0x00 byte or not?
Would like some clarification on subject of standard ITU T.87 JPEG-LS encoding, and what is the correct procedure. Should we, or shouldn't we, encode escape sequnce 0xFF 0x00 in encoded streams?
The answer :
ITU T.87 - ANNEX A - point A1 Coding parameters and compressed image data - pass 3
Marker segments are inserted in the data stream as specified in Annex
D. In order to provide for easy detection of marker segments, a single
byte with the value X'FF' in a coded image data segment shall be
followed with the insertion of a single bit '0'. This inserted bit
shall occupy the most significant bit of the next byte. If the X'FF'
byte is followed by a single bit '1', then the decoder shall treat the
byte which follows as the second byte of a marker, and process it in
accordance with Annex C. If a '0' bit was inserted by the encoder, the
decoder shall discard the inserted bit, which does not form part of
the data stream to be decoded.
NOTE 2 – This marker segment detection
procedure differs from the one specified in CCITT Rec. T.81 | ISO/IEC
10918-1.
JPEG-LS T.87 overrides T.81 JPEG Standard for encoded data stream to have byte 0xFF followed by byte with value between 0x00 and 0x7F (inclusive).

problems sending bytes greater 0x7F python3 serial port

I'm working with python3 and do not find an answer for my little problem.
My problem is sending a byte greater than 0x7F over the serial port with my raspberry pi.
example:
import serial
ser=serial.Serial("/dev/ttyAMA0")
a=0x7F
ser.write(bytes(chr(a), 'UTF-8'))
works fine! The receiver gets 0x7F
if a equals 0x80
a=0x80
ser.write(bytes(chr(a), 'UTF-8'))
the receiver gets two bytes: 0xC2 0x80
if i change the type to UTF-16 the receiver reads
0xFF 0xFE 0x80 0x00
The receiver should get only 0x80!
Whats wrong! Thanks for your answers.
UTF-8 specification says that words that are 1 byte/octet start with 0. Because 0x80 is "10000000" in binary, it needs to be preceded by a C2, "11000010 10000000" (2 bytes/octets). 0x7F is 01111111, so when reading it, it knows it is only 1 byte/octet long.
UTF-16 says that all words are represented as 2 byte/octets and has a Byte Order Mark which essentially tells the reader which one is the most-significant octet (or endianness.
Check on UTF-8 for full specifications, but essentially you are moving from the end of the 1 byte range, to the start of the 2 byte range.
I don't understand why you want to send your own custom 1-byte words, but what you are really looking for is any SBCS (Single Byte Character Set) which has a character for those bytes you specify. UTF-8/UTF-16 are MBCS, which means when you encode a character, it may give you more than a single byte.
Before UTF-? came along, everything was SBCS, which meant that any code page you selected was coded using 8-bits. The problem arose when 256 characters were not enough, and they had to make code pages like IBM273 (IBM EBCDIC Germany) and ISO-8859-1 (ANSI Latin 1; Western European) to interpret what "0x2C" meant. Both the sender and receiver needed to set their code page identifier to the same, or they wouldn't understand each other. There is further confusion because these SBCS code pages don't always use the full 256 characters, so "0x7F" may not even exist / have a meaning.
What you could do is encode it to something like codepage 737/IBM 00737, send the "Α" (Greek Alpha) character and it should encode it as 0x80.
If it doesn't work, t'm not sure if you can send the raw byte through pyserial as the write() method seems to require an encoding, you may need to look into the source code to see the lower level details.
a=0x80
ser.write(bytes(chr(a), 'ISO-8859-1'))

x64 opcodes and scaled byte index

I think I'm getting the Mod R/M byte down but I'm still confused by the effective memory address/scaled indexing byte. I'm looking at these sites: http://www.sandpile.org/x86/opc_rm.htm, http://wiki.osdev.org/X86-64_Instruction_Encoding. Can someone encode an example with the destination address being in a register where the SIB is used? Say for example adding an 8-bit register to an address in a 8-bit register with SIB used?
Also when I use the ModR/M byte of 0x05 is that (*) relative to the current instruction pointer? Is it 32 or 64 bits when in 64 bit mode?'
Is the SIB always used with a source or destination address?
A memory address is never in an 8-bit register, but here's an example of using SIB:
add byte [rax + rdx], 1
This is an instance of add rm8, imm8, 80 /0 ib. /0 indicates that the r field in the ModR/M byte is zero. We must use a SIB here but don't need an immediate offset, so we can use 00b for the mod and 100b for the rm, to form 04h for the ModR/M byte (44h and 84h also work, but wastes space encoding a zero-offset). Looking in the SIB table now, there are two registers both with "scale 1", so the base and index are mostly interchangeable (rsp can not be an index, but we're not using it here). So the SIB byte can be 10h or 02h.
Just putting the bytes in a row now:
80 04 10 01
; or
80 04 02 01
Also when I use the ModR/M byte of 0x05 is that (*) relative to the current instruction pointer? Is it 32 or 64 bits when in 64 bit mode?
Yes. You saw the note, I'm sure. So it can be either, depending on whether you used an address size override or not. In every reasonable case, it will be rip + sdword. Using the other form gives you a truncated result, I can't immediately imagine any circumstances under which that makes sense to do (for general lea math sure, but not for pointers). Probably (this is speculation though) that possibility only exists to make the address size override work reasonably uniformly.
Is the SIB always used with a source or destination address?
Depends on what you mean. Certainly, if you have a SIB, it will encode a source or destination (because what else is there?) (you might argue that the SIB that can appear in nop rm encodes nothing because nop has neither sources nor destinations). If you mean "which one does it encode", it can be either one. Looking over all instructions, it can most often appear in a source operand. But obviously there are many cases where it can encode the destination - example: see above. If you mean "is it always used", well no, see that table that you were looking at.

Dealing with Padding / Stuff Bits Entropy Encoded JPEG

When decoding entropy encoded DC values in JPEG (or the entropy encoded prediction differences in lossless JPEG), how do I distinguish between 1 bits that have been stuffed to pad a byte before a marker and a Huffman coded value?
For example if I see:
0xAF 0xFF 0xD9
and I have already consumed the bits in [0xA], how can I tell if the next 0xF is padded or should be decoded?
This is from the JPEG Spec:
F.1.2.3 Byte stuffing
In order to provide code space for marker codes
which can be located in the compressed image data without decoding,
byte stuffing is used.
Whenever, in the course of normal encoding, the
byte value X’FF’ is created in the code string, a X’00’ byte is
stuffed into the code string. If a X’00’ byte is detected after a
X’FF’ byte, the decoder must discard it. If the byte is not zero, a
marker has been detected, and shall be interpreted to the extent
needed to complete the decoding of the scan.
Byte alignment of markers
is achieved by padding incomplete bytes with 1-bits. If padding with
1-bits creates a X’FF’ value, a zero byte is stuffed before adding the
marker.
There are only two possibilities for an FF value in the compressed data stream.
Restart Marker; or
FF00 representing FF.
If you are decoding a stream, you will know from the restart interval when to expect a restart marker. When you hit the point in decoding where you should find a restart marker, you discard the remaining bits in the current byte.

Resources