Parsing JPEG SOS Marker

Parsing JPEG SOS Marker - jpeg

There is a question with the same title but unfortunatly it doesn't help me.
I am trying to parse the data of a SOS marker. All documentations I can find say that after the marker (0xFFDA) a two byte number follows which defines how long this segment is -- for example here -- as with most variable size markers. But I seem not to understand it correctly in this case. It works for every other marker type.
I checked multiple files but just can't get this right. Is this number not defining how long the complete SOS field is? So for a baseline JPEG there should be exaclty one SOS segment and after this should the End Of Image marker follow. If it is progressive there can be multiple SOS segments but still all should have a length field.
I have one picture with a SOF0 Marker so it should be baseline. I believe that this is the correct SOFn marker because the image resolution can be found after that marker. With a hex editor I have found 3 0xFFDA marker and all of them have 0x000C in the following 2 bytes. So that segment, as I understand it, should always be 12 Byte long. But in all 3 cases no new marker is following after 12 byte of data. I guess the very last one is the scan I am looking for because if the value 0xFF comes up it is followed by 0x00 -- except the reset markers.
Are those two byte following 0xFFDA not the length fields?
EDIT:
So thanks to the comments and answer there seems to be no length field for the actual compressed data and only way to know where it ends seems to be decoding it.
Why does a Baseline DCT Image have multiple scans? I would understand why it has two; the main image and a thumbnail, but what is the third scan?
But there is one more thing. According to DRI Marker (Define Restart Interval) it contains the value after which a Scan should have a restart marker 0xFFD0 - 0xFFD7. But I seem to misunderstand that too or I'm not doing it right. For example a marker contained the value 0x0140 as restart interval. In the following Scan I started from the beginning and searched for the first 0xFFD0 but it came after 862 bytes instead of 320.

The SOS marker contains the compressed data; the most complex part of the JPEG stream. The SOFn marker indicates the format of the data. SOF0 and SOF1 are processed identically. SOF2 (progressive) is quite a bit different. (The read of the SOFn markers are not in common use or commonly supported).
The length is that of the SOS header, not the compressed data. Most of the header is only applicable to progressive scans (SOF2).
The compressed data follows the header. The compressed data has no length. You have to scan through the data to find the next marker.

Summary of how to find next marker after SOS marker (0xFFDA):
Skip first 3 bytes after SOS marker (2 bytes header size + 1 byte number of image components in scan).
Search for next FFxx marker (skip every FF00 and range from FFD0 to FFD7 because they are part of scan).
*This is summary of comments below post of user3344003 + my knowledge + Table B.1 from https://www.w3.org/Graphics/JPEG/itu-t81.pdf.
*Basing on Table B.1 I can also suspect that values FF01 and FF02 through FFBF should also be skipped in point 2 but I am not sure if they cannot appear as part of encoded SOS data.
Additional question above:
Why does a Baseline DCT Image have multiple scans? I would understand why it has two; the main image and a thumbnail, but what is the third scan?
If image stream contains APP2 marker (0xFFE2), that tells us it can be (but not have to be) Multi Picture JPEG (tag named MPF), we can have more than 3 images. In general APP markers can store anything, there are a lot of standards related to APP segments in JPEG files.
First table tells us about "things" that can be stored in each APP segment:
https://exiftool.org/TagNames/JPEG.html

Related

What is the "n" parameter in the JPEG spec's DQT segment?

I'm in the process of writing a JPEG file decoder to learn about the workings of JPEG files. In working through ITU-T81, which specifies JPEG, I ran into the following regarding the DQT segment for quantization tables:
In many of JPEG's segments, there is an n parameter which you read from the segment, which then indicates how many iterations of the following item there are. However, in the DQT case, it just says "multiple", and its not defined how many multiples there are. One can possibly infer from Lq, but the way this multiple is defined is a bit of an anomaly compared to the other segments.
For anyone who is familiar with this specification, what is the right way to determine how many multiples, or n, of (Pq, Tq, Q0..Q63) there should be?

Take the length field (LQ), subtract the length of the Pq/Tq field (one byte if I remember), and that is N.

Decoding lossless jpeg in nema.org DICOM files

I wrote a jpeg compressor/decompressor years ago, which can handle lossless and lossy jpeg files. It works well, but doesn't always decode jpeg streams in DICOM files correctly.
I know jpeg well, but I know little about DICOM. Lossless jpeg in DICOM can't possibly be compliant with the jpeg ISO standard. There must be some modification, either hard coded, or modified by a parameter somewhere in a DICOM file outside of the jpeg file stream.
My code fails on most of the sample DICOM files (compsamples_jpeg.tar) at:
ftp://medical.nema.org/MEDICAL/Dicom/DataSets/WG04/
Here's what happens when I decode the first lossless jpeg (IMAGES\JPLL\CT1_JPLL) in this set:
dicom decoded image
The left image is rendered from my code, the right was rendered by an online DICOM reader:
www (dot) ofoct (dot) com (slash) viewer (slash) dicom-viewer-online (dot) html
(x)MedCon, an open source DICOM reader, fails at the exact same pixel as my code, so I'm not the only one who has this problem.
xmedcon dot sourceforge dot net
I have read this jpeg stream byte by byte, drew the huffman tree and calculated the huffman codes with pencil and paper, and my code does exactly what it is supposed to do. Here are the huffman codes:
0 00
4 01
3 100
5 101
1 1100
2 1101
6 1110
7 11110
8 111110
9 1111110
12 11111110
11 111111110
10 1111111110
15 11111111110
Here is the compressed data after the SOS marker:
ff 00 de 0c 00 (00 after ff is stuff byte)
11111111 11011110 00001100 00000000
11111111110 si=15
111100000110000 diff=30768
The online viewer says the first pixel value is -3024. If this is correct, the first diff value should be -3024, but it is not.
After this, my code correctly decodes about 2/5 of the image, but then decodes a wildly inaccurate diff value:
d2 a1 fe ff 00 e0 (00 after ff is stuff byte)
1010111 10100001 11111110 11111111 11100000
101 si=5
01111 diff=-16
01 si=4
0000 diff=-15
111111110 si=11 ????
11111111111 diff=2047
If you look at the image decoded by the online viewer, there is no radical change in pixel intensity at this location, so the si=11 value can't be correct.
I am sure I have a good understanding of jpeg, but jpeg streams in DICOM don't seem to follow the jpeg standard. What extensions/changes are made to jpeg streams when they are embedded in DICOM files?

DICOM specifies the use of ISO 10918 just as it is written, so there is nothing magic about the use of lossless JPEG in DICOM images, other than the matters of reinterpreting the always unsigned output of the decoded bitstream as signed (depending on Pixel Representation) and applying the Rescale Slope and Intercept to the decoded "stored pixel values" into whatever "values" a viewer might report (e.g., as Hounsfield Units), as Paolo describes. Or to put it another way, do not rely on the "pixel values" reported by a viewer to be the same as the direct output of the decoded bitstream.
For reference, here are the sections in DICOM that address the use of 10918 in general:
http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_8.2.html#sect_8.2.1
http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_A.4.html#sect_A.4.1
DICOM encoders may split individual compressed frames into separate fragments, as in the case of this sample that deliberately uses fragmentation to test the decoding capability. I expect you know that and have taken care of reassembling compressed the bit stream across fragment boundaries (i.e., removing the fixed length Item tags between fragments):
http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_A.4.html
Though some encoders may be buggy, I don't think that is the case for IMAGES\JPLL\CT1_JPLL in the NEMA sample dataset, which I created many years ago using the Stanford PVRG codec.
My own decoder (minimal as it is) at http://www.dclunie.com/pixelmed/software/codec/ has no problem with it. The source is available, so if you want to recompile it with some of the debugging messages turned on to track each decoded value, predictor input value, restart at the beginning of each row, etc., to compare with your own logic, feel free.
Finally, since JPEG lossless is used rarely outside DICOM, you may find it hard to obtain other samples to test with. One such source that comes to mind is the USF digitized mammography collection (medical, but not DICOM), at http://marathon.csee.usf.edu/Mammography/Database.html.
David
PS. I did check which codec XMedCon is using at https://sourceforge.net/projects/xmedcon/ and it seems to use some copy of the Cornell lossless code; so it may be vulnerable to the same bug described in the post that BitBank referred to (https://groups.google.com/forum/#!topic/comp.protocols.dicom/Yl5GkZ8ggOE) or some other error. I didn't try to decipher the source code to see.

The first pixel's value is indeed -3024 as the online dicom viewer says:
You correctly decode the first amplitude as 30768, but the first pixel has the predictor set to zero and therefore its real value is 32768+30768=63536. This is an unsigned value.
Now, the pixel representation tag says that the file values are in b2 complement (signed), therefore when we use the most significant bit as a sign the value becomes -2000.
When we apply the value in the rescale slope intercept tag (-1024) then the value of the first pixel becomes -3024.
However, my codec doesn't find any amplitude 2047 near the row 179, so maybe your codec is going out of sync somehow: the loss of sync is also visible in the subsequent rows (they are all shifted to the right).

How do graphic file format types work?

I'm very interested in understanding how graphic file format (PNG, JPG, GIF) work. Are there any code examples that demonstrate how these files are made and also how they are interpreted (viewed in browser)?

Regardless of which graphic file format you are working with, you need to understand the basic nature that all graphic files have in common.
File Header
File Type, Version, (Time & Date Stamp - if included)
Possible data structure/s info or chunks
Flags for which color type to be expected, if compression is available and which type, byte order (endian), has transparency, and other various flags.
Image Data Info
Width normally in pixels sometimes in pels, bits or bytes
Height normally in pixels sometimes in pels, bits or bytes
Bits Per Pixel or Pixel Depth
Image Size in Bytes: numPixelsWidth * numPixelsHeight * ((bits or bytes) for each pixel)
Color Type: - Each Pixel has color data which can vary
Gray Scale
Palette
Color RGB
Color RGBA
Possible Others
If Compression Is Present Which Coding and Encoding Is Used
The actual image data
Once you understand this basic structure then parsing image files becomes easier once you know the specification to the file structure you are working with. When you know how many bytes to read in to your file pointer that includes all headers and chunks, then you can advance your file pointer to the data structure that will read in or write out all the pixel (color) data. In many cases the pixel data is usually 24bits per pixel such that each channel RGBA - Red, Green, Blue, and Alpha are 8bits each or one byte same as an unsigned char. This is represented in either a structure or a two dimensional array. Either way once you know the file's structure and know how to read in the actual image or color data you can easily store it into a single array. Then what you do with it from there depends on your application's needs.

The most detailed information can be obtained by reading the file format specification and implementing a parser in the language you know best.
A good way would be to read the format and transform it into an array of four byte tupples (RGBA, the red, green, blue and alpha parts of a color) This will allow you to use this format as an in between format between formats for easy conversion. At the same time most APIs support the displaying of this raw format.
A good format to get started with is BMP. As old as it is, if this is your first encounter with writing a parser this is a safe an 'easy' format. A good second format is PNG. Start with the uncompressed variations and later add the compression.
Next step is TGA to learn reading chunks or JPG to learn more about compression.
Extra tip: Some implementations of writers contain(ed) errors causing images to be in violation of the format. Others added extra features that never made it to the official specs. When writing a parser this can be a real pain. When you are running into problems always second guess the image you are trying to read. A good binary/hex file reader/editor can be a very helpful tool. I used AXE, if I remember correctly it allows you to overlay the hex codes with a format so you can quickly recognize the header and chunks.

How to write a JFIF file?

I'm developing a C++ project, in which I have to compress bitmap data in JPEG format, and write as output a .jpg file that could be opened by a normal image viewer.
I cannot use any kind of libraries.
All the JPEG compression is done, the only thing I'm missing is how to write the stuff in the file correctly.
I already looked into the JFIF format file specification, and googled a lot, but can't figure out how to do it.
More in detail, I should have all the headers put correctly in the file, what I really miss is, after I have the 3 color components ready to be written, how can I do that? (in which order to write the components, how to handle subsampled components (is there other stuff?))
EDIT:
Link to a sample output image (starting from a random generated 8x8 RGB bitmap).
https://dl.dropbox.com/u/46024798/out.jpg
The headers of the image (should) specify that this is a JPEG 8x8px with 3 color components, subsampling 4:4:4.
More in detail, what I did is:
Generate 3 random 8x8 blocks, with values in range [0..255]
Subtract 128 to all the elements (now in range [-128..127])
Apply the Discrete Cosene Transormation to the 3 blocks
Quantize the result
Put the results of quantization in zig-zag order
Look up in the Huffman Tables for the values to write in the files (with End Of Block marker and that kind of stuff)
And for the JPEG compression, that should be ok.
Then I write the file:
First, I write the SOI header, the APP0 marker, the "magic string" JFIF, version, units, density and thumbnail info
Then the quantization table
Then the Start Of Frame marker, with image precision, dimensions, number of components, subsampling info, a DC Huffman Table and an AC Huffman Table
Then the Start Of Scan header (probably where I messed up), in which I point the IDs of the Huffman Tables to use for each component and other stuff for which I don't know exactly the meaning (spectral selection??, successive approximation??)
Finally, I write the Huffman Encoded values in this order:
All the Y block
All the Cb block
All the Cr block
And End Of Image

Why several FFDA markers in jpeg file

In my jpeg file there are few FFDA markers. From which marker my data starts, and how do I know from which marker I decode the file?

The JPEG standard has many options that are not used very often. A typical color image will have 3 color components (Y, Cr, Cb) interleaved in a single scan (one FFDA marker). They can also be stored in any combination in separate scans. A progressive JPEG image encodes multiple scans with more and more detail (AC coefficients) in each successive scan, but the standard allows any combination of color components and coefficients to be mixed in different scans. I have only seen one case of a non-progressive JPEG with separate scans for each color component; it came from an IP camera.

Your JPEG is probably progressive which means you have to decode the data after at least the first FFDA marker, which will bring you an intermediate result.
If this is your first attempt at making a JPEG decoder you should choose another image and try to implement a baseline decoder instead. Progressive images adds a lot of complexity to the problem.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string