I am trying to encode raw yuv422 bytes from a camera to a jpeg file in c++. I have reviewed different source codes that do this. I am still having trouble understanding the order in which the compressed mcu blocks are placed into the jpeg file. What i would like to do is encode each block in a sequential pattern left to right top to down. What items in the jpeg file govern this (header parameters, zig-zag table...etc)? I presume there must be a parameter(s) becaue it doen't appear that the mcu blocks are always place in the same order, based on the different source codes that i have reviewed. If this is the case how would a jpeg decoder(reader) know the order of the mcu blocks.
Thanks
The MCUs are encoded left to right, top to bottom.
The zig-zag is within a block.
Related
I'm trying to find out if there's a way to determine if an AAC-encoded audio track is encoded with Dolby Pro Logic II data. Is there a way of examining the file such that you can see this information? I have for example encoded a media file in Handbrake with (truncated to audio options) -E av_aac -B 320 --mixdown dpl2 and this is the audio track output that mediainfo shows:
Audio #1
ID : 2
Format : AAC
Format/Info : Advanced Audio Codec
Format profile : LC
Codec ID : 40
Duration : 2h 5mn
Bit rate mode : Variable
Bit rate : 321 Kbps
Channel(s) : 2 channels
Channel positions : Front: L R
Sampling rate : 48.0 KHz
Compression mode : Lossy
Stream size : 288 MiB (3%)
Title : Stereo / Stereo
Language : English
Encoded date : UTC 2017-04-11 22:21:41
Tagged date : UTC 2017-04-11 22:21:41
but I can't tell if there's anything in this output that would suggest that it's encoded with DPL2 data.
tl:dr; it's probably possible; it may be easier if you're a programmer.
Because the information encoded is just a stereo analog pair, there is no guaranteed way of detecting a Dolby Pro Logic II (DPL2) signal therein, unless you specifically store your own metadata saying "this is a DPL2 file." But you can probably make a pretty good guess.
All of the old analog Dolby Surround formats, including DPL2, store surround information in two channels by inverting the phase of the surround or surrounds and then mixing them into the original left and right channels. Dolby Surround type decoders, including DPL2, attempt to recover this information by inverting the phase of one of the two channels and then looking for similarities in these signal pairs. This is either done trivially, as in Dolby Surround, or else these similarities are artificially biased to be pushed much further to the left or right, or the left or right surround, as in DPL2.
So the trick is to detect whether important data is being stored in the surround channel(s). I'll sketch out for you a method that might work, and I'll try to express it without writing code, but it's up to you to implement and refine it to your liking.
Crop the first N seconds or so of program content into a stereo file, where N is between one and thirty. Call this file Input.
Mix down the Input stereo channels to a new mono file at -3dB per channel. Call this file Center.
Split the left and right channels of Input into separate files. Call these Left and Right.
Invert the right channel. Call this file RightInvert.
Mix down the Left and RightInvert channels to a new mono file at -3dB per channel. Call this file Surround.
Determine the RMS and peak dB of the Surround file.
If the RMS or peak DB of the Surround file are below "a tolerance", stop; the original file is either mono or center-panned and hence contains no surround information. You'll have to experiment with several DPL2 and non-DPL2 sources to see what these tolerances are, but after a dozen or so files the numbers should become clear. I'm guessing around -30 dB or so.
Invert the Center file into a new file. Call this file CenterInvert.
Mix the CenterInvert file into the Surround file at 0 dB (both CenterInvert and Surround should be mono). Call this new file SurroundInvert.
Determine the RMS and peak dB of the SurroundInvert file.
If either the RMS and/or peak dB of SurroundInvert are below "a tolerance," stop; your original source contains panned left or right front information, not surround information. You'll have to experiment with several DPL2 and non-DPL2 sources to see what these tolerances are, but after a dozen or so files the numbers should become clear -- I'm guessing around -35 dB or so.
If you've gotten this far, your original Input probably contains surround information, and hence is probably a member of the Dolby Surround family of encodings.
I've written this algorithm out such that you can do each of these steps with a specific command in sox. If you want to be fancier, instead of doing the RMS/peak value step in sox, you could run an ebur128 program and check your levels in LUFS against a tolerance. If you want to be even fancier, after you create the Surround and Center files, you could filter out all frequencies higher than 7kHz and do de-emphasis on them, just like a real DPL2 decoder would.
To keep this algorithm simple, I've sketched it out entirely in the amplitude domain. The calculation of the SurroundLevel file would probably be a lot more accurately done in the frequency domain, if you know how to calculate the magnitude and angle of FFT bins and you use windows of 30 to 100 ms. But this cheapo version above should get you started.
One last caution. AAC is a modern psychoacoustic codec, which means that it likes to play games with stereo phasing and imaging to achieve its compression. So I consider it likely that the mere act of encapsulating DPL2 into an AAC stream will likely hose some of the imaging present in DPL2. To be candid, neither DPL2 nor AAC belongs anywhere in this pipeline. If you must store an analog stream originally encoded with DPL2, do it in a lossless format like WAV or FLAC, not AAC.
As of this writing, operational concepts behind Dolby Pro Logic (I) are here. These basic concepts still apply to DPL2; operational concepts for DPL2 are here.
If the file has more than one channel, you can with some certainty assume that they are used for surround purposes, although they could be just multiple tracks.
In this case it falls on a playing system to do with channels as it "thinks" best. (if file header doesn't say what to do)
But your file is stereo. If you want to know whether it is a virtual surround file then you look in header for an encoder field to see which encoder was used.
This may help somewhat, although not much. Mostly encoder field is left empty, and second thing is that the encoder doesn't have to be same as the recoder that mixed down the surround data.
I.e. the recoder will first create raw PCM data, then feed it to some encoder to produce compressed file. (AAC or whatever)
Also, there are many applications and versions vary, so might the encoder field, so tracking all of them would be nasty work.
However, you can, with over 60% certainty, deduce whether something is virtual surround or not by examining the data.
This would be advanced DSP and, for speed, even machine learning may be involved.
You would have to find out whether the stereo signals contain certain features of HRTF (head related transfer function).
This may be achieved by examining intensity difference and delay features between same sounds appearing in time domain and harmonic features (characteristic frequency changes) in frequency domain.
You would have to do both, because one without another may just tell you that something is very good stereo recording,, not a virtual surround.
I don't know whether there are HRTF specific features mapped somewhere already, or you would need to do it by yourself.
It's a very complicated solution that takes a lot of time to make properly. Also it's performance would be problematic.
With this method you can also break the stereo mixdown to the nearly original surround channels.
But for stereo to surround conversion other methods are used and they sound well.
If you are determined to perform such a detection, dedicate half a year or more of hard work if no HRTF features are mapped, few weeks if they are,
brace yourself for big stress and I wish you luck. I have done something similar. It is a killer.
If you want an out of the box solution, then the answer to your question is no, unless header provides you with encoder field and the encoder is distinctive and known to be used only for doing surround to stereo conversion.
I do not think anyone did this from actual data as I described, or if they did it is a part of commercial product. Doing what you want is not usually needed, but it can be done.
Ow, BTW, try googling HRTF inversion, it might give some help.
How to read the pixel values of jpeg image using c/c++, without using any library.
I read about how compression takes place in jpeg in my course,i want header information.
For the syntax of the file you can check wikipedia.
Each segment has its own marker. The variable length segments have a two byte field for their length. So far is not really a problem, as you are able to extract all segments using this information (or at least it seems so on a first glance).
The more problematic part is to actually do something useful with the data inside the segments. The wikipedia page provides information on this topic, but it will require quite some mathematic knowledge to actually decode and grab the pixels.
Finally found some really helpful links..
link 1
link 2
Thanks for help and support.
I'm working with C# and a usb webcam that supports YUY2 or MJPG image formats. Thus far I've always had it in YUY2 mode and that works fine. Recently I tried changing the format to MJPG thinking that it would then feed my program one JPEG image per frame capture. It appears to almost do that. When I try to display the buffer, my app always takes an exception which is vague, but seems to indicate that the stream is invalid. I then copied one of the buffers to a file and tried to view it with IrfanView and it tells me that there is no huffman table. Looking at the buffer with a binary editor, I see it does have the SOI and EOF JPEG markers (and several others); however, it doesn't contain a huffman table marker. Any ideas what I'm doing wrong here? I've read a bit about JPEG and apparently there are cases where images can use a standard huffman table to reduce file size; however, if that's the case, how do I insert this into the image (if appropriate)?
This is with reference to Microsoft Lifecam by the way.
Part of the Motion-JPEG standard for AVI files is that a fixed Huffman table will be used so that it doesn't have to be stored in every frame.
I have some ideas I would like to experiment with relating to data compression, but am finding it difficult to decipher some parts of how the standard are applied "in real life". I would like to look at some sample compressed files to observe how the the blocks are arranged and the huffman tree(s) are structured.
Are there any tools in existence which can help visualize this for a given compressed file (zip/gzip/deflate etc)? I'm picturing something like a tree view or some form of graph visualizer.
You might be interested in this (if you are still interested that is :-P)
http://jvns.ca/blog/2013/10/24/day-16-gzip-plus-poetry-equals-awesome/
I made a "entropy image" tool.
The entropy_image tool replaces each pixel
with the (estimated) number of bits
necessary to encode that pixel using range coding or Huffman compression.
I hope this isn't the only compression visualization tool in the world.
So, to simplify my life I want to be able to append from 1 to 7 additional characters on the end of some jpg images my program is processing*. These are dummy padding (fillers, etc - probably all 0x00) just to make the file size a multiple of 8 bytes for block encryption.
Having tried this out with a few programs, it appears they are fine with the additional characters, which occur after the FF D9 that specifies the end of the image - so it appears that the file format is well defined enough that the 'corruption' I'm adding at the end shouldn't matter.
I can always post process the files later if needed, but my preference is to do the simplest thing possible - which is to let them remain (I'm decrypting other file types and they won't mind, so having a special case is annoying).
I figure with all the talk of Steganography hullaballo years ago, someone has some input here...
(encryption processing by 8 byte blocks, I don't want to save pre-encrypted file size, so append 0x00 to input data, and leave them there after decoding)
No, you can add bits to the end of a jpg file, without making it unusable. The heading of the jpg file tells how to read it, so the program reading it will stop at the end of the jpg data.
In fact, people have hidden zip files inside jpg files by appending the zip data to the end of the jpg data. Because of the way these formats are structured, the resulting file is valid in either format.
You can .. but the results may be unpredictable.
Even though there is enough information in the format to tell the client to ignore the extra data it is likely not a case the programmer tested for.
A paranoid program might look at the size, notice the discrepancy and decide it won't process your file because clearly it doesn't fully understand it. This is particularly likely when reading data from the web when random bytes in a file could be considered a security risk.
You can embed your data in the XMP tag within a JPEG (or EXIF or IPTC fields for that matter).
XMP is XML so you have a fair bit of flexibility there to do you own custom stuff.
It's probably not the simplest thing possible but putting your data here will maintain the integrity of the JPEG and require no "post processing".
You data will then show up in other imaging software such as PhotoShop, which may not be ideal.
As others have stated, you have no control how programs process image files and therefore some programs may find the images valid others may not.
However, there is a bigger issue here. Judging by your question, I'm deducing you're practicing "security through obscurity." It's widely considered a very bad practice. Use Google to find a plethora of articles about the topic.