Distribution of bytes within jpeg files - statistics

when observing compressed data, I expect an almost uniformely distributed byte stream. When using the chi square test for measure the distribution, I get this result e.g. for ZIP-files and other compressed data, but not for JPG-files. Last days I spent with finding reasons for this, but I cannot find any.
When calculating the entropy of JPGs, I get a high result (e.g. 7,95 Bits/Byte). I thought there must be a connection between the entropy and the distribution: the entropy is hight, when every byte appears with almost the same probability. But when using chi square, a get a p-value which is about 4,5e-5...
I just want to understand how different distributions influence the test results... I thought I can measure the same property with both tests, but obviously I can not.
Thank you very much for any hint!
tom

Distribution in jpeg-files
Ignoring the meta-information and the jpeg-header-data, the payload of a jpeg consists of blocks describing huffmann-tables or encoded MCUs (Minimum-Coded-Units, square blocks of the size 16x16). There may be others but this are the most frequent ones.
Those blocks are delimited by 0xFF 0xSS, where 0xSS is a specific startcode. Here is the first problem: 0xFF is a bit more frequent as twalberg mentioned in the comments.
It may happen, that 0xFF occur in an encoded MCU. To distinguish between this normal payload and the start of a new block, 0xFF 0x00 is inserted. If the distribution of unstuffed payload is perfectly uniform, 0x00 will be twice as often in the stuffed data. To make bad things worse, every MCU is filled up with binary ones to get byte-alignment (a slight bias to larger values) and we might need stuffing again.
There may be also some other factors I'm not aware of. If you need more information you have to provide the jpeg-file.
And about your basic assumption:
for rand_data:
dd if=/dev/urandom of=rand_data count=4096 bs=256
for rand_pseudo (python):
s = "".join(chr(i) for i in range(256))
with file("rand_pseudo", "wb") as f:
for i in range(4096):
f.write(s)
Both should be uniform regarding byte-values, shouldn't they? ;)
$ ll rand_*
-rw-r--r-- 1 apuch apuch 1048576 2012-12-04 20:11 rand_data
-rw-r--r-- 1 apuch apuch 1048967 2012-12-04 20:13 rand_data.tar.gz
-rw-r--r-- 1 apuch apuch 1048576 2012-12-04 20:14 rand_pseudo
-rw-r--r-- 1 apuch apuch 4538 2012-12-04 20:15 rand_pseudo.tar.gz
A uniform distribution might indicate a high entropy but its not a guarantee. Also, rand_data might consists out of 1MB of 0x00. Its extremely unlikely, but possible.

Here you can find two files: the first one is random data, generated with dev/unrandom (about 46MB), the second one is a normal JPG file (about 9MB). It is obvious that the symbols of the JPG-file are not as equally distributed as in dev/urandom.
If I compare both files:
Entropy:
JPG: 7,969247 Bits/Byte
RND: 7,999996 Bits/Byte
P-Value of chi-square test:
JPG: 0
RND: 0,3621
How can the entropy lead to such a high result?!?

Here is my java code
public static double getShannonEntropy_Image(BufferedImage actualImage){
List<String> values= new ArrayList<String>();
int n = 0;
Map<Integer, Integer> occ = new HashMap<>();
for(int i=0;i<actualImage.getHeight();i++){
for(int j=0;j<actualImage.getWidth();j++){
int pixel = actualImage.getRGB(j, i);
int alpha = (pixel >> 24) & 0xff;
int red = (pixel >> 16) & 0xff;
int green = (pixel >> 8) & 0xff;
int blue = (pixel) & 0xff;
//0.2989 * R + 0.5870 * G + 0.1140 * B greyscale conversion
//System.out.println("i="+i+" j="+j+" argb: " + alpha + ", " + red + ", " + green + ", " + blue);
int d= (int)Math.round(0.2989 * red + 0.5870 * green + 0.1140 * blue);
if(!values.contains(String.valueOf(d)))
values.add(String.valueOf(d));
if (occ.containsKey(d)) {
occ.put(d, occ.get(d) + 1);
} else {
occ.put(d, 1);
}
++n;
}
}
double e = 0.0;
for (Map.Entry<Integer, Integer> entry : occ.entrySet()) {
int cx = entry.getKey();
double p = (double) entry.getValue() / n;
e += p * log2(p);
}
return -e;
}

Related

An Algorithm for producing fake audio visualizer

Does anybody knows an algorithm for making a random series of numbers (like 100 java-byte (>=-127 & <= 127) ) which when are drawn as a bar chart, would be similar to a regular audio spectrum, like those SoundCloud ones?
I'm trying to write one, it has multiple Random and Sinus calculations, but the result is very ugly, it's something between a sinus wave and an old toothbrush. I would be very thankful if you code direct me to a one which is aesthetically convincing
An algorithm with an explanation (and/or picture) is fine. A pseudocode would be very nice of you. An actual JAVA code is bonus. :D
Edit:
This is the code I'm using right now. It's convoluted but I'm basically adding a random deviation to a sinus wave with random amplitude (which I'm not sure if it was a good idea).
private static final int FREQ = 7;
private static final double DEG_TO_RAD = Math.PI / 180;
private static final int MAX_AMPLITUDE = 127;
private static final float DEVIATION = 0.1f; // 10 percent is maximum deviation
private void makeSinusoidRandomBytes() {
byte[] bytes = new byte[AUDIO_VISUALIZER_DENSITY];
for (int i = 0; i < AUDIO_VISUALIZER_DENSITY; i++) {
int amplitude = random.nextInt(MAX_AMPLITUDE) - MAX_AMPLITUDE/2;
byte dev = (byte) (random.nextInt((int) Math.max(Math.abs(2 * DEVIATION * amplitude), 1))
- Math.abs(DEVIATION * amplitude));
bytes[i] = (byte) (Math.sin(i * FREQ * DEG_TO_RAD) * amplitude - dev);
}
this.bytes = bytes;
}
A real soundwave is actually a combination of sine waves of different frequencies and amplitudes added together, not random deviations from a sine wave. The difficult part will be to choose a combination of wave amplitudes and frequencies that will produce the output that you will subjectively like! However, most sound waves have a base frequency and then a number of overtones which "fit into" that wavelength - for example it might have an overtone at 3/2 * the base frequency and at amplitude of 2/3 the base frequency. By combining these overtones and scaling the resulting waveform to the -127 - +127 range, you'll get an actual soundwave.
The following code is C#, but close enough to Java to give you an idea. It's from a game, where I needed to combine many sine waves together to create various types of oscillating effects:
/// <summary>
/// Return a value between 0 and 1 based on a sine-wave oscillating with a given combination of periods at a given point in time
/// </summary>
/// <param name="time">time to get wave value at</param>
/// <param name="periods">lengths of waves</param>
/// <returns>height of wave</returns>
public static float MultiPulse(float time, params float[] periods)
{
float c = 0;
foreach (float p in periods)
{
float cp = (MathHelper.Pi / p) * time;
float s = ((float)Math.Sin(cp) + 1) / 2;
c += s / periods.Length;
}
return c;
}
You probably want to modify that to allow you to specify different amplitudes as well as periods for the waves you are combining.
By combining many widely varying amplitudes and periods (frequencies) you should by trial and error be able to get something convincing.
Based on the idea see sharper gave me, this is the code I'm using right now:
int mainAmp = random.nextInt(MAX_AMPLITUDE) - MAX_AMPLITUDE / 2;
int overtoneAmp = random.nextInt(MAX_AMPLITUDE * 2 / 3) - MAX_AMPLITUDE / 3;
int overtone2Amp = random.nextInt(MAX_AMPLITUDE * 4 / 7) - MAX_AMPLITUDE / 2 * 7;
int mainFreq = random.nextInt(7) + 7;
int overtoneFreq = mainFreq * 3 / 2;
int overtone2Freq = mainFreq * 7 / 4;
byte[] bytes = new byte[AUDIO_VISUALIZER_DENSITY];
for (int i = 0; i < AUDIO_VISUALIZER_DENSITY; i++) {
bytes[i] = (byte) (Math.sin(i * mainFreq * DEG_TO_RAD) * mainAmp
+ Math.sin(i * overtoneFreq * DEG_TO_RAD) * overtoneAmp
+ Math.sin(i * overtone2Freq * DEG_TO_RAD) * overtone2Amp);
}
Main frequency is between 8 and 15 for my app. You can play with those. The other two overtones I'm using are (2 - 1/2)x & (2 - 1/4)x of main frequency. You can add more like (2 - 1/8)x etc. Or use another series of frequencies. I also randomize the amplitude to get a unique wave each time.
These are some waves I'm drawing using this code:

μ-Law algorithm implementation

Here is Mu-Law encoder taken from NAudio. The question is how is this formula same with the code? I can understand that MuLawCompressTable is actually the Log but I dont get the thing about mantissa why it is taken as is.
private const int cBias = 0x84;
private const int cClip = 32635;
private static readonly byte[] MuLawCompressTable = new byte[256]
{
0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
};
public static byte LinearToMuLawSample(short sample)
{
//We get the sign
int sign = (sample >> 8) & 0x80;
if (sign != 0)
sample = (short)-sample;
if (sample > cClip)
sample = cClip;
sample = (short)(sample + cBias);
int exponent = (int)MuLawCompressTable[(sample >> 7) & 0xFF];
int mantissa = (sample >> (exponent + 3)) & 0x0F;
int compressedByte = ~(sign | (exponent << 4) | mantissa);
return (byte)compressedByte;
}
They are different. See the Wikipedia page on mu-law, http://en.wikipedia.org/wiki/Mulaw
There are two forms of this algorithm—an analog version, and a quantized digital version.
You quote the formula for the "analog" version - a compressive mapping from -1..1 to -1..1, which emphasizes the basic idea of mu-law, i.e. that the quantized value encodes more detail (uses a smaller quantization step) for smaller values, so the introduced quantization error is roughly proportional to the overall amplitude of the signal.
The "digital" version is a piecewise-linear approximation to this basic idea, with some additional shifts to further simplify processing.
Here's a plot comparing the two. You can see the stairsteps in the green line (mu_digital) corresponding to the discrete 7-bit values, and you also spot the different linear sections approximating the smooth blue line.

V4L2_PIX_FMT_YUYV: convert from YUYV to RGB24?

I'm capturing image data from a webcam using Video4Linux2. The pixel format returned by the device is V4L2_PIX_FMT_YUYV. According to http://linuxtv.org/downloads/v4l-dvb-apis/V4L2-PIX-FMT-YUYV.html this is the same as YUV422 so I used a YUV422 to RGB24 conversion based on the description at http://paulbourke.net/dataformats/yuv/ .
Amazingly the result is a strange violet/green picture. So it seems YUYV is something different than YUV422 (and there also exists a pixel format V4L2_PIX_FMT_YUV422P which is/is not the same?).
So I'm totally confused now: how can I convert a V4L2_PIX_FMT_YUYV bitmap to real RGB24? Are there any examples out there?
Too long to put in a comment...
4:2:2 is not a pixel-format, it is just a notation about how the chroma-data have been subsampled. According to the linuxtv-link, V4L2_PIX_FMT_YUYV is identical to YUYV or YUY2.
The ultimate reference on the subject is http://www.fourcc.org. Have a look at what it says about YUY2 at http://www.fourcc.org/yuv.php#YUYV
Horizontal Vertical
Y Sample Period 1 1
V Sample Period 2 1
U Sample Period 2 1
To verify that that the input format indeed is YUYV you can use a viewer I wrote using SDL; which natively supports this format (among others)
https://github.com/figgis/yuv-viewer
See also http://www.fourcc.org/fccyvrgb.php for correct formulas for rgb/yuv-conversion.
Take it from there and drop me a comment if you need further assistance...
I had a similar problem and the issue was endianness. V4L returns pixel data as a series of bytes which I was casting to 16 bit ints. Because of the endianness of my machine the Y and Cb (or Y and Cr for odd pixels) values were getting swapped and I was getting a weird violet/green image.
The solution was just to change how I was extracting Y, Cb and Cr from my 16 bit ints. That is to say, instead of this:
int y = pixbuf[i] & 0xFF00) >> 8;
int u = pixbuf[(i / 2) * 2] & 0xFF;
int v = pixbuf[(i / 2) * 2 + 1] & 0xFF;
I should have done this:
int y = (pixbuf[i] & 0xFF);
int u = (pixbuf[(i / 2) * 2] & 0xFF00) >> 8;
int v = (pixbuf[(i / 2) * 2 + 1] & 0xFF00) >> 8;
Or indeed just processed them as a sequence of bytes like a sensible person...

What exactly does a Sample Rate of 44100 sample?

I'm using FMOD library to extract PCM from an MP3. I get the whole 2 channel - 16 bit thing, and I also get that a sample rate of 44100hz is 44,100 samples of "sound" in 1 second. What I don't get is, what exactly does the 16 bit value represent. I know how to plot coordinates on an xy axis, but what am I plotting? The y axis represents time, the x axis represents what? Sound level? Is that the same as amplitude? How do I determine the different sounds that compose this value. I mean, how do I get a spectrum from a 16 bit number.
This may be a separate question, but it's actually what I really need answered: How do I get the amplitude at every 25 milliseconds? Do I take 44,100 values, divide by 40 (40 * 0.025 seconds = 1 sec) ? That gives 1102.5 samples; so would I feed 1102 values into a blackbox that gives me the amplitude for that moment in time?
Edited original post to add code I plan to test soon: (note, I changed the frame rate from 25 ms to 40 ms)
// 44100 / 25 frames = 1764 samples per frame -> 1764 * 2 channels * 2 bytes [16 bit sample] = 7056 bytes
private const int CHUNKSIZE = 7056;
uint bytesread = 0;
var squares = new double[CHUNKSIZE / 4];
const double scale = 1.0d / 32768.0d;
do
{
result = sound.readData(data, CHUNKSIZE, ref read);
Marshal.Copy(data, buffer, 0, CHUNKSIZE);
//PCM samples are 16 bit little endian
Array.Reverse(buffer);
for (var i = 0; i < buffer.Length; i += 4)
{
var avg = scale * (Math.Abs((double)BitConverter.ToInt16(buffer, i)) + Math.Abs((double)BitConverter.ToInt16(buffer, i + 2))) / 2.0d;
squares[i >> 2] = avg * avg;
}
var rmsAmplitude = ((int)(Math.Floor(Math.Sqrt(squares.Average()) * 32768.0d))).ToString("X2");
fs.Write(buffer, 0, (int) read);
bytesread += read;
statusBar.Text = "writing " + bytesread + " bytes of " + length + " to output.raw";
} while (result == FMOD.RESULT.OK && read == CHUNKSIZE);
After loading mp3, seems my rmsAmplitude is in the range 3C00 to 4900. Have I done something wrong? I was expecting a wider spread.
Yes, a sample represents amplitude (at that point in time).
To get a spectrum, you typically convert it from the time domain to the frequency domain.
Last Q: Multiple approaches are used - You may want the RMS.
Generally, the x axis is the time value and y axis is the amplitude. To get the frequency, you need to take the Fourier transform of the data (most likely using the Fast Fourier Transform [fft] algorithm).
To use one of the simplest "sounds", let's assume you have a single frequency noise with frequency f. This is represented (in the amplitude/time domain) as y = sin(2 * pi * x / f).
If you convert that into the frequency domain, you just end up with Frequency = f.
Each sample represents the voltage of the analog signal at a given time.

RGB 24 to 16-bit color conversion - Colors are darkening

I noticed that my routine to convert between RGB888 24-bit to 16-bit RGB565 resulted in darkening of the colors progressively each time a conversion took place... The formula uses linear interpolation like so...
typedef struct _RGB24 RGB24;
struct _RGB24 {
BYTE B;
BYTE G;
BYTE R;
};
RGB24 *s; // source
WORD *d; // destination
WORD r;
WORD g;
WORD b;
// Code to convert from 24-bit to 16 bit
r = (WORD)((double)(s[x].r * 31) / 255.0);
g = (WORD)((double)(s[x].g * 63) / 255.0);
b = (WORD)((double)(s[x].b * 31) / 255.0);
d[x] = (r << REDSHIFT) | (g << GREENSHIFT) | (b << BLUESHIFT);
// Code to convert from 16-bit to 24-bit
s[x].r = (BYTE)((double)(((d[x] & REDMASK) >> REDSHIFT) * 255) / 31.0);
s[x].g = (BYTE)((double)(((d[x] & GREENMASK) >> GREENSHIFT) * 255) / 63.0);
s[x].b = (BYTE)((double)(((d[x] & BLUEMASK) >> BLUESHIFT) * 255) / 31.0);
The conversion from 16-bit to 24-bit is similar but with reverse interpolation... I don't understand how the values keep getting lower and lower each time a color is cycled through the equation if they are opposites... Originally there was no cast to double, but I figured if I made it a floating point divide it would not have the falloff... but it still does...
When you convert your double values to WORD, the values are being truncated. For example,
(126 * 31)/ 255 = 15.439, which is truncated to 15. Because the values are truncated, they get progressively lower through each iteration. You need to introduce rounding (by adding 0.5 to the calculated values before converting them to integers)
Continuing the example, you then take 15 and convert back:
(15 * 255)/31 = 123.387 which truncates to 123
Don't use floating point for something simple like this. Normal way I've seen is to truncate on the down-conversion but extend on the up-conversion (so 0b11111 goes to 0b11111111).
// Code to convert from 24-bit to 16 bit
r = s[x].r >> (8-REDBITS);
g = s[x].g >> (8-GREENBITS);
b = s[x].b >> (8-BLUEBITS);
d[x] = (r << REDSHIFT) | (g << GREENSHIFT) | (b << BLUESHIFT);
// Code to convert from 16-bit to 24-bit
s[x].r = (d[x] & REDMASK) >> REDSHIFT; // 000abcde
s[x].r = s[x].r << (8-REDBITS) | s[x].r >> (2*REDBITS-8); // abcdeabc
s[x].g = (d[x] & GREENMASK) >> GREENSHIFT; // 00abcdef
s[x].g = s[x].g << (8-GREENBITS) | s[x].g >> (2*GREENBITS-8); // abcdefab
s[x].b = (d[x] & BLUEMASK) >> BLUESHIFT; // 000abcde
s[x].b = s[x].b << (8-BLUEBITS) | s[x].b >> (2*BLUEBITS-8); // abcdeabc
Casting double to WORD doesn't round the double value - it truncates the decimal digits. You need to use some kind of rounding routine to get rounding behavior. Typically you want to round half to even. There is a Stack Overflow question on how to round in C++ if you need it.
Also note that the conversion from 24 bit to 16 bits permanently loses information. It's impossible to fit 24 bits of information into 16 bits, of course. You can't get it back by conversion from 16 bits back to 24 bits.
it is because 16 bit works with the values multiplied with 2 for example
2*2*2*2 and it will come out as rrggbb and in same 32 bit case it will multiply the whole bit values with 2.
in short 16 bit 24 bit 32 bit works with multiplication of rgb with 2 and shows you the values in form of color.
for brief u should find the concept of bit color. check it on Wikipedia hope it will help you
Since you're converting to double anyway, at least use it to avoid overflow, i.e. replace
r = (WORD)((double)(s[x].r * 31) / 255.0);
with
r = (WORD)round(s[x].r / 255.0 * 31.0);
in this way the compiler should also fold 31.0/255.0 in a costant.
Obviously if this has to be repeated for huge quantities of pixels, it would be preferable to create and use a LUT (lookup table) instead.

Resources