V4L2_PIX_FMT_YUYV: convert from YUYV to RGB24? - linux

I'm capturing image data from a webcam using Video4Linux2. The pixel format returned by the device is V4L2_PIX_FMT_YUYV. According to http://linuxtv.org/downloads/v4l-dvb-apis/V4L2-PIX-FMT-YUYV.html this is the same as YUV422 so I used a YUV422 to RGB24 conversion based on the description at http://paulbourke.net/dataformats/yuv/ .
Amazingly the result is a strange violet/green picture. So it seems YUYV is something different than YUV422 (and there also exists a pixel format V4L2_PIX_FMT_YUV422P which is/is not the same?).
So I'm totally confused now: how can I convert a V4L2_PIX_FMT_YUYV bitmap to real RGB24? Are there any examples out there?

Too long to put in a comment...
4:2:2 is not a pixel-format, it is just a notation about how the chroma-data have been subsampled. According to the linuxtv-link, V4L2_PIX_FMT_YUYV is identical to YUYV or YUY2.
The ultimate reference on the subject is http://www.fourcc.org. Have a look at what it says about YUY2 at http://www.fourcc.org/yuv.php#YUYV
Horizontal Vertical
Y Sample Period 1 1
V Sample Period 2 1
U Sample Period 2 1
To verify that that the input format indeed is YUYV you can use a viewer I wrote using SDL; which natively supports this format (among others)
https://github.com/figgis/yuv-viewer
See also http://www.fourcc.org/fccyvrgb.php for correct formulas for rgb/yuv-conversion.
Take it from there and drop me a comment if you need further assistance...

I had a similar problem and the issue was endianness. V4L returns pixel data as a series of bytes which I was casting to 16 bit ints. Because of the endianness of my machine the Y and Cb (or Y and Cr for odd pixels) values were getting swapped and I was getting a weird violet/green image.
The solution was just to change how I was extracting Y, Cb and Cr from my 16 bit ints. That is to say, instead of this:
int y = pixbuf[i] & 0xFF00) >> 8;
int u = pixbuf[(i / 2) * 2] & 0xFF;
int v = pixbuf[(i / 2) * 2 + 1] & 0xFF;
I should have done this:
int y = (pixbuf[i] & 0xFF);
int u = (pixbuf[(i / 2) * 2] & 0xFF00) >> 8;
int v = (pixbuf[(i / 2) * 2 + 1] & 0xFF00) >> 8;
Or indeed just processed them as a sequence of bytes like a sensible person...

Related

Programmatically Lighten or Darken a hex color in lua - nvim highlight colors

The goal is to programmatically change a hex colors brightness in lua.
This post contains several nice examples for js: Programmatically Lighten or Darken a hex color (or rgb, and blend colors)
I tried my luck to convert one of these functions, but I'm still pretty new to lua programming. It just needs to work with hex values, rgb or other variants are not needed. Therefore, I thought the "simpler" answers could serve as inspiration, but I still had no luck with it.
Eventually it shall be used to manipulate highlight colors in nvim. I'm getting the colorcodes with a function I wrote:
local function get_color(synID, what)
local command = 'echo synIDattr(hlID("' .. synID .. '"),' .. '"' .. what .. '"' .. ')'
return vim.api.nvim_command_output(command)
end
I wouldn't resort to bit ops in Lua 5.2 and lower, especially as Lua 5.1 lacks them (LuaJIT however does provide them); use multiplication, floor division & mod instead, and take care to clamp your values:
local function clamp(component)
return math.min(math.max(component, 0), 255)
end
function LightenDarkenColor(col, amt)
local num = tonumber(col, 16)
local r = math.floor(num / 0x10000) + amt
local g = (math.floor(num / 0x100) % 0x100) + amt
local b = (num % 0x100) + amt
return string.format("%#x", clamp(r) * 0x10000 + clamp(g) * 0x100 + clamp(b))
end
Especially with the introduction of bit operators in 5.3, the Javascript references work with minimal changes:
function LightenDarkenColor(col, amt)
col = tonumber(col, 16)
return string.format("%#x", ((col & 0x0000FF) + amt) | ((((col >> 8) & 0x00FF) + amt) << 8) | (((col >> 16) + amt) << 16))
end
print(LightenDarkenColor("3F6D2A", 40))
parseInt became tonumber and toString(16) string.format("%#x", ...)
Note that this function does not perform any error handling on overflows.
The second function on the linked page can be ported the same way. var would be a local in Lua.
For Lua 5.2 and below, you need to use the bit functions. I ported the second function instead, since it would get very unreadable very quickly:
function LightenDarkenColor(col, amt)
local num = tonumber(col, 16)
local r = bit.rshift(num, 16) + amt
local b = bit.band(bit.rshift(num, 8), 0x00FF) + amt
local g = bit.band(num, 0x0000FF) + amt
local newColor = bit.bor(g, bit.bor(bit.lshift(b, 8), bit.lshift(r, 16)))
return string.format("%#x", newColor)
end

Distribution of bytes within jpeg files

when observing compressed data, I expect an almost uniformely distributed byte stream. When using the chi square test for measure the distribution, I get this result e.g. for ZIP-files and other compressed data, but not for JPG-files. Last days I spent with finding reasons for this, but I cannot find any.
When calculating the entropy of JPGs, I get a high result (e.g. 7,95 Bits/Byte). I thought there must be a connection between the entropy and the distribution: the entropy is hight, when every byte appears with almost the same probability. But when using chi square, a get a p-value which is about 4,5e-5...
I just want to understand how different distributions influence the test results... I thought I can measure the same property with both tests, but obviously I can not.
Thank you very much for any hint!
tom
Distribution in jpeg-files
Ignoring the meta-information and the jpeg-header-data, the payload of a jpeg consists of blocks describing huffmann-tables or encoded MCUs (Minimum-Coded-Units, square blocks of the size 16x16). There may be others but this are the most frequent ones.
Those blocks are delimited by 0xFF 0xSS, where 0xSS is a specific startcode. Here is the first problem: 0xFF is a bit more frequent as twalberg mentioned in the comments.
It may happen, that 0xFF occur in an encoded MCU. To distinguish between this normal payload and the start of a new block, 0xFF 0x00 is inserted. If the distribution of unstuffed payload is perfectly uniform, 0x00 will be twice as often in the stuffed data. To make bad things worse, every MCU is filled up with binary ones to get byte-alignment (a slight bias to larger values) and we might need stuffing again.
There may be also some other factors I'm not aware of. If you need more information you have to provide the jpeg-file.
And about your basic assumption:
for rand_data:
dd if=/dev/urandom of=rand_data count=4096 bs=256
for rand_pseudo (python):
s = "".join(chr(i) for i in range(256))
with file("rand_pseudo", "wb") as f:
for i in range(4096):
f.write(s)
Both should be uniform regarding byte-values, shouldn't they? ;)
$ ll rand_*
-rw-r--r-- 1 apuch apuch 1048576 2012-12-04 20:11 rand_data
-rw-r--r-- 1 apuch apuch 1048967 2012-12-04 20:13 rand_data.tar.gz
-rw-r--r-- 1 apuch apuch 1048576 2012-12-04 20:14 rand_pseudo
-rw-r--r-- 1 apuch apuch 4538 2012-12-04 20:15 rand_pseudo.tar.gz
A uniform distribution might indicate a high entropy but its not a guarantee. Also, rand_data might consists out of 1MB of 0x00. Its extremely unlikely, but possible.
Here you can find two files: the first one is random data, generated with dev/unrandom (about 46MB), the second one is a normal JPG file (about 9MB). It is obvious that the symbols of the JPG-file are not as equally distributed as in dev/urandom.
If I compare both files:
Entropy:
JPG: 7,969247 Bits/Byte
RND: 7,999996 Bits/Byte
P-Value of chi-square test:
JPG: 0
RND: 0,3621
How can the entropy lead to such a high result?!?
Here is my java code
public static double getShannonEntropy_Image(BufferedImage actualImage){
List<String> values= new ArrayList<String>();
int n = 0;
Map<Integer, Integer> occ = new HashMap<>();
for(int i=0;i<actualImage.getHeight();i++){
for(int j=0;j<actualImage.getWidth();j++){
int pixel = actualImage.getRGB(j, i);
int alpha = (pixel >> 24) & 0xff;
int red = (pixel >> 16) & 0xff;
int green = (pixel >> 8) & 0xff;
int blue = (pixel) & 0xff;
//0.2989 * R + 0.5870 * G + 0.1140 * B greyscale conversion
//System.out.println("i="+i+" j="+j+" argb: " + alpha + ", " + red + ", " + green + ", " + blue);
int d= (int)Math.round(0.2989 * red + 0.5870 * green + 0.1140 * blue);
if(!values.contains(String.valueOf(d)))
values.add(String.valueOf(d));
if (occ.containsKey(d)) {
occ.put(d, occ.get(d) + 1);
} else {
occ.put(d, 1);
}
++n;
}
}
double e = 0.0;
for (Map.Entry<Integer, Integer> entry : occ.entrySet()) {
int cx = entry.getKey();
double p = (double) entry.getValue() / n;
e += p * log2(p);
}
return -e;
}

What exactly does a Sample Rate of 44100 sample?

I'm using FMOD library to extract PCM from an MP3. I get the whole 2 channel - 16 bit thing, and I also get that a sample rate of 44100hz is 44,100 samples of "sound" in 1 second. What I don't get is, what exactly does the 16 bit value represent. I know how to plot coordinates on an xy axis, but what am I plotting? The y axis represents time, the x axis represents what? Sound level? Is that the same as amplitude? How do I determine the different sounds that compose this value. I mean, how do I get a spectrum from a 16 bit number.
This may be a separate question, but it's actually what I really need answered: How do I get the amplitude at every 25 milliseconds? Do I take 44,100 values, divide by 40 (40 * 0.025 seconds = 1 sec) ? That gives 1102.5 samples; so would I feed 1102 values into a blackbox that gives me the amplitude for that moment in time?
Edited original post to add code I plan to test soon: (note, I changed the frame rate from 25 ms to 40 ms)
// 44100 / 25 frames = 1764 samples per frame -> 1764 * 2 channels * 2 bytes [16 bit sample] = 7056 bytes
private const int CHUNKSIZE = 7056;
uint bytesread = 0;
var squares = new double[CHUNKSIZE / 4];
const double scale = 1.0d / 32768.0d;
do
{
result = sound.readData(data, CHUNKSIZE, ref read);
Marshal.Copy(data, buffer, 0, CHUNKSIZE);
//PCM samples are 16 bit little endian
Array.Reverse(buffer);
for (var i = 0; i < buffer.Length; i += 4)
{
var avg = scale * (Math.Abs((double)BitConverter.ToInt16(buffer, i)) + Math.Abs((double)BitConverter.ToInt16(buffer, i + 2))) / 2.0d;
squares[i >> 2] = avg * avg;
}
var rmsAmplitude = ((int)(Math.Floor(Math.Sqrt(squares.Average()) * 32768.0d))).ToString("X2");
fs.Write(buffer, 0, (int) read);
bytesread += read;
statusBar.Text = "writing " + bytesread + " bytes of " + length + " to output.raw";
} while (result == FMOD.RESULT.OK && read == CHUNKSIZE);
After loading mp3, seems my rmsAmplitude is in the range 3C00 to 4900. Have I done something wrong? I was expecting a wider spread.
Yes, a sample represents amplitude (at that point in time).
To get a spectrum, you typically convert it from the time domain to the frequency domain.
Last Q: Multiple approaches are used - You may want the RMS.
Generally, the x axis is the time value and y axis is the amplitude. To get the frequency, you need to take the Fourier transform of the data (most likely using the Fast Fourier Transform [fft] algorithm).
To use one of the simplest "sounds", let's assume you have a single frequency noise with frequency f. This is represented (in the amplitude/time domain) as y = sin(2 * pi * x / f).
If you convert that into the frequency domain, you just end up with Frequency = f.
Each sample represents the voltage of the analog signal at a given time.

RGB 24 to 16-bit color conversion - Colors are darkening

I noticed that my routine to convert between RGB888 24-bit to 16-bit RGB565 resulted in darkening of the colors progressively each time a conversion took place... The formula uses linear interpolation like so...
typedef struct _RGB24 RGB24;
struct _RGB24 {
BYTE B;
BYTE G;
BYTE R;
};
RGB24 *s; // source
WORD *d; // destination
WORD r;
WORD g;
WORD b;
// Code to convert from 24-bit to 16 bit
r = (WORD)((double)(s[x].r * 31) / 255.0);
g = (WORD)((double)(s[x].g * 63) / 255.0);
b = (WORD)((double)(s[x].b * 31) / 255.0);
d[x] = (r << REDSHIFT) | (g << GREENSHIFT) | (b << BLUESHIFT);
// Code to convert from 16-bit to 24-bit
s[x].r = (BYTE)((double)(((d[x] & REDMASK) >> REDSHIFT) * 255) / 31.0);
s[x].g = (BYTE)((double)(((d[x] & GREENMASK) >> GREENSHIFT) * 255) / 63.0);
s[x].b = (BYTE)((double)(((d[x] & BLUEMASK) >> BLUESHIFT) * 255) / 31.0);
The conversion from 16-bit to 24-bit is similar but with reverse interpolation... I don't understand how the values keep getting lower and lower each time a color is cycled through the equation if they are opposites... Originally there was no cast to double, but I figured if I made it a floating point divide it would not have the falloff... but it still does...
When you convert your double values to WORD, the values are being truncated. For example,
(126 * 31)/ 255 = 15.439, which is truncated to 15. Because the values are truncated, they get progressively lower through each iteration. You need to introduce rounding (by adding 0.5 to the calculated values before converting them to integers)
Continuing the example, you then take 15 and convert back:
(15 * 255)/31 = 123.387 which truncates to 123
Don't use floating point for something simple like this. Normal way I've seen is to truncate on the down-conversion but extend on the up-conversion (so 0b11111 goes to 0b11111111).
// Code to convert from 24-bit to 16 bit
r = s[x].r >> (8-REDBITS);
g = s[x].g >> (8-GREENBITS);
b = s[x].b >> (8-BLUEBITS);
d[x] = (r << REDSHIFT) | (g << GREENSHIFT) | (b << BLUESHIFT);
// Code to convert from 16-bit to 24-bit
s[x].r = (d[x] & REDMASK) >> REDSHIFT; // 000abcde
s[x].r = s[x].r << (8-REDBITS) | s[x].r >> (2*REDBITS-8); // abcdeabc
s[x].g = (d[x] & GREENMASK) >> GREENSHIFT; // 00abcdef
s[x].g = s[x].g << (8-GREENBITS) | s[x].g >> (2*GREENBITS-8); // abcdefab
s[x].b = (d[x] & BLUEMASK) >> BLUESHIFT; // 000abcde
s[x].b = s[x].b << (8-BLUEBITS) | s[x].b >> (2*BLUEBITS-8); // abcdeabc
Casting double to WORD doesn't round the double value - it truncates the decimal digits. You need to use some kind of rounding routine to get rounding behavior. Typically you want to round half to even. There is a Stack Overflow question on how to round in C++ if you need it.
Also note that the conversion from 24 bit to 16 bits permanently loses information. It's impossible to fit 24 bits of information into 16 bits, of course. You can't get it back by conversion from 16 bits back to 24 bits.
it is because 16 bit works with the values multiplied with 2 for example
2*2*2*2 and it will come out as rrggbb and in same 32 bit case it will multiply the whole bit values with 2.
in short 16 bit 24 bit 32 bit works with multiplication of rgb with 2 and shows you the values in form of color.
for brief u should find the concept of bit color. check it on Wikipedia hope it will help you
Since you're converting to double anyway, at least use it to avoid overflow, i.e. replace
r = (WORD)((double)(s[x].r * 31) / 255.0);
with
r = (WORD)round(s[x].r / 255.0 * 31.0);
in this way the compiler should also fold 31.0/255.0 in a costant.
Obviously if this has to be repeated for huge quantities of pixels, it would be preferable to create and use a LUT (lookup table) instead.

What's the most effective way to interpolate between two colors? (pseudocode and bitwise ops expected)

Making a Blackberry app, want a Gradient class. What's the most effective way (as in, speed and battery life) to interpolate two colors? Please be specific.
// Java, of course
int c1 = 0xFFAA0055 // color 1, ARGB
int c2 = 0xFF00CCFF // color 2, ARGB
float st = 0 // the current step in the interpolation, between 0 and 1
Help from here on.
Should I separate each channel of each color, convert them to decimal and interpolate? Is there a simpler way?
interpolatedChannel = red1+((red2-red1)*st)
interpolatedChannel = interpolatedChannel.toString(16)
^ Is this the right thing to do? If speed and effectiveness
is important in a mobile app, should I use bitwise operations?
Help me!
You'll have to separate channels, but there's no need to convert them to decimal.
For example, if you allow 256 possible gradients:
red = red1 + ((red2 - red1) * stage / 256)
EDIT: Since you said you don't know much about bit management, here's a quick way to split channels:
red = color & 0x000000ff;
green = color & 0x0000ff00;
blue = color & 0x00ff0000;
alpha = color >> 24;
And combining them back:
color = (alpha << 24) | blue | green | red;
From here, the details should normally be handled by the compiler optimizations. If anything, you're looking for the best algorithm.
private function interpolateColorsCompact( a:int, b:int, lerp:Number ):int
{
var MASK1:int = 0xff00ff;
var MASK2:int = 0x00ff00;
var f2:int = 256 * lerp;
var f1:int = 256 - f2;
return ((((( a & MASK1 ) * f1 ) + ( ( b & MASK1 ) * f2 )) >> 8 ) & MASK1 )
| ((((( a & MASK2 ) * f1 ) + ( ( b & MASK2 ) * f2 )) >> 8 ) & MASK2 );
}
Not sure if this is the most compact way of doing it, but it uses less local variables and less operators than the classic method that splis them into 3 channels first.
Oh - and sorry that this is Actionscript, but it should be clear how to convert this to Java.
Updated my answer (found a better way):
The following technique will lose 1 bit precision per channel, but it's extremely fast, since you won't have to split the colors into channels:
int color1 = ...;
int color2 = ...;
int interpolatedColor = ((color1 & 0xFEFEFEFE) >> 1) +
((color2 & 0xFEFEFEFE) >> 1));
So, first you AND both colors by 0xFEFEFEFE. This removes the last bit per channel (reduces precision, as I said). After that, you can safely divide the entire value by 2 (implemented as a right-shift by 1). Finally, you just add up the two values.
Just the java version of /u/Quasimondo's answer:
public static int mixColors(int a, int b, float ratio){
int mask1 = 0x00ff00ff;
int mask2 = 0xff00ff00;
int f2 = (int)(256 * ratio);
int f1 = 256 - f2;
return (((((a & mask1) * f1) + ((b & mask1) * f2)) >> 8) & mask1)
| (((((a & mask2) * f1) + ((b & mask2) * f2)) >> 8) & mask2);
}
If you only need exact 50/50 ratios you can cut out the bitshifting:
public static int mixColors(int a, int b){
int mask1 = 0x00ff00ff;
int mask2 = 0xff00ff00;
return (((a & mask1) + (b & mask1)) & mask1)
| (((a & mask2) + (b & mask2)) & mask2);
}

Resources