Binary to 16 Bit unsigned & 16 bit signed magnitude - decimal

I missed a day of class due to illness, so I checked out my profs. material for that day online and I'm stuck on this. His notes don't have an explanation on how to do it. I can do conversions between the masses (decimal to octal, hex, binary etc.), but I can't do this.
Any help? An example would really help me understand quickly. I'll post his slideshow example:
1010 0000 0100 0101 as an unsigned value
= (1 * 2^15) + (1 * 2^13) + (1 * 2^6) + (1 * 2^2) + (1 * 2^0)
= (32,768) + (8192) + (64) + (4) + (1)
= 32,768 + 8261 = 41,029 base 10
1010 0000 0100 0101 as a signed value
= - [(1 * 2^13) + (1 * 2^6) + (1 * 2^2) + (1 * 2^0)]
= -8,261 base 10
I guess I should really attend class even when I'm sick.

Sign is the 15th bit. So all you have to do is basically count places with ones (i.e. 2^place) and add them together.

The difference between a signed integer and an unsigned integer is that one of the bits, in this case the leftmost bit is used to indicate if the value is positive or negatve. In this case, if the leftmost bit is 1, then the value is negative, and when the leftmost bit is 0 the value is positive.
So in the example that your professor gave,
1010 0000 0100 0101
can be interpreted as either a signed integer or an unsigned integer, depeding on the situation. When interpreted as a signed integer, the value evaluates out
(1 * 2^15) + (1 * 2^13) + (1 * 2^6) + (1 * 2^2) + (1 * 2^0) = 41092
When interpreted as an unsigned value, you get the sign from the leftmost bit and the value of the integer from the rest of the bits
- [(1 * 2^13) + (1 * 2^6) + (1 * 2^2) + (1 * 2^0)] = - 8261
Hope this helps!

Related

Is there a limit to the size of a BigInt or BigUint in Rust?

Is there no limit to the size of a BigInt or BigUint from the num crate in Rust? I see that in Java its length is bounded by the upper limit of an integer Integer.MAX_VALUE as it is stored as an array of int.
I did go through the documentation for it but could not really deduce my answer from
A BigUint-typed value BigUint { data: vec!(a, b, c) } represents a
number (a + b * big_digit::BASE + c * big_digit::BASE^2).
big_digit::BASE being defined as
pub const BASE: DoubleBigDigit = 1 << BITS
BITS in turn is 32
So is the BigInt being represented as (a + b * 64 + c * 64^2) internally?
TL;DR: the maximum number that can be represented is roughly:
3.079 x 10^22212093154093428519
I suppose that nothing useful needs such a big number to be represented. You can be certain that the num_bigint will do the job, whatever the usage you have with it.
In theory, there is no limit to the num big integers size since the documentation says nothing about it (version 0.1.44). However, there is a concrete limit that we can calculate:
BigUint is a Vec<BigDigit>, and BigDigit is an u32. As far as I know, Rust does not define a max size for a Vec, but since the maximum possible allocated size is isize::MAX, the maximum number of BigDigit aka u32 is:
MAX_LEN = isize::MAX / sizeof(u32)
With this information, we can deduce that the maximum of a num::BigUint (and a num::BigInt as well) in the current implementation is:
(u32::MAX + 1) ^ MAX_LEN - 1 = 2^32^MAX_LEN - 1
To have this formula, we mimic the way we calculate u8::MAX, for example:
bit::MAX is 1,
the length is 8,
so the maximum is (bit::MAX + 1) ^ 8 - 1 = 255
Here is the full demonstration from the formula given by the num documentation:
a + b * big_digit::BASE + c * big_digit::BASE^2 + ...
If we are taking the max value, a == b == c == u32::MAX. Let's name it a. Let's name big_digit::BASE b for convenience. So the max number is:
sum(a * b^n) where n is from 0 to (MAX_LEN - 1)
if we factorize, we get:
a * sum(b^n) where n is from 0 to (MAX_LEN - 1)
The general formula of the sum of x^n is (x^(n + 1) - 1) / (x - 1). So, because n is MAX_LEN - 1, the result is:
a * (b^(MAX_LEN - 1 + 1) - 1) / (b - 1)
We replace a and b with the right value, and the biggest representable number is:
u32::MAX * (2^32^MAX_LEN - 1) / (2^32 - 1)
u32::MAX is 2^32 - 1, so this can be simplified into:
2^32^MAX_LEN - 1

Converting 8 bit color into RGB value

I'm implementing global illumination in my game engine with "reflective shadow maps". RSM has i.a. color texture. To save memory. I'm packing 24 bit value into 8 bit value. Ok. I know how to pack it. But how do I unpack it? I had idea to create a 1D texture with 8 bit palette, with 255 different colors. My 8 bit color would be index of pixel in that texture.
I'm not sure how to generate this kind of texture.
Are there any mathematical ways to convert 8 bit value into rgb?
#edit
The color is in this format:
RRR GGG BB
#edit2:
And I'm packing my colour like this:
int packed = (red / 32 << 5) + (green / 32 << 2) + (blue / 64);
//the int is actually a byte, c# compiler is bitching if it's byte.
#edit3:
Alright, I found a way to do this I think. Tell me if it's wrong.
#edit4 It's wrong...
int r = (packed >> 5) * 32;
int g = ((packed >> 2) << 3) * 32;
int b = (packed << 6) * 64;
In javascript
Encode
encodedData = (Math.floor((red / 32)) << 5) + (Math.floor((green / 32)) << 2) + Math.floor((blue / 64));
Decode
red = (encodedData >> 5) * 32;
green = ((encodedData & 28) >> 2) * 32;
blue = (encodedData & 3) * 64;
While decoding we are using AND Gate/Operator to extract desired bits and discard leading bits. With green, we would then have to shift right to discard bits at right.
While encoding Math.floor is used to truncate decimal part, if rounded off it would create total value greater than 255 making it a 9 bit number.
UPDATE 1
It does not provide accurate results if we divide color by 32 or 64.
RRRGGGBB
R/G = 3bit, max value is 111 in binary which is 7 in decimal.
B = 2bit, max value is 11 in binary which is 3 in decimal.
We should divide R/G by value equal or greater than 255/7 and B by value equal or greater than 255/3.
We should also note that in place of Math.floor we should use Math.round because rounding off gives more accurate results.
To convert 8bit [0 - 255] value into 3bit [0, 7], the 0 is not a problem, but remember 255 should be converted to 7, so the formula should be Red3 = Red8 * 7 / 255.
To convert 24bit color into 8bit,
8bit Color = (Red * 7 / 255) << 5 + (Green * 7 / 255) << 2 + (Blue * 3 / 255)
To reverse,
Red = (Color >> 5) * 255 / 7
Green = ((Color >> 2) & 0x07) * 255 / 7
Blue = (Color & 0x03) * 255 / 3

How would you average two 32-bit colors packed into an integer?

I'm trying to average two colors.
My original (horrible) implement is as follows:
//color is a union
int ColorAverage(int c1, int c2) {
color C1(c1);
color C2(c2);
return color(
(unsigned char)(0.5f * C1.a + 0.5f * C2.a),
(unsigned char)(0.5f * C1.r + 0.5f * C2.r),
(unsigned char)(0.5f * C1.g + 0.5f * C2.g),
(unsigned char)(0.5f * C1.b + 0.5f * C2.b)
).c;
}
My current solution is as follows (which performs considerably better):
int ColorAverage(int c1, int c2) {
unsigned char* b1 = reinterpret_cast<unsigned char*>(&c1);
unsigned char* b2 = reinterpret_cast<unsigned char*>(&c2);
int value;
unsigned char* bv = reinterpret_cast<unsigned char*>(&value);
bv[0] = (b1[0] + b2[0]) / 2;
bv[1] = (b1[1] + b2[1]) / 2;
bv[2] = (b1[2] + b2[2]) / 2;
bv[3] = (b1[3] + b2[3]) / 2;
return(value);
}
However, it's still quite slow (it's about 3% of my frame time).
I did find a solution for 24bit, but it does not apply to 32bit (the alpha is lost):
#define AVERAGE(a, b) ( ((((a) ^ (b)) & 0xfffefefeL) >> 1) + ((a) & (b)) )
http://www.compuphase.com/graphic/scale3.htm#HSIEH1
Try extending your mask to 32 bits, like this:
#define AVERAGE(a, b) ( ((((a) ^ (b)) & 0xfefefefeL) >> 1) + ((a) & (b)) )
Edit: I did a quick check, and it appears to work for my test case. Nice formula, by the way!
The goal is to take the following operation:
(a + b) / 2 = ((a ^ b) >> 1) + (a & b)
And apply it to all four bytes of the integer. If this were just one byte, then the right shift by 1 bit would discard the right-most bit. However, in this case, the right-most bit of the leading 3 bytes isn't discarded---it's shifted into the neighboring byte. The idea to keep in mind is that you need the mask the last bit of each byte so that it doesn't 'contaminate' the neighboring byte during the shift. For example, say that a ^ b is this:
a XOR b = 1011 1101 1110 1001
A right-shift of 1 bit, without the mask, would look like this:
(a XOR b) >> 1 = 0101 1110 1111 0100
Which is wrong. The mask zero's out the last bit of each byte, so that this doesn't happen:
(a XOR b) AND 0xfefefefe = 1010 1100 1110 1000
Then you can shift this value safely to the right:
((a XOR b) AND 0xfefefefe) = 0101 0110 0111 0100
So:
#define AVERAGE(a, b) ( ((((a) ^ (b)) & 0xfefefefeL) >> 1) + ((a) & (b)) )
One thing to keep in mind is that C does not differentiate arithmetic right shift from logical right shift with its operator. You'll want to make sure that the integers you are shifting are unsigned, to prevent implementation-specific signed-integer shift voodoo.
EDIT: I think #dasblinkenlight may have beaten me the this answer. Just beware of shifting signed integers and you should be good.

RGB 24 to 16-bit color conversion - Colors are darkening

I noticed that my routine to convert between RGB888 24-bit to 16-bit RGB565 resulted in darkening of the colors progressively each time a conversion took place... The formula uses linear interpolation like so...
typedef struct _RGB24 RGB24;
struct _RGB24 {
BYTE B;
BYTE G;
BYTE R;
};
RGB24 *s; // source
WORD *d; // destination
WORD r;
WORD g;
WORD b;
// Code to convert from 24-bit to 16 bit
r = (WORD)((double)(s[x].r * 31) / 255.0);
g = (WORD)((double)(s[x].g * 63) / 255.0);
b = (WORD)((double)(s[x].b * 31) / 255.0);
d[x] = (r << REDSHIFT) | (g << GREENSHIFT) | (b << BLUESHIFT);
// Code to convert from 16-bit to 24-bit
s[x].r = (BYTE)((double)(((d[x] & REDMASK) >> REDSHIFT) * 255) / 31.0);
s[x].g = (BYTE)((double)(((d[x] & GREENMASK) >> GREENSHIFT) * 255) / 63.0);
s[x].b = (BYTE)((double)(((d[x] & BLUEMASK) >> BLUESHIFT) * 255) / 31.0);
The conversion from 16-bit to 24-bit is similar but with reverse interpolation... I don't understand how the values keep getting lower and lower each time a color is cycled through the equation if they are opposites... Originally there was no cast to double, but I figured if I made it a floating point divide it would not have the falloff... but it still does...
When you convert your double values to WORD, the values are being truncated. For example,
(126 * 31)/ 255 = 15.439, which is truncated to 15. Because the values are truncated, they get progressively lower through each iteration. You need to introduce rounding (by adding 0.5 to the calculated values before converting them to integers)
Continuing the example, you then take 15 and convert back:
(15 * 255)/31 = 123.387 which truncates to 123
Don't use floating point for something simple like this. Normal way I've seen is to truncate on the down-conversion but extend on the up-conversion (so 0b11111 goes to 0b11111111).
// Code to convert from 24-bit to 16 bit
r = s[x].r >> (8-REDBITS);
g = s[x].g >> (8-GREENBITS);
b = s[x].b >> (8-BLUEBITS);
d[x] = (r << REDSHIFT) | (g << GREENSHIFT) | (b << BLUESHIFT);
// Code to convert from 16-bit to 24-bit
s[x].r = (d[x] & REDMASK) >> REDSHIFT; // 000abcde
s[x].r = s[x].r << (8-REDBITS) | s[x].r >> (2*REDBITS-8); // abcdeabc
s[x].g = (d[x] & GREENMASK) >> GREENSHIFT; // 00abcdef
s[x].g = s[x].g << (8-GREENBITS) | s[x].g >> (2*GREENBITS-8); // abcdefab
s[x].b = (d[x] & BLUEMASK) >> BLUESHIFT; // 000abcde
s[x].b = s[x].b << (8-BLUEBITS) | s[x].b >> (2*BLUEBITS-8); // abcdeabc
Casting double to WORD doesn't round the double value - it truncates the decimal digits. You need to use some kind of rounding routine to get rounding behavior. Typically you want to round half to even. There is a Stack Overflow question on how to round in C++ if you need it.
Also note that the conversion from 24 bit to 16 bits permanently loses information. It's impossible to fit 24 bits of information into 16 bits, of course. You can't get it back by conversion from 16 bits back to 24 bits.
it is because 16 bit works with the values multiplied with 2 for example
2*2*2*2 and it will come out as rrggbb and in same 32 bit case it will multiply the whole bit values with 2.
in short 16 bit 24 bit 32 bit works with multiplication of rgb with 2 and shows you the values in form of color.
for brief u should find the concept of bit color. check it on Wikipedia hope it will help you
Since you're converting to double anyway, at least use it to avoid overflow, i.e. replace
r = (WORD)((double)(s[x].r * 31) / 255.0);
with
r = (WORD)round(s[x].r / 255.0 * 31.0);
in this way the compiler should also fold 31.0/255.0 in a costant.
Obviously if this has to be repeated for huge quantities of pixels, it would be preferable to create and use a LUT (lookup table) instead.

Partition line into equal parts

This is a geometry question.
I have a line between two points A and B and want separate it into k equal parts. I need the coordinates of the points that partition the line between A and B.
Any help is highly appreciated.
Thanks a lot!
You just need a weighted average of A and B.
C(t) = A * (1-t) + B * t
or, in 2-D
Cx = Ax * (1-t) + Bx * t
Cy = Ay * (1-t) + By * t
When t=0, you get A.
When t=1, you get B.
When t=.25, you a point 25% of the way from A to B
etc
So, to divide the line into k equal parts, make a loop and find C, for t=0/k, t=1/k, t=2/k, ... , t=k/k
for(int i=0;i<38;i++)
{
Points[i].x = m_Pos.x * (1 - (i/38.0)) + m_To.x * (i / 38.0);
Points[i].y = m_Pos.y * (1 - (i/38.0)) + m_To.y * (i / 38.0);
if(i == 0 || i == 37 || i == 19) dbg_msg("CLight","(%d)\nPos(%f,%f)\nTo(%f,%f)\nPoint(%f,%f)",i,m_Pos.x,m_Pos.y,m_To.x,m_To.y,Points[i].x,Points[i].y);
}
prints:
[4c7cba40][CLight]: (0)
Pos(3376.000000,1808.000000)
To(3400.851563,1726.714111)
Point(3376.000000,1808.000000)
[4c7cba40][CLight]: (19)
Pos(3376.000000,1808.000000)
To(3400.851563,1726.714111)
Point(3388.425781,1767.357056)
[4c7cba40][CLight]: (37)
Pos(3376.000000,1808.000000)
To(3400.851563,1726.714111)
Point(3400.851563,1726.714111)
which looks fine but then my program doesn't work :D.
but your method works so thanks

Resources