decoding deflate blocks after (HCLEN + 4) x 3 bits - deflate

So in the deflate algorithm each block starts off with a 3 bit header:
Each block of compressed data begins with 3 header bits
containing the following data:
first bit BFINAL
next 2 bits BTYPE
Assuming BTYPE is 10 (compressed with dynamic Huffman codes) then the next 14 bits are as follows:
5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
5 Bits: HDIST, # of Distance codes - 1 (1 - 32)
4 Bits: HCLEN, # of Code Length codes - 4 (4 - 19)
The next (HCLEN + 4) x 4 bits represent the code lengths.
What happens after that is less clear to me.
RFC1951 § 3.2.7. Compression with dynamic Huffman codes (BTYPE=10) says this:
HLIT + 257 code lengths for the literal/length alphabet,
encoded using the code length Huffman code
HDIST + 1 code lengths for the distance alphabet,
encoded using the code length Huffman code
Doing infgen -ddis on 1589c11100000cc166a3cc61ff2dca237709880c45e52c2b08eb043dedb78db8851e (produced by doing gzdeflate('A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_BED')) gives this:
zeros 65 ! 0110110 110
lens 3 ! 0
lens 3 ! 0
lens 4 ! 101
lens 3 ! 0
lens 3 ! 0
zeros 25 ! 0001110 110
lens 3 ! 0
zeros 138 ! 1111111 110
zeros 22 ! 0001011 110
lens 4 ! 101
lens 3 ! 0
lens 3 ! 0
zeros 3 ! 000 1111
lens 2 ! 100
lens 0 ! 1110
lens 0 ! 1110
lens 2 ! 100
lens 2 ! 100
lens 3 ! 0
lens 3 ! 0
I note that 65 is the hex encoding of "A" in ASCII, which presumably explains "zeros 65".
"lens" occurs 16 times, which is equal to HCLEN + 4.
In RFC1951 § 3.2.2. Use of Huffman coding in the "deflate" format there's this:
2) Find the numerical value of the smallest code for each
code length:
code = 0;
bl_count[0] = 0;
for (bits = 1; bits <= MAX_BITS; bits++) {
code = (code + bl_count[bits-1]) << 1;
next_code[bits] = code;
}
So maybe that's what "zeros 65" is but then what about "zeros 25", "zeros 138" and "zeros 22"? 25, 138 and 22, in ASCII, do not appear in the compressed text.
Any ideas?

The next (HCLEN + 4) x 3 bits represent the code lengths.
The number of lens's has nothing to do with HCLEN. The sequence of zeros and lens represent the 269 (259+10) literal/length and distance codes code lengths. If you add up the zeros and the number of lens, you get 269.
A zero-length symbol means it does not appear in the compressed data. There are no literal bytes in the data in the range 0..64, so it starts with 65 zeros. The first symbol coded is then an 'A', with length 3.

Related

Converting Base64 to Hex confusion

I was working on a problem for converting base64 to hex and the problem prompt said as an example:
3q2+7w== should produce deadbeef
But if I do that manually, using the base64 digit set ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ I get:
3 110111
q 101010
2 110110
+ 111110
7 111011
w 110000
As a binary string:
110111 101010 110110 111110 111011 110000
grouped into fours:
1101 1110 1010 1101 1011 1110 1110 1111 0000
to hex
d e a d b e e f 0
So shouldn't it be deadbeef0 and not deadbeef? Or am I missing something here?
Base64 is meant to encode bytes (8 bit).
Your base64 string has 6 characters plus 2 padding chars (=), so you could theoretically encode 6*6bits = 36 bits, which would equal 9 4bit hex numbers. But in fact you must think in bytes and then you only have 4 bytes (32 bits) of significant information. The remaining 4 bits (the extra '0') must be ignored.
You can calculate the number of insignificant bits as:
y : insignificant bits
x : number of base64 characters (without padding)
y = (x*6) mod 8
So in your case:
y = (6*6) mod 8 = 4
So you have 4 insignificant bit on the end that you need to ignore.

Convert any length signed hexadecimal number to signed decimal number (Excel)

Question
When faced with signed hexadecimal numbers of unknown length, how can one use Excel formulas to easily convert those hexadecimal numbers to decimal numbers?
Example
Hex
---
00
FF
FE
FD
0A
0B
Use this deeply nested formula:
=HEX2DEC(N)-IF(ISERR(FIND(LEFT(IF(ISEVEN(LEN(N)),N,CONCAT(0,N))),"01234567")),16^LEN(IF(ISEVEN(LEN(N)),N,CONCAT(0,N))),0)
where N is a cell containing hexadecimal data.
This formula becomes more readable when expanded:
=HEX2DEC(N) -
/* check if sign bit is present in leftmost nibble, padding to an even number of digits if necessary */
IF( ISERR( FIND( LEFT( IF( ISEVEN(LEN(N))
, N
, CONCAT(0,N)
)
)
, "01234567"
)
)
/* offset if sign bit is present */
, 16^LEN( IF( ISEVEN(LEN(N))
, N
, CONCAT(0,N)
)
)
/* do not offset if sign bit is absent */
, 0
)
and may be read as "First, convert the hexadecimal value to an unsigned decimal value. Then offset the unsigned decimal value if the leftmost nibble of the data contains a sign bit; else do not offset."
Example Conversion
Hex | Dec
-----|----
00 | 0
FF | -1
FE | -2
FD | -3
0A | 10
0B | 11
Let the A1 cell contain a 1 byte hexadecimal string of any case.
To get the 2's complement decimal value of this string, use the following:
=HEX2DEC(A1)-IF(HEX2DEC(A1) > 127, 256, 0)
For an arbitrary length of bytes, use the following:
=HEX2DEC(A1) - IF(HEX2DEC(A1) > POWER(2, 4*LEN(A1))/2 - 1, POWER(2, 4*LEN(A1)), 0)
I usually use MOD function, but it needs addition and substraction of half the max value. For an 8-bit hex number:
=MOD(HEX2DEC(A1) + 2^7, 2^8) - 2^7
Of course it can be made a generic formula based on length:
=MOD(HEX2DEC(A1) + 2^(4*LEN(A1)-1), 2^(4*LEN(A1))) - 2^(4*LEN(A1)-1)
But sometimes input value has lost leading zeroes or maybe you are using hex values of an arbitrary length (I usually have to decode registers from microcontrollers where maybe a 16-bit register has been used for 3 signed values). I prefer keeping bit length in a separate column:
=MOD(HEX2DEC(A1) + 2^(B1-1), 2^(B1)) - 2^(B1-1)
Example conversion
HEX | bit # | Dec
-----|-------|------
0 | 8 | 0
FF | 8 | -1
FF | 16 | 255
FFFE | 16 | -2
2FF | 10 | -257

Why do I get an `Incorrect Padding` Error while trying to decode my base32 string?

I get an Incorrect padding error while trying to decode a BASE32 string in python using the base64.b32decode() function. I think I have my padding correct. Where have I gone wrong?
import base64
my_string=b'SOMESTRING2345'
print(my_string)
print("length : "+str(len(my_string)))
print("length % 8 : "+str(len(my_string)%8))
p_my_string = my_string+b'='*(8-(len(my_string)%8))
print("\nPadded:\n"+str(p_my_string))
print("length : "+str(len(p_my_string)))
b32d = base64.b32decode(p_my_string)
print("\nB32 decode : " + str(b32d))
print("length : " + str(len(b32d)))
Running this code gets me
b'SOMESTRING2345'
length : 14
length % 8 : 6
Padded:
b'SOMESTRING2345=='
length : 16
---------------------------------------------------------------------------
Error Traceback (most recent call last)
<ipython-input-2-9fe7cf88581a> in <module>()
10 print("length : "+str(len(p_my_string)))
11
---> 12 b32d = base64.b32decode(p_my_string)
13 print("\nB32 decode : " + str(b32d))
14 print("length : " + str(len(b32d)))
/opt/anaconda3/lib/python3.6/base64.py in b32decode(s, casefold, map01)
244 decoded[-5:] = last[:-4]
245 else:
--> 246 raise binascii.Error('Incorrect padding')
247 return bytes(decoded)
248
Error: Incorrect padding
​
However, if I change my_string to b'SOMESTRING23456', I get the code working perfectly with the output -
b'SOMESTRING23456'
length : 15
length % 8 : 7
Padded:
b'SOMESTRING23456='
length : 16
B32 decode : b'\x93\x98IN(i\xb5\xbew'
length : 9
There are no legal 14-character base32 strings. Any remainder beyond modulus 8 can only be 2, 4, 5, or 7 characters long, so padding must always be 6, 4, 3 or 1 = character, any other length is invalid. Since a remainder of 6 characters is not a legal base32 encoding, the base32decode() function can’t do anything but reject the invalid 5 character used instead of a valid = padding character.
A base32 character encodes 5 bits and a byte is always 8 bits long. That means that you don’t need padding for inputs of a multiple of 5 bytes (5 times 8 == 40 bits, which can be encoded cleanly in 8 characters).
Any remainder over a multiple of 5 is encoded thus
1 byte = 8 bits: 2 characters (10 bits)
2 bytes = 16 bits: 4 characters (20 bits)
3 bytes = 24 bits: 5 characters (25 bits)
4 bytes = 32 bits: 7 characters (35 bits)
14 characters would hold 70 bits, which is 8 bytes (64 bits) with 6 bits to spare, so character 14 would carry no meaning!
So for any base32 string with a remainder of 1, 3, or 6 characters you will always get an Incorrect padding exception, regardless of how many = characters you add.
Note that the last character in a remainder encodes a limited number of bits so is also going to fall in a specific range; for 2 characters (encoding 1 byte) the second character only encodes 3 bits with the last 2 bits left at 0, so only A, E, I, M, Q, U, Y and 4 are then possible (so every 4th character of the base32 alphabet, A-Z + 2-7). With 4 characters the last character represents just one bit, so only A and Q are legal. 5 characters leaves 1 redundant bit so every second character can be used (A, C, E, etc) and for 7 characters and 3 redundant bits, every 8th character (A, I, Q, Y).
A decoder can choose to accept all possible base32 characters as at that last position and just mask off the bits that still are needed, so for 2 characters a B or 7 or any of the other invalid characters can still lead to a successful decode, but then there is no difference between AA, AB, AC and AD, all 4 will only use the top 3 bits of the second character and all 4 sequences decode to the hex value 0x00.

Compare two fractions (no floating-point)

Using integers ONLY (no floating-point), is there a way to determine between two fractions, which result is greater?
for example say we have these two fraction:
1000/51 = 19(.60) && 1000/52 = 19(.23)
If we were to use floating point numbers obviously the first fraction is greater; however, both fractions equal 19 if we were to use integers only. How might one find out which is greater with out using floating point math?
I have tried to get the remainder using the % operator but does not seem to work in all cases.
1/2 can be think one apple give two people, so every people take 0.5 apple.
so 1000/51 consider as 1000 apples give 51 people.
1000/51 > 1000/52, because the apple the same,but we wanna give it to more people.
it is simple example, more complex exmaple:
1213/109 1245/115 which is greater?
1245 is greater than 1213 and 115 is greater than 109, difference:
1245 - 1213 = 32, and 115 - 109 = 6, 32/6 replce 1245/109, compare 1213/109 to 32/6.
32/6 ≈ 5 and less 6, 6*109 = 654 < 1213, so 1213/109 > 1245/115.
1213/109 1245/115
1213/109 32/6 # make diff 1245 - 1213 = 32 115 - 109 = 6
# compare diff to 1213/109
1213 > 109 * 6
# then
1213/109 > 1245/115

How do the the different parts of an ICC file work together?

I took apart an ICC file from http://www.brucelindbloom.com/index.html?MunsellCalcHelp.html with a look up table using ICC Profile Inspector. The ICC file is supposed to convert Lab to Uniform LAB.
The files it outputs include headers, a matrix (3x3 identity matrix), Input and Output curves, and a lookup table. What do these files mean? And how are they related to the color transform?
The header contents are:
InputChan: 3
OutputChan: 3
Input_Entries: 258
Output_Entries: 256
Clut_Size: 51
The InputCurves file has entries like:
0 0 0 0
1 256 255 255
2 512 510 510
...
256 65535 65280 65280
257 65535 65535 65535
The OutputCurves file has entries like:
0 0 0 0
1 256 257 257
2 512 514 514
...
254 65024 65278 65278
255 65280 65535 65535
And the lookup table entries look like:
0 0 0 25968
1 0 0 26351
2 0 0 26789
...
132649 65535 65535 49667
132650 65535 65535 50603
I'd like to understand how an input LAB color maps to an output value. I'm especially confused because a & bvalues can be negative.
I believe I understand how this works after skimming through http://www.color.org/specification/ICC1v43_2010-12.pdf
This explination may have some off by 1 errors, but it should be generally correct.
The input values are LAB, and L values are mapped using table 39 & 40 in section 10.8 lut16Type. Then the 258 values in the input curves are uniformly spaced across those L, a, & b ranges. The output values are 16 bit, so 0-65535.
The same goes for the CLUT. There are 51^3 entries (51 was chosen by the ICC file authoer). Each dimension (L,a,b) is split uniformally across this space as well. So 0 = 0 & 50 (note 0 - 50 is 51 entries) = 65535 from the previous section. The first 51 rows are for L =0 and a =0, but incriment b. Every 51 rows, the a value increses by 1, and every 51*51 rows, the L values increases by 1.
So given L, a, and b values adjusted by the input curves, figure out their index (0-50) and look those up in the CLUT (l_ind*51*51+a_ind*51+b_ind), which will give you 3 more values.
Now the output curves come in. It's another set of curves that work just like the input curves. The outputs can then get mapped back using the same values from Tables 39 & 40.

Resources