I was working on a problem for converting base64 to hex and the problem prompt said as an example:
3q2+7w== should produce deadbeef
But if I do that manually, using the base64 digit set ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ I get:
3 110111
q 101010
2 110110
+ 111110
7 111011
w 110000
As a binary string:
110111 101010 110110 111110 111011 110000
grouped into fours:
1101 1110 1010 1101 1011 1110 1110 1111 0000
to hex
d e a d b e e f 0
So shouldn't it be deadbeef0 and not deadbeef? Or am I missing something here?
Base64 is meant to encode bytes (8 bit).
Your base64 string has 6 characters plus 2 padding chars (=), so you could theoretically encode 6*6bits = 36 bits, which would equal 9 4bit hex numbers. But in fact you must think in bytes and then you only have 4 bytes (32 bits) of significant information. The remaining 4 bits (the extra '0') must be ignored.
You can calculate the number of insignificant bits as:
y : insignificant bits
x : number of base64 characters (without padding)
y = (x*6) mod 8
So in your case:
y = (6*6) mod 8 = 4
So you have 4 insignificant bit on the end that you need to ignore.
Related
Well, the subject.
I have searched a lot, but unfortunately, found nothing. Is there some document describing this format? Or the structure need to be extracted out from the xauth source files?
Probably not exactly what you are looking for but putting in an answer just for the formatting.
The .Xauthority is an array of structures:
typedef struct xauth {
unsigned short family;
unsigned short address_length;
char *address;
unsigned short number_length;
char *number;
unsigned short name_length;
char *name;
unsigned short data_length;
char *data;
} Xauth;
You would probably still need to be able to decode each entry -- if nothing else by slogging through the source:Xauth.h
For example:
$ od -xc --endian=big .Xauthority | more
0000000 0100 0007 6d61 7869 6d75 7300 0130 0012
001 \0 \0 \a m a x i m u s \0 001 0 \0 022
0000020 4d49 542d 4d41 4749 432d 434f 4f4b 4945
M I T - M A G I C - C O O K I E
0000040 2d31 0010 c0ac 9e9c ee82 ef59 f406 b7f9
- 1 \0 020 300 254 236 234 356 202 357 Y 364 006 267 371
0000060 b745 254e 0100 0007 6d61 7869 6d75 7300
267 E % N 001 \0 \0 \a m a x i m u s \0
The first short is 0x100 indicating the family
The next short is 0x0007 indicating the length of the address
The next 7 bytes are the address: maximus
The next short is 0001, the length of the seat number
The next byte is 30, ascii 0, the seat number
The next short is 0x0012, decimal 18, the length of the name
The next 18 bytes are the name: MIT-MAGIC-COOKIE-1
The next short is 0x0010, decimal 16, the length of the data
And the next 16 bytes are the data: 0xc0ac thru 0x254e.
Then it starts over.
Here are some documents for your reference.
Cookie-based access (.Xauthority file) follows the Inter-Client Exchange (ICE) Protocol and implemented in Inter-Client Exchange Library, you will find more format details in Appendix session.
for example, Appendix B describes the common MIT-MAGIC-COOKIE-1 Authentication method.
The correct specification is in the documentation of the Xau library.
The .Xauthority file is a binary file consisting of a sequence of entries
in the following format:
2 bytes Family value (second byte is as in protocol HOST)
2 bytes address length (always MSB first)
A bytes host address (as in protocol HOST)
2 bytes display "number" length (always MSB first)
S bytes display "number" string
2 bytes name length (always MSB first)
N bytes authorization name string
2 bytes data length (always MSB first)
D bytes authorization data string
So in the deflate algorithm each block starts off with a 3 bit header:
Each block of compressed data begins with 3 header bits
containing the following data:
first bit BFINAL
next 2 bits BTYPE
Assuming BTYPE is 10 (compressed with dynamic Huffman codes) then the next 14 bits are as follows:
5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
5 Bits: HDIST, # of Distance codes - 1 (1 - 32)
4 Bits: HCLEN, # of Code Length codes - 4 (4 - 19)
The next (HCLEN + 4) x 4 bits represent the code lengths.
What happens after that is less clear to me.
RFC1951 § 3.2.7. Compression with dynamic Huffman codes (BTYPE=10) says this:
HLIT + 257 code lengths for the literal/length alphabet,
encoded using the code length Huffman code
HDIST + 1 code lengths for the distance alphabet,
encoded using the code length Huffman code
Doing infgen -ddis on 1589c11100000cc166a3cc61ff2dca237709880c45e52c2b08eb043dedb78db8851e (produced by doing gzdeflate('A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_BED')) gives this:
zeros 65 ! 0110110 110
lens 3 ! 0
lens 3 ! 0
lens 4 ! 101
lens 3 ! 0
lens 3 ! 0
zeros 25 ! 0001110 110
lens 3 ! 0
zeros 138 ! 1111111 110
zeros 22 ! 0001011 110
lens 4 ! 101
lens 3 ! 0
lens 3 ! 0
zeros 3 ! 000 1111
lens 2 ! 100
lens 0 ! 1110
lens 0 ! 1110
lens 2 ! 100
lens 2 ! 100
lens 3 ! 0
lens 3 ! 0
I note that 65 is the hex encoding of "A" in ASCII, which presumably explains "zeros 65".
"lens" occurs 16 times, which is equal to HCLEN + 4.
In RFC1951 § 3.2.2. Use of Huffman coding in the "deflate" format there's this:
2) Find the numerical value of the smallest code for each
code length:
code = 0;
bl_count[0] = 0;
for (bits = 1; bits <= MAX_BITS; bits++) {
code = (code + bl_count[bits-1]) << 1;
next_code[bits] = code;
}
So maybe that's what "zeros 65" is but then what about "zeros 25", "zeros 138" and "zeros 22"? 25, 138 and 22, in ASCII, do not appear in the compressed text.
Any ideas?
The next (HCLEN + 4) x 3 bits represent the code lengths.
The number of lens's has nothing to do with HCLEN. The sequence of zeros and lens represent the 269 (259+10) literal/length and distance codes code lengths. If you add up the zeros and the number of lens, you get 269.
A zero-length symbol means it does not appear in the compressed data. There are no literal bytes in the data in the range 0..64, so it starts with 65 zeros. The first symbol coded is then an 'A', with length 3.
I get an Incorrect padding error while trying to decode a BASE32 string in python using the base64.b32decode() function. I think I have my padding correct. Where have I gone wrong?
import base64
my_string=b'SOMESTRING2345'
print(my_string)
print("length : "+str(len(my_string)))
print("length % 8 : "+str(len(my_string)%8))
p_my_string = my_string+b'='*(8-(len(my_string)%8))
print("\nPadded:\n"+str(p_my_string))
print("length : "+str(len(p_my_string)))
b32d = base64.b32decode(p_my_string)
print("\nB32 decode : " + str(b32d))
print("length : " + str(len(b32d)))
Running this code gets me
b'SOMESTRING2345'
length : 14
length % 8 : 6
Padded:
b'SOMESTRING2345=='
length : 16
---------------------------------------------------------------------------
Error Traceback (most recent call last)
<ipython-input-2-9fe7cf88581a> in <module>()
10 print("length : "+str(len(p_my_string)))
11
---> 12 b32d = base64.b32decode(p_my_string)
13 print("\nB32 decode : " + str(b32d))
14 print("length : " + str(len(b32d)))
/opt/anaconda3/lib/python3.6/base64.py in b32decode(s, casefold, map01)
244 decoded[-5:] = last[:-4]
245 else:
--> 246 raise binascii.Error('Incorrect padding')
247 return bytes(decoded)
248
Error: Incorrect padding
However, if I change my_string to b'SOMESTRING23456', I get the code working perfectly with the output -
b'SOMESTRING23456'
length : 15
length % 8 : 7
Padded:
b'SOMESTRING23456='
length : 16
B32 decode : b'\x93\x98IN(i\xb5\xbew'
length : 9
There are no legal 14-character base32 strings. Any remainder beyond modulus 8 can only be 2, 4, 5, or 7 characters long, so padding must always be 6, 4, 3 or 1 = character, any other length is invalid. Since a remainder of 6 characters is not a legal base32 encoding, the base32decode() function can’t do anything but reject the invalid 5 character used instead of a valid = padding character.
A base32 character encodes 5 bits and a byte is always 8 bits long. That means that you don’t need padding for inputs of a multiple of 5 bytes (5 times 8 == 40 bits, which can be encoded cleanly in 8 characters).
Any remainder over a multiple of 5 is encoded thus
1 byte = 8 bits: 2 characters (10 bits)
2 bytes = 16 bits: 4 characters (20 bits)
3 bytes = 24 bits: 5 characters (25 bits)
4 bytes = 32 bits: 7 characters (35 bits)
14 characters would hold 70 bits, which is 8 bytes (64 bits) with 6 bits to spare, so character 14 would carry no meaning!
So for any base32 string with a remainder of 1, 3, or 6 characters you will always get an Incorrect padding exception, regardless of how many = characters you add.
Note that the last character in a remainder encodes a limited number of bits so is also going to fall in a specific range; for 2 characters (encoding 1 byte) the second character only encodes 3 bits with the last 2 bits left at 0, so only A, E, I, M, Q, U, Y and 4 are then possible (so every 4th character of the base32 alphabet, A-Z + 2-7). With 4 characters the last character represents just one bit, so only A and Q are legal. 5 characters leaves 1 redundant bit so every second character can be used (A, C, E, etc) and for 7 characters and 3 redundant bits, every 8th character (A, I, Q, Y).
A decoder can choose to accept all possible base32 characters as at that last position and just mask off the bits that still are needed, so for 2 characters a B or 7 or any of the other invalid characters can still lead to a successful decode, but then there is no difference between AA, AB, AC and AD, all 4 will only use the top 3 bits of the second character and all 4 sequences decode to the hex value 0x00.
This question is derived from my previous SO question's commends.
I am confused with PLC's interpretation of BCD and decimal.
In a PLC documentation, it somehow implies BCD = decimal:
The instruction reads the content of D300, 0100, as BCD. Referring to Cyber Slueth Omega's answer and online BCD-Hex converter, 0100 (BCD) = 4 (Decimal) = 4 (Hex), but the documentation indicates 0100 (BCD) = 100 (Decimal).
Why?
BCD is HEX
BCD is not binary
HEX is not binary
BCD and HEX are representations of binary information.
The only difference is in how you decide to interpret the numbers. Some PLC instructions will take a piece of word memory and will tell you that "I, the TIM instruction, promise to treat the raw data in D300 as BCD data". It is still HEX data, but it interprets it differently.
If D300 = [x2486] --> the timer (as example) will wait 248.6 seconds. This even though HEX 2486 = 9350 decimal. You can treat hex data as anything. If you treat hex data as encoded BCD you get one answer. If you treat it as a plain unsigned binary number you get another, etc.
If D300 = [x1A3D] --> TIM will throw an error flag because D300 contains non-BCD hex digits
Further, the above example is showing HEX digits - not BINARY digits. It is confusing because they chose [x0100] as their example - only zeroes and ones. When you are plugging this into your online converter you are doing it wrong - you are converting binary 0100 to decimal 4. Hexadecimal is not binary - hex is a base16 representation of binary.
Anatomy of a D-memory location is this
16 Bits | xxxx | xxxx | xxxx | xxxx | /BINARY/
---> | | | |
4 bits/digit D4 D3 D2 D1 /HEX/
example
D300 = 1234 | 0001 | 0010 | 0011 | 0100 |
----> 1 2 3 4
example
D300 = 2F6B | 0010 | 1111 | 0110 | 1011 |
----> 2 F 6 B
example (OP!)
D300 = 0100 | 0000 | 0001 | 0000 | 0000 |
----> 0 1 0 0
A D-memory location can store values from x0000 -> xFFFF (decimal 0-65535). A D-memory location which is used to store BCD values, however, can only use decimal digits. A->F are not allowed. This reduces the range of a 16-bit memory location to 0000->9999.
Counting up you would go :
Decimal BCD HEX
1 0001 0001
2 0002 0002
3 0003 0003
4 0004 0004
5 0005 0005
6 0006 0006
7 0007 0007
8 0008 0008
9 0009 0009
10 0010 000A
11 0011 000B
12 0012 000C
13 0013 000D
14 0014 000E
15 0015 000F
16 0016 0010
17 0017 0011
18 0018 0012
19 0019 0013
20 0020 0014
...etc
Going the other way, if you wish to pass a decimal value to a memory location and have it stored as pure hex (not BCD hex!) you use the '&' symbol.
For example -> [MOV #123 D300]
This moves HEX value x0123 to memory location D300. If you use D300 in a future operation which interprets this as a hexadecimal number then it will have a decimal value of 291. If you use it in an instruction which interprets it as a BCD value then it will have a decimal value of 123.
If instead you do [MOV &123 D300]
This moves the decimal value 123 to D300 and stores it as a hexadecimal number -> [x007B]! If you use D300 now in a future operation which interprets this as a hexadecimal number it will have a decimal value of 123. If you try to use it in an instruction which interprets it as a BCD value you will get an ERROR because [x007B] contains the hex digit 'B' which is not a valid BCD digit.
Binary-coded decimal is encoded as hex digits that have a limited range of 0-9. This means that 0x0100 should be read as 100 when BCD is meant. Numbers with hexadecimal digits from A to F are not valid BCD numbers.
I'm decoding jpeg file. I have generated huffman tables, and quantization tables, and I have reach the point where I have to decode DC and AC elements. For example lets say I have next data
FFDA 00 0C 03 01 00 02 11 03 11 00 3F 00 F2 A6 2A FD 54 C5 5F FFD9
If we ignore few bytes from SOS marker, my real data is starting from F2 byte. So lets write it in binary (starting from F2 byte):
1111 0010 1010 0110 0010 1010 1111 1101 0101 0100 1100 0101 0101 1111
F 2 A 6 2 A F D 5 4 C 5 5 F
When decoding, first element is luminance DC element so let's decode it.
[1111 0]010 1010 0110 0010 1010 1111 1101 0101 0100 1100 0101 0101 1111
F 2 A 6 2 A F D 5 4 C 5 5 F
So 11110 is Huffman code (in my case) for element 08. This means that next 8 bits are my DC value. When I take next 8 bits the value is:
1111 0[010 1010 0]110 0010 1010 1111 1101 0101 0100 1100 0101 0101 1111
F 2 A 6 2 A F D 5 4 C 5 5 F
DC element value is -171.
Here is my problem: next is luminance AC value, but I don't really understand standard in a case when is AC non zero? Tnx!
The DC values, as you've seen, are defined as the number of "extra" bits which specify the positive or negative DC value. The AC coefficients are encoded differently because most of them are 0. The Huffman table defines each entry for AC coefficients with a "skip" value and an "extra bits" length. The skip value is how many AC coefficients to skip before storing the value, and the extra bits are treated the same way as DC values. When decoding AC coefficients, you decode values from 1 to 63, but the way the encoding of the MCU ends can vary. You can have an actual value stored at index 63 or at if you're at index > 48, you could get a ZRL (zero run length = 16 zeros), or any combination which takes you past the end. A simplified decode loop:
void DecodeMCU(signed short *MCU)
{
int index;
unsigned short code, skip, extra;
MCU[0] = decodeDC();
index = 1;
while (index < 64)
{
code = decodeAC();
skip = code >> 4; // skip value
extra = code & 0xf; // extra bits
index += skip;
MCU[index++] = calcACValue(extra);
}
}
The color components can be interleaved (typical) or stored in separate scans. The elements are encoded in zigzag order in each MCU (low frequency elements first). The number of 8x8 blocks of coefficients which define an MCU varies depending on the color subsampling. For 1:1, there will be 1 Y followed by 1 Cr and 1 Cb. For typical digital camera images, the horizontal axis is subsampled, so you will get 2 Y blocks followed by 1 Cr and 1 Cb. The quality setting of the compressed image determines the quantization table used and how many zero AC coefficients are encoded. The lower the quality, the more of each MCU will be zeros. When you do the inverse DCT on your MCU, the number of zeros will determine how much detail is preserved in your 8x8, 16x8, 8x16 or 16x16 block of pixels. Here are the basic steps:
1) Entropy decode the 8x8 coefficient blocks, each color component is stored separately
2) De-zigzag and de-quantize the coefficients
3) Perform inverse DCT on the coefficients (might be 6 8x8 blocks for 4:2:0 subsampling)
4) Convert the colorspace from YCrCb to RGB or whatever you need