Question
When faced with signed hexadecimal numbers of unknown length, how can one use Excel formulas to easily convert those hexadecimal numbers to decimal numbers?
Example
Hex
---
00
FF
FE
FD
0A
0B
Use this deeply nested formula:
=HEX2DEC(N)-IF(ISERR(FIND(LEFT(IF(ISEVEN(LEN(N)),N,CONCAT(0,N))),"01234567")),16^LEN(IF(ISEVEN(LEN(N)),N,CONCAT(0,N))),0)
where N is a cell containing hexadecimal data.
This formula becomes more readable when expanded:
=HEX2DEC(N) -
/* check if sign bit is present in leftmost nibble, padding to an even number of digits if necessary */
IF( ISERR( FIND( LEFT( IF( ISEVEN(LEN(N))
, N
, CONCAT(0,N)
)
)
, "01234567"
)
)
/* offset if sign bit is present */
, 16^LEN( IF( ISEVEN(LEN(N))
, N
, CONCAT(0,N)
)
)
/* do not offset if sign bit is absent */
, 0
)
and may be read as "First, convert the hexadecimal value to an unsigned decimal value. Then offset the unsigned decimal value if the leftmost nibble of the data contains a sign bit; else do not offset."
Example Conversion
Hex | Dec
-----|----
00 | 0
FF | -1
FE | -2
FD | -3
0A | 10
0B | 11
Let the A1 cell contain a 1 byte hexadecimal string of any case.
To get the 2's complement decimal value of this string, use the following:
=HEX2DEC(A1)-IF(HEX2DEC(A1) > 127, 256, 0)
For an arbitrary length of bytes, use the following:
=HEX2DEC(A1) - IF(HEX2DEC(A1) > POWER(2, 4*LEN(A1))/2 - 1, POWER(2, 4*LEN(A1)), 0)
I usually use MOD function, but it needs addition and substraction of half the max value. For an 8-bit hex number:
=MOD(HEX2DEC(A1) + 2^7, 2^8) - 2^7
Of course it can be made a generic formula based on length:
=MOD(HEX2DEC(A1) + 2^(4*LEN(A1)-1), 2^(4*LEN(A1))) - 2^(4*LEN(A1)-1)
But sometimes input value has lost leading zeroes or maybe you are using hex values of an arbitrary length (I usually have to decode registers from microcontrollers where maybe a 16-bit register has been used for 3 signed values). I prefer keeping bit length in a separate column:
=MOD(HEX2DEC(A1) + 2^(B1-1), 2^(B1)) - 2^(B1-1)
Example conversion
HEX | bit # | Dec
-----|-------|------
0 | 8 | 0
FF | 8 | -1
FF | 16 | 255
FFFE | 16 | -2
2FF | 10 | -257
Related
So in the deflate algorithm each block starts off with a 3 bit header:
Each block of compressed data begins with 3 header bits
containing the following data:
first bit BFINAL
next 2 bits BTYPE
Assuming BTYPE is 10 (compressed with dynamic Huffman codes) then the next 14 bits are as follows:
5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
5 Bits: HDIST, # of Distance codes - 1 (1 - 32)
4 Bits: HCLEN, # of Code Length codes - 4 (4 - 19)
The next (HCLEN + 4) x 4 bits represent the code lengths.
What happens after that is less clear to me.
RFC1951 § 3.2.7. Compression with dynamic Huffman codes (BTYPE=10) says this:
HLIT + 257 code lengths for the literal/length alphabet,
encoded using the code length Huffman code
HDIST + 1 code lengths for the distance alphabet,
encoded using the code length Huffman code
Doing infgen -ddis on 1589c11100000cc166a3cc61ff2dca237709880c45e52c2b08eb043dedb78db8851e (produced by doing gzdeflate('A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_BED')) gives this:
zeros 65 ! 0110110 110
lens 3 ! 0
lens 3 ! 0
lens 4 ! 101
lens 3 ! 0
lens 3 ! 0
zeros 25 ! 0001110 110
lens 3 ! 0
zeros 138 ! 1111111 110
zeros 22 ! 0001011 110
lens 4 ! 101
lens 3 ! 0
lens 3 ! 0
zeros 3 ! 000 1111
lens 2 ! 100
lens 0 ! 1110
lens 0 ! 1110
lens 2 ! 100
lens 2 ! 100
lens 3 ! 0
lens 3 ! 0
I note that 65 is the hex encoding of "A" in ASCII, which presumably explains "zeros 65".
"lens" occurs 16 times, which is equal to HCLEN + 4.
In RFC1951 § 3.2.2. Use of Huffman coding in the "deflate" format there's this:
2) Find the numerical value of the smallest code for each
code length:
code = 0;
bl_count[0] = 0;
for (bits = 1; bits <= MAX_BITS; bits++) {
code = (code + bl_count[bits-1]) << 1;
next_code[bits] = code;
}
So maybe that's what "zeros 65" is but then what about "zeros 25", "zeros 138" and "zeros 22"? 25, 138 and 22, in ASCII, do not appear in the compressed text.
Any ideas?
The next (HCLEN + 4) x 3 bits represent the code lengths.
The number of lens's has nothing to do with HCLEN. The sequence of zeros and lens represent the 269 (259+10) literal/length and distance codes code lengths. If you add up the zeros and the number of lens, you get 269.
A zero-length symbol means it does not appear in the compressed data. There are no literal bytes in the data in the range 0..64, so it starts with 65 zeros. The first symbol coded is then an 'A', with length 3.
Have a couple million records with a string like
"00 00 01 00 00 01 00 01 00 00 00 00 01 01 00 01 00 00 00 00 01"
String has a length of 56. All positions are filled with either a 0 or a 1.
My job is parse the string of each record every two positions
(there are no spaces, that is just for clarification).
If there is a 1 in position two that means increment var1 +1
If there is ALSO a 1 in position four, (don't care about leading "0"'s
in position 1/3/5/9...55, etc.) increment var2 + 1, up to 28 variables.
The entire 56 len string must be parsed every two characters. Potentially
there could be 28 variables that have to be incremented, (but not realistic,
most likely there is only five or six) which could be found in any part of the
string, beginning to end (as long as they are in position 2/4/6/8 up to 56, etc.)
This is what my boss gave me:
if substr(BigString,2,1)='1' then var1+1;
OK. Fine.
A) There are 27 more places to evaluate in the string.
B) there are a couple million records.
28 nested if then do loops doesn't sound like an answer (all I could think of). At least not to me.
Thanx.
I think the author is trying to look for an do-loop method. So my suggest is macro %do or array statment in data step.
data _null_;
text = '000001000001000100000000010100010000000001';
y = length(text);
array Var[28];
do i = 1 to dim(Var);
Var[i] + (substrn(text,i*2,1)='1');
put i = Var[i]=;
end;
run;
Kind of easy, isn't is?
Array the variables that are to be potentially incremented according to string. A DO loop can examine each part of the string and conditionally apply the needed increment.
The SUM statement <variable>+<expression> means the variable's value is automatically retained from row to row.
Due to the nature of retained variables, you might want only the final var1-var28 values at the last row in the data. The question does not have enough info regarding what is to be done with the var<n> variables.
Example:
Presume string is named op_string (op for operation). Utilize logical evaluation result True is 1 and False is 0
data want(keep=var1-var28);
set have end=done;
array var var1-var28;
do index = 1 to 28;
var(index) + substr(op_string, 2 * index) = '1'; * Add 0 or 1 according to logic eval;
end;
if done; * output one row at the end of the data set;
run;
Use COUNTC() to count the number of 1's in the string then.
data want;
set have;
value = countc(op_string, '1');
run;
if I understood the problem well, this could be the solution:
EDITED 2. solution:
/* example with same row*/
data test;
a="00000100000100010000000001010001000000000100000000011110";output;
a="10000100000100010000000001010001000000000100011100011101";output;
a="01000100000100010000000001010001000000000100000001000000";output;
a="10100100000100010000000001010001000000000111111111111110";output;
a="01100100000100010000000001010001000000000101010101010101";output;
a="00000100000100010000000001010001000000000100001100101010";output;
run;
/* work by rows*/
%macro x;
%let i=1;
data test_output(drop=i);
set test;
i=1;
%do %while (&i<=56);
var&i.=0;
var&i.=var&i.+input(substr(a,&i,1), best8.);
%let i=%eval(&i.+1);
%end;
run;
%mend;
%x;
/* results:
a var1 var2 var3 var4 var5 var6 var7 . .
00000100000100010000000001010001000000000100000000011110 0 0 0 0 0 1 0 .......
10000100000100010000000001010001000000000100011100011101 1 0 0 0 0 1 0 .......
01000100000100010000000001010001000000000100000001000000 0 1 0 0 0 1 0 .......
10100100000100010000000001010001000000000111111111111110 1 0 1 0 0 1 0 .......
01100100000100010000000001010001000000000101010101010101 0 1 1 0 0 1 0 .......
00000100000100010000000001010001000000000100001100101010 0 0 0 0 0 1 0 .......
*/
I was working on a problem for converting base64 to hex and the problem prompt said as an example:
3q2+7w== should produce deadbeef
But if I do that manually, using the base64 digit set ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ I get:
3 110111
q 101010
2 110110
+ 111110
7 111011
w 110000
As a binary string:
110111 101010 110110 111110 111011 110000
grouped into fours:
1101 1110 1010 1101 1011 1110 1110 1111 0000
to hex
d e a d b e e f 0
So shouldn't it be deadbeef0 and not deadbeef? Or am I missing something here?
Base64 is meant to encode bytes (8 bit).
Your base64 string has 6 characters plus 2 padding chars (=), so you could theoretically encode 6*6bits = 36 bits, which would equal 9 4bit hex numbers. But in fact you must think in bytes and then you only have 4 bytes (32 bits) of significant information. The remaining 4 bits (the extra '0') must be ignored.
You can calculate the number of insignificant bits as:
y : insignificant bits
x : number of base64 characters (without padding)
y = (x*6) mod 8
So in your case:
y = (6*6) mod 8 = 4
So you have 4 insignificant bit on the end that you need to ignore.
I have survey data with the age of individuals in a variable named agen. Originally, the variable was string so I converted it to numeric using the encode command. When I tried to generate a new variable hhage referring to the age of head of household, the new variable generated was inconsistent.
The commands I used are the following:
encode agen, gen(age)
gen hhage=age if relntohrp==1
The new variable generated is not consistent because when I browsed it: the age of the hh head in the first houshehold is 65 while the new number generated was 63. When I checked the second household, the variable hhage reported 28 instead of 33 as the head of the housheold head. And so on.
Run help encode and you can read:
Do not use encode if varname contains numbers that merely happen to be stored
as strings; instead, use generate newvar = real(varname) or destring;
see real() or [D] destring.
For example:
clear all
set more off
input id str5 age
1 "32"
2 "14"
3 "65"
4 "54"
5 "98"
end
list
encode age, gen(age2)
destring age, gen(age3)
list, nolabel
Note the difference between using encode and destring. The former assigns numerical codes (1, 2, 3, ...) to the string values, while destring converts the string value to numeric. This you see stripping the value labels when you list:
. list, nolabel
+------------------------+
| id age age3 age2 |
|------------------------|
1. | 1 32 32 2 |
2. | 2 14 14 1 |
3. | 3 65 65 4 |
4. | 4 54 54 3 |
5. | 5 98 98 5 |
+------------------------+
A simple list or browse may confuse you because encode assigns the sequence of natural numbers but also assigns value labels equal to the original strings:
. list
+------------------------+
| id age age3 age2 |
|------------------------|
1. | 1 32 32 32 |
2. | 2 14 14 14 |
3. | 3 65 65 65 |
4. | 4 54 54 54 |
5. | 5 98 98 98 |
+------------------------+
The nolabel option shows the "underlying" data.
You mention it is inconsistent, but for future questions posting exact input and results is more useful for those trying to help you.
Try taking a look at this method? Sounds like you may have slipped up somewhere in your method.
Anyone who ever had to draw text in a graphics application for pre-windows operating systems (i.e. Dos) will know what I'm asking for.
Each ASCII character can be represented by an 8x8 pixel matrix. Each matrix can be represented by an 8 byte code (each byte used as a bit mask for each line of the matrix, 1 bit representing a white pixel, each 0 a black pixel).
Does anyone know where I can find the byte codes for the basic ASCII characters?
Thanks,
BW
Would this do?
Hope this helps.
There are some good ones here; maybe not 8x8, but still easy parse
5 x 7 typeface would cost less space than 8 x 8.
Do you need any characters that are missing from this?
Self answering because user P i hasn't (they posted it in a comment on the question).
This github repo is exactly what I was looking for
dhepper/font8x8
From the read me . . .
8x8 monochrome bitmap font for rendering
A collection of header files containing a 8x8 bitmap font.
font8x8.h contains all available characters
font8x8_basic.h contains unicode points U+0000 - U+007F
font8x8_latin.h contains unicode points U+0000 - U+00FF
Author: Daniel Hepper daniel#hepper.net
License: Public Domain
Encoding
Every character in the font is encoded row-wise in 8 bytes.
The least significant bit of each byte corresponds to the first pixel in a
row.
The character 'A' (0x41 / 65) is encoded as
{ 0x0C, 0x1E, 0x33, 0x33, 0x3F, 0x33, 0x33, 0x00}
0x0C => 0000 1100 => ..XX....
0X1E => 0001 1110 => .XXXX...
0x33 => 0011 0011 => XX..XX..
0x33 => 0011 0011 => XX..XX..
0x3F => 0011 1111 => xxxxxx..
0x33 => 0011 0011 => XX..XX..
0x33 => 0011 0011 => XX..XX..
0x00 => 0000 0000 => ........
To access the nth pixel in a row, right-shift by n.
. . X X . . . .
| | | | | | | |
(0x0C >> 0) & 1 == 0-+ | | | | | | |
(0x0C >> 1) & 1 == 0---+ | | | | | |
(0x0C >> 2) & 1 == 1-----+ | | | | |
(0x0C >> 3) & 1 == 1-------+ | | | |
(0x0C >> 4) & 1 == 0---------+ | | |
(0x0C >> 5) & 1 == 0-----------+ | |
(0x0C >> 6) & 1 == 0-------------+ |
(0x0C >> 7) & 1 == 0---------------+
It depends on the font. Search Google for 8x8 pixel fonts and you'll find a lot of different ones.
Converting from an image to a byte code table is trivial. Loop through each image 8x8 block at at time, reading the pixels and setting the bytes.
http://cone3d.gamedev.net/cone3d/gfxsdl/tut4-2.gif
you could parse/process this bitmap and get the byte matrixes (matrices?) from this