I've been writing a basic hex dump program in D. My program works as I expect it to, based on test files, but a key formatting function isn't passing the unittest I've written for it.
As the formatted output is exactly as I intended it, I concluded that my unittest doesn't express exactly what I wanted it to.
string getOutput(ubyte[] bin) {
char[] hex =cast(char[]) ("\t\t0\t1\t2\t3\t4\t5\t6\t7\n");
for(int line; 8 * line < bin.length; line++) {
if (8 * line + 8 >= bin.length) {
hex ~= format("%0#0 10x%(\t%1 0 2x%)\n", line*8, bin[8*line..bin.length]);
}
else {
hex ~= format("%0#0 10x%(\t%1 0 2x%)\n", line*8, bin[8*line..8*line + 8]);
}
}
return cast(string) hex;
}
///Example of getOutput()
unittest {
ubyte[9] a = [0xf0, 0xe1, 0xd2, 0xc3, 0xb4, 0xa5, 0x96, 0x87, 0x78];
string b = getOutput(a);
write(b);
assert(b ==
r" 0 1 2 3 4 5 5 7
0000000000 f0 e1 d2 c3 b4 a5 96 87
0000000008 78
");
}
I included write(b) in the unit test to manually inspect the contents. Copy-pasted from my console, b is:
0 1 2 3 4 5 6 7
0000000000 f0 e1 d2 c3 b4 a5 96 87
0x00000008 78
I've spent several hours reading the D lexical documentation for strings, but am nowhere closer to understanding my problem, hence I'm throwing it to you, in the hope I'm not missing something too obvious.
Many thanks.
Related
So in the deflate algorithm each block starts off with a 3 bit header:
Each block of compressed data begins with 3 header bits
containing the following data:
first bit BFINAL
next 2 bits BTYPE
Assuming BTYPE is 10 (compressed with dynamic Huffman codes) then the next 14 bits are as follows:
5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
5 Bits: HDIST, # of Distance codes - 1 (1 - 32)
4 Bits: HCLEN, # of Code Length codes - 4 (4 - 19)
The next (HCLEN + 4) x 4 bits represent the code lengths.
What happens after that is less clear to me.
RFC1951 § 3.2.7. Compression with dynamic Huffman codes (BTYPE=10) says this:
HLIT + 257 code lengths for the literal/length alphabet,
encoded using the code length Huffman code
HDIST + 1 code lengths for the distance alphabet,
encoded using the code length Huffman code
Doing infgen -ddis on 1589c11100000cc166a3cc61ff2dca237709880c45e52c2b08eb043dedb78db8851e (produced by doing gzdeflate('A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_BED')) gives this:
zeros 65 ! 0110110 110
lens 3 ! 0
lens 3 ! 0
lens 4 ! 101
lens 3 ! 0
lens 3 ! 0
zeros 25 ! 0001110 110
lens 3 ! 0
zeros 138 ! 1111111 110
zeros 22 ! 0001011 110
lens 4 ! 101
lens 3 ! 0
lens 3 ! 0
zeros 3 ! 000 1111
lens 2 ! 100
lens 0 ! 1110
lens 0 ! 1110
lens 2 ! 100
lens 2 ! 100
lens 3 ! 0
lens 3 ! 0
I note that 65 is the hex encoding of "A" in ASCII, which presumably explains "zeros 65".
"lens" occurs 16 times, which is equal to HCLEN + 4.
In RFC1951 § 3.2.2. Use of Huffman coding in the "deflate" format there's this:
2) Find the numerical value of the smallest code for each
code length:
code = 0;
bl_count[0] = 0;
for (bits = 1; bits <= MAX_BITS; bits++) {
code = (code + bl_count[bits-1]) << 1;
next_code[bits] = code;
}
So maybe that's what "zeros 65" is but then what about "zeros 25", "zeros 138" and "zeros 22"? 25, 138 and 22, in ASCII, do not appear in the compressed text.
Any ideas?
The next (HCLEN + 4) x 3 bits represent the code lengths.
The number of lens's has nothing to do with HCLEN. The sequence of zeros and lens represent the 269 (259+10) literal/length and distance codes code lengths. If you add up the zeros and the number of lens, you get 269.
A zero-length symbol means it does not appear in the compressed data. There are no literal bytes in the data in the range 0..64, so it starts with 65 zeros. The first symbol coded is then an 'A', with length 3.
Have a couple million records with a string like
"00 00 01 00 00 01 00 01 00 00 00 00 01 01 00 01 00 00 00 00 01"
String has a length of 56. All positions are filled with either a 0 or a 1.
My job is parse the string of each record every two positions
(there are no spaces, that is just for clarification).
If there is a 1 in position two that means increment var1 +1
If there is ALSO a 1 in position four, (don't care about leading "0"'s
in position 1/3/5/9...55, etc.) increment var2 + 1, up to 28 variables.
The entire 56 len string must be parsed every two characters. Potentially
there could be 28 variables that have to be incremented, (but not realistic,
most likely there is only five or six) which could be found in any part of the
string, beginning to end (as long as they are in position 2/4/6/8 up to 56, etc.)
This is what my boss gave me:
if substr(BigString,2,1)='1' then var1+1;
OK. Fine.
A) There are 27 more places to evaluate in the string.
B) there are a couple million records.
28 nested if then do loops doesn't sound like an answer (all I could think of). At least not to me.
Thanx.
I think the author is trying to look for an do-loop method. So my suggest is macro %do or array statment in data step.
data _null_;
text = '000001000001000100000000010100010000000001';
y = length(text);
array Var[28];
do i = 1 to dim(Var);
Var[i] + (substrn(text,i*2,1)='1');
put i = Var[i]=;
end;
run;
Kind of easy, isn't is?
Array the variables that are to be potentially incremented according to string. A DO loop can examine each part of the string and conditionally apply the needed increment.
The SUM statement <variable>+<expression> means the variable's value is automatically retained from row to row.
Due to the nature of retained variables, you might want only the final var1-var28 values at the last row in the data. The question does not have enough info regarding what is to be done with the var<n> variables.
Example:
Presume string is named op_string (op for operation). Utilize logical evaluation result True is 1 and False is 0
data want(keep=var1-var28);
set have end=done;
array var var1-var28;
do index = 1 to 28;
var(index) + substr(op_string, 2 * index) = '1'; * Add 0 or 1 according to logic eval;
end;
if done; * output one row at the end of the data set;
run;
Use COUNTC() to count the number of 1's in the string then.
data want;
set have;
value = countc(op_string, '1');
run;
if I understood the problem well, this could be the solution:
EDITED 2. solution:
/* example with same row*/
data test;
a="00000100000100010000000001010001000000000100000000011110";output;
a="10000100000100010000000001010001000000000100011100011101";output;
a="01000100000100010000000001010001000000000100000001000000";output;
a="10100100000100010000000001010001000000000111111111111110";output;
a="01100100000100010000000001010001000000000101010101010101";output;
a="00000100000100010000000001010001000000000100001100101010";output;
run;
/* work by rows*/
%macro x;
%let i=1;
data test_output(drop=i);
set test;
i=1;
%do %while (&i<=56);
var&i.=0;
var&i.=var&i.+input(substr(a,&i,1), best8.);
%let i=%eval(&i.+1);
%end;
run;
%mend;
%x;
/* results:
a var1 var2 var3 var4 var5 var6 var7 . .
00000100000100010000000001010001000000000100000000011110 0 0 0 0 0 1 0 .......
10000100000100010000000001010001000000000100011100011101 1 0 0 0 0 1 0 .......
01000100000100010000000001010001000000000100000001000000 0 1 0 0 0 1 0 .......
10100100000100010000000001010001000000000111111111111110 1 0 1 0 0 1 0 .......
01100100000100010000000001010001000000000101010101010101 0 1 1 0 0 1 0 .......
00000100000100010000000001010001000000000100001100101010 0 0 0 0 0 1 0 .......
*/
I read on several manuals and online sources that the running time of "simple string concatenation" is O(n^2)?
The algorithm is this: we take the first 2 strings, create a new string, copy the characters of the 2 original strings in the new string, and repeat this process over and over again until all strings are concatenated. We are not using StringBuilder or similar implementations: just a simple string concatenation.
I think the running time should be something like O(kn) where k = number of strings, n = total number of characters.
You don't copy the same characters n times, but k times, so it should not be O(n^2). For example, if you have 2 strings, it's just O(n).
Basically it's n + (n-x) + (n-y) + (n-z)... but k times, not n times.
Where am I wrong?
A precise problem statement is necessary here:
There are two metrics to consider: How much space is required and how much time is required.
This note looks at the time requirements.
The concatenation operation is specified to only concatenate two of the strings at a time, with concatentation being performed with left association:
((k1 + k2) + k3) ...
There are two parameters that may be considered, and two ways of looking at the second parameter.
The first parameter is the total size (in characters) of the strings which are to be concatenated.
The second parameter is either the number of strings which are to be concatenated, or is the size of each of the strings which are to be concatenated.
Considering the first case:
n - Total size (in characters) of the strings to be concatenated.
k - Total number of strings to be concatenated.
The time the concatenation is roughly:
(n/k) * (k^2) / 2
Or, to within a constant factory:
n * k
Then, for a fixed 'k', the concatenation time is linear!
Considering instead the second case:
n - Total size of the strings
m - Size of each of the sub-strings
This corresponds to the prior case but with:
k = n / m
The prior estimate then becomes:
n * k = n * (n / m) = n^2 / m
That is, for a fixed 'm', the concatenation time is quadratic.
If you write some tests and look at the byte code you will see that StringBuilder is used to implement concatenation. And sometimes it will pre-allocate the internal array to increase the efficiency to do so. That is clearly not O(n^2) complexity.
Here is the Java code.
public static void main(String[] args) {
String[] william = {
"To ", "be ", "or ", "not ", "to ", ", that", "is ", "the ",
"question."
};
String quote = "";
for (String word : william) {
quote += word;
}
}
Here is the byte code.
public static void main(java.lang.String[] args);
0 bipush 9
2 anewarray java.lang.String [16]
5 dup
6 iconst_0
7 ldc <String "To "> [18]
9 aastore
10 dup
11 iconst_1
12 ldc <String "be "> [20]
14 aastore
15 dup
16 iconst_2
17 ldc <String 0"or "> [22]
19 aastore
20 dup
21 iconst_3
22 ldc <String "not "> [24]
24 aastore
25 dup
26 iconst_4
27 ldc <String "to "> [26]
29 aastore
30 dup
31 iconst_5
32 ldc <String ", that"> [28]
34 aastore
35 dup
36 bipush 6
38 ldc <String "is "> [30]
40 aastore
41 dup
42 bipush 7
44 ldc <String "the "> [32]
46 aastore
47 dup
48 bipush 8
50 ldc <String "question."> [34]
52 aastore
53 astore_1 [william]
54 ldc <String ""> [36]
56 astore_2 [quote]
57 aload_1 [william]
58 dup
59 astore 6
61 arraylength
62 istore 5
64 iconst_0
65 istore 4
67 goto 98
70 aload 6
72 iload 4
74 aaload
75 astore_3 [word]
76 new java.lang.StringBuilder [38]
79 dup
80 aload_2 [quote]
81 invokestatic java.lang.String.valueOf(java.lang.Object) : java.lang.String [40]
84 invokespecial java.lang.StringBuilder(java.lang.String) [44]
87 aload_3 [word]
88 invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [47]
91 invokevirtual java.lang.StringBuilder.toString() : java.lang.String [51]
94 astore_2 [quote]
95 iinc 4 1
98 iload 4
100 iload 5
102 if_icmplt 70
Question
When faced with signed hexadecimal numbers of unknown length, how can one use Excel formulas to easily convert those hexadecimal numbers to decimal numbers?
Example
Hex
---
00
FF
FE
FD
0A
0B
Use this deeply nested formula:
=HEX2DEC(N)-IF(ISERR(FIND(LEFT(IF(ISEVEN(LEN(N)),N,CONCAT(0,N))),"01234567")),16^LEN(IF(ISEVEN(LEN(N)),N,CONCAT(0,N))),0)
where N is a cell containing hexadecimal data.
This formula becomes more readable when expanded:
=HEX2DEC(N) -
/* check if sign bit is present in leftmost nibble, padding to an even number of digits if necessary */
IF( ISERR( FIND( LEFT( IF( ISEVEN(LEN(N))
, N
, CONCAT(0,N)
)
)
, "01234567"
)
)
/* offset if sign bit is present */
, 16^LEN( IF( ISEVEN(LEN(N))
, N
, CONCAT(0,N)
)
)
/* do not offset if sign bit is absent */
, 0
)
and may be read as "First, convert the hexadecimal value to an unsigned decimal value. Then offset the unsigned decimal value if the leftmost nibble of the data contains a sign bit; else do not offset."
Example Conversion
Hex | Dec
-----|----
00 | 0
FF | -1
FE | -2
FD | -3
0A | 10
0B | 11
Let the A1 cell contain a 1 byte hexadecimal string of any case.
To get the 2's complement decimal value of this string, use the following:
=HEX2DEC(A1)-IF(HEX2DEC(A1) > 127, 256, 0)
For an arbitrary length of bytes, use the following:
=HEX2DEC(A1) - IF(HEX2DEC(A1) > POWER(2, 4*LEN(A1))/2 - 1, POWER(2, 4*LEN(A1)), 0)
I usually use MOD function, but it needs addition and substraction of half the max value. For an 8-bit hex number:
=MOD(HEX2DEC(A1) + 2^7, 2^8) - 2^7
Of course it can be made a generic formula based on length:
=MOD(HEX2DEC(A1) + 2^(4*LEN(A1)-1), 2^(4*LEN(A1))) - 2^(4*LEN(A1)-1)
But sometimes input value has lost leading zeroes or maybe you are using hex values of an arbitrary length (I usually have to decode registers from microcontrollers where maybe a 16-bit register has been used for 3 signed values). I prefer keeping bit length in a separate column:
=MOD(HEX2DEC(A1) + 2^(B1-1), 2^(B1)) - 2^(B1-1)
Example conversion
HEX | bit # | Dec
-----|-------|------
0 | 8 | 0
FF | 8 | -1
FF | 16 | 255
FFFE | 16 | -2
2FF | 10 | -257
I've seen this same question asked around, but it's always with something like:
val1, val2 = input("Enter 2 numbers")
My problem is different.
I have two strings, str1 and str2. I want to compare them byte-by-byte such that the output would look something like this:
str1 str2
0A 0A
20 20
41 41
42 42
43 43
31 31
32 32
33 33
2E 21
So, I've tried various syntaxes to compare them, but it always ends in the same error. Here's one of my latest attempts:
#!/usr/bin/python3
for c1, c2 in (tuple("\n ABC123."), tuple("\n ABC123!")):
print("%02X %02X" % (ord(c1), ord(c2)))
And the error:
$ python3 test.py
Traceback (most recent call last):
File "test.py", line 1, in <module>
ValueError: too many values to unpack (expected 2)
Of course, this line:
for c1, c2 in (tuple("\n ABC123."), tuple("\n ABC123!")):
has gone through many different iterations:
for c1, c2 in "asdf", "asdf"
for c1, c2 in list("asdf"), list("asdf")
for c1, c2 in tuple("asdf"), tuple("asdf")
for c1, c2 in (tuple("asdf"), tuple("asdf"))
for (c1, c2) in (tuple("asdf"), tuple("asdf"))
All of which threw the same error.
I don't think I quite understand python's zipping/unzipping syntax, and I'm just about ready to resort hacking a low-level solution together.
Any ideas?
Okay, so I ended up doing this:
for char in zip(s1,s2):
print("%02X %02X" % ( ord(char[0]), ord(char[1]) ))
However, I notice that if I happen to have two lists of differing lengths, the longer list seems to get truncated at the end. For example:
s1 = "\n ABC123."
s2 = "\n ABC123!."
0A 0A
20 20
41 41
42 42
43 43
31 31
32 32
33 33
2E 21
# !! <-- There is no "2E"
So I guess I could work around that by printing the len() for each string, and then padding the shorter one to meet the longer one.