SAS loop through 56 character string extract every two characters - string

Have a couple million records with a string like
"00 00 01 00 00 01 00 01 00 00 00 00 01 01 00 01 00 00 00 00 01"
String has a length of 56. All positions are filled with either a 0 or a 1.
My job is parse the string of each record every two positions
(there are no spaces, that is just for clarification).
If there is a 1 in position two that means increment var1 +1
If there is ALSO a 1 in position four, (don't care about leading "0"'s
in position 1/3/5/9...55, etc.) increment var2 + 1, up to 28 variables.
The entire 56 len string must be parsed every two characters. Potentially
there could be 28 variables that have to be incremented, (but not realistic,
most likely there is only five or six) which could be found in any part of the
string, beginning to end (as long as they are in position 2/4/6/8 up to 56, etc.)
This is what my boss gave me:
if substr(BigString,2,1)='1' then var1+1;
OK. Fine.
A) There are 27 more places to evaluate in the string.
B) there are a couple million records.
28 nested if then do loops doesn't sound like an answer (all I could think of). At least not to me.
Thanx.

I think the author is trying to look for an do-loop method. So my suggest is macro %do or array statment in data step.
data _null_;
text = '000001000001000100000000010100010000000001';
y = length(text);
array Var[28];
do i = 1 to dim(Var);
Var[i] + (substrn(text,i*2,1)='1');
put i = Var[i]=;
end;
run;
Kind of easy, isn't is?

Array the variables that are to be potentially incremented according to string. A DO loop can examine each part of the string and conditionally apply the needed increment.
The SUM statement <variable>+<expression> means the variable's value is automatically retained from row to row.
Due to the nature of retained variables, you might want only the final var1-var28 values at the last row in the data. The question does not have enough info regarding what is to be done with the var<n> variables.
Example:
Presume string is named op_string (op for operation). Utilize logical evaluation result True is 1 and False is 0
data want(keep=var1-var28);
set have end=done;
array var var1-var28;
do index = 1 to 28;
var(index) + substr(op_string, 2 * index) = '1'; * Add 0 or 1 according to logic eval;
end;
if done; * output one row at the end of the data set;
run;

Use COUNTC() to count the number of 1's in the string then.
data want;
set have;
value = countc(op_string, '1');
run;

if I understood the problem well, this could be the solution:
EDITED 2. solution:
/* example with same row*/
data test;
a="00000100000100010000000001010001000000000100000000011110";output;
a="10000100000100010000000001010001000000000100011100011101";output;
a="01000100000100010000000001010001000000000100000001000000";output;
a="10100100000100010000000001010001000000000111111111111110";output;
a="01100100000100010000000001010001000000000101010101010101";output;
a="00000100000100010000000001010001000000000100001100101010";output;
run;
/* work by rows*/
%macro x;
%let i=1;
data test_output(drop=i);
set test;
i=1;
%do %while (&i<=56);
var&i.=0;
var&i.=var&i.+input(substr(a,&i,1), best8.);
%let i=%eval(&i.+1);
%end;
run;
%mend;
%x;
/* results:
a var1 var2 var3 var4 var5 var6 var7 . .
00000100000100010000000001010001000000000100000000011110 0 0 0 0 0 1 0 .......
10000100000100010000000001010001000000000100011100011101 1 0 0 0 0 1 0 .......
01000100000100010000000001010001000000000100000001000000 0 1 0 0 0 1 0 .......
10100100000100010000000001010001000000000111111111111110 1 0 1 0 0 1 0 .......
01100100000100010000000001010001000000000101010101010101 0 1 1 0 0 1 0 .......
00000100000100010000000001010001000000000100001100101010 0 0 0 0 0 1 0 .......
*/

Related

Convert any length signed hexadecimal number to signed decimal number (Excel)

Question
When faced with signed hexadecimal numbers of unknown length, how can one use Excel formulas to easily convert those hexadecimal numbers to decimal numbers?
Example
Hex
---
00
FF
FE
FD
0A
0B
Use this deeply nested formula:
=HEX2DEC(N)-IF(ISERR(FIND(LEFT(IF(ISEVEN(LEN(N)),N,CONCAT(0,N))),"01234567")),16^LEN(IF(ISEVEN(LEN(N)),N,CONCAT(0,N))),0)
where N is a cell containing hexadecimal data.
This formula becomes more readable when expanded:
=HEX2DEC(N) -
/* check if sign bit is present in leftmost nibble, padding to an even number of digits if necessary */
IF( ISERR( FIND( LEFT( IF( ISEVEN(LEN(N))
, N
, CONCAT(0,N)
)
)
, "01234567"
)
)
/* offset if sign bit is present */
, 16^LEN( IF( ISEVEN(LEN(N))
, N
, CONCAT(0,N)
)
)
/* do not offset if sign bit is absent */
, 0
)
and may be read as "First, convert the hexadecimal value to an unsigned decimal value. Then offset the unsigned decimal value if the leftmost nibble of the data contains a sign bit; else do not offset."
Example Conversion
Hex | Dec
-----|----
00 | 0
FF | -1
FE | -2
FD | -3
0A | 10
0B | 11
Let the A1 cell contain a 1 byte hexadecimal string of any case.
To get the 2's complement decimal value of this string, use the following:
=HEX2DEC(A1)-IF(HEX2DEC(A1) > 127, 256, 0)
For an arbitrary length of bytes, use the following:
=HEX2DEC(A1) - IF(HEX2DEC(A1) > POWER(2, 4*LEN(A1))/2 - 1, POWER(2, 4*LEN(A1)), 0)
I usually use MOD function, but it needs addition and substraction of half the max value. For an 8-bit hex number:
=MOD(HEX2DEC(A1) + 2^7, 2^8) - 2^7
Of course it can be made a generic formula based on length:
=MOD(HEX2DEC(A1) + 2^(4*LEN(A1)-1), 2^(4*LEN(A1))) - 2^(4*LEN(A1)-1)
But sometimes input value has lost leading zeroes or maybe you are using hex values of an arbitrary length (I usually have to decode registers from microcontrollers where maybe a 16-bit register has been used for 3 signed values). I prefer keeping bit length in a separate column:
=MOD(HEX2DEC(A1) + 2^(B1-1), 2^(B1)) - 2^(B1-1)
Example conversion
HEX | bit # | Dec
-----|-------|------
0 | 8 | 0
FF | 8 | -1
FF | 16 | 255
FFFE | 16 | -2
2FF | 10 | -257

WYSIWYG String escape key representation

I've been writing a basic hex dump program in D. My program works as I expect it to, based on test files, but a key formatting function isn't passing the unittest I've written for it.
As the formatted output is exactly as I intended it, I concluded that my unittest doesn't express exactly what I wanted it to.
string getOutput(ubyte[] bin) {
char[] hex =cast(char[]) ("\t\t0\t1\t2\t3\t4\t5\t6\t7\n");
for(int line; 8 * line < bin.length; line++) {
if (8 * line + 8 >= bin.length) {
hex ~= format("%0#0 10x%(\t%1 0 2x%)\n", line*8, bin[8*line..bin.length]);
}
else {
hex ~= format("%0#0 10x%(\t%1 0 2x%)\n", line*8, bin[8*line..8*line + 8]);
}
}
return cast(string) hex;
}
///Example of getOutput()
unittest {
ubyte[9] a = [0xf0, 0xe1, 0xd2, 0xc3, 0xb4, 0xa5, 0x96, 0x87, 0x78];
string b = getOutput(a);
write(b);
assert(b ==
r" 0 1 2 3 4 5 5 7
0000000000 f0 e1 d2 c3 b4 a5 96 87
0000000008 78
");
}
I included write(b) in the unit test to manually inspect the contents. Copy-pasted from my console, b is:
0 1 2 3 4 5 6 7
0000000000 f0 e1 d2 c3 b4 a5 96 87
0x00000008 78
I've spent several hours reading the D lexical documentation for strings, but am nowhere closer to understanding my problem, hence I'm throwing it to you, in the hope I'm not missing something too obvious.
Many thanks.

Excel: ignoring 0 at start of numbers

I have a list of times which i want to add to a string
0900 1730
0900 1730
1000 1700
0930 1700
i need to break these up to hours and minutes like so
09 00 17 30
09 00 17 30
10 00 17 00
09 30 17 00
to do this i am using the MID() function to get the first two characters from the cell and then the last two. But when i do this for numbers that start with 0 of have 00 it drops the first 0 like so
0930 = ",MID(B2,1,2),",",MID(B2,3,2)," output - 93 0 what i want = 09 30
0900 = ",MID(B2,1,2),",",MID(B2,3,2)," output - 90 0 what i want = 09 00
1000 = ",MID(B2,1,2),",",MID(B2,3,2)," output - 10 0 what i want = 10 00
is there a way to solve this?
You can use a mid of a pre-formatted block:
=MID(RIGHT("0000"&B2,4),1,2) =MID(RIGHT("0000"&B2,4),3,2)
This should give you two strings like 09 & 30.
If you want two numeric values you can add a value function:
=VALUE(MID(RIGHT("0000"&B2,4),1,2))
One way is place Single Quote(') before the 0 then it will store the 0930 as text in cell
and your formula will also work, No need to change in the formula.
So the value 0930 will be '0930

Wordnet database has letters in weird/invalid places

I was noticing that some lines in the database files (like data.verb) are not following the correct format. (The database format is outlined here).
02286687 40 v 0a fall_upon d strike 0 come_upon 9 light_upon 0 chance_upon 0 come_across 2 chance_on 0 happen_upon 0 attain d discover 0 003 # 02285629 v 0000 + 07214432 n 0a01 + 00043195 n 0a01 01 + 08 00 | find unexpectedly; "the archeologists chanced upon an old tomb"; "she struck a goldmine"; "The hikers finally struck the main path to the lake"
Where the w_cnt 0a should be a the number 10. This also happens in other places like:
02575723 41 v 08 flim-flam 0 play_a_joke_on 1 play_tricks 0 trick 0 fob 0 fox 0 pull_a_fast_one_on 0 play_a_trick_on 0 008 # 02575082 v 0000 + 10022759 n 0602 + 00171618 n 0401 + 10463714 n 0404 + 06760722 n 0401 + 00752954 n 0401 + 00779248 n 010c ~ 02578384 v 0000 02 + 09 00 + 30 04 | deceive somebody; "We tricked the teacher into thinking that class would be cancelled next week"
Where 010c isn't a valid number. Unless [digit][letter] is a valid format, but is not described in the documentation I have read so far.
Why are their random letters among the numbers?
Looks like the numbers are in hexadecimal format - A is 10, for example.

Choose For Random Strings In Commodore 64 BASIC

I have this variable declarations on my program:
X="MAGENTA"
Y="CYAN"
Z="TAN"
A="KHAKI"
Now what I want is to randomly choose one of these and PRINT it. But how to do this?
My BASIC is pretty rusty but you should just be able to use something like:
10 X$ = "MAGENTA"
20 Y$ = "CYAN"
30 Z$ = "TAN"
40 A$ = "KHAKI"
50 N = INT(RND(1) * 4)
60 IF N = 0 THEN PRINT X$
70 IF N = 1 THEN PRINT Y$
80 IF N = 2 THEN PRINT Z$
90 IF N = 3 THEN PRINT A$
or, putting it in a subroutine for code re-use:
10 X$ = "MAGENTA"
20 Y$ = "CYAN"
30 Z$ = "TAN"
40 A$ = "KHAKI"
50 GOSUB 1000
60 PRINT RC$
70 END
1000 TV = INT(RND(1) * 4)
1010 IF TV = 0 THEN RC$ = X$
1020 IF TV = 1 THEN RC$ = Y$
1030 IF TV = 2 THEN RC$ = Z$
1040 IF TV = 3 THEN RC$ = A$
1050 RETURN
Of course, you probably should be using arrays for that sort of thing so you can just use:
10 DIM A$(3)
10 A$(0) = "MAGENTA"
20 A$(1) = "CYAN"
30 A$(2) = "TAN"
40 A$(3) = "KHAKI"
50 PRINT A$(INT(RND(1)*4))
The above answer is correct and comprehensive.
This answer, on the other hand, is not, but I was actually doing a little bit of Commodore BASIC last month and decided that string indexing CAN be useful, sometimes, so here's a non-answer that sort of reframes your problem.
100 X$ = "MAGENTACYAN TAN KHAKI "
110 PRINT MID$(X$,INT(RND(1)*4)*7, 7)
This code gets a random int from 0 to 3, then uses that to find the start index into a single string that contains all four entries, each of which is padded out (where necessary) to 7 characters. That padding is needed because the final parameter to MID$ is the length of the substring to be extracted.
WHY BOTHER?
When to consider indexing over an array:
(1) when your string data is near-uniform length, and
(2) when you have a LOT of little strings.
If those two conditions are true, then the full code, including the data, is more compact, and takes less memory due to allocating fewer pointers.
P.S. Bonus point if you find that I've made an off-by-one error!
Here's another way to do it, using one variable for the output and ON..GOSUB to set it based on a random number in the range [1..4].
10 ON INT(RND(1)*4+1) GOSUB 100,110,120,130
20 PRINT A$
30 END
100 A$ = "MAGENTA":RETURN
110 A$ = "CYAN":RETURN
120 A$ = "TAN":RETURN
130 A$ = "KHAKI":RETURN

Resources