SAS Index on Array - string

I am trying to search for a keyword in a description field (descr) and if it is there define that field as a match (what keyword it matches on is not important). I am having an issue where the do loop is going through all entries of the array and . I am not sure if this is because my do loop is incorrect or because my index command is inocrrect.
data JE.KeywordMatchTemp1;
set JE.JEMasterTemp;
if _n_ = 1 then do;
do i = 1 by 1 until (eof);
set JE.KeyWords end=eof;
array keywords[100] $30 _temporary_;
keywords[i] = Key_Words;
end;
end;
match = 0;
do i = 1 to 100 until(match=1);
if index(descr, keywords[i]) then match = 1;
end;
drop i;
run;

Add another condition to your DO loop to have it terminate when any match is found. You might want to also remember how many entries are in the array. Also make sure to use INDEX() function properly.
data JE.KeywordMatchTemp1;
if _n_ = 1 then do;
do i = 1 by 1 until (eof);
set JE.KeyWords end=eof;
array keywords[100] $30 _temporary_;
keywords[i] = Key_Words;
end;
last_i = i ;
retain last_i ;
end;
set JE.JEMasterTemp;
match = 0;
do i = 1 to last_i while (match=0) ;
if index(descr, trim(keywords[i]) ) then match = 1;
end;
drop i last_i;
run;

You have two problems; both of which would be easy to see in a small compact example (suggestion: put an example like this in your question in the future).
data partials;
input keyword $;
datalines;
home
auto
car
life
whole
renter
;;;;
run;
data master;
input #1 description $50.;
datalines;
Mutual Fund
State Farm Automobile Insurance
Checking Account
Life Insurance with Geico
Renter's Insurance
;;;;
run;
data want;
set master;
array keywords[100] $ _temporary_;
if _n_=1 then do;
do _i = 1 by 1 until (eof);
set partials end=eof;
keywords[_i] = keyword;
end;
end;
match=0;
do _m = 1 to dim(keywords) while (match=0 and keywords[_m] ne ' ');
if find(lowcase(description),lowcase(keywords[_m]),1,'t') then match=1;
end;
run;
Two things to look at here. First, notice the addition to the while. This guarantees we never try to match " " (which will always match if you have any spaces in your strings). The second is the t option in find (I note you have to add the 1 for start position, as for some reason the alternate version doesn't work at least for me) which trims spaces from both arguments. Otherwise it looks for "auto " instead of "auto".

Related

SAS - put empty space on string

I have a script to write a SAS program (txt) that looks like this:
/********* Import excel spreadsheet with model sepcs *****************/
proc import file = "&mydir\sample.xls" out = model dbms = xls replace;
run;
/********* Create program model *****************/
data model;
set model;
dlb = resolve(dlb);
dub = resolve(dub);
run;
data model;
set model;
where2 = tranwrd(where,"="," ");
where2 = tranwrd(where2,"<"," ");
where2 = tranwrd(where2,">"," ");
nword = countw(where2);
bounds = trim(dlb)!!" "!!trim(dub);
bounds = tranwrd(bounds,"="," ");
bounds = tranwrd(bounds,"<"," ");
bounds = tranwrd(bounds,">"," ");
nbounds = countw(bounds);
run;
proc sql noprint;
select max(nword) into: max_word from model ;
select max(nbounds) into: max_aux from model ;
select name into: list_var separated by " " from dictionary.columns where libname = "WORK" and memname = "IMP" ;
quit;
/******* Generate Model ********/
%macro generate_model;
data model;
set model;
attrib wherev length = $500.;
do i = 1 to countw(where2);
%do j = 1 %to %sysfunc(countw(&list_var));
if upcase(scan(where2,i)) = "%upcase(%scan(&list_var,&j))" and scan(where2,i) not in ("0","1","2","3","4","5","6","7","8","9") then do;
if missing(wherev) then wherev = trim(scan(where2,i));
else if index(wherev,trim(scan(where2,i))) = 0 then do;
wherev = trim(wherev)!!" "!!trim(scan(where2,i));
end;
end;
%end;
end;
drop i where2;
run;
data model;
set model;
attrib aux length = $500.;
do i = 1 to countw(bounds);
%do j = 1 %to %sysfunc(countw(&list_var));
if upcase(scan(bounds,i)) = "%upcase(%scan(&list_var,&j))" and scan(bounds,i) not in ("0","1","2","3","4","5","6","7","8","9") then do;
if missing(aux) then aux = trim(scan(bounds,i));
else if index(aux,trim(scan(bounds,i))) = 0 then do;
aux = trim(aux)!!" "!!trim(scan(bounds,i));
end;
end;
%end;
end;
drop i bounds;
run;
%mend;
%generate_model;
data outem.bound;
set outem.model;
attrib txt length = $2000.;
txt = "******************Macros for variable"!!trim(dep)!!"******;";
output;
txt = "%"!!"macro bound"!!trim(dep)!!";";
output;
if not missing(lb) then do;
txt ="LB="!!trim(lb)!!";";
output;
end;
if not missing(ub) then do;
txt ="UB="!!trim(ub)!!";";
output;
end;
if not missing(dlb) and not missing(lb) then do;
txt ="LB=MAX(LB,"!!trim(dlb)!!");";
output;
end;
if not missing(dlb) and missing(lb) then do;
txt ="LB="!!trim(dlb)!!";";
output;
end;
if not missing(dub) and not missing(ub) then do;
txt ="UB=MIN(UB,"!!trim(dub)!!");";
output;
end;
if not missing(dub) and missing(ub) then do;
txt ="UB="!!trim(dub)!!";";
output;
end;
txt = "%"!!"mend;";
output;run;
data outem.imp;
set outem.bound;
file "&mydir\3_generate_models\3_model.sas" lrecl = 2000;
put txt;
run;
The program works fine, however i can't manage to put empty space before UB or LB.
The output looks like this:
%macro boundHC0340;
LB= 1;
UB= 9;
%mend;
But I would like to get this:
%macro boundHC0340;
LB= 1;
UB= 9;
%mend;
The code already has some attempts to put empty space before UB and LB, but so far I couldn't manage.
I can put other characters and strings in there. I just can't put empty space before UB and LB in order to produce indented code.
I've tried something like this:
txt =" LB="!!trim(lb)!!";";
But the empty space before LB does nothing.
However if i write this:
txt ="******LB="!!trim(lb)!!";";
I get the asterisks on my program.
Any idea of what I'm missing here?
Thank you very much for your support.
Best regards
Ps: here's the hyperlink to sample xls file: sample.xls
Assuming that you have built the variable TXT with the value you want to see you just need to add a format to your final step. To avoid writing a lot of useless trailing blanks use the $VARYING format. You will need to calculate the length of your string to use that format.
data outem.imp;
set outem.bound;
file "&mydir\3_generate_models\3_model.sas" lrecl = 2000;
length= lengthn(txt);
put txt $varying2000. length;
run;
But it is probably easier to just skip all of the concatenation and just use the power of the PUT statement itself to write the program directly from your data. Then you can use things like pointer controls (#3) or named value lb= and other features of the PUT statement to format your program file.
data _null_;
set outem.model;
file "&mydir\3_generate_models\3_model.sas" ;
put 72*'*' ';'
/ '* Macros for variable ' dep ';'
/ 72*'*' ';'
/ '%macro bound' dep ';'
;
if not missing(lb) then put #3 lb= ';' ;
if not missing(ub) then put #3 ub= ';' ;
if not missing(dlb) and not missing(lb) then put
#3 'LB=MAX(LB,' dlb ');'
;
if not missing(dlb) and missing(lb) then put
#3 'LB=' dlb ';'
;
if not missing(dub) and not missing(ub) then put
#3 'UB=MIN(UB,' dub ');'
;
if not missing(dub) and missing(ub) then put
#3 'UB=' dub ';'
;
put '%mend bound' dep ';';
run;
Although looking at the logic of those IF statement why not reduce them to:
put #3 'LB=MAX(' lb ',' dlb ');' ;
put #3 'UB=MIN(' ub ',' dub ');' ;
I think this is the result of SAS applying left alignment by default for the $w. format of your variable when you use your put statement. You can override this by applying a format in the put statement and specifying what alignment you want to use:
data _null_;
file "%sysfunc(pathname(work))\example.txt";
a = " text here";
/*Approach 1 - default behaviour*/
/*No leading spaces on this line in output file (default)*/
put a;
/*Approach 2 - $varying + right alignment*/
/*We need to right align text while preserving the number of leading spaces, so use $varying. */
/*If every line is the same length, we can use $w. instead*/
/*Use -r to override the default format alignment*/
varlen = length(a);
put a $varying2000.-r varlen;
/*Approach 3 - manually specify indentation*/
/*Alternatively - ditch the leading spaces and tell SAS which column to start at*/
put #4 a;
run;
Try changing the last part of your code so it looks a bit like this (fix paths and dataset names as appropriate):
data bound;
set model;
attrib txt length = $2000.;
txt = "******************Macros for variable"!!trim(dep)!!"******;";
output;
txt = "%"!!"macro bound"!!trim(dep)!!";";
output;
if not missing(lb) then do;
/* LEADING SPACES ADDED HERE */
/* LEADING SPACES ADDED HERE */
/* LEADING SPACES ADDED HERE */
txt =" LB="!!trim(lb)!!";";
output;
end;
if not missing(ub) then do;
/* LEADING SPACES ADDED HERE */
/* LEADING SPACES ADDED HERE */
/* LEADING SPACES ADDED HERE */
txt =" UB="!!trim(ub)!!";";
output;
end;
if not missing(dlb) and not missing(lb) then do;
txt ="LB=MAX(LB,"!!trim(dlb)!!");";
output;
end;
if not missing(dlb) and missing(lb) then do;
txt ="LB="!!trim(dlb)!!";";
output;
end;
if not missing(dub) and not missing(ub) then do;
txt ="UB=MIN(UB,"!!trim(dub)!!");";
output;
end;
if not missing(dub) and missing(ub) then do;
txt ="UB="!!trim(dub)!!";";
output;
end;
txt = "%"!!"mend;";
output;
run;
data _null_;
set bound;
file "%sysfunc(pathname(work))\example.sas" lrecl = 2000;
varlen = length(txt);
put txt $varying2000.-r varlen;
run;
x "notepad ""%sysfunc(pathname(work))\example.sas""";
Contents of example.sas (based on sample xls):
******************Macros for variableHC0340******;
%macro boundHC0340;
LB= 1;
UB= 9;
%mend;

Using Formula Language to return column values

I'm trying to return all values stored in the tNames variable.
Values exists in the field. The show multivalues as separate entries have been selected already but none of the names is returned.
Below is the sample code:
tNames := "";
#For(n := 0; n <= QuestionCount - 1; n := n + 1;
tNames := tNames + ", " + #Implode(#GetField("ChecklistContact_" +
#Text(n));",")
);
#Trim(tNames)
I dont know why its not returning anything, will appreciate your help.
The below returns only the contact with index 0, but I want to return all contacts in each document.
tCount := 0;
#For(n := 0; n <= QuestionCount - 1; n := n + 1;
tCount := tCount + #If(#GetField("ChecklistContact_" + #Text(n)) = ""; 0; 1)
);
#GetField("ChecklistContact_" + #Text(tCount))
Following comments from Richard the below return the required values, but will prefer not to hard code field name.
Is there any way of using for loop to return field names and values?
tNames := "";
tNames:= #GetField("ChecklistContact_1") : #GetField("ChecklistContact_2") : ... #GetField("ChecklistContact_7");
#Trim(tNames)
I don't believe that IBM's documentation says this explicitly, but I don't think #GetField works in column value formulas. The doc says that it works in the "current document", and there is no current document when the formula is executing in a view.
Assuming you know what the maximum number for N is, the way to do this is with a simple list:
ChecklistContact_1 : ChecklistContact_2 : ChecklistContact_3 : ... : ChecklistContact_N
If N is large, this will be a lot of typing, but you'll only have to do it once and copying and pasting and editing the numbers will make it go pretty quickly.
It might sound inelegant but, if you can, create a new computed field with your column formula in your form and then use that new field in your column. Also, from a performance standpoint you will be better off.
Maybe use your loop to create the list of fieldnames as Richard suggested, then display tNames

Storing string references

Problem
There are multiple ways to store string reference, so how would you do it in the example code? Currently the problem is with storing access to string because it is causing non-local pointer cannot point to local object. Is storing 'First and 'Last to reference a string a preferable way?
String reference storage
This record stores reference to a string. The First and Last is supposed to point to a string. The Name should be able to the same I think, but that will cause non-local pointer cannot point to local object when a local string is assigned to that. So the current work around solution is to use First and Last.
type Segment is record
First : Positive;
Last : Positive;
Length : Natural := 0;
Name : access String;
end record;
Assigning sub string reference
The commented line is causing non-local pointer cannot point to local object. This is because Item is local. Source is not local and that is the string I want sub string references from.
procedure Find (Source : aliased String; Separator : Character; Last : out Natural; Item_Array : out Segment_Array) is
P : Positive := Source'First;
begin
for I in Item_Array'Range loop
declare
Item : aliased String := Separated_String_Next (Source, Separator, P);
begin
exit when Item'Length = 0;
Item_Array (I).Length := Item'Length;
Item_Array (I).First := Item'First;
Item_Array (I).Last := Item'Last;
--Item_Array (I).Name := Item'Access;
Last := I;
end;
end loop;
end;
Example
with Ada.Text_IO;
with Ada.Integer_Text_IO;
procedure Main is
use Ada.Text_IO;
use Ada.Integer_Text_IO;
function Separated_String_Next (Source : String; Separator : Character; P : in out Positive) return String is
A : Positive := P;
B : Positive;
begin
while A <= Source'Last and then Source(A) = Separator loop
A := A + 1;
end loop;
P := A;
while P <= Source'Last and then Source(P) /= Separator loop
P := P + 1;
end loop;
B := P - 1;
while P <= Source'Last and then Source(P) = Separator loop
P := P + 1;
end loop;
return Source (A .. B);
end;
type Segment is record
First : Positive;
Last : Positive;
Length : Natural := 0;
Name : access String;
end record;
type Segment_Array is array (Integer range <>) of Segment;
procedure Find (Source : String; Separator : Character; Last : out Natural; Item_Array : out Segment_Array) is
P : Positive := Source'First;
begin
for I in Item_Array'Range loop
declare
Item : aliased String := Separated_String_Next (Source, Separator, P);
begin
exit when Item'Length = 0;
Item_Array (I).Length := Item'Length;
Item_Array (I).First := Item'First;
Item_Array (I).Last := Item'Last;
--Item_Array (I).Name := Item'Access;
Last := I;
end;
end loop;
end;
Source : String := ",,Item1,,,Item2,,Item3,,,,,,";
Item_Array : Segment_Array (1 .. 100);
Last : Natural;
begin
Find (Source, ',', Last, Item_Array);
Put_Line (Source);
Put_Line ("Index First Last Name");
for I in Item_Array (Item_Array'First .. Last)'Range loop
Put (I, 5);
Put (Item_Array (I).First, 6);
Put (Item_Array (I).Last, 5);
Put (" ");
Put (Source (Item_Array (I).First .. Item_Array (I).Last));
New_Line;
end loop;
end;
Output
,,Item1,,,Item2,,Item3,,,,,,
Index First Last Name
1 3 7 Item1
2 11 15 Item2
3 18 22 Item3
The error message tells you exactly what is wrong : Item is a string declared locally, i.e. on the stack, and you are assigning its address to an access type (pointer). I hope I don't need to explain why that won't work.
The immediate answer - which isn't wrong but isn't best practice either, is to allocate space for a new string - in a storage pool or on the heap - which is done with new.
Item : access String := new String'(Separated_String_Next (Source, Separator, P));
...
Item_Array (I).Name := Item;
Note that some other record members, at least, Length all appear to be completely redundant since it is merely a copy of its eponymous attributes, so should probably be eliminated (unless there's a part of the picture I can't see).
There are better answers. Sometimes you need to use access types, and handle their object lifetimes and all the ways they can go wrong. But more often their appearance is a hint that something in the design can be improved : for example:
the Unbounded_String may manage your strings more simply
You could use the length as a discriminant on the Segment record, and store the actual string (not an Access) in the record itself
Ada.Containers are a standard library of containers to abstract over handling the storage yourself (much as the STL is used in C++).
If you DO decide you need access types, it's better to use a named access type type Str_Access is access String; - then you can create a storage pool specific to Str_Acc types, and release the entire pool in one operation, to simplify object lifetime management and eliminate memory leaks.
Note the above essentially "deep copies" the slices of the Source string. If there is a specific need to "shallow copy" it - i.e. refer to the specific substrings in place - AND you can guarantee its object lifetime, this answer is not what you want. If so, please clarify the intent of the question.
For a "shallow copy" the approach in the question essentially fails because Item is already a deep copy ... on the stack.
The closest approach I can see is to make the source string aliassed ... you MUST do as you want each Segment to refer to it ... and pass its access to the Find procedure.
Then each Segment becomes a tuple of First, Last, (redundant Length) and access to the entire string (rather than a substring).
procedure Find (Source : access String; Separator : Character;
Last : out Natural; Item_Array : out Segment_Array) is
P : Positive := Source'First;
begin
for I in Item_Array'Range loop
declare
Item : String := Separated_String_Next (Source.all, Separator, P);
begin
exit when Item'Length = 0;
...
Item_Array (I).Name := Source;
Last := I;
end;
end loop;
end;
Source : aliased String := ",,Item1,,,Item2,,Item3,,,,,,";
...
Find (Source'access, ',', Last, Item_Array);
for I in Item_Array (Item_Array'First .. Last)'Range loop
...
Put (Item_Array (I).Name(Item_Array (I).First .. Item_Array (I).Last));
New_Line;
end loop;
A helper to extract a string from a Segment would probably be useful:
function get(S : Segment) return String is
begin
return S.Name(S.First .. S.Last);
end get;
...
Put (get(Item_Array (I));
The only rationale I can see for such a design is where the set of strings to be parsed or dissected will barely fit in memory so duplication must be avoided. Perhaps also embedded programming or some such discipline where dynamic (heap) allocation is discouraged or even illegal.
I see no solution involving address arithmetic within a string, since an array is not merely its contents - if you point within it, you lose the attributes. You can make the same criticism of the equivalent C design : you can identify the start of a substring with a pointer, but you can't just stick a null terminator at the end of the substring without breaking the original string.
Given the bigger picture ... what you need, rather than the low level details of how you want to achieve it, there are probably better solutions.

SAS simplify the contents of a variable

In SAS, I've a variable V containing the following value
V=1996199619961996200120012001
I'ld like to create these 2 variables
V1=19962001 (= different modalities)
V2=42 (= the first modality appears 4 times and the second one appears 2 times)
Any idea ?
Thanks for your help.
Luc
For your first question (if I understand the pattern correctly), you could extract the first four characters and the last four characters:
a = substr(variable, 1,4)
b = substrn(variable,max(1,length(variable)-3),4);
You could then concatenate the two.
c = cats(a,b)
For the second, the COUNT function can be used to count occurrences of a string within a string:
http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p02vuhb5ijuirbn1p7azkyianjd8.htm
Hope this helps :)
Make it a bit more general;
%let modeLength = 4;
%let maxOccur = 100; ** in the input **;
%let maxModes = 10; ** in the output **;
Where does a certain occurrence start?;
%macro occurStart(occurNo);
&modeLength.*&occurNo.-%eval(&modeLength.-1)
%mend;
Read the input;
data simplified ;
infile datalines truncover;
input v $%eval(&modeLength.*&maxOccur.).;
Declare output and work variables;
format what $&modeLength..
v1 $%eval(&modeLength.*&maxModes.).
v2 $&maxModes..;
array w {&maxModes.}; ** what **;
array c {&maxModes.}; ** count **;
Discover unique modes and count them;
countW = 0;
do vNo = 1 to length(v)/&modeLength.;
what = substr(v, %occurStart(vNo), &modeLength.);
do wNo = 1 to countW;
if what eq w(wNo) then do;
c(wNo) = c(wNo) + 1;
goto foundIt;
end;
end;
countW = countW + 1;
w(countW) = what;
c(countW) = 1;
foundIt:
end;
Report results in v1 and v2;
do wNo = 1 to countW;
substr(v1, %occurStart(wNo), &modeLength.) = w(wNo);
substr(v2, wNo, 1) = put(c(wNo),1.);
put _N_= v1= v2=;
end;
keep v1 v2;
The data I testes with;
datalines;
1996199619961996200120012001
197019801990
20011996199619961996200120012001
;
run;

Strange behaviour when simply adding strings in Lazarus - FreePascal

The program has several "encryption" algorithms. This one should blockwise reverse the input. "He|ll|o " becomes "o |ll|He" (block length of 2).
I add two strings, in this case appending the result string to the current "block" string and making that the result. When I add the result first and then the block it works fine and gives me back the original string. But when i try to reverse the order it just gives me the the last "block".
Several other functions that are used for "rotation" are above.
//amount of blocks
function amBl(i1:integer;i2:integer):integer;
begin
if (i1 mod i2) <> 0 then result := (i1 div i2) else result := (i1 div i2) - 1;
end;
//calculation of block length
function calcBl(keyStr:string):integer;
var i:integer;
begin
result := 0;
for i := 1 to Length(keyStr) do
begin
result := (result + ord(keyStr[i])) mod 5;
result := result + 2;
end;
end;
//desperate try to add strings
function append(s1,s2:string):string;
begin
insert(s2,s1,Length(s1)+1);
result := s1;
end;
function rotation(inStr,keyStr:string):string;
var //array of chars -> string
block,temp:string;
//position in block variable
posB:integer;
//block length and block count variable
bl, bc:integer;
//null character as placeholder
n : ansiChar;
begin
//calculating block length 2..6
bl := calcBl(keyStr);
setLength(block,bl);
result := '';
temp := '';
{n := #00;}
for bc := 0 to amBl(Length(inStr),bl) do
begin
//filling block with chars starting from back of virtual block (in inStr)
for posB := 1 to bl do
begin
block[posB] := inStr[bc * bl + posB];
{if inStr[bc * bl + posB] = ' ' then block[posB] := n;}
end;
//adding the block in front of the existing result string
temp := result;
result := block + temp;
//result := append(block,temp);
//result := concat(block,temp);
end;
end;
(full code http://pastebin.com/6Uarerhk)
After all the loops "result" has the right value, but in the last step (between "result := block + temp" and the "end;" of the function) "block" replaces the content of "result" with itself completely, it doesn't add result at the end anymore.
And as you can see I even used a temp variable to try to work around that.. doesnt change anything though.
I am 99.99% certain that your problem is due to a subtle bug in your code. However, your deliberate efforts to hide the relevant code mean that we're really shooting in the dark. You haven't even been clear about where you're seeing the shortened Result: GUI Control/Debugger/Writeln
The irony is that you have all the information at your fingertips to provide a small concise demonstration of your problem - including sample input and expected output.
So without the relevant information, I can only guess; I do think I have a good hunch though.
Try the following code and see if you have a similar experience with S3:
S1 := 'a'#0;
S2 := 'bc';
S3 := S1 + S2;
The reason for my hunch is that #0 is a valid character in a string: but whenever that string needs to be processed as PChar, #0 will be interpreted as a string terminator. This could very well cause the "strange behaviour" you're seeing.
So it's quite probable that you have at least one of the following 2 bugs in your code:
You are always processing 1 too many characters; with the extra character being #0.
When your input string has an odd number of characters: your algorithm (which relies on pairs of characters) adds an extra character with value #0.
Edit
With the additional source code, my hunch is confirmed:
Suppose you have a 5 character string, and key that produces block length 2.
Your inner loop (for posB := 1 to bl do) will read beyond the length of inStr on the last iteration of the outer loop.
So if the next character in memory happens to be #0, you will be doing exactly as described above.
Additional problem. You have the following code:
//calculating block length 2..6
bl := calcBl(keyStr);
Your assumption in the comment is wrong. From the implementation of calcBl, if keyStr is empty, your result will be 0.

Resources