SAS - put empty space on string - string

I have a script to write a SAS program (txt) that looks like this:
/********* Import excel spreadsheet with model sepcs *****************/
proc import file = "&mydir\sample.xls" out = model dbms = xls replace;
run;
/********* Create program model *****************/
data model;
set model;
dlb = resolve(dlb);
dub = resolve(dub);
run;
data model;
set model;
where2 = tranwrd(where,"="," ");
where2 = tranwrd(where2,"<"," ");
where2 = tranwrd(where2,">"," ");
nword = countw(where2);
bounds = trim(dlb)!!" "!!trim(dub);
bounds = tranwrd(bounds,"="," ");
bounds = tranwrd(bounds,"<"," ");
bounds = tranwrd(bounds,">"," ");
nbounds = countw(bounds);
run;
proc sql noprint;
select max(nword) into: max_word from model ;
select max(nbounds) into: max_aux from model ;
select name into: list_var separated by " " from dictionary.columns where libname = "WORK" and memname = "IMP" ;
quit;
/******* Generate Model ********/
%macro generate_model;
data model;
set model;
attrib wherev length = $500.;
do i = 1 to countw(where2);
%do j = 1 %to %sysfunc(countw(&list_var));
if upcase(scan(where2,i)) = "%upcase(%scan(&list_var,&j))" and scan(where2,i) not in ("0","1","2","3","4","5","6","7","8","9") then do;
if missing(wherev) then wherev = trim(scan(where2,i));
else if index(wherev,trim(scan(where2,i))) = 0 then do;
wherev = trim(wherev)!!" "!!trim(scan(where2,i));
end;
end;
%end;
end;
drop i where2;
run;
data model;
set model;
attrib aux length = $500.;
do i = 1 to countw(bounds);
%do j = 1 %to %sysfunc(countw(&list_var));
if upcase(scan(bounds,i)) = "%upcase(%scan(&list_var,&j))" and scan(bounds,i) not in ("0","1","2","3","4","5","6","7","8","9") then do;
if missing(aux) then aux = trim(scan(bounds,i));
else if index(aux,trim(scan(bounds,i))) = 0 then do;
aux = trim(aux)!!" "!!trim(scan(bounds,i));
end;
end;
%end;
end;
drop i bounds;
run;
%mend;
%generate_model;
data outem.bound;
set outem.model;
attrib txt length = $2000.;
txt = "******************Macros for variable"!!trim(dep)!!"******;";
output;
txt = "%"!!"macro bound"!!trim(dep)!!";";
output;
if not missing(lb) then do;
txt ="LB="!!trim(lb)!!";";
output;
end;
if not missing(ub) then do;
txt ="UB="!!trim(ub)!!";";
output;
end;
if not missing(dlb) and not missing(lb) then do;
txt ="LB=MAX(LB,"!!trim(dlb)!!");";
output;
end;
if not missing(dlb) and missing(lb) then do;
txt ="LB="!!trim(dlb)!!";";
output;
end;
if not missing(dub) and not missing(ub) then do;
txt ="UB=MIN(UB,"!!trim(dub)!!");";
output;
end;
if not missing(dub) and missing(ub) then do;
txt ="UB="!!trim(dub)!!";";
output;
end;
txt = "%"!!"mend;";
output;run;
data outem.imp;
set outem.bound;
file "&mydir\3_generate_models\3_model.sas" lrecl = 2000;
put txt;
run;
The program works fine, however i can't manage to put empty space before UB or LB.
The output looks like this:
%macro boundHC0340;
LB= 1;
UB= 9;
%mend;
But I would like to get this:
%macro boundHC0340;
LB= 1;
UB= 9;
%mend;
The code already has some attempts to put empty space before UB and LB, but so far I couldn't manage.
I can put other characters and strings in there. I just can't put empty space before UB and LB in order to produce indented code.
I've tried something like this:
txt =" LB="!!trim(lb)!!";";
But the empty space before LB does nothing.
However if i write this:
txt ="******LB="!!trim(lb)!!";";
I get the asterisks on my program.
Any idea of what I'm missing here?
Thank you very much for your support.
Best regards
Ps: here's the hyperlink to sample xls file: sample.xls

Assuming that you have built the variable TXT with the value you want to see you just need to add a format to your final step. To avoid writing a lot of useless trailing blanks use the $VARYING format. You will need to calculate the length of your string to use that format.
data outem.imp;
set outem.bound;
file "&mydir\3_generate_models\3_model.sas" lrecl = 2000;
length= lengthn(txt);
put txt $varying2000. length;
run;
But it is probably easier to just skip all of the concatenation and just use the power of the PUT statement itself to write the program directly from your data. Then you can use things like pointer controls (#3) or named value lb= and other features of the PUT statement to format your program file.
data _null_;
set outem.model;
file "&mydir\3_generate_models\3_model.sas" ;
put 72*'*' ';'
/ '* Macros for variable ' dep ';'
/ 72*'*' ';'
/ '%macro bound' dep ';'
;
if not missing(lb) then put #3 lb= ';' ;
if not missing(ub) then put #3 ub= ';' ;
if not missing(dlb) and not missing(lb) then put
#3 'LB=MAX(LB,' dlb ');'
;
if not missing(dlb) and missing(lb) then put
#3 'LB=' dlb ';'
;
if not missing(dub) and not missing(ub) then put
#3 'UB=MIN(UB,' dub ');'
;
if not missing(dub) and missing(ub) then put
#3 'UB=' dub ';'
;
put '%mend bound' dep ';';
run;
Although looking at the logic of those IF statement why not reduce them to:
put #3 'LB=MAX(' lb ',' dlb ');' ;
put #3 'UB=MIN(' ub ',' dub ');' ;

I think this is the result of SAS applying left alignment by default for the $w. format of your variable when you use your put statement. You can override this by applying a format in the put statement and specifying what alignment you want to use:
data _null_;
file "%sysfunc(pathname(work))\example.txt";
a = " text here";
/*Approach 1 - default behaviour*/
/*No leading spaces on this line in output file (default)*/
put a;
/*Approach 2 - $varying + right alignment*/
/*We need to right align text while preserving the number of leading spaces, so use $varying. */
/*If every line is the same length, we can use $w. instead*/
/*Use -r to override the default format alignment*/
varlen = length(a);
put a $varying2000.-r varlen;
/*Approach 3 - manually specify indentation*/
/*Alternatively - ditch the leading spaces and tell SAS which column to start at*/
put #4 a;
run;
Try changing the last part of your code so it looks a bit like this (fix paths and dataset names as appropriate):
data bound;
set model;
attrib txt length = $2000.;
txt = "******************Macros for variable"!!trim(dep)!!"******;";
output;
txt = "%"!!"macro bound"!!trim(dep)!!";";
output;
if not missing(lb) then do;
/* LEADING SPACES ADDED HERE */
/* LEADING SPACES ADDED HERE */
/* LEADING SPACES ADDED HERE */
txt =" LB="!!trim(lb)!!";";
output;
end;
if not missing(ub) then do;
/* LEADING SPACES ADDED HERE */
/* LEADING SPACES ADDED HERE */
/* LEADING SPACES ADDED HERE */
txt =" UB="!!trim(ub)!!";";
output;
end;
if not missing(dlb) and not missing(lb) then do;
txt ="LB=MAX(LB,"!!trim(dlb)!!");";
output;
end;
if not missing(dlb) and missing(lb) then do;
txt ="LB="!!trim(dlb)!!";";
output;
end;
if not missing(dub) and not missing(ub) then do;
txt ="UB=MIN(UB,"!!trim(dub)!!");";
output;
end;
if not missing(dub) and missing(ub) then do;
txt ="UB="!!trim(dub)!!";";
output;
end;
txt = "%"!!"mend;";
output;
run;
data _null_;
set bound;
file "%sysfunc(pathname(work))\example.sas" lrecl = 2000;
varlen = length(txt);
put txt $varying2000.-r varlen;
run;
x "notepad ""%sysfunc(pathname(work))\example.sas""";
Contents of example.sas (based on sample xls):
******************Macros for variableHC0340******;
%macro boundHC0340;
LB= 1;
UB= 9;
%mend;

Related

MemSQL - populate a table using comma separate string value

I need to pass a list of values to a MemSQL procedure.
Is there a way of transforming the comma separated integer values from the input string into a table.
MemSQL doesn't yet have something like the python string split function that converts a delimited string into an array of strings. In MemSQL 6.5, the best method would be to do something like this using the locate builtin function.
delimiter //
create or replace procedure insert_split_string(input text, d text) as
declare
position int = 1;
newPosition int = -1;
begin
while newPosition != 0 loop
newPosition = locate(d, input, position);
if newPosition != 0 then
insert into t values(substring(input, position, newPosition - position));
position = newPosition + 1;
end if;
end loop;
-- Add the last delimited element
insert into t values(substring_index(input, d, -1));
end //
delimiter ;
create table t(i int);
call insert_split_string("1,2,3,4,5", ",");
select * from t;

SAS Index on Array

I am trying to search for a keyword in a description field (descr) and if it is there define that field as a match (what keyword it matches on is not important). I am having an issue where the do loop is going through all entries of the array and . I am not sure if this is because my do loop is incorrect or because my index command is inocrrect.
data JE.KeywordMatchTemp1;
set JE.JEMasterTemp;
if _n_ = 1 then do;
do i = 1 by 1 until (eof);
set JE.KeyWords end=eof;
array keywords[100] $30 _temporary_;
keywords[i] = Key_Words;
end;
end;
match = 0;
do i = 1 to 100 until(match=1);
if index(descr, keywords[i]) then match = 1;
end;
drop i;
run;
Add another condition to your DO loop to have it terminate when any match is found. You might want to also remember how many entries are in the array. Also make sure to use INDEX() function properly.
data JE.KeywordMatchTemp1;
if _n_ = 1 then do;
do i = 1 by 1 until (eof);
set JE.KeyWords end=eof;
array keywords[100] $30 _temporary_;
keywords[i] = Key_Words;
end;
last_i = i ;
retain last_i ;
end;
set JE.JEMasterTemp;
match = 0;
do i = 1 to last_i while (match=0) ;
if index(descr, trim(keywords[i]) ) then match = 1;
end;
drop i last_i;
run;
You have two problems; both of which would be easy to see in a small compact example (suggestion: put an example like this in your question in the future).
data partials;
input keyword $;
datalines;
home
auto
car
life
whole
renter
;;;;
run;
data master;
input #1 description $50.;
datalines;
Mutual Fund
State Farm Automobile Insurance
Checking Account
Life Insurance with Geico
Renter's Insurance
;;;;
run;
data want;
set master;
array keywords[100] $ _temporary_;
if _n_=1 then do;
do _i = 1 by 1 until (eof);
set partials end=eof;
keywords[_i] = keyword;
end;
end;
match=0;
do _m = 1 to dim(keywords) while (match=0 and keywords[_m] ne ' ');
if find(lowcase(description),lowcase(keywords[_m]),1,'t') then match=1;
end;
run;
Two things to look at here. First, notice the addition to the while. This guarantees we never try to match " " (which will always match if you have any spaces in your strings). The second is the t option in find (I note you have to add the 1 for start position, as for some reason the alternate version doesn't work at least for me) which trims spaces from both arguments. Otherwise it looks for "auto " instead of "auto".

Comparing strings with symbols from different alphabets

I want to compare two strings which contains symbols from different alphabets (e.g. Russian and English). I want that symbols which looks similarly is considered as equal to each other.
E.g. in the word "Mom" letter "o" is from English alphabet (code 043E in Unicode), and in the world "Mоm" letter "о" is from Russian alphabet (code 006F in Unicode). So ("Mom" = "Mоm") => false, but I want it would be true. Is there some standard SAS function or I should wright a macro to do it.
Thanks!
I would do like that:
First I would make map. I mean which letter in Russian language corresponds to what letter in English language. Example:
б = b
в = v
...
I would store this map in a separate table or as macroVars.
Then I would create a macro loop, with tranwrd function, which loops throug the map, which was created.
Example here could be like that.
data _null_;
stringBefore = "без";
stringAfter = tranwrd(stringBefore,"а","a");
stringAfter = tranwrd(stringAfter,"б","b");
stringAfter = tranwrd(stringAfter,"в","v");
...
run;
After this transformation I think You can compare your strings.
I also coded some functions to deal with keybord layout misprints. Here is code:
/***************************************************************************/
/* FUNCTION count_rus_letters RETURNS NUMBER OF CYRILLIC LETTERS IN STRING */
/***************************************************************************/
proc fcmp outlib=sasuser.userfuncs.mystring;
FUNCTION count_rus_letters(string $);
length letter $2;
rus_count=0;
len=klength(string);
do i=1 to len;
letter=ksubstr(string,i,1);
if letter in ("А","а","Б","б","В","в","Г","г","Д","д","Е","е","Ё","ё","Ж","ж"
"З","з","И","и","Й","й","К","к","Л","л","М","м","Н","н","О","о","П","п","Р","р",
"С","с","Т","т","У","у","Ф","ф","Х","х","Ц","ц","Ч","ч","Ш","ш","Щ","щ","Ъ","ъ"
"Ы","ы","Ь","ь","Э","э","Ю","ю","Я","я")
then rus_count+1;
end;
return(rus_count);
endsub;
run;
/**************************************************************************/
/* FUNCTION count_eng_letters RETURNS NUMBER OF ENGLISH LETTERS IN STRING */
/**************************************************************************/
proc fcmp outlib=sasuser.userfuncs.mystring;
FUNCTION count_eng_letters(string $);
length letter $2;
eng_count=0;
len=klength(string);
do i=1 to len;
letter=ksubstr(string,i,1);
if rank('A') <= rank(letter) <=rank('z')
then eng_count+1;
end;
return(eng_count);
endsub;
run;
/**************************************************************************/
/* FUNCTION is_string_russian RETURNS 1 IF NUMBER OF RUSSIAN SYMBOLS IN */
/* STRING >= NUMBER OF ENGLISH SYMBOLS */
/**************************************************************************/
proc fcmp outlib=sasuser.userfuncs.mystring;
FUNCTION is_string_russian(string $);
length letter $2 result 8;
eng_count=0;
rus_count=0;
len=klength(string);
do i=1 to len;
letter=ksubstr(string,i,1);
if letter in ("А","а","Б","б","В","в","Г","г","Д","д","Е","е","Ё","ё","Ж","ж"
"З","з","И","и","Й","й","К","к","Л","л","М","м","Н","н","О","о","П","п","Р","р",
"С","с","Т","т","У","у","Ф","ф","Х","х","Ц","ц","Ч","ч","Ш","ш","Щ","щ","Ъ","ъ"
"Ы","ы","Ь","ь","Э","э","Ю","ю","Я","я")
then rus_count+1;
if rank('A') <= rank(letter) <=rank('z')
then eng_count+1;
end;
if rus_count>=eng_count
then result=1;
else result=0;
return(result);
endsub;
run;
/**************************************************************************/
/* FUNCTION fix_layout_misprints REPLACES MISPRINTED SYMBOLS BY ANALYSING */
/* LANGUAGE OF THE STRING (FOR ENGLISH STRING RUSSIAN SYMBOLS ARE */
/* REPLACED BY ENGLISH COPIES AND FOR RUSSIAN STRING SYMBOLS ARE */
/* REPLACED BY RUSSIAN COPIES) */
/**************************************************************************/
proc fcmp outlib=sasuser.userfuncs.mystring;
FUNCTION fix_layout_misprints(string $) $ 1000;
length letter $2 result $1000;
eng_count=0;
rus_count=0;
len=klength(string);
do i=1 to len;
letter=ksubstr(string,i,1);
if letter in ("А","а","Б","б","В","в","Г","г","Д","д","Е","е","Ё","ё","Ж","ж"
"З","з","И","и","Й","й","К","к","Л","л","М","м","Н","н","О","о","П","п","Р","р",
"С","с","Т","т","У","у","Ф","ф","Х","х","Ц","ц","Ч","ч","Ш","ш","Щ","щ","Ъ","ъ"
"Ы","ы","Ь","ь","Э","э","Ю","ю","Я","я")
then rus_count+1;
if rank('A') <= rank(letter) <=rank('z')
then eng_count+1;
end;
if rus_count>=eng_count
then result=ktranslate(string,"АаВЕеКкМОоРрСсТХх","AaBEeKkMOoPpCcTXx");
else result=ktranslate(string,"AaBEeKkMOoPpCcTXx","АаВЕеКкМОоРрСсТХх");
return(result);
endsub;
run;
/***********/
/* EXAMPLE */
/***********/
options cmplib=sasuser.userfuncs;
data _null_;
good_str="Иванов";
err_str="Ивaнов";
fixed_str=fix_layout_misprints(err_str);
put "Good string=" good_str;
put "Error string=" err_str;
put "Fixed string=" fixed_str;
rus_count_in_err=count_rus_letters(err_str);
put "Count or Cyrillic symbols in error string=" rus_count_in_err;
eng_count_in_err=count_eng_letters(err_str);
put "Count or English symbols in error string=" eng_count_in_err;
is_error_str_russian=is_string_russian(err_str);
put "Is error string language Russian=" is_error_str_russian;
if (good_str ne err_str)
then put "Before clearing - strings are not equal to each other";
if (good_str = fixed_str)
then put "After clearing - strings are equal to each other";
run;

SAS simplify the contents of a variable

In SAS, I've a variable V containing the following value
V=1996199619961996200120012001
I'ld like to create these 2 variables
V1=19962001 (= different modalities)
V2=42 (= the first modality appears 4 times and the second one appears 2 times)
Any idea ?
Thanks for your help.
Luc
For your first question (if I understand the pattern correctly), you could extract the first four characters and the last four characters:
a = substr(variable, 1,4)
b = substrn(variable,max(1,length(variable)-3),4);
You could then concatenate the two.
c = cats(a,b)
For the second, the COUNT function can be used to count occurrences of a string within a string:
http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p02vuhb5ijuirbn1p7azkyianjd8.htm
Hope this helps :)
Make it a bit more general;
%let modeLength = 4;
%let maxOccur = 100; ** in the input **;
%let maxModes = 10; ** in the output **;
Where does a certain occurrence start?;
%macro occurStart(occurNo);
&modeLength.*&occurNo.-%eval(&modeLength.-1)
%mend;
Read the input;
data simplified ;
infile datalines truncover;
input v $%eval(&modeLength.*&maxOccur.).;
Declare output and work variables;
format what $&modeLength..
v1 $%eval(&modeLength.*&maxModes.).
v2 $&maxModes..;
array w {&maxModes.}; ** what **;
array c {&maxModes.}; ** count **;
Discover unique modes and count them;
countW = 0;
do vNo = 1 to length(v)/&modeLength.;
what = substr(v, %occurStart(vNo), &modeLength.);
do wNo = 1 to countW;
if what eq w(wNo) then do;
c(wNo) = c(wNo) + 1;
goto foundIt;
end;
end;
countW = countW + 1;
w(countW) = what;
c(countW) = 1;
foundIt:
end;
Report results in v1 and v2;
do wNo = 1 to countW;
substr(v1, %occurStart(wNo), &modeLength.) = w(wNo);
substr(v2, wNo, 1) = put(c(wNo),1.);
put _N_= v1= v2=;
end;
keep v1 v2;
The data I testes with;
datalines;
1996199619961996200120012001
197019801990
20011996199619961996200120012001
;
run;

Send subset of data to SAS DS2 thread

I have a dataset with 5 groups and I want to use the DS2 procedure in SAS to concurrently compute group means.
Simulated dataset:
data sim;
call streaminit(7);
do group = 1 to 5;
do pt = 1 to 500;
x = rand('ERLANG', group);
output;
end;
end;
run;
How I envision it working is that each of 5 threads receives a subset of the data corresponding to a particular group. The mean of x is calculated on each subset like so:
proc ds2;
thread t / overwrite=yes;
dcl double n sum mean;
method init();
n = 0;
sum = 0;
mean = .;
end;
method run();
set sim; /* Or perhaps a subsetted dataset */
sum + x;
n + 1;
end;
method term();
mean = sum / n;
output;
end;
endthread;
...
quit;
The problem is, if you call a thread that processes a dataset like below, rows are sent to the 5 threads all willy-nilly (i.e. irrespective of groups).
data test / overwrite=yes;
dcl thread t t_instance;
method run();
set from t_instance threads=5;
end;
enddata;
How can I tell SAS to subset the data by group and pass each subset to its own thread?
I believe you have to add the by statement inside the run() method, and then add some code to deal with the by group (ie, if you want it to output for last.group then add code to do so and clear the totals). DS2 is supposed to be smart and use one thread per by group (or, at least, process an entire by group per thread). I'm not sure if you will see a great improvement if you're reading from disk (since the threading advantage is probably less than the disk read time) but who knows.
The only changes below are in run(), and adding a proc means to check myself.
data sim;
call streaminit(7);
do group = 1 to 5;
do pt = 1 to 500;
x = rand('ERLANG', group);
output;
end;
end;
run;
proc ds2;
thread t / overwrite=yes;
dcl double n sum mean ;
method init();
n = 0;
sum = 0;
mean = .;
end;
method run();
set sim;
by group;
sum + x;
n + 1;
if last.group then do;
mean = sum / n;
output;
n=0;
sum=0;
end;
end;
method term();
end;
endthread;
run;
data test / overwrite=yes;
dcl thread t t_instance;
method run();
set from t_instance threads=5;
end;
enddata;
run;
quit;
proc means data=sim;
class group;
var x;
run;

Resources