Inconsistent behaviour concatenating macro variables - string

I am trying to create a string by concatenating several variables/delimiters within a macro:
%macro write_to_string();
%let delim = = ;
%let string = %sysfunc(catx(%str( ),
&string, \,
step start,
%nrstr(%superq(delim)),
&etls_stepStartTime,
|,
output table,
%nrstr(%superq(delim)),
&SYSLAST,
|,
transform return code,
%nrstr(%superq(delim)),
&trans_rc));
%mend;
The macro is called at the end of several transformations (within SAS DI), so the string keeps having text appended at the end.
If the each instance of %nrstr(%superq(delim)) is replaced with some other delimiter, : say, then the above macro behaves as intended. But with the code as above what I get a 0 followed by the last string to have been appended.
I am quite ignorant about macro variables and functions and am struggling to understand
Why the choice of delimiter seems to affect whether the string is properly appended
Why macro variables sometimes need to be referenced with the preceding & and sometimes not.
Any help is greatly appreciated!
EDIT
The input variables in the code above are autogenerated by the SAS DI system and reset after each transformation in the job. The values look something like
&etls_stepStartTime = 16FEB2017:17:25:37
&SYSLAST = WORK.MY_TABLE_NAME
&trans_rc = 0
Here the value of &trans_rc will indicate the error/warning status of the last transformation that ran.
So my desired output (with the &delim variable working) would be values of the form
step start = 16FEB2017:17:25:37 | output table = WORK.MY_TABLE_NAME | transform return code = 0
delimited by \. As mentioned above, what I get is only the last value (the one corresponding to the last transformation) with a preceding 0\, unless I change the delimiter to some non-reserved character constant.

Don't use %SYSFUNC() with the CAT... series of functions. First of all you don't need them as in macro code you can just place the text where you want it. Second since those functions can work on either numeric or character arguments. This means that SAS has to try to figure out whether the text that your macro code is generating as the arguments represent a number or a character string. That is probably why the equal signs result in zeros. SAS is treating the equal sign as equality test so the zero means that the values on each side are not equal.
%let string =&string \ step start &delim &etls_stepStartTime ;
%let string =&string | output table &delim &SYSLAST ;
%let string =&string | transform return code &delim &trans_rc ;

Related

Python Function with 2 values

Homework Question I am struggling with
Specification:
The third function you will write should be called ‘excelPrep’. Your function should take one (1) argument:
a string that will contain the Excel formula. The function should return two (2) values: first, a string
containing the modified Excel formula; and second an integer containing the number of dollar signs
removed.
Example Test Case:
excelPrep(‘=SUM($A$4:$A$12)’)
returns
=sum(a4:a12)
and
4
I will not write the entire code since this is stackoverflow and not homework helper so I think you should complete with your own mind.
The function should be something like:
Remove the $ by checking every letter in the string with for loop, at the same time add a number counter so that you can know how many $s you’ve removed. Making the input from =SUM($A$4:$A$12) into =SUM(A4:A12).
You could return the value now however if the assignment specified to make the letters in to lowercase. Make a new string variable and append all the letters from the function returned variable =SUM(A4:A12) check if the letter is a number if not .lower(). Which leaves you with =sum(a4:a12).
To return two values, in the end of your function type return stringVariable, integerVariable. Just be careful when ever you are calling the function, you will need to variables to store the outputs. Like: a, b = excelPrep(“=SUM($A$4:$A$12) which for your information a = “=sum(a4:a12)”, b = 4.
Hope that helps.

How to assign multiple lines to a string variable in Matlab

I have a few lines of text like this:
abc
def
ghi
and I want to assign these multiple lines to a Matlab variable for further processing.
I am copying these from very large text file and want to process it in Matlab Instead of saving the text into a file and then reading line by line for processing.
I tried to handle the above text lines as single string but am getting an error whilst trying to assign to a variable:
x = 'abc
def
ghi'
Error:
x = 'abc
|
Error: String is not terminated properly.
Any suggestions which could help me understand and solve the issue will be highly appreciated.
I frequently do this, namely copy text from elsewhere which I want to hard-code into a MATLAB script (in my case it's generally SQL code I want to manipulate and call from MATLAB).
To achieve this I have a helper function in clipboard2cellstr.m defined as follows:
function clipboard2cellstr
str = clipboard('paste');
str = regexprep(str, '''', ''''''); % Double any single quotes
strs = regexp(str, '\s*\r?\n\r?', 'split');
cs = sprintf('{\n''%s''\n}', strjoin(strs, sprintf('''\n''')));
clipboard('copy', cs);
disp(cs)
disp('(Copied to Clipboard)')
end
I then copy the text using Ctrl-c (or however) and run clipboard2cellstr. This changes the contents of the clipboard to something I can paste into the MATLAB editor using Ctrl-v (or however).
For example, copying this line
and this line
and this one, and then running the function generates this:
{
'For example, copying this line'
'and this line'
'and this one, and then running the function generates this:'
}
which is valid MATLAB which can be pasted directly in.
Your error is because you ended the line when MATLAB was expecting a closing quote character. You must use array notation to have multi-line or multi-element arrays.
You can assign like this if you use array notation
x = ['abc'
'def'
'hij']
>> x = 3×3 char array
Note: with this method, your rows must have the same number of characters, as you are really dealing with a character array. You can think of a character array like a numeric matrix, hence why it must be "rectangular".
If you have MATLAB R2016b or newer, you can use the string data type. This uses double quotes "..." rather than single quotes '...', and can be multi-line. You must still use array notation:
x = ["abc"
"def"
"hijk"]
>> x = 3×1 string array
We can have different numbers of characters in each line, as this is simply a 3 element string array, not a character array.
Alternatively, use a cell array of character arrays (or strings)
x = {'abc'
'def'
'hijk'}
>> x = 3×1 cell array
Again, you can have character arrays or strings of different lengths within a cell array.
In all of the above examples, a newline is simply for readability and can be replaced by a semi-colon ; to denote the next line of the array.
The option you choose will depend on what you want to do with the text. If you're reading from a file, I would suggest the string array or the cell array, as they can deal with different length lines. For backwards compatibility, use a cell array. You may find cellfun relevant for operating on cell arrays. For native string operations, use a string array.

Sort letters in string alphabetically- SAS

I would like to sort the letters in a string alphabetically.
E.g.
'apple' = 'aelpp'
The only function I have seen that is somewhat similar is SORTC, but I would like to avoid splitting each word into an array of letters if possible.
Joe's right - there is no built-in function that does this. You have two options here that I can see:
Split your string into an array and sort the array using call sortc. You can do this fairly painlessly using call pokelong provided that you have first defined an array of sufficient length.
Implement a sorting algorithm of your choice. If you choose to go down this route, I would suggest using substr on the left of the = sign to change individual characters without rewriting the whole string.
Here's an example of how you might do #1. #2 would be much more work.
data _null_;
myword = 'apple';
array letters[5] $1;
call pokelong(myword,addrlong(letters1),5); /*Limit # of chars to copy to the length of array*/
call sortc(of letters[*]);
myword = cat(of letters[*]);
putlog _all_;
run;
N.B. for an array of length 5 as used here, make sure you only write the first 5 characters of the string into memory at the start of the array when using call pokelong in order to avoid overflowing past the end of the array - otherwise you could overwrite some other arbitrary section of memory when processing longer values of myword. This could cause undesirable side effects, e.g. application / system crashes. Also, this technique for populating the array will not work in SAS University Edition - if you're using that, you'll need to use a do-loop instead.
I did a little test of this - sorting 2m random words of length 100 consisting of characters chosen from the whole ASCII printable range took about 15 seconds using a single CPU of a several-years-old PC - slightly less time than it took to create the test dataset.
data have;
length myword $100;
do i = 1 to 2000000;
do j = 1 to 100;
substr(myword,j,1) = byte(32 + int(ranuni(1) * (126 - 32)));
end;
output;
end;
drop i j;
run;
data want;
set have;
array letters[100] $1;
call pokelong(myword,addrlong(letters1),100); /*Limit # of chars to copy to the length of array*/
call sortc(of letters[*]);
myword = cat(of letters[*]);
drop letters:;
run;

Hyphen with strings in PROC FORMAT

I am working with IC9 codes and am creating somewhat of a mapping between codes and an integer:
proc format library = &formatlib;
invalue category other = 0
'410'-'410.99', '425.4'-'425.99' = 1
I have searched and searched, but haven't been able to find an explanation of how that range actually works when it comes to formatting.
Take the first range, for example. I assume SAS interprets '410'-'410.99' as "take every value between the inclusive range [410, 410.99] and convert it to a 1. Please correct me if I'm wrong in that assumption. Does SAS treat these seeming strings as floating-point decimals, then? I think that must be the case if these are to be numerical ranges for formatting all codes within the range.
I'm coming to SAS from the worlds of R and Python, and thus the way quote characters are used in SAS sometimes is unclear (like when using %let foo = bar... not quotes are used).
When SAS compares string values with normal comparison operators, what it does is compare the byte representation of each character in the string, one at a time, until it reaches a difference.
So what you're going to see here is when a string is input, it will be compared to the 'start' string and, if greater than start, then compared to the 'end' string, and if less than end, evaluated to a 1; if it's not for each pair listed, then evaluated to a zero.
Importantly, this means that some nonsensical results could occur - see the last row of the following test, for example.
proc format;
invalue category other = 0
'410'-'410.99', '425.4'-'425.99' = 1
;
quit;
data test;
input #1 testval $6.;
category=input(testval,category.);
datalines;
425.23
425.45
425.40
410#
410.00
410.AA
410.7A
;;;;
run;
410.7A is compared to 410 and found greater, as '4'='4', '1'='1', '0'='0', '.' > ' ', so greater . Then 410.7A is compared to 410.99 and found less, as '4'='4', '1'='1', '0'='0', '7' < '9', so less. The A is irrelevant to the comparison. But on the row above it you see it's not in the sequence, since A is ASCII 41x and that is not less than '9' (ASCII 39x).
Note that all SAS strings are filled to their full length by spaces. This can be important in string comparisons, because space is the lowest-valued printable character (if you consider space printable). Thus any character you're likely to compare to space will be higher - so for example the fourth row (410#) is a 1 because # is between and . in the ASCII table! But change that to / and it fails. Similarly, change it to byte(13) (through code) and it fails - because it is then less than space (so 410^M, with ^M representing byte(13), is less than start (410)). In informats and formats, SAS will treat the format/informat start/end as being whatever the length that it needs to - so if you're reading a 6 long string, it will treat it as length 6 and fill the rest with spaces.

Extracting a specific word and a number of tokens on each side of it from each string in a column in SAS?

Extracting a specific word and a number of tokens on each side of it from each string in a column in SAS EG ?
For example,
row1: the sun is nice
row2: the sun looks great
row3: the sun left me
Is there a code that would produce the following result column (2 words where sun is the first):
SUN IS
SUN LOOKS
SUN LEFT
and possibly a second column with COUNT in case of duplicate matches.
So if there was 20 SUN LOOKS then it they would be grouped and have a count of 20.
Thanks
I think you can use functions findw() and scan() to do want you want. Both of those functions operate on the concept of word boundaries. findw() returns the position of the word in the string. Once you know the position, you can use scan() in a loop to get the next word or words following it.
Here is a simple example to show you the concept. It is by no means a finished or polished solution, but intended you point you in the right direction. The input data set (text) contains the sentences you provided in your question with slight modifications. The data step finds the word "sun" in the sentence and creates a variable named fragment that contains 3 words ("sun" + the next 2 words).
data text2;
set text;
length fragment $15;
word = 'sun'; * search term;
fragment_len = 3; * number of words in target output;
word_pos = findw(sentence, word, ' ', 'e');
if word_pos then do;
do i = 0 to fragmen_len-1;
fragment = catx(' ', fragment, scan(sentence, word_pos+i));
end;
end;
run;
Here is a partial print of the output data set.
You can use a combination of the INDEX, SUBSTR and SCAN functions to achieve this functionality.
INDEX - takes two arguments and returns the position at which a given substring appears in a string. You might use:
INDEX(str,'sun')
SUBSTR - simply returns a substring of the provided string, taking a second numeric argument referring to the starting position of the substring. Combine this with your INDEX function:
SUBSTR(str,INDEX(str,'sun'))
This returns the substring of str from the point where the word 'sun' first appears.
SCAN - returns the 'words' from a string, taking the string as the first argument, followed by a number referring to the 'word'. There is also a third argument that specifies the delimiter, but this defaults to space, so you wouldn't need it in your example.
To pick out the word after 'sun' you might do this:
SCAN(SUBSTR(str,INDEX(str,'sun')),2)
Now all that's left to do is build a new string containing the words of interest. That can be achieved with concatenation operators. To see how to concatenate two strings, run this illustrative example:
data _NULL_;
a = 'Hello';
b = 'World';
c = a||' - '||b;
put c;
run;
The log should contain this line:
Hello - World
As a result of displaying the value of the c variable using the put statement. There are a number of functions that can be used to concatenate strings, look in the documentation at CAT,CATX,CATS for some examples.
Hopefully there is enough here to help you.

Resources