Hyphen with strings in PROC FORMAT - string

I am working with IC9 codes and am creating somewhat of a mapping between codes and an integer:
proc format library = &formatlib;
invalue category other = 0
'410'-'410.99', '425.4'-'425.99' = 1
I have searched and searched, but haven't been able to find an explanation of how that range actually works when it comes to formatting.
Take the first range, for example. I assume SAS interprets '410'-'410.99' as "take every value between the inclusive range [410, 410.99] and convert it to a 1. Please correct me if I'm wrong in that assumption. Does SAS treat these seeming strings as floating-point decimals, then? I think that must be the case if these are to be numerical ranges for formatting all codes within the range.
I'm coming to SAS from the worlds of R and Python, and thus the way quote characters are used in SAS sometimes is unclear (like when using %let foo = bar... not quotes are used).

When SAS compares string values with normal comparison operators, what it does is compare the byte representation of each character in the string, one at a time, until it reaches a difference.
So what you're going to see here is when a string is input, it will be compared to the 'start' string and, if greater than start, then compared to the 'end' string, and if less than end, evaluated to a 1; if it's not for each pair listed, then evaluated to a zero.
Importantly, this means that some nonsensical results could occur - see the last row of the following test, for example.
proc format;
invalue category other = 0
'410'-'410.99', '425.4'-'425.99' = 1
;
quit;
data test;
input #1 testval $6.;
category=input(testval,category.);
datalines;
425.23
425.45
425.40
410#
410.00
410.AA
410.7A
;;;;
run;
410.7A is compared to 410 and found greater, as '4'='4', '1'='1', '0'='0', '.' > ' ', so greater . Then 410.7A is compared to 410.99 and found less, as '4'='4', '1'='1', '0'='0', '7' < '9', so less. The A is irrelevant to the comparison. But on the row above it you see it's not in the sequence, since A is ASCII 41x and that is not less than '9' (ASCII 39x).
Note that all SAS strings are filled to their full length by spaces. This can be important in string comparisons, because space is the lowest-valued printable character (if you consider space printable). Thus any character you're likely to compare to space will be higher - so for example the fourth row (410#) is a 1 because # is between and . in the ASCII table! But change that to / and it fails. Similarly, change it to byte(13) (through code) and it fails - because it is then less than space (so 410^M, with ^M representing byte(13), is less than start (410)). In informats and formats, SAS will treat the format/informat start/end as being whatever the length that it needs to - so if you're reading a 6 long string, it will treat it as length 6 and fill the rest with spaces.

Related

SyntaxError: Unexpected number in JSON at position 182 [duplicate]

I'm importing some JSON files into my Parse.com project, and I keep getting the error "invalid key:value pair".
It states that there is an unexpected "8".
Here's an example of my JSON:
}
"Manufacturer":"Manufacturer",
"Model":"THIS IS A STRING",
"Description":"",
"ItemNumber":"Number12345",
"UPC":083456789012,
"Cost":"$0.00",
"DealerPrice":" $0.00 ",
"MSRP":" $0.00 ",
}
If I update the JSON by either removing the 0 from "UPC":083456789012, or converting it to "UPC":"083456789012", it becomes valid.
Can JSON really not accept an integer that begins with 0, or is there a way around the problem?
A leading 0 indicates an octal number in JavaScript. An octal number cannot contain an 8; therefore, that number is invalid.
Moreover, JSON doesn't (officially) support octal numbers, so formally the JSON is invalid, even if the number would not contain an 8. Some parsers do support it though, which may lead to some confusion. Other parsers will recognize it as an invalid sequence and will throw an error, although the exact explanation they give may differ.
Solution: If you have a number, don't ever store it with leading zeroes. If you have a value that needs to have a leading zero, don't treat it as a number, but as a string. Store it with quotes around it.
In this case, you've got a UPC which needs to be 12 digits long and may contain leading zeroes. I think the best way to store it is as a string.
It is debatable, though. If you treat it as a barcode, seeing the leading 0 as an integral part of it, then string makes sense. Other types of barcodes can even contain alphabetic characters.
On the other hand. A UPC is a number, and the fact that it's left-padded with zeroes to 12 digits could be seen as a display property. Actually, if you left-pad it to 13 digits by adding an extra 0, you've got an EAN code, because EAN is a superset of UPC.
If you have a monetary amount, you might display it as € 7.30, while you store it as 7.3, so it could also make sense to store a product code as a number.
But that decision is up to you. I can only advice you to use a string, which is my personal preference for these codes, and if you choose a number, then you'll have to remove the 0 to make it work.
One of the more confusing parts of JavaScript is that if a number starts with a 0 that isn't immediately followed by a ., it represents an octal, not a decimal.
JSON borrows from JavaScript syntax but avoids confusing features, so simply bans numbers with leading zeros (unless then are followed by a .) outright.
Even if this wasn't the case, there would be no reason to expect the 0 to still be in the number when it was parsed since 02 and 2 are just difference representations of the same number (if you force decimal).
If the leading zero is important to your data, then you probably have a string and not a number.
"UPC":"083456789012"
A product code is an identifier, not something you do maths with. It should be a string.
Formally, it is because JSON uses DecimalIntegerLiteral in its JSONNumber production:
JSONNumber ::
-_opt DecimalIntegerLiteral JSONFraction_opt ExponentPart_opt
And DecimalIntegerLiteral may only start with 0 if it is 0:
DecimalIntegerLiteral ::
0
NonZeroDigit DecimalDigits_opt
The rationale behind is is probably:
In the JSON Grammar - to reuse constructs from the main ECMAScript grammar.
In the main ECMAScript grammar - to make it easier to distinguish DecimalIntegerLiteral from HexIntegerLiteral and OctalIntegerLiteral. OctalIntegerLiteral in the first place.
See this productions:
HexIntegerLiteral ::
0x HexDigit
0X HexDigit
HexIntegerLiteral HexDigit
...
OctalIntegerLiteral ::
0 OctalDigit
OctalIntegerLiteral OctalDigit
The UPC should be in string format. For the future you may also get other type of UPC such as GS128 or string based product identification codes. Set your DB column to be string.
If an integer start with 0 in JavaScript it is considered to be the Octal (base 8) value of the integer instead of the decimal (base 10) value. For example:
var a = 065; //Octal Value
var b = 53; //Decimal Value
a == b; //true
I think the easiest way to send your number by JSON is send your number as string.

Legal Statute Sorting Algorithm (Algorithmic Challenge)

I have designed a down and dirty sorting algorithm for New Jersey Legal Statutes, but I'm looking for a better way. Statutes are formatted in the following manner:
Title - A number < 999 and may include up to 2 letters after the title. Ex. 26, 26A, 26AA
Chapter - Formatted exactly the same as title.
Paragraph - A number < 999 which may be followed by 1-2 letters, and or a decimal point, and or a number < 999, and or a letter, and or another number. Ex. 25, 25.26, 25a, 25a.26, 25a.26b, 25aa.26, etc.
I convert the title and paragraph to decimals by stripping out the whole number in the beginning and dividing by 1000, giving me a decimal value which is converted to a string. I have assigned all letters(converted to lowercase) string values from 01-26. I add those values to the end of the initial whole number string sequentially. Any numbers also mixed in are added as their string value.
The obvious bottleneck is the mess of possibilities in the paragraph section. I have actually split that up to paragraph (pre any decimal) and section. I apply the above logic to the broken down sections if they exist.
As for the sorting 17 < 17A < 17AA < 17B < 18.
An example value conversion of 17B:26bb-2a5.1a5 would break down as the following:
Title- .01702
Chapter- .0260202
Paragraph- .002015
Section- .001015
Some more examples of statutes:
17:2-3
18B:2a-1
19AA:3-56g
26:56a-16
1:56-12.123
2:34–15.12a
The method I've devised is pretty dirty. I had to split it up in sections to ensure I had the correct values for each 'section' converting the whole number part to a decimal. I'm also using JS(Node) which doesn't handle large numbers well.
If anyone has a more efficient/clean way, any thoughts, or feedback, I'd greatly appreciate it.

Sort letters in string alphabetically- SAS

I would like to sort the letters in a string alphabetically.
E.g.
'apple' = 'aelpp'
The only function I have seen that is somewhat similar is SORTC, but I would like to avoid splitting each word into an array of letters if possible.
Joe's right - there is no built-in function that does this. You have two options here that I can see:
Split your string into an array and sort the array using call sortc. You can do this fairly painlessly using call pokelong provided that you have first defined an array of sufficient length.
Implement a sorting algorithm of your choice. If you choose to go down this route, I would suggest using substr on the left of the = sign to change individual characters without rewriting the whole string.
Here's an example of how you might do #1. #2 would be much more work.
data _null_;
myword = 'apple';
array letters[5] $1;
call pokelong(myword,addrlong(letters1),5); /*Limit # of chars to copy to the length of array*/
call sortc(of letters[*]);
myword = cat(of letters[*]);
putlog _all_;
run;
N.B. for an array of length 5 as used here, make sure you only write the first 5 characters of the string into memory at the start of the array when using call pokelong in order to avoid overflowing past the end of the array - otherwise you could overwrite some other arbitrary section of memory when processing longer values of myword. This could cause undesirable side effects, e.g. application / system crashes. Also, this technique for populating the array will not work in SAS University Edition - if you're using that, you'll need to use a do-loop instead.
I did a little test of this - sorting 2m random words of length 100 consisting of characters chosen from the whole ASCII printable range took about 15 seconds using a single CPU of a several-years-old PC - slightly less time than it took to create the test dataset.
data have;
length myword $100;
do i = 1 to 2000000;
do j = 1 to 100;
substr(myword,j,1) = byte(32 + int(ranuni(1) * (126 - 32)));
end;
output;
end;
drop i j;
run;
data want;
set have;
array letters[100] $1;
call pokelong(myword,addrlong(letters1),100); /*Limit # of chars to copy to the length of array*/
call sortc(of letters[*]);
myword = cat(of letters[*]);
drop letters:;
run;

Inconsistent behaviour concatenating macro variables

I am trying to create a string by concatenating several variables/delimiters within a macro:
%macro write_to_string();
%let delim = = ;
%let string = %sysfunc(catx(%str( ),
&string, \,
step start,
%nrstr(%superq(delim)),
&etls_stepStartTime,
|,
output table,
%nrstr(%superq(delim)),
&SYSLAST,
|,
transform return code,
%nrstr(%superq(delim)),
&trans_rc));
%mend;
The macro is called at the end of several transformations (within SAS DI), so the string keeps having text appended at the end.
If the each instance of %nrstr(%superq(delim)) is replaced with some other delimiter, : say, then the above macro behaves as intended. But with the code as above what I get a 0 followed by the last string to have been appended.
I am quite ignorant about macro variables and functions and am struggling to understand
Why the choice of delimiter seems to affect whether the string is properly appended
Why macro variables sometimes need to be referenced with the preceding & and sometimes not.
Any help is greatly appreciated!
EDIT
The input variables in the code above are autogenerated by the SAS DI system and reset after each transformation in the job. The values look something like
&etls_stepStartTime = 16FEB2017:17:25:37
&SYSLAST = WORK.MY_TABLE_NAME
&trans_rc = 0
Here the value of &trans_rc will indicate the error/warning status of the last transformation that ran.
So my desired output (with the &delim variable working) would be values of the form
step start = 16FEB2017:17:25:37 | output table = WORK.MY_TABLE_NAME | transform return code = 0
delimited by \. As mentioned above, what I get is only the last value (the one corresponding to the last transformation) with a preceding 0\, unless I change the delimiter to some non-reserved character constant.
Don't use %SYSFUNC() with the CAT... series of functions. First of all you don't need them as in macro code you can just place the text where you want it. Second since those functions can work on either numeric or character arguments. This means that SAS has to try to figure out whether the text that your macro code is generating as the arguments represent a number or a character string. That is probably why the equal signs result in zeros. SAS is treating the equal sign as equality test so the zero means that the values on each side are not equal.
%let string =&string \ step start &delim &etls_stepStartTime ;
%let string =&string | output table &delim &SYSLAST ;
%let string =&string | transform return code &delim &trans_rc ;

Array of Strings in Fortran 77

I've a question about Fortran 77 and I've not been able to find a solution.
I'm trying to store an array of strings defined as the following:
character matname(255)*255
Which is an array of 255 strings of length 255.
Later I read the list of names from a file and I set the content of the array like this:
matname(matcount) = mname
EDIT: Actually mname value is hardcoded as mname = 'AIR' of type character*255, it is a parameter of a function matadd() which executes the previous line. But this is only for testing, in the future it will be read from a file.
Later on I want to print it with:
write(*,*) matname(matidx)
But it seems to print all the 255 characters, it prints the string I assigned and a lot of garbage.
So that is my question, how can I know the length of the string stored?
Should I have another array with all the lengths?
And how can I know the length of the string read?
Thanks.
You can use this function to get the length (without blank tail)
integer function strlen(st)
integer i
character st*(*)
i = len(st)
do while (st(i:i) .eq. ' ')
i = i - 1
enddo
strlen = i
return
end
Got from here: http://www.ibiblio.org/pub/languages/fortran/ch2-13.html
PS: When you say: matname(matidx) it gets the whole string(256) chars... so that is your string plus blanks or garbage
The function Timotei posted will give you the length of the string as long as the part of the string you are interested in only contains spaces, which, if you are assigning the values in the program should be true as FORTRAN is supposed to initialize the variables to be empty and for characters that means a space.
However, if you are reading in from a file you might pick up other control characters at the end of the lines (particularly carriage return and/or line feed characters, \r and/or \n depending on your OS). You should also toss those out in the function to get the correct string length. Otherwise you could get some funny print statements as those characters are printed as well.
Here is my version of the function that checks for alternate white space characters at the end besides spaces.
function strlen(st)
integer i,strlen
character st*(*)
i = len(st)
do while ((st(i:i).eq.' ').or.(st(i:i).eq.'\r').or.
+ (st(i:i).eq.'\n').or.(st(i:i).eq.'\t'))
i = i - 1
enddo
strlen = i
return
end
If there are other characters in the "garbage" section this still won't work completely.
Assuming that it does work for your data, however, you can then change your write statement to look like this:
write(*,*) matname(matidx)(1:strlen(matname(matidx)))
and it will print out just the actual string.
As to whether or not you should use another array to hold the lengths of the string, that is up to you. the strlen() function is O(n) whereas looking up the length in a table is O(1). If you find yourself computing the lengths of these static strings often, it may improve performance to compute the length once when they are read in, store them in an array and look them up if you need them. However, if you don't notice the slowdown, I wouldn't worry about it.
Depending on the compiler that you are using, you may be able to use the trim() intrinsic function to remove any leading/trailing spaces from a string, then process it as you normally would, i.e.
character(len=25) :: my_string
my_string = 'AIR'
write (*,*) ':', trim(my_string), ':'
should print :AIR:.
Edit:
Better yet, it looks like there is a len_trim() function that returns the length of a string after it has been trimmed.
intel and Compaq Visual Fortran have the intrinsic function LEN_TRIM(STRING) which returns the length without trailing blanks or spaces.
If you want to suppress leading blanks or spaces, use "Adjust Left" i.e. ADJUSTF(STRING)
In these FORTRANs I also note a useful feature: If you pass a string in to a function or subroutine as an argument, and inside the subroutine it is declared as CHARACTER*(*), then
using the LEN(STRING) function in the subroutine retruns the actual string length passed in, and not the length of the string as declared in the calling program.
Example:
CHARACTER*1000 STRING
.
.
CALL SUBNAM(STRING(1:72)
SUBROUTINE SYBNAM(STRING)
CHARACTER*(*) STRING
LEN(STRING) will be 72, not 1000

Resources