adding commas as thousands to numerical string in batch - string

I was wondering if there was a simplistic code to add commas as thousands separators in variables that takes into account the length of the digit. For example, if The variable equals 123576 I would want it to become 123,456, but if the variable were equal to 1234567 then I would want it to turn into 1,234,567.

ECHO Processing %num%
SET "withcommas="
:subl
IF DEFINED num SET "withcommas=%num:~-3%,%withcommas%"&SET "num=%num:~0,-3%"&GOTO subl
SET "withcommas=%withcommas:~0,-1%"
ECHO result=%withcommas%
Processes num to required format. Note : num will be undefined by code.

Related

Inconsistent behaviour concatenating macro variables

I am trying to create a string by concatenating several variables/delimiters within a macro:
%macro write_to_string();
%let delim = = ;
%let string = %sysfunc(catx(%str( ),
&string, \,
step start,
%nrstr(%superq(delim)),
&etls_stepStartTime,
|,
output table,
%nrstr(%superq(delim)),
&SYSLAST,
|,
transform return code,
%nrstr(%superq(delim)),
&trans_rc));
%mend;
The macro is called at the end of several transformations (within SAS DI), so the string keeps having text appended at the end.
If the each instance of %nrstr(%superq(delim)) is replaced with some other delimiter, : say, then the above macro behaves as intended. But with the code as above what I get a 0 followed by the last string to have been appended.
I am quite ignorant about macro variables and functions and am struggling to understand
Why the choice of delimiter seems to affect whether the string is properly appended
Why macro variables sometimes need to be referenced with the preceding & and sometimes not.
Any help is greatly appreciated!
EDIT
The input variables in the code above are autogenerated by the SAS DI system and reset after each transformation in the job. The values look something like
&etls_stepStartTime = 16FEB2017:17:25:37
&SYSLAST = WORK.MY_TABLE_NAME
&trans_rc = 0
Here the value of &trans_rc will indicate the error/warning status of the last transformation that ran.
So my desired output (with the &delim variable working) would be values of the form
step start = 16FEB2017:17:25:37 | output table = WORK.MY_TABLE_NAME | transform return code = 0
delimited by \. As mentioned above, what I get is only the last value (the one corresponding to the last transformation) with a preceding 0\, unless I change the delimiter to some non-reserved character constant.
Don't use %SYSFUNC() with the CAT... series of functions. First of all you don't need them as in macro code you can just place the text where you want it. Second since those functions can work on either numeric or character arguments. This means that SAS has to try to figure out whether the text that your macro code is generating as the arguments represent a number or a character string. That is probably why the equal signs result in zeros. SAS is treating the equal sign as equality test so the zero means that the values on each side are not equal.
%let string =&string \ step start &delim &etls_stepStartTime ;
%let string =&string | output table &delim &SYSLAST ;
%let string =&string | transform return code &delim &trans_rc ;

Hyphen with strings in PROC FORMAT

I am working with IC9 codes and am creating somewhat of a mapping between codes and an integer:
proc format library = &formatlib;
invalue category other = 0
'410'-'410.99', '425.4'-'425.99' = 1
I have searched and searched, but haven't been able to find an explanation of how that range actually works when it comes to formatting.
Take the first range, for example. I assume SAS interprets '410'-'410.99' as "take every value between the inclusive range [410, 410.99] and convert it to a 1. Please correct me if I'm wrong in that assumption. Does SAS treat these seeming strings as floating-point decimals, then? I think that must be the case if these are to be numerical ranges for formatting all codes within the range.
I'm coming to SAS from the worlds of R and Python, and thus the way quote characters are used in SAS sometimes is unclear (like when using %let foo = bar... not quotes are used).
When SAS compares string values with normal comparison operators, what it does is compare the byte representation of each character in the string, one at a time, until it reaches a difference.
So what you're going to see here is when a string is input, it will be compared to the 'start' string and, if greater than start, then compared to the 'end' string, and if less than end, evaluated to a 1; if it's not for each pair listed, then evaluated to a zero.
Importantly, this means that some nonsensical results could occur - see the last row of the following test, for example.
proc format;
invalue category other = 0
'410'-'410.99', '425.4'-'425.99' = 1
;
quit;
data test;
input #1 testval $6.;
category=input(testval,category.);
datalines;
425.23
425.45
425.40
410#
410.00
410.AA
410.7A
;;;;
run;
410.7A is compared to 410 and found greater, as '4'='4', '1'='1', '0'='0', '.' > ' ', so greater . Then 410.7A is compared to 410.99 and found less, as '4'='4', '1'='1', '0'='0', '7' < '9', so less. The A is irrelevant to the comparison. But on the row above it you see it's not in the sequence, since A is ASCII 41x and that is not less than '9' (ASCII 39x).
Note that all SAS strings are filled to their full length by spaces. This can be important in string comparisons, because space is the lowest-valued printable character (if you consider space printable). Thus any character you're likely to compare to space will be higher - so for example the fourth row (410#) is a 1 because # is between and . in the ASCII table! But change that to / and it fails. Similarly, change it to byte(13) (through code) and it fails - because it is then less than space (so 410^M, with ^M representing byte(13), is less than start (410)). In informats and formats, SAS will treat the format/informat start/end as being whatever the length that it needs to - so if you're reading a 6 long string, it will treat it as length 6 and fill the rest with spaces.

Fortran read of data with * to signify similar data

My data looks like this
-3442.77 -16749.64 893.08 -3442.77 -16749.64 1487.35 -3231.45 -16622.36 902.29
.....
159*2539.87 10*0.00 162*2539.87 10*0.00
which means I start with either 7 or 8 reals per line and then (towards the end) have 159 values of 2539.87 followed by 10 values of 0 followed by 162 of 2539.87 etc. This seems to be a space-saving method as previous versions of this file format were regular 6 reals per line.
I am already reading the data into a string because of not knowing whether there are 7 or 8 numbers per line. I can therefore easily spot lines that contain *. But what then? I suppose I have to identify the location of each * and then identify the integer number before and real value after before assigning to an array. Am I missing anything?
Read the line. Split it into tokens delimited by whitespace(s). Replace the * in tokens that have it with space. Then read from the string one or two values, depending on wheather there was an asterisk or not. Sample code follows:
REAL, DIMENSION(big) :: data
CHARACTER(LEN=40) :: token
INTEGER :: iptr, count, idx
REAL :: val
iptr = 1
DO WHILE (there_are_tokens_left)
... ! Get the next token into "token"
idx = INDEX(token, "*")
IF (idx == 0) THEN
READ(token, *) val
count = 1
ELSE
! Replace "*" with space and read two values from the string
token(idx:idx) = " "
READ(token, *) count, val
END IF
data(iptr:iptr+count-1) = val ! Add "val" "count" times to the list of values
iptr = iptr + count
END DO
Here I have arbitrarily set the length of the token to be 40 characters. Adjust it according to what you expect to find in your input files.
BTW, for the sake of completeness, this method of compressing something by replacing repeating values with value/repetition-count pairs is called run-length encoding (RLE).
Your input data may have been written in a form suitable for list directed input (where the format specification in the READ statement is simply ''*''). List directed input supports the r*c form that you see, where r is a repeat count and c is the constant to be repeated.
If the total number of input items is known in advance (perhaps it is fixed for that program, perhaps it is defined by earlier entries in the file) then reading the file is as simple as:
REAL :: data(size_of_data)
READ (unit, *) data
For example, for the last line shown in your example on its own ''size_of_data'' would need to be 341, from 159+10+162+10.
With list directed input the data can span across multiple records (multiple lines) - you don't need to know how many items are on each line in advance - just how many appear in the next "block" of data.
List directed input has a few other "features" like this, which is why it is generally not a good idea to use it to parse "arbitrary" input that hasn't been written with it in mind - use an explicit format specification instead (which may require creating the format specification on the fly to match the width of the input field if that is not know ahead of time).
If you don't know (or cannot calculate) the number of items in advance of the READ statement then you will need to do the parsing of the line yourself.

Reading a string with spaces in Fortran

Using read(*,*) in Fortran doesn't seem to work if the string to be read from the user contains spaces.
Consider the following code:
character(Len = 1000) :: input = ' '
read(*,*) input
If the user enters the string "Hello, my name is John Doe", only "Hello," will be stored in input; everything after the space is disregarded. My assumption is that the compiler assumes that "Hello," is the first argument, and that "my" is the second, so to capture the other words, we'd have to use something like read(*,*) input1, input2, input3... etc. The problem with this approach is that we'd need to create large character arrays for each input, and need to know exactly how many words will be entered.
Is there any way around this? Some function that will actually read the whole sentence, spaces and all?
character(100) :: line
write(*,'("Enter some text: ",\)')
read(*,'(A)') line
write(*,'(A)') line
end
... will read a line of text of maximum length 100 (enough for most practical purposes) and write it out back to you. Modify to your liking.
Instead of read(*, *), try read(*, '(a)'). I'm no Fortran expert, but the second argument to read is the format specifier (equivalent to the second argument to sscanf in C). * there means list format, which you don't want. You can also say a14 if you want to read 14 characters as a string, for example.

Array of Strings in Fortran 77

I've a question about Fortran 77 and I've not been able to find a solution.
I'm trying to store an array of strings defined as the following:
character matname(255)*255
Which is an array of 255 strings of length 255.
Later I read the list of names from a file and I set the content of the array like this:
matname(matcount) = mname
EDIT: Actually mname value is hardcoded as mname = 'AIR' of type character*255, it is a parameter of a function matadd() which executes the previous line. But this is only for testing, in the future it will be read from a file.
Later on I want to print it with:
write(*,*) matname(matidx)
But it seems to print all the 255 characters, it prints the string I assigned and a lot of garbage.
So that is my question, how can I know the length of the string stored?
Should I have another array with all the lengths?
And how can I know the length of the string read?
Thanks.
You can use this function to get the length (without blank tail)
integer function strlen(st)
integer i
character st*(*)
i = len(st)
do while (st(i:i) .eq. ' ')
i = i - 1
enddo
strlen = i
return
end
Got from here: http://www.ibiblio.org/pub/languages/fortran/ch2-13.html
PS: When you say: matname(matidx) it gets the whole string(256) chars... so that is your string plus blanks or garbage
The function Timotei posted will give you the length of the string as long as the part of the string you are interested in only contains spaces, which, if you are assigning the values in the program should be true as FORTRAN is supposed to initialize the variables to be empty and for characters that means a space.
However, if you are reading in from a file you might pick up other control characters at the end of the lines (particularly carriage return and/or line feed characters, \r and/or \n depending on your OS). You should also toss those out in the function to get the correct string length. Otherwise you could get some funny print statements as those characters are printed as well.
Here is my version of the function that checks for alternate white space characters at the end besides spaces.
function strlen(st)
integer i,strlen
character st*(*)
i = len(st)
do while ((st(i:i).eq.' ').or.(st(i:i).eq.'\r').or.
+ (st(i:i).eq.'\n').or.(st(i:i).eq.'\t'))
i = i - 1
enddo
strlen = i
return
end
If there are other characters in the "garbage" section this still won't work completely.
Assuming that it does work for your data, however, you can then change your write statement to look like this:
write(*,*) matname(matidx)(1:strlen(matname(matidx)))
and it will print out just the actual string.
As to whether or not you should use another array to hold the lengths of the string, that is up to you. the strlen() function is O(n) whereas looking up the length in a table is O(1). If you find yourself computing the lengths of these static strings often, it may improve performance to compute the length once when they are read in, store them in an array and look them up if you need them. However, if you don't notice the slowdown, I wouldn't worry about it.
Depending on the compiler that you are using, you may be able to use the trim() intrinsic function to remove any leading/trailing spaces from a string, then process it as you normally would, i.e.
character(len=25) :: my_string
my_string = 'AIR'
write (*,*) ':', trim(my_string), ':'
should print :AIR:.
Edit:
Better yet, it looks like there is a len_trim() function that returns the length of a string after it has been trimmed.
intel and Compaq Visual Fortran have the intrinsic function LEN_TRIM(STRING) which returns the length without trailing blanks or spaces.
If you want to suppress leading blanks or spaces, use "Adjust Left" i.e. ADJUSTF(STRING)
In these FORTRANs I also note a useful feature: If you pass a string in to a function or subroutine as an argument, and inside the subroutine it is declared as CHARACTER*(*), then
using the LEN(STRING) function in the subroutine retruns the actual string length passed in, and not the length of the string as declared in the calling program.
Example:
CHARACTER*1000 STRING
.
.
CALL SUBNAM(STRING(1:72)
SUBROUTINE SYBNAM(STRING)
CHARACTER*(*) STRING
LEN(STRING) will be 72, not 1000

Resources