Fortran read of data with * to signify similar data - string

My data looks like this
-3442.77 -16749.64 893.08 -3442.77 -16749.64 1487.35 -3231.45 -16622.36 902.29
.....
159*2539.87 10*0.00 162*2539.87 10*0.00
which means I start with either 7 or 8 reals per line and then (towards the end) have 159 values of 2539.87 followed by 10 values of 0 followed by 162 of 2539.87 etc. This seems to be a space-saving method as previous versions of this file format were regular 6 reals per line.
I am already reading the data into a string because of not knowing whether there are 7 or 8 numbers per line. I can therefore easily spot lines that contain *. But what then? I suppose I have to identify the location of each * and then identify the integer number before and real value after before assigning to an array. Am I missing anything?

Read the line. Split it into tokens delimited by whitespace(s). Replace the * in tokens that have it with space. Then read from the string one or two values, depending on wheather there was an asterisk or not. Sample code follows:
REAL, DIMENSION(big) :: data
CHARACTER(LEN=40) :: token
INTEGER :: iptr, count, idx
REAL :: val
iptr = 1
DO WHILE (there_are_tokens_left)
... ! Get the next token into "token"
idx = INDEX(token, "*")
IF (idx == 0) THEN
READ(token, *) val
count = 1
ELSE
! Replace "*" with space and read two values from the string
token(idx:idx) = " "
READ(token, *) count, val
END IF
data(iptr:iptr+count-1) = val ! Add "val" "count" times to the list of values
iptr = iptr + count
END DO
Here I have arbitrarily set the length of the token to be 40 characters. Adjust it according to what you expect to find in your input files.
BTW, for the sake of completeness, this method of compressing something by replacing repeating values with value/repetition-count pairs is called run-length encoding (RLE).

Your input data may have been written in a form suitable for list directed input (where the format specification in the READ statement is simply ''*''). List directed input supports the r*c form that you see, where r is a repeat count and c is the constant to be repeated.
If the total number of input items is known in advance (perhaps it is fixed for that program, perhaps it is defined by earlier entries in the file) then reading the file is as simple as:
REAL :: data(size_of_data)
READ (unit, *) data
For example, for the last line shown in your example on its own ''size_of_data'' would need to be 341, from 159+10+162+10.
With list directed input the data can span across multiple records (multiple lines) - you don't need to know how many items are on each line in advance - just how many appear in the next "block" of data.
List directed input has a few other "features" like this, which is why it is generally not a good idea to use it to parse "arbitrary" input that hasn't been written with it in mind - use an explicit format specification instead (which may require creating the format specification on the fly to match the width of the input field if that is not know ahead of time).
If you don't know (or cannot calculate) the number of items in advance of the READ statement then you will need to do the parsing of the line yourself.

Related

Inner workings of map() in a specific parsing situation

I know there are already at least two topics that explain how map() works but I can't seem to understand its workings in a specific case I encountered.
I was working on the following Python exercise:
Write a program that computes the net amount of a bank account based a
transaction log from console input. The transaction log format is
shown as following:
D 100
W 200
D means deposit while W means withdrawal. Suppose the following input
is supplied to the program:
D 300
D 300
W 200
D 100
Then, the output should be:
500
One of the answers offered for this exercise was the following:
total = 0
while True:
s = input().split()
if not s:
break
cm,num = map(str,s)
if cm=='D':
total+=int(num)
if cm=='W':
total-=int(num)
print(total)
Now, I understand that map applies a function (str) to an iterable (s), but what I'm failing to see is how the program identifies what is a number in the s string. I assume str converts each letter/number/etc in a string type, but then how does int(num) know what to pick as a whole number? In other words, how come this code doesn't produce some kind of TypeError or ValueError, because the way I see it, it would try and make an integer of (for example) "D 100"?
first
cm,num = map(str,s)
could be simplified as
cm,num = s
since s is already a list of strings made of 2 elements (if the input is correct). No need to convert strings that are already strings. s is just unpacked into 2 variables.
the way I see it, it would try and make an integer of (for example) "D 100"?
no it cannot, since num is the second parameter of the string.
if input is "D 100", then s is ['D','100'], then cm is 'D' and num is '100'
Then since num represents an integer int(num) is going to convert num to its integer value.
The above code is completely devoid of error checking (number of parameters, parameters "type") but with the correct parameters it works.
and map is completely useless in that particular example too.
The reason is the .split(), statement before in the s = input().split(). This creates a list of the values D and 100 (or ['D', '100']), because the default split character is a space ( ). Then the map function applies the str operation to both 'D' and '100'.
Now the map, function is not really required because both values upon input are automatically of the type str (strings).
The second question is how int(num) knows how to convert a string. This has to do with the second (implicit) argument base. Similar to how .split() has a default argument of the character to split on, so does num have a default argument to convert to.
The full code is similar to int(num, base=10). So as long as num has the values 0-9 and at most 1 ., int can convert it properly to the base 10. For more examples check out built in int.

Set position according part of an objectname

I'm trying to script something in blender3D using python.
I've got a bunch of objects in my scene and want to translate them using a the numerical part of their objectname.
First of all i collect objects from the scene by matching a part of their name.
root_obj = [obj for obj in scene.objects if fnmatch.fnmatchcase(obj.name, "*_Root")]
This gives me a list with:[bpy.data.objects['01_Root'],bpy.data.objects['02_Root'],bpy.data.objects['03_Root'],bpy.data.objects['00_Root']]
My goal is to move these objects 15x their corresponding part of the name. So '00_Root' doesnt have to move, but '01_Root' has to move 15 blender units and '02_Root' 30 blender units.
How do i exctract the numberpart of the names and use them as translation values.
I'm a pretty newb with python so i would appreciate all the help i can get.
A string is a list of characters, each character can be accessed by index starting with 0, get the first character with name[0], the second with name[1]. As with any list you can use slicing to get a portion of the list. If the value is always the first two characters you can get the value with name[:2] you can them turn that into an integer with int() or a float with float(). Combined that becomes,
val = int(name[:2])
You then have a number you can calculate the new location with.
obj.location.x = val * 15
If the number of digits in the name might vary you can use split() to break the string on a specific separating character. This returns a list of items between the specified character, so if you want the first item to turn into an integer.
name = '02_item'
val = int(name.split('_')[0])
Using split also allows multiple values in a name.
name = '2_12_item'
val1 = int(name.split('_')[0])
val2 = int(name.split('_')[1])

Hyphen with strings in PROC FORMAT

I am working with IC9 codes and am creating somewhat of a mapping between codes and an integer:
proc format library = &formatlib;
invalue category other = 0
'410'-'410.99', '425.4'-'425.99' = 1
I have searched and searched, but haven't been able to find an explanation of how that range actually works when it comes to formatting.
Take the first range, for example. I assume SAS interprets '410'-'410.99' as "take every value between the inclusive range [410, 410.99] and convert it to a 1. Please correct me if I'm wrong in that assumption. Does SAS treat these seeming strings as floating-point decimals, then? I think that must be the case if these are to be numerical ranges for formatting all codes within the range.
I'm coming to SAS from the worlds of R and Python, and thus the way quote characters are used in SAS sometimes is unclear (like when using %let foo = bar... not quotes are used).
When SAS compares string values with normal comparison operators, what it does is compare the byte representation of each character in the string, one at a time, until it reaches a difference.
So what you're going to see here is when a string is input, it will be compared to the 'start' string and, if greater than start, then compared to the 'end' string, and if less than end, evaluated to a 1; if it's not for each pair listed, then evaluated to a zero.
Importantly, this means that some nonsensical results could occur - see the last row of the following test, for example.
proc format;
invalue category other = 0
'410'-'410.99', '425.4'-'425.99' = 1
;
quit;
data test;
input #1 testval $6.;
category=input(testval,category.);
datalines;
425.23
425.45
425.40
410#
410.00
410.AA
410.7A
;;;;
run;
410.7A is compared to 410 and found greater, as '4'='4', '1'='1', '0'='0', '.' > ' ', so greater . Then 410.7A is compared to 410.99 and found less, as '4'='4', '1'='1', '0'='0', '7' < '9', so less. The A is irrelevant to the comparison. But on the row above it you see it's not in the sequence, since A is ASCII 41x and that is not less than '9' (ASCII 39x).
Note that all SAS strings are filled to their full length by spaces. This can be important in string comparisons, because space is the lowest-valued printable character (if you consider space printable). Thus any character you're likely to compare to space will be higher - so for example the fourth row (410#) is a 1 because # is between and . in the ASCII table! But change that to / and it fails. Similarly, change it to byte(13) (through code) and it fails - because it is then less than space (so 410^M, with ^M representing byte(13), is less than start (410)). In informats and formats, SAS will treat the format/informat start/end as being whatever the length that it needs to - so if you're reading a 6 long string, it will treat it as length 6 and fill the rest with spaces.

Splitting a Comma-Separated String in Scala: Missing Trailing Empty Strings?

I have a data file in csv format.
I am trying to split each line using the basic split command line.split(',')
But when I get a string like this "2,,",
instead of returning an array as thus Array(2,"","")
I just get an Array: Array(2).
I am most definitely missing something basic, could someone help point out the correct way to split a comma separated string here?
This is inherited from Java. You can achieve behavior you want by using the split(String regex, int limit) overload:
"2,,".split(",", -1) // = Array(2, "", "")
Note the String instead of Char.
As explained by the Java Docs, the limit parameter is used as follows:
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.
Source
Using split(separator: Char) will call the overload above, using a limit of zero.

Groovy split without final trim

I'm parsing a CVS file like the following:
"07555555555",25.70,18/11/2010,01/03/2011,N,133,0,36,,896,537,547,,Mr,John,Doe,,
"07555555555",10.15,26/01/2011,01/03/2011,N,16,0,100,,896,537,547,,Mrs,Jane,Doe,,jane#doe.com
The thing is that when using a script like this:
file.eachLine{ line ->
items = line.split(",")
println items.length
}
The result is like the following:
16
18
Which makes me thing that the split function removes a final values. I need it to have all the items even if they are empty. Any idea?
I think you want to do:
items = line.split(',', -1)
to make sure you get all the tokens
(according to the javadoc):
The limit parameter controls the
number of times the pattern is applied
and therefore affects the length of
the resulting array. If the limit n is
greater than zero then the pattern
will be applied at most n - 1 times,
the array's length will be no greater
than n, and the array's last entry
will contain all input beyond the last
matched delimiter. If n is
non-positive then the pattern will be
applied as many times as possible and
the array can have any length. If n is
zero then the pattern will be applied
as many times as possible, the array
can have any length, and trailing
empty strings will be discarded.

Resources