Variable length strings in fortran - string

So I'm just getting started learning Fortran because apparently it's still used for a lot of scientific computing. One of the things that I already hate about it is that compared to C++, strings are a nightmare. At the moment, I'm just trying to find a simple way to read in a string provided by the user and then spit it back out without all the trailing whitespace.
Here's the code I have right now, which according to what I've read should work under later Fortran standards, i.e. 2003/2008 (though I could easily have made errors of which I'm unaware). I'm trying to compile it on the MinGW version of gfortran that came with Code::Blocks 13.12.
program tstring
implicit none
character(100) io_name
character(len=:), allocatable :: final_name
print *, "What is your name, O master?"
read *, io_name
final_name = trim(io_name)
print *, "Excellent, master ", final_name, "!"
end program
It compiles just fine, but still has a massive amount of whitespace between final_name and the "!". The whitespace is somewhat dependent on the number of characters I give to io_name, but not in a particularly logical manner (15 characters gives more whitespace than 30, for example). Particularly bewildering to me is that if I give certain numbers of characters to io_name (between about 17 and 22) then instead of printing out the name, the program runs into a segfault.
Perhaps the most difficult part of this for me is that good Fortran documentation is very hard to find, especially for the 2003 and later standards. So if anybody could point me to some good documentation, I'd really appreciate it!

Get a more recent compiler. Gfortran 4.8 (AFAIK the oldest supported version) should work just fine with your code.
Strings are not really any nightmare, just stop thinking in C++ and trying to translate that in Fortran. It is quite possible to write tokenizers, parsers and DSL interpreters in Fortran.
Regarding the resources, it is off-topic here and there are numerous books and tutorials out there. Search for Fortran Resources at http://fortranwiki.org/fortran/show/HomePage

Related

allow(uncommon_codepoints) is ignored unless specified at crate level

I wrote this the other day:
let µ = ... some expression ...
(As it happens, the µ sign is easily typeable on my keyboard, just AltGr+m. This is why I have a habit to use this letter quite often especially when it is about small values.)
Now I got this:
identifier contains uncommon Unicode codepoints
`#[warn(uncommon_codepoints)]` on by default
No problem, I'll just allow it, I thought, and put this at the front:
#![allow(uncommon_codepoints)]
But no, it's utterly hesitant against greek:
allow(uncommon_codepoints) is ignored unless specified at crate level
`#[warn(unused_attributes)]` on by default
I would think it is at least debatable what "uncommon" exactly is. But I'm not really interested in that discussion, as long as I can turn it off.
So please ... how exactly do I specify something at the crate level? I tried it in main.rs and libs.rs but it wont accept it.
Edit
This really starts to become interesting:
I put the line
#![allow(uncommon_codepoints)]
as line 1 in main.rs and it now stops complaining about the unused_attribute. However, the "uncommon codepoint" warning still appears when compiling the file that contains it (i.e. with cargo build). I am at rustc 1.58.1 (stable, AFAIK)
I also found out that what my keyboard produces is not U+03BC GREEK SMALL LETTER MU but U+00B5 MICRO SIGN. It's still a letter, lowercase. Now, the interesting thing is: the uncommon Unicode warning does not appear for a genuine greek Mu, but for the micro sign it does!
Is there any other place I can turn off annoying and (from my point of view utterly useless) warnings? In general, I highly appreciate rust's detailed and often helpful warnings (though lately I found myself making an unused HashSet just to avoid the warnings about unused imports --- hey I know I will need this later, so please stop nagging), but this unicode thingy is a bit overdone. Its a valid variable name according to rust lexical syntax, and I really do want to use it. Period.

Fortran internal write suddenly an error according to Intel compiler

A Fortran code I am working on has several lines similar to
WRITE(filename, '(A16,"_",I4,".dat")') filename, indx
This code has been successfully compiled and run literally hundreds of times, on many different platforms and pretty much all major compilers. But suddenly the newest (or, new, anyway) Intel compiler doesn't like it. It gives a warning message "forrtl: .... Internal file write-to-self; undefined results". After this line executes, "filename", which was a reasonable character array, becomes blank.
I suppose the problem is that filename is both an input into the write and the destination of the internal write. The fix is easy enough. It works to replace filename as the destination with something like filename_tmp. But as I have said, this has never been necessary until now.
So I am wondering, does filename as both an input and destination violate the Fortran standard, but all these compilers have been turning a blind eye to it for all these years, and now Intel is getting strict? Or is Intel being "snobbish"? Or outright buggy?
Execution1 of the write statement of the question has always been explicitly prohibited.
We currently see (F2018 12.6.4.5.1 p7):
During the execution of an output statement that specifies an internal file, no part of that internal file shall be referenced, defined, or become undefined as the result of evaluating any output list item.
filename is an internal file, and the evaluation of the output list item filename is a reference to that internal file.
This is not a programming violation that the compiler is required to detect, so you can view this as a case of improved diagnostic capability/pickiness of the compiler as you desire. No Fortran program is harmed by the change in this behaviour of the compiler.
Fortran 66 didn't have internal files (or character types), of course, and Fortrans 77, 90 and 95 used different words for the same effect (see for example, F90 9.4.4):
If an internal file has been specified, an input/output list item must not be in the file or associated with the file.
In case it looks like this is more restrictive, from Fortran 2003 the restrictions for input and output statements are stated separately (only output was quoted above, p8 for input).
1 Note the use of execution: there's nothing wrong with the statement itself as a statement. It is allowed to exist in source code that isn't reached. Checking this statement when compiling is not a simple matter.

Read substrings from a string containing multiplication [duplicate]

This question already has answers here:
'*' and '/' not recognized on input by a read statement
(2 answers)
Closed 4 years ago.
I am a scientist programming in Fortran, and I came up with a strange behaviour. In one of my programs I have a string containing several "words", and I want to read all words as substrings. The first word starts with an integer and a wildcard, like "2*something".
When I perform an internal read on that string, I expect to read all wods, but instead, the READ function repeatedly reads the first substring. I do not understand why, nor how to avoid this behaviour.
Below is a minimalist sample program that reproduces this behaviour. I would expect it to read the three substrings and to print "3*a b c" on the screen. Instead, I get "a a a".
What am I doing wrong? Can you please help me and explain what is going on?
I am compiling my programs under GNU/Linux x64 with Gfortran 7.3 (7.3.0-27ubuntu1~18.04).
PROGRAM testread
IMPLICIT NONE
CHARACTER(LEN=1024):: string
CHARACTER(LEN=16):: v1, v2, v3
string="3*a b c"
READ(string,*) v1, v2, v3
PRINT*, v1, v2, v3
END PROGRAM testread
You are using list-directed input (the * format specifier). In list-directed input, a number (n) followed by an asterisk means "repeat this item n times", so it is processed as if the input was a a a b c. You would need to have as input '3*a' b c to get what you want.
I will use this as another opportunity to point out that list-directed I/O is sometimes the wrong choice as its inherent flexibility may not be what you want. That it has rules for things like repeat counts, null values, and undelimited strings is often a surprise to programmers. I also often see programmers complaining that list-directed input did not give an error when expected, because the compiler had an extension or the programmer didn't understand just how liberal the feature can be.
I suggest you pick up a Fortran language reference and carefully read the section on list-directed I/O. You may find you need to use an explicit format or change your program's expectations.
Following the answer of #SteveLionel, here is the relevant part of the reference on list-directed sequential READ statements (in this case, for Intel Fortran, but you could find it for your specific compiler and it won't be much different).
A character string does not need delimiting apostrophes or quotation marks if the corresponding I/O list item is of type default character, and the following is true:
The character string does not contain a blank, comma (,), or slash ( / ).
The character string is not continued across a record boundary.
The first nonblank character in the string is not an apostrophe or a quotation mark.
The leading character is not a string of digits followed by an asterisk.
A nondelimited character string is terminated by the first blank, comma, slash, or end-of-record encountered. Apostrophes and quotation marks within nondelimited character strings are transferred as is.
In total, there are 4 forms of sequential read statements in Fortran, and you may choose the option that best fits your need:
Formatted Sequential Read:
To use this you change the * to an actual format specifier. If you know the length of the strings at advance, this would be as easy as '(a3,a2,a2)'. Or, you could come with a format specifier that matches your data, but this generally demands you knowing the length or format of stuff.
Formatted Sequential List-Directed:
You are currently using this option (the * format descriptor). As we already showed you, this kind of I/O comes with a lot of magic and surprising behavior. What is hitting you is the n*cte thing, that is interpreted as n repetitions of cte literal.
As said by Steve Lionel, you could put quotation marks around the problematic word, so it will be parsed as one-piece. Or, as proposed by #evets, you could split or break your string using the intrinsics index or scan. Another option could be changing your wildcard from asterisk to anything else.
Formatted Namelist:
Well, that could be an option if your data was (or could be) presented in the namelist format, but I really think it's not your case.
Unformatted:
This may not apply to your case because you are reading from a character variable, and an internal READ statement can only be formatted.
Otherwise, you could split your string by means of a function instead of a I/O operation. There is no intrinsic for this, but you could come with one without much trouble (see this thread for reference). As you may have noted already, manipulating strings in fortran is... awkward, at least. There are some libraries out there (like this) that may be useful if you are doing lots of string stuff in Fortran.

Heuristic for using symbols rather than strings in J

While the real reason to use J's symbols (s: ' Abe Bill Chad') rather than string arrays ('Abe','Bill',:'Chad') or boxed lists of strings ('Abe';'Bill';'Chad') is that it is the best solution (most efficient/convenient for man or machine), what is the rule of thumb for when to use symbols?
The vocabulary page for s: mentions efficient "searching, sorting, and comparisons." Do you start using symbols from the beginning if there's any chance you'll be searching, sorting, or comparing? Or do you only use symbols once you recognize that your code is working around the limitations of the other options? Or something a bit more nuanced in between?
My experience is that for most use cases symbol (s:) is not necessary to provide acceptable performance which matches the advice in the section on Symbol in J for C Programmers that you use symbols if you find your program taking a lot of time matching strings. It also warns that:
there is no way to tell the interpreter to free the resources for a
single string; this can be a problem if your symbol table is large and
changes dynamically.
For this reason the Vocabulary page for Symbol on JWiki suggests symbols are most useful when:
a limited number of strings appear repeatedly
the set of symbols is known and unchanging

A language in which everything compiles

I'm trying to do some research for a new project, and I need to create objects dynamically from random data.
For this to work, I need a language / compiler that doesn't have problems with weird uncompilable code lying around.
Basically, I need the random code to compile (or be interpreted) as much as possible - Meaning that the uncompilable parts will be ignored, and only the compilable parts will create the objects (which could be ran).
Object Oriented-ness is not a must, but is a very strong advantage.
I thought of ASM, but it's very messy, and I'd probably need a more readable code
Thanks!
It sounds like you're doing something very much like genetic programming; even if you aren't, GP has to solve some of the same problems—using randomness to generate valid programs. The approach to this that is typically used is to work with a syntax tree: rather than storing x + y * 3 - 2, you store something like the following:
Then, instead of randomly changing the syntax, one can randomly change nodes in the tree instead. And if x should randomly change to, say, +, you can statically know that this means you need to insert two children (or not, depending on how you define +).
A good choice for a language to work with for this would be any Lisp dialect. In a Lisp, the above program would be written (- (+ x (* y 3)) 2), which is just a linearization of the syntax tree using parentheses to show depth. And in fact, Lisps expose this feature: you can just as easily work with the object '(- (+ x (* y 3)) 2) (note the leading quote). This is a three-element list, whose first element is -, second element is another list, and third element is 2. And, though you might or might not want it for your particular application, there's an eval function, such that (eval '(- (+ x (* y 3)) 2)) will take in the given list, treat it as a Lisp syntax tree/program, and evaluate it. This is what makes Lisps so attractive for doing this sort of work; Lisp syntax is basically a reification of the syntax-tree, and if you operate at the syntax-tree level, you can work on code as though it was a value. Lisp won't help you read /dev/random as a program directly, but with a little interpretation layered on top, you should be able to get what you want.
I should also mention, though I don't know anything about it (not that I know much about ordinary genetic programming) the existence of linear genetic programming. This is sort of like the assembly model that you mentioned—a linear stream of very, very simple instructions. The advantage here would seem to be that if you are working with /dev/random or something like it, the amount of interpretation needed is very small; the disadvantage would be, as you mentioned, the low-level nature of the code.
I'm not sure if this is what you're looking for, but any programming language can be made to function this way. For any programming language P, define the language Palways as follows:
If p is a valid program in P, then p is a valid program in Palways whose meaning is the same as its meaning in P.
If p is not a valid program in P, then p is a valid program in Palways whose meaning is the same as a program that immediately terminates.
For example, I could make the language C++always so that this program:
#include <iostream>
using namespace std;
int main() {
cout << "Hello, world!" << endl;
}
would compile as "Hello, world!", while this program:
Hahaha! This isn't legal C++ code!
Would be a legal program that just does absolutely nothing.
To solve your original problem, just take any OOP language like Java, Smalltalk, etc. and construct the appropriate Javaalways, Smalltalkalways, etc. language from it. Again, I'm not sure if this is at all what you're looking for, but it could be done very easily.
Alternatively, consider finding a grammar for any OOP language and then using that grammar to produce random syntactically valid programs. You could then filter those programs down by using the Palways programming language for that language to eliminate syntactically but not semantically valid programs.
Divide the ASCII byte values into 9 classes (division modulo 9 would help). Then assign then to Brainfuck codewords (see http://en.wikipedia.org/wiki/Brainfuck). Then interpret as Brainfuck.
There you go, any sequence of ASCII characters is a program. Not that it's going to do anything sensible... This approach has a much better chance, compared to templatetypedef's answer, to get a nontrivial program from a random byte sequence.
Text Editors
You could try feeding random character strings to an editor like Emacs or VI. Many (most?) characters will perform an editing action but some will do nothing (other than beep, perhaps). You would have to ensure that the random code mutator never generates the character sequence that exits the editor. However, this experience would be much like programming a Turing machine -- the code is not too readable.
Mathematica
In Mathematica, undefined symbols and other expressions evaluate to themselves, without error. So, that language might be a viable choice if you can arrange for the random code mutator to always generate well-formed expressions. This would be readily achievable since the basic Mathematica syntax is trivial, making it easy to operate on syntactic units rather than at the character level. It would be even easier if the mutator were written in Mathematica itself since expression-munging is Mathematica's forte. You could define a mini-language of valid operations within a Mathematica package that does not import the system-defined symbols. This would allow you to generate well-formed expressions to your heart's content without fear of generating a dangerous expression, like DeleteFile[FileNames["*.*", "/", Infinity]].
I believe Common Lisp should suit your needs. I always have some code in my SLIME/Emacs session that wouldn't compile. You can always tweak things, redefine functions in run-time. It is actually very good for prototyping.
A few years ago it took me quite a while to learn. But nowadays we have quicklisp and everything is so much easier.
Here I describe my development environment:
Install lisp on my linux machine
PS: I want to give an example, where Common Lisp was useful for me:
Up to maybe 2004 I used to write small programs in C (the keep it simple Unix way).
The last 3 years I had to get lots of different hardware running. Motorized stages, scientific cameras, IO cards.
The cameras turned out to be quite annoying. Usually you have to cool them down to -50 degree celsius or so and (in some SDKs) they don't like it when you close them. But this
is exactly how my C development cycle worked: write (30s), compile (1s), run (0.1s), repeat.
Eventually I decided to just use Common Lisp. Often it is straight forward to define the foreign function interfaces to talk to the SDKs and I can do this without ever leaving the running Lisp image. I start the editor in the morning define the open-device function, to talk to the device and after 3 hours I have enough of the functions implemented to set gain, temperature, region of interest and obtain the video.
Then I can often put the SDK manual away and just use the camera.
I used the same interactive programming approach when I have to parse some webpage or some weird XML.

Resources