How to input a string value of unknown length from console in Fortran? [duplicate] - string

I would like to use deferred-length character strings in a "simple" manner to read user input. The reason that I want to do this is that I do not want to have to declare the size of a character string before knowing how large the user input will be. I know that there are "complicated" ways to do this. For example, the iso_varying_string module can be used: https://www.fortran.com/iso_varying_string.f95. Also, there is a solution here: Fortran Character Input at Undefined Length. However, I was hoping for something as simple, or almost as simple, as the following:
program main
character(len = :), allocatable :: my_string
read(*, '(a)') my_string
write(*,'(a)') my_string
print *, allocated(my_string), len(my_string)
end program
When I run this program, the output is:
./a.out
here is the user input
F 32765
Notice that there is no output from write(*,'(a)') my_string. Why?
Also, my_string has not been allocated. Why?
Why isn't this a simple feature of Fortran? Do other languages have this simple feature? Am I lacking some basic understanding about this issue in general?

vincentjs's answer isn't quite right.
Modern (2003+) Fortran does allow automatic allocation and re-allocation of strings on assignment, so a sequence of statements such as this
character(len=:), allocatable :: string
...
string = 'Hello'
write(*,*)
string = 'my friend'
write(*,*)
string = 'Hello '//string
write(*,*)
is correct and will work as expected and write out 3 strings of different lengths. At least one compiler in widespread use, the Intel Fortran compiler, does not engage 2003 semantics by default so may raise an error on trying to compile this. Refer to the documentation for the setting to use Fortran 2003.
However, this feature is not available when reading a string so you have to resort to the tried and tested (aka old-fashioned if you prefer) approach of declaring a buffer of sufficient size for any input and of then assigning the allocatable variable. Like this:
character(len=long) :: buffer
character(len=:), allocatable :: string
...
read(*,*) buffer
string = trim(buffer)
No, I don't know why the language standard forbids automatic allocation on read, just that it does.

Deferred length character is a Fortran 2003 feature. Note that many of the complicated methods linked to are written against earlier language versions.
With Fortran 2003 support, reading a complete record into a character variable is relatively straight forward. A simple example with very minimal error handling below. Such a procedure only needs to be written once, and can be customized to suit a user's particular requirements.
PROGRAM main
USE, INTRINSIC :: ISO_FORTRAN_ENV, ONLY: INPUT_UNIT
IMPLICIT NONE
CHARACTER(:), ALLOCATABLE :: my_string
CALL read_line(input_unit, my_string)
WRITE (*, "(A)") my_string
PRINT *, ALLOCATED(my_string), LEN(my_string)
CONTAINS
SUBROUTINE read_line(unit, line)
! The unit, connected for formatted input, to read the record from.
INTEGER, INTENT(IN) :: unit
! The contents of the record.
CHARACTER(:), INTENT(OUT), ALLOCATABLE :: line
INTEGER :: stat ! IO statement IOSTAT result.
CHARACTER(256) :: buffer ! Buffer to read a piece of the record.
INTEGER :: size ! Number of characters read from the file.
!***
line = ''
DO
READ (unit, "(A)", ADVANCE='NO', IOSTAT=stat, SIZE=size) buffer
IF (stat > 0) STOP 'Error reading file.'
line = line // buffer(:size)
! An end of record condition or end of file condition stops the loop.
IF (stat < 0) RETURN
END DO
END SUBROUTINE read_line
END PROGRAM main

Deferred length arrays are just that: deferred length. You still need to allocate the size of the array using the allocate statement before you can assign values to it. Once you allocate it, you can't change the size of the array unless you deallocate and then reallocate with a new size. That's why you're getting a debug error.
Fortran does not provide a way to dynamically resize character arrays like the std::string class does in C++, for example. In C++, you could initialize std::string var = "temp", then redefine it to var = "temporary" without any extra work, and this would be valid. This is only possible because the resizing is done behind the scenes by the functions in the std::string class (it doubles the size if the buffer limit is exceeded, which is functionally equivalent to reallocateing with a 2x bigger array).
Practically speaking, the easiest way I've found when dealing with strings in Fortran is to allocate a reasonably large character array that will fit most expected inputs. If the size of the input exceeds the buffer, then simply increase the size of your array by reallocateing with a larger size. Removing trailing white space can be done using trim.

You know that there are "complicated" ways of doing what you want. Rather than address those, I'll answer your first two "why?"s.
Unlike intrinsic assignment a read statement does not have the target variable first allocated to the correct size and type parameters for the thing coming in (if it isn't already like that). Indeed, it is a requirement that the items in an input list be allocated. Fortran 2008, 9.6.3, clearly states:
If an input item or an output item is allocatable, it shall be allocated.
This is the case whether the allocatable variable is a character with deferred length, a variable with other deferred length-type parameters, or an array.
There is another way to declare a character with deferred length: giving it the pointer attribute. This doesn't help you, though, as we also see
If an input item is a pointer, it shall be associated with a definable target ...
Why you have no output from your write statement is related to why you see that the character variable isn't allocated: you haven't followed the requirements of Fortran and so you can't expect the behaviour that isn't specified.
I'll speculate as to why this restriction is here. I see two obvious ways to relax the restriction
allow automatic allocation generally;
allow allocation of a deferred length character.
The second case would be easy:
If an input item or an output item is allocatable, it shall be allocated unless it is a scalar character variable with deferred length.
This, though, is clumsy and such special cases seem against the ethos of the standard as a whole. We'd also need a carefully thought out rule about alloction for this special case.
If we go for the general case for allocation, we'd presumably require that the unallocated effective item is the final effective item in the list:
integer, allocatable :: a(:), b(:)
character(7) :: ifile = '1 2 3 4'
read(ifile,*) a, b
and then we have to worry about
type aaargh(len)
integer, len :: len
integer, dimension(len) :: a, b
end type
type(aaargh), allocatable :: a(:)
character(9) :: ifile = '1 2 3 4 5'
read(ifile,*) a
It gets quite messy very quickly. Which seems like a lot of problems to resolve where there are ways, of varying difficulty, of solving the read problem.
Finally, I'll also note that allocation is possible during a data transfer statement. Although a variable must be allocated (as the rules are now) when appearing in input list components of an allocated variable of derived type needn't be if that effective item is processed by defined input.

Related

String (constants) Literals pointer

I'm relatively new to C++ and I searched for an answer to my question, however I got more confused. As I understand, string literals must be pointed by "const" pointers, since are considered to be readable only. I also understand the pointer itself is not constant (and could be changed), but actually it is pointing to a string constant.I also understand that the string itself cannot be modified. So in this example:
const char* cstr="string";
*cstr = 'a';
I get an error: "assignment of read-only location."
Now, if I define my C-string as following, and define a pointer to it, I'll be able to change the string:
char str[7]="string";
char* cstr = str;
*cstr = 'a';
cout << cstr <<endl;
the string will be modified (output --> a), means the first element of the string is changes. My two questions are:
1- why in the second example I am able to modify the C-string but in the first case I cannot make any changes to the string? 2- In both cases I am using pointers, but in the first case I should Use constant char pointer?
When you use the syntax
const char* cstr="string";
C++ defines:
An array of 7 character in the read-only section of memory, with the contents string\0 in it.
pointer on the stack (or in the writable global section of memory), with the address of that array.
However, when you use the syntax:
char str[7]="string";
C++ defines:
An array of 7 character on the stack (or in the writable global section of memory), with the contents "string\0" in it.
In the first case, the actual values are in read-only memory, so you can't change them. In the second case, they are in writable memory (stack or global).
C++ tries to enforce this semantic, so if the definition is read-only memory, you should use a const pointer.
Note that not all architectures have read-only memory, but because most of them do, and C++ might want to use the read-only memory feature (for better correctness), then C++ programmers should assume (for the purpose of pointer types) that constants are going to be placed in read-only memory.

Using aligned memory for Fortran FFTs (FFTW) without memory leaks

I want to use the modern Fortran interface of FFTW, but in a way that allows simple function calls like ifftshift(fft_c2c(vec)*exp(vec)) et cetera. This is my understanding of how to do this (I also understand that doing a new plan every call is not the most efficient thing). Currently this code is functional (returns correct results); however, there is a memory leak so that repeated calls result in losses. I'm not quite sure where though! I had hoped that the association of the return variable `fft' with the only unfreed memory would result in no leaks but this is evidently not true. What am I missing, and how can I better structure what I want to do with proper modern fortran? Thanks!
function fft_c2c(x) result(fft)
integer :: N
type(C_PTR) :: plan
complex(C_DOUBLE_COMPLEX), pointer :: fft(:)
complex(C_DOUBLE_COMPLEX), dimension(:), intent(in) :: x
! Use an auxiliary array that is allocated with fftw_alloc_complex
! to ensure memory alignment for performance, see FFTW docs
complex(C_DOUBLE_COMPLEX), pointer :: x_align(:)
type(C_PTR) :: p
N = size(x)
p = fftw_alloc_complex(int(N, C_SIZE_T))
call c_f_pointer(p, fft, [N]);
p = fftw_alloc_complex(int(N, C_SIZE_T))
call c_f_pointer(p, x_align, [N]);
plan = fftw_plan_dft_1d(N, x_align, fft, FFTW_FORWARD, FFTW_MEASURE);
! FFTW overwrites x_align and fft during planning process, so assign
! data here
x_align = x
call fftw_execute_dft(plan, x_align, fft);
call fftw_free(p);
end function fft_c2c
You can't do that easily. You are forcing your notin of "modern"="everything is a function" on Fortran, here it does not fit that well (or not at all).
For the meory leaks the rule is simple - deallocate all the pointers. Using them for the result variable is a guarantee of a memory leak. If you need local allocted aligned memory, you need to locally allocate it, copy the data there, copy the data out and deallocate it.
Every pointer in Fortran need explicit deallocation, there is no reference counting or garbage collection to deallocate them for you.
You think about just using the nonaligned memory with the appropriate flags and measure the difference, you seem not to care about the top performance anyway.
Finally, doingFFTW_MEASURE before every transform is not just "not the most efficient thing", it is an absolute performance disaster. You should, at the very least, use FFTW_ESTIMATE to mitigate it.

Why is string manipulation more expensive?

I've heard this so many times, that I have taken it for granted. But thinking back on it, can someone help me realize why string manipulation, say comparison etc, is more expensive than say an integer, or some other primitive?
8bit example:
1 bit can be 1 or 0. With 2 bits you can represent 0, 1, 2, and 3. And so on.
With a byte you have 2^8 possibilities, from 0 to 255.
In a string a single letter is stored in a byte, so "Hello world" is 11 bytes.
If I want to do 100 + 100, 100 is stored in 1 byte of memory, I need only two bytes to sum two numbers. The result will need again 1 byte.
Now let's try with strings, "100" + "100", this is 3 bytes plus 3 bytes and the result, "100100" needs 6 bytes to be stored.
This is over-simplified, but more or less it works in this way.
The int data type in C# was carefully selected to be a good match with processor design. Which can store an int in a cpu register, a storage location that's an easy factor of 3 faster than memory. And a single cpu instruction to compare values of type int. The CMP instruction runs in less than a single cpu cycle, a fraction of a nano-second.
That doesn't work nearly as well for a string, it is a variable length data type and every single char in the string must be compared to test for equality. So it is automatically proportionally slower by the size of the string. Furthermore, string comparison is afflicted by culture dependent comparison rules. The kind that make "ss" and "ß" equal in German and "Aa" and "Å" equal in Danish. Nothing subtle to deal with, taken care of by highly optimized table-driven code inside the CLR. It can't beat CMP.
I've always thought it was because of the immutability of strings. That is, every time you make a change to the string, it requires allocating memory for a whole new string (rather than modifying the original in place).
Probably a woefully naive understanding but perhaps someone else can expound further.
There are several things to consider when looking at the "cost" of manipulating strings.
There is the cost in terms of memory usage, there is the cost in terms of CPU cycles used, and there is a cost associated with the complexity of the code involved.
Integer manipulation (Add, Subtract, Multipy, Divide, Compare) is most often done by the CPU at the hardware level, in few (or even 1) instruction. When the manipulation is done, the answer fits back in the same size chunk of memory.
Strings are stored in blocks of memory, which have to be manipulated a byte or word at a time. Comparing two 100 character long strings may require 100 separate comparison operations.
Any manipulation that makes a string longer will require, either moving the string to a bigger block of memory, or moving other stuff around in memory to allow growing the existing block.
Any manipulation that leaves the string the same, or smaller, could be done in place, if the language allows for it. If not, then again, a new block of memory has to be allocated and contents moved.

Atomic reads and writes of properties

This question has been asked before but i still don't understand it fully so here it goes.
If i have a class with a property (a non-nullable double or int) - can i read and write the property with multiple threads?
I have read somewhere that since doubles are 64 bytes, it is possible to read a double property on one thread while it is being written on a different one. This will result in the reading thread returning a value that is neither the original value nor the new written value.
When could this happen? Is it possible with ints as well? Does it happen with both 64 and 32 bit applications?
I haven't been able to replicate this situation in a console
If i have a class with a property (a non-nullable double or int) - can i read and write the property with multiple theads?
I assume you mean "without any synchronization".
double and long are both 64 bits (8 bytes) in size, and are not guaranteed to be written atomically. So if you were moving from a value with byte pattern ABCD EFGH to a value with byte pattern MNOP QRST, you could potentially end up seeing (from a different thread) ABCD QRST or MNOP EFGH.
With properly aligned values of size 32 bits or lower, atomicity is guaranteed. (I don't remember seeing any guarantees that values will be properly aligned, but I believe they are by default unless you force a particular layout via attributes.) The C# 4 spec doesn't even mention alignment in section 5.5 which deals with atomicity:
Reads and writes of the following data types are atomic: bool, char, byte, sbyte, short, ushort, uint, int, float, and reference types. In addition, reads and writes of enum types with an underlying type in the previous list are also atomic. Reads and writes of other types, including long, ulong, double, and decimal, as well as user-defined types, are not guaranteed to be atomic. Aside from the library functions designed for that purpose, there is no guarantee of atomic read-modify-write, such as in the case of increment or decrement.
Additionally, atomicity isn't the same as volatility - so without any extra care being taken, a read from one thread may not "see" a write from a different thread.
These operations are not atomic, that's why the Interlocked class exists in the first place, with methods like Increment(Int32) and Increment(Int64).
To ensure thread safety, you should use at least this class, if not more complex locking (with ReaderWriterLockSlim, for example, in case you want to synchronize access to groups of properties).

No error message in Haskell

Just out of curiosity, I made a simple script to check speed and memory efficiency of constructing a list in Haskell:
wasteMem :: Int -> [Int]
wasteMem 0 = [199]
wasteMem x = (12432483483467856487256348746328761:wasteMem (x-1))
main = do
putStrLn("hello")
putStrLn(show (wasteMem 10000000000000000000000000000000000))
The strange thing is, when I tried this, it didn't run out of memory or stack space, it only prints [199], the same as running wasteMem 0. It doesn't even print an error message... why? Entering this large number in ghci just prints the number, so I don't think it's a rounding or reading error.
Your program is using a number greater than maxBound :: Int32. This means it will behave differently on different platforms. For GHC x86_64 Int is 64 bits (32 bits otherwise, but the Haskell report only promises 29 bits). This means your absurdly large value (1x10^34) is represented as 4003012203950112768 for me and zero for you 32-bit folks:
GHCI> 10000000000000000000000000000000000 :: Int
4003012203950112768
GHCI> 10000000000000000000000000000000000 :: Data.Int.Int32
0
This could be made platform independent by either using a fixed-size type (ex: from Data.Word or Data.Int) or using Integer.
All that said, this is a poorly conceived test to begin with. Haskell is lazy, so the amount of memory consumed by wastedMem n for any value n is minimal - it's just a thunk. Once you try to show this result it will grab elements off the list one at a time - first generating "[12432483483467856487256348746328761, and leaving the rest of the list as a thunk. The first value can be garbage collected before the second value is even considered (a constant-space program).
Adding to Thomas' answer, if you really want to waste space, you have to perform an operation on the list, which needs the whole list in memory at once. One such operation is sorting:
print . sort . wasteMem $ (2^16)
Also note that it's almost impossible to estimate the run-time memory usage of your list. If you want a more predictable memory benchmark, create an unboxed array instead of a list. This also doesn't require any complicated operation to ensure that everything stays in memory. Indexing a single element in an array already makes sure that the array is in memory at least once.

Resources