I have specific dataformat, say 'n' (arbitrary) row and '4' columns. If 'n' is '10', the example data would go like this.
1.01e+00 -2.01e-02 -3.01e-01 4.01e+02
1.02e+00 -2.02e-02 -3.02e-01 4.02e+02
1.03e+00 -2.03e-02 -3.03e-01 4.03e+02
1.04e+00 -2.04e-02 -3.04e-01 4.04e+02
1.05e+00 -2.05e-02 -3.05e-01 4.05e+02
1.06e+00 -2.06e-02 -3.06e-01 4.06e+02
1.07e+00 -2.07e-02 -3.07e-01 4.07e+02
1.08e+00 -2.08e-02 -3.08e-01 4.07e+02
1.09e+00 -2.09e-02 -3.09e-01 4.09e+02
1.10e+00 -2.10e-02 -3.10e-01 4.10e+02
Constraints in building this input would be
data should have '4' columns.
data separated by white spaces.
I want to implement a feature to check whether the input file has '4' columns in every row, and built my own based on the 'M.S.B's answer in the post Reading data file in Fortran with known number of lines but unknown number of entries in each line.
program readtest
use :: iso_fortran_env
implicit none
character(len=512) :: buffer
integer :: i, i_line, n, io, pos, pos_tmp, n_space
integer,parameter :: max_len = 512
character(len=max_len) :: filename
filename = 'data_wrong.dat'
open(42, file=trim(filename), status='old', action='read')
print *, '+++++++++++++++++++++++++++++++++++'
print *, '+ Count lines +'
print *, '+++++++++++++++++++++++++++++++++++'
n = 0
i_line = 0
pos = 1
pos_tmp = 1
i_line = i_line+1
read(42, '(a)', iostat=io) buffer
(*1)! Count blank spaces.
n_space = 0
pos = index(buffer(pos+1:), " ") + pos
if (pos /= 0) then
if (pos > pos_tmp+1) then
n_space = n_space+1
pos_tmp = pos
pos_tmp = pos
end if
if (pos == max_len) then
end if
end do
pos_tmp = pos
if (io /= 0) then
end if
print *, '> line : ', i_line, ' n_space : ', n_space
n = n+1
end do
print *, ' >> number of line = ', n
end program
If I run the above program with a input file with some wrong rows like follows,
1.01e+00 -2.01e-02 -3.01e-01 4.01e+02
1.02e+00 -2.02e-02 -3.02e-01 4.02e+02
1.03e+00 -2.03e-02 -3.03e-01 4.03e+02
1.04e+00 -2.04e-02 -3.04e-01 4.04e+02
1.05e+00 -2.05e-02 -3.05e-01 4.05e+02
1.06e+00 -2.06e-02 -3.06e-01 4.06e+02
1.07e+00 -2.07e-02 -3.07e-01 4.07e+02
1.0 2.0 3.0
1.08e+00 -2.08e-02 -3.08e-01 4.07e+02 1.00
1.09e+00 -2.09e-02 -3.09e-01 4.09e+02
1.10e+00 -2.10e-02 -3.10e-01 4.10e+02
The output is like this,
+ Count lines +
> line : 1 n_space : 4
> line : 2 n_space : 4
> line : 3 n_space : 4
> line : 4 n_space : 4
> line : 5 n_space : 4
> line : 6 n_space : 4
> line : 7 n_space : 4
> line : 8 n_space : 3 (*2)
> line : 9 n_space : 5 (*3)
> line : 10 n_space : 4
> line : 11 n_space : 4
>> number of line = 11
And you can see that the wrong rows are properly detected as I intended (see (*2) and (*3)), and I can write 'if' statements to make some error messages.
But I think my code is 'extremely' ugly since I had to do something like (*1) in the code to count consecutive white spaces as one space. I think there would be much more elegant way to ensure the rows contain only '4' column each, say,
read(*,'4(X, A)') line
(which didn't work)
And also my program would fail if the length of 'buffer' exceeds 'max_len' which is set to '512' in this case. Indeed '512' should be enough for most practical purposes, I also want my checking subroutine to be robust in this way.
So, I want to improve my subroutine in at least these aspects
Want it to be more elegant (not as (*1))
Be more general (especially in regards to 'max_len')
Does anyone has some experience in building this kind of input-checking subroutine ??
Any comments would be highly appreciated.
Thank you for reading the question.
Without knowledge of the exact data format, I think it would be rather difficult to achieve what you want (or at least, I wouldn't know how to do it).
In the most general case, I think your space counting idea is the most robust and correct.
It can be adapted to avoid the maximum string length problem you describe.
In the following code, I go through the data as an unformatted, stream access file.
Basically you read every character and take note of new_lines and spaces.
As you did, you use spaces to count to columns (skipping double spaces) and new_line characters to count the rows.
However, here we are not reading the entire line as a string and going through it to find spaces; we read char by char, avoiding the fixed string length problem and we also end up with a single loop. Hope it helps.
EDIT: now handles white spaces at beginning at end of line and empty lines
program readtest
use :: iso_fortran_env
implicit none
character :: old_char, new_char
integer :: line, io, cols
logical :: beg_line
integer,parameter :: max_len = 512
character(len=max_len) :: filename
filename = 'data_wrong.txt'
! Output format to be used later
100 format (a, 3x, i0, a, 3x , i0)
open(42, file=trim(filename), status='old', action='read', &
form="unformatted", access="stream")
! set utils
old_char = " "
line = 0
beg_line = .true.
cols = 0
! Start scannig char by char
read(42, iostat = io) new_char
! Exit if EOF
if (io < 0) then
end if
! Deal with empty lines
if (beg_line .and. new_char==new_line(new_char)) then
line = line + 1
write(*, 100, advance="no") "Line number:", line, &
"; Columns: Number", cols
write(*,'(6x, a5)') "EMPTYLINE"
! Deal with beginning of line for white spaces
elseif (beg_line) then
beg_line = .false.
! this indicates new columns
elseif (new_char==" " .and. old_char/=" ") then
cols = cols + 1
! End of line: time to print
elseif (new_char==new_line(new_char)) then
if (old_char/=" ") then
cols = cols+1
line = line + 1
! Printing out results
write(*, 100, advance="no") "Line number:", line, &
"; Columns: Number", cols
if (cols == 4) then
write(*,'(6x, a5)') "OK"
write(*,'(6x, a5)') "ERROR"
end if
! Restart with a new line (reset counters)
cols = 0
beg_line = .true.
end if
old_char = new_char
end do
end program
This is the output of this program:
Line number: 1; Columns number: 4 OK
Line number: 2; Columns number: 4 OK
Line number: 3; Columns number: 4 OK
Line number: 4; Columns number: 4 OK
Line number: 5; Columns number: 4 OK
Line number: 6; Columns number: 4 OK
Line number: 7; Columns number: 4 OK
Line number: 8; Columns number: 3 ERROR
Line number: 9; Columns number: 5 ERROR
Line number: 10; Columns number: 4 OK
Line number: 11; Columns number: 4 OK
If you knew your data format, you could read your lines in a vector of dimension 4 and use iostat variable to print out an error on each line where iostat is an integer greater than 0.
Instead of counting whitespace you can use manipulation of substrings to get what you want. A simple example follows:
program foo
implicit none
character(len=512) str ! Assume str is sufficiently long buffer
integer fd, cnt, m, n
open(newunit=fd, file='test.dat', status='old')
cnt = 0
read(fd,'(A)',end=10) str
str = adjustl(str) ! Eliminate possible leading whitespace
n = index(str, ' ') ! Find first space
if (n /= 0) then
write(*, '(A)', advance='no') str(1:n)
str = adjustl(str(n+1:))
end if
if (len_trim(str) == 0) exit ! Trailing whitespace
cnt = cnt + 1
end do
if (cnt /= 3) then
write(*,'(A)') ' Error'
end if
end do
10 close(fd)
end program foo
this should read any line of reasonable length (up to the line limit your compiler defaults to, which is generally 2GB now-adays). You could change it to stream I/O to have no limit but most Fortran compilers have trouble reading stream I/O from stdin, which this example reads from. So if the line looks anything like a list of numbers it should read them, tell you how many it read, and let you know if it had an error reading any value as a number (character strings, strings bigger than the size of a REAL value, ....). All the parts here are explained on the Fortran Wiki, but to keep it short this is a stripped down version that just puts the pieces together. The oddest behavior it would have is that if you entered something like this with a slash in it
10 20,,30,40e4 50 / this is a list of numbers
it would treat everything after the slash as a comment and not generate a non-zero status return while returning five values. For a more detailed explanation of the code I think the annotated pieces on the Wiki explain how it works. In the search, look for "getvals" and "readline".
So with this program you can read a line and if the return status is zero and the number of values read is four you should be good except for a few dusty corners where the lines would definitely not look like a list of numbers.
module M_getvals
public getvals, readline
implicit none
subroutine getvals(line,values,icount,ierr)
character(len=*),intent(in) :: line
real :: values(:)
integer,intent(out) :: icount, ierr
character(len=:),allocatable :: buffer
character(len=len(line)) :: words(size(values))
integer :: ios, i
words=' '
read(buffer,*,iostat=ios) words
do i=1,size(values)
if(words(i).eq.'') cycle
write(*,*)'*getvals* WARNING:['//trim(words(i))//'] is not a number'
end subroutine getvals
subroutine readline(line,ier)
character(len=:),allocatable,intent(out) :: line
integer,intent(out) :: ier
integer,parameter :: buflen=1024
character(len=buflen) :: buffer
integer :: last, isize
read(*,iostat=ier,fmt='(a)',advance='no',size=isize) buffer
end subroutine readline
end module M_getvals
program tryit
use M_getvals, only: getvals, readline
implicit none
character(len=:),allocatable :: line
real,allocatable :: values(:)
integer :: icount, ier, ierr
call readline(line,ier)
call getvals(line,values,icount,ierr)
write(*,'(*(g0,1x))')'VALUES=',values(:icount),'NUMBER OF VALUES=',icount,'STATUS=',ierr
end program tryit
Honesty, it should work reasonably with just about any line you throw at it.
If you are always reading four values, using list-directed I/O and checking the iostat= value on READ and checking if you hit EOR would be very simple (just a few lines) but since you said you wanted to read lines of arbitrary length I am assuming four values on a line was just an example and you wanted something very generic.
I know that IACHAR(s) returns the code for the ASCII character in the first character position of the string s, but I need to convert the entire string to an integer. I also have a few number of strings (around 30 strings, each consists of at most 20 characters). Is there any way to convert each one of them to a unique integer in Fortran 90?
You can read a string into an integer variable:
module str2int_mod
elemental subroutine str2int(str,int,stat)
implicit none
! Arguments
character(len=*),intent(in) :: str
integer,intent(out) :: int
integer,intent(out) :: stat
read(str,*,iostat=stat) int
end subroutine str2int
end module
program test
use str2int_mod
character(len=20) :: str(3)
integer :: int(3), stat(3)
str(1) = '123' ! Valid integer
str(2) = '-1' ! Also valid
str(3) = 'one' ! invalid
call str2int(str,int,stat)
do i=1,3
if ( stat(i) == 0 ) then
print *,i,int(i)
print *,'Conversion of string ',i,' failed!'
end program
You can use the read() method as suggested, or you could use faiNumber for Fortran(faiNumber-Fortran) that was written by me at faiNumber-Fortran operated about 10x faster than read()(tested with gfortran8 with build version legacy, f95, f2003, and f2018).
Also, if you use faiNumber-Fortran, you are guarded against invalid string such as "1 abc", "125 7895", and so on. Those formats are parsable by the read() procedure(tested with gfortran8 with build version legacy, f95, f2003, and f2018). Where faiNumber will notify you that the input string is invalid.
For version one you get two versions, one to use with pure procedures, of which slightly slower than the version that can only be used by impure procedures.
FaiNumber-Fortran also let you choose where to start and end in your string. This below is a small example of what you can do. There is a lot more than the example. Nonetheless, I documented the code very thoroughly(I hope). The example is for the version that built as an all pure procedures library.
program example
! For 64/128, use fnDecimalUtil64/fnDecimalUtil128.
! To use procedures of 64/128, The right module have to be called.
use fnDecimalUtil
implicit none
! For 64/128, integer kind are k_int64/k_int128.
integer(k_int32) :: resultValue, startpos, endpos
! Where there is an error code return, it will always be an int32 value.
integer(k_int32) :: errorInt
logical :: errorLogical
! For 64/128, call decToInt64/decToInt128.
call decToInt32("123", resultValue, errorLogical)
if ( errorLogical .eqv. .FALSE. ) then
print *, resultValue
print *, "There was an error during parsing."
end if
startpos = 13
endpos = 17
call decToInt32(" This here($12345)can be parse with start and end", &
resultValue, errorLogical, startpos, endpos)
if ( errorLogical .eqv. .FALSE. ) then
print *, resultValue
print *, "There was an error during parsing."
end if
! This procedure below is where you need to know what was wrong
! during parsing the input string.
! This may run slower if the strings are long. The TrueError procedure
! has exactly the same feature as the normal one, they are just
! different by how errors are handled.
! Empty string will be checked first then error 5.
! If error 5 is encountered, nothing else will be check. For error
! 5, startpos will be checked first before endpos.
! For 64/128, call decToInt64TrueError/decToInt128TrueError
startpos = 12
call decToInt32TrueError(" line 24: 1278421", resultValue, errorInt, startpos) ! startpos can be used without endpos,
if ( errorInt == 0 ) then
print *, resultValue
else if ( errorInt == 1 ) then
print *, "The input string was empty."
else if ( errorInt == 2 ) then
print *, "The input string contained an invalid decimal integer."
else if ( errorInt == 3 ) then
print *, "The input string contained a value that is smaller than the minimum value of the data type."
else if ( errorInt == 4 ) then
print *, "The input string contained a value that is larger than the maximum value of the data type."
else if ( errorInt == 5 ) then
print *, "It was either startpos > length, endpos < startpos, or endpos < 1."
end if
end program example
I would like to read a data file with a Fortran program, where each line is a list of integers.
Each line has a variable number of integers, separated by a given character (space, comma...).
Sample input:
I have a solution to split lines, which I find rather convoluted:
module split
implicit none
function string_to_integers(str, sep) result(a)
integer, allocatable :: a(:)
integer :: i, j, k, n, m, p, r
character(*) :: str
character :: sep, c
character(:), allocatable :: tmp
!First pass: find number of items (m), and maximum length of an item (r)
n = len_trim(str)
m = 1
j = 0
r = 0
do i = 1, n
if(str(i:i) == sep) then
m = m + 1
r = max(r, j)
j = 0
j = j + 1
end if
end do
r = max(r, j)
allocate(character(r) :: tmp)
!Second pass: copy each item into temporary string (tmp),
!read an integer from tmp, and write this integer in the output array (a)
tmp(1:r) = " "
j = 0
k = 0
do i = 1, n
c = str(i:i)
if(c == sep) then
k = k + 1
read(tmp, *) p
a(k) = p
tmp(1:r) = " "
j = 0
j = j + 1
tmp(j:j) = c
end if
end do
k = k + 1
read(tmp, *) p
a(k) = p
end function
end module
My question:
Is there a simpler way to do this in Fortran? I mean, reading a list of values where the number of values to read is unknown. The above code looks awkward, and file I/O does not look easy in Fortran.
Also, the main program has to read lines with unknown and unbounded length. I am able to read lines if I assume they are all the same length (see below), but I don't know how to read unbounded lines. I suppose it would need the stream features of Fortran 2003, but I don't know how to write this.
Here is the current program:
program read_data
use split
implicit none
integer :: q
integer, allocatable :: a(:)
character(80) :: line
open(unit=10, file="input.txt", action="read", status="old", form="formatted")
read(10, "(A80)", iostat=q) line
if(q /= 0) exit
if(line(1:1) /= "#") then
a = string_to_integers(line, ",")
print *, ubound(a), a
end if
end do
end program
A comment about the question: usually I would do this in Python, for example converting a line would be as simple as a = [int(x) for x in line.split(",")], and reading a file is likewise almost a trivial task. And I would do the "real" computing stuff with a Fortran DLL. However, I'd like to improve my Fortran skills on file I/O.
I don't claim it is the shortest possible, but it is much shorter than yours. And once you have it, you can reuse it. I don't completely agree with these claims how Fotran is bad at string processing, I do tokenization, recursive descent parsing and similar stuff just fine in Fortran, although it is easier in some other languages with richer libraries. Sometimes you can use the libraries written in other languages (especially C and C++) in Fortran too.
If you always use the comma you can remove the replacing by comma and thus shorten it even more.
function string_to_integers(str, sep) result(a)
integer, allocatable :: a(:)
character(*) :: str
character :: sep
integer :: i, n_sep
n_sep = 0
do i = 1, len_trim(str)
if (str(i:i)==sep) then
n_sep = n_sep + 1
str(i:i) = ','
end if
end do
read(str,*) a
end function
Potential for shortening: view the str as a character array using equivalence or transfer and use count() inside of allocate to get the size of a.
The code assumes that there is just one separator between each number and there is no separator before the first one. If multiple separators are allowed between two numbers, you have to check whether the preceding character is a separator or not
do i = 2, len_trim(str)
if (str(i:i)==sep .and. str(i-1:i-1)/=sep) then
n_sep = n_sep + 1
str(i:i) = ','
end if
end do
My answer is probably too simplistic for your goals but I have spent a lot of time recently reading in strange text files of numbers. My biggest problem is finding where they start (not hard in your case) then my best friend is the list-directed read.
read(unit=10,fmt=*) a
will read in all of the data into vector 'a', done deal. With this method you will not know which line any piece of data came from. If you want to allocate it then you can read the file once and figure out some algorithm to make the array larger than it needs to be, like maybe count the number of lines and you know a max data amount per line (say 21).
status = 0
do while ( status == 0)
line_counter = line_counter + 1
read(unit=10,, iostat=status, fmt=*)
end do
If you want to then eliminate zero values you can remove them or pre-seed the 'a' vector with a negative number if you don't expect any then remove all of those.
Another approach stemming from the other suggestion is to first count the commas then do a read where the loop is controlled by
do j = 1, line_counter ! You determined this on your first read
read(unit=11,fmt=*) a(j,:) ! a is now a 2 dimensional array (line_counter, maxNumberPerLine)
! You have a separate vector numberOfCommas(j) from before
end do
And now you can do whatever you want with these two arrays because you know all the data, which line it came from, and how many data were on each line.
I think this is quite a basic question, but I can't seem to find the answer. I'm trying to read a file of the following form:
1 filedir/i03j12_fort.4
71 filedir/i04j01_fort.4
224 filedir/i04j02_fort.4
I use the following command to get the initial integer, plus the 'i' and 'j' values from the filename (ldir is a string containing the length of filedir).
read(filenumber,'(i6,'//ldir//'x,i2,x,i2)') n,pix_i,pix_j
the problem is that the amount of whitespace preceding the integer varies between files, so I have to manually change the width each time. I have also tried not specifying a format, and reading the whole filename as a string, i.e.
read(filenumber,*) n, filename
but the filename returns weird characters (n works though).
Is there any format statement that will read the integer up to the first whitespace it finds, to replace the 'i6' I have above?
No - you will need to process the file "manually". Read it into a string, go looking for the first non-blank, then go looking for the next blank, then use internal io to read the relevant bits, etc.
As you have found, list directed io (using * as the format specifier) has some surprising features - one of them being that the slash character (/) in input means "stop reading here and leave remaining variables in the IO list as they were". This doesn't work well when you have paths that contain slashes!
Just for fun...
PROGRAM read_some_things
! Some number bigger than most of the lines to be read.
INTEGER, PARAMETER :: line_buffer_size = 28
! Index of the start of the value of i in filename.
INTEGER, PARAMETER :: pos_i_in_filename = 10
! Index of the start of the value of j in filename.
INTEGER, PARAMETER :: pos_j_in_filename = 13
CALL process_a_file
SUBROUTINE process_a_file
INTEGER :: unit ! Unit number for IO.
CHARACTER(:), ALLOCATABLE :: line ! A line from the file.
INTEGER :: iostat ! IOSTAT code.
CHARACTER(256) :: iomsg ! IOMSG to go with IOSTAT
INTEGER :: n, i, j ! Numbers of interest.
OPEN( NEWUNIT=unit, FILE='2015-01-09 read_some_things.txt', &
CALL read_a_line(unit, line, iostat, iomsg)
IF (iostat /= 0) THEN
PRINT "('Error number ',I0,' reading file: ',A)", &
iostat, TRIM(iomsg)
! What to do with an empty record?
! IF (LEN_TRIM(line) == 0) CALL Start_WW3
CALL chop_a_line(line, n, i, j)
PRINT "(2X,I0,1X,I0,1X,I0)", n, i, j
END SUBROUTINE process_a_file
! Parse a line into numbers of interest.
SUBROUTINE chop_a_line(line, n, i, j)
CHARACTER(*), INTENT(IN) :: line ! The line to chop.
INTEGER, INTENT(OUT) :: n ! Things we got...
! Various significnat character positions in the line.
INTEGER :: first_non_blank_pos
INTEGER :: next_blank_pos
INTEGER :: before_filename_pos
! Buffer for assembling a format specification.
CHARACTER(100) :: fmt
! Find start of first non-blank group.
first_non_blank_pos = VERIFY(line, ' ')
! Tolerate its non-existence - this may be zero.
! Find start of the following blank group, starting from after
! the beginning of the first non-blank group.
next_blank_pos = SCAN(line(first_non_blank_pos+1:), ' ')
! It had better exist. If it doesn't, confuse user.
IF (next_blank_pos == 0) ERROR STOP 'I didn''t draw any blanks'
next_blank_pos = next_blank_pos + first_non_blank_pos
! Find start of the second group of non-blanks, backup one.
before_filename_pos = VERIFY(line(next_blank_pos:), ' ')
! It had better exist. If it doesn't, annoy user.
IF (before_filename_pos == 0) ERROR STOP 'Line in file with no file!'
! Note -2 to backup one and remember position before filename.
before_filename_pos = before_filename_pos + next_blank_pos - 2
! This specifies:
! - read all prior to filename as integer,
! - then skip to start of i, read I2,
! - then skip to start of j, read I2.
WRITE (fmt, "('(I',I0,',T',I0,',I2,T',I0,',I2)')") &
before_filename_pos, &
before_filename_pos + pos_i_in_filename, &
before_filename_pos + pos_j_in_filename
READ (line, fmt) n, i, j
END SUBROUTINE chop_a_line
! Read a record into a character variable. Pretty common task...
SUBROUTINE read_a_line(unit, line, iostat, iomsg)
INTEGER, INTENT(IN) :: unit ! Unit to read from.
CHARACTER(:), INTENT(OUT), ALLOCATABLE :: line ! The record read.
INTEGER, INTENT(OUT) :: iostat ! +ve on error, -ve on eof.
CHARACTER(*), INTENT(OUT) :: iomsg ! IOMSG if iostat /= 0
! Buffer to read record fragment.
CHARACTER(line_buffer_size) :: buffer
INTEGER :: size ! Amount read per read.
line = ''
! Read a bit without always advancing to the next record.
READ ( unit, "(A)", ADVANCE='NO', SIZE=size, IOSTAT=iostat, &
IOMSG=iomsg ) buffer
IF (iostat > 0) RETURN ! Bail on fail.
! Philosophical discussion about whether EOF is possible
! and SIZE /= 0 goes here (consider STREAM access).
line = line // buffer(:size) ! Append what we got.
! Exit loop on end of file or end of record.
IF (iostat < 0) EXIT
! End of record is expected, not a relevant condition to return.
IF (IS_IOSTAT_EOR(iostat)) iostat = 0
END SUBROUTINE read_a_line
END PROGRAM read_some_things
For completeness, here is my chosen solution, which relies on the fact that the string following the integer always starts with '/' (as it's a filepath):
! first determine how much whitespace is around the first integer
! and store this as string ln
read(20, '(a)') filestring
write(ln, "(I1)") INDEX(filestring, ' /')-1
! use ln to as the integer width
read(filenumber,'(i'//ln//','//ldir//'x,i2,x,i2)') n,pix_i,pix_j
