Read scientific formatted numbers from txt - text

I would like to read and store scientific formatted numbers from a txt file, which is formatted and the numbers are separated by tabulator.
This is what I have so far:
IMPLICIT NONE
REAL,ALLOCATABLE,DIMENSION(2) :: data(:,:)
INTEGER :: row,column
INTEGER :: j,i
CHARACTER(len=30) :: filename
CHARACTER(len=30) :: format
filename='data.txt'
open(86,file=filename,err=10)
write(*,*)'open data file'
read(86, *) row
read(86, *) column
allocate(data(row,column))
format='(ES14.7)'
do i=1,row
read(86,format) data(i,:)
enddo
close(86)
This is how the txt file looks like:
200
35
2.9900E-35 2.8000E-35 2.6300E-35 2.4600E-35 2.3100E-35 2.1600E-35 ...
The problem is that it doesn't read and store the correct values from the txt to the data variable. Is it the format causing the problem?
I would also like to know how to count the number of columns in this case. (I can count the rows by using read(86,*) in a for loop.)

Yes, your format is not good for the data you show. Better one should be like that read(99,'(6(E11.4,X))') myData(i,:).
However, I am not sure if you really need to use format at your reading at all.
Following example pretty close to what you are trying to do, and it is working bot with and without format.
program readdata
implicit none
real, allocatable :: myData(:,:)
real :: myLine
integer :: i, j, myRow, myColumn
character(len=30) :: myFileName
character(len=30) :: myFormat
myFileName='data.dat'
open(99, file=myFileName)
write(*,*)'open data file'
read(99, *) myRow
read(99, *) myColumn
allocate(myData(myRow,myColumn))
do i=1,myRow
read(99,*) myData(i,:)
!read(99,'(6(E11.4,X))') myData(i,:)
print*, myData(i,:)
enddo
close(99)
end program readdata
To test, I assumed that you have rows and columns always in the file, as you give, so my test data was following.
2
6
2.9900E-35 2.8000E-35 2.6300E-35 2.4600E-35 2.3100E-35 2.1600E-35
2.9900E-35 2.8000E-35 2.6300E-35 2.4600E-35 2.3100E-35 2.1600E-35
If you are really interested to read your files with a format and if the number of columns are not constant you may need a format depending on a variable, please see related discussions here.

Though there are no direct command to count the number of items in a line, we can count the number of periods or (E|e|D|d) by using the scan command. For example,
program main
implicit none
character(100) str
integer n
read( *, "(a)" ) str
call countreal( str, n )
print *, "number of items = ", n
contains
subroutine countreal( str, num )
implicit none
character(*), intent(in) :: str
integer, intent(out) :: num
integer pos, offset
num = 0
pos = 0
do
offset = scan( str( pos + 1 : ), "." ) !! (1) search for periods
!! offset = scan( str( pos + 1 : ), "EeDd" ) !! (2) search for (E|e|D|d)
if ( offset > 0 ) then
pos = pos + offset
num = num + 1
print *, "pos=", pos, "num=", num !! just for check
else
return
endif
enddo
endsubroutine
end
Please note that pattern (1) works only when all items have periods, while pattern (2) works only when all items have exponents:
# When compiled with (1)
$ echo "2.9900 2.8000E-35 2.6300D-35 2.46 2.31" | ./a.out
pos= 2 num= 1
pos= 10 num= 2
pos= 22 num= 3
pos= 34 num= 4
pos= 40 num= 5
number of items = 5
# When compiled with (2)
$ echo "2.9900E-35 2.8000D-35 2.6300e-35 2.4600d-35" | ./a.out
pos= 7 num= 1
pos= 19 num= 2
pos= 31 num= 3
pos= 43 num= 4
number of items = 4
For more general purposes, it may be more convenient to write a custom "split()" function that separate items with white spaces (or use an external library that supports a split function).

Related

Analyzing input from an internal unit in fortran [duplicate]

I have specific dataformat, say 'n' (arbitrary) row and '4' columns. If 'n' is '10', the example data would go like this.
1.01e+00 -2.01e-02 -3.01e-01 4.01e+02
1.02e+00 -2.02e-02 -3.02e-01 4.02e+02
1.03e+00 -2.03e-02 -3.03e-01 4.03e+02
1.04e+00 -2.04e-02 -3.04e-01 4.04e+02
1.05e+00 -2.05e-02 -3.05e-01 4.05e+02
1.06e+00 -2.06e-02 -3.06e-01 4.06e+02
1.07e+00 -2.07e-02 -3.07e-01 4.07e+02
1.08e+00 -2.08e-02 -3.08e-01 4.07e+02
1.09e+00 -2.09e-02 -3.09e-01 4.09e+02
1.10e+00 -2.10e-02 -3.10e-01 4.10e+02
Constraints in building this input would be
data should have '4' columns.
data separated by white spaces.
I want to implement a feature to check whether the input file has '4' columns in every row, and built my own based on the 'M.S.B's answer in the post Reading data file in Fortran with known number of lines but unknown number of entries in each line.
program readtest
use :: iso_fortran_env
implicit none
character(len=512) :: buffer
integer :: i, i_line, n, io, pos, pos_tmp, n_space
integer,parameter :: max_len = 512
character(len=max_len) :: filename
filename = 'data_wrong.dat'
open(42, file=trim(filename), status='old', action='read')
print *, '+++++++++++++++++++++++++++++++++++'
print *, '+ Count lines +'
print *, '+++++++++++++++++++++++++++++++++++'
n = 0
i_line = 0
do
pos = 1
pos_tmp = 1
i_line = i_line+1
read(42, '(a)', iostat=io) buffer
(*1)! Count blank spaces.
n_space = 0
do
pos = index(buffer(pos+1:), " ") + pos
if (pos /= 0) then
if (pos > pos_tmp+1) then
n_space = n_space+1
pos_tmp = pos
else
pos_tmp = pos
end if
endif
if (pos == max_len) then
exit
end if
end do
pos_tmp = pos
if (io /= 0) then
exit
end if
print *, '> line : ', i_line, ' n_space : ', n_space
n = n+1
end do
print *, ' >> number of line = ', n
end program
If I run the above program with a input file with some wrong rows like follows,
1.01e+00 -2.01e-02 -3.01e-01 4.01e+02
1.02e+00 -2.02e-02 -3.02e-01 4.02e+02
1.03e+00 -2.03e-02 -3.03e-01 4.03e+02
1.04e+00 -2.04e-02 -3.04e-01 4.04e+02
1.05e+00 -2.05e-02 -3.05e-01 4.05e+02
1.06e+00 -2.06e-02 -3.06e-01 4.06e+02
1.07e+00 -2.07e-02 -3.07e-01 4.07e+02
1.0 2.0 3.0
1.08e+00 -2.08e-02 -3.08e-01 4.07e+02 1.00
1.09e+00 -2.09e-02 -3.09e-01 4.09e+02
1.10e+00 -2.10e-02 -3.10e-01 4.10e+02
The output is like this,
+++++++++++++++++++++++++++++++++++
+ Count lines +
+++++++++++++++++++++++++++++++++++
> line : 1 n_space : 4
> line : 2 n_space : 4
> line : 3 n_space : 4
> line : 4 n_space : 4
> line : 5 n_space : 4
> line : 6 n_space : 4
> line : 7 n_space : 4
> line : 8 n_space : 3 (*2)
> line : 9 n_space : 5 (*3)
> line : 10 n_space : 4
> line : 11 n_space : 4
>> number of line = 11
And you can see that the wrong rows are properly detected as I intended (see (*2) and (*3)), and I can write 'if' statements to make some error messages.
But I think my code is 'extremely' ugly since I had to do something like (*1) in the code to count consecutive white spaces as one space. I think there would be much more elegant way to ensure the rows contain only '4' column each, say,
read(*,'4(X, A)') line
(which didn't work)
And also my program would fail if the length of 'buffer' exceeds 'max_len' which is set to '512' in this case. Indeed '512' should be enough for most practical purposes, I also want my checking subroutine to be robust in this way.
So, I want to improve my subroutine in at least these aspects
Want it to be more elegant (not as (*1))
Be more general (especially in regards to 'max_len')
Does anyone has some experience in building this kind of input-checking subroutine ??
Any comments would be highly appreciated.
Thank you for reading the question.
Without knowledge of the exact data format, I think it would be rather difficult to achieve what you want (or at least, I wouldn't know how to do it).
In the most general case, I think your space counting idea is the most robust and correct.
It can be adapted to avoid the maximum string length problem you describe.
In the following code, I go through the data as an unformatted, stream access file.
Basically you read every character and take note of new_lines and spaces.
As you did, you use spaces to count to columns (skipping double spaces) and new_line characters to count the rows.
However, here we are not reading the entire line as a string and going through it to find spaces; we read char by char, avoiding the fixed string length problem and we also end up with a single loop. Hope it helps.
EDIT: now handles white spaces at beginning at end of line and empty lines
program readtest
use :: iso_fortran_env
implicit none
character :: old_char, new_char
integer :: line, io, cols
logical :: beg_line
integer,parameter :: max_len = 512
character(len=max_len) :: filename
filename = 'data_wrong.txt'
! Output format to be used later
100 format (a, 3x, i0, a, 3x , i0)
open(42, file=trim(filename), status='old', action='read', &
form="unformatted", access="stream")
! set utils
old_char = " "
line = 0
beg_line = .true.
cols = 0
! Start scannig char by char
do
read(42, iostat = io) new_char
! Exit if EOF
if (io < 0) then
exit
end if
! Deal with empty lines
if (beg_line .and. new_char==new_line(new_char)) then
line = line + 1
write(*, 100, advance="no") "Line number:", line, &
"; Columns: Number", cols
write(*,'(6x, a5)') "EMPTYLINE"
! Deal with beginning of line for white spaces
elseif (beg_line) then
beg_line = .false.
! this indicates new columns
elseif (new_char==" " .and. old_char/=" ") then
cols = cols + 1
! End of line: time to print
elseif (new_char==new_line(new_char)) then
if (old_char/=" ") then
cols = cols+1
endif
line = line + 1
! Printing out results
write(*, 100, advance="no") "Line number:", line, &
"; Columns: Number", cols
if (cols == 4) then
write(*,'(6x, a5)') "OK"
else
write(*,'(6x, a5)') "ERROR"
end if
! Restart with a new line (reset counters)
cols = 0
beg_line = .true.
end if
old_char = new_char
end do
end program
This is the output of this program:
Line number: 1; Columns number: 4 OK
Line number: 2; Columns number: 4 OK
Line number: 3; Columns number: 4 OK
Line number: 4; Columns number: 4 OK
Line number: 5; Columns number: 4 OK
Line number: 6; Columns number: 4 OK
Line number: 7; Columns number: 4 OK
Line number: 8; Columns number: 3 ERROR
Line number: 9; Columns number: 5 ERROR
Line number: 10; Columns number: 4 OK
Line number: 11; Columns number: 4 OK
If you knew your data format, you could read your lines in a vector of dimension 4 and use iostat variable to print out an error on each line where iostat is an integer greater than 0.
Instead of counting whitespace you can use manipulation of substrings to get what you want. A simple example follows:
program foo
implicit none
character(len=512) str ! Assume str is sufficiently long buffer
integer fd, cnt, m, n
open(newunit=fd, file='test.dat', status='old')
do
cnt = 0
read(fd,'(A)',end=10) str
str = adjustl(str) ! Eliminate possible leading whitespace
do
n = index(str, ' ') ! Find first space
if (n /= 0) then
write(*, '(A)', advance='no') str(1:n)
str = adjustl(str(n+1:))
end if
if (len_trim(str) == 0) exit ! Trailing whitespace
cnt = cnt + 1
end do
if (cnt /= 3) then
write(*,'(A)') ' Error'
else
write(*,*)
end if
end do
10 close(fd)
end program foo
this should read any line of reasonable length (up to the line limit your compiler defaults to, which is generally 2GB now-adays). You could change it to stream I/O to have no limit but most Fortran compilers have trouble reading stream I/O from stdin, which this example reads from. So if the line looks anything like a list of numbers it should read them, tell you how many it read, and let you know if it had an error reading any value as a number (character strings, strings bigger than the size of a REAL value, ....). All the parts here are explained on the Fortran Wiki, but to keep it short this is a stripped down version that just puts the pieces together. The oddest behavior it would have is that if you entered something like this with a slash in it
10 20,,30,40e4 50 / this is a list of numbers
it would treat everything after the slash as a comment and not generate a non-zero status return while returning five values. For a more detailed explanation of the code I think the annotated pieces on the Wiki explain how it works. In the search, look for "getvals" and "readline".
So with this program you can read a line and if the return status is zero and the number of values read is four you should be good except for a few dusty corners where the lines would definitely not look like a list of numbers.
module M_getvals
private
public getvals, readline
implicit none
contains
subroutine getvals(line,values,icount,ierr)
character(len=*),intent(in) :: line
real :: values(:)
integer,intent(out) :: icount, ierr
character(len=:),allocatable :: buffer
character(len=len(line)) :: words(size(values))
integer :: ios, i
ierr=0
words=' '
buffer=trim(line)//"/"
read(buffer,*,iostat=ios) words
icount=0
do i=1,size(values)
if(words(i).eq.'') cycle
read(words(i),*,iostat=ios)values(icount+1)
if(ios.eq.0)then
icount=icount+1
else
ierr=ios
write(*,*)'*getvals* WARNING:['//trim(words(i))//'] is not a number'
endif
enddo
end subroutine getvals
subroutine readline(line,ier)
character(len=:),allocatable,intent(out) :: line
integer,intent(out) :: ier
integer,parameter :: buflen=1024
character(len=buflen) :: buffer
integer :: last, isize
line=''
ier=0
INFINITE: do
read(*,iostat=ier,fmt='(a)',advance='no',size=isize) buffer
if(isize.gt.0)line=line//buffer(:isize)
if(is_iostat_eor(ier))then
last=len(line)
if(last.ne.0)then
if(line(last:last).eq.'\\')then
line=line(:last-1)
cycle INFINITE
endif
endif
ier=0
exit INFINITE
elseif(ier.ne.0)then
exit INFINITE
endif
enddo INFINITE
line=trim(line)
end subroutine readline
end module M_getvals
program tryit
use M_getvals, only: getvals, readline
implicit none
character(len=:),allocatable :: line
real,allocatable :: values(:)
integer :: icount, ier, ierr
INFINITE: do
call readline(line,ier)
if(allocated(values))deallocate(values)
allocate(values(len(line)/2+1))
if(ier.ne.0)exit INFINITE
call getvals(line,values,icount,ierr)
write(*,'(*(g0,1x))')'VALUES=',values(:icount),'NUMBER OF VALUES=',icount,'STATUS=',ierr
enddo INFINITE
end program tryit
Honesty, it should work reasonably with just about any line you throw at it.
PS:
If you are always reading four values, using list-directed I/O and checking the iostat= value on READ and checking if you hit EOR would be very simple (just a few lines) but since you said you wanted to read lines of arbitrary length I am assuming four values on a line was just an example and you wanted something very generic.

Number of Occurences

I am new to Python and trying to use it for competitive programming.
This is the question:
You are given an unsorted array of characters 'n' and a character 'x'. You have to find the number of times x occurs in character array
Input format:
First line contains an integer T, number of test cases. Then follows T test cases. Each test case consists of two lines. First line contains N, length of the array. Second lines contains N space separated characters. Example
Input
2
4
a b c d
5
a b x c d
Output
0
1
In c++ I know we start like this:
int t;
cin>>t;
while(t--)
{
int n;
cin>>n;
int arr[n];
for(int i=0;i<n;i++)
{
cin>>arr[i];
}
...
How do we do the same in python?
This is my code:
t = int(input())
while t:
n = int(input())
arr = [x for x in raw_input().split()]
res = 0
for i in arr:
if i == 'x':
res += 1
t -=1
print(res)
Getting runtime error for this
I think the issue is with how I'm taking inputs to run test cases but not sure
#juanpa arrivillaga answered your question, but it may help you understand that answer better to see an actual implementation.
Example:
from io import StringIO
io_buffer = """2
4
abcd
5
abxcd
"""
with StringIO(io_buffer) as buffer:
num_test_cases = int(buffer.readline())
for _ in range(num_test_cases):
num_chars = int(buffer.readline())
char_line = buffer.readline().strip()
# This is the important part - everything else is overhead
x_count = char_line.count("x")
print(f"character string: {char_line:>5}, length: {num_chars}, x count: {x_count}")
Output:
character string: abcd, length: 4, x count: 0
character string: abxcd, length: 5, x count: 1

Error in reading data into 3x3 matrix in Fortran [duplicate]

I would like to read and store scientific formatted numbers from a txt file, which is formatted and the numbers are separated by tabulator.
This is what I have so far:
IMPLICIT NONE
REAL,ALLOCATABLE,DIMENSION(2) :: data(:,:)
INTEGER :: row,column
INTEGER :: j,i
CHARACTER(len=30) :: filename
CHARACTER(len=30) :: format
filename='data.txt'
open(86,file=filename,err=10)
write(*,*)'open data file'
read(86, *) row
read(86, *) column
allocate(data(row,column))
format='(ES14.7)'
do i=1,row
read(86,format) data(i,:)
enddo
close(86)
This is how the txt file looks like:
200
35
2.9900E-35 2.8000E-35 2.6300E-35 2.4600E-35 2.3100E-35 2.1600E-35 ...
The problem is that it doesn't read and store the correct values from the txt to the data variable. Is it the format causing the problem?
I would also like to know how to count the number of columns in this case. (I can count the rows by using read(86,*) in a for loop.)
Yes, your format is not good for the data you show. Better one should be like that read(99,'(6(E11.4,X))') myData(i,:).
However, I am not sure if you really need to use format at your reading at all.
Following example pretty close to what you are trying to do, and it is working bot with and without format.
program readdata
implicit none
real, allocatable :: myData(:,:)
real :: myLine
integer :: i, j, myRow, myColumn
character(len=30) :: myFileName
character(len=30) :: myFormat
myFileName='data.dat'
open(99, file=myFileName)
write(*,*)'open data file'
read(99, *) myRow
read(99, *) myColumn
allocate(myData(myRow,myColumn))
do i=1,myRow
read(99,*) myData(i,:)
!read(99,'(6(E11.4,X))') myData(i,:)
print*, myData(i,:)
enddo
close(99)
end program readdata
To test, I assumed that you have rows and columns always in the file, as you give, so my test data was following.
2
6
2.9900E-35 2.8000E-35 2.6300E-35 2.4600E-35 2.3100E-35 2.1600E-35
2.9900E-35 2.8000E-35 2.6300E-35 2.4600E-35 2.3100E-35 2.1600E-35
If you are really interested to read your files with a format and if the number of columns are not constant you may need a format depending on a variable, please see related discussions here.
Though there are no direct command to count the number of items in a line, we can count the number of periods or (E|e|D|d) by using the scan command. For example,
program main
implicit none
character(100) str
integer n
read( *, "(a)" ) str
call countreal( str, n )
print *, "number of items = ", n
contains
subroutine countreal( str, num )
implicit none
character(*), intent(in) :: str
integer, intent(out) :: num
integer pos, offset
num = 0
pos = 0
do
offset = scan( str( pos + 1 : ), "." ) !! (1) search for periods
!! offset = scan( str( pos + 1 : ), "EeDd" ) !! (2) search for (E|e|D|d)
if ( offset > 0 ) then
pos = pos + offset
num = num + 1
print *, "pos=", pos, "num=", num !! just for check
else
return
endif
enddo
endsubroutine
end
Please note that pattern (1) works only when all items have periods, while pattern (2) works only when all items have exponents:
# When compiled with (1)
$ echo "2.9900 2.8000E-35 2.6300D-35 2.46 2.31" | ./a.out
pos= 2 num= 1
pos= 10 num= 2
pos= 22 num= 3
pos= 34 num= 4
pos= 40 num= 5
number of items = 5
# When compiled with (2)
$ echo "2.9900E-35 2.8000D-35 2.6300e-35 2.4600d-35" | ./a.out
pos= 7 num= 1
pos= 19 num= 2
pos= 31 num= 3
pos= 43 num= 4
number of items = 4
For more general purposes, it may be more convenient to write a custom "split()" function that separate items with white spaces (or use an external library that supports a split function).

Reading a float and modifying variable based file in fortran : correct format

I am trying to write a code where I want to read a variable (Delta), and based on that I am trying to copy the corresponding file (TESTDIR/Delta0.5_DOS_2D_TBM.data DOS.data for Delta=0.5) to the present directory.
Program Modify_variable_based_file
character(LEN=100):: command
character(LEN=10):: chDelta
real*8:: Delta
Print*,'Enter Delta'
Read*,Delta
write(chDelta,'(f0.1)') Delta
print*,'chDelta=',chDelta,' Delta=',Delta
command='cp TESTDIR/Delta' // trim(adjustl(chDelta)) //'_DOS_2D_TBM.data DOS.data'
call system(command)
End Program Modify_variable_based_file
However, I can see chDelta is .5 instead of 0.5 when I input Delta.
Can you suggest me the correct format? And is there an alternative where I can avoid the string conversion?
Note that here my files are named with number having the most significant digit on the left of the decimal, i.e. if it Delta is 1.5, file is Delta1.5_DOS_2D_TBM.data. Zero arises before the decimal only when there are no other significant digits.
If you need only one digit after the decimal point, you can use f20.1 etc such that
character(LEN=100) :: chDelta !! use a sufficiently large buffer
write( chDelta,'(f20.1)' ) Delta
chDelta = adjustL( chDelta )
Then the chDelta corresponding to Delta=0.5 becomes "0.5" (note that adjustL() removes all the blanks before 0). Similarly, you can retain 4 digits by using the format
write( chDelta,'(f20.4)' ) Delta
which gives chDelta = "0.5000". To obtain a flexible number of nonzero digits after the decimal point, we may need to remove unnecessary zeros manually. This can be done, for example, by searching for the last nonzero digit and removing the trailing zeros.
real*8 x( 5 )
character(100) str
x(:) = [ 1.0d0, 0.2d0, 1.23d0, -123.456d0, 123.678901d0 ]
do i = 1, 5
write( str, "(f20.4)" ) x(i)
call truncate( str, 4 )
print *, "file", trim(str), ".dat"
enddo
...
subroutine truncate( str, dmax )
implicit none
character(*), intent(inout) :: str
integer, intent(in) :: dmax !! maximum number of nonzero digits
integer :: dot, k, last
str = adjustL( str )
dot = index( str, '.' )
do k = dot + dmax, dot, -1
if ( str( k:k ) /= '0' ) then
last = k
exit
endif
enddo
if ( last == dot ) last = last + 1 !! retain at least one digit
str = str( 1:last )
end
Then the output becomes
file1.0.dat
file0.2.dat
file1.23.dat
file-123.456.dat
file123.6789.dat

Equivalent of adding strings to a loop, for strings (matlab)?

How would I be able to do the equivalent of this with strings:
a = [1 2 3; 4 5 6];
c = [];
for i=1:5
b = a(1,:)+i;
c = [c;b];
end
c =
2 3 4
3 4 5
4 5 6
5 6 7
6 7 8
Basically looking to combine several strings into a Matrix.
You're growing a variable in a loop, which is a kind of sin in Matlab :) So I'm going to show you some better ways of doing array concatenation.
There's cell strings:
>> C = {
'In a cell string, it'
'doesn''t matter'
'if the strings'
'are not of equal lenght'};
>> C{2}
ans =
doesn't matter
Which you could use in a loop like so:
% NOTE: always pre-allocate everything before a loop
C = cell(5,1);
for ii = 1:5
% assign some random characters
C{ii} = char( '0'+round(rand(1+round(rand*10),1)*('z'-'0')) );
end
There's ordinary arrays, which have as a drawback that you have to know the size of all your strings beforehand:
a = [...
'testy' % works
'droop'
];
b = [...
'testing' % ERROR: CAT arguments dimensions
'if this works too' % are not consistent.
];
for these cases, use char:
>> b = char(...
'testing',...
'if this works too'...
);
b =
'testing '
'if this works too'
Note how char pads the first string with spaces to fit the length of the second string. Now again: don't use this in a loop, unless you've pre-allocated the array, or if there really is no other way to go.
Type help strfun on the Matlab command prompt to get an overview of all string-related functions available in Matlab.
You mean storing a string on each matrix position? You can't do that, since matrices are defined over basic types. You can have a CHAR on each position:
>> a = 'bla';
>> b = [a; a]
b <2x3 char> =
bla
bla
>> b(2,3) = 'e'
b =
bla
ble
If you want to store matrices, use a cell array (MATLAB reference, Blog of Loren Shure), which are kind of similar but using "{}" instead of "()":
>> c = {a; a}
c =
'bla'
'bla'
>> c{2}
ans =
bla

Resources