I have a character string array in Fortran as ' results: CI- Energies --- th= 89 ph=120'. How do I extract the characters '120' from the string and store into a real variable?
The string is written in the file 'input.DAT'. I have written the Fortran code as:
implicit real*8(a-h,o-z)
character(39) line
open(1,file='input.DAT',status='old')
read(1,'(A)') line,phi
write(*,'(A)') line
write(*,*)phi
end
Upon execution it shows:
At line 5 of file string.f (unit = 1, file = 'input.dat')
Fortran runtime error: End of file
I have given '39' as the dimension of the character array as there are 39 characters including 'spaces' in the string upto '120'.
Assuming that the real number you want to read appears after the last equal sign in the string, you can use the SCAN intrinsic function to find that location and then READ the number from the rest of the string, as shown in the following program.
program xreadnum
implicit none
integer :: ipos
integer, parameter :: nlen = 100
character (len=nlen) :: str
real :: xx
str = "results: CI- Energies --- th= 89 ph=120"
ipos = scan(str, "=", back=.true.)
print*, "reading real variable from '" // trim(str(1+ipos:)) // "'"
read (str(1+ipos:),*) xx
print*, "xx = ", xx
end program xreadnum
! gfortran output:
! reading real variable from '120'
! xx = 120.000000
To convert string s into a real type variable r:
READ(s, "(Fw.d)") r
Here w is the total field width and d is the number of digits after the decimal point. If there is no decimal point in the input string, values of w and d might affect the result, e.g.
s = '120'
READ(s, "(F3.0)") r ! r <-- 120.0
READ(s, "(F3.1)") r ! r <-- 12.0
Answer to another part of the question (how to extract substring with particular number to convert) strongly depends on the format of the input strings, e.g. if all the strings are formed by fixed-width fields, it's possible to skip irrelevant part of the string:
s = 'a=120'
READ(s(3:), "(F3.0)") r
Related
I have specific dataformat, say 'n' (arbitrary) row and '4' columns. If 'n' is '10', the example data would go like this.
1.01e+00 -2.01e-02 -3.01e-01 4.01e+02
1.02e+00 -2.02e-02 -3.02e-01 4.02e+02
1.03e+00 -2.03e-02 -3.03e-01 4.03e+02
1.04e+00 -2.04e-02 -3.04e-01 4.04e+02
1.05e+00 -2.05e-02 -3.05e-01 4.05e+02
1.06e+00 -2.06e-02 -3.06e-01 4.06e+02
1.07e+00 -2.07e-02 -3.07e-01 4.07e+02
1.08e+00 -2.08e-02 -3.08e-01 4.07e+02
1.09e+00 -2.09e-02 -3.09e-01 4.09e+02
1.10e+00 -2.10e-02 -3.10e-01 4.10e+02
Constraints in building this input would be
data should have '4' columns.
data separated by white spaces.
I want to implement a feature to check whether the input file has '4' columns in every row, and built my own based on the 'M.S.B's answer in the post Reading data file in Fortran with known number of lines but unknown number of entries in each line.
program readtest
use :: iso_fortran_env
implicit none
character(len=512) :: buffer
integer :: i, i_line, n, io, pos, pos_tmp, n_space
integer,parameter :: max_len = 512
character(len=max_len) :: filename
filename = 'data_wrong.dat'
open(42, file=trim(filename), status='old', action='read')
print *, '+++++++++++++++++++++++++++++++++++'
print *, '+ Count lines +'
print *, '+++++++++++++++++++++++++++++++++++'
n = 0
i_line = 0
do
pos = 1
pos_tmp = 1
i_line = i_line+1
read(42, '(a)', iostat=io) buffer
(*1)! Count blank spaces.
n_space = 0
do
pos = index(buffer(pos+1:), " ") + pos
if (pos /= 0) then
if (pos > pos_tmp+1) then
n_space = n_space+1
pos_tmp = pos
else
pos_tmp = pos
end if
endif
if (pos == max_len) then
exit
end if
end do
pos_tmp = pos
if (io /= 0) then
exit
end if
print *, '> line : ', i_line, ' n_space : ', n_space
n = n+1
end do
print *, ' >> number of line = ', n
end program
If I run the above program with a input file with some wrong rows like follows,
1.01e+00 -2.01e-02 -3.01e-01 4.01e+02
1.02e+00 -2.02e-02 -3.02e-01 4.02e+02
1.03e+00 -2.03e-02 -3.03e-01 4.03e+02
1.04e+00 -2.04e-02 -3.04e-01 4.04e+02
1.05e+00 -2.05e-02 -3.05e-01 4.05e+02
1.06e+00 -2.06e-02 -3.06e-01 4.06e+02
1.07e+00 -2.07e-02 -3.07e-01 4.07e+02
1.0 2.0 3.0
1.08e+00 -2.08e-02 -3.08e-01 4.07e+02 1.00
1.09e+00 -2.09e-02 -3.09e-01 4.09e+02
1.10e+00 -2.10e-02 -3.10e-01 4.10e+02
The output is like this,
+++++++++++++++++++++++++++++++++++
+ Count lines +
+++++++++++++++++++++++++++++++++++
> line : 1 n_space : 4
> line : 2 n_space : 4
> line : 3 n_space : 4
> line : 4 n_space : 4
> line : 5 n_space : 4
> line : 6 n_space : 4
> line : 7 n_space : 4
> line : 8 n_space : 3 (*2)
> line : 9 n_space : 5 (*3)
> line : 10 n_space : 4
> line : 11 n_space : 4
>> number of line = 11
And you can see that the wrong rows are properly detected as I intended (see (*2) and (*3)), and I can write 'if' statements to make some error messages.
But I think my code is 'extremely' ugly since I had to do something like (*1) in the code to count consecutive white spaces as one space. I think there would be much more elegant way to ensure the rows contain only '4' column each, say,
read(*,'4(X, A)') line
(which didn't work)
And also my program would fail if the length of 'buffer' exceeds 'max_len' which is set to '512' in this case. Indeed '512' should be enough for most practical purposes, I also want my checking subroutine to be robust in this way.
So, I want to improve my subroutine in at least these aspects
Want it to be more elegant (not as (*1))
Be more general (especially in regards to 'max_len')
Does anyone has some experience in building this kind of input-checking subroutine ??
Any comments would be highly appreciated.
Thank you for reading the question.
Without knowledge of the exact data format, I think it would be rather difficult to achieve what you want (or at least, I wouldn't know how to do it).
In the most general case, I think your space counting idea is the most robust and correct.
It can be adapted to avoid the maximum string length problem you describe.
In the following code, I go through the data as an unformatted, stream access file.
Basically you read every character and take note of new_lines and spaces.
As you did, you use spaces to count to columns (skipping double spaces) and new_line characters to count the rows.
However, here we are not reading the entire line as a string and going through it to find spaces; we read char by char, avoiding the fixed string length problem and we also end up with a single loop. Hope it helps.
EDIT: now handles white spaces at beginning at end of line and empty lines
program readtest
use :: iso_fortran_env
implicit none
character :: old_char, new_char
integer :: line, io, cols
logical :: beg_line
integer,parameter :: max_len = 512
character(len=max_len) :: filename
filename = 'data_wrong.txt'
! Output format to be used later
100 format (a, 3x, i0, a, 3x , i0)
open(42, file=trim(filename), status='old', action='read', &
form="unformatted", access="stream")
! set utils
old_char = " "
line = 0
beg_line = .true.
cols = 0
! Start scannig char by char
do
read(42, iostat = io) new_char
! Exit if EOF
if (io < 0) then
exit
end if
! Deal with empty lines
if (beg_line .and. new_char==new_line(new_char)) then
line = line + 1
write(*, 100, advance="no") "Line number:", line, &
"; Columns: Number", cols
write(*,'(6x, a5)') "EMPTYLINE"
! Deal with beginning of line for white spaces
elseif (beg_line) then
beg_line = .false.
! this indicates new columns
elseif (new_char==" " .and. old_char/=" ") then
cols = cols + 1
! End of line: time to print
elseif (new_char==new_line(new_char)) then
if (old_char/=" ") then
cols = cols+1
endif
line = line + 1
! Printing out results
write(*, 100, advance="no") "Line number:", line, &
"; Columns: Number", cols
if (cols == 4) then
write(*,'(6x, a5)') "OK"
else
write(*,'(6x, a5)') "ERROR"
end if
! Restart with a new line (reset counters)
cols = 0
beg_line = .true.
end if
old_char = new_char
end do
end program
This is the output of this program:
Line number: 1; Columns number: 4 OK
Line number: 2; Columns number: 4 OK
Line number: 3; Columns number: 4 OK
Line number: 4; Columns number: 4 OK
Line number: 5; Columns number: 4 OK
Line number: 6; Columns number: 4 OK
Line number: 7; Columns number: 4 OK
Line number: 8; Columns number: 3 ERROR
Line number: 9; Columns number: 5 ERROR
Line number: 10; Columns number: 4 OK
Line number: 11; Columns number: 4 OK
If you knew your data format, you could read your lines in a vector of dimension 4 and use iostat variable to print out an error on each line where iostat is an integer greater than 0.
Instead of counting whitespace you can use manipulation of substrings to get what you want. A simple example follows:
program foo
implicit none
character(len=512) str ! Assume str is sufficiently long buffer
integer fd, cnt, m, n
open(newunit=fd, file='test.dat', status='old')
do
cnt = 0
read(fd,'(A)',end=10) str
str = adjustl(str) ! Eliminate possible leading whitespace
do
n = index(str, ' ') ! Find first space
if (n /= 0) then
write(*, '(A)', advance='no') str(1:n)
str = adjustl(str(n+1:))
end if
if (len_trim(str) == 0) exit ! Trailing whitespace
cnt = cnt + 1
end do
if (cnt /= 3) then
write(*,'(A)') ' Error'
else
write(*,*)
end if
end do
10 close(fd)
end program foo
this should read any line of reasonable length (up to the line limit your compiler defaults to, which is generally 2GB now-adays). You could change it to stream I/O to have no limit but most Fortran compilers have trouble reading stream I/O from stdin, which this example reads from. So if the line looks anything like a list of numbers it should read them, tell you how many it read, and let you know if it had an error reading any value as a number (character strings, strings bigger than the size of a REAL value, ....). All the parts here are explained on the Fortran Wiki, but to keep it short this is a stripped down version that just puts the pieces together. The oddest behavior it would have is that if you entered something like this with a slash in it
10 20,,30,40e4 50 / this is a list of numbers
it would treat everything after the slash as a comment and not generate a non-zero status return while returning five values. For a more detailed explanation of the code I think the annotated pieces on the Wiki explain how it works. In the search, look for "getvals" and "readline".
So with this program you can read a line and if the return status is zero and the number of values read is four you should be good except for a few dusty corners where the lines would definitely not look like a list of numbers.
module M_getvals
private
public getvals, readline
implicit none
contains
subroutine getvals(line,values,icount,ierr)
character(len=*),intent(in) :: line
real :: values(:)
integer,intent(out) :: icount, ierr
character(len=:),allocatable :: buffer
character(len=len(line)) :: words(size(values))
integer :: ios, i
ierr=0
words=' '
buffer=trim(line)//"/"
read(buffer,*,iostat=ios) words
icount=0
do i=1,size(values)
if(words(i).eq.'') cycle
read(words(i),*,iostat=ios)values(icount+1)
if(ios.eq.0)then
icount=icount+1
else
ierr=ios
write(*,*)'*getvals* WARNING:['//trim(words(i))//'] is not a number'
endif
enddo
end subroutine getvals
subroutine readline(line,ier)
character(len=:),allocatable,intent(out) :: line
integer,intent(out) :: ier
integer,parameter :: buflen=1024
character(len=buflen) :: buffer
integer :: last, isize
line=''
ier=0
INFINITE: do
read(*,iostat=ier,fmt='(a)',advance='no',size=isize) buffer
if(isize.gt.0)line=line//buffer(:isize)
if(is_iostat_eor(ier))then
last=len(line)
if(last.ne.0)then
if(line(last:last).eq.'\\')then
line=line(:last-1)
cycle INFINITE
endif
endif
ier=0
exit INFINITE
elseif(ier.ne.0)then
exit INFINITE
endif
enddo INFINITE
line=trim(line)
end subroutine readline
end module M_getvals
program tryit
use M_getvals, only: getvals, readline
implicit none
character(len=:),allocatable :: line
real,allocatable :: values(:)
integer :: icount, ier, ierr
INFINITE: do
call readline(line,ier)
if(allocated(values))deallocate(values)
allocate(values(len(line)/2+1))
if(ier.ne.0)exit INFINITE
call getvals(line,values,icount,ierr)
write(*,'(*(g0,1x))')'VALUES=',values(:icount),'NUMBER OF VALUES=',icount,'STATUS=',ierr
enddo INFINITE
end program tryit
Honesty, it should work reasonably with just about any line you throw at it.
PS:
If you are always reading four values, using list-directed I/O and checking the iostat= value on READ and checking if you hit EOR would be very simple (just a few lines) but since you said you wanted to read lines of arbitrary length I am assuming four values on a line was just an example and you wanted something very generic.
local data = "here is a string"
local no = 12
foo = string.format("%50s %05d",data,no)
print(foo:len(),string.format("%q",foo))
defines foo as a string of specific length
" here is a string 00012"
However, is there an easy way to get
"here is a string 00012"
I know, that I can fill up the string data with spaces
while data:len() < 50 do data = data.." " end
Add a minus to format string %-50s to align text to the left:
foo = string.format("%-50s %05d","here is a string", 12)
print(foo:len(), foo)
Output:
56 here is a string 00012
Allowed flags:
- : left align result inside field
+ : always prefix with a sign, using + if field positive
0 : left-fill with zeroes rather than spaces
(space) : If positive, put a space where the + would have been
# : Changes the behaviour of various formats, as follows:
For octal conversion (o), prefixes the number with 0 - if necessary.
For hex conversion (x), prefixes the number with 0x
For hex conversion (X), prefixes the number with 0X
For e, E and f formats, always show the decimal point.
For g and G format, always show the decimal point, and do not truncate trailing zeroes.
The option to 'always show the decimal point' would only apply if you had the precision set to 0.
I want to write a code in MATLAB that converts a letter into NATO alphabet. Such as the word 'hello' would be re-written as Hotel-Echo-Lima-Lima-Oscar. I have been having some trouble with the code. So far I have the following:
function natoText = textToNato(plaintext)
plaintext = lower(plaintext);
r = zeros(1, length(plaintext))
%Define my NATO alphabet
natalph = ["Alpha","Bravo","Charlie","Delta","Echo","Foxtrot","Golf", ...
"Hotel","India","Juliet","Kilo","Lima","Mike","November","Oscar", ...
"Papa","Quebec","Romeo","Sierra","Tango","Uniform","Victor",...
"Whiskey","Xray","Yankee","Zulu"];
%Define the normal lower alphabet
noralpha = ['a' : 'z'];
%Now we need to make a loop for matlab to check for each letter
for i = 1:length(text)
for j = 1:26
n = r(i) == natalph(j);
if noralpha(j) == text(i) : n
else r(i) = r(i)
natoText = ''
end
end
end
for v = 1:length(plaintext)
natoText = natoText + r(v) + ''
natoText = natoText(:,-1)
end
end
I know the above code is a mess and I am a bit in doubt what really I have been doing. Is there anyone who knows a better way of doing this? Can I modify the above code so that it works?
It is because now when I run the code, I am getting an empty plot, which I don't know why because I have not asked for a plot in any lines.
You can actually do your conversion in one line. Given your string array natalph:
plaintext = 'hello'; % Your input; could also be "hello"
natoText = strjoin(natalph(char(lower(plaintext))-96), '-');
And the result:
natoText =
string
"Hotel-Echo-Lima-Lima-Oscar"
This uses a trick that character arrays can be treated as numeric arrays of their ASCII equivalent values. The code char(lower(plaintext))-96 converts plaintext to lowercase, then to a character array (if it isn't already) and implicitly converts it to a numeric vector of ASCII values by subtracting 96. Since 'a' is equal to 97, this creates an index vector containing the values 1 ('a') through 26 ('z'). This is used to index the string array natalph, and these are then joined together with hyphens.
I would like to read a data file with a Fortran program, where each line is a list of integers.
Each line has a variable number of integers, separated by a given character (space, comma...).
Sample input:
1,7,3,2
2,8
12,44,13,11
I have a solution to split lines, which I find rather convoluted:
module split
implicit none
contains
function string_to_integers(str, sep) result(a)
integer, allocatable :: a(:)
integer :: i, j, k, n, m, p, r
character(*) :: str
character :: sep, c
character(:), allocatable :: tmp
!First pass: find number of items (m), and maximum length of an item (r)
n = len_trim(str)
m = 1
j = 0
r = 0
do i = 1, n
if(str(i:i) == sep) then
m = m + 1
r = max(r, j)
j = 0
else
j = j + 1
end if
end do
r = max(r, j)
allocate(a(m))
allocate(character(r) :: tmp)
!Second pass: copy each item into temporary string (tmp),
!read an integer from tmp, and write this integer in the output array (a)
tmp(1:r) = " "
j = 0
k = 0
do i = 1, n
c = str(i:i)
if(c == sep) then
k = k + 1
read(tmp, *) p
a(k) = p
tmp(1:r) = " "
j = 0
else
j = j + 1
tmp(j:j) = c
end if
end do
k = k + 1
read(tmp, *) p
a(k) = p
deallocate(tmp)
end function
end module
My question:
Is there a simpler way to do this in Fortran? I mean, reading a list of values where the number of values to read is unknown. The above code looks awkward, and file I/O does not look easy in Fortran.
Also, the main program has to read lines with unknown and unbounded length. I am able to read lines if I assume they are all the same length (see below), but I don't know how to read unbounded lines. I suppose it would need the stream features of Fortran 2003, but I don't know how to write this.
Here is the current program:
program read_data
use split
implicit none
integer :: q
integer, allocatable :: a(:)
character(80) :: line
open(unit=10, file="input.txt", action="read", status="old", form="formatted")
do
read(10, "(A80)", iostat=q) line
if(q /= 0) exit
if(line(1:1) /= "#") then
a = string_to_integers(line, ",")
print *, ubound(a), a
end if
end do
close(10)
end program
A comment about the question: usually I would do this in Python, for example converting a line would be as simple as a = [int(x) for x in line.split(",")], and reading a file is likewise almost a trivial task. And I would do the "real" computing stuff with a Fortran DLL. However, I'd like to improve my Fortran skills on file I/O.
I don't claim it is the shortest possible, but it is much shorter than yours. And once you have it, you can reuse it. I don't completely agree with these claims how Fotran is bad at string processing, I do tokenization, recursive descent parsing and similar stuff just fine in Fortran, although it is easier in some other languages with richer libraries. Sometimes you can use the libraries written in other languages (especially C and C++) in Fortran too.
If you always use the comma you can remove the replacing by comma and thus shorten it even more.
function string_to_integers(str, sep) result(a)
integer, allocatable :: a(:)
character(*) :: str
character :: sep
integer :: i, n_sep
n_sep = 0
do i = 1, len_trim(str)
if (str(i:i)==sep) then
n_sep = n_sep + 1
str(i:i) = ','
end if
end do
allocate(a(n_sep+1))
read(str,*) a
end function
Potential for shortening: view the str as a character array using equivalence or transfer and use count() inside of allocate to get the size of a.
The code assumes that there is just one separator between each number and there is no separator before the first one. If multiple separators are allowed between two numbers, you have to check whether the preceding character is a separator or not
do i = 2, len_trim(str)
if (str(i:i)==sep .and. str(i-1:i-1)/=sep) then
n_sep = n_sep + 1
str(i:i) = ','
end if
end do
My answer is probably too simplistic for your goals but I have spent a lot of time recently reading in strange text files of numbers. My biggest problem is finding where they start (not hard in your case) then my best friend is the list-directed read.
read(unit=10,fmt=*) a
will read in all of the data into vector 'a', done deal. With this method you will not know which line any piece of data came from. If you want to allocate it then you can read the file once and figure out some algorithm to make the array larger than it needs to be, like maybe count the number of lines and you know a max data amount per line (say 21).
status = 0
do while ( status == 0)
line_counter = line_counter + 1
read(unit=10,, iostat=status, fmt=*)
end do
allocate(a(counter*21))
If you want to then eliminate zero values you can remove them or pre-seed the 'a' vector with a negative number if you don't expect any then remove all of those.
Another approach stemming from the other suggestion is to first count the commas then do a read where the loop is controlled by
do j = 1, line_counter ! You determined this on your first read
read(unit=11,fmt=*) a(j,:) ! a is now a 2 dimensional array (line_counter, maxNumberPerLine)
! You have a separate vector numberOfCommas(j) from before
end do
And now you can do whatever you want with these two arrays because you know all the data, which line it came from, and how many data were on each line.
I have a homework program I have run into a problem with. We basically have to take a word (such as MATLAB) and have the function give us the correct score value for it using the rules of Scrabble. There are other things involved such as double word and double point values, but what I'm struggling with is converting to ASCII. I need to get my string into ASCII form and then sum up those values. We only know the bare basics of strings and our teacher is pretty useless. I've tried converting the string into numbers, but that's not exactly working out. Any suggestions?
function[score] = scrabble(word, letterPoints)
doubleword = '#';
doubleletter = '!';
doublew = [findstr(word, doubleword)]
trouble = [findstr(word, doubleletter)]
word = char(word)
gameplay = word;
ASCII = double(gameplay)
score = lower(sum(ASCII));
Building on Francis's post, what I would recommend you do is create a lookup array. You can certainly convert each character into its ASCII equivalent, but then what I would do is have an array where the input is the ASCII code of the character you want (with a bit of modification), and the output will be the point value of the character. Once you find this, you can sum over the points to get your final point score.
I'm going to leave out double points, double letters, blank tiles and that whole gamut of fun stuff in Scrabble for now in order to get what you want working. By consulting Wikipedia, this is the point distribution for each letter encountered in Scrabble.
1 point: A, E, I, O, N, R, T, L, S, U
2 points: D, G
3 points: B, C, M, P
4 points: F, H, V, W, Y
5 points: K
8 points: J, X
10 points: Q, Z
What we're going to do is convert your word into lower case to ensure consistency. Now, if you take a look at the letter a, this corresponds to ASCII code 97. You can verify that by using the double function we talked about earlier:
>> double('a')
97
As there are 26 letters in the alphabet, this means that going from a to z should go from 97 to 122. Because MATLAB starts indexing arrays at 1, what we can do is subtract each of our characters by 96 so that we'll be able to figure out the numerical position of these characters from 1 to 26.
Let's start by building our lookup table. First, I'm going to define a whole bunch of strings. Each string denotes the letters that are associated with each point in Scrabble:
string1point = 'aeionrtlsu';
string2point = 'dg';
string3point = 'bcmp';
string4point = 'fhvwy';
string5point = 'k';
string8point = 'jx';
string10point = 'qz';
Now, we can use each of the strings, convert to double, subtract by 96 then assign each of the corresponding locations to the points for each letter. Let's create our lookup table like so:
lookup = zeros(1,26);
lookup(double(string1point) - 96) = 1;
lookup(double(string2point) - 96) = 2;
lookup(double(string3point) - 96) = 3;
lookup(double(string4point) - 96) = 4;
lookup(double(string5point) - 96) = 5;
lookup(double(string8point) - 96) = 8;
lookup(double(string10point) - 96) = 10;
I first create an array of length 26 through the zeros function. I then figure out where each letter goes and assign to each letter their point values.
Now, the last thing you need to do is take a string, take the lower case to be sure, then convert each character into its ASCII equivalent, subtract by 96, then sum up the values. If we are given... say... MATLAB:
stringToConvert = 'MATLAB';
stringToConvert = lower(stringToConvert);
ASCII = double(stringToConvert) - 96;
value = sum(lookup(ASCII));
Lo and behold... we get:
value =
10
The last line of the above code is crucial. Basically, ASCII will contain a bunch of indexing locations where each number corresponds to the numerical position of where the letter occurs in the alphabet. We use these positions to look up what point / score each letter gives us, and we sum over all of these values.
Part #2
The next part where double point values and double words come to play can be found in my other StackOverflow post here:
Calculate Scrabble word scores for double letters and double words MATLAB
Convert from string to ASCII:
>> myString = 'hello, world';
>> ASCII = double(myString)
ASCII =
104 101 108 108 111 44 32 119 111 114 108 100
Sum up the values:
>> total = sum(ASCII)
total =
1160
The MATLAB help for char() says (emphasis added):
S = char(X) converts array X of nonnegative integer codes into a character array. Valid codes range from 0 to 65535, where codes 0 through 127 correspond to 7-bit ASCII characters. The characters that MATLABĀ® can process (other than 7-bit ASCII characters) depend upon your current locale setting. To convert characters into a numeric array, use the double function.
ASCII chart here.