Reading a file of lists of integers in Fortran - string

I would like to read a data file with a Fortran program, where each line is a list of integers.
Each line has a variable number of integers, separated by a given character (space, comma...).
Sample input:
1,7,3,2
2,8
12,44,13,11
I have a solution to split lines, which I find rather convoluted:
module split
implicit none
contains
function string_to_integers(str, sep) result(a)
integer, allocatable :: a(:)
integer :: i, j, k, n, m, p, r
character(*) :: str
character :: sep, c
character(:), allocatable :: tmp
!First pass: find number of items (m), and maximum length of an item (r)
n = len_trim(str)
m = 1
j = 0
r = 0
do i = 1, n
if(str(i:i) == sep) then
m = m + 1
r = max(r, j)
j = 0
else
j = j + 1
end if
end do
r = max(r, j)
allocate(a(m))
allocate(character(r) :: tmp)
!Second pass: copy each item into temporary string (tmp),
!read an integer from tmp, and write this integer in the output array (a)
tmp(1:r) = " "
j = 0
k = 0
do i = 1, n
c = str(i:i)
if(c == sep) then
k = k + 1
read(tmp, *) p
a(k) = p
tmp(1:r) = " "
j = 0
else
j = j + 1
tmp(j:j) = c
end if
end do
k = k + 1
read(tmp, *) p
a(k) = p
deallocate(tmp)
end function
end module
My question:
Is there a simpler way to do this in Fortran? I mean, reading a list of values where the number of values to read is unknown. The above code looks awkward, and file I/O does not look easy in Fortran.
Also, the main program has to read lines with unknown and unbounded length. I am able to read lines if I assume they are all the same length (see below), but I don't know how to read unbounded lines. I suppose it would need the stream features of Fortran 2003, but I don't know how to write this.
Here is the current program:
program read_data
use split
implicit none
integer :: q
integer, allocatable :: a(:)
character(80) :: line
open(unit=10, file="input.txt", action="read", status="old", form="formatted")
do
read(10, "(A80)", iostat=q) line
if(q /= 0) exit
if(line(1:1) /= "#") then
a = string_to_integers(line, ",")
print *, ubound(a), a
end if
end do
close(10)
end program
A comment about the question: usually I would do this in Python, for example converting a line would be as simple as a = [int(x) for x in line.split(",")], and reading a file is likewise almost a trivial task. And I would do the "real" computing stuff with a Fortran DLL. However, I'd like to improve my Fortran skills on file I/O.

I don't claim it is the shortest possible, but it is much shorter than yours. And once you have it, you can reuse it. I don't completely agree with these claims how Fotran is bad at string processing, I do tokenization, recursive descent parsing and similar stuff just fine in Fortran, although it is easier in some other languages with richer libraries. Sometimes you can use the libraries written in other languages (especially C and C++) in Fortran too.
If you always use the comma you can remove the replacing by comma and thus shorten it even more.
function string_to_integers(str, sep) result(a)
integer, allocatable :: a(:)
character(*) :: str
character :: sep
integer :: i, n_sep
n_sep = 0
do i = 1, len_trim(str)
if (str(i:i)==sep) then
n_sep = n_sep + 1
str(i:i) = ','
end if
end do
allocate(a(n_sep+1))
read(str,*) a
end function
Potential for shortening: view the str as a character array using equivalence or transfer and use count() inside of allocate to get the size of a.
The code assumes that there is just one separator between each number and there is no separator before the first one. If multiple separators are allowed between two numbers, you have to check whether the preceding character is a separator or not
do i = 2, len_trim(str)
if (str(i:i)==sep .and. str(i-1:i-1)/=sep) then
n_sep = n_sep + 1
str(i:i) = ','
end if
end do

My answer is probably too simplistic for your goals but I have spent a lot of time recently reading in strange text files of numbers. My biggest problem is finding where they start (not hard in your case) then my best friend is the list-directed read.
read(unit=10,fmt=*) a
will read in all of the data into vector 'a', done deal. With this method you will not know which line any piece of data came from. If you want to allocate it then you can read the file once and figure out some algorithm to make the array larger than it needs to be, like maybe count the number of lines and you know a max data amount per line (say 21).
status = 0
do while ( status == 0)
line_counter = line_counter + 1
read(unit=10,, iostat=status, fmt=*)
end do
allocate(a(counter*21))
If you want to then eliminate zero values you can remove them or pre-seed the 'a' vector with a negative number if you don't expect any then remove all of those.
Another approach stemming from the other suggestion is to first count the commas then do a read where the loop is controlled by
do j = 1, line_counter ! You determined this on your first read
read(unit=11,fmt=*) a(j,:) ! a is now a 2 dimensional array (line_counter, maxNumberPerLine)
! You have a separate vector numberOfCommas(j) from before
end do
And now you can do whatever you want with these two arrays because you know all the data, which line it came from, and how many data were on each line.

Related

Analyzing input from an internal unit in fortran [duplicate]

I have specific dataformat, say 'n' (arbitrary) row and '4' columns. If 'n' is '10', the example data would go like this.
1.01e+00 -2.01e-02 -3.01e-01 4.01e+02
1.02e+00 -2.02e-02 -3.02e-01 4.02e+02
1.03e+00 -2.03e-02 -3.03e-01 4.03e+02
1.04e+00 -2.04e-02 -3.04e-01 4.04e+02
1.05e+00 -2.05e-02 -3.05e-01 4.05e+02
1.06e+00 -2.06e-02 -3.06e-01 4.06e+02
1.07e+00 -2.07e-02 -3.07e-01 4.07e+02
1.08e+00 -2.08e-02 -3.08e-01 4.07e+02
1.09e+00 -2.09e-02 -3.09e-01 4.09e+02
1.10e+00 -2.10e-02 -3.10e-01 4.10e+02
Constraints in building this input would be
data should have '4' columns.
data separated by white spaces.
I want to implement a feature to check whether the input file has '4' columns in every row, and built my own based on the 'M.S.B's answer in the post Reading data file in Fortran with known number of lines but unknown number of entries in each line.
program readtest
use :: iso_fortran_env
implicit none
character(len=512) :: buffer
integer :: i, i_line, n, io, pos, pos_tmp, n_space
integer,parameter :: max_len = 512
character(len=max_len) :: filename
filename = 'data_wrong.dat'
open(42, file=trim(filename), status='old', action='read')
print *, '+++++++++++++++++++++++++++++++++++'
print *, '+ Count lines +'
print *, '+++++++++++++++++++++++++++++++++++'
n = 0
i_line = 0
do
pos = 1
pos_tmp = 1
i_line = i_line+1
read(42, '(a)', iostat=io) buffer
(*1)! Count blank spaces.
n_space = 0
do
pos = index(buffer(pos+1:), " ") + pos
if (pos /= 0) then
if (pos > pos_tmp+1) then
n_space = n_space+1
pos_tmp = pos
else
pos_tmp = pos
end if
endif
if (pos == max_len) then
exit
end if
end do
pos_tmp = pos
if (io /= 0) then
exit
end if
print *, '> line : ', i_line, ' n_space : ', n_space
n = n+1
end do
print *, ' >> number of line = ', n
end program
If I run the above program with a input file with some wrong rows like follows,
1.01e+00 -2.01e-02 -3.01e-01 4.01e+02
1.02e+00 -2.02e-02 -3.02e-01 4.02e+02
1.03e+00 -2.03e-02 -3.03e-01 4.03e+02
1.04e+00 -2.04e-02 -3.04e-01 4.04e+02
1.05e+00 -2.05e-02 -3.05e-01 4.05e+02
1.06e+00 -2.06e-02 -3.06e-01 4.06e+02
1.07e+00 -2.07e-02 -3.07e-01 4.07e+02
1.0 2.0 3.0
1.08e+00 -2.08e-02 -3.08e-01 4.07e+02 1.00
1.09e+00 -2.09e-02 -3.09e-01 4.09e+02
1.10e+00 -2.10e-02 -3.10e-01 4.10e+02
The output is like this,
+++++++++++++++++++++++++++++++++++
+ Count lines +
+++++++++++++++++++++++++++++++++++
> line : 1 n_space : 4
> line : 2 n_space : 4
> line : 3 n_space : 4
> line : 4 n_space : 4
> line : 5 n_space : 4
> line : 6 n_space : 4
> line : 7 n_space : 4
> line : 8 n_space : 3 (*2)
> line : 9 n_space : 5 (*3)
> line : 10 n_space : 4
> line : 11 n_space : 4
>> number of line = 11
And you can see that the wrong rows are properly detected as I intended (see (*2) and (*3)), and I can write 'if' statements to make some error messages.
But I think my code is 'extremely' ugly since I had to do something like (*1) in the code to count consecutive white spaces as one space. I think there would be much more elegant way to ensure the rows contain only '4' column each, say,
read(*,'4(X, A)') line
(which didn't work)
And also my program would fail if the length of 'buffer' exceeds 'max_len' which is set to '512' in this case. Indeed '512' should be enough for most practical purposes, I also want my checking subroutine to be robust in this way.
So, I want to improve my subroutine in at least these aspects
Want it to be more elegant (not as (*1))
Be more general (especially in regards to 'max_len')
Does anyone has some experience in building this kind of input-checking subroutine ??
Any comments would be highly appreciated.
Thank you for reading the question.
Without knowledge of the exact data format, I think it would be rather difficult to achieve what you want (or at least, I wouldn't know how to do it).
In the most general case, I think your space counting idea is the most robust and correct.
It can be adapted to avoid the maximum string length problem you describe.
In the following code, I go through the data as an unformatted, stream access file.
Basically you read every character and take note of new_lines and spaces.
As you did, you use spaces to count to columns (skipping double spaces) and new_line characters to count the rows.
However, here we are not reading the entire line as a string and going through it to find spaces; we read char by char, avoiding the fixed string length problem and we also end up with a single loop. Hope it helps.
EDIT: now handles white spaces at beginning at end of line and empty lines
program readtest
use :: iso_fortran_env
implicit none
character :: old_char, new_char
integer :: line, io, cols
logical :: beg_line
integer,parameter :: max_len = 512
character(len=max_len) :: filename
filename = 'data_wrong.txt'
! Output format to be used later
100 format (a, 3x, i0, a, 3x , i0)
open(42, file=trim(filename), status='old', action='read', &
form="unformatted", access="stream")
! set utils
old_char = " "
line = 0
beg_line = .true.
cols = 0
! Start scannig char by char
do
read(42, iostat = io) new_char
! Exit if EOF
if (io < 0) then
exit
end if
! Deal with empty lines
if (beg_line .and. new_char==new_line(new_char)) then
line = line + 1
write(*, 100, advance="no") "Line number:", line, &
"; Columns: Number", cols
write(*,'(6x, a5)') "EMPTYLINE"
! Deal with beginning of line for white spaces
elseif (beg_line) then
beg_line = .false.
! this indicates new columns
elseif (new_char==" " .and. old_char/=" ") then
cols = cols + 1
! End of line: time to print
elseif (new_char==new_line(new_char)) then
if (old_char/=" ") then
cols = cols+1
endif
line = line + 1
! Printing out results
write(*, 100, advance="no") "Line number:", line, &
"; Columns: Number", cols
if (cols == 4) then
write(*,'(6x, a5)') "OK"
else
write(*,'(6x, a5)') "ERROR"
end if
! Restart with a new line (reset counters)
cols = 0
beg_line = .true.
end if
old_char = new_char
end do
end program
This is the output of this program:
Line number: 1; Columns number: 4 OK
Line number: 2; Columns number: 4 OK
Line number: 3; Columns number: 4 OK
Line number: 4; Columns number: 4 OK
Line number: 5; Columns number: 4 OK
Line number: 6; Columns number: 4 OK
Line number: 7; Columns number: 4 OK
Line number: 8; Columns number: 3 ERROR
Line number: 9; Columns number: 5 ERROR
Line number: 10; Columns number: 4 OK
Line number: 11; Columns number: 4 OK
If you knew your data format, you could read your lines in a vector of dimension 4 and use iostat variable to print out an error on each line where iostat is an integer greater than 0.
Instead of counting whitespace you can use manipulation of substrings to get what you want. A simple example follows:
program foo
implicit none
character(len=512) str ! Assume str is sufficiently long buffer
integer fd, cnt, m, n
open(newunit=fd, file='test.dat', status='old')
do
cnt = 0
read(fd,'(A)',end=10) str
str = adjustl(str) ! Eliminate possible leading whitespace
do
n = index(str, ' ') ! Find first space
if (n /= 0) then
write(*, '(A)', advance='no') str(1:n)
str = adjustl(str(n+1:))
end if
if (len_trim(str) == 0) exit ! Trailing whitespace
cnt = cnt + 1
end do
if (cnt /= 3) then
write(*,'(A)') ' Error'
else
write(*,*)
end if
end do
10 close(fd)
end program foo
this should read any line of reasonable length (up to the line limit your compiler defaults to, which is generally 2GB now-adays). You could change it to stream I/O to have no limit but most Fortran compilers have trouble reading stream I/O from stdin, which this example reads from. So if the line looks anything like a list of numbers it should read them, tell you how many it read, and let you know if it had an error reading any value as a number (character strings, strings bigger than the size of a REAL value, ....). All the parts here are explained on the Fortran Wiki, but to keep it short this is a stripped down version that just puts the pieces together. The oddest behavior it would have is that if you entered something like this with a slash in it
10 20,,30,40e4 50 / this is a list of numbers
it would treat everything after the slash as a comment and not generate a non-zero status return while returning five values. For a more detailed explanation of the code I think the annotated pieces on the Wiki explain how it works. In the search, look for "getvals" and "readline".
So with this program you can read a line and if the return status is zero and the number of values read is four you should be good except for a few dusty corners where the lines would definitely not look like a list of numbers.
module M_getvals
private
public getvals, readline
implicit none
contains
subroutine getvals(line,values,icount,ierr)
character(len=*),intent(in) :: line
real :: values(:)
integer,intent(out) :: icount, ierr
character(len=:),allocatable :: buffer
character(len=len(line)) :: words(size(values))
integer :: ios, i
ierr=0
words=' '
buffer=trim(line)//"/"
read(buffer,*,iostat=ios) words
icount=0
do i=1,size(values)
if(words(i).eq.'') cycle
read(words(i),*,iostat=ios)values(icount+1)
if(ios.eq.0)then
icount=icount+1
else
ierr=ios
write(*,*)'*getvals* WARNING:['//trim(words(i))//'] is not a number'
endif
enddo
end subroutine getvals
subroutine readline(line,ier)
character(len=:),allocatable,intent(out) :: line
integer,intent(out) :: ier
integer,parameter :: buflen=1024
character(len=buflen) :: buffer
integer :: last, isize
line=''
ier=0
INFINITE: do
read(*,iostat=ier,fmt='(a)',advance='no',size=isize) buffer
if(isize.gt.0)line=line//buffer(:isize)
if(is_iostat_eor(ier))then
last=len(line)
if(last.ne.0)then
if(line(last:last).eq.'\\')then
line=line(:last-1)
cycle INFINITE
endif
endif
ier=0
exit INFINITE
elseif(ier.ne.0)then
exit INFINITE
endif
enddo INFINITE
line=trim(line)
end subroutine readline
end module M_getvals
program tryit
use M_getvals, only: getvals, readline
implicit none
character(len=:),allocatable :: line
real,allocatable :: values(:)
integer :: icount, ier, ierr
INFINITE: do
call readline(line,ier)
if(allocated(values))deallocate(values)
allocate(values(len(line)/2+1))
if(ier.ne.0)exit INFINITE
call getvals(line,values,icount,ierr)
write(*,'(*(g0,1x))')'VALUES=',values(:icount),'NUMBER OF VALUES=',icount,'STATUS=',ierr
enddo INFINITE
end program tryit
Honesty, it should work reasonably with just about any line you throw at it.
PS:
If you are always reading four values, using list-directed I/O and checking the iostat= value on READ and checking if you hit EOR would be very simple (just a few lines) but since you said you wanted to read lines of arbitrary length I am assuming four values on a line was just an example and you wanted something very generic.

How would you solve the letter changer in Julia?

I found this challenge:
Using your language, have the function LetterChanges(str) take the str parameter being passed and modify it using the following algorithm. Replace every letter in the string with the letter following it in the alphabet (ie. c becomes d, z becomes a). Then capitalize every vowel in this new string (a, e, i, o, u) and finally return this modified string.
I am new in Julia, and I was challenging myself in this challenge. I found this challenge very hard in Julia lang and I could not find a solution.
Here I tried to solve in the way below, but I got error: the x value is not defined
How would you solve this?
function LetterChanges(stringis::AbstractString)
alphabet = "abcdefghijklmnopqrstuvwxyz"
vohels = "aeiou"
for Char(x) in split(stringis, "")
if x == 'z'
x = 'a'
elseif x in vohels
uppercase(x)
else
Int(x)+1
Char(x)
println(x)
end
end
end
Thank you
As a side note:
The proposed solution works properly. However, if you would need high performance (which you probably do not given the source of your problem) it is more efficient to use string builder:
function LetterChanges2(str::AbstractString)
v = Set("aeiou")
#sprint(sizehint=sizeof(str)) do io # use on Julia 0.7 - new keyword argument
sprint() do io # use on Julia 0.6.2
for c in str
c = c == 'z' ? 'a' : c+1 # we assume that we got only letters from 'a':'z'
print(io, c in v ? uppercase(c) : c)
end
end
end
it is over 10x faster than the above.
EDIT: for Julia 0.7 this is a bit faster:
function LetterChanges2(str::AbstractString)
v = BitSet(collect(Int,"aeiouy"))
sprint(sizehint=sizeof(str)) do io # use on Julia 0.7 - new keyword argument
for c in str
c = c == 'z' ? 'a' : c+1 # we assume that we got only letters from 'a':'z'
write(io, Int(c) in v ? uppercase(c) : c)
end
end
end
There is a logic error. It says "Replace every letter in the string with the letter following it in the alphabet. Then capitalize every vowel in this new string". Your code checks, if it is a vowel. Then it capitalizes it or replaces it. That's different behavior. You have to first replace and then to check if it is a vowel.
You are replacing 'a' by 'Z'. You should be replacing 'z' by 'a'
The function split(stringis, "") returns an array of strings. You can't store these strings in Char(x). You have to store them in x and then you can transform theses string to char with c = x[1].
After transforming a char you have to store it in the variable: c = uppercase(c)
You don't need to transform a char into int. You can add a number to a char: c = c + 1
You have to store the new characters in a string and return them.
function LetterChanges(stringis::AbstractString)
# some code
str = ""
for x in split(stringis, "")
c = x[1]
# logic
str = "$str$c"
end
return str
end
Here's another version that is a bit faster than #BogumilKaminski's answer on version 0.6, but that might be different on 0.7. On the other hand, it might be a little less intimidating than the do-block magic ;)
function changeletters(str::String)
vowels = "aeiouy"
carr = Vector{Char}(length(str))
i = 0
for c in str
newchar = c == 'z' ? 'a' : c + 1
carr[i+=1] = newchar in vowels ? uppercase(newchar) : newchar
end
return String(carr)
end
At the risk of being accused of cheating, this is a dictionary-based approach:
function change_letters(s::String)::String
k = collect('a':'z')
v = vcat(collect('b':'z'), 'A')
d = Dict{Char, Char}(zip(k, v))
for c in Set("eiou")
d[c - 1] = uppercase(d[c - 1])
end
b = IOBuffer()
for c in s
print(b, d[c])
end
return String(take!(b))
end
It seems to compare well in speed terms with the other Julia 0.6 methods for long strings (e.g. 100,000 characters). There's a bit of unnecessary overhead in constructing the dictionary which is noticeable on small strings, but I'm far too lazy to type out the 'a'=>'b' construction long-hand!

Converting letters into NATO alphabet in MATLAB

I want to write a code in MATLAB that converts a letter into NATO alphabet. Such as the word 'hello' would be re-written as Hotel-Echo-Lima-Lima-Oscar. I have been having some trouble with the code. So far I have the following:
function natoText = textToNato(plaintext)
plaintext = lower(plaintext);
r = zeros(1, length(plaintext))
%Define my NATO alphabet
natalph = ["Alpha","Bravo","Charlie","Delta","Echo","Foxtrot","Golf", ...
"Hotel","India","Juliet","Kilo","Lima","Mike","November","Oscar", ...
"Papa","Quebec","Romeo","Sierra","Tango","Uniform","Victor",...
"Whiskey","Xray","Yankee","Zulu"];
%Define the normal lower alphabet
noralpha = ['a' : 'z'];
%Now we need to make a loop for matlab to check for each letter
for i = 1:length(text)
for j = 1:26
n = r(i) == natalph(j);
if noralpha(j) == text(i) : n
else r(i) = r(i)
natoText = ''
end
end
end
for v = 1:length(plaintext)
natoText = natoText + r(v) + ''
natoText = natoText(:,-1)
end
end
I know the above code is a mess and I am a bit in doubt what really I have been doing. Is there anyone who knows a better way of doing this? Can I modify the above code so that it works?
It is because now when I run the code, I am getting an empty plot, which I don't know why because I have not asked for a plot in any lines.
You can actually do your conversion in one line. Given your string array natalph:
plaintext = 'hello'; % Your input; could also be "hello"
natoText = strjoin(natalph(char(lower(plaintext))-96), '-');
And the result:
natoText =
string
"Hotel-Echo-Lima-Lima-Oscar"
This uses a trick that character arrays can be treated as numeric arrays of their ASCII equivalent values. The code char(lower(plaintext))-96 converts plaintext to lowercase, then to a character array (if it isn't already) and implicitly converts it to a numeric vector of ASCII values by subtracting 96. Since 'a' is equal to 97, this creates an index vector containing the values 1 ('a') through 26 ('z'). This is used to index the string array natalph, and these are then joined together with hyphens.

Matlab, order cells of strings according to the first one

I have 2 cell of strings and I would like to order them according to the first one.
A = {'a';'b';'c'}
B = {'b';'a';'c'}
idx = [2,1,3] % TO FIND
B=B(idx);
I would like to find a way to find idx...
Use the second output of ismember. ismember tells you whether or not values in the first set are anywhere in the second set. The second output tells you where these values are located if we find anything. As such:
A = {'a';'b';'c'}
B = {'b';'a';'c'}
[~,idx] = ismember(A, B);
Note that there is a minor typo when you declared your cell arrays. You have a colon in between b and c for A and a and c for B. I placed a semi-colon there for both for correctness.
Therefore, we get:
idx =
2
1
3
Benchmarking
We have three very good algorithms here. As such, let's see how this performs by doing a benchmarking test. What I'm going to do is generate a 10000 x 1 random character array of lower case letters. This will then be encapsulated into a 10000 x 1 cell array, where each cell is a single character array. I construct A this way, and B is a random permutation of the elements in A. This is the code that I wrote to do this for us:
letters = char(97 + (0:25));
rng(123); %// Set seed for reproducibility
ind = randi(26, [10000, 1]);
lettersMat = letters(ind);
A = mat2cell(lettersMat, ones(10000,1), 1);
B = A(randperm(10000));
Now... here comes the testing code:
clear all;
close all;
letters = char(97 + (0:25));
rng(123); %// Set seed for reproducibility
ind = randi(26, [10000, 1]);
lettersMat = letters(ind);
A = mat2cell(lettersMat, 1, ones(10000,1));
B = A(randperm(10000));
tic;
[~,idx] = ismember(A,B);
t = toc;
fprintf('ismember: %f\n', t);
clear idx; %// Make sure test is unbiased
tic;
[~,idx] = max(bsxfun(#eq,char(A),char(B)'));
t = toc;
fprintf('bsxfun: %f\n', t);
clear idx; %// Make sure test is unbiased
tic;
[~, indA] = sort(A);
[~, indB] = sort(B);
idx = indB(indA);
t = toc;
fprintf('sort: %f\n', t);
This is what I get for timing:
ismember: 0.058947
bsxfun: 0.110809
sort: 0.006054
Luis Mendo's approach is the fastest, followed by ismember, and then finally bsxfun. For code compactness, ismember is preferred but for performance, sort is better. Personally, I think bsxfun should win because it's such a nice function to use ;).
This seems to be significantly faster than using ismember (although admittedly less clear than #rayryeng's answer). With thanks to #Divakar for his correction on this answer.
[~, indA] = sort(A);
[~, indB] = sort(B);
idx = indA(indB);
I had to jump in as it seems runtime performance could be a criteria here :)
Assuming that you are dealing with scalar strings(one character in each cell), here's my take that works even when you have not-commmon elements between A and B and uses the very powerful bsxfun and as such I am really hoping this would be runtime-efficient -
[v,idx] = max(bsxfun(#eq,char(A),char(B)'));
idx = v.*idx
Example -
A =
'a' 'b' 'c' 'd'
B =
'b' 'a' 'c' 'e'
idx =
2 1 3 0
For a specific case when you have no not-common elements between A and B, it becomes a one-liner -
[~,idx] = max(bsxfun(#eq,char(A),char(B)'))
Example -
A =
'a' 'b' 'c'
B =
'b' 'a' 'c'
idx =
2 1 3

Fortran: Set operations

Fortran: There are two large arrays of integers, the goal is to find out if they have any number in common or not, how?
You may consider that both are in the same size (case 1) or in different sizes (case 2). It is possible also that they have many common numbers repeated, so this should be handled to avoid unnecessary search or operators.
The simplest way is to do Brute-Force search which is not appropriate. We are thinking about SET operations similar to Python as the following:
a = set([integers])
b = set([integers])
incommon = len(a.intersection(b)) > 0 #True if so, otherwise False
So for example:
a = [1,2,3,4,5]
b = [0,6,7,8,9]
sa = set(a)
sb = set(b)
incommon = len(sa.intersection(sb)) > 0
>>> incommon: False
b = [0,6,7,8,1]
incommon = len(sa.intersection(sb)) > 0
>>> incommon: True
How to implement this in Fortran? note that arrays are of large size (>10000) and the operation would repeat for million times!
Updates:
[regarding the comment for the question] We absolutely have tried many ways that we knew. As mentioned BFS method, for example. It works but is not efficient for two reasons: 1) the nature of the method which requires large iterations, 2) the code we could implement. The accepted answer (by yamajun) was very informative to us much more than the question itself. How easy implementation of Quick-Sort, Shrink and Isin all are very nicely thought and elegantly implemented. Our appreciation goes for such prompt and perfect solution.
Maybe this will work.
added from here
The main idea is using intrinsic function ANY().
ANY(x(:) == y) returns .true. if a scalar value y exists in an array x. When y is also an array ANY(x == y) returns x(1)==y(1) & x(2)==y(2) &..., so we have to use do loop for each element of y.
Now we try to delete duplicate numbers in the arrays.
First we sort the arrays. Quick-sort can be written concisely in a Haskell-like manner.
(Reference : Arjen Markus, ACM Fortran Forum 27 (2008) 2-5.)
But because recursion consumes stacks, Shell-sort might be a better choice, which does not require extra memories. It is often stated in textbooks that Shell-sort works in O(N^3/2~5/4), but it works much faster using special gap functions.wikipedia
Next we delete duplicate numbers by comparing successive elements using the idea of zip pairs. [x(2)/=x(1), ..., x(n)/=x(n-1)] We need to add extra one element to match array size. The intrinsic function PACK() is used as a Filter.
to here
program SetAny
implicit none
integer, allocatable :: ia(:), ib(:)
! fortran2008
! allocate(ia, source = [1,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5])
! allocate(ib, source = [0,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9])
allocate(ia(size([1,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5])))
allocate(ib(size([0,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9])))
ia = [1,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5]
ib = [0,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9]
print *, isin( shrnk( ia ), shrnk( ib ) )
stop
contains
logical pure function isin(ia, ib)
integer, intent(in) :: ia(:), ib(:)
integer :: i
isin = .true.
do i = 1, size(ib)
if ( any(ia == ib(i)) ) return
end do
isin = .false.
return
end function isin
pure function shrnk(ia) result(res)
integer, intent(in) :: ia(:)
integer, allocatable :: res(:) ! f2003
integer :: iwk(size(ia))
iwk = qsort(ia)
res = pack(iwk, [.true., iwk(2:) /= iwk(1:)]) ! f2003
return
end function shrnk
pure recursive function qsort(ia) result(res)
integer, intent(in) :: ia(:)
integer :: res(size(ia))
if (size(ia) .lt. 2) then
res = ia
else
res = [ qsort( pack(ia(2:), ia(2:) &lt ia(1)) ), ia(1), qsort( pack(ia(2:), ia(2:) &gt= ia(1)) ) ]
end if
return
end function qsort
end program SetAny
Shell sort
pure function ssort(ix) ! Shell Sort
integer, intent(in) :: ix(:)
integer, allocatable :: ssort(:)
integer :: i, j, k, kmax, igap, itmp
ssort = ix
kmax = 0
do ! Tokuda's gap sequence ; h_k=Ceiling( (9(9/4)^k-4)/5 ), h_k &lt 4N/9 ; O(N)~NlogN
if ( ceiling( (9.0 * (9.0 / 4.0)**(kmax + 1) - 4.0) / 5.0 ) &gt size(ix) * 4.0 / 9.0 ) exit
kmax = kmax + 1
end do
do k = kmax, 0, -1
igap = ceiling( (9.0 * (9.0 / 4.0)**k - 4.0) / 5.0 )
do i = igap, size(ix)
do j = i - igap, 1, -igap
if ( ssort(j) &lt= ssort(j + igap) ) exit
itmp = ssort(j)
ssort(j) = ssort(j + igap)
ssort(j + igap) = itmp
end do
end do
end do
return
end function ssort

Resources