Fortran: Set operations - search

Fortran: There are two large arrays of integers, the goal is to find out if they have any number in common or not, how?
You may consider that both are in the same size (case 1) or in different sizes (case 2). It is possible also that they have many common numbers repeated, so this should be handled to avoid unnecessary search or operators.
The simplest way is to do Brute-Force search which is not appropriate. We are thinking about SET operations similar to Python as the following:
a = set([integers])
b = set([integers])
incommon = len(a.intersection(b)) > 0 #True if so, otherwise False
So for example:
a = [1,2,3,4,5]
b = [0,6,7,8,9]
sa = set(a)
sb = set(b)
incommon = len(sa.intersection(sb)) > 0
>>> incommon: False
b = [0,6,7,8,1]
incommon = len(sa.intersection(sb)) > 0
>>> incommon: True
How to implement this in Fortran? note that arrays are of large size (>10000) and the operation would repeat for million times!
Updates:
[regarding the comment for the question] We absolutely have tried many ways that we knew. As mentioned BFS method, for example. It works but is not efficient for two reasons: 1) the nature of the method which requires large iterations, 2) the code we could implement. The accepted answer (by yamajun) was very informative to us much more than the question itself. How easy implementation of Quick-Sort, Shrink and Isin all are very nicely thought and elegantly implemented. Our appreciation goes for such prompt and perfect solution.

Maybe this will work.
added from here
The main idea is using intrinsic function ANY().
ANY(x(:) == y) returns .true. if a scalar value y exists in an array x. When y is also an array ANY(x == y) returns x(1)==y(1) & x(2)==y(2) &..., so we have to use do loop for each element of y.
Now we try to delete duplicate numbers in the arrays.
First we sort the arrays. Quick-sort can be written concisely in a Haskell-like manner.
(Reference : Arjen Markus, ACM Fortran Forum 27 (2008) 2-5.)
But because recursion consumes stacks, Shell-sort might be a better choice, which does not require extra memories. It is often stated in textbooks that Shell-sort works in O(N^3/2~5/4), but it works much faster using special gap functions.wikipedia
Next we delete duplicate numbers by comparing successive elements using the idea of zip pairs. [x(2)/=x(1), ..., x(n)/=x(n-1)] We need to add extra one element to match array size. The intrinsic function PACK() is used as a Filter.
to here
program SetAny
implicit none
integer, allocatable :: ia(:), ib(:)
! fortran2008
! allocate(ia, source = [1,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5])
! allocate(ib, source = [0,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9])
allocate(ia(size([1,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5])))
allocate(ib(size([0,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9])))
ia = [1,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5,2,3,4,5]
ib = [0,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9]
print *, isin( shrnk( ia ), shrnk( ib ) )
stop
contains
logical pure function isin(ia, ib)
integer, intent(in) :: ia(:), ib(:)
integer :: i
isin = .true.
do i = 1, size(ib)
if ( any(ia == ib(i)) ) return
end do
isin = .false.
return
end function isin
pure function shrnk(ia) result(res)
integer, intent(in) :: ia(:)
integer, allocatable :: res(:) ! f2003
integer :: iwk(size(ia))
iwk = qsort(ia)
res = pack(iwk, [.true., iwk(2:) /= iwk(1:)]) ! f2003
return
end function shrnk
pure recursive function qsort(ia) result(res)
integer, intent(in) :: ia(:)
integer :: res(size(ia))
if (size(ia) .lt. 2) then
res = ia
else
res = [ qsort( pack(ia(2:), ia(2:) &lt ia(1)) ), ia(1), qsort( pack(ia(2:), ia(2:) &gt= ia(1)) ) ]
end if
return
end function qsort
end program SetAny
Shell sort
pure function ssort(ix) ! Shell Sort
integer, intent(in) :: ix(:)
integer, allocatable :: ssort(:)
integer :: i, j, k, kmax, igap, itmp
ssort = ix
kmax = 0
do ! Tokuda's gap sequence ; h_k=Ceiling( (9(9/4)^k-4)/5 ), h_k &lt 4N/9 ; O(N)~NlogN
if ( ceiling( (9.0 * (9.0 / 4.0)**(kmax + 1) - 4.0) / 5.0 ) &gt size(ix) * 4.0 / 9.0 ) exit
kmax = kmax + 1
end do
do k = kmax, 0, -1
igap = ceiling( (9.0 * (9.0 / 4.0)**k - 4.0) / 5.0 )
do i = igap, size(ix)
do j = i - igap, 1, -igap
if ( ssort(j) &lt= ssort(j + igap) ) exit
itmp = ssort(j)
ssort(j) = ssort(j + igap)
ssort(j + igap) = itmp
end do
end do
end do
return
end function ssort

Related

Pythonic way of finding differences between two strings

a = "3030104AF43B000A3F1D200619D09FE00403031324354650004FFFFF"
b = "3030104BE3B000C3DF1D200617183BA00403030335F5B6F0004FFFFF"
Let's say a and b are two hexadecimal string values of which are equal and even in length. They also share the same format such that differences occurring between both strings always happen at the same position (But, initially I do not know which position these differences occur). For example a's first six digits are the same as b's first six digits. i.e., 303010. The following 4 digits are different 4AF43B in a compared with b, after which the next two digit are the same for a and b (00). This pattern follows on until the end of both strings.
I have written code to store the differences occurring as different elements in a list.
seed = "3030104AF43B000A3F1D200619D09FE00403031324354650004FFFFF"
seed2 = "3030104BE3B000C3DF1D200617183BA00403030335F5B6F0004FFFFF"
seed = seed.rstrip("FF")
seed2 = seed2.rstrip("FF")
differences_list1 = []
differences_list2 = []
sequence1 = ""
sequence2 = ""
for pair in range(int(len(seed) / 2)):
data_pair1 = seed[pair * 2:(pair * 2) + 2]
data_pair2 = seed2[pair * 2:(pair * 2) + 2]
if data_pair1 == data_pair2:
if sequence1 == "" and sequence2 == "":
continue
# here, we know it is not an empty sequence
differences_list1.append(sequence1)
differences_list2.append(sequence2)
sequence1 = ""
sequence2 = ""
continue
# when they are not equal to each other
sequence1 = sequence1 + data_pair1
sequence2 = sequence2 + data_pair2
print(str(differences_list1))
print(str(differences_list2))
Output (which I want):
['4AF43B', '0A3F', '19D09FE0', '1324354650']
['4BE3B0', 'C3DF', '17183BA0', '0335F5B6F0']
I've gotten the output as I desired somehow but I would like to know how can I improve/write my code in a more pythonic way (specifically python3.9)?

How would you solve the letter changer in Julia?

I found this challenge:
Using your language, have the function LetterChanges(str) take the str parameter being passed and modify it using the following algorithm. Replace every letter in the string with the letter following it in the alphabet (ie. c becomes d, z becomes a). Then capitalize every vowel in this new string (a, e, i, o, u) and finally return this modified string.
I am new in Julia, and I was challenging myself in this challenge. I found this challenge very hard in Julia lang and I could not find a solution.
Here I tried to solve in the way below, but I got error: the x value is not defined
How would you solve this?
function LetterChanges(stringis::AbstractString)
alphabet = "abcdefghijklmnopqrstuvwxyz"
vohels = "aeiou"
for Char(x) in split(stringis, "")
if x == 'z'
x = 'a'
elseif x in vohels
uppercase(x)
else
Int(x)+1
Char(x)
println(x)
end
end
end
Thank you
As a side note:
The proposed solution works properly. However, if you would need high performance (which you probably do not given the source of your problem) it is more efficient to use string builder:
function LetterChanges2(str::AbstractString)
v = Set("aeiou")
#sprint(sizehint=sizeof(str)) do io # use on Julia 0.7 - new keyword argument
sprint() do io # use on Julia 0.6.2
for c in str
c = c == 'z' ? 'a' : c+1 # we assume that we got only letters from 'a':'z'
print(io, c in v ? uppercase(c) : c)
end
end
end
it is over 10x faster than the above.
EDIT: for Julia 0.7 this is a bit faster:
function LetterChanges2(str::AbstractString)
v = BitSet(collect(Int,"aeiouy"))
sprint(sizehint=sizeof(str)) do io # use on Julia 0.7 - new keyword argument
for c in str
c = c == 'z' ? 'a' : c+1 # we assume that we got only letters from 'a':'z'
write(io, Int(c) in v ? uppercase(c) : c)
end
end
end
There is a logic error. It says "Replace every letter in the string with the letter following it in the alphabet. Then capitalize every vowel in this new string". Your code checks, if it is a vowel. Then it capitalizes it or replaces it. That's different behavior. You have to first replace and then to check if it is a vowel.
You are replacing 'a' by 'Z'. You should be replacing 'z' by 'a'
The function split(stringis, "") returns an array of strings. You can't store these strings in Char(x). You have to store them in x and then you can transform theses string to char with c = x[1].
After transforming a char you have to store it in the variable: c = uppercase(c)
You don't need to transform a char into int. You can add a number to a char: c = c + 1
You have to store the new characters in a string and return them.
function LetterChanges(stringis::AbstractString)
# some code
str = ""
for x in split(stringis, "")
c = x[1]
# logic
str = "$str$c"
end
return str
end
Here's another version that is a bit faster than #BogumilKaminski's answer on version 0.6, but that might be different on 0.7. On the other hand, it might be a little less intimidating than the do-block magic ;)
function changeletters(str::String)
vowels = "aeiouy"
carr = Vector{Char}(length(str))
i = 0
for c in str
newchar = c == 'z' ? 'a' : c + 1
carr[i+=1] = newchar in vowels ? uppercase(newchar) : newchar
end
return String(carr)
end
At the risk of being accused of cheating, this is a dictionary-based approach:
function change_letters(s::String)::String
k = collect('a':'z')
v = vcat(collect('b':'z'), 'A')
d = Dict{Char, Char}(zip(k, v))
for c in Set("eiou")
d[c - 1] = uppercase(d[c - 1])
end
b = IOBuffer()
for c in s
print(b, d[c])
end
return String(take!(b))
end
It seems to compare well in speed terms with the other Julia 0.6 methods for long strings (e.g. 100,000 characters). There's a bit of unnecessary overhead in constructing the dictionary which is noticeable on small strings, but I'm far too lazy to type out the 'a'=>'b' construction long-hand!

Fortran nested WHERE statement

I have a Fortran 90 source code with a nested WHERE statement. There is a problem but it seems difficult to understand what exactly happens. I would like to transform it into DO-IF structure in order to debug. What it is not clear to me is how to translate the nested WHERE.
All the arrays have the same size.
WHERE (arrayA(:) > 0)
diff_frac(:) = 1.5 * arrayA(:)
WHERE (diff_frac(:) > 2)
arrayC(:) = arrayC(:) + diff_frac(:)
ENDWHERE
ENDWHERE
My option A:
DO i=1, SIZE(arrayA)
IF (arrayA(i) > 0) THEN
diff_frac(i) = 1.5 * arrayA(i)
DO j=1, SIZE(diff_frac)
IF (diff_frac(j) > 2) THEN
arrayC(j) = arrayC(j) + diff_frac(j)
ENDIF
ENDDO
ENDIF
ENDDO
My option B:
DO i=1, SIZE(arrayA)
IF (arrayA(i) > 0) THEN
diff_frac(i) = 1.5 * arrayA(i)
IF (diff_frac(i) > 2) THEN
arrayC(i) = arrayC(i) + diff_frac(i)
ENDIF
ENDIF
ENDDO
Thank you
According to the thread "Nested WHERE constructs" in comp.lang.fortran (particularly Ian's reply), it seems that the first code in the Question translates to the following:
do i = 1, size( arrayA )
if ( arrayA( i ) > 0 ) then
diff_frac( i ) = 1.5 * arrayA( i )
endif
enddo
do i = 1, size( arrayA )
if ( arrayA( i ) > 0 ) then
if ( diff_frac( i ) > 2 ) then
arrayC( i ) = arrayC( i ) + diff_frac( i )
endif
endif
enddo
This is almost the same as that in Mark's answer except for the second mask part (see below). Key excerpts from the F2008 documents are something like this:
7.2.3 Masked array assignment – WHERE (page 161)
7.2.3.2 Interpretation of masked array assignments (page 162)
... 2. Each statement in a WHERE construct is executed in sequence.
... 4. The mask-expr is evaluated at most once.
... 8. Upon execution of a WHERE statement that is part of a where-body-construct, the control mask is established to have the value m_c .AND. mask-expr.
... 10. If an elemental operation or function reference occurs in the expr or variable of a where-assignment-stmt or in a mask-expr, and is not within the argument list of a nonelemental function reference, the operation is performed or the function is evaluated only for the elements corresponding to true values of the control mask.
If I understand the above thread/documents correctly, the conditional diff_frac( i ) > 2 is evaluated after arrayA( i ) > 0, so corresponding to double IF blocks (if I assume that A .and. B in Fortran does not specify the order of evaluation).
However, as noted in the above thread, the actual behavior may depend on compilers... For example, if we compile the following code with gfortran5.2, ifort14.0, or Oracle fortran 12.4 (with no options)
integer, dimension(4) :: x, y, z
integer :: i
x = [1,2,3,4]
y = 0 ; z = 0
where ( 2 <= x )
y = x
where ( 3.0 / y < 1.001 ) !! possible division by zero
z = -10
end where
end where
print *, "x = ", x
print *, "y = ", y
print *, "z = ", z
they all give the expected result:
x = 1 2 3 4
y = 0 2 3 4
z = 0 0 -10 -10
But if we compile with debugging options
gfortran -ffpe-trap=zero
ifort -fpe0
f95 -ftrap=division (or with -fnonstd)
gfortran and ifort abort with floating-point exception by evaluating y(i) = 0 in the mask expression, while f95 runs with no complaints. (According to the linked thread, Cray behaves similarly to gfortran/ifort, while NAG/PGI/XLF are similar to f95.)
As a side note, when we use "nonelemental" functions in WHERE constructs, the control mask does not apply and all the elements are used in the function evaluation (according to Sec. 7.2.3.2, sentence 9 of the draft above). For example, the following code
integer, dimension(4) :: a, b, c
a = [ 1, 2, 3, 4 ]
b = -1 ; c = -1
where ( 3 <= a )
b = a * 100
c = sum( b )
endwhere
gives
a = 1 2 3 4
b = -1 -1 300 400
c = -1 -1 698 698
which means that sum( b ) = 698 is obtained from all the elements of b, with the two statements evaluated in sequence.
Why not
WHERE (arrayA(:) > 0)
diff_frac(:) = 1.5 * arrayA(:)
ENDWHERE
WHERE (diff_frac(:) > 2 .and. arrayA(:) > 0)
arrayC(:) = arrayC(:) + diff_frac(:)
ENDWHERE
?
I won't say it can't be done with nested wheres, but I don't see why it has to be. Then, if you must translate to do loops, the translation is very straightforward.
Your own attempts suggest you think of where as a kind of looping construct, I think it's better to think of it as a masked assignment (which is how it's explained in the language standard) in which each individual assignment happens at the same time. These days you might consider translating into do concurrent constructs.
Sorry about deflecting the question a bit, but this is interesting. I am not sure that I can tell how the nested where is going to be compiled. It may even be one of those cases that push the envelope.
I agree with High Performance Mark that where is best thought of as a masking operation and then it is unclear (to me) whether your "A" or "B" will result.
I do think that his solution should be the same as your nested where.
My point: Since this is tricky to even discern, can you write new code instead of this, from scratch? Not to translate it, but delete it, forget about it, and write code to do the job.
If you know exactly what this piece of code needs to do, its pre- and post- conditions, then it shouldn't be difficult. If you don't know that then the algorithm may be too entangled in which case this should be rewritten anyway. There may be subtleties involved between what this was intended to do and what it does. You say you are debugging this code already.
Again, sorry to switch context but I think that there is a possibility that this is one of those situations where code is best served by a complete rewrite.
If you want to keep it and only write loops for debugging: Why not write them and compare output?
Run it with where as it is, then run it with "A" instead, then with "B". Print values.

Reading a file of lists of integers in Fortran

I would like to read a data file with a Fortran program, where each line is a list of integers.
Each line has a variable number of integers, separated by a given character (space, comma...).
Sample input:
1,7,3,2
2,8
12,44,13,11
I have a solution to split lines, which I find rather convoluted:
module split
implicit none
contains
function string_to_integers(str, sep) result(a)
integer, allocatable :: a(:)
integer :: i, j, k, n, m, p, r
character(*) :: str
character :: sep, c
character(:), allocatable :: tmp
!First pass: find number of items (m), and maximum length of an item (r)
n = len_trim(str)
m = 1
j = 0
r = 0
do i = 1, n
if(str(i:i) == sep) then
m = m + 1
r = max(r, j)
j = 0
else
j = j + 1
end if
end do
r = max(r, j)
allocate(a(m))
allocate(character(r) :: tmp)
!Second pass: copy each item into temporary string (tmp),
!read an integer from tmp, and write this integer in the output array (a)
tmp(1:r) = " "
j = 0
k = 0
do i = 1, n
c = str(i:i)
if(c == sep) then
k = k + 1
read(tmp, *) p
a(k) = p
tmp(1:r) = " "
j = 0
else
j = j + 1
tmp(j:j) = c
end if
end do
k = k + 1
read(tmp, *) p
a(k) = p
deallocate(tmp)
end function
end module
My question:
Is there a simpler way to do this in Fortran? I mean, reading a list of values where the number of values to read is unknown. The above code looks awkward, and file I/O does not look easy in Fortran.
Also, the main program has to read lines with unknown and unbounded length. I am able to read lines if I assume they are all the same length (see below), but I don't know how to read unbounded lines. I suppose it would need the stream features of Fortran 2003, but I don't know how to write this.
Here is the current program:
program read_data
use split
implicit none
integer :: q
integer, allocatable :: a(:)
character(80) :: line
open(unit=10, file="input.txt", action="read", status="old", form="formatted")
do
read(10, "(A80)", iostat=q) line
if(q /= 0) exit
if(line(1:1) /= "#") then
a = string_to_integers(line, ",")
print *, ubound(a), a
end if
end do
close(10)
end program
A comment about the question: usually I would do this in Python, for example converting a line would be as simple as a = [int(x) for x in line.split(",")], and reading a file is likewise almost a trivial task. And I would do the "real" computing stuff with a Fortran DLL. However, I'd like to improve my Fortran skills on file I/O.
I don't claim it is the shortest possible, but it is much shorter than yours. And once you have it, you can reuse it. I don't completely agree with these claims how Fotran is bad at string processing, I do tokenization, recursive descent parsing and similar stuff just fine in Fortran, although it is easier in some other languages with richer libraries. Sometimes you can use the libraries written in other languages (especially C and C++) in Fortran too.
If you always use the comma you can remove the replacing by comma and thus shorten it even more.
function string_to_integers(str, sep) result(a)
integer, allocatable :: a(:)
character(*) :: str
character :: sep
integer :: i, n_sep
n_sep = 0
do i = 1, len_trim(str)
if (str(i:i)==sep) then
n_sep = n_sep + 1
str(i:i) = ','
end if
end do
allocate(a(n_sep+1))
read(str,*) a
end function
Potential for shortening: view the str as a character array using equivalence or transfer and use count() inside of allocate to get the size of a.
The code assumes that there is just one separator between each number and there is no separator before the first one. If multiple separators are allowed between two numbers, you have to check whether the preceding character is a separator or not
do i = 2, len_trim(str)
if (str(i:i)==sep .and. str(i-1:i-1)/=sep) then
n_sep = n_sep + 1
str(i:i) = ','
end if
end do
My answer is probably too simplistic for your goals but I have spent a lot of time recently reading in strange text files of numbers. My biggest problem is finding where they start (not hard in your case) then my best friend is the list-directed read.
read(unit=10,fmt=*) a
will read in all of the data into vector 'a', done deal. With this method you will not know which line any piece of data came from. If you want to allocate it then you can read the file once and figure out some algorithm to make the array larger than it needs to be, like maybe count the number of lines and you know a max data amount per line (say 21).
status = 0
do while ( status == 0)
line_counter = line_counter + 1
read(unit=10,, iostat=status, fmt=*)
end do
allocate(a(counter*21))
If you want to then eliminate zero values you can remove them or pre-seed the 'a' vector with a negative number if you don't expect any then remove all of those.
Another approach stemming from the other suggestion is to first count the commas then do a read where the loop is controlled by
do j = 1, line_counter ! You determined this on your first read
read(unit=11,fmt=*) a(j,:) ! a is now a 2 dimensional array (line_counter, maxNumberPerLine)
! You have a separate vector numberOfCommas(j) from before
end do
And now you can do whatever you want with these two arrays because you know all the data, which line it came from, and how many data were on each line.

MATLAB generate combination from a string

I've a string like this "FBECGHD" and i need to use MATLAB and generate all the required possible permutations? In there a specific MATLAB function that does this task or should I define a custom MATLAB function that perform this task?
Use the perms function. A string in matlab is a list of characters, so it will permute them:
A = 'FBECGHD';
perms(A)
You can also store the output (e.g. P = perms(A)), and, if A is an N-character string, P is a N!-by-N array, where each row corresponds to a permutation.
If you are interested in unique permutations, you can use:
unique(perms(A), 'rows')
to remove duplicates (otherwise something like 'ABB' would give 6 results, instead of the 3 that you might expect).
As Richante answered, P = perms(A) is very handy for this. You may also notice that P is of type char and it's not convenient to subset/select individual permutation. Below worked for me:
str = 'FBECGHD';
A = perms(str);
B = cellstr(reshape(A,7,[])');
C = unique(B);
It also appears that unique(A, 'rows') is not removing duplicate values:
>> A=[11, 11];
>> unique(A, 'rows')
ans =
11 11
However, unique(A) would:
>> unique(A)
ans =
11
I am not a matlab pro by any means and I didn't investigate this exhaustively but at least in some cases it appears that reshape is not what you want. Notice that below gives 999 and 191 as permutations of 199 which isn't true. The reshape function as written appears to operate "column-wise" on A:
>> str = '199';
A = perms(str);
B = cellstr(reshape(A,3,[])');
C = unique(B);
>> C
C =
'191'
'199'
'911'
'919'
'999'
Below does not produce 999 or 191:
B = {};
index = 1;
while true
try
substring = A(index,:);
B{index}=substring;
index = index + 1;
catch
break
end
end
C = unique(B)
C =
'199' '919' '991'

Resources