Calling fftw routines from pure subroutines in Fortran90 - multithreading

Multithreaded FFTW can be implemented as in this from FFTW homepage. Instead, we want to call the serial FFTW routines within OpenMP parallel environment using multiple threads. We want variable and fourier_variable to be thread-safe. This could be done by using PURE subroutines and declaring variable and fourier_variable inside it. The question here is related to calling FFTW routines like fftw_execute_dft_r2c from within a PURE subroutine.
A stripped-down version of the code is presented here just for reference (the full code is an optimisation solver involving many FFTW calls).
PROGRAM main
USE fft_call
REAL(8), DIMENSION(1:N, 1:N) :: variable
COMPLEX(C_DOUBLE_COMPLEX), DIMENSION(1:N/2+1, 1:N) :: fourier_variable
INTEGER :: JJ
!$OMP PARALLEL
!$OMP DO
DO JJ = 1, 5
call fourier_to_physical(fourier_variable, variable)
END DO
!$OMP END DO
!$OMP END PARALLEL
END PROGRAM main
MODULE fft_call
contains
PURE SUBROUTINE fourier_to_physical( fourier_variable, variable)
IMPLICIT NONE
REAL(8), DIMENSION(1:N, 1:N) :: variable
COMPLEX(C_DOUBLE_COMPLEX), DIMENSION(1:N/2+1, 1:N), INTENT(OUT) :: fourier_variable
CALL fftw_execute_dft_r2c (plan_fwd, variable, fourier_variable)
END SUBROUTINE fourier_to_physical
END MODULE fft_call
The error while calling fftw_plan_dft_r2c_2d from the PURE subroutine fourier_to_physical:
Error: Function reference to 'fftw_plan_dft_r2c' at (1) is to a non-PURE procedure within a PURE procedure
The question: is there a way to call FFTW routines like fftw_execute_dft_r2c from within a PURE subroutine in Fortran90?
Or, in other words, are their PURE versions of fftw_execute_dft_r2c such that we can call them from PURE procedures? We are beginners to OpenMP.

Related

How threads should verifiy that the queue is full while it's being modified at the same time by another thread?

I asked a question yesterday on the site and received an answer on the source of my error but I am still confused on how to solve the problem.
So I'm trying to replace a synchronization barrier with a function called by threads that have a rank other than 1. Thread number 1 is responsible for filling the queue.
I had the idea that threads (rank not equal to 1) should check if the queue is full. If it is not, they call the already predefined sleep function. If it is full, we can be sure that we can pass without fearing a race condition.
if (num_thread==1) then
do i_task=first_task,last_task
tasklist_GRAD(i_task)%state=STATE_READY
call queue_enqueue_data(master_queue,tasklist_GRAD(i_task)) !< add the list elements to the queue (full queue)
end do
end if
if (num_thread .ne. 1) then
call wait_thread(master_queue)
end if
!!$ !$OMP BARRIER !!!!!!! retired OMP BARR
call master_worker_execution(self,var,master_queue,worker_queue,first_task,last_task,nthreads,num_thread,lck)
Here is the definition of the wait function:
subroutine wait_thread(master_queue)
type(QUEUE_STRUCT),pointer,asynchronous::master_queue !< the master queue of tasks
do while (.not. queue_full(master_queue))
call system_sleep(1)
end do
end subroutine wait_thread
The problem is that thread number 1 will obviously modify the queue called master_queue so write and at the same time the other threads will check if the queue is full and so modify it too. This can lead to a race condition.
Here is the definition of queue_full function:
recursive logical function queue_full( queue )
type(QUEUE_STRUCT),asynchronous, intent(in) :: queue
!$OMP CRITICAL
queue_full = (queue%size == queue%capacity)
!$OMP END CRITICAL
end function queue_full
When running I don't get a segmentation error but I don't get the results for the code with OMP_BARRIER either.
Usually I get values displayed but now I get a blinking cursor.
My question is: is there any way to solve this? Is it impossible to replace OMP_BARRIER?
I tried to add the attribute asynchronous for the master_queue so the declaration became like this : type(QUEUE_STRUCT),pointer,asynchronous::master_queue !< the master queue of tasks. The critical directive in the definition of queue_full is something that I added too but in vain.
Any help, please ?
EDIT:
I just tried this method but I don't know if it really replaces OMP_BARRIER. Here is the new module time (nothing complicated, system_sleep and wait functions):
module time
use QUEUE
contains
recursive subroutine system_sleep(wait)
use,intrinsic :: iso_c_binding, only: c_int
integer,intent(in) :: wait
integer(kind=c_int):: waited
interface
function c_usleep(msecs) bind (C,name="usleep")
import
integer(c_int) :: c_usleep
integer(c_int),intent(in),VALUE :: msecs
end function c_usleep
end interface
if(wait.gt.0)then
waited=c_usleep(int(wait,kind=c_int))
endif
end subroutine system_sleep
recursive subroutine wait(full)
logical,intent(in)::full
do
call system_sleep(1000)
if (full .eqv. .true.) EXIT
end do
end subroutine wait
end module time
and this is how I replaced OMP_BARRIER:
full = .false.
first_task=5
last_task=6
if (num_thread==1) then
do i_task=first_task,last_task
tasklist_GRAD(i_task)%state=STATE_READY
call queue_enqueue_data(master_queue,tasklist_GRAD(i_task)) !< add the list elements to the queue (full queue)
end do
full=.true.
end if
if (num_thread .ne. 1) then !!!!!!!!!! how to replace OMP_BARRIER
call wait(full) !!! wait until full equal to true
end if
call master_worker_execution(self,var,master_queue,worker_queue,first_task,last_task,nthreads,num_thread,lck)
I want also to add that the shared variable full is of type logical.
Is this an efficient way to get rid of an explicit barrier ?

Private and public variables inside a module and a a subroutine in OpenMP

I am trying to parallelize a fairly complicated simulation code used for oil field calculations.
My question is, if I can declare a variable or some allocatable arrays and a few subroutines in a module, then use this module in another module/subroutine which contains the parallel region, will those variables and arrays be considered private to each thread (i.e. they will have separate copies of those variables and changes made to a variables in a thread won't be seen by other threads) or they'll be shared?
Like this:
module m2
implicit none
integer :: global
contains
subroutine s2()
implicit none
integer :: tid
tid = ! Some calculation
end subroutine s2
end module m2
module m1
use m2
implicit none
contains
subroutine s1
!$omp parallel do
do i=1, 9
call s2()
end do
!$omp end parallel do
end subroutine s1
end module m1
Will tid and global be private or shared?
Any help is greatly appreciated!
Module variables are always shared in OpenMP unless you use the threadprivate directive. See Difference between OpenMP threadprivate and private for detailed description of threadprivate. So global will be shared.
The local variable tid is declared in the subroutine and called from the parallel region. Therefore it will be private unless it has the save attribute.
(Note that initialization like integer :: tid = 0 also adds the save implicitly, so be careful.)

Paralelize mixed f77 f90 Fortran code?

I have a code written mostly in f77 however there are also routines written with the f90 syntax.
I've been reading how to use openMP for each case, but now I have the doubt how should I do it if I have both syntax in the same code?
More specifically should I use
use omp_lib
or
include 'omp_lib.h'
but in the same .f file I have both sintaxys. what if I use both?
I am compiling with gfortran 4.8.4.
If I use let's say
use omp_lib (meaning f90 syntax)
Then I have to use the correspondent syntax
!$omp parallel shared ( var1, var2 ) private ( i, j )
!$omp do
do j = 1, n
do i = 1, m
var1(i,j) = var2
end do
end do
!$omp end do
However a few lines down I'll have a do loop written in f77
c$omp parallel shared ( w ) private ( i, j )
c$omp do
do 20 i = 10, 1, -2
write(*,*) 'i =', i
20 continue
c$omp end do
but this way to call the parallel loop would be recognised by using use omp_lib?
The same question when is the opposite.
First, you need the use or include only if you call OpenMP procedures. If you just have a couple of !$omp directives it is not needed.
You only need this, when you call a procedure like:
call omp_set_num_threads(1)
or
tid = omp_get_thread_num()
but not for the directives.
For Fortran 90 and later code (even in fixed form which looks like Fortran 77) use
use omp_lib
because the compiler is then better able to check if you call the functions properly (it has an explicit interface to them).
If the code is true Fortran 77 (and not just looking like it), you have to use the include although technically Fortran 77 doesn't even know that. It is a common non-standard extension to Fortran 77.

Is it legal that the index for !$omp atomic different from its host's loop index variable?

I came across a question when I was learning about how to avoid a data conflict with multiple threads potential reading and writing using the OpenMP directive !$atomic.
Shown in the text below is the code snippet made up for my question. I am wondering if it is legal in FORTRAN to use a different index (here is j) for !$atomic than the loop index variable i, which is the one immediately following the directive !$omp parallel do private(a,b) ? Thanks.
program main
...
integer :: i,j
integer, dimension(10000) :: vx,vy,va,vb
...
va=0
!$omp parallel do private(j)
do i=1,10000
j=merge(vx(i),vy(i),mod(i,2)==1)
!$omp atomic update
va(j)=va(j)+vb(j)
end do
!$omp end parallel do
...
end program
Furthermore, is it OK to loop on an atomic directive?
program main
...
integer :: i,j
integer, dimension(10000) :: vx,vy
integer, dimension(12,10000) :: va,vb
...
va=0
!$omp parallel do private(j,k)
do i=1,10000
j=merge(vx(i),vy(i),mod(i,2)==1)
do k=1,12
!$omp atomic update
va(k,j)=va(k,j)+vb(k,j)
enddo
end do
!$omp end parallel do
...
end program
Yes, why not? It is just update of a memory address, there is no difference. There even wouldn't be much sense in using atomic with i in your case, as different threads have different values of i.
BUT, be aware of your race condition with j you are writing to it from more thread, it should be private.
Your second example adds nothing new, it is the same situation, still legal.

To host functions within a threaded subroutine

I encountered a problem when I port my Fortran project to OpenMP. In my original code, there are two functions named add and mpy being passed to a threaded subroutine submodel that throws respective function into another subroutine defined in a module toolbox.
Now, for my new code, I am wondering whether there is a way to produce exactly the same outcome as with my original code but with a tiny twist that moves the two functions add and mpy to be hosted (i.e., contained) within the subroutine submodel.
Thanks.
Lee
--- My original code consists of four files: MAIN.F90, MODEL.F90, VARIABLE.F90, and TOOLBOX.F90
OUTPUT:
--- addition ---
3 7 11 15
--- multiplication ---
2 12 30 56
Press any key to continue . . .
MAIN.F90
program main
use model
implicit none
call sandbox()
end program main
MODEL.F90
module model
use omp_lib
use variable
implicit none
contains
subroutine submodel(func,x,y)
implicit none
interface
function func(z)
implicit none
integer :: z,func
end function func
end interface
integer :: x,y
call tool(func,x,y)
end subroutine submodel
function add(a)
implicit none
integer :: a,add
add=a+thread_private
end function add
function mpy(m)
implicit none
integer :: m,mpy
mpy=m*thread_private
end function mpy
subroutine sandbox()
implicit none
integer :: a(4),b(4),c(4),i
a=[((i),i=1,7,2)]
b=[((i),i=2,8,2)]
!$omp parallel do
do i=1,4
thread_private=b(i)
call submodel(add,a(i),c(i))
enddo
!$omp end parallel do
write(6,'(a)') '--- addition ---'
write(6,'(4(i5))') c
!$omp parallel do
do i=1,4
thread_private=b(i)
call submodel(mpy,a(i),c(i))
enddo
!$omp end parallel do
write(6,'(a)') '--- multiplication ---'
write(6,'(4(i5))') c
end subroutine sandbox
end module model
TOOLBOX.F90
module toolbox
implicit none
contains
subroutine tool(funct,input,output)
implicit none
interface
function funct(x)
implicit none
integer :: x,funct
end function funct
end interface
integer :: input,output
output = funct(input)
end subroutine tool
end module toolbox
VARIABLE.F90
module variable
use toolbox
implicit none
integer :: thread_private
!$omp threadprivate(thread_private)
end module variable
Is it possible to simply rearrange them in this way? (I have tried and apparently it failed):
subroutine submodel(func,x,y)
implicit none
interface
function func(z)
implicit none
integer :: z,func
end function func
end interface
integer :: x,y
call tool(func,x,y)
contains
function add(a)
implicit none
integer :: a,add
add=a+thread_private
end function add
function mpy(m)
implicit none
integer :: m,mpy
mpy=m*thread_private
end function mpy
end subroutine submodel
You can make the two procedures internal to the subroutine submodel exactly as you did in your last code snippet. The problem is you cannot pass these two subroutines as actual arguments from outside of the subroutine, because you have no access to them there.
Even if you have procedure pointers to them stored somewhere, these would be invalid as soon as the original run of submodel that could have created them ended.
I would think about using some switch:
subroutine submodel(switch,x,y)
implicit none
integer :: switch,x,y
select case(switch)
case(USE_ADD)
call tool(add,x,y)
case(USE_MPY)
call tool(mpy,x,y)
case default
stop "unknown switch value"
end select
contains
function add(a)
implicit none
integer :: a,add
add=a+thread_private
end function add
function mpy(m)
implicit none
integer :: m,mpy
mpy=m*thread_private
end function mpy
end subroutine submodel
Another option is to keep your original design.

Resources