How can I replace an OMP BARRIER by a wait function? - multithreading

I want to retire the !$OMP BARRIER in the following code so I thought to replace it by a wait function.
With !$OMP BARRIER:
if (num_thread==1) then
do i_task=first_task,last_task
tasklist_GRAD(i_task)%state=STATE_READY
call queue_enqueue_data(master_queue,tasklist_GRAD(i_task)) !< add the list elements to the queue (full queue)
end do
end if
!$OMP BARRIER ! barrier to retire
call master_worker_execution(self,var,master_queue,worker_queue,first_task,last_task,nthreads,num_thread,lck)
Without !$OMP BARRIER:
if (num_thread==1) then
omp_start=omp_get_wtime() !start
do i_task=first_task,last_task
tasklist_GRAD(i_task)%state=STATE_READY
call queue_enqueue_data(master_queue,tasklist_GRAD(i_task)) !< add the list elements to the queue (full queue)
end do
omp_end=omp_get_wtime() !end
end if
if (num_thread .ne. 1) then
call wait(int(omp_end-omp_start)*1000)
end if
call master_worker_execution(self,var,master_queue,worker_queue,first_task,last_task,nthreads,num_thread,lck)
The definition of the wait subroutine:
subroutine wait(omp_start,omp_end)
real(kind=REAL64),intent(in)::omp_start,omp_end
real(kind=REAL64)::time
time=omp_end-omp_start
call sleep(int(time))
end subroutine wait
The barrier should let the threads (not thread number 1) wait for the thread number 1 to finish queueing the master_queue. That's why I thought to replace it by a wait function.
When executing, I get a segfault due to thread safety (I guess). I have a doubt on using the INT function because I declared omp_start and omp_end as real(kind=REAL64).
EDIT:
I modified the wait subroutine according to the answer I got and did the following:
subroutine wait(master_queue)
type(QUEUE_STRUCT),pointer::master_queue !< the master queue of tasks
do while (.not. queue_full(master_queue))
call sleep(1)
end do
end subroutine wait
Unfortunately, I'm not getting results as with OMP_BARRIER.
logical function queue_full( queue )
type(QUEUE_STRUCT), intent(in) :: queue
queue_full = (queue%size == queue%capacity)
end function queue_full

The reason you are getting a segfault is that omp_start and omp_end are set by thread 1 and read by the other threads. As you currently have it, the order in which this happens is undefined, and so they can (and likely will) be read before they are set.
There is a more fundamental problem however. It looks like you just want the other threads to wait for thread 1 to finish, but there is no way to know in advance how long this will take. As such, there is no way to implement such a wait function. This is the whole reason for using barriers in the first place.

Related

How threads should verifiy that the queue is full while it's being modified at the same time by another thread?

I asked a question yesterday on the site and received an answer on the source of my error but I am still confused on how to solve the problem.
So I'm trying to replace a synchronization barrier with a function called by threads that have a rank other than 1. Thread number 1 is responsible for filling the queue.
I had the idea that threads (rank not equal to 1) should check if the queue is full. If it is not, they call the already predefined sleep function. If it is full, we can be sure that we can pass without fearing a race condition.
if (num_thread==1) then
do i_task=first_task,last_task
tasklist_GRAD(i_task)%state=STATE_READY
call queue_enqueue_data(master_queue,tasklist_GRAD(i_task)) !< add the list elements to the queue (full queue)
end do
end if
if (num_thread .ne. 1) then
call wait_thread(master_queue)
end if
!!$ !$OMP BARRIER !!!!!!! retired OMP BARR
call master_worker_execution(self,var,master_queue,worker_queue,first_task,last_task,nthreads,num_thread,lck)
Here is the definition of the wait function:
subroutine wait_thread(master_queue)
type(QUEUE_STRUCT),pointer,asynchronous::master_queue !< the master queue of tasks
do while (.not. queue_full(master_queue))
call system_sleep(1)
end do
end subroutine wait_thread
The problem is that thread number 1 will obviously modify the queue called master_queue so write and at the same time the other threads will check if the queue is full and so modify it too. This can lead to a race condition.
Here is the definition of queue_full function:
recursive logical function queue_full( queue )
type(QUEUE_STRUCT),asynchronous, intent(in) :: queue
!$OMP CRITICAL
queue_full = (queue%size == queue%capacity)
!$OMP END CRITICAL
end function queue_full
When running I don't get a segmentation error but I don't get the results for the code with OMP_BARRIER either.
Usually I get values displayed but now I get a blinking cursor.
My question is: is there any way to solve this? Is it impossible to replace OMP_BARRIER?
I tried to add the attribute asynchronous for the master_queue so the declaration became like this : type(QUEUE_STRUCT),pointer,asynchronous::master_queue !< the master queue of tasks. The critical directive in the definition of queue_full is something that I added too but in vain.
Any help, please ?
EDIT:
I just tried this method but I don't know if it really replaces OMP_BARRIER. Here is the new module time (nothing complicated, system_sleep and wait functions):
module time
use QUEUE
contains
recursive subroutine system_sleep(wait)
use,intrinsic :: iso_c_binding, only: c_int
integer,intent(in) :: wait
integer(kind=c_int):: waited
interface
function c_usleep(msecs) bind (C,name="usleep")
import
integer(c_int) :: c_usleep
integer(c_int),intent(in),VALUE :: msecs
end function c_usleep
end interface
if(wait.gt.0)then
waited=c_usleep(int(wait,kind=c_int))
endif
end subroutine system_sleep
recursive subroutine wait(full)
logical,intent(in)::full
do
call system_sleep(1000)
if (full .eqv. .true.) EXIT
end do
end subroutine wait
end module time
and this is how I replaced OMP_BARRIER:
full = .false.
first_task=5
last_task=6
if (num_thread==1) then
do i_task=first_task,last_task
tasklist_GRAD(i_task)%state=STATE_READY
call queue_enqueue_data(master_queue,tasklist_GRAD(i_task)) !< add the list elements to the queue (full queue)
end do
full=.true.
end if
if (num_thread .ne. 1) then !!!!!!!!!! how to replace OMP_BARRIER
call wait(full) !!! wait until full equal to true
end if
call master_worker_execution(self,var,master_queue,worker_queue,first_task,last_task,nthreads,num_thread,lck)
I want also to add that the shared variable full is of type logical.
Is this an efficient way to get rid of an explicit barrier ?

Main Thread wait for the fiber returning from Concurrent Effect or not?

we are doing IO operations that we want to run in a separate thread and the main thread should not wait for this operation.
def seperateThread(action: F[Unit]): F[Unit]
ConcurrentEffect[F].start(action).void
If I will call this function like below
for {
_ <- service.seperateThread(request, languageId, cacheItinerary, slices, pricing)
} yield {}
It will do the seperateThread operation in different fiber and return F[Unit] immediately or wait for the operation to complete?
Starting a fiber is a non-blocking operation, so the application flow will right away go to the next instruction.
In order to wait for the operation running in another fiber to complete, you need to invoke the join operation on the fiber object. You can't do it in your implementation since you've called void thus ignoring returned reference to fiber.
If you change your method like this:
def seperateThread[F[_]: ConcurrentEffect: Functor: Sync](action: F[Unit]): F[Fiber[F, Unit]] = ConcurrentEffect[F].start(action)
then you'd be able to use reference to created fiber to join:
for {
fiber <- ConcurrentEffect[IO].start(IO(println("Hello from another fiber!")))
// _ <- do some more operations in parallel ...
result <- fiber.join //here you can access value returned by fiber
//(in your case it's Unit so you can just ignore it).
} yield result
Using fiber's start directly is not advised in most cases, since it could lead to resource leaks. You should consider using background instead which creates Resource which will automatically cancel and clean up fiber at the end of processing.

omp_get_num_threads() and omp_get_thread_num() returning nonsense

I am just starting out with using OpenMP in Fortran using the Intel Fortran compiler and Visual Studio 2015. In the project properties I have "Fortran -> Language -> Process OpenMP Directives" set to "Generate Parallel Code (/Qopenmp)"
I have a simple program starting as follows:
program hellothreads
integer threads, id
call omp_set_num_threads(3)
threads = omp_get_num_threads()
print *,"there are", threads, "threads"
This produces
there are -2147483648 threads
which there certainly aren't. Setting the number of threads seems to work OK though, since:
!$OMP Parallel private(id) shared(threads)
threads = omp_get_num_threads()
id = omp_get_thread_num()
print *, "hello from thread", id, "out of", threads
!$OMP end Parallel
outputs
hello from thread -2147483648 out of -2147483648
hello from thread -2147483648 out of -2147483648
hello from thread -2147483648 out of -2147483648
and continuing with:
!$OMP Parallel private(id) shared(threads)
threads = omp_get_num_threads()
id = omp_get_thread_num()
print *, "this is thread", id, "of", threads
!$OMP end Parallel
outputs
this is thread -2147483648 of -2147483648
this is thread -2147483648 of -2147483648
Finally there is different weird behaviour if I call the OpenMP functions inside the a "print": e.g.:
!$OMP Parallel private(id) shared(threads)
print *, "this is thread", omp_get_num_threads(), "of", omp_get_thread_num()
!$OMP end Parallel
stop
end
Outputs
this is thread NaN of NaN
this is thread NaN of NaN
What is wrong with my configuration and/or code?
Use implicit none in ALL your Fortran programs!!!
After doing that you will realize that the functions are not declared and assumed to be real. The nonsense real value is than converted to an integer value and stored in your variables which you print.
As #francescalus recommends in the comment, by use omp_lib you use a module which contains the correct declarations of the functions and will help you check if you are using them correctly.

Is it legal that the index for !$omp atomic different from its host's loop index variable?

I came across a question when I was learning about how to avoid a data conflict with multiple threads potential reading and writing using the OpenMP directive !$atomic.
Shown in the text below is the code snippet made up for my question. I am wondering if it is legal in FORTRAN to use a different index (here is j) for !$atomic than the loop index variable i, which is the one immediately following the directive !$omp parallel do private(a,b) ? Thanks.
program main
...
integer :: i,j
integer, dimension(10000) :: vx,vy,va,vb
...
va=0
!$omp parallel do private(j)
do i=1,10000
j=merge(vx(i),vy(i),mod(i,2)==1)
!$omp atomic update
va(j)=va(j)+vb(j)
end do
!$omp end parallel do
...
end program
Furthermore, is it OK to loop on an atomic directive?
program main
...
integer :: i,j
integer, dimension(10000) :: vx,vy
integer, dimension(12,10000) :: va,vb
...
va=0
!$omp parallel do private(j,k)
do i=1,10000
j=merge(vx(i),vy(i),mod(i,2)==1)
do k=1,12
!$omp atomic update
va(k,j)=va(k,j)+vb(k,j)
enddo
end do
!$omp end parallel do
...
end program
Yes, why not? It is just update of a memory address, there is no difference. There even wouldn't be much sense in using atomic with i in your case, as different threads have different values of i.
BUT, be aware of your race condition with j you are writing to it from more thread, it should be private.
Your second example adds nothing new, it is the same situation, still legal.

Thread and Synchronization

I'm confused about how threads and synchronization works. I am working through a sample problem that is described like so:
There are two threads: P and Q. The variable, counter, is shared by both threads.
Modification of counter in one thread is visible to the other thread. The
increment instruction adds one to the variable, storing the new value.
1 global integer counter = 0
2
3 thread P()
4 incr(counter)
5 print counter
6 end
7
8 thread Q()
9 print counter
10 incr(counter)
11 print counter
12 incr(counter)
13 end
There are three print statements that output the value of counter. In the output
list below, indicate whether the given output is possible and if it is, give
the interleaving instructions (using thread and line numbers) of P and Q that
can lead to the output.
The example has output 122 is it possible? which can be produced by P4, Q9, Q10, P5, Q11, Q12. I can't wrap my head around how this works.
Assume thread P starts first and increments "counter" by one. Then it's suspended and thread Q starts, reads "counter" and prints its value ("1"). Next thread Q increments "counter", which is now "2". Then thread Q gets suspended and thread P continues. It now reads "counter" and prints its value ("2"). Thread P terminates. Thread Q continues, reads and prints "counter" ("2"). It then increments "counter" by one.
The output therefore is: "122"
That's one possible sequence of execution. Generally speaking you can never tell when a thread gets suspended and when it continues, that's the whole point of this exercise. By adding synchronization mechanisms (which this example is completely lacking) you can get control over the sequence of execution again.

Resources