How do I parallelize GPars Actors? - multithreading

My understanding of GPars Actors may be off so please correct me if I'm wrong. I have a Groovy app that polls a web service for jobs. When one or more jobs are found it sends each job to a DynamicDispatchActor I've created, and the job is handled. The jobs are completely self-contained and don't need to return anything to the main thread. When multiple jobs come in at once I'd like them to be processed in parallel, but no matter what configuration I try the actor processes them first in first out.
To give a code example:
def poolGroup = new DefaultPGroup(new DefaultPool(true, 5))
def actor = poolGroup.messageHandler {
when {Integer msg ->
println("I'm number ${msg} on thread ${Thread.currentThread().name}")
Thread.sleep(1000)
}
}
def integers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
integers.each {
actor << it
}
This prints out:
I'm number 1 on thread Actor Thread 31
I'm number 2 on thread Actor Thread 31
I'm number 3 on thread Actor Thread 31
I'm number 4 on thread Actor Thread 31
I'm number 5 on thread Actor Thread 31
I'm number 6 on thread Actor Thread 31
I'm number 7 on thread Actor Thread 31
I'm number 8 on thread Actor Thread 31
I'm number 9 on thread Actor Thread 31
I'm number 10 on thread Actor Thread 31
With a slight pause in between each print out. Also notice that each printout happens from the same Actor/thread.
What I'd like to see here is the first 5 numbers are printed out instantly because the thread pool is set to 5, and then the next 5 numbers as those threads free up. Am I completely off base here?

To make it run as you expect there are few changes to make:
import groovyx.gpars.group.DefaultPGroup
import groovyx.gpars.scheduler.DefaultPool
def poolGroup = new DefaultPGroup(new DefaultPool(true, 5))
def closure = {
when {Integer msg ->
println("I'm number ${msg} on thread ${Thread.currentThread().name}")
Thread.sleep(1000)
stop()
}
}
def integers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
def actors = integers.collect { poolGroup.messageHandler(closure) << it }
actors*.join()
Full gist file: https://gist.github.com/wololock/7f1348e04f68710e42d2
Then the output will be:
I'm number 5 on thread Actor Thread 5
I'm number 4 on thread Actor Thread 4
I'm number 1 on thread Actor Thread 1
I'm number 3 on thread Actor Thread 3
I'm number 2 on thread Actor Thread 2
I'm number 6 on thread Actor Thread 3
I'm number 9 on thread Actor Thread 4
I'm number 7 on thread Actor Thread 2
I'm number 8 on thread Actor Thread 5
I'm number 10 on thread Actor Thread 1
Now let's take a look what changed. First of all in your previous example you've worked on a single actor only. You defined poolGroup correctly, but then you created a single actor and shifted computation to this single instance. To make run those computations in parallel you have to rely on poolGroup and only send an input to some message handler - pool group will handle actors creation and their lifecycle management. This is what we do in:
def actors = integers.collect { poolGroup.messageHandler(closure) << it }
It will create a collection of actors started with given input. Pool group will take care that the specified pool size is not exceeded. Then you have to join each actor and this can be done by using groovy's magic: actors*.join(). Thanks that the application will wait with termination until all actors stop their computation. That's why we have to add stop() method to the when closure of message handler's body - without it, it wont terminate, because pool group does not know that actors did they job - they may wait e.g. for some another message.
Alternative solution
We can also consider alternative solution that uses GPars parallelized iterations:
import groovyx.gpars.GParsPool
// This example is dummy, but let's assume that this processor is
// stateless and shared between threads component.
class Processor {
void process(int number) {
println "${Thread.currentThread().name} starting with number ${number}"
Thread.sleep(1000)
}
}
def integers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Processor processor = new Processor()
GParsPool.withPool 5, {
integers.eachParallel { processor.process(it) }
}
In this example you have a stateless component Processor and paralleled computations using one instance of stateless Processor with multiple input values.
I've tried to figure out the case you mentioned in comment, but I'm not sure if single actor can process multiple messages at a time. Statelessness of an actor means only that it does not change it's internal state during the processing of a message and must not store any other information in actor scope. It would be great if someone could correct me if my reasoning is not correct :)
I hope this will help you. Best!

Related

Efficent use of Future

is there any diffrence between in these approach?
val upload = for {
done <- Future {
println("uploadingStart")
uploadInAmazonS3 //take 10 to 12 sec
println("uploaded")
}
}yield done
println("uploadingStart")
val upload = for {
done <- Future {
uploadInAmazonS3 //take 10 to 12 sec
}
}yield done
println("uploadingStart")
i wanna know in terms of thread Blocking?
does thread is blocked here, while executing these three lines
println("uploadingStart")
uploadInAmazonS3 //take 10 to 12 sec
println("uploaded")
and in another it is not blocking thread it is so?
or thread are same busy in both cases?
The code within future will be executed by some thread from the executionContext(thread pool)
Yes, the thread which executes this part
println("uploadingStart")
uploadInAmazonS3 //take 10 to 12 sec
println("uploaded")
will be blocked, but not the calling thread(main thread).
In the second case both the println statements are executed by the main thread. Since the main thread simply proceeds after creating the future, the println statements are executed without any delay
The difference is that in former code, println are executed when the future is really performed, whereas in the second one println are runed when the future is declared (prepared, but not yet executed).

Perl Queue and Threads abnormal exit

I am quite new to Perl, especially Perl Threads.
I want to accomplish:
Have 5 threads that will en-queue data(Random numbers) into a
Thread::queue
Have 3 threads that will de-queue data from the
Thread::queue.
The complete code that I wrote in order to achieve above mission:
#!/usr/bin/perl -w
use strict;
use threads;
use Thread::Queue;
my $queue = new Thread::Queue();
our #Enquing_threads;
our #Dequeuing_threads;
sub buildQueue
{
my $TotalEntry=1000;
while($TotalEntry-- >0)
{
my $query = rand(10000);
$queue->enqueue($query);
print "Enque thread with TID " .threads->tid . " got $query,";
print "Queue Size: " . $queue->pending . "\n";
}
}
sub process_Queue
{
my $query;
while ($query = $queue->dequeue)
{
print "Dequeu thread with TID " .threads->tid . " got $query\n";
}
}
push #Enquing_threads,threads->create(\&buildQueue) for 1..5;
push #Dequeuing_threads,threads->create(\&process_Queue) for 1..3;
Issues that I am Facing:
The threads are not running as concurrently as expected.
The entire program abnormally exit with following console output:
Perl exited with active threads:
8 running and unjoined
0 finished and unjoined
0 running and detached
Enque thread with TID 5 got 6646.13585023883,Queue Size: 595
Enque thread with TID 1 got 3573.84104215917,Queue Size: 595
Any help on code-optimization is appreciated.
This behaviour is to be expected: When the main thread exits, all other threads exit as well. If you don't care, you can $thread->detach them. Otherwise, you have to manually $thread->join them, which we'll do.
The $thread->join waits for the thread to complete, and fetches the return value (threads can return values just like subroutines, although the context (list/void/scalar) has to be fixed at spawn time).
We will detach the threads that enqueue data:
threads->create(\&buildQueue)->detach for 1..5;
Now for the dequeueing threads, we put them into a lexical variable (why are you using globals?), so that we can dequeue them later:
my #dequeue_threads = map threads->create(\&process_queue), 1 .. 3;
Then wait for them to complete:
$_->join for #dequeue_threads;
We know that the detached threads will finish execution before the programm exits, because the only way for the dequeueing threads to exit is to exhaust the queue.
Except for one and a half bugs. You see, there is a difference between an empty queue and a finished queue. If the queue is just empty, the dequeueing threads will block on $queue->dequeue until they get some input. The traditional solution is to dequeue while the value they get is defined. We can break the loop by supplying as many undef values in the queue as there are threads reading from the queue. More modern version of Thread::Queue have an end method, that makes dequeue return undef for all subsequent calls.
The problem is when to end the queue. We should to this after all enqueueing threads have exited. Which means, we should wait for them manually. Sigh.
my #enqueueing = map threads->create(\&enqueue), 1..5;
my #dequeueing = map threads->create(\&dequeue), 1..3;
$_->join for #enqueueing;
$queue->enqueue(undef) for 1..3;
$_->join for #dequeueing;
And in sub dequeuing: while(defined( my $item = $queue->dequeue )) { ... }.
Using the defined test fixes another bug: rand can return zero, although this is quite unlikely and will slip through most tests. The contract of rand is that it returns a pseudo-random floating point number between including zero and excluding some upper bound: A number from the interval [0, x). The bound defaults to 1.
If you don't want to join the enqueueing threads manually, you could use a semaphore to signal completition. A semaphore is a multithreading primitive that can be incremented and decremented, but not below zero. If a decrement operation would let the drop count below zero, the call blocks until another thread raises the count. If the start count is 1, this can be used as a flag to block resources.
We can also start with a negative value 1 - $NUM_THREADS, and have each thread increment the value, so that only when all threads have exited, it can be decremented again.
use threads; # make a habit of importing `threads` as the first thing
use strict; use warnings;
use feature 'say';
use Thread::Queue;
use Thread::Semaphore;
use constant {
NUM_ENQUEUE_THREADS => 5, # it's good to fix the thread counts early
NUM_DEQUEUE_THREADS => 3,
};
sub enqueue {
my ($out_queue, $finished_semaphore) = #_;
my $tid = threads->tid;
# iterate over ranges instead of using the while($maxval --> 0) idiom
for (1 .. 1000) {
$out_queue->enqueue(my $val = rand 10_000);
say "Thread $tid enqueued $val";
}
$finished_semaphore->up;
# try a non-blocking decrement. Returns true only for the last thread exiting.
if ($finished_semaphore->down_nb) {
$out_queue->end; # for sufficiently modern versions of Thread::Queue
# $out_queue->enqueue(undef) for 1 .. NUM_DEQUEUE_THREADS;
}
}
sub dequeue {
my ($in_queue) = #_;
my $tid = threads->tid;
while(defined( my $item = $in_queue->dequeue )) {
say "thread $tid dequeued $item";
}
}
# create the queue and the semaphore
my $queue = Thread::Queue->new;
my $enqueuers_ended_semaphore = Thread::Semaphore->new(1 - NUM_ENQUEUE_THREADS);
# kick off the enqueueing threads -- they handle themself
threads->create(\&enqueue, $queue, $enqueuers_ended_semaphore)->detach for 1..NUM_ENQUEUE_THREADS;
# start and join the dequeuing threads
my #dequeuers = map threads->create(\&dequeue, $queue), 1 .. NUM_DEQUEUE_THREADS;
$_->join for #dequeuers;
Don't be suprised if the threads do not seem to run in parallel, but sequentially: This task (enqueuing a random number) is very fast, and is not well suited for multithreading (enqueueing is more expensive than creating a random number).
Here is a sample run where each enqueuer only creates two values:
Thread 1 enqueued 6.39390993005694
Thread 1 enqueued 0.337993319585337
Thread 2 enqueued 4.34504733960242
Thread 2 enqueued 2.89158054485114
Thread 3 enqueued 9.4947585773571
Thread 3 enqueued 3.17079715055542
Thread 4 enqueued 8.86408863197179
Thread 5 enqueued 5.13654995317669
Thread 5 enqueued 4.2210886147538
Thread 4 enqueued 6.94064174636395
thread 6 dequeued 6.39390993005694
thread 6 dequeued 0.337993319585337
thread 6 dequeued 4.34504733960242
thread 6 dequeued 2.89158054485114
thread 6 dequeued 9.4947585773571
thread 6 dequeued 3.17079715055542
thread 6 dequeued 8.86408863197179
thread 6 dequeued 5.13654995317669
thread 6 dequeued 4.2210886147538
thread 6 dequeued 6.94064174636395
You can see that 5 managed to enqueue a few things before 4. The threads 7 and 8 don't get to dequeue anything, 6 is too fast. Also, all enqueuers are finished before the dequeuers are spawned (for such a small number of inputs).

Several producers, one consumer: Avoid starvation

Given an instance of the producer-consumer problem, where several producers send messages to a single consumer: What techniques do you recommend to avoid starvation of producers, when some of the messages arrive "at the same time" to the consumer. Until now I am considering:
Choosing "non-deterministically" by sampling some probability distribution (not sure how, considering that a different number of messages are arrived at different time stamps).
Using some counters and put a producer to sleep for a while after it has send n messages.
If you can have a priority queue I think each producer can have a message sent counter. And the queue will order based on the messageSent number and the date, such that a message should be sent before another message if its sent number is less then the other message.
In Java
class Message { //or you can implement Comparable<Message>
final Date created = new Date();
final int messageNumber;
public Message(int m ){this.messageNumber = m;}
}
BlockingQueue<Message> queue = new PriorityBlockingQueue<Message>(new Comparator(){
public int compare(Message m1, Message m2){
if(m1.messageNumber < m2.messageNumber) return 1;
if(m2.messageNumber < m1.messageNumber) return -1;
if(m1.messageNumber == m2.messageNumber) return m1.created.compareTo(m2.created);
}
});
class Provider{
int currentMessage = 0;
void send(){
queue.offer(new Message(currentMessage++));
}
}
So if Producer 1 adds 5 elements to the queue (first) and Producer 2 adds 1, the queue will have
P1: 5
P1: 4
P1: 3
P1: 2
P2: 1
P1: 1
One of the simplest and best way is to process the messages on the order of their arrival (a simple FIFO list would do the trick). it doesn't matter even if multiple messages come at the same time. By this way, none of the producers will be starved.
One thing I would make sure is the consumer consumes the messages more rapidly than the producers producing the messages. If not it might end up in producers waiting for the consumer and there won't be any advantage on having multiple producers for a single consumer.

Limit number of threads in Groovy

How can I limit number of threads that are being executed at the same time?
Here is sample of my algorithm:
for(i = 0; i < 100000; i++) {
Thread.start {
// Do some work
}
}
I would like to make sure that once number of threads in my application hits 100, algorithm will pause/wait until number of threads in the app goes below 100.
Currently "some work" takes some time to do and I end up with few thousands of threads in my app. Eventually it runs out of threads and "some work" crashes. I would like to fix it by limiting number of pools that it can use at one time.
Please let me know how to solve my issue.
I believe you are looking for a ThreadPoolExecutor in the Java Concurrency API. The idea here is that you can define a maximum number of threads in a pool and then instead of starting new Threads with a Runnable, just let the ThreadPoolExecutor take care of managing the upper limit for Threads.
Start here: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
import java.util.concurrent.*;
import java.util.*;
def queue = new ArrayBlockingQueue<Runnable>( 50000 )
def tPool = new ThreadPoolExecutor(5, 500, 20, TimeUnit.SECONDS, queue);
for(i = 0; i < 5000; i++) {
tPool.execute {
println "Blah"
}
}
Parameters for the ThreadBlockingQueue constructor: corePoolSize (5), this is the # of threads to create and to maintain if the system is idle, maxPoolSize (500) max number of threads to create, 3rd and 4th argument states that the pool should keep idle threads around for at least 20 seconds, and the queue argument is a blocking queue that stores queued tasks.
What you'll want to play around with is the queue sizes and also how to handle rejected tasks. If you need to execute 100k tasks, you'll either have to have a queue that can hold 100k tasks, or you'll have to have a strategy for handling a rejected tasks.

Seeking help with a MT design pattern

I have a queue of 1000 work items and a n-proc machine (assume n =
4).The main thread spawns n (=4) worker threads at a time ( 25 outer
iterations) and waits for all threads to complete before processing
the next n (=4) items until the entire queue is processed
for(i= 0 to queue.Length / numprocs)
for(j= 0 to numprocs)
{
CreateThread(WorkerThread,WorkItem)
}
WaitForMultipleObjects(threadHandle[])
The work done by each (worker) thread is not homogeneous.Therefore in
1 batch (of n) if thread 1 spends 1000 s doing work and rest of the 3
threads only 1 s , above design is inefficient,becaue after 1 sec
other 3 processors are idling. Besides there is no pooling - 1000
distinct threads are being created
How do I use the NT thread pool (I am not familiar enough- hence the
long winded question) and QueueUserWorkitem to achieve the above. The
following constraints should hold
The main thread requires that all worker items are processed before
it can proceed.So I would think that a waitall like construct above
is required
I want to create as many threads as processors (ie not 1000 threads
at a time)
Also I dont want to create 1000 distinct events, pass to the worker
thread, and wait on all events using the QueueUserWorkitem API or
otherwise
Exisitng code is in C++.Prefer C++ because I dont know c#
I suspect that the above is a very common pattern and was looking for
input from you folks.
I'm not a C++ programmer, so I'll give you some half-way pseudo code for it
tcount = 0
maxproc = 4
while queue_item = queue.get_next() # depends on implementation of queue
# may well be:
# for i=0; i<queue.length; i++
while tcount == maxproc
wait 0.1 seconds # or some other interval that isn't as cpu intensive
# as continously running the loop
tcount += 1 # must be atomic (reading the value and writing the new
# one must happen consecutively without interruption from
# other threads). I think ++tcount would handle that in cpp.
new thread(worker, queue_item)
function worker(item)
# ...do stuff with item here...
tcount -= 1 # must be atomic

Resources