I have 100+ tasks to do, I can do it in a loop, but that will be slow
I want to do these jobs by threading, let's say, 10 threads
There is no dependency between the jobs, each can run independently, and stop if failed
I want these threads to pick up my jobs and do it, there should be no more than 10 threads in total, otherwise it may harm the server
These threads keep doing the jobs until all finished
Stop the job in the thread when timeout
I was searching information about this on the Internet, Threads::Pool, Threads::Queue...
But I can't be sure on which one is better for my case. Could anyone give me some advise?
You could use Thread::Queue and threads.
The IPC (communication between threads) is much easier tan between processes.
To fork or not to fork?
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
my #thrs = threads->create(sub {
while (my $item = $q->dequeue()) {
# Do work on $item
}
})->detach() for 1..10;#for 10 threads
my $dbh = ...
while (1){
#get items from db
my #items = get_items_from_db($dbh);
# Send work to the thread
$q->enqueue(#items);
print "Pending items: "$q->pending()."\n";
sleep 15;#check DB in every 15 secs
}
I'd never use perl threads. The reason is that they aren't conceptually speaking threads: you have to specify what data is to be shared between the threads. Each thread runs a perl interpreter. That's why they are called interpreterthreads or ithreads. Needless to say, this consumes alot of memory all for running things in parallel. fork() shares al the memory up until the fork point. So if they are independent tasks, always use fork. It's also the most Unix way of doing things.
Related
It looks like thread_sleep is not ending properly.
I could handle it using thread queues, semaphores and such but I am interested in what the problem may be here.
This code never ends, as the threads->list() size never decreases.
use strict;
use warnings;
use Thread;
my #threads;
my $count = 0;
while ( scalar( #threads ) < 10 ) {
my $thr = threads->create( 'thread_sleep' );
push #threads, $thr;
$count++;
print "Spawned Thread nr. $count\n";
while ( threads->list() > 4 ) {
print "too many threads, sleeping a second...\n";
sleep( 1 );
}
}
sub thread_sleep {
sleep( 5 );
}
Threads work a lot like processes -- after a thread exits, it stays around in the thread list as a "zombie" thread until another thread (not necessarily its parent) calls $thr->join to collect its return value.
You aren't calling $thr->join anywhere, so these threads are piling up. You can use threads->list(threads::joinable) to check which threads have exited and are now joinable.
(Alternatively, consider using Parallel::ForkManager to manage multiple worker processes. Perl interpreter threads are messy and are best avoided.)
You are relying on a very outdated Perl module that was introduced in 1998 and removed from core in 2007. You don't say what version of Perl you're running, but software doesn't work like automobiles where a pristine example of very old edition is laudable
You need to update your installation
The documentation for the Thread module says this (markup original)
DEPRECATED
The Thread module served as the frontend to the old-style thread model, called 5005threads, that was introduced in release 5.005. That model was deprecated, and has been removed in version 5.10.
For old code and interim backwards compatibility, the Thread module has been reworked to function as a frontend for the new interpreter threads (ithreads) model. However, some previous functionality is not available. Further, the data sharing models between the two thread models are completely different, and anything to do with data sharing has to be thought differently. With ithreads, you must explicitly share() variables between the threads.
You are strongly encouraged to migrate any existing threaded code to the new model (i.e., use the threads and threads::shared modules) as soon as possible.
I'm creating a perl application which executes in multiple threads and each thread consuming time. This is what I have so far
use strict;
use warnings;
use threads;
my #file_list = ("file1", "file2", "file3");
my #jobs;
my #failed_jobs;
my $timeout = 10; #10 seconds timeout
foreach my $s (#file_list){
push #jobs, threads->create(sub{
#time consuming task
})
}
$_->join for #jobs;
The problem is that the time consuming task may sometimes get stuck (or take more than $timeout seconds of time to run). So when that happens, I want to get the name of the file and push it to #failed_jobs and then kill that thread. However, I want to continue with the other threads. When all threads are either killed or completed, I want to exit.
Can someone tell me how to modify my above code to achieve this?
Thanks
If you want the ability to kill the task, you don't want threads but processes.
I wrote a very short script with Perl and I used multi-thread in it.
My problem is, the thread I created is not joinable. So I am wondering, what is the condition to make thread joinable?
What is the limit of a thread in Perl?
#!/usr/bin/env perl
#
#
use lib "$::XCATROOT/lib/perl";
use strict;
use threads;
use Safe;
sub test
{
my $parm = shift;
}
my $newchassis = ["1", "2", "3"];
my #snmp_threads ;
for my $item (#$newchassis)
{
my $thread = threads->create(\&test, $item);
push #snmp_threads, $thread;
}
for my $t (#snmp_threads)
{
$t->join();
}
This can be very tricky as it works find on RHEL 6.3 and but fails on SLES 11sp2.
Though there is no code, i will go ahead and assume that you are using join foreach #threads; for joining the threads. Now the joining of the threads depends on the post processing. Without seeing your code it's difficult to know, what you are doing. But how it works is that :
If the post-processing step needs all threads to finish before
beginning work, then the wait for individual threads is unavoidable.
If the post-processing step is specific to the results of each
thread, it should be possible to make the post-processing part of
the thread itself.
In both cases, $_->join foreach #threads; is the way to go.
If there is no need to wait for the threads to finish, use the
detach command instead of join. However, any results that the
threads may return will be discarded.
Are you sure, you have provided a valid post processing scenario for your activity ?
First of all, I'm new to Perl.
I want to make multiple (e.g. 160) HTTP GET requests on a REST API in Perl. Executing them one after another takes much time, so I was thinking of running the requests in parallel. Therefore I used threads to execute more requests at the same time and limited the number of parallel requests to 10.
This worked just fine for the first time I ran the program, the second time I ran 'out of memory' after the 40th request.
Here's the code: (#urls contains the 160 URLs for the requests)
while(#urls) {
my #threads;
for (my $j = 0; $j < 10 and #urls; $j++) {
my $url = shift(#urls);
push #threads, async { $ua->get($url) };
}
for my $thread (#threads) {
my $response = $thread->join;
print "$response\n";
}
}
So my question is, why am I NOT running out of memory the first time but the second time (am I missing something crucial in my code)? And what can I do to prevent it?
Or is there a better way of executing parallel GET requests?
I'm not sure why you would get a OOM error on a second run when you don't get one on the first run; when you run a Perl script and the perl binary exits, it'll release all of it's memory back to the OS. Nothing is kept between executions. Is the exact same data being returned by the REST service each time? Maybe there's more data the second time you run and it's pushing you over the edge.
One problem I notice is that you're launching 10 threads and running them to completion, then spawning 10 more threads. A better solution may be a worker-thread model. Spawn 10 threads (or however many you want) at the start of the program, put the URLs into a queue, and allow the threads to process the queue themselves. Here's a quick example that may help:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new();
my #thr = map {
threads->create(sub {
my #responses = ();
while (defined (my $url = $q->dequeue())) {
push #responses, $ua->get($url);
}
return #responses;
});
} 1..10;
$q->enqueue($_) for #urls;
$q->enqueue(undef) for 1..10;
foreach (#thr) {
my #responses_of_this_thread = $_->join();
print for #responses_of_this_thread;
}
Note, I haven't tested this to make sure it works. In this example, you create a new thread queue and spawn up 10 worker threads. Each thread will block on the dequeue method until there is something to be read. Next, you queue up all the URLs that you have, and an undef for each thread. The undef will allow the threads to exit when there is no more work to perform. At this point, the threads will go through and process the work, and you will gather the responses via the join at the end.
Whenever I need an asynchronous solution Perl, I first look at the POE framework. In this particular case I used POE HTTP Request module that will allow us to send multiple requests simultaneously and provide a callback mechanism where you can process your http responses.
Perl threads are scary and can crash your application, especially when you join or detach them. If responses do not take a long time to process, a single threaded POE solution would work beautifully.
Sometimes though, we have to a rely on threading because application gets blocked due to long running tasks. In those cases, I create a certain number of threads BEFORE initiating anything in the application. Then with Thread::Queue I pass the data from the main thread to these workers AND never join/detach them; always keep them around for stability purposes.
(Not an ideal solution for every case.)
POE supports threads now and each thread can run a POE::Kernel. The kernels can communicate with each other through TCP sockets (which POE provides nice unblocking interfaces).
The memory consumption of the following code increases in the course of its execution.
What could be going wrong ? Is there something else I need to do to exit cleanly from the thread ?
#!/usr/bin/perl -w
use strict;
my ($i,$URL);
my #Thread;
my $NUM_THREADS=4;
my #response:shared =();
while(1)
{
for($i=0;$i<$NUM_THREADS;$i++)
{
if( $response[$i] is processed)
{
$URL=FindNextURL();
$Thread[$i]=new threads \&Get,$i,$URL;
$Thread[$i]->detach();
}
}
# wait for atleast one $response[$i]
# if ready process it
}
sub Get
{
my $i=$_[0];
my $URL=$_[1];
$response[$i]=FetchURL($URL);
return;
}
from http://perldoc.perl.org/threads.html
"On most systems, frequent and continual creation and destruction of threads can lead to ever-increasing growth in the memory footprint of the Perl interpreter. While it is simple to just launch threads and then ->join() or ->detach() them, for long-lived applications, it is better to maintain a pool of threads, and to reuse them for the work needed, using queues to notify threads of pending work. The CPAN distribution of this module contains a simple example (examples/pool_reuse.pl) illustrating the creation, use and monitoring of a pool of reusable threads."
Please try to have a poll of threads.