Threads in Perl: order of detach() - multithreading

I am trying to get into Perl's use of threads. Reading the documentation I came across the following code:
use threads;
my $thr = threads->create(\&sub1); # Spawn the thread
$thr->detach(); # Now we officially don't care any more
sleep(15); # Let thread run for awhile
sub sub1 {
my $count = 0;
while (1) {
$count++;
print("\$count is $count\n");
sleep(1);
}
}
The goal, it seems, would be to create one thread running sub1 for 15 seconds, and in the mean time print some strings. However, I don't think I understand what's going on at the end of the programme.
First of all, detach() is defined as follows:
Once a thread is detached, it'll run until it's finished; then Perl
will clean up after it automatically.
However, when does the subroutine finish? while(1) never finishes. Nor do I find any information in sleep() that it'd cause to break a loop. On top of that, from the point we detach we are 'waiting for the script to finish and then clean it up' for 15 seconds, so if we are waiting for the subroutine to finish, why do we need sleep() in the main script? The position is awkward to me; it suggests that the main programme sleeps for 15 seconds. But what is the point of that? The main programme (thread?) sleeps while the sub-thread keeps running, but how is the subroutine then terminated?
I guess the idea is that after sleep-ing is done, the subroutine ends, after which we can detach/clean up. But how is this syntactically clear? Where in the definition of sleep is it said that sleep terminates a subroutine (and why), and how does it know which one to terminate in case there are more than one threads?

All threads end when the program ends. The program ends when the main thread ends. The sleep in the main thread is merely keeping the program running a short time, after which the main thread (therefore the program, therefore all created threads) also end.
So what's up with detach? It just says "I'm never going to bother joining to this thread, and I don't care what it returns". If you don't either detach a thread or join to it, you'd get a warning when the program ends.

detach a thread means "I don't care any more", and that does actually mean when your process exits, the thread will error and terminate.
Practically speaking - I don't think you ever want to detach a thread in perl - just add a join at the end of your code, so it can exit cleanly, and signal it via a semaphore or Thread::Queue in order to terminate.
$_ -> join for threads -> list;
Will do the trick.
That code example - in my opinion - is a bad example. It's just plain messy to sleep so a detached thread has a chance to complete, when you could just join and know that it's finished. This is especially true of perl threads, which it's deceptive to assume they're lightweight, and so can be trivially started (and detached). If you're ever spawning enough that the overhead of joining them is too high, then you're using perl threads wrong, and probably should fork instead.
You're quite right - the thread will never terminate, and so you code will always have a 'dirty' exit.
So instead I'd rewrite:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use threads::shared;
my $run : shared;
$run = 1;
sub sub1 {
my $count = 0;
while ($run) {
$count++;
print("\$count is $count\n");
sleep(1);
}
print "Terminating\n";
}
my $thr = threads->create( \&sub1 ); # Spawn the thread
sleep(15); # Let thread run for awhile
$run = 0;
$thr->join;
That way your main signals the thread to say "I'm done" and waits for it to finish it's current loop.

Related

perl : Threads do not die when main process dies

I have a perl program, that spawns several threads. Each thread processees some task (by firing off other system commands etc) and then when its all done, Waits.
Once all threads are done, they fire a signal to Parent process. The parent then loads up new jobs, and signals the threads to go work on these new tasks.
So ideally, this program, would run forever.
Now, if I kill it in command line with kill -9 MainProgram.pl, its not killed! I see the output of the jobs the threads are running, and then I also see that after they are done, they getnew jobs and just go on and on...
I am absolutely confounded. If I do a kill -9 MainProgram.pl, it is supposed to kill all threads it owns, right?
Regardless of what the threads are out doing....
And even if the threads are doing I/O and so they wait for the IO to get done...I would expect the thread to die after its current task is done..but clearly, Main is reloading jobs too, as threads just keep continuing...
Is this kind of behaviour seen in perl ?
EDIT: Some of the code in mainProgram.pl
use threads;
use threads::shared;
for (my $count = 0; $count <= $threadNum-1; $count++) {
$t = threads->new(\&handleEvent, $count) ;
push(#threads, $t);
}
#Parent thread:
while(1) {
lock($parentSignal);
cond_wait($parentSignal);
getEvents();
while(#eventCount== 0){
sleep($parent_sleep_time);
getEvents(); #Try to get events again until you get some new stuff to process
}
cond_broadcast($threadsDone); # threadsgo work on this
}
Thanks
From what I understand, you're supposed to either join() or detach() on all threads prior to exiting.
From the POD:
If the program exits without all threads having either been joined or
detached, then a warning will be issued.
Source: http://metacpan.org/pod/threads

Perl Threads - Capture Exit

I have code that spawns two threads. The first is a system command which launches an application. The second monitors the program. I'm new to perl threads so I have a few questions...
my $thr1 = threads->new(system($cmd));
sleep(FIVEMINUTES);
my $thr2 = threads->new(\&check);
my $rth1 = $thr1->join();
my $rth2 = $thr2->join();
1) Do I need a second thread to monitor the program? You can think of my sub routine call to &check as a infinite while loop which checks a text file for stuff the application produces. Could I just do this:
my $thr1 = threads->new(system($cmd));
sleep(FIVEMINUTES);
&check;
2) I'm trying to figure out what my parent is doing after I run this code. So after I launch line 1 it will spawn that new thread, sleep, then spawn that second thread and then sit at that first join and wait. It will not execute the rest of my code until it joins at that first join. Is this correct or am I wrong? If I am wrong, then how does it work?
3) My first thread the one that launches the application can be killed unexpectedly. when this happens, I have nothing to catch that and kill the threads. It just says:
"Thread 1 terminated abnormally: Undefined subroutine &main::65280 called at myScript.pl line 109." and then hangs there.
What could I do to get it to end the other threads? I need it to send an email before the program ends as well which I can do by just calling &email (another subroutine I made).
Thanks
First of all,
my $thr1 = threads->new(system($cmd));
should be
my $thr1 = threads->new(sub { system($cmd) });
or simply
my $thr1 = async { system($cmd) };
You don't need to start a third thread. As you suspected, the main thread and the one executing system are sufficient.
What if the command finishes executing in less than five minutes? The following replaces sleep with a signal.
use threads;
use threads::shared;
my $done :shared = 0;
my $thr1 = async {
system($cmd);
lock($done);
$done = 1;
cond_signal($done);
};
{ # Wait up to $timeout for the thread to end.
lock($done);
my $timeout = time() + 5*60;
1 while !$done && cond_timedwait($done, $timeout);
if (!$done) {
... there was a timeout ...
}
}
$thr1->join();
In 2004-2006 I had the same challenges for 24/7 running perl app on Winblows... The only approach working was to use xml state files on disk to communicate the status of each component of the system... and make sure if threads are used every stat file handling occurred within a closure code block {} (big gotcha) The app ran at least 3 years on 100 machines 24/7 without errors ...
If you are on a Unix-like OS I would suggest to use forks and interprocess communication.
Use cpan modules, do not reinvent the wheel..
Multithreading in Perl is a little hard to deal with, I would suggest using the fork() commands instead. I will attempt to answer your questions to the best of my ability.
1) It seems to me like two threads/processes are the way to go here, as you need to check asynchronously check your data.
2) Your parent works exactly as you describe.
3) The reason for your thread hanging could be that you never terminate your second thread. You said it was an infinite loop, is there any exit condition?

Does Perl's thread join method wait for the first thread to finish if I launch more than one thread?

foreach $name (#project_list)
{
# a thread is created for each project
my $t = threads->new(\&do_work, $name);
push(#threads, $t);
}
foreach (#threads) {
my $thrd = $_->join;
print "Thread $thrd done\n";
}
sub do_work {
# execute some commands here...
}
The project_list is a list of 40 items. When I spawn a thread for each item, will the join method wait for the first thread to finish and then move over to the next one and so on?
If that is the case, then is it possible to avoid it? I mean some threads will finish faster then others so why wait?
Please let me know if more information is required.
Thank you.
$_->join waits for the thread designated by $_ to finish. Since you're pushing them in order, and foreach traverses the list in order, yes, you'll wait first for the first thread.
But that doesn't matter since you're waiting for all threads to finish. It doesn't matter if you wait for the fastest finishers or the slowest ones first - you'll be waiting for everyone anyway.
Why wait? It depends on the scope of the post-processing step.
If the post-processing step needs all threads to finish before beginning work, then the wait for individual threads is unavoidable.
If the post-processing step is specific to the results of each thread, it should be possible to make the post-processing part of the thread itself.
In both cases, $_->join foreach #threads; is the way to go.
If there is no need to wait for the threads to finish, use the detach command instead of join. However, any results that the threads may return will be discarded.
The main thread has to live for the duration of all threads so you need to know when they are all done. The can be done queues or semaphores. Semaphores are the simplest:
use Thread::Semaphore;
my $S = Thread::Semaphore->new();
foreach $name (#project_list)
{
# a thread is created for each project
my $t = threads->new(\&do_work, $name);
$S->down_force(); # take one for each thread
}
$S->down(); # this blocks until worker threads release one each
print "Thread $thrd done\n";
sub do_work {
# execute some commands here...
$S->up(); # here the worker gives one back when done.
}

Threaded perl and signal handlers

I am using the Thread::Pool module in perl to parallelize some perl code. This process takes a while and occasionally I will kill it from the command line with a SIGINT. Doing so causes the program to end abruptly, as I expected. This leaves some messy temporary files around, so I'd like to install a signal handler. I did this:
sub INT_Handler{
#clean up code
exit(1);
}
$SIG{'INT'} = 'INT_handler';
before creating the thread pool and starting the threads. Now when I send the SIGINT, the worker threads that are running die, but the pool promptly launches another set workers to handle the next set of jobs and continues running. Why doesn't the call to exit in the signal handler exit the main thread? What do I need to stop the process from running?
Edited in response to mob's comment
** Further edit **
Here is an example I wrote up.
use Thread::Pool;
sub INT_handler{
print "Handler\n";
exit(1);
}
$SIG{'INT'}='INT_handler';
sub f{
print "Started a thread " . rand(10000) . "\n";
sleep(10);
}
my $pool;
my $submit = \&f;
if (0){
$pool = Thread::Pool->new({do=>'f', workers=>5});
$submit = sub{ $pool->job; }
}
for (my $i = 0; $i < 100; $i++){ $submit->(); }
$pool->shutdown if defined $pool;
with 0, I see the expected result
h:57 Sep 15 16:15:19> perl tp.pl
Started a thread 3224.83224635111
Handler
but with 1, this happens
h:57 Sep 15 16:14:56> perl tp.pl
Started a thread 5034.63673711853
Started a thread 9300.99967009486
Started a thread 1394.45532885478
Started a thread 3356.0428193687
Started a thread 1424.4741558014
etc and the handler doesn't get entered and the process continues running. I had to kill the process with a signal other than SIGINT. Without the handler, both cases simply exit when passed a SIGINT.
This is more a hint rather than a definitive answer, but it appears your main thread is never in the "safe" state to run the signal handler. It does work when you enable Perl's unsafe signals:
PERL_SIGNALS=unsafe perl tp.pl
See perlipc for more information on safe and unsafe signals -- maybe it will lead you in the right direction to implement it with safe signals (as it probably should be).
(update by mob) Building on Michal's original insight, this workaround with Perl::Unsafe::Signals also gets the handler to work as you'd expect
use Perl::Unsafe::Signals;
...
UNSAFE_SIGNALS {
$pool->shutdown if defined $pool;
};
So clearly it is something about Perl's safe signalling mechanism that is interfering with the signal on its way to the handler. I wonder if this would be fixed by putting an UNSAFE_SIGNALS { ... } block inside of Thread::Pool::shutdown. Either way, I would file a bug report about this.

Perl ithreads: Do some math instead of sleeping

I am using perl ithreads and things work fine, unless I decide to have threads sleep.
Lets say my routine thread_job is passed as an entry for several threads to start running concurrently.
thread_job()
{
...
sleep 2;
#do other stuff here
}
If I dont have a sleep I have no issues with the threads running and they do their tasks fine. If I add a sleep, my script hangs. I am running this off a windows command prompt, if that helps.
Since I do need to sleep and Im guessing there's an issue with using this sleep on my current setup, I intend to have the thread do something, for a while, instead of sleeping. Is there any such mathematical operation which I could perform?
Try using Win32::Sleep instead. (Note that it takes milliseconds as an argument, not seconds.)
Calling sleep() blocks the entire process (that is all the threads).
You can instead block a single thread by calling select(). Do something like this:
thread_job() {
...
$delay = 2;
select(undef, undef, undef, $delay);
...
}

Resources