Perl Threads - Capture Exit - multithreading

I have code that spawns two threads. The first is a system command which launches an application. The second monitors the program. I'm new to perl threads so I have a few questions...
my $thr1 = threads->new(system($cmd));
sleep(FIVEMINUTES);
my $thr2 = threads->new(\&check);
my $rth1 = $thr1->join();
my $rth2 = $thr2->join();
1) Do I need a second thread to monitor the program? You can think of my sub routine call to &check as a infinite while loop which checks a text file for stuff the application produces. Could I just do this:
my $thr1 = threads->new(system($cmd));
sleep(FIVEMINUTES);
✓
2) I'm trying to figure out what my parent is doing after I run this code. So after I launch line 1 it will spawn that new thread, sleep, then spawn that second thread and then sit at that first join and wait. It will not execute the rest of my code until it joins at that first join. Is this correct or am I wrong? If I am wrong, then how does it work?
3) My first thread the one that launches the application can be killed unexpectedly. when this happens, I have nothing to catch that and kill the threads. It just says:
"Thread 1 terminated abnormally: Undefined subroutine &main::65280 called at myScript.pl line 109." and then hangs there.
What could I do to get it to end the other threads? I need it to send an email before the program ends as well which I can do by just calling &email (another subroutine I made).
Thanks

First of all,
my $thr1 = threads->new(system($cmd));
should be
my $thr1 = threads->new(sub { system($cmd) });
or simply
my $thr1 = async { system($cmd) };
You don't need to start a third thread. As you suspected, the main thread and the one executing system are sufficient.
What if the command finishes executing in less than five minutes? The following replaces sleep with a signal.
use threads;
use threads::shared;
my $done :shared = 0;
my $thr1 = async {
system($cmd);
lock($done);
$done = 1;
cond_signal($done);
};
{ # Wait up to $timeout for the thread to end.
lock($done);
my $timeout = time() + 5*60;
1 while !$done && cond_timedwait($done, $timeout);
if (!$done) {
... there was a timeout ...
}
}
$thr1->join();

In 2004-2006 I had the same challenges for 24/7 running perl app on Winblows... The only approach working was to use xml state files on disk to communicate the status of each component of the system... and make sure if threads are used every stat file handling occurred within a closure code block {} (big gotcha) The app ran at least 3 years on 100 machines 24/7 without errors ...
If you are on a Unix-like OS I would suggest to use forks and interprocess communication.
Use cpan modules, do not reinvent the wheel..

Multithreading in Perl is a little hard to deal with, I would suggest using the fork() commands instead. I will attempt to answer your questions to the best of my ability.
1) It seems to me like two threads/processes are the way to go here, as you need to check asynchronously check your data.
2) Your parent works exactly as you describe.
3) The reason for your thread hanging could be that you never terminate your second thread. You said it was an infinite loop, is there any exit condition?

Related

Threads in Perl: order of detach()

I am trying to get into Perl's use of threads. Reading the documentation I came across the following code:
use threads;
my $thr = threads->create(\&sub1); # Spawn the thread
$thr->detach(); # Now we officially don't care any more
sleep(15); # Let thread run for awhile
sub sub1 {
my $count = 0;
while (1) {
$count++;
print("\$count is $count\n");
sleep(1);
}
}
The goal, it seems, would be to create one thread running sub1 for 15 seconds, and in the mean time print some strings. However, I don't think I understand what's going on at the end of the programme.
First of all, detach() is defined as follows:
Once a thread is detached, it'll run until it's finished; then Perl
will clean up after it automatically.
However, when does the subroutine finish? while(1) never finishes. Nor do I find any information in sleep() that it'd cause to break a loop. On top of that, from the point we detach we are 'waiting for the script to finish and then clean it up' for 15 seconds, so if we are waiting for the subroutine to finish, why do we need sleep() in the main script? The position is awkward to me; it suggests that the main programme sleeps for 15 seconds. But what is the point of that? The main programme (thread?) sleeps while the sub-thread keeps running, but how is the subroutine then terminated?
I guess the idea is that after sleep-ing is done, the subroutine ends, after which we can detach/clean up. But how is this syntactically clear? Where in the definition of sleep is it said that sleep terminates a subroutine (and why), and how does it know which one to terminate in case there are more than one threads?
All threads end when the program ends. The program ends when the main thread ends. The sleep in the main thread is merely keeping the program running a short time, after which the main thread (therefore the program, therefore all created threads) also end.
So what's up with detach? It just says "I'm never going to bother joining to this thread, and I don't care what it returns". If you don't either detach a thread or join to it, you'd get a warning when the program ends.
detach a thread means "I don't care any more", and that does actually mean when your process exits, the thread will error and terminate.
Practically speaking - I don't think you ever want to detach a thread in perl - just add a join at the end of your code, so it can exit cleanly, and signal it via a semaphore or Thread::Queue in order to terminate.
$_ -> join for threads -> list;
Will do the trick.
That code example - in my opinion - is a bad example. It's just plain messy to sleep so a detached thread has a chance to complete, when you could just join and know that it's finished. This is especially true of perl threads, which it's deceptive to assume they're lightweight, and so can be trivially started (and detached). If you're ever spawning enough that the overhead of joining them is too high, then you're using perl threads wrong, and probably should fork instead.
You're quite right - the thread will never terminate, and so you code will always have a 'dirty' exit.
So instead I'd rewrite:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use threads::shared;
my $run : shared;
$run = 1;
sub sub1 {
my $count = 0;
while ($run) {
$count++;
print("\$count is $count\n");
sleep(1);
}
print "Terminating\n";
}
my $thr = threads->create( \&sub1 ); # Spawn the thread
sleep(15); # Let thread run for awhile
$run = 0;
$thr->join;
That way your main signals the thread to say "I'm done" and waits for it to finish it's current loop.

Perl parallel HTTP requests - out of memory

First of all, I'm new to Perl.
I want to make multiple (e.g. 160) HTTP GET requests on a REST API in Perl. Executing them one after another takes much time, so I was thinking of running the requests in parallel. Therefore I used threads to execute more requests at the same time and limited the number of parallel requests to 10.
This worked just fine for the first time I ran the program, the second time I ran 'out of memory' after the 40th request.
Here's the code: (#urls contains the 160 URLs for the requests)
while(#urls) {
my #threads;
for (my $j = 0; $j < 10 and #urls; $j++) {
my $url = shift(#urls);
push #threads, async { $ua->get($url) };
}
for my $thread (#threads) {
my $response = $thread->join;
print "$response\n";
}
}
So my question is, why am I NOT running out of memory the first time but the second time (am I missing something crucial in my code)? And what can I do to prevent it?
Or is there a better way of executing parallel GET requests?
I'm not sure why you would get a OOM error on a second run when you don't get one on the first run; when you run a Perl script and the perl binary exits, it'll release all of it's memory back to the OS. Nothing is kept between executions. Is the exact same data being returned by the REST service each time? Maybe there's more data the second time you run and it's pushing you over the edge.
One problem I notice is that you're launching 10 threads and running them to completion, then spawning 10 more threads. A better solution may be a worker-thread model. Spawn 10 threads (or however many you want) at the start of the program, put the URLs into a queue, and allow the threads to process the queue themselves. Here's a quick example that may help:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new();
my #thr = map {
threads->create(sub {
my #responses = ();
while (defined (my $url = $q->dequeue())) {
push #responses, $ua->get($url);
}
return #responses;
});
} 1..10;
$q->enqueue($_) for #urls;
$q->enqueue(undef) for 1..10;
foreach (#thr) {
my #responses_of_this_thread = $_->join();
print for #responses_of_this_thread;
}
Note, I haven't tested this to make sure it works. In this example, you create a new thread queue and spawn up 10 worker threads. Each thread will block on the dequeue method until there is something to be read. Next, you queue up all the URLs that you have, and an undef for each thread. The undef will allow the threads to exit when there is no more work to perform. At this point, the threads will go through and process the work, and you will gather the responses via the join at the end.
Whenever I need an asynchronous solution Perl, I first look at the POE framework. In this particular case I used POE HTTP Request module that will allow us to send multiple requests simultaneously and provide a callback mechanism where you can process your http responses.
Perl threads are scary and can crash your application, especially when you join or detach them. If responses do not take a long time to process, a single threaded POE solution would work beautifully.
Sometimes though, we have to a rely on threading because application gets blocked due to long running tasks. In those cases, I create a certain number of threads BEFORE initiating anything in the application. Then with Thread::Queue I pass the data from the main thread to these workers AND never join/detach them; always keep them around for stability purposes.
(Not an ideal solution for every case.)
POE supports threads now and each thread can run a POE::Kernel. The kernels can communicate with each other through TCP sockets (which POE provides nice unblocking interfaces).

Threaded perl and signal handlers

I am using the Thread::Pool module in perl to parallelize some perl code. This process takes a while and occasionally I will kill it from the command line with a SIGINT. Doing so causes the program to end abruptly, as I expected. This leaves some messy temporary files around, so I'd like to install a signal handler. I did this:
sub INT_Handler{
#clean up code
exit(1);
}
$SIG{'INT'} = 'INT_handler';
before creating the thread pool and starting the threads. Now when I send the SIGINT, the worker threads that are running die, but the pool promptly launches another set workers to handle the next set of jobs and continues running. Why doesn't the call to exit in the signal handler exit the main thread? What do I need to stop the process from running?
Edited in response to mob's comment
** Further edit **
Here is an example I wrote up.
use Thread::Pool;
sub INT_handler{
print "Handler\n";
exit(1);
}
$SIG{'INT'}='INT_handler';
sub f{
print "Started a thread " . rand(10000) . "\n";
sleep(10);
}
my $pool;
my $submit = \&f;
if (0){
$pool = Thread::Pool->new({do=>'f', workers=>5});
$submit = sub{ $pool->job; }
}
for (my $i = 0; $i < 100; $i++){ $submit->(); }
$pool->shutdown if defined $pool;
with 0, I see the expected result
h:57 Sep 15 16:15:19> perl tp.pl
Started a thread 3224.83224635111
Handler
but with 1, this happens
h:57 Sep 15 16:14:56> perl tp.pl
Started a thread 5034.63673711853
Started a thread 9300.99967009486
Started a thread 1394.45532885478
Started a thread 3356.0428193687
Started a thread 1424.4741558014
etc and the handler doesn't get entered and the process continues running. I had to kill the process with a signal other than SIGINT. Without the handler, both cases simply exit when passed a SIGINT.
This is more a hint rather than a definitive answer, but it appears your main thread is never in the "safe" state to run the signal handler. It does work when you enable Perl's unsafe signals:
PERL_SIGNALS=unsafe perl tp.pl
See perlipc for more information on safe and unsafe signals -- maybe it will lead you in the right direction to implement it with safe signals (as it probably should be).
(update by mob) Building on Michal's original insight, this workaround with Perl::Unsafe::Signals also gets the handler to work as you'd expect
use Perl::Unsafe::Signals;
...
UNSAFE_SIGNALS {
$pool->shutdown if defined $pool;
};
So clearly it is something about Perl's safe signalling mechanism that is interfering with the signal on its way to the handler. I wonder if this would be fixed by putting an UNSAFE_SIGNALS { ... } block inside of Thread::Pool::shutdown. Either way, I would file a bug report about this.

How do I use Perl's `Thread::Pool::Simple`?

I'm using Thread::Pool::Simple for multi-threading.
I have a couple of questions which are quite general to multi-threading, I guess:
Each of my threads might die if something unexpected hapens. This is totally accepted by me, since it means some of my assertion are wrong and I need to redesign the code. Currently, when any thread dies the main program (calling thread) also dies, yielding something like:
Perl exited with active threads:
0 running and unjoined
0 finished and unjoined
4 running and detached
Are these "running and detached"
zombies? Are they "dangerous" in any
way? Is there a way to kill all of
them if any of the threads dies?
What is the common solution for such
scenarios?
Generally, my jobs are independent.
However, I pass each of them as an
argument a unique hash which is
taken form one big hash oh hashes.
the thread might change this
personal hash (but it can't get to
the large hash - it doesn't even
know about it). Hence, I guess I
don't need any locks etc. Am I
missing anything?
When your main program exits, all threads are terminated.
Perl threads work in one of two ways.
1) You can use join:
my $thr = threads->create(...);
# do something else while thread works
my $return = $thr->join(); # wait for thread to terminate and fetch return value
2) You can use detach:
my $thr = threads->create(...);
$thr->detatch(); # thread will discard return value and auto-cleanup when done
That message lists the threads that hadn't been cleaned up before the program terminated.
"Running and unjoined" is case 1, still running. "Finished and unjoined" is case 1, finished but the return value hasn't been fetched yet. "Running and detached" is case 2, still running.
So it's saying you have 4 threads that had been detached but hadn't finished before the program died. You can't tell from that whether they would have finished if the program had run longer, or they were stuck in an infinite loop, or deadlocked, or what.
You shouldn't need any locks for the situation you describe.

Perl ithreads: Do some math instead of sleeping

I am using perl ithreads and things work fine, unless I decide to have threads sleep.
Lets say my routine thread_job is passed as an entry for several threads to start running concurrently.
thread_job()
{
...
sleep 2;
#do other stuff here
}
If I dont have a sleep I have no issues with the threads running and they do their tasks fine. If I add a sleep, my script hangs. I am running this off a windows command prompt, if that helps.
Since I do need to sleep and Im guessing there's an issue with using this sleep on my current setup, I intend to have the thread do something, for a while, instead of sleeping. Is there any such mathematical operation which I could perform?
Try using Win32::Sleep instead. (Note that it takes milliseconds as an argument, not seconds.)
Calling sleep() blocks the entire process (that is all the threads).
You can instead block a single thread by calling select(). Do something like this:
thread_job() {
...
$delay = 2;
select(undef, undef, undef, $delay);
...
}

Resources