How to interrupt an I/O call in a perl thread? - multithreading

Looks like times of old good alarm() call are over, so how to interrupt blocking read()'s and write()'s made from perl thread (using that brand new 'threads' module) assuming the code making those blocking calls cannot be changed? Actual problem is stucking communication with Modbus device so I've created a simple testcase not to dip you into RS-485 hell:
use threads;
use IO::Handle;
threads->create(sub {
$io = IO::Handle->new_from_fd(fileno(STDIN), 'r') or die;
$br = read $io, $buf, 100;
warn "read: $br";
});
while(1) {threads->yield()};
Here, warn() is never executed unless you hit Ctrl-D on keyboard. Is there any simple solution to timeout that read() call?

Rather than threading (which is ill-supported in Perl at best, and mostly frowned upon), why not try one of the event modules instead? That's the standard solution to concurrency in Perl.

Related

Basic perl threading: threads->list() does not decrease

It looks like thread_sleep is not ending properly.
I could handle it using thread queues, semaphores and such but I am interested in what the problem may be here.
This code never ends, as the threads->list() size never decreases.
use strict;
use warnings;
use Thread;
my #threads;
my $count = 0;
while ( scalar( #threads ) < 10 ) {
my $thr = threads->create( 'thread_sleep' );
push #threads, $thr;
$count++;
print "Spawned Thread nr. $count\n";
while ( threads->list() > 4 ) {
print "too many threads, sleeping a second...\n";
sleep( 1 );
}
}
sub thread_sleep {
sleep( 5 );
}
Threads work a lot like processes -- after a thread exits, it stays around in the thread list as a "zombie" thread until another thread (not necessarily its parent) calls $thr->join to collect its return value.
You aren't calling $thr->join anywhere, so these threads are piling up. You can use threads->list(threads::joinable) to check which threads have exited and are now joinable.
(Alternatively, consider using Parallel::ForkManager to manage multiple worker processes. Perl interpreter threads are messy and are best avoided.)
You are relying on a very outdated Perl module that was introduced in 1998 and removed from core in 2007. You don't say what version of Perl you're running, but software doesn't work like automobiles where a pristine example of very old edition is laudable
You need to update your installation
The documentation for the Thread module says this (markup original)
DEPRECATED
The Thread module served as the frontend to the old-style thread model, called 5005threads, that was introduced in release 5.005. That model was deprecated, and has been removed in version 5.10.
For old code and interim backwards compatibility, the Thread module has been reworked to function as a frontend for the new interpreter threads (ithreads) model. However, some previous functionality is not available. Further, the data sharing models between the two thread models are completely different, and anything to do with data sharing has to be thought differently. With ithreads, you must explicitly share() variables between the threads.
You are strongly encouraged to migrate any existing threaded code to the new model (i.e., use the threads and threads::shared modules) as soon as possible.

Does join in perl threads block SIGALRM?

I have a small sample program that hangs on perl 5.16.3. I am attempting to use an alarm to trigger if two threads don't finish working in time in a much more complicated program, but this boils down the gist of it. I know there's plenty of other ways to do this, but for the sake of argument, let's say I'm stuck with the code the way it is. I'm not sure if this is a bug in perl, or something that legitimately shouldn't work.
I have researched this on the Internet, and it seems like mixing alarms and threads is generally discouraged, but I've seen plenty of examples where people claim that this is a perfectly reasonable thing to do, such as this other SO question, Perl threads with alarm. The code provided in the accepted answer on that question also hangs on my system, which is why I'm wondering if maybe this is something that's now broke, at least as of 5.16.3.
It appears that in the code below, if I call join before the alarm goes off, the alarm never triggers. If I replace the join with while(1){} and go into a busy-wait loop, then the alarm goes off just fine, so it appears that join is blocking the SIGALRM for some reason.
My expectation is that the join happens, and then a few seconds later I see "Alarm!" printed on the screen, but this never happens, so long as that join gets called before the alarm goes off.
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
sub worker {
print "Worker thread started.\n";
while(1){}
}
my $thread = threads->create(\&worker);
print "Setting alarm.\n";
$SIG{ALRM} = sub { print "Alarm!\n" };
alarm 2;
print "Joining.\n";
$thread->join();
The problem has nothing to do with threads. Signals are only processed between Perl ops, and join is written in C, so the signal will only be handled when join returns. The following demonstrates this:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
sub worker {
print "Worker thread started.\n";
for (1..5) {
sleep(1);
print(".\n");
}
}
my $thread = threads->create(\&worker);
print "Setting alarm.\n";
$SIG{ALRM} = sub { print "Alarm!\n" };
alarm 2;
print "Joining.\n";
$thread->join();
Output:
Setting alarm.
Joining.
Worker thread started.
.
.
.
.
.
Alarm!
join is essentially a call to pthread_join. Unlike other blocking system calls, pthread_join does not get interrupted by signals.
By the way, I renamed $tid to $thread since threads->create returns a thread object, not a thread id.
I'm going to post an answer to my own question to add some detail to ikegami's response above, and summarize our conversation, which should save future visitors from having to read through the huge comment trail it collected.
After discussing things with ikegami, I went and did some more reading on perl signals, consulted some other perl experts, and discovered the exact reason why join isn't being "interrupted" by the interpreter. As ikegami said, signals only get delivered in between perl operations. In perl, this is called Deferred Signals, or Safe Signals.
Deferred Signals were released in 5.8.0, back in 2002, which could be one of the reasons I was seeing older posts on the Net which don't appear to work. They probably worked with "unsafe signals", which act more like signal delivery that we're used to in C. In fact, as of 5.8.1, you can turn off deferred signal delivery by setting the environment variable PERL_SIGNALS=unsafe before executing your script. When I do this, the threads::join call is indeed interrupted as I was expecting, just as pthread_join is interrupted in C in this same scenario.
Unlike other I/O operations, like read, which returns EINTR when a signal interrupts it, threads::join doesn't do this. Under the hood it's a call to the C library call pthread_join, which the man page confirms does not return EINTR. Under deferred signals, when the interpreter gets the SIGALRM, it schedules delivery of the signal, deferring it, until the threads::join->pthread_join library call returns. Since pthread_join doesn't "interrupt" and return EINTR, my SIGALRM is effectively being swallowed by the threads::join. With other I/O operations, they would "interrupt" and return EINTR, giving the perl interpreter a chance to deliver the signal and then restart the system call via SA_RESTART.
Obviously, running in unsafe signals mode is probably a Bad Thing, so as an alternative, according to perlipc, you can use the POSIX module to install a signal handler directly via sigaction. This then makes the one particular signal "unsafe".

Perl Threads - Capture Exit

I have code that spawns two threads. The first is a system command which launches an application. The second monitors the program. I'm new to perl threads so I have a few questions...
my $thr1 = threads->new(system($cmd));
sleep(FIVEMINUTES);
my $thr2 = threads->new(\&check);
my $rth1 = $thr1->join();
my $rth2 = $thr2->join();
1) Do I need a second thread to monitor the program? You can think of my sub routine call to &check as a infinite while loop which checks a text file for stuff the application produces. Could I just do this:
my $thr1 = threads->new(system($cmd));
sleep(FIVEMINUTES);
&check;
2) I'm trying to figure out what my parent is doing after I run this code. So after I launch line 1 it will spawn that new thread, sleep, then spawn that second thread and then sit at that first join and wait. It will not execute the rest of my code until it joins at that first join. Is this correct or am I wrong? If I am wrong, then how does it work?
3) My first thread the one that launches the application can be killed unexpectedly. when this happens, I have nothing to catch that and kill the threads. It just says:
"Thread 1 terminated abnormally: Undefined subroutine &main::65280 called at myScript.pl line 109." and then hangs there.
What could I do to get it to end the other threads? I need it to send an email before the program ends as well which I can do by just calling &email (another subroutine I made).
Thanks
First of all,
my $thr1 = threads->new(system($cmd));
should be
my $thr1 = threads->new(sub { system($cmd) });
or simply
my $thr1 = async { system($cmd) };
You don't need to start a third thread. As you suspected, the main thread and the one executing system are sufficient.
What if the command finishes executing in less than five minutes? The following replaces sleep with a signal.
use threads;
use threads::shared;
my $done :shared = 0;
my $thr1 = async {
system($cmd);
lock($done);
$done = 1;
cond_signal($done);
};
{ # Wait up to $timeout for the thread to end.
lock($done);
my $timeout = time() + 5*60;
1 while !$done && cond_timedwait($done, $timeout);
if (!$done) {
... there was a timeout ...
}
}
$thr1->join();
In 2004-2006 I had the same challenges for 24/7 running perl app on Winblows... The only approach working was to use xml state files on disk to communicate the status of each component of the system... and make sure if threads are used every stat file handling occurred within a closure code block {} (big gotcha) The app ran at least 3 years on 100 machines 24/7 without errors ...
If you are on a Unix-like OS I would suggest to use forks and interprocess communication.
Use cpan modules, do not reinvent the wheel..
Multithreading in Perl is a little hard to deal with, I would suggest using the fork() commands instead. I will attempt to answer your questions to the best of my ability.
1) It seems to me like two threads/processes are the way to go here, as you need to check asynchronously check your data.
2) Your parent works exactly as you describe.
3) The reason for your thread hanging could be that you never terminate your second thread. You said it was an infinite loop, is there any exit condition?

Perl parallel HTTP requests - out of memory

First of all, I'm new to Perl.
I want to make multiple (e.g. 160) HTTP GET requests on a REST API in Perl. Executing them one after another takes much time, so I was thinking of running the requests in parallel. Therefore I used threads to execute more requests at the same time and limited the number of parallel requests to 10.
This worked just fine for the first time I ran the program, the second time I ran 'out of memory' after the 40th request.
Here's the code: (#urls contains the 160 URLs for the requests)
while(#urls) {
my #threads;
for (my $j = 0; $j < 10 and #urls; $j++) {
my $url = shift(#urls);
push #threads, async { $ua->get($url) };
}
for my $thread (#threads) {
my $response = $thread->join;
print "$response\n";
}
}
So my question is, why am I NOT running out of memory the first time but the second time (am I missing something crucial in my code)? And what can I do to prevent it?
Or is there a better way of executing parallel GET requests?
I'm not sure why you would get a OOM error on a second run when you don't get one on the first run; when you run a Perl script and the perl binary exits, it'll release all of it's memory back to the OS. Nothing is kept between executions. Is the exact same data being returned by the REST service each time? Maybe there's more data the second time you run and it's pushing you over the edge.
One problem I notice is that you're launching 10 threads and running them to completion, then spawning 10 more threads. A better solution may be a worker-thread model. Spawn 10 threads (or however many you want) at the start of the program, put the URLs into a queue, and allow the threads to process the queue themselves. Here's a quick example that may help:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new();
my #thr = map {
threads->create(sub {
my #responses = ();
while (defined (my $url = $q->dequeue())) {
push #responses, $ua->get($url);
}
return #responses;
});
} 1..10;
$q->enqueue($_) for #urls;
$q->enqueue(undef) for 1..10;
foreach (#thr) {
my #responses_of_this_thread = $_->join();
print for #responses_of_this_thread;
}
Note, I haven't tested this to make sure it works. In this example, you create a new thread queue and spawn up 10 worker threads. Each thread will block on the dequeue method until there is something to be read. Next, you queue up all the URLs that you have, and an undef for each thread. The undef will allow the threads to exit when there is no more work to perform. At this point, the threads will go through and process the work, and you will gather the responses via the join at the end.
Whenever I need an asynchronous solution Perl, I first look at the POE framework. In this particular case I used POE HTTP Request module that will allow us to send multiple requests simultaneously and provide a callback mechanism where you can process your http responses.
Perl threads are scary and can crash your application, especially when you join or detach them. If responses do not take a long time to process, a single threaded POE solution would work beautifully.
Sometimes though, we have to a rely on threading because application gets blocked due to long running tasks. In those cases, I create a certain number of threads BEFORE initiating anything in the application. Then with Thread::Queue I pass the data from the main thread to these workers AND never join/detach them; always keep them around for stability purposes.
(Not an ideal solution for every case.)
POE supports threads now and each thread can run a POE::Kernel. The kernels can communicate with each other through TCP sockets (which POE provides nice unblocking interfaces).

How to implement a thread safe timer on linux?

As we know, doing things in signal handlers is really bad, because they run in an interrupt-like context. It's quite possible that various locks (including the malloc() heap lock!) are held when the signal handler is called.
So I want to implement a thread safe timer without using signal mechanism.
How can I do?
Sorry, actually, I'm not expecting answers about thread-safe, but answers about implementing a timer on Unix or Linux which is thread-safe.
Use usleep(3) or sleep(3) in your thread. This will block the thread until the timeout expires.
If you need to wait on I/O and have a timer expire before any I/O is ready, use select(2), poll(2) or epoll(7) with a timeout.
If you still need to use a signal handler, create a pipe with pipe(2), do a blocking read on the read side in your thread, or use select/poll/epoll to wait for it to be ready, and write a byte to the write end of your pipe in the signal handler with write(2). It doesn't matter what you write to the pipe - the idea is to just get your thread to wake up. If you want to multiplex signals on the one pipe, write the signal number or some other ID to the pipe.
You should probably use something like pthreads, the POSIX threads library. It provides not only threads themselves but also basic synchronization primitives like mutexes (locks), conditions, semaphores. Here's a tutorial I found that seems to be decent:
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
For what it's worth, if you're totally unfamiliar with multithreaded programming, it might be a little easier to learn it in Java or Python, if you know either of those, than in C.
I think the usual way around the problems you describe is to make the signal handlers do only a minimal amount of work. E.g. setting some timer_expired flag. Then you have some thread that regularly checks whether the flag has been set, and does the actual work.
If you don't want to use signals I suppose you'd have to make a thread sleep or busy-wait for the specified time.
Use a Posix interval timer, and have it notify via a signal. Inside the signal handler function almost none of C's functions, like printf() can be used, as they aren't re-entrant.
Use a single global flag, declared static volatile for your signal handler to manipulate. The handler should literally have this one line of code, and NOTHING else; This flag should impact the flow control elsewhere in the 1 & Only thread in the program.
static volatile bool g_zig_instead_of_zag_flg = false;
...
void signal_handler_fnc()
g_zig_instead_of_zag_flg = true;
return
int main() {
if(false == g_zig_instead_of_zag) {
do_zag();
} else {
do_zig();
g_zig_instead_of_zag = false;
return 0;
}
Michael Kerrisk's The Linux Programming Interface has examples of both methods, and a few more, but the examples come with a lot of his own private functions you have to get working, and the examples carefully avoid many of the gotchas they should explore, so not great.
Using the Poxix interval timer that notifies via a thread makes everything a lot worse, and AFAICT, that notification method is pretty much useless. I only say pretty much because I am allowing that there may be SOME case where doing nothing in the main() thread, and everything in the handler thread is useful, but I sure can't think of any such case.

Resources