Thread-safe logger for Tcl - multithreading

I need a logging library for my multi-threaded Tcl aplication. Can I use standard logger package? If I can, what restrictions are applied in multi-threading environment?
I'd like to share logging services among the threads, if possible.
Thanks

Tcl threads do not share data (unless you explicitly use certain facilities from the Thread package) and instead are communicating via message passing. So it seems like the way to go would be to setup a dedicated "logger" thread and just queue logging messages into it from the worker threads.
Otherwise the point of contention will probably be somewhere in the OS resource used by the logger to actually write data.
Update Okay, here's a working sketch of what I actually proposed to implement:
package require Tcl 8.5
package require Thread
proc make_worker_thread {logger_id body} {
set newbody [list set ::logger $logger_id]
append newbody \n {
proc ::log {severity msg} {
global logger
thread::send $logger [list ::log $severity $msg]
}
} \n $body
thread::create $newbody
}
set logger [thread::create {
package require logger
proc log {severity msg} {
puts "hey, that's it: ($severity) $msg"
}
puts "logger thread created: [thread::id]"
thread::wait
}]
for {set i 0} {$i < 3} {incr i} {
make_worker_thread $logger {
proc post_msg {} {
log notice "A message from [thread::id]"
after 1000 ::post_msg
}
puts "worker thread created: [thread::id]"
after 1000 ::post_msg
thread::wait
}
}
vwait forever
This code creates one logger thread and four worker threads each of which posts a message to the logger thread once per second. The code runs until manually interrupted. The logger thread just simple-mindedly outputs the message it was passed to the console, but as someone else in this thread already mentioned, you could probably use the "logger" package from Tcllib, if you need fancy stuff like facilities.
To reiterate my points:
The logger package itself does not presumably know anything about threading.
Tcl threads are well-separated and usually communicate via message passing.
Hence create a thread for the logger and teach worker threads send messages to it; therefore working threads are not concerned with how logger is implemented.
P.S. In the worker threads, you can use [thread::send -async ...] to make sending log messages fully asynchronous.

A Logging API for Tcl
This implementation is thread safe. Because of the general
purpose the C-functions do not require a tcl-interpreter.

It depends a bit on what you want to achieve with a multithreaded use of logger.
If you just have the use case to not block your worker threads while writing log messages to disk, the simplest way is to use logger normally and configure a simple logproc that does a thread::send -async to some logging thread (which might itself use logger with appenders to write the actual log files) with your log message (basically what has been sketched in the accepted answer).
If you want to use loggers option to disable/enable logging for the whole program, across various threads, you need to do a little more work to propagate loglevel changes to all threads via custom lvlchangeproc's.

Here is my "multithreading" wrapper for logger package:
# replacement for logger::init procedure from logger package
proc ::mylogger::init { service } {
set log [logger::init $service]
foreach lvl [logger::levels] {
interp alias {} log_to_file_$lvl {} ::mylogger::log $lvl $service
${log}::logproc $lvl log_to_file_$lvl
}
return $log
}
proc mylogger::server { } {
set t [thread::create {
proc log { level txt } {
set msg "\[[clock format [clock seconds] -format "%Y-%m-%dT%H:%M:%S"]\]\t$level\t$txt"
puts stderr $msg
}
# enter to event loop
thread::wait
}]
tsv::set measure-logger loggerThread $t
}
proc ::mylogger::log { level service txt } {
set t [tsv::get measure-logger loggerThread]
thread::send -async $t [list log $level "$service\t$txt"]
}
# EXAMPLE
# start logging thread
# should be called once from main application thread
::mylogger::server
# create logger
# may be called from any thread
set log [mylogger::init myservice]
# log a message
# may be called from the thread the "mylogger::init myservice" was called in
${log}::debug myservice "Hello, World!"
# wait a second
after 1000

Related

Perl set timeout within thread fails: 'Alarm clock'

I have a threaded application and would like to set timeouts for the threads. Peldoc for alarm suggests to use a eval-die pair and catch the ALRM signal. However, this fails with threads producing the error Alarm clock:
use strict; use warnings;
require threads;
require threads::shared;
my $t = threads->create( sub {
eval {
$SIG{ALRM} = sub { die "alarm\n" };
alarm 2;
main();
alarm 0;
};
if ($#){
die $# unless $# eq "alarm\n";
print "timed out\n";
}
}
);
my #r = $t->join;
print "done\n";
sub main {
sleep 3;
}
This post suggests that alarm is called without signal handler in the threads library. Another post is about this problem and answers suggest to use fork and waitpid, but I would really like to use threads. Another post claims to come up with a solution, but this still gives the Alarm clock error for me. I tried to catch Alarm clock in the if ($#), but no success. Any idea how I could make this work?
The whole idea of using alarm in threads is problematic.
Signals are sent to processes, not threads.
What if two threads want to use alarm?
You'll have to implement your own system. The following is an attempt at a general solution:
package Threads::Alarm;
use strict;
use warnings;
use threads;
use threads::shared;
use Exporter qw( import );
our #EXPORT_OK = qw( alarm thread_alarm );
# A list of "$time:$tid" strings sorted by ascending time.
my #alarms :shared;
sub thread_alarm {
my ($wait) = #_;
my $tid = threads->tid();
lock #alarms;
# Cancel existing alarm for this thread, if any.
for my $i (0..$#alarms) {
if ((split(/:/, $alarms[$i]))[1] == $tid) {
splice(#alarms, $i, 1);
last;
}
}
# Create an alarm
if ($wait) {
my $when = time() + $wait;
# A binary search would be better.
my $i;
for ($i=0; $i<#alarms; ++$i) {
last if $when < (split(/:/, $alarms[$i]))[0];
}
splice(#alarms, $i, 0, "$when:$tid");
}
# Notify others of change to #alarms.
cond_broadcast(#alarms);
}
{
no warnings 'once';
*alarm = \&thread_alarm;
}
threads->create(sub {
while (1) {
my $thread;
{
lock #alarms;
while (1) {
# Wait for an alarm request to come in.
cond_wait(#alarms) while !#alarms;
# Grab the soonest alarm.
my ($when, $tid) = split(/:/, $alarms[0]);
# Check if the thread still exists.
my $thread = threads->object($tid)
or last;
# Wait for the #alarms to change or for the alarm time.
last if !cond_timedwait(#alarms, $when);
}
# Before releasing the lock, remove the alarm we're about to raise.
shift(#alarms);
# Notify others of change to #alarms.
# Doesn't actually do anything at this time.
cond_broadcast(#alarms);
}
$thread->kill('ALRM') if $thread;
}
})->detach();
1;
Completely untested. Well, I made sure it compiles, but that's it.
Note that threads->kill doesn't send a real signal (since those are sent to processes, not threads), so the OS won't interrupt any operation (e.g. sleep, wait). Simple solution: Send a real signal to a handler that does nothing right after calling threads->kill. Maybe I should have written a solution that was based around the actual SIGALRM.

Async Logger. Can I lose/delay log entries?

I'm implementing my own logging framework. Following is my BaseLogger which receives the log entries and push it to the actual Logger which implements the abstract Log method.
I use the C# TPL for logging in an Async manner. I use Threads instead of TPL. (TPL task doesn't hold a real thread. So if all threads of the application end, tasks will stop as well, which will cause all 'waiting' log entries to be lost.)
public abstract class BaseLogger
{
// ... Omitted properties constructor .etc. ... //
public virtual void AddLogEntry(LogEntry entry)
{
if (!AsyncSupported)
{
// the underlying logger doesn't support Async.
// Simply call the log method and return.
Log(entry);
return;
}
// Logger supports Async.
LogAsync(entry);
}
private void LogAsync(LogEntry entry)
{
lock (LogQueueSyncRoot) // Make sure we ave a lock before accessing the queue.
{
LogQueue.Enqueue(entry);
}
if (LogThread == null || LogThread.ThreadState == ThreadState.Stopped)
{ // either the thread is completed, or this is the first time we're logging to this logger.
LogTask = new new Thread(new ThreadStart(() =>
{
while (true)
{
LogEntry logEntry;
lock (LogQueueSyncRoot)
{
if (LogQueue.Count > 0)
{
logEntry = LogQueue.Dequeue();
}
else
{
break;
// is it possible for a message to be added,
// right after the break and I leanve the lock {} but
// before I exit the loop and task gets 'completed' ??
}
}
Log(logEntry);
}
}));
LogThread.Start();
}
}
// Actual logger implimentations will impliment this method.
protected abstract void Log(LogEntry entry);
}
Note that AddLogEntry can be called from multiple threads at the same time.
My question is, is it possible for this implementation to lose log entries ?
I'm worried that, is it possible to add a log entry to the queue, right after my thread exists the loop with the break statement and exits the lock block, and which is in the else clause, and the thread is still in the 'Running' state.
I do realize that, because I'm using a queue, even if I miss an entry, the next request to log, will push the missed entry as well. But this is not acceptable, specially if this happens for the last log entry of the application.
Also, please let me know whether and how I can implement the same, but using the new C# 5.0 async and await keywords with a cleaner code. I don't mind requiring .NET 4.5.
Thanks in Advance.
While you could likely get this to work, in my experience, I'd recommend, if possible, use an existing logging framework :) For instance, there are various options for async logging/appenders with log4net, such as this async appender wrapper thingy.
Otherwise, IMHO since you're going to be blocking a threadpool thread during your logging operation anyway, I would instead just start a dedicated thread for your logging. You seem to be kind-of going for that approach already, just via Task so that you'd not hold a threadpool thread when nothing is logging. However, the simplification in implementation I think benefits just having the dedicated thread.
Once you have a dedicated logging thread, you then only need have an intermediate ConcurrentQueue. At that point, your log method just adds to the queue and your dedicated logging thread just does that while loop you already have. You can wrap with BlockingCollection if you need blocking/bounded behavior.
By having the dedicated thread as the only thing that writes, it eliminates any possibility of having multiple threads/tasks pulling off queue entries and trying to write log entries at the same time (painful race condition). Since the log method is now just adding to a collection, it doesn't need to be async and you don't need to deal with the TPL at all, making it simpler and easier to reason about (and hopefully in the category of 'obviously correct' or thereabouts :)
This 'dedicated logging thread' approach is what I believe the log4net appender I linked to does as well, FWIW, in case that helps serve as an example.
I see two race conditions off the top of my head:
You can spin up more than one Thread if multiple threads call AddLogEntry. This won't cause lost events but is inefficient.
Yes, an event can be queued while the Thread is exiting, and in that case it would be "lost".
Also, there's a serious performance issue here: unless you're logging constantly (thousands of times a second), you're going to be spinning up a new Thread for each log entry. That will get expensive quickly.
Like James, I agree that you should use an established logging library. Logging is not as trivial as it seems, and there are already many solutions.
That said, if you want a nice .NET 4.5-based approach, it's pretty easy:
public abstract class BaseLogger
{
private readonly ActionBlock<LogEntry> block;
protected BaseLogger(int maxDegreeOfParallelism = 1)
{
block = new ActionBlock<LogEntry>(
entry =>
{
Log(entry);
},
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism,
});
}
public virtual void AddLogEntry(LogEntry entry)
{
block.Post(entry);
}
protected abstract void Log(LogEntry entry);
}
Regarding the loosing waiting messages on app crush because of unhandled exception, I've bound a handler to the event AppDomain.CurrentDomain.DomainUnload. Goes like this:
protected ManualResetEvent flushing = new ManualResetEvent(true);
protected AsyncLogger() // ctor of logger
{
AppDomain.CurrentDomain.DomainUnload += CurrentDomain_DomainUnload;
}
protected void CurrentDomain_DomainUnload(object sender, EventArgs e)
{
if (!IsEmpty)
{
flushing.WaitOne();
}
}
Maybe not too clean, but works.

fork and waitpid fail on linux. Without hitting the hard or soft limits

I have a process that must create and close threads on demand.
Each thread forks a new process using open2. Sometimes after executing the program for a long time open2 fails to fork the process sometimes and gives a "Can not allocate memory error", sometimes this happens for threads too.I know that the Linux has soft and hard limits but the number of the concurrent threads and processes for my server does not exceed those values.
Is there something like a counter for number of processes and threads that eliminates thread and process creation after sometime?
If it is so how servers like Postgres work for a long period of time?
The project has multiple processes that communicate using TCP, but the part that causes the error that i described in a frond end to mplayer, that is written in Perl. The code is as follows:
use strict;
use warnings;
use IO::Socket::INET;
use IO::Select;
use POSIX ":sys_wait_h";
use IPC::Open2;
use 5.010;
use Config;
BEGIN
{
if(!$Config{useithreads})
{
die "Your perl does not compiled with threading support.";
}
}
use threads;
use threads::shared;
use constant
{
SERVER_PORT=>5000,
#Remote request packet fields
PACKET_REQTYPE=>0,
PACKET_FILENAM=>1,
PACKET_VOLMLVL=>2,
PACKET_ENDPOSI=>3,
PACKET_SEEKPOS=>4,
#our request typs
PLAY_REQUEST=>1,
STOP_REQUEST=>2,
INFO_REQUEST=>3,
VOCH_REQUEST=>4,
PAUS_REQUEST=>5,
PLPA_REQUEST=>6,
SEEK_REQUEST=>7,
#Play states
STATE_PAUS=>0,
STATE_PLAY=>1,
STATE_STOP=>2,
};
#The following line must be added because of a bad behavior in the perl thread library that causes a SIGPIPE to be generated under heavy usage of the threads.
$SIG{PIPE} = 'IGNORE';
#This variable holds the server socket object
my $server_socket;
#This array is used to hold objects of our all threads
my #thread_objects;
#create the server socket
$server_socket=IO::Socket::INET->new(LocalPort=>SERVER_PORT,Listen=>20,Proto=>'tcp',Reuse=>1) or
die "Creating socket error ($#)";
#Now try to accept remote connections
print "Server socket created successfully now try to accept remote connections on port: ".SERVER_PORT."\n";
while(my $client_connection=$server_socket->accept())
{
push #thread_objects,threads->create(\&player_thread,$client_connection);
$thread_objects[$#thread_objects]->detach();
}
#This subroutine is used to play something using tcp-based commands
sub player_thread
{
my $client_socket=shift;
#create a new select object
my $selector=IO::Select->new($client_socket);
#this variabe is used to pars our request
my #remote_request;
#getting th thread id of the current thread
my $tid=threads->self()->tid;
#This variable is used to hold the pid of mplayer child
my $mp_pid=-1;
#Mplayer stdin and stdout file descriptors
my ($MP_STDIN,$MP_STDOUT);
#This variable is used to check if we are playing something now or not
my $is_playing=STATE_STOP;
print "Client thread $tid created.\n";
while(1)
{
#check to see if we can read anything from our handler
#print "Before select\n";
#my #ready=$selector->can_read();
#print "After select: #ready\n";
#now the data is ready for reading so we read it here
my $data=<$client_socket>;
#This means if the connection is closed by the remote end
if(!defined($data))
{
print "Remote connection has been closed in thread $tid mplayer id is: $mp_pid and state is: $is_playing.\n";
#if we have an mplayer child when remote connection is closed we must wait for it
#so that is work is done
if($mp_pid!=-1 and $is_playing ==STATE_PLAY)
{
waitpid $mp_pid,0;
$is_playing=STATE_STOP;
}
elsif($is_playing==STATE_PAUS and $mp_pid!=-1)
{
print "thread $tid is in the paused state, we must kill mplayer.\n";
print $MP_STDIN "quit\n";
waitpid $mp_pid,0;
$is_playing=STATE_STOP;
}
last;
}#if
#FIXME:: Here we must validate our argument
#Now we try to execute the command
chomp($data);
#remote_request=split ",",$data;
print "#remote_request\n";
#Trying to reap the death child and change the state of the thread
my $dead_child=-1;
$dead_child=&reaper($mp_pid);
if($dead_child)
{
$is_playing=STATE_STOP;
$mp_pid=-1;
}
given($remote_request[PACKET_REQTYPE])
{
when($_==PLAY_REQUEST)
{
print "Play request\n";
if($is_playing==STATE_STOP)
{
eval{$mp_pid=open2($MP_STDOUT,$MP_STDIN,"mplayer -slave -really-quiet -softvol -volume ".$remote_request[PACKET_VOLMLVL]." -endpos ".$remote_request[PACKET_ENDPOSI]." ./".$remote_request[PACKET_FILENAM]);};
print "Some error occurred in open2 system call: $#\n" if $#;
$is_playing=STATE_PLAY;
print "Mplayer pid: $mp_pid.\n";
}
}
when($_==STOP_REQUEST)
{
print "Stop request\n";
if($is_playing != STATE_STOP)
{
print $MP_STDIN "pausing_keep stop\n";
#FIXME:: Maybe we should use WNOHANG here
my $id=waitpid $mp_pid,0;
print "Mplayer($id) stopped.\n";
$is_playing=STATE_STOP;
$mp_pid=-1;
}
}
when($_==PAUS_REQUEST)
{
print "pause request\n";
if($is_playing !=STATE_STOP)
{
print $MP_STDIN "pausing_keep pause\n";
$is_playing=STATE_PAUS;
}
}
when($_==VOCH_REQUEST)
{
print "volume change request\n";
if($is_playing !=STATE_STOP)
{
print $MP_STDIN "pausing_keep volume ".$remote_request[PACKET_VOLMLVL]." 1\n";
}
}
when($_==INFO_REQUEST)
{
my $id;
$id=&reaper($mp_pid);
if($id > 0)
{
print "Mplayer($id) stopped.\n";
$is_playing=STATE_STOP;
$mp_pid=-1;
}
given($is_playing)
{
when($_==STATE_STOP)
{
print $client_socket "Stopped\n";
}
when($_==STATE_PAUS)
{
print $client_socket "Paused\n";
}
when($_==STATE_PLAY)
{
print $client_socket "Playing\n";
}
}
}
when ($_==PLPA_REQUEST)
{
print "play paused request\n";
if($is_playing==STATE_STOP)
{
eval{$mp_pid=open2($MP_STDOUT,$MP_STDIN,"mplayer -slave -really-quiet -softvol -volume ".$remote_request[PACKET_VOLMLVL]." -endpos ".$remote_request[PACKET_ENDPOSI]." ./".$remote_request[PACKET_FILENAM]);};
print "Some error occurred in open2 system call: $#\n" if $#;
print $MP_STDIN "pausing_keep pause\n";
$is_playing=STATE_PAUS;
}
}
when ($_==SEEK_REQUEST)
{
print "Seek request\n";
if($is_playing != STATE_STOP)
{
my $seek_pos=abs $remote_request[PACKET_SEEKPOS];
print $MP_STDIN "seek $seek_pos 2\n";
$is_playing=STATE_PLAY;
}
}
default
{
warn "Invalid request($_)!!!";
next;
}
}#Given
}#while
$client_socket->close();
print "Thread $tid is exiting now, the child mplayer pid is: $mp_pid and state is: $is_playing.\n";
}
#The following subroutine takes a pid and if that pid is grater than 0 it tries to reap it
#if it is successful returns pid of the reaped process else 0
sub reaper
{
my $pid=shift;
if($pid > 0)
{
my $id=waitpid($pid,WNOHANG);
if($id > 0)
{
return $id;
}
}
return 0;
}
"Can not allocate memory error" is what it says, either the user exceeded its memory quota (check with ulimit -m, compare to ps ux) or you're really out of memory (free).
The limits for max user processes are only indirectly connected - if you fork() more processes then the user's memory quota permits, fork() will fail with ENOMEM.
You also might want to see:
What are some conditions that may cause fork() or system() calls to fail on Linux?
I finally found the problem, it is because of a memory leak in the Perl's thread module that causes the memory to grow after a long time. Then open2 can not allocate memory and fails.

Prevent tcl thread from being blocked by main event loop

I am trying to run a thread continuously and not have it become blocked by the tcl main event loop.
Here is a simple example of what I'm trying to do:
#!/bin/sh
#\
exec tclsh "$0" "$#"
package require Thread
set ::a_thread [thread::create {thread::wait}]
proc start_a {} {
thread::send $::a_thread {
puts "Running a thread"
}
after 1000 a_start
}
proc infinite_loop {} {
while {1} {
puts "Loop"
after 500
}
}
start_a
infinite_loop
vwait forever
In this code, the infinite_loop proc is called and the main event loop runs infinitely. I would like it if the a_thread could still run in the background though. How can I achieve this?
The main event loop is not blocking your thread. Instead you are using the main event loop to shedule scripts to be executed in the thread. Instead, run the scheduler in the thread itself:
Code tested and works as expected:
thread::send $::a_thread {
proc loop {} {
puts "running a thread"
after 1000 loop
}
loop
}
while 1 {
puts "loop"
after 500
}
The answer is, of course, the one given by slebetman. However, one way to debug this sort of thing (especially in more complex cases) is to prefix the messages printed by each thread by the result of thread::id, and to make sure you print a message at the start of each time round the loop. For example:
package require Thread
set ::a_thread [thread::create {thread::wait}]
proc start_a {} {
puts "[thread::id]: Dispatch to $::a_thread"
thread::send $::a_thread {
puts "[thread::id]: Running a thread"
}
after 1000 a_start
}
proc infinite_loop {} {
while {1} {
puts "[thread::id]: Loop"
after 500
}
}
start_a
infinite_loop
puts "[thread::id]: Start main event loop"
vwait forever
That would have told you that the dispatch was happening once, that the running in the other thread is happening synchronously (thread::send waits for the script to finish executing by default), and that the infinite loop is preventing the startup of the main event loop (and hence the rescheduling of the dispatch). Since you didn't know who was doing what, of course there was confusion!

Perl Threads and Unsafe Signals

So I recently wanted to thread one of my Perl programs to increase its speed. Taking in a list of websites, I wanted to start a thread for each url and get the content of each website and then look for a company description on the page. Once one thread found a result, or all thread's didn't, I wanted to exit, write my result, and read in urls for my next company.
The problem that I see is that I use the Perl::Unsafe::Signals module inside of the function that I call when creating a thread. I need the unsafe signals to interrupt regular expressions that get "stuck". However this seems to cause all sorts of problems, mainly having the program crash and the error msg "Alarm Clock" shown.
Therefore, is there a way to use Perl::Unsafe::Signals and threads safely? Is there a way to timeout a regular expression in another way by sending a signal to the function ( like I send a 'KILL' signal below?) Thanks.
Note: I stripped down the code to all pertinent parts, let me know if you need more.
use threads ('exit' => 'threads_only');
use threads::shared;
my #descrip;
share(#descrip);
my $lock;
share($lock);
URL:foreach my $url(#unique_urls) {
#skip blank urls
if(!$url) { next URL; }#if
#find description
my $thread = threads->create(\&findCompanyDescription, $PREV_COMPANY, $PREV_BASE_URL, $url);
#while a description has not been found and there are still active threads, keep looking
#there may be a better way to do this, but this seems to work for me
while(!#descrip && threads->list() != 0) {;}
#kill all threads, write output, read in next batch of urls
my #threads = threads->list();
foreach(#threads) { print("detaching\n"); $_->kill('KILL')->detach(); }#foreach
#######SUBROUTINE CALLED BY THREAD CREATE
sub findCompanyDescription {
my($company_full, $base_url, $url) = #_;
my($descrip, $raw_meta, $raw) = '';
my #company;
$SIG{'KILL'} = sub { alarm(0); threads->exit(); };
eval {
local $SIG{ALRM} = sub { die("alarm\n") }; # NB: \n required
alarm(5);
use Perl::Unsafe::Signals;
UNSAFE_SIGNALS {
while($company) {
my #matches = ($content =~ m!.*<([\w\d]+).*?>\s*about\s+$company[\w\s\-_]*<.*?>(?:<.*?>|\s)*(.*?)</\1.*?>!sig);
MATCH:for(my $ndx=1; $ndx<#matches; $ndx+=2) {
($raw, $descrip) = &filterResult($matches[$ndx], $company_full);
if($descrip) {
$company = undef;
last(MATCH);
}#if
}#for
#reduce the company name and try again
$company = &reduceCompanyName($company);
}#while
alarm(0);
};#unsafe_signals
};#eval
if($#) {
if($# eq "alarm\n" && $DEBUG) { print("\nWebpage Timeout [].\n"); }#if
}#if
if($descrip) { lock($lock); {
#descrip = ($PREV_ID, $company_full, $base_url, $url, 1, $raw, $descrip); }
}#if
In general, "unsafe" signals are unsafe for both single threaded and multi-threaded. You've only increased your peril by using threads and unsafe signals. Perl's usual safe signal handler sets the flag signal_pending without meaningfull interrupting execution. The VM checks that flag when it's between opcodes.
Your regexp execution is a single, "atomic" opcode. Of course, the regexp itself is another VM with its own opcodes but we don't have currently visibility into that for the perl signal handler.
Frankly, I've no good idea about how to interrupt the regexp engine. It's got some global C state which in the past prior to perl-5.10 prevented it from being reentrant. It might not be safe for universal interruptability like you're trying. If you really wanted it to be fully interruptible, you might want to fork and have your child process do the regexp and communicate the results back over a pipe.
require JSON;
require IO::Select;
my $TIMEOUT_SECONDS = 2.5; # seconds
my ( $read, $write );
pipe $read, $write;
my #matches;
my $pid = fork;
if ( $pid ) {
my $select = IO::Select->new( $read );
if ( $select->can_read( $TIMEOUT_SECONDS ) ) {
local $/;
my $json = <$read>;
if ( $json ) {
my $matches_ref = JSON::from_json( $json );
if ( $matches_ref ) {
#matches = #$matches_ref;
}
}
}
waitpid $pid, 0;
}
else {
my #r = $content =~ m!.*<([\w\d]+).*?>\s*about\s+$company[\w\s\-_]*<.*?>(?:<.*?>|\s)*(.*?)</\1.*?>!sig;
my $json = JSON::to_json( \ #r );
print { $write } $json;
close $write;
exit;
}
IMHO, mixing signals and threads is a challenging task per se (i.e. w/o perl-specific things).
Remember that even in a single-threaded program you can safely call only async-signal-safe functions from the signal handler because the program may be interrupted at any point.
Perl adds another layer of abstraction, so I have no idea about safety of calling "die" from signal handler in case of unsafe signals.
If I remember properly, SIGALRM is asynchronous signal, so it must be handled synchronously. Your way of handling it is generally incorrect in multi-threaded programs.
Moreover, IMHO perl threads just do not work as most people expect.
Just avoid using them and use processes instead.
P.S.
The following line doesn't make sense:
$SIG{'KILL'} = sub { alarm(0); threads->exit(); };
SIGKILL (as well as SIGSTOP) cannot be caught.
I'm not really specialist on Perl-MT, but one thing you apparently is missing is that signals are global to the whole process - they are not thread specific. On POSIX systems you can't set a signal handler for a thread: signals are delivered to the whole process. IOW alarm() call affects the whole process, not only the thread which calls it. And even local %SIG in MT context doesn't do what one might think it does - because local is a thing of syntax.

Resources