Perl - Using LWP in multithread program - multithreading

Here's my code:
use LWP;
use threads;
use strict;
use warnings;
my #thrds;
for (1..100)
{
push #thrds, threads->create( 'doit' );
}
for (#thrds)
{
$_->join();
}
sub doit
{
LWP::UserAgent->new->get("http://dx.doi.org/10.1002/aoc.1067");
}
I'm using Windows 7 x64 and ActivePerl 5.20.2 x64, I also tried StrawberryPerl.
I've got bunch of errors:
Thread ... terminated abnormally: Can't locate object method "_uric_escape" via package "URI" at .../URI.pm line 81
String found where operator expected at (eval 10) line 8, near "croak 'usage: $io->getlines()'"
(Do you need to predeclare croak?)
If I add
sleep 1;
before
push #thrds, threads->create( 'doit' );
it'll be ok.
What's the problem?

I'm not sure why, but there seems to be problems dealing with dynamically-loaded modules. Explicitly loading them before thread creation solves the problem. In other words, add the following:
use Carp qw( );
use URI qw( );

I think the problem here will be memory footprint. You are - after all - loading the LWP library, and then cloning your process 100 times.
Contrary to popular belief, threads in perl are not even remotely lightweight. They are not well suited to this model of usage - each thread is a 'full copy' of your process, and that's just ... not a great plan.
My copy of perl - ActivePerl 5.20.2 - doesn't exhibit the same problem (I don't think - I didn't actually want to spam that website you list).
I would suggest instead that you either rewrite your threading to use Thread::Queue and a lower degree of parallelism:
use strict;
use warnings;
use LWP;
use threads;
use Thread::Queue;
my $workers = 10;
my $work_q = Thread::Queue->new();
my $url = "http://localhost:80";
sub worker_thread {
while ( my $url = $work_q->dequeue ) {
LWP::UserAgent->new->get($url);
}
}
threads->create( \&worker_thread ) for 1 .. $workers;
for ( 1 .. 100 ) {
$work_q->enqueue($url);
}
$work_q->end;
foreach my $thread ( threads->list ) {
$thread->join();
}
Otherwise, a fork-based approach may also work better:
use strict;
use warnings;
use Parallel::ForkManager;
use LWP;
my $manager = Parallel::ForkManager->new(10);
for ( 1 .. 100 ) {
$manager->start and next;
LWP::UserAgent->new->get("http://localhost:80");
$manager->finish;
}
$manager->wait_all_children;
Edit:
A bit more testing on your sample - I do get similar runtime errors on RHEL running 5.20.2
They vary somewhat, which really does mean there has to be some kind of race condition going on here.
Which is odd, because threads are supposed to be standalone and they're not.
In particular - my kernel bombs with a 'kill' because of memory exhaustion thanks to a few gigs of memory footprint, which is a pretty good reason not to use this approach.
My test case of running your code with RHEL and Perl 5.20.2 also triggers problems (occasionally).
What I get is similar sorts of errors - sporadically. I cannot see an obvious source thought. It might be as simple as too many open file descriptors or too much memory consumed. This is quite a hefty memory burden.

This is very straightforward using the Mojolicious framework. The Mojo::UserAgent class has been designed to work asynchronously with the help of the Mojo::IOLoop module, and although synchronous transactions are available they are implemented using special cases of the standard asynchronous calls
use strict;
use warnings;
use Mojo;
my $ua = Mojo::UserAgent->new( max_redirects => 5 );
my $n;
for ( 1 .. 100 ) {
$ua->get('http://dx.doi.org/10.1002/aoc.1067' => \&completed);
++$n;
}
Mojo::IOLoop->start;
sub completed {
my ($ua, $tx) = #_;
Mojo::IOLoop->stop unless --$n > 0;
}
A quick benchmark gave the following results
100 synchronous GET requests took 178 seconds
100 asynchronous GET requests took 10 seconds

I know this is an old topic, but just yesterday I ran into this problem. Below info can be useful for someone.
I was getting several kinds of errors from LWP in similar code (parallel http requests), including
"String found where operator expected at ... near "croak ..."
"Deep recursion on subroutine "IO::Socket::new" ... Out of memory!"
Sometimes requests just failed (returned status 500 or 501).
Problem seems to be gone after I added the following line:
use IO::File;
Don't ask me why, I have no idea :) Figured this out accidentally.
I am using Strawberry Perl 5.24.0.

Related

Limit Execution Time in Node

I am working on a node-based MUD game and I would like to limit the amount of time any one command can execute before it gets killed (e.g. 1000ms). I found a module called Tripwire which seems promising but it does not appear to be actively maintained. Tripwire does work as advertised. It manages to force an exception if someone creates an endless loop, but it does not support any resumption of the original script thread.
I am looking for either:
(1) A similar but actively maintained Node module that can interrupt and resume the original event thread, or,
(2) A working example of V8's Isolate::IsExecutionTerminating + Isolate::CancelTerminateExecution (I forked Tripwire but I haven't done any meaningful C++ in a long time and am now just beating my head against the wall).
I have only been able to find test cases so far (which is at least something). I am really hoping that someone has already tackled this, though.
Test cases:
https://chromium.googlesource.com/v8/v8/+/ad55afcb459dafda1cf48e676985717fd7eae786/test/cctest/test-thread-termination.cc
I know this is a bit vague.
I ended up instrumenting the script by passing it through acorn and generating my own final script. I am hoping that the sandbox is locked down to prevent users from escaping it. Example of "compiled" output:
createPermissions(expr) {
let __mec = __bfc(this || GameMaster, 'public', 'createPermissions', __FILE__, false); try { let parts = expr.split('/');
for (let i = 0; i < parts.length; i++) {
__ala(); let foo = parts.slice(0, i).join('/');
} } finally { __efc(__mec, 'createPermissions'); }
}
This new "language" supports public, protected, package, and private variables/methods (by maintaining its own internal call stack, execution context, etc). The directives are "reserved words" (e.g. __bfc=begin function call, __ala=assert loop alarm).
Thanks #jfriend00 for the suggestion.
For those who are curious: Transpiler Module

node.js multithreading with max child count

I need to write a script, that takes an array of values and multithreaded way it (forks?) runs another script with a value from array as a param, but so max running forks would be set, so it would wait for script to finish if there are more than n running already. How do I do that?
There is a plugin named child_process, but not sure how to get it done, as it always waits for child termination.
Basically, in PHP it would be something like this (wrote it from head, may contain some syntax errors):
<php
declare(ticks = 1);
$data = file('data.txt');
$max=20;
$child=0;
function sig_handler($signo) {
global $child;
switch ($signo) {
case SIGCHLD:
$child -= 1;
}
}
pcntl_signal(SIGCHLD, "sig_handler");
foreach($data as $dataline){
$dataline = trim($dataline);
while($child >= $max){
sleep(1);
}
$child++;
$pid=pcntl_fork();
if($pid){
// SOMETHING WENT WRONG? NEVER HAPPENS!
}else{
exec("php processdata.php \"$dataline\"");
exit;
}//fork
}
while($child != 0){
sleep(1);
}
?>
After the conversation in the comments, here's how to have Node executing your PHP script.
Since you're calling an external command, there's no need to create a new thread. The Node.js runloop understands that calls to external commands are async operations, and it can execute all of them at the same time.
You can see different ways for executing an external process in this SO question (linked answer may be the best in your case).
However, since you're already moving everything to Node, you may even consider rewriting your "process.php" script to Node.js code. Since, as you explained, that script connects to remote servers and databases and uses nslookup (which you may not really need with Node.js), you won't need any separate thread: they're all async operations that Node.js excels at performing.

How to set up a internet connectivity detector for a Net::IRC bot?

I have an IRC bot written in Perl, using the deprecated, undocumented and unloved Net::IRC library. Still, it runs just fine... unless the connection goes down. It appears that the library ceased to be updated before they've implemented support for reconnecting. The obvious solution would be to rewrite the whole bot to make use of the library's successors, but that would unfortunately require rewriting the whole bot.
So I'm interested in workarounds.
Current setup I have is supervisord configured to restart the bot whenever the process exits unexpectedly, and a cron job to kill the process whenever internet connectivity is lost.
This does not work as I would like it to, because the bot seems incapable of detecting that it has lost connectivity due to internet outage. It will happily continue running, doing nothing, pretending to still be connected to the IRC server.
I have the following code as the main program loop:
while (1) {
$irc->do_one_loop;
# can add stuff here
}
What I would like it to do is:
a) detect that the internet has gone down,
b) wait until the internet has gone up,
c) exit the script, so that supervisord can resurrect it.
Are there any other, better ways of doing this?
EDIT: The in-script method did not work, for unknown reasons. I'm trying to make a separate script to solve it.
#!/usr/bin/perl
use Net::Ping::External;
while (1) {
while (Net::Ping::External::ping(host => "8.8.8.8")) { sleep 5; }
sleep 5 until Net::Ping::External::ping(host => "8.8.8.8");
system("sudo kill `pgrep -f 'perl painbot.pl'`");
}
Assuming that do_one_loop will not hang (may need to add some alarm if it does), you'll need to actively poll something to tell whether or not the network is up. Something like this should work to ping every 5 seconds after a failure until you get a response, then exit.
use Net::Ping::External;
sub connectionCheck {
return if Net::Ping::External::ping(host => "8.8.8.8");
sleep 5 until Net::Ping::External::ping(host => "8.8.8.8");
exit;
}
Edit:
Since do_one_loop does seem to hang, you'll need some way to wrap a timeout around it. The amount of time depends on how long you expect it to run for, and how long you are willing to wait if it becomes unresponsive. A simple way to do this is using alarm (assuming you are not on windows):
local $SIG{'ALRM'} = sub { die "Timeout" };
alarm 30; # 30 seconds
eval {
$irc->do_one_loop;
alarm 0;
};
The Net::IRC main loop has support for timeouts and scheduled events.
Try something like this (I haven't tested it, and it's been 7 years since I last used the module...):
# connect to IRC, add event handlers, etc.
$time_of_last_ping = $time_of_last_pong = time;
$irc->timeout(30);
# Can't handle PONG in Net::IRC (!), so handle "No origin specified" error
# (this may not work for you; you may rather do this some other way)
$conn->add_handler(409, sub { $time_of_last_pong = time });
while (1) {
$irc->do_one_loop;
# check internet connection: send PING to server
if ( time-$time_of_last_ping > 30 ) {
$conn->sl("PING"); # Should be "PING anything"
$time_of_last_ping = time;
}
break if time-$time_of_last_pong > 90;
}

Using semaphores while modyifing file in perl

I'm kind of newbie to threads in perl.
I have a file with a list of projects (each project is in a separate line), and i want to build those projects in parallel.
Currently, each thread:
opens the file as "read" mode
saves a list of some projects (= some file lines)
closes the file
opens the file again- as "write" mode
rewrites it without the lines that were selected
in order to make sure each thread is the only one to access the file, im trying to use semaphore.
for some reason, threads collisons are occurred, and i can't figure out what am i doing wrong.
i can see (in my "REPORT" which also gets the current time for each build)
that deifferent threads select the same projects from the "shared" file (it happens only once in a while, but still..)
i'm not even sure if my $semaphore decleration is legal as "my" variable.
Any help would be truly appreciated!!
Thanks.
here's a part of my code:
my $semaphore = Thread::semaphore->new() ;
sub build_from_targets_list{
#...
open(REPORT, "+>$REPORT_tmp"); # Open for output
#....
#threads =();
for ($i = 0; $i < $number_of_cores; $i++){
my $thr = threads->new(\&compile_process, $i,*REPORT);
push #threads, $thr;
}
$_->join for #threads;
close (REPORT);
}
### Some stuff..
sub compile_process{
*REPORT = shift(#_);
#...
while (1){
$semaphore->down();
open (DATA_FILE, $shared_file);
flock(DATA_FILE, 2);
while ($record = <DATA_FILE>) {
chomp($record);
push(#temp_target_list,$record);
}
# ... choose some lines (=projects)...
# remove the projects that should be built by this thread:
for ($k = 0; $k < $num_of_targets_in_project; $k++){
shift(#temp_target_list);
}
close(DATA_FILE);
open (REWRITE,">$shared_file");
flock(REWRITE, 2);
seek(REWRITE, 0, 0);
foreach $temp_target (#temp_target_list){
print REWRITE "$temp_target\n";
}
close (REWRITE);
## ... BUILD selected projects...
$semaphore->up();
}
}
First, some basic cleanup of how you're dealing with files. No point in trying to debug a thread problem if it's a simple file issue.
One must check that any file commands (open, close, flock, seek, etc...) succeed. Either stick some or dies on there or use autodie.
Second is the use of a hard coded constant for flock. Those are system dependent, and its hard to remember which mode 2 is. Fcntl provides the constants.
You're opening the data file for reading with an exclusive lock (2 is usually exclusive lock). That should probably be a shared lock. This would be unlikely to cause a problem, but it will cause your threads to block unnecessarily.
Finally, use lexical filehandles instead of a globally scoped glob. This reduces the chance
use Fcntl qw(:flock);
use autodie;
open (my $data_fh, $shared_file);
flock($data_fh, LOCK_SH);
As a side note, the seek $fh, 0, 0 after opening a file for writing is unnecessary. Same goes for seek constants as for flock, use Fcntl to get the constants.
An additional bug is that you're passing in $i, *REPORT but compile_process thinks *REPORT is the first argument. And again the use of global filehandles means that passing it in is redundant, use lexical filehandles.
Now that's out of the way, your basic algorithm seems flawed. compile_process has each thread reading in the whole data file into the thread local array #temp_target_list, shifting some off of that local array and writing the rest out. Because #temp_target_list is per thread, there's no coordination. Unless $num_of_targets_in_project is shared and doing some sort of off screen coordination, but that's not shown.
File based locking is always going to be a little slice of hell. Threads have much better mechanisms for coordination. There's a much easier way to do this.
Assuming the file isn't too large, read each line into a shared array. Then have each thread take items to work on from that array. The array is shared, so as each element is removed the array will update for all the threads. Something like...
use strict;
use warnings;
use autodie;
use threads;
use threads::shared;
my $Max_Threads = 5;
my #Todo : shared;
open my $fh, "<", $work_file;
#Todo = <$fh>;
close $fh;
my #threads;
for (1..$Max_Threads) {
push #threads, threads->new(\&compile_process);
}
$_->join for #threads;
sub compile_process {
while( my $work = shift #Todo ) {
...do whatever with $work...
}
}
If the file is too large to be held in memory, you can use Thread::Queue to build a queue of work items and add to it dynamically.

How can I modify my perl script to use multiple processors?

Hi I have a simple script that takes a file and runs another Perl script on it. The script does this to every picture file in the current folder. This is running on a machine with 2 quad core Xeon processors, 16gb of ram, running RedHat Linux.
The first script work.pl basically calls magicplate.pl passes some parameters and the name of the file for magicplate.pl to process. Magic Plate takes about a minute to process each image. Because work.pl is preforming the same function over 100 times and because the system has multiple processors and cores I was thinking about splitting the task up so that it could run multiple times in parallel. I could split the images up to different folders if necessary. Any help would be great. Thank you
Here is what I have so far:
use strict;
use warnings;
my #initialImages = <*>;
foreach my $file (#initialImages) {
if($file =~ /.png/){
print "processing $file...\n";
my #tmp=split(/\./,$file);
my $name="";
for(my $i=0;$i<(#tmp-1);$i++) {
if($name eq "") { $name = $tmp[$i]; } else { $name=$name.".".$tmp[$i];}
}
my $exten=$tmp[(#tmp-1)];
my $orig=$name.".".$exten;
system("perl magicPlate.pl -i ".$orig." -min 4 -max 160 -d 1");
}
}
You should consider NOT creating a new process for each file that you want to process -- It's horribly inefficient, and probably what is taking most of your time here. Just loading up Perl and whatever modules you use over and over ought to be creating some overhead. I recall a poster on PerlMonks that did something similar, and ended up transforming his second script into a module, reducing the worktime from an hour to a couple of minutes. Not that you should expect such a dramatic improvement, but one can dream..
With the second script refactored as a module, here's an example of thread usage, in which BrowserUK creates a thread pool, feeding it jobs through a queue.
You could use Parallel::ForkManager (set $MAX_PROCESSES to the number of files processed at the same time):
use Parallel::ForkManager;
use strict;
use warnings;
my #initialImages = <*>;
foreach my $file (#initialImages) {
if($file =~ /.png/){
print "processing $file...\n";
my #tmp=split(/\./,$file);
my $name="";
for(my $i=0;$i<(#tmp-1);$i++) {
if($name eq "") { $name = $tmp[$i]; } else { $name=$name.".".$tmp[$i];}
}
my $exten=$tmp[(#tmp-1)];
my $orig=$name.".".$exten;
$pm = new Parallel::ForkManager($MAX_PROCESSES);
my $pid = $pm->start and next;
system("perl magicPlate.pl -i ".$orig." -min 4 -max 160 -d 1");
$pm->finish; # Terminates the child process
}
}
But as suggested by Hugmeir running perl interpreter again and again for each new file is not a good idea.
Import "maigcplate" and use threading.
Start magicplate.pl in the background (you would need to add process throttling)
Import "magicplate" and use fork (add process throttling and a kiddy reaper)
Make "maigcplate" a daemon with a pool of workers = # of CPUs
use an MQ implementation for communication
use sockets for communication
Use webserver(nginx, apache, ...) and wrap in REST for a webservice
etc...
All these center around creating multiple workers that can each run on their own cpu. Certain implementations will use resources better (those that don't start a new process) and be easier to implement and maintain.

Resources