I want to use threads in Perl to increase the speed of my program ... for example i want to use 20 threads in this code:
use IO::Socket;
my $in_file2 = 'rang.txt';
open DAT,$in_file2;
my #ip=<DAT>;
close DAT;
chomp(#ip);
foreach my $ip(#ip)
{
$host = IO::Socket::INET->new(
PeerAddr => $ip,
PeerPort => 80,
proto => 'tcp',
Timeout=> 1
)
and open(OUT, ">>port.txt");
print OUT $ip."\n";
close(OUT);
}
In the above code we give a list of ips and scan a given port. I want use threads in this code. Is there any other way to increase the speed of my code?
Thanks.
Instead of using threads, you might want to look into AnyEvent::Socket, or Coro::Socket, or POE, or Parallel::ForkManager.
Read the Perl threading tutorial.
Perl can do both threading and forking. "threads" is officially not recommended - in no small part because it's not well understood, and - perhaps slightly counterintutively - isn't lightweight like threads are in some programming languages.
If you are particularly keen to thread, the 'worker' model of threading works much better than spawning a thread per task. You might do the latter in some languages - in perl it's very inefficient.
As such you might do something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use Thread::Queue;
use IO::Socket;
my $nthreads = 20;
my $in_file2 = 'rang.txt';
my $work_q = Thread::Queue->new;
my $result_q = Thread::Queue->new;
sub ip_checker {
while ( my $ip = $work_q->dequeue ) {
chomp($ip);
$host = IO::Socket::INET->new(
PeerAddr => $ip,
PeerPort => 80,
proto => 'tcp',
Timeout => 1
);
if ( defined $host ) {
$result_q->enqueue($ip);
}
}
}
sub file_writer {
open( my $output_fh, ">>", "port.txt" ) or die $!;
while ( my $ip = $result_q->dequeue ) {
print {$output_fh} "$ip\n";
}
close($output_fh);
}
for ( 1 .. $nthreads ) {
push( #workers, threads->create( \&ip_checker ) );
}
my $writer = threads->create( \&file_writer );
open( my $dat, "<", $in_file2 ) or die $!;
$work_q->enqueue(<$dat>);
close($dat);
$work_q->end;
foreach my $thr (#workers) {
$thr->join();
}
$result_q->end;
$writer->join();
This uses a queue to feed a set of (20) worker threads with an IP list, and work their way through them, collating and printing results through the writer thread.
But as threads aren't really recommended any more, a better way might be to use Parallel::ForkManager which with your code might go a bit like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Fcntl qw ( :flock );
use IO::Socket;
my $in_file2 = 'rang.txt';
open( my $input, "<", $in_file2 ) or die $!;
open( my $output, ">", "port.txt" ) or die $!;
my $manager = Parallel::ForkManager->new(20);
foreach my $ip (<$input>) {
$manager->start and next;
chomp($ip);
my $host = IO::Socket::INET->new(
PeerAddr => $ip,
PeerPort => 80,
proto => 'tcp',
Timeout => 1
);
if ( defined $host ) {
flock( $output, LOCK_EX ); #exclusive or write lock
print {$output} $ip, "\n";
flock( $output, LOCK_UN ); #unlock
}
$manager->finish;
}
$manager->wait_all_children;
close($output);
close($input);
You need to be particularly careful of file IO when multiprocessing, because the whole point is your execution sequence is no longer well defined. So it's insanely easy to end up with different threads clobbering files that another thread has open, but hasn't flushed to disk.
I note your code - you seem to rely on failing a file open, in order to not print to it. That's not a nice thing to do, especially when your file handle is not lexically scoped.
But in both multiprocessing paradigms I outlined above (there are others, these are the most common) you still have to deal with the file IO serialisation. Note that your 'results' will be in a random order in both, because it'll very much depend on when the task completes. If that's important to you, then you'll need to collate and sort after your threads or forks complete.
It's probably generally better to look towards forking - as said above, in threads docs:
The "interpreter-based threads" provided by Perl are not the fast, lightweight system for multitasking that one might expect or hope for. Threads are implemented in a way that make them easy to misuse. Few people know how to use them correctly or will be able to provide help.
The use of interpreter-based threads in perl is officially discouraged.
Related
I am trying to use a perl threads and sockets, to create a simple client/server application. The problem comes in client side, when trying to mix STDIN with socket reading in different threads. Note, I succeeded with using Tkx as workaround to STDIN. But my intention is to build short samples for teaching purposes only, in several programming languages. I intend to create the program minimal, with no UI, as simple as possible.
Here is the code of client, which has the problems:
use strict;
use IO::Socket::INET;
use threads;
our $sock = new IO::Socket::INET ( PeerAddr => 'localhost', PeerPort => 1234, Proto => 'tcp') or die "cannot connect to localhost";
my $thr = threads->create(\&msg_proc, $sock); #, $sock);
for (my $msg; $msg ne "exit";)
{
$msg = <STDIN>;
$msg =~ s/[\r\n\s\0]*\Z//g;
$sock->send ("$msg\n");# or die "can't send";
};
$sock->close();
print "exit main thread\n";
$thr->join() if $thr->is_running();
print "exit all\n";
sub msg_proc
{
my $svr = shift;
my $i = 0;
while ($svr)
{
sleep(1);
$svr->send ("{slept $i}\n") or die "server closed connection";
my $svr_msg = <$svr> or die "server closed connection";
$svr_msg =~ s/[\r\n\s\0]*\Z//g;
print "From server: <<$i $svr_msg>>\n";
$i++;
}
print "sock exit\n";
}
The problem starts when I remove the line $svr->send in the thread procedure msg_proc. In this case the client fails to read normally from STDIN, looks like interfering with socket reading operation <$svr>. Looks like it interferes with <STDIN> operation. For some reason these two can't coexist in parallel. Note, the C++ and Java versions of these demos do not have this problem.
This is the code of the server:
use strict;
use IO::Socket::INET;
use threads;
my $sock = IO::Socket::INET -> new (LocalPort => 1234, Proto => 'tcp', Listen => 1, Reuse => 1) or die "Could not create socket: $!\n";
print "server started\n";
my $i = 0;
while (my $s = $sock->accept())
{
print "New connection\n";
my $thr = threads->create(\&client_proc, $s, $i);
$s->send("welcome $i\n");
$i++;
}
sub client_proc
{
my $client = shift;
my $client_no = shift;
print "## client $client_no started\n";
$client->send("hello $client_no\n");
for (my $msg; $msg ne "exit";)
{
$msg = <$client> or die "client $client_no closed connection";
$msg =~ s/[\r\n\s\0]*\Z//;
print "From client $client_no: '$msg' len: ", length ($msg), "\n";
$client->send( "client $client_no: $msg\n" ) or die "client $client_no closed connection";
};
$client->close();
print "## client $i exit\n";
}
As I understand from here, the use of interpreter-based threads in perl is officially discouraged. But I can't understand, what interpreter-based threads actually means, and what is exactly the alternative provided. Does this mean using threads is discouraged in perl at all?
Note: I am using Strawberry Perl 5.32.1 64 bit under Windows 11, zip package, no msi. But under ActiveState Perl the problem was identically the same.
Here's the guts of the program using Parallel::ForkManager. It seems to stop at 200 proccesses, sometimes its around 30, depending on the size of the pgsql query that collects URLs to send to Mojo::UserAgent. There seems to be some hard limits somewhere? Is there a better way to write this so that I don't run into those limits? The machine its running on has 16 CPUs and 128GB of memory, so it can certainly run more than 200 proccesses that will die after the Mojo::UserAgent timeout, which is generally 2 seconds.
use Parallel::ForkManager;
use Mojo::Base-strict;
use Mojo::UserAgent;
use Mojo::Pg;
use Math::Random::Secure qw(rand irand);
use POSIX qw(strftime);
use Socket;
use GeoIP2::Database::Reader;
use File::Spec::Functions qw(:ALL);
use File::Basename qw(dirname);
use feature 'say';
$max_kids = 500;
sub do_auth {
...
push( #url, $authurl );
}
do_auth();
my $pm = Parallel::ForkManager->new($max_kids);
LINKS:
foreach my $linkarray (#url) {
$pm->start and next LINKS; # do the fork
my $ua = Mojo::UserAgent->new( max_redirects => 5, timeout => $timeout );
$ua->get($url);
$pm->finish;
}
$pm->wait_all_children;
For your example code (fetching a URL) I would never use Forkmanager. I would use Mojo::IOLoop::Delay or non-blocking calling style.
use Mojo::UserAgent;
use feature 'say';
my $ua = Mojo::UserAgent->new;
$ua->inactivity_timeout(15);
$ua->connect_timeout(15);
$ua->request_timeout(15);
$ua->max_connections(0);
my #url = ("http://stackoverflow.com/questions/41253272/joining-a-view-and-a-table-in-mysql",
"http://stackoverflow.com/questions/41252594/develop-my-own-website-builder",
"http://stackoverflow.com/questions/41251919/chef-mysql-server-configuration",
"http://stackoverflow.com/questions/41251689/sql-trigger-update-error",
"http://stackoverflow.com/questions/41251369/entity-framework-how-to-add-complex-objects-to-db",
"http://stackoverflow.com/questions/41250730/multi-dimensional-array-from-matching-mysql-columns",
"http://stackoverflow.com/questions/41250528/search-against-property-in-json-object-using-mysql-5-6",
"http://stackoverflow.com/questions/41249593/laravel-time-difference",
"http://stackoverflow.com/questions/41249364/variable-not-work-in-where-clause-php-joomla");
foreach my $linkarray (#url) {
# Run all requests at the same time
$ua->get($linkarray => sub {
my ($ua, $tx) = #_;
say $tx->res->dom->at('title')->text;
});
}
Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
Most likely you are running into an operating system limit on threads or processes. The quick and dirty way to fix this would be to increase the limit, which is usually configurable. That said, rewriting the code not to use so many short lived threads is a more scalable solution.
sub worker {
# Curse stuff
initscr();
cbreak();
noecho();
my $fh = FileHandle->new;
open $fh, q{-|},
"$myexe #pre_args make #after_args 2>&1"
or croak 'Cannot open';
process_output($fh);
my $err = close $fh;
endwin();
return;
}
sub process_output {
my ($fh) = #_;
while (my $line = <$fh>) {
#do stuff
}
}
ReadMode 3;
threads->create(\&worker);
while (threads->list(threads::running)) {
my $char = ReadKey -1, *STDIN;
if ($char) {
if ($char eq 'q') {
endwin();
kill('INT', $$);
threads->exit();
}
}
}
ReadMode 0;
foreach my $thr (threads->list) {
$thr->join();
When I press 'q':
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached
and then I did ps -fu myuserid
I saw that $myexe was still running
Q1) How can i force child process to exit? threads->exit() didnt seem to work
The most obvious problem with the sample program is that it is using multiple threads for the curses library. That won't work. (curses is not thread-safe). If you have to do this, keep all of the curses work in the same thread.
You call exit but don't detach or join the threads.
Stick:
foreach my $thr ( threads -> list() ) {
$thr -> join;
}
at the end, and your 'main' code will wait for your threads to (all) exit.
However - threads -> exit() is for exiting the current thread. See:
http://perldoc.perl.org/threads.html#EXITING-A-THREAD
In order to terminate another thread, you need something like threads -> kill. Either send a 'proper' kill signal, or use a signal handler for SIGUSR1 or similar.
I'd probably approach it a little differently though - define a shared variable:
my $done : shared;
And then test it within the while loop, so you've a normal execution flow rather than a mid flight kill.
Your kill ( INT, $$ ) is going to be killing your main process, and just drop the threads on the floor. That's not good style really.
So - to backtrack a bit - the problem you're having - I think - is because 'signals' in perl aren't what you're expecting them to be. Perl uses safe signals, which makes a blocking call (such as a read from a filehandle) block the signal.
http://perldoc.perl.org/perlipc.html#Deferred-Signals-%28Safe-Signals%29
So I wouldn't normally suggest using signals within threads are a good idea - they're a little erratic, which isn't good for program predictability.
You can 'throw' and 'catch' signals, by defining a signal handler within the thread:
$SIG{'USR1'} = sub { print "Caught USR1"; die };
And then 'call' it using:
$worker -> kill ( 'USR1' );
But in certain circumstances, that won't 'bail out' immediately in the way you expect.
For example - this will work:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
sub worker {
print Dumper \%SIG;
my $tail_pid = open ( my $tail_proc, "-|", "tail -f /var/log/messages" );
$SIG{'USR1'} = sub { print "Caught USR1\nKilling $tail_pid"; kill ( 'TERM', $tail_pid ); die; threads -> exit() };
print "Pre-loop\n";
while ( 1 ) {
print "Thread processing\n";
sleep 1;
}
print "Done";
return;
}
my $worker = threads -> create ( \&worker );
sleep 2;
print "Sending kill\n";
$worker -> kill ( 'SIGUSR1' );
sleep 2;
print "waiting for join\n";
$worker -> join();
But if your while loop is reading from the file handle - it's a blocking call, so the 'kill' will be held until the block lifts.
E.g.
while ( <$tail_proc> ) {
Will go into a block pending IO, and your thread won't 'get' the signal until IO occurs, and the thread continues processing. That might be sufficient for your needs though. Otherwise you're potentially looking at select or IO::Select to test if the handle is readable.
So what you may want to do instead is just kill the process that's 'feeding' your while loop - because by doing so, it'll unblock and the while will become undef and exit.
E.g.:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use threads::shared;
my $kill_pid : shared;
sub worker {
$kill_pid = open ( my $tail_proc, "-|", "tail -f /var/log/messages" );
print "Pre-loop\n";
while ( <$tail_proc> ) {
print "Thread processing\n";
print;
sleep 1;
}
print "Done";
return;
}
my $worker = threads -> create ( \&worker );
sleep 2;
print "Sending kill\n";
if ( defined $kill_pid ) { print "killing tail, $kill_pid\n"; kill ( 'TERM', $kill_pid ); };
sleep 2;
print "waiting for join\n";
$worker -> join();
I am trying to create a socket object that can be shared among threads. The following code does not work because socket object is a GLOB. How could the socket object be shared? Can it be done this way?
my $socket1 = IO::Socket::INET->new(
Proto => "tcp",
PeerAddr => "localhost",
PeerPort => "888",
) or die "couldn't connect: $!";
my $socket_shared =shared_clone($socket1);
....
my $thr1 = threads->create(\&Thread_1);
$thr1->join();
sub Thread_1 {
lock($socket_cpy);
my $data = "Msg.\n";
$socket1->send($data);
$socket1->recv($data,1024);
}
ERROR: Unsupported ref type: GLOB at line (7 here, where the shared_clone is called).
Can I suggest instead that you don't try and share a socket between threads? That feels like something that's going to be asking for concurrency issues between them.
Whilst there are (probably) ways of doing that, I'd suggest instead - have a single thread 'responsible' for the IO, and use something like Thread::Queue to interact with it.
E.g. something like:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $output_q = Thread::Queue->new();
my $nthreads = 1;
sub socket_thread {
my $socket1 = IO::Socket::INET->new(
Proto => "tcp",
PeerAddr => "localhost",
PeerPort => "888",
) or die "couldn't connect: $!";
while ( my $data = $output_q->dequeue() ) {
$socket1->send($data);
$socket1->recv( $data, 1024 );
}
}
sub worker_thread {
$output_q->enqueue( "Some text from thread: ",
threads->self->tid(), "\n" );
}
for ( 1 .. $nthreads ) {
threads->create( \&worker_thread );
}
foreach my $thr ( threads->list() ) {
$thr->join();
}
That way you don't need to pass the socket around at all, you can use one or more queues to serialise your IO. This - to my mind - is one of the stronger reasons to thread in perl in the first place - you've better IPC mechanisms to work with.
Otherwise I'd be tempted to suggest using forks, which (on Unix) are generally more efficient.
I was wanting to implement some logging for a threaded script I have, and I came across File::Tee. However, when attempting to ppm the module on a Windows box, it's not found (and according to activestate, not supported on Windows).
I really liked that you could lock file access though, by doing something like:
tee STDOUT, {mode => '>>', open => '$ENV{DOM}\\threaded_build.log', lock => 1};
tee STDERR, {mode => '>>', open => '$ENV{DOM}\\threaded_debug.log', lock => 1};
Is there a cross-platform, thread-safe alternative?
File::Tee takes extra care to handle output generated by external programs run through system or XS code that doesn't go through perlio. I think that's what makes it incompatible with Windows.
IO::Tee is more cross-platform and I don't think making it thread safe would be too hard to do. The sync code in File::Tee just looks like:
flock($teefh, LOCK_EX) if $target->{lock};
print $teefh $cp;
flock($teefh, LOCK_UN) if $target->{lock};
You could accomplish the same thing in IO::Tee by modifying a couple of methods:
use Fcntl ':flock';
no warnings 'redefine';
sub IO::Tee::PRINT
{
my $self = shift;
my $ret = 1;
foreach my $fh (#$self) {
flock($fh, LOCK_EX);
undef $ret unless print $fh #_;
flock($fh, LOCK_UN);
}
return $ret;
}
sub IO::Tee::PRINTF
{
my $self = shift;
my $fmt = shift;
my $ret = 1;
foreach my $fh (#$self) {
flock($fh, LOCK_EX);
undef $ret unless printf $fh $fmt, #_;
flock($fh, LOCK_UN);
}
return $ret;
}