Shared socket object among Perl threads - multithreading

I am trying to create a socket object that can be shared among threads. The following code does not work because socket object is a GLOB. How could the socket object be shared? Can it be done this way?
my $socket1 = IO::Socket::INET->new(
Proto => "tcp",
PeerAddr => "localhost",
PeerPort => "888",
) or die "couldn't connect: $!";
my $socket_shared =shared_clone($socket1);
....
my $thr1 = threads->create(\&Thread_1);
$thr1->join();
sub Thread_1 {
lock($socket_cpy);
my $data = "Msg.\n";
$socket1->send($data);
$socket1->recv($data,1024);
}
ERROR: Unsupported ref type: GLOB at line (7 here, where the shared_clone is called).

Can I suggest instead that you don't try and share a socket between threads? That feels like something that's going to be asking for concurrency issues between them.
Whilst there are (probably) ways of doing that, I'd suggest instead - have a single thread 'responsible' for the IO, and use something like Thread::Queue to interact with it.
E.g. something like:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $output_q = Thread::Queue->new();
my $nthreads = 1;
sub socket_thread {
my $socket1 = IO::Socket::INET->new(
Proto => "tcp",
PeerAddr => "localhost",
PeerPort => "888",
) or die "couldn't connect: $!";
while ( my $data = $output_q->dequeue() ) {
$socket1->send($data);
$socket1->recv( $data, 1024 );
}
}
sub worker_thread {
$output_q->enqueue( "Some text from thread: ",
threads->self->tid(), "\n" );
}
for ( 1 .. $nthreads ) {
threads->create( \&worker_thread );
}
foreach my $thr ( threads->list() ) {
$thr->join();
}
That way you don't need to pass the socket around at all, you can use one or more queues to serialise your IO. This - to my mind - is one of the stronger reasons to thread in perl in the first place - you've better IPC mechanisms to work with.
Otherwise I'd be tempted to suggest using forks, which (on Unix) are generally more efficient.

Related

Perl threads, sockets and STDIN interference

I am trying to use a perl threads and sockets, to create a simple client/server application. The problem comes in client side, when trying to mix STDIN with socket reading in different threads. Note, I succeeded with using Tkx as workaround to STDIN. But my intention is to build short samples for teaching purposes only, in several programming languages. I intend to create the program minimal, with no UI, as simple as possible.
Here is the code of client, which has the problems:
use strict;
use IO::Socket::INET;
use threads;
our $sock = new IO::Socket::INET ( PeerAddr => 'localhost', PeerPort => 1234, Proto => 'tcp') or die "cannot connect to localhost";
my $thr = threads->create(\&msg_proc, $sock); #, $sock);
for (my $msg; $msg ne "exit";)
{
$msg = <STDIN>;
$msg =~ s/[\r\n\s\0]*\Z//g;
$sock->send ("$msg\n");# or die "can't send";
};
$sock->close();
print "exit main thread\n";
$thr->join() if $thr->is_running();
print "exit all\n";
sub msg_proc
{
my $svr = shift;
my $i = 0;
while ($svr)
{
sleep(1);
$svr->send ("{slept $i}\n") or die "server closed connection";
my $svr_msg = <$svr> or die "server closed connection";
$svr_msg =~ s/[\r\n\s\0]*\Z//g;
print "From server: <<$i $svr_msg>>\n";
$i++;
}
print "sock exit\n";
}
The problem starts when I remove the line $svr->send in the thread procedure msg_proc. In this case the client fails to read normally from STDIN, looks like interfering with socket reading operation <$svr>. Looks like it interferes with <STDIN> operation. For some reason these two can't coexist in parallel. Note, the C++ and Java versions of these demos do not have this problem.
This is the code of the server:
use strict;
use IO::Socket::INET;
use threads;
my $sock = IO::Socket::INET -> new (LocalPort => 1234, Proto => 'tcp', Listen => 1, Reuse => 1) or die "Could not create socket: $!\n";
print "server started\n";
my $i = 0;
while (my $s = $sock->accept())
{
print "New connection\n";
my $thr = threads->create(\&client_proc, $s, $i);
$s->send("welcome $i\n");
$i++;
}
sub client_proc
{
my $client = shift;
my $client_no = shift;
print "## client $client_no started\n";
$client->send("hello $client_no\n");
for (my $msg; $msg ne "exit";)
{
$msg = <$client> or die "client $client_no closed connection";
$msg =~ s/[\r\n\s\0]*\Z//;
print "From client $client_no: '$msg' len: ", length ($msg), "\n";
$client->send( "client $client_no: $msg\n" ) or die "client $client_no closed connection";
};
$client->close();
print "## client $i exit\n";
}
As I understand from here, the use of interpreter-based threads in perl is officially discouraged. But I can't understand, what interpreter-based threads actually means, and what is exactly the alternative provided. Does this mean using threads is discouraged in perl at all?
Note: I am using Strawberry Perl 5.32.1 64 bit under Windows 11, zip package, no msi. But under ActiveState Perl the problem was identically the same.

Perl parallel crawler multithreading

I have a multi threaded Perl crawler which is working fine if I declare URLs in array.How ever if I read the URLs from DB I am getting "segmentation failure" error.Please help me to fix this issue.Thanks
Direct URL declaration
use 5.012; use warnings;
use threads;
use Thread::Queue;
use LWP::UserAgent;
use constant THREADS => 10;
my $queue = Thread::Queue->new();
my #URLs =qw(http://www.example.com
http://www.example.com1
http://www.example.com2 );
print #URLs;
my #threads;
for (1..THREADS) {
push #threads, threads->create(sub {
my $ua = LWP::UserAgent->new;
$ua->timeout(5); # short timeout for easy testing.
while(my $task = $queue->dequeue) {
my $response = eval{ $ua->get($task)->status_line };
say "$task --> $response";
}
});
}
$queue->enqueue(#URLs);
$queue->enqueue(undef) for 1..THREADS;
# ... here work is done
$_->join foreach #threads;
Trying to read the URLs from DB
my $dbh = DBI->connect("DBI:mysql:$database;host=$server", $username, $password) # Get the rows from database
|| die "Could not connect to database: $DBI::errstr";
my $sth = $dbh->prepare('select cname,url,xpath,region from competitors') #query to select required fields
|| die "$DBI::errstr";
$sth->execute();
if ($sth->rows < 0) {
print "Sorry, no domains found.\n";
}
else {
while (my $results = $sth->fetchrow_hashref) {
my $competitor= $results->{cname};
my $url = $results->{url};
my $xpath = $results->{xpath};
my $region = $results->{region};
push(my #all,$url);
use constant THREADS => 10;
my $queue = Thread::Queue->new();
my #URLs=#all;
my #threads;
for (1..THREADS) {
push #threads, threads->create(sub {
my $ua = LWP::UserAgent->new;
$ua->timeout(500); # short timeout for easy testing.
while(my $task = $queue->dequeue) {
my $response = eval{ $ua->get($task)->status_line };
print "$task --> $response";
}
});
}
$queue->enqueue( #URLs);
$queue->enqueue(undef) for 1..THREADS;
# ... here work is done
$_->join foreach #threads;
}
} #close db
$sth->finish;
$dbh->disconnect;
Expected o/p
www.example.com-->200 ok
www.example.com1-->200 ok
Current o/p
Segmentation error
Your $sth and $dbh are still around when you create the thread, creating a copy of them, which is a no-no.
newly created threads must make their own connections to the database. Handles can't be shared across threads.
Better scoping of your variables should avoid the problem.
use strict;
use warnings;
use threads;
use Thread::Queue 3.01 qw( );
use constant NUM_WORKERS => 10;
sub worker {
my ($ua, $url) = #_;
...
}
{
my $q = Thread::Queue->new();
for (1..NUM_WORKERS) {
async {
my $ua = LWP::UserAgent->new();
while ( my $url = $q->dequeue() ) {
eval { worker($ua, $url); 1 }
or warn $#;
}
};
}
{
my $dbh = DBI->connect(..., { RaiseError => 1 });
my $sth = $dbh->prepare('SELECT ...');
$sth->execute();
while ( my $row = $sth->fetchrow_hashref() ) {
$q->enqueue($row->{url});
}
}
$q->end();
$_->join for threads->list;
}
You should declare #all outside of the while loop, then, when the URLs are pushed, close that loop and go on
my $dbh = DBI->connect("DBI:mysql:$database;host=$server", $username, $password) # Get the rows from database
|| die "Could not connect to database: $DBI::errstr";
my $sth = $dbh->prepare('select cname,url,xpath,region from competitors') #query to select required fields
|| die "$DBI::errstr";
$sth->execute();
# >> declare your URL-array before starting to fetch
my #URLs;
if ($sth->rows < 0) {
print "Sorry, no domains found.\n";
}
else {
while (my $results = $sth->fetchrow_hashref) {
my $competitor= $results->{cname};
my $url = $results->{url};
my $xpath = $results->{xpath};
my $region = $results->{region};
push(#URLs,$url);
}
}
$sth->finish;
$dbh->disconnect;
use constant THREADS => 10;
my $queue = Thread::Queue->new();
my #threads;
for (1..THREADS) {
push #threads, threads->create(sub {
my $ua = LWP::UserAgent->new;
$ua->timeout(500); # short timeout for easy testing.
while(my $task = $queue->dequeue) {
my $response = eval{ $ua->get($task)->status_line };
print "$task --> $response";
}
});
}
$queue->enqueue( #URLs);
$queue->enqueue(undef) for 1..THREADS;
# ... here work is done
$_->join foreach #threads;
Segfaults are very rare as a result of perl code. They're memory related, and usually mean a problem in external binaries. (I'd be betting on DBI here)
Threads in particular have a lot of legacy issues - they're getting better in newer versions of perl though. I would strongly recommend that you consider upgrading to a recent version of perl if you haven't already. I know that isn't always an option, but it's a good idea.
It's really hard to second guess your problem, as I don't have your DB, so I can't recreate it.
I would suggest that generally there's a few things you can do to keep threads 'clean' - the way your code works, is the DB handles are in scope within the threads. I would avoid doing that. Declare the thread sub right at the top, with as narrow a scope as possible.
I will note though:
push ( my #all, $url );
probably doesn't do what you think it does!
But yes, taking your code I'd put it like this:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
use LWP;
my $num_threads = 10;
my $work_q = Thread::Queue->new();
sub worker {
my $ua = LWP::UserAgent->new;
$ua->timeout(500); # short timeout for easy testing.
while ( my $task = $work_q->dequeue ) {
my $response = eval { $ua->get($task)->status_line };
print "$task --> $response";
}
}
## fetch_list
sub fetch_url_list {
my $dbh = DBI->connect( "DBI:mysql:$database;host=$server",
$username, $password ) # Get the rows from database
|| die "Could not connect to database: $DBI::errstr";
my $sth =
$dbh->prepare( 'select cname,url,xpath,region from competitors'
) #query to select required fields
|| die "$DBI::errstr";
$sth->execute();
if ( $sth->rows < 0 ) {
print "Sorry, no domains found.\n";
}
else {
while ( my $results = $sth->fetchrow_hashref ) {
my $competitor = $results->{cname};
my $url = $results->{url};
my $xpath = $results->{xpath};
my $region = $results->{region};
$work_q -> enqueue ( $url );
}
}
$sth->finish;
$dbh->disconnect;
}
for ( 1 .. $num_threads ) {
threads->create( \&worker );
}
fetch_url_list();
$work_q->end;
foreach my $thr ( threads->list() ) {
$thr->join();
}
This way - none of your threads have the DB stuff 'in scope', and the DB doesn't have thread stuff in scope. That reduces odds of 'pollution' causing you problems. In particular - threads when they start 'copy' everything in the current scope, which can do really wierd things when they're objects. (e.g. such as the DB handles)
Failing that, I would consider looking at a 'forking' approach. Threads are good at passing data back and forth, but forks are generally more efficient (definitely on Unix based systems) when you don't need to pass data back and forth (and you don't, you're just running a test and printing a result).

Memory leak in perl server

I'm trying to write a multithreaded server with perl (Windows x64). When trying to connect to it from another computer, I found the memory and handle usage kept going up, even if I maintained only one connection at a time. And after thousands of trials it used up nearly all system memory. I can't figure out the reason.
Here is the server side:
use IO::Socket::INET;
use threads;
sub session_thread
{
my $client_socket=$_[0];
my $client_address = $client_socket->peerhost();
my $client_port = $client_socket->peerport();
print "connection from $client_address:$client_port\n";
my $data = "";
$client_socket->recv($data, 1024);
print "$client_address:$client_port says: $data";
$data = "ok";
$client_socket->send($data);
shutdown($client_socket, 1);
$client_socket->close();
threads->exit();
}
$| = 1;
my $socket = new IO::Socket::INET (
LocalHost => '0.0.0.0',
LocalPort => '7777',
Proto => 'tcp',
Listen => 5,
ReuseAddr => 1
);
die "cannot create socket $!\n" unless $socket;
print "server waiting for client connection on port 7777\n";
while(1)
{
my $client_socket = $socket->accept();
threads->create('session_thread',$client_socket);
}
$socket->close();
Thanks.
You either have to wait for the thread to finish by joining it or tell Perl that you don't care about the thread's return value and that Perl itself should clean up the data once the thread exits. The latter seems to match your use case and is done by detaching.
Also note that using exit is not necessary in your example. You can simply return from the thread's subroutine normally. exit is used for ending a thread from a deeper nesting level within the program.

Perl multithreading - quit on first thread success

I'm by no means an expert in perl or multithreading but I'm convinced I'm "doing it wrong" and need some guidance as to how to alter it so that I don't get thread quit warnings.
As you can see, this program reads in argument 0, does a lookup to find each of the IP addresses associated with a host name, then tests each IP to see if ssh is running.
The purpose of this script is to spawn a tcp test for each host and return the first successful tcp connect.
Can anyone suggest a way of doing this that is more reliable and doesn't require a sleep?
use strict;
use warnings;
BEGIN {
use Config;
$Config{useithreads} or die('Recompile Perl with threads to run this program.');
}
use threads;
use IO::Socket::INET;
$| = 1;
unless($ARGV[0]){ die("error please use argument")}
my $timeoutval=3;
my $testHost=$ARGV[0];
my $dlquery=`dig $testHost | grep 'IN A'`;
my $SUCCESS=0;
sub testSSHhost {
my $fqdn = shift;
my $socket = new IO::Socket::INET (
PeerHost => $fqdn,
PeerPort => '22',
Proto => 'tcp',
Timeout => $timeoutval,
) or return "ERROR in Socket Creation : $!\n";
my $tcpData = <$socket>;
$socket->close();
if ($tcpData && $tcpData=~/SSH/){
print "$fqdn\n";
$SUCCESS=1;
exit(0);
}
return $fqdn;
}
my #threads;
for my $line (split(/\n/,$dlquery)){
my #linesplit=split(/ /,$line);
$linesplit[0]=~s/\.$//;
$linesplit[0]=~s/ *//g;
my $t = threads->new(\&testSSHhost, $linesplit[0]);
push(#threads,$t);
}
while (!$SUCCESS){sleep 0.3}
Really what i'm trying to avoid is the "A thread exited while 2 threads were running." error message
Or a "segmentation fault" message
Something like this (not tested!):
use Modern::Perl;
use threads;
use Thread::Queue;
use IO::Socket::INET;
$| = 1;
my $testHost = $ARGV[0];
my $dlquery = `dig $testHost | grep 'IN A'`;
my $config = { NUMBER_OF_THREADS => 5 }; #how many threads you gonna use?
my $queue = Thread::Queue->new;
my $queue_processed = Thread::Queue->new;
for my $line ( split( /\n/, $dlquery ) ) {
my ($ip) = split( / /, $line );
$ip =~ s/\.$//;
$ip =~ s/ *//g;
$queue->enqueue($ip);
}
foreach my $thread_id ( 1 .. $config->{NUMBER_OF_THREADS} ) {
$queue->enqueue(undef);
my $thread = threads->create( \&testSSHhost() )->detach();
}
while ( $queue->pending() ) {
my $result = $queue_processed->dequeue();
if ( $result->{status} ) {
say $result->{ip};
}
}
sub testSSHhost {
while ( my $fqdn = $queue->dequeue() ) {
my $status = 0;
my $socket = new IO::Socket::INET(
PeerHost => $fqdn,
PeerPort => 22,
Proto => 'tcp',
Timeout => 3,
) or return "ERROR in Socket Creation : $!\n";
my $tcpData = <$socket>;
$socket->close();
if ( $tcpData && $tcpData =~ /SSH/ ) {
$status = 1;
}
$queue_processed->enqueue( { ip => $fqdn, status => $status, } );
}
return 0;
}
You can achieve it with Qeues:
http://search.cpan.org/dist/Thread-Queue/lib/Thread/Queue.pm
Before spawning the threads, you create a queue, and then let the threads push the successful IP address in it. Then the parent process would block in dequeuing until something pops up.

How to use threads in Perl?

I want to use threads in Perl to increase the speed of my program ... for example i want to use 20 threads in this code:
use IO::Socket;
my $in_file2 = 'rang.txt';
open DAT,$in_file2;
my #ip=<DAT>;
close DAT;
chomp(#ip);
foreach my $ip(#ip)
{
$host = IO::Socket::INET->new(
PeerAddr => $ip,
PeerPort => 80,
proto => 'tcp',
Timeout=> 1
)
and open(OUT, ">>port.txt");
print OUT $ip."\n";
close(OUT);
}
In the above code we give a list of ips and scan a given port. I want use threads in this code. Is there any other way to increase the speed of my code?
Thanks.
Instead of using threads, you might want to look into AnyEvent::Socket, or Coro::Socket, or POE, or Parallel::ForkManager.
Read the Perl threading tutorial.
Perl can do both threading and forking. "threads" is officially not recommended - in no small part because it's not well understood, and - perhaps slightly counterintutively - isn't lightweight like threads are in some programming languages.
If you are particularly keen to thread, the 'worker' model of threading works much better than spawning a thread per task. You might do the latter in some languages - in perl it's very inefficient.
As such you might do something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use Thread::Queue;
use IO::Socket;
my $nthreads = 20;
my $in_file2 = 'rang.txt';
my $work_q = Thread::Queue->new;
my $result_q = Thread::Queue->new;
sub ip_checker {
while ( my $ip = $work_q->dequeue ) {
chomp($ip);
$host = IO::Socket::INET->new(
PeerAddr => $ip,
PeerPort => 80,
proto => 'tcp',
Timeout => 1
);
if ( defined $host ) {
$result_q->enqueue($ip);
}
}
}
sub file_writer {
open( my $output_fh, ">>", "port.txt" ) or die $!;
while ( my $ip = $result_q->dequeue ) {
print {$output_fh} "$ip\n";
}
close($output_fh);
}
for ( 1 .. $nthreads ) {
push( #workers, threads->create( \&ip_checker ) );
}
my $writer = threads->create( \&file_writer );
open( my $dat, "<", $in_file2 ) or die $!;
$work_q->enqueue(<$dat>);
close($dat);
$work_q->end;
foreach my $thr (#workers) {
$thr->join();
}
$result_q->end;
$writer->join();
This uses a queue to feed a set of (20) worker threads with an IP list, and work their way through them, collating and printing results through the writer thread.
But as threads aren't really recommended any more, a better way might be to use Parallel::ForkManager which with your code might go a bit like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Fcntl qw ( :flock );
use IO::Socket;
my $in_file2 = 'rang.txt';
open( my $input, "<", $in_file2 ) or die $!;
open( my $output, ">", "port.txt" ) or die $!;
my $manager = Parallel::ForkManager->new(20);
foreach my $ip (<$input>) {
$manager->start and next;
chomp($ip);
my $host = IO::Socket::INET->new(
PeerAddr => $ip,
PeerPort => 80,
proto => 'tcp',
Timeout => 1
);
if ( defined $host ) {
flock( $output, LOCK_EX ); #exclusive or write lock
print {$output} $ip, "\n";
flock( $output, LOCK_UN ); #unlock
}
$manager->finish;
}
$manager->wait_all_children;
close($output);
close($input);
You need to be particularly careful of file IO when multiprocessing, because the whole point is your execution sequence is no longer well defined. So it's insanely easy to end up with different threads clobbering files that another thread has open, but hasn't flushed to disk.
I note your code - you seem to rely on failing a file open, in order to not print to it. That's not a nice thing to do, especially when your file handle is not lexically scoped.
But in both multiprocessing paradigms I outlined above (there are others, these are the most common) you still have to deal with the file IO serialisation. Note that your 'results' will be in a random order in both, because it'll very much depend on when the task completes. If that's important to you, then you'll need to collate and sort after your threads or forks complete.
It's probably generally better to look towards forking - as said above, in threads docs:
The "interpreter-based threads" provided by Perl are not the fast, lightweight system for multitasking that one might expect or hope for. Threads are implemented in a way that make them easy to misuse. Few people know how to use them correctly or will be able to provide help.
The use of interpreter-based threads in perl is officially discouraged.

Resources