Why are my Perl threads stuck in an infinite loop? (Thread::Queue) - multithreading

I am using Thread::Queue to push an array onto a queue and process each element of it using threads. Below is a simplified version of my program to demonstrate what is happening.
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use threads::shared;
use Thread::Queue;
# Define queue
my $QUEUE :shared = new Thread::Queue();
# Define values
my #values = qw(string1 string2 string3);
# Enqueue values
$QUEUE->enqueue(#values);
# Get thread limit
my $QUEUE_SIZE = $QUEUE->pending();
my $thread_limit = $QUEUE_SIZE;
# Create threads
for my $i (1 .. $thread_limit) {
my $thread = threads->create(\&work);
}
# Join threads
my $i = 0;
for my $thread (threads->list()) {
$thread->join();
}
print "COMPLETE\n";
# Thread work function
sub work {
while (my $value = $QUEUE->dequeue()) {
print "VALUE: $value\n";
sleep(5);
print "Finished sleeping\n";
}
print "Got out of loop\n";
}
When I run this code I get the following output and then it just hangs forever:
VALUE: string1
VALUE: string2
VALUE: string3
Finished sleeping
Finished sleeping
Finished sleeping
Once the queue reaches its end, the while loop should break and the script should continue but it doesn't appear to ever get out of the loop.
Why is this getting stuck?

Since you never call $QUEUE->end(), your threads are blocking on dequeue() waiting for more entries to appear.
So, ensure you do call $QUEUE->end() after the last call to enqueue, or before joining the threads.

Related

Threaded code exits before all tasks are complete

I am trying to take a portion of an existing script and have it run multiple nmap scans simultaneously to increase the speed of the script.
I initially tried using fork, but it was suggested to me that I should be using threads instead as I am doing this on a Windows box. I modified a code snippet I found online and it partially works.
I am using a list of 23 IP addresses. I have been able to open 10 threads and scan the first 10 addresses, but then the code exits. Ideally, the code would open a new thread each time one exits so that there are always 10 threads running, until it reaches the remainder, in this case there would be three. Then only 3 threads would be open.
This entire code needs to be run inside a subroutine that I have in my original sequential code. I am using ping instead of the nmap command to test.
#!/usr/bin/Perl
use strict;
use threads;
my $i = 0;
my #lines;
# Define the number of threads
my $num_of_threads = 10;
# use the initThreads subroutine to create an array of threads.
my #threads = initThreads();
my #files = glob( "./ping.txt" ) or die "Can't open CMS HostInventory$!"; # Open the CMS Host Inventory csv files for parsing
foreach my $file ( #files ) {
open (CONFIG, '<', $file) or die "Can't ip360_DNS File$!";
#lines = <CONFIG>;
chomp (#lines);
}
# Loop through the array:
foreach ( #threads ) {
# Tell each thread to perform our 'doOperation()' subroutine.
$_ = threads->create(\&doOperation);
}
# This tells the main program to keep running until all threads have finished.
foreach ( #threads ) {
$_->join();
}
print "\nProgram Done!\nPress Enter to exit";
$a = <>;
####################### SUBROUTINES ############################
sub initThreads{
my #initThreads;
for ( my $i = 1; $i <= $num_of_threads; $i++ ) {
push(#initThreads, $i);
}
return #initThreads;
}
sub doOperation{
# Get the thread id. Allows each thread to be identified.
my $id = threads->tid();
my $ip = ($id - 1);
system("ping $lines[$ip] >> ./input/$lines[$ip].txt");
print "Thread $id done!\n";
# Exit the thread
threads->exit();
}

Missing characters while reading input with threads

Let's say we have a script which open a file, then read it line by line and print the line to the terminal. We have a sigle thread and a multithread version.
The problem is than the resulting output of both scripts is almost the same, but not exactly. In the multithread versions there are about ten lines which missed the first 2 chars. I mean, if the real line is something line "Stackoverflow rocks", I obtain "ackoverflow rocks".
I think that this is related to some race condition since if I adjust the parameters to create a lot of little workers, I get more faults than If I use less and bigger workers.
The single thread is like this:
$file = "some/file.txt";
open (INPUT, $file) or die "Error: $!\n";
while ($line = <STDIN>) {
print $line;
}
The multithread version make's use of the thread queue and this implementation is based on the #ikegami approach:
use threads qw( async );
use Thread::Queue 3.01 qw( );
use constant NUM_WORKERS => 4;
use constant WORK_UNIT_SIZE => 100000;
sub worker {
my ($job) = #_;
for (#$job) {
print $_;
}
}
my $q = Thread::Queue->new();
async { while (defined( my $job = $q->dequeue() )) { worker($job); } }
for 1..NUM_WORKERS;
my $done = 0;
while (!$done) {
my #lines;
while (#lines < WORK_UNIT_SIZE) {
my $line = <>;
if (!defined($line)) {
$done = 1;
last;
}
push #lines, $line;
}
$q->enqueue(\#lines) if #lines;
}
$q->end();
$_->join for threads->list;
I tried your program and got similar (wrong) results. Instead of Thread::Semaphore I used lock from threads::shared around the print as it's simpler to use than T::S, i.e.:
use threads;
use threads::shared;
...
my $mtx : shared;
sub worker
{
my ($job) = #_;
for (#$job) {
lock($mtx); # (b)locks
print $_;
# autom. unlocked here
}
}
...
The global variable $mtx serves as a mutex. Its value doesn't matter, even undef (like here) is ok.
The call to lock blocks and returns only if no other threads currently holds the lock on that variable.
It automatically unlocks (and thus makes lock return) when it goes out of scope. In this sample that happens
after every single iteration of the for loop; there's no need for an extra {…} block.
Now we have syncronized the print calls…
But this didn't work either, because print does buffered I/O (well, only O). So I forced unbuffered output:
use threads;
use threads::shared;
...
my $mtx : shared;
$| = 1; # force unbuffered output
sub worker
{
# as above
}
...
and then it worked. To my surprise I could then remove the lock and it still worked. Perhaps by accident. Note that your script will run significantly slower without buffering.
My conclusion is: you're suffering from buffering.

Using threads in loop

I want to use threads in loops. The way I want to use this is that start threads in a loop and wait for them to finish. Once all threads are finished then sleep for some predefined number of time and then start those threads again.
Actually I want to run these threads once every hour and that is why I am using sleep. Also I know that hourly run can be done via cron but I can't do that so I am using sleep.
I am getting this error when I am trying to run my code:
Thread already joined at ex.pl line 33.
Perl exited with active threads:
5 running and unjoined
0 finished and unjoined
0 running and detached
This is my code:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $queue = Thread::Queue->new();
my #threads_arr;
sub main {
$queue->enqueue($_) for 1 .. 5;
$queue->enqueue(undef) for 1 .. 5;
}
sub thread_body {
while ( my $num = $queue->dequeue() ) {
print "$num is popped by " . threads->tid . " \n";
sleep(5);
}
}
while (1) {
my $main_thread = threads->new( \&main );
push #threads_arr, threads->create( \&thread_body ) for 1 .. 5;
$main_thread->join();
foreach my $x (#threads_arr) {
$x->join();
}
sleep(1);
print "sleep \n";
}
Also I am able to see other questions similar to this but I am not able to get any of them.
Your #threads_arr array never gets cleared after you join the first 5 threads. The old (already joined) threads still exist in the array the second time around the loop, so Perl throws the "Thread already joined" error when attempting to join them. Clearing or locally initializing #threads_arr every time around the loop will fix the problem.
#threads_arr=(); # Reinitialize the array
push #threads_arr, threads->create( \&thread_body ) for 1 .. 5;

Perl: write value in thread

I am trying to get text of two large files. To speed it up i tried threads.
Before i used threads the script worked, now it does not.
The problem is: I save everything I read in the file into a hash.
When i print out the size (or keys/values) after the read-in in the sub (which the thread executed) it shows a correct number > 0, when i print out the size of the hash anywhere else (after the threads have run) it shows me 0.
print ": ".keys(%c);
is used 2 times, and has different output each time.
(In the final programm 2 Threads are running and a method to compare the stuff is called after the threads finished)
Example code:
my %c;
my #threads = initThreads();
#threads[0] = threads->create(\&ce);
foreach(#threads){
$_->join();
}
print ": ".keys(%c);
sub initThreads{
my #initThreads;
for(my $i = 0; $i<2;$i++){
push(#initThreads, $i);
}
return #initThreads;
}
sub ce(){
my $id = threads->tid();
open my $file, "<", #arg1[1] or die $!;
my #cXY;
my #cDa;
while(my $line = <$file>){
# some regex and push to arrays, works
#c{#cXY} = #cDa;
}
print "Thread $id is done\n";
close $file;
print ": ".keys(%c);
threads->exit();
}
Do i have to run the things after the first 2 threads finished in another thread which waits until the first two are finished?
Or what am i doing wrong with threads?
Thanks.
%c isn't shared across your threads.
use threads;
use threads::shared
my %c :shared;
See threads::shared.
In Perl, threads don't share memory. Each thread operates on a different copy of %c, so the changes aren't reflected to the parent thread. While sharing a variable across threads is possible, this is not generally advisable.
Make use of the possibility to return data from a thread. E.g
my %c = map %{ $_->join }, #threads; # flatten all returned hashes
sub ce {
my %hash;
...;
return \%hash;
}
Some other suggestions:
use strict; use warnings; if you aren't already.
use better variable names.
you only seem to be spawning one thread (in $threads[0]).
my #array; for (my $i = 0; $i < 2; $i++){ push(#array, $i) } is equivalent to my #array = 0 .. 1.
#arg1 is not declared in the current scope.
manually exiting a thread is not neccessary in your case.

How do I kill Perl threads?

In this program I create a fork, and then call domultithreading from it. It then creates a few threads.
sub domultithreading {
#Not my function
my ($num) = #_;
my #thrs;
my $i = 0;
my $connectionsperthread = 50;
while ( $i < $num ) {
$thrs[$i] = threads->create( \&doconnections, $connectionsperthread, 1 );
$i += $connectionsperthread;
}
my #threadslist = threads->list();
while ( $#threadslist > 0 ) {
$failed = 0;
}
}
sub kill {
#how can I kill the threads made in domultithreading?
kill 9, $pid;
print "\nkilling $pid\n";
}
I then want to be able to kill the fork and its threads, however I can't figure it out. Any suggestions?
Thanks a lot
Perl provides two concurrency models: Processes and Threads. While you shouldn't neccessarily mix these two without a good reason, threads do model processes quite closely, so we can nearly treat them as such. Specifically, we can send signals to threads.
Processes can be signalled with the kill function: kill SIGNAL => $pid, while threads can be signalled with the kill method: $thr->kill(SIGNAL). This method returns the thread object. Signals can be intercepted when setting signal handlers in the %SIG hash.
This means that every process TERM signal handler TERMs all the child threads like
$_->kill(9)->join() for threads->list;
and every thread TERM signal handler simply exits the thread, or does cleaning up:
threads->exit; # exit the current thread
There are actually few different ways to kill a thread in Perl, depending on what you want to achieve.
Let's take the following code as example:
use strict;
use warnings;
use threads;
use Thread::Queue;
# Create the shared queue (used by the threads):
my $queue = Thread::Queue->new();
sub main {
# Initialise the shared queue:
$queue->enqueue("A", "B", "C", "D", "E", "F");
print "Total number of items: " . $queue->pending() . "\n";
$queue->end(); # signal that there is no more work to be sent...
# Create 3 threads:
threads->create('do') for ( 0..2 );
print "Number of current threads: " . threads->list() . "\n";
foreach my $thread ( threads->list() ) { # for each thread...
$thread->join(); # wait the thread to finish all its work...
print "Number of items in the queue: " . $queue->pending() . "\n" if defined $queue->pending();
print "Number of current threads: " . threads->list() . "\n";
}
}
sub do {
# Retrieve the current thread ID:
my $threadID = threads->self()->tid();
# Setup the thread's kill signal handler:
local $SIG{KILL} = sub { threads->exit() };
while ( defined (my $item = $queue->dequeue()) ) { # for each element in the queue...
print "(Thread-" . $threadID . "): Do something with item '$item'...\n";
sleep 1 + $threadID;
print "(Thread-" . $threadID . "): Finished to use item '$item'...\n";
}
}
main();
The code above spawns 3 threads, each of which will take and process an element of the shared queue till the queue is empty.
In this case, since we declared that no more element will be added to the queue (i.e. $queue->end()), the threads will be joined (to the main) once they had processed all the elements of the queue. Indeed, using $thread->join() we are saying to the main to wait for $thread to join.
If we omit to declare $queue->end(), the threads will not join the main but stay pending for new elements of the queue.
Now, if we want to kill the threads, we have two options: killing the threads but letting them to finish what they are doing first or simply (brutally) killing the threads immediately. In Perl, both are achieved via Thread Signalling.
In the first case (i.e. if we want to tell the threads to finish their work and, after, to stop processing the shared queue), we should use $thread->kill('KILL')->join():
foreach my $thread ( threads->list() ) { # for each thread...
$thread->kill('KILL')->join(); # wait the thread finish its work and kill it...
print "Number of items in the queue: " . $queue->pending() . "\n" if defined $queue->pending();
print "Number of current threads: " . threads->list() . "\n";
}
On the other hand, in the latter case (i.e. if we want to kill the threads immediately), we should use $thread->kill('KILL')->kill():
foreach my $thread ( threads->list() ) { # for each thread...
$thread->kill('KILL')->kill(); # kill the thread immediately...
print "Number of items in the queue: " . $queue->pending() . "\n" if defined $queue->pending();
print "Number of current threads: " . threads->list() . "\n";
}
Of course, if you want to kill the thread from within itself, you just need to call threads->exit() or simply use return:
sub do {
...
threads->exit(); # kill the thread...
...
}

Resources