Threaded code exits before all tasks are complete - multithreading

I am trying to take a portion of an existing script and have it run multiple nmap scans simultaneously to increase the speed of the script.
I initially tried using fork, but it was suggested to me that I should be using threads instead as I am doing this on a Windows box. I modified a code snippet I found online and it partially works.
I am using a list of 23 IP addresses. I have been able to open 10 threads and scan the first 10 addresses, but then the code exits. Ideally, the code would open a new thread each time one exits so that there are always 10 threads running, until it reaches the remainder, in this case there would be three. Then only 3 threads would be open.
This entire code needs to be run inside a subroutine that I have in my original sequential code. I am using ping instead of the nmap command to test.
#!/usr/bin/Perl
use strict;
use threads;
my $i = 0;
my #lines;
# Define the number of threads
my $num_of_threads = 10;
# use the initThreads subroutine to create an array of threads.
my #threads = initThreads();
my #files = glob( "./ping.txt" ) or die "Can't open CMS HostInventory$!"; # Open the CMS Host Inventory csv files for parsing
foreach my $file ( #files ) {
open (CONFIG, '<', $file) or die "Can't ip360_DNS File$!";
#lines = <CONFIG>;
chomp (#lines);
}
# Loop through the array:
foreach ( #threads ) {
# Tell each thread to perform our 'doOperation()' subroutine.
$_ = threads->create(\&doOperation);
}
# This tells the main program to keep running until all threads have finished.
foreach ( #threads ) {
$_->join();
}
print "\nProgram Done!\nPress Enter to exit";
$a = <>;
####################### SUBROUTINES ############################
sub initThreads{
my #initThreads;
for ( my $i = 1; $i <= $num_of_threads; $i++ ) {
push(#initThreads, $i);
}
return #initThreads;
}
sub doOperation{
# Get the thread id. Allows each thread to be identified.
my $id = threads->tid();
my $ip = ($id - 1);
system("ping $lines[$ip] >> ./input/$lines[$ip].txt");
print "Thread $id done!\n";
# Exit the thread
threads->exit();
}

Related

perl multithreading perl exited with active threads

I write perl script to do a certain but I needed to mutlithread it, but I get this:
Perl exited with active threads:
2 running and unjoined
0 finished and unjoined
0 running and detached
Here is the code:
use Net::Ping;
use threads;
use Benchmark qw(:hireswallclock);
use threads::shared;
my $starttime = Benchmark->new;
my $finishtime;
my $timespent;
my $num_of_threads = 2;
my #threads = initThreads();
my $inFile = $ARGV[0] ;
open(IN , "<$inFile") or die "can not find $inFile" ;
my #output: shared = <IN>;
chomp (#output) ;
my $chunk_size = #output / 2;
print($chunk_size);
#############################
######## PROCEDURES ########
#############################
# Subroutine that intializes an array that will contain all our threads:
sub initThreads{
my #initThreads; # Our array
for(my $i=1; $i<=$num_of_threads; $i++){
push(#initThreads, $i);
}
return #initThreads;
}
sub doScript{
my $id = threads->tid();
print "//////////////////////////////////////////////////////////////////////////////////////////////Starting thread $id\n";
my ($start, $end, $output) = #_;
for my $i ($start .. $end) {
## some stuff done
sleep 1 if 0.2 > rand;
}
print "/////////////////////////////////////////////////////////////////////////////////////////////////////////////////Thread $id done!\n";
threads->exit();
}
########################
######## MAIN ########----------------------------------------------------------------
########################
for my $chunk(1 .. 2){
my $start = ($chunk - 1) * $chunk_size;
push #threads, threads->create(
\&doScript,
$start,
($start + $chunk_size - 1),
\#output,
);
print("finish");
}
# This tells the main program to keep running until all threads have finished.
foreach(#threads){
threads->join();
}
$finishtime = Benchmark->new;
$timespent = timediff($finishtime,$starttime);
print "\nDone!\nSpent ". timestr($timespent);
#print "\nProgram Done!\nPress Enter to exit\n";
$a = <>;
close (IN);
I even replaced the join block with this:
$_->join() for threads->list();
the error was solved but the script seemed not to do anything, threads started and terminated without doing anything.
Can anyone help me with this !
The reason you get that error is exactly as said on the tin - your code exited before the threads were closed. It can commonly happen when an exit or die is triggered early.
In particular I think your problem may lie within your reusing of #threads and initThreads().
The latter returns a list of numbers, not any threads. And then you push a couple more threads to the end of the list later, when you're doing a create.
That looks a lot like some sort of logic error to me.
But the major problem I think will be this:
foreach(#threads){
threads->join();
}
You're not actually joining a specific thread. What you probably want is:
foreach my $thr ( #threads ) {
$thr -> join();
}
At least, you would, if you hadn't manually populated #threads with [1,2] thanks to my #threads = initThreads();

Missing characters while reading input with threads

Let's say we have a script which open a file, then read it line by line and print the line to the terminal. We have a sigle thread and a multithread version.
The problem is than the resulting output of both scripts is almost the same, but not exactly. In the multithread versions there are about ten lines which missed the first 2 chars. I mean, if the real line is something line "Stackoverflow rocks", I obtain "ackoverflow rocks".
I think that this is related to some race condition since if I adjust the parameters to create a lot of little workers, I get more faults than If I use less and bigger workers.
The single thread is like this:
$file = "some/file.txt";
open (INPUT, $file) or die "Error: $!\n";
while ($line = <STDIN>) {
print $line;
}
The multithread version make's use of the thread queue and this implementation is based on the #ikegami approach:
use threads qw( async );
use Thread::Queue 3.01 qw( );
use constant NUM_WORKERS => 4;
use constant WORK_UNIT_SIZE => 100000;
sub worker {
my ($job) = #_;
for (#$job) {
print $_;
}
}
my $q = Thread::Queue->new();
async { while (defined( my $job = $q->dequeue() )) { worker($job); } }
for 1..NUM_WORKERS;
my $done = 0;
while (!$done) {
my #lines;
while (#lines < WORK_UNIT_SIZE) {
my $line = <>;
if (!defined($line)) {
$done = 1;
last;
}
push #lines, $line;
}
$q->enqueue(\#lines) if #lines;
}
$q->end();
$_->join for threads->list;
I tried your program and got similar (wrong) results. Instead of Thread::Semaphore I used lock from threads::shared around the print as it's simpler to use than T::S, i.e.:
use threads;
use threads::shared;
...
my $mtx : shared;
sub worker
{
my ($job) = #_;
for (#$job) {
lock($mtx); # (b)locks
print $_;
# autom. unlocked here
}
}
...
The global variable $mtx serves as a mutex. Its value doesn't matter, even undef (like here) is ok.
The call to lock blocks and returns only if no other threads currently holds the lock on that variable.
It automatically unlocks (and thus makes lock return) when it goes out of scope. In this sample that happens
after every single iteration of the for loop; there's no need for an extra {…} block.
Now we have syncronized the print calls…
But this didn't work either, because print does buffered I/O (well, only O). So I forced unbuffered output:
use threads;
use threads::shared;
...
my $mtx : shared;
$| = 1; # force unbuffered output
sub worker
{
# as above
}
...
and then it worked. To my surprise I could then remove the lock and it still worked. Perhaps by accident. Note that your script will run significantly slower without buffering.
My conclusion is: you're suffering from buffering.

csv format issue using multithreading in perl

I'm running a perl script consisting of 30 threads to run a subroutine. For each thread, I'm supplying 100 data. In the subroutine, after the code does what its supposed to, I'm storing the output in a csv file. However, I find that on execution, the csv file has some data overlapped. For example, in the csv file, I'm storing name, age, gender, country this way-
print OUTPUT $name.",".$age.",".$gender.",".$country.",4\n";
The csv file should have outputs as such-
Randy,35,M,USA,4
Tina,76,F,UK,4
etc.
However, in the csv file, I see that some columns has overlapped or has been entered haphazardly in this way-
Randy,35,M,USA,4
TinaMike,76,UK
23,F,4
Is it because some threads are executing at the same time? What could I do to avoid this? I'm using the print statement only after I'm getting the data. Any suggestions?
4 is the group id which will remain constant.
Below is the code snippet:
#!/usr/bin/perl
use DBI;
use strict;
use warnings;
use threads;
use threads::shared;
my $host = "1.1.1.1";
my $database = "somedb";
my $user = "someuser";
my $pw = "somepwd";
my #threads;
open(PUT,">/tmp/file1.csv") || die "can not open file";
open(OUTPUT,">/tmp/file2.csv") || die "can not open file";
my $dbh = DBI->connect("DBI:mysql:$database;host=$host", $user, $pw ,) || die "Could not connect to database: $DBI::errstr";
$dbh->{'mysql_auto_reconnect'} = 1;
my $sql = qq{
//some sql to get a primary keys
};
my $sth = $dbh->prepare($sql);
$sth->execute();
while(my #request = $sth->fetchrow_array())
{
#get other columns and print to file1.csv
print PUT $net.",".$sub.",4\n";
$i++; #this has been declared before
}
for ( my $count = 1; $count <= 30; $count++) {
my $t = threads->new(\&sub1, $count);
push(#threads,$t);
}
foreach (#threads) {
my $num = $_->join;
print "done with $num\n";
}
sub sub1 {
my $num = shift;
//calculated start_num and end_num based on an internal logic
for(my $x=$start_num; $x<=$end_num; $x++){
print OUTPUT $name.",".$age.",".$gender.",".$country.",4\n";
$j++; #this has been declared before
}
sleep(1);
return $num;
}
I have problem in the file2 which has the OUTPUT handler
You are multithreading and printing to a file from multiple threads. This will always end badly - print is not an 'atomic' operation, so different prints can interrupt each other.
What you need to do is serialize your output such that this cannot happen. The simplest way is to use a lock or a semaphore:
my $print_lock : shared;
{
lock $print_lock;
print OUTPUT $stuff,"\n";
}
when the 'lock' drifts out of scope, it'll be released.
Alternatively, have a separate thread that 'does' file IO, and use Thread::Queue to feed lines to it. Depends somewhat on whether you need any ordering/processing of the contents of 'OUTPUT'.
Something like:
use Thread::Queue;
my $output_q = Thread::Queue -> new();
sub output_thread {
open ( my $output_fh, ">", "output_filename.csv" ) or die $!;
while ( my $output_line = $output_q -> dequeue() ) {
print {$output_fh} $output_line,"\n";
}
close ( $output_fh );
sub doing_stuff_thread {
$output_q -> enqueue ( "something to output" ); #\n added by sub!
}
my $output_thread = threads -> create ( \&output_thread );
my $doing_stuff_thread = threads -> create ( \&doing_stuff_thread );
#wait for doing_stuff to finish - closing the queue will cause output_thread to flush/exit.
$doing_stuff_thread -> join();
$output_q -> end;
$output_thread -> join();
Open the File handle globally, then try using flock on the file handle as demonstrated:
sub log_write {
my $line = shift;
flock(OUTPUT, LOCK_EX) or die "can't lock: $!";
seek(OUTPUT, 0, SEEK_END) or die "can't fast forward: $!";
print OUTPUT $line;
flock(OUTPUT, LOCK_UN) or die "can't unlock: $!";
}
Other example:
perlfaq5 - I still don't get locking. I just want to increment the number in the file. How can I do this?

Perl: write value in thread

I am trying to get text of two large files. To speed it up i tried threads.
Before i used threads the script worked, now it does not.
The problem is: I save everything I read in the file into a hash.
When i print out the size (or keys/values) after the read-in in the sub (which the thread executed) it shows a correct number > 0, when i print out the size of the hash anywhere else (after the threads have run) it shows me 0.
print ": ".keys(%c);
is used 2 times, and has different output each time.
(In the final programm 2 Threads are running and a method to compare the stuff is called after the threads finished)
Example code:
my %c;
my #threads = initThreads();
#threads[0] = threads->create(\&ce);
foreach(#threads){
$_->join();
}
print ": ".keys(%c);
sub initThreads{
my #initThreads;
for(my $i = 0; $i<2;$i++){
push(#initThreads, $i);
}
return #initThreads;
}
sub ce(){
my $id = threads->tid();
open my $file, "<", #arg1[1] or die $!;
my #cXY;
my #cDa;
while(my $line = <$file>){
# some regex and push to arrays, works
#c{#cXY} = #cDa;
}
print "Thread $id is done\n";
close $file;
print ": ".keys(%c);
threads->exit();
}
Do i have to run the things after the first 2 threads finished in another thread which waits until the first two are finished?
Or what am i doing wrong with threads?
Thanks.
%c isn't shared across your threads.
use threads;
use threads::shared
my %c :shared;
See threads::shared.
In Perl, threads don't share memory. Each thread operates on a different copy of %c, so the changes aren't reflected to the parent thread. While sharing a variable across threads is possible, this is not generally advisable.
Make use of the possibility to return data from a thread. E.g
my %c = map %{ $_->join }, #threads; # flatten all returned hashes
sub ce {
my %hash;
...;
return \%hash;
}
Some other suggestions:
use strict; use warnings; if you aren't already.
use better variable names.
you only seem to be spawning one thread (in $threads[0]).
my #array; for (my $i = 0; $i < 2; $i++){ push(#array, $i) } is equivalent to my #array = 0 .. 1.
#arg1 is not declared in the current scope.
manually exiting a thread is not neccessary in your case.

In Perl, how can I wait for threads to end in parallel?

I have a Perl script that launches 2 threads,one for each processor. I need it to wait for a thread to end, if one thread ends a new one is spawned. It seems that the join method blocks the rest of the program, therefore the second thread can't end until everything the first thread does is done which sort of defeats its purpose.
I tried the is_joinable method but that doesn't seem to do it either.
Here is some of my code :
use threads;
use threads::shared;
#file_list = #ARGV; #Our file list
$nofiles = $#file_list + 1; #Real number of files
$currfile = 1; #Current number of file to process
my %MSG : shared; #shared hash
$thr0 = threads->new(\&process, shift(#file_list));
$currfile++;
$thr1 = threads->new(\&process, shift(#file_list));
$currfile++;
while(1){
if ($thr0->is_joinable()) {
$thr0->join;
#check if there are files left to process
if($currfile <= $nofiles){
$thr0 = threads->new(\&process, shift(#file_list));
$currfile++;
}
}
if ($thr1->is_joinable()) {
$thr1->join;
#check if there are files left to process
if($currfile <= $nofiles){
$thr1 = threads->new(\&process, shift(#file_list));
$currfile++;
}
}
}
sub process{
print "Opening $currfile of $nofiles\n";
#do some stuff
if(some condition){
lock(%MSG);
#write stuff to hash
}
print "Closing $currfile of $nofiles\n";
}
The output of this is :
Opening 1 of 4
Opening 2 of 4
Closing 1 of 4
Opening 3 of 4
Closing 3 of 4
Opening 4 of 4
Closing 2 of 4
Closing 4 of 4
First off, a few comments on the code itself. You need to make sure you have:
use strict;
use warnings;
at the beginning of every script. Second:
#file_list = #ARGV; #Our file list
$nofiles = $#file_list + 1; #Real number of files
is unnecessary as an array in scalar context evaluates to the number of elements in the array. That is:
$nofiles = #ARGV;
would correctly give you the number of files in #ARGV regardless of the value of $[.
Finally, the script can be made much simpler by partitioning the list of files before starting the threads:
use strict; use warnings;
use threads;
use threads::shared;
my #threads = (
threads->new(\&process, #ARGV[0 .. #ARGV/2]),
threads->new(\&process, #ARGV[#ARGV/2 + 1 .. #ARGV - 1]),
);
$_->join for #threads;
sub process {
my #files = #_;
warn "called with #files\n";
for my $file ( #files ) {
warn "opening '$file'\n";
sleep rand 3;
warn "closing '$file'\n";
}
}
Output:
C:\Temp> thr 1 2 3 4 5
called with 1 2 3
opening '1'
called with 4 5
opening '4'
closing '4'
opening '5'
closing '1'
opening '2'
closing '5'
closing '2'
opening '3'
closing '3'
Alternatively, you can let the threads move on to the next task as they are done:
use strict; use warnings;
use threads;
use threads::shared;
my $current :shared;
$current = 0;
my #threads = map { threads->new(\&process, $_) } 1 .. 2;
$_->join for #threads;
sub process {
my ($thr) = #_;
warn "thread $thr stared\n";
while ( 1 ) {
my $file;
{
lock $current;
return unless $current < #ARGV;
$file = $ARGV[$current];
++ $current;
}
warn "$thr: opening '$file'\n";
sleep rand 5;
warn "$thr: closing '$file'\n";
}
}
Output:
C:\Temp> thr 1 2 3 4 5
thread 1 stared
1: opening '1'
1: closing '1'
1: opening '2'
thread 2 stared
2: opening '3'
2: closing '3'
2: opening '4'
1: closing '2'
1: opening '5'
1: closing '5'
2: closing '4'
I think you need to move the code that pulls the next file from the list into the threads themselves.
So every thread would not just process one file, but continue to process until the list is empty.
This way, you also save on the overhead of creating new threads all the time.
Your main thread will then join both of them.
Of course, this requires synchronization on the list (so that they do not pull the same data). Alternately, you could split the list into two (one for each thread), but that might result in an unlucky distribution.
(PS: No Perl God, just a humble monk)

Resources