Keeping count with threads in perl - multithreading

Im trying to count whenever a thread is done in perl, and print the count. but this is not working. i keep getting either "0" or "1", im trying to add to the count then print the count right after the get request is made.
use strict;
use threads;
use LWP::UserAgent;
our $MAX //= $ARGV[1];
my $list = $ARGV[0];
open my $handle, '<', $list;
chomp(my #array = <$handle>);
close $handle;
my $lines = `cat $list | wc -l`;
my $count = 0;
my #threads;
foreach $_ (#array) {
push #threads, async{
my #chars = ("a".."z");
my $random = join '', map { #chars[rand #chars] } 1 .. 6;
my $ua = LWP::UserAgent->new;
my $url = $_ . '?session=' . $random;
my $response = $ua->get($url);
count++;
print $count;
};
sleep 1 while threads->list( threads::running ) > $MAX;
}
$_->join for #threads;

Just to summarise points in comments by #choroba and myself, and not leave the question without an answer.
You would need to include:
use threads::shared;
in your code, along with all the other use elements.
And to indicate that variable $count is shared:
my $count :shared = 0;
EDIT As per Ikegami's comment, you would have to lock the variable if you want to modify it, to avoid problems of concurrency.
{
lock($count);
$count++;
print $count;
}
And that should be enough for the variable $count to be shared.

Related

Perl multithreading slower when memory usage getting high

Hi all~ I has written a very simple code in Perl using multithreading. The codes are as follows.
#!/bin/perl
use strict;
use threads;
use Benchmark qw(:hireswallclock);
my $starttime;
my $finishtime;
my $timespent;
my $num_of_threads = 1;
my $total_size = 10000000;
my $chunk_size = int($total_size / $num_of_threads);
if($total_size % $num_of_threads){
$chunk_size++;
}
my #threads = ();
$starttime = Benchmark->new;
for(my $i = 0; $i < $num_of_threads; $i++) {
my $thread = threads->new(\&search);
push (#threads, $thread);
}
foreach my $thread (#threads) {
$thread->join();
}
my $finishtime = Benchmark->new;
$timespent = timediff($finishtime, $starttime);
print "$num_of_threads threads used in ".timestr($timespent)."\nDone!\n";
sub search{
my $i = 0;
while($i < $chunk_size){
$i++;
}
return 1;
}
This piece of codes works fine as when increasing the number of threads, it will run faster.
However, when adding additional lines in the middle, which will create an array in big size, the code will run more slowly when adding more threads. The codes with the additional lines are shown below.
#!/bin/perl
use strict;
use threads;
use Benchmark qw(:hireswallclock);
my $starttime;
my $finishtime;
my $timespent;
my $num_of_threads = 1;
my $total_size = 10000000;
my $chunk_size = int($total_size / $num_of_threads);
if($total_size % $num_of_threads){
$chunk_size++;
}
##########Additional codes##########
print "Preparing data...\n";
$starttime = Benchmark->new;
my #array = ();
for(my $i = 0; $i < $total_size; $i++){
my $rn = rand();
push(#array, $rn);
}
$finishtime = Benchmark->new;
$timespent = timediff($finishtime, $starttime);
print "Used ".timestr($timespent)."\n";
######################################
my #threads = ();
$starttime = Benchmark->new;
for(my $i = 0; $i < $num_of_threads; $i++) {
my $thread = threads->new(\&search);
push (#threads, $thread);
}
foreach my $thread (#threads) {
$thread->join();
}
my $finishtime = Benchmark->new;
$timespent = timediff($finishtime, $starttime);
print "$num_of_threads threads used in ".timestr($timespent)."\nDone!\n";
sub search{
my $i = 0;
while($i < $chunk_size){
$i++;
}
return 1;
}
I am so confused regarding such a behaviour in Perl's multithreading. Does anyone has any idea what may go wrong here?
Thanks!
You have to remember that when using ithreads - interpreter threads - the entire Perl interpreter, including code and memory, is cloned into the new thread. So the more data to be cloned, the longer it takes. There are ways to control what is cloned; take a look at the threads perldoc.
You should do as little as possible and not even load many modules until after you have spawned your threads.
If you do have a lot of data that will be used by all the threads, make it shared with threads::shared. Then to share a data structure use shared_clone(). You cannot simply share() anything, other than a simple variable. That shared variable can only contain plain scalars or other shared references.
If you are going to consume or pump that data, make it a queue instead with the Thread::Queue module. It automatically shares the values and takes care of the locking. After spawning your pool of worker threads, control them with Thread::Semaphore. That way they won't terminate before you've given them anything to do. You can also prevent race conditions.
https://metacpan.org/pod/threads::shared
HTH
Thank you all for pointing me to the relative directions! I have learned and tried different things, inlucding how to use shared and queue, which should be able to solve the problem. So I revised the script as follows:
#!/bin/perl
use strict;
use threads;
use threads::shared;
use Thread::Queue;
use Benchmark qw(:hireswallclock);
my $starttime;
my $finishtime;
my $timespent;
my $num_of_threads = shift #ARGV;
my $total_size = 100000;
######Initiation of a 2D queue######
print "Preparing queue...\n";
$starttime = Benchmark->new;
my $queue = Thread::Queue->new();
for(my $i = 0; $i < $total_size; $i++){
my $rn1 = rand();
my $rn2 = rand();
my #interval :shared = sort($rn1, $rn2);
$queue->enqueue(\#interval);
}
$finishtime = Benchmark->new;
$timespent = timediff($finishtime, $starttime);
print "Used ".timestr($timespent)."\n";
#####################################
$starttime = Benchmark->new;
my $queue_copy = $queue; #Copy the 2D queue so that the original queue can be kept\
for(my $i = 0; $i < $num_of_threads; $i++) {
my $thread = threads->create(\&search, $queue_copy);
}
foreach my $thread (threads->list()) {
$thread->join();
}
$finishtime = Benchmark->new;
$timespent = timediff($finishtime, $starttime);
print "$num_of_threads threads used in ".timestr($timespent)."\nDone!\n";
#####################################
sub search{
my $temp_queue = $_[0];
while(my $temp_interval = $temp_queue->dequeue_nb()){
#Do something
}
return 1;
}
What I was trying to do was , first, to make a queue of arrays, with each containing two numbers. A copy of the queue was made as I want to preserve the original queue when going through it. Then the copied queue was gone through using multithreads. However, I still found out that it ran slower when more threads were added, which I have no clue why.

Need to open a file and replace multiple strings

I have a really big xml file. It has certain incrementing numbers inside, which i would like to replace with a different incrementing number. I've looked and here is what someone suggested here before. Unfortunately i cant get it to work :(
In the code below all instances of 40960 should be replaced with 41984, all instances of 40961 with 41985 etc. Nothing happens. What am i doing wrong?
use strict;
use warnings;
my $old = 40960;
my $new = 41984;
my $string;
my $file = 'file.txt';
rename($file, $file.'.bak');
open(IN, '<'.$file.'.bak') or die $!;
open(OUT, '>'.$file) or die $!;
$old++;
$new++;
for (my $i = 0; $i < 42; $i++) {
while(<IN>) {
$_ =~ s/$old/$new/g;
print OUT $_;
}
}
close(IN);
close(OUT);
Other answers give you better solutions to your problem. Mine concentrates on explaining why your code didn't work.
The core of your code is here:
$old++;
$new++;
for (my $i = 0; $i < 42; $i++) {
while(<IN>) {
$_ =~ s/$old/$new/g;
print OUT $_;
}
}
You increment the values of $old and $new outside of your loops. And you never change those values again. So you're only making the same substitution (changing 40961 to 41985) 42 times. You never try to change any other numbers.
Also, look at the while loop that reads from IN. On your first iteration (when $i is 0) you read all of the data from IN and the file pointer is left at the end of the file. So when you go into the while loop again on your second iteration (and all subsequent iterations) you read no data at all from the file. You need to reset the file pointer to the start of your file at the end of each iteration.
Oh, and the basic logic is wrong. If you think about it, you'll end up writing each line to the output file 42 times. You need to do all possible substitutions before writing the line. So your inner loop needs to be the outer loop (and vice versa).
Putting those suggestions together, you need something like this:
my $old = 40960;
my $change = 1024;
while (<IN>) {
# Easier way to write your loop
for my $i ( 1 .. 42 ) {
my $new = $old + $change;
# Use \b to mark word boundaries
s/\b$old\b/$new/g;
$old++;
}
# Print each output line only once
print OUT $_;
}
Here's an example that works line by line, so the size of file is immaterial. The example assumes you want to replace things like "45678", but not "fred45678". The example also assumes that there is a range of numbers, and you want them replaced with a new range offset by a constant.
#!/usr/bin/perl
use strict;
use warnings;
use constant MIN => 40000;
use constant MAX => 90000;
use constant DIFF => +1024;
sub repl { $_[0] >= MIN && $_[0] <= MAX ? $_[0] + DIFF : $_[0] }
while (<>) {
s/\b(\d+)\b/repl($1)/eg;
print;
}
exit(0);
Invoked with the file you want to transform as an argument, it produces altered output on stdout. With the following input ...
foo bar 123
40000 50000 60000 99999
fred60000
fred 60000 fred
... it produces this output.
foo bar 123
41024 51024 61024 99999
fred60000
fred 61024 fred
There are a couple of classic Perlisms here, but the example shouldn't be hard to follow if you RTFM appropriately.
Here is an alternative way which reads the input file into a string and does all the substitutions at once:
use strict;
use warnings;
{
my $old = 40960;
my $new = 41984;
my ($regexp) = map { qr/$_/ } join '|', map { $old + $_ } 0..41;
my $file = 'file.txt';
rename($file, $file.'.bak');
open(IN, '<'.$file.'.bak') or die $!;
my $str = do {local $/; <IN>};
close IN;
$str =~ s/($regexp)/do_subst($1, $old, $new)/ge;
open(OUT, '>'.$file) or die $!;
print OUT $str;
close OUT;
}
sub do_subst {
my ( $old, $old_base, $new_base ) = #_;
my $i = $old - $old_base;
my $new = $new_base + $i;
return $new;
}
Note: Can probably be made more efficient by using Regexp::Assemble

Perl daemon doesn't go through entire loop

I'm trying to make a program that works like a very simple server in Perl.
The program itself is meant to work as a library catalogue, giving the user options of searching for books by title or author, and borrowing or returning books. The list of books is provided in a separate file.
Basically, it's supposed to take requests (files) from "Requests" folder, process them, and then give answers (also files) in "Answers" folder. After the process is over, it deletes the old requests and repeats the process (answers are deleted by the client after they are accepted).
It's meant to run as a daemon, but for some reason only the loop responsible for deleting the request files works in the background - the requests are not processed into answers, but are just deleted. Whenever a new request appears, it's almost immediately deleted.
I'm learning to use daemons and tried to emulate what is in this thread.
#!/usr/bin/perl
use warnings;
use strict;
use Proc::Daemon;
#FUNCTIONS DEFINTIONS
sub FindAuthor
{
#try to find book by this author in the catalogue
}
sub FindTitle
{
#try to find book with this title in the catalogue
}
sub CheckIfCanBeReturned
{
#check if the book is borrowed and by whom
}
#attempt at daemonization
Proc::Daemon::Init;
my $continueWork = 1;
$SIG{TERM} = sub { $continueWork = 0 };
while ( $continueWork )
{
sleep(2);
my #RequestFilesArray = `ls /home/Ex5/Requests`;
#list all requests currently in the Request folder
for ( my $b = 0; $b < #RequestFilesArray; $b++)
{
my $cut = `printf "$RequestFilesArray[$b]" | wc -m`;
$cut = $cut - 1;
$RequestFilesArray[$b] = substr $RequestFilesArray[$b], 0, $cut;
}
#the requests are formatted in such way,
#that the first 2 letters indicate what the client wants to be done
#and the rest is taken parameters used in the processing
for (my $i = 0; $i < #RequestFilesArray; $i++)
{
my $UserRequest = `tail -1 Requests/$RequestFilesArray[$i]`;
my $fix = `printf "$UserRequest" | wc -m`;
$fix = $fix - 1;
$UserRequest = substr $UserRequest, 0, $fix;
my $RequestType = substr $UserRequest, 0, 2;
my $RequestedValue = substr $UserRequest, 3;
my $RequestNumber = $i;
if ($RequestType eq "fa")
{
#FIND BY AUTHOR
my #results = FindAuthor ($RequestedValue);
my $filename = "/home/Ex5/Answers/" . $RequestFilesArray[$RequestNumber];
open (my $answerFile, '>', $filename) or die "$!";
for (my $a = 0; $a < #results; $a++)
{
print $answerFile $results[$a],"\n";
}
close $answerFile;
}
elsif ($RequestType eq "ft")
{
#FIND BY TITLE
my #results = FindTitle ($RequestedValue);
my $filename = "/home/Ex5/Answers/" . $RequestFilesArray[$RequestNumber];
open ( my $answerFile, '>', $filename) or die "$!";
for (my $a = 0; $a < #results; $a++)
{
print $answerFile $results[$a],"\n";
}
close $answerFile;
}
elsif ($RequestType eq "br")
{
#BOOK RETURN
my $result = CheckIfCanBeReturned ($RequestedValue, $RequestFilesArray[$RequestNumber]);
my $filename = "/home/Ex5/Answers/" . $RequestFilesArray[$RequestNumber];
open ( my $answerFile, '>', $filename) or die "$!";
print $answerFile $result;
close $answerFile;
}
elsif ($RequestType eq "bb")
{
#BOOK BORROW
my $result = CheckIfCanBeBorrowed ($RequestedValue, $RequestFilesArray[$RequestNumber]);
my $filename = "/home/Ex5/Answers/" . $RequestFilesArray[$RequestNumber];
open ( my $answerFile, '>', $filename) or die "$!";
print $answerFile $result;
close $answerFile;
}
else
{
print "something went wrong with this request";
}
}
#deleting processed requests
for ( my $e = 0; $e < #RequestFilesArray; $e++)
{
my $removeReq = "/home/Ex5/Requests/" . $RequestFilesArray[$e];
unlink $removeReq;
}
#$continueWork =0;
}
You have written way too much code before attempting to test it. You have also started shell processes at every opportunity rather than learning the correct way to achieve things in Perl
The first mistake is to use ls to discover what jobs are waiting. ls prints multiple files per line, and you treat the whole of each line as a file name, using the bizarre printf "$RequestFilesArray[$b]" | wc -m instead of length $RequestFilesArray[$b]
Things only get worse after that
I suggest the following
Start again from scratch
Write your program in Perl. Perl isn't a shell language
Advance in very small increments, making sure that your code compiles and does what it is supposed to every three or four lines. It does wonders for the confidence to know that you're enhancing working code rather than creating a magical sequence of random characters
Learn how to debug. You appear to be staring at your code hoping for inspiration to strike in the manner of someone staring at their car engine in the hope of seeing why it won't start
Delete request files as part of processing the request, and only once the request has been processed and the answer file successfully written. It shouldn't be done in a separate loop
Taking what you provided, here's some pseudocode I've devised for you that you can use as somewhat of a template. This is by NO MEANS exhaustive. I think the advice #Borodin gave is sound and prudent.
This is all untested and much of the new stuff is pseudocode. However, hopefully, there are some breadcrumbs from which to learn. Also, as I stated above, your use of Proc::Daemon::Init is suspect. At the very least, it is so minimally used that it is gobbling up whatever error(s) is/are occurring and you've no idea what's wrong with the script.
#!/usr/bin/perl -wl
use strict;
use File::Basename;
use File::Spec;
use Proc::Daemon;
use Data::Dumper;
# turn off buffering
$|++;
#FUNCTIONS DEFINTIONS
sub FindAuthor
{
#try to find book by this author in the catalogue
}
sub FindTitle
{
#try to find book with this title in the catalogue
}
sub CheckIfCanBeReturned
{
#check if the book is borrowed and by whom
}
sub tail
{
my $file = shift;
# do work
}
sub find_by
{
my $file = shift;
my $val = shift;
my $by = shift;
my #results;
my $xt = 0;
# sanity check args
# do work
if ( $by eq 'author' )
{
my #results = FindByAuthor(blah);
}
elsif ( $by eq 'blah' )
{
#results = blah();
}
#...etc
# really should use File::Spec IE
my $filename = File::Spec->catfile('home', 'Ex5', 'Answers', $file);
# might be a good idea to either append or validate you're not clobbering
# an existent file here because this is currently clobbering.
open (my $answerFile, '>', $filename) or die "$!";
for ( #results )
{
print $answerFile $_,"\n";
}
close $answerFile;
# have some error checking in place and set $xt to 1 if an error occurs
return $xt;
}
#attempt at daemonization
# whatever this is is completely broken methinks.
#Proc::Daemon::Init;
my $continueWork++;
my $r_dir = '/home/user/Requests';
$SIG{TERM} = sub { $continueWork = 0 };
# going with pseudocode
while ( $continueWork )
{
#list all requests currently in the Request folder
my #RequestFilesArray = grep(/[^\.]/, <$r_dir/*>);
#the requests are formatted in such way,
#that the first 2 letters indicate what the client wants to be done
#and the rest is taken parameters used in the processing
for my $request_file ( #RequestFilesArray )
{
my $result = 0;
$request_file = basename($request_file);
my $cut = length($request_file) - 1;
my $work_on = substr $request_file, 0, $cut;
my $UserRequest = tail($request_file);
my $fix = length($UserRequest) - 1;
$UserRequest = substr $UserRequest, 0, $fix;
my $RequestType = substr $UserRequest, 0, 2;
my $RequestedValue = substr $UserRequest, 3;
if ($RequestType eq "fa")
{
#FIND BY AUTHOR
$result = find_by($request_file, $RequestedValue, 'author');
}
elsif ($RequestType eq "ft")
{
#FIND BY TITLE
$result = find_by($request_file, $RequestedValue, 'title');
}
elsif ($RequestType eq "br")
{
#BOOK RETURN
$result = CheckIfCanBeReturned ($RequestedValue, $request_file) or handle();
}
elsif ($RequestType eq "bb")
{
#BOOK BORROW
$result = CheckIfCanBeBorrowed ($RequestedValue, $request_file) or handle();
}
else
{
print STDERR "something went wrong with this request";
}
}
#deleting processed requests
if ( $result == 1 )
{
unlink $work_on;
}
sleep(2);
}
Take special note to my "mild" attempt and DRYing up your code by using the find_by subroutine. You had a LOT of duplicate code in your original script, which I moved into a single sub routine. DRY eq 'Don't Repeat Yourself'.

Perl, using a linux command on users stored in a hash

Here is the code:
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
my #temparray;
my $count = 0;
my #lastarray;
my $lastbash;
#Opens the file /etc/shadow and puts the users with an uid over 1000 but less that 65000 into an array.
open( my $passwd, "<", "/etc/passwd") or die "/etc/passwd failed to open.\n";
while (my $lines = <$passwd>) {
my #splitarray = split(/\:/, $lines );
if( $splitarray[2] >= 1000 && $splitarray[2] < 65000) {
$temparray[$count] =$splitarray[0];
print "$temparray[$count]\n";
$count++;
}
}
close $passwd;
foreach (#temparray) {
$lastbash = qx(last $temparray);
print "$lastbash\n";
}
What I want to do is use the built in linux command "last" on all the users stored in the #temparray. And i want the output to be like this:
user1:10
user2:22
Where 22 and 10 being the number of times they logged in. How can I achieve this ?
I have tried several different ways but I always end up with errors.
The following should perform the task as requested:
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
my #temparray;
my $count = 0;
my #lastarray;
my $lastbash;
#Opens the file /etc/shadow and puts the users with an uid over 1000 but less that 65000 into an array.
open( my $passwd, "<", "/etc/passwd") or die "/etc/passwd failed to open.\n";
while (my $lines = <$passwd>) {
my #splitarray = split(/\:/, $lines );
if( $splitarray[2] >= 1000 && $splitarray[2] < 65000) {
$temparray[$count] =$splitarray[0];
print "$temparray[$count]\n";
$count++;
}
}
close $passwd;
foreach (#temparray) {
my #lastbash = qx(last $_); #<----Note the lines read in go to the $_ variable. Note use of my. You also read the text into array.
print $_.":".#lastbash."\n"; #<----Note the formatting. Reading #lastbash returns the number of elements.
}
You don't really need the $count, you could just do push #temparray, $splitarray[0].
That said, I'm not sure why you need #temparray either... You can just run the command against the users as you find them.
my $passwd = '/etc/passwd';
open( my $fh, '<', $passwd )
or die "Could not open file '$passwd' : $!";
my %counts;
# Get `last` counts and store them %counts
while ( my $line = <$fh> ) {
my ( $user, $uid ) = ( split( /:/, $line ) )[ 0, 2 ];
if ( $uid >= 1000 && $uid < 65000 ) {
my $last = () = qx{last $user};
$counts{$user} = $last
}
}
close $fh;
# Sort %counts keys by value (in descending order)
for my $user ( sort { $counts{$b} <=> $counts{$a} } keys %counts ) {
printf "%s:\t %3d\n", $user, $counts{$user};
}

Calculating the Mean from aPerl Script

I m still in here. ;)
I've got this code from a very expert guy, and I'm shy to ask him this basic questions...anyway this is my question now; this Perl Script prints the median of a column of numbers delimited space, and, I added some stuff to get the size of it, now I'm trying to get the sum of the same column. I did and got not results, did I not take the right column? ./stats.pl 1 columns.txt
#!/usr/bin/perl
use strict;
use warnings;
my $index = shift;
my $filename = shift;
my $columns = [];
open (my $fh, "<", $filename) or die "Unable to open $filename for reading\n";
for my $row (<$fh>) {
my #vals = split/\s+/, $row;
push #{$columns->[$_]}, $vals[$_] for 0 .. $#vals;
}
close $fh;
my #column = sort {$a <=> $b} #{$columns->[$index]};
my $offset = int($#column / 2);
my $length = 2 - #column % 2;
my #medians = splice(#column, $offset, $length);
my $median;
$median += $_ for #medians;
$median /= #medians;
print "MEDIAN = $median\n";
################################################
my #elements = #{$columns->[$index]};
my $size = #elements;
print "SIZE = $size\n";
exit 0;
#################################################
my $sum = #{$columns->[$index]};
for (my $size=0; $size < $sum; $size++) {
my $mean = $sum/$size;
};
print "$mean\n";
thanks in advance.
OK some pointers to get you going :
You can put all the numbers into an array :
my #result = split(m/\d+/, $line);
#average
use List::Util qw(sum);
my $sum = sum(#result);
You can then access individual columns with $result[$index] where index is the number of column you want to access.
Also note that :
$total = $line + $total;
$count = $count + 1;
Can be rewritten as :
$total += $line;
$count += 1;
Finally make sure that you are reading the file :
put a "debugging" print into the while loop :
print $line, "\n";
This should get you going :)

Resources