Reassembling chunks of a binary file in Perl in a multithreaded script - multithreading

I'm trying to break a binary file into chunks for a multipart upload and then have them reassembled correctly. I'm using threads to try to speed up transfers. The result hasn't validated using md5sum so I decided to try to just to reassemble it locally and see what happens. The local result comes out ok with a small text file, but not a large binary file, when multithreaded. It works either way when single threaded.
Here's a simple (admittedly inefficient) implementation of a single threaded version that works:
use strict;
use POSIX;
my $file = 'test.txt';
my $chunkSize = 3;
my $size = -s $file;
my $parts = ceil($size / $chunkSize);
my %output;
open(my $fileHandle, '<', $file) or die("Error reading file, stopped");
binmode($fileHandle);
for (my $i = 0; $i < $parts; $i++) {
my $chunk;
my $offset = $i * $chunkSize;
seek ($fileHandle, $offset, 0);
if ($chunkSize > $size) {
$chunkSize = $size;
}
elsif (($chunkSize + $offset) > $size) {
$chunkSize = $size - $offset;
}
my $length = read($fileHandle, $chunk, $chunkSize);
say STDERR "Chunk size for $i: " . $chunkSize . "(" . ($chunkSize + $offset) . "/$size) starting at " . $offset .": ";# . $chunk;
$output{ $i } = $chunk;
}
close($fileHandle);
open(my $fileHandle, '>', $file . '.comp') or die("Error reading file, stopped");
binmode($fileHandle);
for (1...$parts) {
print $fileHandle $output{ $_ };
}
close ($fileHandle);
Here's the multithreaded version that does not:
use strict;
use POSIX;
use threads;
use threads::shared;
use Thread::Queue;
my $file = '/backup/2022-12-13/accounts/test.tar.gz';
my $chunkSize = 500 * 1024 * 1024;
#Or my $chunkSize = 3; when using it with the smaller text test file.
my $size = -s $file;
my $parts = ceil($size / $chunkSize);
my %output :shared;
my #queueData;
for (0...$parts) { push (#queueData, $_); }
my $overallProgress :shared;
my $queue = Thread::Queue->new(#queueData);
open(my $fileHandle, '<', $file) or die("Error reading file, stopped");
binmode($fileHandle);
my #threads;
for(1..4) {
push #threads, threads->create( sub {
while( my $i = $queue->dequeue_nb ) {
my $chunk;
my $offset = ($i - 1) * $chunkSize;
seek ($fileHandle, $offset, 0);
if ($chunkSize > $size) {
$chunkSize = $size;
}
elsif (($chunkSize + $offset) > $size) {
$chunkSize = $size - $offset;
}
my $length = read($fileHandle, $chunk, $chunkSize);
say STDERR "Chunk size for $i: " . $chunkSize . "(" . ($chunkSize + $offset) . "/$size) starting at " . $offset .": ";
$output{ $i } = $chunk;
}
});
}
$_->join for #threads;
close($fileHandle);
open(my $fileHandle, '>', $file . '.comp') or die("Error reading file, stopped");
binmode($fileHandle);
for (1...$parts) {
print "Reassembling part $_...";
print $fileHandle $output{ $_ };
}
close ($fileHandle);
If I run that code on this simple text file, it works:
1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Result of md5sum on both source and destination file:
6d8d0b2c6a06f88e13167fffacb1b74e /home/tbutler/text.txt
6d8d0b2c6a06f88e13167fffacb1b74e /home/tbutler/text.txt.comp
But, if I take a 9 GB tarball (and increase the chunk size to 500 MB by changing the appropriate line to my $chunkSize = 500 * 1024 * 1024) and run it likewise, the md5sum result is not matching. In fact the resultant file is approximately a gigabyte smaller than the original file and I can't figure out why.

Related

Waiting for a defined period of time for the input in Perl

What is the way to say
wait for the 10 sec for the input
if no input recognized
print something
in Perl?
IO::Select and can_read with a timeout.
#!/usr/bin/env perl
use strict;
use warnings;
use IO::Select;
my $select = IO::Select->new();
$select->add( \*STDIN );
my $input = "NONE";
if ( $select->can_read(10) ) {
$input = <STDIN>;
}
print "Got input of $input\n";
You can use alarm(),
use strict;
use warnings;
sub timeout {
my ($f, $sec) = #_;
return eval {
local $SIG{ALRM} = sub { die };
alarm($sec);
$f->();
alarm(0);
1;
};
}
my $text;
my $ok = timeout(sub{ $text = <STDIN>; }, 10);
if ($ok) {
print "input: $text";
}
else {
print "timeout occurred\n";
}

Count Entries in hash and how frequent they change

I have a script that reads CAN bus signals and displays them to screen but I need to add a signal counter and frequency.
So I need to count how many times has this $id appeared so far and how many milliseconds ago did was it added to the hash table.
#!/usr/bin/perl -w
use strict;
open my $can_bus, "./receivetest |"
or die "Couldn't read from CAN bus: $!";
my %last_value;
while (<$can_bus>) {
if( /^(........:...) (........) (.*)$/ )
{
my ($something, $id, $payload) = ($1,$2,$3);
$last_value{ $id } = $payload;
system('clear'); #clear screen
#Print Table
for my $id (sort keys %last_value) {
print "$id\t $last_value{ $id }\n";
}
}
else {
warn "ignore unknown line: ";
warn $_;
}
}
This is my code so far.
You can store different values for one $id key if you expand your hash by adding more keys after the $id key. For example:
if (defined $last_value{ $id } ){
$last_value{ $id }{COUNT} += 1;
my $time_diff = $now_milli - $last_value{ $id }{TIME};
$last_value{ $id }{TIME} = $now_milli;
$last_value{ $id }{DIFF} = $time_diff;
$last_value{ $id }{PAYLOAD} = $payload;
}else{
$last_value{ $id }{TIME} = $now_milli;
$last_value{ $id }{DIFF} = "NA";
$last_value{ $id }{COUNT} = 1;
$last_value{ $id }{PAYLOAD} = $payload;
}
To get current time in milliseconds you can use Time::HiRes qw/gettimeofday/ which is a part of Perl core:
use Time::HiRes qw/gettimeofday/;
my $now_milli = 1000 * gettimeofday();
Finally, to print information stored in the %last_value hash:
foreach my $id (keys %last_value){
print "ID: ", $id, "\n";
print "Count: ", $last_value{$id}{COUNT}, "\n";
print "Time from last: ", $last_value{$id}{DIFF}, "\n";
print "Payload: ", $last_value{$id}{PAYLOAD}, "\n";
}

Segmentation fault on merging threads (Perl)

I had some working code that I've tried to multithread using the tutorial on dreamincode: http://www.dreamincode.net/forums/topic/255487-multithreading-in-perl/
The example code there seems to work fine, but I can't for the life of me work out why mine isn't. From putting in debug messages it seems to get all the way to the end of the subroutine with all of the threads, and then sit there for a while before hitting a segmentation fault and dumping the core. That being said I've also not managed to find the core dump files anywhere (Ubuntu 13.10).
If anyone has any suggested reading, or can see the error in the rather messy code below I'd be eternally grateful.
#!/usr/bin/env perl
use Email::Valid;
use LWP::Simple;
use XML::LibXML;
use Text::Trim;
use threads;
use DB_File;
use Getopt::Long;
my $sourcefile = "thislevel.csv";
my $startOffset = 0;
my $chunk = 10000;
my $num_threads = 8;
$result = GetOptions ("start=i" => \$startOffset, # numeric
"chunk=i" => \$chunk, # numeric
"file=s" => \$sourcefile, # string
"threads=i" => \$num_threads, #numeric
"verbose" => \$verbose); # flag
$tie = tie(#filedata, "DB_File", $sourcefile, O_RDWR, 0666, $DB_RECNO)
or die "Cannot open file $sourcefile: $!\n";
my $filenumlines = $tie->length;
if ($filenumlines>$startOffset + $chunk){
$numlines = $startOffset + $chunk;
} else {
$numlines = $filenumlines;
}
open (emails, '>>emails.csv');
open (errorfile, '>>errors.csv');
open (nxtlvl, '>>nextlevel.csv');
open (donefile, '>>donelines.csv');
my $line = '';
my $found = false;
my $linenum=0;
my #threads = initThreads();
foreach(#threads){
$_ = threads->create(\&do_search);
}
foreach(#threads){
$_->join();
}
close nxtlvl;
close emails;
close errorfile;
close donefile;
sub initThreads{
# An array to place our threads in
my #initThreads;
for(my $i = 1;$i<=$num_threads;$i++){
push(#initThreads,$i);
}
return #initThreads;
}
sub do_search{
my $id = threads->tid();
my $linenum=$startOffset-1+$id;
my $parser = XML::LibXML->new();
$parser->set_options({ recover => 2,
validation => 0,
suppress_errors => 1,
suppress_warnings => 1,
pedantic_parser => 0,
load_ext_dtd => 0, });
while ($linenum < $numlines) {
$found = false;
#full_line = split ',', $filedata[$linenum-1];
$line = trim(#full_line[1]);
$this_url = trim(#full_line[2]);
print "Thread $id Scanning $linenum of $filenumlines\: ";
printf "%.3f\%\n", 100 * $linenum / $filenumlines;
my $content = get trim($this_url);
if (!defined($content)) {
print errorfile "$this_url, no content\n";
}elsif (length($content)<100) {
print errorfile "$this_url, short\n";
}else {
my $doc = $parser->load_html(string => $content);
if(defined($doc)){
for my $anchor ( $doc->findnodes("//a[\#href]") )
{
$is_email = substr $anchor->getAttribute("href") ,7;
if(Email::Valid->address($is_email)) {
printf emails "%s, %s\n", $line, $is_email;
$found = true;
} else{
$link = $anchor->getAttribute("href");
if (substr lc(trim($link)),0,4 eq "http"){
printf nxtlvl "%s, %s\n", $line, $link;
} else {
printf nxtlvl "%s, %s/%s\n", $line, $line, $link;
}
}
}
}
if ($found=false){
my #lines = split '\n',$content;
foreach my $cline (#lines){
my #words = split ' ',$cline;
foreach my $word (#words) {
my #subwords = split '"',$word ;
foreach my $subword (#subwords) {
if(Email::Valid->address($subword)) {
printf emails "%s, %s\n", $line, $subword;
}
}
}
}
}
}
printf donefile "%s\n",$linenum;
$linenum = $linenum + $num_threads;
}
threads->exit();
}
In addition to sundry coding errors that mean my code should never ever be used as an example for other visitors, DB_File is not a thread-safe module.
Annoyingly, and perhaps misleadingly, it works absolutely as it should do right up until you close the threads that have been successfully accessing the file throughout your code.

script does not fetch properly

#!/usr/bin/perl
if (! eval "require LWP::UserAgent;")
{
$ret = "LWP::UserAgent not found";
}
if ( exists $ARGV[0]) {
if ($ret)
{
print "no ($ret)\n";
exit 1;
}
my $ua = LWP::UserAgent->new(timeout => 5);
my $response = $ua->request(HTTP::Request->new('GET',$ARGV[0]));
my #content = split (/\n/, $response->content);
my $active_connections = -1;
if ($content[0] =~ /^Active connections:\s+(\d+)\s*$/i) {
$active_connections = $1;
}
my $accepts = -1;
my $handled = -1;
my $requests = -1;
if ($content[2] =~ /^\s+(\d+)\s+(\d+)\s+(\d+)\s*$/) {
$accepts = $1;
$handled = $2;
$requests = $3;
}
my $reading = -1;
my $writing = -1;
my $waiting = -1;
if ($content[3] =~ /Reading: (\d+) Writing: (\d+) Waiting: (\d+)\s*$/) {
$reading = $1;
$writing = $2;
$waiting = $3;
}
print "nginx_active:$active_connections nginx_reading:$reading nginx_writing:$writing nginx_waiting:$waiting ";
print "\n";
}
My nginx status page:
Active connections: 2
server accepts handled requests
2 2 2
Reading: 0 Writing: 1 Waiting: 1
running:
./get_nginx_clients_status.pl http://IP/nginx_status
nginx_active:-1 nginx_reading:-1 nginx_writing:-1 nginx_waiting:-1
I should be getting:
nginx_accepts:113869 nginx_handled:113869 nginx_requests:122594
Any idea what could be wrong?
Your code states:
my $accepts = -1;
my $handled = -1;
my $requests = -1;
And your output is:
nginx_active:-1 nginx_reading:-1 nginx_writing:-1 nginx_waiting:-1
Therefore, because you test:
$content[2] =~ /^\s+(\d+)\s+(\d+)\s+(\d+)\s*$/
... it follows that $content[2] does not contain the information your regular expression demands. Have you tried adding print $content[2] . "\n"; to help debug your script?
The Debugger
Please consider use of the Perl debugger. It will answer obvious problems very quickly for you.
Start your script using the -d flag. Then step through your code one line at a time using n to go to the next line, l to list the code at the current point, and x #content (for example) to list the contents of variables each step of the way.

Perl help: Can't call method "display" on an undefined value

Let start with the basic back ground. We recently brought our web hosting in house.
There are few old website still use Perl. I have no experience with Perl.
Let's Begin. We have a this sub website on our main domain.
Public link : http://www.gatewayrehab.org/eap/
When you goto website we get the following error message
"Software error:
Can't call method "display" on an undefined value at /var/www/www.gatewayrehab.org/app/webroot/eap/index.cgi line 47."
Looking at the EAP website/directory all files look in place with proper permission, again I have no experience with Perl/Cgi. Below is the index.cgi file :
#!/usr/bin/perl -w
### the main control file used in the system
BEGIN { unshift #INC, qw(./cgi-bin/include/); }
### send all fatal errors to the browser
use CGI::Carp qw(fatalsToBrowser);
use CGI qw(:standard);
use Error_Handler;
use File_Handler;
use Cookie_Handler;
require "./cgi-bin/setup.cgi";
do "./cgi-bin/include/common.cgi";
### initialize the file handling module
my $File = new File_Handler;
### initialize the cookie handling module
my $Cookie = new Cookie_Handler;
###parse
$ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
if ($ENV{'REQUEST_METHOD'} eq "POST"){
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
} else {
$buffer = $ENV{'QUERY_STRING'};
}
#pairs = split(/&/, $buffer);
#&error_check;
foreach $pair (#pairs){
($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$name =~ tr/+/ /;
$name =~ s/\breq\_//ig;
$name =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$name =~ tr/A-Z/a-z/;
$name = trim($name);
$FORM{$name} = trim($value);
}
my %cookiedata = $Cookie -> get_cookies();
### read the summary database
my $summary_ref = $File -> read($login_summary)|| $Error -> display("$!". __LINE__);
my (#summary) = #$summary_ref;
### read the companies database
my $companies_ref = $File -> read($companies_db)|| $Error -> display("$!". __LINE__);
my (#companies) = #$companies_ref;
my %COMP = ();
foreach (#companies) {
$_ =~ s/\n|\r//g;
my ($c_num, $c_name) = split(/\t/, $_);
$COMP{$c_num} = $c_name;
}
if ( $cookiedata{'LOGIN'} != 1 ) {
my $found = 0;
my $company_number = $ENV{'REMOTE_USER'};
$company_number =~ s/s|e|w//g;
foreach (#summary) {
$_ =~ s/\n|\r//g;
my #field = split(/\t/, $_);
$field[0] = &trim($field[0]);
$field[2] = &trim($field[2]);
$field[3] = &trim($field[3]);
$field[4] = &trim($field[4]);
$field[5] = &trim($field[5]);
$field[6] = &trim($field[6]);
if ( $field[0] eq "$company_number" ) {
$found = 1;
my $firstletters = substr($ENV{'REMOTE_USER'}, 0, 2);
$firstletters = trim($firstletters);
if ( $firstletters ne "sw" && $firstletters ne "lf" ) {
$firstletters = substr($firstletters, 0, 1);
}
if ( lc($firstletters) eq "e" ) {
$field[3] = ($field[3] + 1);
} elsif ( lc($firstletters) eq "s" ) {
$field[2] = ($field[2] + 1);
} elsif ( lc($firstletters) eq "w" ) {
$field[4] = ($field[4] + 1);
} elsif ( lc($firstletters) eq "sw" ) {
$field[5] = ($field[2] + 1);
} elsif ( lc($firstletters) eq "lf" ) {
$field[6] = ($field[6] + 1);
} else {
$field[3] = ($field[3] + 1);
}
$_ = join("\t", #field);
}
}
if ( $found == 1 ) {
# write data back to file
# append to summary file
open(LOG, ">$login_summary") || $Error -> display("$!". __LINE__);
flock(LOG,2);
foreach (#summary) {
print LOG $_ ."\n";
}
flock(LOG,8);
close(LOG);
#$File -> file($login_summary);
#$File -> data(\#summary);
#$File -> write() || $Error -> display("$!". __LINE__);
} else {
$e = 0;
$s = 0;
$w = 0;
$sw = 0;
$lf = 0;
my $firstletters = substr($ENV{'REMOTE_USER'}, 0, 2);
$firstletters = trim($firstletters);
if ( $firstletters ne "sw" && $firstletters ne "lf" ) {
$firstletters = substr($firstletters, 0, 1);
}
if ( lc($firstletters) eq "e" ) {
$e = 1;
} elsif ( lc($firstletters) eq "s" ) {
$s = 1;
} elsif ( lc($firstletters) eq "w" ) {
$w = 1;
} elsif ( lc($firstletters) eq "sw" ) {
#$sw = 1;
$s = 1;
} elsif ( lc($firstletters) eq "lf" ) {
$lf = 1;
} else {
$e = 1;
}
# append to summary file
open(LOG, ">>$login_summary") || $Error -> display("$!". __LINE__);
flock(LOG,2);
print LOG $company_number ."\t". $COMP{$company_number} ."\t". $s ."\t". $e ."\t". $w . "\t". $sw ."\t". $lf ."\n";
flock(LOG,8);
close(LOG);
}
my (#login_logs) = ();
my $logline = "";
$login_logs[0] = $ENV{'REMOTE_USER'};
$login_logs[1] = $ENV{'REMOTE_ADDR'};
$login_logs[2] = time();
open(LOG, ">>$login_logs") || $Error -> display("$!". __LINE__);
flock(LOG,2);
print LOG $ENV{'REMOTE_USER'} ."\t". $ENV{'REMOTE_ADDR'} ."\t". time() ."\n";
flock(LOG,8);
close(LOG);
print "Set-Cookie: LOGIN=1";
print "; path=$cookiepath; domain=$cookiedomain;\n";
}
my $firstletters = substr($ENV{'REMOTE_USER'}, 0, 2);
$firstletters = trim($firstletters);
if ( $firstletters ne "sw" && $firstletters ne "lf") {
$firstletters = substr($firstletters, 0, 1);
}
if ( lc($firstletters) eq "e" ) {
print "Location: http://www.gatewayrehab.org/eap/new/employee/member.htm\n\n";
} elsif ( lc($firstletters) eq "s" ) {
print "Location: http://www.gatewayrehab.org/eap/supervisor/\n\n";
} elsif ( lc($firstletters) eq "w" ) {
print "Location: http://www.gatewayrehab.org/eap/new/worklife/member.htm\n\n";
} elsif ( lc($firstletters) eq "sw" ) {
print "Location: http://www.gatewayrehab.org/eap/supervisor-wl/\n\n";
exit;
} elsif ( lc($firstletters) eq "lf" ) {
print "Location: http://www.gatewayrehab.org/eap/legalandfinancial/\n\n";
exit;
} else {
print "Location: http://www.gatewayrehab.org/eap/new/employee/member.htm\n\n";
}
#output html
print "Content-type: text/html\n\n";
print "<h1>hello world!</h1>";
$e = `perl -ver`;
$r = `whereis perl5`;
$z = `whereis sendmail`;#
$w = `top`;#
$d = `w`;
print "<pre>perl version:<br>$e<hr>perl path:<br>$r<hr>sendmail path:<br>$z<hr>top:<br>$w<hr>w:<br>$d<hr>environment vars:<br>";##
while (($key, $val) = each %ENV) {
print "$key = $val\n";
}
$x= 'lowercase';
print "<hr>path tranlsated(NT)<br>$ENV{'PATH_TRANSLATED'}</pre>";
#$x = uc($x);
print "<br>$x";
exit;
Please let me know what I am missing. If you need to look at more "included" files let me know.
Also here is the link for our cgi config. http://www.gatewayrehab.org/eap/cgi-bin/cgi.cgi
Thank You.
The error comes from this line: my $summary_ref = $File -> read($login_summary)|| $Error -> display("$!". __LINE__);. It means $Error doesn't exist or its value is undef. And indeed, I don't see such a variable being declared or initialised. Maybe it's suppose to be exported by Error_Handler???
This error is happening when trying to report another error. You could try replacing (if only temporarily) $Error -> display("$!". __LINE__); with die($!) and checking your server's error log for the error message. That said, it's surely "No such file or directory" or "Permission denied", so maybe it's not worth the time to find out the exact message. (Upd: Actually, I think the message will be "redirected" to your browser, so that makes things easier.)
I'm guessing here, but it looks like it's trying to read the file named by $login_summary. I have no idea where this is set (if at all!), so you might want to find out its value, and maybe where it's getting set.
As ikegami pointed out, the error you are seeing indicates that $Error is not being initialized, and looking at the rest of the script, I would guess that what is needed (first of all) is to initialize it in the same manner as the $File and $Cookie variables. Add this line after line 20 in your script:
my $Error = new Error_Handler;
That might give you a nicer error message, but it will probably just tell you what you already discovered when you added your die($!); line: 'No such file or directory'.
Your script is also doing a file called ./cgi-bin/include/common.cgi. Check this file for your $login_summary variable, to know what file it's trying to access.
I have resolved the problem with a quick fix. I don't know why it work but it does for me.
Here what i did...After reading online i found that adding "-w" on the header of all .(dot)cgi files make it work.
I do hope there is a better method to add "-w" in one place then adding it on all .cgi files.
In short change #!/usr/bin/perl to #!/usr/bin/perl -w
Thanks all.

Resources