perl remove string block from file and save to file - string

I have a file that looks like this:
string 1 {
abc { session 1 }
fairPrice {
ID LU0432618274456
Source 4
service xyz
}
}
string 2 {
abc { session 23 }
fairPrice {
ID LU036524565456171
Source 4
service tzu
}
}
My program should read in the file with a search-parameter given (for example "string 1") and search the complete block until "}" and remove that part from the file. Can someone assist on that...I have some code so far but how can I do the removal and saving to the same file again?
my $fh = IO::File->new( "$fname", "r" ) or die ( "ERROR: Strategy file \"$fname\" not found." );
while($line=<$fh>)
{
if ($line =~ /^\s*string 1\s*\w+\s*\{\s*$/) {
$inside_json_msg = 1;
$msg_json .= $line;
}
else {
if ($inside_json_msg)
{
if ($line =~ m/^\}\s*$/) {
$msg_json.= $line if defined($line);
$inside_json_msg = 0;
} else {
$msg_json .= $line;
}
}
}
}

You code mentions JSON, but your data isn't JSON. If it is JSON and you've just transcribed it badly, then please use a JSON library.
But if your data isn't JSON, then something like this will do the trick.
#!/usr/bin/perl
use strict;
use warnings;
my $match = shift or die "I need a string to match\n";
while (<DATA>) {
# If this is the start of a block we want to remove...
if (/^\s*$match\s+{/) {
# Set $braces to 1 (or 0 if the block closes on this line)
my $braces = /}/ ? 0 : 1;
# While $braces is non-zero
while ($braces) {
# Read the next line of the file
$_ = <DATA>;
# Increment or decrement $braces as appropriate
$braces-- if /}/;
$braces++ if /{/;
}
} else {
# Otherwise, just print the line
print;
}
}
__DATA__
string 1 {
abc { session 1 }
fairPrice {
ID LU0432618274456
Source 4
service xyz
}
}
string 2 {
abc { session 23 }
fairPrice {
ID LU036524565456171
Source 4
service tzu
}
}
Currently, this just prints the output to the console. And I use the DATA filehandle for easier testing. Switching to use real filehandles is left as an exercise for the reader :-)
Update: I decided that I didn't like all the incrementing and decrementing of $braces using regex matches. So here's another (improved?) version that uses y/.../.../ to count the occurrences of opening and closing braces in the line. It's possible that this version might be slightly less readable (the syntax highlighter certainly thinks so).
#!/usr/bin/perl
use strict;
use warnings;
my $match = shift or die "I need a string to match\n";
while (<DATA>) {
if (/^\s*$match\s+{/) {
my $braces = y/{// - y/}//;
while ($braces) {
$_ = <DATA>;
$braces -= y/}//;
$braces += y/{//;
}
} else {
print;
}
}
__DATA__
string 1 {
abc { session 1 }
fairPrice {
ID LU0432618274456
Source 4
service xyz
}
}
string 2 {
abc { session 23 }
fairPrice {
ID LU036524565456171
Source 4
service tzu
}
}
Update 2: Ok, I originally said that dealing with real filehandles would be left as an exercise for the reader. But here's a version that does that.
#!/usr/bin/perl
use strict;
use warnings;
my $match = shift or die "I need a string to match\n";
open my $fh, '+<', 'data' or die $!;
# Read all the data from the file
my #data = <$fh>;
# Empty the file
seek $fh, 0, 0;
truncate $fh, 0;
my $x = 0;
while ($x <= $#data) {
$_ = $data[$x++];
if (/^\s*$match\s+{/) {
my $braces = y/{// - y/}//;
while ($braces) {
$_ = $data[$x++];
$braces -= y/}//;
$braces += y/{//;
}
} else {
print $fh $_;
}
}
Currently, I've hard-coded the filename to be data. I hope it's obvious how to fix that.

Can use Text::Balanced to break the text into blocks delimited by {}, in a way that also keeps the text preceding and following the blocks.
In that list drop the element with the specific skip-pattern (string 1 here) and its following block and retain everything else. Then overwrite the source file with that.
use warnings;
use strict;
use Path::Tiny;
use Text::Balanced qw(extract_bracketed extract_multiple);
my $file = shift // die "Usage: $0 file\n"; #/
my $text = path($file)->slurp;
# returns: 'string 1', BLOCK, 'string 2', BLOCK (may have spaces/newlines)
my #elems = extract_multiple(
$text, [ sub { extract_bracketed($text, '{}') } ]
);
my $skip_phrase = 'string 1';
my (#text_keep, $skip);
for (#elems) {
if (/$skip_phrase/) {
$skip = 1;
next;
}
elsif ($skip) {
$skip = 0;
next
}
push #text_keep, $_;
}
print for #text_keep;
# Overwrite source; uncomment when tested
#open my $fh_out, '>', $file or die "Can't open $file: $!";
#print $fh_out $_ for #text_keep;
Tested with files with more text and blocks, both before and after the one to drop.
Another tool that can be used to extract delimited chunks is in Regexp::Common, see this post.

I would use proper json as format and jq as processor for that format. Rewriting a hack in perl does not make much sense.

Here is an example using Regexp::Grammars:
use feature qw(say);
use strict;
use warnings;
use Data::Printer;
use Regexp::Grammars;
{
my ($block_name, $block_num) = #ARGV;
my $parser = qr!
<nocontext:>
<blocks>
<rule: blocks> <[block]>+
<rule: block> <block_name> <block_num> <braced_item>
<token: block_name> \w+
<token: block_num> \d+
<rule: braced_item> \{ (?: <escape> | <braced_item> | [^{}] )* \}
<token: escape> \\ .
!xms;
my $data = read_file('cfg.txt');
if ($data =~ $parser) {
print_blocks( $/{blocks}{block}, $block_name, $block_num );
}
else {
warn "No match";
}
}
sub print_blocks {
my ( $blocks, $block_name, $block_num ) = #_;
for my $block (#$blocks) {
next if ($block->{block_name} eq $block_name)
&& ($block->{block_num} == $block_num);
say $block->{block_name}, " ", $block->{block_num},
" ", $block->{braced_item}{braced_item};
}
}
sub read_file {
my ( $fn ) = #_;
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
my $str = do { local $/; <$fh> };
close $fh;
return $str;
}

Related

Perl hand Module to threads

i am trying to pass a subroutine from an self written module to threads using the following code.
This is my first time using threads so I'm kinda not familiar with it.
Main Script (shortend)
#!/usr/bin/perl -w
use strict;
use threads;
use lib 'PATH TO LIB';
use goldstandard;
my $delete_raw_files = 0;
my $outfolder = /PATH/;
my %folder = goldstandard -> create_folder($outfolder,$delete_raw_files);
&tagging if $tagging == 1;
sub tagging{
my %hash = goldstandard -> tagging_hash(\%folder);
my #threads;
foreach(keys %hash){
if($_ =~ m/mate/){
my $arguments = "goldstandard -> mate_tagging($hash{$_}{raw},$hash{$_}{temp},$hash{$_}{tagged},$mate_anna,$mate_model)";
push(#threads,$arguments);
}
if($_ =~ m/morpheus/){
my $arguments = "goldstandard -> morpheus_tagging($hash{$_}{source},$hash{$_}{tagged},$morpheus_stemlib,$morpheus_cruncher)";
push(#threads,$arguments)
}
}
foreach(#threads){
my $thread = threads->create($_);
$thread ->join();
}
}
Module
package goldstandard;
use strict;
use warnings;
sub mate_tagging{
my $Referenz = shift;
my $input = shift;
my $output_temp_dir = shift;
my $output_mate_human = shift;
my $anna = shift;
my $model = shift;
opendir(DIR,"$input");
my #dir = readdir(DIR);
my $anzahl = #dir;
foreach(#dir){
unless($_ =~ m/^\./){
my $name = $_;
my $path = $input . $_;
my $out_temp = $output_temp_dir . $name;
my $out_mate_human_final = $output_mate_human . $name;
qx(java -Xmx10G -classpath $anna is2.tag.Tagger -model $model -test $path -out $out_temp);
open(OUT, "> $out_mate_human_final");
open(TEMP, "< $out_temp");
my $output_text;
while(<TEMP>){
unless($_ =~ m/^\s+$/){
if ($_ =~ m/^\d+\t(.*?)\t_\t_\t_\t(.*?)\t_\t/) {
my $tags = $2;
my $words = $1;
print OUT "$words\t$tags\n";
}
}
}
}
}
}
sub morpheus_tagging{
my $Referenz = shift;
my $input = shift;
my $output = shift;
my $stemlib = shift;
my $cruncher = shift;
opendir(DIR,"$input");
my #dir = readdir(DIR);
foreach(#dir){
unless($_ =~ m/^\./){
my $name = $_;
my $path = $input . $_;
my $out = $output . $name;
qx(env MORPHLIB='$stemlib' '$cruncher' < '$path' > '$out');
}
}
}
1;
Executing this code gets me
Thread 1 terminated abnormally: Undefined subroutine &main::goldstandard -> morpheus_tagging(...) called at ... line 43.
I guess eather the way I am calling the treads or the way I am providing the arguments are wrong. I Hope some can help me with that? I Also found something on safe and unsafe modules bum I'm not sure is this is realy the problem.
I guess eather the way I am calling the treads or the way I am providing the arguments are wrong. I Hope some can help me with that? I Also found something on safe and unsafe modules bum I'm not sure is this is realy the problem.Thanks in advance
You must pass the name of a sub or a reference to a sub, plus arguments, to threads->create. So you need something like
my $method_ref = $invoker->can($method_name);
threads->create($method_ref, $invoker, #args);
That said, passing arguments to threads->create has issues that can be avoided by using a closure.
threads->create(sub { $invoker->$method_name(#args) })
The above can be written more simply as follows:
async { $invoker->$method_name(#args) }
This gets us the following:
sub tagging {
my %hash = goldstandard->tagging_hash(\%folder);
my #jobs;
for (keys %hash) {
if (/mate/) {
push #jobs, [ 'goldstandard', 'mate_tagging',
$hash{$_}{raw},
$hash{$_}{temp},
$hash{$_}{tagged},
$mate_anna,
$mate_model,
];
}
if (/morpheus/) {
push #jobs, [ 'goldstandard', 'morpheus_tagging',
$hash{$_}{source},
$hash{$_}{tagged},
$morpheus_stemlib,
$morpheus_cruncher,
];
}
}
my #threads;
for my $job (#jobs) {
my ($invoker, $method_name, #args) = #$job;
push #threads, async { $invoker->$method_name(#args) };
}
$_->join for #threads;
}
or just
sub tagging {
my %hash = goldstandard->tagging_hash(\%folder);
my #threads;
for (keys %hash) {
if (/mate/) {
push #threads, async {
goldstandard->mate_tagging(
$hash{$_}{raw},
$hash{$_}{temp},
$hash{$_}{tagged},
$mate_anna,
$mate_model,
);
};
}
if (/morpheus/) {
push #threads, async {
goldstandard->morpheus_tagging(
$hash{$_}{source},
$hash{$_}{tagged},
$morpheus_stemlib,
$morpheus_cruncher,
);
};
}
}
$_->join for #threads;
}
Notes that I delayed the calls to join until after all the threads are created. Your way made it so only one thread would run at a time.
But what we have isn't great. We have no way of limiting how many threads are active at a time, and we (expensively) create many threads instead of reusing them. We can use a worker pool to solve both of these problems.
use constant NUM_WORKERS => 5;
use Thread::Queue 3.01 qw( );
my $q;
sub tagging {
my %hash = goldstandard->tagging_hash(\%folder);
my #threads;
for (keys %hash) {
if (/mate/) {
$q->enqueue(sub {
goldstandard->mate_tagging(
$hash{$_}{raw},
$hash{$_}{temp},
$hash{$_}{tagged},
$mate_anna,
$mate_model,
);
});
}
if (/morpheus/) {
$q->enqueue(sub {
goldstandard->morpheus_tagging(
$hash{$_}{source},
$hash{$_}{tagged},
$morpheus_stemlib,
$morpheus_cruncher,
);
});
}
}
}
{
$q = Thread::Queue->new();
for (1..NUM_WORKERS) {
async {
while ( my $job = $q->dequeue() ) {
$job->();
}
};
}
... call tagging and whatever ...
$q->end();
$_->join() for threads->list();
}

Sharing a thread variable without making it global (Perl)

I'm trying to write a simple script that uses threads and shares a variable, but I don't want to make this variable global to the whole script. Below is a simplified example.
use strict;
use warnings;
use threads;
use threads::shared;
my $val:shared;
# Create threads
for my $i (1 .. 5) {
threads->create(\&do_something, $i);
}
# Wait for all threads to complete
map { $_->join(); } threads->list();
# $val is global to the script so this line will work!
print "VAL IS: $val\n";
sub do_something {
my $i = shift;
print "Doing something with thread $i!\n";
{
lock $val;
$val = "SOMETHING IS $i";
print "$val\n\n";
}
}
Output:
Doing something with thread 1!
SOMETHING IS 1
Doing something with thread 2!
SOMETHING IS 2
Doing something with thread 3!
SOMETHING IS 3
Doing something with thread 4!
SOMETHING IS 4
Doing something with thread 5!
SOMETHING IS 5
VAL IS: SOMETHING IS 5
How can I get this effect without making $val accessible to the whole script? In other words, how can I make it so attempting to print VAL IS: $val will fail, but the variable will still be successfully shared by the threads?
I can't define it like this:
# Create threads
for my $i (1 .. 5) {
my $val:shared;
threads->create(\&do_something, $i);
}
Or I will get:
Global symbol "$val" requires explicit package
What is the right way to lexically scope a shared variable?
Pass a reference to it as an argument.
sub do_something {
my ($id, $lock_ref) = #_;
print("$id: Started\n");
{
lock $$lock_ref;
print("$id: Exclusive\n");
sleep(1);
}
print("$id: done.\n");
}
{
my $lock :shared;
for my $id (1..5) {
async { do_something($id, \$lock); };
}
}
Or scope it so only the worker subs can see it.
{
my $lock :shared;
sub do_something {
my ($id) = #_;
print("$id: Started\n");
{
lock $lock;
print("$id: Exclusive\n");
sleep(1);
}
print("$id: done.\n");
}
}
for my $id (1..5) {
async { do_something($id); };
}
You can limit the scope of shared variable (make sure that perl sees shared variable before thread creation),
# ..
{
my $val:shared;
sub do_something {
my $i = shift;
print "Doing something with thread $i!\n";
{
lock $val;
$val = "SOMETHING IS $i";
print "$val\n\n";
}
}
}
# Create threads
for my $i (1 .. 5) {
threads->create(\&do_something, $i);
}
# ...

How to pause and resume a multithread perl script?

I have written the perl script to pause and resume.When the user enters Ctrl+c it has to pause and on pressing c it should resume. But is not working properly as expected. Can anyone help me on this what mistake i am making:
use strict;
use threads;
use threads::shared;
use Thread::Suspend;
use Lens;
$SIG{'INT'} = 'Pause';
#$| = 1;
print chr(7);
my $nthreads = 64;
my #thrs;
for(1..$nthreads)
{
print "START $_ \n";
my ($thr) = threads->create(\&worker, $_);
push #thrs ,$thr;
}
$_->join for #thrs;
exit;
sub worker
{
my $id = shift;
my $tmp;
my $lens = Lens->new("172.16.1.65:2000");
die "cannot create object" unless defined $lens;
die "cannot connect to XRay at " unless defined $lens->open("172.16.1.65:2000");
for(1..100000)
{
print "Thread $id \n";
}
print "$id>LOAD EXIT\n";
}
sub Pause
{
sleep(1);
print "\nCaught ^C\n";
print "Press \"c\" to continue, \"e\" to exit: ";
$_->suspend() for #thrs;
while (1)
{
my $input = lc(getc());
chomp ($input);
if ($input eq 'c') {
#clock($hour,$min,$sec);
$_->resume() for #thrs;
return;
}
elsif ($input eq 'e') {
exit 1;
}
}
}
Well, you haven't been too specific as to how it's "not working properly". But I would suggest looking at using Thread::Semaphore for a 'suspend' mechanism.
I would also suggest not using signal and instead doing something like:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Semaphore;
use Term::ReadKey;
my $nthreads = 64;
my $thread_semaphore = Thread::Semaphore->new($nthreads);
sub worker {
for ( 1 .. 10 ) {
$thread_semaphore->down();
print threads->self->tid(), "\n";
sleep 1;
$thread_semaphore->up();
}
}
for ( 1 .. $nthreads ) {
threads->create( \&worker );
}
my $keypress;
ReadMode 4;
while ( threads->list(threads::running) ) {
while ( not defined( $keypress = ReadKey(-1) )
and threads->list(threads::running) )
{
print "Waiting\nRunning:". threads->list(threads::running) . "\n";
sleep 1;
}
print "Got $keypress\n";
if ( $keypress eq "p" ) {
print "Pausing...";
$thread_semaphore -> down_force($nthreads);
print "All paused\n";
}
if ( $keypress eq "c" ) {
print "Resuming...";
$thread_semaphore -> up ( $nthreads );
}
}
ReadMode 0;
foreach my $thr ( threads->list ) {
$thr->join();
}
It'll 'suspend' by setting the semaphores to zero (or negative) and relies on the threads checking if they should be stopping here or not.
I think the root of your problem though, will probably be signal propagation - your signal handler is global across your threads. You might find configuring $SIG{'INT'} for your threads separately will yield better results. (E.g. set the signal handler to 'IGNORE' at the start of your code, and set specific ones in the thread/main once the threads have been spawned).

Perl Multithread Program

I am new in Perl. I want to write a Perl script using thread.I have few files say 20 files and want to process those files using 5 threads in 4 batches. I am printing the thread no. After completing one batch ,the thread no must start with 1 for the next batch. But instead of that its creating 20 threads.please help. my code is as follows:
#!/usr/bin/perl -w
use strict;
use warnings;
use threads;
use threads::shared;
my $INPUT_DIR="/home/Documents/myscript/IMPORTLDIF/";
opendir(DIR, $INPUT_DIR) ;
my #files = grep { /^InputFile/ } readdir DIR;
my $count = #files;
#print "Total Files: $count \n";
my #threads;
my $noofthread = 5;
my $nooffiles = $count;
my $noofbatch = $nooffiles / $noofthread;
#print "No of batch: $noofbatch \n";
my $fileIndex = 0;
my $batch = 1;
while ($fileIndex < $nooffiles) {
print "Batch: $batch \n";
for (my $i=0; $i < $noofthread && $fileIndex < $nooffiles ; $i++) {
my $t = threads->new(\&doOperation, $files[$fileIndex], $i)->join;
push(#threads, $t);
$fileIndex++;
print "FileIndex: $fileIndex \n";
}
$batch++;
}
sub doOperation () {
my $ithread = threads->tid() ;
print "Thread Index : [id=$ithread]\n" ;
foreach my $item (#_){
my $filename = $item;
print "Filename name: $filename \n";
}
Edited program using thread queue:
#!/usr/bin/perl -w
# This is compiled with threading support
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
my $INPUT_DIR="/home/Documents/myscript/IMPORTLDIF/";
opendir(DIR, $INPUT_DIR) or die "Cannot opendir: $!";
my #thrs = threads->create(\&doOperation ) for 1..5;#for 5 threads
#my #files = `ls -1 /home/Documents/myscript/IMPORTLDIF/`;
my #files = grep { /^Input/ } readdir DIR or die "File not present present. \n";
chomp(#files);
#add files to queue
foreach my $f (#files){
# Send work to the thread
$q->enqueue($f);
print "Pending items: " + $q->pending()."\n";
}
$q->enqueue('_DONE_') for #thrs;
$_->join() for #thrs;
sub doOperation () {
my $ithread = threads->tid() ;
while (my $filename = $q->dequeue()) {
# Do work on $item
return 1 if $filename eq '_DONE_';
print "[id=$ithread]\t$filename\n";
}
return 1;
}
You are spawning a thread and then waiting for it to complete before spawning the next, each thread handling one file. That is why you see as many threads as you have files.
my $t = threads->new(\&doOperation, $files[$fileIndex], $i)->join;
^^^^--- This will block
Instead try something like this:
....
# split the workload into N batches
#
while (my #batch = splice(#files, 0, $batch_size)) {
push #threads, threads->new(\&doOperation, #batch);
}
# now wait for all workers to finish
#
for my $thr (#threads) {
$thr->join;
}
As an aside, Thread::Queue and Thread-Pool might imply better designs for the work you want to do.
You could use Paralel:Queue and create 4 thread and pass them items that they could work on.
To fork or not to fork?
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
my #thrs;
push #thrs, threads->create(\&doOperation ) for 1..5;#for 5 threads
my #files = `ls -1 /tmp/`;chomp(#files);
#add files to queue
foreach my $f (#files){
# Send work to the thread
$q->enqueue($f);
print "Pending items: "$q->pending()."\n";
}
$q->enqueue('_DONE_') for #thrs;
$_->join() for threads->list();
sub doOperation () {
my $ithread = threads->tid() ;
while (my $filename = $q->dequeue()) {
# Do work on $item
return 1 if $filename eq '_DONE_';
print "[id=$ithread]\t$filename\n";
}
return 1;
}

Perl: How to push a hash into an array that is outside of a subroutine

I originally experimented with trying to send a hash object through Thread::Queue, but according to this link, my versions of Thread::Queue and threads::shared is too old. Unfortunately, since the system I'm testing on isn't mine, I can't upgrade.
I then tried to use a common array to store my hashes. Here is the code so far:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
use constant NUM_WORKERS => 10;
my #out_array;
test1();
sub test1
{
my $in_queue = Thread::Queue->new();
foreach (1..NUM_WORKERS) {
async {
while (my $job = $in_queue->dequeue()) {
test2($job);
}
};
}
my #sentiments = ("Axe Murderer", "Mauler", "Babyface", "Dragon");
$in_queue->enqueue(#sentiments);
$in_queue->enqueue(undef) for 1..NUM_WORKERS;
$_->join() for threads->list();
foreach my $element (#out_array) {
print "element: $element\n";
}
}
sub test2
{
my $string = $_[0];
my %hash = (Skeleton => $string);
push #out_array, \%hash;
}
However, at the end of the procedure, #out_array is always empty. If I remove the threading parts of the script, then #out_array is correctly populated. I suspect I'm implementing threading incorrectly here.
How would I correctly populate #out_array in this instance?
You need to make it shared
use threads::shared;
my #out_array :shared;
I don't think you need to lock it if all you do is push onto it, but if you did, you'd use
lock #out_array;
You need to share any array or hash referenced by a value you push onto it using the tools in thread::shared.
push #out_array, share(%hash);
Though as I mentioned earlier, I'd use a Thread::Queue.
sub test2 {
my ($string) = #_;
my %hash = ( Skeleton => $string );
return \%hash;
}
...
my $response_q = Thread::Queue->new()
my $running :shared = NUM_WORKERS;
...
async {
while (my $job = $request_q->dequeue()) {
$response_q->enqueue(test2($job));
}
{ lock $running; $response_q->enqueue(undef) if !--$running; }
};
...
$request_q->enqueue(#sentiments);
$request_q->enqueue(undef) for 1..NUM_WORKERS;
while (my $response = $response_q->dequeue()) {
print "Skeleton: $response->{Skeleton}\n";
}
$_->join() for threads->list();
Note that lack of anything thread-specific in test2. This is good. You should always strive for separation of concerns.
You need to return your data from thread:
....
async {
my $data;
while (my $job = $in_queue->dequeue()) {
$data = test2($job);
}
return $data;
};
...
for ( threads->list() ) {
my $data = $_->join();
#now you have this thread return value in $data
}
sub test2
{
my $string = $_[0];
my %hash = (Skeleton => $string);
return \%hash;
}
I found my answer in the example here.
I had to change 2 things:
share the #out_array outside both subs
share the %hash in test2
add return; to the end of test2
Code outside both subs:
my #out_array : shared = ();
test2 sub:
sub test2
{
my $string = $_[0];
my %hash : shared;
$hash{Skeleton} = $string;
push #out_array, \%hash;
return;
}

Resources