Extract text and write to a new file in Perl

Extract text and write to a new file in Perl - string

Simply put, I want to extract text from a file and save that text to a new file, using Perl.
Here is my code, thus far:
#!/usr/local/bin/perl
use warnings;
use strict;
use File::Slurp;
use FileHandle;
use Fcntl qw(:DEFAULT :flock :seek); # Import LOCK_* constants
my $F_IN = FileHandle->new("<$ARGV[0]");
my $F_OUT = FileHandle->new(">PerlTest.txt");
while (my $line = $F_IN->getline) {
$line =~ m|foobar|g;
$F_OUT->print($line);
# I want to only copy the text that matches, not the whole line.
# I changed the example text to 'foobar' to avoid confusion.
}
$F_IN->close();
$F_OUT->close();
Obviously, it's copying the line. How can I extract and print specific text from a file, instead of the whole line?

If it can only happen once per line:
while (<>) {
print "$1\n" if /(thebigredpillow)/;
}
If it can happen multiple times per line:
while (<>) {
while (/(thebigredpillow)/g) {
print "$1\n";
}
}
Usage:
script file.in >file.out

You could use capturing parentheses to grab the matched string:
while (my $line = $F_IN->getline) {
if ($line =~ m|(thebigredpillow)|) {
$F_OUT->print("$1\n");
}
}
See perldoc perlre.

#!/usr/local/bin/perl
use warnings;
use strict;
use IO::All;
my #lines = io($ARGV[0])->slurp;
foreach(#lines) {
if(/thebigredpillow/g) {
$_ >> io('PerlTest.txt');
}
}

Related

(Perl) How to turn string into date format and find most recent?

I'm using Perl and have an input file with multiple dates such as 17/04/2021 written in it as text.
How could I go about turning them into date formats and then comparing them to see which one is the most recent?
Input file format:
01/09/2020
23/10/2019
12/06/2022
15/08/2017
Perl Script:
#! /usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my $InputFile = "path/to/file.input";
open(FH, '<', $InputFile) or die $!;
while(my $Line = <FH>)
{
}
close(FH);
Thanks.

Dates in the format yyyymmdd can be compared directly, numerically or lexically. So turn it around
use warnings;
use strict;
use feature 'say';
# use List::Util qw(max);
die "Usage: $0 file\n" if not #ARGV;
my #dates;
while (<>) {
chomp;
push #dates, join '', reverse split '/';
}
#dates = sort { $a <=> $b } #dates; # latest: $dates[-1]
say for #dates;
# Or, if only the last one is needed (uncomment 'use' statement)
# my $latest_date = max #dates;
The "diamond operator" <> reads line by line files submitted on the command line, when used in scalar context. The split argument for the separator is still a regular expression, even as I use '' delimiters (instead of /\//). Its next (optional) argument, an expression yielding the string to split, is by default $_ variable.
Also see
reverse,
join,
sort, and List::Util, as needed.
Can do it in a commnad-line program ("one-liner") as well
perl -wnlE'push #d, join "", reverse split "/"; }{ say for sort #d' file
where }{ stands for the beginning of END { } block. Or, for the latest date only
perl -MList::Util=max -wnlE'... }{ say max #d' file
If you'd like it more compact,
use warnings;
use strict;
use feature 'say';
say for sort map { chomp; join '', reverse split '/' } <>;
That same diamond operator in the list context returns all lines at once, and here its output is fed to map and that imposes the list context.
Or on the command line
perl -wE'say for sort map { chomp; join "", reverse split "/" } <>' file

strptime is (always) your friend:
#!/usr/bin/env perl
use 5.12.10;
use Time::Piece;
my $fmt='%d/%m/%Y';
my #t;
while( <DATA> ){
chop;
eval { push #t, Time::Piece->strptime($_, $fmt) } or
say STDERR "Unexpected format in input: $_";
}
say $_->strftime($fmt) foreach sort #t;
__DATA__
01/09/2020
01/09/2020
23/10/2019
12/06/2022
15/08/2017
To do this as a one-liner, you can do:
perl -MTime::Piece -0777 -aE '$f="%d/%m/%Y";
say foreach sort { $a <=> $b } map Time::Piece->strptime($_, $f), #F'
The one-liner is not quite the same, though as it will handle multiple dates on a line while the script is strict about each line containing only one date.

Here's one way:
#! /usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use Time::Local;
my $InputFile = $ARGV[0];
open(my $fh, '<', $InputFile) or die $!;
## A hash to hold the times so we can sort later
my %seconds;
while(my $Line = <$fh>){
chomp($Line);
my ($day, $month, $year) = split(/\//, $Line);
my $secondsSinceTheEpoch = timelocal(0, 0, 0, $day, $month-1, $year);
$seconds{$secondsSinceTheEpoch}++
}
close($fh);
my #sortedSeconds = sort {$a <=> $b} keys(%seconds);
print "$sortedSeconds[0]\n";
Or, if you're into the whole brevity thing:
#! /usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use Time::Local;
## A hash to hold the times so we can sort later
my %seconds;
while(<>){
chomp();
my ($day, $month, $year) = split(/\//);
$seconds{timelocal(0, 0, 0, $day, $month-1, $year)}++
}
my #sortedSeconds = sort {$a <=> $b} keys(%seconds);
print "$sortedSeconds[0]\n";
In both cases, you need to pass the file to the script as an argument:
$ foo.pl file
1502744400

Perl script to search a word inside the directory

I'am looking for a perl script to grep for a string in all files inside a directory .
bash command .
Code:
grep -r 'word' /path/to/dir

This is a fairly canonical task while I couldn't find straight answers with a possibly easiest and simples tool for the job, the handy Path::Tiny
use warnings;
use strict;
use feature 'say';
use Data::Dump; # dd
use Path::Tiny; # path
my $dir = shift // '.';
my $pattern = qr/word/;
my $ret = path($dir)->visit(
sub {
my ($entry, $state) = #_;
return if not -f;
for ($entry->lines) {
if (/$pattern/) {
print "$entry: $_";
push #{$state->{$entry}}, $_;
}
}
},
{ recurse => 1 }
);
dd $ret; # print the returned complex data structure
The way a file is read here, using lines, is just one way to do that. It may not be suitable for extremely large files as it reads all lines at once, where one better read line by line.
The visit method is based on iterator, which accomplishes this task cleanly as well
my $iter = path($dir)->iterator({ recurse => 1 });
my $info;
while (my $e = $iter->()) {
next if not -f $e;
# process the file $e as needed
#/$pattern/ and push #{$info->{$e}}, $_ and print "$e: $_"
# for $e->lines
}
Here we have to provide a data structure to accumulate information but we get more flexibility.
The -f filetest used above, of a "plain" file, is still somewhat permissive; it allows for swap files, for example, which some editors keep during a session (vim for instance). Those will result in all kinds of matches. To stay with purely ASCII or UTF-8 files use -T test.
Otherwise, there are libraries for recursive traversal and searching, for example File::Find (or File::Find::Rule) or Path::Iterator::Rule.
For completeness, here is a take with the core File::Find
use warnings;
use strict;
use feature 'say';
use File::Find;
my #dirs = #ARGV ? #ARGV : '.';
my $pattern = qr/word/;
my %res;
find( sub {
return if not -T; # ASCII or UTF-8 only
open my $fh, '<', $_ or do {
warn "Error opening $File::Find::name: $!";
return;
};
while (<$fh>) {
if (/$pattern/) {
chomp;
push #{$res{$File::Find::name}}, $_
}
}
}, #dirs
);
for my $k (keys %res) {
say "In file $k:";
say "\t$_" for #{$res{$k}};
}

How to get Perl to loop over all files in a directory?

I have a Perl script with contains
open (FILE, '<', "$ARGV[0]") || die "Unable to open $ARGV[0]\n";
while (defined (my $line = <FILE>)) {
# do stuff
}
close FILE;
and I would like to run this script on all .pp files in a directory, so I have written a wrapper script in Bash
#!/bin/bash
for f in /etc/puppet/nodes/*.pp; do
/etc/puppet/nodes/brackets.pl $f
done
Question
Is it possible to avoid the wrapper script and have the Perl script do it instead?

Yes.
The for f in ...; translates to the Perl
for my $f (...) { ... } (in the case of lists) or
while (my $f = ...) { ... } (in the case of iterators).
The glob expression that you use (/etc/puppet/nodes/*.pp) can be evaluated inside Perl via the glob function: glob '/etc/puppet/nodes/*.pp'.
Together with some style improvements:
use strict; use warnings;
use autodie; # automatic error handling
while (defined(my $file = glob '/etc/puppet/nodes/*.pp')) {
open my $fh, "<", $file; # lexical file handles, automatic error handling
while (defined( my $line = <$fh> )) {
do stuff;
}
close $fh;
}
Then:
$ /etc/puppet/nodes/brackets.pl

This isn’t quite what you asked, but another possibility is to use <>:
while (<>) {
my $line = $_;
# do stuff
}
Then you would put the filenames on the command line, like this:
/etc/puppet/nodes/brackets.pl /etc/puppet/nodes/*.pp
Perl opens and closes each file for you. (Inside the loop, the current filename and line number are $ARGV and $. respectively.)

Jason Orendorff has the right answer:
From Perlop (I/O Operators)
The null filehandle <> is special: it can be used to emulate the behavior of sed and awk, and any other Unix filter program that takes a list of filenames, doing the same to each line of input from all of them. Input from <> comes either from standard input, or from each file listed on the command line.
This doesn't require opendir. It doesn't require using globs or hard coding stuff in your program. This is the natural way to read in all files that are found on the command line, or piped from STDIN into the program.
With this, you could do:
$ myprog.pl /etc/puppet/nodes/*.pp
or
$ myprog.pl /etc/puppet/nodes/*.pp.backup
or even:
$ cat /etc/puppet/nodes/*.pp | myprog.pl

take a look at this documentation it explains all you need to know
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/tmp';
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .pp
next unless ($file =~ m/\.pp$/);
open (FILE, '<', $file) || die "Unable to open $file\n";
while (defined (my $line = <FILE>)) {
# do stuff
}
}
closedir(DIR);
exit 0;

I would suggest to put all filenames to array and then use this array as parameters list to your perl method or script. Please see following code:
use Data::Dumper
$dirname = "/etc/puppet/nodes";
opendir ( DIR, $dirname ) || die "Error in opening dir $dirname\n";
my #files = grep {/.*\.pp/} readdir(DIR);
print Dumper(#files);
closedir(DIR);
Now you can pass \#files as parameter to any perl method.

my #x = <*>;
foreach ( #x ) {
chomp;
if ( -f "$_" ) {
print "process $_\n";
# do stuff
next;
};
};

Perl can shell out to execute system commands in various ways, the most straightforward is using backticks ``
use strict;
use warnings FATAL => 'all';
my #ls = `ls /etc/puppet/nodes/*.pp`;
for my $f ( #ls ) {
open (my $FILE, '<', $f) || die "Unable to open $f\n";
while (defined (my $line = <$FILE>)) {
# do stuff
}
close $FILE;
}
(Note: you should always use strict; and use warnings;)

Perl Inserting a string from a file after every occurence of a slash in a url

I have the following URL's:
FILE1.txt
http://www.stackoveflow.com/dog/cat/rabbit/hamster/
192.168.192.168/lion/tiger/elephant/
FILE2.txt
HELLO
GOODBYE
The output I am trying to achieve:
http://www.stackoveflow.com/dogHELLO/cat/rabbit/hamster/
http://www.stackoveflow.com/dog/catHELLO/rabbit/hamster/
http://www.stackoveflow.com/dog/cat/rabbitHELLO/hamster/
http://www.stackoveflow.com/dog/cat/rabbit/hamsterHELLO/
http://www.stackoveflow.com/dog/cat/rabbit/hamster/HELLO
http://www.stackoveflow.com/dogGOODBYE/cat/rabbit/hamster/
http://www.stackoveflow.com/dog/catGOODBYE/rabbit/hamster/
http://www.stackoveflow.com/dog/cat/rabbitGOODBYE/hamster/
http://www.stackoveflow.com/dog/cat/rabbit/hamsterGOODBYE/
http://www.stackoveflow.com/dog/cat/rabbit/hamster/GOODBYE
192.168.192.168/lionHELLO/tiger/elephant/
192.168.192.168/lion/tigerHELLO/elephant/
192.168.192.168/lion/tiger/elephantHELLO/
192.168.192.168/lion/tiger/elephant/HELLO
192.168.192.168/lionGOODBYE/tiger/elephant/
192.168.192.168/lion/tigerGOODBYE/elephant/
192.168.192.168/lion/tiger/elephantGOODBYE/
192.168.192.168/lion/tiger/elephant/GOODBYE
As you can see the strings HELLO and GOODBYE are inserted after every slash, and if there is already a string after the slash it will append the HELLO and GOODBYE after that (e.g http://www.stackoveflow.com/dogHELLO/cat/rabbit/hamster/ and so on).
What I have tried
use strict;
use warnings;
my #f1 = do {
open my $fh, '<', 'FILE1.txt';
<$fh>;
};
chomp #f1;
my #f2 = do {
open my $fh, '<', 'FILE2.txt';
<$fh>;
};
chomp #f2;
for my $f1 (#f1) {
my #fields = $f1 =~ m{[^/]+}g;
for my $f2 (#f2) {
for my $i (0 .. $#fields) {
my #new = #fields;
$new[$i] .= $f2;
print qq{/$_/\n}, for join '/', #new;
}
print "\n\n";
}
}
#courtesy of Borodin
However this code does not cater for url's that have the slashes in the http:// part as these are replaced with http:HELLO/ when it should not do.
Also it does not put HELLO or GOODBYE after the slash if there is no string already there e.g http://www.stackoveflow.com/dog/cat/rabbit/hamster/<--SHOULD PUT HELLO AFTER THIS SLASH AS WELL BUT DOSN'T
It appears that this code removes then re-inserts the slashes with the strings from FILE2.txt, as opposed to inserting HELLO and GOODBYE in the correct place to start with.
My question
Is there a better method of going about achieving the output I require or is there something I can do to my existing code to cater for the problems described above?
Your help is much appreciated, many thanks

Here is the algorithm in prose:
Open File2.txt. Read in all lines, removing the newline. We call the array #words.
Open File2.txt. We call the file handle $fh.
As long as we can read a $line from $fh:
Remove the newline, remove starting and ending slashes.
Split the $line at every slash, call the array #animals.
Loop through the #words, calling each element $word:
Loop through the indices of the #animals, calling each index $i:
Make a #copy of the #animals.
Append the $word to the $i-th element of #copy.
Join the #copy with slashes, surround it with slashes, and print with newline.
Print an empty line.

This program will do what you ask.
use strict;
use warnings;
use autodie;
my #f1 = do {
open my $fh, '<', 'FILE1.txt';
<$fh>;
};
chomp #f1;
my #f2 = do {
open my $fh, '<', 'FILE2.txt';
<$fh>;
};
chomp #f2;
for my $f1 (#f1) {
my #fields = $f1 =~ m{[^/]+}g;
for my $f2 (#f2) {
for my $i (0 .. $#fields) {
my #new = #fields;
$new[$i] .= $f2;
print qq{/$_/\n}, for join '/', #new;
}
print "\n\n";
}
}
output
/dogHELLO/cat/rabbit/hamster/
/dog/catHELLO/rabbit/hamster/
/dog/cat/rabbitHELLO/hamster/
/dog/cat/rabbit/hamsterHELLO/
/dogGOODBYE/cat/rabbit/hamster/
/dog/catGOODBYE/rabbit/hamster/
/dog/cat/rabbitGOODBYE/hamster/
/dog/cat/rabbit/hamsterGOODBYE/
/lionHELLO/tiger/elephant/
/lion/tigerHELLO/elephant/
/lion/tiger/elephantHELLO/
/lionGOODBYE/tiger/elephant/
/lion/tigerGOODBYE/elephant/
/lion/tiger/elephantGOODBYE/

Rather than splitting the line on every slash, you can do it all with a regex.
Updated version:
#!usr/bin/perl
use strict;
use warnings;
my #insert_words = qw/HELLO GOODBYE/;
my $word = 0;
while (<DATA>)
{
chomp;
foreach my $word (#insert_words)
{
my $repeat = 1;
while ((my $match=$_) =~ s|(?<!/)(?:/(?!/)[^/]*){$repeat}[^/]*\K|$word|)
{
print "$match\n";
$repeat++;
}
print "\n";
}
}
__DATA__
/dog/cat/rabbit/hamster/
http://www.stackoverflow.com/dog/cat/rabbit/hamster/
The key is the substitution operator: s|(?<!/)(?:/(?!/)[^/]*){$repeat}[^/]*\K|$word|.
(?<!/) and (?!/) are negative look-behind and look-ahead, respectively. They ensure that we are only matching a single /, thus ignoring http://.
(?:/(?!/)[^/]*){$repeat} is a capturing group that must match a specified number of times, and we increase that number until it no longer matches.
I had to use [^/]* instead of [^/]+ to meet your requirement of matching at the end of the string. That is why both the look-behind and the look-ahead are needed.
\K means "match everything up to this point, but don't include it in the match itself." Thus we don't have to worry about including the whole beginning of the string that matched in the replacement.
Note: The r option is another way to perform substitution without modifying the original string. However, it requires Perl 5.16 (thanks Amon). Thus I removed it from the example.

Perl: adding a string to $_ is producing strange results

I wrote a super simple script:
#!/usr/bin/perl -w
use strict;
open (F, "<ids.txt") || die "fail: $!\n";
my #ids = <F>;
foreach my $string (#ids) {
chomp($string);
print "$string\n";
}
close F;
This is producing an expected output of all the contents of ids.txt:
hello
world
these
annoying
sourcecode
lines
Now I want to add a file-extension: .txt for every line. This line should do the trick:
#!/usr/bin/perl -w
use strict;
open (F, "<ids.txt") || die "fail: $!\n";
my #ids = <F>;
foreach my $string (#ids) {
chomp($string);
$string .= ".txt";
print "$string\n";
}
close F;
But the result is as follows:
.txto
.txtd
.txte
.txtying
.txtcecode
Instead of appending ".txt" to my lines, the first 4 letters of my string will be replaced by ".txt" Since I want to check if some files exist, I need the full filename with extension.
I have tried to chop, chomp, to substitute (s/\n//), joins and whatever. But the result is still a replacement instead of an append.
Where is the mistake?

Chomp does not remove BOTH \r and \n if the file has DOS line endings and you are running on Linux/Unix.
What you are seeing is actually the original string, a carriage return, and the extension, which overwrites the first 4 characters on the display.
If the incoming file has DOS/Windows line endings you must remove both:
s/\R+$//

A useful debugging technique when you are not quite sure why your data is getting set to what it is is to dump it with Data::Dumper:
#!/usr/bin/perl -w
use strict;
use Data::Dumper ();
$Data::Dumper::Useqq = 1; # important to be able to actually see differences in whitespace, etc
open (F, "<ids.txt") || die "fail: $!\n";
my #ids = <F>;
foreach my $string (#ids) {
chomp($string);
print "$string\n";
print Data::Dumper::Dumper( { 'string' => $string } );
}
close F;

have you tried this?
foreach my $string (#ids) {
chomp($string);
print $string.".txt\n";
}
I'm not sure what's wrong with your code though. these results are strange

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extract text and write to a new file in Perl - string

If it can only happen once per line: while (<>) { print "$1\n" if /(thebigredpillow)/; } If it can happen multiple times per line: while (<>) { while (/(thebigredpillow)/g) { print "$1\n"; } } Usage: script file.in >file.out

You could use capturing parentheses to grab the matched string: while (my $line = $F_IN->getline) { if ($line =~ m|(thebigredpillow)|) { $F_OUT->print("$1\n"); } } See perldoc perlre.

#!/usr/local/bin/perl use warnings; use strict; use IO::All; my #lines = io($ARGV[0])->slurp; foreach(#lines) { if(/thebigredpillow/g) { $_ >> io('PerlTest.txt'); } }

Related

(Perl) How to turn string into date format and find most recent?

Perl script to search a word inside the directory

How to get Perl to loop over all files in a directory?

Perl Inserting a string from a file after every occurence of a slash in a url

Perl: adding a string to $_ is producing strange results

Categories

Resources