Perl script to search a word inside the directory - linux

I'am looking for a perl script to grep for a string in all files inside a directory .
bash command .
Code:
grep -r 'word' /path/to/dir

This is a fairly canonical task while I couldn't find straight answers with a possibly easiest and simples tool for the job, the handy Path::Tiny
use warnings;
use strict;
use feature 'say';
use Data::Dump; # dd
use Path::Tiny; # path
my $dir = shift // '.';
my $pattern = qr/word/;
my $ret = path($dir)->visit(
sub {
my ($entry, $state) = #_;
return if not -f;
for ($entry->lines) {
if (/$pattern/) {
print "$entry: $_";
push #{$state->{$entry}}, $_;
}
}
},
{ recurse => 1 }
);
dd $ret; # print the returned complex data structure
The way a file is read here, using lines, is just one way to do that. It may not be suitable for extremely large files as it reads all lines at once, where one better read line by line.
The visit method is based on iterator, which accomplishes this task cleanly as well
my $iter = path($dir)->iterator({ recurse => 1 });
my $info;
while (my $e = $iter->()) {
next if not -f $e;
# process the file $e as needed
#/$pattern/ and push #{$info->{$e}}, $_ and print "$e: $_"
# for $e->lines
}
Here we have to provide a data structure to accumulate information but we get more flexibility.
The -f filetest used above, of a "plain" file, is still somewhat permissive; it allows for swap files, for example, which some editors keep during a session (vim for instance). Those will result in all kinds of matches. To stay with purely ASCII or UTF-8 files use -T test.
Otherwise, there are libraries for recursive traversal and searching, for example File::Find (or File::Find::Rule) or Path::Iterator::Rule.
For completeness, here is a take with the core File::Find
use warnings;
use strict;
use feature 'say';
use File::Find;
my #dirs = #ARGV ? #ARGV : '.';
my $pattern = qr/word/;
my %res;
find( sub {
return if not -T; # ASCII or UTF-8 only
open my $fh, '<', $_ or do {
warn "Error opening $File::Find::name: $!";
return;
};
while (<$fh>) {
if (/$pattern/) {
chomp;
push #{$res{$File::Find::name}}, $_
}
}
}, #dirs
);
for my $k (keys %res) {
say "In file $k:";
say "\t$_" for #{$res{$k}};
}

Related

search multi line string from multiple files in a directory

the string to to be searched is:
the file_is being created_automaically {
period=20ns }
the perl script i am using is following ( this script is working fine for single line string but not working for multi line )
#!/usr/bin/perl
my $dir = "/home/vikas";
my #files = glob( $dir . '/*' );
#print "#files";
system ("rm -rf $dir/log.txt");
my $list;
foreach $list(#files){
if( !open(LOGFILE, "$list")){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: unable to open file";
close (File);
else {
while (<LOGFILE>){
if($_ =~ /".*the.*automaically.*\{\n.*period\=20ns.*\}"/){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: File contain the required string\n";
close (File);
break;
}
}
close (LOGFILE);
}
}
This code does not compile, it contains errors that causes it to fail to execute. You should never post code that you have not first tried to run.
The root of your problem is that for a multiline match, you cannot read the file in line-by-line mode, you have to slurp the whole file into a variable. However, your program contains many flaws. I will demonstrate. Here follows excerpts of your code (with fixed indentation and missing curly braces).
First off, always use:
use strict;
use warnings;
This will save you many headaches and long searches for hidden problems.
system ("rm -rf $dir/log.txt");
This is better done in Perl, where you can control for errors:
unlink "$dir/log.txt" or die "Cannot delete '$dir/log.txt': $!";
foreach my $list (#files) {
# ^^
Declare the loop variable in the loop itself, not before it.
if( !open(LOGFILE, "$list")){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: unable to open file";
close (File);
You never have to explicitly select a file handle before you print to it. You just print to the file handle: print File "....". What you are doing is just changing the STDOUT file handle, which is not a good thing to do.
Also, this is error logging, which should go to STDERR instead. This can be done simply by opening STDERR to a file at the beginning of your program. Why do this? If you are not debugging a program at a terminal, for example via the web or some other process where STDERR does not show up on your screen. Otherwise it is just extra work while debugging.
open STDERR, ">", "$dir/log.txt" or die "Cannot open 'log.txt' for overwrite: $!";
This has the added benefit of you not having to delete the log first. And now you do this instead:
if (! open LOGFILE, $list ) {
warn "Unable to open file '$list': $!";
} else ....
warn goes to STDERR, so it is basically the same as print STDERR.
Speaking of open, you should use three argument open with explicit file handle. So it becomes:
if (! open my $fh, "<", $list )
} else {
while (<LOGFILE>) {
Since you are looking for a multiline match, you need to slurp the file(s) instead. This is done by setting the input record separator to undef. Typically like this:
my $file = do { local $/; <$fh> }; # $fh is our file handle, formerly LOGFILE
Next how to apply the regex:
if($_ =~ /".*the.*automaically.*\{\n.*period\=20ns.*\}"/) {
$_ =~ is optional. A regex automatically matches against $_ if no other variable is used.
You should probably not use " in the regex. Unless you have " in the target string. I don't know why you put it there, maybe you think strings need to be quoted inside a regex. If you do, that is wrong. To match the string you have above, you do:
if( /the.*automaically.*{.*period=20ns.*}/s ) {
You don't have to escape \ curly braces {} or equal sign =. You don't have to use quotes. The /s modifier makes . (wildcard character period) also match newline, so we can remove \n. We can remove .* from start or end of string, because that is implied, regex matches are always partial unless anchors are used.
break;
The break keyword is only used with the switch feature, which is experimental, plus you don't use it, or have it enabled. So it is just a bareword, which is wrong. If you want to exit a loop prematurely, you use last. Note that we don't have to use last because we slurp the file, so we have no loop.
Also, you generally should pick suitable variable names. If you have a list of files, the variable that contains the file name should not be called $list, I think. It is logical that it is called $file. And the input file handle should not be called LOGFILE, it should be called $input, or $infh (input file handle).
This is what I get if I apply the above to your program:
use strict;
use warnings;
my $dir = "/home/vikas";
my #files = glob( $dir . '/*' );
my $logfile = "$dir/log.txt";
open STDERR, ">", $logfile or die "Cannot open '$logfile' for overwrite: $!";
foreach my $file (#files) {
if(! open my $input, "<", $file) {
warn "Unable to open '$file': $!";
} else {
my $txt = do { local $/; <$fh> };
if($txt =~ /the.*automaically.*{.*period=20ns.*}/) {
print " $file : File contain the required string\n";
}
}
}
Note that the print goes to STDOUT, not to the error log. It is not common practice to have STDOUT and STDERR to the same file. If you want, you can simply redirect output in the shell, like this:
$ perl foo.pl > output.txt
The following sample code demonstrates usage of regex for multiline case with logger($fname,$msg) subroutine.
Code snippet assumes that input files are relatively small and can be read into a variable $data (an assumption is that computer has enough memory to read into).
NOTE: input data files should be distinguishable from rest files in home directory $ENV{HOME}, in this code sample these files assumed to match pattern test_*.dat, perhaps you do not intend to scan absolutely all files in your home directory (there could be many thousands of files but you interested in a few only)
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
my($dir,$re,$logfile);
$dir = '/home/vikas/';
$re = qr/the file_is being created_automaically \{\s+period=20ns\s+\}/;
$logfile = $dir . 'logfile.txt';
unlink $logfile if -e $logfile;
for ( glob($dir . "test_*.dat") ) {
if( open my $fh, '<', $_ ) {
my $data = do { local $/; <$fh> };
close $fh;
logger($logfile, "INFO: $_ contains the required string")
if $data =~ /$re/gsm;
} else {
logger($logfile, "WARN: unable to open $_");
}
}
exit 0;
sub logger {
my $fname = shift;
my $text = shift;
open my $fh, '>>', $fname
or die "Couldn't to open $fname";
say $fh $text;
close $fh;
}
Reference: regex modifies, unlink, perlvar

finding a file in directory using perl script

I'm trying to develop a perl script that looks through all of the user's directories for a particular file name without the user having to specify the entire pathname to the file.
For example, let's say the file of interest was data.list. It's located in /home/path/directory/project/userabc/data.list. At the command line, normally the user would have to specify the pathname to the file like in order to access it, like so:
cd /home/path/directory/project/userabc/data.list
Instead, I want the user just to have to enter script.pl ABC in the command line, then the Perl script will automatically run and retrieve the information in the data.list. which in my case, is count the number of lines and upload it using curl. the rest is done, just the part where it can automatically locate the file
Even though very feasible in Perl, this looks more appropriate in Bash:
#!/bin/bash
filename=$(find ~ -name "$1" )
wc -l "$filename"
curl .......
The main issue would of course be if you have multiple files data1, say for example /home/user/dir1/data1 and /home/user/dir2/data1. You will need a way to handle that. And how you handle it would depend on your specific situation.
In Perl that would be much more complicated:
#! /usr/bin/perl -w
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if 0; #$running_under_some_shell
use strict;
# Import the module File::Find, which will do all the real work
use File::Find ();
# Set the variable $File::Find::dont_use_nlink if you're using AFS,
# since AFS cheats.
# for the convenience of &wanted calls, including -eval statements:
# Here, we "import" specific variables from the File::Find module
# The purpose is to be able to just type '$name' instead of the
# complete '$File::Find::name'.
use vars qw/*name *dir *prune/;
*name = *File::Find::name;
*dir = *File::Find::dir;
*prune = *File::Find::prune;
# We declare the sub here; the content of the sub will be created later.
sub wanted;
# This is a simple way to get the first argument. There is no
# checking on validity.
our $filename=$ARGV[0];
# Traverse desired filesystem. /home is the top-directory where we
# start our seach. The sub wanted will be executed for every file
# we find
File::Find::find({wanted => \&wanted}, '/home');
exit;
sub wanted {
# Check if the file is our desired filename
if ( /^$filename\z/) {
# Open the file, read it and count its lines
my $lines=0;
open(my $F,'<',$name) or die "Cannot open $name";
while (<$F>){ $lines++; }
print("$name: $lines\n");
# Your curl command here
}
}
You will need to look at the argument-parsing, for which I simply used $ARGV[0] and I do dont know what your curl looks like.
A more simple (though not recommended) way would be to abuse Perl as a sort of shell:
#!/usr/bin/perl
#
my $fn=`find /home -name '$ARGV[0]'`;
chomp $fn;
my $wc=`wc -l '$fn'`;
print "$wc\n";
system ("your curl command");
Following code snippet demonstrates one of many ways to achieve desired result.
The code takes one parameter, a word to look for in all subdirectories inside file(s) data.list. And prints out a list of found files in a terminal.
The code utilizes subroutine lookup($dir,$filename,$search) which calls itself recursively once it come across a subdirectory.
The search starts from current working directory (in question was not specified a directory as start point).
use strict;
use warnings;
use feature 'say';
my $search = shift || die "Specify what look for";
my $fname = 'data.list';
my $found = lookup('.',$fname,$search);
if( #$found ) {
say for #$found;
} else {
say 'Not found';
}
exit 0;
sub lookup {
my $dir = shift;
my $fname = shift;
my $search = shift;
my $files;
my #items = glob("$dir/*");
for my $item (#items) {
if( -f $item && $item =~ /\b$fname\b/ ) {
my $found;
open my $fh, '<', $item or die $!;
while( my $line = <$fh> ) {
$found = 1 if $line =~ /\b$search\b/;
if( $found ) {
push #{$files}, $item;
last;
}
}
close $fh;
}
if( -d $item ) {
my $ret = lookup($item,$fname,$search);
push #{$files}, $_ for #$ret;
}
}
return $files;
}
Run as script.pl search_word
Output sample
./capacitor/data.list
./examples/data.list
./examples/test/data.list
Reference:
glob,
Perl file test operators

How to get Perl to loop over all files in a directory?

I have a Perl script with contains
open (FILE, '<', "$ARGV[0]") || die "Unable to open $ARGV[0]\n";
while (defined (my $line = <FILE>)) {
# do stuff
}
close FILE;
and I would like to run this script on all .pp files in a directory, so I have written a wrapper script in Bash
#!/bin/bash
for f in /etc/puppet/nodes/*.pp; do
/etc/puppet/nodes/brackets.pl $f
done
Question
Is it possible to avoid the wrapper script and have the Perl script do it instead?
Yes.
The for f in ...; translates to the Perl
for my $f (...) { ... } (in the case of lists) or
while (my $f = ...) { ... } (in the case of iterators).
The glob expression that you use (/etc/puppet/nodes/*.pp) can be evaluated inside Perl via the glob function: glob '/etc/puppet/nodes/*.pp'.
Together with some style improvements:
use strict; use warnings;
use autodie; # automatic error handling
while (defined(my $file = glob '/etc/puppet/nodes/*.pp')) {
open my $fh, "<", $file; # lexical file handles, automatic error handling
while (defined( my $line = <$fh> )) {
do stuff;
}
close $fh;
}
Then:
$ /etc/puppet/nodes/brackets.pl
This isn’t quite what you asked, but another possibility is to use <>:
while (<>) {
my $line = $_;
# do stuff
}
Then you would put the filenames on the command line, like this:
/etc/puppet/nodes/brackets.pl /etc/puppet/nodes/*.pp
Perl opens and closes each file for you. (Inside the loop, the current filename and line number are $ARGV and $. respectively.)
Jason Orendorff has the right answer:
From Perlop (I/O Operators)
The null filehandle <> is special: it can be used to emulate the behavior of sed and awk, and any other Unix filter program that takes a list of filenames, doing the same to each line of input from all of them. Input from <> comes either from standard input, or from each file listed on the command line.
This doesn't require opendir. It doesn't require using globs or hard coding stuff in your program. This is the natural way to read in all files that are found on the command line, or piped from STDIN into the program.
With this, you could do:
$ myprog.pl /etc/puppet/nodes/*.pp
or
$ myprog.pl /etc/puppet/nodes/*.pp.backup
or even:
$ cat /etc/puppet/nodes/*.pp | myprog.pl
take a look at this documentation it explains all you need to know
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/tmp';
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .pp
next unless ($file =~ m/\.pp$/);
open (FILE, '<', $file) || die "Unable to open $file\n";
while (defined (my $line = <FILE>)) {
# do stuff
}
}
closedir(DIR);
exit 0;
I would suggest to put all filenames to array and then use this array as parameters list to your perl method or script. Please see following code:
use Data::Dumper
$dirname = "/etc/puppet/nodes";
opendir ( DIR, $dirname ) || die "Error in opening dir $dirname\n";
my #files = grep {/.*\.pp/} readdir(DIR);
print Dumper(#files);
closedir(DIR);
Now you can pass \#files as parameter to any perl method.
my #x = <*>;
foreach ( #x ) {
chomp;
if ( -f "$_" ) {
print "process $_\n";
# do stuff
next;
};
};
Perl can shell out to execute system commands in various ways, the most straightforward is using backticks ``
use strict;
use warnings FATAL => 'all';
my #ls = `ls /etc/puppet/nodes/*.pp`;
for my $f ( #ls ) {
open (my $FILE, '<', $f) || die "Unable to open $f\n";
while (defined (my $line = <$FILE>)) {
# do stuff
}
close $FILE;
}
(Note: you should always use strict; and use warnings;)

Extract text and write to a new file in Perl

Simply put, I want to extract text from a file and save that text to a new file, using Perl.
Here is my code, thus far:
#!/usr/local/bin/perl
use warnings;
use strict;
use File::Slurp;
use FileHandle;
use Fcntl qw(:DEFAULT :flock :seek); # Import LOCK_* constants
my $F_IN = FileHandle->new("<$ARGV[0]");
my $F_OUT = FileHandle->new(">PerlTest.txt");
while (my $line = $F_IN->getline) {
$line =~ m|foobar|g;
$F_OUT->print($line);
# I want to only copy the text that matches, not the whole line.
# I changed the example text to 'foobar' to avoid confusion.
}
$F_IN->close();
$F_OUT->close();
Obviously, it's copying the line. How can I extract and print specific text from a file, instead of the whole line?
If it can only happen once per line:
while (<>) {
print "$1\n" if /(thebigredpillow)/;
}
If it can happen multiple times per line:
while (<>) {
while (/(thebigredpillow)/g) {
print "$1\n";
}
}
Usage:
script file.in >file.out
You could use capturing parentheses to grab the matched string:
while (my $line = $F_IN->getline) {
if ($line =~ m|(thebigredpillow)|) {
$F_OUT->print("$1\n");
}
}
See perldoc perlre.
#!/usr/local/bin/perl
use warnings;
use strict;
use IO::All;
my #lines = io($ARGV[0])->slurp;
foreach(#lines) {
if(/thebigredpillow/g) {
$_ >> io('PerlTest.txt');
}
}

Perl: adding a string to $_ is producing strange results

I wrote a super simple script:
#!/usr/bin/perl -w
use strict;
open (F, "<ids.txt") || die "fail: $!\n";
my #ids = <F>;
foreach my $string (#ids) {
chomp($string);
print "$string\n";
}
close F;
This is producing an expected output of all the contents of ids.txt:
hello
world
these
annoying
sourcecode
lines
Now I want to add a file-extension: .txt for every line. This line should do the trick:
#!/usr/bin/perl -w
use strict;
open (F, "<ids.txt") || die "fail: $!\n";
my #ids = <F>;
foreach my $string (#ids) {
chomp($string);
$string .= ".txt";
print "$string\n";
}
close F;
But the result is as follows:
.txto
.txtd
.txte
.txtying
.txtcecode
Instead of appending ".txt" to my lines, the first 4 letters of my string will be replaced by ".txt" Since I want to check if some files exist, I need the full filename with extension.
I have tried to chop, chomp, to substitute (s/\n//), joins and whatever. But the result is still a replacement instead of an append.
Where is the mistake?
Chomp does not remove BOTH \r and \n if the file has DOS line endings and you are running on Linux/Unix.
What you are seeing is actually the original string, a carriage return, and the extension, which overwrites the first 4 characters on the display.
If the incoming file has DOS/Windows line endings you must remove both:
s/\R+$//
A useful debugging technique when you are not quite sure why your data is getting set to what it is is to dump it with Data::Dumper:
#!/usr/bin/perl -w
use strict;
use Data::Dumper ();
$Data::Dumper::Useqq = 1; # important to be able to actually see differences in whitespace, etc
open (F, "<ids.txt") || die "fail: $!\n";
my #ids = <F>;
foreach my $string (#ids) {
chomp($string);
print "$string\n";
print Data::Dumper::Dumper( { 'string' => $string } );
}
close F;
have you tried this?
foreach my $string (#ids) {
chomp($string);
print $string.".txt\n";
}
I'm not sure what's wrong with your code though. these results are strange

Resources