How to use `diff` on files whose paths contain whitespace - linux

I am trying to find the differences between files, but the filename and directory name contain white space. I am trying to execute the command in a Perl script.
diff /home/users/feroz/logs/back_up20161112/Security File/General Security.csv /home/users/feroz/logs/back_up20161113/Security File/General Security.csv
open( my $FH, '>', $logfile ) or die "Cannot open the file '$logfile' $!";
foreach $filename ( keys %filenames ) {
$old_file = $parent_directory . $previous_date . $search_directory . "$filenames{$filename}";
$new_file = $parent_directory . $current_date . $search_directory . "$filenames{$filename}";
if ( !-e $old_file ) {
#print ("\nFile does not exist in previos date backup");
print $FH "\nERROR:'$old_file' ---- does not exist in the backup directory ";
elsif ( !-e $new_file ) {
#print ("\n The file does not exist in current directory");
print $FH "\nERROR:'$new_file' --- does not exist in the present directory ";
else {
# print $FH "\nDifference between the files $filenames{$filename} of $previous_date and $current_date ";
my $cmd = 'diff $old_file $new_file| xargs -0';
open( my $OH, '|-', $cmd ) or die "Failed to read the output";
while ( <OH> ) {
print $FH "$_";
close $OH;

To be absolutly safe, use ShellQuote
use String::ShellQuote;
my $old_file2 = shell_quote($old_file);
my $new_file2 = shell_quote($new_file);
`diff $old_file2 $new_file2`;

Thank you for showing your Perl code
Single quotes don't interpolate, so that will pass the strings $old_file and $new_file to the command instead of those variables' contents. The shell will then try to interpret them as shell variables
I suggest that you write this instead
my $cmd = qq{diff '$old_file' '$new_file' | xargs -0};
open( my $OH, '-|', $cmd ) or die "Failed to read the output";
That will use double quotes (qq{...}) around the command string so that the variables are interpolated. The file paths have single quotes around them to indicate that the shell should treat them as individual strings
This won't work if there's a chance that your file paths could contain a single quote, but that's highly unusual

Pass arguments out-of-band to avoid the need to shell-quote them, rather than interpolating them into a string which is parsed by a shell as a script. Substituting filenames as literal text into a script generates exposure to shell injection attacks -- the shell-scripting equivalent to the family of database security bugs known as SQL injection.
Without Any Shell At All
The pipe to xargs -0 appears to be serving no purpose here. Eliminating it allows this to be run without any shell involved at all:
open(my $fh, "-|", "diff", $old_file, $new_file)
With Shell Arguments Passed Out-Of-Band From Script Text
If you really do want the shell to be invoked, the safe thing to do is to keep the script text an audited constant, and have it retrieve arguments from either the argv list passed to the shell or the environment.
# Putting $1 and $2 in double quotes ensures that the shell treats contents as literal
# the "_" is used for $0 in the shell.
$shell_script='diff "$1" "$2" | xargs -0'
open(my $fh, "-|",
"sh", "-c", $shell_script,
"_", $old_file, $new_file);

You can either
Put the whitespace path segment inside quotes
diff /home/users/feroz/logs/back_up20161112/"Security File"/General Security.csv /home/users/feroz/logs/back_up20161113/"Security File"/General Security.csv
or escape the whitespace
diff /home/users/feroz/logs/back_up20161112/Security\ File/General Security.csv /home/users/feroz/logs/back_up20161113/Security\ File/General Security.csv`


the string to to be searched is:
the file_is being created_automaically {
period=20ns }
the perl script i am using is following ( this script is working fine for single line string but not working for multi line )
my $dir = "/home/vikas";
my #files = glob( $dir . '/*' );
#print "#files";
system ("rm -rf $dir/log.txt");
my $list;
foreach $list(#files){
if( !open(LOGFILE, "$list")){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: unable to open file";
close (File);
else {
while (<LOGFILE>){
if($_ =~ /".*the.*automaically.*\{\n.*period\=20ns.*\}"/){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: File contain the required string\n";
close (File);
close (LOGFILE);
This code does not compile, it contains errors that causes it to fail to execute. You should never post code that you have not first tried to run.
The root of your problem is that for a multiline match, you cannot read the file in line-by-line mode, you have to slurp the whole file into a variable. However, your program contains many flaws. I will demonstrate. Here follows excerpts of your code (with fixed indentation and missing curly braces).
First off, always use:
use strict;
use warnings;
This will save you many headaches and long searches for hidden problems.
system ("rm -rf $dir/log.txt");
This is better done in Perl, where you can control for errors:
unlink "$dir/log.txt" or die "Cannot delete '$dir/log.txt': $!";
foreach my $list (#files) {
# ^^
Declare the loop variable in the loop itself, not before it.
if( !open(LOGFILE, "$list")){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: unable to open file";
close (File);
You never have to explicitly select a file handle before you print to it. You just print to the file handle: print File "....". What you are doing is just changing the STDOUT file handle, which is not a good thing to do.
Also, this is error logging, which should go to STDERR instead. This can be done simply by opening STDERR to a file at the beginning of your program. Why do this? If you are not debugging a program at a terminal, for example via the web or some other process where STDERR does not show up on your screen. Otherwise it is just extra work while debugging.
open STDERR, ">", "$dir/log.txt" or die "Cannot open 'log.txt' for overwrite: $!";
This has the added benefit of you not having to delete the log first. And now you do this instead:
if (! open LOGFILE, $list ) {
warn "Unable to open file '$list': $!";
} else ....
warn goes to STDERR, so it is basically the same as print STDERR.
Speaking of open, you should use three argument open with explicit file handle. So it becomes:
if (! open my $fh, "<", $list )
} else {
while (<LOGFILE>) {
Since you are looking for a multiline match, you need to slurp the file(s) instead. This is done by setting the input record separator to undef. Typically like this:
my $file = do { local $/; <$fh> }; # $fh is our file handle, formerly LOGFILE
Next how to apply the regex:
if($_ =~ /".*the.*automaically.*\{\n.*period\=20ns.*\}"/) {
$_ =~ is optional. A regex automatically matches against $_ if no other variable is used.
You should probably not use " in the regex. Unless you have " in the target string. I don't know why you put it there, maybe you think strings need to be quoted inside a regex. If you do, that is wrong. To match the string you have above, you do:
if( /the.*automaically.*{.*period=20ns.*}/s ) {
You don't have to escape \ curly braces {} or equal sign =. You don't have to use quotes. The /s modifier makes . (wildcard character period) also match newline, so we can remove \n. We can remove .* from start or end of string, because that is implied, regex matches are always partial unless anchors are used.
The break keyword is only used with the switch feature, which is experimental, plus you don't use it, or have it enabled. So it is just a bareword, which is wrong. If you want to exit a loop prematurely, you use last. Note that we don't have to use last because we slurp the file, so we have no loop.
Also, you generally should pick suitable variable names. If you have a list of files, the variable that contains the file name should not be called $list, I think. It is logical that it is called $file. And the input file handle should not be called LOGFILE, it should be called $input, or $infh (input file handle).
This is what I get if I apply the above to your program:
use strict;
use warnings;
my $dir = "/home/vikas";
my #files = glob( $dir . '/*' );
my $logfile = "$dir/log.txt";
open STDERR, ">", $logfile or die "Cannot open '$logfile' for overwrite: $!";
foreach my $file (#files) {
if(! open my $input, "<", $file) {
warn "Unable to open '$file': $!";
} else {
my $txt = do { local $/; <$fh> };
if($txt =~ /the.*automaically.*{.*period=20ns.*}/) {
print " $file : File contain the required string\n";
Note that the print goes to STDOUT, not to the error log. It is not common practice to have STDOUT and STDERR to the same file. If you want, you can simply redirect output in the shell, like this:
$ perl > output.txt
The following sample code demonstrates usage of regex for multiline case with logger($fname,$msg) subroutine.
Code snippet assumes that input files are relatively small and can be read into a variable $data (an assumption is that computer has enough memory to read into).
NOTE: input data files should be distinguishable from rest files in home directory $ENV{HOME}, in this code sample these files assumed to match pattern test_*.dat, perhaps you do not intend to scan absolutely all files in your home directory (there could be many thousands of files but you interested in a few only)
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
$dir = '/home/vikas/';
$re = qr/the file_is being created_automaically \{\s+period=20ns\s+\}/;
$logfile = $dir . 'logfile.txt';
unlink $logfile if -e $logfile;
for ( glob($dir . "test_*.dat") ) {
if( open my $fh, '<', $_ ) {
my $data = do { local $/; <$fh> };
close $fh;
logger($logfile, "INFO: $_ contains the required string")
if $data =~ /$re/gsm;
} else {
logger($logfile, "WARN: unable to open $_");
exit 0;
sub logger {
my $fname = shift;
my $text = shift;
open my $fh, '>>', $fname
or die "Couldn't to open $fname";
say $fh $text;
close $fh;
Reference: regex modifies, unlink, perlvar

I am trying following code in one of my perl script and getting error, how do i execute following shell command and store in variable
#!/usr/bin/perl -w
my $p = $( PROCS=`echo /proc/[0-9]*|wc -w|tr -d ' '`; read L1 L2 L3 DUMMY < /proc/loadavg ; echo ${L1}:${L2}:${L3}:${PROCS} );
print $p;
Bareword found where operator expected at /tmp/ line 3, near "$( PROCS"
(Missing operator before PROCS?)
syntax error at /tmp/ line 3, near "$( PROCS"
Unterminated <> operator at /tmp/ line 3.
What is wrong?
my $p = $( PROCS=`echo /proc/[0-9]*|wc -w|tr -d ' '`; read L1 L2 L3 DUMMY < /proc/loadavg ; echo ${L1}:${L2}:${L3}:${PROCS} );
Isn't perl. It's how you'd execute a command in bash.
To run a command in perl you can:
use system.
put your command in backticks
qx (quote-execute):
However, you're enumerating a directory there, wordcounting, tr-ing and reading. So you don't actually need to do all that using a shell command. And indeed, I'd discourage you from doing so, because that's just a way to make a mess with no productive benefit.
Looks like what you're after as an end result is the 3 load average samples and a count of number of processes. Is that right?
In which case:
my $proc_count = scalar ( () = glob ( "/proc/[0-9]*" ));
open ( my $la, "<", "/proc/loadavg" ) or warn $!;
print join ( ":", split ( /\s+/, <$la> ), $proc_count ),"\n";
Something like that, anyway.
Simply printing a shell command in your Perl script won't actually execute it. You have to tell Perl that it's an external command, which you can do with system:
use strict;
use warnings;
my $command = q{
PROCS=`echo /proc/[0-9]*|wc -w|tr -d ' '`;
read L1 L2 L3 DUMMY < /proc/loadavg;
echo ${L1}:${L2}:${L3}:${PROCS}
(Note that you should put use strict; use warnings; at the top of every Perl script you write.)
However, it's generally better to use native Perl functionality instead of system. All you're doing is reading from files, which Perl is perfectly capable of doing:
use strict;
use warnings;
use 5.010;
my #procs = glob '/proc/[0-9]*';
my $file = '/proc/loadavg';
open my $fh, '<', $file or die "Failed to open '$file': $!";
my $load = <$fh>;
say(join ':', (split ' ', $load)[0..2], scalar #procs);
Even better might be to use the Proc::ProcessTable module, which provides a consistent interface to the /proc filesystem across different flavors of *nix. It got some bad reviews early on but is supposedly getting bugfixes now; I haven't used it myself but you might take a look.

I have a Perl script with contains
open (FILE, '<', "$ARGV[0]") || die "Unable to open $ARGV[0]\n";
while (defined (my $line = <FILE>)) {
# do stuff
close FILE;
and I would like to run this script on all .pp files in a directory, so I have written a wrapper script in Bash
for f in /etc/puppet/nodes/*.pp; do
/etc/puppet/nodes/ $f
Is it possible to avoid the wrapper script and have the Perl script do it instead?
The for f in ...; translates to the Perl
for my $f (...) { ... } (in the case of lists) or
while (my $f = ...) { ... } (in the case of iterators).
The glob expression that you use (/etc/puppet/nodes/*.pp) can be evaluated inside Perl via the glob function: glob '/etc/puppet/nodes/*.pp'.
Together with some style improvements:
use strict; use warnings;
use autodie; # automatic error handling
while (defined(my $file = glob '/etc/puppet/nodes/*.pp')) {
open my $fh, "<", $file; # lexical file handles, automatic error handling
while (defined( my $line = <$fh> )) {
do stuff;
close $fh;
$ /etc/puppet/nodes/
This isn’t quite what you asked, but another possibility is to use <>:
while (<>) {
my $line = $_;
# do stuff
Then you would put the filenames on the command line, like this:
/etc/puppet/nodes/ /etc/puppet/nodes/*.pp
Perl opens and closes each file for you. (Inside the loop, the current filename and line number are $ARGV and $. respectively.)
Jason Orendorff has the right answer:
From Perlop (I/O Operators)
The null filehandle <> is special: it can be used to emulate the behavior of sed and awk, and any other Unix filter program that takes a list of filenames, doing the same to each line of input from all of them. Input from <> comes either from standard input, or from each file listed on the command line.
This doesn't require opendir. It doesn't require using globs or hard coding stuff in your program. This is the natural way to read in all files that are found on the command line, or piped from STDIN into the program.
With this, you could do:
$ /etc/puppet/nodes/*.pp
$ /etc/puppet/nodes/*.pp.backup
or even:
$ cat /etc/puppet/nodes/*.pp |
take a look at this documentation it explains all you need to know
use strict;
use warnings;
my $dir = '/tmp';
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .pp
next unless ($file =~ m/\.pp$/);
open (FILE, '<', $file) || die "Unable to open $file\n";
while (defined (my $line = <FILE>)) {
# do stuff
exit 0;
I would suggest to put all filenames to array and then use this array as parameters list to your perl method or script. Please see following code:
use Data::Dumper
$dirname = "/etc/puppet/nodes";
opendir ( DIR, $dirname ) || die "Error in opening dir $dirname\n";
my #files = grep {/.*\.pp/} readdir(DIR);
print Dumper(#files);
Now you can pass \#files as parameter to any perl method.
my #x = <*>;
foreach ( #x ) {
if ( -f "$_" ) {
print "process $_\n";
# do stuff
Perl can shell out to execute system commands in various ways, the most straightforward is using backticks ``
use strict;
use warnings FATAL => 'all';
my #ls = `ls /etc/puppet/nodes/*.pp`;
for my $f ( #ls ) {
open (my $FILE, '<', $f) || die "Unable to open $f\n";
while (defined (my $line = <$FILE>)) {
# do stuff
close $FILE;
(Note: you should always use strict; and use warnings;)

The shell script will be passed a string of arguments. The position of the key/value I am looking to parse out may change over time, i.e. it may come before or after another key at any time so parsing between two keys wouldn't be an option.
I am looking to parse the domain key out of a string like this:
maxpark 0 maxsub n domain maxlst n max_defer_fail_percentage user oli force no_cache_update 0 maxpop n maxaddon 0 locale en contactemail
The key would be "domain" the value would be "". The domain key could have more than one '.' in it so I would need to grab the entire domain key.
I am not the best with regular expressions but I imagine using 'sed' is what I'm going to need to do.
I am accessing this full string using $*, if I could simply reference the key by accessing $DOMAIN that would be great, but since my only option is to access based on position, $3, and the position could change, that isn't an option
Solved the problem using PERL.
#!/usr/bin/perl -w
use strict;
my %OPTS = #ARGV;
open(FILE, "</var/named/$OPTS{'domain'}.db") || die "File not found";
my #lines = <FILE>;
my #newlines;
foreach(#lines) {
$_ =~ s/$LOCAL_IP/$PUBLIC_IP/g;
open(FILE, ">/var/named/$OPTS{'domain'}.db") || die "File not found";
print FILE #newlines;
If you do have perl, just use this one-liner from your shell script.
domain=$( echo $* | perl -ne '/domain\s([^\s]+)\s/ and print "$1"' )
Or if you'd rather just do it with sed:
domain=$( echo $* | sed 's/.*\<domain \([^ ]\+\).*/\1/' )

Mac Os X does not have the useful linux command rename, which has the following format:
rename 'perl-regex' list-of-files
So here's what I have put together but it does not rename any files ($new is always the same as $file):
#!/usr/bin/env perl -w
use strict;
use File::Copy 'move';
my $regex=shift;
my #files=#ARGV;
for my $file (#files)
my $new=$file;
$new =~ "$regex"; # this is were the problem is !!!
if ($new ne $file)
print STDOUT "$file --> $new \n";
move $file, ${new} or warn "Could not rename $file to $new";
It is as if I am not passing the regexp and if I hard code it to
$new =~ s/TMP/tmp;
it will work just fine...
Any thoughts?
$operator = 's/TMP/tmp/';
print $operator;
doesn't magically evaluate the operator, so it should be no surprise that
$operator = 's/TMP/tmp/';
$x =~ $operator;
doesn't either. If you want to evaluate Perl code, you're going to have to pass it to the Perl interpreter. You can access it using eval EXPR.
$operator = 's/TMP/tmp/';
eval('$x =~ '.$operator.'; 1')
or die $#;
You cannot put the whole sentence s/TMP/tmp; in a variable. You can, though, do something like
$new =~ s/$find/$replace;
$find being your regex and $replace what you want to replace the matches with.
If you still want to pass the whole sentence, you might want to take a look at eval().
There are two ways this can be solved elegantly
Require two seperate command line arguments: One for the regex, and one for the replacement. This is inelegant and restrictive.
my ($search, $replace, #files) = #ARGV;
my $new = $file;
$new =~ s/$search/$replace/e; # the /e evals the replacement,
# allowing us to interpolate vars
Invoked like my-rename '(.*)\.txt' '#{[our $i++]}-$' *.txt. This allows to execute almost any code⁽¹⁾ via string variable interpolation.
(1): no nested regexes in older perls
Just allow arbitrary Perl code, similar to perl -ne'...'. The semantics of the -n switch are that the current line is passed as $_. It would make sense to pass filenames as $_, and use the value of the last statement as the new filename. This would lead to something like
# somewhat tested
my ($eval_content, #files) = #ARGV;
my $code = eval q' sub {
no strict; # could be helpful ;-)
my #_out_;
for (#_) {
my $_orig_ = $_;
push #_out_, [ $_orig_ => do { ' . $eval_content . q' } ];
# or
# do { " . $eval_content . " };
# push #_out_, [ $_orig_, $_ ];
# if you want to use $_ as out-argument (like -p).
# Can lead to more concise code.
return #_out_;
} ';
die "Eval error: $#" if $#;
for my $rename ($code->(#files)) {
my ($from, $to) = #$rename;
This could be invoked like my-rename 'next FILENAME if /^\./; our $i++; s/(.*)\.txt/$i-$; $_' *.txt. That skips all files starting with a dot, registeres a global variable $i, and puts a number counting upwards from one in front of each filename, and changes the extension. Then we return $_ in the last statement.
The loop builds pairs of the original and the new filename, which can be processed in the second loop.
This is probably quite flexible, and not overly inefficient.
Well, it is already a Perl utility, and it's on CPAN: You can use the module which comes with that utility, File::Rename, in a direct way:
#!/usr/bin/env perl
use File::Rename qw(rename);
rename #ARGV, sub { s/TMP/tmp/ }, 'verbose';
Other possibility is to concatenate the module and the script from that distribution and put the resulting file somewhere into your $PATH.
Better download the real script with no dependencies:
#!/usr/bin/perl -w
# This script was developed by Robin Barker (,
# from Larry Wall's original script eg/rename from the perl source.
# This script is free software; you can redistribute it and/or modify it
# under the same terms as Perl itself.
# Larry(?)'s RCS header:
# RCSfile: rename,v Revision: 4.1 Date: 92/08/07 17:20:30
# $RCSfile: rename,v $$Revision: 1.5 $$Date: 1998/12/18 16:16:31 $
# $Log: rename,v $
# Revision 1.5 1998/12/18 16:16:31 rmb1
# moved to perl/source
# changed man documentation to POD
# Revision 1.4 1997/02/27 17:19:26 rmb1
# corrected usage string
# Revision 1.3 1997/02/27 16:39:07 rmb1
# added -v
# Revision 1.2 1997/02/27 16:15:40 rmb1
# *** empty log message ***
# Revision 1.1 1997/02/27 15:48:51 rmb1
# Initial revision
use strict;
use Getopt::Long;
my ($verbose, $no_act, $force, $op);
die "Usage: rename [-v] [-n] [-f] perlexpr [filenames]\n"
unless GetOptions(
'v|verbose' => \$verbose,
'n|no-act' => \$no_act,
'f|force' => \$force,
) and $op = shift;
$verbose++ if $no_act;
if (!#ARGV) {
print "reading filenames from STDIN\n" if $verbose;
for (#ARGV) {
my $was = $_;
eval $op;
die $# if $#;
next if $was eq $_; # ignore quietly
if (-e $_ and !$force)
warn "$was not renamed: $_ already exists\n";
elsif ($no_act or rename $was, $_)
print "$was renamed as $_\n" if $verbose;
warn "Can't rename $was $_: $!\n";
=head1 NAME
rename - renames multiple files
B<rename> S<[ B<-v> ]> S<[ B<-n> ]> S<[ B<-f> ]> I<perlexpr> S<[ I<files> ]>
renames the filenames supplied according to the rule specified as the
first argument.
The I<perlexpr>
argument is a Perl expression which is expected to modify the C<$_>
string in Perl for at least some of the filenames specified.
If a given filename is not modified by the expression, it will not be
If no filenames are given on the command line, filenames will be read
via standard input.
For example, to rename all files matching C<*.bak> to strip the extension,
you might say
rename 's/\.bak$//' *.bak
To translate uppercase names to lower, you'd use
rename 'y/A-Z/a-z/' *
=head1 OPTIONS
=over 8
=item B<-v>, B<--verbose>
Verbose: print names of files successfully renamed.
=item B<-n>, B<--no-act>
No Action: show what files would have been renamed.
=item B<-f>, B<--force>
Force: overwrite existing files.
No environment variables are used.
=head1 AUTHOR
Larry Wall
=head1 SEE ALSO
mv(1), perl(1)
If you give an invalid Perl expression you'll get a syntax error.
=head1 BUGS
The original C<rename> did not check for the existence of target filenames,
so had to be used with care. I hope I've fixed that (Robin Barker).
