Simple perl opendir - linux

I am completely new to perl and have just been learning it. I came across this script I need to run that has some network Tstat trace data. However, I get an error 'Cannot parse date.'
The code that generates this is here
foreach my $dir (#trace_dirs) {
undef #traces;
opendir(DIR, $dir) || die "Can't open dir: $dir \n";
#traces = grep { /.out$/ && -d "$dir/$_" } readdir(DIR);
foreach my $trace (#traces) {
$trace =~ /^(\d\d)_(\d\d)_(\d\d)_(\w\w\w)_(\d\d\d\d)/;
$trace_date=&ParseDate("$3/$4/$5 $1:$2") || die "Cannot parse date \n";
$traces{$trace_date} = $trace;
$trace_dir{$trace_date} = $dir;
}
closedir DIR;
}
can some tell me what this code is looking for?

When you run into problems like this, throw yourself a bone by looking at the data you are trying to play with. Make sure that the value in $trace is what you expect and that the date string you create is what you expect:
print "Trace is [$trace]\n";
if( $trace =~ /^(\d\d)_(\d\d)_(\d\d)_(\w\w\w)_(\d\d\d\d)/ ) {
my $date = "$3/$4/$5 $1:$2";
print "date is [$date]\n";
$trace_date= ParseDate( $date ) || die "Cannot parse date [$date]\n";
}
I'm guessing that the value in $4, which apparently is a string like 'Jan', 'Feb', and so on, isn't something that ParseDate likes.
Note that you should only use the capture variables after a successful pattern match, lest they be left over from a different match.

However, I get an error 'Cannot parse date.'
You get the error due to the line:
$trace =~ /^(\d\d)_(\d\d)_(\d\d)_(\w\w\w)_(\d\d\d\d)/;
The script expects that all files in the directory with extension .out have proper timestamps in the beginning of their names. And the line of the script lack any error handling.
Try adding some check here, e.g.:
unless($trace =~ /^(\d\d)_(\d\d)_(\d\d)_(\w\w\w)_(\d\d\d\d)/) {
warn "WRN: Malformed file name: $trace\n";
next;
}
That checks if the file name matches, and if it doesn't, warning would be printed and it would be skipped.
Alternatively you can also add the check to the grep {} readdir() line:
#traces = grep { /.out$/ && /^(\d\d)_(\d\d)_(\d\d)_(\w\w\w)_(\d\d\d\d)/ && -d "$dir/$_" } readdir(DIR);
to filter out misplaced .out files (hm, actually directories) before they reach the loop which calls the ParseDate function.

Related

search multi line string from multiple files in a directory

the string to to be searched is:
the file_is being created_automaically {
period=20ns }
the perl script i am using is following ( this script is working fine for single line string but not working for multi line )
#!/usr/bin/perl
my $dir = "/home/vikas";
my #files = glob( $dir . '/*' );
#print "#files";
system ("rm -rf $dir/log.txt");
my $list;
foreach $list(#files){
if( !open(LOGFILE, "$list")){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: unable to open file";
close (File);
else {
while (<LOGFILE>){
if($_ =~ /".*the.*automaically.*\{\n.*period\=20ns.*\}"/){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: File contain the required string\n";
close (File);
break;
}
}
close (LOGFILE);
}
}
This code does not compile, it contains errors that causes it to fail to execute. You should never post code that you have not first tried to run.
The root of your problem is that for a multiline match, you cannot read the file in line-by-line mode, you have to slurp the whole file into a variable. However, your program contains many flaws. I will demonstrate. Here follows excerpts of your code (with fixed indentation and missing curly braces).
First off, always use:
use strict;
use warnings;
This will save you many headaches and long searches for hidden problems.
system ("rm -rf $dir/log.txt");
This is better done in Perl, where you can control for errors:
unlink "$dir/log.txt" or die "Cannot delete '$dir/log.txt': $!";
foreach my $list (#files) {
# ^^
Declare the loop variable in the loop itself, not before it.
if( !open(LOGFILE, "$list")){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: unable to open file";
close (File);
You never have to explicitly select a file handle before you print to it. You just print to the file handle: print File "....". What you are doing is just changing the STDOUT file handle, which is not a good thing to do.
Also, this is error logging, which should go to STDERR instead. This can be done simply by opening STDERR to a file at the beginning of your program. Why do this? If you are not debugging a program at a terminal, for example via the web or some other process where STDERR does not show up on your screen. Otherwise it is just extra work while debugging.
open STDERR, ">", "$dir/log.txt" or die "Cannot open 'log.txt' for overwrite: $!";
This has the added benefit of you not having to delete the log first. And now you do this instead:
if (! open LOGFILE, $list ) {
warn "Unable to open file '$list': $!";
} else ....
warn goes to STDERR, so it is basically the same as print STDERR.
Speaking of open, you should use three argument open with explicit file handle. So it becomes:
if (! open my $fh, "<", $list )
} else {
while (<LOGFILE>) {
Since you are looking for a multiline match, you need to slurp the file(s) instead. This is done by setting the input record separator to undef. Typically like this:
my $file = do { local $/; <$fh> }; # $fh is our file handle, formerly LOGFILE
Next how to apply the regex:
if($_ =~ /".*the.*automaically.*\{\n.*period\=20ns.*\}"/) {
$_ =~ is optional. A regex automatically matches against $_ if no other variable is used.
You should probably not use " in the regex. Unless you have " in the target string. I don't know why you put it there, maybe you think strings need to be quoted inside a regex. If you do, that is wrong. To match the string you have above, you do:
if( /the.*automaically.*{.*period=20ns.*}/s ) {
You don't have to escape \ curly braces {} or equal sign =. You don't have to use quotes. The /s modifier makes . (wildcard character period) also match newline, so we can remove \n. We can remove .* from start or end of string, because that is implied, regex matches are always partial unless anchors are used.
break;
The break keyword is only used with the switch feature, which is experimental, plus you don't use it, or have it enabled. So it is just a bareword, which is wrong. If you want to exit a loop prematurely, you use last. Note that we don't have to use last because we slurp the file, so we have no loop.
Also, you generally should pick suitable variable names. If you have a list of files, the variable that contains the file name should not be called $list, I think. It is logical that it is called $file. And the input file handle should not be called LOGFILE, it should be called $input, or $infh (input file handle).
This is what I get if I apply the above to your program:
use strict;
use warnings;
my $dir = "/home/vikas";
my #files = glob( $dir . '/*' );
my $logfile = "$dir/log.txt";
open STDERR, ">", $logfile or die "Cannot open '$logfile' for overwrite: $!";
foreach my $file (#files) {
if(! open my $input, "<", $file) {
warn "Unable to open '$file': $!";
} else {
my $txt = do { local $/; <$fh> };
if($txt =~ /the.*automaically.*{.*period=20ns.*}/) {
print " $file : File contain the required string\n";
}
}
}
Note that the print goes to STDOUT, not to the error log. It is not common practice to have STDOUT and STDERR to the same file. If you want, you can simply redirect output in the shell, like this:
$ perl foo.pl > output.txt
The following sample code demonstrates usage of regex for multiline case with logger($fname,$msg) subroutine.
Code snippet assumes that input files are relatively small and can be read into a variable $data (an assumption is that computer has enough memory to read into).
NOTE: input data files should be distinguishable from rest files in home directory $ENV{HOME}, in this code sample these files assumed to match pattern test_*.dat, perhaps you do not intend to scan absolutely all files in your home directory (there could be many thousands of files but you interested in a few only)
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
my($dir,$re,$logfile);
$dir = '/home/vikas/';
$re = qr/the file_is being created_automaically \{\s+period=20ns\s+\}/;
$logfile = $dir . 'logfile.txt';
unlink $logfile if -e $logfile;
for ( glob($dir . "test_*.dat") ) {
if( open my $fh, '<', $_ ) {
my $data = do { local $/; <$fh> };
close $fh;
logger($logfile, "INFO: $_ contains the required string")
if $data =~ /$re/gsm;
} else {
logger($logfile, "WARN: unable to open $_");
}
}
exit 0;
sub logger {
my $fname = shift;
my $text = shift;
open my $fh, '>>', $fname
or die "Couldn't to open $fname";
say $fh $text;
close $fh;
}
Reference: regex modifies, unlink, perlvar

finding a file in directory using perl script

I'm trying to develop a perl script that looks through all of the user's directories for a particular file name without the user having to specify the entire pathname to the file.
For example, let's say the file of interest was data.list. It's located in /home/path/directory/project/userabc/data.list. At the command line, normally the user would have to specify the pathname to the file like in order to access it, like so:
cd /home/path/directory/project/userabc/data.list
Instead, I want the user just to have to enter script.pl ABC in the command line, then the Perl script will automatically run and retrieve the information in the data.list. which in my case, is count the number of lines and upload it using curl. the rest is done, just the part where it can automatically locate the file
Even though very feasible in Perl, this looks more appropriate in Bash:
#!/bin/bash
filename=$(find ~ -name "$1" )
wc -l "$filename"
curl .......
The main issue would of course be if you have multiple files data1, say for example /home/user/dir1/data1 and /home/user/dir2/data1. You will need a way to handle that. And how you handle it would depend on your specific situation.
In Perl that would be much more complicated:
#! /usr/bin/perl -w
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if 0; #$running_under_some_shell
use strict;
# Import the module File::Find, which will do all the real work
use File::Find ();
# Set the variable $File::Find::dont_use_nlink if you're using AFS,
# since AFS cheats.
# for the convenience of &wanted calls, including -eval statements:
# Here, we "import" specific variables from the File::Find module
# The purpose is to be able to just type '$name' instead of the
# complete '$File::Find::name'.
use vars qw/*name *dir *prune/;
*name = *File::Find::name;
*dir = *File::Find::dir;
*prune = *File::Find::prune;
# We declare the sub here; the content of the sub will be created later.
sub wanted;
# This is a simple way to get the first argument. There is no
# checking on validity.
our $filename=$ARGV[0];
# Traverse desired filesystem. /home is the top-directory where we
# start our seach. The sub wanted will be executed for every file
# we find
File::Find::find({wanted => \&wanted}, '/home');
exit;
sub wanted {
# Check if the file is our desired filename
if ( /^$filename\z/) {
# Open the file, read it and count its lines
my $lines=0;
open(my $F,'<',$name) or die "Cannot open $name";
while (<$F>){ $lines++; }
print("$name: $lines\n");
# Your curl command here
}
}
You will need to look at the argument-parsing, for which I simply used $ARGV[0] and I do dont know what your curl looks like.
A more simple (though not recommended) way would be to abuse Perl as a sort of shell:
#!/usr/bin/perl
#
my $fn=`find /home -name '$ARGV[0]'`;
chomp $fn;
my $wc=`wc -l '$fn'`;
print "$wc\n";
system ("your curl command");
Following code snippet demonstrates one of many ways to achieve desired result.
The code takes one parameter, a word to look for in all subdirectories inside file(s) data.list. And prints out a list of found files in a terminal.
The code utilizes subroutine lookup($dir,$filename,$search) which calls itself recursively once it come across a subdirectory.
The search starts from current working directory (in question was not specified a directory as start point).
use strict;
use warnings;
use feature 'say';
my $search = shift || die "Specify what look for";
my $fname = 'data.list';
my $found = lookup('.',$fname,$search);
if( #$found ) {
say for #$found;
} else {
say 'Not found';
}
exit 0;
sub lookup {
my $dir = shift;
my $fname = shift;
my $search = shift;
my $files;
my #items = glob("$dir/*");
for my $item (#items) {
if( -f $item && $item =~ /\b$fname\b/ ) {
my $found;
open my $fh, '<', $item or die $!;
while( my $line = <$fh> ) {
$found = 1 if $line =~ /\b$search\b/;
if( $found ) {
push #{$files}, $item;
last;
}
}
close $fh;
}
if( -d $item ) {
my $ret = lookup($item,$fname,$search);
push #{$files}, $_ for #$ret;
}
}
return $files;
}
Run as script.pl search_word
Output sample
./capacitor/data.list
./examples/data.list
./examples/test/data.list
Reference:
glob,
Perl file test operators

Having a small Issue running a Perl scripts IF statement.

I created a small script in Perl and I am really new to this. I'm supposed to have a script that looks at an argument given and create a directory tree in the given argument. This part of the script works. The second part (which is the nested if statement) does not when you do not give an argument and it asks you to input a directory of your choice. I believe the nested if statement is messing up due to the $file input but I'm not entirely sure whats wrong. This is probably something really simple, but I have not been able to find the solution. Thank you in advance for the help and tips.
#! /usr/bin/perl
if ($#ARGV == -1)
{
print "Please enter default directory:";
my $file=<STDIN>;
if (-d $file)
{
chdir $file;
system("mkdir Data");
system("mkdir Data/Image");
system("mkdir Data/Cache");
print "Structure Created";
}
else
{
print "Directory does not exsist";
}
}
else
{
chdir $ARGV[0];
system("mkdir Data");
system("mkdir Data/Image");
system("mkdir Data/Cache");
print ("Structure Created");
}
print ("\n");
The test -d $file is failing because what is entered via STDIN also has the newline, after the string that specifies the directory name. You need chomp($file);
However, there are a few more points I would like to bring up.
Most importantly, there is repeated code in both branches. You really do not want to do that. It can, and does, cause trouble later. Instead, decide on the directory name, and then make it.
Second, there is no reason to go out to the system in order to make a directory. It is far better to do it in Perl, and there are good modules for this.
use strict;
use warnings;
use File::Path qw(make_path);
my $dir;
if (not #ARGV) {
print "Please enter default directory: ";
$dir = <STDIN>;
chomp $dir;
}
else {
$dir = $ARGV[0];
}
die "No directory $dir" if not -d $dir;
my $orig_cwd = chdir $dir or die "Can't chdir to $dir: $!";
my #dirs = map { "Data/$_" } qw(Image Cache);
my #dirs_made = make_path( #dirs, { verbose => 1 } );
print "Created directories:\n";
print "$_\n" for #dirs_made;
I build the directory list using map so to avoid repeated strings with Data/..., and for later flexibility. You can of course just type the names in, but that tends to invite silly mistakes.
I used File::Path to make the directories. It builds the whole path, like mkdir -p, and has a few other useful options that you can pass in { }, including error handling. There are other modules as well, for example Path::Tiny with its mkpath (and a lot of other goodies).
Note that with chdir you probably want to record the current working directory, that it returns, and that you want to check for error. But you don't have to chdir, if there are no other reasons for that. Just include the $dir name in the map
# No chdir needed here
my #dirs = map { "$dir/Data/$_" } qw(Image Cache);

Errors in declaration when trying to parse a csv file

I'm trying to parse a CSV file that is formatted like this:
dog cats,yellow blue tomorrow,12445
birds,window bank door,-novalue-
birds,window door,5553
aspirin man,red,567
(there is no value where -novalue- is written)
use strict;
use warnings;
my $filename = 'in.txt';
my $filename2 = 'out.txt';
open(my $in, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
my $word = "";
while (my $row = <$in>) {
chomp $row;
my #fields = split(/,/,$row);
#Save the first word of the second column
($word) = split(/\s/,$fields[1]);
if ($word eq 'importartWord')
{
printf $out "$fields[0]".';'."$word".';'."$fields[2]";
}
else #keep as it was
{
printf $out "$fields[0]".';'."$fields[1]".';'."$fields[2]";
}
Use of uninitialized value $word in string ne at prueba7.pl line 22, <$in> line 10.
No matter where I define $word I cannot stop receiving that error and can't understand why. I think I have initialized $word correctly. I would really appreciate your help here.
Please if you are going to suggest using Text::CSV post a working code example since I haven't been able to apply it for the propose I have explained here. That's the reason I ended up writing the above code.
PD:
Because I know you are going to ask for my previous code using Text::CSV, here it is:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({ sep_char => ';', binary => 1 }) or
die "Cannot use CSV: ".Text::CSV->error_diag ();
#directorio donde esta esc_prim2.csv
my $file = 'C:\Users\Sergio\Desktop\GIS\perl\esc_prim2.csv';
my $sal = 'C:\Users\Sergio\Desktop\GIS\perl\esc_prim3.csv';
open my $data, "<:encoding(utf8)", "$file" or die "$file: $!";
open my $out, ">:encoding(utf8)", "$sal" or die "$sal: $!";
$csv->eol ("\r\n");
#initializing variables
my $row = "";
my $word = "";
my $validar = 0;
my $line1 = "";
my #mwords = [""];#Just a try to initialize mwords... doesn't work, error keeps showing
#save the first line with field names on the other file
$line1 = <$data>;
$csv->parse($line1);
my #fields = $csv->fields();
$csv->print($out,[$fields[0], $fields[1], $fields[2]]);
while ($row = <$data>) {
if ($csv->parse($row)) {
#fields = $csv->fields();
#save first word of the field's second element
#mwords = split (/\s/, $fields[1]);
#keep the first one
$word = $mwords[0];
printf($mwords[0]);
#if that word is not one of SAN, EL y LA... writes a line in the new file with the updated second field.
$validar = ($word ne 'SAN') && ($word ne 'EL') && ($word ne 'LA');
if ($validar)
{
$csv->print($out,[$fields[0], $word, $fields[2]]);
}
else { #Saves the line in the new file as it was in the old one.
$csv->print($out,[$fields[0], $fields[1], $fields[2]]);
}
} else {#error procesing row
warn "La row no se ha podido procesar\n";
}
}
close $data or die "$file: $!";
close $out or die "$sal: $!";
Here the line where $validar is declared brings the same error of "uninitialized value" although I did it.
I also tried the push #rows, $row; approach but I don't really know how to handle the $rows[$i] since they are references to arrays (pointers) and I know they can't be operated as variables... Couldn't find a working example on how to use them.
I think you're misunderstanding the error. It's not a problem with the declaration of the variable, but with the data that you're putting into the variable.
Use of uninitialized value
This means that you are trying to use a value that is undefined (not undeclared). That means you are using a variable that you haven't given a value.
You can get more details about the warning (and it's a warning, not an error) by adding use diagnostics to your code. You'll get something like this:
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.
To help you figure out what was undefined, perl will try to tell you
the name of the variable (if any) that was undefined. In some cases
it cannot do this, so it also tells you what operation you used the
undefined value in. Note, however, that perl optimizes your program
and the operation displayed in the warning may not necessarily appear
literally in your program. For example, "that $foo" is usually
optimized into "that " . $foo, and the warning will refer to the
concatenation (.) operator, even though there is no . in
your program.
So, when you're populating $word, it's not getting a value. Presumably, that's because some lines in your input file have an empty record there.
I have no way of knowing whether or not that's a valid input for your program, so I can't really give any helpful suggestions on how to fix this.
The error message you provided ends with: line 22, <$in> line 10. but your question doesn't show line 10 of the data ($in) requiring some speculation in this answer - but, I'd say that the second field, $field[1], of line 10 of in.txt is empty.
Consequently, this line: ($word) = split(/\s/,$fields[1]); is causing $word to be undefined. As a result, some use of it latter - be it the ne operator (as displayed in the message) or anything else is going to generate an error.
As an aside - there's little point in interpolating a variable in a string on its own; instead of "$fields[0]", say $fields[0] unless you're going to put something else in there, like "$fields[0];". You may want to consider replacing
printf $out "$fields[0]".';'."$word".';'."$fields[2]";
with
printf $out $fields[0] . ';' . $word . ';' . $fields[2];
or
printf $out "$fields[0];$word;$fields[2]";
Of course, TMTOWTDI - so you may want to tell me to mind my own business instead. :-)

Issues with reducing duplicate output from log file search

This website has been a great help since I'm getting back into programming and I'm attempting to write a simple perl script that will analyze apache log files from a directory (multiple domains), pull the last 1000 lines of each log file, strip the IP addresses from the log file and then compare them with a known block list of bot spammers.
Now so far I've got the script working except for one issue. Lets say I have the IP address 10.128.45.5 in two log files, the script of course analyzes each log file in turn stripping and reducing the IP's to one PER log file but what I'm trying to do is narrow that down even more to one per instance I run this script, regardless if the same IP appears across multiple log files.
Here's the code I've gotten so far, sorry if it's a bit messy.
#!/usr/bin/perl
# Extract IP's from apache access logs for the last hour and matches with forum spam bot list.
# The fun work of Daniel Pearson
use strict;
use warnings;
use Socket;
# Declarations
my ($file,$list,#files,%ips,$match,$path,$sort);
my $timestamp = localtime(time);
# Check to see if matching file exists
$list ='list';
if (-e $list) {
Delete the file so we can download a new one if it exists
print "File Exists!";
print "Deleting File $list\n";
unlink($list);
}
sleep(5);
system ("wget http://www.domain.com/list");
sleep(5);
my $dir = $ARGV[0] or die "Need to specify the log file directory\n";
opendir(DIR, "$dir");
#files = grep(/\.*$/,readdir(DIR));
closedir(DIR);
foreach my $file(#files) {
my $sum = 0;
if (-d $file) {
print "Skipping Directory $file\n";
}
else {
$path = "$dir$file";
open my $path, "-|", "/usr/bin/tail", "-1000", "$path" or die "could not start tail on $path: $!";
my %ips;
while (my $line = <$path>) {
chomp $line;
if ($line =~ m/(?!0+\.0+\.0+\.0+$)(([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5]))/g) {
my $ip = $1;
$ips{$ip} = $ip;
}
}
}
foreach my $key (sort keys %ips) {
open ("files","$list");
while (my $sort = <files>) {
chomp $sort;
if ($key =~ $sort) {
open my $fh, '>>', 'banned.out';
print "Match Found we need to block it $key\n";
print $fh "$key:$timestamp\n";
close $fh;
}
}
}
}
Any advice that could be given I would be grateful for.
To achieve the task:
Move my %ips outside of (above) the foreach my $file (#files) loop.
Move foreach my $key ( sort keys %ips ) outside of (below) the foreach my $file (#files) loop.

Resources