Setting Binary Transfer mode - linux

My Perl script below is very basic. It goes and copies a .zip file located on one server and transfers it to another server.
#!/usr/bin/perl -w
use strict;
use warnings;
my $remotehost ="XXXXXX";
my $remotepath = "/USA/Fusion_Keyword_Reports";
my $remoteuser = "XXXXXXX";
my $remotepass = "XXXXXXX";
my $inputfile ="/fs/fs01/crmdata/SYWR/AAM/list8.txt";
my $remotefile1;
#my $DIR="/fs/fs01/crmdata/SYWR/AAM";
open (FILEIN, "<", $inputfile) or die "can't open list8 file";
while (my $line =<FILEIN>) {
if ($line =~ m /Keywords-Report(.*?)/i && $line !~ m/Keywords-Report-loopback/i) {
print $line;
$remotefile1 =$line;
last;
}
}
close FILEIN;
print "remotefile $remotefile1\n";
my $DIR1="/fs/fs01/crmdata/SYWR/AAM/$remotefile1";
my $cmd= "ftp -in";
my $ftp_command = "open $remotehost
user $remoteuser $remotepass
cd $remotepath
asc
get $remotefile1
bye
";
open (CMD, "|$cmd");
print CMD $ftp_command;
close (CMD);
exit(0);
When I run the script it does work but I get an error and the file that gets transferred is corrupted as a result.
226 Transfer complete.
WARNING! 40682 bare linefeeds received in ASCII mode.
File may not have transferred correctly.
I did some reading and I think I need to set the transfer mode to binary. However I am really not sure how to do that in my script. Additionally, I am not sure that is the right solution either.
I would really appreciate your thoughts about this error. If setting the transfer mode to Binary will fix this problem can you please show me where I would do that?

my $ftp_command = "open $remotehost
user $remoteuser $remotepass
cd $remotepath
binary
get $remotefile1
bye
";

Related

search multi line string from multiple files in a directory

the string to to be searched is:
the file_is being created_automaically {
period=20ns }
the perl script i am using is following ( this script is working fine for single line string but not working for multi line )
#!/usr/bin/perl
my $dir = "/home/vikas";
my #files = glob( $dir . '/*' );
#print "#files";
system ("rm -rf $dir/log.txt");
my $list;
foreach $list(#files){
if( !open(LOGFILE, "$list")){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: unable to open file";
close (File);
else {
while (<LOGFILE>){
if($_ =~ /".*the.*automaically.*\{\n.*period\=20ns.*\}"/){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: File contain the required string\n";
close (File);
break;
}
}
close (LOGFILE);
}
}
This code does not compile, it contains errors that causes it to fail to execute. You should never post code that you have not first tried to run.
The root of your problem is that for a multiline match, you cannot read the file in line-by-line mode, you have to slurp the whole file into a variable. However, your program contains many flaws. I will demonstrate. Here follows excerpts of your code (with fixed indentation and missing curly braces).
First off, always use:
use strict;
use warnings;
This will save you many headaches and long searches for hidden problems.
system ("rm -rf $dir/log.txt");
This is better done in Perl, where you can control for errors:
unlink "$dir/log.txt" or die "Cannot delete '$dir/log.txt': $!";
foreach my $list (#files) {
# ^^
Declare the loop variable in the loop itself, not before it.
if( !open(LOGFILE, "$list")){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: unable to open file";
close (File);
You never have to explicitly select a file handle before you print to it. You just print to the file handle: print File "....". What you are doing is just changing the STDOUT file handle, which is not a good thing to do.
Also, this is error logging, which should go to STDERR instead. This can be done simply by opening STDERR to a file at the beginning of your program. Why do this? If you are not debugging a program at a terminal, for example via the web or some other process where STDERR does not show up on your screen. Otherwise it is just extra work while debugging.
open STDERR, ">", "$dir/log.txt" or die "Cannot open 'log.txt' for overwrite: $!";
This has the added benefit of you not having to delete the log first. And now you do this instead:
if (! open LOGFILE, $list ) {
warn "Unable to open file '$list': $!";
} else ....
warn goes to STDERR, so it is basically the same as print STDERR.
Speaking of open, you should use three argument open with explicit file handle. So it becomes:
if (! open my $fh, "<", $list )
} else {
while (<LOGFILE>) {
Since you are looking for a multiline match, you need to slurp the file(s) instead. This is done by setting the input record separator to undef. Typically like this:
my $file = do { local $/; <$fh> }; # $fh is our file handle, formerly LOGFILE
Next how to apply the regex:
if($_ =~ /".*the.*automaically.*\{\n.*period\=20ns.*\}"/) {
$_ =~ is optional. A regex automatically matches against $_ if no other variable is used.
You should probably not use " in the regex. Unless you have " in the target string. I don't know why you put it there, maybe you think strings need to be quoted inside a regex. If you do, that is wrong. To match the string you have above, you do:
if( /the.*automaically.*{.*period=20ns.*}/s ) {
You don't have to escape \ curly braces {} or equal sign =. You don't have to use quotes. The /s modifier makes . (wildcard character period) also match newline, so we can remove \n. We can remove .* from start or end of string, because that is implied, regex matches are always partial unless anchors are used.
break;
The break keyword is only used with the switch feature, which is experimental, plus you don't use it, or have it enabled. So it is just a bareword, which is wrong. If you want to exit a loop prematurely, you use last. Note that we don't have to use last because we slurp the file, so we have no loop.
Also, you generally should pick suitable variable names. If you have a list of files, the variable that contains the file name should not be called $list, I think. It is logical that it is called $file. And the input file handle should not be called LOGFILE, it should be called $input, or $infh (input file handle).
This is what I get if I apply the above to your program:
use strict;
use warnings;
my $dir = "/home/vikas";
my #files = glob( $dir . '/*' );
my $logfile = "$dir/log.txt";
open STDERR, ">", $logfile or die "Cannot open '$logfile' for overwrite: $!";
foreach my $file (#files) {
if(! open my $input, "<", $file) {
warn "Unable to open '$file': $!";
} else {
my $txt = do { local $/; <$fh> };
if($txt =~ /the.*automaically.*{.*period=20ns.*}/) {
print " $file : File contain the required string\n";
}
}
}
Note that the print goes to STDOUT, not to the error log. It is not common practice to have STDOUT and STDERR to the same file. If you want, you can simply redirect output in the shell, like this:
$ perl foo.pl > output.txt
The following sample code demonstrates usage of regex for multiline case with logger($fname,$msg) subroutine.
Code snippet assumes that input files are relatively small and can be read into a variable $data (an assumption is that computer has enough memory to read into).
NOTE: input data files should be distinguishable from rest files in home directory $ENV{HOME}, in this code sample these files assumed to match pattern test_*.dat, perhaps you do not intend to scan absolutely all files in your home directory (there could be many thousands of files but you interested in a few only)
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
my($dir,$re,$logfile);
$dir = '/home/vikas/';
$re = qr/the file_is being created_automaically \{\s+period=20ns\s+\}/;
$logfile = $dir . 'logfile.txt';
unlink $logfile if -e $logfile;
for ( glob($dir . "test_*.dat") ) {
if( open my $fh, '<', $_ ) {
my $data = do { local $/; <$fh> };
close $fh;
logger($logfile, "INFO: $_ contains the required string")
if $data =~ /$re/gsm;
} else {
logger($logfile, "WARN: unable to open $_");
}
}
exit 0;
sub logger {
my $fname = shift;
my $text = shift;
open my $fh, '>>', $fname
or die "Couldn't to open $fname";
say $fh $text;
close $fh;
}
Reference: regex modifies, unlink, perlvar

Adding custom header to specific files in a directory

I would like to add a unique one line header that pertains to each file FOCUS*.tsv file in a specified directory. After that, I would like to combine all of these files into one file.
First I’ve tried sed command.
`my $cmd9 = `sed -i '1i$SampleID[4]' $tsv_file`;` print $cmd9;
It looked like it worked but after I’ve combined all of these files into one file in the next section of the code, the inserted row was listed four times for each file.
I’ve tried the following Perl script to accomplish the same but it deleted the content of the file and only prints out the added header.
I’m looking for the simplest way to accomplish what I’m looking for.
Here is what I’ve tried.
#!perl
use strict;
use warnings;
use Tie::File;
my $home="/data/";
my $tsv_directory = $home."test_all_runs/".$ARGV[0];
my $tsvfiles = $home."test_all_runs/".$ARGV[0]."/tsv_files.txt";
my #run_directory = (); #run_directory = split /\//, $tsv_directory; print "The run directory is #############".$run_directory[3]."\n";
my $cmd = `ls $tsv_directory/FOCUS*\.tsv > $tsvfiles`; #print "$cmd";
my $cmda = "ls $tsv_directory/FOCUS*\.tsv > $tsvfiles"; #print "$cmda";
my #tsvfiles =();
#this code opens the vcf_files.txt file and passes each line into an array for indidivudal manipulation
open(TXT2, "$tsvfiles");
while (<TXT2>){
push (#tsvfiles, $_);
}
close(TXT2);
foreach (#tsvfiles){
chop($_);
}
#this loop works fine
for my $tsv_file (#tsvfiles){
open my $in, '>', $tsv_file or die "Can't write new file: $!";
open my $out, '>', "$tsv_file.new" or die "Can't write new file: $!";
$tsv_file =~ m|([^/]+)-oncomine.tsv$| or die "Can't extract Sample ID";
my $sample_id = $1;
#print "The sample ID is ############## $sample_id\n";
my $headerline = $run_directory[3]."/".$sample_id;
print $out $headerline;
while( <$in> ) {
print $out $_;
}
close $out;
close $in;
unlink($tsv_file);
rename("$tsv_file.new", $tsv_file);
}
Thank you
Apparently, the wrong '>' when opening the file for reading was the problem and it got solved.
However, I'd like to make a few comments on some of the rest of the code.
The list of files is built by running external ls redirected to a file, then reading this file into an array. However, that is exactly the job of glob and all of that is replaced by
my #tsvfiles = glob "$tsv_directory/FOCUS*.tsv";
Then you don't need the chomp either, and the chop that is used would actually hurt since it removes the last character, not only the newline (or really $/).
Use of chop is probably not what you want. If you are removing the linefeed ($/) use chomp
To extract a match and assign it, a common idiom is
my ($sample_id) = $tsv_file =~ m|([^/]+)-oncomine.tsv$|
or die "Can't extract Sample ID: $!";
Note that I also added $!, to actually print the error. Otherwise we just don't know what it was.
The unlink and rename appear to be overwriting one file with another. You can do that by using move from the core module File::Copy
use File::Copy qw(move);
move ($tsv_file_new, $tsv_file)
or die "Can't move $tsv_file to $tsv_file_new: $!";
which renames the _new into $tsv_file, so overwriting it.
As for how the files need to be combined, more precise explanation would be needed.

batch download from URL

I want to download thousand of files from a URL. Each line in "FileName.txt" contains the name of file to download. I am using a Perl script to take the file name from "FileName.txt" and downloading them after a random time. I run script as "./program.pl Filename.txt"
Filename.txt
A
B
C
B
program.pl
#!/usr/bin/perl
$file1=$ARGV[0];
open(FP1, $file1);
while($s1=<FP1>)
<br>
{ chomp ($s1);
$range = 5;
$minimum = 3;
$random_number = int(rand($range)) + $minimum;
`wget --wait="$random_number" "http://URL=$s1"`;
}
I am getting the output for few initial file but not for remaining file. For remaining file $ emacs fileD.txt give
[13] 29699
Could you kindly tell me why I am getting "[13] 29699", and what is the best way to download file after random time interval. Sorry, the program at while does not show the correct handler. Thanks
You don't show where $id comes from, but presumably some URLs contain & which puts the process in the background. You should use single quotes for wget's argument or use the list form of system.
Further, wget's wait parameter is only relevant if your are using wget itself to traverse links from a given URL. In your case, you need your Perl script to sleep between invoking wget for each URL:
#!/usr/bin/env perl
use strict;
use warnings;
use constant WAIT_MINIMUM => 3;
use constant WAIT_RANGE => 5;
my ($url_list_file) = #ARGV;
defined($url_list_file)
or die "Need URL list\n";
open my $fh, '<', $url_list_file
or die "Cannot open '$url_list_file': $!";
while (my $url = <$fh>) {
$url =~ s/\R\z//;
my #cmd = (wget => 'http://$url');
print "#cmd\n";
my $error = system #cmd;
if ($error) {
warn "''#cmd' failed: $?";
}
sleep WAIT_MINIMUM + rand(WAIT_RANGE);
}
What means URL=? wget takes url as simple paramter. Seems to be you need
`wget --wait=$random_number 'http://$s1'`;

Search filesystem via perl script while ignoring remote mounts

I've written a perl script that is designed to search a server for world writable files. After some testing, though, I've found that I made a mistake in the logic. Specifically, I've told it to not search /. My initial thought behind this was that I was looking for locally mounted volumes while avoiding those of a remote variety (CIFS, NFS, what-have-you).
What I failed to take into consideration is that not every directory has a unique volume. As a result, by excluding / in my scan, I've missed several directories that should be included. Now I need to rework the script to include those while still excluding remote volumes.
#!/usr/bin/perl
# Directives which establish our execution environment
use warnings;
use strict;
use Fcntl ':mode';
use File::Find;
no warnings 'File::Find';
no warnings 'uninitialized';
# Variables used throughout the script
my $DIR = "/var/log/tivoli/";
my $MTAB = "/etc/mtab";
my $PERMFILE = "world_writable_w_files.txt";
my $TMPFILE = "world_writable_files.tmp";
my $EXCLUDE = "/usr/local/etc/world_writable_excludes.txt";
# Compile a list of mountpoints that need to be scanned
my #mounts;
# Create the filehandle for the /etc/mtab file
open MT, "<${MTAB}" or die "Cannot open ${MTAB}, $!";
# We only want the local mountpoints that are not "/"
while (<MT>) {
if ($_ =~ /ext[34]/) {
my #line = split;
push(#mounts, $line[1]) unless ($_ =~ /root/);
}
}
close MT;
# Read in the list of excluded files
my $regex = do {
open EXCLD, "<${EXCLUDE}" or die "Cannot open ${EXCLUDE}, $!\n";
my #ignore = <EXCLD>;
chomp #ignore;
local $" = '|';
qr/#ignore/;
};
# Create the output file path if it doesn't already exist.
mkdir "${DIR}" or die "Cannot execute mkdir on ${DIR}, $!" unless (-d "${DIR}");
# Create the filehandle for writing the findings
open WWFILE, ">${DIR}${TMPFILE}" or die "Cannot open ${DIR}${TMPFILE}, $!";
foreach (#mounts) {
# The anonymous subroutine which is executed by File::Find
find sub {
return unless -f; # Is it a regular file...
# ...and world writable.
return unless (((stat)[2] & S_IWUSR) && ((stat)[2] & S_IWGRP) && ((stat)[2] & S_IWOTH));
# Add the file to the list of found world writable files unless it is
# in the list if exclusions
print WWFILE "$File::Find::name\n" unless ($File::Find::name =~ $regex);
}, $_;
}
close WWFILE;
# If no world-writable files have been found ${TMPFILE} should be zero-size;
# Delete it so Tivoli won't alert
if (-z "${DIR}${TMPFILE}") {
unlink "${DIR}${TMPFILE}";
} else {
rename("${DIR}${TMPFILE}","${DIR}${PERMFILE}") or die "Cannot rename file ${DIR}${TMPFILE}, $!";
}
I'm at a bit of a loss as to how to approach this now. I know I can obtain the necessary information using stat -f -c %T but I don't see a similar option for perl's built-in stat (unless I'm misinterpreting the descriptions for output fields; perhaps it is found in one of the S_ variables?).
I'm just looking for a push in the right direction. I'd really rather not drop to a shell command to obtain this information.
EDIT: I've found this answer to a similar question, but it seems to be not entirely helpful. When I test the built-in stat against a CIFS mount I get 18. Perhaps what I need is a comprehensive list of values that could be returned for remote files to compare against?
EDIT2: This is the script in its new form which meets the requirements:
#!/usr/bin/perl
# Directives which establish our execution environment
use warnings;
use strict;
use Fcntl ':mode';
use File::Find;
no warnings 'File::Find';
no warnings 'uninitialized';
# Variables used throughout the script
my $DIR = "/var/log/tivoli/";
my $MTAB = "/etc/mtab";
my $PERMFILE = "world_writable_w_files.txt";
my $TMPFILE = "world_writable_files.tmp";
my $EXCLUDE = "/usr/local/etc/world_writable_excludes.txt";
my $ROOT = "/";
my #devNum;
# Create an array of the file stats for "/"
my #rootStats = stat("${ROOT}");
# Compile a list of mountpoints that need to be scanned
my #mounts;
open MT, "<${MTAB}" or die "Cannot open ${MTAB}, $!";
# We only want the local mountpoints
while (<MT>) {
if ($_ =~ /ext[34]/) {
my #line = split;
push(#mounts, $line[1]);
}
}
close MT;
# Build an array of each mountpoint's device number for future comparison
foreach (#mounts) {
my #stats = stat($_);
push(#devNum, $stats[0]);
}
# Read in the list of excluded files and create a regex from them
my $regExcld = do {
open XCLD, "<${EXCLUDE}" or die "Cannot open ${EXCLUDE}, $!\n";
my #ignore = <XCLD>;
chomp #ignore;
local $" = '|';
qr/#ignore/;
};
# Create a regex to compare file device numbers to.
my $devRegex = do {
chomp #devNum;
local $" = '|';
qr/#devNum/;
};
# Create the output file path if it doesn't already exist.
mkdir("${DIR}" or die "Cannot execute mkdir on ${DIR}, $!") unless (-d "${DIR}");
# Create our filehandle for writing our findings
open WWFILE, ">${DIR}${TMPFILE}" or die "Cannot open ${DIR}${TMPFILE}, $!";
foreach (#mounts) {
# The anonymous subroutine which is executed by File::Find
find sub {
# Is it in a basic directory, ...
return if $File::Find::dir =~ /sys|proc|dev/;
# ...a regular file, ...
return unless -f;
# ...local, ...
my #dirStats = stat($File::Find::name);
return unless $dirStats[0] =~ $devRegex;
# ...and world writable?
return unless (((stat)[2] & S_IWUSR) && ((stat)[2] & S_IWGRP) && ((stat)[2] & S_IWOTH));
# If so, add the file to the list of world writable files unless it is
# in the list if exclusions
print(WWFILE "$File::Find::name\n") unless ($File::Find::name =~ $regExcld);
}, $_;
}
close WWFILE;
# If no world-writable files have been found ${TMPFILE} should be zero-size;
# Delete it so Tivoli won't alert
if (-z "${DIR}${TMPFILE}") {
unlink "${DIR}${TMPFILE}";
} else {
rename("${DIR}${TMPFILE}","${DIR}${PERMFILE}") or die "Cannot rename file ${DIR}${TMPFILE}, $!";
}
The dev field result from stat() tells you the device number the inode lives on. That can be used to distinguish different mount points, as they'll have a different device number from the one you started at.

Issues with reducing duplicate output from log file search

This website has been a great help since I'm getting back into programming and I'm attempting to write a simple perl script that will analyze apache log files from a directory (multiple domains), pull the last 1000 lines of each log file, strip the IP addresses from the log file and then compare them with a known block list of bot spammers.
Now so far I've got the script working except for one issue. Lets say I have the IP address 10.128.45.5 in two log files, the script of course analyzes each log file in turn stripping and reducing the IP's to one PER log file but what I'm trying to do is narrow that down even more to one per instance I run this script, regardless if the same IP appears across multiple log files.
Here's the code I've gotten so far, sorry if it's a bit messy.
#!/usr/bin/perl
# Extract IP's from apache access logs for the last hour and matches with forum spam bot list.
# The fun work of Daniel Pearson
use strict;
use warnings;
use Socket;
# Declarations
my ($file,$list,#files,%ips,$match,$path,$sort);
my $timestamp = localtime(time);
# Check to see if matching file exists
$list ='list';
if (-e $list) {
Delete the file so we can download a new one if it exists
print "File Exists!";
print "Deleting File $list\n";
unlink($list);
}
sleep(5);
system ("wget http://www.domain.com/list");
sleep(5);
my $dir = $ARGV[0] or die "Need to specify the log file directory\n";
opendir(DIR, "$dir");
#files = grep(/\.*$/,readdir(DIR));
closedir(DIR);
foreach my $file(#files) {
my $sum = 0;
if (-d $file) {
print "Skipping Directory $file\n";
}
else {
$path = "$dir$file";
open my $path, "-|", "/usr/bin/tail", "-1000", "$path" or die "could not start tail on $path: $!";
my %ips;
while (my $line = <$path>) {
chomp $line;
if ($line =~ m/(?!0+\.0+\.0+\.0+$)(([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5]))/g) {
my $ip = $1;
$ips{$ip} = $ip;
}
}
}
foreach my $key (sort keys %ips) {
open ("files","$list");
while (my $sort = <files>) {
chomp $sort;
if ($key =~ $sort) {
open my $fh, '>>', 'banned.out';
print "Match Found we need to block it $key\n";
print $fh "$key:$timestamp\n";
close $fh;
}
}
}
}
Any advice that could be given I would be grateful for.
To achieve the task:
Move my %ips outside of (above) the foreach my $file (#files) loop.
Move foreach my $key ( sort keys %ips ) outside of (below) the foreach my $file (#files) loop.

Resources