Adding files to zipped folder in perl - linux

I have the following perl script that is intended to accept command line arguments that will archive all of a users data files into a zip file and then delete the original data. The script does alright, but when run again with a different user as the argument, it overwrites the previous data in the userData.zip file. I have searched and not been able to find how to perform this task. It should continue to accept users as an argument and append their folders to the userData.zip file.
Any help is appreciated.
use 5.14.2;
use strict;
use warnings;
use Archive::Zip qw( :ERROR_CODES :CONSTANTS );
use File::Path;
my ($DATAFILEIN, $DATAFILEOUT);
my ($new,$zip);
use constant COLUMNS => 6;
sub main {
verifyArguments();
setDataFileIn();
zipFiles();
deleteUserFiles();
#setDataFileOut();
#printData();
#writeData();
}
sub verifyArguments {
if (!(#ARGV) || !(-e $ARGV[0])) {
die "\n\nYou must specify correct file name upon command invocation.\n\n";
}
}
sub setDataFileIn {
$DATAFILEIN = $ARGV[0];
}
sub zipFiles {
print "\nBacking up ".$DATAFILEIN."\n";
sleep 1;
$zip = Archive::Zip->new();
opendir (DIR, $DATAFILEIN) or die $!;
while (my $file = readdir(DIR)) {
# Use -f to look for a file
next unless (-f $DATAFILEIN."\\".$file);
$zip->addFile($DATAFILEIN."\\".$file, );
print "Added $file to zip\n";
}
closedir(DIR);
my $fileName = $DATAFILEIN;
unless ( $zip->writeToFileNamed('userData.zip') == AZ_OK ) {
die 'write error';
}
print "Successfully backed up $fileName to userData.zip\n";
}
sub deleteUserFiles{
rmtree($DATAFILEIN);
}
main();

Have you read this portion of the Archive::Zip FAQ?
Can't Read/modify/write same Zip file
Q: Why can't I open a Zip file, add a member, and write it back? I get
an error message when I try.
A: Because Archive::Zip doesn't (and can't, generally) read file
contents into memory, the original Zip file is required to stay around
until the writing of the new file is completed.
The best way to do this is to write the Zip to a temporary file and
then rename the temporary file to have the old name (possibly after
deleting the old one).
Archive::Zip v1.02 added the archive methods overwrite() and
overwriteAs() to do this simply and carefully.
See examples/updateZip.pl for an example of this technique.
I don't see $zip->overwrite() in your code.
The best place to find information on CPAN modules is http://metacpan.org. In this case, the Archive::Zip page. That page has a documentation link to Archive::Zip::FAQ. You can read it there, or you can probably just type perldoc Archive::Zip::FAQ on your system where you have the module installed.
The examples are part of the downloaded package. If you used the cpan command to install Archive::Zip, then the examples would be in the build location. By default, that would be ~/.cpan/build/Archive-Zip-*/examples.

Related

Having a small Issue running a Perl scripts IF statement.

I created a small script in Perl and I am really new to this. I'm supposed to have a script that looks at an argument given and create a directory tree in the given argument. This part of the script works. The second part (which is the nested if statement) does not when you do not give an argument and it asks you to input a directory of your choice. I believe the nested if statement is messing up due to the $file input but I'm not entirely sure whats wrong. This is probably something really simple, but I have not been able to find the solution. Thank you in advance for the help and tips.
#! /usr/bin/perl
if ($#ARGV == -1)
{
print "Please enter default directory:";
my $file=<STDIN>;
if (-d $file)
{
chdir $file;
system("mkdir Data");
system("mkdir Data/Image");
system("mkdir Data/Cache");
print "Structure Created";
}
else
{
print "Directory does not exsist";
}
}
else
{
chdir $ARGV[0];
system("mkdir Data");
system("mkdir Data/Image");
system("mkdir Data/Cache");
print ("Structure Created");
}
print ("\n");
The test -d $file is failing because what is entered via STDIN also has the newline, after the string that specifies the directory name. You need chomp($file);
However, there are a few more points I would like to bring up.
Most importantly, there is repeated code in both branches. You really do not want to do that. It can, and does, cause trouble later. Instead, decide on the directory name, and then make it.
Second, there is no reason to go out to the system in order to make a directory. It is far better to do it in Perl, and there are good modules for this.
use strict;
use warnings;
use File::Path qw(make_path);
my $dir;
if (not #ARGV) {
print "Please enter default directory: ";
$dir = <STDIN>;
chomp $dir;
}
else {
$dir = $ARGV[0];
}
die "No directory $dir" if not -d $dir;
my $orig_cwd = chdir $dir or die "Can't chdir to $dir: $!";
my #dirs = map { "Data/$_" } qw(Image Cache);
my #dirs_made = make_path( #dirs, { verbose => 1 } );
print "Created directories:\n";
print "$_\n" for #dirs_made;
I build the directory list using map so to avoid repeated strings with Data/..., and for later flexibility. You can of course just type the names in, but that tends to invite silly mistakes.
I used File::Path to make the directories. It builds the whole path, like mkdir -p, and has a few other useful options that you can pass in { }, including error handling. There are other modules as well, for example Path::Tiny with its mkpath (and a lot of other goodies).
Note that with chdir you probably want to record the current working directory, that it returns, and that you want to check for error. But you don't have to chdir, if there are no other reasons for that. Just include the $dir name in the map
# No chdir needed here
my #dirs = map { "$dir/Data/$_" } qw(Image Cache);

Adding custom header to specific files in a directory

I would like to add a unique one line header that pertains to each file FOCUS*.tsv file in a specified directory. After that, I would like to combine all of these files into one file.
First I’ve tried sed command.
`my $cmd9 = `sed -i '1i$SampleID[4]' $tsv_file`;` print $cmd9;
It looked like it worked but after I’ve combined all of these files into one file in the next section of the code, the inserted row was listed four times for each file.
I’ve tried the following Perl script to accomplish the same but it deleted the content of the file and only prints out the added header.
I’m looking for the simplest way to accomplish what I’m looking for.
Here is what I’ve tried.
#!perl
use strict;
use warnings;
use Tie::File;
my $home="/data/";
my $tsv_directory = $home."test_all_runs/".$ARGV[0];
my $tsvfiles = $home."test_all_runs/".$ARGV[0]."/tsv_files.txt";
my #run_directory = (); #run_directory = split /\//, $tsv_directory; print "The run directory is #############".$run_directory[3]."\n";
my $cmd = `ls $tsv_directory/FOCUS*\.tsv > $tsvfiles`; #print "$cmd";
my $cmda = "ls $tsv_directory/FOCUS*\.tsv > $tsvfiles"; #print "$cmda";
my #tsvfiles =();
#this code opens the vcf_files.txt file and passes each line into an array for indidivudal manipulation
open(TXT2, "$tsvfiles");
while (<TXT2>){
push (#tsvfiles, $_);
}
close(TXT2);
foreach (#tsvfiles){
chop($_);
}
#this loop works fine
for my $tsv_file (#tsvfiles){
open my $in, '>', $tsv_file or die "Can't write new file: $!";
open my $out, '>', "$tsv_file.new" or die "Can't write new file: $!";
$tsv_file =~ m|([^/]+)-oncomine.tsv$| or die "Can't extract Sample ID";
my $sample_id = $1;
#print "The sample ID is ############## $sample_id\n";
my $headerline = $run_directory[3]."/".$sample_id;
print $out $headerline;
while( <$in> ) {
print $out $_;
}
close $out;
close $in;
unlink($tsv_file);
rename("$tsv_file.new", $tsv_file);
}
Thank you
Apparently, the wrong '>' when opening the file for reading was the problem and it got solved.
However, I'd like to make a few comments on some of the rest of the code.
The list of files is built by running external ls redirected to a file, then reading this file into an array. However, that is exactly the job of glob and all of that is replaced by
my #tsvfiles = glob "$tsv_directory/FOCUS*.tsv";
Then you don't need the chomp either, and the chop that is used would actually hurt since it removes the last character, not only the newline (or really $/).
Use of chop is probably not what you want. If you are removing the linefeed ($/) use chomp
To extract a match and assign it, a common idiom is
my ($sample_id) = $tsv_file =~ m|([^/]+)-oncomine.tsv$|
or die "Can't extract Sample ID: $!";
Note that I also added $!, to actually print the error. Otherwise we just don't know what it was.
The unlink and rename appear to be overwriting one file with another. You can do that by using move from the core module File::Copy
use File::Copy qw(move);
move ($tsv_file_new, $tsv_file)
or die "Can't move $tsv_file to $tsv_file_new: $!";
which renames the _new into $tsv_file, so overwriting it.
As for how the files need to be combined, more precise explanation would be needed.

How to tail file in perl when copytruncate is used

Problem
I have created a simple perl script to read log files and process the data asynchronously.
The reading sub also checks for changes in inode number so a new filehandle is created when the logs rotate.
The problem i am facing is that when copytruncate is used in logrotation then then inode does not change when the file is rotated.
This shouldn't be an issue as the script should just continue reading the file but for some reason that i cannot immediately see, as soon as the logs rotate no new lines are ever read.
Question
How can i modify the below script (or completely scrap and start again) to continously tail a file which is logrotated using copytruncate using perl ?
Code
use strict;
use warnings;
use threads;
use Thread::Queue;
use threads::shared;
my $logq = Thread::Queue->new();
my %Servers :shared;
my %servername :shared;
#########
#This sub just reads the data off the queue and processes it, i have
#reduced it to a simple print statement for simplicity.
#The sleep is to prevent it from eating cpu.
########
sub process_data
{
while(sleep(5)){
if ($logq->pending())
{
while($logq->pending() > 0){
my $data = $logq->dequeue();
print "Data:$data\n";
}
}
}
}
sub read_file
{
my $myFile=$_[0];
#Get the argument and assign to var.
open(my $logfile,'<',$myFile) || die "error";
#open file
my $Inode=(stat($logfile))[1];
#Get the current inode
seek $logfile, 0, 2;
#Go to the end of the file
for (;;) {
while (<$logfile>) {
chomp( $_ );
$logq->enqueue( $_ );
#Add lines to queue for processing
}
sleep 5;
if($Inode != (stat($myFile))[1]){
close($logfile);
while (! -e $myFile){
sleep 2;
}
open($logfile,'<',$myFile) || die "error";
$Inode=(stat($logfile))[1];
}
#Above checks if the inode has changed and the file exists still
seek $logfile, 0, 1;
#Remove eof
}
}
my $thr1 = threads->create(\&read_file,"test");
my $thr4 = threads->create(\&process_data);
$thr1->join();
$thr4->join();
#Creating the threads, can add more log files for processing or multiple processing sections.
Possibly relevant info
Log config for logrotate contains
compress
compresscmd /usr/bin/bzip2
uncompresscmd /usr/bin/bunzip2
daily
rotate 5
notifempty
missingok
copytruncate
for this file.
Specs
GNU bash, version 3.2.57(1)-release (s390x-ibm-linux-gnu)
perl, v5.10.0
(if logrotate has version and someone knows how to check then i will also add that)
Any more info needed just ask.
So the reason that this was failing is pretty obvious when you look at copytruncate, it copies the original file and then truncates the current one.
Whilst this ensure that the inode is kept, it created another problem.
As the current way i tail the file is by simply staying at the end and removing the eof flag this means that when the file is truncated, the pointer stays at the position of the last line before truncation, which in turn means that no more lines would be read until it reached that pointer again.
The obvious solution then is to simply check the size of the file and reset the pointer if it is ever pointing past the end of the file.
I found it easier to just check that file size never got smaller though, using the two lines below.
my $fileSize=(stat($logfile))[7];
#Added after the inode is assigned
and changing
if($Inode != (stat($myFile))[1]){
to
if($Inode != (stat($myFile))[1] || (stat($myFile))[7] < $fileSize){

batch download from URL

I want to download thousand of files from a URL. Each line in "FileName.txt" contains the name of file to download. I am using a Perl script to take the file name from "FileName.txt" and downloading them after a random time. I run script as "./program.pl Filename.txt"
Filename.txt
A
B
C
B
program.pl
#!/usr/bin/perl
$file1=$ARGV[0];
open(FP1, $file1);
while($s1=<FP1>)
<br>
{ chomp ($s1);
$range = 5;
$minimum = 3;
$random_number = int(rand($range)) + $minimum;
`wget --wait="$random_number" "http://URL=$s1"`;
}
I am getting the output for few initial file but not for remaining file. For remaining file $ emacs fileD.txt give
[13] 29699
Could you kindly tell me why I am getting "[13] 29699", and what is the best way to download file after random time interval. Sorry, the program at while does not show the correct handler. Thanks
You don't show where $id comes from, but presumably some URLs contain & which puts the process in the background. You should use single quotes for wget's argument or use the list form of system.
Further, wget's wait parameter is only relevant if your are using wget itself to traverse links from a given URL. In your case, you need your Perl script to sleep between invoking wget for each URL:
#!/usr/bin/env perl
use strict;
use warnings;
use constant WAIT_MINIMUM => 3;
use constant WAIT_RANGE => 5;
my ($url_list_file) = #ARGV;
defined($url_list_file)
or die "Need URL list\n";
open my $fh, '<', $url_list_file
or die "Cannot open '$url_list_file': $!";
while (my $url = <$fh>) {
$url =~ s/\R\z//;
my #cmd = (wget => 'http://$url');
print "#cmd\n";
my $error = system #cmd;
if ($error) {
warn "''#cmd' failed: $?";
}
sleep WAIT_MINIMUM + rand(WAIT_RANGE);
}
What means URL=? wget takes url as simple paramter. Seems to be you need
`wget --wait=$random_number 'http://$s1'`;

Search filesystem via perl script while ignoring remote mounts

I've written a perl script that is designed to search a server for world writable files. After some testing, though, I've found that I made a mistake in the logic. Specifically, I've told it to not search /. My initial thought behind this was that I was looking for locally mounted volumes while avoiding those of a remote variety (CIFS, NFS, what-have-you).
What I failed to take into consideration is that not every directory has a unique volume. As a result, by excluding / in my scan, I've missed several directories that should be included. Now I need to rework the script to include those while still excluding remote volumes.
#!/usr/bin/perl
# Directives which establish our execution environment
use warnings;
use strict;
use Fcntl ':mode';
use File::Find;
no warnings 'File::Find';
no warnings 'uninitialized';
# Variables used throughout the script
my $DIR = "/var/log/tivoli/";
my $MTAB = "/etc/mtab";
my $PERMFILE = "world_writable_w_files.txt";
my $TMPFILE = "world_writable_files.tmp";
my $EXCLUDE = "/usr/local/etc/world_writable_excludes.txt";
# Compile a list of mountpoints that need to be scanned
my #mounts;
# Create the filehandle for the /etc/mtab file
open MT, "<${MTAB}" or die "Cannot open ${MTAB}, $!";
# We only want the local mountpoints that are not "/"
while (<MT>) {
if ($_ =~ /ext[34]/) {
my #line = split;
push(#mounts, $line[1]) unless ($_ =~ /root/);
}
}
close MT;
# Read in the list of excluded files
my $regex = do {
open EXCLD, "<${EXCLUDE}" or die "Cannot open ${EXCLUDE}, $!\n";
my #ignore = <EXCLD>;
chomp #ignore;
local $" = '|';
qr/#ignore/;
};
# Create the output file path if it doesn't already exist.
mkdir "${DIR}" or die "Cannot execute mkdir on ${DIR}, $!" unless (-d "${DIR}");
# Create the filehandle for writing the findings
open WWFILE, ">${DIR}${TMPFILE}" or die "Cannot open ${DIR}${TMPFILE}, $!";
foreach (#mounts) {
# The anonymous subroutine which is executed by File::Find
find sub {
return unless -f; # Is it a regular file...
# ...and world writable.
return unless (((stat)[2] & S_IWUSR) && ((stat)[2] & S_IWGRP) && ((stat)[2] & S_IWOTH));
# Add the file to the list of found world writable files unless it is
# in the list if exclusions
print WWFILE "$File::Find::name\n" unless ($File::Find::name =~ $regex);
}, $_;
}
close WWFILE;
# If no world-writable files have been found ${TMPFILE} should be zero-size;
# Delete it so Tivoli won't alert
if (-z "${DIR}${TMPFILE}") {
unlink "${DIR}${TMPFILE}";
} else {
rename("${DIR}${TMPFILE}","${DIR}${PERMFILE}") or die "Cannot rename file ${DIR}${TMPFILE}, $!";
}
I'm at a bit of a loss as to how to approach this now. I know I can obtain the necessary information using stat -f -c %T but I don't see a similar option for perl's built-in stat (unless I'm misinterpreting the descriptions for output fields; perhaps it is found in one of the S_ variables?).
I'm just looking for a push in the right direction. I'd really rather not drop to a shell command to obtain this information.
EDIT: I've found this answer to a similar question, but it seems to be not entirely helpful. When I test the built-in stat against a CIFS mount I get 18. Perhaps what I need is a comprehensive list of values that could be returned for remote files to compare against?
EDIT2: This is the script in its new form which meets the requirements:
#!/usr/bin/perl
# Directives which establish our execution environment
use warnings;
use strict;
use Fcntl ':mode';
use File::Find;
no warnings 'File::Find';
no warnings 'uninitialized';
# Variables used throughout the script
my $DIR = "/var/log/tivoli/";
my $MTAB = "/etc/mtab";
my $PERMFILE = "world_writable_w_files.txt";
my $TMPFILE = "world_writable_files.tmp";
my $EXCLUDE = "/usr/local/etc/world_writable_excludes.txt";
my $ROOT = "/";
my #devNum;
# Create an array of the file stats for "/"
my #rootStats = stat("${ROOT}");
# Compile a list of mountpoints that need to be scanned
my #mounts;
open MT, "<${MTAB}" or die "Cannot open ${MTAB}, $!";
# We only want the local mountpoints
while (<MT>) {
if ($_ =~ /ext[34]/) {
my #line = split;
push(#mounts, $line[1]);
}
}
close MT;
# Build an array of each mountpoint's device number for future comparison
foreach (#mounts) {
my #stats = stat($_);
push(#devNum, $stats[0]);
}
# Read in the list of excluded files and create a regex from them
my $regExcld = do {
open XCLD, "<${EXCLUDE}" or die "Cannot open ${EXCLUDE}, $!\n";
my #ignore = <XCLD>;
chomp #ignore;
local $" = '|';
qr/#ignore/;
};
# Create a regex to compare file device numbers to.
my $devRegex = do {
chomp #devNum;
local $" = '|';
qr/#devNum/;
};
# Create the output file path if it doesn't already exist.
mkdir("${DIR}" or die "Cannot execute mkdir on ${DIR}, $!") unless (-d "${DIR}");
# Create our filehandle for writing our findings
open WWFILE, ">${DIR}${TMPFILE}" or die "Cannot open ${DIR}${TMPFILE}, $!";
foreach (#mounts) {
# The anonymous subroutine which is executed by File::Find
find sub {
# Is it in a basic directory, ...
return if $File::Find::dir =~ /sys|proc|dev/;
# ...a regular file, ...
return unless -f;
# ...local, ...
my #dirStats = stat($File::Find::name);
return unless $dirStats[0] =~ $devRegex;
# ...and world writable?
return unless (((stat)[2] & S_IWUSR) && ((stat)[2] & S_IWGRP) && ((stat)[2] & S_IWOTH));
# If so, add the file to the list of world writable files unless it is
# in the list if exclusions
print(WWFILE "$File::Find::name\n") unless ($File::Find::name =~ $regExcld);
}, $_;
}
close WWFILE;
# If no world-writable files have been found ${TMPFILE} should be zero-size;
# Delete it so Tivoli won't alert
if (-z "${DIR}${TMPFILE}") {
unlink "${DIR}${TMPFILE}";
} else {
rename("${DIR}${TMPFILE}","${DIR}${PERMFILE}") or die "Cannot rename file ${DIR}${TMPFILE}, $!";
}
The dev field result from stat() tells you the device number the inode lives on. That can be used to distinguish different mount points, as they'll have a different device number from the one you started at.

Resources