How to tail file in perl when copytruncate is used - linux

Problem
I have created a simple perl script to read log files and process the data asynchronously.
The reading sub also checks for changes in inode number so a new filehandle is created when the logs rotate.
The problem i am facing is that when copytruncate is used in logrotation then then inode does not change when the file is rotated.
This shouldn't be an issue as the script should just continue reading the file but for some reason that i cannot immediately see, as soon as the logs rotate no new lines are ever read.
Question
How can i modify the below script (or completely scrap and start again) to continously tail a file which is logrotated using copytruncate using perl ?
Code
use strict;
use warnings;
use threads;
use Thread::Queue;
use threads::shared;
my $logq = Thread::Queue->new();
my %Servers :shared;
my %servername :shared;
#########
#This sub just reads the data off the queue and processes it, i have
#reduced it to a simple print statement for simplicity.
#The sleep is to prevent it from eating cpu.
########
sub process_data
{
while(sleep(5)){
if ($logq->pending())
{
while($logq->pending() > 0){
my $data = $logq->dequeue();
print "Data:$data\n";
}
}
}
}
sub read_file
{
my $myFile=$_[0];
#Get the argument and assign to var.
open(my $logfile,'<',$myFile) || die "error";
#open file
my $Inode=(stat($logfile))[1];
#Get the current inode
seek $logfile, 0, 2;
#Go to the end of the file
for (;;) {
while (<$logfile>) {
chomp( $_ );
$logq->enqueue( $_ );
#Add lines to queue for processing
}
sleep 5;
if($Inode != (stat($myFile))[1]){
close($logfile);
while (! -e $myFile){
sleep 2;
}
open($logfile,'<',$myFile) || die "error";
$Inode=(stat($logfile))[1];
}
#Above checks if the inode has changed and the file exists still
seek $logfile, 0, 1;
#Remove eof
}
}
my $thr1 = threads->create(\&read_file,"test");
my $thr4 = threads->create(\&process_data);
$thr1->join();
$thr4->join();
#Creating the threads, can add more log files for processing or multiple processing sections.
Possibly relevant info
Log config for logrotate contains
compress
compresscmd /usr/bin/bzip2
uncompresscmd /usr/bin/bunzip2
daily
rotate 5
notifempty
missingok
copytruncate
for this file.
Specs
GNU bash, version 3.2.57(1)-release (s390x-ibm-linux-gnu)
perl, v5.10.0
(if logrotate has version and someone knows how to check then i will also add that)
Any more info needed just ask.

So the reason that this was failing is pretty obvious when you look at copytruncate, it copies the original file and then truncates the current one.
Whilst this ensure that the inode is kept, it created another problem.
As the current way i tail the file is by simply staying at the end and removing the eof flag this means that when the file is truncated, the pointer stays at the position of the last line before truncation, which in turn means that no more lines would be read until it reached that pointer again.
The obvious solution then is to simply check the size of the file and reset the pointer if it is ever pointing past the end of the file.
I found it easier to just check that file size never got smaller though, using the two lines below.
my $fileSize=(stat($logfile))[7];
#Added after the inode is assigned
and changing
if($Inode != (stat($myFile))[1]){
to
if($Inode != (stat($myFile))[1] || (stat($myFile))[7] < $fileSize){

Related

How to add a line to multiple text files under different sub-directories only if the line doesn't exist?

I have a directory called "technology" and under "technology" directory I have multiple sub-directories nested up to two levels max.
Under every sub-directory I have at least one "*.txt" file which could have up to 20 to 30 lines entry in it.
Now, I want to add "Remarks" line in every *.txt files spanning across multiple sub-directories only if the line is not there already.
I am getting list of all files under sub-directory using:
find ./ -name '*.txt'
I am using below mention Perl script to update entries, with new remarks as shown below.
/technology$ perl -p -i -e 's/Remarks.*/Remarks: NEW Value/' 'find ./ -name *.txt'
Problem with above script is that it only updates the existing remarks field.
How can I add an entry (one liner remarks) to ONLY those files that don't actually have it already?
I want to add line to only those files that don't contain a "Remarks line".
It isn't entirely clear if you want to update the rows containing Remarks in the files that already contain such a line or you want to leave those files unchanged and only add a line at the end of those file that don't contain remarks.
Fortunately, there isn't much difference. This code should work for "edit existing Remarks lines; add a new Remarks line if there isn't one already":
#!/usr/bin/env perl -i
use strict;
use warnings;
my $num_remarks = 0;
while (<>)
{
$num_remarks++ if s/Remarks.*/Remarks: NEW Value $$/;
print;
}
continue
{
if (eof)
{
print "Remarks: NEW Value $$\n" if $num_remarks == 0;
$num_remarks = 0;
}
}
The alternative requirement "leave existing Remarks lines unchanged; add a new Remarks line if there isn't one" can be handled with:
#!/usr/bin/env perl -i
use strict;
use warnings;
my $num_remarks = 0;
while (<>)
{
$num_remarks++ if m/Remarks/; # m// instead of s///
print;
}
continue
{
if (eof)
{
print "Remarks: NEW Value $$\n" if $num_remarks == 0;
$num_remarks = 0;
}
}
There are probably shorter ways to write this code. Both variants include the PID of the Perl process in the 'Remarks' line when it is added. This makes it easier to see when things are changed.

Using Net::OpenSSH tail the message file and grep

I am using Net::OpenSSH
my $ssh = Net::OpenSSH->new("$linux_machine_host")
Using the SSH object, fews commands are executed multiple times for N hours.
At times I need to look for any error messages, such as Timeout, in the var/adm/message file.
My suggestion
$ssh->capture2("echo START >> /var/adm/messages");
$ssh->capture2("some command which will be run in background for n hours");
$ssh->capture2("echo END >> /var/adm/messages");
Then read all lines between START and END and grep for the required error message.
$ssh->capture2("grep -A 100000 "START" /var/adm/messages | grep -B 100000 END");`
Without writing START and END into the message file, can I tail the var/adm/message file at some point and capture any new messages appearing afterwards.
Are there any Net::OpenSSH methods which would capture new lines and write them into a file?
You can read the messages file via SFTP (see Net::SFTP::Foreign):
# untested!
use Net::SFTP::Foreign::Constants qw(:flags);
...
my $sftp = $ssh->sftp;
# open the messages file creating it if it doesn't exist
# and move to the end:
my $fh = $sftp->open("/var/adm/messages",
SSH2_FXF_READ|SSH2_FXF_CREAT)
or die $sftp->error;
seek($fh, 0, 2);
$ssh->capture2("some command which...");
# look for the size of /var/adm/messages now so that we
# can ignore any lines that may be appended while we are
# reading it:
my $end = (stat $fh)[7];
# and finally read any lines added since we opened it:
my #msg;
while (1) {
my $pos = tell $fh;
last if $pos < 0 or $pos >= $end;
my $line = <$fh>;
last unless defined $line;
push #msg, $line;
}
Note that you are not taking into account that the messages file may be rotated. Handling that would require more convoluted approaches.

Split a huge file in LINUX into multiple small files (each less than 100MB) splitting at a specific line with pattern match

I have the below source file (~10GB) and I need to split into several small files (<100MB each) and each file should have the same header record. The tricky part is I can't just split the file at any random line by using some split command. Records belonging to an agent shouldn't be split across multiple files. For simplicity I am only showing 2 agents here (there are thousands of them in the real file).
Inout.csv
Src,AgentNum,PhoneNum
DWH,Agent_1234,phone1
NULL,NULL,phone2
NULL,NULL,phone3
DWH,Agent_5678,phone1
NULL,NULL,phone2
NULL,NULL,phone3
DWH,Agent_9999,phone1
NULL,NULL,phone2
NULL,NULL,phone3
Output1.csv
Src,AgentNum,PhoneNum
DWH,Agent_1234,phone1
NULL,NULL,phone2
NULL,NULL,phone3
Output2.csv
Src,AgentNum,PhoneNum
DWH,Agent_5678,phone1
NULL,NULL,phone2
NULL,NULL,phone3
DWH,Agent_9999,phone1
NULL,NULL,phone2
NULL,NULL,phone3
#!/bin/bash
#Calculate filesize in bytes
FileSizeBytes=`du -b $FileName | cut -f1`
#Check for the file size
if [[ $FileSizeBytes -gt 100000000 ]]
then
echo "Filesize is greater than 100MB"
NoOfLines=`wc -l < $FileName`
AvgLineSize=$((FileSizeBytes / NoOfLines))
LineCountInEachFile=$((100000000 / AvgLineSize))
#Section for splitting the files
else
echo "Filesize is already less than 100MB. No splitting needed"
exit 0
fi
I an new to UNIX but trying this bash script on my own and kind of stuck at splitting the files. I am not expecting somebody to give me a full script, I am looking for any simple approach/recommendation possibly using other simple alternatives like sed or such. Many thanks in advance!
Here is a rough idea of how to do it in Perl. Please modify the regular expression if it doesn't exactly match to your actual data. I only tested it on your dummy data.
#!/usr/bin/perl -w
my $l=<>; chomp($l); my $header=$l;
my $agent=""; my $fh;
while ($l=<>) {
chomp($l);
if ($l=~m/^\s*[^,]+,(Agent_\d+),[^,]+/) {
$agent="$1";
open($fh,">","${agent}.txt") or die "$!";
print $fh $header."\n";
}
print $fh $l."\n";
}
Use it as follows:
./perlscript.pl < inputfile.txt
If you don't have perl (check for perl at /usr/bin/perl or some other such location), I will try to do a awk script. Let me know if you find problems running in the above script.
In response to your updated request that you only want to split the file, with each output file as less than 100MB, with no agent records split across two files, and that that header is printed in each output file, here is a rough idea of how you can accomplish that. It doesn't to a exact-cut (because you would need to calculate before you write). If you set the $maxfilesize to a value like 95*1024*1024 or 99*1024*1024, that should let you have a file that is less than 100MB (For ex., if the maximum size of a agent's records are less than 5MB, then set the $maxfilesize to 95*1024*1024)
#!/usr/bin/perl -w
# Max file size, approximately in bytes
#
# For 99MB make it as 99*1024*1024
#
my $maxfilesize=95*1024*1024;
#my $maxfilesize=400;
my $l=<>; chomp($l); my $header=$l;
my $fh;
my $filecounter=0;
my $filename="";
my $filesize=1000000000000; # big dummy size for first iteration
while ($l=<>) {
chomp($l);
if ($l=~m/^\s*[^,]+,Agent_\d+,[^,]+/) {
if ($filesize>$maxfilesize) {
print "FileSize: $filesize\n";
$filecounter++; $filename=sprintf("outfile_%05d",$filecounter);
print "Opening New File: $filename\n";
open($fh,">","${filename}.txt") or die "$!";
print $fh $header."\n";
$filesize=length($header);
}
}
print $fh $l."\n";
$filesize+=length($l);
print "FileSize: $filesize\n";
}
If you want more precise cuts than this, I will update it buffer the data before printing.
Step 1. Save the header
Step 2. create a variable "content" to temp-save the things the program is going to read
Step 3. start reading the next lines, in python:
if line.startswith("DWH"):
if content != "":
#if the content.len() reaches your predefined size, output_your_header + content here and reinitiate content by 'content = ""'
#else, content.len() is still under size limit, keep adding the new agent to content by doing 'content += line'
else:
content += line

Adding files to zipped folder in perl

I have the following perl script that is intended to accept command line arguments that will archive all of a users data files into a zip file and then delete the original data. The script does alright, but when run again with a different user as the argument, it overwrites the previous data in the userData.zip file. I have searched and not been able to find how to perform this task. It should continue to accept users as an argument and append their folders to the userData.zip file.
Any help is appreciated.
use 5.14.2;
use strict;
use warnings;
use Archive::Zip qw( :ERROR_CODES :CONSTANTS );
use File::Path;
my ($DATAFILEIN, $DATAFILEOUT);
my ($new,$zip);
use constant COLUMNS => 6;
sub main {
verifyArguments();
setDataFileIn();
zipFiles();
deleteUserFiles();
#setDataFileOut();
#printData();
#writeData();
}
sub verifyArguments {
if (!(#ARGV) || !(-e $ARGV[0])) {
die "\n\nYou must specify correct file name upon command invocation.\n\n";
}
}
sub setDataFileIn {
$DATAFILEIN = $ARGV[0];
}
sub zipFiles {
print "\nBacking up ".$DATAFILEIN."\n";
sleep 1;
$zip = Archive::Zip->new();
opendir (DIR, $DATAFILEIN) or die $!;
while (my $file = readdir(DIR)) {
# Use -f to look for a file
next unless (-f $DATAFILEIN."\\".$file);
$zip->addFile($DATAFILEIN."\\".$file, );
print "Added $file to zip\n";
}
closedir(DIR);
my $fileName = $DATAFILEIN;
unless ( $zip->writeToFileNamed('userData.zip') == AZ_OK ) {
die 'write error';
}
print "Successfully backed up $fileName to userData.zip\n";
}
sub deleteUserFiles{
rmtree($DATAFILEIN);
}
main();
Have you read this portion of the Archive::Zip FAQ?
Can't Read/modify/write same Zip file
Q: Why can't I open a Zip file, add a member, and write it back? I get
an error message when I try.
A: Because Archive::Zip doesn't (and can't, generally) read file
contents into memory, the original Zip file is required to stay around
until the writing of the new file is completed.
The best way to do this is to write the Zip to a temporary file and
then rename the temporary file to have the old name (possibly after
deleting the old one).
Archive::Zip v1.02 added the archive methods overwrite() and
overwriteAs() to do this simply and carefully.
See examples/updateZip.pl for an example of this technique.
I don't see $zip->overwrite() in your code.
The best place to find information on CPAN modules is http://metacpan.org. In this case, the Archive::Zip page. That page has a documentation link to Archive::Zip::FAQ. You can read it there, or you can probably just type perldoc Archive::Zip::FAQ on your system where you have the module installed.
The examples are part of the downloaded package. If you used the cpan command to install Archive::Zip, then the examples would be in the build location. By default, that would be ~/.cpan/build/Archive-Zip-*/examples.

Search filesystem via perl script while ignoring remote mounts

I've written a perl script that is designed to search a server for world writable files. After some testing, though, I've found that I made a mistake in the logic. Specifically, I've told it to not search /. My initial thought behind this was that I was looking for locally mounted volumes while avoiding those of a remote variety (CIFS, NFS, what-have-you).
What I failed to take into consideration is that not every directory has a unique volume. As a result, by excluding / in my scan, I've missed several directories that should be included. Now I need to rework the script to include those while still excluding remote volumes.
#!/usr/bin/perl
# Directives which establish our execution environment
use warnings;
use strict;
use Fcntl ':mode';
use File::Find;
no warnings 'File::Find';
no warnings 'uninitialized';
# Variables used throughout the script
my $DIR = "/var/log/tivoli/";
my $MTAB = "/etc/mtab";
my $PERMFILE = "world_writable_w_files.txt";
my $TMPFILE = "world_writable_files.tmp";
my $EXCLUDE = "/usr/local/etc/world_writable_excludes.txt";
# Compile a list of mountpoints that need to be scanned
my #mounts;
# Create the filehandle for the /etc/mtab file
open MT, "<${MTAB}" or die "Cannot open ${MTAB}, $!";
# We only want the local mountpoints that are not "/"
while (<MT>) {
if ($_ =~ /ext[34]/) {
my #line = split;
push(#mounts, $line[1]) unless ($_ =~ /root/);
}
}
close MT;
# Read in the list of excluded files
my $regex = do {
open EXCLD, "<${EXCLUDE}" or die "Cannot open ${EXCLUDE}, $!\n";
my #ignore = <EXCLD>;
chomp #ignore;
local $" = '|';
qr/#ignore/;
};
# Create the output file path if it doesn't already exist.
mkdir "${DIR}" or die "Cannot execute mkdir on ${DIR}, $!" unless (-d "${DIR}");
# Create the filehandle for writing the findings
open WWFILE, ">${DIR}${TMPFILE}" or die "Cannot open ${DIR}${TMPFILE}, $!";
foreach (#mounts) {
# The anonymous subroutine which is executed by File::Find
find sub {
return unless -f; # Is it a regular file...
# ...and world writable.
return unless (((stat)[2] & S_IWUSR) && ((stat)[2] & S_IWGRP) && ((stat)[2] & S_IWOTH));
# Add the file to the list of found world writable files unless it is
# in the list if exclusions
print WWFILE "$File::Find::name\n" unless ($File::Find::name =~ $regex);
}, $_;
}
close WWFILE;
# If no world-writable files have been found ${TMPFILE} should be zero-size;
# Delete it so Tivoli won't alert
if (-z "${DIR}${TMPFILE}") {
unlink "${DIR}${TMPFILE}";
} else {
rename("${DIR}${TMPFILE}","${DIR}${PERMFILE}") or die "Cannot rename file ${DIR}${TMPFILE}, $!";
}
I'm at a bit of a loss as to how to approach this now. I know I can obtain the necessary information using stat -f -c %T but I don't see a similar option for perl's built-in stat (unless I'm misinterpreting the descriptions for output fields; perhaps it is found in one of the S_ variables?).
I'm just looking for a push in the right direction. I'd really rather not drop to a shell command to obtain this information.
EDIT: I've found this answer to a similar question, but it seems to be not entirely helpful. When I test the built-in stat against a CIFS mount I get 18. Perhaps what I need is a comprehensive list of values that could be returned for remote files to compare against?
EDIT2: This is the script in its new form which meets the requirements:
#!/usr/bin/perl
# Directives which establish our execution environment
use warnings;
use strict;
use Fcntl ':mode';
use File::Find;
no warnings 'File::Find';
no warnings 'uninitialized';
# Variables used throughout the script
my $DIR = "/var/log/tivoli/";
my $MTAB = "/etc/mtab";
my $PERMFILE = "world_writable_w_files.txt";
my $TMPFILE = "world_writable_files.tmp";
my $EXCLUDE = "/usr/local/etc/world_writable_excludes.txt";
my $ROOT = "/";
my #devNum;
# Create an array of the file stats for "/"
my #rootStats = stat("${ROOT}");
# Compile a list of mountpoints that need to be scanned
my #mounts;
open MT, "<${MTAB}" or die "Cannot open ${MTAB}, $!";
# We only want the local mountpoints
while (<MT>) {
if ($_ =~ /ext[34]/) {
my #line = split;
push(#mounts, $line[1]);
}
}
close MT;
# Build an array of each mountpoint's device number for future comparison
foreach (#mounts) {
my #stats = stat($_);
push(#devNum, $stats[0]);
}
# Read in the list of excluded files and create a regex from them
my $regExcld = do {
open XCLD, "<${EXCLUDE}" or die "Cannot open ${EXCLUDE}, $!\n";
my #ignore = <XCLD>;
chomp #ignore;
local $" = '|';
qr/#ignore/;
};
# Create a regex to compare file device numbers to.
my $devRegex = do {
chomp #devNum;
local $" = '|';
qr/#devNum/;
};
# Create the output file path if it doesn't already exist.
mkdir("${DIR}" or die "Cannot execute mkdir on ${DIR}, $!") unless (-d "${DIR}");
# Create our filehandle for writing our findings
open WWFILE, ">${DIR}${TMPFILE}" or die "Cannot open ${DIR}${TMPFILE}, $!";
foreach (#mounts) {
# The anonymous subroutine which is executed by File::Find
find sub {
# Is it in a basic directory, ...
return if $File::Find::dir =~ /sys|proc|dev/;
# ...a regular file, ...
return unless -f;
# ...local, ...
my #dirStats = stat($File::Find::name);
return unless $dirStats[0] =~ $devRegex;
# ...and world writable?
return unless (((stat)[2] & S_IWUSR) && ((stat)[2] & S_IWGRP) && ((stat)[2] & S_IWOTH));
# If so, add the file to the list of world writable files unless it is
# in the list if exclusions
print(WWFILE "$File::Find::name\n") unless ($File::Find::name =~ $regExcld);
}, $_;
}
close WWFILE;
# If no world-writable files have been found ${TMPFILE} should be zero-size;
# Delete it so Tivoli won't alert
if (-z "${DIR}${TMPFILE}") {
unlink "${DIR}${TMPFILE}";
} else {
rename("${DIR}${TMPFILE}","${DIR}${PERMFILE}") or die "Cannot rename file ${DIR}${TMPFILE}, $!";
}
The dev field result from stat() tells you the device number the inode lives on. That can be used to distinguish different mount points, as they'll have a different device number from the one you started at.

Resources