Arrange Log Entries into Dated Files - linux

I'm trying to split a large log file, containing log entries for months at a time, and I'm trying to split it up into logfiles by date. There are thousands of line as follows:
Sep 4 11:45 kernel: Entry
Sep 5 08:44 syslog: Entry
I'm trying to split it up so that the files, logfile.20090904 and logfile.20090905 contain the entries.
I've created a program to read each line, and send it to the appropriate file, but it runs pretty slow (especially since I have to turn a month name to a number). I've thought about doing a grep for every day, which would require finding the first date in the file, but that seems slow as well.
Is there a more optimal solution? Maybe I'm missing a command line program that would work better.
Here is my current solution:
#! /bin/bash
cat $FILE | while read line; do
dts="${line:0:6}"
dt="`date -d "$dts" +'%Y%m%d'`"
# Note that I could do some caching here of the date, assuming
# that dates are together.
echo $line >> $FILE.$dt 2> /dev/null
done

#OP try not to use bash's while read loop to iterate a big file. Its tried and proven that its slow, and furthermore, you are calling external date command for every line of the file you read. Here's a more efficient way, using only gawk
gawk 'BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",mth,"|")
}
{
for(i=1;i<=m;i++){ if ( mth[i]==$1){ month = i } }
tt="2009 "month" "$2" 00 00 00"
date= strftime("%Y%m%d",mktime(tt))
print $0 > FILENAME"."date
}
' logfile
output
$ more logfile
Sep 4 11:45 kernel: Entry
Sep 5 08:44 syslog: Entry
$ ./shell.sh
$ ls -1 logfile.*
logfile.20090904
logfile.20090905
$ more logfile.20090904
Sep 4 11:45 kernel: Entry
$ more logfile.20090905
Sep 5 08:44 syslog: Entry

The quickest thing given what you've already done would be to simply name the files "Sep 4" and so on, then rename them all at the end - that way all you have to do is read a certain number of characters, no extra processing.
If for some reason you don't want to do that, but you know the dates are in order, you could cache the previous date in both forms, and do a string comparison to find out whether you need to run date again or just use the old cached date.
Finally, if speed really keeps being an issue, you could try perl or python instead of bash. You're not doing anything too crazy here, though (besides starting a subshell and date process every line, which we already figured out how to avoid), so I don't know how much it'll help.

A skeleton of script:
BIG_FILE=big.txt
# remove $BIG_FILE when the script exits
trap "rm -f $BIG_FILE" EXIT
cat $FILES > $BIG_FILE || { echo "cat failed"; exit 1 }
# sort file by date in place
sort -M $BIG_FILE -o $BIG_FILE || { echo "sort failed"; exit 1 }
while read line;
# extract date part from line ...
DATE_STR=${line:0:12}
# a new date - create a new file
if (( $DATE_STR != $PREV_DATE_STR)); then
# close file descriptor of "dated" file
exec 5>&-
PREV_DATE_STR=$DATE_STR
# open file of a "dated" file for write
FILE_NAME= ... set to file name ...
exec 5>$FILE_NAME || { echo "exec failed"; exit 1 }
fi
echo -- $line >&5 || { echo "print failed"; exit 1 }
done < $BIG_FILE

This script executes the inner loop 365 or 366 times, once for each day of the year, instead of iterating over each line of the log file:
#!/bin/bash
month=0
months=(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
for eom in 31 29 31 30 31 30 31 31 30 31 30 31
do
(( month++ ))
echo "Month $month"
if (( month == 2 )) # see what day February ends on
then
eom=$(date -d "3/1 - 1 day" +%-d)
fi
for (( day=1; day<=eom; day++ ))
do
grep "^${months[$month - 1]} $day " dates.log > temp.out
if [[ -s temp.out ]]
then
mv temp.out file.$(date -d $month/$day +"%Y%m%d")
else
rm temp.out
fi
# instead of creating a temp file and renaming or removing it,
# you could go ahead and let grep create empty files and let find
# delete them at the end, so instead of the grep and if/then/else
# immediately above, do this:
# grep --color=never "^${months[$month - 1]} $day " dates.log > file.$(date -d $month/$day +"%Y%m%d")
done
done
# if you let grep create empty files, then do this:
# find -type f -name "file.2009*" -empty -delete

Related

How to grep a string variable with spaces in a bash script [duplicate]

This question already has answers here:
Grep error due to expanding variables with spaces
(3 answers)
Closed 5 years ago.
Solution: My date variables were in the wrong format (day number and day of the week were flipped). I changed this, then used the if statement proposed by #PesaThe below instead of my test.
Original Post:
I am writing a bash script to run as part of my servers' daily maintenance tasking. This particular job is to search for entries in input_file matching yesterday's and today's time stamps. Here are my date variables.
today=$(date "+%a, %b %d %Y")
yesterday=$(date --date=yesterday "+%a, %b %d %Y")
Here are the declarations, which are exactly as they should be:
declare -- adminLogLoc="/opt/sc/admin/logs/"
declare -- adminLog="/opt/sc/admin/logs/201801.log"
declare -- today="Tue, Jan 02 2018"
declare -- yesterday="Mon, Jan 01 2018"
declare -- report="/maintenance/daily/2018-01-02_2.2.txt"
Here are some actual log entries like those I need output. These were found with grep $today $adminLog | grep error
Tue, 02 Jan 2018 14:38:50 +0000||error|WARNING|13|Query #2464 used to generate source data is inactive.
Tue, 02 Jan 2018 14:38:50 +0000||error|WARNING|13|Query #2468 used to generate source data is inactive.
Tue, 02 Jan 2018 14:38:50 +0000||error|WARNING|13|Query #2470 used to generate source data is inactive.
Tue, 02 Jan 2018 14:38:50 +0000||error|WARNING|13|Query #2474 used to generate source data is inactive.
Here is the if statement I am trying to run:
# Check for errors yesterday
if [ $(grep $yesterday $adminLog|grep "error") != "" ]; then
echo "No errors were found for $yesterday." >> $report
else
$(grep $yesterday $adminLog|grep "error") >> $report
fi
# Check for errors today (at the time the report is made, there
# probably won't be many, if any at all)
if [ $(grep $today $adminLog|grep "error") != "" ]; then
echo "No errors were found for $today." >> $report
else
$(grep $today $adminLog|grep "error") >> $report
fi
I have tried this several ways, such as putting double quotes around the variables in the test, so on. When I run the grep search in the command line after setting the variables, it works perfectly, but when I run it in the test brackets, grep uses each term (i.e. Tue, Jan... so on) as individual arguments. I have also tried
grep $yesterday $adminLog 2> /dev/null | grep -q error
if [ $? = "0" ] ; then
with no luck.
How can I get this test to work so I can input the specified entry into my log file? Thank you.
Could you please try following script and let me know if this helps you. This snippet will help to simply print the yesterday's and today's logs in case you want to take them into a output file or so, we could adjust it accordingly too then.
#!/bin/bash
today=$(date "+%a, %b %d %Y")
yesterday=$(date --date=yesterday "+%a, %b %d %Y")
todays=$(grep "$today" Input_file)
yesterdays=$(grep "$yesterday" Input_file)
if [[ -n $todays ]]
then
echo "$todays"
else
echo "no logs found for todays date."
fi
if [[ -n $yesterdays ]]
then
echo "$yesterdays"
else
echo "NO logs found for yesterday's date."
fi

How to grep log files during a specific time period [duplicate]

This question already has answers here:
Extract data from log file in specified range of time [duplicate]
(5 answers)
Closed 6 years ago.
Okay, So i have log files and I would like to search within specific ranges. These ranges will be different throughout the day. Below is a piece of a log file and this is the only piece I can show you, sorry work stuff. I am using the cat command if that matters.
Working EXAMPLE : cat /dir/dir/dir/2014-07-30.txt | grep *someword* | cut -d',' -f1,4,3,7
2014-07-30 19:17:34.542 ;; (p=0,siso=0)
The above gets me the info I need along with the time stamp, but shows all time ranges and that is what I would like to correct. Lets say I only want ranges of 18 to 20 in the first column of the time.
Actual --> 2014-07-30 19:17:34.542 ;; (p=0,siso=0)
Only range I am looking for --> [18-20]:00:00.000 ;; (p=0,siso=0)
I am not worried about the 00s as they can be any digit.
Thanks for looking. I have not used much in the way of scripting as you can tell from my example, but any help is greatly appreciated.
I have included a log file, the colons and commas are where they should be.
2014-07-30 14:33:19.259 ;; (p=0,ser=0,siso=0) IN ### Word:Numbers=00000,word=None something goes here and here (something here andhere:here also here:2222),codeword=8,codeword=0,Noideanumbers=00000000,something=something, ;;
Using awk:
logsearch() {
grep "$3" "$4" | awk -v start="$1" -v end="$2" '{split($2, a, /:/)} (a[1] >= start) && (a[1] <= end)'
}
# logsearch <START> <END> <PATTERN> <FILE>
logsearch 18 20 '*someword*' /dir/dir/dir/2014-07-30.txt
Or with only awk (possibly different pattern quoting requirements):
logsearch2 ()
{
awk -v start="$1" -v end="$2" -v pat="$3" '($0 ~ pat) {split($2, a, /:/)} ($0 ~ pat) && (a[1] >= start) && (a[1] <= end)' "$4"
}
Not having seen the original input data I'm guessing from your cut what's going on.
Will this give you something similar to your desired outcome?
awk -F, '/someword/ && $4 ~ /^(18|19|20)/{printf "%s %s %s %s\n", $1,$4,$3,$7}' /dir/dir/dir/2014-07-30.txt
That said: a bit of sample data typically goes a long way!
Edit1:
Given the input line you added to both your comment and the original post the following awk statement does what you're asking:
awk '/something/ && $2 ~ /^(18|19|20)/{printf "%s %s %s %s\n", $1,$2,$3,$4} /path/to/your/input_file
This is a very interesting question. The pure BASH solution offers quite a bit of flexibility in how you deal with or process the entries after you identify those responsive to the range of date/time of interest. The simplest way in BASH is simply to get your start-time and stop-time in seconds since epoch and then test each log entry to determine if it falls within that range and then -- do something with the log entry. The basic logic involved is relatively short. The width of the date_time field within the log can be set by passing the width as argument 4. Set the default dwidth as needed (currently 15 to match syslog and journalctl format. The only required argument is the logfile name. If no start/stop time is specified, it will find all entries:
## set filename, set start time and stop time (in seconds since epoch)
# and time_field width (number of chars that make up date in log entry)
lfname=${1}
test -n "$2" && starttm=`date --date "$2" +%s` || starttm=0
test -n "$3" && stoptm=`date --date "$3" +%s` || stoptm=${3:-`date --date "Jan 01 2037 00:01:00" +%s`}
dwidth=${4:-15}
## read each line from the log file and act on only those with
# date_time between starttm and stoptm (inclusive)
while IFS=$'\n' read line || test -n "$line"; do
test "${line:0:1}" != - || continue # exclude journalctl first line
logtm=`date --date "${line:0:$dwidth}" +%s` # get logtime from entry in seconds since epoch
if test $logtm -ge $starttm && test $logtm -le $stoptm ; then
echo "logtm: ${line:0:$dwidth} => $logtm"
fi
done < "${lfname}"
working example:
#!/bin/bash
## log date format len
# journalctl 15
# syslog 15
# your log example 23
function usage {
test -n "$1" && printf "\n Error: %s\n" "$1"
printf "\n usage : %s logfile ['start datetime' 'stop datetime' tmfield_width]\n\n" "${0//*\//}"
printf " example: ./date-time-diff.sh syslog \"Jul 31 00:15:02\" \"Jul 31 00:18:30\"\n\n"
exit 1
}
## test for required input & respond to help
test -n "$1" || usage "insufficient input."
test "$1" = "-h" || test "$1" = "--help" && usage
## set filename, set start time and stop time (in seconds since epoch)
# and time_field width (number of chars that make up date in log entry)
lfname=${1}
test -n "$2" && starttm=`date --date "$2" +%s` || starttm=0
test -n "$3" && stoptm=`date --date "$3" +%s` || stoptm=${3:-`date --date "Jan 01 2037 00:01:00" +%s`}
dwidth=${4:-15}
## read each line from the log file and act on only those with
# date_time between starttm and stoptm (inclusive)
while IFS=$'\n' read line || test -n "$line"; do
test "${line:0:1}" != - || continue # exclude journalctl first line
logtm=`date --date "${line:0:$dwidth}" +%s` # get logtime from entry in seconds since epoch
if test $logtm -ge $starttm && test $logtm -le $stoptm ; then
echo "logtm: ${line:0:$dwidth} => $logtm"
fi
done < "${lfname}"
exit 0
usage:
$ ./date-time-diff.sh -h
usage : date-time-diff.sh logfile ['start datetime' 'stop datetime' tmfield_width]
example: ./date-time-diff.sh syslog "Jul 31 00:15:02" "Jul 31 00:18:30"
Remember to quote your starttm and stoptm strings. Testing with 20 entries in logfile between Jul 31 00:12:58 and Jul 31 00:21:10.
test output:
$ ./date-time-diff.sh jc.log "Jul 31 00:15:02" "Jul 31 00:18:30"
logtm: Jul 31 00:15:02 => 1406783702
logtm: Jul 31 00:15:10 => 1406783710
logtm: Jul 31 00:15:11 => 1406783711
logtm: Jul 31 00:15:11 => 1406783711
logtm: Jul 31 00:15:11 => 1406783711
logtm: Jul 31 00:15:11 => 1406783711
logtm: Jul 31 00:18:30 => 1406783910
Depending on what you need, another one of the solutions may fit your needs, but if you need to be able to process or manipulate the matching log entries, it is hard to beat a BASH script.
You can pipe the results to grep again.
cat /dir/dir/dir/2014-07-30.txt | grep someword | cut -d',' -f1,4,3,7 \
| grep '^\d\d\d\d-\d\d-\d\d \(1[89]\|20\)'
I don't have enough reputation to comment, but as minopret suggested do one grep at a time.
Here is one of the solutions to get the 18-20 range:
grep ' 20: \| 17:\| 18:' filename.txt
I have found the answer in the form I was looking for:
cat /dir/dir/dir/2014-07-30.txt | grep *someword* | cut -d',' -f1,4,3,7 | egrep '[^ ]+ (2[0-2]):[0-9]'
The following command gets me all the information I need from the cut, and greps for the someword I need and with the egrep I can search the times I need.

`sed` pattern matching?

Permissions links Owner Group Size Date Time Directory or file
-rwxr--r-- 1 User1 root 26 2012-04-12 19:51 MyFile.txt
drwxrwxr-x 3 User2 csstf 4096 2012-03-15 00:12 MyDir
I have problem for pattern match to get certain details using the above details. I actually need to write down the shell script to get the following details.
I need to use pipe in this question. When I do ls -la | prog.sh it need to show the details below.
The major part I don't get is how to use sed pattern matching.
1. Total number of lines read.
2. Total number of different users (owners).
3. Total number of files with execute permission for the owner.
4. The top 3 largest directory.
This is what I have tried so far
#!/bin/bash
while read j
do
B=`sed -n '$=' $1`
echo "total number of lines read = $B"
done
The while loop reads the output of ls -la line by line and you need to process each line and maintain variables for the information you need.
Here is a sample script to get you started:
#!/bin/bash
declare -i lineCount=0
declare -i executePermissionCount=0
# an array to keep track of owners
declare -a owners=()
# read each line into an array called lineFields
while read -r -a lineFields
do
# the owner is the third element in the array
owner="${lineFields[2]}"
# check if we have already seen this owner before
found=false
for i in "${owners[#]}"
do
if [[ $i == $owner ]]
then
found=true
fi
done
# if we haven't seen this owner, add it to the array
if ! $found
then
owners+=( "$owner" )
fi
# check if this file has owner execute permission
permission="${lineFields[0]}"
# the 4th character should be x
if [[ ${permission:3:1} == "x" ]]
then
(( executePermissionCount++ ))
fi
# increment line count
(( lineCount++ ))
done
echo "Number of lines: $lineCount"
echo "Number of different owners: ${#owners[#]}"
echo "Number of files with execute permission: $executePermissionCount"

Export variables to another script

I am making 2 scripts. The first script is going to take a file, and then move it to a directory named "Trash". The second script will recover this file and send it back to it's original directory. So far I have the first script moving the file correctly.
Here is my code so far:
For delete.sh
FILE=$1
DATEDELETED=$(date)
DIRECTORY=$(pwd)
mv $1 Trash
echo $FILE
echo $DATEDELETED
echo $DIRECTORY
Output:
trashfile
Sun Mar 2 21:37:21 CST 2014
/home/user
For undelete.sh:
PATH=/home/user/Trash
for file in $PATH
do
$file | echo "deleted on" | date -r $file
done
echo "Enter the filename to undelete from the above list:"
EDIT: So I realized that I don't need variables, I can just list all the files in the Trash directory and edit the output to what I want. I'm having a little trouble with my for statement though, I'm getting these two errors: ./undelete.sh: line 6: date: command not found
./undelete.sh: line 6: /home/user/Trash: Is a directory. So I'm not exactly sure what I'm doing wrong in my for statement.
Here is the expected output:
file1 deleted on Tue Mar 16 17:15:34 CDT 2010
file2 deleted on Tue Mar 16 17:15:47 CDT 2010
Enter the filename to undelete from the above list:
Well I've accomplished what could be a workaround for what your scenario is trying to accomplish.
Basically you can enter echo "script2variable=$script1variable" >> script2.sh from script1.sh. Then use the source command to call that script later from any script you desire. Might have to just play with the theories involved.
Good Luck!
Delete Script file
#!/bin/bash
# delete.sh file
# Usage: ./delete.sh [filename]
#DATEDELETED=$(date) #not best solution for this kind of application
DIR=$(pwd)
fn=$1
#Specify your trash directory or partition
trash=~/trash
#Set path and filename for simple use in the script
trashed=$trash/$fn.tgz
#Send variables to new manifest script.
echo "#!/bin/bash" > $1.mf.sh
echo "DIR=$DIR" >> $1.mf.sh
# Questionable need?
#echo "DATEDELETED=$DATEDELETED" >> $1.mf.sh
echo "mv $1 $DIR" >> $1.mf.sh
echo Compressing
tar -cvpzf $trashed $1 $1.mf.sh
if [ $? -ne 0 ]; then
echo Compression Failed
else
echo completed trash compression successfully
echo Trashbin lists the file as $trashed
rm $1 -f
rm $1.mf.sh -f
echo file removed successfully
fi
exit 0
Restore Script File
#!/bin/bash
# restore.sh
# Usage: ./restore.sh
# filename not required for this script, use index selection
fn=$1
#SET LOCATION OF TRASH DIRECTORY
trash=~/trash
listfile=($(ls $trash))
#SET COUNTER FOR LISTING FILES = 0
i=0
#THIS IS JUST A HEADER FOR YOUR OUTPUT.
clear #cleanup your shell
echo -e INDEX '\t' Filename '\t' '\t' '\t' Date Removed
#Echo a list of files from the array with the counter.
for FILES in "${listfile[#]}"
do
echo -e $i '\t' $FILES '\t' "deleted on $(date -r $trash/$FILES)"
let "i += 1"
done
#Output total number of items from the ls directory.
echo -e '\n' $i file\(s\) found in the trash!
# 1 Offset for 1 = 0, 2 = 1, and so on.
let "i -= 1"
#Require input of a single, double, or triple digit number only.
#Loop back prompt if not a number.
while true;
do
read -p "Enter an index number for file restoration: " indx
case $indx in
[0-9]|[0-9][0-9]|[0-9][0-9][0-9] ) break ;;
* ) read -p "Please enter a valid number 0-$i: " indx ;;
esac
done
#
script=$(echo ${listfile[$indx]}|sed 's/\.tgz/\.mf.sh/g')
tar -xvpzf $trash/${listfile[$indx]}
rm $trash/${listfile[$indx]}
sleep 2
chmod +x $script
source $script
rm $script
Run the script with source
source <yourscript>
or
. ./<yourscript>
In your case
. ./delete.sh && ./undelete.sh
Hope this will help

How can I find and delete files based on date in a linux shell script without find?

PLEASE NOTE THAT I CANNOT USE 'find' IN THE TARGET ENVIRONMENT
I need to delete all files more than 7 days old in a linux shell script. SOmething like:
FILES=./path/to/dir
for f in $FILES
do
echo "Processing $f file..."
# take action on each file. $f store current file name
# perhaps stat each file to get the last modified date and then delete files with date older than today -7 days.
done
Can I use 'stat' to do this? I was trying to use
find *.gz -mtime +7 -delete
but discovered that I cannot use find on the target system (there is no permission for the cron user and this can't be changed). Target system is Redhat Enterprise.
The file names are formatted like this:
gzip > /mnt/target03/rest-of-path/web/backups/DATABASENAME_date "+%Y-%m-%d".gz
This should work:
#!/bin/sh
DIR="/path/to/your/files"
now=$(date +%s)
DAYS=30
for file in "$DIR/"*
do
if [ $(((`stat $file -c '%Y'`) + (86400 * $DAYS))) -lt $now ]
then
# process / rm / whatever the file...
fi
done
A bit of explanation: stat <file> -c '%Z' gives the modification time of the file as seconds since the UNIX epoch for a file, and $(date +%s) gives the current UNIX timestamp. Then there's just a simple check to see whether the file's timestamp, plus seven days' worth of seconds, is greater than the current timestamp.
Since you have time in the filename then use that to time the deletion heres some code that does that :
This script gets the current time in seconds since epoch and then calculates the timestamp 7 days ago. Then for each file parses the filename and converts the date embeded in each filename to a timestamp then compares timestamps to determine which files to delete. Using timestamps gets rid of all hassles with working with dates directly (leap year, different days in months, etc )
The actual remove is commented out so you can test the code.
#funciton to get timestamp X days prior to input timestamp
# arg1 = number of days past input timestamp
# arg2 = timestamp ( e.g. 1324505111 ) seconds past epoch
getTimestampDaysInPast () {
daysinpast=$1
seconds=$2
while [ $daysinpast -gt 0 ] ; do
daysinpast=`expr $daysinpast - 1`
seconds=`expr $seconds - 86400`
done
# make midnight
mod=`expr $seconds % 86400`
seconds=`expr $seconds - $mod`
echo $seconds
}
# get current time in seconds since epoch
getCurrentTime() {
echo `date +"%s"`
}
# parse format and convert time to timestamp
# e.g. 2011-12-23 -> 1324505111
# arg1 = filename with date string in format %Y-%m-%d
getFileTimestamp () {
filename=$1
date=`echo $filename | sed "s/[^0-9\-]*\([0-9\-]*\).*/\1/g"`
ts=`date -d $date | date +"%s"`
echo $ts
}
########################### MAIN ############################
# Expect directory where files are to be deleted to be first
# arg on commandline. If not provided then use current working
# directory
FILEDIR=`pwd`
if [ $# -gt 0 ] ; then
FILEDIR=$1
fi
cd $FILEDIR
now=`getCurrentTime`
mustBeBefore=`getTimestampDaysInPast 7 $now`
SAVEIFS=$IFS
# need this to loop around spaces with filenames
IFS=$(echo -en "\n\b")
# for safety change this glob to something more restrictive
for f in * ; do
filetime=`getFileTimestamp $f`
echo "$filetime lt $mustBeBefore"
if [ $filetime -lt $mustBeBefore ] ; then
# uncomment this when you have tested this on your system
echo "rm -f $f"
fi
done
# only need this if you are going to be doing something else
IFS=$SAVEIFS
If you prefer to rely on the date in the filenames, you can use this routine, that checks if a date is older than another:
is_older(){
local dtcmp=`date -d "$1" +%Y%m%d`; shift
local today=`date -d "$*" +%Y%m%d`
return `test $((today - dtcmp)) -gt 0`
}
and then you can loop through filenames, passing '-7 days' as the second date:
for filename in *;
do
dt_file=`echo $filename | grep -o -E '[12][0-9]{3}(-[0-9]{2}){2}'`
if is_older "$dt_file" -7 days; then
# rm $filename or whatever
fi
done
In is_older routine, date -d "-7 days" +%Y%m%d will return the date of 7 days before, in numeric format ready for the comparison.
DIR=''
now=$(date +%s)
for file in "$DIR/"*
do
echo $(($(stat "$file" -c '%Z') + $((86400 * 7))))
echo "----------"
echo $now
done

Resources