Get/extract the data from log file of last 3 minutes? [duplicate] - linux

This question already has answers here:
Filter log file entries based on date range
(5 answers)
Closed 6 years ago.
I have agent.log file. This file is updating as regular interval.
Entries are as follows 2014-01-07 03:43:35,223 INFO ...some data
I want to extract data of last 3 minutes, Is there any way so that I will get this data using bash script?

Try the solution below:
awk \
-v start="$(date +"%F %R" --date=#$(expr `date +%s` - 180))" \
-v end="$(date "+%F %R")" \
'$0 ~ start, $0 ~ end' \
agent.log
In the start variable there is the time stamp 3 minutes (180 seconds) before the current time.
In the end there is the current time.
$0 ~ start, $0 ~ end selects the lines between start and end

date +"%F %R" gives you the current time down to the minute.
grep '^'"$(date +"%F %R")" agent.log will select the last minute from the file
Now for the previous two minutes it's more tricky... I have developed some scripts that can do complete time manipulation in relative or absolute, and it may be simpler than fiddling with date...
2 minutes ago in the right format: date --date="#$(($(date +"%s") - 2*60))" +"%F %R"
Merge all 3:
NOW=$(date +"%F %R")
M1=$(date --date="#$(($(date +"%s") - 1*60))" +"%F %R")
M2=$(date --date="#$(($(date +"%s") - 2*60))" +"%F %R")
grep '^'"$NOW\|$M1\|$M2" agent.log

my answer considers the followings:
using bash and UNIX/Linux commands
the last log line is the start time not the actual server time
there is no expectation about the lines' date (minutes, days, years, etc.)
the whole script should be expandable to the inverse, or a specified from-to interval
#!/bin/bash
# this script expects descending dates in a log file (reverse as real life examples)!!!
FILE=$1
INTV=180 # sec
while read LINE
do
if [ -z $LAST_LOG_LINE ]
then
# interval stat line
LAST_LOG_LINE=$(date --date="$( echo "$LINE" | sed -e 's/INFO.*//')" +%s)
# mod
#continue
fi
ACT_LOG_LINE=$(date --date="$( echo "$LINE" | sed -e 's/INFO.*//')" +%s)
# print line if not greater than $INTV (180s)
# else break the reading and exit
if [ $(($LAST_LOG_LINE-$ACT_LOG_LINE)) -gt $INTV ]
then
break
fi
# actual print
echo "$LINE"
done < $FILE
Testing:
2014-01-07 03:43:35,223 INFO ...some data
2014-01-07 03:42:35,223 INFO ...some data
2014-01-07 03:41:35,223 INFO ...some data
2014-01-07 03:40:35,223 INFO ...some data
2014-01-07 02:43:35,223 INFO ...some data
2014-01-07 01:43:35,223 INFO ...some data
2014-01-06 03:43:35,223 INFO ...some data
$ /tmp/stack.sh /tmp/log
2014-01-07 03:42:35,223 INFO ...some data
2014-01-07 03:41:35,223 INFO ...some data
2014-01-07 03:40:35,223 INFO ...some data
$

I think you may be somewhat better off using Python in this case. Even if this script doesn't find a date exactly 3 minutes ago, it will still get any log entries in between the time the script was called and 3 minutes ago. This is both concise and more robust than some of the previous solutions offered.
#!/usr/bin/env python
from datetime import datetime, timedelta
with open('agent.log') as f:
for line in f:
logdate = datetime.strptime(line.split(',')[0], '%Y-%m-%d %H:%M:%S')
if logdate >= datetime.now() - timedelta(minutes=3):
print(line)

A Ruby solution (tested on ruby 1.9.3)
You can pass days, hours, minutes or seconds as a parameter and it will search for the expression and on the file specified (or directory, in which case it will append '/*' to the name):
In your case just call the script like so: $0 -m 3 "expression" log_file
Note: Also if you know the location of 'ruby' change the shebang (first line of the script),
for security reasons.
#! /usr/bin/env ruby
require 'date'
require 'pathname'
if ARGV.length != 4
$stderr.print "usage: #{$0} -d|-h|-m|-s time expression log_file\n"
exit 1
end
begin
total_amount = Integer ARGV[1]
rescue ArgumentError
$stderr.print "error: parameter 'time' must be an Integer\n"
$stderr.print "usage: #{$0} -d|-h|-m|-s time expression log_file\n"
end
if ARGV[0] == "-m"
gap = Rational(60, 86400)
time_str = "%Y-%m-%d %H:%M"
elsif ARGV[0] == "-s"
gap = Rational(1, 86400)
time_str = "%Y-%m-%d %H:%M:%S"
elsif ARGV[0] == "-h"
gap = Rational(3600, 86400)
time_str = "%Y-%m-%d %H"
elsif ARGV[0] == "-d"
time_str = "%Y-%m-%d"
gap = 1
else
$stderr.print "usage: #{$0} -d|-h|-m|-s time expression log_file\n"
exit 1
end
pn = Pathname.new(ARGV[3])
if pn.exist?
log = (pn.directory?) ? ARGV[3] + "/*" : ARGV[3]
else
$stderr.print "error: file '" << ARGV[3] << "' does not exist\n"
$stderr.print "usage: #{$0} -d|-h|-m|-s time expression log_file\n"
end
search_str = ARGV[2]
now = DateTime.now
total_amount.times do
now -= gap
system "cat " << log << " | grep '" << now.strftime(time_str) << ".*" << search_str << "'"
end

Related

How can I “fill in the blanks” for a command in a script from user input?

I’m trying to build a script that asks for a time clock number and a DC number from the user running the script, which I’m intending to fill in the Xs for
/u/applic/tna/shell/tc_software_update.sh tmcxx.s0xxxx.us REFURBISHED
However, I am stumped as to how to have the user’s input fill in those Xs on that command within the script. This script is in its earliest stages so it’s very rough right now lol. Thank you for responding. Here’s the scripts skeleton I’m working on:
#!/bin/bash
#This server is intended to speed up the process to setup timeclocks from DC tickets
#Defines time clock numbers
timeclocks="01|02|03|04|05|06|07|08|09|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35"
#Defines DC number
echo “What is the DC number?”
read dc
#Defines TMC number
echo "What is the Time Clock number?"
read number
if $number == $timeclocks && $dc == ???; then
/u/applic/tna/shell/tc_software_update.sh tmcxx.s0xxxx.us REFURBISHED
Do you mean invoking $ /u/applic/tna/shell/tc_software_update.sh tmc${number}.s0${dc}.us REFURBISHED?
Consider the following snippet:
[test.sh]
read x
read y
echo "x=${x}, y=${y}"
$ sh test.sh
5
4
x=5, y=4
Further on, you can use command line arguments ($1, $2 etc.) instead of the user input.
Modelling this on your script:
timeclocks=( {1..35} )
printf '%s' "DC number: "; read dc
printf '%s' "Time Clock number: "; read tmc
tmc=$( printf '%02d' "$tmc" )
dc=$( printf '%04d' "$dc" )
tmc_valid=$( for t in ${timeclocks[#]}; do \
[[ $tmc -eq $t ]] && echo true && break; \
done )
[[ "$tmc_valid" = "true" && "$dc" = "???" ]] && \
/u/applic/tna/shell/tc_software_update.sh tmc${tmc}.s0${dc}.us REFURBISHED

How to display content of a file which added from past 5 mins without scanning whole file in Linux?

I have DB error log file, it will grow continuously.
Now i want to set some error monitoring on that file for every 5 minutes.
The problem is i don’t want to scan whole file for every 5 minutes(when monitoring cron executed), because it may grow very big in future. Scanning through whole(big) file for every 5 mins will consume bit more resources.
So i just want to scan only the lines which were inserted/written to the log during last 5 mins interval.
Each error recorded in log will have Timestamp prepend to it like below:
180418 23:45:00 [ERROR] mysql got signal 11.
So i want to search with pattern [ERROR] only on lines which were added from last 5 mins(not whole file) and place the output to another file.
Please help me here.
Feel free if u need more clarification on my question.
I’m using RHEL 7 and i’m trying to implement above monitoring through bash shell script
Serializing the Byte Offset
This picks up where the last instance left off. If you run it every 5 minutes, then, it'll scan 5 minutes of data.
Note that this implementation knowingly can scan data added during an invocation's run twice. This is a little sloppy, but it's much safer to scan overlapping data twice than to never read it at all, which is a risk that can be run if relying on cron to run your program on schedule (likewise, sleeps can run over the requested time if the system is busy).
#!/usr/bin/env bash
file=$1; shift # first input: filename
grep_opts=( "$#" ) # remaining inputs: grep options
dir=$(dirname -- "$file") # extract directory name to use for offset storage
basename=${file##*/} # pick up file name w/o directory
size_file="$dir/.$basename.size" # generate filename to use to store offset
if [[ -s $size_file ]]; then # ...if we already have a file with an offset...
old_size=$(<"$size_file") # ...read it from that file
else
old_size=0 # ...otherwise start at the front.
fi
new_size=$(stat --format=%s -- "$file") || exit # Figure out current size
if (( new_size < old_size )); then
old_size=0 # file was truncated, so we can't trust old_size
elif (( new_size == old_size )); then
exit 0 # no new contents, so no point in trying to search
fi
# read starting at old_size and grep only that content
dd iflag=skip_bytes skip="$old_size" if="$file" | grep "${grep_opts[#]}"; grep_retval=$?
# if the read failed, don't store an updated offset
(( ${PIPESTATUS[0]} != 0 )) && exit 1
# create a new tempfile to store offset in
tempfile=$(mktemp -- "${size_file}.XXXXXX") || exit
# write to that temporary file...
printf '%s\n' "$new_size" > "$tempfile" || { rm -f "$tempfile"; exit 1; }
# ...and if that write succeeded, overwrite the last place where we serialized output.
mv -- "$tempfile" "$new_size" || exit
exit "$grep_retval"
Alternate Mode: Bisect For The Timestamp
Note that this can miss content if you're relying on, say, cron to invoke your code every 5 minutes on-the-dot; storing byte offsets can thus be more accurate.
Using the bsearch tool by Ole Tange:
#!/usr/bin/env bash
file=$1; shift
start_date=$(date -d 'now - 5 minutes' '+%y%m%d %H:%M:%S')
byte_offset=$(bsearch --byte-offset "$file" "$start_date")
dd iflag=skip_bytes skip="$byte_offset" if="$file" | grep "$#"
Another approach could be something like this:
DB_FILE="FULL_PATH_TO_YOUR_DB_FILE"
current_db_size=$(du -b "$DB_FILE" | cut -f 1)
if [[ ! -a SOME_PATH_OF_YOUR_CHOICE/last_size_db_file ]] ; then
tail --bytes $current_db_size $DB_FILE > SOME_PATH_OF_YOUR_CHOICE/log-file_$(date +%Y-%m-%d_%H-%M-%S)
else
if [[ $(cat last_size_db_file) -gt $current_db_size ]] ; then
previously_readed_bytes=0
else
previously_readed_bytes=$(cat last_size_db_file)
fi
new_bytes=$(($current_db_size - $previously_readed_bytes))
tail --bytes $new_bytes $DB_FILE > SOME_PATH_OF_YOUR_CHOICE/log-file_$(date +%Y-%m-%d_%H-%M-%S)
fi
printf $current_db_size > SOME_PATH_OF_YOUR_CHOICE/last_size_db_file
this prints all bytes of DB_FILE not previously printed to SOME_PATH_OF_YOUR_CHOICE/log-file_$(date +%Y-%m-%d_%H-%M-%S)
Note that $(date +%Y-%m-%d_%H-%M-%S) will be the current 'full' date at the time of creating the log file
you can make this an script, and use cron to execute that script every five minutes; something like this:
*/5 * * * * PATH_TO_YOUR_SCRIPT
Here is my approach:
First, read the whole log once so far.
If you reach the end, collect and read new lines for a timespan (in my example 9 seconds, for faster testing, while my dummy server appends to the logfile every 3 seconds).
After the timespan, echo the cache, clear the cache (an array arr), loop and sleep for some time, so that this process doesn't consume all CPU time.
First, my dummy logfile writer:
#!/bin/bash
#
# dummy logfile writer
#
while true
do
s=$(( $(date +%s) % 3600))
echo $s server msg
sleep 3
done >> seconds.log
Startet via ./seconds-out.sh &.
Now the more complicated part:
#!/bin/bash
#
# consume a logfile as written so far. Then, collect every new line
# and show it in an interval of $interval
#
interval=9 # 9 seconds
#
printf -v secnow '%(%s)T' -1
start=$(( secnow % (3600*24*365) ))
declare -a arr
init=false
while true
do
read line
printf -v secnow '%(%s)T' -1
now=$(( secnow % (3600*24*365) ))
# consume every line created in the past
if (( ! init ))
then
# assume reading a line might not take longer than a second (rounded to whole seconds)
while (( ${#line} > 0 && (now - start) < 2 ))
do
read line
start=$now
echo -n "." # for debugging purpose, remove
printf -v secnow '%(%s)T' -1
now=$(( secnow % (3600*24*365) ))
done
init=1
echo "init=$init" # for debugging purpose, remove
# collect new lines, display them every $interval seconds
else
if ((${#line} > 0 ))
then
echo -n "-" # for debugging purpose, remove
arr+=("read: $line \n")
fi
if (( (now - start) > interval ))
then
echo -e "${arr[#]]}"
arr=()
start=$now
fi
fi
sleep .1
done < seconds.log
Output with logfile generator in 3 seconds, running for some time, then starting the read-seconds.sh script, with debugging output activated:
./read-seconds.sh
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................init=1
---read: 1688 server msg
read: 1691 server msg
read: 1694 server msg
---read: 1697 server msg
read: 1700 server msg
read: 1703 server msg
----read: 1706 server msg
read: 1709 server msg
read: 1712 server msg
read: 1715 server msg
^C
Every dot represents a logfile line from the past and therefor skipped.
Every dash represents a logfile line collected.

formatting bash/shell script time to days, hours, minutes and second

I have a backup script which is written in bash/shell scripting language. I calculate the total runtime/execution time by doing this:
#!/bin/bash
# start time
res1=$(date +%s.%N)
### do some work here
# end time & calculate
res2=$(date +%s.%N)
dt=$(echo "$res2 - $res1" | bc)
dd=$(echo "$dt/86400" | bc)
dt2=$(echo "$dt-86400*$dd" | bc)
dh=$(echo "$dt2/3600" | bc)
dt3=$(echo "$dt2-3600*$dh" | bc)
dm=$(echo "$dt3/60" | bc)
ds=$(echo "$dt3-60*$dm" | bc)
# finished
printf " >>> Process Completed - Total Runtime (d:h:m:s) : %d:%02d:%02d:%02.4f\n" $dd $dh $dm $ds
echo " "
exit 0
This outputs something like this:
How do you format the result, so it looks something like this:
0 Days, 0 Hours, 0 Minutes and 0.0968 Seconds
If it can intelligently show only values > 0, like these examples - it would be abonus:
7 Minutes and 5.126 Seconds
or
2 hours, 4 Minutes and 1.106 Seconds
or
7.215 Seconds etc...
You can use your last printf like this:
printf " >>> Process Completed - Total Runtime (d:h:m:s) : %d Days, %02d Hours, %02d Minutes, %02.4f Seconds\n" $dd $dh $dm $ds
However I would suggest you to use awk and do all calculations and formatting in awk itself so that you can avoid many invocations of bc.
Suggested awk script:
awk -v res1="$res1" -v res2="$res2" 'BEGIN {dt=res2-res1; dd=dt/86400; dt2=dt-86400*dd;
dh=dt2/3600; dt3=dt2-3600*dh; dm=dt3/60; ds=dt3-60*dm;
printf " >>> Process Completed - Total Runtime (d:h:m:s) : %d Days, %02d Hours, %02d Minutes, %02.4f Seconds\n",
dt/86400, dd, dh, dm, ds}'

How can you can calculate the time span between two time entries in a file using a shell script?

In a Linux script: I have a file that has two time entries for each message within the file. A 'received time' and a 'source time'. there are hundreds of messages within the file.
I want to calculate the elapsed time between the two times.
2014-07-16T18:40:48Z (received time)
2014-07-16T18:38:27Z (source time)
The source time is 3 lines after the received time, not that it matters.
info on the input data:
The input has a lines are as follows:
TimeStamp: 2014-07-16T18:40:48Z
2 lines later: a bunch of messages in one line and within each line, multiple times is:
sourceTimeStamp="2014-07-16T18:38:27Z"
If you have GNU's date (not busybox's), you can give difference in seconds with:
#!/bin/bash
A=$(date -d '2014-07-16T18:40:48Z' '+%s')
B=$(date -d '2014-07-16T18:38:27Z' '+%s')
echo "$(( A - B )) seconds"
For busybox's date and ash (modern probably / BusyBox v1.21.0):
#!/bin/ash
A=$(busybox date -d '2014-07-16 18:40:48' '+%s')
B=$(busybox date -d '2014-07-16 18:38:27' '+%s')
echo "$(( A - B )) seconds"
you should be able to use date like this (e.g.)
date +%s --date="2014-07-16T18:40:48Z"
to convert both timestamps into a unix timestamp. Getting the time difference between them is then reduced to a simple subtraction.
Does this help?
I would use awk. The following script searches for the lines of interest, converts the time value into a UNIX timestamp and saves them in the start, end variables. At the end of the script the difference will get calculated and printed:
timediff.awk:
/received time/ {
"date -d "$1" +%s" | getline end
}
/source time/ {
"date -d "$1" +%s" | getline start
exit
}
END {
printf "%s seconds in between", end - start
}
Execute it like this:
awk -f timediff.awk log.file
Output:
141 seconds in between

Generating multiple files with the same structure

I want to generate a series of files in which the file name of each file shall be increased by 1 (File1.txt, File2.txt, File3.txt, ... FileN.txt) where N = 250
Each file has 2 lines.
AAAXXX (where XXX = 001 to 250 - automatic increased for each file)
BBBYYY (where YYY = 3 digit random number )
Example:
File1.txt:
AAA001
BBB175
File5.txt:
AAA005
BBB067
File102.txt:
AAA102
BBB765
I'm a newbie using Ubuntu Linux 12.04 - but I'm hoping someone can assist.
You can do it as follows:
#!/bin/bash
for i in {1..250}
do
printf "AAA%03d\nBBB%03d" ${i} $(($RANDOM % 1000)) > File${i}.txt
done
Explanation:
for i in {1..250} - bash way of specifying iteration from 1 to 250, increment size of 1.
printf - shell printf command - used to print formatted string
AAA - string literal (means "exactly as written")
%03d - formatted string, this prints a decimal number padded with 3 zero's in front.
\n - newline
BBB - another string literal
%03d - same as before
${i} - this is the value used in the first formatted string (%03d)
$(($RANDOM % 1000)) - $RANDOM is a system variable that provides a random number for you each time you access it. The % 1000 to take the modulo so you get a range betwee 0-999. This is used in the 2nd formatted string (%03d)
> File${i}.txt: output redirection; creates and saves to a file (overwrites if file already exists.
Here's a quick one-liner that might start you off:
for i in {1..250}; do printf "AAA%03d\nBBB%03d" $i $(($RANDOM % 1000)) > "File${i}.txt"; done
Using bash:
for i in {1..250}; do printf "AAA%03d\nBBB%03d\n" "$i" "$((RANDOM%1000))" > "File$i.txt"; done
You can write a bash script for this
#!/bin/bash
for (( i=1; i<=250; i++ ))
do
NUMBER=$[ ( $RANDOM % 999 ) + 100 ]
echo "AAA$i BBB$NUMBER" > File$i.txt
done

Resources