I am trying to parse a log and get the lines between timestamp.Tried sed approach like below but facing issue with regex
Log pattern:
IP - - [20/Apr/2018:14:25:37 +0000] "GET / HTTP/1.1" 301 3936 "-" "
IP - - [20/Apr/2018:14:44:08 +0000]
----------------------------------
IP- - [20/Apr/2018:20:43:46 +0000]
I need to get the lines between 14:25 and 20:43 for 20th april as the log contains other dates also.
Tried this:
sed -n '/\[14:25/,/\[20:43/p' *-https_access.log.1
but not working.
Since you mentioned you want logs for 20th April, I'd suggest something like :
$ sed -n '/20\/Apr\/2018:14:25/,/20\/Apr\/2018:20:43/p' *-https_access.log.1
This is very less likely to conflict with false matches in case "20:43" occurs elsewhere.
sed is not appropriate because it's hard to compare element (like day and hour).
with awk (self commented):
awk -F '[ []' '
{
# separt date and hour then rebuild the fields
sub(/:/, " ", $5);$0=$0""
}
# print if it s the day and between the 2 hour (string compare works in this case)
$5 ~ /20.Apr.2018/ && $6 >= "04:25" && $7 < "20:44"
' YourFile
more generaly, we can use variable to give date and hour as paramter to the awk (not the purpose here)
To print lines between match1 and match2 with sed or awk you can do:
sed -n '/match1/,/match2/p' inputfile
awk '/match1/,/match2/' inputfile
in your example match1 is 20/Apr/2018:14:25 and match2 is 20/Apr/2018:20:43. So any of these commands should work for you:
sed -n '/20\/Apr\/2018:14:25/,/20\/Apr\/2018:20:43/p' inputfile
awk '/20\/Apr\/2018:14:25/,/20\/Apr\/2018:20:43/' inputfile
or use | as a sed's delimiter to prevent escaping slash:
sed -n '\|20/Apr/2018:14:25|,\|20/Apr/2018:20:43|p' inputfile
The best solution is to use awk for this. What you need to do is convert your time-stamps to a unix-time and then do the comparisons. In awk you can do this using mktime()
mktime(datespec [, utc-flag ]): Turn datespec into a timestamp in the same form as is returned by systime(). It is similar to the
function of the same name in ISO C. The argument, datespec, is a
string of the form YYYY MM DD HH MM SS [DST]. The string consists of
six or seven numbers representing, respectively, the full year
including century, the month from 1 to 12, the day of the month from 1
to 31, the hour of the day from 0 to 23, the minute from 0 to 59, the
second from 0 to 60,55 and an optional daylight-savings flag.
In order to convert your time-format of the form 20/Apr/2018:14:25:37 +0000 into 2018 04 20 14 25 37 +0000
awk -v tstart="20/Apr/2018:14:25:00" -v tend = "20/Apr/2018:20:43:00" \
'function tounix(str) {
split(str,a,"/|:| ")
return mktime(a[3]" "month[a[2]]" "a[1]" "a[4]" "a[5]" "a[6])
}
BEGIN{
month["Jan"]="01";month["Feb"]="02";month["Mar"]="03"
month["Apr"]="04";month["May"]="05";month["Jun"]="06"
month["Jul"]="07";month["Aug"]="08";month["Sep"]="09"
month["Oct"]="10";month["Nov"]="11";month["Dec"]="12"
FS="\\[|\\]"
t1=tounix(tstart)
t2=tounix(tend)
}
{ t=tounix($2) }
(t1<=t && t<=t)' <file>
This method is robust as it will do true time comparisons which are independent of leap years, day/month/year-cross-overs, ... In contrast to other solutions provided, this method also does not require the existence of the date tstart and tend in the file
Related
I have log file containing logs like this:
[Oct 13 09:28:15] WARNING.... Today is good day...
[Oct 13 09:28:15] Info... Tommorow will be...
[Oct 13 09:28:15] WARNING.... Yesterday was...
I need shell command to count occurrences of certain string in last 5 minutes.
I have tried this:
$(awk -v d1="$(date --date="-5 min" "+%b %_d %H:%M:%S")" -v d2="$(date "+%b %_d %H:%M:%S")" '$0 > d1 && $0 < d2 || $0 ~ d2' "$1" |
grep -ci "$2")
and calling script like this: sh ${script} /var/log/message "day" but it does not work
Your immediate problem is that you are comparing dates in random string format. To Awk (and your computer generally) a string which starts with "Dec" is "less than" a string which starts with "Oct" (this is what date +%b produces). Generally, you would want both your log files and your programs to use dates in some standard computer-readable format, usually ISO 8601.
Unfortunately, though, sometimes you can't control that, and need to adapt your code accordingly. The solution then is to normalize the dates before comparing them.
awk -v d1=$(date -d "-5 min" +"%F-%T") -v d2=$(date +"%F-%T") '
BEGIN { split("Jan:Feb:Mar:Apr:May:Jun:Jul:Aug:Sep:Oct:Nov:Dec", m, ":")
for (i=1; i<=12; ++i) mon["[" m[i]] = i }
{ timestamp = substr(d1, 1, 5) mon[$1] "-" $2 "-" $3 }
timestamp > d1 && timestamp <= d2' "$1" | grep -ci "$2
This will not work across New Year boundaries, but should hopefully at least help get you started in the right direction. (I suppose you could check if the year in d2 is different, and then check if the month in $1 is January, and then add 1 to the year from d1 in timestamp; but I leave this as an exercise for the desperate. This still won't work across longer periods of time, but the OP calls for a maximum period of 5 minutes, so the log can't straddle multiple years. Or if it does, you have a more fundamental problem.)
Perhaps note as well that date -d is a GNU extension which is not portable to POSIX (so this will not work e.g. on MacOS without modifications).
(Also, for production use, I would refactor the grep -ci into the Awk script; see also useless use of grep.)
Finally, the command substitution $(...) around your entire command line is wrong; this would instruct your shell to use the output from Awk and run it as a command.
I have a log file, I'm trying to reformat using sed/awk/grep but running into difficulties with the date format. The log looks like this:
1.2.3.4 - - [28/Mar/2019:11:43:58 +0000] "GET /e9bb2dddd28b/5.6.7.8/YL0000000000.rom HTTP/1.1" "-" "Yealink W52P 25.81.0.10 00:00:00:00:00:00" 404 - 1 5 0.146
I would like the output as so:
Yealink,1.2.3.4,28-03-2019 11:43:58
I have tried the following:
grep Yealink access.log | grep 404 | sed 's/\[//g' | awk '{print "Yealink,",$1,",",strftime("%Y-%m-%d %H:%M:%S", $4)}' | sed 's/, /,/g' | sed 's/ ,/,/g'
edit - removing [ before passing date string to strftime based on comments - but still not working as expected
However this returns a null date - so clearly I have the strftime syntax wrong:
Yealink,1.2.3.4,1970-01-01 01:00:00
Update 2019-10-25: gawk is now getting strptime() in an extension library, see https://groups.google.com/forum/#!msg/comp.lang.awk/Ft6_h7NEIaE/tmyxd94hEAAJ
Original post:
See the gawk manual for strftime, it doesn't expect a time in any format except seconds since the epoch. If gawk had a strptime() THEN that would work, but it doesn't (and I can't persuade the maintainers to provide one) so you have to massage the timestamp into a format that mktime() can convert to seconds and then pass THAT to strftime(), e.g.:
$ awk '{
split($4,t,/[[\/:]/)
old = t[4] " " (index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3 " " t[2] " " t[5] " " t[6] " " t[7];
secs = mktime(old)
new = strftime("%d-%m-%Y %T",secs);
print $4 ORS old ORS secs ORS new
}' file
[28/Mar/2019:11:43:58
2019 3 28 11 43 58
1553791438
28-03-2019 11:43:58
but of course you don't need mktime() or strftime() at all - just shuffle the date components around:
$ awk '{
split($4,t,/[[\/:]/)
new = sprintf("%02d-%02d-%04d %02d:%02d:%02d",t[2],(index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3,t[4],t[5],t[6],t[7])
print $4 ORS new
}' file
[28/Mar/2019:11:43:58
28-03-2019 11:43:58
That will work in any awk, not just GNU awk, since it doesn't require time functions.
index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3 is just the idiomatic way to convert a 3-char month name abbreviation (e.g. Mar) into the equivalent month number (3).
Another awk, thanks #EdMorton for reviewing the getline usage.
The idea here is to use date command in awk which accepts abbreviated Months
$ date -d"28/Mar/2019:11:43:58 +0000" "+%F %T" # Fails
date: invalid date ‘28/Mar/2019:11:43:58 +0000’
$ date -d"28 Mar 2019:11:43:58 +0000" "+%F %T" # Again fails because of : before time section
date: invalid date ‘28 Mar 2019:11:43:58 +0000’
$ date -d"28 Mar 2019 11:43:58 +0000" "+%F %T" # date command works but incorrect time because of + in the zone
2019-03-28 17:13:58
$ date -d"28 Mar 2019 11:43:58" "+%F %T" # correct value after stripping +0000
2019-03-28 11:43:58
$
Results
awk -F"[][]" -v OFS=, '/Yealink/ {
split($1,a," "); #Format $1 to get IP
gsub("/", " ",$2); sub(":"," ",$2); sub("\\+[0-9]+","",$2); # Massage to get data value
cmd = "date -d\047" $2 "\047 \047+%F %T\047"; if ( (cmd | getline line) > 0 ) $2=line; close(cmd) # use system date
print "Yealink",a[1],$2
} ' access.log
Below is the file content
$ cat access.log
1.2.3.4 - - [28/Mar/2019:11:43:58 +0000] "GET /e9bb2dddd28b/5.6.7.8/YL0000000000.rom HTTP/1.1" "-" "Yealink W52P 25.81.0.10 00:00:00:00:00:00" 404 - 1 5 0.146
$
How can I get the value of up from below command on linux?
# w
01:16:08 up 20:29, 1 user, load average: 0.50, 0.34, 0.30
USER TTY LOGIN# IDLE JCPU PCPU WHAT
root pts/0 00:57 0.00s 0.11s 0.02s w
# w | grep up
01:16:17 up 20:29, 1 user, load average: 0.42, 0.33, 0.29
On Linux, the easiest way to get the uptime in (fractional) seconds is via the 1st field of /proc/uptime (see man proc):
$ cut -d ' ' -f1 /proc/uptime
350735.47
To format that number the same way that w and uptime do, using awk:
$ awk '{s=int($1);d=int(s/86400);h=int(s % 86400/3600);m=int(s % 3600 / 60);
printf "%d days, %02d:%02d\n", d, h, m}' /proc/uptime
4 days, 01:25 # 4 days, 1 hour, and 25 minutes
To answer the question as asked - parsing the output of w (or uptime, whose output is the same as w's 1st output line, which contains all the information of interest), which also works on macOS/BSD, with a granularity of integral seconds:
A perl solution:
<(uptime) is a Bash process substitution that provides uptime's output as input to the perl command - see bottom.
$ perl -nle 'print for / up +((?:\d+ days?, +)?[^,]+)/' <(uptime)
4 days, 01:25
This assumes that days is the largest unit every displayed.
perl -nle tells Perl to process the input line by line, without printing any output by default (-n), automatically stripping the trailing newline from each input line on input, and automatically appending one on output (-l); -e tells Perl to treat the next argument as the script (expression) to process.
print for /.../ tells Perl to output what each capture group (...) inside regex /.../ captures.
up + matches literal up, preceded by (at least) one space and followed by 1 or more spaces (+)
(?:\d+ days?, +)? is a non-capturing subexpression - due to ?: - that matches:
1 or more digits (\d+)
followed by a single space
followed by literal day, optionally followed by a literal s (s?)
the trailing ? makes the entire subexpression optional, given that a number-of-days part may or may not be present.
[^,]+ matches 1 or more (+) subsequent characters up to, but not including a literal , ([^,]) - this is the hh:mm part.
The overall capture group - the outer (...) therefore captures the entire up-time expression - whether composed of hh:mm only, or preceded by <n> day/s> - and prints that.
<(uptime) is a Bash process substitution (<(...))
that, loosely speaking, presents uptime's output as a (temporary, self-deleting) file that perl can read via stdin.
Something like this with gnu sed:
$ w |head -n1
02:06:19 up 3:42, 1 user, load average: 0.01, 0.05, 0.13
$ w |sed -r '1 s/.*up *(.*),.*user.*/\1/g;q'
3:42
$ echo "18:35:23 up 18 days, 9:08, 6 users, load average: 0.09, 0.31, 0.41" \
|sed -r '1 s/.*up *(.*),.*user.*/\1/g;q'
18 days, 9:08
Given that the format of the uptime depends on whether it is less or more than 24 hours, the best I could come up with is a double awk:
$ w
18:35:23 up 18 days, 9:08, 6 users,...
$ w | awk -F 'user|up ' 'NF > 1 {print $2}' \
| awk -F ',' '{for(i = 1; i < NF; i++) {printf("%s ",$i)}} END{print ""}'
18 days 9:08
I am trying to grep a log file for entries within the last 24 hours. I came up with the following command:
grep "$(date +%F\ '%k')"\|"$(date +%F --date='yesterday')\ [$(date +%k)-23]" /path/to/log/file
I know regular expressions can be used in grep, but am not very familiar with regex. You see I am greping for anything from today or anything from yesterday between the current hour or higher. This isnt working and I am guessing due to the way I am trying to pass a command as a variable in the regex of grep. I also wouldnt be opposed to using awk with awk I came up with the following but it is not checking the variables properly:
t=$(date +%F) | y=$(date +%F --date='yesterday') | hr=$(date +%k) | awk '{ if ($1=$t || $1=$y && $2>=$hr) { print $0 }}' /path/to/log/file
I would assume systime could be used with awk rather than settings variables but i am not familiar with systime at all. Any suggestions with either command would be greatly appreciated! Oh and here's the log formatting:
2012-12-26 16:33:16 SMTP connection from [127.0.0.1]:46864 (TCP/IP connection count = 1)
2012-12-26 16:33:16 SMTP connection from (localhost) [127.0.0.1]:46864 closed by QUIT
2012-12-26 16:38:19 SMTP connection from [127.0.0.1]:48451 (TCP/IP connection count = 1)
2012-12-26 16:38:21 SMTP connection from [127.0.0.1]:48451 closed by QUIT
2012-12-26 16:38:21 SMTP connection from [127.0.0.1]:48860 (TCP/IP connection count = 1)
Here's one way using GNU awk. Run like:
awk -f script.awk file
Contents of script.awk:
BEGIN {
time = systime()
}
{
spec = $1 " " $2
gsub(/[-:]/, " ", spec)
}
time - mktime(spec) < 86400
Alternatively, here's the one-liner:
awk 'BEGIN { t = systime() } { s = $1 " " $2; gsub(/[-:]/, " ", s) } t - mktime(s) < 86400' file
Also, the correct way to pass shell vars to awk is to use the -v flag. I've made a few adjustments to your awk command to show you what I mean, but I recommend against doing this:
awk -v t="$(date +%F)" -v y="$(date +%F --date='yesterday')" -v hr="$(date +%k)" '$1==t || $1==y && $2>=hr' file
Explanation:
So before awk starts processing the file, the BEGIN block is processed first. In this block we create a variable called time / t and this is set using the systime() function. systime() simply returns the current time as the number of seconds since the system epoch. Then, for every line in your log file, awk will create another variable called spec / s and this is set to the first and second fields seperated by a single space. Additionally, other characters like - and : need to be globally substituted with spaces for the mktime() function to work correctly and this done using gsub(). Then it's just a little mathematics to test if the datetime in the log file is within the last 24 hours (or exactly 86400 seconds). If the test is true, the line will be printed. Maybe a little extra reading would help, see Time Functions and String Manipulation Functions. HTH.
I have a file with records having timestamp fields that include GMT offset. I want to use the sed command to replace the value on the record to a regular timestamp (without GMT offset).
For example:
`$date -d '2012/11/01 00:50:22 -0800' '+%Y-%m-%d %H:%M:%S'`
returns this value which is what I am looking to do:
2012-11-01 01:50:22
Except I want to perform that operation on every line of this file and apply the date command to the timestamp value. Here is a sample record:
"SB","6GV96644X48128125","","","","T0006",2012/10/03 13:08:43 -0700,"NJ"
Here is my code:
head -1 myfile | sed 's/,[0-9: /\-]\{25\},/,'"`date -d \1 '+%Y-%m-%d %H:%M:%S'`"',/
which doesn't work: it just ignores \1 and replaces the matched pattern with today's date:
"SB","6GV96644X48128125","","","","T0006",2012-11-14 01:00:00,"NJ"
I hoped that \1 would result in the matched patterns be passed to the date function and return a regular timestamp value (as in the example I provided above showing how the date functions applies the GMT offset and returns a regular stimestamp string) and would replace the old value on the record.
I would use awk instead. For example:
awk '{cmd="date -d \""$7"\" \"+%Y-%m-%d %H:%M:%S\"";
cmd | getline k; $7=k; print}' FS=, OFS=, myFile
This will replace the 7th field with the results of running the date command on the original contents of the 7th field.
In sed:
head -1 datefile |
sed '
# handle % in input correctly
s/%/%%/g
# execute date(1) command
s/\(.*,\)\([0-9: /\-]\{25\}\)\(,.*\)/'"date -d '\2' '+\1%Y-%m-%d %H:%M:%S\3'"'/e'
'
This might work for you (GNU sed):
sed -r 's/^(([^,]*,){6})([^,]*)(.*)/printf "%s%s%s" '\''\1'\'' $(date -d '\''\3'\'' '\''+%Y-%m-%d %H:%M:%S'\'') '\''\4'\''/e;q' file
use
head -1 datefile | sed -e 's?\(..\)/\(..\)/\(.... ..:..:..\)?'"date -d '\2/\1/\3' '+%s'"'?e'