How to get logs of last hour in linux using awk command - linux

I have a logs file named source.log having time format like :-
Fri, 09 Dec 2016 05:03:29 GMT 127.0.0.1
and i am using script to get logs from a logs file for last 1 hour.
Script:-
awk -vDate=`date -d'now-1 hour' +[%d/%b/%Y:%H:%M:%S` '$4 > Date {print Date, $0}' source.log > target.log
But this script gives the result same as like the source file.
There is something wrong in time format matching, due to which it is not giving last hour records.

I know I'm late to help the OP, but maybe this answer can help anyone else in this situation.
First it's necessary to compare the whole date and not only the time part, because times near midnight.
Note that awk can only compare strings and numbers. Some awk implementations have the mktime() function that converts a specifically formatted string into UNIX timestamp, in order to make datetime comparisons, but it doesn't support any datetime format, so we can't use it.
The best way would be changing (if possible) the datetime format of the log entries, using 'YYMMDDhhmmss' datetime format or ISO format. In this way, comparing two datetimes is simple as compare strings or numbers.
But let's assume that we can't change log entries date format, so we'll need to convert ourselves inside awk:
awk -vDate="`date -d'now-1 hour' +'%Y%m%d%H%M%S'`" '
BEGIN{
for(i=0; i<12; i++)
MON[substr("JanFebMarAprMayJunJulAugSepOctNovDec", i*3+1, 3)] = sprintf("%02d", i+1);
}
toDate() > Date
function toDate(){
time = $5; gsub(/:/, "", time);
return $4 MON[$3] $2 time;
}' source.log
Explanation
-vDate=... sets the Date awk variable with the initial datetime (one hour ago).
BEGIN section creates an array indexed by the month abbreviation (it's especific to english)
toDate() function converts the line's fields into a string with the same format as Date variable (YYYMMDDhhmmss).
Finally when the condition toDate() > Date is true, awk prints the current line (log entry).

Related

How to edit the lines in text file in Linux - format the date to YYYY-MM-DD and then grep the line by time period

Can anyone help to format this text file(YYYYMMDD) as a date formatted(YYYY-MM-DD) text file using bash script or in Linux command line? I am not sure how to start editing 23millon lines!!!
I have YYYYMMDD format textfile :-
3515034013|50008|20140601|20240730
and I want to edit like YYYY-MM-DD formatted text file(Only 3rd and 4th fields need to be changed for 23million lines):-
3515034013|50008|2014-06-01|2024-07-30
I Want to convert from YYYYMMDD formatted text file to the YYYY-DD-MM format and I want to get specific lines from the text file based on the time period after this file manipulation which is the end goal.
The end goal is to format the 3rd field and 4th field as YYYY-MM-DD and also want to grep the line by date from that formatted text file:- 03rd field is the start date and the 04th field is the end date Let's say for example I need,
(01). The end date(04th field) before today i.e 2022-08-06 - all the old lines
(02). The end date(04th field) is 2 years from now i.e lines in between 2022-08-06th <-> 2024-08-06th?
Please note:- There are more than a 23million lines to edit and analyze based on the date.
How to approach this problem statement? which method is time efficient awk or sed or Bash line-by-line editing?
$ awk '
BEGIN { FS=OFS="|" }
{
for ( i=3; i<=4; i++ ) {
$i = substr($i,1,4) "-" substr($i,5,2) "-" substr($i,7)
}
print
}
' file
3515034013|50008|2014-06-01|2024-07-30
Here is a way to do it with sed. It has the same restrictions as steffens answer: | as fieldseparator and that all dates have the same format i.e. leading zeros in the month and date part.
sed -E 's/^(.*[|])([0-9]{4})([0-9]{2})([0-9]{2})[|]([0-9]{4})([0-9]{2})([0-9]{2})$/\1\2-\3-\4|\5-\6-\7/g'
Here is what the regular expression does:
^(.*[|]) captures the first part of the string from linestart (^) to a | into \1, this captures the first two columns, because the remaining part of the re matches the remaining part of the line up until lineend!
([0-9]{4})([0-9]{2})([0-9]{2})[|] captures the first date field parts into \2 to \4, notice the [|]
([0-9]{4})([0-9]{2})([0-9]{2})$ does the same for the second date column anchored at lineend ($) and captures the parts into \5 to \7, notice the $
the replacement part \1\2-\3-\4|\5-\6-\7 inserts - at the different places
the capturing into \n happens because of the use of (...) parens in the regular expression.
Here's one way to change the format with awk:
awk '{$3=substr($3,1,4) "-" substr($3,5,2) "-" substr($3,7,2); $4=substr($4,1,4) "-" substr($4,5,2) "-" substr($4,7,2); print}' FS='|' OFS='|'
It should work given that
| is only used for field separation
all dates have the same format
You can pipe the transformed lines to a new file or change it in place. Of course you can do the same with sed or ed. I'd go for awk because you'd be able to extract your specific lines just in the same run to an extra file.
This might work for you (GNU sed):
sed -E 's/^([^|]*\|[^|]*\|....)(..)(..\|....)(..)/\1-\2-\3-\4-/' file
Pattern match and insert - where desired.
Or if the file is only 4 columns:
sed -E 's/(..)(..\|....)(..)(..)$/-\1-\2-\3-\4/' file

How to grep the logs between two date range in Unix

I have a log file abc.log in which each line is a date in date +%m%d%y format:
061019:12
062219:34
062319:56
062719:78
I want to see the all the logs between this date range (7 days before date to current date) i.e (from 062019 to 062719 in this case). The result should be:
062219:34
062319:56
062719:78
I have tried few things from my side to achieve:
awk '/062019/,/062719' abc.log
This gives me correct answer, but if i don't want to hard-code the date value and try achieving the same it does not give the correct value.
awk '/date --date "7 days ago" +%m%d%y/,/date +%m%d%y' abc.log
Note:
date --date "7 days ago" +%m%d%y → 062019 (7 days back date)
date +%m%d%y → 062719 (Current date)
Any suggestions how this can be achieved?
Your middle-endian date format is unfortunate for sorting and comparison purposes. Y-m-d would have been much easier.
Your approach using , ranges in awk requires exactly one log entry per day (and that the log entries are sorted chronologically).
I would use perl, e.g. something like:
perl -MPOSIX=strftime -ne '
BEGIN { ($start, $end) = map strftime("%y%m%d", localtime $_), time - 60 * 60 * 24 * 7, time }
print if /^(\d\d)(\d\d)(\d\d):/ && "$3$1$2" ge $start && "$3$1$2" le $end' abc.log
Use strftime "%y%m%d" to get the ends of the date range in big-endian format (which allows for lexicographical comparisons).
Use a regex to extract day/month/year from each line into separate variables.
Compare the rearranged date fields of the current line to the ends of the range to determine whether to print the line.
To get around the issue of looking for dates that may not be there, you could generate a pattern that matches any of the dates (since there are only 8 of them it doesn’t get too big, if you want to look for the last year it might not work as well):
for d in 7 6 5 4 3 2 1 0
do
pattern="${pattern:+${pattern}\\|}$(date --date "${d} days ago" +%m%d%y)"
done
grep "^\\(${pattern}\\)" abc.log

Change date format from dd/mm/yyyy to yyyy-mm-dd in a file using shell scripting

I have a source file with 18 columns in which columns 10 , 11 and 15 are in the format dd/mm/yyyy and all these needs to be converted to yyyy-mm-dd and written to target file along with other columns.
I am aware of date formatting functions on Variables but do not know how to apply the same on few columns in a file.
I don’t have a machine available to test, but consider using awk with a little function since you are doing the same thing 3 times. It will look something like this:
awk ‘
function dodate(in){
split(in,/\//,a) # split existing date into elements of array “a”
return a[3] “-“ a[2] “-“ a[1]
}
{ $10=dodate($10); $11=dodate($11); $15=dodate($15); print }’ yourFile
Reference for awk functions, and split.
If the fields on each line are separated by commas, tell awk that with:
awk -F, ...
Maybe you could use command awk to solve it.
As you have 3 cols contain date (col 10, 11, 15), here I assume a sample string which field seperator is |, col contains date is the 4th col
aa|bb|cc|29/09/2017|dd|ee|ff
use String-Manipulation Functions to extract date, then format it with getline to format it to expected syntax.
command is
echo 'aa|bb|cc|2017-09-29|dd|ee|ff' | awk -F\| 'BEGIN{OFS="|"}{$4=gensub(/([0-9]{1,2})\/([0-9]{1,2})\/([0-9]{4})/,"\\3\\2\\1","g",$4); "date --date=\""$4"\" +\"%F\"" | getline a; $4=a; print $0}'
output is
aa|bb|cc|2017-09-29|dd|ee|ff
Hope to help you.
If you have the dateutils package installed, you can use dateutils.dconv
cat file | dateutils.dconv -S -i "%d/%m/%Y"
-i specify input date format
-S sed mode, process only the matched string and copy the rest
Input File
aa|bb|cc|29/09/2017|dd|ee|ff|02/10/2017|gg
Output
aa|bb|cc|2017-09-29|dd|ee|ff|2017-10-02|gg
I'd use the date command:
while read fmtDate
do
date -d ${fmtDate} "+%Y-%m-%d"
done

Awk timestamp greater than

I have a file which I'm trying to print only the lines with a timestamp greater than or equal to 22:01, but I cant seem to get it to work correctly. As can be seen below it still prints the 8:05 timestamps as well? Probably a school boy error but I'm struggling to get this working so any pointers in the right direction would be appreciated.
cat /tmp/m1.out | awk '$1>="22:01"'
22:05:42:710
23:05:42:710
8:05:42:710
8:05:42:710
8:05:42:710
8:05:42:710
8:05:42:710
Thanks,
Matt
The problem has been correctly identified in the comments. You are comparing against a string, which triggers a string comparison. In string comparison, "8:05:42:710" is greater than "22:01" because the first character "8" is greater than "2".
One option would be to split the time into the separate components and use numerical comparisons instead:
awk -F: '$1 >= 22 && $2 >= 1' /tmp/m1.out
If your logic is more complex, e.g. your file has more fields and you don't want to change the field separator, you can use split:
awk '{ split($1, pieces, /:/) } pieces[1] >= 22 && pieces[2] >= 1' file
Padding the field with a leading zero is a little more tricky and isn't necessary in your example, as a time with only one digit in the hours will never be greater than 22.
The best thing to do if possible would be to use a timestamp that is compatible with string comparison, although that would require control of whatever is producing the file you're working with.

awk: format date string from YYYYMMDD to YYYY-MM-DD

I have a CSV file which I parse using awk because I don't need all columns.
The problem I have is that one column is a date but in the format YYYYMMDD but I need it in YYYY-MM-DD and I don't know how to achieve that.
I already tried with split($27, a) but it doesn't split it - so a[0] returns the whole string.
Use your awk output as input to date -d, e.g.
$ date -d 20140918 +'%Y-%m-%d'
2014-09-18
You could use substr:
printf "%s-%s-%s", substr($27,0,4), substr($27,5,2), substr($27,7,2)
Assuming that the 27th field was 20140318, this would produce 2014-03-18.

Resources