conditional statement with awk - linux

I'm new with linux
I'm trying to get logs between two dates with gawk.
this is my log
Oct 07 11:00:33 abcd
Oct 08 12:00:33 abcd
Oct 09 14:00:33 abcd
Oct 10 21:00:33 abcd
I can do it when both start and end date are sent
but I have problem when start or end date or both are not sent
and I don't know how to check it .
I've written below code but it has syntax error .
sudo gawk -v year='2022' -v start='' -v end='2022:10:08 21:00:34' '
BEGIN{ gsub(/[:-]/," ", start); gsub(/[:-]/," ", end) }
{ dt=year" "$1" "$2" "$3; gsub(/[:-]/," ", dt) }
if(start && end){mktime(dt)>=mktime(start) && mktime(dt)<=mktime(end)}
else if(end){mktime(dt)<=mktime(end)}
else if(start){mktime(dt)>=mktime(start)} ' log.txt
How can I modify this code ?

I'd write:
gawk -v end="Oct 10 12:00:00" '
function to_epoch(timestamp, n, a) {
n = split(timestamp, a, /[ :]/)
return mktime(strftime("%Y", systime()) " " month[a[1]] " " a[2] " " a[3] " " a[4] " " a[5])
}
BEGIN {
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", m)
for (i=1; i<=12; i++) month[m[i]]=i
if (start) {_start = to_epoch(start)} else {_start = 0}
if (end) {_end = to_epoch(end)} else {_end = 2**31}
}
{ ts = to_epoch($0) }
_start <= ts && ts <= _end
' log.txt
You'll pass the start and/or end variables with the same datetime format as appears in the log file.

This would be easier with dateutils, e.g.:
<infile dategrep -i '%b %d %H:%M:%S' '>Oct 08 00:00:00' |
dategrep -i '%b %d %H:%M:%S' '<Oct 09 23:59:59'
Output:
Oct 08 12:00:33 abcd
Oct 09 14:00:33 abcd

Related

how to skip blank space if value is not there and print proper row and column

I have one details.txt file which has below data
size=190000
date=1603278566981
repo-name=testupload
repo-path=/home/test/testupload
size=140000
date=1603278566981
repo-name=testupload2
repo-path=/home/test/testupload2
size=170000
date=1603278566981
repo-name=testupload3
repo-path=/home/test/testupload3
and below awk script process that to
#!/bin/bash
awk -vOFS='\t' '
BEGIN{ FS="=" }
/^size/{
if(++count1==1){ header=$1"," }
sizeArr[++count]=$NF
next
}
/^#repo-name/{
if(++count2==1){ header=header OFS $1"," }
repoNameArr[count]=$NF
next
}
/^date/{
if(++count3==1){ header=header OFS $1"," }
dateArr[count]=$NF
next
}
/^#blob-name/{
if(++count4==1){ header=header OFS $1"," }
repopathArr[count]=$NF
next
}
END{
print header
for(i=1;i<=count;i++){
printf("%s,%s,%s,%s,%s\n",sizeArr[i],repoNameArr[i],dateArr[i],repopathArr[i])
}
}
' details.txt | tr -d # |awk -F, '{$3=substr($3,0,10)}1' OFS=,|sed 's/date/creationTime/g'
which prints value as expected, (because it has reponame)
size " repo-name" " creationTime" " blob-name"
10496000 testupload Fri 11 Dec 2020 07:35:56 AM CET testfile.tar11.gz
10496000 testupload Thu 10 Dec 2020 02:44:04 PM CET testfile.tar.gz
9602303 testupload Fri 11 Dec 2020 07:38:58 AM CET apache-maven-3.6.3-bin/apache-maven-3.6.3-bin.zip
but when something is missing in file format of file gets wrong format (here repo name jumps to last column's headers as first few data don't have reponame value)
size " creationTimeime" " blob-name" " " repo-name"
261304 Thu 13 Feb 2020 08:50:02 AM CET temp 8963d25231b
29639 Thu 13 Feb 2020 08:50:00 AM CET temp 3780c72cab5
93699 Thu 13 Feb 2020 08:50:00 AM CET temp 209276c91ba
and column headers gets wrongly printed but data gets printed perfectly, is there any thing that validate if one of the field is not there it should skip that and print the rest in proper format.
If data is not available it should keep that header same, it should not headers sequence.
My requirement
if deatils.txt is missing any records it should skip that and print as blank and prints as per header.
Headers gets disturbed if repo-name field is not there but rest output is correct so we need to have headers intact even if field is missing.
Wrong:
size " creationTimeime" " blob-name" " " repo-name"
261304 Thu 13 Feb 2020 08:50:02 AM CET temp 8963d25231b
29639 Thu 13 Feb 2020 08:50:00 AM CET temp 3780c72cab5
93699 Thu 13 Feb 2020 08:50:00 AM CET temp 209276c91ba
Right
size " repo-name" " creationTime" " blob-name"
10496000 testupload Fri 11 Dec 2020 07:35:56 AM CET testfile.tar11.gz
10496000 testupload Thu 10 Dec 2020 02:44:04 PM CET testfile.tar.gz
9602303 testupload Fri 11 Dec 2020 07:38:58 AM CET apache-maven-3.6.3-bin/apache-maven-3.6.3-bin.zip
Thanks
samurai
You may try this gnu awk:
awk -F= -v OFS='\t' 'function prt(ind, name, s) {s=map[ind][name]; return (s==""?" ":s);} {map[NR][$1] = $2} END {print "Size", "Repo Name", "CreationTime", "Repo Path"; for (i=1; i<=NR; i+=4) print prt(i, "size"), prt(i+2, "repo-name"), prt(i+1, "date"), prt(i+3, "repo-path")}' file
Size Repo Name CreationTime Repo Path
190000 testupload 1603278566981 /home/test/testupload
140000 testupload2 1603278566981 /home/test/testupload2
170000 testupload3 1603278566981 /home/test/testupload3
To make it readable:
awk -F= -v OFS='\t' 'function prt(ind, name, s) {
s = map[ind][name]
return (s==""?" ":s)
}
{
map[NR][$1] = $2
}
END {
print "Size", "Repo Name", "CreationTime", "Repo Path"
for (i=1; i<=NR; i+=4)
print prt(i, "size"), prt(i+2, "repo-name"), prt(i+1, "date"), prt(i+3, "repo-path")
}' file

Convert a text into time format using bash script

I am new to shell scripting.. I have a tab-separated file, e.g.,
0018803 01 1710 2050 002571
0018951 01 1934 2525 003277
0019362 02 2404 2415 002829
0019392 01 2621 2820 001924
0019542 01 2208 2413 003434
0019583 01 1815 2134 002971
Here, the 3rd and 4th column is representing Start Time and End Time.
I want to convert these two columns in proper timeFrame so that I can get 6th column as the exact time difference between column 4 and column 3 in hours and minutes.
Column 6 result will be 3:40, 5:51, 00:11, 1:59, 2:05.
One way with awk:
$ cat test.awk
# create a function to split hour and minute
function f(h, x) {
h[0] = substr(x,1,2)+0
h[1] = substr(x,3,2)+0
}
{
f(start, $3);
f(end, $4);
span = end[1] - start[1] > 0 \
? sprintf("%d:%02d", end[0]-start[0], end[1]-start[1]) \
: sprintf("%d:%02d", end[0]-start[0]-1, 60+end[1]-start[1]);
print $0 OFS span
}
then run the awk file as the following:
$ awk -f test.awk input_file
Edit: per #glenn jackman's suggestion, the code can be simplified (refer to #Kamil Cuk's method):
function g(x) {
return substr(x,1,2)*60 + substr(x,3,2)
}
{
span = g($4) - g($3)
printf("%s%s%d:%02d\n", $0, OFS, int(span/60), span%60)
}
A simple bash solution using arithmetic expansion:
while IFS='' read -r l; do
IFS=' ' read -r _ _ st et _ <<<"$l"
d=$(( (10#${et:0:2} * 60 + 10#${et:2:2}) - (10#${st:0:2} * 60 + 10#${st:2:2}) ))
printf "%s %02d:%02d\n" "$l" "$((d/60))" "$((d%60))"
done < intput_file_path
will output:
0018803 01 1710 2050 002571 03:40
0018951 01 1934 2525 003277 05:51
0019362 02 2404 2415 002829 00:11
0019392 01 2621 2820 001924 01:59
0019542 01 2208 2413 003434 02:05
0019583 01 1815 2134 002971 03:19
Here is one in GNU awk using time functions, mktime to convert to epoch time and strftime to convert the time to desired format HH:MM:
$ awk -v OFS="\t" '{
dt3="1970 01 01 " substr($3,1,2) " " substr($3,3,2) " 00"
dt4="1970 01 01 " substr($4,1,2) " " substr($4,3,2) " 00"
print $0,strftime("%H:%M",mktime(dt4)-mktime(dt3),1) # thanks #glennjackman,1 :)
}' file
Output ($6 only):
03:40
05:51
00:11
01:59
02:05
03:19

Sed to extract with log between to given date

I am trying to extract the logs b/w two given dates. the code is working fine if I specify the date like this Apr 02 15:21:28, I mean if know time with exact min and second, But code gets failed with I pass value like this
from Apr 02 15* to Apr 04 15* here 15 is hour, actually I want to make script in which user just need to add day and time (only in hours no min or seconds)
> #!/bin/bash
read -p " enter the App name : " app
file="/logs/$app/$app.log"
read -p " Enter the Date in this Format --'10 Jan 20 or Jan 10 20' : " first
read -p " Enter the End time of logs : " end
if [ -f "$file" ]
then
if grep -q "$first" "$file"; then
final_first=$first
fi
if grep -q "$end" "$file"; then
final_end=$end
fi
sed -n " /$final_first/,/$final_end/ "p $file >$app.txt
else
echo "$app.log not found, Please check correct log name in deployer"
fi
Sample data:
Apr 07 12:39:15 DEBUG [http-0.0.0.0-8089-21] model.DSSAuthorizationModel - pathInfo : /about-ses
Apr 07 12:39:15 DEBUG [http-0.0.0.0-8089-21] servlet.CasperServlet - Request about to be serviced by model: com.ge.oilandgas.sts.model.SessionValidModel
I'd use a language with built-in datetime parsing or an easily included module. For example, perl
first="Apr 07 12"
end="Apr 08 00"
perl -MTime::Piece -sane '
BEGIN {
$first_ts = Time::Piece->strptime($first, "%b %d %H")->epoch;
$end_ts = Time::Piece->strptime($end, "%b %d %H")->epoch;
}
$ts = Time::Piece->strptime(join(" ", #F[0..2]), "%b %d %T")->epoch;
print if $first_ts <= $ts and $ts <= $end_ts;
' -- -first="$first" -end="$end" <<END
Apr 07 11:39:15 DEBUG [http-0.0.0.0-8089-21] model.DSSAuthorizationModel - pathInfo : /about-ses
Apr 07 12:00:00 DEBUG [http-0.0.0.0-8089-21] model.DSSAuthorizationModel - pathInfo : /about-ses
Apr 07 12:39:15 DEBUG [http-0.0.0.0-8089-21] model.DSSAuthorizationModel - pathInfo : /about-ses
Apr 07 12:39:15 DEBUG [http-0.0.0.0-8089-21] servlet.CasperServlet - Request about to be serviced by model: com.ge.oilandgas.sts.model.SessionValidModel
Apr 07 23:59:59 DEBUG [http-0.0.0.0-8089-21] servlet.CasperServlet - Request about to be serviced by model: com.ge.oilandgas.sts.model.SessionValidModel
Apr 08 00:00:01 DEBUG [http-0.0.0.0-8089-21] servlet.CasperServlet - Request about to be serviced by model: com.ge.oilandgas.sts.model.SessionValidModel
END
outputs
Apr 07 12:00:00 DEBUG [http-0.0.0.0-8089-21] model.DSSAuthorizationModel - pathInfo : /about-ses
Apr 07 12:39:15 DEBUG [http-0.0.0.0-8089-21] model.DSSAuthorizationModel - pathInfo : /about-ses
Apr 07 12:39:15 DEBUG [http-0.0.0.0-8089-21] servlet.CasperServlet - Request about to be serviced by model: com.ge.oilandgas.sts.model.SessionValidModel
Apr 07 23:59:59 DEBUG [http-0.0.0.0-8089-21] servlet.CasperServlet - Request about to be serviced by model: com.ge.oilandgas.sts.model.SessionValidModel
Given your code, I would make the following change:
if ! [ -f "$file" ]; then
echo "$app.log not found, Please check correct log name in deployer"
exit 1
fi
grep -q "$first" "$file" && final_first="/$first/" || final_first='1'
grep -q "$end" "$file" && final_end="/$end/" || final_end='$'
sed -n "${final_first},${final_end}p" "$file" >"$app.txt"
That provides default addresses for the sed range, first line and last line.

The method parse_datetime from Perl's DateTime::Format::Strptime can't parse timezone name

I have a laptop with ubuntu 12.04.
The execution of date command at the console result this:
$ date
Thu May 8 15:28:12 WIB 2014
The perl script below will be running well.
#!/usr/bin/perl
use DateTime::Format::Strptime;
$parser = DateTime::Format::Strptime->new( pattern => "%a %b %d %H:%M:%S %Y %Z");
$date = "Fri Sep 20 08:22:42 2013 WIB";
$dateimap = $parser->parse_datetime($date);
$date = $dateimap->strftime("%d-%b-%Y %H:%M:%S %z");
print "$date\n";
$date = "Fri Jan 8 16:49:34 2010 WIT";
$dateimap = $parser->parse_datetime($date);
$date = $dateimap->strftime("%d-%b-%Y %H:%M:%S %z");
print "$date\n";
The result is
20-Sep-2013 08:22:42 +0700
08-Jan-2010 16:49:34 +0900
But, why the timezone name "WIT" is converted to timezone "+0900" ?
AFAIK, WIT is Western Indonesian Time. IMHO it should has timezone "+0700" not "+0900".
The other computer has a running CentOS 5.9.
The execution of date command at the CentOS result:
$ date
Thu May 8 15:38:24 WIT 2014
But the execution of the perl script above result like this:
20-Sep-2013 08:22:42 +0700
Can't call method "strftime" on an undefined value at strptime.pl line 14.
Actually the method parse_datetime can't parse the date which contain "WIT" timezone.
The returned value $dateimap is empty or undef.
The CentOS have been set to localtime Asia/Jakarta.
$ ls -l /etc/localtime
lrwxrwxrwx 1 root root 32 Sep 23 2013 /etc/localtime -> /usr/share/zoneinfo/Asia/Jakarta
Any suggestion ?
Thank you.
Actually the problem happens because the version of module DateTime::Format::Strptime at CentOS 5.9 is 1.2000 while at ubuntu 12.04 is 1.54.
Another problem using the older version of DateTime::Format::Strptime.
$ perl -e '
> use DateTime::Format::Strptime;
> $parser = DateTime::Format::Strptime->new( pattern => "%a %b %d %H:%M:%S %Y");
> $datembox = "Wed Jan 1 06:42:18 2014 WIT";
> $date = $parser->parse_datetime($datembox);
> print "$date\n";'
$
If we remove double spaces at variable $datembox.
$ perl -e '
> use DateTime::Format::Strptime;
> $parser = DateTime::Format::Strptime->new( pattern => "%a %b %d %H:%M:%S %Y");
> $datembox = "Wed Jan 1 06:42:18 2014 WIT";
> $datembox =~ s/[\s]+/ /g;
> $date = $parser->parse_datetime($datembox);
> print "$date\n";'
2014-01-01T06:42:18
$

Awk/Perl convert textfile to csv with sensible format

I have a historical autogenerated logfile with the following format that I would like to convert to a csv file prior to uploading to a database
--------------------------------------
Thu Jul 8 09:34:12 BST 2010
BLUE Head 1
Duration = 20 s
Activity = 14.9 MBq
Sensitivity = 312 cps/MBq
--------------------------------------
Thu Jul 8 09:34:55 BST 2010
BLUE Head 1
Duration = 20 s
Activity = 14.9 MBq
Sensitivity = 318 cps/MBq
--------------------------------------
Thu Jul 8 10:13:39 BST 2010
RED Head 1
Duration = 20 s
Activity = 14.9 MBq
Sensitivity = 307 cps/MBq
--------------------------------------
Thu Jul 8 10:14:10 BST 2010
RED Head 1
Duration = 20 s
Activity = 14.9 MBq
Sensitivity = 305 cps/MBq
--------------------------------------
Mon Jul 19 10:11:18 BST 2010
BLUE Head 1
Duration = 20 s
Activity = 12.4 MBq
Sensitivity = 326 cps/MBq
--------------------------------------
Mon Jul 19 10:12:09 BST 2010
BLUE Head 1
Duration = 20 s
Activity = 12.4 MBq
Sensitivity = 333 cps/MBq
--------------------------------------
Mon Jul 19 10:13:57 BST 2010
RED Head 1
Duration = 20 s
Activity = 12.4 MBq
Sensitivity = 338 cps/MBq
--------------------------------------
Mon Jul 19 10:14:45 BST 2010
RED Head 1
Duration = 20 s
Activity = 12.4 MBq
Sensitivity = 340 cps/MBq
--------------------------------------
I would like to convert the logfile to the following format
Date,Camera,Head,Duration,Activity
08/07/10,BLUE,1,20,14.9
08/07/10,BLUE,1,20,14.9
08/07/10,RED,1,20,14.9
08/07/10,RED,1,20,14.9
I have used awk to get me close to what I wish
awk 'BEGIN {print "Date,Camera,Head,Duration,Activity";RS = "--------------------------------------"; FS="\n";}; {OFS=",";split($3, a, " ");split($4,b, " "); split($5,c," ");print $2,a[1],a[3],b[3],c[3]}' sensitivity.txt > sensitivity.csv
which gives me
Date,Camera,Head,Duration,Activity
,,,,
Thu Jul 8 09:34:12 BST 2010,BLUE,1,20,14.9
Thu Jul 8 09:34:55 BST 2010,BLUE,1,20,14.9
Thu Jul 8 10:13:39 BST 2010,RED,1,20,14.9
Thu Jul 8 10:14:10 BST 2010,RED,1,20,14.9
How can I
(a) get rid of the 4 output field separators in line 4
(b) Convert the date format from Thu Jul 8 09:34:12 BST 2010 to DD/MM/YY (Can I do this in pure awk or by piping to perl)
#sudo_O's answer is fine but here's an alternative:
$ cat tst.awk
BEGIN{ RS="---+\n"; OFS=","; months="JanFebMarAprMayJunJulAugSepOctNovDec" }
NR==1{ print "Date","Camera","Head","Duration","Activity"; next }
{ print sprintf("%04d%02d%02d",$6,(match(months,$2)+2)/3,$3),$7,$9,$12,$16 }
$ gawk -f tst.awk file
Date,Camera,Head,Duration,Activity
20100708,BLUE,1,20,14.9
20100708,BLUE,1,20,14.9
20100708,RED,1,20,14.9
20100708,RED,1,20,14.9
20100719,BLUE,1,20,12.4
20100719,BLUE,1,20,12.4
20100719,RED,1,20,12.4
20100719,RED,1,20,12.4
Note that I used GNU awk above so I could set the RS to more than a single character. With other awks just convert all the "---..."s lines to a blank line or control character or something and set RS accordingly before running the script.
If you don't like my suggested date format, tweak the sprintf() to suit.
This straight forward awk script will do the job:
BEGIN {
n=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",month,"|")
for (i=1;i<=n;i++) {
month_index[month[i]] = i
}
print "Date,Camera,Head,Duration,Activity"
}
/^-*$/{
i=0
next
}
{
i++
}
i==1{
printf "%02d/%02d/%02d,",$3,month_index[$2],substr($6,3)
}
i==2{
printf "%s,%d,",$1,$3
}
i==3{
printf "%d,",$3
}
i==4{
printf "%.1f\n",$3
}
Outputs:
$ awk -f script.awk file
08/07/10,BLUE,1,20,14.9
08/07/10,BLUE,1,20,14.9
08/07/10,RED,1,20,14.9
08/07/10,RED,1,20,14.9
19/07/10,BLUE,1,20,12.4
19/07/10,BLUE,1,20,12.4
19/07/10,RED,1,20,12.4
19/07/10,RED,1,20,12.4
I figured I would show how to actually parse the input, rather than just performing string transformations.
#! /usr/bin/env perl
use strict;
use warnings;
use Date::Parse;
use Date::Format;
use Text::CSV;
sub convert_date{
my $time = str2time($_[0]);
# iso 8601 style:
return time2str('%Y-%m-%d',$time); # YYYY-MM-DD
# or the outdated style output you wanted
return time2str('%d/%m/%y',$time); # DD/MM/YY
}
my %multiply_table = (
s => 1,
m => 60,
h => 60 * 60,
d => 60 * 60 * 24,
);
sub convert_duration{
my($d,$s) = $_[0] =~ /^ \s* (\d+) \s* (\w) \s* $/x;
die "Invalid duration '$_[0]'" unless $d && $s;
return $d * $multiply_table{$s};
}
my #field_list = qw'Date Camera Head Duration Activity';
my $csv = Text::CSV->new( { eol => "\n" } );
# print header
$csv->print( \*STDOUT, \#field_list );
# set record separator
local $/ = ('-' x 38) . "\n";
# parse data
while(<>){
chomp; # remove record separator
next unless $_; # skip empty section
my($time,$camdat,#fields) = split m/\n/; # split up the fields
my %data;
# split camera and head fields
#data{qw(Camera Head)} = split /\s+Head\s+/, $camdat;
# parse lines like:
# Duration = 20 s
# Activity = 14.9 MBq
# Sensitivity = 305 cps/MBq
for(#fields){
my($key,$value) = /(\w+) \s* = \s* (.*) /x;
$data{$key} = $value;
}
# at this point we start reducing precision
$data{Date} = convert_date( $time );
# remove measurement units
$data{Duration} = convert_duration($data{Duration}); # safe
$data{Activity} =~ s/[^\d]*$//; # unsafe
$csv->print(\*STDOUT, [#data{#field_list}]);
}

Resources