Convert a text into time format using bash script - linux

I am new to shell scripting.. I have a tab-separated file, e.g.,
0018803 01 1710 2050 002571
0018951 01 1934 2525 003277
0019362 02 2404 2415 002829
0019392 01 2621 2820 001924
0019542 01 2208 2413 003434
0019583 01 1815 2134 002971
Here, the 3rd and 4th column is representing Start Time and End Time.
I want to convert these two columns in proper timeFrame so that I can get 6th column as the exact time difference between column 4 and column 3 in hours and minutes.
Column 6 result will be 3:40, 5:51, 00:11, 1:59, 2:05.

One way with awk:
$ cat test.awk
# create a function to split hour and minute
function f(h, x) {
h[0] = substr(x,1,2)+0
h[1] = substr(x,3,2)+0
}
{
f(start, $3);
f(end, $4);
span = end[1] - start[1] > 0 \
? sprintf("%d:%02d", end[0]-start[0], end[1]-start[1]) \
: sprintf("%d:%02d", end[0]-start[0]-1, 60+end[1]-start[1]);
print $0 OFS span
}
then run the awk file as the following:
$ awk -f test.awk input_file
Edit: per #glenn jackman's suggestion, the code can be simplified (refer to #Kamil Cuk's method):
function g(x) {
return substr(x,1,2)*60 + substr(x,3,2)
}
{
span = g($4) - g($3)
printf("%s%s%d:%02d\n", $0, OFS, int(span/60), span%60)
}

A simple bash solution using arithmetic expansion:
while IFS='' read -r l; do
IFS=' ' read -r _ _ st et _ <<<"$l"
d=$(( (10#${et:0:2} * 60 + 10#${et:2:2}) - (10#${st:0:2} * 60 + 10#${st:2:2}) ))
printf "%s %02d:%02d\n" "$l" "$((d/60))" "$((d%60))"
done < intput_file_path
will output:
0018803 01 1710 2050 002571 03:40
0018951 01 1934 2525 003277 05:51
0019362 02 2404 2415 002829 00:11
0019392 01 2621 2820 001924 01:59
0019542 01 2208 2413 003434 02:05
0019583 01 1815 2134 002971 03:19

Here is one in GNU awk using time functions, mktime to convert to epoch time and strftime to convert the time to desired format HH:MM:
$ awk -v OFS="\t" '{
dt3="1970 01 01 " substr($3,1,2) " " substr($3,3,2) " 00"
dt4="1970 01 01 " substr($4,1,2) " " substr($4,3,2) " 00"
print $0,strftime("%H:%M",mktime(dt4)-mktime(dt3),1) # thanks #glennjackman,1 :)
}' file
Output ($6 only):
03:40
05:51
00:11
01:59
02:05
03:19

Related

conditional statement with awk

I'm new with linux
I'm trying to get logs between two dates with gawk.
this is my log
Oct 07 11:00:33 abcd
Oct 08 12:00:33 abcd
Oct 09 14:00:33 abcd
Oct 10 21:00:33 abcd
I can do it when both start and end date are sent
but I have problem when start or end date or both are not sent
and I don't know how to check it .
I've written below code but it has syntax error .
sudo gawk -v year='2022' -v start='' -v end='2022:10:08 21:00:34' '
BEGIN{ gsub(/[:-]/," ", start); gsub(/[:-]/," ", end) }
{ dt=year" "$1" "$2" "$3; gsub(/[:-]/," ", dt) }
if(start && end){mktime(dt)>=mktime(start) && mktime(dt)<=mktime(end)}
else if(end){mktime(dt)<=mktime(end)}
else if(start){mktime(dt)>=mktime(start)} ' log.txt
How can I modify this code ?
I'd write:
gawk -v end="Oct 10 12:00:00" '
function to_epoch(timestamp, n, a) {
n = split(timestamp, a, /[ :]/)
return mktime(strftime("%Y", systime()) " " month[a[1]] " " a[2] " " a[3] " " a[4] " " a[5])
}
BEGIN {
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", m)
for (i=1; i<=12; i++) month[m[i]]=i
if (start) {_start = to_epoch(start)} else {_start = 0}
if (end) {_end = to_epoch(end)} else {_end = 2**31}
}
{ ts = to_epoch($0) }
_start <= ts && ts <= _end
' log.txt
You'll pass the start and/or end variables with the same datetime format as appears in the log file.
This would be easier with dateutils, e.g.:
<infile dategrep -i '%b %d %H:%M:%S' '>Oct 08 00:00:00' |
dategrep -i '%b %d %H:%M:%S' '<Oct 09 23:59:59'
Output:
Oct 08 12:00:33 abcd
Oct 09 14:00:33 abcd

How to extract log using bash script between a date / time and another

have a 3 logfiles and need to extract via bash a period, however it does not identify the file as date / time.
Can anyone help me with how would a script using sed or awk or even grep get a log from YYYY / MM / DD HH: MM: ss to YYYY / MM / DD HH: MM: ss?
My logs files is generated something like this:
2019-06-04-06.48.05.040000 INFO v65a8fe79:16a8d792e10:-d37:10.150.100.000 66.200.83.195 |36983 RD8jrq1limntMPACJ4iRx-D
2019-06-04-07.38.03.145000 INFO 2d5bb9b6:16a8d794bd9:-ae9:10.150.100.000 200.200.87.8 |37027 fje7hxh7yKCGZcEQOnPOafQ
2019-06-04-07.38.09.966000 INFO 65a8fe79:16a8d792e10:-d36:10.150.100.000 200.200.87.8 |37029 3hesLFH1cySQ1so0YSmShbV
2019-06-04-07.38.09.966000 INFO 2d5bb9b6:16a8d794bd9:-ae8:10.150.100.000 200.200.87.8 |37028 SykkGWSrAXh8yUG
and the others have this format:
2019-06-05 00:28:50,548 DEBUG [site.aq.application.object.context.DataContextFactoryImpl] - [Criado o DataContext com -389:192.193.10.250]
2019-06-05 00:28:50,550 INFO [site.aq.application.object.context.DataContextFactoryImpl] - [CacheableRegraUserAgentService: countFail=8, matchRate=0.6]
2019-06-05 00:28:50,554 DEBUG [site.aq.application.object.context.DataContextFactoryImpl] - [Liberado o dataContext com ID 2d5bb9b6:16a8d794bd9:-389:192.193.46.200]
2019-06-05 07:20:04,628 DEBUG [site.aq.application.object.context.DataContextFactoryImpl] - [Criado o DataContext com ID65a8fe79:16a8d792e10:-5e8:192.300.46.200]
I am begnner and with exemple search I do this:
I try write this:
Create the file "lista-log.awk"
#!/usr/bin/gawk -f
BEGIN {
starttime = mktime(starttime)
endtime = mktime(endtime)
}
func in_range(n, start, end) {
return start <= n && n < end
}
match($0, /^([0-9]{4})-([0-9]{2})-([0-9]{2})\s/, m) &&
in_range(mktime(m[1] " " m[2] " " m[3] " 00 00 00"), starttime, endtime)
and on promtp I write for example:
./lista-log.awk -v starttime='2019 06 05 00 00 00' -v endtime='2019 06 05 04 39 00' arquivo.log.txt
but I think that it's as if he does not understand that this is an hour/date, why don't retorn nothing
The nice thing you have is that your date-time is a sortable format. So all you need to do is write the following awk line:
awk -v tStart="2019-05-31-01.02.03.000000" -v tEnd="2019-06-01-21.02.03.000000" \
'($1 >= tStart) && ($1 < tEnd)' file

find length of a fixed width file wtih a little twist

Hi Wonderful People/My Gurus and all kind-hearted people.
I've a fixed width file and currently i'm trying to find the length of those rows that contain x bytes. I tried couple of awk commands but, it is not giving me the result that i wanted. My fixed width contains 208bytes, but there are few rows that don't contain 208 bytes. I"m trying to discover those records that doesn't have 208bytes.
this cmd gave me the file length
awk '{print length;exit}' file.text
here i tried to print rows that contain 101 bytes, but it didn't work.
awk '{print length==101}' file.text
Any help/insights here would be highly helpful
With awk:
awk 'length() < 208' file
Well, length() gives you the number of characters, not bytes. This number can differ in unicode context. You can use the LANG environment variable to force awk to use bytes:
LANG=C awk 'length() < 208' file
Perl to the rescue!
perl -lne 'print "$.:", length if length != 208' -- file.text
-n reads the input line by line
-l removes newlines from the input before processing it and adds them to print
The one-liner will print line number ($.) and the length of the line for each line whose length is different than 208.
if you're using gawk, then it's no issue, even in typical UTF-8 locale mode :
length(s) = # chars native to locale,
# typically that means # utf-8 chars
match(s, /$/) - 1 = # raw bytes # this also work for pure-binary
# inputs, without triggering
# any error messages in gawk Unicode mode
Best illustrated by example :
0000000 3347498554 3381184647 3182945161 171608122
: Ɔ ** LJ ** Ȉ ** ɉ ** 㷽 ** ** : 210 : \n
072 306 206 307 207 310 210 311 211 343 267 275 072 210 072 012
: ? 86 ? 87 ? 88 ? 89 ? ? ? : 88 : nl
58 198 134 199 135 200 136 201 137 227 183 189 58 136 58 10
3a c6 86 c7 87 c8 88 c9 89 e3 b7 bd 3a 88 3a 0a
0000020
# gawk profile, created Sat Oct 29 20:32:49 2022
BEGIN {
1 __ = "\306\206\307\207\310" (_="\210") \
"\311\211\343\267\275"
1 print "",__,_
1 STDERR = "/dev/stderr"
1 print ( match(_, /$/) - 1, "_" ) > STDERR # *A
1 print ( length(__), match(__, /$/) - 1 ) > STDERR # *B
1 print ( (__~_), match(__, (_) ".*") ) > STDERR # *C
1 print ( RSTART, RLENGTH ) > STDERR # *D
}
1 | _ *A # of bytes off "_" because it was defined as 0x88 \210
5 | 11 *B # of chars of "__", and
# of bytes of it :
# 4 x 2-byte UC
# + 1 x 3-byte UC = 11
1 | 3 *C # does byte \210 exist among larger string (true/1),
# and which unicode character is 1st to
# contain \210 - the 3rd one, by original definition
3 | 3 *D # notice I also added a ".*" to the tail of this match() :
# if the left-side string being tested is valid UTF-8,
# then this will match all the way to the end of string,
# inclusive, in which you can deduce :
#
# "\210 first appeared in 3rd-to-last utf-8 character"
Combining that inferred understanding :
RLENGTH = "3 chars to the end, inclusive",
with knowledge of how many to its left :
RSTART - 1 = "2 chars before",
yields a total count of 3 + 2 = 5, affirming length()'s result

Scripting - Iterating Numbers Using a While Loop (newusers command)

I am working on a script where I want to iterate between the numbers 1 to 15, but want it shown as 01 02 03 ... 13 14 15. Essentially what I am trying to do is add 15 users using the newusers command and using this script as < to the command. newusers needs to be in this format:
pw_name:pw_passwd:pw_uid:pw_gid:pw_gecos:pw_dir:pw_shell
Basically, it should look like this when I run the script with arguments =
cstuser01:EzVlK9Je8JvfQump:1001:1001:CST8177 user:/home/cstuser01:/bin/bash
cstuser02:EsKOfvhgnWpiBT6c:1002:1002:CST8177 user:/home/cstuser02:/bin/bash
cstuser03:qzQuR5vRgxdzY6dq:1003:1003:CST8177 user:/home/cstuser03:/bin/bash
I got most of it working but I am getting the error below:
./15users.sh: 57: ./15users.sh: Illegal number: 08
Here is my script so far (I took out a couple sections with error checking) =
#!/bin/sh -u
PATH=/bin:/usr/bin ; export PATH
umask=022
#num=1 (this variable is needed depending on which loop I use below)
user=$prefix"user"
uid=1001
gid=$uid
home=/home/$user
shell=/bin/bash
#echo "pw_name:pw_passwd:pw_uid:pw_gid:pw_gecos:pw_dir:pw_shell"
#PASSWD=$(openssl rand -base64 12)
I originally had this but ran into a few problems:
while [ $NUM -le 15 ] ; do
if [ $NUM -lt 10 ] ; then
NUM=0$NUM
fi
echo "$USER$NUM:$(openssl rand -base64 12):$UID:$GID:$GECO:$HOME$NUM:$SHELL"
UID=$(( UID + 1 ))
GID=$(( GID + 1 ))
NUM=$(( NUM + 1 ))
done
A friend of mine suggested this, it works perfectly fine. But I am trying to future proof this thing. What if I have a 100 or 1,000 users to add.
for NUM in 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 ; do
echo "$USER$NUM:$(openssl rand -base64 12):$UID:$GID:$GECO:$HOME$NUM:$SHELL"
done
This didn't work:
for num in {01..15} ; do
i=09
echo "$(( 10#$num + 1 ))"
10
done
I then tried this getting a syntax error =
./15users.sh: 50: ./15users.sh: Syntax error: Bad for loop variable
for (( num=1; num<=15; num++ )) ; do
printf "%02d\n" $num
done
I tried this as well but seq prints vertically not horizontally:
#iterate=$(seq -w 1 15)
for $iterate ; do
echo "$user$num:$(openssl rand -base64 12):$uid:$gid:$geco:$home$num:$shell"
done
To loop over 01 to 15, it is much simpler to use brace expansion:
$ for num in {01..15}; do echo "$num"; done
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
In bash, by default, numbers beginning with 0 are octal. Since 08 and 09 are illegal as base-8 numbers, they will cause an error. To avoid that, explicitly specify the base:
$ i=09; echo $(( 10#$i + 1 ))
10
The expression 10#$i tells bash to interpret $i as a base-10 number.
Do NOT use all caps for your script variables. The system uses all caps and you don't want to accidentally overwrite a system variable.
In the case of UID, it is a read-only bash variable. Attempts by your script to assign UID will fail. Use lower or mixed-case for your script variables.
Another example of the all caps problem is $HOME. Note that the following code works:
$ openssl rand -base64 12
1bh9+dp+Ap7xFIBB
But the following fails:
$ (HOME=/home/user; openssl rand -base64 12)
zceZeWsQGpohTPvv
unable to write 'random state'
Apparently, openssl expects to have write-access to $HOME.
Assigning HOME to a non-existent directory causes an error.
So, again, do not all all caps for your script variables.
I won't try to diagnose your error message, but you're over-complicating what you're trying to achieve.
for i in {01..15}; do echo $i; done
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
Bash supports C style loops as well:
$ for (( i=1; i<=15; i++ )); do printf "%02d\n" $i; done
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
Just use printf with the flag to print leading 0 and you have your output.
Since it hasn't been mentioned yet:
seq -w 1 15
seq -w 1 15 | while read num; do echo "n=$num"; done

Awk/Perl convert textfile to csv with sensible format

I have a historical autogenerated logfile with the following format that I would like to convert to a csv file prior to uploading to a database
--------------------------------------
Thu Jul 8 09:34:12 BST 2010
BLUE Head 1
Duration = 20 s
Activity = 14.9 MBq
Sensitivity = 312 cps/MBq
--------------------------------------
Thu Jul 8 09:34:55 BST 2010
BLUE Head 1
Duration = 20 s
Activity = 14.9 MBq
Sensitivity = 318 cps/MBq
--------------------------------------
Thu Jul 8 10:13:39 BST 2010
RED Head 1
Duration = 20 s
Activity = 14.9 MBq
Sensitivity = 307 cps/MBq
--------------------------------------
Thu Jul 8 10:14:10 BST 2010
RED Head 1
Duration = 20 s
Activity = 14.9 MBq
Sensitivity = 305 cps/MBq
--------------------------------------
Mon Jul 19 10:11:18 BST 2010
BLUE Head 1
Duration = 20 s
Activity = 12.4 MBq
Sensitivity = 326 cps/MBq
--------------------------------------
Mon Jul 19 10:12:09 BST 2010
BLUE Head 1
Duration = 20 s
Activity = 12.4 MBq
Sensitivity = 333 cps/MBq
--------------------------------------
Mon Jul 19 10:13:57 BST 2010
RED Head 1
Duration = 20 s
Activity = 12.4 MBq
Sensitivity = 338 cps/MBq
--------------------------------------
Mon Jul 19 10:14:45 BST 2010
RED Head 1
Duration = 20 s
Activity = 12.4 MBq
Sensitivity = 340 cps/MBq
--------------------------------------
I would like to convert the logfile to the following format
Date,Camera,Head,Duration,Activity
08/07/10,BLUE,1,20,14.9
08/07/10,BLUE,1,20,14.9
08/07/10,RED,1,20,14.9
08/07/10,RED,1,20,14.9
I have used awk to get me close to what I wish
awk 'BEGIN {print "Date,Camera,Head,Duration,Activity";RS = "--------------------------------------"; FS="\n";}; {OFS=",";split($3, a, " ");split($4,b, " "); split($5,c," ");print $2,a[1],a[3],b[3],c[3]}' sensitivity.txt > sensitivity.csv
which gives me
Date,Camera,Head,Duration,Activity
,,,,
Thu Jul 8 09:34:12 BST 2010,BLUE,1,20,14.9
Thu Jul 8 09:34:55 BST 2010,BLUE,1,20,14.9
Thu Jul 8 10:13:39 BST 2010,RED,1,20,14.9
Thu Jul 8 10:14:10 BST 2010,RED,1,20,14.9
How can I
(a) get rid of the 4 output field separators in line 4
(b) Convert the date format from Thu Jul 8 09:34:12 BST 2010 to DD/MM/YY (Can I do this in pure awk or by piping to perl)
#sudo_O's answer is fine but here's an alternative:
$ cat tst.awk
BEGIN{ RS="---+\n"; OFS=","; months="JanFebMarAprMayJunJulAugSepOctNovDec" }
NR==1{ print "Date","Camera","Head","Duration","Activity"; next }
{ print sprintf("%04d%02d%02d",$6,(match(months,$2)+2)/3,$3),$7,$9,$12,$16 }
$ gawk -f tst.awk file
Date,Camera,Head,Duration,Activity
20100708,BLUE,1,20,14.9
20100708,BLUE,1,20,14.9
20100708,RED,1,20,14.9
20100708,RED,1,20,14.9
20100719,BLUE,1,20,12.4
20100719,BLUE,1,20,12.4
20100719,RED,1,20,12.4
20100719,RED,1,20,12.4
Note that I used GNU awk above so I could set the RS to more than a single character. With other awks just convert all the "---..."s lines to a blank line or control character or something and set RS accordingly before running the script.
If you don't like my suggested date format, tweak the sprintf() to suit.
This straight forward awk script will do the job:
BEGIN {
n=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",month,"|")
for (i=1;i<=n;i++) {
month_index[month[i]] = i
}
print "Date,Camera,Head,Duration,Activity"
}
/^-*$/{
i=0
next
}
{
i++
}
i==1{
printf "%02d/%02d/%02d,",$3,month_index[$2],substr($6,3)
}
i==2{
printf "%s,%d,",$1,$3
}
i==3{
printf "%d,",$3
}
i==4{
printf "%.1f\n",$3
}
Outputs:
$ awk -f script.awk file
08/07/10,BLUE,1,20,14.9
08/07/10,BLUE,1,20,14.9
08/07/10,RED,1,20,14.9
08/07/10,RED,1,20,14.9
19/07/10,BLUE,1,20,12.4
19/07/10,BLUE,1,20,12.4
19/07/10,RED,1,20,12.4
19/07/10,RED,1,20,12.4
I figured I would show how to actually parse the input, rather than just performing string transformations.
#! /usr/bin/env perl
use strict;
use warnings;
use Date::Parse;
use Date::Format;
use Text::CSV;
sub convert_date{
my $time = str2time($_[0]);
# iso 8601 style:
return time2str('%Y-%m-%d',$time); # YYYY-MM-DD
# or the outdated style output you wanted
return time2str('%d/%m/%y',$time); # DD/MM/YY
}
my %multiply_table = (
s => 1,
m => 60,
h => 60 * 60,
d => 60 * 60 * 24,
);
sub convert_duration{
my($d,$s) = $_[0] =~ /^ \s* (\d+) \s* (\w) \s* $/x;
die "Invalid duration '$_[0]'" unless $d && $s;
return $d * $multiply_table{$s};
}
my #field_list = qw'Date Camera Head Duration Activity';
my $csv = Text::CSV->new( { eol => "\n" } );
# print header
$csv->print( \*STDOUT, \#field_list );
# set record separator
local $/ = ('-' x 38) . "\n";
# parse data
while(<>){
chomp; # remove record separator
next unless $_; # skip empty section
my($time,$camdat,#fields) = split m/\n/; # split up the fields
my %data;
# split camera and head fields
#data{qw(Camera Head)} = split /\s+Head\s+/, $camdat;
# parse lines like:
# Duration = 20 s
# Activity = 14.9 MBq
# Sensitivity = 305 cps/MBq
for(#fields){
my($key,$value) = /(\w+) \s* = \s* (.*) /x;
$data{$key} = $value;
}
# at this point we start reducing precision
$data{Date} = convert_date( $time );
# remove measurement units
$data{Duration} = convert_duration($data{Duration}); # safe
$data{Activity} =~ s/[^\d]*$//; # unsafe
$csv->print(\*STDOUT, [#data{#field_list}]);
}

Resources