multiple search using awk - pattern + arithmetic condition - linux

I am trying to use awk command to perform multiple search to fetch records from a log file WHERE it matches following 2 conditions :
pattern - EXEC_TIME
last column i.e. having EXEC_TIME > 5000 ms.
I tried and used below command but its not giving me correct output, not sure if can be use same way!
I am just learning awk so any help will be appreciated.
awk -F ':' '/EXEC_TIME/&&$15>="5000"{print $2,$15}' TransactionInfoLogs.log
MP170420.0548.T00003[SERV] 9065 ms
OC170420.0655.T00001[SERV] 708 ms
Below is sample log file:
[TXN_ID]:MP170420.0548.T00003[SERV][SERV]:BLKSRVREQ[MSISDN]:8028359017[SV_CHRG_ID]:37152[RESP_CODE]:200[START]:Thu Apr 20 12:44:23 WAT 2017 [END]:Thu Apr 20 12:44:23 WAT 2017[EXEC_TIME]:9065 ms
[TXNID]:XX170420.1244.C01465[TYPE]:SERVICE_CHARGE_PAYER_PAYEE[AMT]:0[PR_MSISDN]:8028359017[PR_MFS]:101[PR_W_TYPE]:12[PR_PREBAL]:0[PR_BAL]:0[PY_MSISDN]:IND03[PY_MFS]:101[PY_W_TYPE]:null[PY_PRE
BAL]:2782239[PY_BAL]:2782239
[2017-04-20 12:44:29,552][http-bio-172.24.87.5-7890-exec-7365]-
[TXN_ID]:XX170420.1244.C01467[SERV]:null[MSISDN]:8080967233[RESP_CODE]:00066[START]:Thu Apr 20 12:44:29 WAT 2017 [END]:Thu Apr 20 12:44:29 WAT 2017[EXEC_TIME]:9 ms
[2017-04-20 12:44:36,634][http-bio-172.24.87.5-7890-exec-7364]-
[TXN_ID]:OC170420.0655.T00001[SERV]:null[MSISDN]:7016532415[RESP_CODE]:00066[START]:Thu Apr 20 12:44:36 WAT 2017 [END]:Thu Apr 20 12:44:36 WAT 2017[EXEC_TIME]:708 ms
[2017-04-20 12:44:45,820][http-bio-172.24.87.5-7890-exec-7359]-
[TXN_ID]:XX170420.1244.C01471[SERV]:null[MSISDN]:8026136275[RESP_CODE]:00066[START]:Thu Apr 20 12:44:45 WAT 2017 [END]:Thu Apr 20 12:44:45 WAT 2017[EXEC_TIME]:39 ms
[2017-04-20 12:44:46,010][http-bio-172.24.87.5-7890-exec-7366]-
[TXN_ID]:XX170420.1244.C01473[SERV]:BLKSRVREQ[MSISDN]:8127459541[SV_CHRG_ID]:37152[RESP_CODE]:200[START]:Thu Apr 20 12:44:45 WAT 2017 [END]:Thu Apr 20 12:44:46 WAT 2017[EXEC_TIME]:221 ms
[TXNID]:XX170420.1244.C01473[TYPE]:SERVICE_CHARGE_PAYER_PAYEE[AMT]:0[PR_MSISDN]:8127459541[PR_MFS]:101[PR_W_TYPE]:12[PR_PREBAL]:0[PR_BAL]:0[PY_MSISDN]:IND03[PY_MFS]:101[PY_W_TYPE]:null[PY_PRE
BAL]:2853870[PY_BAL]:2853870
[2017-04-20 12:44:49,989][http-bio-172.24.87.5-7890-exec-7371]-
[TXN_ID]:XX170420.1244.C01475[SERV]:BLKSRVREQ[MSISDN]:8089138902[SV_CHRG_ID]:37152[RESP_CODE]:200[START]:Thu Apr 20 12:44:49 WAT 2017 [END]:Thu Apr 20 12:44:49 WAT 2017[EXEC_TIME]:57 ms
[TXNID]:XX170420.1244.C01475[TYPE]:SERVICE_CHARGE_PAYER_PAYEE[AMT]:0[PR_MSISDN]:8089138902[PR_MFS]:101[PR_W_TYPE]:12[PR_PREBAL]:0[PR_BAL]:0[PY_MSISDN]:IND03[PY_MFS]:101[PY_W_TYPE]:null[PY_PRE
BAL]:3071459[PY_BAL]:3071459

Whenever you have name->value mappings in an input file it's a good idea to first create an array of that mapping (n2v[] below) and then you can just reference each field by it's name rather than it's position, e.g.:
$ cat tst.awk
{
delete n2v
while ( match($0,/\[[^]]+]:/) ) {
if ( name != "" ) {
value = substr($0,1,RSTART-1)
sub(/\[.*/,"",value)
n2v[name] = value
}
name = substr($0,RSTART+1,RLENGTH-3)
$0 = substr($0,RSTART+RLENGTH)
}
value = $0
n2v[name] = value
for (name in n2v) {
value = n2v[name]
print name, "->", value
}
}
$ head -1 file | awk -f tst.awk
EXEC_TIME -> 9065 ms
START -> Thu Apr 20 12:44:23 WAT 2017
RESP_CODE -> 200
SV_CHRG_ID -> 37152
TXN_ID -> MP170420.0548.T00003
END -> Thu Apr 20 12:44:23 WAT 2017
MSISDN -> 8028359017
SERV -> BLKSRVREQ
You can then tweak the above to do whatever you want:
$ cat tst.awk
{
delete n2v
while ( match($0,/\[[^]]+]:/) ) {
if ( name != "" ) {
value = substr($0,1,RSTART-1)
sub(/\[.*/,"",value)
n2v[name] = value
}
name = substr($0,RSTART+1,RLENGTH-3)
$0 = substr($0,RSTART+RLENGTH)
}
value = $0
n2v[name] = value
}
n2v["EXEC_TIME"]+0 > 5000 { print n2v["TXN_ID"], n2v["EXEC_TIME"] }
$ awk -f tst.awk file
MP170420.0548.T00003 9065 ms

Related

conditional statement with awk

I'm new with linux
I'm trying to get logs between two dates with gawk.
this is my log
Oct 07 11:00:33 abcd
Oct 08 12:00:33 abcd
Oct 09 14:00:33 abcd
Oct 10 21:00:33 abcd
I can do it when both start and end date are sent
but I have problem when start or end date or both are not sent
and I don't know how to check it .
I've written below code but it has syntax error .
sudo gawk -v year='2022' -v start='' -v end='2022:10:08 21:00:34' '
BEGIN{ gsub(/[:-]/," ", start); gsub(/[:-]/," ", end) }
{ dt=year" "$1" "$2" "$3; gsub(/[:-]/," ", dt) }
if(start && end){mktime(dt)>=mktime(start) && mktime(dt)<=mktime(end)}
else if(end){mktime(dt)<=mktime(end)}
else if(start){mktime(dt)>=mktime(start)} ' log.txt
How can I modify this code ?
I'd write:
gawk -v end="Oct 10 12:00:00" '
function to_epoch(timestamp, n, a) {
n = split(timestamp, a, /[ :]/)
return mktime(strftime("%Y", systime()) " " month[a[1]] " " a[2] " " a[3] " " a[4] " " a[5])
}
BEGIN {
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", m)
for (i=1; i<=12; i++) month[m[i]]=i
if (start) {_start = to_epoch(start)} else {_start = 0}
if (end) {_end = to_epoch(end)} else {_end = 2**31}
}
{ ts = to_epoch($0) }
_start <= ts && ts <= _end
' log.txt
You'll pass the start and/or end variables with the same datetime format as appears in the log file.
This would be easier with dateutils, e.g.:
<infile dategrep -i '%b %d %H:%M:%S' '>Oct 08 00:00:00' |
dategrep -i '%b %d %H:%M:%S' '<Oct 09 23:59:59'
Output:
Oct 08 12:00:33 abcd
Oct 09 14:00:33 abcd

how to skip blank space if value is not there and print proper row and column

I have one details.txt file which has below data
size=190000
date=1603278566981
repo-name=testupload
repo-path=/home/test/testupload
size=140000
date=1603278566981
repo-name=testupload2
repo-path=/home/test/testupload2
size=170000
date=1603278566981
repo-name=testupload3
repo-path=/home/test/testupload3
and below awk script process that to
#!/bin/bash
awk -vOFS='\t' '
BEGIN{ FS="=" }
/^size/{
if(++count1==1){ header=$1"," }
sizeArr[++count]=$NF
next
}
/^#repo-name/{
if(++count2==1){ header=header OFS $1"," }
repoNameArr[count]=$NF
next
}
/^date/{
if(++count3==1){ header=header OFS $1"," }
dateArr[count]=$NF
next
}
/^#blob-name/{
if(++count4==1){ header=header OFS $1"," }
repopathArr[count]=$NF
next
}
END{
print header
for(i=1;i<=count;i++){
printf("%s,%s,%s,%s,%s\n",sizeArr[i],repoNameArr[i],dateArr[i],repopathArr[i])
}
}
' details.txt | tr -d # |awk -F, '{$3=substr($3,0,10)}1' OFS=,|sed 's/date/creationTime/g'
which prints value as expected, (because it has reponame)
size " repo-name" " creationTime" " blob-name"
10496000 testupload Fri 11 Dec 2020 07:35:56 AM CET testfile.tar11.gz
10496000 testupload Thu 10 Dec 2020 02:44:04 PM CET testfile.tar.gz
9602303 testupload Fri 11 Dec 2020 07:38:58 AM CET apache-maven-3.6.3-bin/apache-maven-3.6.3-bin.zip
but when something is missing in file format of file gets wrong format (here repo name jumps to last column's headers as first few data don't have reponame value)
size " creationTimeime" " blob-name" " " repo-name"
261304 Thu 13 Feb 2020 08:50:02 AM CET temp 8963d25231b
29639 Thu 13 Feb 2020 08:50:00 AM CET temp 3780c72cab5
93699 Thu 13 Feb 2020 08:50:00 AM CET temp 209276c91ba
and column headers gets wrongly printed but data gets printed perfectly, is there any thing that validate if one of the field is not there it should skip that and print the rest in proper format.
If data is not available it should keep that header same, it should not headers sequence.
My requirement
if deatils.txt is missing any records it should skip that and print as blank and prints as per header.
Headers gets disturbed if repo-name field is not there but rest output is correct so we need to have headers intact even if field is missing.
Wrong:
size " creationTimeime" " blob-name" " " repo-name"
261304 Thu 13 Feb 2020 08:50:02 AM CET temp 8963d25231b
29639 Thu 13 Feb 2020 08:50:00 AM CET temp 3780c72cab5
93699 Thu 13 Feb 2020 08:50:00 AM CET temp 209276c91ba
Right
size " repo-name" " creationTime" " blob-name"
10496000 testupload Fri 11 Dec 2020 07:35:56 AM CET testfile.tar11.gz
10496000 testupload Thu 10 Dec 2020 02:44:04 PM CET testfile.tar.gz
9602303 testupload Fri 11 Dec 2020 07:38:58 AM CET apache-maven-3.6.3-bin/apache-maven-3.6.3-bin.zip
Thanks
samurai
You may try this gnu awk:
awk -F= -v OFS='\t' 'function prt(ind, name, s) {s=map[ind][name]; return (s==""?" ":s);} {map[NR][$1] = $2} END {print "Size", "Repo Name", "CreationTime", "Repo Path"; for (i=1; i<=NR; i+=4) print prt(i, "size"), prt(i+2, "repo-name"), prt(i+1, "date"), prt(i+3, "repo-path")}' file
Size Repo Name CreationTime Repo Path
190000 testupload 1603278566981 /home/test/testupload
140000 testupload2 1603278566981 /home/test/testupload2
170000 testupload3 1603278566981 /home/test/testupload3
To make it readable:
awk -F= -v OFS='\t' 'function prt(ind, name, s) {
s = map[ind][name]
return (s==""?" ":s)
}
{
map[NR][$1] = $2
}
END {
print "Size", "Repo Name", "CreationTime", "Repo Path"
for (i=1; i<=NR; i+=4)
print prt(i, "size"), prt(i+2, "repo-name"), prt(i+1, "date"), prt(i+3, "repo-path")
}' file

How to monitor newly created file in a directory with bash?

I have a log directory that consists of bunch of log files, one log file is created once an system event has happened. I want to write an oneline bash script that always monitors the file list and display the content of the newly created file on the terminal. Here is what it looks like:
Currently, all I have is to display the content of the whole directory:
for f in *; do cat $f; done
It lacks the monitoring feature that I wanted. One limitation of my system is that I do not have watch command. I also don't have any package manager to install fancy tools. Raw BSD is all I have. I do have tail, I was thinking of something like tail -F $(ls) but this tails each file instead of the file list.
In summary, I want to modify my script such that I can monitor the content of all newly created files.
First approach - use a hidden file in you dir (in my example it has a name .watch). Then you one-liner might look like:
for f in $(find . -type f -newer .watch); do cat $f; done; touch .watch
Second approach - use inotify-tools: https://unix.stackexchange.com/questions/273556/when-a-particular-file-arrives-then-execute-a-procedure-using-shell-script/273563#273563
You can cram it into a one-liner if you want, but I'd recommend just running the script in the background:
#!/bin/bash
[ ! -d "$1" ] && {
printf "error: argument is not a valid directory to monitory.\n"
exit 1
}
while :; fname="$1/$(inotifywait -q -e modify -e create --format '%f' "$1")"; do
cat "$fname"
done
Which will watch the directory given as the first argument, and cat any new or changed file in that directory. Example:
$ bash watchdir.sh my_logdir &
Which will then cat new or changed files in my_logdir.
Using inotifywait in monitor mode
First this little demo:
Open one terminal and run this:
ext=(php css other)
while :;do
subname=''
((RANDOM%10))||printf -v subname -- "-%04x" $RANDOM
date >/tmp/test$subname.${ext[RANDOM%3]}
sleep 1
done
This will create randomly files named /tmp/test.php, /tmp/test.css and /tmp/test.other, but randomly (approx 1 time / 10), the name will be /tmp/test-XXXX.[css|php|other] where XXXX is an hexadecimal random number.
Open another terminal and run this:
waitPaths=(/{home,tmp})
while read file ;do
if [ "$file" ] &&
( [ -z "${file##*.php}" ] || [ -z "${file##*.css}" ] ) ;then
(($(stat -c %Y-%X $file)))||echo -n new
echo file: $file, content:
cat $file
fi
done < <(
inotifywait -qme close_write --format %w%f ${waitPaths[*]}
)
This may produce something like:
file: /tmp/test.css, content:
Tue Apr 26 18:53:19 CEST 2016
file: /tmp/test.php, content:
Tue Apr 26 18:53:21 CEST 2016
file: /tmp/test.php, content:
Tue Apr 26 18:53:23 CEST 2016
file: /tmp/test.css, content:
Tue Apr 26 18:53:25 CEST 2016
file: /tmp/test.php, content:
Tue Apr 26 18:53:27 CEST 2016
newfile: /tmp/test-420b.php, content:
Tue Apr 26 18:53:28 CEST 2016
file: /tmp/test.php, content:
Tue Apr 26 18:53:29 CEST 2016
file: /tmp/test.php, content:
Tue Apr 26 18:53:30 CEST 2016
file: /tmp/test.php, content:
Tue Apr 26 18:53:31 CEST 2016
Some explanation:
waitPaths=(/{home,tmp}) could be written waitPaths=(/home /tmp) or for only one directory: waitPaths=/var/log
if condition search for filenames matching *.php or *.css
(($(stat -c %Y-%X $file)))||echo -n new will compare creation and modification time.
inotifywait
-q to stay quiet (don't print more then required)
-m for monitor mode: Command don't termine, but print each matching event.
-e close_write react only to specified kind of event.
-f %w%f Output format: path/file
Another way:
There is a more sophisticated sample:
Listenning for two kind of events (CLOSE_WRITE | CREATE)
Using a list of new files flags for knowing which files are new when CLOSE_WRITE event occur.
In second console, hit Ctrl+C, or in new terminal, tris this:
waitPaths=(/{home,tmp})
declare -A newFiles
while read path event file; do
if [ "$file" ] && ( [ -z "${file##*.php}" ] || [ -z "${file##*.css}" ] ); then
if [ "$event" ] && [ -z "${event//*CREATE*}" ]; then
newFiles[$file]=1
else
if [ "${newFiles[$file]}" ]; then
unset newFiles[$file]
echo NewFile: $file, content:
sed 's/^/>+ /' $file
else
echo file: $file, content:
sed 's/^/> /' $path/$file
fi
fi
fi
done < <(inotifywait -qme close_write -e create ${waitPaths[*]})
May produce something like:
file: test.css, content:
> Tue Apr 26 22:16:02 CEST 2016
file: test.php, content:
> Tue Apr 26 22:16:03 CEST 2016
NewFile: test-349b.css, content:
>+ Tue Apr 26 22:16:05 CEST 2016
file: test.css, content:
> Tue Apr 26 22:16:08 CEST 2016
file: test.css, content:
> Tue Apr 26 22:16:10 CEST 2016
file: test.css, content:
> Tue Apr 26 22:16:13 CEST 2016
Watching for new files AND new lines in old files, using bash
There is another solution by using some bashisms like associative arrays:
Sample:
wpath=/var/log
while : ;do
while read -a crtfile ;do
if [ "${crtfile:0:1}" = "-" ] &&
[ "${crtfile[8]##*.}" != "gz" ] &&
[ "${files[${crtfile[8]}]:-0}" -lt ${crtfile[4]} ] ;then
printf "\e[47m## %-14s :- %(%a %d %b %y %T)T ##\e[0m\n" ${crtfile[8]} -1
tail -c +$[1+${files[${crtfile[8]}]:-0}] $wpath/${crtfile[8]}
files[${crtfile[8]}]=${crtfile[4]}
fi
done < <( /bin/ls -l $wpath )
sleep 1
done
This will dump each files (with filename not ending by .gz) in /var/log, and watch for modification or new files, then dump new lines.
Demo:
In a first terminal console, hit:
ext=(php css other)
( while :; do
subname=''
((RANDOM%10)) || printf -v subname -- "-%04x" $RANDOM
name=test$subname.${ext[RANDOM%3]}
printf "%-16s" $name
{
date +"%a %d %b %y %T" | tee /dev/fd/5
fortune /usr/share/games/fortunes/bofh-excuses
} >> /tmp/$name
sleep 1
done ) 5>&1
You need to have fortune installed with BOFH excuses librarie.
If you really not have fortune, you could use this instead:
LANG=C ext=(php css other)
( while :; do
subname=''
((RANDOM%10)) || printf -v subname -- "-%04x" $RANDOM
name=test$subname.${ext[RANDOM%3]}
printf "%-16s" $name
{
date +"%a %d %b %y %T" | tee /dev/fd/5
for ((1; RANDOM%5; 1))
do
printf -v str %$[RANDOM&12]s
str=${str// /blah, }
echo ${str%, }.
done
} >> /tmp/$name
sleep 1
done ) 5>&1
This may output something like:
test.css Thu 28 Apr 16 12:00:02
test.php Thu 28 Apr 16 12:00:03
test.other Thu 28 Apr 16 12:00:04
test.css Thu 28 Apr 16 12:00:05
test.css Thu 28 Apr 16 12:00:06
test.other Thu 28 Apr 16 12:00:07
test.php Thu 28 Apr 16 12:00:08
test.css Thu 28 Apr 16 12:00:09
test.other Thu 28 Apr 16 12:00:10
test.other Thu 28 Apr 16 12:00:11
test.php Thu 28 Apr 16 12:00:12
test.other Thu 28 Apr 16 12:00:13
In a second terminal console, hit:
declare -A files
wpath=/tmp
while :; do
while read -a crtfile; do
if [ "${crtfile:0:1}" = "-" ] && [ "${crtfile[8]:0:4}" = "test" ] &&
( [ "${crtfile[8]##*.}" = "css" ] || [ "${crtfile[8]##*.}" = "php" ] ) &&
[ "${files[${crtfile[8]}]:-0}" -lt ${crtfile[4]} ]; then
printf "\e[47m## %-14s :- %(%a %d %b %y %T)T ##\e[0m\n" ${crtfile[8]} -1
tail -c +$[1+${files[${crtfile[8]}]:-0}] $wpath/${crtfile[8]}
files[${crtfile[8]}]=${crtfile[4]}
fi
done < <(/bin/ls -l $wpath)
sleep 1
done
This will each seconds
for all entries in watched directory
search for files (first caracter is -),
search for filenames begining by test,
search for filenames ending by css or php,
compare already printed sizes with new file size,
if new size greater,
print out new bytes by using tail -c and
store new already printed size
sleep 1 seconds
this may output something like:
## test.css :- Thu 28 Apr 16 12:00:09 ##
Thu 28 Apr 16 12:00:02
BOFH excuse #216:
What office are you in? Oh, that one. Did you know that your building was built over the universities first nuclear research site? And wow, aren't you the lucky one, your office is right over where the core is buried!
Thu 28 Apr 16 12:00:05
BOFH excuse #145:
Flat tire on station wagon with tapes. ("Never underestimate the bandwidth of a station wagon full of tapes hurling down the highway" Andrew S. Tannenbaum)
Thu 28 Apr 16 12:00:06
BOFH excuse #301:
appears to be a Slow/Narrow SCSI-0 Interface problem
## test.php :- Thu 28 Apr 16 12:00:09 ##
Thu 28 Apr 16 12:00:03
BOFH excuse #36:
dynamic software linking table corrupted
Thu 28 Apr 16 12:00:08
BOFH excuse #367:
Webmasters kidnapped by evil cult.
## test.css :- Thu 28 Apr 16 12:00:10 ##
Thu 28 Apr 16 12:00:09
BOFH excuse #25:
Decreasing electron flux
## test.php :- Thu 28 Apr 16 12:00:13 ##
Thu 28 Apr 16 12:00:12
BOFH excuse #3:
electromagnetic radiation from satellite debris
Nota: If some file are modified more than one time between two checks, all modification will be printed on next check.
Although not really nice, the following gives (and repeats) the last 50 lines of the newest file in the current directory:
while true; do tail -n 50 $(ls -Art | tail -n 1); sleep 5; done
You can refresh every minute using a cronjob:
$crontabe -e
* * * * * /home/script.sh
if you need to refresh in less than a minute you can use the command "sleep" inside your script.

Why linux split program have weird behavior with large files >20GB?

I'm doing the next statement on my ubuntu:
split --number=l/5 /pathToSource.csv /pathToOutputDirectory
If i do a "ls"
myUser#serverNAme:/pathToOutputDirectory> ls -la
total 21467452
drwxr-xr-x 2 myUser group 4096 Jun 23 08:51 .
drwxrwxrwx 4 myUser group 4096 Jun 23 08:44 ..
-rw-r--r-- 1 myUser group 10353843231 Jun 23 08:48 aa
-rw-r--r-- 1 myUser group 0 Jun 23 08:48 ab
-rw-r--r-- 1 myUser group 11376663825 Jun 23 08:51 ac
-rw-r--r-- 1 myUser group 0 Jun 23 08:51 ad
-rw-r--r-- 1 myUser group 252141913 Jun 23 08:51 ae
If i do a "du" over ab and ad files.
$du -h ab ad
0 ab
0 ad
As you can see, split divided the file in a non-homogeneous form.
Anyone know what's going on?
Some unprintable character can hang the split?
Thank you.
Best Regards!
Francisco.
While this is unusual data with an average line length of 114137, I'm not sure that fully describes the issue. Hmm you've 21982648969 of data => each bucket that split is trying to fill is 4396529793. That's larger than 2^32. I wonder do we have a 32 bit overflow. Are you on a 32 bit or 64 bit platform? Looking at the code I don't see an overflow issue TBH. Note you could anonymize and compress the data providing the following file for download somewhere:
tr -c '\n' . < /pathToSource.csv | xz > /pathToSource.csv.xz
It's also worth specifying the version since implementation changed a bit between v8.8 and v8.13
A workarround in groovy:
class Sanitizer {
public static void main(String[] args) {
def textOnly = new File('/path/NoDanger.txt')
def data = new File('/path/danger.txt')
String line = null
data.withReader { reader ->
while ( ( line = reader.readLine() ) != null ){
/*char[] stringToCharArray = line.toCharArray();
for(int i = 0; i < 5; i++ ){
char a = stringToCharArray[i]
int b = Character.getNumericValue(a);
println Integer.toHexString(b)
if (!(b =~ /\w/)) {
println "inside"
} else println "outside"
}*/
String newString = line.replaceAll("[^\\p{Print}]", "");
textOnly << newString+"\n"
}
} //reader
}
}

Awk/Perl convert textfile to csv with sensible format

I have a historical autogenerated logfile with the following format that I would like to convert to a csv file prior to uploading to a database
--------------------------------------
Thu Jul 8 09:34:12 BST 2010
BLUE Head 1
Duration = 20 s
Activity = 14.9 MBq
Sensitivity = 312 cps/MBq
--------------------------------------
Thu Jul 8 09:34:55 BST 2010
BLUE Head 1
Duration = 20 s
Activity = 14.9 MBq
Sensitivity = 318 cps/MBq
--------------------------------------
Thu Jul 8 10:13:39 BST 2010
RED Head 1
Duration = 20 s
Activity = 14.9 MBq
Sensitivity = 307 cps/MBq
--------------------------------------
Thu Jul 8 10:14:10 BST 2010
RED Head 1
Duration = 20 s
Activity = 14.9 MBq
Sensitivity = 305 cps/MBq
--------------------------------------
Mon Jul 19 10:11:18 BST 2010
BLUE Head 1
Duration = 20 s
Activity = 12.4 MBq
Sensitivity = 326 cps/MBq
--------------------------------------
Mon Jul 19 10:12:09 BST 2010
BLUE Head 1
Duration = 20 s
Activity = 12.4 MBq
Sensitivity = 333 cps/MBq
--------------------------------------
Mon Jul 19 10:13:57 BST 2010
RED Head 1
Duration = 20 s
Activity = 12.4 MBq
Sensitivity = 338 cps/MBq
--------------------------------------
Mon Jul 19 10:14:45 BST 2010
RED Head 1
Duration = 20 s
Activity = 12.4 MBq
Sensitivity = 340 cps/MBq
--------------------------------------
I would like to convert the logfile to the following format
Date,Camera,Head,Duration,Activity
08/07/10,BLUE,1,20,14.9
08/07/10,BLUE,1,20,14.9
08/07/10,RED,1,20,14.9
08/07/10,RED,1,20,14.9
I have used awk to get me close to what I wish
awk 'BEGIN {print "Date,Camera,Head,Duration,Activity";RS = "--------------------------------------"; FS="\n";}; {OFS=",";split($3, a, " ");split($4,b, " "); split($5,c," ");print $2,a[1],a[3],b[3],c[3]}' sensitivity.txt > sensitivity.csv
which gives me
Date,Camera,Head,Duration,Activity
,,,,
Thu Jul 8 09:34:12 BST 2010,BLUE,1,20,14.9
Thu Jul 8 09:34:55 BST 2010,BLUE,1,20,14.9
Thu Jul 8 10:13:39 BST 2010,RED,1,20,14.9
Thu Jul 8 10:14:10 BST 2010,RED,1,20,14.9
How can I
(a) get rid of the 4 output field separators in line 4
(b) Convert the date format from Thu Jul 8 09:34:12 BST 2010 to DD/MM/YY (Can I do this in pure awk or by piping to perl)
#sudo_O's answer is fine but here's an alternative:
$ cat tst.awk
BEGIN{ RS="---+\n"; OFS=","; months="JanFebMarAprMayJunJulAugSepOctNovDec" }
NR==1{ print "Date","Camera","Head","Duration","Activity"; next }
{ print sprintf("%04d%02d%02d",$6,(match(months,$2)+2)/3,$3),$7,$9,$12,$16 }
$ gawk -f tst.awk file
Date,Camera,Head,Duration,Activity
20100708,BLUE,1,20,14.9
20100708,BLUE,1,20,14.9
20100708,RED,1,20,14.9
20100708,RED,1,20,14.9
20100719,BLUE,1,20,12.4
20100719,BLUE,1,20,12.4
20100719,RED,1,20,12.4
20100719,RED,1,20,12.4
Note that I used GNU awk above so I could set the RS to more than a single character. With other awks just convert all the "---..."s lines to a blank line or control character or something and set RS accordingly before running the script.
If you don't like my suggested date format, tweak the sprintf() to suit.
This straight forward awk script will do the job:
BEGIN {
n=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",month,"|")
for (i=1;i<=n;i++) {
month_index[month[i]] = i
}
print "Date,Camera,Head,Duration,Activity"
}
/^-*$/{
i=0
next
}
{
i++
}
i==1{
printf "%02d/%02d/%02d,",$3,month_index[$2],substr($6,3)
}
i==2{
printf "%s,%d,",$1,$3
}
i==3{
printf "%d,",$3
}
i==4{
printf "%.1f\n",$3
}
Outputs:
$ awk -f script.awk file
08/07/10,BLUE,1,20,14.9
08/07/10,BLUE,1,20,14.9
08/07/10,RED,1,20,14.9
08/07/10,RED,1,20,14.9
19/07/10,BLUE,1,20,12.4
19/07/10,BLUE,1,20,12.4
19/07/10,RED,1,20,12.4
19/07/10,RED,1,20,12.4
I figured I would show how to actually parse the input, rather than just performing string transformations.
#! /usr/bin/env perl
use strict;
use warnings;
use Date::Parse;
use Date::Format;
use Text::CSV;
sub convert_date{
my $time = str2time($_[0]);
# iso 8601 style:
return time2str('%Y-%m-%d',$time); # YYYY-MM-DD
# or the outdated style output you wanted
return time2str('%d/%m/%y',$time); # DD/MM/YY
}
my %multiply_table = (
s => 1,
m => 60,
h => 60 * 60,
d => 60 * 60 * 24,
);
sub convert_duration{
my($d,$s) = $_[0] =~ /^ \s* (\d+) \s* (\w) \s* $/x;
die "Invalid duration '$_[0]'" unless $d && $s;
return $d * $multiply_table{$s};
}
my #field_list = qw'Date Camera Head Duration Activity';
my $csv = Text::CSV->new( { eol => "\n" } );
# print header
$csv->print( \*STDOUT, \#field_list );
# set record separator
local $/ = ('-' x 38) . "\n";
# parse data
while(<>){
chomp; # remove record separator
next unless $_; # skip empty section
my($time,$camdat,#fields) = split m/\n/; # split up the fields
my %data;
# split camera and head fields
#data{qw(Camera Head)} = split /\s+Head\s+/, $camdat;
# parse lines like:
# Duration = 20 s
# Activity = 14.9 MBq
# Sensitivity = 305 cps/MBq
for(#fields){
my($key,$value) = /(\w+) \s* = \s* (.*) /x;
$data{$key} = $value;
}
# at this point we start reducing precision
$data{Date} = convert_date( $time );
# remove measurement units
$data{Duration} = convert_duration($data{Duration}); # safe
$data{Activity} =~ s/[^\d]*$//; # unsafe
$csv->print(\*STDOUT, [#data{#field_list}]);
}

Resources