I need to add a comma to the end of every fourth line. here is a example of the output followed by what i am looking for.
("tester1",
"SERVICE_TICKET_CREATED",
"Thu Mar 19 23:27:57 UTC 2015",
"73.217.129.159")
("tester1",
"SERVICE_TICKET_CREATED",
"Fri Mar 20 00:31:59 UTC 2015",
"73.217.129.159")
And what I need
("tester1",
"SERVICE_TICKET_CREATED",
"Thu Mar 19 23:27:57 UTC 2015",
"73.217.129.159"),
("tester1",
"SERVICE_TICKET_CREATED",
"Fri Mar 20 00:31:59 UTC 2015",
"73.217.129.159"),
Using awk
awk is well-suited to this:
$ awk '0==NR%4{$0=$0","} 1' file
("tester1",
"SERVICE_TICKET_CREATED",
"Thu Mar 19 23:27:57 UTC 2015",
"73.217.129.159"),
("tester1",
"SERVICE_TICKET_CREATED",
"Fri Mar 20 00:31:59 UTC 2015",
"73.217.129.159"),
How it works:
0==NR%4{$0=$0","}
NR is the line number. NR%4 is the line number modulo 4. Thus, 0 == NR%4 on every fourth line. For those lines, we add a comma at the end: $0=$0",".
1
This is awk's cryptic shorthand for print-the-line.
Using sed
It looks like you want a comma after every line that ends with a close-parens. If that is the case, then:
$ sed 's/)$/),/' file
("tester1",
"SERVICE_TICKET_CREATED",
"Thu Mar 19 23:27:57 UTC 2015",
"73.217.129.159"),
("tester1",
"SERVICE_TICKET_CREATED",
"Fri Mar 20 00:31:59 UTC 2015",
"73.217.129.159"),
If your goal is to add a comma after every closing ) then you can do the following:
sed 's/)$/),/'
This would accommodate records that differed in number of lines.
You can just subtitute any ending ")" with ")," by using this sed command:
sed 's/)$/),/' <your file>
Related
How to convert any date format in milliseonds in linux/centos/rhel with shell script
Date formats like:
Thu, 23 Jun 2022 08:27:26 +0000,
Thu Jun 23 11:11:43 UTC 2022,
2022-06-23T08:28:23Z
This can be converted in milliseconds with command:
date -d "2022-06-23T08:28:23Z" +"%s%N"
Any other format of date can also be converted with above command like:
date -d "Thu, 23 Jun 2022 08:27:26 +0000" +"%s%N"
date -d "Thu Jun 23 11:11:43 UTC 2022" +"%s%N"
I'll preface this with the fact that I have no knowledge of awk (or maybe it's sed I need?) and fairly basic knowledge of grep and Linux, so apologies if this is a really dumb question. I find the man pages really difficult to decipher, and googling has gotten me quite far in my solution but not far enough to tie the two things I need to do together. Onto the problem...
I have some log files that I'm trying to extract rows from that are on a Linux server, named in the format aYYYYMMDD.log, that are all along the lines of:
Starting Process A
Wed 27 Oct 18:15:39 BST 2021 >>> /dir/task1 start <<<
...
Wed 27 Oct 18:15:40 BST 2021 >>> /dir/task1 end <<<
Wed 27 Oct 18:15:40 BST 2021 >>> /dir/task2 start <<<
...
Wed 27 Oct 18:15:42 BST 2021 >>> /dir/task2 end <<<
...
...
Wed 27 Oct 18:15:53 BST 2021 >>> /dir/taskreporting start <<<
...
Wed 27 Oct 18:15:53 BST 2021 >>> Starting task90 <<<
...
Wed 27 Oct 18:15:54 BST 2021 >>> Finishing task90 <<<
Wed 27 Oct 18:15:54 BST 2021 >>> Starting task91 <<<
...
Wed 27 Oct 18:15:57 BST 2021 >>> Finishing task91 <<<
...
...
Wed 27 Oct 18:16:12 BST 2021 >>> Starting task99 <<<
...
Wed 27 Oct 18:16:27 BST 2021 >>> Finishing task99 <<<
...
Wed 27 Oct 18:16:27 BST 2021 >>> /dir/taskreporting end <<<
...
Ended Process A
(I've excluded the log rows which are irrelevant to my requirement; )
I need to find what tasks were run during the taskreporting task, which I have managed to do with the following command (thanks to this other stackoverflow post):
awk '/taskreporting start/{flag=1;next}/taskreporting end/{flag=0}flag' <specific filename>.log | grep 'Starting task\|Finishing task'
This works well when I run it against a single file and produces output like:
Wed 27 Oct 18:15:53 BST 2021 >>> Starting task90 <<<
Wed 27 Oct 18:15:54 BST 2021 >>> Finishing task90 <<<
Wed 27 Oct 18:15:54 BST 2021 >>> Starting task91 <<<
Wed 27 Oct 18:15:57 BST 2021 >>> Finishing task91 <<<
...
Wed 27 Oct 18:16:12 BST 2021 >>> Starting task99 <<<
Wed 27 Oct 18:16:27 BST 2021 >>> Finishing task99 <<<
which is pretty much what I want to see. However, as I have multiple files to extract (having amended the filename in the above command appropriately, e.g. to *.log), I need to output the filename alongside the rows, so that I know which file the info belongs to, e.g. I'd like to see:
a211027.log Wed 27 Oct 18:15:53 BST 2021 >>> Starting task90 <<<
a211027.log Wed 27 Oct 18:15:54 BST 2021 >>> Finishing task90 <<<
a211027.log Wed 27 Oct 18:15:54 BST 2021 >>> Starting task91 <<<
a211027.log Wed 27 Oct 18:15:57 BST 2021 >>> Finishing task91 <<<
...
a211027.log Wed 27 Oct 18:16:12 BST 2021 >>> Starting task99 <<<
a211027.log Wed 27 Oct 18:16:27 BST 2021 >>> Finishing task99 <<<
I've googled and it seems like {print FILENAME} is what I need, but I couldn't figure out where to add it into my current awk command. How can I amend my awk command to get it to add the filename to the beginning of the rows? Or is there a better way of achieving my aim?
As you have provided most of the answer yourself, all that is needed is {print FILENAME, $0} which will add the filename in front of the rest of the content $0
awk '/taskreporting start/{flag=1;next}/taskreporting end/{flag=0}flag {print FILENAME, $0}' <specific filename>.log
Below is content of file. I want to find out difference between each line of first field.
0.607401 # Tue Mar 27 04:30:01 IST 2018
0.607401 # Tue Mar 27 04:35:02 IST 2018
0.606325 # Tue Mar 27 04:40:02 IST 2018
0.606223 # Tue Mar 27 04:45:01 IST 2018
0.606167 # Tue Mar 27 04:50:02 IST 2018
0.605716 # Tue Mar 27 04:55:01 IST 2018
0.605716 # Tue Mar 27 05:00:01 IST 2018
0.607064 # Tue Mar 27 05:05:01 IST 2018
output:-
0
-0.001076
-0.000102
.019944
..
..
.001348
CODE:
awk '{s=$0;getline;print s-$0;next}' a.txt
However this does not work as expected...
Could you help me please?
You can use the following awk code:
$ awk 'NR==1{save=$1;next}NR>1{printf "%.6f\n",($1-save);save=$1}' file
0.000000
-0.001076
-0.000102
-0.000056
-0.000451
0.000000
0.001348
and format the output as you want by modifying the printf.
The way you are currently doing will skip some lines!!!
i have CSV file with some database benchmark results here is the example:
Date;dbms;type;description;W;D;S;results;time;id
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;570;265;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;420;215;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;500;365;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;530;255;50
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;870;265;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;620;215;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;700;365;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;530;255;99
i need to process all rows with the same id (value of the last column) and get this:
Date;dbms;type;description;W;D;S;time;results;results/time
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;sum column 8;sum column 9;(sum column 8 / sum column 9)
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;sum column 8;sum column 9;(sum column 8 / sum column 9)
for now i can only do the sum of column 8 with the awk command:
awk -F";" '{print;sum+=$8 }END{print "sum " sum}' ./file.CSV
Edit:
need help with some modification of script iam already using. here are real input data:
Date;dbms;type;description;W;D;time;TotalTransactions;NOTransactions;id
Mon Jun 15 14:53:41 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;10;272270;117508;50
Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;10;280080;110063;50
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;10;144170;31815;60
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;10;137570;33910;60
Mon Jun 15 15:24:04 CEST 2015;hsql;in-memory;TPC-C test results;2;1;10;226660;97734;70
Mon Jun 15 15:34:08 CEST 2015;hsql;in-memory;TPC-C test results;2;1;10;210420;95113;70
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;10;288360;119328;80
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;10;270360;124328;80
i need to sum values in time, TotalTransactions and NOTransactions columns and then add a column with value (sum NOTransactions/sum time)
iam using this script:
awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0}
(NR>1 && NF) {sum7[$10]+=$7; sum8[$10]+=$8; sum9[$10]+=$9; other[$10]=$0}
END {for (i in sum8)
{$0=other[i]; $7=sum7[i];$8=sum8[i]; $9=sum9[i]; $10=sprintf("%.0f", sum9[i]/sum7[i]); print}}' ./logsFinal.csv
gives me this output:
;;;;;;;;;results/time
Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;20;552350;227571;11379
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;20;281740;65725;3286
Mon Jun 15 15:34:08 CEST 2015;hsql;in-memory;TPC-C test results;2;1;20;437080;192847;9642
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;20;558720;243656;12183
Date;dbms;type;description;W;D;0;0;0;-nan
values looks good (except header row). But i need to get these results without id column (i want delete id column)
So i need to get same values but instead of identify processed rows with same values in id column it must be rows with same values in dbms AND W AND D columns
You can use this awk:
awk 'BEGIN{ FS=OFS=";" }
NR>1 && NF {
s=""
for(i=1; i<=7; i++)
s=s $i OFS;
a[$NF]=s;
sum8[$NF]+=$8
sum9[$NF]+=$9
} END{
for (i in a)
print a[i] sum8[i], sum9[i], (sum9[i]?sum8[i]/sum9[i]:"NaN")
}' file
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;2020;1100;1.83636
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;2720;1100;2.47273
This awk program will print the modified header and modify the output to contain the sums and their division:
awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0}
(NR>1 && NF) {sum8[$10]+=$8; sum9[$10]+=$9; other[$10]=$0}
END {for (i in sum8)
{$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}'
which gives:
Date;dbms;type;description;W;D;S;results;time;results/time
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;2020;1100;1.83636
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;2720;1100;2.47273
You don't seem to care for the ID in the result, but if you do, just replace $10= with $11=.
Also, if you need to sum things based on values of more than one column, you can create a temporary variable (a in the example below) which is a concatenation of two columns and use it as an index in the arrays, like this:
awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0}
(NR>1 && NF) {a=$5$6; sum8[a]+=$8; sum9[a]+=$9; other[a]=$0}
END {for (i in sum8)
{$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}'
Im trying to place a open parenthesis on the first line and close it as the end of the 4th line. Below is a example of the data followed by the output that I am looking for.
tester1
SERVICE_TICKET_CREATED
Thu Mar 19 23:27:57 UTC 2015
192.168.1.3
tester2
SERVICE_TICKET_CREATED
Fri Mar 20 00:31:59 UTC 2015
192.168.1.2
(tester1
SERVICE_TICKET_CREATED
Thu Mar 19 23:27:57 UTC 2015
192.168.1.3)
(tester2
SERVICE_TICKET_CREATED
Fri Mar 20 00:31:59 UTC 2015
192.168.1.2)
Using awk you can do it as
awk 'NR%4==1{print "("$0; next} NR%4==0{print $0")"; next}1'
Test
$ awk 'NR%4==1{print "("$0; next} NR%4==0{print $0")"; next}1' input
(tester1
SERVICE_TICKET_CREATED
Thu Mar 19 23:27:57 UTC 2015
192.168.1.3)
(tester2
SERVICE_TICKET_CREATED
Fri Mar 20 00:31:59 UTC 2015
192.168.1.2)
Shorter version
awk 'NR%4==1{$0="("$0} NR%4==0{$0=$0")"}1'
sed -r 's/^/(/;N;N;N;s/$/)/' input
The N reads the next line into the buffer. s/^/(/ puts an opening paren at the beginning, s/$/)/ puts a closing one at the end of the buffer.