sed place parentheses at the beginning and close on the 4th line - linux

Im trying to place a open parenthesis on the first line and close it as the end of the 4th line. Below is a example of the data followed by the output that I am looking for.
tester1
SERVICE_TICKET_CREATED
Thu Mar 19 23:27:57 UTC 2015
192.168.1.3
tester2
SERVICE_TICKET_CREATED
Fri Mar 20 00:31:59 UTC 2015
192.168.1.2
(tester1
SERVICE_TICKET_CREATED
Thu Mar 19 23:27:57 UTC 2015
192.168.1.3)
(tester2
SERVICE_TICKET_CREATED
Fri Mar 20 00:31:59 UTC 2015
192.168.1.2)

Using awk you can do it as
awk 'NR%4==1{print "("$0; next} NR%4==0{print $0")"; next}1'
Test
$ awk 'NR%4==1{print "("$0; next} NR%4==0{print $0")"; next}1' input
(tester1
SERVICE_TICKET_CREATED
Thu Mar 19 23:27:57 UTC 2015
192.168.1.3)
(tester2
SERVICE_TICKET_CREATED
Fri Mar 20 00:31:59 UTC 2015
192.168.1.2)
Shorter version
awk 'NR%4==1{$0="("$0} NR%4==0{$0=$0")"}1'

sed -r 's/^/(/;N;N;N;s/$/)/' input
The N reads the next line into the buffer. s/^/(/ puts an opening paren at the beginning, s/$/)/ puts a closing one at the end of the buffer.

Related

How to prepend the filename to extracted lines via awk and grep

I'll preface this with the fact that I have no knowledge of awk (or maybe it's sed I need?) and fairly basic knowledge of grep and Linux, so apologies if this is a really dumb question. I find the man pages really difficult to decipher, and googling has gotten me quite far in my solution but not far enough to tie the two things I need to do together. Onto the problem...
I have some log files that I'm trying to extract rows from that are on a Linux server, named in the format aYYYYMMDD.log, that are all along the lines of:
Starting Process A
Wed 27 Oct 18:15:39 BST 2021 >>> /dir/task1 start <<<
...
Wed 27 Oct 18:15:40 BST 2021 >>> /dir/task1 end <<<
Wed 27 Oct 18:15:40 BST 2021 >>> /dir/task2 start <<<
...
Wed 27 Oct 18:15:42 BST 2021 >>> /dir/task2 end <<<
...
...
Wed 27 Oct 18:15:53 BST 2021 >>> /dir/taskreporting start <<<
...
Wed 27 Oct 18:15:53 BST 2021 >>> Starting task90 <<<
...
Wed 27 Oct 18:15:54 BST 2021 >>> Finishing task90 <<<
Wed 27 Oct 18:15:54 BST 2021 >>> Starting task91 <<<
...
Wed 27 Oct 18:15:57 BST 2021 >>> Finishing task91 <<<
...
...
Wed 27 Oct 18:16:12 BST 2021 >>> Starting task99 <<<
...
Wed 27 Oct 18:16:27 BST 2021 >>> Finishing task99 <<<
...
Wed 27 Oct 18:16:27 BST 2021 >>> /dir/taskreporting end <<<
...
Ended Process A
(I've excluded the log rows which are irrelevant to my requirement; )
I need to find what tasks were run during the taskreporting task, which I have managed to do with the following command (thanks to this other stackoverflow post):
awk '/taskreporting start/{flag=1;next}/taskreporting end/{flag=0}flag' <specific filename>.log | grep 'Starting task\|Finishing task'
This works well when I run it against a single file and produces output like:
Wed 27 Oct 18:15:53 BST 2021 >>> Starting task90 <<<
Wed 27 Oct 18:15:54 BST 2021 >>> Finishing task90 <<<
Wed 27 Oct 18:15:54 BST 2021 >>> Starting task91 <<<
Wed 27 Oct 18:15:57 BST 2021 >>> Finishing task91 <<<
...
Wed 27 Oct 18:16:12 BST 2021 >>> Starting task99 <<<
Wed 27 Oct 18:16:27 BST 2021 >>> Finishing task99 <<<
which is pretty much what I want to see. However, as I have multiple files to extract (having amended the filename in the above command appropriately, e.g. to *.log), I need to output the filename alongside the rows, so that I know which file the info belongs to, e.g. I'd like to see:
a211027.log Wed 27 Oct 18:15:53 BST 2021 >>> Starting task90 <<<
a211027.log Wed 27 Oct 18:15:54 BST 2021 >>> Finishing task90 <<<
a211027.log Wed 27 Oct 18:15:54 BST 2021 >>> Starting task91 <<<
a211027.log Wed 27 Oct 18:15:57 BST 2021 >>> Finishing task91 <<<
...
a211027.log Wed 27 Oct 18:16:12 BST 2021 >>> Starting task99 <<<
a211027.log Wed 27 Oct 18:16:27 BST 2021 >>> Finishing task99 <<<
I've googled and it seems like {print FILENAME} is what I need, but I couldn't figure out where to add it into my current awk command. How can I amend my awk command to get it to add the filename to the beginning of the rows? Or is there a better way of achieving my aim?
As you have provided most of the answer yourself, all that is needed is {print FILENAME, $0} which will add the filename in front of the rest of the content $0
awk '/taskreporting start/{flag=1;next}/taskreporting end/{flag=0}flag {print FILENAME, $0}' <specific filename>.log

I want to find difference between 2 numbers stored in a file using shell script

Below is content of file. I want to find out difference between each line of first field.
0.607401 # Tue Mar 27 04:30:01 IST 2018
0.607401 # Tue Mar 27 04:35:02 IST 2018
0.606325 # Tue Mar 27 04:40:02 IST 2018
0.606223 # Tue Mar 27 04:45:01 IST 2018
0.606167 # Tue Mar 27 04:50:02 IST 2018
0.605716 # Tue Mar 27 04:55:01 IST 2018
0.605716 # Tue Mar 27 05:00:01 IST 2018
0.607064 # Tue Mar 27 05:05:01 IST 2018
output:-
0
-0.001076
-0.000102
.019944
..
..
.001348
CODE:
awk '{s=$0;getline;print s-$0;next}' a.txt
However this does not work as expected...
Could you help me please?
You can use the following awk code:
$ awk 'NR==1{save=$1;next}NR>1{printf "%.6f\n",($1-save);save=$1}' file
0.000000
-0.001076
-0.000102
-0.000056
-0.000451
0.000000
0.001348
and format the output as you want by modifying the printf.
The way you are currently doing will skip some lines!!!

Bash - print all seuqential lines from file and ignore non-sequential ones

I need to extract all sequential lines from a text file based on the sequence in the 4th column. This sequence is the current time, and there is only one entry for each second (so only one line). Sometimes in the file the sequence will break, because something has slowed down the script that creates it and it has skipped a second or two. As in the below example:
Thu Jun 8 14:17:31 CEST 2017 sync:1
Thu Jun 8 14:17:32 CEST 2017 sync:1
Thu Jun 8 14:17:33 CEST 2017 sync:1
Thu Jun 8 14:17:37 CEST 2017 sync:1 <--
Thu Jun 8 14:17:38 CEST 2017 sync:1
Thu Jun 8 14:17:39 CEST 2017 sync:1
Thu Jun 8 14:17:40 CEST 2017 sync:1
I need bash to ignore this line and continue without printing it, but still print everything before and after it. How should I go about that?
If you only care about the seconds field (eg, 14:17:39 -> 15:22:40 is clearly not sequential, but this code will think it is; if your data is sufficiently simple this may be fine):
awk 'NR==1 || $6 == (p + 1)%60 ; {p=$6}' FS=':\| *' input
To check the hour and minute, you could simply convert to seconds from midnight or add logic to compare the hours and minutes. Something like:
awk '{s=$4 * 3600 + $5 * 60 + $6} NR==1 || s == (p + 1)%86400 ; {p=s}' FS=':\| *' input

BASH - conditional sum of columns and rows in csv file

i have CSV file with some database benchmark results here is the example:
Date;dbms;type;description;W;D;S;results;time;id
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;570;265;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;420;215;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;500;365;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;530;255;50
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;870;265;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;620;215;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;700;365;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;530;255;99
i need to process all rows with the same id (value of the last column) and get this:
Date;dbms;type;description;W;D;S;time;results;results/time
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;sum column 8;sum column 9;(sum column 8 / sum column 9)
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;sum column 8;sum column 9;(sum column 8 / sum column 9)
for now i can only do the sum of column 8 with the awk command:
awk -F";" '{print;sum+=$8 }END{print "sum " sum}' ./file.CSV
Edit:
need help with some modification of script iam already using. here are real input data:
Date;dbms;type;description;W;D;time;TotalTransactions;NOTransactions;id
Mon Jun 15 14:53:41 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;10;272270;117508;50
Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;10;280080;110063;50
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;10;144170;31815;60
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;10;137570;33910;60
Mon Jun 15 15:24:04 CEST 2015;hsql;in-memory;TPC-C test results;2;1;10;226660;97734;70
Mon Jun 15 15:34:08 CEST 2015;hsql;in-memory;TPC-C test results;2;1;10;210420;95113;70
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;10;288360;119328;80
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;10;270360;124328;80
i need to sum values in time, TotalTransactions and NOTransactions columns and then add a column with value (sum NOTransactions/sum time)
iam using this script:
awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0}
(NR>1 && NF) {sum7[$10]+=$7; sum8[$10]+=$8; sum9[$10]+=$9; other[$10]=$0}
END {for (i in sum8)
{$0=other[i]; $7=sum7[i];$8=sum8[i]; $9=sum9[i]; $10=sprintf("%.0f", sum9[i]/sum7[i]); print}}' ./logsFinal.csv
gives me this output:
;;;;;;;;;results/time
Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;20;552350;227571;11379
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;20;281740;65725;3286
Mon Jun 15 15:34:08 CEST 2015;hsql;in-memory;TPC-C test results;2;1;20;437080;192847;9642
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;20;558720;243656;12183
Date;dbms;type;description;W;D;0;0;0;-nan
values looks good (except header row). But i need to get these results without id column (i want delete id column)
So i need to get same values but instead of identify processed rows with same values in id column it must be rows with same values in dbms AND W AND D columns
You can use this awk:
awk 'BEGIN{ FS=OFS=";" }
NR>1 && NF {
s=""
for(i=1; i<=7; i++)
s=s $i OFS;
a[$NF]=s;
sum8[$NF]+=$8
sum9[$NF]+=$9
} END{
for (i in a)
print a[i] sum8[i], sum9[i], (sum9[i]?sum8[i]/sum9[i]:"NaN")
}' file
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;2020;1100;1.83636
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;2720;1100;2.47273
This awk program will print the modified header and modify the output to contain the sums and their division:
awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0}
(NR>1 && NF) {sum8[$10]+=$8; sum9[$10]+=$9; other[$10]=$0}
END {for (i in sum8)
{$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}'
which gives:
Date;dbms;type;description;W;D;S;results;time;results/time
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;2020;1100;1.83636
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;2720;1100;2.47273
You don't seem to care for the ID in the result, but if you do, just replace $10= with $11=.
Also, if you need to sum things based on values of more than one column, you can create a temporary variable (a in the example below) which is a concatenation of two columns and use it as an index in the arrays, like this:
awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0}
(NR>1 && NF) {a=$5$6; sum8[a]+=$8; sum9[a]+=$9; other[a]=$0}
END {for (i in sum8)
{$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}'

sed add comma at end of every 4th line

I need to add a comma to the end of every fourth line. here is a example of the output followed by what i am looking for.
("tester1",
"SERVICE_TICKET_CREATED",
"Thu Mar 19 23:27:57 UTC 2015",
"73.217.129.159")
("tester1",
"SERVICE_TICKET_CREATED",
"Fri Mar 20 00:31:59 UTC 2015",
"73.217.129.159")
And what I need
("tester1",
"SERVICE_TICKET_CREATED",
"Thu Mar 19 23:27:57 UTC 2015",
"73.217.129.159"),
("tester1",
"SERVICE_TICKET_CREATED",
"Fri Mar 20 00:31:59 UTC 2015",
"73.217.129.159"),
Using awk
awk is well-suited to this:
$ awk '0==NR%4{$0=$0","} 1' file
("tester1",
"SERVICE_TICKET_CREATED",
"Thu Mar 19 23:27:57 UTC 2015",
"73.217.129.159"),
("tester1",
"SERVICE_TICKET_CREATED",
"Fri Mar 20 00:31:59 UTC 2015",
"73.217.129.159"),
How it works:
0==NR%4{$0=$0","}
NR is the line number. NR%4 is the line number modulo 4. Thus, 0 == NR%4 on every fourth line. For those lines, we add a comma at the end: $0=$0",".
1
This is awk's cryptic shorthand for print-the-line.
Using sed
It looks like you want a comma after every line that ends with a close-parens. If that is the case, then:
$ sed 's/)$/),/' file
("tester1",
"SERVICE_TICKET_CREATED",
"Thu Mar 19 23:27:57 UTC 2015",
"73.217.129.159"),
("tester1",
"SERVICE_TICKET_CREATED",
"Fri Mar 20 00:31:59 UTC 2015",
"73.217.129.159"),
If your goal is to add a comma after every closing ) then you can do the following:
sed 's/)$/),/'
This would accommodate records that differed in number of lines.
You can just subtitute any ending ")" with ")," by using this sed command:
sed 's/)$/),/' <your file>

Resources