How to extract data with field value greater than particular number - linux

I am trying to extract the values/methods which are taking more than particular milliseconds, I am unable to provide correct field separator
awk -F'=' '$3>960' file
awk -F'=||' '$3>960' file
This is a sample line:
logAlias=Overall|logDurationMillis=34|logTimeStart=2019-09-12_05:22:02.602|logTimeStop=2019-09-12_05:22:02.636|logTraceUID=43adbcaf55de|getMethod1=26|getMethod2=0|getMethod3=0|getMethod4=1|getMethod5=8
I do not see any result or i see it gives me all the transactions

Here is a generic, robust and easily extendible way:
awk -F'|' '{
for(i=1;i<=NF;++i) {
split($i,kv,"=")
f[kv[1]]=kv[2]
}
}
f["logDurationMillis"]>960' file

You may use
awk -F'[=|]' '$4>960' file
Note that [=|] is a regex matching either = or | and the value you want to compare against appears in the fourth field.
See online demo:
s="logAlias=Overall|logDurationMillis=34|logTimeStart=2019-09-12_05:22:02.602|logTimeStop=2019-09-12_05:22:02.636|logTraceUID=43adbcaf55de|getMethod1=26|getMethod2=0|getMethod3=0|getMethod4=1|getMethod5=8
logAlias=Overall|logDurationMillis=980|logTimeStart=2019-09-12_05:22:02.602|logTimeStop=2019-09-12_05:22:02.636|logTraceUID=43adbcaf55de|getMethod1=26|getMethod2=0|getMethod3=0|getMethod4=1|getMethod5=8"
awk -F'[=|]' '$4>960' <<< "$s"
Output:
logAlias=Overall|logDurationMillis=980|logTimeStart=2019-09-12_05:22:02.602|logTimeStop=2019-09-12_05:22:02.636|logTraceUID=43adbcaf55de|getMethod1=26|getMethod2=0|getMethod3=0|getMethod4=1|getMethod5=8

Could you please try following, this may help you in case your string is not coming in a fixed place.
awk '
match($0,/logDurationMillis=[0-9]+/){
if(substr($0,RSTART+18,RLENGTH-18)+0>960){
print
}
}
' Input_file
2nd Solution:
awk '
match($0,/logDurationMillis=[0-9]+/){
val=substr($0,RSTART,RLENGTH)
sub(/.*=/,"",val)
if(val+0>960){
print
}
}
' Input_file

Here is how I do it:
awk -F'logDurationMillis=' '{split($2,a,"[|]")} a[1]>960' file
If its log duration logDurationMillis you are looking for, I do set it as a line separator. This way I know for sure the next data is the value to get. Then split the next data by | to get the number in front of it. Then a[1] have your value and you can test it against what you need. No loop, so should be fast.

Related

how to filter log data which specify column is bigger than a value

need to check logs, logs format as below, normally I get logs by cat ***.log | grep --color -a 'keywords'I wanna get all logs which proctm > 30000µs, anyone knows how to write linux command? thank u
traceid=1f8a1b84c113b31e2fd6589cbf339600|***|errcode=0|proctm=28887µs|usetrx= false|ormstats=0 0 0|
With your shown samples, could you please try following. We could do this in a single awk itself.
awk '
/keyword/ && match($0,/proctm=[^µ]*/){
val=substr($0,RSTART,RLENGTH)
sub(/[^0-9]+/,"",val)
if(val+0>30000){ print }
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/keyword/ && match($0,/proctm=[^µ]*/){ ##Using match function to match regex proctm= till µ here.
val=substr($0,RSTART,RLENGTH) ##Creating val which has sub string which has matched regex value.
sub(/[^0-9]+/,"",val) ##Substituting everything apart from digit in val.
if(val+0>30000){ print } ##Checking if val is greater than 30000 then print current line.
}
' Input_file ##Mention *.log to pass all log files here.

How to edit output rows from awk with defined position?

Is there a way how to solve this?
I have a bash script, which creates .dat and .log file from source files.
I'm using awk with print and position what I need to print. The problem is with the last position - ID2 (lower). It should be just \*[0-9]{3}\*#, but in some cases there is a string before [0-9]{12}\[00]\>.
Then row looks for example like this:
2020-01-11 01:01:01;test;test123;123456789123[00]>*123*#
What I need is remove the string before in a file:
2020-01-11 01:01:01;test;test123;*123*#
File structure:
YYYY-DD-MM HH:MM:SS;string;ID1;ID2
I will be happy for any advice, thanks.
awk 'BEGIN{FS=OFS=";"} {$NF=substr($NF,length($NF)-5)}1' file
Here we keep only last 6 characters of the last field, while semicolon is the field separator. If there is nothing else in front of that *ID*#, then we keep all of it.
Delete everything before the first *:
$ awk 'BEGIN{FS=OFS=";"}{sub(/^[^*]*/,"",$NF)}1' file
Output:
2020-01-11 01:01:01;test;test123;*123*#
Could you please try following tested and written with shown samples in GNU awk.
awk '
match($0,/[0-9]{12}\[[0-9]+\]>/) && /\*[0-9]{3}\*#/{
print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/[0-9]{12}\[[0-9]+\]>/) && /\*[0-9]{3}\*#/{ ##Using match function to match regex in it, what regex does is: It matches digits(12 in number) then [ then digits(continuously coming) and ] Also checking condition if line ends with *3 digits *
print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH) ##If above condition is TRUE then printing sub-string from 1st character to RSTART-1 and then sub-string from RSTART+RLENGTH value to till last of line.
}
' Input_file ##Mentioning Input_file name here.

Grouping related rows of data into a single column in Linux

I have a csv file that gets generated daily and automatically that has output similar to the following example:
"N","3.5",3,"Bob","10/29/17"
"Y","4.5",5,"Bob","10/11/18"
"Y","5",6,"Bob","10/28/18"
"Y","3",1,"Jim",
"N","4",2,"Jim","09/29/17"
"N","2.5",4,"Joe","01/26/18"
I need to transform the text so that it is grouped by person (the fourth column), and all of the records are in a single row and in the columns are repeated using the same sequence: 1,2,3,5. Some cells may be missing data but must remain in the sequence so the columns line up. So the output I need will look like this:
"Bob","N","3.5",3,"10/29/17","Y","4.5",5,"10/11/18","Y","5",6,"10/28/18"
"Jim","Y","3",1,,"N","4",2,"09/29/17"
"Joe","N","2.5",4,"01/26/18"
I am open to using sed, awk, or pretty much any standard Linux command to get this task done. I've been trying to use awk, and though I get close, I can't figure out how to finish it.
Here is the command where I'm close. It lists the header and the names, but no other data:
awk -F"," 'NR==1; NR>1 {a[$4]=a[$4] ? i : ""} END {for (i in a) {print i}}' test2.csv
you need little more code
$ awk 'BEGIN {FS=OFS=","}
{k=$4; $4=$5; NF--; a[k]=(k in a?a[k] FS $0:$0)}
END {for(k in a) print k,a[k]}' file
"Bob","N","3.5",3,"10/29/17" ,"Y","4.5",5,"10/11/18" ,"Y","5",6,"10/28/18"
"Jim","Y","3",1, ,"N","4",2,"09/29/17"
"Joe","N","2.5",4,"01/26/18"
note that NF-- trick may not work in all awks.
Could you please try following too, reading the Input_file 2 times, it will provide output in same sequence in which 4th column has come in Input_file.
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
a[$4]=a[$4]?a[$4] OFS $1 OFS $2 OFS $3 OFS $5:$4 OFS $1 OFS $2 OFS $3 OFS $5
next
}
a[$4]{
print a[$4]
delete a[$4]
}
' Input_file Input_file
If there is any chance that any of the CSV values has a comma, then a "CSV-aware" tool will would be advisable to obtain a reliable but straightforward solution.
One approach would be to use one of the many readily available csv2tsv command-line tools. A variety of elegant solutions then becomes possible. For example, one could pipe the CSV into csv2tsv, awk, and tsv2csv.
Here is another solution that uses csv2tsv and jq:
csv2tsv < input.csv | jq -Rrn '
[inputs | split("\t")]
| group_by(.[3])[]
| sort_by(.[2])
| [.[0][3]] + ( map( del(.[3])) | add)
| #csv
'
This produces:
"Bob","N","3.5","3","10/29/17 ","Y","4.5","5","10/11/18 ","Y","5","6","10/28/18 "
"Jim","Y","3","1"," ","N","4","2","09/29/17 "
"Joe","N","2.5","4","01/26/18"
Trimming the excess spaces is left as an exercise :-)

Use sed to find and replace a number following by its successor in bash

I have a string that contains multiple occurrences of number ranges, which are separated by a comma, e.g.,
2-12,59-89,90-102,103-492,593-3990,3991-4930
Now I would like to remove all directly neighbouring ranges and remove them from the string, i.e., remove anything that is of the form -(x),(x+1), to get something like this:
2-12,59-492,593-4930
Can anyone think of a method to accomplish this? I can honestly not post anything that I have tried, because all my tries were highly unsuccessful. To me it seems like it is not possible to actually find anything of the form -(x),(x+1) using sed, since that would require doing operations or comparisons of a found number by another number that has to be part of the command that is currently searching for numbers.
If everybody agrees that sed is NOT the correct tool for doing this, I will do it another way, but I am still interested if it's possible.
with awk
awk -F, -v RS="-" -v ORS="-" '$2!=$1+1' file
with appropriate separator setting, print the record when second field is not +1.
RS is the record separator and ORS is the outpout record separator.
test:
> awk -F, -v RS="-" -v ORS="-"
'$2!=$1+1' <<< "2-12,59-89,90-102,103-492,593-3990,3991-4930"
2-12,59-492,593-4930
awk solution:
awk -F'-' '{ r=$1;
for (i=2; i<=NF; i++) {
split($i, a, ",");
r=sprintf("%s%s", r, a[2]-a[1]==1? "" : FS $i)
}
print r
}' file
-F'-' - treat -(hyphen) as field separator
r - resulting string
split($i, a, ",") - split adjacent range boundaries into array a by separator ,
a[2]-a[1]==1 - crucial condition, reflects (x),(x+1)
The output:
2-12,59-492,593-4930
This might work for you (GNU sed):
sed -r ' s/^/\n/;:a;ta;s/\n([^-]*-)([0-9]*)(.*,)/\1\n\2\n\2\n\3/;Td;:b;s/(\n.*\n.*)9(_*\n)/\1_\2/;tb;s/(\n.*\n)(_*\n)/\10\2/;s/$/\n0123456789/;s/(\n.*\n[0-9]*)([0-8])(_*\n.*)\n.*\2(.).*/\1\4\3/;:z;tz;s/(\n.*\n[^_]*)_([^\n]*\n)/\10\2/;tz;:c;tc;s/([0-9]*-)\n(.*)\n(.*)\n,(\3)-/\n\1/;ta;s/\n(.*)\n.*\n,/\1,\n/;ta;:d;s/\n//g' file
This proof-of-concept sed solution, iteratively increments and compares the end of one range with the start of another. If the comparison is true it removes both and repeats, otherwise it moves on to the next range and repeats until all ranges have been compared.

Awk using index with Substring

I have one command to cut string.
I wonder detail of control index of command in Linux "awk"
I have two different case.
I want to get word "Test" in below example string.
1. "Test-01-02-03"
2. "01-02-03-Test-Ref1-Ref2
First one I can get like
substr('Test-01-02-03',0,index('Test-01-02-03',"-"))
-> Then it will bring result only "test"
How about Second case I am not sure how can I get Test in that case using index function.
Do you have any idea about this using awk?
Thanks!
This is how to use index() to find/print a substring:
$ cat file
Test-01-02-03
01-02-03-Test-Ref1-Ref2
$ awk -v tgt="Test" 's=index($0,tgt){print substr($0,s,length(tgt))}' file
Test
Test
but that may not be the best solution for whatever your actual problem is.
For comparison here's how to do the equivalent with match() for an RE:
$ awk -v tgt="Test" 'match($0,tgt){print substr($0,RSTART,RLENGTH)}' file
Test
Test
and if you like the match() synopsis, here's how to write your own function to do it for strings:
awk -v tgt="Test" '
function strmatch(source,target) {
SSTART = index(source,target)
SLENGTH = length(target)
return SSTART
}
strmatch($0,tgt){print substr($0,SSTART,SLENGTH)}
' file
If these lines are the direct input to awk then the following work:
echo 'Test-01-02-03' | awk -F- '{print $1}' # First field
echo '01-02-03-Test-Ref1-Ref2' | awk -F- '{print $NF-2}' # Third field from the end.
If these lines are pulled out of a larger line in an awk script and need to be split again then the following snippets will do that:
str="Test-01-02-03"; split(str, a, /-/); print a[1]
str="01-02-03-Test-Ref1-Ref2"; numfields=split(str, a, /-/); print a[numfields-2]

Resources