Adding new line before string matching regex in linux (Jenkins) - linux

Hi I'm trying to do some CSV manipulation before processing. Now I'm strungling with following scenario.
Input file (no line breaks):
timeStamp,elapsed,label,responseCode,responseMessage,threadName,success,failureMessage,bytes,sentBytes,Latency,IdleTime 1611013105559,492,REST API,200,,REST API 1-1,true,,1221,32292,492,0 1611013107054,575,DB check,200,OK,REST API 1-1,true,,177,0,575,0 1611013251449,231,DB check,null 0,"java.sql.SQLException: Cannot create PoolableConnectionFactory (ORA-28040: No matching authentication protocol )",REST API 1-1,false,Row not inserted properly.,89,0,0,0
Desired output (new line before the timestamp):
timeStamp,elapsed,label,responseCode,responseMessage,threadName,success,failureMessage,bytes,sentBytes,Latency,IdleTime
1611013105559,492,REST API,200,,REST API 1-1,true,,1221,32292,492,0
1611013107054,575,DB check,200,OK,REST API 1-1,true,,177,0,575,0
1611013251449,231,DB check,null 0,"java.sql.SQLException: Cannot create PoolableConnectionFactory (ORA-28040: No matching authentication protocol )",REST API 1-1,false,Row not inserted properly.,89,0,0,0
Actual output:
timeStamp,elapsed,label,responseCode,responseMessage,threadName,success,failureMessage,bytes,sentBytes,Latency,IdleTime
[0-9]{13},492,REST API,200,,REST API 1-1,true,,1221,32292,492,0
[0-9]{13},575,DB check,200,OK,REST API 1-1,true,,177,0,575,0
[0-9]{13},231,DB check,null 0,"java.sql.SQLException: Cannot create PoolableConnectionFactory (ORA-28040: No matching authentication protocol )",REST API 1-1,false,Row not inserted properly.,89,0,0,0
Using this command:
awk -v patt=[0-9]{13} '$0 ~ patt {gsub(patt, "\n"patt)}1' < input.jtl > output.jtl
Anyone can help please?
Regards Jan

With awk could you please try following, written and tested with shown samples.
awk '{gsub(/[0-9]{13},[0-9]{3}/,ORS"&")} 1' Input_file > output.jtl
Explanation: Simply globally substitutinggsub matched regex [0-9]{13},[0-9]{3} value with ORS(new line) and with the matched value itself. 1 will print the edited/non-edited current line.

If you want to use backreference use gensub. In your case we might do
awk '{print gensub(/([0-9]{13})/, "\n\\1", "g")}' input.jtl
Note that I enclosed [0-9]{13} in () thus making it first (and only) group which I then reference as \\1, g mean global replacement (all occurences). gensub does return new string rather than changing, so I print it. If you want to know more about gensub then read String Functions docs.

You can use a GNU sed like
sed -E 's/\<[0-9]{13}\>/\n&/g' input.jtl > output.jtl
Details:
-E - enables POSIX ERE syntax (less escaping required)
\<[0-9]{13}\> - matches a leading word boundary, thirteen digits and a trailing word boundary
\n& - replaces the match with a newline and the match itself
g - all occurrences on a line.

Related

I am trying to use awk to extract a portion of each line in my file

I have a large file of user agent strings, and I want to extract one particular section of each request.
For input:
207.46.13.9 - - [22/Jan/2019:08:02:29 +0330] "GET /product/23474/%D9%84%DB%8C%D8%B2%D8%B1-%D8%A8%D8%AF%D9%86-%D8%AE%D8%A7%D9%86%DA%AF%DB%8C-%D8%B1%D9%85%DB%8C%D9%86%DA%AF%D8%AA%D9%88%D9%86-%D9%85%D8%AF%D9%84-Remington-Laser-Hair-Removal-IPL6250 HTTP/1.1" 200 41766 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "-"
I am trying to get output:
23474
from after /product/ in the sample above.
I'm trying to use Awk, but I can't figure out how to get the regex expression that's required for this. I'm sure it's simpler than I think, but I'm quite new to this!
The pattern is the following:
RANDOMSTUFF/GET /product/XXXXX/MORERANDOMSTUFF
and I'm trying to grab XXXXX. I don't think I can use just the '/' since there will be other slashes in the line.
I've tried
awk 'BEGIN{FS="[GET \\/product\\/]"}{print $2}'
to try and use GET /product as a field separator, and then grab the next item. But I've realized this won't work (even if I got the regex expression right, which I didn't), since there might not be whitespace after the product ID I want to grab.
The square brackets you tried to put around the FS are incorrect here, but the problem after you fix that is that you then simply have two fields, as you are overriding the splitting on whitespace which Awk normally does.
Because the (horrible) date format always has exactly two slashes, I think you can actually do
awk -F / '/product/ { print $5 }' filename
Even though it divides the earlier part of the line into quite weird parts, the things after GET or PUT will always be $4, $5, etc.
If you wanted to keep your original idea, maybe try
awk 'BEGIN {FS="GET /product/"}
NF==2{
# second field is now everything after /product/ -- split on slash
split($2, f, "/")
print f[1] }' file
... or very simply, brutally remove everything except the text you want;
awk '/\/product\// { sub(".*/product/", ""), sub("/.*", ""); print }' file
which might be better expressed as a simple sed script;
sed -n 's%.*GET /product/\([^/]*\)/.*%\1%p' file

Matching emails from second column in one file against another file

I have two files, one with emails in it(useremail.txt), and another with email:phonenumber(emailnumber.txt).
useremail.txt contains:
John smith:blabla#hotmail.com
David smith:haha#gmail.com
emailnumber.txt contains:
blabla#hotmail.com:093748594
So the solution needs to grab the email from the second column of useremail and then search through the emailnumber file and find matches and output John smith:093748594, so just the name and phone number.
I'm on windows so I need a gawk or grep solution, I have tried for a long time trying to get it to work with awk/grep and can't find the right solution, any help would be really appreciated.
Another in (GNU) awk:
$ awk '
BEGIN {
# RS=ORS="\r\n" # since you are using GNU awk this replaces the sub()
FS=OFS=":" # input and output field separators
}
NR==FNR { # processing the first file
sub(/\r$/,"",$NF) # remove the \r after the email OR uncomment RS above
a[$2]=$1 # hash name, index on email
next # on to the next record
}
($1 in a) { # if email in second file matches one in hash
print a[$1],$2 # output. If ORS uncommented above, output ends in \r
# if not, you may want to add it to the print ... "\r"
}' useremail emailnumber
Output:
John smith:093748594
Since you tried the accepted answer in Linux and Windows and you use GNU awk, in the future you could set RS="\r?\n" which accepts both forms, \r\n and bare \n. However, I've recently ran into a problem with that form in a specific condition (for which I've not yet filed a bug report).
You could try this:
awk -F":" '(FNR==NR){a[$2]=$1}(FNR!=NR){print a[$1]":"$2}' useremail.txt emailnumber.txt
If there are entries in emailnumber.txt with no matching entry in useremail.txt:
awk -F":" '(FNR==NR){a[$2]=$1}(FNR!=NR){if(a[$1]){print a[$1]":"$2}}' useremail.txt emailnumber.txt

How to use awk for filtering(perl automation)

This is my txt file
type=0
vcpu_count=10
maste=0
h=0
p=0
memory=23.59
num=2
I want to get the vcpu_count and memory values and store it in some array through perl(automating script) .
awk -F'=' '/vcpu_count/{printf "\n",$1}' .vmConfig.txt
i am using this command just to test on terminal.but am getting a blank line. How do i do it. I need to get these two values and check for condition
If you are using Perl anyway, just use Perl for this too.
my %array;
open ($config, "<", ".vmConfig.txt") or die "$0: Could not open .vmConfig.txt: $!\n";
while (<$config>) {
next unless /^\s*(vcpu_count|memory)\s*=\s*(.*?)\s*\n/;
$array{$1} = $2;
}
close($config);
If you don't want the result to be an associative array (aka hash), refactoring should be relatively easy.
Following awk may help you on same.
Solution 1st:
awk '/vcpu_count/{print;next} /memory/{print}' Input_file
Output will be as follows:
vcpu_count=10
memory=23.59
Solution 2nd:
In case you want to print the values on a single line using printf then following may help you on same:
awk '/vcpu_count/{val=$0;next} /memory/{printf("%s AND %s\n",val,$0)}' Input_file
Output will be as follows:
vcpu_count=10 AND memory=23.59
when you use awk -F'=' '/vcpu_count/{printf "\n",$1}' .vmConfig.txt there are a couple of mistakes. Firstly, printf "\n" will only ever print a new line, as you have found. You need to add a format specifier - something like printf "%s\n", $2 will treat field 2 as a string and add it into the printed string. Checking out man printf at the command line will explain a bit more,.
Secondly, as I changed there, when you used $1 you were using the first field, which is the key in this case (while $0 is the whole line.)
Triplees solution is probably the most appropriate, but if there is a particular reason to start awk to perform this before perl, the following may help.
As you have done, it splits on =, but then outputs as csv, which you can change as appropriate. Even if input lines are not always in same order, will output in predictable order on single line
awk 'BEGIN {
FS="=";
OFS="," # tabs, etc if wanted, delete for spaces.
}
/vcpu_count/ {cpu=$2}
/memory/ {mem=$2}
END { print cpu, mem }'
This gives
10,23.59

Please explain this awk script for taking Fixed Width to CSV

I'm learning some awk. I found an example online of taking a fixed width file and converting it to a csv file. There is just one part I do not understand, even after going through many man pages and online tutorials:
1: awk -v FIELDWIDTHS='1 10 4 2 2' -v OFS=',' '
2: { $1=$1 ""; print }
3: ' data.txt`
That is verbatim from the sample online (found here).
What I don't understand is line 2. I get there is no condition, so the 'program' (contained in brackets) will always execute per record (line). I don't understand why it is doing the $1=$1 as well as the empty string statement "";. However, removing these causes incorrect behavior.
$1=$1 assigns a value to $1 (just happens to be the same value it already had). Assigning any value to a field cause awk to recompile the current record using the OFS value between fields (effectively replacing all FSs or FIELDSEPS spacings with OFSs).
$ echo 'a,b,c' | awk -F, -v OFS="-" '{print; $1=$1; print}'
a,b,c
a-b-c
The "" is because whoever wrote the script doesn't fully understand awk and thinks that's necessary to ensure numbers retain their precision by converting them to a string before the assignment.

Using Awk to process a file where each record has different fixed-width fields

I have some data files from a legacy system that I would like to process using Awk. Each file consists of a list of records. There are several different record types and each record type has a different set of fixed-width fields (there is no field separator character). The first two characters of the record indicate the type, from this you then know which fields should follow. A file might look something like this:
AAField1Field2LongerField3
BBField4Field5Field6VeryVeryLongField7Field8
CCField99
Using Gawk I can set the FIELDWIDTHS, but that applies to the whole file (unless I am missing some way of setting this on a record-by-record basis), or I can set FS to "" and process the file one character at a time, but that's a bit cumbersome.
Is there a good way to extract the fields from such a file using Awk?
Edit: Yes, I could use Perl (or something else). I'm still keen to know whether there is a sensible way of doing it with Awk though.
Hopefully this will lead you in the right direction. Assuming your multi-line records are guaranteed to be terminated by a 'CC' type row you can pre-process your text file using simple if-then logic. I have presumed you require fields1,5 and 7 on one row and a sample awk script would be.
BEGIN {
field1=""
field5=""
field7=""
}
{
record_type = substr($0,1,2)
if (record_type == "AA")
{
field1=substr($0,3,6)
}
else if (record_type == "BB")
{
field5=substr($0,9,6)
field7=substr($0,21,18)
}
else if (record_type == "CC")
{
print field1"|"field5"|"field7
}
}
Create an awk script file called program.awk and pop that code into it. Execute the script using :
awk -f program.awk < my_multi_line_file.txt
You maybe can use two passes:
1step.awk
/^AA/{printf "2 6 6 12" }
/^BB/{printf "2 6 6 6 18 6"}
/^CC/{printf "2 8" }
{printf "\n%s\n", $0}
2step.awk
NR%2 == 1 {FIELDWIDTHS=$0}
NR%2 == 0 {print $2}
And then
awk -f 1step.awk sample | awk -f 2step.awk
You probably need to suppress (or at least ignore) awk's built-in field separation code, and use a program along the lines of:
awk '/^AA/ { manually process record AA out of $0 }
/^BB/ { manually process record BB out of $0 }
/^CC/ { manually process record CC out of $0 }' file ...
The manual processing will be a bit fiddly - I suppose you'll need to use the substr function to extract each field by position, so what I've got as one line per record type will be more like one line per field in each record type, plus the follow-on printing.
I do think you might be better off with Perl and its unpack feature, but awk can handle it too, albeit verbosely.
Could you use Perl and then select an unpack template based on the first two chars of the line?
Better use some fully featured scripting language like perl or ruby.
What about 2 scripts? E.g. 1st script inserts field separators based on the first characters, then the 2nd should process it?
Or first of all define some function in your AWK script, which splits the lines into variables based on the input - I would go this way, for the possible re-usage.

Resources