how to make sed match a pattern with multiple lines - multithreading

Background
I use sed a lot to track logs that begin with lines that matches a certain pattern. I use this command:
sed -ne '/pattern/ p' infile >outfile
In the code, I simply prepend log lines with identifiers so that I can filter using them later on (my identifiers are order number and thread number.. so for example this log line:
9/14/2017 10:19:58 AM::: ORD7123::TH41361::Failed Checkout
is for order 7123 thread 41361. So the sed command above (if I'm filtering for all logs pertaining to order 7123) will look like:
sed -ne '/ORD7123/ p' infile >outfile
Problem
The problem happens when the log for a single order/thread combination spans multiple lines like so:
9/14/2017 10:19:58 AM::: ORD7123::TH41361::Failed Checking Out With Credit Card for # 3 times. Error: {
"order": "country is required",
"credit_card": "year is not a valid year"
}.
9/14/2017 10:19:59 AM::: ORD7123::TH41347::Successfully Got a something Solution
using the above sed command, my output will look like this
9/14/2017 10:19:58 AM::: ORD7123::TH41361::Failed Checking Out With Credit Card for # 3 times. Error: {
9/14/2017 10:19:59 AM::: ORD7123::TH41347::Successfully Got a something Solution
Suggestions/Analysis
We faced this problem before (where we had control over the creation of logs) and the way we dealt with it was by replacing new lines by /n or something like that. In this case I don't have much control over the logs creation, and so I must deal with the logs as is

awk should be able to handle this that can work the way logstash works and collects log messages on multiple lines. Looking at your example, it appears that you want to capture text between { ... } that goes over multiple lines. Hence you can use something link this:
awk '/ORD7123/{if (/{$/) p=1; print; next} p; p && /^}/{p=0}' file.log
If you don't always have { ... } as shown in example logs then you can use this awk command:
awk '/ORD7123/ {p=NR} NR==p+1 {p = (/^[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4} / ? 0 : NR)} p' file
9/14/2017 10:19:58 AM::: ORD7123::TH41361::Failed Checking Out With Credit Card for # 3 times. Error: {
"order": "country is required",
"credit_card": "year is not a valid year"
}.
9/14/2017 10:19:59 AM::: ORD7123::TH41347::Successfully Got a something Solution
This awk command looks for starting date pattern in every log line and if it doesn't find it, it is considered continuation of previous log message.

awk solution:
awk -v p="ORD7123" '$0~p{ print;
while(getline nl > 0 && (nl!~/^[0-9]+\/[0-9]{2}/ || nl~p)){
print nl
}
}' inputfile
The exemplary output:
9/14/2017 10:19:58 AM::: ORD7123::TH41361::Failed Checking Out With Credit Card for # 3 times. Error: {
"order": "country is required",
"credit_card": "year is not a valid year"
}.
9/14/2017 10:19:59 AM::: ORD7123::TH41347::Successfully Got a something Solution

This might work for you (GNU sed):
sed ':a;/ORD7123/!d;:b;n;/^..\?\/..\?\/.... ..:..:.. /ba;bb' file
Match on the required string (ORD7123) otherwise delete the line. On a match, read and print subsequent lines until a line beginning with a date and time, then check for the required string.

sed is for simple substitutions on individual lines, that is all. You're not trying to do a simple substitution on an individual line so you shouldn't be considering sed. Just use awk:
$ cat tst.awk
/^[0-9]/ { prt() }
{ rec = (rec=="" ? "" : rec ORS) $0 }
END { prt() }
function prt() {
if ( rec ~ /ORD7123/) {
print rec
}
rec = ""
}
$ awk -f tst.awk file
9/14/2017 10:19:58 AM::: ORD7123::TH41361::Failed Checking Out With Credit Card for # 3 times. Error: {
"order": "country is required",
"credit_card": "year is not a valid year"
}.
9/14/2017 10:19:59 AM::: ORD7123::TH41347::Successfully Got a something Solution
It can very easily be tweaked to remove the newlines in the middle of the records if you'd like further processing to be simpler by just changing ORS to OFS (or any other string you like) where the record is being compiled:
$ cat tst.awk
/^[0-9]/ { prt() }
{ rec = (rec=="" ? "" : rec OFS) $0 }
END { prt() }
function prt() {
if ( rec ~ /ORD7123/) {
print rec
}
rec = ""
}
$ awk -f tst.awk file
9/14/2017 10:19:58 AM::: ORD7123::TH41361::Failed Checking Out With Credit Card for # 3 times. Error: { "order": "country is required", "credit_card": "year is not a valid year" }.
9/14/2017 10:19:59 AM::: ORD7123::TH41347::Successfully Got a something Solution

Related

I need to make an awk script to parse text in a file. I am not sure if I am doing it correctly

Hi I need to make a an awk script in order to parse a csv file and sort it in bash.
I need to get a list of presidents from Wikipedia and sort their years in office by year.
When it is all sorted out, each ear needs to be in a text file.
Im not sure I am doing it correctly
Here is a portion of my csv file:
28,Woodrow Wilson,http:..en.wikipedia.org.wiki.Woodrow_Wilson,4.03.1913,4.03.1921,Democratic ,WoodrowWilson.gif,thmb_WoodrowWilson.gif,New Jersey
29,Warren G. Harding,http:..en.wikipedia.org.wiki.Warren_G._Harding,4.03.1921,2.8.1923,Republican ,WarrenGHarding.gif,thmb_WarrenGHarding.gif,Ohio
I want to include $2 which is i think the name, and sort by $4 which is think the date the president took office
Here is my actual awk file:
#!/usr/bin/awk -f
-F, '{
if (substr($4,length($4)-3,2) == "17")
{ print $2 > Presidents1700 }
else if (substr($4,length($4)-3,2) == "18")
{ print $2 > Presidents1800 }
else if (substr($4,length($4)-3,2) == "19")
{ print $2 > Presidents1900 }
else if (substr($4,length($4)-3,2) == "20")
{ print $2 > Presidents2000 }
}'
Here is my function running it:
SplitFile() {
printf "Task 4: Spliting file based on century\n"
awk -f $AFILE ${custFolder}/${month}/$DFILE
}
Where $AFILE is my awk file, and the directories listed on the right lead to my actual file.
Here is a portion of my output, it's actually several hundred lines long but in the
end this is what a portion of it looks like:
awk: presidentData/10/presidents.csv:47: 46,Joseph Biden,http:..en.wikipedia.org.wiki.Joe_Biden,20.01.2021,Incumbent , Democratic , Joe_Biden.jpg,thmb_Joe_Biden.jpg,Pennsilvania awk: presidentData/10/presidents.csv:47: ^ syntax error awk: presidentData/10/presidents.csv:47: 46,Joseph Biden,http:..en.wikipedia.org.wiki.Joe_Biden,20.01.2021,Incumbent , Democratic , Joe_Biden.jpg,thmb_Joe_Biden.jpg,Pennsilvania awk: presidentData/10/presidents.csv:47: ^ syntax error
awk: presidentData/10/presidents.csv:47: 46,Joseph Biden,http:..en.wikipedia.org.wiki.Joe_Biden,20.01.2021,Incumbent , Democratic , Joe_Biden.jpg,thmb_Joe_Biden.jpg,Pennsilvania awk: presidentData/10/presidents.csv:47: ^ syntax error
awk: presidentData/10/presidents.csv:47: 46,Joseph Biden,http:..en.wikipedia.org.wiki.Joe_Biden,20.01.2021,Incumbent , Democratic , Joe_Biden.jpg,thmb_Joe_Biden.jpg,Pennsilvania awk: presidentData/10/presidents.csv:47:
I know the output is not very helpful; I would rather just screenshot but I can't. I tried getting help but these online classes can be really hard and getting help at a distance is tough, the syntax errors above seem to be pointing to commas in the csv file.
After the edits, it's clear you are trying to classify the presidents by century outputting the century in which the president served.
As stated in my comments above, you don't include single quotes or command-line arguments in an awk script file. You use the BEGIN {...} rule to set the field-separator FS = ",". Then there are several ways to you split things in the fourth field. split() is just as easy as anything else.
That will leave you with the ending year in which the president served in the fourth element of arr (arr[0] is always the complete expression matching any REGEX used). Then it just a matter of comparing with the largest year first and decreasing from there redirecting the output to the output file for the century.
Continuing with what you started, your awk script will look similar to:
#!/usr/bin/awk -f
BEGIN { FS = "," }
{
split ($4, arr, ".")
if (arr[3] >= 2000)
print $2 > "Presidents2000"
else if (arr[3] >= 1900)
print $2 > "Presidents1900"
else if (arr[3] >= 1800)
print $2 > "Presidents1800"
else if (arr[3] >= 1700)
print $2 > "Presidents1700"
}
Now make it executable (for convenience). Presuming the script is in the file pres.awk:
$ chmod +x pres.awk
Now simply call the awk script passing the .csv filename as the argument, e.g.
$ ./pres.awk my.csv
Now list the files named Presid* and see what is created:
$ ls -al Presid*
-rw-r--r-- 1 david david 33 Oct 8 22:28 Presidents1900
And verify the contents is what you needed:
$ cat Presidents1900
Woodrow Wilson
Warren G. Harding
Presuming that is the output you are looking for based on your attempt.
(note: you need to quote the output file name to ensure, e.g. Presidents1900 isn't taken as a variable that hasn't been set yet)
Let me know if you have further questions.

How to make awk grab string in between a second set of single-quotes

Please help, this is driving me mad.
I've got a standard Wp-config.php file and I'm trying to get awk to output only the database name, database username and password on a single line, but no matter what I try it spits out either irrelevant nonsense or syntax errors.
define('DB_NAME', 'pinkywp_wrdp1');
/** MySQL database username */
define('DB_USER', 'pinkywp_user1');
/** MySQL database password */
define('DB_PASSWORD', 'Mq2uMCLuGvfyw');
Desired output:
pinkywp_wrdp1 pinkywp_user1 Mq2uMCLuGvfyw
Actual output:
./dbinfo.sh: line 28: unexpected EOF while looking for matching `''
./dbinfo.sh: line 73: syntax error: unexpected end of file
With GNU awk:
Use ' as field separator and if current line contains 5 columns, print content of column 4 with a trailing blank.
awk -F "'" 'NF==5 {printf("%s ",$4)}' file
Output:
pinkywp_wrdp1 pinkywp_user1 Mq2uMCLuGvfyw
$ awk -F"'" '$1~/^define/ && $2~/^DB_/{ printf "%s%s", $4, (++cnt%3 ? OFS : ORS)}'
pinkywp_wrdp1 pinkywp_user1 Mq2uMCLuGvfyw
A few awk solutions:
1) with GNU flavor:
awk -v RS="');" '{ printf "%s%s", (NR==1? "":OFS), substr($NF, 2) }END{ print "" }' file
2) quotes-independant solution:
awk -F', ' '/define/{
gsub(/^["\047]|["\047]\);$/, "", $2);
printf "%s%s", (NR==1? "":" "), $2
}
END{ print "" }' file
The output (for both approaches):
pinkywp_wrdp1 pinkywp_user1 Mq2uMCLuGvfyw

Move every x(dynamic) number of lines to a single line [Shell]

So I have data that looks like this
/blah
etc1: etc
etc2
etc3: etc
etc4
/blah
etc1: etc
etc2
etc3
/blah
etc1: etc
etc2
etc3: etc
etc4
/blah
etc1
etc2
So I can't do a specific number of lines, so thought was to use / as a delimiter and put every line after until / on same line(comma delimited?)
Ideal Expected Output:
/blah,etc1: etc,etc2,etc3: etc,etc4,,
/blah,etc1,etc2,etc3,,
/blah,etc1: etc,etc2,etc3: etc,etc4,,
/blah,etc1,etc2,,
Prefer shell/bash/ksh but an excel solution would work too.
Here's an awk solution:
awk '
/^\// { if (NR > 1) print ","; printf "%s,", $0; next }
{ gsub(/^ +| +$/, ""); printf "%s,", $0 }
END { print "," }
' file
Note that it assumes that the input file starts with a /blah-like line, but doesn't end with one.
Crammed into a (less readable) one-liner:
awk '/^\// {if(NR>1) print","; printf"%s,",$0; next} {gsub(/^ +| +$/, ""); printf"%s,",$0} END {print","}' file
A sed solution
sed -r ':a;N;$!ba;s/\n\s+/,/g' input | sed 's/$/,,/'
you get,
/blah,etc1: etc,etc2,etc3: etc,etc4,,
/blah,etc1: etc,etc2,etc3,,
/blah,etc1: etc,etc2,etc3: etc,etc4,,
/blah,etc1,etc2,,

Search for a line in file and replace a next pattern matched line with newline in linux(Shell scripting)

I have a file with below data.Lets call it as myfile.xml:
.........
<header>unique_name</header>
......
somelines
......
<version>I need only this line</version>
......
......
<version>This is second match of version, which I dont want</version>
Now I'm in search of linux commands that does below things:
There can be many <header>.*</header> lines. But I need <header>unique_name</header> .This is an unique header name that I will hardcore it.It appears only once in the file, but can appear anywhere in the file.
Search for <version>.*</version> that appears after <header>unique_name</header> in myfile.txt and this should be replaced with <version>new version number</version>.
I've tried implementing using grep, sed, awk, but I could not. Please advise.
Input and Expected Output:
Input file "myfile.xml":
stringtoFIND=<header>unique_name</header>
newversionNUMBER=new_version_number
The myfile.xml file contents below:
<header>Some strings</header>
......Somelines...........
<version>I dont need this line, since header doesnt match stringtoFIND variable</version>
<header>unique_name</header>
.............
<version>I need only this line</version>
...........
..........
<version>I Dont need this line</version>
.........
Expected output
<header>Some strings</header>
......Somelines...........
<version>I dont need this line, since header doesnt match stringtoFIND variable</version>
<header>unique_name</header>
.............
<version>new_version_number</version>
...........
..........
<version>I Dont need this line</version>
.........
Using GNU awk for the 3rd arg to match():
$ cat tst.awk
match($0,/<header>(.*)<\/header>/,a) {
inBlock = (a[1] == "unique_name" ? 1 : 0)
}
inBlock && match($0,/(.*<version>).*(<\/version>.*)/,a) {
$0 = a[1] "new_version_number" a[2]
inBlock = 0
}
{ print }
$ awk -f tst.awk file
<header>Some strings</header>
......Somelines...........
<version>I dont need this line, since header doesnt match stringtoFIND variable</version>
<header>unique_name</header>
.............
<version>new_version_number</version>
...........
..........
<version>I Dont need this line</version>
.........
You can do this with awk like this.
script.awk
/<header>unique_name<\/header>/ { found=1; done=0 }
/<version>.*<\/version>/ && found && !done {
# replace version in $0
gsub(/<version>.*<\/version>/,"<version>new_version_number</version>")
done = 1
}
# implicitly print current $0:
1
Run the script: awk -f script.awk yourfile > newfile
Each line is printed and replacement of version is done according to the state in found and done.
A similar answer to the one by Lars Fischer:
#! /usr/bin/awk -f
/<header>.*<\/header>/ {
looking = 0
}
/<header>unique_name<\/header>/ {
looking = 1
}
looking && /<version>.*<\/version>/ {
n = match($0, /^ *<version>/)
$0 = substr($0, 1, n) Version "</version>"
looking = 0
}
{ print }
I construct the new version line instead of substituting it. In rules, I put the boolean before the regex because it's more efficient, not that you'll notice. I personally dislike ending the script with 1 to indicate printing, but that's just a style choice.
Invoke as
$ awk -v Version="$version" -f script.awk input

AWK to to find first occurrence of string and assign to variable for compare

I have written following line of code which explodes the string by the first occurrence of the string after a delimiter.
echo "$line" | awk -F':' '{ st = index($0,":");print "field1: "$1 "
=> " substr($0,st+1)}';
But I don't want to display it. Want to take both occurrences in variable so I tried the following code
explodetext="$line" | awk -F':' '{ st = index($0,":")}';
Sample data:
id:1
url:http://test.com
Expected OutPUt will be:
key=id
val=1
key=url
val=http://test.com
but not working as expected.Any solution?
Thanks
Your code, expanded:
echo "$line" \
| awk -F':' '
{
st = index($0,":")
print "field1: " $1 " => " substr($0,st+1)
}'
The output of this appears merely to split the line according to the first colon. From the sample data you've provided, it seems that your lines contain two fields, which are separated by the first colon found. This means you can't safely use awk's field separator to find your data (though you can use it for field names), making index() a reasonable approach.
One strategy might be to place your input into an array, for assessment:
#!/usr/bin/awk -f
BEGIN {
FS=":"
}
{
record[$1]=substr($0,index($0,":")+1);
}
END {
if (record["id"] > 0) {
printf("Record ID %d had a value of %s.\n", record["id"], record["url"])
} else {
print "No valid records found."
}
}
I suppose that your text file input.txt is stored in the format as given below:
id:1
url:http://test1.com
You could use the below piece of code, say awkscript, to achieve what you wish to do :
#!/bin/bash
awk '
BEGIN{FS=":"}
{
if ($2 > 0) {
if ( getline > 0){
st = index($0,":")
url = substr($0,st+1);
system("echo Do something with " url);
}
}
}' $1
Run the code as ./awkscript input.txt
Note: I assume that that the input file contains only one id/url pair as you confirmed in your comment.

Resources