Get first word of match of last line - linux

I want to parse through a log file formatted like this:
INFO: Successfully received REQUEST_ID: 1111 from 164.12.1.11
INFO: Successfully received REQUEST_ID: 2222 from 164.12.2.22
ERROR: Some error
INFO: Successfully received REQUEST_ID: 3333 from 164.12.3.33
INFO: Successfully received REQUEST_ID: 4444 from 164.12.4.44
WARNING: Some warning
INFO: Some other info
I want a script that outputs 4444. So extract the next word after ^.*REQUEST_ID: from the last line that contains the pattern ^.*REQUEST_ID.
What I have so far:
ID=$(sed -n -e 's/^.*REQUEST_ID: //p' $logfile | tail -n 1)
For lines match the pattern matches for, it deletes all the text matching the match thus leaving only the text after the match and prints it. Then I tail it to get the last line. How to do make it so it only prints the first word?
And is there a more efficient way of doing this then having it piped to tail?

With awk:
awk '
$4 ~ /REQUEST_ID:/{val=$5}
END {print val}
' file.csv
$4 ~ /REQUEST_ID:/ : Match lines in which Field # 4 match REQUEST_ID:.
{val=$5} : Store the value of field 5 in the variable val.
END {print val} : On closing the file, print the last value stored.
I have used a regex match to allow for some variance on the string, and yet get a match. A more lenient match will be (a match at any place of the line):
awk ' /REQUEST_ID/ {val=$5}
END {print val}
' file.csv
If you value (or need) more speed than robustness, then use (Quoting needed):
awk '
$4 == "REQUEST_ID:" {val=$5}
END {print val}
' file.csv

With GNU sed:
sed -nE 's/.* REQUEST_ID: ([0-9]+) .*/\1/p' | tail -n 1
Output:
4444
With GNU grep:
grep -Po 'REQUEST_ID: \K[0-9]+' file | tail -n 1
Output:
4444
-P: Interpret PATTERN as a Perl regular expression.
-o: Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
\K: Drop everything before that point from the internal record.

sed '/^.*REQUEST_ID: \([0-9]\{1,\}\) .*/ {s//\1/;h;}
$!d
x' ${logfile}
posix version
print an empty line if no occurence, the next word (assuming it's a number here)
Principe:
if line contain REQUEST_ID
extract the next number
put it in hold buffer
if not the end, delete the current content (and cycle to next line)
load holding buffer (and print the line ending the cycle)

You can match the number and replace with that value:
sed -e 's/^.*REQUEST_ID: \([0-9]*\).*$/\1/g' $logfile

Print field where line and column meet.
awk 'FNR == 5 {print $5}' file
4444

Another awk alternative if you don't know the position of the search word.
tac file | awk '{for(i=1;i<NF;i++) if($i=="REQUEST_ID:") {print $(i+1);exit}}'
yet, another one without looping
tac file | awk -vRS=" " 'n{print;exit} /REQUEST_ID:/{n=1}'

Related

sed: Getting Two Extra Characters in the Results

How can I display the next two characters from sed results (wildcard characters and then stop the results)?
echo 'this is a test line' | sed 's/^.*te*/te../'
Expecting
test
Actual results te.. line
You can use
sed -n 's/.*\(te..\).*/\1/p' <<< 'this is a test line'
See the online demo. Here,
-n - suppresses the default line output
.*\(te..\).* - matches any zero or more chars, then captured into Group 1 te and any two chars, and then matches the rest of the string
\1 - replaces the whole match with the value of Group 1
p - only prints the result of the substitution.
GNU AWK solution
echo 'this is a test line' | awk 'BEGIN{FPAT="te.."}{print $1}'
output
test
Explanation: Inform AWK to detect fields like te.. using FPAT (Field PATtern) then just print 1st field.
(tested in GNU Awk 5.0.1)

How to retrieve a string which is located after a pattern in bash

I have a large file. I want to retrieve the word that is exactly after this string: "PatterStr()."
Two sample lines:
PatterStr().123232424 hhhhh 9999. test, test32312
66666666698977. PatterStr().8888
The output should be:
123232424
8888
when I use grep the whole line will be printed
And when two patterns are find in a line, both should be printed, for instance:
PatterStr().123232424 hhhhh 9999. test, test32312. PatterStr().11111111
66666666698977. PatterStr().8888
the correct result:
123232424
11111111
8888
Could you please try following.
awk '
{
while(match($0,/PatterStr\(\)\.[0-9]+/)){
value=substr($0,RSTART,RLENGTH)
sub(/.*\./,"",value)
print value
$0=substr($0,RSTART+RLENGTH)
value=""
}
}' Input_file
Output will be as follows.
123232424
11111111
8888
Explanation of above code: Adding detailed level of explanation for above code.
awk ' ##Starting awk program from here.
{
while(match($0,/PatterStr\(\)\.[0-9]+/)){ ##Starting while loop which has match function to match regex of PatterStr(). till all digits here.
value=substr($0,RSTART,RLENGTH) ##Creating variable value which has sub-string value of current line, startin point RSTART tioll RLENGTH.
sub(/.*\./,"",value) ##Substituting everything till DOT with NULL in variable value here.
print value ##Printing variable value here.
$0=substr($0,RSTART+RLENGTH) ##Setting rest of sub-string value starting from RSTART+RLENGTH to last of line of current line here.
value="" ##Nullify variable value here.
}
}' Input_file ##Mentioning Input_file name here.
You can reduce the output of grep with the option -o or --only-matching. This will print only the matched parts of a matching line. To suppress the output of PatterStr() you can use a LookBehind as described here.
cat bigfile | grep -Po '(?<=PatterStr\(\)\.)[\w]+'
This line does what you need
grep 'PatterStr()' large-file | sed "s/ /\n/g" | grep 'PatterStr()' | cut -f2 -d\.
Output:
123232424
11111111
8888
There are many ways how you can achieve this, you can for example do it with sed:
sed 's/ /\n/g' text-file.txt | sed -n 's/^PatterStr()\.\(.*\)/\1/p'
The first sed will split the content to separate lines by replacing space by new line, the second will match the lines with PatterStr(). and print what comes directly after it.
With the help of ORS we get a "\n" after each statement on each line.
awk -F'[. ]' 'NR == 1{print $2 ORS $NF}NR == 2{print $NF}' file
123232424
11111111
8888

grep with two or more words, one line by file with many files

everyone. I have
file 1.log:
text1 value11 text
text text
text2 value12 text
file 2.log:
text1 value21 text
text text
text2 value22 text
I want:
value11;value12
value21;value22
For now I grep values in separated files and paste later in another file, but I think this is not a very elegant solution because I need to read all files more than one time, so I try to use grep for extract all data in a single cat | grep line, but is not the result I expected.
I use:
cat *.log | grep -oP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" | tr '\n' '; '
or
cat *.log | grep -oP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" | xargs
but I get in each case:
value11;value12;value21;value22
value11 value12 value21 value22
Thank you so much.
Try:
$ awk -v RS='[[:space:]]+' '$0=="text1" || $0=="text2"{getline; printf "%s%s",sep,$0; sep=";"} ENDFILE{if(sep)print""; sep=""}' *.log
value11;value12
value21;value22
For those who prefer their commands spread over multiple lines:
awk -v RS='[[:space:]]+' '
$0=="text1" || $0=="text2" {
getline
printf "%s%s",sep,$0
sep=";"
}
ENDFILE {
if(sep)print""
sep=""
}' *.log
How it works
-v RS='[[:space:]]+'
This tells awk to treat any sequence of whitespace (newlines, blanks, tabs, etc) as a record separator.
$0=="text1" || $0=="text2"{getline; printf "%s%s",sep,$0; sep=";"}
This tells awk to look file records that matches either text1 ortext2`. For those records and those records only the commands in curly braces are executed. Those commands are:
getline tells awk to read in the next record.
printf "%s%s",sep,$0 tells awk to print the variable sep followed by the word in the record.
After we print the first match, the command sep=";" is executed which tells awk to set the value of sep to a semicolon.
As we start each file, sep is empty. This means that the first match from any file is printed with no separator preceding it. All subsequent matches from the same file will have a ; to separate them.
ENDFILE{if(sep)print""; sep=""}
After the end of each file is reached, we print a newline if sep is not empty and then we set sep back to an empty string.
Alternative: Printing the second word if the first word ends with a number
In an alternative interpretation of the question (hat tip: David C. Rankin), we want to print the second word on any line for which the first word ends with a number. In that case, try:
$ awk '$1~/[0-9]$/{printf "%s%s",sep,$2; sep=";"} ENDFILE{if(sep)print""; sep=""}' *.log
value11;value12
value21;value22
In the above, $1~/[0-9]$/ selects the lines for which the first word ends with a number and printf "%s%s",sep,$2 prints the second field on that line.
Discussion
The original command was:
$ cat *.log | grep -oP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" | tr '\n' '; '
value11;value12;value21;value22;
Note that, when using most unix commands, cat is rarely ever needed. In this case, for example, grep accepts a list of files. So, we could easily do without the extra cat process and get the same output:
$ grep -hoP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" *.log | tr '\n' '; '
value11;value12;value21;value22;
I agree with #John1024 and how you approach this problem will really depend on what the actual text is you are looking for. If for instance your lines of concern start with text{1,2,...} and then what you want in the second field can be anything, then his approach is optimal. However, if the values in the first field and vary and what you are really interested in is records where you have valueXX in the second field, then an approach keying off the second field may be what you are looking for.
Taking for example your second field, if the text you are interested in is in the form valueXX (where XX are two or more digits at the end of the field), you can process only those records where your second field matches and then use a simple conditional testing whether FNR == 1 to control the ';' delimiter output and ENDFILE to control the new line similar to:
awk '$2 ~ /^value[0-9][0-9][0-9]*$/ {
printf "%s%s", (FNR == 1) ? "" : ";", $2
}
ENDFILE {
print ""
}' file1.log file2.log
Example Use/Output
$ awk '$2 ~ /^value[0-9][0-9][0-9]*$/ {
printf "%s%s", (FNR == 1) ? "" : ";", $2
}
ENDFILE {
print ""
}' file1.log file2.log
value11;value12
value21;value22
Look things over and consider your actual input files and then either one of these two approaches should get you there.
If I understood you correctly, you want the values but search for the text[12] ie. to get the word after matching search word, not the matching search word:
$ awk -v s="^text[12]$" ' # set the search regex *
FNR==1 { # in the beginning of each file
b=b (b==""?"":"\n") # terminate current buffer with a newline
}
{
for(i=1;i<NF;i++) # iterate all but last word
if($i~s) # if current word matches search pattern
b=b (b~/^$|\n$/?"":";") $(i+1) # add following word to buffer
}
END { # after searching all files
print b # output buffer
}' *.log
Output:
value11;value12
value21;value22
* regex could be for example ^(text1|text2)$, too.

return all lines that match String1 in a file after the last matching String2 in the same file

I figured out how to get the line number of the last matching word in the file :
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1
It gave me the value of 1787. So, I passed it manually to the sed command to search for the lines that contains the sentence "blades are down" after that line number and it returned all the lines successfully
sed -n '1787,$s/blades are down/&/p' myfile.txt
Is there a way that I can pass the line number from the first command to the second one through a variable or a file so I can but them in the script to be executed automatically ?
Thank you.
You can do this by just connecting your two commands with xargs. 'xargs -I %' allows you to take the stdin from a previous command and place it whenever you want in the next command. The '%' is where your '1787' will be written:
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1 | xargs -I % sed -n %',$s/blades are down/&/p' myfile.txt
You can use:
command substitution to capture the result of the first command in a variable.
simple string concatenation to use the variable in your sed comand
startLine=$(grep -n ' b ' textfile.txt | tail -1 | cut -d ':' -f1)
sed -n ${startLine}',$s/blades are down/&/p' myfile.txt
You don't strictly need the intermediate variable - you could simply use:
sed $(grep -n ' b ' textfile.txt | tail -1 | cut -d ':' -f1)',$s/blades are down/&/p' myfile.txt`
but it may make sense to do error checking on the result of the command substitution first.
Note that I've streamlined the first command by using grep's -n option, which puts the line number separated with : before each match.
First we can get "half" of the file after the last match of string2, then you can use grep to match all the string1
tac your_file | awk '{ if (match($0, "string2")) { exit ; } else {print;} }' | \
grep "string1"
but the order is reversed if you don't care about the order. But if you do care, just add another tac at the end with a pipe |.
This might work for you (GNU sed):
sed -n '/\n/ba;/ b /h;//!H;$!d;x;//!d;s/$/\n/;:a;/\`.*blades are down.*$/MP;D' file
This reads through the file storing all lines following the last match of the first string (" b ") in the hold space.
At the end of file, it swaps to the hold space, checks that it does indeed have at least one match, then prints out those lines that match the second string ("blades are down").
N.B. it makes the end case (/\n/) possible by adding a new line to the end of the hold space, which will eventually be thrown away. This also caters for the last line edge condition.

grep match only lines in a specified range

Is it possible to use grep to match only lines with numbers in a pre-specified range?
For instance I want to list all lines with numbers in the range [1024, 2048] of a log that contain the word 'error'.
I would like to keep the '-n' functionality i.e. have the number of the matched line in the file.
Use sed first:
sed -ne '1024,2048p' | grep ...
-n says don't print lines, 'x,y,p' says print lines x-y inclusive (overrides the -n)
sed -n '1024,2048{/error/{=;p}}' | paste - -
Here /error/ is a pattern to match and = prints the line number.
Awk is a good tool for the job:
$ awk 'NR>=1024 && NR<=2048 && /error/ {print NR,$0}' file
In awk the variable NR contains the current line number and $0 contains the line itself.
The benefits with using awk is that you can easily change the output to display however you want it. For instance to separate the line number from line with a colon followed by a TAB:
$ awk 'NR>=1024 && NR<=2048 && /error/ {print NR,$0}' OFS=':\t' file

Resources