How do I get multi-line string between two braces containing a specific search string? - linux

I'm looking for a quick and easy one-liner to extract all brace-delimited text-blocks containing a search string from a text file. I've just about googled myself crazy on this, but everyone seems to be only posting about getting the text between braces without a search string.
I've got a large text file with contents like this:
blabla
blabla {
blabla
}
blabla
blabla {
blabla
blablaeventblabla
}
blabla
The vast majority of bracketed entries do not contain the search string, which is "event".
What I am trying to extract are all text (especially including multi-line matches) between each set of curly braces, but only if said text also contains the search string. So output like this:
blabla {
blabla
blablaeventblabla
}
My linux command line is /usr/bin/bash. I've been trying various grep and awk commands, but just can't get it to work:
awk '/{/,/event/,/}/' filepath
grep -iE "/{.*event.*/}" filepath
I was thinking this would be really easy, as it's a common task. What am I missing here?

This gnu-awk should work:
awk -v RS='[^\n]*{|}' 'RT ~ /{/{p=RT} /event/{ print p $0 RT }' file
blabla {
blabla
blablaeventblabla
}
RS='[^\n]*{\n|}' sets input record separator as any text followed by { OR a }. RT is the internal awk variable that is set to matched text based on RS regex.

User 999999999999999999999999999999 had a nice answer using sed which I really liked, unfortunately their answer appears to have disappeared for some reason.
Here it is for those who might be interested:
sed '/{/{:1; /}/!{N; b1}; /event/p}; d' filepath
Explanation:
/{/ if current line contains{then execute next block
{ start block
:1; label for code to jump to
/}/! if the line does not contain}then execute next block
{ start block
N; add next line to pattern space
b1 jump to label 1
}; end block
/event/p if the pattern space contains the search string, print it
(at this point the pattern space contains a full block of lines
from{to})
}; end block
d delete pattern space

Here is a modified version of this gem from 'leu' (10x leu for enlighten us). This one is doing something very similarly. Extract everything between which begin with 'DEC::PKCS7[' and ending with ']!':
cat file | sed '/^DEC::PKCS7\[/{s///; :1; /\]\!$/!{N; b1;}; s///;};'
Explanation:
/^DEC::PKCS7\[/ # if current line begins with 'DEC::PKCS7[' then execute next block
{ # start block
s///; # remove all upto 'DEC::PKCS7['
:1; # label '1' for code to jump to
/\]\!$/! # if the line does not end with ']!' then execute next block
{ # start block
N; # add next line to pattern space
b1; # jump to label 1
}; # end block
s///; # remove all from ']!' to end of line
}; # end block
Notes:
This works on single and multi-line.
This will have unexpected behavior if you have ']!' in the middle of
the input.
This does not answer the question. It's already answered very well.
My intentions are just to help other cases.

Related

How to extract a string after matching characters from a variable in shell script [duplicate]

This question already has answers here:
Can grep show only words that match search pattern?
(15 answers)
Closed 2 years ago.
I have a file with following text as below
classA = Something
classB = AB1234567
classC = Something more
classD = Something Else
Objective:
Using a shell script, I want to read the text which says AB1234567 from above complete text.
So to start, I can read the second line of the above text using following logic in my shell script:
secondLine=`sed -n '2p' my_file`;
echo $secondLine;
secondLine outputs classB = AB1234567. How do I extract AB1234567 from classB = AB1234567 in my shell script?
Question:
Considering the fact that AB is common in that particular part of the text all the files I deal with, how can I make sed to read all the numbers after AB?
Please note that classB = AB1234567 could end with a space or a newline. And I need to get this into a variable
Try:
sed '2{ s/^classB = \(AB[^ ]*\) *$/\1/;q } ;d' your_fileName
2 is the line number.
{ open a sed group command.
s/ substitute below match
^ is anchor for beginning of the line
\(...\) is known a capture group with \1 as its back-reference
[^ ]* means any character but not a space
\(AB[^ ]*\) capture AB followed by anything until first space seen but not spaces (back-reference is \1)
* means zero-or-more spaces
$ is anchor for end of the line
/ with below
\1 back-reference of above capture group
/ end of substitution
q quit to avoid reading rest of the file unnecessarily
} close group command.
d delete any other lines before seen line number 2.
get into variable:
your_variableName=$(sed '2{ s/^classB = \(AB[^ ]*\) *$/\1/;q } ;d' your_fileName)
Could you please try following, looks should be easy in awk. Considering you want to print 2nd line and print only digits in last field.
secondLine=$(awk 'FNR==2{sub(/[^0-9]*/,"",$NF);print $NF}' Input_file)
You may try this awk:
awk -F ' *= *' '$1 ~ /B$/ { print $2 }' file
AB1234567
I'm not 100% sure this is what you're looking for, but if you know there's only a single element in the file that starts with AB, this will get it into a variable:
$ cat sample.txt
classA = Something
classB = AB1234567
classC = Something more
classD = Something Else
$ x=$(perl -ne 'print if s/^.*\s+(AB\S+)\s*$/$1/' sample.txt)
$ echo "the variable is: $x"
the variable is: AB1234567
Explanation of the regex:
^ beginning of line
.* anything
\s+ any number of spaces
(AB\S+) anything that starts with AB followed by non-spaces
\s*$ Zero or more spaces followed by the end of the line.

Is there a simple way to group multiple lines of a command output based on a beginning and end match?

Is there a simple way to group multiple lines which match a pattern into single lines?
Basically, the output of a command lists something like:
key1 blah blah = dict {
unrelated stuff {
}
something I actually want to match via grep or something
some common end term for key1 I can use as an end pattern match
}
x 100 similar keys
My end-game here in this specific case is to strip an XML of entries which have a specific entry within them. I could do this (and solve a lot of other day-to-day problems) if each entry was its own line instead of multi-line (grep in the matches, sed out the text after the bracket, etc.)
Something like:
print multi-line crap | merge beginningpattern endpattern | grep lines now that everything is merged
Basically the 'merged' command would strip all linefeeds between every new beginningpattern and endpattern (maybe putting a linefeed at the end)
awk and gsub would be the right way, if I understand your question correctly. For example:
required_string=$(cat $i.xml | awk 'BEGIN { x=0 ; y=0} /<yourStartingString>/ { x=1 } /<EndingString>/ {x=0} {if (x==1 && y==1) { gsub(/(.*<grepforwhatyouneed>)|(<endgrep>)/,"");print } } { if(x==1 && y==0) y=1 }')

How can I replace a `---`-separated block with the contents of a text file, using standard Linux tools?

I have a text file that looks something like this:
Some text here. This text is not replaced.
---
And then a wild block appears!
It has stuff in it that I'm trying to replace.
---
The block is no more. Nothing to replace here.
And another text file with contents to insert:
A multi-
line thing to replace.
This block is not demarcated
in the same way
as the other
And what I'm trying to do is replace the ----demarcated block with the contents of the text file, so that it looks like this:
Some text here. This text is not replaced.
A multi-
line thing to replace.
This block is not demarcated
in the same way
as the other
The block is no more. Nothing to replace here.
This is similar to this question, but I don't think that applies, since what I'm dealing with is a multi-line block, and it doesn't seem like sed is very good at that. Can awk or ruby or something do this?
Untested but will be close if not exactly what you want:
awk '
NR==FNR { file1 = file1 $0 RS; next }
/---/ {
if (f) {
printf "%s", file1
}
f = !f
next
}
!f
' file1 file2

Replace text between two strings in file using linux bash

i have file "acl.txt"
192.168.0.1
192.168.4.5
#start_exceptions
192.168.3.34
192.168.6.78
#end_exceptions
192.168.5.55
and another file "exceptions"
192.168.88.88
192.168.76.6
I need to replace everything between #start_exceptions and #end_exceptions with content of exceptions file. I have tried many solutions from this forum but none of them works.
EDITED:
Ok, if you want to retain the #start and #stop, I will revert to awk:
awk '
BEGIN {p=1}
/^#start/ {print;system("cat exceptions");p=0}
/^#end/ {p=1}
p' acl.txt
Thanks to #fedorqui for tweaks in comments below.
Output:
192.168.0.1
192.168.4.5
#start_exceptions
192.168.88.88
192.168.76.6
#end_exceptions
192.168.5.55
p is a flag that says whether or not to print lines. It starts at the beginning as 1, so all lines are printed till I find a line starting with #start. Then I cat the contents of the exceptions file and stop printing lines till I find a line starting with #end, at which point I set the p flag back to 1 so remaining lines get printed.
If you want output to a file, add "> newfile" to the very end of the command like this:
awk '
BEGIN {p=1}
/^#start/ {print;system("cat exceptions");p=0}
/^#end/ {p=1}
p' acl.txt > newfile
YET ANOTHER VERSION IF YOU REALLY WANT TO USE SED
If you really, really want to do it with sed, you can use nested address spaces, firstly to select the lines between #start_exceptions and #end_exceptions, then again to select the first line within that and also lines other than the #end_exceptions line:
sed '
/^#start/,/^#end/{
/^#start/{
n
r exceptions
}
/^#end/!d
}
' acl.txt
Output:
192.168.0.1
192.168.4.5
#start_exceptions
192.168.88.88
192.168.76.6
#end_exceptions
192.168.5.55
ORIGINAL ANSWER
I think this will work:
sed -e '/^#end/r exceptions' -e '/^#start/,/^#end/d' acl.txt
When it finds /^#end/ it reads in the exceptions file. And it also deletes everything between /#start/ and /#end/.
I have left the matching slightly "loose" for clarity of expressing the technique.
You can use the following, based on Replace string with contents of a file using sed:
$ sed $'/end/ {r exceptions\n} ; /start/,/end/ {d}' acl.txt
192.168.0.1
192.168.4.5
192.168.88.88
192.168.76.6
192.168.5.55
Explanation
sed $'one_thing; another_thing' ac1.txt performs the two actions.
/end/ {r exceptions\n} if the line contains end, then read the file exceptions and append it.
/start/,/end/ {d} from a line containing start to a line containing end, delete all the lines.
I had problem with Mark Setchell's solution in MINGW. The caret was not picking up the beginning of line. Indeed, is the detection of the separator dependent on it being at the beginning of the line?
I came up with this awk alternative...
$ awk -v data="$(<exceptions)" '
BEGIN {p=1}
/#start_exceptions/ {print; print data;p=0}
/#end_exceptions/ {p=1}
p
' acl.txt

how to remove text block (pattern) from a file with sed/awk

I have thousands of text files that I have imported that contain a piece of text that I would like to remove.
It is not just a block of text but a pattern.
<!--
# Translator(s):
#
# username1 <email1>
# username2 <email2>
# usernameN <emailN>
#
-->
The block if it appears it will have 1 or more users being listed with their email addresses.
I have another small awk program that accomplish the task in a very few rows of code. It can be used to remove patterns of text from a file. Start as well as stop regexp can be set.
# This block is a range pattern and captures all lines between( and including )
# the start '<!--' to the end '-->' and stores the content in record $0.
# Record $0 contains every line in the range pattern.
# awk -f remove_email.awk yourfile
# The if statement is not needed to accomplish the task, but may be useful.
# It says - if the range patterns in $0 contains a '#' then it will print
# the string "Found an email..." if uncommented.
# command 'next' will discard the content of the current record and search
# for the next record.
# At the same time the awk program begins from the beginning.
/<!--/, /-->/ {
#if( $0 ~ /#/ ){
# print "Found an email and removed that!"
#}
next
}
# This line prints the body of the file to standard output - if not captured in
# the block above.
1 {
print
}
Save the code in 'remove_email.awk' and run it by:
awk -f remove_email.awk yourfile
This sed solution might work:
sed '/^<!--/,/^-->/{/^<!--/{h;d};H;/^-->/{x;/^<!--\n# Translator(s):\n#\(\n# [^<]*<email[0-9]\+>\)\+\n#\n-->$/!p};d}' file
An alternative (perhaps better solution?):
sed '/^<!--/{:a;N;/^-->/M!ba;/^<!--\n# Translator(s):\n#\(\n# \w\+ <[^>]\+>\)+\n#\n-->/d}' file
This gathers up the lines that start with <!-- and end with --> then pattern matches on the collection i.e. the second line is # Translator(s): the third line is #, the fourth and perhaps more lines follow # username <email address>, the penultimate line is # and the last line is -->. If a match is made the entire collection is deleted otherwise it is printed as normal.
for this task you need look-ahead, which is normally done with a parser.
Another solution, but not very efficient would be:
sed "s/-->/&\n/;s/<!--/\n&/" file | awk 'BEGIN {RS = "";FS = "\n"}/username/{print}'
HTH Chris
perl -i.orig -00 -pe 's/<!--\s+#\s*Translator.*?\s-->//gs' file1 file2 file3
Here is my solution, if I understood your problem correctly. Save the following to a file called remove_blocks.awk:
# See the beginning of the block, mark it
/<!--/ {
state = "block_started"
}
# At the end of the block, if the block does not contain email, print
# out the whole block.
/^-->/ {
if (!block_contains_user_email) {
for (i = 0; i < count; i++) {
print saved_line[i];
}
print
}
count = 0
block_contains_user_email = 0
state = ""
next
}
# Encounter a block: save the lines and wait until the end of the block
# to decide if we should print it out
state == "block_started" {
saved_line[count++] = $0
if (NF>=3 && $3 ~ /#/) {
block_contains_user_email = 1
}
next
}
# For everything else, print the line
1
Assume that your text file is in data.txt (or many files, for that matter):
awk -f remove_blocks.awk data.txt
The above command will print out everything in the text file, minus the blocks which contain user email.

Resources