Grep for a string in a specific order - linux

I need to grep a large file for words (really a string of characters) in a specified order. I also need the string to be able to contain a colon ":". For example, if the words are "apple", "banana", and ":peach", I will get the line that says, "apple cherries banana cool :peach" but not "apple :peach cherries banana cool". I would really like to be able to have one string and not grep commands in other grep commands. I am not concerned about searching whole words only.

grep "apple.*banana.*:peach" file
e.g.
$ echo "apple cherries banana cool :peach" | grep "apple.*banana.*:peach"
apple cherries banana cool :peach
$ echo "apple :peach cherries banana cool" |grep "apple.*banana.*:peach"
$
Pretty simple regex. the "apple", "banana" and ":peach" portions are just literal strings. The .* in between them is are two regex operators - the . will match any character, and the * says that the previous match can match 0 or more times.
In essence we're saying find these literal strings in this order, with any number of characters between them (including none, so even "applebanana:peach" would match.)

Related

searching a multiple line pattern using grep regex

I am relatively new to linux I want to search a pattern in a file which starts with "Leonard is" and ends on "champion"
Also this pattern might be placed in multiple lines
the input file(input.txt) may look like:
1 rabbit eats carrot Leonard is a champion
2 loin is the king of
3 jungle Leonard is a
4 Champion
5 Leonard is An exemplary
6 Champion
i would want to have all the occurrences of my pattern ignoring all the other characters other than the pattern in the output file:
1 Leonard is a champion
3 Leonard is a
4 Champion
5 Leonard is An exemplary
6 Champion
i have been very close with the following command:
cat input.txt | grep -ioE "Leonard.*Champion$"
as this command only returns
1 Leonard is a champion
ignoring all the patterns occurring in multiple line
if any other approach of searching other than grep is useful kindly let me know Thanks!!
Perl to the rescue:
perl -l -0777 -e 'print for <> =~ /(.*Leonard(?s:.*?)[Cc]hampion.*)/g' -- input.txt
-l adds newlines to prints
-0777 reads the whole file instead of processing it line by line
the diamond operator <> reads the input
.*? is like .*, i.e. it matches anything, but the ? means the shortest possible match is enough. That prevents the regex from matching everything between the first Leonard and last Champion.
. in a regex doesn't match a newline normally, but it does with the s modifier. (?s:.*?) localizes the changed behaviour, so other dots still don't match newlines.
You're looking for \s which stands for whitespace. + stands for one or more
Pattern: Leonard is a\s+Champion
See: https://regex101.com/r/qiNXhf/1
I use this tool with 0 knowledge of regex in my mind, and it helps me a lot. See the notes on the right bottom, where all these signs are explained.
The "." is referenced as "any character except new line", therefore, what you're trying to achieve with . is not possible, I suggest using \s with an addition of * or + as well (as suggested above), but need to find out how to implement it with the "grep" reg expression. There are also nice tools for regex testing - https://regexr.com/ for example.

Find two lines and replace with one

I am looking for a solution that would allow me to search text files on a linux server that would look a file and find a pattern such as:
Text 123
Blue Green
And then replaces it with one line, every time it finds it in a file...
Order Blue Green
I am not sure what would be the easiest way to solve this. I have seen many guides using SED but only for finding one line and replacing it.
You ask about sed, here is an answer in sed.
Let me mention however, that while sed is fun for this kind of exercise, you probably should choose something else, more flexible and easier to learn; perl for example.
look for first line /Text 123/
when found start a loop :a
concat next line N
replace twins of searched text with single copy and print it
s/Text 123\nText 123/Text 123/p;
loop while that replaces ta;
try to replace s///
rely on concat being printed unchanged if replace does not trigger
Code:
sed "/Text 123/{:a;N;s/Text 123\nText 123/Text 123/p;ta;s/Text 123\nBlue Green/Order Blue Green/}"
Test input:
Text 123
Do not replace
Lala
Text 123
Blue Green
lulu
Text 123
Do not replace either
Text 123
Text 123
Blue Green
preceding should be replaced
Output:
Text 123
Do not replace
Lala
Order Blue Green
lulu
Text 123
Do not replace either
Text 123
Order Blue Green
preceding should be replaced
Platform: Windows and GNU sed version 4.2.1
Note:
On that platform the sed line allows to use the environment variables for the two text fragments, which you probably want to do:
sed "/%EnvVar2%/{:a;N;s/%EnvVar2%\n%EnvVar2%/%EnvVar2%/p;ta;s/%EnvVar2%\n%EnvVar%/Order %EnvVar%/}"
Platform2:
still Windows
using bash GNU bash, version 3.1.17(1)-release (i686-pc-msys)
GNU sed version 4.2.1 (same)
On this platform, variables can e.g. be used like:
sed "/${EnvVar2}/{:a;N;s/${EnvVar2}\n${EnvVar2}/${EnvVar2}/p;ta;s/${EnvVar2}\n${EnvVar}/Order ${EnvVar}/}"
On this platform it is important to use "..." in order to be able to use variables,
it does not work with '...'.
As #edMorton has hinted, on all platforms be careful however with trying to replace (using variables) text which looks like using a variable. E.g. with "Text $123" in bash. In that case, not using variables but trying to replace text which looks like variables, using '...' instead of "..." is the way to go.
sed is for simple substitutions on individual lines, that is all. If you find yourself trying to use constructs other than s, g, and p (with -n) then you are on the wrong track as all other sed constructs became obsolete in the mid-1970s when awk was invented.
Your problem is not doing substitutions on individual lines, it's on a multi-line record and to do that with GNU awk for multi-char RS is:
$ awk -v RS='^$' -v ORS= '{gsub(/Text 123\nBlue Green/,"Order Blue Green")}1' file
Order Blue Green
but there are several other approaches depending on your real needs.

Use awk or sed to eliminate repeat entries in text file

I have a text file like this...
apples
berries
berries
cherries
and I want it to look like this...
apples
berries
cherries
That's it. I just want to eliminate doubled entries. I would prefer for this to be an awk or sed "one-liner" but if there's some other common bash tool that I have overlooked that would be fine.
sort -u file
if in case you are not worried about the order of the output.
Remove duplicates by retaining the order:
awk '!a[$1]++' file
There is a special command for this task, called uniq:
$ uniq file
apples
berries
cherries
This requires that common lines are adjacent, not adjacent equal lines are not removed.

How to retain part of a string when using SED?

I have multiple lines in a file like these
APPLE JUICE
APPLE JAM
APPLE JELLY
I want to replace "APPLE" with "ORANGE" and append "SHOP" to the end of the string. The output would be
ORANGE JUICE SHOP
ORANGE JAM SHOP
ORANGE JELLY SHOP
How to do this in sed or vim?
EDIT1:
I found a solution that works in sed
#replace APPLE with ORANGE
sed -i s/APPLE/ORANGE/g foo.txt
#in a line containing ORANGE replace newline with SHOP
sed -i '/ORANGE/s/$/ SHOP/g'
the problem now is that I can't get the second command to work in vim. So this is a vim question now.
No, #Kent's answer elide rows not containing APPLE and append SHOP to all lines in file. You need to use RE sub-expressions:
echo "APPLE JUICE\
CHERRY JAM\
APPLE JELLY" | sed 's/^APPLE \(.*\)$/ORANGE \1 SHOP/'
ORANGE JUICE SHOP
CHERRY JAM
ORANGE JELLY SHOP
In vim itself try this one in command mode
:%s/APPLE \ (.*\ )/ORANGE \1 SHOP/g
here you go:
kent$ echo "APPLE JUICE
APPLE JAM
APPLE JELLY"|sed 's/\bAPPLE\b/ORANGE/;s/$/ SHOP/'
ORANGE JUICE SHOP
ORANGE JAM SHOP
ORANGE JELLY SHOP
I added word boundary, so that something like PINEAPPLE won't be replaced by PINEORANGE
in vim, with the same idea:
%s/\<APPLE\>/ORANGE/|%s/$/ SHOP/
EDIT
when I posted the answer, there is no requirement that, appending SHOP only if there is an ORANGE match. OP updated the question after my answer.
anyway, I added another vim cmd:
%s/\<APPLE\>/ORANGE/g|%s/.*\<ORANGE\>.*/& SHOP/
#dyomas is right that #Kent's answer indiscriminately appends SHOP to all lines.
Here is a solution that does not use sub-expressions; instead, it uses the t command that tests for substitutions and only then appends SHOP.
sed 's/\bAPPLE\b/ORANGE/; t append; b; :append s/$/ SHOP/'
Because the t command jumps in case of substitution (but we need the opposite), we have to jump "over" a b command that otherwise aborts the line processing.

How to extract the first word that follows a string?

For example, say I have a text file example.txt that reads:
I like dogs.
My favorite dog is George because he is my dog.
George is a nice dog.
Now how do I extract "George" given that it is the first word that follows "My favorite dog is"?
What if there as more than one space, e.g.
My favorite dog is George .....
Is there a way to reliably extract the word "George" regardless of the number of spaces between "My favorite dog is" and "George"?
If you do not have perl installed you can use sed:
cat example.txt | sed 's/my favourite dog is *\([a-zA-Z]*\) .*/\1/g'
Pure Bash:
string='blah blah ! HEAT OF FORMATION 105.14088 93.45997 46.89387 blah blah'
pattern='HEAT OF FORMATION ([^[:blank:]]*)'
[[ $string =~ $pattern ]]
match=${BASH_REMATCH[1]}
You can do:
cat example.txt | perl -pe 's/My favorite dog is\s+(\w+).*/\1/g'
It outputs Geroge
If you are trying to search a file, especially if you have a big file, using external tools like sed/awk/perl are faster than using pure bash loops and bash string manipulation.
sed 's/.*HEAT OF FOMATION[ \t]*\(.[^ \t]*\).*/\1/' file
Pure bash string manipulation are only good when you are processing a few simple strings inside your script. Like manipulating a variable.

Resources