How to use grep to match two strings in the same line - string

How can I use grep to find two terms / strings in one line?
The output, or an entry in the log file, should only be made if the two terms / strings have been found.
I have made the following attempts:
egrep -n --color '(string1.*string2)' debuglog.log
In this example, everything between the two strings is marked.
But I would like to see only the two found strings marked.
Is that possible?
Maybe you could do this with another tool, I am open for suggestions.

The simplest solution would be to first select only the lines that contain both strings and then grep twice to color the matches, eg:
egrep 'string1.*string2|string2.*string1' |
egrep -n --color=always 'string1' | egrep --color 'string2'
It is important to set color to always, otherwise the grep won't output the color information to the pipe.

Here is single command awk solution that prefixes and suffixes matched strings with color codes:
awk '/string1.*string2/{
gsub(/string1|string2/, "\033[01;31m\033[K&\033[m"); print}' file

I know some people will disagree, but I think the best way is to do it like this :
Lets say this is your input :
$ cat > fruits.txt
apple banana
orange strawberry
coconut watermelon
apple peach
With this code you can get exactly what you need, and the code looks nicer and cleaner :
awk '{ if ( $0 ~ /apple/ && $0 ~ /banana/ )
{
print $0
}
}' fruits.txt
But, as I said before, some people will disagree as its too much typing. ths short way with grep is just concatenate many greps , e.g. :
grep 'apple' fruits.txt | grep 'banana'
Regards!

I am a little confused of what you really want as there was no sample data or expected output, but:
$ cat foo
1
2
12
21
132
13
And the awk that prints the matching parts of the records:
$ awk '
/1/ && /2/ {
while(match($0,/1|2/)) {
b=b substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
print b
b=""
}' foo
12
21
12
but fails with printing overlapping matches.

Related

Filtering on a condition using the column names and not numbers

I am trying to filter a text file with columns based on two conditions. Due to the size of the file, I cannot use the column numbers (as there are thousands and are unnumbered) but need to use the column names. I have searched and tried to come up with multiple ways to do this but nothing is returned to the command line.
Here are a few things I have tried:
awk '($colname1==2 && $colname2==1) { count++ } END { print count }' file.txt
to filter out the columns based on their conditions
and
head -1 file.txt | tr '\t' | cat -n | grep "COLNAME
to try and return the possible column number related to the column.
An example file would be:
ID ad bd
1 a fire
2 b air
3 c water
4 c water
5 d water
6 c earth
Output would be:
2 (count of ad=c and bd=water)
with your input file and the implied conditions this should work
$ awk -v c1='ad' -v c2='bd' 'NR==1{n=split($0,h); for(i=1;i<=n;i++) col[h[i]]=i}
$col[c1]=="c" && $col[c2]=="water"{count++} END{print count+0}' file
2
or you can replace c1 and c2 with the values in the script as well.
to find the column indices you can run
$ awk -v cols='ad bd' 'BEGIN{n=split(cols,c); for(i=1;i<=n;i++) colmap[c[i]]}
NR==1{for(i=1;i<=NF;i++) if($i in colmap) print $i,i; exit}' file
ad 2
bd 3
or perhaps with this chain
$ sed 1q file | tr -s ' ' \\n | nl | grep -E 'ad|bd'
2 ad
3 bd
although may have false positives due to regex match...
You can rewrite the awk to be more succinct
$ awk -v cols='ad bd' '{while(++i<=NF) if(FS cols FS ~ FS $i FS) print $i,i;
exit}' file
ad 2
bd 3
As I mentioned in an earlier comment, the answer at https://unix.stackexchange.com/a/359699/133219 shows how to do this:
awk -F'\t' '
NR==1 {
for (i=1; i<=NF; i++) {
f[$i] = i
}
}
($(f["ad"]) == "c") && ($(f["bd"]) == "water") { cnt++ }
END { print cnt+0 }
' file
2
I'm assuming your input is tab-separated due to the tr '\t' in the command in your question that looks like you're trying to convert tabs to newlines to convert column names to numbers. If I'm wrong and they're just separated by any chains of white space then remove -F'\t' from the above.
Use miller toolkit to manipulate tab-delimited files using column names. Below is a one-liner that filters a tab-delimited file (delimiter is specified using --tsv) and writes the results to STDOUT together with the header. The header is removed using tail and the lines are counted with wc.
mlr --tsv filter '$ad == "c" && $bd == "water"' file.txt | tail -n +2 | wc -l
Prints:
2
SEE ALSO:
miller manual
Note that miller can be easily installed, for example, using conda, like so:
conda create --name miller miller
For years it bugged me there is no succinct way in Unix to do this sort of thing, although miller is a pretty good tool for this. Recently I wrote pick to choose columns by name, and additionally modify, combine and add them by name, as well as filtering rows by clauses using column names. The solution to the above with pick is
pick -h #ad=c #bd=water < data.txt | wc -l
By default pick prints the header of the selected columns, -h is to omit it. To print columns you simply name them on the command line, e.g.
pick ad water < data.txt | wc -l
Pick has many modes, all of them focused on manipulating columns and selecting/filtering rows with a minimal amount of syntax.

Filtering by author and counting all numbers im txt file - Linux terminal, bash

I need help with two hings
1)the file.txt has the format of a list of films
, in which they are authors in different lines, year of publication, title, e.g.
author1
year1
title1
author2
year2
title2
author3
year3
title3
author4
year4
title4
I need to show only book titles whose author is "Joanne Rowling"
2)
one.txt contains numbers and letters for example like:
dada4dawdaw54 232dawdawdaw 53 34dadasd
77dkwkdw
65 23 laka 23
I need to sum all of them and receive score - here it should 561
I tried something like that:
awk '{for(i=1;i<=NF;i++)s+=$i}END{print s}' plik2.txt
but it doesn't make sense
For the 1st question, the solution of okulkarni is great.
For the 2nd question, one solution is
sed 's/[^0-9]/ /g' one.txt | awk '{for(i=1;i<=NF;i++) sum+= $i} END { print sum}'
The sed command converts all non-numeric characters into spaces, while the awk command sums the numbers, line by line.
For the first question, you just need to use grep. Specifically, you can do grep -A 2 "Joanne Rowling" file.txt. This will show all lines with "Joanne Rowling" and the two lines immediately after.
For the second question, you can also use grep by doing grep -Eo '[0-9]+' | paste -sd+ | bc. This will put a + between every number found by grep and then add them up using bc.

grep lines that contain 1 character followed by another character

I'm working on my assignment and I've been stuck on this question, and I've tried looking for a solution online and my textbook.
The question is:
List all the lines in the f3.txt file that contain words with a character b not followed by a character e.
I'm aware you can do grep -i 'b' to find the lines that contain the letter b, but how can I make it so that it only shows the lines that contain b but not followed by the character e?
This will find a "b" that is not followed by "e":
$ echo "one be
two
bring
brought" | egrep 'b[^e]'
Or if perl is available but egrep is not:
$ echo "one be
two
bring
brought" | perl -ne 'print if /b[^e]/;'
And if you want to find lines with "b" not followed by "e" but no words that contain "be" (using the \w perl metacharacter to catch another character after the b), and avoiding any words that end with b:
$ echo "lab
bribe
two
bring
brought" | perl -ne 'print if /b\w/ && ! /be/'
So the final call would:
$ perl -ne 'print if /b\w/ && ! /be/' f3.txt
Exluding "edge" words that may exist and break the exercise, like lab , bribe and bob:
$ a="one
two
lab
bake
bob
aberon
bee
bell
bribe
bright
eee"
$ echo "$a" |grep -v 'be' |grep 'b.'
bake
bob
bright
You can go for the following two solutions:
grep -ie 'b[^e]' input_file.txt
or
grep -ie 'b.' input_file.txt | grep -vi 'be'
The first one does use regex:
'b[^e]' means b followed by any symbol that is not e
-i is to ignore case, with this option lines containing B or b that are not directly followed by e or E will be accepted
The second solution calls grep twice:
the first time you look for patterns that contains b only to select those lines
the resulting lines are filtered by the second grep using -v to reject lines containing be
both grep are ignoring the case by using -i
if b must absolutely be followed by another character then use b. (regex meaning b followed by any other char) otherwise if you want to also accept lines where b is not followed by any other character at all you can just use b in the first grep call instead of b..
grep -ie 'b' input_file.txt | grep -vi 'be'
input:
BEBE
bebe
toto
abc
bobo
result:
abc
bobo

sed command to strip a match found

I have a file "fruit.xml" that looks like the below:
FRUIT="Apples"
FRUIT="Bananas"
FRUIT="Peaches"
I want to use a single SED line command to find all occurrences of NAME=" and I want strip the value between the "" from all the matches found.
So the result should look like:
Apples
Bananas
Peaches
This is the command I am using:
sed 's/.*FRUIT="//' fruit.xml
The problem is that it leaves the last " at the end of the value I need. eg: Apples".
Just catch the group and print it back: catch everything from " until another " is found with the () (or \(...\) if you don't use the -r option). Then, print it back with \1:
$ sed -r 's/.*FRUIT="([^"]*)"/\1/' file
Apples
Bananas
Peaches
You can also use field separators with awk: tell awk that your field separators are either FRUIT=" or ". This way, the desired content becomes the 2nd field.
$ awk -FS='FRUIT="|"' '{print $2}' file
Apples
Bananas
Peaches
To make your command work, just strip the " at the end of the line:
$ sed -e 's/.*FRUIT="//' -e 's/"$//' file
^^ ^^^^^^^^^^^
| replace " in the end of line with nothing
-e to allow you use multiple commands
This would be enough if you want to keep the leading spaces,
sed 's/\bFRUIT="\([^"]*\)"/\1/' fruit.xml
OR
sed 's/\bFRUIT="\|"//g' fruit.xml
Try this, this replaces the line with the founded fruit in the quotes:
sed 's/.*FRUIT="\(.*\)"/\1/' test.xml
Use a simple cut command
cut -d '"' -f2 fruits.xml
Output:
Apples
Bananas
Peaches
assuming 1 occurence per value and with this format
sed 's/.*="//;s/".*$//' fruit.xml

How to use Linux command(sed?) to delete specific lines in a file?

I have a file that contains a matrix. For example, I have:
1 a 2 b
2 b 5 b
3 d 4 b
4 b 7 b
I know it's easy to use sed command to delete specific lines with specific strings. But what if I only want to delete those lines where the second field's value is b (i.e., second line and fourth line)?
You can use regex in sed.
sed -i 's/^[0-9]\s+b.*//g' xxx_file
or
sed -i '/^[0-9]\s+b.*/d' xxx_file
The "-i" argument will modify the file's content directly, you can remove "-i" and output the result to other files as you want.
Awk just work fine, just use code as below:
awk '{if ($2 != "b") print $0;}' file
if you want get more usage about awk, just man it!
awk:
cat yourfile.txt | awk '{if($2!="b"){print;}}'

Resources