grep lines that contain 1 character followed by another character - linux

I'm working on my assignment and I've been stuck on this question, and I've tried looking for a solution online and my textbook.
The question is:
List all the lines in the f3.txt file that contain words with a character b not followed by a character e.
I'm aware you can do grep -i 'b' to find the lines that contain the letter b, but how can I make it so that it only shows the lines that contain b but not followed by the character e?

This will find a "b" that is not followed by "e":
$ echo "one be
two
bring
brought" | egrep 'b[^e]'
Or if perl is available but egrep is not:
$ echo "one be
two
bring
brought" | perl -ne 'print if /b[^e]/;'
And if you want to find lines with "b" not followed by "e" but no words that contain "be" (using the \w perl metacharacter to catch another character after the b), and avoiding any words that end with b:
$ echo "lab
bribe
two
bring
brought" | perl -ne 'print if /b\w/ && ! /be/'
So the final call would:
$ perl -ne 'print if /b\w/ && ! /be/' f3.txt

Exluding "edge" words that may exist and break the exercise, like lab , bribe and bob:
$ a="one
two
lab
bake
bob
aberon
bee
bell
bribe
bright
eee"
$ echo "$a" |grep -v 'be' |grep 'b.'
bake
bob
bright

You can go for the following two solutions:
grep -ie 'b[^e]' input_file.txt
or
grep -ie 'b.' input_file.txt | grep -vi 'be'
The first one does use regex:
'b[^e]' means b followed by any symbol that is not e
-i is to ignore case, with this option lines containing B or b that are not directly followed by e or E will be accepted
The second solution calls grep twice:
the first time you look for patterns that contains b only to select those lines
the resulting lines are filtered by the second grep using -v to reject lines containing be
both grep are ignoring the case by using -i
if b must absolutely be followed by another character then use b. (regex meaning b followed by any other char) otherwise if you want to also accept lines where b is not followed by any other character at all you can just use b in the first grep call instead of b..
grep -ie 'b' input_file.txt | grep -vi 'be'
input:
BEBE
bebe
toto
abc
bobo
result:
abc
bobo

Related

How to use grep to match two strings in the same line

How can I use grep to find two terms / strings in one line?
The output, or an entry in the log file, should only be made if the two terms / strings have been found.
I have made the following attempts:
egrep -n --color '(string1.*string2)' debuglog.log
In this example, everything between the two strings is marked.
But I would like to see only the two found strings marked.
Is that possible?
Maybe you could do this with another tool, I am open for suggestions.
The simplest solution would be to first select only the lines that contain both strings and then grep twice to color the matches, eg:
egrep 'string1.*string2|string2.*string1' |
egrep -n --color=always 'string1' | egrep --color 'string2'
It is important to set color to always, otherwise the grep won't output the color information to the pipe.
Here is single command awk solution that prefixes and suffixes matched strings with color codes:
awk '/string1.*string2/{
gsub(/string1|string2/, "\033[01;31m\033[K&\033[m"); print}' file
I know some people will disagree, but I think the best way is to do it like this :
Lets say this is your input :
$ cat > fruits.txt
apple banana
orange strawberry
coconut watermelon
apple peach
With this code you can get exactly what you need, and the code looks nicer and cleaner :
awk '{ if ( $0 ~ /apple/ && $0 ~ /banana/ )
{
print $0
}
}' fruits.txt
But, as I said before, some people will disagree as its too much typing. ths short way with grep is just concatenate many greps , e.g. :
grep 'apple' fruits.txt | grep 'banana'
Regards!
I am a little confused of what you really want as there was no sample data or expected output, but:
$ cat foo
1
2
12
21
132
13
And the awk that prints the matching parts of the records:
$ awk '
/1/ && /2/ {
while(match($0,/1|2/)) {
b=b substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
print b
b=""
}' foo
12
21
12
but fails with printing overlapping matches.

Swapping the first word with itself 3 times only if there are 4 words only using sed

Hi I'm trying to solve a problem only using sed commands and without using pipeline. But I am allowed to pass the result of a sed command to a file or te read from a file.
EX:
sed s/dog/cat/ >| tmp
or
sed s/dog/cat/ < tmp
Anyway lets say I had a file F1 and its contents was :
Hello hi 123
if a equals b
you
one abc two three four
dany uri four 123
The output should be:
if if if a equals b
dany dany dany uri four 123
Explanation: the program must only print lines that have exactly 4 words and when it prints them it must print the first word of the line 3 times.
I've tried doing commands like this:
sed '/[^ ]*.[^ ]*.[^ ]*/s/[^ ]\+/& & &/' F1
or
sed 's/[^ ]\+/& & &/' F1
but I can't figure out how i can calculate with sed that there are only 4 words in a line.
any help will be appreciated
$ sed -En 's/^([^[:space:]]+)([[:space:]]+[^[:space:]]+){3}$/\1 \1 &/p' file
if if if a equals b
dany dany dany uri four 123
The above uses a sed that supports EREs with a -E option, e.g. GNU and OSX seds).
If the fields are tab separated
sed 'h;s/[^[:blank:]]//g;s/[[:blank:]]\{3\}//;/^$/!d;x;s/\([^[:blank:]]*[[:blank:]]\)/\1\1\1/' infile

Extract values from a fixed-width column

I have text file named file that contains the following:
Australia AU 10
New Zealand NZ 1
...
If I use the following command to extract the country names from the first column:
awk '{print $1}' file
I get the following:
Australia
New
...
Only the first word of each country name is output.
How can I get the entire country name?
Try this:
$ awk '{print substr($0,1,15)}' file
Australia
New Zealand
To complement Raymond Hettinger's helpful POSIX-compliant answer:
It looks like your country-name column is 23 characters wide.
In the simplest case, if you don't need to trim trailing whitespace, you can just use cut:
# Works, but has trailing whitespace.
$ cut -c 1-23 file
Australia
New Zealand
Caveat: GNU cut is not UTF-8 aware, so if the input is UTF-8-encoded and contains non-ASCII characters, the above will not work correctly.
To trim trailing whitespace, you can take advantage of GNU awk's nonstandard FIELDWIDTHS variable:
# Trailing whitespace is trimmed.
$ awk -v FIELDWIDTHS=23 '{ sub(" +$", "", $1); print $1 }' file
Australia
New Zealand
FIELDWIDTHS=23 declares the first field (reflected in $1) to be 23 characters wide.
sub(" +$", "", $1) then removes trailing whitespace from $1 by replacing any nonempty run of spaces (" +") at the end of the field ($1) with the empty string.
However, your Linux distro may come with Mawk rather than GNU Awk; use awk -W version to determine which one it is.
For a POSIX-compliant solution that trims trailing whitespace, extend Raymond's answer:
# Trailing whitespace is trimmed.
$ awk '{ c=substr($0, 1, 23); sub(" +$", "", c); print c}' file
Australia
New Zealand
to get rid of the last two columns
awk 'NF>2 && NF-=2' file
NF>2 is the guard to filter records with more than 2 fields. If your data is consistent you can drop that to simply,
awk 'NF-=2' file
This isn't relevant in the case where your data has spaces, but often it doesn't:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
foo bar baz etc...
In these cases it's really easy to get, say, the IMAGE column using tr to remove multiple spaces:
$ docker ps | tr --squeeze-repeats ' '
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
foo bar baz
Now you can pipe this (without the pesky header row) to cut:
$ docker ps | tr --squeeze-repeats ' ' | tail -n +2 | cut -d ' ' -f 2
foo

How to make filter on an output based on string lines?

I have the following output
$ mycommand
1=aaa
1=eee
12=cccc
15=bbb
And I have a string str containing:
eee
cccc
and I want to display only lines which contains string exist in the string lines
So my out put will be:
$ mycommand | use_awk_or_sed_or_any_command
1=eee
12=cccc
If you store the strings in a file, you can use grep with its -f option:
$ cat search
eee
cccc
$ grep -wf search file
1=eee
12=cccc
You might also need the -F option if your strings contain special characters like ., $ etc.
Say your command is echo -e "1=aaa\n1=eee\n12=cccc\n15=bbb", you could do
echo -e "1=aaa\n1=eee\n12=cccc\n15=bbb" | grep -wE "$(sed 'N;s/\n/|/' <<<"$str")"
The sed command simply replaces the newline (\n) with | which is used by grep -E (for extended regular expressions) to separate multiple patterns. This means that the grep will print lines matching either eee or cccc. The w ensures that the match is of an entire word, so that things like eeeeee will not be matched.

How to use Linux command(sed?) to delete specific lines in a file?

I have a file that contains a matrix. For example, I have:
1 a 2 b
2 b 5 b
3 d 4 b
4 b 7 b
I know it's easy to use sed command to delete specific lines with specific strings. But what if I only want to delete those lines where the second field's value is b (i.e., second line and fourth line)?
You can use regex in sed.
sed -i 's/^[0-9]\s+b.*//g' xxx_file
or
sed -i '/^[0-9]\s+b.*/d' xxx_file
The "-i" argument will modify the file's content directly, you can remove "-i" and output the result to other files as you want.
Awk just work fine, just use code as below:
awk '{if ($2 != "b") print $0;}' file
if you want get more usage about awk, just man it!
awk:
cat yourfile.txt | awk '{if($2!="b"){print;}}'

Resources