How to make filter on an output based on string lines? - linux

I have the following output
$ mycommand
1=aaa
1=eee
12=cccc
15=bbb
And I have a string str containing:
eee
cccc
and I want to display only lines which contains string exist in the string lines
So my out put will be:
$ mycommand | use_awk_or_sed_or_any_command
1=eee
12=cccc

If you store the strings in a file, you can use grep with its -f option:
$ cat search
eee
cccc
$ grep -wf search file
1=eee
12=cccc
You might also need the -F option if your strings contain special characters like ., $ etc.

Say your command is echo -e "1=aaa\n1=eee\n12=cccc\n15=bbb", you could do
echo -e "1=aaa\n1=eee\n12=cccc\n15=bbb" | grep -wE "$(sed 'N;s/\n/|/' <<<"$str")"
The sed command simply replaces the newline (\n) with | which is used by grep -E (for extended regular expressions) to separate multiple patterns. This means that the grep will print lines matching either eee or cccc. The w ensures that the match is of an entire word, so that things like eeeeee will not be matched.

Related

Swapping the first word with itself 3 times only if there are 4 words only using sed

Hi I'm trying to solve a problem only using sed commands and without using pipeline. But I am allowed to pass the result of a sed command to a file or te read from a file.
EX:
sed s/dog/cat/ >| tmp
or
sed s/dog/cat/ < tmp
Anyway lets say I had a file F1 and its contents was :
Hello hi 123
if a equals b
you
one abc two three four
dany uri four 123
The output should be:
if if if a equals b
dany dany dany uri four 123
Explanation: the program must only print lines that have exactly 4 words and when it prints them it must print the first word of the line 3 times.
I've tried doing commands like this:
sed '/[^ ]*.[^ ]*.[^ ]*/s/[^ ]\+/& & &/' F1
or
sed 's/[^ ]\+/& & &/' F1
but I can't figure out how i can calculate with sed that there are only 4 words in a line.
any help will be appreciated
$ sed -En 's/^([^[:space:]]+)([[:space:]]+[^[:space:]]+){3}$/\1 \1 &/p' file
if if if a equals b
dany dany dany uri four 123
The above uses a sed that supports EREs with a -E option, e.g. GNU and OSX seds).
If the fields are tab separated
sed 'h;s/[^[:blank:]]//g;s/[[:blank:]]\{3\}//;/^$/!d;x;s/\([^[:blank:]]*[[:blank:]]\)/\1\1\1/' infile

grep lines that contain 1 character followed by another character

I'm working on my assignment and I've been stuck on this question, and I've tried looking for a solution online and my textbook.
The question is:
List all the lines in the f3.txt file that contain words with a character b not followed by a character e.
I'm aware you can do grep -i 'b' to find the lines that contain the letter b, but how can I make it so that it only shows the lines that contain b but not followed by the character e?
This will find a "b" that is not followed by "e":
$ echo "one be
two
bring
brought" | egrep 'b[^e]'
Or if perl is available but egrep is not:
$ echo "one be
two
bring
brought" | perl -ne 'print if /b[^e]/;'
And if you want to find lines with "b" not followed by "e" but no words that contain "be" (using the \w perl metacharacter to catch another character after the b), and avoiding any words that end with b:
$ echo "lab
bribe
two
bring
brought" | perl -ne 'print if /b\w/ && ! /be/'
So the final call would:
$ perl -ne 'print if /b\w/ && ! /be/' f3.txt
Exluding "edge" words that may exist and break the exercise, like lab , bribe and bob:
$ a="one
two
lab
bake
bob
aberon
bee
bell
bribe
bright
eee"
$ echo "$a" |grep -v 'be' |grep 'b.'
bake
bob
bright
You can go for the following two solutions:
grep -ie 'b[^e]' input_file.txt
or
grep -ie 'b.' input_file.txt | grep -vi 'be'
The first one does use regex:
'b[^e]' means b followed by any symbol that is not e
-i is to ignore case, with this option lines containing B or b that are not directly followed by e or E will be accepted
The second solution calls grep twice:
the first time you look for patterns that contains b only to select those lines
the resulting lines are filtered by the second grep using -v to reject lines containing be
both grep are ignoring the case by using -i
if b must absolutely be followed by another character then use b. (regex meaning b followed by any other char) otherwise if you want to also accept lines where b is not followed by any other character at all you can just use b in the first grep call instead of b..
grep -ie 'b' input_file.txt | grep -vi 'be'
input:
BEBE
bebe
toto
abc
bobo
result:
abc
bobo

Extract values from a fixed-width column

I have text file named file that contains the following:
Australia AU 10
New Zealand NZ 1
...
If I use the following command to extract the country names from the first column:
awk '{print $1}' file
I get the following:
Australia
New
...
Only the first word of each country name is output.
How can I get the entire country name?
Try this:
$ awk '{print substr($0,1,15)}' file
Australia
New Zealand
To complement Raymond Hettinger's helpful POSIX-compliant answer:
It looks like your country-name column is 23 characters wide.
In the simplest case, if you don't need to trim trailing whitespace, you can just use cut:
# Works, but has trailing whitespace.
$ cut -c 1-23 file
Australia
New Zealand
Caveat: GNU cut is not UTF-8 aware, so if the input is UTF-8-encoded and contains non-ASCII characters, the above will not work correctly.
To trim trailing whitespace, you can take advantage of GNU awk's nonstandard FIELDWIDTHS variable:
# Trailing whitespace is trimmed.
$ awk -v FIELDWIDTHS=23 '{ sub(" +$", "", $1); print $1 }' file
Australia
New Zealand
FIELDWIDTHS=23 declares the first field (reflected in $1) to be 23 characters wide.
sub(" +$", "", $1) then removes trailing whitespace from $1 by replacing any nonempty run of spaces (" +") at the end of the field ($1) with the empty string.
However, your Linux distro may come with Mawk rather than GNU Awk; use awk -W version to determine which one it is.
For a POSIX-compliant solution that trims trailing whitespace, extend Raymond's answer:
# Trailing whitespace is trimmed.
$ awk '{ c=substr($0, 1, 23); sub(" +$", "", c); print c}' file
Australia
New Zealand
to get rid of the last two columns
awk 'NF>2 && NF-=2' file
NF>2 is the guard to filter records with more than 2 fields. If your data is consistent you can drop that to simply,
awk 'NF-=2' file
This isn't relevant in the case where your data has spaces, but often it doesn't:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
foo bar baz etc...
In these cases it's really easy to get, say, the IMAGE column using tr to remove multiple spaces:
$ docker ps | tr --squeeze-repeats ' '
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
foo bar baz
Now you can pipe this (without the pesky header row) to cut:
$ docker ps | tr --squeeze-repeats ' ' | tail -n +2 | cut -d ' ' -f 2
foo

Deleting string up to the first occurrence of certain character

Is there a way to delete all the characters up to and including the first occurrence of a certain character?
123:abc
12:cba
1234:cccc
and the output would be:
abc
cba
cccc
Using sed:
sed 's/^[^:]*://' file
abc
cba
cccc
Or using awk:
awk -F: '{print $2}' file
abc
cba
cccc
You could use cut:
$ cut -d":" -f2- myfile.txt
use awk
echo "123:abc" | awk -F ":" '{print $2}'
-F means to use : as the separator to split the string.
{print $2} means to print the second substring.
If the data is in a variable, you can use parameter expansion:
$ var=123:abc
$ echo ${var#*:}
abc
$
The # means to remove the shortest pattern of *: (anything followed by a colon) from the front of the string, as you said in your requirement "delete all the characters up to the first occurrence of certain character + that character", not to get the second field where the delimiter is the colon.

Extract Lines when Column K is empty with AWK/Perl

I have data that looks like this:
foo 78 xxx
bar yyy
qux 99 zzz
xuq xyz
They are tab delimited.
How can I extract lines where column 2 is empty, yielding
bar yyy
xuq xyz
I tried this but doesn't seem to work:
awk '$2==""' myfile.txt
You need to specifically set the field separator to a TAB character:
> cat qq.in
foo 78 xxx
bar yyy
qux 99 zzz
xuq xyz
> cat qq.in | awk 'BEGIN {FS="\t"} $2=="" {print}'
bar yyy
xuq xyz
The default behaviour for awk is to treat an FS of SPACE (the default) as a special case. From the man page:
In the special case that FS is a single space, fields are separated by runs of spaces and/or tabs and/or newlines. (my italics)
perl -F/\t/ -lane 'print unless $F[1] eq q//' myfile.txt
Command Switches
-F tells Perl what delimiter to autosplit on (tabs in this case)
-a enables autosplit mode, splitting each line on the specified delimiter to populate an array #F
-l automatically appends a newline "\n" at the end of each printed line
-n processes the file line-by-line
-e treats the first quoted argument as code and not a filename
grep -e '^.*\t\t.*$' myfile.txt
Will grep each line consisting of characters-tab-tab-characters (nothing between tabs).

Resources