Bash Advanced grep solution - linux

I have a file.txt on my Linux which looks like the following structure:
file.txt:
full name
E-mail: email#email.com
Phone: 0123456789
full name
email#email.com
01/23456789
full name
e: email#email.com
00-223-445-56
.
.
.
etc
Or only the name entry and phone number or e-mail address
I would like to use grep to when I start
./myprogram.sh file.txt
list all of the E-mail addresses and Phone numbers from the file. How can I do that if the file.txt looks like this?

you can start wit something simple like that:
cat file.txt | grep -E "(#|[0-9]+)"
it gives you everything with an # (so emails) and everything with numbers (so phone numbers). you can use more advanced regular expressions for better search (emails and phone numbers do have stricter rules...) but thats the idea.

egrep "#|[0-9]"
will match only lines that contain an "#" or at least one digit. Your example says that the name line does not contain digits.

It isn't quite clear which format you expect as an output. If you want to have e-mail addresses and phone numbers separated (is it that what you want? The connection: e-mail address <-> phone number would be kind of confused then) you could also use (GNU) sed:
sed -n -e '1 {s/\(.*\)/e-mail:\n\1/; P;};' \
-e '/#/ s/\(.\+[ \\\t]\)\{0,1\}\(.\+\)#\(.\+\)/\t\2\#\3/p;' \
-e '/[0-9]\+$/ H; $ {x; s/\n\([^0-9]*\)\([0-9]\+\)/\n\t\2/g; s/\(.*\)/\nphone:\n\1/p;}'
file.txt

Do you actually want to use grep, or are you just using the phrase "use grep" to mean "filter". Assuming the latter (since grep is the wrong tool for this), and assuming that each record has a single email address in the final column of the 2nd line and a phone number in the final column of the 3rd line, and that each record is separated by a line with no extra whitespace, you could do:
<file.txt awk '{print $NF}' | awk '{print $2,$3}' FS='\n' RS=
You could do it with a single awk invocation, but this is simpler and probably sufficient.

That's a job for awk, not grep:
$ awk '(NR%4)~/^[23]$/{print $NF}' file
email#email.com
0123456789
email#email.com
01/23456789
email#email.com
00-223-445-56
$ awk '(NR%4)~/^[23]$/{printf "%s%s", $NF, (++c%2?OFS:ORS)}' file
email#email.com 0123456789
email#email.com 01/23456789
email#email.com 00-223-445-56
$ awk '(NR%4)==2{print $NF}' file
email#email.com
email#email.com
email#email.com
$ awk '(NR%4)==3{print $NF}' file
0123456789
01/23456789
00-223-445-56
Take your pick, it's all trivial...

Related

Capturing string between 2 specific letters/words using shell scripting

I am trying to capture the string between 2 specific letters/words using sed/awk. This is what I am trying to do:
The input is a file test.log containing
Owner: CN=abc.samplecerrt.com,o=IN,DC=com
Owner: CN=abc1.samplecerrt.com,o=IN,DC=com
I want to extract only "CN=abc.samplecerrt.com"
I tried
sed 's/.*CN=\(.*\),.*/\1/p' test.log >> result.log
But this returns "abc.samplecerrt.com,o=IN,DC=com"
How do I go about this?
test file:
$ cat logs.txt
CN=abc.samplecerrt.com,o=IN,DC=com Owner: CN=abc1.samplecerrt.com,o=IN,DC=com
command and output:
$ grep -oP 'CN=(?:(?!CN=).)*?.com' logs.txt
CN=abc.samplecerrt.com
CN=abc1.samplecerrt.com
This might work for you (GNU sed):
sed -n 's/.*\(CN=[^,]*\).*/\1/p' file
Or:
sed 's/.*\(CN=[^,]*\).*/\1/p;d' file
The first turns off implicit printing -n so as to act like grep.
Matches and captures the string CN= followed by zero or more non-comma characters and prints the captured group \1 if a match is made.
The second solution is much the same except it deletes all lines and only prints the captured group as above.
With awk you can get the field where is the string you need. For it, you can set FS=:|, Now if you run
awk -v FS=":|," '{print $2}' file
CN=abc.samplecerrt.com
CN=abc1.samplecerrt.com
you get the field. But you only want one, so
awk -v FS=":|," '$2 !~ /abc1/ {print $2}' file
CN=abc.samplecerrt.com

bash: awk print with in print

I need to grep some pattern and further i need to print some output within that. Currently I am using the below command which is working fine. But I like to eliminate using multiple pipe and want to use single awk command to achieve the same output. Is there a way to do it using awk?
root#Server1 # cat file
Jenny:Mon,Tue,Wed:Morning
David:Thu,Fri,Sat:Evening
root#Server1 # awk '/Jenny/ {print $0}' file | awk -F ":" '{ print $2 }' | awk -F "," '{ print $1 }'
Mon
I want to get this output using single awk command. Any help?
You can try something like:
awk -F: '/Jenny/ {split($2,a,","); print a[1]}' file
Try this
awk -F'[:,]+' '/Jenny/{print $2}' file.txt
It is using muliple -F value inside the [ ]
The + means one or more since it is treated as a regex.
For this particular job, I find grep to be slightly more robust.
Unless your company has a policy not to hire people named Eve.
(Try it out if you don't understand.)
grep -oP '^[^:]*Jenny[^:]*:\K[^,:]+' file
Or to do a whole-word match:
grep -oP '^[^:]*\bJenny\b[^:]*:\K[^,:]+' file
Or when you are confident that "Jenny" is the full name:
grep -oP '^Jenny:\K[^,:]+' file
Output:
Mon
Explanation:
The stuff up until \K speaks for itself: it selects the line(s) with the desired name.
[^,:]+ captures the day of week (in this case Mon).
\K cuts off everything preceding Mon.
-o cuts off anything following Mon.

Using grep to find a line and print only 2 words from that line

I'm new to linux and i decided to learn shell scripting. I have created a file data.txt that contains the following text:
12345 Nick Abrams A 10900
67890 George Kennedy I 20000
(text goes on...)
The first field is a card's pin number, the second is the name of the client, third is surname, fourth indicates whether a card is active (or inactive) and the last field is the client's balance. I need to write a script that receives the client's pin from keyboard and if that pin is written in the text file then the script should print the client's name and surname on the screen. I have used grep like this
grep "^$pin" data.txt
But it returns all the details of a client. I only need to print the second and third field and ignore everything else. Is there any parameter to isolate the words i need?
Could you please try following and let me know if this helps you.
cat script.ksh
echo "Please enter your choice:"
read value
awk -v val="$value" '$1==val{print $2,$3}' Input_file
EDIT: Adding a solution with grep and cut in a script too here.
cat script.ksh
echo "Please enter your choice:"
read value
grep "$value" Input_file | cut -d" " -f2,3
Better use sed :
sed -n 's/^'"$pin \([^ ]* [^ ]*\).*"'/\1/p' data.txt
This command match a line that start by $pin, and write only part that follow regex between \( and \)
#!/bin/bash
echo "Enter PIN, please"
read pin
grep "${pin}" pins.txt | awk '{print $2" "$3}'
Input: 12345
Output: Nick Abrams
Or with cut:
#!/bin/bash
echo "Enter PIN, please"
read pin
grep "${pin}" pins.txt | cut -d' ' -f2,3

Add an extra column after grep content

I understand that grep can extract the specific content from a file line by line.
Just wondering how can add another column before or after each line as an index.
For example:
grep "aaa" text.txt > tmp.txt
In the tmp.txt file, we can see the content as follows,
aaawekjre
qejrteraaa
wrgeaaaere
However, I would like to add a specific index as an extra column.
Therefore, the tmp.txt might look like this:
John aaawekjre
John qejrteraaa
John wrgeaaaere
You can use awk:
awk '/aaa/{print "John", $0}' text.txt > tmp.txt
$ sed -n '/aaa/ s/^/John /p' text.txt
John aaawekjre
John qejrteraaa
John wrgeaaaere
How it works
-n
This tells sed not to print anything unless we explicitly ask it to.
/aaa/ s/^/John /p
This selects lines that contain aaa. For those lines, we do a substitution (s/^/John /) to put John at the beginning of the line and we print the line (p).
In this way, lines that do not contain aaa are never printed. Thus, there is no need for a separate grep process.
try this
grep "aaa" text.txt | awk '{print "John " $0}' > tmp.txt

how to print a line resulting from a search keyword in a column using grep command,

This is an example of a filename.csv file
"Sort Order","Common Name","Formal Name","Type","Sub Type"
"1","Afghanistan","Islamic State of Afghanistan","Independent State"
"2","Albania","Republic of Albania","Independent State"
"3","Algeria","People's Democratic Republic of Algeria","Independent State"
"4","Andorra","Principality of Andorra","Independent State"
"5","Angola","Republic of Angola","Independent State"
So what is the grep command to search for angola in common name and print it like this:
"5","Angola","Republic of Angola","Independent State"
I know that we can use:
grep '"Algeria"' filename.csv
However, what if I am being more specific. Let's say Algeria might exist in other column however, we only need to print the one in Common Name column.
I tried
grep '"Common Name | Algeria"' filename.csv
Seems not to work.
You could try the below grep command to print the lines which contains the string Angola in the second column.
grep -E '^"[^"]*","Angola"' file
This could be easily done through awk,
awk -F, '$2=="\"Angola\""' file
try awk
awk -F"," '$2~/Algeria/' file
Use cut command,
grep Algeria filename.csv|cut -f2 -d','|awk -F '[""]' '{print $2}'
Output:
Algeria
Here is the explanation of all command I have added with your grep and discarded some unnecessary quotes you used in it.
cut command:
-f It will extract 2nd column depending on delimiter
-d It will choose delimiter i.e.: which is here ,
awk command:
It will extract value between two "" (block quotes) and print it.

Resources