Add an extra column after grep content - linux

I understand that grep can extract the specific content from a file line by line.
Just wondering how can add another column before or after each line as an index.
For example:
grep "aaa" text.txt > tmp.txt
In the tmp.txt file, we can see the content as follows,
aaawekjre
qejrteraaa
wrgeaaaere
However, I would like to add a specific index as an extra column.
Therefore, the tmp.txt might look like this:
John aaawekjre
John qejrteraaa
John wrgeaaaere

You can use awk:
awk '/aaa/{print "John", $0}' text.txt > tmp.txt

$ sed -n '/aaa/ s/^/John /p' text.txt
John aaawekjre
John qejrteraaa
John wrgeaaaere
How it works
-n
This tells sed not to print anything unless we explicitly ask it to.
/aaa/ s/^/John /p
This selects lines that contain aaa. For those lines, we do a substitution (s/^/John /) to put John at the beginning of the line and we print the line (p).
In this way, lines that do not contain aaa are never printed. Thus, there is no need for a separate grep process.

try this
grep "aaa" text.txt | awk '{print "John " $0}' > tmp.txt

Related

How to make a strict match with awk

I am querying one file with the other file and have them as following:
File1:
Angela S Darvill| text text text text
Helen Stanley| text text text text
Carol Haigh S|text text text text .....
File2:
Carol Haigh
Helen Stanley
Angela Darvill
This command:
awk 'NR==FNR{_[$1];next} ($1 in _)' File2.txt File1.txt
returns lines that overlap, BUT doesn’t have a strict match. Having a strict match, only Helen Stanley should have been returned.
How do you restrict awk on a strict overlap?
With your shown samples please try following. You were on right track, you need to do 2 things, 1st: take whole line as an index in array a while reading file2.txt and set field seapeator to | before awk starts reading file1
awk -F'|' 'NR==FNR{a[$0];next} $1 in a' File2.txt File1.txt
Command above doesn’t work for me (I am on Mac, don’t know whether it matters), but
awk 'NR==FNR{_[$0];next} ($1 in _)' File2.txt. FS="|" File1.txt
worked well
You can also use grep to match from File2.txt as a list of regexes to make an exact match.
You can use sed to prepare the matches. Here is an example:
sed -E 's/[ \t]*$//; s/^(.*)$/^\1|/' File2.txt
^Carol Haigh|
^Helen Stanley|
^Angela Darvill|
...
Then use process with that sed as an -f argument to grep:
grep -f <(sed -E 's/[ \t]*$//; s/^(.*)$/^\1|/' File2.txt) File1.txt
Helen Stanley| text text text text
Since your example File2.txt has trailing spaces, the sed has s/[ \t]*$//; as the first substitution. If your actual file does not have those trading spaces, you can do:
grep -f <(sed -E 's/.*/^&|/' File2.txt) File1.txt
Ed Morton brings up a good point that grep will still interpret RE meta-characters in File2.txt. You can use the flag -F so only literal strings are used:
grep -F -f <(sed -E 's/.*/&|/' File2.txt) File1.txt

Using cat and grep to print line and its number but ignore at the same time blank lines

I have created a simple script that prints the contents of a text file using cat command. Now I want to print a line along with its number, but at the same time I need to ignore blank lines. The following format is desired:
1 George Jones Berlin 2564536877
2 Mike Dixon Paris 2794321976
I tried using
cat -n catalog.txt | grep -v '^$' catalog.txt
But I get the following results:
George Jones Berlin 2564536877
Mike Dixon Paris 2794321976
I have managed to get rid of the blank lines, but line's number is not printed. What am I doing wrong?
Here are the contents of catalog.txt:
George Jones Berlin 2564536877
Mike Dixon Paris 2794321976
Your solution doesn't work because cat -n catalog.txt is already giving you non-blank lines.
You can pipe grep's output to cat -n:
grep -v '^$' yourFile | cat -n
Example:
test.txt:
Hello
how
are
you
?
$ grep -v '^$' test | cat -n
1 Hello
2 how
3 are
4 you
5 ?
At first glance, you should drop the file name in the command line to grep to make grep read from stdin:
cat -n catalog.txt | grep -v '^$'
^^^
In your code, you supplied catalog.txt to grep, which made it read from the file and ignore its standard input. So you're basically grepping from the file instead of the output of cat piped to its stdin.
To correctly ignore blank lines the prepend line numbers, switch the order of grep and cat:
grep -v '^$' catalog.txt | cat -n
Another awk
$ awk 'NF{$0=FNR " " $0}NF' 48488182
1 George Jones Berlin 2564536877
3 Mike Dixon Paris 2794321976
The second line was blank in this case.
single, simple, basic awk solution could help you here.
Solution 1st:
awk 'NF{print FNR,$0}' Input_file
Solution 2nd: Above will print line number including the line number of NULL lines, in case you want to leave empty lines line number then following may help you in same.
awk '!NF{FNR--;next} NF{print FNR,$0}' Input_file
Solution 3rd: Using only grep, though output will have a colon in between line number and the line.
grep -v '^$' Input_file | grep -n '.*'
Explanation of Solution 1st:
NF: Checking condition here if NF(Number of fields in current line, it is awk's out of the box variable which has the value of number of fields in a line) is NOT NULL, if this condition is TRUE then following the actions mentioned next to it.
{print FNR,$0}: Using print function of awk here to print FNR(Line number, which will have the line's number in it, it is awk's out of box variable) then print $0 which means current line.
By this we satisfy OP's both the conditions of leaving empty lines and print the line numbers along with lines too. I hope this helps you.

Bash Advanced grep solution

I have a file.txt on my Linux which looks like the following structure:
file.txt:
full name
E-mail: email#email.com
Phone: 0123456789
full name
email#email.com
01/23456789
full name
e: email#email.com
00-223-445-56
.
.
.
etc
Or only the name entry and phone number or e-mail address
I would like to use grep to when I start
./myprogram.sh file.txt
list all of the E-mail addresses and Phone numbers from the file. How can I do that if the file.txt looks like this?
you can start wit something simple like that:
cat file.txt | grep -E "(#|[0-9]+)"
it gives you everything with an # (so emails) and everything with numbers (so phone numbers). you can use more advanced regular expressions for better search (emails and phone numbers do have stricter rules...) but thats the idea.
egrep "#|[0-9]"
will match only lines that contain an "#" or at least one digit. Your example says that the name line does not contain digits.
It isn't quite clear which format you expect as an output. If you want to have e-mail addresses and phone numbers separated (is it that what you want? The connection: e-mail address <-> phone number would be kind of confused then) you could also use (GNU) sed:
sed -n -e '1 {s/\(.*\)/e-mail:\n\1/; P;};' \
-e '/#/ s/\(.\+[ \\\t]\)\{0,1\}\(.\+\)#\(.\+\)/\t\2\#\3/p;' \
-e '/[0-9]\+$/ H; $ {x; s/\n\([^0-9]*\)\([0-9]\+\)/\n\t\2/g; s/\(.*\)/\nphone:\n\1/p;}'
file.txt
Do you actually want to use grep, or are you just using the phrase "use grep" to mean "filter". Assuming the latter (since grep is the wrong tool for this), and assuming that each record has a single email address in the final column of the 2nd line and a phone number in the final column of the 3rd line, and that each record is separated by a line with no extra whitespace, you could do:
<file.txt awk '{print $NF}' | awk '{print $2,$3}' FS='\n' RS=
You could do it with a single awk invocation, but this is simpler and probably sufficient.
That's a job for awk, not grep:
$ awk '(NR%4)~/^[23]$/{print $NF}' file
email#email.com
0123456789
email#email.com
01/23456789
email#email.com
00-223-445-56
$ awk '(NR%4)~/^[23]$/{printf "%s%s", $NF, (++c%2?OFS:ORS)}' file
email#email.com 0123456789
email#email.com 01/23456789
email#email.com 00-223-445-56
$ awk '(NR%4)==2{print $NF}' file
email#email.com
email#email.com
email#email.com
$ awk '(NR%4)==3{print $NF}' file
0123456789
01/23456789
00-223-445-56
Take your pick, it's all trivial...

Linux: Sed command output of field 5 to a file

This image is the /etc/passwd fileLinux: What single sed command can be used to output matches from /etc/passwd that have Smith or Jones in their description (5th field) to a file called smith_jones.txt?
I wouldn't use sed, but it looks like you're referencing a standard /etc/passwd file, so something that may do what you're looking for is this:
cat /etc/passwd | awk -F ":" '{if ($5 ~ /Smith/ || $5 ~ /Jones/) print}'
So awk '{print $5}' is commonly used to print the 5th column of something piped to it, in this case the /etc/passwd file. However, as it's not tabular data, I've supplied -F argument with the delimiter ":" as that's what splits our values.
It's then a fairly easy if statement essentially saying, if this string contains Smith OR Jones in it somewhere, print it.

how to print a line resulting from a search keyword in a column using grep command,

This is an example of a filename.csv file
"Sort Order","Common Name","Formal Name","Type","Sub Type"
"1","Afghanistan","Islamic State of Afghanistan","Independent State"
"2","Albania","Republic of Albania","Independent State"
"3","Algeria","People's Democratic Republic of Algeria","Independent State"
"4","Andorra","Principality of Andorra","Independent State"
"5","Angola","Republic of Angola","Independent State"
So what is the grep command to search for angola in common name and print it like this:
"5","Angola","Republic of Angola","Independent State"
I know that we can use:
grep '"Algeria"' filename.csv
However, what if I am being more specific. Let's say Algeria might exist in other column however, we only need to print the one in Common Name column.
I tried
grep '"Common Name | Algeria"' filename.csv
Seems not to work.
You could try the below grep command to print the lines which contains the string Angola in the second column.
grep -E '^"[^"]*","Angola"' file
This could be easily done through awk,
awk -F, '$2=="\"Angola\""' file
try awk
awk -F"," '$2~/Algeria/' file
Use cut command,
grep Algeria filename.csv|cut -f2 -d','|awk -F '[""]' '{print $2}'
Output:
Algeria
Here is the explanation of all command I have added with your grep and discarded some unnecessary quotes you used in it.
cut command:
-f It will extract 2nd column depending on delimiter
-d It will choose delimiter i.e.: which is here ,
awk command:
It will extract value between two "" (block quotes) and print it.

Resources