What are the differences among grep, awk & sed? [duplicate] - linux

This question already has answers here:
What are the differences between Perl, Python, AWK and sed? [closed]
(5 answers)
What is the difference between sed and awk? [closed]
(3 answers)
Closed last month.
I am confused about the differences between grep, awk and sed in terms of their role in Unix/Linux system administration and text processing.

Short definition:
grep: search for specific terms in a file
#usage
$ grep This file.txt
Every line containing "This"
Every line containing "This"
Every line containing "This"
Every line containing "This"
$ cat file.txt
Every line containing "This"
Every line containing "This"
Every line containing "That"
Every line containing "This"
Every line containing "This"
Now awk and sed are completly different than grep.
awk and sed are text processors. Not only do they have the ability to find what you are looking for in text, they have the ability to remove, add and modify the text as well (and much more).
awk is mostly used for data extraction and reporting. sed is a stream editor
Each one of them has its own functionality and specialties.
Example
Sed
$ sed -i 's/cat/dog/' file.txt
# this will replace any occurrence of the characters 'cat' by 'dog'
Awk
$ awk '{print $2}' file.txt
# this will print the second column of file.txt
Basic awk usage:
Compute sum/average/max/min/etc. what ever you may need.
$ cat file.txt
A 10
B 20
C 60
$ awk 'BEGIN {sum=0; count=0; OFS="\t"} {sum+=$2; count++} END {print "Average:", sum/count}' file.txt
Average: 30
I recommend that you read this book: Sed & Awk: 2nd Ed.
It will help you become a proficient sed/awk user on any unix-like environment.

Grep is useful if you want to quickly search for lines that match in a file. It can also return some other simple information like matching line numbers, match count, and file name lists.
Awk is an entire programming language built around reading CSV-style files, processing the records, and optionally printing out a result data set. It can do many things but it is not the easiest tool to use for simple tasks.
Sed is useful when you want to make changes to a file based on regular expressions. It allows you to easily match parts of lines, make modifications, and print out results. It's less expressive than awk but that lends it to somewhat easier use for simple tasks. It has many more complicated operators you can use (I think it's even turing complete), but in general you won't use those features.

I just want to mention a thing, there are many tools can do text processing, e.g.
sort, cut, split, join, paste, comm, uniq, column, rev, tac, tr, nl, pr, head, tail.....
they are very handy but you have to learn their options etc.
A lazy way (not the best way) to learn text processing might be: only learn grep , sed and awk. with this three tools, you can solve almost 99% of text processing problems and don't need to memorize above different cmds and options. :)
AND, if you 've learned and used the three, you knew the difference. Actually, the difference here means which tool is good at solving what kind of problem.
a more lazy way might be learning a script language (python, perl or ruby) and do every text processing with it.

Related

Print previous line if condition is met

I would like to grep a word and then find the second column in the line and check if it is bigger than a value. Is yes, I want to print the previous line.
Ex:
Input file
AAAAAAAAAAAAA
BB 2
CCCCCCCCCCCCC
BB 0.1
Output
AAAAAAAAAAAAA
Now, I want to search for BB and if the second column (2 or 0.1) in that line is bigger than 1, I want to print the previous line.
Can somebody help me with grep and awk? Thanks. Any other suggestions are also welcome. Thanks.
This can be a way:
$ awk '$1=="BB" && $2>1 {print f} {f=$1}' file
AAAAAAAAAAAAA
Explanation
$1=="BB" && $2>1 {print f} if the 1st field is exactly BB and 2nd field is bigger than 1, then print f, a stored value.
{f=$1} store the current line in f, so that it is accessible when reading the next line.
Another option: reverse the file and print the next line if the condition matches:
tac file | awk '$1 == "BB" && $2 > 1 {getline; print}' | tac
Concerning generality
I think it needs to be mentioned that the most general solution to this class of problem involves two passes:
the first pass to add a decimal row number ($REC) to the front of each line, effectively grouping lines into records by $REC
the second pass to trigger on the first instance of each new value of $REC as a record boundary (resetting $CURREC), thereafter rolling along in the native AWK idiom concerning the records to follow matching $CURREC.
In the intermediate file, some sequence of decimal digits followed by a separator (for human reasons, typically an added tab or space) is parsed (aka conceptually snipped off) as out-of-band with respect to the baseline file.
Command line paste monster
Even confined to the command line, it's an easy matter to ensure that the intermediate file never hits disk. You just need to use an advanced shell such as ZSH (my own favourite) which supports process substitution:
paste <( <input.txt awk "BEGIN { R=0; N=0; } /Header pattern/ { N=1; } { R=R+N; N=0; print R; }" ) input.txt | awk -f yourscript.awk
Let's render that one-liner more suitable for exposition:
P="/Header pattern/"
X="BEGIN { R=0; N=0; } $P { N=1; } { R=R+N; N=0; print R; }"
paste <( <input.txt awk $X ) input.txt | awk -f yourscript.awk
This starts three processes: the trivial inline AWK script, paste, and the AWK script you really wanted to run in the first place.
Behind the scenes, the <() command line construct creates a named pipe and passes the pipe name to paste as the name of its first input file. For paste's second input file, we give it the name of our original input file (this file is thus read sequentially, in parallel, by two different processes, which will consume between them at most one read from disk, if the input file is cold).
The magic named pipe in the middle is an in-memory FIFO that ancient Unix probably managed at about 16 kB of average size (intermittently pausing the paste process if the yourscript.awk process is sluggish in draining this FIFO back down).
Perhaps modern Unix throws a bigger buffer in there because it can, but it's certainly not a scarce resource you should be concerned about, until you write your first truly advanced command line with process redirection involving these by the hundreds or thousands :-)
Additional performance considerations
On modern CPUs, all three of these processes could easily find themselves running on separate cores.
The first two of these processes border on the truly trivial: an AWK script with a single pattern match and some minor bookkeeping, paste called with two arguments. yourscript.awk will be hard pressed to run faster than these.
What, your development machine has no lightly loaded cores to render this master shell-master solution pattern almost free in the execution domain?
Ring, ring.
Hello?
Hey, it's for you. 2018 just called, and wants its problem back.
2020 is officially the reprieve of MTV: That's the way we like it, magic pipes for nothing and cores for free. Not to name out loud any particular TLA chip vendor who is rocking the space these days.
As a final performance consideration, if you don't want the overhead of parsing actual record numbers:
X="BEGIN { N=0; } $P { N=1; } { print N; N=0; }"
Now your in-FIFO intermediate file is annotated with just an additional two characters prepended to each line ('0' or '1' and the default separator character added by paste), with '1' demarking first line in record.
Named FIFOs
Under the hood, these are no different than the magic FIFOs instantiated by Unix when you write any normal pipe command:
cat file | proc1 | proc2 | proc2
Three unnamed pipes (and a whole process devoted to cat you didn't even need).
It's almost unfortunate that the truly exceptional convenience of the default stdin/stdout streams as premanaged by the shell obscures the reality that paste $magictemppipe1 $magictemppipe2 bears no additional performance considerations worth thinking about, in 99% of all cases.
"Use the <() Y-joint, Luke."
Your instinctive reflex toward natural semantic decomposition in the problem domain will herewith benefit immensely.
If anyone had had the wits to name the shell construct <() as the YODA operator in the first place, I suspect it would have been pressed into universal service at least a solid decade ago.
Combining sed & awk you get this:
sed 'N;s/\n/ /' < file |awk '$3>1{print $1}'
sed 'N;s/\n/ / : Combine 1st and 2nd line and replace next line char with space
awk '$3>1{print $1}': print $1(1st column) if $3(3rd column's value is > 1)

sed regex with variables to replace numbers in a file

Im trying to replace numbers in my textfile by adding one to them. i.e.
sed 's/3/4/g' path.txt
sed 's/2/3/g' path.txt
sed 's/1/2/g' path.txt
Instead of this, Can i automate it, i.e. find a /d and add one to it in the replace.
Something like
sed 's/\([0-8]\)/\1+1/g' path.txt
Also wanted to capture more than one digit i.e. ([0-9])\t([0-9]) and change each one keeping the tab inbetween
Thanks
edited #2
Using the perl example,
I also would like it to work with more digits i.e.
perl -pi~ -e 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/ ($1+1)\.($2+1)\.($3+1)\.($4+1) /ge' output.txt
Any tips on making the above work?
There is no support for arithmetic in sed, but you can easily do this in Perl.
perl -pe 's/(\d+)/ $1+1 /ge'
With the /e option, the replacement expression needs to be valid Perl code. So to handle your final updated example, you need
perl -pi~ -e 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/ $1+1 . "." $2+1 . "." . $3+1 . "." . $4+1 /ge'
where strings are properly quoted and adjacent strings are concatenated together with the . Perl string concatenation operator. (The arithmetic numbers are coerced into strings as well when they are concatenated with a string.)
... Though of course, the first script already does that more elegantly, since with the /g flag it already increments every sequence of digits with one, anywhere in the string.
Triplee's perl solution is the more generic answer, but Michal's sed solution works well for this particular case. However, Michal's sed solution is more easily written:
sed y/12345678/23456789/ path.txt
and is better implemented as
tr 12345678 23456789 < path.txt
This utterly fails to handle 2 digit numbers (as in the edited question).
You can do it with sed but it's not easy, see this thread.
And it's hard with awk too, see this.
I'd rather use perl for this (something like this can be seen in action # ideone):
perl -pe 's/([0-8])/$1+1/e'
(The ideone.com example must have some looping as ideone does not sets -pe by default.)
You can't do addition directly in sed - you could do it in awk by matching numbers using a regex in each line and increasing the value, but it's quite complicated. If do not need to handle arbitrary numbers but a limited set, like only single-digit numbers from 0 to 8, you can just put several replacement commands on a single sed command line by separating them with semicolons:
sed 's/8/9/g ; s/7/8/g; s/6/7/g; s/5/6/g; s/4/5/g; s/3/4/g; s/2/3/g; s/1/2/g; s/0/1/g' path.txt
This might work for you (GNU sed & Bash):
sed 's/[0-9]/$((&+1))/g;s/.*/echo "&"/e' file
This will add one to every individual digit, to increment numbers:
sed 's/[0-9]\+/$((&+1))/g;s/.*/echo "&"/e' file
N.B. This method is fraught with problems and may cause unexpected results.

Bash history without line numbers [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
The bash history command is very cool. I understand why it shows the line numbers, but is there a way I can invoke the history command and suppress the line numbers?
The point here is to use the history command, so please don't reply cat ~/.bash_history
Current Output:
529 man history
530 ls
531 ll
532 clear
533 cd ~
534 history
Historical graphic source.
Desired Output:
man history
ls
ll
clear
cd ~
history
Historical graphic source.
Thanks to everyone for your great solutions. Paul's is the simplest and will work for me for because my bash history size is set at 2000.
I also wanted to share a cool article I found this morning. It has a couple good options that I am now using, like keeping duplicate entries out of the bash history and making sure multiple bash sessions don't overwrite the history file: http://blog.macromates.com/2008/working-with-history-in-bash/
Try this:
$ history | cut -c 8-
awk can help:
history|awk '{$1="";print substr($0,2)}'
This answer can fail if you have a long history.
If you were willing to switch to zsh isntead of bash, then zsh supports this natively (as well as other options for history formatting)
zsh> fc -ln 0
(See https://serverfault.com/questions/114988/removing-history-or-line-numbers-from-zsh-history-file)
history -w /dev/stdout
From output of history --help:
-w write the current history to the history file
It writes current history to specified file - /dev/stdout in this case.
I'm late on this one, but the shorter method would be to add the following in your ~/.bashrc or ~/.profile file:
HISTTIMEFORMAT="$(echo -e '\r\e[K')"
From bash manpage:
HISTTIMEFORMAT
If this variable is set and not null, its value is used as a
format string for strftime(3) to print the time stamp associated
with each history entry displayed by the history builtin. If
this variable is set, time stamps are written to the history
file so they may be preserved across shell sessions. This uses
the history comment character to distinguish timestamps from
other history lines.
Using this capability, a smart hack consist in making the variable "print" a carriage return (\r) and clear the line (ANSI code K) instead of an actual timestamp.
Alternatively, you could use sed:
history | sed 's/^[ ]*[0-9]\+[ ]*//'
Using alias, you can set this as your standard (stick it in your bash_profile):
alias history="history | sed 's/^[ ]*[0-9]\+[ ]*//'"
Although cut with the -c option works for most practical purposes, I think that piping history to awk would be a better solution. For example:
history | awk '{ $1=""; print }'
OR
history | awk '{ $1=""; print $0 }'
Both of these solutions do the same thing. The output of history is being fed to awk. Awk then blanks out the first column, which corresponds to the numbers in the history command's output. Here awk is more convenient because you don't have to concern yourself with the number of characters in the number part of the output.
print $0 is equivalent to print, since the default is to print everything that appears on the line. Typing print $0 is more explicit, but which one you choose is up to you. The behavior of print $0 and simply print when used with awk is more evident if you used awk to print a file (cat would be faster to type instead of awk, but this is for illustrating a point).
[Ex] Using awk to display the contents of a file with $0
$ awk '{print $0}' /tmp/hello-world.txt
Hello World!
[Ex] Using awk to display the contents of a file without explicit $0
$ awk '{print}' /tmp/hello-world.txt
Hello World!
[Ex] Using awk when the history line spans multiple lines
$ history
11 clear
12 echo "In word processing and desktop publishing, a hard return or paragraph break indicates a new paragraph, to be distinguished from the soft return at the end of a line internal to a paragraph. This distinction allows word wrap to automatically re-flow text as it is edited, without losing paragraph breaks. The software may apply vertical whitespace or indenting at paragraph breaks, depending on the selected style."
$ history | awk ' $1=""; {print}'
clear
echo "In word processing and desktop publishing, a hard return or paragraph break indicates a new paragraph, to be distinguished from the soft return at the end of a line internal to a paragraph. This distinction allows word wrap to automatically re-flow text as it is edited, without losing paragraph breaks. The software may apply vertical whitespace or indenting at paragraph breaks, depending on the selected style."
history command does not have an option to suppress line numbers. You will have to combine multiple commands as everyone is suggesting:
Example :
history | cut -d' ' -f4- | sed 's/^ \(.*$\)/\1/g'
$ hh -n
You may want to try https://github.com/dvorka/hstr which allows for "suggest box style" filtering of Bash history with (optional) metrics based ordering i.e. it is much more efficient and faster in both forward and backward directions:
It can be easily bound to Ctrl-r and/or Ctrl-s
You can use command cut to solve it:
Cut out fields from STDIN or files.
Cut out the first sixteen characters of each line of STDIN:
cut -c 1-16
Cut out the first sixteen characters of each line of the given files:
cut -c 1-16 file
Cut out everything from the 3rd character to the end of each line:
cut -c3-
Cut out the fifth field of each line, using a colon as a field delimiter (default delimiter is tab):
cut -d':' -f5
Cut out the 2nd and 10th fields of each line, using a semicolon as a delimiter:
cut -d';' -f2,10
Cut out the fields 3 through 7 of each line, using a space as a delimiter:
cut -d' ' -f3-7
I know I am late for the party but this is just so much easier to remember:
cat ~/.bash_history
If you are trying to send your history without line numbers to a file and want to have the file for later reference please read below:
history | sed 's/^[ ]*[0-9]\+[ ]*//' >>history.txt
The above command will read your history's content into a text file called history. Which will allow you to have different versions as you progress through your project(s).
I like it, because it helps me simplify automation when executing a bash script (wink)

Simple Text Search Bash

I have a text file with 10 k lines. How do I extract all the lines where a certain keyword appears? It's fundamental that I am able to select the entire line where a certain text pattern shows up. How can I do this in bash?
Use grep to search for text and print matching lines:
grep yourKeyword yourFile.txt
If the pattern consists of several words, you must quote the pattern:
grep "your key string" yourFile.txt
Besides using grep you can also use awk. Plus, awk has the advantage of doing processing as it searches the lines..
awk '/pattern/{ do stuff }' file

What is the easiest way to join 12 columns?

I have 12 columns separated by a tab. How can I join them side-by-side?
[Added] You can also tell me other methods as AWK: the faster the better.
Since you asked specifically about awk (there are tools better suited to the job), the following is a first-cut solution:
awk '{print $1$2$3$4$5$6$7$8$9$10$11$12}'
A more complicated and configurable solution, where you could change the number of columns used for output, would be:
awk -v lim=12 '{for(x=1;x<lim;x++){printf "%s",$x};print ""}'
Other possibilities, if you're not restricted to awk, are:
tr -d '\011' # to combine ALL columns on the line.
cut --output-delimiter='' -f1-12 # more general (1-12 or 3-7 or 1-6,9).
Based on your edit and comments, I suggest cut is the best tool for the job. Use "man cut", "info cut" or "cut --help" for more details (this depends on your platform).
If you are just using awk to concatenate the columns I would use 'tr' and delete tab
cat file1 | tr -d '\011'> file2
Try this:
{
print $1$2$3$4$5$6$7$8$9$(10)$(11)$(12)
}
I'm not an awk genius so I don't know if there's some sort of looping construct you can use.
Well, it depends on your editor/command of choice. But generally, it boild down to replacing the character with nothing.
For example, in vim: ":%s/\t//g"
You did not mention what tool you would like to use but any text editor would be able to replace the tab to an empty character, I guess that would work, that's what I usually do.

Resources