The Concept of 'Hold space' and 'Pattern space' in sed - linux

I'm confused by the two concepts in sed: hold space and pattern space. Can someone help explain them?
Here's a snippet of the manual:
h H Copy/append pattern space to hold space.
g G Copy/append hold space to pattern space.
n N Read/append the next line of input into the pattern space.
These six commands really confuse me.

When sed reads a file line by line, the line that has been currently read is inserted into the pattern buffer (pattern space). Pattern buffer is like the temporary buffer, the scratchpad where the current information is stored. When you tell sed to print, it prints the pattern buffer.
Hold buffer / hold space is like a long-term storage, such that you can catch something, store it and reuse it later when sed is processing another line. You do not directly process the hold space, instead, you need to copy it or append to the pattern space if you want to do something with it. For example, the print command p prints the pattern space only. Likewise, s operates on the pattern space.
Here is an example:
sed -n '1!G;h;$p'
(the -n option suppresses automatic printing of lines)
There are three commands here: 1!G, h and $p. 1!G has an address, 1 (first line), but the ! means that the command will be executed everywhere but on the first line. $p on the other hand will only be executed on the last line. So what happens is this:
first line is read and inserted automatically into the pattern space
on the first line, first command is not executed; h copies the first line into the hold space.
now the second line replaces whatever was in the pattern space
on the second line, first we execute G, appending the contents of the hold buffer to the pattern buffer, separating it by a newline. The pattern space now contains the second line, a newline, and the first line.
Then, h command inserts the concatenated contents of the pattern buffer into the hold space, which now holds the reversed lines two and one.
We proceed to line number three -- go to the point (3) above.
Finally, after the last line has been read and the hold space (containing all the previous lines in a reverse order) have been appended to the pattern space, pattern space is printed with p. As you have guessed, the above does exactly what the tac command does -- prints the file in reverse.

#Ed Morton: I disagree with you here. I found sed very useful and simple (once you grok the concept of the pattern and hold buffers) to come up with an elegant way to do multiline grepping.
For example, let's take a text file that has hostnames and some information about each host, with lots of junk in between that I dont care about.
Host: foo1
some junk, doesnt matter
some junk, doesnt matter
Info: about foo1 that I really care about!!
some junk, doesnt matter
some junk, doesnt matter
Info: a second line about foo1 that I really care about!!
some junk, doesnt matter
some junk, doesnt matter
Host: foo2
some junk, doesnt matter
Info: about foo2 that I really care about!!
some junk, doesnt matter
some junk, doesnt matter
To me, an awk script to just get the lines with the hostname and the corresponding info line would take a bit more than what I'm able to do with sed:
sed -n '/Host:/{h}; /Info/{x;p;x;p;}' myfile.txt
output looks like:
Host: foo1
Info: about foo1 that I really care about!!
Host: foo1
Info: a second line about foo1 that I really care about!!
Host: foo2
Info: about foo2 that I really care about!!
(Note that Host: foo1 appears twice in the output.)
Explanation:
-n disables output unless explicitly printed
first match, finds and puts the Host: line into hold buffer (h)
second match, finds the next Info: line, but first exchanges (x) current line in pattern buffer with hold buffer, and prints (p) the Host: line, then re-exchanges (x) and prints (p) the Info: line.
Yes, this is a simplistic example, but I suspect this is a common issue that was quickly dealt with by a simple sed one-liner. For much more complex tasks, such as ones in which you cannot rely on a given, predictable sequence, awk may be better suited.

Although #January's answer and the example are nice, the explanation was not enough for me. I had to search and learn a lot until I managed to understand how exactly sed -n '1!G;h;$p' works. So I'd like to elaborate on the command for someone like me.
First of all, let's see what the command does.
$ echo {a..d} | tr ' ' '\n' # Prints from 'a' to 'd' in each line
a
b
c
d
$ echo {a..d} | tr ' ' '\n' | sed -n '1!G;h;$p'
d
c
b
a
It reverses the input like tac command does.
sed reads line-by-line, so let's see what happens on the patten space and the hold space at each line. As h command copies the contents of the pattern space to the hold space, both spaces have the same text.
Read line Pattern Space / Hold Space Command executed
-----------------------------------------------------------
a a$ h
b b\na$ 1!G;h
c c\nb\na$ 1!G;h
d d\nc\nb\na$ 1!G;h;$p
At the last line, $p prints d\nc\nb\na$ which is formatted to
d
c
b
a
If you want to see the pattern space for each line, you can add an l command.
$ echo {a..d} | tr ' ' '\n' | sed -n '1!G;h;l;$p'
a$
b\na$
c\nb\na$
d\nc\nb\na$
d
c
b
a
I found it very helpful to watch this video tutorial Understanding how sed works, as the guy shows how each space will be used step by step. The hold spaced is referred in the 4th tutorial, but I recommend watching all the videos if you are not familiar with sed.
Also GNU sed document and Bruce Barnett's Sed tutorial are very good references.

Related

Replace same occurrence of word with different words using sed

I want to replace 2 same words in file(app.properties) with 2 different words using sed command.
Example:
mysql.host=<<CHANGE_ME>>
mysql.username=testuser
mysql.port=3306
mysql.db.password=<<CHANGE_ME>>
required output will be
mysql.host=localhost
mysql.username=testuser
mysql.port=3306
mysql.db.password=password123
I tried below command:
sed -e "s/<<CHANGE_ME>>/localhost/1" -e "s/<<CHANGE_ME>>/password123/2" app.properties > /home/centos/SCRIPT/io.properties_new
However I am getting localhost at both the places.
I'm sure it's not impossible, but also that you will not be able to figure out how it works once you find an answer. A better solution is to switch to a language which is more human-readable, so you can understand what it does.
awk 'BEGIN { split("localhost:password123", items, ":") }
/<<CHANGE_ME>>/ { sub(/<<CHANGE_ME>>/, items[++i]) } 1' input_file >output_file
The BEGIN block creates an array items of replacements. The main script then increments i every time we perform a replacement, indexing further into items for the replacement string.
This may be possible but I don't know if this is really readable for everyone.
Something like this might suite you :
sed -e '0,/<<CHANGE_ME>>/{s/<<CHANGE_ME>>/localhost/}' -e '1,/<<CHANGE_ME>>/{s/<<CHANGE_ME>>/password123/}' app.properties > /home/centos/SCRIPT/io.properties_new
If you have any idea to improve this, don't hesitate. I would really like to learn the best way to do this too :D
With sed, using a wonderfully confusing if (first time) do x, else do y logic:
sed '/<CHANGE_ME>/{bb;:a;s/<CHANGE_ME>/password123/;:b;x;s/E//;x;ta;s/<CHANGE_ME>/localhost/;x;s/^/E/;x}' input_file
Writing each command of the sed script on its own line makes it more understandable, or at least easier for me to expain it:
/<CHANGE_ME>/{
bb
:a
s/<CHANGE_ME>/password123/
:b
x
s/E//
x
ta
s/<CHANGE_ME>/localhost/
x
s/^/E/
x
}
Here's the explanation:
/<CHANGE_ME>/{…} means that the stuff in {…} is only applied to lines matching <CHANGE_ME>;
bb: "branch to (go to) :b", in this case used to skip the first substitution command;
:a: a target for another branch or test-and-branch command;
s/…/…/: you know what it does, but we skip this the first time the script is run;
b: branches to the end of the script, skipping everything (because we are giving no argument to b);
:b: the target of the command bb at 1.;
x: swap patter space (the line you're dealing with at the moment), with the hold space (a kind of variable that you can put stuff into via x, h, and H commands);
s/E//: tries to match and delete a E (just because that's the initial of my name), which fails the first time we run this, because the hold space that we've swapped earlier with the patter space was empty;
x: undos what the previous x did, so we're back on working with the line matching <CHANGE_ME>;
ta: tests if last peformed s/…/…/ command succeeded and, if so, it goes to :a, otherwise it's a no-op; the first time we run the script this is a no-op, because step 6 failed;
s/…/…/: you know what it does;
x: see above
s/^/E/: inserts the E at the beginning of the line, so that next time we run the script substitution of step 7 succeedes, step 9 successfully branches to :a, step 3 is peformed for the first time, and step 4 exits the script for ever;
x: see above
Perhaps this might help:
sed -e '1s/<<CHANGE_ME>>/localhost/' \
-e '4s/<<CHANGE_ME>>/password123/' \
app.properties > /home/centos/SCRIPT/io.properties_new

'N' and 'D' not working as expected with sed

sed 'N; D' testfile
testfile contains:
this is the first line
this is the second line
this is the third line
this is the fourth line
I am using RHEL 6 and the output comes as:
this is the fourth line
As per my understanding, N just pulls in the next line into the pattern space and D deletes just the first line of the pattern space. Therefore, the output should have been:
this is the second line
this is the fourth line
Can someone please explain why the output is coming as mentioned above?
According to the documentation:
D
If pattern space contains no newline, start a normal new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the first newline, and restart cycle with the resultant pattern space, without reading a new line of input.
(Emphasis mine.)
It sounds like this would restart your sed program from the beginning, reading and deleting lines until it runs out of input, at which point only the last line is left in the buffer.
As already shown using D will move to the beginning of program. You can however use the following to print even lines:
sed -n 'n;p'
and to print odds:
sed 'n;d'
In GNU sed you can also use:
sed '0~2!d' # Odd
sed '1~2!d' # Even
An alternative can be something like:
N;s/^[^\n]*\n//
which will read the next line into the pattern space and then substitute the first away.
One might ask why this is the behavior. One reason is to make things like this possible, working with multiply lines in the pattern space:
$!N;/\npattern$/d;P;D
The above will delete lines matching pattern as well as the line before.

extract first instance per line (maybe grep?)

I want to extract the first instance of a string per line in linux. I am currently trying grep but it yields all the instances per line. Below I want the strings (numbers and letters) after "tn="...but only the first set per line. The actual characters could be any combination of numbers or letters. And there is a space after them. There is also a space before the tn=
Given the following file:
hello my name is dog tn=12g3 fun 23k3 hello tn=1d3i9 cheese 234kd dks2 tn=6k4k ksk
1263 chairs are good tn=k38493kd cars run vroom it95958 tn=k22djd fair gold tn=293838 tounge
Desired output:
12g3
k38493
Here's one way you can do it if you have GNU grep, which (mostly) supports Perl Compatible Regular Expressions with -P. Also, the non-standard switch -o is used to only print the part matching the pattern, rather than the whole line:
grep -Po '^.*?tn=\K\S+' file
The pattern matches the start of the line ^, followed by any characters .*?, where the ? makes the match non-greedy. After the first match of tn=, \K "kills" the previous part so you're only left with the bit you're interested in: one or more non-space characters \S+.
As in Ed's answer, you may wish to add a space before tn to avoid accidentally matching something like footn=.... You might also prefer to use something like \w to match "word" characters (equivalent to [[:alnum:]_]).
Just split the input in tn=-separators and pick the second one. Then, split again to get everything up to the first space:
$ awk -F"tn=" '{split($2,a, " "); print a[1]}' file
12g3
k38493kd
$ awk 'match($0,/ tn=[[:alnum:]]+/) {print substr($0,RSTART+4,RLENGTH-4)}' file
12g3
k38493kd

Ignore spaces, tabs and new line in SED

I tried to replace a string in a file that contains tabs and line breaks.
the command in the shell file looked something like this:
FILE="/Somewhere"
STRING_OLD="line 1[ \t\r\n]*line 2"
sed -i 's/'"$STRING_OLD"'/'"$STRING_NEW"'/' $FILE
if I manually remove the line breaks and the tabs and leave only the spaces then I can replace successfully the file. but if I leave the line breaks then SED is unable to locate the $STRING_OLD and unable to replace to the new string
thanks in advance
Kobi
sed reads lines one at a time, and usually lines are also processed one at a time, as they are read. However, sed does have facilities for reading additional lines and operating on the combined result. There are several ways that could be applied to your problem, such as:
FILE="/Somewhere"
STRING_OLD="line 1[ \t\r\n]*line 2"
sed -n "1h;2,\$H;\${g;s/$STRING_OLD/$STRING_NEW/g;p}"
That that does more or less what you describe doing manually: it concatenates all the lines of the file (but keeps newlines), and then performs the substitution on the overall buffer, all at once. That does assume, however, either that the file is short (POSIX does not require it to work if the overall file length exceeds 8192 bytes) or that you are using a sed that does not have buffer-size limitations, such as GNU sed. Since you tagged Linux, I'm supposing that GNU sed can be assumed.
In detail:
the -n option turns off line echoing, because we save everything up and print the modified text in one chunk at the end.
there are multiple sed commands, separated by semicolons, and with literal $ characters escaped (for the shell):
1h: when processing the first line of input, replace the "hold space" with the contents of the pattern space (i.e. the first line, excluding newline)
2,\$H: when processing any line from the second through the last, append a newline to the hold space, then the contents of the pattern space
\${g;s/$STRING_OLD/$STRING_NEW/g;p}: when processing the last line, perform this group of commands: copy the hold space into the pattern space; perform the substitution, globally; print the resulting contents of the pattern space.
That's one of the simpler approaches, but if you need to accommodate seds that are not as capable as GNU's with regard to buffer capacity then there are other ways to go about it. Those start to get ugly, though.

sed lines in script - exercise

a simple question please
i have this code, and it add the word echo for all lines, but i want exclusively to odd lines
i kwow that this code sed -n 1~2p' show me all odd lines, but i can't doing the same in script above
sed 's/.*/echo &/' $startdirectory
thanks
For a literal answer to what you're asking (apply an 's' action to every other line), you want
sed -e '1~2s/.*/echo &/'
This is marginally better than the N way of doing things in that it doesn't interfere with other things you might want to do to other lines in the file.
Actually rather simple:
sed -e 'N;s/^/echo /'
The N reads a second line into the pattern space; the substitute puts 'echo' in front of the first, the implicit print prints both lines and empties the pattern space.
Note that if you have an odd number of lines, it drops the last. Fixing that is an exercise for the reader.

Resources