sed behavior: Pattern space and Address range [duplicate] - linux

I noticed something a bit odd while fooling around with sed. If you try to remove multiple line intervals (by number) from a file, but any interval specified later in the list is fully contained within an interval earlier in the list, then an additional single line is removed after the specified (larger) interval.
seq 10 > foo.txt
sed '2,7d;3,6d' foo.txt
1
9
10
This behaviour was behind an annoying bug for me, since in my script I generated the interval endpoints on the fly, and in some cases the intervals produced were redundant. I can clean this up, but I can't think of a good reason why sed would behave this way on purpose.

Since this question was highlighted as needing an answer in the Stack Overflow Weekly Newsletter email for 2015-02-24, I'm converting the comments above (which provide the answer) into a formal answer. Unattributed comments here were made by me in essentially equivalent form.
Thank you for a concise, complete question. The result is interesting. I can reproduce it with your script. Intriguingly, sed '3,6d;2,7d' foo.txt (with the delete operations in the reverse order) produces the expected answer with 8 included in the output. That makes it look like it might be a reportable bug in (GNU) sed, especially as BSD sed (on Mac OS X 10.10.2 Yosemite) works correctly with the operations in either order. I tested using 'sed (GNU sed) 4.2.2' from an Ubuntu 14.04 derivative.
More data points for you/them. Both of these include 8 in the output:
sed -e '/2/,/7/d' -e '/3/,/6/d' foo.txt
sed -e '2,7d' -e '/3/,/6/d' foo.txt
By contrast, this does not:
sed -e '/2/,/7/d' -e '3,6d' foo.txt
The latter surprised me (even accepting the basic bug).
Beats me. I thought given some of sed's arcane constructs that you might be missing the batman symbol or something from the middle of your command but sed -e '2,7d' -e '3,6d' foo.txt behaves the same way and swapping the order produces the expected results (GNU sed 4.2.2 on Cygwin). /bin/sed on Solaris always produces the expected result and interestingly so does GNU sed 3.02. Ed Morton
More data: it only seems to happen with sed 4.2.2 if the 2nd range is a subset of the first: sed '2,5d;2,5d' shows the bug, sed '2,5d;1,5d' and sed '2,5d;2,6d' do not. glenn jackman
The GNU sed home page says "Please send bug reports to bug-sed at gnu.org" (except it has an # in place of ' at '). You've got a good reproduction; be explicit about the output you expect vs the output you get (they'll get the point, but it's best to make sure they can't misunderstand). Point out that the reverse ordering of the commands works as expected, and give the various other commands as examples of working or not working. (You could even give this Q&A URL as a cross-reference, but make sure that the bug report is self-contained so that it can be understood even if no-one follows the URL.)
You can also point to BSD sed (and the Solaris version, and the older GNU 3.02 sed) as behaving as expected. With the old version GNU sed working, it means this is arguably a regression. […After a little experimentation…] The breakage occurred in the 4.1 release; the 4.0.9 release is OK. (I also checked 4.1.5 and 4.2.1; both are broken.) That will help the maintainers if they want to find the trouble by looking at what changed.
The OP noted:
Thanks everyone for comments and additional tests. I'll submit a bug report to GNU sed and post their response. santayana

Related

Is there a simple alternative to "who am i" and "logname"?

I noticed that RHEL 8 and Fedora 30 don't update the utmp file properly.
As a result, commands such as 'who am i', 'last', 'w' etc print incorrect results (who am i actually doesn't print anything)
After a bit of googling, I found 'logname' which worked in this case but I read that gnome is dropping support for utmp altogether so it's a matter of time until this stops working too.
I wrote the following script which finds the login name of the user (even if he is using sudo the moment he runs the command) but it's way too complicated so I'm looking for alternatives.
LOGIN_UID=$(cat /proc/self/loginuid)
LOGIN_NAME=$(awk -v val=LOGIN_UID -F ":" '$3==val{print $1}' /etc/passwd)
Is there a simple alternative which is not based on proper updating of /var/run/utmp ?
Edit 1: Solutions that don't work $HOME, $USER and id return incorrect values when used in a script that has been run with the sudo command. who am i and logname depend on utmp which isn't always updated by the terminal.
Working solution: After a bit of searching, a simpler way than the aforementioned was found in https://unix.stackexchange.com/users/5685/frederik-deweerdt 's comment to his own answer
Link to answer which contains the commment: https://unix.stackexchange.com/a/74312
Answer 1
stat -c "%U" $(tty)
Second answer found at https://stackoverflow.com/a/51765389/10630167
Answer 2
`pstree -lu -s $$ | grep --max-count=1 -o '([^)]*)' | head -n 1 | sed 's/[()]//g'`
Your question is not well-defined because if X and Y are not working, what are the chances that Z will work? This depends entirely on the precise failure mode you are attempting to handle, and there is nothing in your question to reveal the specific circumstances in which you need this.
With that out of the way, perhaps look at the POSIX id command, which has explicit options to print the real (login) or effective (after any setuid command) user id with -r or -u, respectively. Of course, the precise means by which it obtains this information are not specified, and will remain implementation-dependent, and thus might or might not work on your platform under your specific circumstances.
As an aside, here is a refactoring of your code to avoid polluting the variable name space with two separate variables.
LOGIN_NAME=$(awk 'NR==FNR { val=$0; next }
$3==val{print $1}' /proc/self/loginuid FS=":" /etc/passwd)

Executing error while executing sed command

Below given sed command is working fine on online BASH & KSH shell, but getting an error "Illegal operation --r" while trying to run it on linux server.
I'm trying to make a regex to parse MFBBMYKLAXXX from first line.
echo "{1:F01MFBBMYKLAXXX2474811384}{2:O3001434181108BKKBTHBKBXXX12203020241811081534N}{3:{108:241C182AFFD4403C}}{4:
:15A:
:20:10168957
:22A:NEWT
:94A:BILA
:22C:BKKBBK8308MFBBKL
:82A:BKKBTHBK
:87A:MFBBMYKL
:15B:
:30T:20181108
:30V:20181109
:36:32,8308
:32B:THB2500000,
:53A:/610165
BKKBTHBK
:57A:BKKBTHBK
:33B:USD76148,01
:53A:CHASUS33
:57A:/04058664
BKTRUS33
:58A:MFBBMYKL
:15C:
:24D:ELEC/REUTERS D-3000
-}{5:{CHK:4117CD0206B7}}{S:{COP:S}}
" | sed -rn 's/.*\{1:F01([A-Z]{12}).*/\1/p'
The use of sed -r (or in some dialects sed -E) is nonstandard and optional.
It selects a regex dialect called extended regular expressions, which allows you to express some things more succinctly.
POSIX basic regular expressions support pretty much the same facilities, but with an oddball syntax where you have to backslash some characters to obtain their special meaning (which in other words does exactly the opposite of what backslash escaping originally did).
So if you have an extended regular expression like a+(b{2})c then if your sed does not support either -r or -E, try a\+\(b\{2\}\}c without any special option, and hope that your sed is at least roughly on par with what POSIX specifies. (If you're serious about retrocomputing, this is unlikely, though.)
The original regular expression implementation by Ken Thompson only supported the regex metacharacters [...] and . and *, and for a long time, that's all sed supported, too.
Of course, you could always install a more modern sed. I know SunOS used to have some goodies in their xpg4 directory but I have no idea if this was is the case in Solaris; if so, maybe you just need to add /usr/xpg4/bin to your PATH. (According to this it was true at one point in time at least.)

Linux piping data containing backticks

I'm piping output from one command into a second:
mpc listall | mpc add
mpc listall returns the following data (can output 1 or more lines):
Dare - 16 - I´ll Be Your King.mp3
When piping it into the next command, it seems that my shell (Ash on BusyBox) converts the ´ into an asterisk, as I get the error
error adding Dare - 16 - I*ll Be Your King.mp3: No such directory
Manually adding double quotes works! like this:
mpc add "Dare - 16 - I´ll Be Your King.mp3"
So, I tried adding those by sed and awk, but in those cases the backtick gets converted to an asterisk again:
mpc listall | sed 's/^/"/;s/$/"/'
returns:
"Dare - 16 - I*ll Be Your King.mp3"
So, question is, is there a way to pass backticks, or actually any character, as is, into another command without conversions?
By the way, obviously, it's quite bad to have this character in the file name in the first place, but I want my code to be robust and able to handle anything thrown at it.
Not a direct solution the problem, but a work-around (that's actually better, because it uses the API) in python using the python-mpd2 library:
myMpdClient = MPDClient()
myMpdClient.connect("localhost", 6600) # connect to localhost:6600
myMpdClient.findadd('base','directory-name')
myMpdClient.close()
I'll accept a working solution to the actual proposed problem, instead of this one, if someone posts a working solution.

Understanding sed

I am trying to understand how
sed 's/\^\[/\o33/g;s/\[1G\[/\[27G\[/' /var/log/boot
worked and what the pieces mean. The man page I read just confused me more and I tried the info sai Id but had no idea how to work it! I'm pretty new to Linux. Debian is my first distro but seemed like a rather logical place to start as it is a root of many others and has been around a while so probably is doing stuff well and fairly standardized. I am running Wheezy 64 bit as fyi if needed.
The sed command is a stream editor, reading its file (or STDIN) for input, applying commands to the input, and presenting the results (if any) to the output (STDOUT).
The general syntax for sed is
sed [OPTIONS] COMMAND FILE
In the shell command you gave:
sed 's/\^\[/\o33/g;s/\[1G\[/\[27G\[/' /var/log/boot
the sed command is s/\^\[/\o33/g;s/\[1G\[/\[27G\[/' and /var/log/boot is the file.
The given sed command is actually two separate commands:
s/\^\[/\o33/g
s/\[1G\[/\[27G\[/
The intent of #1, the s (substitute) command, is to replace all occurrences of '^[' with an octal value of 033 (the ESC character). However, there is a mistake in this sed command. The proper bash syntax for an escaped octal code is \nnn, so the proper way for this sed command to have been written is:
s/\^\[/\033/g
Notice the trailing g after the replacement string? It means to perform a global replacement; without it, only the first occurrence would be changed.
The purpose of #2 is to replace all occurrences of the string \[1G\[ with \[27G\[. However, this command also has a mistake: a trailing g is needed to cause a global replacement. So, this second command needs to be written like this:
s/\[1G\[/\[27G\[/g
Finally, putting all this together, the two sed commands are applied across the contents of the /var/log/boot file, where the output has had all occurrences of ^[ converted into \033, and the strings \[1G\[ have been converted to \[27G\[.

Using -s command in bash script

I have a trivial error that I cant seem to get around. Im trying to return the various section numbers of lets say "man" since it resides in all the sections. I am using the -s command but am having problems. Every time I use it I keep getting "what manual page do you want". Any help?
In the case of getting the section number of a command, you want something like man -k "page_name" | awk -F'-' "/^page_name \(/ {print $1}", replacing any occurrence of page_name with whatever command you're needing.
This won't work for all systems necessarily as the format for the "man" output is "implementation-defined". In other words, the format on FreeBSD, OS X, various flavours of Linux, etc. may not be the same. For example, mine is:
page_name (1) - description
If you want the section number only, I'm sure there is something you can do such as saving the result of that line in a shell variable and use parameter expansion to remove the parentheses around the section number:
man -k "page_name" | awk -F'-' "/^page_name \(/ {print $1}" | while IFS= read sect ; do
sect="${sect##*[(]}"
sect="${sect%[)]*}"
printf '%s\n' "$sect"
done
To get the number of sections a command appears in, add | wc -l at the end on the same line as the done keyword. For the mount command, I have 3:
2
2freebsd
8
You've misinterpreted the nature of -s. From man man:
-S list, -s list, --sections=list
List is a colon- or comma-separated list of `order specific' manual sections to search. This option overrides the
$MANSECT environment variable. (The -s
spelling is for compatibility with System V.)
So when man sees man -s man it thinks you want to look for a page in section "man" (which most likely doesn't exist, since it is not a normal section), but you didn't say what page, so it asks:
What manual page do you want?
BTW, wrt "man is just the test case cuz i believe its in all the sections" -- nope, it is probably only in one, and AFAIK there isn't any word with a page in all sections. More than 2 or 3 would be very unusual.
The various standard sections are described in man man too.
The correct syntax requires an argument. Typically you're looking for either
man -s 1 man
to read the documentation for the man(1) command, or
man -s 7 man
to read about the man(7) macro package.
If you want a list of standard sections, the former contains that. You may have additional sections installed locally, though. A directory listing of /usr/local/share/man might reveal additional sections, for example.
(Incidentally, -s is not a "command" in this context, it's an option.)

Resources