Bloated echo command - gnu

Look at the following implementations of the "echo" command:
http://bxr.su/o/bin/echo/echo.c (OpenBSD)
http://bxr.su/d/bin/echo/echo.c (DragonFly)
http://bxr.su/n/bin/echo/echo.c (NetBSD)
http://bxr.su/f/bin/echo/echo.c (FreeBSD)
http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/echo.c (GNU)
As you go down the list, I'm sure you'll notice the increasing bloat in each implementation.
What is the point of a 272 line echo program?

In their article 'Program Design in the UNIX Environment', Pike & Kernighan discuss how the cat program accreted control arguments. Somewhere, though not that article, there was a comment about 'cat came back from Berkeley waving flags'. This is a similar issue to the problem with echo developing options. (I found a reference to the relevant article in the BSD (Mac OS X) man page for cat: Rob Pike, "UNIX Style, or cat -v Considered Harmful", USENIX Summer Conference Proceedings, 1983. See also http://quotes.cat-v.org/programming/)
In their book 'The UNIX Programming Environment', Kernighan & Pike (yes, those two again) quote Doug McIlroy on the subject of what 'echo' should do with no arguments (circa 1984):
Another question of philosophy is what echo should do if given no arguments - specifically, should it print a blank line or nothing at all. All the current echo implementations we know print a blank line, but past versions didn't, and there were great debates on the subject. Doug McIlroy imparted the right feelings of mysticism in his discussion on the topic:
The UNIX and the Echo
There dwelt in the land of New Jersey the UNIX, a fair maid whom savants travelled far to admire. Dazzled by her purity, all sought to espouse her, one for her virginal grace, another for her polished civility, yet another for her agility in performing exacting tasks seldom accomplished even in much richer lands. So large of heart and accommodating of nature was she that the UNIX adopted all but the most insufferably rich of her suitors. Soon, many offspring grew and prospered and spread to the ends of the earth.
Nature herself smiled and answered to the UNIX more eagerly than to other mortal beings. Humbler folk, who knew little of more courtly manners, delighted in her echo, so precise and crystal clear they scarce believed she could be answered by the same rocks and woods that so garbled their own shouts into the wilderness. And the compliant UNIX obliged with perfect echoes of whatever she was asked.
When one impatient swain asked the UNIX, 'Echo nothing', the UNIX obligingly opened her mouth, echoed nothing, and closed it again.
'Whatever do you mean,' the youth demanded, 'opening your mouth like that? Henceforth never open your mouth when you are supposed to echo nothing!' And the UNIX obliged.
'But I want a perfect performance, even when you echo nothing,' pleaded a sensitive youth, 'and no perfect echoes can come from a closed mouth.' Not wishing to offend either one, the UNIX agreed to say different nothings for the impatient youth and the insensitive youth. She called the sensitive nothing '\n'.
Yet now when she said '\n', she was not really saying nothing so she had to open her mouth twice, once to say '\n' and once to say nothing, and so she did not please the sensitive youth, who said forthwith, 'The \n sounds like a perfect nothing to me, but the second one ruins it. I want you to take back one of them.' So the UNIX, who could not abide offending, agreed to undo some echoes, and called that '\c'. Now the sensitive youth could hear a perfect echo of nothing by asking for '\n' and '\c' together. But they say that he died of a surfeit of notation before he ever heard one.
The Korn shell introduced (or, at least, included) a printf command that was based on the C language printf() function, and that uses a format string to control how the material should appear. It is a better tool for complicated formatting than echo. But, because of the history outlined in the quote, echo doesn't just echo any more; it interprets what it is given to echo.
And interpreting the command line arguments to echo indubitably requires more code than not interpreting them. A basic echo command is:
#include <stdio.h>
int main(int argc, char **argv)
{
const char *pad = "";
while (*++argv)
{
fputs(pad, stdout);
fputs(*argv, stdout);
pad = " ";
}
fputc('\n', stdout);
return 0;
}
There are other ways to achieve that. But the more complex versions of echo have to scrutinize their arguments before printing anything - and that takes more code. And different systems have decided that they want to do different amounts of interpretation of their arguments, leading to different amounts of code.

You will notice there is not really that much bloat growth.
Most of the lines of code are comments.
Most of the lines of code that are not comments, are usage documentation, so when somebody goes 'echo --help' it will do something.
Code outside of the above appears largely to be handling the arguments echo can take, as well as the "special" expansion for symbols such as \n and \t to be their equivalent characters instead of echoing them literally.
Also, most of the time, you're not even running the echo command, most of the time 'echo' invokes a shell built-in. At least on my machine, you have to type /bin/echo --help to get all the advanced functionality/documentation out of it, because echo --help merely echo's --help
For a good example, run this in your shell.
echo '\e[31mhello\e[0m'
Then run this:
echo -e '\e[31mhello\e[0m'
And you will note vastly different results.
The former will just emit the input as-is, but the latter will print hello coloured red.
Another example, using the 'hex' code:
$echo '\x64\x65\x66'
\x64\x65\x66
$echo -e '\x64\x65\x66'
def
Vastly different behaviour. The openbsd implementation cannot to this =).

I'm not sure if I like the first implementation: it has one option too many!
If a -n option was required, then also adding a -- option to stop option processing would be helpful. That way if you are writing a shell script that reads and prints user input, you don't get inconsistent behavior if the user types -n.

Related

Why append an extra character in `test`/`[` string comparison in POSIX sh? [duplicate]

This question already has answers here:
Why do shell script comparisons often use x$VAR = xyes?
(7 answers)
Closed 2 years ago.
After reading Debian's /usr/bin/startx, there was something I found peculiar:
mcookie=`/usr/bin/mcookie`
if test x"$mcookie" = x; then
echo "Couldn't create cookie"
exit 1
fi
I don't understand why the extra x is necessary - wouldn't it be equivalent to write the following?
mcookie=`/usr/bin/mcookie`
if test "$mcookie" = ""; then
echo "Couldn't create cookie"
exit 1
fi
I initially thought that very early versions of sh might have printed some error if the mkcookie variable happened to be unset (this wasn't the only instance; these types of comparisons were scattered liberally throughout the script). But then it didn't make much sense after further thought because the variable was being quoted and the shell would expand it to an empty string.
I perused the Dash, --posix section of the Bash man page and checked POSIX itself. It said nothing about it under test: s1 = s2 True if the strings s1 and s2 are identical; otherwise, false. I assume the people that wrote the script knew what they were doing - so can someone shed some light on this?
Thankies,
Edwin
Here's an excerpt from the Wooledge Bash Pitfall #4:
You may have seen code like this:
[ x"$foo" = xbar ] # Ok, but usually unnecessary.
The x"$foo" hack is required for code that must run on very ancient shells which lack [[, and have a more primitive [, which gets confused if $foo begins with a -. On said older systems, [ still doesn't care whether the token on the right hand side of the = begins with a -. It just uses it literally. It's just the left-hand side that needs extra caution.
Note that shells that require this workaround are not POSIX-conforming. Even the Heirloom Bourne shell doesn't require this (probably the non-POSIX Bourne shell clone that's still most widely in use as a system shell). Such extreme portability is rarely a requirement and makes your code less readable (and uglier).
If you find it in code from this century, it's just cargo culting that keeps being reinforced by people who don't quote.
/usr/bin/startx in particular has used this idiom since at least XFree86 2.1 in Feb 1993, and any updates have likely just matched style since.

What is the proper way of filtering type errors in Bash scripts?

Some years ago, while we were learning Bash script, our teacher taught us that the first thing we should do was filtering the "type" of the arguments given and throw an error in case they were different from expected.
(e. g., if our script takes 2 arguments to sum them, we should first check if they were indeed numbers)
The method he told us to use was sed command.
In the example above, we would take all the characters from the arguments that are different from numbers, and if there's any, we would consider it as a non numerical value, throwing an error.
Since this method looks pretty rudimentary to me, I would like to ask:
Is there any elegant way of achieving this?
Is this process actually done when building scripts?
sed is tedious overkill when the shell has this functionality built in.
case $1 in *[!0-9]*) echo "$0: not a number: $1" >&2; exit 127;; esac
(If you want to allow for negative numbers or non-integers, obviously adapt the glob pattern.)
The shell itself doesn't have any types at all -- everything is a string.

In a bash function, how do I get stdin into a variable

I want to call a function with a pipe, and read all of stdin into a variable.
I read that the correct way to do that is with read, or maybe read -r or read -a. However, I had a lot of problems in practise doing that (esp with multi-line strings).
In the end I settled on
function example () {
local input=$(cat)
...
}
What is the idiomatic way to do this?
input=$(cat) is a perfectly fine way to capture standard input if you really need to. One caveat is that command substitutions strip all trailing newlines, so if you want to make sure to capture those as well, you need to ensure that something aside from the newline(s) is read last.
input=$(cat; echo x)
input=${input%x} # Strip the trailing x
Another option in bash 4 or later is to use the readarray command, which will populate an array with each line of standard input, one line per element, which you can then join back into a single variable if desired.
readarray foo
printf -v foo "%s" "${foo[#]}"
I've found that using cat is really slow in comparison to the following method, based on tests I've run:
local input="$(< /dev/stdin)"
In case anyone is wondering, < is just input redirection. From the bash-hackers wiki:
When the inner command is only an input redirection, and nothing else,
for example
$( <FILE )
# or
` <FILE `
then Bash attempts to read the given file and act just if the given
command was cat FILE.
Remarks about portability
In terms of how portable this method is, you are likely to go your entire linux user career, and never use a linux system which doesn't have /dev/stdin, but in case you want to satisfy that itch, here is a question on Unix Stackexchange which questions portability of directly accessing /dev/{stdin,stdout,stderr} and friends.
One more thing I've come across when working with linux containers such as ones built with docker, or buildah, is that there are situations where /dev/stdin or even /dev/stdout are not available inside the container. I've not been able to conclusively say what causes this.
There are a few overlapping / very similar questions floating around on SO. I answered this here, using the read built-in:
https://stackoverflow.com/a/58452863/3220983
In my answers there, however, I am ONLY concerned with a single line.
The arguable weakness of the cat approach, is that requires spawning a subshell. Otherwise, it's a good one. It's probably the easiest way to deal with multi line processing, as specifically queried here.
I think the read approach is faster / more resource efficient if you are trying to chain a lot of commands, or iterate through a list calling a function repeatedly.

In bash, how do I execute the contents of a variable, VERBATIM, as though they were a command line?

I need to construct a complex command that includes quoted arguments. As it happens, they are arguments to grep, so I'll use that as my example and deeply simplify the command to just enough to demonstrate the error.
Let's start with a working example:
> COMMAND='/usr/bin/grep _'
> echo $COMMAND
/usr/bin/grep _
> $COMMAND
foo <- I type this, and grep filters it out.
foo_ <- I type this, and.....
foo_ <- ... it matches, so grep emits it.
"foo" is not echoed back because it lacks an underscore, "foo_" has one, so it's returned. Let's get to a demonstration of the problem:
> COMMAND='/usr/bin/grep "_ _"'
> echo -E $COMMAND
/usr/bin/grep "_ _"
> /usr/bin/grep "_ _" <- The exact same command line
foo <- fails to match
foo_ _ <- matches, so it gets echoed back
foo_ _
> $COMMAND <- But that command doesn't work from a variable
grep: _": No such file or directory
In other words, when this command is invoked through a variable name, bash is taking the space between underscores as an argument delimiter - despite the quotes.
Normally, I'd fix this with backslashes:
> COMMAND='/usr/bin/grep "_\ _"'
> $COMMAND
grep: trailing backslash (\)
Okay, maybe I need another layer of escaping the backslash:
> COMMAND='/usr/bin/grep "_\\ _"'
12:32 (master) /Users/ronbarry> $COMMAND
grep: _": No such file or directory
And now we're back to square one - the command line is still being broken up at the space. I can, of course, verify all of this with some debugging, which establishes that the backslashes are surviving, unescaped, and grep is being called with multiple arguments:
> set -x
> $COMMAND
+ /usr/bin/grep '"_\\' '_"' <- grep is being called with two args
I have a solution to the problem that takes advantage of arrays, but packing commands this way (in my full implementation, which I'll spare you) is unfamiliar to most people who'd read my code. To oversimplify the creation of an array-based command:
> declare -a COMMAND=('/usr/bin/grep' '-i' 'a b')
12:44 (master) /Users/ronbarry> ${COMMAND[*]}
foo <- Same old, same old
fooa B <- ...
fooa B <- Matches because of case-insensitive (-i) grep.
Finally we get to the question. Why does bash break up quoted arguments in strings when interpreting them as commands and why doesn't there seem to be a string-y way to get it to work? If I have a command packed in a string variable, it violates the Principle of Least Surprise to have that string interpreted differently than the string itself would be. If someone can point me at some docs that cover all of this, and will set me at peace with why I have to resort to the infinitely uglier mechanism of building up arrays with all of my commands, I'd very much appreciate it.
Disclaimer: After writing the following, I almost decided that the question should be closed for encouraging opinion-based responses. This is an opinion-based response. Proceed at your own risk.
Why does bash break up quoted arguments in strings when interpreting them as commands
Because that's what it does. A more interesting question might be "Why does bash break up strings at all?", to which the only possible answer would be "it seemed like a good idea at the time".
Or, to put it another way: In the beginning, nobody thought of putting spaces into filenames. When you only had a few letters for a filename, you didn't waste any of them on spaces. So it seemed reasonable to represent a list of words as just a space-separated list of words, and that was the basis on which shell languages were developed. So the default behaviour of bash, like that of all unix-y shells, is to consider a string with whitespace in it to be a whitespace-separated list of words.
But, of course, that leads to all sorts of headaches, because strings are not structured data. Sometimes a filename does have whitespace in its name. And not all utility arguments are filenames, either. Sometimes you want to give an argument to a utility which is, for example, a sentence. Without that complication, shells were able to avoid making you type quotes, unlike "real" programming languages where strings need to be quoted. But once you decide that sometimes a space in a string is just another character, you need to have some kind of quoting system. So then the syntax of shells added several quoting forms, each with slightly different semantics. The most common is double-quoting, which marks the contents as a single word but still allows variable expansion.
It remains the case that shell quotes, like quotes in any other language, are simply syntactic constructs. They are not part of the string, and the fact that a particular character in a string was marked with a quote (or, equivalently, a backslash) is not retained as part of the string -- again, just like any other programming language. Strings are not really lists of words; they are just treated that way by default.
All of that is not very satisfactory. The nature of shell programming is that you really want a data structure which is a list of "words" -- or, better, a list of strings. And, eventually, shells got around to doing that. Unfortunately, by then there wasn't much syntactic space left in shell languages; it was considered important that the new features not change the behaviour of existing shell scripts. As far as I know, the current shell syntax for arrays was created by David Korn in 1988 (or earlier); eventually, bash also implemented arrays with basically the same syntax.
One of the curiosities in the syntax is that there are three ways of specifying that an entire array should be substituted:
${array[*]} or ${array[#]}: concatenate all the array elements together separated with the first character in $IFS, and then consider the result to be a whitespace-separated list of words.
"${array[*]}": concatenate all the array elements together separated with the first character in $IFS, and then consider the result to be a single word.
"${array[#]}": each array element is inserted as a separate word.
Of these, the first one is essentially useless; the second one is occasionally useful, and the third -- and most difficult to type -- is the one you almost always want.
In the above brief discussion, I left out any consideration of glob characters and filename expansion, and a number of other shell idiosyncrasies. So don't take it as a complete tutorial, by any means.
why doesn't there seem to be a string-y way to get it to work?
You can always use eval. Unfortunately. If you really really want to get bash to interpret a string as though it were a bash program rather than a string, and if you are prepared to open your script up to all manner of injection attacks, then the shell will happily give you enough rope. Personally, I would never allow a script which used eval to pass code review so I'm not going to expand on its use here. But it's documented.
If I have a command packed in a string variable, it violates the Principle of Least Surprise to have that string interpreted differently than the string itself would be.
Surprise is really in the eye of the beholder. There are probably lots of programmers who think that a newline character really occupies two bytes, and are Surprised when it turns out that in C, '\n'[0] is not a backslash. But I think most of us would be Surprised if it were. (I've tried to answer SO questions based on this misunderstanding, and it is not easy.)
Bash strings, regardless of anything else, are strings. They are not bash programs. Having them suddenly interpreted as bash programs would, in my opinion, not only be surprising but dangerous. At least if you use eval, there is a big red flag for the code reviewer.

The difference between arguments and options pertaining to the linux shell

I'm currently enrolled in an intro to Unix / Linux class and we came to a question that the instructor and I did not agree on.
cp -i file1 file2
Which is true about the preceding command?
A. There is only one utility
B. There is one option
C. There are three arguments
D. file1 will be copied as file2 and the user will be warned before
an overwrite occures
E. All of the above
I insisted that it was E. All of the above. The instructor has settled on D.
It seems clear that A, B, and D are all correct. The hang up was C and whether or not the -i flag was both an option and an argument.
My logic was that all options are arguments but not all arguments are options and since there are multiple true answers listed, then in multiple choice question tradition the answer is more than likely to be E all of the above.
I haven't been able to find the smoking gun on this issue and thought I would throw it to the masters.
I know this is an old thread, but I want to add the following for anyone else that may stumble into a similar disagreement.
$ ls -l junk
-rw-r--r-- 1 you 19 Sep 26 16:25 junk
"The strings that follow the program name on the command line, such as -l
and junk in the example above, are called the program's arguments. Arguments are usually options or names of files to be used by the command."
Brian W. Kernighan & Rob Pike, "The UNIX Programming Environment"
The manual page here states:
Mandatory arguments to long options are mandatory for short options too.
This seems to imply that in the context of this particular question, at least, you're supposed to not consider options to be arguments. Otherwise it becomes very recursive and kind of pointless.
I think the instructor should accept your explanation though, this really is splitting hairs for most typical cases.
I think the term "arguments" is used in different ways in different contexts, which is the root of the disagreement, it seems. In support of your stance though, note that the C runtime, upon which cp was most likely written, declares the program entry point as main(argc, argv) (types elided), which seems to indicate at least that those who designed the C architecture/library/etc. thought of options as a subset of arguments. Then of course options can have their own arguments (different context), etc....
This is how I was taught, it is said in this case:
cp -i file1 file2
The right answer would be A B and D but not C.
Since -i is an option and file1 and file2 are arguments. Normally options are considered to change the behaviour of an application or command where as arguments do not.
I suppose it is up to semantics as to whether you consider -i an argument of the original application since it is a behaviour changing option (or argument) of cp but it is considered in English an option not a argument.
That's how I still define the difference and keep the difference between the two parts of a command.
As another command example, cronjobs. I often use PHP cronjobs and I normally have both options and arguments associated with the command. Options are always used (in my opinion) to define extra behaviour while arguments are designed to provide the app and it's behaviours with the data it requires to complete the operation.
Edit
I agree with #unwind this is splitting hairs and actually a lot of times comes down to scenario and opinion. It was quite bad of him to even mark on it really, he should of known this is a subjective question. Tests are completely unfair when filled with subjective questions.
Hmmm... I personally like to distinguish between options and arguments however, you could technically say that options are arguments. I would say that you are correct but I think your instructor settled on D because he doesn't want you to get them confused. For example, the following is equivalent to the above command...:
ARG1="-i" ; ARG2="file1" ; ARG3="file2" ; cp $ARG1 $ARG2 $ARG3
ARG1="-i" ; ARG2="file1" ; ARG3="file2" ; cp $ARG2 $ARG1 $ARG3
ARG1="-i" ; ARG2="file1" ; ARG3="file2" ; cp $ARG2 $ARG3 $ARG1
...whereas cp $ARG1 $ARG3 $ARG2 is not the same. I would say that options are a special type of arguments.

Resources