I want to call a function with a pipe, and read all of stdin into a variable.
I read that the correct way to do that is with read, or maybe read -r or read -a. However, I had a lot of problems in practise doing that (esp with multi-line strings).
In the end I settled on
function example () {
local input=$(cat)
...
}
What is the idiomatic way to do this?
input=$(cat) is a perfectly fine way to capture standard input if you really need to. One caveat is that command substitutions strip all trailing newlines, so if you want to make sure to capture those as well, you need to ensure that something aside from the newline(s) is read last.
input=$(cat; echo x)
input=${input%x} # Strip the trailing x
Another option in bash 4 or later is to use the readarray command, which will populate an array with each line of standard input, one line per element, which you can then join back into a single variable if desired.
readarray foo
printf -v foo "%s" "${foo[#]}"
I've found that using cat is really slow in comparison to the following method, based on tests I've run:
local input="$(< /dev/stdin)"
In case anyone is wondering, < is just input redirection. From the bash-hackers wiki:
When the inner command is only an input redirection, and nothing else,
for example
$( <FILE )
# or
` <FILE `
then Bash attempts to read the given file and act just if the given
command was cat FILE.
Remarks about portability
In terms of how portable this method is, you are likely to go your entire linux user career, and never use a linux system which doesn't have /dev/stdin, but in case you want to satisfy that itch, here is a question on Unix Stackexchange which questions portability of directly accessing /dev/{stdin,stdout,stderr} and friends.
One more thing I've come across when working with linux containers such as ones built with docker, or buildah, is that there are situations where /dev/stdin or even /dev/stdout are not available inside the container. I've not been able to conclusively say what causes this.
There are a few overlapping / very similar questions floating around on SO. I answered this here, using the read built-in:
https://stackoverflow.com/a/58452863/3220983
In my answers there, however, I am ONLY concerned with a single line.
The arguable weakness of the cat approach, is that requires spawning a subshell. Otherwise, it's a good one. It's probably the easiest way to deal with multi line processing, as specifically queried here.
I think the read approach is faster / more resource efficient if you are trying to chain a lot of commands, or iterate through a list calling a function repeatedly.
Related
I need to construct a complex command that includes quoted arguments. As it happens, they are arguments to grep, so I'll use that as my example and deeply simplify the command to just enough to demonstrate the error.
Let's start with a working example:
> COMMAND='/usr/bin/grep _'
> echo $COMMAND
/usr/bin/grep _
> $COMMAND
foo <- I type this, and grep filters it out.
foo_ <- I type this, and.....
foo_ <- ... it matches, so grep emits it.
"foo" is not echoed back because it lacks an underscore, "foo_" has one, so it's returned. Let's get to a demonstration of the problem:
> COMMAND='/usr/bin/grep "_ _"'
> echo -E $COMMAND
/usr/bin/grep "_ _"
> /usr/bin/grep "_ _" <- The exact same command line
foo <- fails to match
foo_ _ <- matches, so it gets echoed back
foo_ _
> $COMMAND <- But that command doesn't work from a variable
grep: _": No such file or directory
In other words, when this command is invoked through a variable name, bash is taking the space between underscores as an argument delimiter - despite the quotes.
Normally, I'd fix this with backslashes:
> COMMAND='/usr/bin/grep "_\ _"'
> $COMMAND
grep: trailing backslash (\)
Okay, maybe I need another layer of escaping the backslash:
> COMMAND='/usr/bin/grep "_\\ _"'
12:32 (master) /Users/ronbarry> $COMMAND
grep: _": No such file or directory
And now we're back to square one - the command line is still being broken up at the space. I can, of course, verify all of this with some debugging, which establishes that the backslashes are surviving, unescaped, and grep is being called with multiple arguments:
> set -x
> $COMMAND
+ /usr/bin/grep '"_\\' '_"' <- grep is being called with two args
I have a solution to the problem that takes advantage of arrays, but packing commands this way (in my full implementation, which I'll spare you) is unfamiliar to most people who'd read my code. To oversimplify the creation of an array-based command:
> declare -a COMMAND=('/usr/bin/grep' '-i' 'a b')
12:44 (master) /Users/ronbarry> ${COMMAND[*]}
foo <- Same old, same old
fooa B <- ...
fooa B <- Matches because of case-insensitive (-i) grep.
Finally we get to the question. Why does bash break up quoted arguments in strings when interpreting them as commands and why doesn't there seem to be a string-y way to get it to work? If I have a command packed in a string variable, it violates the Principle of Least Surprise to have that string interpreted differently than the string itself would be. If someone can point me at some docs that cover all of this, and will set me at peace with why I have to resort to the infinitely uglier mechanism of building up arrays with all of my commands, I'd very much appreciate it.
Disclaimer: After writing the following, I almost decided that the question should be closed for encouraging opinion-based responses. This is an opinion-based response. Proceed at your own risk.
Why does bash break up quoted arguments in strings when interpreting them as commands
Because that's what it does. A more interesting question might be "Why does bash break up strings at all?", to which the only possible answer would be "it seemed like a good idea at the time".
Or, to put it another way: In the beginning, nobody thought of putting spaces into filenames. When you only had a few letters for a filename, you didn't waste any of them on spaces. So it seemed reasonable to represent a list of words as just a space-separated list of words, and that was the basis on which shell languages were developed. So the default behaviour of bash, like that of all unix-y shells, is to consider a string with whitespace in it to be a whitespace-separated list of words.
But, of course, that leads to all sorts of headaches, because strings are not structured data. Sometimes a filename does have whitespace in its name. And not all utility arguments are filenames, either. Sometimes you want to give an argument to a utility which is, for example, a sentence. Without that complication, shells were able to avoid making you type quotes, unlike "real" programming languages where strings need to be quoted. But once you decide that sometimes a space in a string is just another character, you need to have some kind of quoting system. So then the syntax of shells added several quoting forms, each with slightly different semantics. The most common is double-quoting, which marks the contents as a single word but still allows variable expansion.
It remains the case that shell quotes, like quotes in any other language, are simply syntactic constructs. They are not part of the string, and the fact that a particular character in a string was marked with a quote (or, equivalently, a backslash) is not retained as part of the string -- again, just like any other programming language. Strings are not really lists of words; they are just treated that way by default.
All of that is not very satisfactory. The nature of shell programming is that you really want a data structure which is a list of "words" -- or, better, a list of strings. And, eventually, shells got around to doing that. Unfortunately, by then there wasn't much syntactic space left in shell languages; it was considered important that the new features not change the behaviour of existing shell scripts. As far as I know, the current shell syntax for arrays was created by David Korn in 1988 (or earlier); eventually, bash also implemented arrays with basically the same syntax.
One of the curiosities in the syntax is that there are three ways of specifying that an entire array should be substituted:
${array[*]} or ${array[#]}: concatenate all the array elements together separated with the first character in $IFS, and then consider the result to be a whitespace-separated list of words.
"${array[*]}": concatenate all the array elements together separated with the first character in $IFS, and then consider the result to be a single word.
"${array[#]}": each array element is inserted as a separate word.
Of these, the first one is essentially useless; the second one is occasionally useful, and the third -- and most difficult to type -- is the one you almost always want.
In the above brief discussion, I left out any consideration of glob characters and filename expansion, and a number of other shell idiosyncrasies. So don't take it as a complete tutorial, by any means.
why doesn't there seem to be a string-y way to get it to work?
You can always use eval. Unfortunately. If you really really want to get bash to interpret a string as though it were a bash program rather than a string, and if you are prepared to open your script up to all manner of injection attacks, then the shell will happily give you enough rope. Personally, I would never allow a script which used eval to pass code review so I'm not going to expand on its use here. But it's documented.
If I have a command packed in a string variable, it violates the Principle of Least Surprise to have that string interpreted differently than the string itself would be.
Surprise is really in the eye of the beholder. There are probably lots of programmers who think that a newline character really occupies two bytes, and are Surprised when it turns out that in C, '\n'[0] is not a backslash. But I think most of us would be Surprised if it were. (I've tried to answer SO questions based on this misunderstanding, and it is not easy.)
Bash strings, regardless of anything else, are strings. They are not bash programs. Having them suddenly interpreted as bash programs would, in my opinion, not only be surprising but dangerous. At least if you use eval, there is a big red flag for the code reviewer.
With a bash release which has been patched for shellshock
$ bash --version
GNU bash, version 3.2.52(1)-release (x86_64-apple-darwin12)
Copyright (C) 2007 Free Software Foundation, Inc.
$ env x='() { :;}; echo vulnerable' bash -c "echo this is a test"
bash: warning: x: ignoring function definition attempt
bash: error importing function definition for `x'
this is a test
another similar exploit still works and has been assigned CVE-2014-7169
$ env X='() { (a)=>\' bash -c "echo date"; cat echo
bash: X: line 1: syntax error near unexpected token `='
bash: X: line 1: `'
bash: error importing function definition for `X'
Thu Sep 25 12:47:22 EDT 2014
$ ls echo
echo
Looking for a breakdown of this as well.
The bug
CVE-2014-7169 is a bug in bash's parser. Bash's parser uses a variable eol_ungetc_lookahead to ungetc characters across lines. That variable wasn't being properly reset from the reset_parser function, which is called e.g. on some syntax errors. Using that bug, it's possible to inject a character into the start of the next bash input line.
So the test code forces a syntax error, using either (a)= or function a a, adds the redirection character to prepend to the next line >, and adds a line continuation \, which leads to either version of the test code:
() { (a)=>\
() { function a a>\
When bash is executed, it processes variables from the environment, finds that variable X is a exported function, and evaluates it to import the function. But the evaluation fails with a parse error, leaving the > character in the eol_ungetc_lookahead variable. Then, when parsing the command argument echo date, it prepends the > character, leading to >echo date, which runs date redirected to a file named echo.
Its relation to the previous bug
The above bug is obviously very different to the original shellshock bug. There are actually several problems:
Bash evaluates completely a variable that looks like an exported function (starts with the four characters () {). CVE-2014-6271.
Under some conditions, it is possible to inject a character into an ungetc variable, that will be prepended to the next input line. CVE-2014-7169.
Bash allows every environment variable to be treated like an exported function, so long as it starts with the four characters () {. CVE-2014-6271, CVE-2014-7169, all the other CVEs where a bug is triggered in bash's parser.
There is a limited stack for here-doc redirection, and there is no check for overflow. CVE-2014-7186, which leads to memory corruption, and can probably be leveraged for arbitrary code execution.
There is a limited stack for nested control structures (select/for/while), with checks for overflow. That stack is still corrupted. CVE-2014-7187.
The fixes
The first patch restricts bash to evaluating a single function definition in each variable that looks like a exported function.
The second patch properly resets eol_ungetc_lookahead on reset_parser.
The third patch changes how functions are exported: now they are exported in variables named BASH_FUNC_functionname%%.
Attack surface
The big problem here has been that every environment variable could be used as a vector for attack. Typically, attackers cannot control arbitrary environment variables, otherwise there are already other known attacks (think of LD_PRELOAD, PATH, IFS, ...).
sudo is not affected because it strips exported bash functions from the environment, as mentioned by Gilles on security.SE.
ssh is affected. Typical sshd installations only allow a limited set of environment variables to be exported as configured in AcceptEnv in sshd_config, e.g: LANG and LC_*. Even with this aggressive whitelisting approach, in shellshock any variable could be an attack vector.
Not only was every environment variable a potential attack vector, they exposed a >6000 lines parser.
Lessons relearned
system, popen, and others are potentially dangerous. Not only should you take care with their arguments: even when the arguments are fixed at compile-time, the environment is a potential attack vector. Use fork()/execve(), preferably with a clean environment (but at least restrict the environment to white-listed variables, preferably with their values sanity-checked). Remember that a good quality system does what it is supposed to do, while a secure system does what it is supposed to do and nothing more. Invoking a full-blown shell makes doing nothing more a little bit harder.
Complexity is the enemy of security. These days you can easily find people recommending simpler shells. Most shells are free from shellshock by not supporting exported functions at all. Conversely, bash has received lots of security features over the years (you need to invoke it with -p to avoid it dropping privileges on startup, it sanitizes IFS, ...), so don't assume I'm advocating switching shells, this is more of a general advice.
Some excerpts from David Wheeler's ancient "Secure Programming for Linux and UNIX HOWTO" chapter on environment variables are still worth rereading.
§5.2.3 ¶1:
For secure setuid/setgid programs, the short list of environment
variables needed as input (if any) should be carefully extracted. Then
the entire environment should be erased, followed by resetting a small
set of necessary environment variables to safe values. There really
isn't a better way if you make any calls to subordinate programs;
there's no practical method of listing ``all the dangerous values''.
§5.2.3 ¶6:
If you really need user-supplied values, check the values first (to
ensure that the values match a pattern for legal values and that they
are within some reasonable maximum length).
Waving my hands a lot, I suspect the new exploit does the following:
The backslash helps bypass the original patch, so that the string is still evaluated.
The > combines with echo as an output redirection for the bash shell
With echo being consumed by the evaluation to define the function, the only part of the -c argument left to execute is date, whose output goes to a file name echo instead of standard output.
That's the best I can come up with short of reading the bash source, but I suspect the backslash facilitates some sort of buffer overflow that allows the environment string and the argument to -c to be merged.
I'm currently enrolled in an intro to Unix / Linux class and we came to a question that the instructor and I did not agree on.
cp -i file1 file2
Which is true about the preceding command?
A. There is only one utility
B. There is one option
C. There are three arguments
D. file1 will be copied as file2 and the user will be warned before
an overwrite occures
E. All of the above
I insisted that it was E. All of the above. The instructor has settled on D.
It seems clear that A, B, and D are all correct. The hang up was C and whether or not the -i flag was both an option and an argument.
My logic was that all options are arguments but not all arguments are options and since there are multiple true answers listed, then in multiple choice question tradition the answer is more than likely to be E all of the above.
I haven't been able to find the smoking gun on this issue and thought I would throw it to the masters.
I know this is an old thread, but I want to add the following for anyone else that may stumble into a similar disagreement.
$ ls -l junk
-rw-r--r-- 1 you 19 Sep 26 16:25 junk
"The strings that follow the program name on the command line, such as -l
and junk in the example above, are called the program's arguments. Arguments are usually options or names of files to be used by the command."
Brian W. Kernighan & Rob Pike, "The UNIX Programming Environment"
The manual page here states:
Mandatory arguments to long options are mandatory for short options too.
This seems to imply that in the context of this particular question, at least, you're supposed to not consider options to be arguments. Otherwise it becomes very recursive and kind of pointless.
I think the instructor should accept your explanation though, this really is splitting hairs for most typical cases.
I think the term "arguments" is used in different ways in different contexts, which is the root of the disagreement, it seems. In support of your stance though, note that the C runtime, upon which cp was most likely written, declares the program entry point as main(argc, argv) (types elided), which seems to indicate at least that those who designed the C architecture/library/etc. thought of options as a subset of arguments. Then of course options can have their own arguments (different context), etc....
This is how I was taught, it is said in this case:
cp -i file1 file2
The right answer would be A B and D but not C.
Since -i is an option and file1 and file2 are arguments. Normally options are considered to change the behaviour of an application or command where as arguments do not.
I suppose it is up to semantics as to whether you consider -i an argument of the original application since it is a behaviour changing option (or argument) of cp but it is considered in English an option not a argument.
That's how I still define the difference and keep the difference between the two parts of a command.
As another command example, cronjobs. I often use PHP cronjobs and I normally have both options and arguments associated with the command. Options are always used (in my opinion) to define extra behaviour while arguments are designed to provide the app and it's behaviours with the data it requires to complete the operation.
Edit
I agree with #unwind this is splitting hairs and actually a lot of times comes down to scenario and opinion. It was quite bad of him to even mark on it really, he should of known this is a subjective question. Tests are completely unfair when filled with subjective questions.
Hmmm... I personally like to distinguish between options and arguments however, you could technically say that options are arguments. I would say that you are correct but I think your instructor settled on D because he doesn't want you to get them confused. For example, the following is equivalent to the above command...:
ARG1="-i" ; ARG2="file1" ; ARG3="file2" ; cp $ARG1 $ARG2 $ARG3
ARG1="-i" ; ARG2="file1" ; ARG3="file2" ; cp $ARG2 $ARG1 $ARG3
ARG1="-i" ; ARG2="file1" ; ARG3="file2" ; cp $ARG2 $ARG3 $ARG1
...whereas cp $ARG1 $ARG3 $ARG2 is not the same. I would say that options are a special type of arguments.
Does anyone know of a Linux command that reads a linear system of equations from its standard input and writes the solution (if exists) in its standard output?
I want to do something like this:
generate_system | solve_system
You can probably write your own such command using this package.
This is an old question, but showed up in my searches for this problem, so I'm adding an answer here.
I used maxima's solve function. Wrangling the input/output to/from maxima is a bit of a challenge, but can be done.
prepare the system of equations as a comma-separated list -- for a example, EQs="C[1]+C[2]=1,C[1]-C[2]=2". I wanted a solution for an unknown number of variables, so I used C[n], but you can use variable names.
prepare a list of variables you wish to solve for -- EQ_VARS="C[1],C[2]"
Maxima will echo all inputs, use line wrap, and return a solution in the form [C[1]=...,C[2]=..]. We need to resolve all of these.
Taken together, this becomes
OUT_VALS=( \
$(maxima --very-quiet \
--batch-string="display2d:false\$linel:9999\$print(map(rhs,float(solve([$EQs],[$EQ_VARS]))[1]))\$" \
| tail -n 1 \
| tr -c '0-9-.e' ' ') )
which will place the solution values into the array $OUT_VALS.
Note that this only properly handles that Maxima output if your problem is correctly constrained -- if you have zero, or more than one solution, the output will not be parsed correctly.
Look at the following implementations of the "echo" command:
http://bxr.su/o/bin/echo/echo.c (OpenBSD)
http://bxr.su/d/bin/echo/echo.c (DragonFly)
http://bxr.su/n/bin/echo/echo.c (NetBSD)
http://bxr.su/f/bin/echo/echo.c (FreeBSD)
http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/echo.c (GNU)
As you go down the list, I'm sure you'll notice the increasing bloat in each implementation.
What is the point of a 272 line echo program?
In their article 'Program Design in the UNIX Environment', Pike & Kernighan discuss how the cat program accreted control arguments. Somewhere, though not that article, there was a comment about 'cat came back from Berkeley waving flags'. This is a similar issue to the problem with echo developing options. (I found a reference to the relevant article in the BSD (Mac OS X) man page for cat: Rob Pike, "UNIX Style, or cat -v Considered Harmful", USENIX Summer Conference Proceedings, 1983. See also http://quotes.cat-v.org/programming/)
In their book 'The UNIX Programming Environment', Kernighan & Pike (yes, those two again) quote Doug McIlroy on the subject of what 'echo' should do with no arguments (circa 1984):
Another question of philosophy is what echo should do if given no arguments - specifically, should it print a blank line or nothing at all. All the current echo implementations we know print a blank line, but past versions didn't, and there were great debates on the subject. Doug McIlroy imparted the right feelings of mysticism in his discussion on the topic:
The UNIX and the Echo
There dwelt in the land of New Jersey the UNIX, a fair maid whom savants travelled far to admire. Dazzled by her purity, all sought to espouse her, one for her virginal grace, another for her polished civility, yet another for her agility in performing exacting tasks seldom accomplished even in much richer lands. So large of heart and accommodating of nature was she that the UNIX adopted all but the most insufferably rich of her suitors. Soon, many offspring grew and prospered and spread to the ends of the earth.
Nature herself smiled and answered to the UNIX more eagerly than to other mortal beings. Humbler folk, who knew little of more courtly manners, delighted in her echo, so precise and crystal clear they scarce believed she could be answered by the same rocks and woods that so garbled their own shouts into the wilderness. And the compliant UNIX obliged with perfect echoes of whatever she was asked.
When one impatient swain asked the UNIX, 'Echo nothing', the UNIX obligingly opened her mouth, echoed nothing, and closed it again.
'Whatever do you mean,' the youth demanded, 'opening your mouth like that? Henceforth never open your mouth when you are supposed to echo nothing!' And the UNIX obliged.
'But I want a perfect performance, even when you echo nothing,' pleaded a sensitive youth, 'and no perfect echoes can come from a closed mouth.' Not wishing to offend either one, the UNIX agreed to say different nothings for the impatient youth and the insensitive youth. She called the sensitive nothing '\n'.
Yet now when she said '\n', she was not really saying nothing so she had to open her mouth twice, once to say '\n' and once to say nothing, and so she did not please the sensitive youth, who said forthwith, 'The \n sounds like a perfect nothing to me, but the second one ruins it. I want you to take back one of them.' So the UNIX, who could not abide offending, agreed to undo some echoes, and called that '\c'. Now the sensitive youth could hear a perfect echo of nothing by asking for '\n' and '\c' together. But they say that he died of a surfeit of notation before he ever heard one.
The Korn shell introduced (or, at least, included) a printf command that was based on the C language printf() function, and that uses a format string to control how the material should appear. It is a better tool for complicated formatting than echo. But, because of the history outlined in the quote, echo doesn't just echo any more; it interprets what it is given to echo.
And interpreting the command line arguments to echo indubitably requires more code than not interpreting them. A basic echo command is:
#include <stdio.h>
int main(int argc, char **argv)
{
const char *pad = "";
while (*++argv)
{
fputs(pad, stdout);
fputs(*argv, stdout);
pad = " ";
}
fputc('\n', stdout);
return 0;
}
There are other ways to achieve that. But the more complex versions of echo have to scrutinize their arguments before printing anything - and that takes more code. And different systems have decided that they want to do different amounts of interpretation of their arguments, leading to different amounts of code.
You will notice there is not really that much bloat growth.
Most of the lines of code are comments.
Most of the lines of code that are not comments, are usage documentation, so when somebody goes 'echo --help' it will do something.
Code outside of the above appears largely to be handling the arguments echo can take, as well as the "special" expansion for symbols such as \n and \t to be their equivalent characters instead of echoing them literally.
Also, most of the time, you're not even running the echo command, most of the time 'echo' invokes a shell built-in. At least on my machine, you have to type /bin/echo --help to get all the advanced functionality/documentation out of it, because echo --help merely echo's --help
For a good example, run this in your shell.
echo '\e[31mhello\e[0m'
Then run this:
echo -e '\e[31mhello\e[0m'
And you will note vastly different results.
The former will just emit the input as-is, but the latter will print hello coloured red.
Another example, using the 'hex' code:
$echo '\x64\x65\x66'
\x64\x65\x66
$echo -e '\x64\x65\x66'
def
Vastly different behaviour. The openbsd implementation cannot to this =).
I'm not sure if I like the first implementation: it has one option too many!
If a -n option was required, then also adding a -- option to stop option processing would be helpful. That way if you are writing a shell script that reads and prints user input, you don't get inconsistent behavior if the user types -n.