With a bash release which has been patched for shellshock
$ bash --version
GNU bash, version 3.2.52(1)-release (x86_64-apple-darwin12)
Copyright (C) 2007 Free Software Foundation, Inc.
$ env x='() { :;}; echo vulnerable' bash -c "echo this is a test"
bash: warning: x: ignoring function definition attempt
bash: error importing function definition for `x'
this is a test
another similar exploit still works and has been assigned CVE-2014-7169
$ env X='() { (a)=>\' bash -c "echo date"; cat echo
bash: X: line 1: syntax error near unexpected token `='
bash: X: line 1: `'
bash: error importing function definition for `X'
Thu Sep 25 12:47:22 EDT 2014
$ ls echo
echo
Looking for a breakdown of this as well.
The bug
CVE-2014-7169 is a bug in bash's parser. Bash's parser uses a variable eol_ungetc_lookahead to ungetc characters across lines. That variable wasn't being properly reset from the reset_parser function, which is called e.g. on some syntax errors. Using that bug, it's possible to inject a character into the start of the next bash input line.
So the test code forces a syntax error, using either (a)= or function a a, adds the redirection character to prepend to the next line >, and adds a line continuation \, which leads to either version of the test code:
() { (a)=>\
() { function a a>\
When bash is executed, it processes variables from the environment, finds that variable X is a exported function, and evaluates it to import the function. But the evaluation fails with a parse error, leaving the > character in the eol_ungetc_lookahead variable. Then, when parsing the command argument echo date, it prepends the > character, leading to >echo date, which runs date redirected to a file named echo.
Its relation to the previous bug
The above bug is obviously very different to the original shellshock bug. There are actually several problems:
Bash evaluates completely a variable that looks like an exported function (starts with the four characters () {). CVE-2014-6271.
Under some conditions, it is possible to inject a character into an ungetc variable, that will be prepended to the next input line. CVE-2014-7169.
Bash allows every environment variable to be treated like an exported function, so long as it starts with the four characters () {. CVE-2014-6271, CVE-2014-7169, all the other CVEs where a bug is triggered in bash's parser.
There is a limited stack for here-doc redirection, and there is no check for overflow. CVE-2014-7186, which leads to memory corruption, and can probably be leveraged for arbitrary code execution.
There is a limited stack for nested control structures (select/for/while), with checks for overflow. That stack is still corrupted. CVE-2014-7187.
The fixes
The first patch restricts bash to evaluating a single function definition in each variable that looks like a exported function.
The second patch properly resets eol_ungetc_lookahead on reset_parser.
The third patch changes how functions are exported: now they are exported in variables named BASH_FUNC_functionname%%.
Attack surface
The big problem here has been that every environment variable could be used as a vector for attack. Typically, attackers cannot control arbitrary environment variables, otherwise there are already other known attacks (think of LD_PRELOAD, PATH, IFS, ...).
sudo is not affected because it strips exported bash functions from the environment, as mentioned by Gilles on security.SE.
ssh is affected. Typical sshd installations only allow a limited set of environment variables to be exported as configured in AcceptEnv in sshd_config, e.g: LANG and LC_*. Even with this aggressive whitelisting approach, in shellshock any variable could be an attack vector.
Not only was every environment variable a potential attack vector, they exposed a >6000 lines parser.
Lessons relearned
system, popen, and others are potentially dangerous. Not only should you take care with their arguments: even when the arguments are fixed at compile-time, the environment is a potential attack vector. Use fork()/execve(), preferably with a clean environment (but at least restrict the environment to white-listed variables, preferably with their values sanity-checked). Remember that a good quality system does what it is supposed to do, while a secure system does what it is supposed to do and nothing more. Invoking a full-blown shell makes doing nothing more a little bit harder.
Complexity is the enemy of security. These days you can easily find people recommending simpler shells. Most shells are free from shellshock by not supporting exported functions at all. Conversely, bash has received lots of security features over the years (you need to invoke it with -p to avoid it dropping privileges on startup, it sanitizes IFS, ...), so don't assume I'm advocating switching shells, this is more of a general advice.
Some excerpts from David Wheeler's ancient "Secure Programming for Linux and UNIX HOWTO" chapter on environment variables are still worth rereading.
§5.2.3 ¶1:
For secure setuid/setgid programs, the short list of environment
variables needed as input (if any) should be carefully extracted. Then
the entire environment should be erased, followed by resetting a small
set of necessary environment variables to safe values. There really
isn't a better way if you make any calls to subordinate programs;
there's no practical method of listing ``all the dangerous values''.
§5.2.3 ¶6:
If you really need user-supplied values, check the values first (to
ensure that the values match a pattern for legal values and that they
are within some reasonable maximum length).
Waving my hands a lot, I suspect the new exploit does the following:
The backslash helps bypass the original patch, so that the string is still evaluated.
The > combines with echo as an output redirection for the bash shell
With echo being consumed by the evaluation to define the function, the only part of the -c argument left to execute is date, whose output goes to a file name echo instead of standard output.
That's the best I can come up with short of reading the bash source, but I suspect the backslash facilitates some sort of buffer overflow that allows the environment string and the argument to -c to be merged.
Related
This question already has answers here:
Why do shell script comparisons often use x$VAR = xyes?
(7 answers)
Closed 2 years ago.
After reading Debian's /usr/bin/startx, there was something I found peculiar:
mcookie=`/usr/bin/mcookie`
if test x"$mcookie" = x; then
echo "Couldn't create cookie"
exit 1
fi
I don't understand why the extra x is necessary - wouldn't it be equivalent to write the following?
mcookie=`/usr/bin/mcookie`
if test "$mcookie" = ""; then
echo "Couldn't create cookie"
exit 1
fi
I initially thought that very early versions of sh might have printed some error if the mkcookie variable happened to be unset (this wasn't the only instance; these types of comparisons were scattered liberally throughout the script). But then it didn't make much sense after further thought because the variable was being quoted and the shell would expand it to an empty string.
I perused the Dash, --posix section of the Bash man page and checked POSIX itself. It said nothing about it under test: s1 = s2 True if the strings s1 and s2 are identical; otherwise, false. I assume the people that wrote the script knew what they were doing - so can someone shed some light on this?
Thankies,
Edwin
Here's an excerpt from the Wooledge Bash Pitfall #4:
You may have seen code like this:
[ x"$foo" = xbar ] # Ok, but usually unnecessary.
The x"$foo" hack is required for code that must run on very ancient shells which lack [[, and have a more primitive [, which gets confused if $foo begins with a -. On said older systems, [ still doesn't care whether the token on the right hand side of the = begins with a -. It just uses it literally. It's just the left-hand side that needs extra caution.
Note that shells that require this workaround are not POSIX-conforming. Even the Heirloom Bourne shell doesn't require this (probably the non-POSIX Bourne shell clone that's still most widely in use as a system shell). Such extreme portability is rarely a requirement and makes your code less readable (and uglier).
If you find it in code from this century, it's just cargo culting that keeps being reinforced by people who don't quote.
/usr/bin/startx in particular has used this idiom since at least XFree86 2.1 in Feb 1993, and any updates have likely just matched style since.
Having an Octave script (in the sense of dynamic languages here) move.m defining function move(direction), it can be invoked from another script (alternatively from the command line) in different ways: move left, move('left') or move(left). While the first two will instantiate direction with the string 'left', the last one will consider left as a variable.
The question is about the formal principle in language definition behind this. I understand that in the first mode, the script is invoked as a command, considering that the rest of the command line is just data, not variables (pretty much as in a Linux prompt); while in the last two it is called as a function, interpreting what follows (between parenthesis) as either data or variables. If this is a general design criteria among scripting languages, what is the principle behind it?
To answer your question, yes, this is by design, and it's syntactic sugar offered by matlab (and hence octave) for running certain functions that expect only string arguments. Here is the relevant section in the matlab manual: https://uk.mathworks.com/help/matlab/matlab_prog/command-vs-function-syntax.html
I should clarify some misconceptions though. First, it's not "data" vs "variables". Any argument supplied in command syntax is simply interpreted as a string. So these two are equivalent:
fprintf("1")
fprintf 1
I.e., in fprintf 1, the 1 is not numeric data. It's a string.
Secondly, not all m files are "scripts". You calling your m file a script caused me some confusion. Your particular file contains a function definition and nothing else, so it's a function, 100%.
The reason this is important here, is that all functions can be called either via functional syntax or command syntax (as long as it makes sense in terms of the expected arguments being strings), whereas scripts take no arguments, so there is no functional / command syntax at play, and if you were passing 'arguments' to a script you're doing something wrong.
I understand that in the first mode, the script is invoked as a command [...]
As far as Octave goes, you are better off forgetting about that distinction. I'm not sure if a "command" ever existed but it certainly does not exist now. The command syntax is just syntactic sugar in Octave. Makes it simpler for interactive plot adjustment since it's functions arguments mainly take strings.
I want to call a function with a pipe, and read all of stdin into a variable.
I read that the correct way to do that is with read, or maybe read -r or read -a. However, I had a lot of problems in practise doing that (esp with multi-line strings).
In the end I settled on
function example () {
local input=$(cat)
...
}
What is the idiomatic way to do this?
input=$(cat) is a perfectly fine way to capture standard input if you really need to. One caveat is that command substitutions strip all trailing newlines, so if you want to make sure to capture those as well, you need to ensure that something aside from the newline(s) is read last.
input=$(cat; echo x)
input=${input%x} # Strip the trailing x
Another option in bash 4 or later is to use the readarray command, which will populate an array with each line of standard input, one line per element, which you can then join back into a single variable if desired.
readarray foo
printf -v foo "%s" "${foo[#]}"
I've found that using cat is really slow in comparison to the following method, based on tests I've run:
local input="$(< /dev/stdin)"
In case anyone is wondering, < is just input redirection. From the bash-hackers wiki:
When the inner command is only an input redirection, and nothing else,
for example
$( <FILE )
# or
` <FILE `
then Bash attempts to read the given file and act just if the given
command was cat FILE.
Remarks about portability
In terms of how portable this method is, you are likely to go your entire linux user career, and never use a linux system which doesn't have /dev/stdin, but in case you want to satisfy that itch, here is a question on Unix Stackexchange which questions portability of directly accessing /dev/{stdin,stdout,stderr} and friends.
One more thing I've come across when working with linux containers such as ones built with docker, or buildah, is that there are situations where /dev/stdin or even /dev/stdout are not available inside the container. I've not been able to conclusively say what causes this.
There are a few overlapping / very similar questions floating around on SO. I answered this here, using the read built-in:
https://stackoverflow.com/a/58452863/3220983
In my answers there, however, I am ONLY concerned with a single line.
The arguable weakness of the cat approach, is that requires spawning a subshell. Otherwise, it's a good one. It's probably the easiest way to deal with multi line processing, as specifically queried here.
I think the read approach is faster / more resource efficient if you are trying to chain a lot of commands, or iterate through a list calling a function repeatedly.
I need to construct a complex command that includes quoted arguments. As it happens, they are arguments to grep, so I'll use that as my example and deeply simplify the command to just enough to demonstrate the error.
Let's start with a working example:
> COMMAND='/usr/bin/grep _'
> echo $COMMAND
/usr/bin/grep _
> $COMMAND
foo <- I type this, and grep filters it out.
foo_ <- I type this, and.....
foo_ <- ... it matches, so grep emits it.
"foo" is not echoed back because it lacks an underscore, "foo_" has one, so it's returned. Let's get to a demonstration of the problem:
> COMMAND='/usr/bin/grep "_ _"'
> echo -E $COMMAND
/usr/bin/grep "_ _"
> /usr/bin/grep "_ _" <- The exact same command line
foo <- fails to match
foo_ _ <- matches, so it gets echoed back
foo_ _
> $COMMAND <- But that command doesn't work from a variable
grep: _": No such file or directory
In other words, when this command is invoked through a variable name, bash is taking the space between underscores as an argument delimiter - despite the quotes.
Normally, I'd fix this with backslashes:
> COMMAND='/usr/bin/grep "_\ _"'
> $COMMAND
grep: trailing backslash (\)
Okay, maybe I need another layer of escaping the backslash:
> COMMAND='/usr/bin/grep "_\\ _"'
12:32 (master) /Users/ronbarry> $COMMAND
grep: _": No such file or directory
And now we're back to square one - the command line is still being broken up at the space. I can, of course, verify all of this with some debugging, which establishes that the backslashes are surviving, unescaped, and grep is being called with multiple arguments:
> set -x
> $COMMAND
+ /usr/bin/grep '"_\\' '_"' <- grep is being called with two args
I have a solution to the problem that takes advantage of arrays, but packing commands this way (in my full implementation, which I'll spare you) is unfamiliar to most people who'd read my code. To oversimplify the creation of an array-based command:
> declare -a COMMAND=('/usr/bin/grep' '-i' 'a b')
12:44 (master) /Users/ronbarry> ${COMMAND[*]}
foo <- Same old, same old
fooa B <- ...
fooa B <- Matches because of case-insensitive (-i) grep.
Finally we get to the question. Why does bash break up quoted arguments in strings when interpreting them as commands and why doesn't there seem to be a string-y way to get it to work? If I have a command packed in a string variable, it violates the Principle of Least Surprise to have that string interpreted differently than the string itself would be. If someone can point me at some docs that cover all of this, and will set me at peace with why I have to resort to the infinitely uglier mechanism of building up arrays with all of my commands, I'd very much appreciate it.
Disclaimer: After writing the following, I almost decided that the question should be closed for encouraging opinion-based responses. This is an opinion-based response. Proceed at your own risk.
Why does bash break up quoted arguments in strings when interpreting them as commands
Because that's what it does. A more interesting question might be "Why does bash break up strings at all?", to which the only possible answer would be "it seemed like a good idea at the time".
Or, to put it another way: In the beginning, nobody thought of putting spaces into filenames. When you only had a few letters for a filename, you didn't waste any of them on spaces. So it seemed reasonable to represent a list of words as just a space-separated list of words, and that was the basis on which shell languages were developed. So the default behaviour of bash, like that of all unix-y shells, is to consider a string with whitespace in it to be a whitespace-separated list of words.
But, of course, that leads to all sorts of headaches, because strings are not structured data. Sometimes a filename does have whitespace in its name. And not all utility arguments are filenames, either. Sometimes you want to give an argument to a utility which is, for example, a sentence. Without that complication, shells were able to avoid making you type quotes, unlike "real" programming languages where strings need to be quoted. But once you decide that sometimes a space in a string is just another character, you need to have some kind of quoting system. So then the syntax of shells added several quoting forms, each with slightly different semantics. The most common is double-quoting, which marks the contents as a single word but still allows variable expansion.
It remains the case that shell quotes, like quotes in any other language, are simply syntactic constructs. They are not part of the string, and the fact that a particular character in a string was marked with a quote (or, equivalently, a backslash) is not retained as part of the string -- again, just like any other programming language. Strings are not really lists of words; they are just treated that way by default.
All of that is not very satisfactory. The nature of shell programming is that you really want a data structure which is a list of "words" -- or, better, a list of strings. And, eventually, shells got around to doing that. Unfortunately, by then there wasn't much syntactic space left in shell languages; it was considered important that the new features not change the behaviour of existing shell scripts. As far as I know, the current shell syntax for arrays was created by David Korn in 1988 (or earlier); eventually, bash also implemented arrays with basically the same syntax.
One of the curiosities in the syntax is that there are three ways of specifying that an entire array should be substituted:
${array[*]} or ${array[#]}: concatenate all the array elements together separated with the first character in $IFS, and then consider the result to be a whitespace-separated list of words.
"${array[*]}": concatenate all the array elements together separated with the first character in $IFS, and then consider the result to be a single word.
"${array[#]}": each array element is inserted as a separate word.
Of these, the first one is essentially useless; the second one is occasionally useful, and the third -- and most difficult to type -- is the one you almost always want.
In the above brief discussion, I left out any consideration of glob characters and filename expansion, and a number of other shell idiosyncrasies. So don't take it as a complete tutorial, by any means.
why doesn't there seem to be a string-y way to get it to work?
You can always use eval. Unfortunately. If you really really want to get bash to interpret a string as though it were a bash program rather than a string, and if you are prepared to open your script up to all manner of injection attacks, then the shell will happily give you enough rope. Personally, I would never allow a script which used eval to pass code review so I'm not going to expand on its use here. But it's documented.
If I have a command packed in a string variable, it violates the Principle of Least Surprise to have that string interpreted differently than the string itself would be.
Surprise is really in the eye of the beholder. There are probably lots of programmers who think that a newline character really occupies two bytes, and are Surprised when it turns out that in C, '\n'[0] is not a backslash. But I think most of us would be Surprised if it were. (I've tried to answer SO questions based on this misunderstanding, and it is not easy.)
Bash strings, regardless of anything else, are strings. They are not bash programs. Having them suddenly interpreted as bash programs would, in my opinion, not only be surprising but dangerous. At least if you use eval, there is a big red flag for the code reviewer.
This question already has answers here:
Why do shell script comparisons often use x$VAR = xyes?
(7 answers)
Closed 9 years ago.
I know I can test for an empty string in Bash with -z like so:
if [[ -z $myvar ]]; then do_stuff; fi
but I see a lot of code written like:
if [[ X"" = X"$myvar" ]]; then do_stuff; fi
Is that method more portable? Is it just historical cruft from before the days of -z? Is it for POSIX shells (even though I've seen it used in scripts targeting bash)? Ready for my history/portability lesson.
The same question was asked on Server Fault as How to determine if a bash variable is empty? but no one offered an explanation as to why you see code with the X"" stuff.
Fundamentally, because in times now long past, the behaviour of test was more complex and not uniformly defined across different systems (so portable code had to be written carefully to avoid non-portable constructs).
In particular, before test was a shell built-in, it was a separate executable (and note that MacOS X still has /bin/test and /bin/[ as executables). When that was the case, writing:
if [ -z $variable ]
when $variable was empty would invoke the test program via its alias [ with 3 arguments:
argv[0] = "["
argv[1] = "-z"
argv[2] = "]"
because the variable was empty so there was nothing to expand. So, the safe way of writing the code was:
if [ -z "$variable" ]
This works reliably, passing 4 arguments to the test executable. Granted, the test program has been a built-in to most shells for decades, but old equipment dies hard, and so do good practices learned even longer ago.
The other problem resolved by the X prefix was what happened if variables include leading dashes, or contain equals or other comparators. Consider (a not desparately good example):
x="-z"
if [ $x -eq 0 ]
Is that an empty string test with a stray (erroneous) argument, or a numeric equality test with a non-numeric first argument? Different systems provided different answers before POSIX standardized the behaviour, circa 1990. So, the safe way of dealing with this was:
if [ "X$x" = "X0" ]
or (less usually, in my experience, but completely equivalently):
if [ X"$x" = X"0" ]
It was all the edge cases like this, tied up with the possibility that the test was a separate executable, that means that portable shell code still uses double quotes more copiously than the modern shells actually require, and the X-prefix notation was used to ensure that things could not get misinterpreted.
Ah, this is one of my preferred questions and answers, because I came up with the answer just thinking about it. The X is set just in case the string starts with a -, that can be taken as a flag for the test. Putting an X before just removes that case, and the comparison can still hold.
I also like this because this kind of trick is almost an inheritance from the past, oldest times of computing, and you encounter it when you try to read some of the most portable shell scripts out there (autoconf, configure, etc.)