shell quoting behavior - linux

I just learned that, quoting make a huge difference in some cases, and I did something to test it, here is what I just did,
$ xfs=$(find . -type f -perm -111) #find all files with x-perm
$ echo "$xfs"
./b.out
./a.out
$ echo $xfs
./b.out ./a.out #why all in one line, but the above takes two?
If $xfs contains \n, AFAIK, echo -e will expand \n, but how can echo "$xfs" take 2 lines?

Any whitespace is normally in shell considered to be an argument separator. Thus, your first example has two arguments. echo prints the arguments separated by one space, and that's the behaviour you see in your second example.
However, when you use quotes, anything between them is one argument, and it is printed literally. The one argument in your first example already contains a newline, so it is printed with a newline.
The -e option from the bash echo builtin regulates the expansion of escape sequences like \n; however, you don't have any escape sequences. The variable contains a literal newline.

Related

How to take regex as parameter in shell script? [duplicate]

Here are a series of cases where echo $var can show a different value than what was just assigned. This happens regardless of whether the assigned value was "double quoted", 'single quoted' or unquoted.
How do I get the shell to set my variable correctly?
Asterisks
The expected output is /* Foobar is free software */, but instead I get a list of filenames:
$ var="/* Foobar is free software */"
$ echo $var
/bin /boot /dev /etc /home /initrd.img /lib /lib64 /media /mnt /opt /proc ...
Square brackets
The expected value is [a-z], but sometimes I get a single letter instead!
$ var=[a-z]
$ echo $var
c
Line feeds (newlines)
The expected value is a a list of separate lines, but instead all the values are on one line!
$ cat file
foo
bar
baz
$ var=$(cat file)
$ echo $var
foo bar baz
Multiple spaces
I expected a carefully aligned table header, but instead multiple spaces either disappear or are collapsed into one!
$ var=" title | count"
$ echo $var
title | count
Tabs
I expected two tab separated values, but instead I get two space separated values!
$ var=$'key\tvalue'
$ echo $var
key value
In all of the cases above, the variable is correctly set, but not correctly read! The right way is to use double quotes when referencing:
echo "$var"
This gives the expected value in all the examples given. Always quote variable references!
Why?
When a variable is unquoted, it will:
Undergo field splitting where the value is split into multiple words on whitespace (by default):
Before: /* Foobar is free software */
After: /*, Foobar, is, free, software, */
Each of these words will undergo pathname expansion, where patterns are expanded into matching files:
Before: /*
After: /bin, /boot, /dev, /etc, /home, ...
Finally, all the arguments are passed to echo, which writes them out separated by single spaces, giving
/bin /boot /dev /etc /home Foobar is free software Desktop/ Downloads/
instead of the variable's value.
When the variable is quoted it will:
Be substituted for its value.
There is no step 2.
This is why you should always quote all variable references, unless you specifically require word splitting and pathname expansion. Tools like shellcheck are there to help, and will warn about missing quotes in all the cases above.
You may want to know why this is happening. Together with the great explanation by that other guy, find a reference of Why does my shell script choke on whitespace or other special characters? written by Gilles in Unix & Linux:
Why do I need to write "$foo"? What happens without the quotes?
$foo does not mean “take the value of the variable foo”. It means
something much more complex:
First, take the value of the variable.
Field splitting: treat that value as a whitespace-separated list of fields, and build the resulting list. For example, if the variable
contains foo * bar ​ then the result of this step is the 3-element
list foo, *, bar.
Filename generation: treat each field as a glob, i.e. as a wildcard pattern, and replace it by the list of file names that match this
pattern. If the pattern doesn't match any files, it is left
unmodified. In our example, this results in the list containing foo,
following by the list of files in the current directory, and finally
bar. If the current directory is empty, the result is foo, *,
bar.
Note that the result is a list of strings. There are two contexts in
shell syntax: list context and string context. Field splitting and
filename generation only happen in list context, but that's most of
the time. Double quotes delimit a string context: the whole
double-quoted string is a single string, not to be split. (Exception:
"$#" to expand to the list of positional parameters, e.g. "$#" is
equivalent to "$1" "$2" "$3" if there are three positional
parameters. See What is the difference between $* and $#?)
The same happens to command substitution with $(foo) or with
`foo`. On a side note, don't use `foo`: its quoting rules are
weird and non-portable, and all modern shells support $(foo) which
is absolutely equivalent except for having intuitive quoting rules.
The output of arithmetic substitution also undergoes the same
expansions, but that isn't normally a concern as it only contains
non-expandable characters (assuming IFS doesn't contain digits or
-).
See When is double-quoting necessary? for more details about the
cases when you can leave out the quotes.
Unless you mean for all this rigmarole to happen, just remember to
always use double quotes around variable and command substitutions. Do
take care: leaving out the quotes can lead not just to errors but to
security
holes.
In addition to other issues caused by failing to quote, -n and -e can be consumed by echo as arguments. (Only the former is legal per the POSIX spec for echo, but several common implementations violate the spec and consume -e as well).
To avoid this, use printf instead of echo when details matter.
Thus:
$ vars="-e -n -a"
$ echo $vars # breaks because -e and -n can be treated as arguments to echo
-a
$ echo "$vars"
-e -n -a
However, correct quoting won't always save you when using echo:
$ vars="-n"
$ echo "$vars"
$ ## not even an empty line was printed
...whereas it will save you with printf:
$ vars="-n"
$ printf '%s\n' "$vars"
-n
user double quote to get the exact value. like this:
echo "${var}"
and it will read your value correctly.
echo $var output highly depends on the value of IFS variable. By default it contains space, tab, and newline characters:
[ks#localhost ~]$ echo -n "$IFS" | cat -vte
^I$
This means that when shell is doing field splitting (or word splitting) it uses all these characters as word separators. This is what happens when referencing a variable without double quotes to echo it ($var) and thus expected output is altered.
One way to prevent word splitting (besides using double quotes) is to set IFS to null. See http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_06_05 :
If the value of IFS is null, no field splitting shall be performed.
Setting to null means setting to empty
value:
IFS=
Test:
[ks#localhost ~]$ echo -n "$IFS" | cat -vte
^I$
[ks#localhost ~]$ var=$'key\nvalue'
[ks#localhost ~]$ echo $var
key value
[ks#localhost ~]$ IFS=
[ks#localhost ~]$ echo $var
key
value
[ks#localhost ~]$
The answer from ks1322 helped me to identify the issue while using docker-compose exec:
If you omit the -T flag, docker-compose exec add a special character that break output, we see b instead of 1b:
$ test=$(/usr/local/bin/docker-compose exec db bash -c "echo 1")
$ echo "${test}b"
b
echo "${test}" | cat -vte
1^M$
With -T flag, docker-compose exec works as expected:
$ test=$(/usr/local/bin/docker-compose exec -T db bash -c "echo 1")
$ echo "${test}b"
1b
Additional to putting the variable in quotation, one could also translate the output of the variable using tr and converting spaces to newlines.
$ echo $var | tr " " "\n"
foo
bar
baz
Although this is a little more convoluted, it does add more diversity with the output as you can substitute any character as the separator between array variables.

Basename: extra operand error when on linux command [duplicate]

Should or should I not wrap quotes around variables in a shell script?
For example, is the following correct:
xdg-open $URL
[ $? -eq 2 ]
or
xdg-open "$URL"
[ "$?" -eq "2" ]
And if so, why?
General rule: quote it if it can either be empty or contain spaces (or any whitespace really) or special characters (wildcards). Not quoting strings with spaces often leads to the shell breaking apart a single argument into many.
$? doesn't need quotes since it's a numeric value. Whether $URL needs it depends on what you allow in there and whether you still want an argument if it's empty.
I tend to always quote strings just out of habit since it's safer that way.
In short, quote everything where you do not require the shell to perform word splitting and wildcard expansion.
Single quotes protect the text between them verbatim. It is the proper tool when you need to ensure that the shell does not touch the string at all. Typically, it is the quoting mechanism of choice when you do not require variable interpolation.
$ echo 'Nothing \t in here $will change'
Nothing \t in here $will change
$ grep -F '#&$*!!' file /dev/null
file:I can't get this #&$*!! quoting right.
Double quotes are suitable when variable interpolation is required. With suitable adaptations, it is also a good workaround when you need single quotes in the string. (There is no straightforward way to escape a single quote between single quotes, because there is no escape mechanism inside single quotes -- if there was, they would not quote completely verbatim.)
$ echo "There is no place like '$HOME'"
There is no place like '/home/me'
No quotes are suitable when you specifically require the shell to perform word splitting and/or wildcard expansion.
Word splitting (aka token splitting);
$ words="foo bar baz"
$ for word in $words; do
> echo "$word"
> done
foo
bar
baz
By contrast:
$ for word in "$words"; do echo "$word"; done
foo bar baz
(The loop only runs once, over the single, quoted string.)
$ for word in '$words'; do echo "$word"; done
$words
(The loop only runs once, over the literal single-quoted string.)
Wildcard expansion:
$ pattern='file*.txt'
$ ls $pattern
file1.txt file_other.txt
By contrast:
$ ls "$pattern"
ls: cannot access file*.txt: No such file or directory
(There is no file named literally file*.txt.)
$ ls '$pattern'
ls: cannot access $pattern: No such file or directory
(There is no file named $pattern, either!)
In more concrete terms, anything containing a filename should usually be quoted (because filenames can contain whitespace and other shell metacharacters). Anything containing a URL should usually be quoted (because many URLs contain shell metacharacters like ? and &). Anything containing a regex should usually be quoted (ditto ditto). Anything containing significant whitespace other than single spaces between non-whitespace characters needs to be quoted (because otherwise, the shell will munge the whitespace into, effectively, single spaces, and trim any leading or trailing whitespace).
When you know that a variable can only contain a value which contains no shell metacharacters, quoting is optional. Thus, an unquoted $? is basically fine, because this variable can only ever contain a single number. However, "$?" is also correct, and recommended for general consistency and correctness (though this is my personal recommendation, not a widely recognized policy).
Values which are not variables basically follow the same rules, though you could then also escape any metacharacters instead of quoting them. For a common example, a URL with a & in it will be parsed by the shell as a background command unless the metacharacter is escaped or quoted:
$ wget http://example.com/q&uack
[1] wget http://example.com/q
-bash: uack: command not found
(Of course, this also happens if the URL is in an unquoted variable.) For a static string, single quotes make the most sense, although any form of quoting or escaping works here.
wget 'http://example.com/q&uack' # Single quotes preferred for a static string
wget "http://example.com/q&uack" # Double quotes work here, too (no $ or ` in the value)
wget http://example.com/q\&uack # Backslash escape
wget http://example.com/q'&'uack # Only the metacharacter really needs quoting
The last example also suggests another useful concept, which I like to call "seesaw quoting". If you need to mix single and double quotes, you can use them adjacent to each other. For example, the following quoted strings
'$HOME '
"isn't"
' where `<3'
"' is."
can be pasted together back to back, forming a single long string after tokenization and quote removal.
$ echo '$HOME '"isn't"' where `<3'"' is."
$HOME isn't where `<3' is.
This isn't awfully legible, but it's a common technique and thus good to know.
As an aside, scripts should usually not use ls for anything. To expand a wildcard, just ... use it.
$ printf '%s\n' $pattern # not ``ls -1 $pattern''
file1.txt
file_other.txt
$ for file in $pattern; do # definitely, definitely not ``for file in $(ls $pattern)''
> printf 'Found file: %s\n' "$file"
> done
Found file: file1.txt
Found file: file_other.txt
(The loop is completely superfluous in the latter example; printf specifically works fine with multiple arguments. stat too. But looping over a wildcard match is a common problem, and frequently done incorrectly.)
A variable containing a list of tokens to loop over or a wildcard to expand is less frequently seen, so we sometimes abbreviate to "quote everything unless you know precisely what you are doing".
Here is a three-point formula for quotes in general:
Double quotes
In contexts where we want to suppress word splitting and globbing. Also in contexts where we want the literal to be treated as a string, not a regex.
Single quotes
In string literals where we want to suppress interpolation and special treatment of backslashes. In other words, situations where using double quotes would be inappropriate.
No quotes
In contexts where we are absolutely sure that there are no word splitting or globbing issues or we do want word splitting and globbing.
Examples
Double quotes
literal strings with whitespace ("StackOverflow rocks!", "Steve's Apple")
variable expansions ("$var", "${arr[#]}")
command substitutions ("$(ls)", "`ls`")
globs where directory path or file name part includes spaces ("/my dir/"*)
to protect single quotes ("single'quote'delimited'string")
Bash parameter expansion ("${filename##*/}")
Single quotes
command names and arguments that have whitespace in them
literal strings that need interpolation to be suppressed ( 'Really costs $$!', 'just a backslash followed by a t: \t')
to protect double quotes ('The "crux"')
regex literals that need interpolation to be suppressed
use shell quoting for literals involving special characters ($'\n\t')
use shell quoting where we need to protect several single and double quotes ($'{"table": "users", "where": "first_name"=\'Steve\'}')
No quotes
around standard numeric variables ($$, $?, $# etc.)
in arithmetic contexts like ((count++)), "${arr[idx]}", "${string:start:length}"
inside [[ ]] expression which is free from word splitting and globbing issues (this is a matter of style and opinions can vary widely)
where we want word splitting (for word in $words)
where we want globbing (for txtfile in *.txt; do ...)
where we want ~ to be interpreted as $HOME (~/"some dir" but not "~/some dir")
See also:
Difference between single and double quotes in Bash
What are the special dollar sign shell variables?
Quotes and escaping - Bash Hackers' Wiki
When is double quoting necessary?
I generally use quoted like "$var" for safe, unless I am sure that $var does not contain space.
I do use $var as a simple way to join lines:
lines="`cat multi-lines-text-file.txt`"
echo "$lines" ## multiple lines
echo $lines ## all spaces (including newlines) are zapped
Whenever the https://www.shellcheck.net/ plugin for your editor tells you to.

bash echo environment variable containing escaped characters

I have an script that echo the input given, into a file as follows:
echo $# > file.txt
When I pass a sting like "\"" I want it to exactly print "\"" to the file however it prints ".
My question is how can I print all characters of a variable containing a string without considering escapes?
When I use echo in bash like echo "\"" it only prints " while when I use echo '"\""' it prints it correctly. I thought maybe that would be the solution to use single quotes around the variable, however I cannot get the value of a variable inside single quotes.
First, note that
echo $# > file.txt
can fail in several ways. Shellcheck identifies one problem (missing quotes on $#). See the accepted, and excellent, answer to Why is printf better than echo? for others.
Second, as others have pointed out, there is no practical way for a Bash program to know exactly how parameters were specified on the command line. For instance, for all of these invocations
prog \"
prog "\""
prog '"'
the code in prog will see a $1 value that consists of one double-quote character. Any quoting characters that are used in the invocation of prog are removed by the quote removal part of the shell expansions done by the parent shell process.
Normally that doesn't matter. If variables or parameters contain values that would need to be quoted when entered as literals (e.g. "\"") they can be used safely, including passing them as parameters to other programs, by quoting uses of the variable or parameter (e.g. "$1", "$#", "$x").
There is a problem with variables or parameters that require quoting when entered literally if you need to write them in a way that they can be reused as shell input (e.g. by using eval or source/.). Bash supports the %q format specification to the printf builtin to handle this situation. It's not clear what the OP is trying to do, but one possible solution to the question is:
if (( $# > 0 )) ; then
printf -v quoted_params '%q ' "$#" # Add all parameters to 'quoted_params'
printf '%s\n' "${quoted_params% }" # Remove trailing space when printing
fi >file.txt
That creates an empty 'file.txt' when no positional parameters are provided. The code would need to be changed if that is not what is required.
If you run echo \", the function of the backslash in bash is to escape the character after it. This actually enables you to use the double quotes as an argument. You cannot use a backslash by itself; if you want to have a backslash as an argument you need to use another slash to escape that: echo \\
Now if you want to create a string where these things are not escaped, use single quotes: echo '\'
See for a better explanation this post: Difference between single and double quotes in Bash

Bash PS1: line wrap issue with non-printing characters from an external command

I am using an external command to populate my bash prompt, which is run each time PS1 is evaluated. However, I have a problem when this command outputs non-printable characters (like color escape codes).
Here is an example:
$ cat green_cheese.sh
#!/bin/bash
echo -e "\033[32mcheese\033[0m"
$ export PS1="\$(./green_cheese.sh) \$"
cheese $ # <- cheese is green!
cheese $ <now type really long command>
The canonical way of dealing with non-printing characters in the PS1 prompt is to enclose them in \[ and \] escape sequences. The problem is that if you do this from the external command those escapes are not parsed by the PS1 interpreter:
$ cat green_cheese.sh
#!/bin/bash
echo -e "\[\033[32m\]cheese\[\033[0m\]"
$ export PS1="\$(./green_cheese.sh) \$"
\[\]cheese\[\] $ # <- FAIL!
Is there a particular escape sequence I can use from the external command to achieve the desired result? Or is there a way I can manually tell the prompt how many characters to set the prompt width to?
Assume that I can print anything I like from the external command, and that this command can be quite intelligent (for example, counting characters in the output). I can also make the export PS1=... command as complicated as required. However, the escape codes for the colors must come from the external command.
Thanks in advance!
I couldn't tell you exactly why this works, but replace \[ and \] with the actual characters that bash generates from them in your prompt:
echo -e "\001\033[32m\002cheese\001\033[0m\002"
[I learned this from some Stack Overflow post that I cannot find now.]
If I had to guess, it's that bash replaces \[ and \] with the two ASCII characters before executing the command that's embedded in the prompt, so that by the time green_cheese.sh completes, it's too late for bash to process the wrappers correctly, and so they are treated literally. One way to avoid this is to use PROMPT_COMMAND to build your prompt dynamically, rather than embedding executable code in the value of PS1.
prompt_cmd () {
PS1="$(green_cheese.sh)"
PS1+=' \$ '
}
PROMPT_COMMAND=prompt_cmd
This way, the \[ and \] are added to PS1 when it is defined, not when it is evaluated, so you don't need to use \001 and \002 directly.
If you can't edit the code generating the string containing ANSI color / control codes, you can wrap them after the fact.
The following will enclose ANSI control sequences in ASCII SOH (^A) and STX (^B) which are equivalent to \[ and \] respectively:
function readline_ANSI_escape() {
if [[ $# -ge 1 ]]; then
echo "$*"
else
cat # Read string from STDIN
fi | \
perl -pe 's/(?:(?<!\x1)|(?<!\\\[))(\x1b\[[0-9;]*[mG])(?!\x2|\\\])/\x1\1\x2/g'
}
Use it like:
$ echo $'\e[0;1;31mRED' | readline_ANSI_escape
Or:
$ readline_ANSI_escape "$string"
As a bonus, running the function multiple times will not re-escape already escaped control codes.
I suspect that if you echo the value of $PS1 after your first example, you’ll find that its value is the word “cheese” in green. (At least, that’s what I see when I run your example.) At first glance, this is what you want — the word “cheese” in green! Except that what you really wanted was the word cheese preceded by the escape codes that produce green. What you did by using the -e flag for echo is produce a value with the escape codes already evaluated.
That happens to work for the specification of colors, but as you’ve found, it mangles the “non-printing sequence” markers into something the $PS1 interpreter doesn’t properly understand.
Fortunately, the solution is simple: drop the -e flag. echo will then leave the escape sequences untouched, and the $PS1 interpreter will Do The Right Thing™.

How does one ‘contract’ strings to escape special characters in Bash?

There are many ways to expand an escaped string, but how can a shell command be made to take a string as an argument and escape it?
Here are some examples of different ways of expansion:
$ echo -e '\x27\\012\b34\n56\\\aa7\t8\r 9\0\0134\047'
'\0134
9\'7 8
$ echo $'\x27\\012\b34\n56\\\aa7\t8\r 9\0\0134\047'
'\0134
9\a7 8
$ PS1='(5)$ ' # At least tab-width - 3 long; 5 columns given typical tab-width.
(5)$ printf %b '\x27\\012\b34\n56\\\aa7\t8\r 9\0\0134\047'
'\0134
9\'(5)$
Note: there's actually a tab character between the 7 and 8 above, but the markup rendering seems to break it.
Yes, all sorts of craziness in there. ;-)
Anyway, I'm looking for the reverse of such escape expansion commands. If the command was called escape, it would satisfy these properties:
$ echo -ne "$(escape "$originalString")"
Should output the verbatim value of originalString as would ‘echo -n "$originalString"’. I.e. it should be an identity.
Likewise:
$ escape "$(echo -ne "$escapedString")"
Should output the string escaped again, though not necessarily in the same way as before. E.g. \0134 may become \\ or vice versa.
Don't use echo -e -- it's very poorly specified in POSIX, and considered deprecated for all but the simplest uses. Bash has extensions to its printf that provide a better-supported approach:
printf -v escaped_string %q "$raw_string"
...gives you a shell-escaped string from a raw one (storing it in a variable named escaped_string), and
printf -v raw_string %b "$escaped_string"
...gives you a raw string from a backslash-escaped one, storing it in raw_string.
Note that the two escape syntaxes are not equivalent -- strings escaped with printf %q are ready for eval, rather than for printf %b.
That is, you can safely run:
eval "myvar=$escaped_string"
...when escaped_string has been created with printf %q as above.
That said: What's the use case? It's strongly preferred to handle raw strings as raw strings (using NUL terminaters when delimiting is necessary), rather than converting them to and from an escaped form.

Resources