Is space considered a metacharacter in Bash? - linux

I have searched for the list of metacharacters in Bash but space is not enlisted.
I wonder if I'm right by assuming that space is the "token separation character" in Bash, since it not only works as such with Shell programs or builtins but also when creating an array through compound assignment - quotes escape spaces, just like they do most other metacharacters.
They cannot be escaped by backslashes, though.
Parameters are passed to programs and functions separated by spaces, for example.
Can someone explain how (and when) bash interprets spaces? Thanks!
I've written an example:
$ a=(zero one two)
$ echo ${a[0]}
$ zero
$ a=("zero one two")
$ echo ${a[0]}
$ zero one two

From the man page:
metacharacter
A character that, when unquoted, separates words. One of the following:
| & ; ( ) < > space tab
^^^^^

According to the Posix shell specification for Token Recognition, any shell (which pretends to be Posix-compliant) should interpret whitespace as separating tokens:
If the current character is an unquoted <newline>, the current token shall be delimited.
If the current character is an unquoted <blank>, any token containing the previous character is delimited and the current character shall be discarded.
Here <blank> refers to the character class blank as defined by LC_CTYPE at the time the shell starts. In almost all cases, that character class consists precisely of the space and tab characters.
It's important to distinguish between the shell mechanism for recognizing tokens, and the use of $IFS to perform word-splitting. Word splitting is performed (in most contexts) after brace, tilde, parameter and variable, arithmetic and command expansions. Consider, for example:
$ # Setting IFS does not affect token recognition
$ bash -c 'IFS=:; arr=(foo:bar); echo "${arr[0]}"'
foo:bar
$ # But it does affect word splitting after variable expansion
$ bash -c 'IFS=: foobar=foo:bar; arr=($foobar); echo "${arr[0]}"'
foo

Yes it is. From the Bash Reference Manual's Definitions section:
blank
A space or tab character.
…
metacharacter
A character that, when unquoted, separates words. A metacharacter is a blank or one of the following characters: ‘|’, ‘&’, ‘;’, ‘(’, ‘)’, ‘<’, or ‘>’.

Related

Writing a BASH command to print a range [duplicate]

I want to run a command from a bash script which has single quotes and some other commands inside the single quotes and a variable.
e.g. repo forall -c '....$variable'
In this format, $ is escaped and the variable is not expanded.
I tried the following variations but they were rejected:
repo forall -c '...."$variable" '
repo forall -c " '....$variable' "
" repo forall -c '....$variable' "
repo forall -c "'" ....$variable "'"
If I substitute the value in place of the variable the command is executed just fine.
Please tell me where am I going wrong.
Inside single quotes everything is preserved literally, without exception.
That means you have to close the quotes, insert something, and then re-enter again.
'before'"$variable"'after'
'before'"'"'after'
'before'\''after'
Word concatenation is simply done by juxtaposition. As you can verify, each of the above lines is a single word to the shell. Quotes (single or double quotes, depending on the situation) don't isolate words. They are only used to disable interpretation of various special characters, like whitespace, $, ;... For a good tutorial on quoting see Mark Reed's answer. Also relevant: Which characters need to be escaped in bash?
Do not concatenate strings interpreted by a shell
You should absolutely avoid building shell commands by concatenating variables. This is a bad idea similar to concatenation of SQL fragments (SQL injection!).
Usually it is possible to have placeholders in the command, and to supply the command together with variables so that the callee can receive them from the invocation arguments list.
For example, the following is very unsafe. DON'T DO THIS
script="echo \"Argument 1 is: $myvar\""
/bin/sh -c "$script"
If the contents of $myvar is untrusted, here is an exploit:
myvar='foo"; echo "you were hacked'
Instead of the above invocation, use positional arguments. The following invocation is better -- it's not exploitable:
script='echo "arg 1 is: $1"'
/bin/sh -c "$script" -- "$myvar"
Note the use of single ticks in the assignment to script, which means that it's taken literally, without variable expansion or any other form of interpretation.
The repo command can't care what kind of quotes it gets. If you need parameter expansion, use double quotes. If that means you wind up having to backslash a lot of stuff, use single quotes for most of it, and then break out of them and go into doubles for the part where you need the expansion to happen.
repo forall -c 'literal stuff goes here; '"stuff with $parameters here"' more literal stuff'
Explanation follows, if you're interested.
When you run a command from the shell, what that command receives as arguments is an array of null-terminated strings. Those strings may contain absolutely any non-null character.
But when the shell is building that array of strings from a command line, it interprets some characters specially; this is designed to make commands easier (indeed, possible) to type. For instance, spaces normally indicate the boundary between strings in the array; for that reason, the individual arguments are sometimes called "words". But an argument may nonetheless have spaces in it; you just need some way to tell the shell that's what you want.
You can use a backslash in front of any character (including space, or another backslash) to tell the shell to treat that character literally. But while you can do something like this:
reply=\”That\'ll\ be\ \$4.96,\ please,\"\ said\ the\ cashier
...it can get tiresome. So the shell offers an alternative: quotation marks. These come in two main varieties.
Double-quotation marks are called "grouping quotes". They prevent wildcards and aliases from being expanded, but mostly they're for including spaces in a word. Other things like parameter and command expansion (the sorts of thing signaled by a $) still happen. And of course if you want a literal double-quote inside double-quotes, you have to backslash it:
reply="\"That'll be \$4.96, please,\" said the cashier"
Single-quotation marks are more draconian. Everything between them is taken completely literally, including backslashes. There is absolutely no way to get a literal single quote inside single quotes.
Fortunately, quotation marks in the shell are not word delimiters; by themselves, they don't terminate a word. You can go in and out of quotes, including between different types of quotes, within the same word to get the desired result:
reply='"That'\''ll be $4.96, please," said the cashier'
So that's easier - a lot fewer backslashes, although the close-single-quote, backslashed-literal-single-quote, open-single-quote sequence takes some getting used to.
Modern shells have added another quoting style not specified by the POSIX standard, in which the leading single quotation mark is prefixed with a dollar sign. Strings so quoted follow similar conventions to string literals in the ANSI standard version of the C programming language, and are therefore sometimes called "ANSI strings" and the $'...' pair "ANSI quotes". Within such strings, the above advice about backslashes being taken literally no longer applies. Instead, they become special again - not only can you include a literal single quotation mark or backslash by prepending a backslash to it, but the shell also expands the ANSI C character escapes (like \n for a newline, \t for tab, and \xHH for the character with hexadecimal code HH). Otherwise, however, they behave as single-quoted strings: no parameter or command substitution takes place:
reply=$'"That\'ll be $4.96, please," said the cashier'
The important thing to note is that the single string that gets stored in the reply variable is exactly the same in all of these examples. Similarly, after the shell is done parsing a command line, there is no way for the command being run to tell exactly how each argument string was actually typed – or even if it was typed, rather than being created programmatically somehow.
Below is what worked for me -
QUOTE="'"
hive -e "alter table TBL_NAME set location $QUOTE$TBL_HDFS_DIR_PATH$QUOTE"
EDIT: (As per the comments in question:)
I've been looking into this since then. I was lucky enough that I had repo laying around. Still it's not clear to me whether you need to enclose your commands between single quotes by force. I looked into the repo syntax and I don't think you need to. You could used double quotes around your command, and then use whatever single and double quotes you need inside provided you escape double ones.
just use printf
instead of
repo forall -c '....$variable'
use printf to replace the variable token with the expanded variable.
For example:
template='.... %s'
repo forall -c $(printf "${template}" "${variable}")
Variables can contain single quotes.
myvar=\'....$variable\'
repo forall -c $myvar
I was wondering why I could never get my awk statement to print from an ssh session so I found this forum. Nothing here helped me directly but if anyone is having an issue similar to below, then give me an up vote. It seems any sort of single or double quotes were just not helping, but then I didn't try everything.
check_var="df -h / | awk 'FNR==2{print $3}'"
getckvar=$(ssh user#host "$check_var")
echo $getckvar
What do you get? A load of nothing.
Fix: escape \$3 in your print function.
Does this work for you?
eval repo forall -c '....$variable'

strange characters when redirecting to file in bash script [duplicate]

This question already has an answer here:
Prevent "echo" from interpreting backslash escapes
(1 answer)
Closed 4 years ago.
I have a bash script that contains lines:
remote_installer_svc_args="$local_cifs_mount/eset-remote-installer.args"
svc_arg_x86="%SYSTEMROOT%\\$(basename $remote_temp_dir)\\$(basename $INSTALLER_BAT)"
svc_arg_x64="%SYSTEMROOT%\\$(basename $remote_temp_dir)\\$(basename $INSTALLER_BAT)"
echo "$svc_arg_x86" > $remote_installer_svc_args
echo "$svc_arg_x64" >> $remote_installer_svc_args
It should produce a file that looks like this (in notepad++ on windows):
instead the file looks like this:
or in vim:
What is wrong with the script? Because when I copy those lines into bash it works, only if I run the script it does produce those strange characters...
You've run into part of the mess of inconsistent behavior that plagues the echo command. Specifically, some versions of echo (in some modes) interpret escape (backslash) sequences in the string they're asked to print. Others don't. When you ask echo to print %SYSTEMROOT%\era_rd_6HbUKJTR\EraAgentInstaller.bat, it might see the \e part and think it's supposed to convert that to the ASCII escape character.
Note that there are two different characters being called "escape" here: The backslash is used by the shell as an escape character, meaning that it and the characters immediately following it have some special meaning. The ASCII escape, on the other hand, is treated as a special character by the terminal (and vim and some other things) in a somewhat similar manner. Since the ASCII escape is a nonprinting character, when notepad++ and vim have to display it, they show some sort of alternate representation ("ESC" or "^]").
Anyway, since echo is inconsistent about its treatment of the backslash character, it's best to avoid it for strings that might contain backslash. Use printf instead (see "Why is printf better than echo?" on unix.se). It's a little more complicated to use, but not too bad. The main things to realize are that the first argument to printf is a "format" string that's used to control how the rest of the arguments are printed, and that unlike echo it doesn't automatically add a newline to the end.
What you want to use is:
printf '%s\n' "$svc_arg_x86" > $remote_installer_svc_args
printf '%s\n' "$svc_arg_x64" >> $remote_installer_svc_args
Or you can simplify it to:
printf '%s\n' "$svc_arg_x86" "$svc_arg_x64" > $remote_installer_svc_args
That first argument, %s\n, says to print a plain string followed by a newline. Backslash escapes in the format string are always interpreted, but strings formatted with the %s format never have escapes interpreted. Note that in the single-command version, the format string gets applied to each of the other two arguments, so each gets a newline at the end, so each winds up on a separate line in the output file.

How can I remove a newline (\n) at the end of a string?

The problem
I have multiple property lines in a single string separated by \n like this:
LINES2="Abc1.def=$SOME_VAR\nAbc2.def=SOMETHING_ELSE\n"$LINES
The LINES variable
might contain an undefined set of characters
may be empty. If it is empty, I want to avoid the trailing \n.
I am open for any command line utility (sed, tr, awk, ... you name it).
Tryings
I tried this to no avail
sed -z 's/\\n$//g' <<< $LINES2
I also had no luck with tr, since it does not accept regex.
Idea
There might be an approach to convert the \n to something else. But since $LINES can contain arbitrary characters, this might be dangerous.
Sources
I skim read through the following questions
How can I replace a newline (\n) using sed?
sed with literal string--not input file
Here's one solution:
LINES2="Abc1.def=$SOME_VAR"$'\n'"Abc2.def=SOMETHING_ELSE${LINES:+$'\n'$LINES}"
The syntax ${name:+value} means "insert value if the variable name exists and is not empty." So in this case, it inserts a newline followed by $LINES if $LINES is not empty, which seems to be precisely what you want.
I use $'\n' because "\n" is not a newline character. A more readable solution would be to define a shell variable whose value is a single newline.
It is not necessary to quote strings in shell assignment statements, since the right-hand side of an assignment does not undergo word-splitting nor glob expansion. Not quoting would make it easier to interpolate a $'\n'.
It is not usually advisable to use UPPER-CASE for shell variables because the shell and the OS use upper-case names for their own purposes. Your local variables should normally be lower case names.
So if I were not basing the answer on the command in the question, I would have written:
lines2=Abc1.def=$someVar$'\n'Abc2.def=SOMETHING_ELSE${lines:+$'\n'$lines}

Could $ can be used as a concatenated symbol in BASH?

I have read a BASH script, and found the following line:
lines="$lines"$'\n'
After testing, I know the meaning of this line is adding a "\n" after the string "$lines".
But after checking the bash manual, I can't find "$" can be used as a concatenated symbol. Could anyone give explainations on this usage of "$"? Thanks very much in advance!
A slightly closer read of the Bash Manual under Quoting would reveal where this gem is hidden. Specifically:
Words of the form $'string' are treated specially. The word expands to
string, with backslash-escaped characters replaced as specified by the
ANSI C standard.
Used specifically in the context of \n it provdes a new line. You most often see this form of quoting used in regard to the Bash IFS (internal field separator) whose default is space tab newline written:
IFS=$' \t\n'

bash difference between raw string and string in variable

I wrote a little script in bash, but it only worked when I stored the string as a variable, and I'd like to know why. Here's the summary:
When I use the string itself, bash treats it as a single entity
for word in "this is a sentence"; do
echo $word
done
# => this is a sentence
If I save the exact same string into a variable, bash iterates over the words
sentence="this is a sentence"
for word in $sentence; do
echo $word
done
# => this
# is
# a
# sentence
Why are these being treated differently?
Is there a simple way to iterate through the words in the string without first saving the string as a variable?
The quotes tell bash to treat a thing in quotes as a single parameter in a parameter list at the time the expression is evaluated. The quotes (unless protected with \ or ') are removed.
echo "" # prints newlines, no quotes
echo '""' # Print ""
export X='""'
env | grep X # X contains ""
export X=""
env | grep X # X is empty
When you use a variable, bash unpacks it as is (i.e. as if you typed the variable's contents in the variable's place). For a for-loop bash determines the list-elements to iterate over by separating the for-loop's parameters by whitespace, but treating (as always) quote-protected items a single parameter/list-element. Your variable contained no quotes -- items are treated as separate parameters.
As comments suggested, quotes are important. A for loop will step through a list of values terminated by a semicolon, and that list is a set of strings. Unquoted strings are delimited usually by whitespace. Whitespace inside a quoted string does not separate the string from its brethren, it's simply part of the quoted string. There's some truly excellent documentation about quotes in bash at http://mywiki.wooledge.org/Quotes . Read it. Read it now. You'll find a part that says
The quotes are not actually passed along to the command. They are removed by the shell (this process is cleverly called "quote removal").
To step through the words in a sentence that's stored in a variable (if I've inferred your question correctly), you could perhaps use an array to separate the words by whitespace:
#!/bin/bash
sentence="this is a sentence"
IFS=" " read -a words <<< "$sentence"
for word in "${words[#]}"; do
echo "$word"
done
In bash, read -a will divide a string by $IFS and place the divided parts into elements of the array. See http://mywiki.wooledge.org/BashGuide/Arrays for more information about how bash arrays work.
If you want more details in pursuit of a specific problem, you might want to tell us what the problem is, or risk making this an XY problem.
In the assignment
sentence="this is a sentence"
there are no unquoted spaces, so everything to the right of the = is treated as a single word. (Something like sentence=this is a sentence would be parsed as a single assignment sentence=this followed by an attempt to run a program called is.) As a result, the value of sentences is a sequence of 18 characters. It is identical to
sentence=this\ is\ a\ sentence
because again, there are no unquoted spaces.
For the same reason
for word in "this is a sentence"; do
echo $word
done
has word being set to each word in the following sequence, which only contains a single word because there are no unquoted spaces.
The key difference with your other loop is that parameter expansions are subject to word-splitting after the fact. The loop
for word in $sentence; do
echo $word
done
after parameter expansion looks like
for word in this is a sentence; do
echo $word
done
so now word is set to each of the 4 words in the list following the in keyword.
It's not clear what you are actually asking at the end of your question, but the preceding is legal code. There is no requirement that a string be placed in quotes in bash; quotes do not define something as a string value, but simply escape every character that appears within the quotes. "foo" and \f\o\o are the same thing in shell.
Quoting turns any string into a single unit. If you lose the quotes, everything should be fine.

Resources