bash difference between raw string and string in variable - string

I wrote a little script in bash, but it only worked when I stored the string as a variable, and I'd like to know why. Here's the summary:
When I use the string itself, bash treats it as a single entity
for word in "this is a sentence"; do
echo $word
done
# => this is a sentence
If I save the exact same string into a variable, bash iterates over the words
sentence="this is a sentence"
for word in $sentence; do
echo $word
done
# => this
# is
# a
# sentence
Why are these being treated differently?
Is there a simple way to iterate through the words in the string without first saving the string as a variable?

The quotes tell bash to treat a thing in quotes as a single parameter in a parameter list at the time the expression is evaluated. The quotes (unless protected with \ or ') are removed.
echo "" # prints newlines, no quotes
echo '""' # Print ""
export X='""'
env | grep X # X contains ""
export X=""
env | grep X # X is empty
When you use a variable, bash unpacks it as is (i.e. as if you typed the variable's contents in the variable's place). For a for-loop bash determines the list-elements to iterate over by separating the for-loop's parameters by whitespace, but treating (as always) quote-protected items a single parameter/list-element. Your variable contained no quotes -- items are treated as separate parameters.

As comments suggested, quotes are important. A for loop will step through a list of values terminated by a semicolon, and that list is a set of strings. Unquoted strings are delimited usually by whitespace. Whitespace inside a quoted string does not separate the string from its brethren, it's simply part of the quoted string. There's some truly excellent documentation about quotes in bash at http://mywiki.wooledge.org/Quotes . Read it. Read it now. You'll find a part that says
The quotes are not actually passed along to the command. They are removed by the shell (this process is cleverly called "quote removal").
To step through the words in a sentence that's stored in a variable (if I've inferred your question correctly), you could perhaps use an array to separate the words by whitespace:
#!/bin/bash
sentence="this is a sentence"
IFS=" " read -a words <<< "$sentence"
for word in "${words[#]}"; do
echo "$word"
done
In bash, read -a will divide a string by $IFS and place the divided parts into elements of the array. See http://mywiki.wooledge.org/BashGuide/Arrays for more information about how bash arrays work.
If you want more details in pursuit of a specific problem, you might want to tell us what the problem is, or risk making this an XY problem.

In the assignment
sentence="this is a sentence"
there are no unquoted spaces, so everything to the right of the = is treated as a single word. (Something like sentence=this is a sentence would be parsed as a single assignment sentence=this followed by an attempt to run a program called is.) As a result, the value of sentences is a sequence of 18 characters. It is identical to
sentence=this\ is\ a\ sentence
because again, there are no unquoted spaces.
For the same reason
for word in "this is a sentence"; do
echo $word
done
has word being set to each word in the following sequence, which only contains a single word because there are no unquoted spaces.
The key difference with your other loop is that parameter expansions are subject to word-splitting after the fact. The loop
for word in $sentence; do
echo $word
done
after parameter expansion looks like
for word in this is a sentence; do
echo $word
done
so now word is set to each of the 4 words in the list following the in keyword.
It's not clear what you are actually asking at the end of your question, but the preceding is legal code. There is no requirement that a string be placed in quotes in bash; quotes do not define something as a string value, but simply escape every character that appears within the quotes. "foo" and \f\o\o are the same thing in shell.

Quoting turns any string into a single unit. If you lose the quotes, everything should be fine.

Related

Parsing a string with quotes in GETOPTS

I am trying to accept a space delimited string in place of $OPTARG while parsing an option
For example
./script -k '1 2 ad'ias'
As seen the third string can contain any special character. Is there a way that I can overlook the quote in between as I want to parse the entire string and process some options
Tried inserting the \ character but that does not work for my case because I cannot insert any character in my string.
while getopts "a:k:" option
do
echo "${option}"
case ${option} in
a)
function_a ${OPTARG} # <-- no quotes
;;
k)
function_k "${OPTARG}" # <-- quotes
;;
esac
done
I'm not sure I fully understand what the difficulty is; handling strings with special characters is a bit tricky, but (except the NUL character) basically doable. The main things to watch out for are:
When you represent a string literal (in a script, or when passing arguments to a script), you must use a valid shell representation of that string, not just the raw string. For example, suppose you want to pass/use this string:
12 34 kla#42#!' 2 M$" rtqas;::#
There are a number of ways of representing this string for use in a shell script or command line. You can leave it unquoted but escape the individual special characters, like this:
12\ 34\ kla\#42#\!\'\ 2\ M\$\"\ rtqas\;::\#
Or you could wrap it in double-quotes, and escape just those characters that retain special meaning inside double-quotes (that is, double-quotes, backquotes, and dollar signs, and if it's a bash interactive shell exclamation marks):
"12 34 kla#42#!' 2 M\$\" rtqas;::#" # For a non-interactive shell
"12 34 kla#42#\!' 2 M\$\" rtqas;::#" # For an interactive shell
If it didn't contain single-quotes, you could single-quote it; since it does, you can't use that method. But you can mix methods, e.g. using single-quotes around the parts that don't contain single-quotes and escaping or double-quoting the single quote:
'12 34 kla#42#!'\'' 2 M$" rtqas;::#' # Single-quote is escaped
'12 34 kla#42#!'"'"' 2 M$" rtqas;::#' # Single-quote is double-quoted
In bash (but not some other shells), there's also ANSI-C-escaped strings, written with $' ... ':
$'12 34 kla#42#!\' 2 M$" rtqas;::#' # Single-quote is the only character that needs escaping
Note that all of the above are just different ways of representing the exact same string; once the shell parses it, it comes out the same from any of them. You can use whatever's convenient, but you must use a syntactically valid representation of the string.
Once the string is stored in a parameter/variable, you must put double-quotes around references to that variable. In most shell contexts, when a variable is used without quotes, the shell will split it into words (based on spaces or whatever's in IFS), and try to expand anything that looks like a file wildcard; you don't want this. But if it's in double-quotes, the variable gets expanded and no further parsing is done, it's just passed through unmolested.
Actually, you should almost always double-quote variable references in shell scripts even if you don't expect them to contain special characters. We see so many shell questions here that have the answer "if you double-quoted your variable references, you wouldn't have this problem"...
Here's an example, based on your script:
#!/bin/bash
printopt() {
printf '%s value is: <<%s>>\n' "$1" "$2" # Double-quotes required here
}
while getopts "a:k:" option
do
case "${option}" in # This is one of the few places it's safe to leave off double-quotes. But they don't hurt.
a)
printopt "-a" "${OPTARG}" # Double-quotes required here
;;
k)
printopt "-k" "${OPTARG}" # Double-quotes required here
;;
esac
done
And running it with various representations of strings:
$ ./argtest.sh -a 12\ 34\ kla\#42#\!\'\ 2\ M\$\"\ rtqas\;::\# -k "1 2 ad'ias"
-a value is: <<12 34 kla#42#!' 2 M$" rtqas;::#>>
-k value is: <<1 2 ad'ias>>
$ ./argtest.sh -a '12 34 kla#42#!'"'"' 2 M$" rtqas;::#' -k $'1 2 ad\'ias'
-a value is: <<12 34 kla#42#!' 2 M$" rtqas;::#>>
-k value is: <<1 2 ad'ias>>
Ok, ok, there are a few situations where it's more complicated than that:
There are some situations where a string will be run through the shell parsing process multiple times, such as when it's being run over ssh (the command gets processed by the local shell, passed to the remote computer, then processed by that shell and executed), or used as a shell alias (the alias command gets parsed, result stored, then parsed again when you use it). In these cases, you essentially need two (or possibly more) layers of quoting/escaping: take the raw string, quote/escape by any of the above methods, then take that string and quote/escape that (probably by a different method).
Some versions of echo will parse escape (backslash) sequences in the string (using different rules than the shell itself does), which can cause confusion. I recommend using printf instead when this might be an issue; the only problem is that it's more complex than echo: it doesn't just print its arguments, it uses the first argument is a format string which controls how the rest of the arguments are printed. See my examples in the script above.
If you are passing the string to another script that doesn't use double-quotes around its parameter & variable references, you are doomed. In this case, the only thing that can be done is to fix that other script.

How can I remove a newline (\n) at the end of a string?

The problem
I have multiple property lines in a single string separated by \n like this:
LINES2="Abc1.def=$SOME_VAR\nAbc2.def=SOMETHING_ELSE\n"$LINES
The LINES variable
might contain an undefined set of characters
may be empty. If it is empty, I want to avoid the trailing \n.
I am open for any command line utility (sed, tr, awk, ... you name it).
Tryings
I tried this to no avail
sed -z 's/\\n$//g' <<< $LINES2
I also had no luck with tr, since it does not accept regex.
Idea
There might be an approach to convert the \n to something else. But since $LINES can contain arbitrary characters, this might be dangerous.
Sources
I skim read through the following questions
How can I replace a newline (\n) using sed?
sed with literal string--not input file
Here's one solution:
LINES2="Abc1.def=$SOME_VAR"$'\n'"Abc2.def=SOMETHING_ELSE${LINES:+$'\n'$LINES}"
The syntax ${name:+value} means "insert value if the variable name exists and is not empty." So in this case, it inserts a newline followed by $LINES if $LINES is not empty, which seems to be precisely what you want.
I use $'\n' because "\n" is not a newline character. A more readable solution would be to define a shell variable whose value is a single newline.
It is not necessary to quote strings in shell assignment statements, since the right-hand side of an assignment does not undergo word-splitting nor glob expansion. Not quoting would make it easier to interpolate a $'\n'.
It is not usually advisable to use UPPER-CASE for shell variables because the shell and the OS use upper-case names for their own purposes. Your local variables should normally be lower case names.
So if I were not basing the answer on the command in the question, I would have written:
lines2=Abc1.def=$someVar$'\n'Abc2.def=SOMETHING_ELSE${lines:+$'\n'$lines}

Is space considered a metacharacter in Bash?

I have searched for the list of metacharacters in Bash but space is not enlisted.
I wonder if I'm right by assuming that space is the "token separation character" in Bash, since it not only works as such with Shell programs or builtins but also when creating an array through compound assignment - quotes escape spaces, just like they do most other metacharacters.
They cannot be escaped by backslashes, though.
Parameters are passed to programs and functions separated by spaces, for example.
Can someone explain how (and when) bash interprets spaces? Thanks!
I've written an example:
$ a=(zero one two)
$ echo ${a[0]}
$ zero
$ a=("zero one two")
$ echo ${a[0]}
$ zero one two
From the man page:
metacharacter
A character that, when unquoted, separates words. One of the following:
| & ; ( ) < > space tab
^^^^^
According to the Posix shell specification for Token Recognition, any shell (which pretends to be Posix-compliant) should interpret whitespace as separating tokens:
If the current character is an unquoted <newline>, the current token shall be delimited.
If the current character is an unquoted <blank>, any token containing the previous character is delimited and the current character shall be discarded.
Here <blank> refers to the character class blank as defined by LC_CTYPE at the time the shell starts. In almost all cases, that character class consists precisely of the space and tab characters.
It's important to distinguish between the shell mechanism for recognizing tokens, and the use of $IFS to perform word-splitting. Word splitting is performed (in most contexts) after brace, tilde, parameter and variable, arithmetic and command expansions. Consider, for example:
$ # Setting IFS does not affect token recognition
$ bash -c 'IFS=:; arr=(foo:bar); echo "${arr[0]}"'
foo:bar
$ # But it does affect word splitting after variable expansion
$ bash -c 'IFS=: foobar=foo:bar; arr=($foobar); echo "${arr[0]}"'
foo
Yes it is. From the Bash Reference Manual's Definitions section:
blank
A space or tab character.
…
metacharacter
A character that, when unquoted, separates words. A metacharacter is a blank or one of the following characters: ‘|’, ‘&’, ‘;’, ‘(’, ‘)’, ‘<’, or ‘>’.

Bash array creation: ("$#") vs ($#)

I am running a script with: ./some_script arg1 arg2 "multiple words arg3" arg4. I want to explode arguments ($#) into an array. This snippet works just for arguments without spaces:
arr=($#)
If I want to store the correct arguments into array I must use:
arr=("$#")
Why should I enclose $# in quotes?
I think this has something to do with parameter expansion and special parameters, but I don't think I got it well.
In the shell, whenever a variable (including special parameters like $#) in referenced without double-quotes, the value goes through word splitting and wildcard expansion after it's expanded. For example:
$ var="FOO * BAR"
$ printf "%s\n" "$var"
FOO * BAR
$ printf "%s\n" $var
FOO
Desktop
Documents
Downloads
Library
Movies
Music
Pictures
Public
BAR
In the second case, the variable value "FOO * BAR" got split into separate words ("FOO", "*", and "BAR"), and then the "*" was expanded into a list of matching files. This is why you almost always want to put variable references in double-quotes.
The same thing applies to $# -- if it's not in double-quotes, it's expanded into the list of parameters and then each one of them is subjected to that same word splitting and wildcard expansion that $var went through above. If it's in double-quotes, the parameter values are left unmolested.
BTW, there is another way to get the parameters: $*. This differs from $# in that it sticks all of the parameter values together with spaces between them (while $# maintains each parameter as a separate word). In double-quotes, "$*" gives a single word consisting of all parameters. Without double-quotes, $* sticks all the parameters together, then re-splits them (maybe at the same places, maybe not), and does wildcard expansion. Probably not what you wanted.
If you didn't surround it with quotes, "multiple words arg3" would be further expanded to multiple words arg3 - ie. the quotes preserve special characters after the variable has been expanded.
In other words, when you don't surround a variable expansion with quotes, what the variable expands to will further be expanded, which would in this case eliminate the double quotes around the third argument.

Linux bash string split using IFS on single quote

there are many answered questions about IFS string splitting and single quote escaping in Linux bash, but I found none joining the two topics. Stumbling upon the problem I got a strange (to me) behavior with a code like the one here below:
(bash script block)
theString="a string with some 'single' quotes in it"
old_IFS=$IFS
IFS=\'
read -a stringTokens <<< "$theString"
IFS=$old_IFS
for token in ${stringTokens[#]}
do
echo $token
done
# let's say $i holds the piece of string between quotes
echo ${stringTokens[$i]}
What happens is that the echo-ed element of the array actually contains the sub-string I need (thus leading me to think that the \' IFS is correct) while the for loop return the string split on spaces.
Can someone kindly help me understanding why the same array (or what in my mind looks like the same array) behaves like this?
When you do:
for token in ${stringTokens[#]}
The loop effectively becomes:
for token in a string with some single quotes in it
The for loop does not parse the array element-wise, but it parses the entire output of the string separated by spaces.
Instead try:
for token in "${stringTokens[#]}";
do
echo "$token"
done
This will be equivalent to:
for token in "in a string with some " "single" " quotes in it"
Output on my PC:
a string with some
single
quotes in it
Check this out for more Bash Pitfalls:
http://mywiki.wooledge.org/BashPitfalls

Resources