Linux bash string split using IFS on single quote - linux

there are many answered questions about IFS string splitting and single quote escaping in Linux bash, but I found none joining the two topics. Stumbling upon the problem I got a strange (to me) behavior with a code like the one here below:
(bash script block)
theString="a string with some 'single' quotes in it"
old_IFS=$IFS
IFS=\'
read -a stringTokens <<< "$theString"
IFS=$old_IFS
for token in ${stringTokens[#]}
do
echo $token
done
# let's say $i holds the piece of string between quotes
echo ${stringTokens[$i]}
What happens is that the echo-ed element of the array actually contains the sub-string I need (thus leading me to think that the \' IFS is correct) while the for loop return the string split on spaces.
Can someone kindly help me understanding why the same array (or what in my mind looks like the same array) behaves like this?

When you do:
for token in ${stringTokens[#]}
The loop effectively becomes:
for token in a string with some single quotes in it
The for loop does not parse the array element-wise, but it parses the entire output of the string separated by spaces.
Instead try:
for token in "${stringTokens[#]}";
do
echo "$token"
done
This will be equivalent to:
for token in "in a string with some " "single" " quotes in it"
Output on my PC:
a string with some
single
quotes in it
Check this out for more Bash Pitfalls:
http://mywiki.wooledge.org/BashPitfalls

Related

Parsing a string with quotes in GETOPTS

I am trying to accept a space delimited string in place of $OPTARG while parsing an option
For example
./script -k '1 2 ad'ias'
As seen the third string can contain any special character. Is there a way that I can overlook the quote in between as I want to parse the entire string and process some options
Tried inserting the \ character but that does not work for my case because I cannot insert any character in my string.
while getopts "a:k:" option
do
echo "${option}"
case ${option} in
a)
function_a ${OPTARG} # <-- no quotes
;;
k)
function_k "${OPTARG}" # <-- quotes
;;
esac
done
I'm not sure I fully understand what the difficulty is; handling strings with special characters is a bit tricky, but (except the NUL character) basically doable. The main things to watch out for are:
When you represent a string literal (in a script, or when passing arguments to a script), you must use a valid shell representation of that string, not just the raw string. For example, suppose you want to pass/use this string:
12 34 kla#42#!' 2 M$" rtqas;::#
There are a number of ways of representing this string for use in a shell script or command line. You can leave it unquoted but escape the individual special characters, like this:
12\ 34\ kla\#42#\!\'\ 2\ M\$\"\ rtqas\;::\#
Or you could wrap it in double-quotes, and escape just those characters that retain special meaning inside double-quotes (that is, double-quotes, backquotes, and dollar signs, and if it's a bash interactive shell exclamation marks):
"12 34 kla#42#!' 2 M\$\" rtqas;::#" # For a non-interactive shell
"12 34 kla#42#\!' 2 M\$\" rtqas;::#" # For an interactive shell
If it didn't contain single-quotes, you could single-quote it; since it does, you can't use that method. But you can mix methods, e.g. using single-quotes around the parts that don't contain single-quotes and escaping or double-quoting the single quote:
'12 34 kla#42#!'\'' 2 M$" rtqas;::#' # Single-quote is escaped
'12 34 kla#42#!'"'"' 2 M$" rtqas;::#' # Single-quote is double-quoted
In bash (but not some other shells), there's also ANSI-C-escaped strings, written with $' ... ':
$'12 34 kla#42#!\' 2 M$" rtqas;::#' # Single-quote is the only character that needs escaping
Note that all of the above are just different ways of representing the exact same string; once the shell parses it, it comes out the same from any of them. You can use whatever's convenient, but you must use a syntactically valid representation of the string.
Once the string is stored in a parameter/variable, you must put double-quotes around references to that variable. In most shell contexts, when a variable is used without quotes, the shell will split it into words (based on spaces or whatever's in IFS), and try to expand anything that looks like a file wildcard; you don't want this. But if it's in double-quotes, the variable gets expanded and no further parsing is done, it's just passed through unmolested.
Actually, you should almost always double-quote variable references in shell scripts even if you don't expect them to contain special characters. We see so many shell questions here that have the answer "if you double-quoted your variable references, you wouldn't have this problem"...
Here's an example, based on your script:
#!/bin/bash
printopt() {
printf '%s value is: <<%s>>\n' "$1" "$2" # Double-quotes required here
}
while getopts "a:k:" option
do
case "${option}" in # This is one of the few places it's safe to leave off double-quotes. But they don't hurt.
a)
printopt "-a" "${OPTARG}" # Double-quotes required here
;;
k)
printopt "-k" "${OPTARG}" # Double-quotes required here
;;
esac
done
And running it with various representations of strings:
$ ./argtest.sh -a 12\ 34\ kla\#42#\!\'\ 2\ M\$\"\ rtqas\;::\# -k "1 2 ad'ias"
-a value is: <<12 34 kla#42#!' 2 M$" rtqas;::#>>
-k value is: <<1 2 ad'ias>>
$ ./argtest.sh -a '12 34 kla#42#!'"'"' 2 M$" rtqas;::#' -k $'1 2 ad\'ias'
-a value is: <<12 34 kla#42#!' 2 M$" rtqas;::#>>
-k value is: <<1 2 ad'ias>>
Ok, ok, there are a few situations where it's more complicated than that:
There are some situations where a string will be run through the shell parsing process multiple times, such as when it's being run over ssh (the command gets processed by the local shell, passed to the remote computer, then processed by that shell and executed), or used as a shell alias (the alias command gets parsed, result stored, then parsed again when you use it). In these cases, you essentially need two (or possibly more) layers of quoting/escaping: take the raw string, quote/escape by any of the above methods, then take that string and quote/escape that (probably by a different method).
Some versions of echo will parse escape (backslash) sequences in the string (using different rules than the shell itself does), which can cause confusion. I recommend using printf instead when this might be an issue; the only problem is that it's more complex than echo: it doesn't just print its arguments, it uses the first argument is a format string which controls how the rest of the arguments are printed. See my examples in the script above.
If you are passing the string to another script that doesn't use double-quotes around its parameter & variable references, you are doomed. In this case, the only thing that can be done is to fix that other script.

How can I remove a newline (\n) at the end of a string?

The problem
I have multiple property lines in a single string separated by \n like this:
LINES2="Abc1.def=$SOME_VAR\nAbc2.def=SOMETHING_ELSE\n"$LINES
The LINES variable
might contain an undefined set of characters
may be empty. If it is empty, I want to avoid the trailing \n.
I am open for any command line utility (sed, tr, awk, ... you name it).
Tryings
I tried this to no avail
sed -z 's/\\n$//g' <<< $LINES2
I also had no luck with tr, since it does not accept regex.
Idea
There might be an approach to convert the \n to something else. But since $LINES can contain arbitrary characters, this might be dangerous.
Sources
I skim read through the following questions
How can I replace a newline (\n) using sed?
sed with literal string--not input file
Here's one solution:
LINES2="Abc1.def=$SOME_VAR"$'\n'"Abc2.def=SOMETHING_ELSE${LINES:+$'\n'$LINES}"
The syntax ${name:+value} means "insert value if the variable name exists and is not empty." So in this case, it inserts a newline followed by $LINES if $LINES is not empty, which seems to be precisely what you want.
I use $'\n' because "\n" is not a newline character. A more readable solution would be to define a shell variable whose value is a single newline.
It is not necessary to quote strings in shell assignment statements, since the right-hand side of an assignment does not undergo word-splitting nor glob expansion. Not quoting would make it easier to interpolate a $'\n'.
It is not usually advisable to use UPPER-CASE for shell variables because the shell and the OS use upper-case names for their own purposes. Your local variables should normally be lower case names.
So if I were not basing the answer on the command in the question, I would have written:
lines2=Abc1.def=$someVar$'\n'Abc2.def=SOMETHING_ELSE${lines:+$'\n'$lines}

Bash parsing and shell expansion

I'm confused in the way bash parses input and performs expansion.
For input say, \'"\"hello world\"" passed as argument in bash to a script that displays what its input is, I'm not exactly sure how Bash parses it.
Example,
var=\'"\"hello world\""
./displaywhatiget.sh "$var"
I got '"hello world"
I understand that the double quotes in "$var" tells bash to treat the value of var together. However, what I don't understand is when is the backslash escaping and double-quoted parsing for the value takes place in bash's expansion process.
I'm coming from shell-operation, and shell expansion.
All of the interesting things happen in the assignment, var=\'"\"hello world\"". Let's break it down:
\' - this is an escaped single-quote. Without the escape, it would start a single-quoted string, but escaped it's just a literal single-quote. Thus, the final string will start with '.
" - this starts a double-quoted string.
\" - an escaped double-quote; like the escaped single-quote, this gets treated as a literal double-quote, so " will be the second character of the final string.
hello world - since we're still in a double-quoted string, this just gets included literally in the final string. Note that if we weren't in double-quotes at this point, the space would've marked the end of the string.
\" - another escaped double-quote; again, included literally so the last character of the final string will be ".
" - this closes the double-quoted string.
Thus, var gets assigned the value '"hello world". In ./displaywhatiget.sh "$var", the double-quotes mean that $var gets replaced by var's value, but no further interpretation is done; that's just passed directly to the script.
UPDATE: When using set -vx, bash prints the assignment in a somewhat strange way. As I said in a comment, what it does is take the original command, parse it (as I described above) to figure out what it means, then back-translate that to get an equivalent command (i.e. one that'd have the same effect). The equivalent command it comes up with is var=''\''"hello world"'. Here's how that would be parsed:
'' - this is a zero-length single-quoted string; it has no effect whatsoever. I'm not sure why bash includes it. I'm tempted to call it a bug, but it's not actually wrong, just completely pointless. BTW, if you want an example of quote removal, here it is: in this command, these quotes would just be removed with no trace left.
\' - this is an escaped single-quote, just like in the original command. The final string will start with '.
' - this starts a single-quoted string. No interpretation at all is performed inside single-quotes, except for looking for the close-quote.
"hello world" - since we're in a single-quoted string, this just gets included literally in the final string, including the double-quotes and space.
' - this closes the single-quoted string.
so it gets the same value assigned to var, just written differently. Any of these would also have the same effect:
var=\''"hello world"'
var="'\"hello world\""
var=\'\"hello\ world\"
var="'"'"hello world"'
var=$'\'"hello world"'
...and many others. bash could technically have printed any of these under set -vx.
The parsing of the \-prefixed escape sequences happens on assignment:
var=\'"\"hello world\""
causes Bash to store the following literal in $var: '"hello world".
On later referencing $var inside a double-quoted string ("$var") the above literal becomes a literal part of that double-quoted string - no interpretation of the value of $var is performed at this point.
What double-quoted strings expand to is treated as a single word (argument) by the shell (after removing the enclosing double quotes, a process called quote removal).

bash difference between raw string and string in variable

I wrote a little script in bash, but it only worked when I stored the string as a variable, and I'd like to know why. Here's the summary:
When I use the string itself, bash treats it as a single entity
for word in "this is a sentence"; do
echo $word
done
# => this is a sentence
If I save the exact same string into a variable, bash iterates over the words
sentence="this is a sentence"
for word in $sentence; do
echo $word
done
# => this
# is
# a
# sentence
Why are these being treated differently?
Is there a simple way to iterate through the words in the string without first saving the string as a variable?
The quotes tell bash to treat a thing in quotes as a single parameter in a parameter list at the time the expression is evaluated. The quotes (unless protected with \ or ') are removed.
echo "" # prints newlines, no quotes
echo '""' # Print ""
export X='""'
env | grep X # X contains ""
export X=""
env | grep X # X is empty
When you use a variable, bash unpacks it as is (i.e. as if you typed the variable's contents in the variable's place). For a for-loop bash determines the list-elements to iterate over by separating the for-loop's parameters by whitespace, but treating (as always) quote-protected items a single parameter/list-element. Your variable contained no quotes -- items are treated as separate parameters.
As comments suggested, quotes are important. A for loop will step through a list of values terminated by a semicolon, and that list is a set of strings. Unquoted strings are delimited usually by whitespace. Whitespace inside a quoted string does not separate the string from its brethren, it's simply part of the quoted string. There's some truly excellent documentation about quotes in bash at http://mywiki.wooledge.org/Quotes . Read it. Read it now. You'll find a part that says
The quotes are not actually passed along to the command. They are removed by the shell (this process is cleverly called "quote removal").
To step through the words in a sentence that's stored in a variable (if I've inferred your question correctly), you could perhaps use an array to separate the words by whitespace:
#!/bin/bash
sentence="this is a sentence"
IFS=" " read -a words <<< "$sentence"
for word in "${words[#]}"; do
echo "$word"
done
In bash, read -a will divide a string by $IFS and place the divided parts into elements of the array. See http://mywiki.wooledge.org/BashGuide/Arrays for more information about how bash arrays work.
If you want more details in pursuit of a specific problem, you might want to tell us what the problem is, or risk making this an XY problem.
In the assignment
sentence="this is a sentence"
there are no unquoted spaces, so everything to the right of the = is treated as a single word. (Something like sentence=this is a sentence would be parsed as a single assignment sentence=this followed by an attempt to run a program called is.) As a result, the value of sentences is a sequence of 18 characters. It is identical to
sentence=this\ is\ a\ sentence
because again, there are no unquoted spaces.
For the same reason
for word in "this is a sentence"; do
echo $word
done
has word being set to each word in the following sequence, which only contains a single word because there are no unquoted spaces.
The key difference with your other loop is that parameter expansions are subject to word-splitting after the fact. The loop
for word in $sentence; do
echo $word
done
after parameter expansion looks like
for word in this is a sentence; do
echo $word
done
so now word is set to each of the 4 words in the list following the in keyword.
It's not clear what you are actually asking at the end of your question, but the preceding is legal code. There is no requirement that a string be placed in quotes in bash; quotes do not define something as a string value, but simply escape every character that appears within the quotes. "foo" and \f\o\o are the same thing in shell.
Quoting turns any string into a single unit. If you lose the quotes, everything should be fine.

How to echo a string with any content in bash?

I'm having an extremely hard time figuring out how to echo this:
![alt text](https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon48.png "Logo Title Text 1")
I keep getting this error:
bash: ![alt: event not found
Using double quotes around it does not work. The using single quotes around it does work, however, I also need to echo strings that have single quotes within them. I wouldn't be able to wrap the string with single quotes then.
Is there a way to echo a string of ANY content?
Thanks.
EDIT: Here is some context. I am making a Markdown renderer that grabs the content of a code editor, then appends every line of the code individually into a text file. I am doing this by doing this:
echo TheLineOfMarkdown > textfile.txt
Unlike in many programing languages, '...' and "..." in Bash do not represent "strings" per se; they quote/escape whatever they contain, but they do not create boundaries that separate arguments. So, for example, these two commands are equivalent:
echo foobar
echo "fo"ob'ar'
So if you need to quote some of an argument with single-quotes, and a different part of the argument has to contain single-quotes — no problem.
For example:
echo '![alt text](https://... "What'"'"'s up, Doc?")'
Another option is to use \, which is similar to '...' except that it only quotes a single character. It can even be used inside double-quotes:
echo "\![alt text](https://... \"What's up, Doc?\")"
For more information, see §3.1.2 "Quoting" in the Bash Reference Manual.
! is annoying. My advice: Use \!.
! invokes history completion, which is also performed inside double-quotes. So you need to single-quote the exclamation mark, but as you say that conflicts with the need to not single-quote other single-quotes.
Remember that you can mix quotes:
$ echo '!'"'"'"'
!'"
(That's just one argument.) But in this case, the backslash is easier to type and quite possibly more readable.

Resources