Escaping rules when reading in bash - linux

Assume the file text is created as the following:
> cat > text
\abc
^D
The following two scripts generate different outputs and I don't understand why.
> s=$(cat text)
> echo $s
\abc
> cat text | while read line; do echo $line; done
abc
Why?

read without the -r option interprets backslash escapes, and \a becomes a. See the manual:
-rIf this option is given, backslash does not act as an escape
character. The backslash is considered to be part of the line. In
particular, a backslash-newline pair may not be used as a line
continuation.

Related

Does the `-r` flag with `read` cause or NOT cause character escape?

I am very confused about the read -r flag, or the meaning of "escape" in this contexts. The manual says regarding this flag:
-r = do not allow backslashes to escape any characters
But this seems to me to be the OPPOSITE of what the flag does. For example, running:
read -d '' VAR <<EOF
This is the \t first line
This is the second line
EOF
echo $VAR
... gives:
This is the t first line
This is the second line
But that seems to me as though the 't' character has NOT been escaped by the backslash. Conversely, when I add the -r flag, I get the following:
This is the first line
This is the second line
... where it appears to me as though the 't' character HAS been escaped due to the -r flag. So am I misunderstanding the meaning of the word "escape", or misunderstanding something else going on here?
I strongly suspect your confusion is caused by the manner in which you are determining the final content of the string. When backslashes are treated as an escape sequence (eg, when you do not use -r), \t is treated the same as a t. When they are not, it is treated as the literal two characters \t. Consider:
$ cat a.sh
#!/bin/sh
read a << 'EOF'
a: Without -r: foo\tbar
EOF
read -r b << 'EOF'
b: With -r : foo\tbar
EOF
printf "a = %s\n" "$a"
printf "b = %s\n" "$b"
printf "printf interprets the string: $a\n"
printf "printf interprets the string: $b\n"
$ ./a.sh
a = a: Without -r: footbar
b = b: With -r : foo\tbar
printf interprets the string: a: Without -r: footbar
printf interprets the string: b: With -r : foo bar
Thanks to everyone for their input. OK, this is one of those pesky things in bash that is clearer to me now, but had me confused initially. Here's my summary understanding.
There are, in a sense, three strings at play here:
The string of characters input to the heredoc,
The string of characters output from the heredoc and input to read,
The string of characters output from read and input to VAR
The string of characters being fed into the heredoc is, of course, whatever you type between the delimiters. But the string of characters output by the heredoc will depend on its own rules (viz. on whether the delimiter is quoted or not).
Next, the string of characters output by the heredoc will go into read, but the string of characters to be output by read (and saved into VAR) will depend on the presence/absence of the -r flag. If the string of characters input to read contain backslashes, then read without -r will first escape any such backlash-prefixed sequence -- thus modifying the string of characters -- and saving it into VAR.
But read -r will not attempt to interpret the backslashes, leaving the input text "as is" when outputting to VAR. Hence, the original \t is preserved with read -r and thus interpreted as a tab in the final echo $VAR.
My confusion primarily lay in my lack of discernment of the three separate strings of characters at play here (not echo vs printf).
The escaping that a backslash does as input to read, is to prevent the next character from being treated as a separator:
$ read -r a b <<< 'foo\ bar'; printf "<%s> <%s>\n" "$a" "$b"
<foo\> <bar>
$ read a b <<< 'foo\ bar'; printf "<%s> <%s>\n" "$a" "$b"
<foo bar> <>
Without it, backslashes are removed as part of the escape processing. With it, they are kept as-is.
Having the \t turn into a hard tab is due to echo, some implementations of it do that by default, some don't.

How to extract string between quotes in Bash

I need to extract the string between quotation marks in a file.
For example: my file is called test.txt and it has the following content:
"Hello_World"
I am reading it as follows from bash:
string="$(head -1 test.txt)"
echo $string
This prints "Hello_World", but I need Hello_World.
Any help will be appreciated. Thanks.
You can do this in pure bash without having to spawn any external programs:
read -r line < test.txt ; line=${line#\"} ; line=${line%\"} ; echo $line
The read actually reads in the entire line, and the two assignments actually strip off any single quote at the start or end of the line.
I assumed you didn't want to strip out any quotes within the string itself so I've limited it to one at either end.
It also allows you to successfully read lines without a leading quote, trailing quote, or both.
You can use tr:
echo "$string " | tr -d '"'
From man tr:
DESCRIPTION
The tr utility copies the standard input to the standard output with substitution or deletion of selected characters.
The following options are available:
-C Complement the set of characters in string1, that is ``-C ab'' includes every character except for `a' and `b'.
-c Same as -C but complement the set of values in string1.
-d Delete characters in string1 from the input.
You can simply use sed to read the first line and also filter out ", try following command,
sed -n '1 s/"//gp' test.txt
Brief explanation,
-n: suppress automatic print
1: Match only the first line
s/"//gp: filter out ", and then print the line

Output variable in unix with new lines as \n

I need to output a variable value to a file in a unix script. My problem it that the variable contains multiple lines. I need those to be output as '\n' literals in the file (a java options file), but I'm using echo and they always get processed into real new lines.
echo "-dmyproperty=$MULTILINE_VAR" >> jvm.options
I've tried echo options like -e o -E but they don't seem to do anything. Can anyone help?
You can use bash parameter substitution with an ANSI-C quoted newline
$ var="line1
line2
line3"
$ echo "${var//$'\n'/\\n}"
line1\nline2\nline3

Linux: Append variable to end of line using line number as variable

I am new to shell scripting. I am using ksh.
I have this particular line in my script which I use to append text in a variable q to the end of a particular line given by the variable a
containing the line number .
sed -i ''$a's#$#'"$q"'#' test.txt
Now the variable q can contain a large amount of text, with all sorts of special characters, such as !##$%^&*()_+:"<>.,/;'[]= etc etc, no exceptions. For now, I use a couple of sed commands in my script to remove any ' and " in this text (sed "s/'/ /g" | sed 's/"/ /g'), but still when I execute the above command I get the following error
sed: -e expression #1, char 168: unterminated `s' command
Any sed, awk, perl, suggestions are very much appreciated
The difficulty here is to quote (escape) the substitution separator characters # in the sed command:
sed -i ''$a's#$#'"$q"'#' test.txt
For example, if q contains # it will not work. The # will terminate the replacement pattern prematurely. Example: q='a#b', a=2, and the command expands to
sed -i 2s#$#a#b# test.txt
which will not append a#b to the end of line 2, but rather a#.
This can be solved by escaping the # characters in q:
sed -i 2s#$#a\#b# test.txt
However, this escaping could be cumbersome to do in shell.
Another approach is to use another level of indirection. Here is an example of using a Perl one-liner. First q is passed to the script in quoted form. Then, within the script the variable assigned to a new internal variable $q. Using this approach there is no need to escape the substitution separator characters:
perl -pi -E 'BEGIN {$q = shift; $a = shift} s/$/$q/ if $. == $a' "$q" "$a" test.txt
Do not bother trying to sanitize the string. Just put it in a file, and use sed's r command to read it in:
echo "$q" > tmpfile
sed -i -e ${a}rtmpfile test.txt
Ah, but that creates an extra newline that you don't want. You can remove it with:
sed -e ${a}rtmpfile test.txt | awk 'NR=='$a'{printf $0; next}1' > output
Another approach is to use the patch utility if present in your system.
patch test.txt <<-EOF
${a}c
$(sed "${a}q;d" test.txt)$q
.
EOF
${a}c will be replaced with the line number followed by c which means the operation is a change in line ${a}.
The second line is the replacement of the change. This is the concatenated value of the original text and the added text.
The sole . means execute the commands.

What is the purpose of the -e flag in this script?

I got a script from - http://www.thegeekstuff.com/2010/06/bash-conditional-expression/
It is -
$ cat exist.sh
#! /bin/bash
file=$1
if [ -e $file ]
then
echo -e "File $file exists"
else
echo -e "File $file doesnt exists"
fi
$ ./exist.sh /usr/bin/boot.ini
File /usr/bin/boot.ini exists
I used the same code without -e near both the echo and it works. So, what is the purpose of using -e there ?
The -e flag enables interpretation of the following backslash-escaped
characters in each STRING:
\a alert (bell)
\b backspace
\c suppress trailing newline
\e escape
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\NNN
the character whose ASCII code is NNN (octal); if NNN is not
a valid octal number, it is printed literally.
\xnnn
the character whose ASCII code is the hexadecimal value
nnn (one to three digits)
Source: http://ss64.com/bash/echo.html
-e enables interpretation of backslash escapes, but answering your question, about the purpose of it being there, it seems to be none at all. It can even be harmful. echo -e is useful if you want to include those backslashed characters in the string, but that is not the case in your example, unless $file has them, and then this can happen:
$ touch test\\test
$ ls
exist.sh test\test
$ ./exist.sh test\\test
File test est exists
Without the -e you get the correct file name. Of course, this is all academic because it's unlikely that files will contain backslashed entities, but then we can conclude those switches were put there with the express goal of confusing you.

Resources