Easiest Shell for C++/C#/Java Programmers - linux

I am a bit struggling with bash programming, because I don't seem to understand the syntax rules. For example:
read confirm
if [ "$confirm" == "yes" ]; then
echo "Thank you for saying yes"
else
echo "Why did you say no?"
fi
In this code, you can use many forms to do the same thing:
"$confirm" == "yes"
$confirm == "yes"
"$confirm" == yes
$confirm == yes
So what is the rule? Besides that, it is very strict, so if you write the above if statement without the space between '[' and '"', you would get an error. So my questions:
Can anybody give me an idea about the 'basic' rules of syntax in shell scripting? You can call it grammar if you want.
There are three different shell programming scripts as far as I know. Which one of them is the easiest to learn for programmers of languages like C++, C#, and Java?
Regards,
Rafid

The rules are simple but subtle. The examples you gave are not all equivalent, they have subtly different meanings. For a fairly good reference you can read Shell Command Language, which covers POSIX shells. Most shells, certainly including bash, zsh and ksh, are POSIX shells and will implement at least what is listed there. Some shells may conform to earlier versions of the specification or be similar but not conformant.
The primary rule that you will need to remember if you are learning Unix shell scripting is this: Expressions are separated by spaces. Technically they are separated by whatever characters are listed in the variable $IFS, but this amounts to whitespace under normal circumstances.
If you say ["$a"="$b"] in bash the shell tries to read the entire string as a command, evaluating $a and $b in place. Supposing the value of $a was a literal a and the value of $b was a literal b, the shell would attempt to execute a command called [a=b], which is a legal file name. The quotation marks were interpreted by the shell as special but the [ was not, because it is only special if written as a separate token. It was not separated by spaces.
Almost everything you see and do in a shell is a command. The character [ isn't syntax, it's a command. Commands take arguments, separated by spaces. What those arguments mean is up to the command, not the shell. In C if ( a == b ) is handled all by the parser, except for the values of a and b. In bash if [ "$a" == "$b" ] is first parsed by the shell, which evaluates the variables $a and $b, and then the command [ is executed. Sometimes this is a shell builtin command, sometimes it is literally a separate executable (look for /bin/[ in your system). This means that a == b ] is not interpreted by bash at all but instead is a kind of domain-specific language that is interpreted by [, which is also known as test. In fact you can write if test "$a" == "$b" instead. In this form test does not require the closing ], but everything else is the same. To see what test will do with these arguments read help test or man test.
The other rule to remember when learning Unix shell scripting is this: Variables are expanded first, commands are evaluated second. This means that if you have a space in a variable, such as foo="a b" then the shell will see the space after the variable is expanded: ls $foo by itself will complain that it cannot find the file a and it cannot find the file b. To get the behavior you probably expect from other languages you almost always will want to quote your variables: ls "$foo", which instructs the shell that the expanded variable is to be treated as a single string and not re-tokenized.
Shell scripting is filled with oddities but it is not irrational (at least not most of the time). Some historical warts do exist but there are really not very many rules to remember once you get the hand of the basics. Just do not expect it to operate like a conventional C-like language and you won't be too surprised.

One problem is that if you do:
if [ $foo == "bar" ] ...
and then $foo is empty, you would get a syntax error. That could be solved by e.g.
# prepend both with an 'x' (or any other char)
if [ x$foo == "xbar" ] ..
# or enclosing in quotes (not sure this works on all shells)
if [ "$foo" == "bar" ] ...
Putting stuff inside quotes also makes sure that whitespaces are preserved for the comparison.
Today, there are other more advanced ways to use expressions inside if-statements, for example the double brackets
if [[ $foo == bar ]]
see this SO question for more details about that.
As for choosing which script dialect to learn, I'd suggest learning bash as that is by far the most common one. If you want to write portable scripts, then limiting yourself to the old 'sh' dialect which isn't as advanced, but should be supported by almost all unix shells.
The C-shell might be syntax wise more easy for C-style programmers to learn (since the syntax matches C more closely), but it is less common in the wild

The most basic grammar rule in bash is:
<command> <space>+ <arguments>
That is, the command and arguments must be separated by one or more spaces. [ is a command, so omitting the space after it is an error.
As for which is "better", I echo j_random_hacker's comment:
All the shell languages are horrible, though some are more horrible than others, and all are better than Windows's cmd.exe. My advice is to stick with bash, which is the most popular.

Here's some useful links for bash tips:
Bash scripting “common gotchas” for Python/Perl/Ruby programmers (SO question)
BASH Frequently Asked Questions
Bash Pitfalls
Official Bash FAQ

To be really safe you schould use:
"$confirm" == "yes"
as this makes sure that bash will consider both values to be just strings. Otherwise, they are subject to expansion/substitution which may give unexpected results or/and syntax errors.
As for the syntax error when you ommit the space between opening brace and quote -- opening brace "[" is just an alias for the "test" shell builtin. So your condition can be also written as:
read confirm
if test "$confirm" == "yes"; then
echo "Thank you for saying yes"
else
echo "Why did you say no?"
fi
In fact, on some systems there is a file in /usr/bin with the name of [, that is linked to test command (in case if some shell doesn't have "[" as a shell builtin command).

All the shell languages are horrible, though some are more horrible than others, and all are better than Windows's cmd.exe. My advice is to stick with bash, which is the most popular. man bash should give you the (enormous, convoluted) syntax rules.

I suggest you check out tcsh. It's syntax is more similar to C.

For more regarding [[, test and [ see these answers:
Test for non-zero length string in bash: [ -n "$var" ] or [ "$var" ]
bash: double or single bracket, parentheses, curly braces
What's the difference between [ and [[ in bash?

Related

When processing string with Bash, how to treat comma differently depending on whether it's surrounded by some specific characters?

I would like to transform a MySQL script into a JSON file and was asked to use Bash for it.
By writing a simple shell script:
#!/bin/bash
# I know this script just output each entry with its value, because I haven' t gone any further
for filename in $dir/home/*.sql
do
cat $filename | while read line
do
names=${line%values*}
names=${names#*(}
names=${names%)*}
values=${line#*values(}
values=${values%)*}
while [[ $names != $currentname ]]
do
currentname=${names%%,*}
currentvalue=${values%%,*}
echo $currentname
echo $currentvalue
names=${names#*,}
values=${values#*,}
done
done
done
I have been basically able to fulfill the requirement. However, there is one more problem.
Some of the string entries has comma among its characters.
This causes a mistake that my script thinks these commas as the ones that separates values and thus a string bearing comma will be treated as two different strings.
It would be an easy task to solve this with programming languages like C++, but I have been asked to do this only with bash shell script although I am not familiar with it. So now I have been stuck with no clue. Maybe regular expression would be the cure? Or if there are other approaches please also help.
FYI, here is an example of the problem:
Input:
values(100, 'A100', 'A,100');
Expected output:
100
'A100'
'A,100'
Actual current output:
100
'A100'
'A
100'
Something like this may help:
data="values(100, 'A100', 'A,100');"
json=${data//values(}
json=${json//);}
json=${json//, /$'\n'}
echo "$json"
Expected output:
Typically in shell you would match it with a regex:
echo "values(100, 'A100', 'A,100');" | sed 's/values(//; s/\(, \|);\)/\n/g'
but this does not solve the problem at all.
The best and only solution is to write a real parser for real mysql langauge to 'handle' '' ' ' 'all\tcorner\'cases' properly. Read the input char by char, store state (ex. if you are inside quotation or not), handle '\'' and other \n etc. sequences for the need of extracting the field. You might interest yourself in mysql internal lexer (it's big!) and lex and yacc programs.
Check your scripts with http://shellcheck.net . Read https://mywiki.wooledge.org/BashFAQ/001 . Quote variable expansions. Don't be nominated for useless cat award.
and was asked to use Bash for it.
Bash is a shell - it's primary role is to run and connect other programs with each other. Bash is a shell, not a full blown programming language, and writing programming stuff in it is going to be very hard or it just ends up using external programs, as that's what it's for. Write the parser in other language - use bash to run it. If you're comfortable in C++, write it in C++ inside a bash script, then compile and execute it inside a bash script.
A common arrangement is to use regex for this, yes; for example, this is a requirement for parsing CSV files. But you can parse the line piece by piece like in your attempt.
However, you have a number of quoting errors which would prevent your code from working even if you figured out a way to parse the input the way you want to. (And of course, get rid of the Useless use of cat?)
while read -r line; do
case $line in
*values\(*\)\; );;
*) continue;;
esac
line=${line#values\(}
line=${line%\)\;}
while [ "$line" ]; do
case $line in
\'*)
line=${line#\'}
tail=${line#*\'}
value=\'${line%"$tail"}
line=${tail#,}
line=${line# };;
*) value=${line%%,*}
line=${line#*,}
line=${line# };;
esac
echo "$value"
done
done <"$filename"
This is probably not really the way to go, just a hint if you really want to try to tackle this in Bash. I would write a simple parser in Python if I wanted to cover all bases.

Unterstanding the dollar in shell with pwd and awk [duplicate]

This question already has answers here:
Backticks vs braces in Bash
(3 answers)
Brackets ${}, $(), $[] difference and usage in bash
(1 answer)
Closed 4 years ago.
I have two questions and could use some help understanding them.
What is the difference between ${} and $()? I understand that ()
means running command in separate shell and placing $ means passing
the value to variable. Can someone help me in understanding
this? Please correct me if I am wrong.
If we can use for ((i=0;i<10;i++)); do echo $i; done and it works fine then why can't I use it as while ((i=0;i<10;i++)); do echo $i; done? What is the difference in execution cycle for both?
The syntax is token-level, so the meaning of the dollar sign depends on the token it's in. The expression $(command) is a modern synonym for `command` which stands for command substitution; it means run command and put its output here. So
echo "Today is $(date). A fine day."
will run the date command and include its output in the argument to echo. The parentheses are unrelated to the syntax for running a command in a subshell, although they have something in common (the command substitution also runs in a separate subshell).
By contrast, ${variable} is just a disambiguation mechanism, so you can say ${var}text when you mean the contents of the variable var, followed by text (as opposed to $vartext which means the contents of the variable vartext).
The while loop expects a single argument which should evaluate to true or false (or actually multiple, where the last one's truth value is examined -- thanks Jonathan Leffler for pointing this out); when it's false, the loop is no longer executed. The for loop iterates over a list of items and binds each to a loop variable in turn; the syntax you refer to is one (rather generalized) way to express a loop over a range of arithmetic values.
A for loop like that can be rephrased as a while loop. The expression
for ((init; check; step)); do
body
done
is equivalent to
init
while check; do
body
step
done
It makes sense to keep all the loop control in one place for legibility; but as you can see when it's expressed like this, the for loop does quite a bit more than the while loop.
Of course, this syntax is Bash-specific; classic Bourne shell only has
for variable in token1 token2 ...; do
(Somewhat more elegantly, you could avoid the echo in the first example as long as you are sure that your argument string doesn't contain any % format codes:
date +'Today is %c. A fine day.'
Avoiding a process where you can is an important consideration, even though it doesn't make a lot of difference in this isolated example.)
$() means: "first evaluate this, and then evaluate the rest of the line".
Ex :
echo $(pwd)/myFile.txt
will be interpreted as
echo /my/path/myFile.txt
On the other hand ${} expands a variable.
Ex:
MY_VAR=toto
echo ${MY_VAR}/myFile.txt
will be interpreted as
echo toto/myFile.txt
Why can't I use it as bash$ while ((i=0;i<10;i++)); do echo $i; done
I'm afraid the answer is just that the bash syntax for while just isn't the same as the syntax for for.
your understanding is right. For detailed info on {} see bash ref - parameter expansion
'for' and 'while' have different syntax and offer different styles of programmer control for an iteration. Most non-asm languages offer a similar syntax.
With while, you would probably write i=0; while [ $i -lt 10 ]; do echo $i; i=$(( i + 1 )); done in essence manage everything about the iteration yourself

Difference between ${} and $() in Bash [duplicate]

This question already has answers here:
Backticks vs braces in Bash
(3 answers)
Brackets ${}, $(), $[] difference and usage in bash
(1 answer)
Closed 4 years ago.
I have two questions and could use some help understanding them.
What is the difference between ${} and $()? I understand that ()
means running command in separate shell and placing $ means passing
the value to variable. Can someone help me in understanding
this? Please correct me if I am wrong.
If we can use for ((i=0;i<10;i++)); do echo $i; done and it works fine then why can't I use it as while ((i=0;i<10;i++)); do echo $i; done? What is the difference in execution cycle for both?
The syntax is token-level, so the meaning of the dollar sign depends on the token it's in. The expression $(command) is a modern synonym for `command` which stands for command substitution; it means run command and put its output here. So
echo "Today is $(date). A fine day."
will run the date command and include its output in the argument to echo. The parentheses are unrelated to the syntax for running a command in a subshell, although they have something in common (the command substitution also runs in a separate subshell).
By contrast, ${variable} is just a disambiguation mechanism, so you can say ${var}text when you mean the contents of the variable var, followed by text (as opposed to $vartext which means the contents of the variable vartext).
The while loop expects a single argument which should evaluate to true or false (or actually multiple, where the last one's truth value is examined -- thanks Jonathan Leffler for pointing this out); when it's false, the loop is no longer executed. The for loop iterates over a list of items and binds each to a loop variable in turn; the syntax you refer to is one (rather generalized) way to express a loop over a range of arithmetic values.
A for loop like that can be rephrased as a while loop. The expression
for ((init; check; step)); do
body
done
is equivalent to
init
while check; do
body
step
done
It makes sense to keep all the loop control in one place for legibility; but as you can see when it's expressed like this, the for loop does quite a bit more than the while loop.
Of course, this syntax is Bash-specific; classic Bourne shell only has
for variable in token1 token2 ...; do
(Somewhat more elegantly, you could avoid the echo in the first example as long as you are sure that your argument string doesn't contain any % format codes:
date +'Today is %c. A fine day.'
Avoiding a process where you can is an important consideration, even though it doesn't make a lot of difference in this isolated example.)
$() means: "first evaluate this, and then evaluate the rest of the line".
Ex :
echo $(pwd)/myFile.txt
will be interpreted as
echo /my/path/myFile.txt
On the other hand ${} expands a variable.
Ex:
MY_VAR=toto
echo ${MY_VAR}/myFile.txt
will be interpreted as
echo toto/myFile.txt
Why can't I use it as bash$ while ((i=0;i<10;i++)); do echo $i; done
I'm afraid the answer is just that the bash syntax for while just isn't the same as the syntax for for.
your understanding is right. For detailed info on {} see bash ref - parameter expansion
'for' and 'while' have different syntax and offer different styles of programmer control for an iteration. Most non-asm languages offer a similar syntax.
With while, you would probably write i=0; while [ $i -lt 10 ]; do echo $i; i=$(( i + 1 )); done in essence manage everything about the iteration yourself

Shell injection - is this secure?

I am solving an issue I have with bad performing flash in Firefox under Linux.
I would like to know if the following code is secure. The input is untrusted and I get the feeling that if not sanitized, could be dangerous.
#!/bin/bash
#in="vlc://www.youtube.com/watch?v=yVpbFMhOAwE"
in=$1;
out=`echo $in | sed -r 's/vlc:\/\/www\.youtube\.com\/watch\?v=([-_a-zA-Z0-9]*).*$/\1/g'`;
vlc "http://www.youtube.com/watch?v=$out";
Edit 1: based on Jan Hudec's comments I have come to this
#!/bin/bash
#in="vlc://www.youtube.com/watch?v=yVpbFMhOAwE"
in=$1;
if [ `expr "$in" : '^vlc://www.youtube.com/watch?v=[-_a-zA-Z0-9]*$'` -gt 0 ]
then
vlc "http${in:3}";
fi
Edit 2 (likely final):
#!/bin/bash
#in="vlc://www.youtube.com/watch?v=yVpbFMhOAwE"
in=$1;
if expr "$in" : '^vlc://www.youtube.com/watch?v=[-_a-zA-Z0-9]*$' >/dev/null
then
vlc "http${in:3}";
fi
I don't think this particular script can be actually exploited to do anything evil, but only as long as vlc won't do anything evil on a malformed URL starting with correct youtube host, but containing funny stuff. Because the sanitization is totally ineffective.
There are 3 mistakes that together mean almost anything can be passed to vlc and some information about the system may be found using it:
The first problem is the echo command. Echo is the most irregular command of unix shell,
behaving very differently in each shell. Use printf if you need to pass parameter to
standard input of a command unmodified.
Second problem is that it's argument is not quoted (you still have to quote variables
in command substitution), so it will undergo word splitting and filename generation
(globbing). The later could be abused to get some information of the system. As long
as the command only has local effect, this is worthless, but vary of similar mistake
in anything network-observable. Also some shells (but not bash) might allow some
side-effects from filename generation in which case it would become dangerous.
Last the sed will just return the content of $in when it does not match and you pass
it straight to vlc than. It is properly quoted there, so vlc won't interpret it as separate
URLs or options, only as one, funny, invalid URL. So it is unlikely to be exploitable
unless using some serious bug in vlc.
I think you should:
Check whether the argument is in appropriate format using expr command with properly quoted arguments.
And abort with error if it does not.
Than simply remove the prefix with ${in#*=} because you already checked the prefix is there and ends with =.
In bash, there's no need to use expr:
regex="^vlc://www\.youtube\.com/watch\?v=[-_a-zA-Z0-9]*$"
if [[ $in =~ $regex ]]; then

How do you pass on filenames to other programs correctly in bash scripts?

What idiom should one use in Bash scripts (no Perl, Python, and such please) to build up a command line for another program out of the script's arguments while handling filenames correctly?
By correctly, I mean handling filenames with spaces or odd characters without inadvertently causing the other program to handle them as separate arguments (or, in the case of < or > — which are, after all, valid if unfortunate filename characters if properly escaped — doing something even worse).
Here's a made-up example of what I mean, in a form that doesn't handle filenames correctly: Let's assume this script (foo) builds up a command line for a command (bar, assumed to be in the path) by taking all of foo's input arguments and moving anything that looks like a flag to the front, and then invoking bar:
#!/bin/bash
# This is clearly wrong
FILES=
FLAGS=
for ARG in "$#"; do
echo "foo: Handling $ARG"
if [ x${ARG:0:1} = "x-" ]; then
# Looks like a flag, add it to the flags string
FLAGS="$FLAGS $ARG"
else
# Looks like a file, add it to the files string
FILES="$FILES $ARG"
fi
done
# Call bar with the flags and files (we don't care that they'll
# have an extra space or two)
CMD="bar $FLAGS $FILES"
echo "Issuing: $CMD"
$CMD
(Note that this just an example; there are lots of other times one needs to do this and that to a bunch of args and then pass them onto other programs.)
In a naive scenario with simple filenames, that works great. But if we assume a directory containing the files
one
two
three and a half
four < five
then of course the command foo * fails miserably in its task:
foo: Handling four < five
foo: Handling one
foo: Handling three and a half
foo: Handling two
Issuing: bar four < five one three and a half two
If we actually allow foo to issue that command, well, the results won't be what we're expecting.
Previously I've tried to handle this through the simple expedient of ensuring that there are quotes around each filename, but I've (very) quickly learned that that is not the correct approach. :-)
So what is? Constraints:
I want to keep the idiom as simple as possible (not least so I can remember it).
I'm looking for a general-purpose idiom, hence my making up the bar program and the contrived example above instead of using a real scenario where people might easily (and reasonably) go down the route of trying to use features in the target program.
I want to stick to Bash script, I don't want to call out to Perl, Python, etc.
I'm fine with relying on (other) standard *nix utilities, like xargs, sed, or tr provided we don't get too obtuse (see #1 above). (Apologies to Perl, Python, etc. programmers who think #3 and #4 combine to draw an arbitrary distinction.)
If it matters, the target program might also be a Bash script, or might not. I wouldn't expect it to matter...
I don't just want to handle spaces, I want to handle weird characters correctly as well.
I'm not bothered if it doesn't handle filenames with embedded nul characters (literally character code 0). If someone's managed to create one in their filesystem, I'm not worried about handling it, they've tried really hard to mess things up.
Thanks in advance, folks.
Edit: Ignacio Vazquez-Abrams pointed me to Bash FAQ entry #50, which after some reading and experimentation seems to indicate that one way is to use Bash arrays:
#!/bin/bash
# This appears to work, using Bash arrays
# Start with blank arrays
FILES=()
FLAGS=()
for ARG in "$#"; do
echo "foo: Handling $ARG"
if [ x${ARG:0:1} = "x-" ]; then
# Looks like a flag, add it to the flags array
FLAGS+=("$ARG")
else
# Looks like a file, add it to the files array
FILES+=("$ARG")
fi
done
# Call bar with the flags and files
echo "Issuing (but properly delimited, not exactly as this appears): bar ${FLAGS[#]} ${FILES[#]}"
bar "${FLAGS[#]}" "${FILES[#]}"
Is that correct and reasonable? Or am I relying on something environmental above that will bite me later. It seems to work and it ticks all the other boxes for me (simple, easy to remember, etc.). It does appear to rely on a relatively recent Bash feature (FAQ entry #50 mentions v3.1, but I wasn't sure whether that was arrays in general of some of the syntax they were using with it), but I think it's likely I'll only be dealing with versions that have it.
(If the above is correct and you want to un-delete your answer, Ignacio, I'll accept it provided I haven't accepted any others yet, although I stand by my statement about link-only answers.)
Why do you want to "build up" a command? Add the files and flags to arrays using proper
quoting and issue the command directly using the quoted arrays as arguments.
Selected lines from your script (omitting unchanged ones):
if [[ ${ARG:0:1} == - ]]; then # using a Bash idiom
FLAGS+=("$ARG") # add an element to an array
FILES+=("$ARG")
echo "Issuing: bar \"${FLAGS[#]}\" \"${FILES[#]}\""
bar "${FLAGS[#]}" "${FILES[#]}"
For a quick demo of using arrays in this manner:
$ a=(aaa 'bbb ccc' ddd); for arg in "${a[#]}"; do echo "..${arg}.."; done
Output:
..aaa..
..bbb ccc..
..ddd..
Please see BashFAQ/050 regarding putting commands in variables. The reason that your script doesn't work is because there's no way to quote the arguments within a quoted string. If you were to put quotes there, they would be considered part of the string itself instead of as delimiters. With the arguments left unquoted, word splitting is done and arguments that include spaces are seen as more than one argument. Arguments with "<", ">" or "|" are not a problem in any case since redirection and piping is performed before variable expansion so they are seen as characters in a string.
By putting the arguments (filenames) in an array, spaces, newlines, etc., are preserved. By quoting the array variable when it's passed as an argument, they are preserved on the way to the consuming program.
Some additional notes:
Use lowercase (or mixed case) variable names to reduce the chance that they will collide with the shell's builtin variables.
If you use single square brackets for conditionals in any modern shell, the archaic "x" idiom is no longer necessary if you quote the variables (see my answer here). However, in Bash, use double brackets. They provide additional features (see my answer here).
Use getopts as Let_Me_Be suggested. Your script, though I know it's only an example, will not be able to handle switches that take arguments.
This for ARG in "$#" can be shortened to this for ARG (but I prefer the readability of the more explicit version).
See BashFAQ #50 (and also maybe #35 on option parsing). For the scenario you describe, where you're building a command dynamically, the best option is to use arrays rather than simple strings, as they won't lose track of where the word boundaries are. The general rules are: to create an array, instead of VAR="foo bar baz", use VAR=("foo" "bar" "baz"); to use the array, instead of $VAR, use "${VAR[#]}". Here's a working version of your example script using this method:
#!/bin/bash
# This is clearly wrong
FILES=()
FLAGS=()
for ARG in "$#"; do
echo "foo: Handling $ARG"
if [ x${ARG:0:1} = "x-" ]; then
# Looks like a flag, add it to the flags array
FLAGS=("${FLAGS[#]}" "$ARG") # FLAGS+=("$ARG") would also work in bash 3.1+, as Dennis pointed out
else
# Looks like a file, add it to the files string
FILES=("${FILES[#]}" "$ARG")
fi
done
# Call bar with the flags and files (we don't care that they'll
# have an extra space or two)
CMD=("bar" "${FLAGS[#]}" "${FILES[#]}")
echo "Issuing: ${CMD[*]}"
"${CMD[#]}"
Note that in the echo command I used "${VAR[*]}" instead of the [#] form because there's no need/point to preserving word breaks here. If you wanted to print/record the command in unambiguous form, this would be a lot messier.
Also, this gives you no way to build up redirections or other special shell options in the built command -- if you add >outfile to the FILES array, it'll be treated as just another command argument, not a shell redirection. If you need to programmatically build these, be prepared for headaches.
getopts should be able to handle spaces in arguments correctly ("file name.txt"). Weird characters should work as well, assuming they are correctly escaped (ls -b).

Resources