How do you pass on filenames to other programs correctly in bash scripts? - linux

What idiom should one use in Bash scripts (no Perl, Python, and such please) to build up a command line for another program out of the script's arguments while handling filenames correctly?
By correctly, I mean handling filenames with spaces or odd characters without inadvertently causing the other program to handle them as separate arguments (or, in the case of < or > — which are, after all, valid if unfortunate filename characters if properly escaped — doing something even worse).
Here's a made-up example of what I mean, in a form that doesn't handle filenames correctly: Let's assume this script (foo) builds up a command line for a command (bar, assumed to be in the path) by taking all of foo's input arguments and moving anything that looks like a flag to the front, and then invoking bar:
#!/bin/bash
# This is clearly wrong
FILES=
FLAGS=
for ARG in "$#"; do
echo "foo: Handling $ARG"
if [ x${ARG:0:1} = "x-" ]; then
# Looks like a flag, add it to the flags string
FLAGS="$FLAGS $ARG"
else
# Looks like a file, add it to the files string
FILES="$FILES $ARG"
fi
done
# Call bar with the flags and files (we don't care that they'll
# have an extra space or two)
CMD="bar $FLAGS $FILES"
echo "Issuing: $CMD"
$CMD
(Note that this just an example; there are lots of other times one needs to do this and that to a bunch of args and then pass them onto other programs.)
In a naive scenario with simple filenames, that works great. But if we assume a directory containing the files
one
two
three and a half
four < five
then of course the command foo * fails miserably in its task:
foo: Handling four < five
foo: Handling one
foo: Handling three and a half
foo: Handling two
Issuing: bar four < five one three and a half two
If we actually allow foo to issue that command, well, the results won't be what we're expecting.
Previously I've tried to handle this through the simple expedient of ensuring that there are quotes around each filename, but I've (very) quickly learned that that is not the correct approach. :-)
So what is? Constraints:
I want to keep the idiom as simple as possible (not least so I can remember it).
I'm looking for a general-purpose idiom, hence my making up the bar program and the contrived example above instead of using a real scenario where people might easily (and reasonably) go down the route of trying to use features in the target program.
I want to stick to Bash script, I don't want to call out to Perl, Python, etc.
I'm fine with relying on (other) standard *nix utilities, like xargs, sed, or tr provided we don't get too obtuse (see #1 above). (Apologies to Perl, Python, etc. programmers who think #3 and #4 combine to draw an arbitrary distinction.)
If it matters, the target program might also be a Bash script, or might not. I wouldn't expect it to matter...
I don't just want to handle spaces, I want to handle weird characters correctly as well.
I'm not bothered if it doesn't handle filenames with embedded nul characters (literally character code 0). If someone's managed to create one in their filesystem, I'm not worried about handling it, they've tried really hard to mess things up.
Thanks in advance, folks.
Edit: Ignacio Vazquez-Abrams pointed me to Bash FAQ entry #50, which after some reading and experimentation seems to indicate that one way is to use Bash arrays:
#!/bin/bash
# This appears to work, using Bash arrays
# Start with blank arrays
FILES=()
FLAGS=()
for ARG in "$#"; do
echo "foo: Handling $ARG"
if [ x${ARG:0:1} = "x-" ]; then
# Looks like a flag, add it to the flags array
FLAGS+=("$ARG")
else
# Looks like a file, add it to the files array
FILES+=("$ARG")
fi
done
# Call bar with the flags and files
echo "Issuing (but properly delimited, not exactly as this appears): bar ${FLAGS[#]} ${FILES[#]}"
bar "${FLAGS[#]}" "${FILES[#]}"
Is that correct and reasonable? Or am I relying on something environmental above that will bite me later. It seems to work and it ticks all the other boxes for me (simple, easy to remember, etc.). It does appear to rely on a relatively recent Bash feature (FAQ entry #50 mentions v3.1, but I wasn't sure whether that was arrays in general of some of the syntax they were using with it), but I think it's likely I'll only be dealing with versions that have it.
(If the above is correct and you want to un-delete your answer, Ignacio, I'll accept it provided I haven't accepted any others yet, although I stand by my statement about link-only answers.)

Why do you want to "build up" a command? Add the files and flags to arrays using proper
quoting and issue the command directly using the quoted arrays as arguments.
Selected lines from your script (omitting unchanged ones):
if [[ ${ARG:0:1} == - ]]; then # using a Bash idiom
FLAGS+=("$ARG") # add an element to an array
FILES+=("$ARG")
echo "Issuing: bar \"${FLAGS[#]}\" \"${FILES[#]}\""
bar "${FLAGS[#]}" "${FILES[#]}"
For a quick demo of using arrays in this manner:
$ a=(aaa 'bbb ccc' ddd); for arg in "${a[#]}"; do echo "..${arg}.."; done
Output:
..aaa..
..bbb ccc..
..ddd..
Please see BashFAQ/050 regarding putting commands in variables. The reason that your script doesn't work is because there's no way to quote the arguments within a quoted string. If you were to put quotes there, they would be considered part of the string itself instead of as delimiters. With the arguments left unquoted, word splitting is done and arguments that include spaces are seen as more than one argument. Arguments with "<", ">" or "|" are not a problem in any case since redirection and piping is performed before variable expansion so they are seen as characters in a string.
By putting the arguments (filenames) in an array, spaces, newlines, etc., are preserved. By quoting the array variable when it's passed as an argument, they are preserved on the way to the consuming program.
Some additional notes:
Use lowercase (or mixed case) variable names to reduce the chance that they will collide with the shell's builtin variables.
If you use single square brackets for conditionals in any modern shell, the archaic "x" idiom is no longer necessary if you quote the variables (see my answer here). However, in Bash, use double brackets. They provide additional features (see my answer here).
Use getopts as Let_Me_Be suggested. Your script, though I know it's only an example, will not be able to handle switches that take arguments.
This for ARG in "$#" can be shortened to this for ARG (but I prefer the readability of the more explicit version).

See BashFAQ #50 (and also maybe #35 on option parsing). For the scenario you describe, where you're building a command dynamically, the best option is to use arrays rather than simple strings, as they won't lose track of where the word boundaries are. The general rules are: to create an array, instead of VAR="foo bar baz", use VAR=("foo" "bar" "baz"); to use the array, instead of $VAR, use "${VAR[#]}". Here's a working version of your example script using this method:
#!/bin/bash
# This is clearly wrong
FILES=()
FLAGS=()
for ARG in "$#"; do
echo "foo: Handling $ARG"
if [ x${ARG:0:1} = "x-" ]; then
# Looks like a flag, add it to the flags array
FLAGS=("${FLAGS[#]}" "$ARG") # FLAGS+=("$ARG") would also work in bash 3.1+, as Dennis pointed out
else
# Looks like a file, add it to the files string
FILES=("${FILES[#]}" "$ARG")
fi
done
# Call bar with the flags and files (we don't care that they'll
# have an extra space or two)
CMD=("bar" "${FLAGS[#]}" "${FILES[#]}")
echo "Issuing: ${CMD[*]}"
"${CMD[#]}"
Note that in the echo command I used "${VAR[*]}" instead of the [#] form because there's no need/point to preserving word breaks here. If you wanted to print/record the command in unambiguous form, this would be a lot messier.
Also, this gives you no way to build up redirections or other special shell options in the built command -- if you add >outfile to the FILES array, it'll be treated as just another command argument, not a shell redirection. If you need to programmatically build these, be prepared for headaches.

getopts should be able to handle spaces in arguments correctly ("file name.txt"). Weird characters should work as well, assuming they are correctly escaped (ls -b).

Related

Shell script (bash) to match a string variable with multiple values

I am trying write a script to compare one string variable to a list of values, i.e. if the variable matches (exact) to one of the values, then some action needs to be done.
The script is trying to match Unix pathnames, i.e. if the user enters / , /usr, /var etc, then to give an error, so that we do not get accidental corruption using the script. The list of values may change in future due to the application requirements. So I cannot have huge "if" statement to check this.
What I intend to do is that in case if the user enters, any of the forbidden path to give an error but sub-paths which are not forbidden should be allowed, i.e. /var should be rejected but /var/opt/app should be accepted.
I cannot use regex as partial match will not work
I am not sure of using a where loop and an if statement, is there any alternative?
thanks
I like to use associative arrays for this.
declare -A nonoList=(
[/foo/bar]=1
["/some/other/path with spaces"]=1
[/and/so/on]=1
# as many as you need
)
This can be kept in a file and sourced, if you want to separate it out.
Then in your script, just do a lookup.
if [[ -n "${nonoList[$yourString]}" ]] # -n checks for nonzero length
This also prevents you from creating a big file ad grep'ing over it redundantly, though that also works.
As an alternative, if you KNOW there will not be embedded newlines in any of those filenames (it's a valid character, but messy for programming) then you can do this:
$: cat foo
/foo/bar
/some/other/path with spaces
/and/so/on
Just a normal file with one file-path per line. Now,
chkSet=$'\n'"$(<foo)"$'\n' # single var, newlines before & after each
Then in your processing, assuming f=/foo/bar or whatever file you're checking,
if [[ "$chkSet" =~ $'\n'"$f"$'\n' ]] # check for a hit
This won't give you accidental hits on /some/other/path when the actual filename is /some/other/path with spaces because the pattern explicitly checks for a newline character before and after the filename. That's why we explicitly assure they exist at the front and end of the file. We assume they are in between, so make sure your file doesn't have any spaces (or any other characters, like quotes) that aren't part of the filenames.
If you KNOW there will also be no embedded whitespace in your filenames, it's a lot easier.
mapfile -t nopes < foo
if [[ " ${nopes[*]} " =~ " $yourString " ]]; then echo found; else echo no; fi
Note that " ${nopes[*]} " embeds spaces (technically it uses the first character of $IFS, but that's a space by default) into a single flattened string. Again, literal spaces before and behind key and list prevent start/end mismatches.
Paul,
Your alternative work around worked like a charm. I don't have any directories which need embedded space in them. So as long as my script can recognize that there are certain directories to avoid, it does its job.
Thanks

When processing string with Bash, how to treat comma differently depending on whether it's surrounded by some specific characters?

I would like to transform a MySQL script into a JSON file and was asked to use Bash for it.
By writing a simple shell script:
#!/bin/bash
# I know this script just output each entry with its value, because I haven' t gone any further
for filename in $dir/home/*.sql
do
cat $filename | while read line
do
names=${line%values*}
names=${names#*(}
names=${names%)*}
values=${line#*values(}
values=${values%)*}
while [[ $names != $currentname ]]
do
currentname=${names%%,*}
currentvalue=${values%%,*}
echo $currentname
echo $currentvalue
names=${names#*,}
values=${values#*,}
done
done
done
I have been basically able to fulfill the requirement. However, there is one more problem.
Some of the string entries has comma among its characters.
This causes a mistake that my script thinks these commas as the ones that separates values and thus a string bearing comma will be treated as two different strings.
It would be an easy task to solve this with programming languages like C++, but I have been asked to do this only with bash shell script although I am not familiar with it. So now I have been stuck with no clue. Maybe regular expression would be the cure? Or if there are other approaches please also help.
FYI, here is an example of the problem:
Input:
values(100, 'A100', 'A,100');
Expected output:
100
'A100'
'A,100'
Actual current output:
100
'A100'
'A
100'
Something like this may help:
data="values(100, 'A100', 'A,100');"
json=${data//values(}
json=${json//);}
json=${json//, /$'\n'}
echo "$json"
Expected output:
Typically in shell you would match it with a regex:
echo "values(100, 'A100', 'A,100');" | sed 's/values(//; s/\(, \|);\)/\n/g'
but this does not solve the problem at all.
The best and only solution is to write a real parser for real mysql langauge to 'handle' '' ' ' 'all\tcorner\'cases' properly. Read the input char by char, store state (ex. if you are inside quotation or not), handle '\'' and other \n etc. sequences for the need of extracting the field. You might interest yourself in mysql internal lexer (it's big!) and lex and yacc programs.
Check your scripts with http://shellcheck.net . Read https://mywiki.wooledge.org/BashFAQ/001 . Quote variable expansions. Don't be nominated for useless cat award.
and was asked to use Bash for it.
Bash is a shell - it's primary role is to run and connect other programs with each other. Bash is a shell, not a full blown programming language, and writing programming stuff in it is going to be very hard or it just ends up using external programs, as that's what it's for. Write the parser in other language - use bash to run it. If you're comfortable in C++, write it in C++ inside a bash script, then compile and execute it inside a bash script.
A common arrangement is to use regex for this, yes; for example, this is a requirement for parsing CSV files. But you can parse the line piece by piece like in your attempt.
However, you have a number of quoting errors which would prevent your code from working even if you figured out a way to parse the input the way you want to. (And of course, get rid of the Useless use of cat?)
while read -r line; do
case $line in
*values\(*\)\; );;
*) continue;;
esac
line=${line#values\(}
line=${line%\)\;}
while [ "$line" ]; do
case $line in
\'*)
line=${line#\'}
tail=${line#*\'}
value=\'${line%"$tail"}
line=${tail#,}
line=${line# };;
*) value=${line%%,*}
line=${line#*,}
line=${line# };;
esac
echo "$value"
done
done <"$filename"
This is probably not really the way to go, just a hint if you really want to try to tackle this in Bash. I would write a simple parser in Python if I wanted to cover all bases.

concatenate two strings and one variable using bash

I need to generate filename from three parts, two strings, and one variable.
for f in `cat files.csv`; do echo fastq/$f\_1.fastq.gze; done
files.csv has the following lines:
Sample_11
Sample_12
I need to generate the following:
fastq/Sample_11_1.fastq.gze
fastq/Sample_12_1.fastq.gze
My problem is that I got the below files:
_1.fastq.gze_11
_1.fastq.gze_12
the string after the variable deletes the string before it.
I appreciate any help
Regards
By the way your idiom: for f in cat files.csv should be avoid. Refer: Dangerous Backticks
while read f
do
echo "fastq/${f}/_1.fastq.gze"
done < files.csv
You can make it a one-liner with xargs and printf.
xargs printf 'fastq/%s_1.fastq.gze\n' <files.csv
The function of printf is to apply the first argument (the format string) to each argument in turn.
xargs says to run this command on as many files as it can fit onto the command line (splitting it up into multiple invocations if the input file is too large to fit all the arguments onto a single command line, subject to the ARG_MAX constant in your kernel).
Your best bet, generally, is to wrap the variable name in braces. So, in this case:
echo fastq/${f}_1.fastq.gz
See this answer for some details about the general concept, as well.
Edit: An additional thought looking at the now-provided output makes me think that this isn't a coding problem at all, but rather a conflict between line-endings and the terminal/console program.
Specifically, if the CSV file ends its lines with just a carriage return (ASCII/Unicode 13), the end of Sample_11 might "rewind" the line to the start and overwrite.
In that case, based loosely on this article, I'd recommend replacing cat (if you understandably don't want to re-architect the actual script with something like while) with something that will strip the carriage returns, such as:
for f in $(tr -cd '\011\012\040-\176' < temp.csv)
do
echo fastq/${f}_1.fastq.gze
done
As the cited article explains, Octal 11 is a tab, 12 a line feed, and 40-176 are typeable characters (Unicode will require more thinking). If there aren't any line feeds in the file, for some reason, you probably want to replace that with tr '\015' '\012', which will convert the carriage returns to line feeds.
Of course, at that point, better is to find whatever produces the file and ask them to put reasonable line-endings into their file...

Reversal using functions in Bash

I am trying to create a reverse function in Bash where it will reverse either a directory or just an array of items as a parameter using a main function and a reverse function. I believe that my main function is messed up or that I cannot invoke the reverse function that I have created. I am not that familiar with Bash either.
#!/bin/bash
function reverse(){
input=$1 #1st parameter
copy=${input}
len=${#copy}
for((i=$len-1;i>=0;i--)); do
if [ $i = "[" ]; then
continue
fi
rev="$rev${copy:$i:1}"
done
echo "var: $var, rev: $rev"
}
function main(){
arr=( tiger lion bear )
mydir = $arr
reverse $mydir
echo $reverse
# should print: bear lion tiger
}
main
There are two big problems with your code:
When assigning to mydir, you probably want to copy the whole array. But you assign $arr, which is the same as assigning only the first element, ${arr[0]}. Quoting Arrays section of man bash:
Referencing an array variable without a subscript is equivalent to referencing with a subscript of 0.
You should use mydir=( "${arr[#]}" ) to copy the array and then a similar construct to pass the whole array to the reverse function. But you don’t have to copy the array at all, this main is sufficient:
function main(){
arr=( tiger lion bear )
reverse "${arr[#]}"
}
Your implementation of reverse reverses letters in its first parameter, i.e. the first item of arr array, as it is called now. You should reverse the array instead. It will be stored in parameters of the function, the # variable. You can get all the parameters of your function as a params array via params=( "$#" ).
The second point strongly hints me that you have no idea what you are doing and you just copy-pasted parts of the script from somewhere. I will not write the code for you. It is really easy to do if you know the basics of Bash. If you don’t, go learn them. You will learn nothing by copy-pasting ad-hoc snippets from Stack Overflow or other sites. Next time, you would come to ask virtually the same question again. We are not here to make your homework for you, we are here to teach you.
There are also several minor issues, but still pretty severe in terms of functionality.
You must not have spaces around the = in variable assignment in shell. You have them in main when assigning to mydir. This is why Bash probably prints an error saying that is cannot find any command named mydir. Each shell splits the command line into words and treats the first word as command name.
Your reverse function copies input variable needlessly – unless you really want to print the original value at the end as you do now, of course. But then, I would rather copy the content to a variable named original and work on the input variable, because then the intent would be easier to understand.
Everywhere you expand a variable in shell, you should expand it inside double quotes, unless you have a good reason to do otherwise and you know what you are doing. If you expand it outside double quotes, you are going to get into trouble with escaping errors. Double quotes protect most special characters inside from being interpreted by the shell, only allowing variable, command, arithmetic and history expansion.
In the loop in reverse, i is supposed to contain a number. It is nonsense to test if for string equality to [. That condition is always false.
You never initialize var variable, but you use its value in the echo statement at the end of reverse body.
You also do not specify value of rev variable before you first use it. As all variables are empty by default and you use reverse only once, this is not an issue, but it still is not a good practice.
Another variable without initialization is reverse at the last line of main. Function call in shell returns only status, pretty much like any other command. You can modify a (global) variable inside the function, but you don’t do that. The echo command at the end of main is thus useless.
I already told you enough to figure out the correct implementation of reverse even if you are a shell beginner. If you read the Arrays section of Bash manual again, everything should be clear. If it is not, comment and I’ll try to give you more guidance.

Is this batch file injection?

C:\>batinjection OFF ^& DEL c.c
batinjection.bat has contents of ECHO %*
I've heard of SQL injection, though i've never actually done it, but is this injection? Are there different types of injection and this is one of them?
Or is there another technical term for this? or a more specific term?
Note- a prior edit had C:\>batinjection OFF & DEL c.c(i.e. without ^%) and ECHO %1(i.e. without %*) which wasn't quite right. I have corrected it. It doesn't affect the answers.
Your example presents three interesting issues that are easier to understand
when separated.
First, Windows allows multiple statements to be executed on one line by
separating with "&". This could potentially be used in an injection attack.
Second, ECHO parses and interprets messages passed to it. If the message is
"OFF" or "/?" or even blank, then ECHO will provide a different expected
behavior than just copying the message to stdout.
Third, you know that it's possible to inject code into a number of
scriptable languages, including batch files, and want to explore ways
to recognize it so you can better defend against it in your code.
It would be easier to recognize the order in which things are happening
in your script if you add an echo statement before and after the one
you're trying to inject. Call it foo.bat.
#echo off
echo before
echo %1
echo after
Now, you can more easily tell whether your injection attempt executed at
the command line (not injection) or was executed as a result of parameter
expansion that broke out of the echo statement and executed a new statement
(injection).
foo dir
Results in:
before
dir
after
Pretty normal so far. Try a parameter that echo interprets.
foo /?
Results in:
before
Displays messages, or turns command-echoing on or off.
ECHO [ON | OFF]
ECHO [message]
Type ECHO without parameters to display the current echo setting.
after
Hmm. Help for the echo command. It's probably not the desired use of
echo in that batch file, but it's not injection. The parameters were
not used to "escape out" of the limits of either the echo statement or
the syntax of the batch file.
foo dog & dir
Results in:
before
dog
after
[A spill of my current directory]
Okay, the dir happened outside of the script. Not injection.
foo ^&dir/w
Results in:
before
ECHO is off.
[A spill of my current directory in wide format]
after
Now, we've gotten somewhere. The dir is not a function of ECHO, and is
running between the before and after statements. Let's try something
more dramatic but still mostly harmless.
foo ^&dir\/s
Yikes! You can pass an arbitrary command that can potentially impact
your system's performance all inside an innocuous-looking "echo %1".
Yes, it's a type of injection, and it's one of the big problems with batch files, that mostly it isn't a purposefully attac, most of the time you simple get trouble with some characters or word like OFF.
Therefore you should use technics to avoid this problems/vulnerabilitys.
In your case you could change your batch file to
set "param1=%*"
setlocal EnableDelayedExpansion
echo(!param1!
I use echo( here instead of echo. or something else, as it is the only known secure echo for all appended contents.
I use the delayed expansion ! instead of percent expansion, as delayed expansion is always safe against any special characters.
To use the delayed expansion you need to transfer the parameter into a variable and a good way is to use quotes around the set command, it avoid many problems with special characters (but not all).
But to build an absolutly secure way to access batch parameters, the way is quite harder.
Try to make this safe is tricky
myBatch.bat ^&"&"
You could read SO: How to receive even the strangest command line parameters?
The main idea is to use the output of a REM statement while ECHO ON.
This is safe in the way, that you can't inject code (or better: only with really advanced knowledge), but the original content can be changed, if your content is something like.
myBatch.bat myContent^&"&"%a
Will be changed to myContent&"&"4
AFAIK, this is know as command injection (which is one of types code injection attack).
The later link lists various injection attacks. The site (www.owasp.org) is an excellent resource regarding web security.
There are multiple applications of injection one can generalize as "language injection". SQL Injection and Cross Site Scripting are the most popular, but others are possible.
In your example, the ECHO statement isn't actually performing the delete, so I wouldn't call that injection. Instead, the delete happens outside of the invocation of the batinjection script itself.

Resources