Using a glob expression passed as a bash script argument - linux

TL;DR:
Why isn't invoking ./myscript foo* when myscript has var=$1 the same as invoking ./myscript with var=foo* hardcoded?
Longer form
I've come across a weird issue in a bash script I'm writing. I am sure there is a simple explanation, but I can't figure it out.
I am trying to pass a command line argument to be assigned as a variable in the script.
I want the script to allow 2 command line arguments as follows:
$ bash my_bash_script.bash args1 args2
In my script, I assigned variables like this:
ARGS1=$1
ARGS2=$2
Args 1 is a string descriptor to add to the output file.
Args 2 is a group of directories: "dir1, dir2, dir3", which I am passing as dir*
When I assign dir* to ARGS2 in the script it works fine, but when I pass dir* as the second command line argument, it only includes dir1 in the wildcard expansion of dir*.
I assume this has something to do with how the shell handles wildcards (even when passed as args), but I don't really understand it.
Any help would be appreciated.
Environment / Usage
I have a group of directories:
dir_1_y_map, dir_1_x_map, dir_2_y_map, dir_2_x_map,
... dir_10_y_map, dir_10_x_map...
Inside these directories I am trying to access a file with extension ".status" via *.status, and ".report.txt" via *report.txt.
I want to pass dir_*_map as the second argument to the script and store it in the variable ARGS2, then use it to search within each of the directories for the ".status" and ".report" files.
The issue is that passing dir_*_map from the command line doesn't give the list of directories, but rather just the first item in the list. If I assign the variable ARGS2=dir_*_map within the script, it works as I intend.
Workaround: Quoting
It turns out that passing the second argument in quotes allowed the wildcard expansion to work appropriately for "dir_*_map"
#!/usr/bin/env bash
ARGS1=$1
ARGS2=$2
touch $ARGS1".extension"
for i in /$ARGS2/*.status
do
grep -e "string" $i >> $ARGS1".extension"
done
Here is an example invocation of the script:
sh ~/path/to/script descriptor "dir_*_map"
I don't fully understand when/why some arguments must be passed in quotes, but I assume it has to do with the wildcard expansion in the for loop.

Addressing the "why"
Assignments, as in var=foo*, don't expand globs -- that is, when you run var=foo*, the literal string foo* is put into the variable foo, not the list of files matching foo*.
By contrast, unquoted use of foo* on a command line expands the glob, replacing it with a list of individual names, each of which is passed as a separate argument.
Thus, running ./yourscript foo* doesn't pass foo* as $1 unless no files matching that glob expression exist; instead, it becomes something like ./yourscript foo01 foo02 foo03, with each argument in a different spot on the command line.
The reason running ./yourscript "foo*" functions as a workaround is the unquoted expansion inside the script allowing the glob to be expanded at that later time. However, this is bad practice: glob expansion happens concurrent with string-splitting (meaning that relying on this behavior removes your ability to pass filenames containing characters found in IFS, typically whitespace), and also means that you can't pass literal filenames when they could also be interpreted as globs (if you have a file named [1] and a file named 1, passing [1] would always be replaced with 1).
Idiomatic Usage
The idiomatic way to build this would be to shift away the first argument, and then iterate over subsequent ones, like so:
#!/bin/bash
out_base=$1; shift
shopt -s nullglob # avoid generating an error if a directory has no .status
for dir; do # iterate over directories passed in $2, $3, etc
for file in "$dir"/*.status; do # iterate over files ending in .status within those
grep -e "string" "$file" # match a single file
done
done >"${out_base}.extension"
If you have many .status files in a single directory, all this can be made more efficient by using find to invoke grep with as many arguments as possible, rather than calling grep individually on a per-file basis:
#!/bin/bash
out_base=$1; shift
find "$#" -maxdepth 1 -type f -name '*.status' \
-exec grep -h -- /dev/null '{}' + \
>"${out_base}.extension"
Both scripts above expect the globs passed not to be quoted on the invoking shell. Thus, usage is of the form:
# being unquoted, this expands the glob into a series of separate arguments
your_script descriptor dir_*_map
This is considerably better practice than passing globs to your script (which then is required to expand them to retrieve the actual files to use); it works correctly with filenames containing whitespace (which the other practice doesn't), and files whose names are themselves glob expressions.
Some other points of note:
Always put double quotes around expansions! Failing to do so results in the additional steps of string-splitting and glob expansion (in that order) being applied. If you want globbing, as in the case of "$dir"/*.status, then end the quotes before the glob expression starts.
for dir; do is precisely equivalent to for dir in "$#"; do, which iterates over arguments. Don't make the mistake of using for dir in $*; do or for dir in $#; do instead! These latter invocations combine each element of the list with the first character of IFS (which, by default, contains the space, the tab and the newline in that order), then splits the resulting string on any IFS characters found within, then expands each component of the resulting list as a glob.
Passing /dev/null as an argument to grep is a safety measure: It ensures that you don't have different behavior between the single-argument and multi-argument cases (as an example, grep defaults to printing filenames within output only when passed multiple arguments), and ensures that you can't have grep hang trying to read from stdin if it's passed no additional filenames at all (which find won't do here, but xargs can).
Using lower-case names for your own variables (as opposed to system- and shell-provided variables, which have all-uppercase names) is in accordance with POSIX-specified convention; see fourth paragraph of the POSIX specification regarding environment variables, keeping in mind that environment variables and shell variables share a namespace.

Related

Delete files in a variable - bash

i have a variable of filenames that end with a vowel. I need to delete all of these files at once. I have tried using
rm "$vowels"
but that only seems to return the files within the variable and state that there is "No such file or Directory"
Its your use of quotes: they tell rm that your variables contents are to be interpreted as a single argument (filename). Without quotes the contents will be broken into multiple arguments using the shell rules in effect.
Be aware that this can be risky if your filenames contain spaces - as theres no way to tell the difference between spaces between filenames, and spaces IN filenames.
You can get around this by using an array instead and using quoted array expansion (which I cant remember the syntax of, but might look something like rm "${array[#]}" - where each element in the array will be output as a quoted string).
SOLUTION
assigning the variable
vowel=$(find . -type f | grep "[aeiou]$")
removing all files within variable
echo $vowel | xargs rm -v

Why does echo command interpret variable for base directory?

I would like to find some file types in pictures folder and I have created the following bash-script in /home/user/pictures folder:
for i in *.pdf *.sh *.txt;
do
echo 'all file types with extension' $i;
find /home/user/pictures -type f -iname $i;
done
But when I execute the bash-script, it does not work as expected for files that are located on the base directory /home/user/pictures. Instead of echo 'All File types with Extension *.sh' the command interprets the variable for base directory:
all file types with extension file1.sh
/home/user/pictures/file1.sh
all file types with extension file2.sh
/home/user/pictures/file2.sh
all file types with extension file3.sh
/home/user/pictures/file3.sh
I would like to know why echo - command does not print "All File types with Extension *.sh".
Revised code:
for i in '*.pdf' '*.sh' '*.txt'
do
echo "all file types with extension $i"
find /home/user/pictures -type f -iname "$i"
done
Explanation:
In bash, a string containing *, or a variable which expands to such a string, may be expanded as a glob pattern unless that string is protected from glob expansion by putting it inside quotes (although if the glob pattern does not match any files, then the original glob pattern will remain after attempted expansion).
In this case, it is not wanted for the glob expansion to happen - the string containing the * needs to be passed as a literal to each of the echo and the find commands. So the $i should be enclosed in double quotes - these will allow the variable expansion from $i, but the subsequent wildcard expansion will not occur. (If single quotes, i.e. '$i' were used instead, then a literal $i would be passed to echo and to find, which is not wanted either.)
In addition to this, the initial for line needs to use quotes to protect against wildcard expansion in the event that any files matching any of the glob patterns exist in the current directory. Here, it does not matter whether single or double quotes are used.
Separately, the revised code here also removes some unnecessary semicolons. Semicolons in bash are a command separator and are not needed merely to terminate a statement (as in C etc).
Observed behaviour with original code
What seems to be happening here is that one of the patterns used in the initial for statement is matching files in the current directory (specifically the *.sh is matching file1.sh file2.sh, and file3.sh). It is therefore being replaced by a list of these filenames (file1.sh file2.sh file3.sh) in the expression, and the for statement will iterate over these values. (Note that the current directory might not be the same as either where the script is located or the top level directory used for the find.)
It would also still be expected that the *.pdf and *.txt would be used in the expression -- either substituted or not, depending on whether any matches are found. Therefore the output shown in the question is probably not the whole output of the script.
Such expressions (*.blabla) changes the value of $i in the loop. Here is the trick i would do :
for i in pdf sh txt;
do
echo 'all file types with extension *.'$i;
find /home/user/pictures -type f -iname '*.'$i;
done

Avoid using an array for wildcard expansion in bash

I wrote the following code:
join(){
IFS="$1"
shift
echo "$*"
}
FILES=(/tmp/*)
SEPARATED_FILES=$(join , ${FILES[*]})
echo $VAR
And it prints the comma separated lists of files in /tmp just fine. But I would like to refactor it and eliminate the tmp global variable FILES which is an array. I tried the following:
SEPARATED_FILES=$(join , ${(/tmp/*)[*]})
echo $VAR
But it prints the following error:
line 8: ${(/tmp/*)[*]}: bad substitution
Yes! You can avoid it by doing pass the glob as directly an argument to the function. Note that, the glob results are expanded by the shell before passing to the function. So pass the first argument as the IFS you want to set and the second as the glob expression you want to use.
join , /tmp/*
The glob is expanded to file names before the function is being called.
join , /tmp/file1 /tmp/file2 /tmp/file3
A noteworthy addition to the above would be to use nullglob option before calling the function. Because when the glob does not produce any results, the un-expanded string can be safely ignored.
shopt -s nullglob
join , /tmp/*
and in a command substitution syntax as
fileList=$(shopt -s nullglob; join , /tmp/*)
Couple of takeaways from your good effort.
Always apply shell quoting to variables/arrays unless you have a reason not to do so. Doing so preserves the literal value of the contents inside and prevents Word-Splitting from happening
Always use lower case names for user-defined variable/function and array names

find command works on prompt, not in bash script - pass multiple arguments by variable

I've searched around questions with similar issues but haven't found one that quite fits my situation.
Below is a very brief script that demonstrates the problem I'm facing:
#!/bin/bash
includeString="-wholename './public_html/*' -o -wholename './config/*'"
find . \( $includeString \) -type f -mtime -7 -print
Basically, we need to search inside a folder, but only in certain of its subfolders. In my longer script, includeString gets built from an array. For this demo, I kept things simple.
Basically, when I run the script, it doesn't find anything. No errors, but also no hits. If I manually run the find command, it works. If I remove ( $includeString ) it also works, though obviously it doesn't limit itself to the folders I want.
So why would the same command work from the command line but not from the bash script? What is it about passing in $includeString that way that causes it to fail?
You're running into an issue with how the shell handles variable expansion. In your script:
includeString="-wholename './public_html/*' -o -wholename './config/*'"
find . \( $includeString \) -type f -mtime -7 -print
This results in find looking for files where -wholename matches the literal string './public_html/*'. That is, a filename that contains single quotes. Since you don't have any whitespace in your paths, the easiest solution here would be to just drop the single quotes:
includeString="-wholename ./public_html/* -o -wholename ./config/*"
find . \( $includeString \) -type f -mtime -7 -print
Unfortunately, you'll probably get bitten by wildcard expansion here (the shell will attempt to expand the wildcards before find sees them).
But as Etan pointed out in his comment, this appears to be needlessly complex; you can simply do:
find ./public_html ./config -type f -mtime -7 -print
If you want to store a list of arguments and expand it later, the correct form to do that with is an array, not a string:
includeArgs=( -wholename './public_html/*' -o -wholename './config/*' )
find . '(' "${includeArgs[#]}" ')' -type f -mtime -7 -print
This is covered in detail in BashFAQ #50.
Note: As Etan points out in a comment, the better solution in this case may be to reformulate the find command, but passing multiple arguments via variable(s) is a technique worth exploring in general.
tl;dr:
The problem is not specific to find, but to how the shell parses command lines.
Quote characters embedded in variable values are treated as literals: They are neither recognized as argument-boundary delimiters nor are they removed after parsing, so you cannot use a string variable with embedded quoting to pass multiple arguments simply by directly using it as part of a command.
To robustly pass multiple arguments stored in a variable,
use array variables in shells that support them (bash, ksh, zsh) - see below.
otherwise, for POSIX compliance, use xargs - see below.
Robust solutions:
Note: The solutions assume presence of the following script, let's call it echoArgs, which prints the arguments passed to it in diagnostic form:
#!/usr/bin/env bash
for arg; do # loop over all arguments
echo "[$arg]" # print each argument enclosed in [] so as to see its boundaries
done
Further, assume that the equivalent of the following command is to be executed:
echoArgs one 'two three' '*' last # note the *literal* '*' - no globbing
with all arguments but the last passed by variable.
Thus, the expected outcome is:
[one]
[two three]
[*]
[last]
Using an array variable (bash, ksh, zsh):
# Assign the arguments to *individual elements* of *array* args.
# The resulting array looks like this: [0]="one" [1]="two three" [2]="*"
args=( one 'two three' '*' )
# Safely pass these arguments - note the need to *double-quote* the array reference:
echoArgs "${args[#]}" last
Using xargs - a POSIX-compliant alternative:
POSIX utility xargs, unlike the shell itself, is capable of recognized quoted strings embedded in a string:
# Store the arguments as *single string* with *embedded quoting*.
args="one 'two three' '*'"
# Let *xargs* parse the embedded quoted strings correctly.
# Note the need to double-quote $args.
echo "$args" | xargs -J {} echoArgs {} last
Note that {} is a freely chosen placeholder that allows you to control where in the resulting command line the arguments provided by xargs go.
If all xarg-provided arguments go last, there is no need to use -J at all.
For the sake of completeness: eval can also be used to parse quoted strings embedded in another string, but eval is a security risk: arbitrary commands could end up getting executed; given the safe solutions discussed above, there is no need to use eval.
Finally, Charles Duffy mentions another safe alternative in a comment, which, however, requires more coding: encapsulate the command to invoke in a shell function, pass the variable arguments as separate arguments to the function, then manipulate the all-arguments array $# inside the function to supplement the fixed arguments (using set), and invoke the command with "$#".
Explanation of the shell's string-handling issues involved:
When you assign a string to a variable, embedded quote characters become part of the string:
var='one "two three" *'
$var now literally contains one "two three" *, i.e., the following 4 - instead of the intended 3 - words, separated by a space each:
one
"two-- " is part of the word itself!
three"-- " is part of the word itself!
*
When you use $var unquoted as part of an argument list, the above breakdown into 4 words is exactly what the shell does initially - a process called word splitting. Note that if you were to double-quote the variable reference ("$var"), the entire string would always become a single argument.
Because $var is expanded to its value, one of the so-called parameter expansions, the shell does NOT attempt to recognize embedded quotes inside that value as marking argument boundaries - this only works with quote characters specified literally, as a direct part of the command line (assuming these quote characters aren't themselves quoted).
Similarly, only such directly specified quote characters are removed by the shell before passing the enclosed string to the command being invoked - a process called quote removal.
However, the shell additionally applies pathname expansion (globbing) to the resulting 4 words, so any of the words that happen to match filenames will expand to the matching filenames.
In short: the quote characters in $var's value are neither recognized as argument-boundary delimiters nor are they removed after parsing. Additionally, the words in $var's value are subject to pathname expansion.
This means that the only way to pass multiple arguments is to leave them unquoted inside the variable value (and also leave the reference to that variable unquoted), which:
won't work with values with embedded spaces or shell metacharacters
invariably subjects the values to pathname expansion

bash - run script based on substring of filename (perhaps using wildcard)

I've got the below simple script that calls an external script with a number of filenames and arguments of either a delimiter or a set of cut positions. My question: is there a way to make the filename 'dynamic using wildcards' in the sense that the directory will always contain those filenames but with extra text on either end? But the script can do some sort of match up to get the full filename based on a 'contains'.
current /release/ext/ directory contents:
2011storesblah.dat
hrlatest.dat
emp_new12.txt
ie the directory contains these files today (but next week the filenames in this directory could have a slightly different prefix.
eg:
stores_newer.dat
finandhr.dat
emps.txt
Script:
#!/bin/bash
FILES='/release/ext/stores.dat "|"
/release/ext/emp.txt 1-3 4-11 15-40
/release/ext/hr.dat "|" 2'
for f in $FILES
do
echo `sh myexternalscript.sh $f`;
done
Note: there is no need to handle a scenario where the file in my script matches more than 2 files in the direc (it will always only match one).
Also it only can match the file types that are specified in the script.
Also, I don't need to search recursively, just needs to look in the /release/ext/ directory only.
I'm running SunOS 5.10.
$FILES=`find /release/ext -name *stores*.dat`
for FILE in $FILES do
# need to test for empty, case $FILES is empty
test -n "$FILE" && /do/whatever/you/want
done;
It is unclear what the pipe characters and numbers are for in your $FILES variable. However, here is something you might find useful:
#!/bin/bash
filespecs='*stores*.dat *hr*.dat *emp*.txt'
dir='/release/ext'
cd "$dir"
for file in $filespecs
do
sh myexternalscript.sh "$dir/$file"
done
Note that your question is tagged "bash" and you use "bash" in your shebang, but for some reason, you use "sh" when you call your other script. On some systems, sh is symlinked to Bash, but it will behave differently than Bash when called directly. On many systems, sh is completely separate from Bash.
In order to expand the globs and incorporate other arguments, you will need to violate the Bash rule of always quoting variables (this is an example of one of the exceptions).
filespecs='*stores*.dat | 3
*hr*.dat 4 5
*emp*.txt 6 7 8'
while read -r spec arg1 arg2 arg3 arg4
do
sh myexternalscript.sh "$dir"/$spec "$arg1" "$arg2" "$arg3" "$arg4"
done < <(echo "$filespecs")
Use as many "arg" arguments as you think you'll need. Extras will be passed as empty, but set arguments. If there are more arguments than variables to accept them, then the last variable will contain all the remainders in addition to the one that corresponds to it. This version doesn't need the cd since the glob isn't expanded until the directory has been prepended, while in the first version the glob is expanded before the directory is prepended.
If you quote the pipes in the manner shown in your question, then the double quotes will be included in the argument. In the way I show it, only the pipe character gets passed but it's protected since the variable is quoted at the time it's referenced.

Resources