find command works on prompt, not in bash script - pass multiple arguments by variable - linux

I've searched around questions with similar issues but haven't found one that quite fits my situation.
Below is a very brief script that demonstrates the problem I'm facing:
#!/bin/bash
includeString="-wholename './public_html/*' -o -wholename './config/*'"
find . \( $includeString \) -type f -mtime -7 -print
Basically, we need to search inside a folder, but only in certain of its subfolders. In my longer script, includeString gets built from an array. For this demo, I kept things simple.
Basically, when I run the script, it doesn't find anything. No errors, but also no hits. If I manually run the find command, it works. If I remove ( $includeString ) it also works, though obviously it doesn't limit itself to the folders I want.
So why would the same command work from the command line but not from the bash script? What is it about passing in $includeString that way that causes it to fail?

You're running into an issue with how the shell handles variable expansion. In your script:
includeString="-wholename './public_html/*' -o -wholename './config/*'"
find . \( $includeString \) -type f -mtime -7 -print
This results in find looking for files where -wholename matches the literal string './public_html/*'. That is, a filename that contains single quotes. Since you don't have any whitespace in your paths, the easiest solution here would be to just drop the single quotes:
includeString="-wholename ./public_html/* -o -wholename ./config/*"
find . \( $includeString \) -type f -mtime -7 -print
Unfortunately, you'll probably get bitten by wildcard expansion here (the shell will attempt to expand the wildcards before find sees them).
But as Etan pointed out in his comment, this appears to be needlessly complex; you can simply do:
find ./public_html ./config -type f -mtime -7 -print

If you want to store a list of arguments and expand it later, the correct form to do that with is an array, not a string:
includeArgs=( -wholename './public_html/*' -o -wholename './config/*' )
find . '(' "${includeArgs[#]}" ')' -type f -mtime -7 -print
This is covered in detail in BashFAQ #50.

Note: As Etan points out in a comment, the better solution in this case may be to reformulate the find command, but passing multiple arguments via variable(s) is a technique worth exploring in general.
tl;dr:
The problem is not specific to find, but to how the shell parses command lines.
Quote characters embedded in variable values are treated as literals: They are neither recognized as argument-boundary delimiters nor are they removed after parsing, so you cannot use a string variable with embedded quoting to pass multiple arguments simply by directly using it as part of a command.
To robustly pass multiple arguments stored in a variable,
use array variables in shells that support them (bash, ksh, zsh) - see below.
otherwise, for POSIX compliance, use xargs - see below.
Robust solutions:
Note: The solutions assume presence of the following script, let's call it echoArgs, which prints the arguments passed to it in diagnostic form:
#!/usr/bin/env bash
for arg; do # loop over all arguments
echo "[$arg]" # print each argument enclosed in [] so as to see its boundaries
done
Further, assume that the equivalent of the following command is to be executed:
echoArgs one 'two three' '*' last # note the *literal* '*' - no globbing
with all arguments but the last passed by variable.
Thus, the expected outcome is:
[one]
[two three]
[*]
[last]
Using an array variable (bash, ksh, zsh):
# Assign the arguments to *individual elements* of *array* args.
# The resulting array looks like this: [0]="one" [1]="two three" [2]="*"
args=( one 'two three' '*' )
# Safely pass these arguments - note the need to *double-quote* the array reference:
echoArgs "${args[#]}" last
Using xargs - a POSIX-compliant alternative:
POSIX utility xargs, unlike the shell itself, is capable of recognized quoted strings embedded in a string:
# Store the arguments as *single string* with *embedded quoting*.
args="one 'two three' '*'"
# Let *xargs* parse the embedded quoted strings correctly.
# Note the need to double-quote $args.
echo "$args" | xargs -J {} echoArgs {} last
Note that {} is a freely chosen placeholder that allows you to control where in the resulting command line the arguments provided by xargs go.
If all xarg-provided arguments go last, there is no need to use -J at all.
For the sake of completeness: eval can also be used to parse quoted strings embedded in another string, but eval is a security risk: arbitrary commands could end up getting executed; given the safe solutions discussed above, there is no need to use eval.
Finally, Charles Duffy mentions another safe alternative in a comment, which, however, requires more coding: encapsulate the command to invoke in a shell function, pass the variable arguments as separate arguments to the function, then manipulate the all-arguments array $# inside the function to supplement the fixed arguments (using set), and invoke the command with "$#".
Explanation of the shell's string-handling issues involved:
When you assign a string to a variable, embedded quote characters become part of the string:
var='one "two three" *'
$var now literally contains one "two three" *, i.e., the following 4 - instead of the intended 3 - words, separated by a space each:
one
"two-- " is part of the word itself!
three"-- " is part of the word itself!
*
When you use $var unquoted as part of an argument list, the above breakdown into 4 words is exactly what the shell does initially - a process called word splitting. Note that if you were to double-quote the variable reference ("$var"), the entire string would always become a single argument.
Because $var is expanded to its value, one of the so-called parameter expansions, the shell does NOT attempt to recognize embedded quotes inside that value as marking argument boundaries - this only works with quote characters specified literally, as a direct part of the command line (assuming these quote characters aren't themselves quoted).
Similarly, only such directly specified quote characters are removed by the shell before passing the enclosed string to the command being invoked - a process called quote removal.
However, the shell additionally applies pathname expansion (globbing) to the resulting 4 words, so any of the words that happen to match filenames will expand to the matching filenames.
In short: the quote characters in $var's value are neither recognized as argument-boundary delimiters nor are they removed after parsing. Additionally, the words in $var's value are subject to pathname expansion.
This means that the only way to pass multiple arguments is to leave them unquoted inside the variable value (and also leave the reference to that variable unquoted), which:
won't work with values with embedded spaces or shell metacharacters
invariably subjects the values to pathname expansion

Related

Delete files in a variable - bash

i have a variable of filenames that end with a vowel. I need to delete all of these files at once. I have tried using
rm "$vowels"
but that only seems to return the files within the variable and state that there is "No such file or Directory"
Its your use of quotes: they tell rm that your variables contents are to be interpreted as a single argument (filename). Without quotes the contents will be broken into multiple arguments using the shell rules in effect.
Be aware that this can be risky if your filenames contain spaces - as theres no way to tell the difference between spaces between filenames, and spaces IN filenames.
You can get around this by using an array instead and using quoted array expansion (which I cant remember the syntax of, but might look something like rm "${array[#]}" - where each element in the array will be output as a quoted string).
SOLUTION
assigning the variable
vowel=$(find . -type f | grep "[aeiou]$")
removing all files within variable
echo $vowel | xargs rm -v

Is it possible to use program output as values for brace expansion?

I'm a linux newbie. I would like to know if you can use a program's output (which is comma separated values) as the values that will be used for brace expansions.
I was mainly trying to do this with touch but started using echo to test different approaches with no luck. the base approach is
echo b/{$(find sourceFolder -type f -printf "%f,")}
I also tried cat but got the same results. The output of find is used as a single value to the braces not multiple comma separated values. I verified this with
echo b/{$(find sourceFolder -type f -printf "%f,"),otherValue}
with this I get 2 outputs b/(with the comma separated values list from find) and b/otherValue
I was able to create the files with touch in the current working folder I wanted to see if its possible to do that in another folder.
It is frustrating but brace expansion is performed before variable expansion. Consequently, you need a different approach. Consider:
find sourceFolder -type f -printf "b/%f\n"
This puts b/ in front of each found file name and does not require brace expansion.
While you cannot do this directly with bash, you can use eval to obtain the same result:
$ eval "echo {$(echo 2,3),4}"
2 3 4

Using a glob expression passed as a bash script argument

TL;DR:
Why isn't invoking ./myscript foo* when myscript has var=$1 the same as invoking ./myscript with var=foo* hardcoded?
Longer form
I've come across a weird issue in a bash script I'm writing. I am sure there is a simple explanation, but I can't figure it out.
I am trying to pass a command line argument to be assigned as a variable in the script.
I want the script to allow 2 command line arguments as follows:
$ bash my_bash_script.bash args1 args2
In my script, I assigned variables like this:
ARGS1=$1
ARGS2=$2
Args 1 is a string descriptor to add to the output file.
Args 2 is a group of directories: "dir1, dir2, dir3", which I am passing as dir*
When I assign dir* to ARGS2 in the script it works fine, but when I pass dir* as the second command line argument, it only includes dir1 in the wildcard expansion of dir*.
I assume this has something to do with how the shell handles wildcards (even when passed as args), but I don't really understand it.
Any help would be appreciated.
Environment / Usage
I have a group of directories:
dir_1_y_map, dir_1_x_map, dir_2_y_map, dir_2_x_map,
... dir_10_y_map, dir_10_x_map...
Inside these directories I am trying to access a file with extension ".status" via *.status, and ".report.txt" via *report.txt.
I want to pass dir_*_map as the second argument to the script and store it in the variable ARGS2, then use it to search within each of the directories for the ".status" and ".report" files.
The issue is that passing dir_*_map from the command line doesn't give the list of directories, but rather just the first item in the list. If I assign the variable ARGS2=dir_*_map within the script, it works as I intend.
Workaround: Quoting
It turns out that passing the second argument in quotes allowed the wildcard expansion to work appropriately for "dir_*_map"
#!/usr/bin/env bash
ARGS1=$1
ARGS2=$2
touch $ARGS1".extension"
for i in /$ARGS2/*.status
do
grep -e "string" $i >> $ARGS1".extension"
done
Here is an example invocation of the script:
sh ~/path/to/script descriptor "dir_*_map"
I don't fully understand when/why some arguments must be passed in quotes, but I assume it has to do with the wildcard expansion in the for loop.
Addressing the "why"
Assignments, as in var=foo*, don't expand globs -- that is, when you run var=foo*, the literal string foo* is put into the variable foo, not the list of files matching foo*.
By contrast, unquoted use of foo* on a command line expands the glob, replacing it with a list of individual names, each of which is passed as a separate argument.
Thus, running ./yourscript foo* doesn't pass foo* as $1 unless no files matching that glob expression exist; instead, it becomes something like ./yourscript foo01 foo02 foo03, with each argument in a different spot on the command line.
The reason running ./yourscript "foo*" functions as a workaround is the unquoted expansion inside the script allowing the glob to be expanded at that later time. However, this is bad practice: glob expansion happens concurrent with string-splitting (meaning that relying on this behavior removes your ability to pass filenames containing characters found in IFS, typically whitespace), and also means that you can't pass literal filenames when they could also be interpreted as globs (if you have a file named [1] and a file named 1, passing [1] would always be replaced with 1).
Idiomatic Usage
The idiomatic way to build this would be to shift away the first argument, and then iterate over subsequent ones, like so:
#!/bin/bash
out_base=$1; shift
shopt -s nullglob # avoid generating an error if a directory has no .status
for dir; do # iterate over directories passed in $2, $3, etc
for file in "$dir"/*.status; do # iterate over files ending in .status within those
grep -e "string" "$file" # match a single file
done
done >"${out_base}.extension"
If you have many .status files in a single directory, all this can be made more efficient by using find to invoke grep with as many arguments as possible, rather than calling grep individually on a per-file basis:
#!/bin/bash
out_base=$1; shift
find "$#" -maxdepth 1 -type f -name '*.status' \
-exec grep -h -- /dev/null '{}' + \
>"${out_base}.extension"
Both scripts above expect the globs passed not to be quoted on the invoking shell. Thus, usage is of the form:
# being unquoted, this expands the glob into a series of separate arguments
your_script descriptor dir_*_map
This is considerably better practice than passing globs to your script (which then is required to expand them to retrieve the actual files to use); it works correctly with filenames containing whitespace (which the other practice doesn't), and files whose names are themselves glob expressions.
Some other points of note:
Always put double quotes around expansions! Failing to do so results in the additional steps of string-splitting and glob expansion (in that order) being applied. If you want globbing, as in the case of "$dir"/*.status, then end the quotes before the glob expression starts.
for dir; do is precisely equivalent to for dir in "$#"; do, which iterates over arguments. Don't make the mistake of using for dir in $*; do or for dir in $#; do instead! These latter invocations combine each element of the list with the first character of IFS (which, by default, contains the space, the tab and the newline in that order), then splits the resulting string on any IFS characters found within, then expands each component of the resulting list as a glob.
Passing /dev/null as an argument to grep is a safety measure: It ensures that you don't have different behavior between the single-argument and multi-argument cases (as an example, grep defaults to printing filenames within output only when passed multiple arguments), and ensures that you can't have grep hang trying to read from stdin if it's passed no additional filenames at all (which find won't do here, but xargs can).
Using lower-case names for your own variables (as opposed to system- and shell-provided variables, which have all-uppercase names) is in accordance with POSIX-specified convention; see fourth paragraph of the POSIX specification regarding environment variables, keeping in mind that environment variables and shell variables share a namespace.

Unix 'alias' fails with 'awk' command

I'm creating an alias in Unix and have found that the following command fails..
alias logspace='find /apps/ /opt/ -type f -size +100M -exec ls -lh {} \; | awk '{print $5, $9 }''
I get the following :
awk: cmd. line:1: {print
awk: cmd. line:1: ^ unexpected newline or end of string
Any ideas on why the piped awk command fails...
Thanks,
Shaun.
To complement's #Dropout's helpful answer:
tl;dr
The problem is the OP's attempt to use ' inside a '-enclosed (single-quoted) string.
The most robust solution in this case is to replace each interior ' with '\'' (sic):
alias logspace='find /apps/ /opt/ -type f -size +100M -exec ls -lh {} \; |
awk '\''{print $5, $9 }'\'''
Bourne-like (POSIX-compatible) shells do not support using ' chars inside single-quoted ('...'-enclosed) strings AT ALL - not even with escaping.
(By contrast, you CAN escape " inside a double-quoted string as \", and, as in #Droput's answer, you can directly, embed ' chars. there, but see below for pitfalls.)
The solution above effectively builds the string from multiple, single-quoted strings into which literal ' chars. - escaped outside the single-quoted strings as \' - are spliced in.
Another way of putting it, as #Etan Reisinger has done in a comment: '\'' means: "close string", "escape single quote", "start new string".
When defining an alias, you usually want single quotes around its definition so as to delay evaluation of the command until the alias is invoked.
Other solutions and their pitfalls:
The following discusses alternative solutions, based on the following alias:
alias foo='echo A '\''*'\'' is born at $(date)'
Note how the * is effectively enclosed in single quotes - using above technique - so as to prevent pathname expansion when the alias is invoked later.
When invoked, this alias prints literal A * star is born, followed by the then-current date and time, e.g.: A * is born at Mon Jun 16 11:33:19 EDT 2014.
Use a feature called ANSI C quoting with shells that support it: bash, ksh, zsh
ANSI C-quoted strings, which are enclosed in $'...', DO allow escaping embedded ' chars. as \':
alias foo=$'echo A \'*\' is born at $(date)'
Pitfalls:
This feature is not part of POSIX.
By design, escape sequences such as \n, \t, ... are interpreted, too (in fact, that's the purpose of the feature).
Use of alternating quoting styles, as in #Dropout's answer:
Pitfall:
'...' and "..." have different semantics, so substituting one for the other can have unintended side-effects:
alias foo="echo A '*' is born at $(date)" # DOES NOT WORK AS INTENDED
While syntactically correct, this will NOT work as intended, because the use of double quotes causes the shell to expand the command substitution $(date) right away, and thus hardwires the date and time at the time of the alias definition into the alias.
As stated: When defining an alias, you usually want single quotes around its definition so as to delay evaluation of the command until the alias is invoked.
Finally, a caveat:
The tricky thing in a Bourne-like shell environment is that embedding ' inside a single-quoted string sometimes - falsely - APPEARS to work (instead of generating a syntax error, as in the question), when it instead does something different:
alias foo='echo a '*' is born at $(date)' # DOES NOT WORK AS EXPECTED.
This definition is accepted (no syntax error), but won't work as expected - the right-hand side of the definition is effectively parsed as 3 strings - 'echo a ', *, and ' is born at $(date)', which, due to how the shell parses string (merging adjacent strings, quote removal), results in the following, single, literal string: a * is born at $(date). Since the * is unquoted in the resulting alias definition, it will expand to a list of all file/directory names in the current directory (pathname expansion) when the alias is invoked.
You chould use different quotes for surrounding the whole text and for inner strings.
Try changing it to
alias logspace="find /apps/ /opt/ -type f -size +100M -exec ls -lh {} \; | awk '{print $5, $9 }'"
In other words, your outer quotes should be different than the inner ones, so they don't mix.
Community wiki update:
The redeeming feature of this answer is recognizing that the OP's problem lies in unescaped use of the string delimiters (') inside a string.
However, this answer contains general string-handling truths, but does NOT apply to (Bourne-like, POSIX-compatible) shell programming specifically, and thus does not address the OP's problem directly - see the comments.
Note: Code snippets are meant to be pseudo code, not shell language.
Basic strings: You canNOT use the same quote within the string as the entire string is delimited with:
foo='hello, 'world!'';
^--start string
^-- end string
^^^^^^^^--- unknown "garbage" causing syntax error.
You have to escape the internal strings:
foo='hello, \'world!\'';
^--escape
This is true of pretty much EVERY programming language on the planet. If they don't provide escaping mechanisms, such as \, then you have to use alternate means, e.g.
quotechar=chr(39); // single quote is ascii #39
foo='hello ,' & quotechar & 'world!' & quotechar;
Escape the $ sign not the awk's single quotes and use double quotes for the alias.
Try this :
alias logspace="find /apps/ /opt/ -type f -size +100M -exec ls -lh {} \; | awk '{print \$5, \$9 }'"

Extract file basename without path and extension in bash [duplicate]

This question already has answers here:
Extract filename and extension in Bash
(38 answers)
Closed 6 years ago.
Given file names like these:
/the/path/foo.txt
bar.txt
I hope to get:
foo
bar
Why this doesn't work?
#!/bin/bash
fullfile=$1
fname=$(basename $fullfile)
fbname=${fname%.*}
echo $fbname
What's the right way to do it?
You don't have to call the external basename command. Instead, you could use the following commands:
$ s=/the/path/foo.txt
$ echo "${s##*/}"
foo.txt
$ s=${s##*/}
$ echo "${s%.txt}"
foo
$ echo "${s%.*}"
foo
Note that this solution should work in all recent (post 2004) POSIX compliant shells, (e.g. bash, dash, ksh, etc.).
Source: Shell Command Language 2.6.2 Parameter Expansion
More on bash String Manipulations: http://tldp.org/LDP/LG/issue18/bash.html
The basename command has two different invocations; in one, you specify just the path, in which case it gives you the last component, while in the other you also give a suffix that it will remove. So, you can simplify your example code by using the second invocation of basename. Also, be careful to correctly quote things:
fbname=$(basename "$1" .txt)
echo "$fbname"
A combination of basename and cut works fine, even in case of double ending like .tar.gz:
fbname=$(basename "$fullfile" | cut -d. -f1)
Would be interesting if this solution needs less arithmetic power than Bash Parameter Expansion.
Here are oneliners:
$(basename "${s%.*}")
$(basename "${s}" ".${s##*.}")
I needed this, the same as asked by bongbang and w4etwetewtwet.
Pure bash, no basename, no variable juggling. Set a string and echo:
p=/the/path/foo.txt
echo "${p//+(*\/|.*)}"
Output:
foo
Note: the bash extglob option must be "on", (Ubuntu sets extglob "on" by default), if it's not, do:
shopt -s extglob
Walking through the ${p//+(*\/|.*)}:
${p -- start with $p.
// substitute every instance of the pattern that follows.
+( match one or more of the pattern list in parenthesis, (i.e. until item #7 below).
1st pattern: *\/ matches anything before a literal "/" char.
pattern separator | which in this instance acts like a logical OR.
2nd pattern: .* matches anything after a literal "." -- that is, in bash the "." is just a period char, and not a regex dot.
) end pattern list.
} end parameter expansion. With a string substitution, there's usually another / there, followed by a replacement string. But since there's no / there, the matched patterns are substituted with nothing; this deletes the matches.
Relevant man bash background:
pattern substitution:
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pat
tern just as in pathname expansion. Parameter is expanded and
the longest match of pattern against its value is replaced with
string. If pattern begins with /, all matches of pattern are
replaced with string. Normally only the first match is
replaced. If pattern begins with #, it must match at the beginā€
ning of the expanded value of parameter. If pattern begins with
%, it must match at the end of the expanded value of parameter.
If string is null, matches of pattern are deleted and the / fol
lowing pattern may be omitted. If parameter is # or *, the sub
stitution operation is applied to each positional parameter in
turn, and the expansion is the resultant list. If parameter is
an array variable subscripted with # or *, the substitution
operation is applied to each member of the array in turn, and
the expansion is the resultant list.
extended pattern matching:
If the extglob shell option is enabled using the shopt builtin, several
extended pattern matching operators are recognized. In the following
description, a pattern-list is a list of one or more patterns separated
by a |. Composite patterns may be formed using one or more of the fol
lowing sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
Here is another (more complex) way of getting either the filename or extension, first use the rev command to invert the file path, cut from the first . and then invert the file path again, like this:
filename=`rev <<< "$1" | cut -d"." -f2- | rev`
fileext=`rev <<< "$1" | cut -d"." -f1 | rev`
If you want to play nice with Windows file paths (under Cygwin) you can also try this:
fname=${fullfile##*[/|\\]}
This will account for backslash separators when using BaSH on Windows.
Just an alternative that I came up with to extract an extension, using the posts in this thread with my own small knowledge base that was more familiar to me.
ext="$(rev <<< "$(cut -f "1" -d "." <<< "$(rev <<< "file.docx")")")"
Note: Please advise on my use of quotes; it worked for me but I might be missing something on their proper use (I probably use too many).
Use the basename command. Its manpage is here: http://unixhelp.ed.ac.uk/CGI/man-cgi?basename

Resources