How to write a bash script to find files with complex conditions [duplicate] - linux

This question already has answers here:
Expanding a bash array only gives the first element
(1 answer)
Why does shell ignore quoting characters in arguments passed to it through variables? [duplicate]
(3 answers)
Closed 11 months ago.
Expected Function: write a bash script showing all the regular files
with filename ended with ".xml" or ".yml" and with path not begined
with "./target/", i.e., exclude the "target" subdirectory.
Example File List: ./a/a1.xml; ./a/a2.txt; ./b/b1.yml; ./target/t.xml;
Example Outout: ./a/a1.xml; ./b/b1.yml
I construct find options in a bash shell script like
#!/bin/bash
find_opts=( -type f -a ( -not -path "./target/*" ) -a ( -false -o -name "*.xml" -o -name "*.yml" ) )
find . $find_opts
But it does not output the expected result. Howver, when I tpye the full command string in bash terminal as follows:
[root#localhost]#find . \( -type f -a \( -not -path "./target/*" \) -a \( -false -o -name "*.xml" -o -name "*.yml" \) \)
it works. What is the problem about the above bash script ?
==============================================================
Someone gives this reference link : Expanding a bash array only gives the first element
It is about "bash array". But it seems that my problem is not about "bash array". Would anyone give any reasons?
Please see How to exclude a directory in find . command to known why I use parentheses whiches look like an array. Anyway, I try two others attempts:
#the first one
find_opts=( -type f -a ( -not -path "./target/*" ) -a ( -false -o -name "*.xml" -o -name "*.yml" ) )
#the second one
find_opts=\\( -type f -a \\( -not -path "./target/*" \\) -a \\( -false -o -name "*.xml" -o -name "*.yml" \\) \\)
#the third one
find_opts=\( -type f -a \( -not -path "./target/*" \) -a \( -false -o -name "*.xml" -o -name "*.yml" \) \)
The first one and the third one give the same output which is unexpeced result. The second one occures an error.
syntax error near unexpected token `('
The problem is still here.
==============================================================
Someone give one more reference:Why does shell ignore quoting characters in arguments passed to it through variables?
It is about how to use bash array to pass arguments and the problem is solved by the following code:
#!/bin/bash
find_opts=(\( -type f -a \( -not -path './target/*' \) -a \( -false -o -name '*.xml' -o -name '*.yml' \) \))
find . "${find_opts[#]}"

Related

Find command cannot be executed inside bash script

I wrote a bash script to retrieve the last few days file info from the file system, and the file under some sub-folders will be excluded. Here is the script(test.sh):
#!/bin/bash
date_range=$1
base_dir=$2
excluded_dir=$3
# Command initialization
cmd="find $base_dir"
for item in ${excluded_dir[#]}
do
cmd="$cmd -not \( -path '$base_dir/$item' -prune \)"
done
cmd="$cmd -type f -mtime -$date_range -ls"
echo $cmd
$cmd
I tried an example as below:
./test.sh 3 /root "excluded_folder1 excluded_folder2"
The command has been initialized as:
find /root -not \( -path '/root/excluded_folder1' -prune \) -not \( -path '/root/excluded_folder2' -prune \) -type f -mtime -3 -ls
If I run this find command in my terminal, it works fine, I can get the results that I want. While if it's executed in the bash script. I always get such an error:
find: paths must precede expression: \(
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
Does anybody knows what is the problem and how to fix this?
Thanks for all the answers and suggestions I received here. But none of that solved my problem. The problem is finally solved by using 'eval' to execute the command. The final working bash script is as below:
#!/bin/bash
date_range=$1
base_dir=$2
excluded_dir=$3
# Command initialization
cmd="find $base_dir"
for item in ${excluded_dir[#]}
do
cmd="$cmd -not \( -path '$base_dir/$item' -prune \)"
done
cmd="$cmd -type f -mtime -$date_range -ls"
eval $cmd
While there're some posts saying using eval in bash script is a bad and insecure choice, I still don't know how can I solve this problem with some other approaches. If someone got a better idea, please post it here.
Reference:
What is the eval command in bash?
Why and when should eval use be avoided in shell scripts?
Based on my guess above, I suggest the following code:
#!/bin/bash
date_range=$1
base_dir=$2
excluded_dir=$3
# Command initialization
cmd="find $base_dir"
for item in ${excluded_dir[#]}
do
cmd="$cmd -not ( -path '$base_dir/$item' -prune )"
done
cmd="$cmd -type f -mtime -$date_range -ls"
echo $cmd
$cmd

Can't realize alias/substitution function for my .bashrc [duplicate]

This question already has answers here:
Bash script to receive and repass quoted parameters
(2 answers)
Closed 3 years ago.
I'm trying to colourize my find command so I've added this alias function to my .bashrc.
# liberate your find
function find
{
command find $# -exec ls --color=auto -d {} \;
}
But there is unexpected behavior using this code. It drops my quotes.
GNU bash, version 4.4.23(1)-release (x86_64-pc-linux-gnu)
Use my function:
find ./ -name '*.pl' -or -name '*.pm'
Result:
./lib/cover.pm
./lib/db.pm
Using the same find function but built-in:
command find ./ -name '*.pl' -or -name '*.pm'
Result:
./auth.pl
./index.pl
./title.pl
./lib/cover.pm
./lib/db.pm
./fs2db.pl
So the second variant didn't eat my quotes and works as it should.
For reproducing the problem I created all the files as shown in the longer Result in the question.
When I define the function(*) as
function find
{
command find $# -exec ls --color=auto -d {} \;
}
and execute
find ./ -name '*.pl' -or -name '*.pm'
I get an error message
find: paths must precede expression: fs2db.pl
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
because *.pl gets expanded to auth.pl fs2db.pl index.pl title.pl by the shell.
I had to change the function to
function find
{
command find "$#" -exec ls --color=auto -d {} \;
}
to reproduce your problem. (Maybe this depends on the shell. I tested with bash 4.4.19(3)-release)
After set -x you can see what the shell does when executing your function:
$ find ./ -name '*.pl' -or -name '*.pm'
+ find ./ -name '*.pl' -or -name '*.pm'
+ command find ./ -name '*.pl' -or -name '*.pm' -exec ls --color=auto -d '{}' ';'
+ find ./ -name '*.pl' -or -name '*.pm' -exec ls --color=auto -d '{}' ';'
./lib/cover.pm
./lib/db.pm
The difference between executing your function and executing the find command directly is that your function appends an -exec action with the implicit -a (AND) operator. Without an explicit action, find prints all matching results.
You see a result of the operator precedence -a (AND) higher than -o (=-or, OR)
You can compare the output of these 3 commands
command find ./ -name '*.pl' -or -name '*.pm'
command find ./ -name '*.pl' -or -name '*.pm' -print
command find ./ \( -name '*.pl' -or -name '*.pm' \) -print
see http://man7.org/linux/man-pages/man1/find.1.html#NON-BUGS
You can call your function as
find ./ \( -name '*.pl' -or -name '*.pm' \)
to avoid the problem.
(*) This function definition is copied from the question.
You should use the portable POSIX style find() { ... } instead, unless there is a specific requirement for the Korn shell style function find { ... }.
As written, the -exec primary only applies to the code on the right of the -or operator. You need to parenthesize your arguments so that -exec applies to everything that matches. You also need to extract the path from the other arguments (which gets messy if you want to specify multiple paths, as your function would have to decide where to put the parentheses; distinguishing between paths and other expressions would amount to reimplementing a good chunk of find's parsing. I'll assume you are only passing a single path here).
find ()
{
path=$1
shift
command find "$path" \( "$#" \) -exec ls --color=auto -d {} \;
}
Alternatively, you can put the parentheses in the command line, with no change to your current definition.
find ./ \( -name '*.pl' -or -name '*.pm' \)
Your original function runs
find ./ -name '*.pl' -or -name '*.pm' -exec ls --color=auto -d {} \;
which is equivalent to
find ./ -name '*.pl' -or \( -name '*.pm' -exec ls --color=auto -d {} \; \)
with no implicit -print.

linux find command is not taking proper argument

find . -maxdepth 1 ! -path . -type f ! -name "*.gz" ${FILE_PATTERN} -mtime +${DAYS_AFTER_ARCHIVE}
I am trying to execute above command inside the script where ${FILE_PATTERN} and ${DAYS_AFTER_ARCHIVE} are the variable provided while executing the script. The variable value for ${FILE_PATTERN} would be ! -name "*warnings*".
I am looking for executing above command as like below command in script
find . -maxdepth 1 ! -path . -type d ! -name "*warnings*" -mtime +7
I am providing file pattern argument as "! -name "warnings""
but receiving following error message
find: paths must precede expression
Usage: find [-H] [-L] [-P] [path...] [expression]
suggest on above.
First of all
-name "*.gz" ${FILE_PATTERN}
has too many option values (this is what usually causes the message shown)
If you use bash or similar, escape the exclamation marks
find . -maxdepth 1 \! -path . -type f \! -name "*.gz" ${FILE_PATTERN} -mtime +${DAYS_AFTER_ARCHIVE}
I am providing file pattern argument as "! -name "warnings"" but receiving following error message
You can't combine flags and their values like that. Also, you can't nest " like that. So, it could be like
! -name "$FILE_PATTERN"
If you're using BASH you can make use of BASH arrays:
# find options
FILE_PATTERN=(! -name "warnings*")
# build the find command
cmd=(find . -maxdepth 1 ! -path . -type f \( ! -name "*.gz" "${FILE_PATTERN[#}}" \) -mtime +${DAYS_AFTER_ARCHIVE})
# execute the command
"${cmd[#]}"
If not using BASH then you will have to use eval with caution:
FILE_PATTERN='! -name "warnings*"'
eval find . -maxdepth 1 ! -path . -type f ! -name "*.gz" "${FILE_PATTERN}" -mtime +${DAYS_AFTER_ARCHIVE}

What is wrong with my find command usage?

I'm trying to find all files whose name matches certain C++ file extensions but exclude certain directories matching a pattern with this:
find /home/palchan/code -name "*.[CcHh]" -o -name "*.cpp" -o -name "*.hpp" -a ! -name "*pattern*"
and this still gives me as output certain files like:
/home/palchan/code/libFox/pattern/hdr/fox/RedFox.H
which has the pattern in it?
Here is an example:
> ls -R .
.:
libFox
./libFox:
RedFox.C RedFox.H pattern
./libFox/pattern:
RedFox.C RedFox.H
and then I run:
> find . \( -name "*.[HC]" -a ! -name "*pattern*" \)
./libFox/pattern/RedFox.C
./libFox/pattern/RedFox.H
./libFox/RedFox.C
./libFox/RedFox.H
The following should work:
find /home/palchan/code \( -name "*pattern*" \) -prune -o -type f \( -name "*.[CcHh]" -o -name "*.cpp" -o -name "*.hpp" \) -print
From man find:
-name pattern
Base of file name (the path with the leading directories removed) matches shell pattern pattern. The metacharacters (`*', `?', and `[]') match
a `.' at the start of the base name (this is a change in findutils-4.2.2; see section STANDARDS CONFORMANCE below). To ignore a directory and
the files under it, use -prune; see an example in the description of -path. Braces are not recognised as being special, despite the fact that
some shells including Bash imbue braces with a special meaning in shell patterns. The filename matching is performed with the use of the
fnmatch(3) library function. Don't forget to enclose the pattern in quotes in order to protect it from expansion by the shell.
So, basically, you should use -prune to exclude directories instead of ! -name something
Try doing this :
find /home/palchan/code \( -name "*.[CcHh]" -o -name "*.cpp" -o -name "*.hpp" -a ! -name "*pattern*" \)

Use find command but exclude files in two directories

I want to find files that end with _peaks.bed, but exclude files in the tmp and scripts folders.
My command is like this:
find . -type f \( -name "*_peaks.bed" ! -name "*tmp*" ! -name "*scripts*" \)
But it didn't work. The files in tmp and script folder will still be displayed.
Does anyone have ideas about this?
Here's how you can specify that with find:
find . -type f -name "*_peaks.bed" ! -path "./tmp/*" ! -path "./scripts/*"
Explanation:
find . - Start find from current working directory (recursively by default)
-type f - Specify to find that you only want files in the results
-name "*_peaks.bed" - Look for files with the name ending in _peaks.bed
! -path "./tmp/*" - Exclude all results whose path starts with ./tmp/
! -path "./scripts/*" - Also exclude all results whose path starts with ./scripts/
Testing the Solution:
$ mkdir a b c d e
$ touch a/1 b/2 c/3 d/4 e/5 e/a e/b
$ find . -type f ! -path "./a/*" ! -path "./b/*"
./d/4
./c/3
./e/a
./e/b
./e/5
You were pretty close, the -name option only considers the basename, where as -path considers the entire path =)
Use
find \( -path "./tmp" -o -path "./scripts" \) -prune -o -name "*_peaks.bed" -print
or
find \( -path "./tmp" -o -path "./scripts" \) -prune -false -o -name "*_peaks.bed"
or
find \( -path "./tmp" -path "./scripts" \) ! -prune -o -name "*_peaks.bed"
The order is important. It evaluates from left to right.
Always begin with the path exclusion.
Explanation
Do not use -not (or !) to exclude whole directory. Use -prune.
As explained in the manual:
−prune The primary shall always evaluate as true; it
shall cause find not to descend the current
pathname if it is a directory. If the −depth
primary is specified, the −prune primary shall
have no effect.
and in the GNU find manual:
-path pattern
[...]
To ignore a whole
directory tree, use -prune rather than checking
every file in the tree.
Indeed, if you use -not -path "./pathname",
find will evaluate the expression for each node under "./pathname".
find expressions are just condition evaluation.
\( \) - groups operation (you can use -path "./tmp" -prune -o -path "./scripts" -prune -o, but it is more verbose).
-path "./script" -prune - if -path returns true and is a directory, return true for that directory and do not descend into it.
-path "./script" ! -prune - it evaluates as (-path "./script") AND (! -prune). It revert the "always true" of prune to always false. It avoids printing "./script" as a match.
-path "./script" -prune -false - since -prune always returns true, you can follow it with -false to do the same than !.
-o - OR operator. If no operator is specified between two expressions, it defaults to AND operator.
Hence, \( -path "./tmp" -o -path "./scripts" \) -prune -o -name "*_peaks.bed" -print is expanded to:
[ (-path "./tmp" OR -path "./script") AND -prune ] OR ( -name "*_peaks.bed" AND print )
The print is important here because without it is expanded to:
{ [ (-path "./tmp" OR -path "./script" ) AND -prune ] OR (-name "*_peaks.bed" ) } AND print
-print is added by find - that is why most of the time, you do not need to add it in you expression. And since -prune returns true, it will print "./script" and "./tmp".
It is not necessary in the others because we switched -prune to always return false.
Hint: You can use find -D opt expr 2>&1 1>/dev/null to see how it is optimized and expanded,
find -D search expr 2>&1 1>/dev/null to see which path is checked.
Here is one way you could do it...
find . -type f -name "*_peaks.bed" | egrep -v "^(./tmp/|./scripts/)"
for me, this solution didn't worked on a command exec with find, don't really know why, so my solution is
find . -type f -path "./a/*" -prune -o -path "./b/*" -prune -o -exec gzip -f -v {} \;
Explanation: same as sampson-chen one with the additions of
-prune - ignore the proceding path of ...
-o - Then if no match print the results, (prune the directories and print the remaining results)
18:12 $ mkdir a b c d e
18:13 $ touch a/1 b/2 c/3 d/4 e/5 e/a e/b
18:13 $ find . -type f -path "./a/*" -prune -o -path "./b/*" -prune -o -exec gzip -f -v {} \;
gzip: . is a directory -- ignored
gzip: ./a is a directory -- ignored
gzip: ./b is a directory -- ignored
gzip: ./c is a directory -- ignored
./c/3: 0.0% -- replaced with ./c/3.gz
gzip: ./d is a directory -- ignored
./d/4: 0.0% -- replaced with ./d/4.gz
gzip: ./e is a directory -- ignored
./e/5: 0.0% -- replaced with ./e/5.gz
./e/a: 0.0% -- replaced with ./e/a.gz
./e/b: 0.0% -- replaced with ./e/b.gz
You can try below:
find ./ ! \( -path ./tmp -prune \) ! \( -path ./scripts -prune \) -type f -name '*_peaks.bed'
Try something like
find . \( -type f -name \*_peaks.bed -print \) -or \( -type d -and \( -name tmp -or -name scripts \) -and -prune \)
and don't be too surprised if I got it a bit wrong. If the goal is an exec (instead of print), just substitute it in place.
With these explanations you meet your objective and many others. Just join each part as you want to do.
MODEL
find ./\
-iname "some_arg" -type f\ # File(s) that you want to find at any hierarchical level.
! -iname "some_arg" -type f\ # File(s) NOT to be found on any hirearchic level (exclude).
! -path "./file_name"\ # File(s) NOT to be found at this hirearchic level (exclude).
! -path "./folder_name/*"\ # Folder(s) NOT to be found on this Hirearchic level (exclude).
-exec grep -IiFl 'text_content' -- {} \; # Text search in the content of the found file(s) being case insensitive ("-i") and excluding binaries ("-I").
EXAMPLE
find ./\
-iname "*" -type f\
! -iname "*pyc" -type f\
! -path "./.gitignore"\
! -path "./build/*"\
! -path "./__pycache__/*"\
! -path "./.vscode/*"\
! -path "./.git/*"\
-exec grep -IiFl 'title="Brazil - Country of the Future",' -- {} \;
Thanks! 🤗🇧🇷
[Ref(s).: https://unix.stackexchange.com/q/73938/61742 ]
EXTRA:
You can use the commands above together with your favorite editor and analyze the contents of the files found, for example...
vim -p $(find ./\
-iname "*" -type f\
! -iname "*pyc" -type f\
! -path "./.gitignore"\
! -path "./build/*"\
! -path "./__pycache__/*"\
! -path "./.vscode/*"\
! -path "./.git/*"\
-exec grep -IiFl 'title="Brazil - Country of the Future",' -- {} \;)

Resources