Running a script based on hostname within a group variable - linux

I'm new to shell/bash and I'm trying to perform a function to clear logs on my Oracle files. The environment I'm working in has to have all logs fully open, but the issue we have is the volumes filling up and not allowing services to restart. I'm trying to create a script to run as a cron job to search directories depending on which group they're a part of (each group has slightly different paths and names).
I've got the script going through the "VMORDER" which cycles through the groups listed. I want it to pull the host name. Is there a way for me to say "If VM belongs to a group (i.e. GP1, GP2, etc) then run "GP1s" script"?
Thanks for any help you can provide :).
#!/bin/bash
SCRIPTDIR=[SCRIPT DIR]
GP1="vm01 vm02"
GP2="vm03 vm04 vm05"
GP3="vm06 vm07 vm08"
VMORDER="GP1 GP2 GP3"
##DIRECTORY PATHS
VCIE_DIRECTORY=[DIRECTORY]
##FILE EXCLUSION LISTING
access_log='access.log'
admin_server='AdminServer.log'
admin_service='adminservice.log'
app_ms_1='app_ms*.log'
app_ms_2='app_ms*.out'
app_wm_1='app_ms*.log'
app_wm_2='app_ms*.out'
audit_recorder_log='DefaultAuditRecorder.log'
jms_log='jms*.log'
osb_log='osb_domain.log'
diagnostic_log='diagnostic.log'
HNAME=$( hostname | cut -d'.' -f1 | tr [:lower:] [:upper:] )
find_log_rotation(){
for i in $(VMORDER)
do
clear_logs ${i}
done
}
clear_logs(){
##GP1
if [ $HNAME = GP1];
find -P $VCIE_DIRECTORY/app_ms{1..4}/logs/ -type f -not -name "$app_ms_1" -not -name "$app_ms_2" -not -name "$access_log" -not -name "$audit_recorder_log" -not -name "$jms_log" -mtime 1
fi
##GP2
if [ $HNAME = GP2];
find -P $VCIE_DIRECTORY/app_wm{1..4}/logs/ -type f -not -name "$app_wm_1" -not -name "$app_wm_2" -not -name "$access_log" -not -name "$audit_recorder_log" -not -name "$jms_log" -mtime 1
fi
##GP3
if [ $HNAME = GP3];
find -P $VCIE_DIRECTORY/AdminServer/logs/ -type f -not -name "$admin_server" -not -name "$access_log" -not -name "$access_log" -not -name "$admin_service" -not -name "$osb_domain" -mtime 1
fi

Short answer:
When the hostname is something like "vm04" und you do not tr it to uppercase, you could use:
if [[ "${GP1}" = *${HNAME}* ]]; then
The double [] make it a special syntax, right from the =-sign is an expression.
Do not put quotes around it, that would make it a normal string.
Long answer
Do you really have to look for different files on different servers? When you do not have app_ms_1 files under AdminServer, making the selection of the files to skip easier.
SKIP="${app_ms_1}|${app_ms_2}"
SKIP="${SKIP}|${access_log}|${audit_recorder_log}|${jms_log}"
SKIP="${SKIP}|${app_wm_1}|${app_wm_2}"
SKIP="${SKIP}|${admin_server}|${osb_domain}"
find ${VCIE_DIRECTORY}/*/logs -type f -mtime 1 | egrep -v "${SKIP}" | while read file; do
echo Something with ${file}
done
First make sure the code above returns the correct files (should SKIP have ${app_ms_3}, are the wildcards handled correctly).
Do you need to use the HNAME?
Than you might want to rewrite your code:
if [[ "${GP1}" = *${HNAME}* ]]; then
SKIP="${app_ms_1}|${app_ms_2}"
SKIP="${SKIP}|${access_log}|${audit_recorder_log}|${jms_log}"
STARTDIR=${VCIE_DIRECTORY}/app_ms{1..4}/logs/"
fi
# Something like this also for GP2 and GP3
find ${STARTDIR} -type f -mtime 1 | egrep -v "${SKIP}" | xargs rm

Related

Exclude a directory from plocate's search [duplicate]

How do I exclude a specific directory when searching for *.js files using find?
find . -name '*.js'
If -prune doesn't work for you, this will:
find -name "*.js" -not -path "./directory/*"
Caveat: requires traversing all of the unwanted directories.
Use the -prune primary. For example, if you want to exclude ./misc:
find . -path ./misc -prune -o -name '*.txt' -print
To exclude multiple directories, OR them between parentheses.
find . -type d \( -path ./dir1 -o -path ./dir2 -o -path ./dir3 \) -prune -o -name '*.txt' -print
And, to exclude directories with a specific name at any level, use the -name primary instead of -path.
find . -type d -name node_modules -prune -o -name '*.json' -print
I find the following easier to reason about than other proposed solutions:
find build -not \( -path build/external -prune \) -name \*.js
# you can also exclude multiple paths
find build -not \( -path build/external -prune \) -not \( -path build/blog -prune \) -name \*.js
Important Note: the paths you type after -path must exactly match what find would print without the exclusion. If this sentence confuses you just make sure to use full paths through out the whole command like this: find /full/path/ -not \( -path /full/path/exclude/this -prune \) .... See note [1] if you'd like a better understanding.
Inside \( and \) is an expression that will match exactly build/external (see important note above), and will, on success, avoid traversing anything below. This is then grouped as a single expression with the escaped parenthesis, and prefixed with -not which will make find skip anything that was matched by that expression.
One might ask if adding -not will not make all other files hidden by -prune reappear, and the answer is no. The way -prune works is that anything that, once it is reached, the files below that directory are permanently ignored.
This comes from an actual use case, where I needed to call yui-compressor on some files generated by wintersmith, but leave out other files that need to be sent as-is.
Note [1]: If you want to exclude /tmp/foo/bar and you run find like this "find /tmp \(..." then you must specify -path /tmp/foo/bar. If on the other hand you run find like this cd /tmp; find . \(... then you must specify -path ./foo/bar.
There is clearly some confusion here as to what the preferred syntax for skipping a directory should be.
GNU Opinion
To ignore a directory and the files under it, use -prune
From the GNU find man page
Reasoning
-prune stops find from descending into a directory. Just specifying -not -path will still descend into the skipped directory, but -not -path will be false whenever find tests each file.
Issues with -prune
-prune does what it's intended to, but are still some things you have to take care of when using it.
find prints the pruned directory.
TRUE That's intended behavior, it just doesn't descend into it. To avoid printing the directory altogether, use a syntax that logically omits it.
-prune only works with -print and no other actions.
NOT TRUE. -prune works with any action except -delete. Why doesn't it work with delete? For -delete to work, find needs to traverse the directory in DFS order, since -deletewill first delete the leaves, then the parents of the leaves, etc... But for specifying -prune to make sense, find needs to hit a directory and stop descending it, which clearly makes no sense with -depth or -delete on.
Performance
I set up a simple test of the three top upvoted answers on this question (replaced -print with -exec bash -c 'echo $0' {} \; to show another action example). Results are below
----------------------------------------------
# of files/dirs in level one directories
.performance_test/prune_me 702702
.performance_test/other 2
----------------------------------------------
> find ".performance_test" -path ".performance_test/prune_me" -prune -o -exec bash -c 'echo "$0"' {} \;
.performance_test
.performance_test/other
.performance_test/other/foo
[# of files] 3 [Runtime(ns)] 23513814
> find ".performance_test" -not \( -path ".performance_test/prune_me" -prune \) -exec bash -c 'echo "$0"' {} \;
.performance_test
.performance_test/other
.performance_test/other/foo
[# of files] 3 [Runtime(ns)] 10670141
> find ".performance_test" -not -path ".performance_test/prune_me*" -exec bash -c 'echo "$0"' {} \;
.performance_test
.performance_test/other
.performance_test/other/foo
[# of files] 3 [Runtime(ns)] 864843145
Conclusion
Both f10bit's syntax and Daniel C. Sobral's syntax took 10-25ms to run on average. GetFree's syntax, which doesn't use -prune, took 865ms. So, yes this is a rather extreme example, but if you care about run time and are doing anything remotely intensive you should use -prune.
Note Daniel C. Sobral's syntax performed the better of the two -prune syntaxes; but, I strongly suspect this is the result of some caching as switching the order in which the two ran resulted in the opposite result, while the non-prune version was always slowest.
Test Script
#!/bin/bash
dir='.performance_test'
setup() {
mkdir "$dir" || exit 1
mkdir -p "$dir/prune_me/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/w/x/y/z" \
"$dir/other"
find "$dir/prune_me" -depth -type d -exec mkdir '{}'/{A..Z} \;
find "$dir/prune_me" -type d -exec touch '{}'/{1..1000} \;
touch "$dir/other/foo"
}
cleanup() {
rm -rf "$dir"
}
stats() {
for file in "$dir"/*; do
if [[ -d "$file" ]]; then
count=$(find "$file" | wc -l)
printf "%-30s %-10s\n" "$file" "$count"
fi
done
}
name1() {
find "$dir" -path "$dir/prune_me" -prune -o -exec bash -c 'echo "$0"' {} \;
}
name2() {
find "$dir" -not \( -path "$dir/prune_me" -prune \) -exec bash -c 'echo "$0"' {} \;
}
name3() {
find "$dir" -not -path "$dir/prune_me*" -exec bash -c 'echo "$0"' {} \;
}
printf "Setting up test files...\n\n"
setup
echo "----------------------------------------------"
echo "# of files/dirs in level one directories"
stats | sort -k 2 -n -r
echo "----------------------------------------------"
printf "\nRunning performance test...\n\n"
echo \> find \""$dir"\" -path \""$dir/prune_me"\" -prune -o -exec bash -c \'echo \"\$0\"\' {} \\\;
name1
s=$(date +%s%N)
name1_num=$(name1 | wc -l)
e=$(date +%s%N)
name1_perf=$((e-s))
printf " [# of files] $name1_num [Runtime(ns)] $name1_perf\n\n"
echo \> find \""$dir"\" -not \\\( -path \""$dir/prune_me"\" -prune \\\) -exec bash -c \'echo \"\$0\"\' {} \\\;
name2
s=$(date +%s%N)
name2_num=$(name2 | wc -l)
e=$(date +%s%N)
name2_perf=$((e-s))
printf " [# of files] $name2_num [Runtime(ns)] $name2_perf\n\n"
echo \> find \""$dir"\" -not -path \""$dir/prune_me*"\" -exec bash -c \'echo \"\$0\"\' {} \\\;
name3
s=$(date +%s%N)
name3_num=$(name3 | wc -l)
e=$(date +%s%N)
name3_perf=$((e-s))
printf " [# of files] $name3_num [Runtime(ns)] $name3_perf\n\n"
echo "Cleaning up test files..."
cleanup
This is the only one that worked for me.
find / -name MyFile ! -path '*/Directory/*'
Searching for "MyFile" excluding "Directory".
Give emphasis to the stars * .
One option would be to exclude all results that contain the directory name with grep. For example:
find . -name '*.js' | grep -v excludeddir
Tested in Linux Ubuntu 18.04 and 20.04.
find is incredibly important and powerful, but it is so nuanced and confusing!
How do I exclude a specific directory when searching for *.js files using find?
Quick example: exclude all directories with a given prefix
This is a really useful example that doesn't answer the OP's question directly, but is even more useful in my opinion:
#Kamil Dziedzic asked in a comment below my answer (corrected for grammar and punctuation):
How can I ignore directories with a given prefix? For example, I would like to exclude directories starting with _.
Here is how:
# Ignore all directories (and their contents, via `-prune`) beginning with
# prefix "prefix" at the lowest level of the specified directory (`.`).
find . -not \( -path "./prefix*" -type d -prune \) | sort -V
# Ignore all directories (and their contents, via `-prune`) beginning with
# prefix "prefix" at any level recursively within the specified directory.
find . -not \( -path "*/prefix*" -type d -prune \) | sort -V
So, for a directory prefix of _, use whichever of these you want:
find . -not \( -path "./_*" -type d -prune \) | sort -V
find . -not \( -path "*/_*" -type d -prune \) | sort -V
Explanation:
. means "current directory"
* is a find wildcard, matching any number of any character (like the regular expression .*)
\( and \) are escaped parenthesis. They must be escaped with the backslash so that they get passed to find as parameters to find rather than getting processed by your shell interpreter itself (such as bash or sh or whatever shell you use)
-not \( \) says to ignore files which match the conditions within those parenthesis.
-path "./prefix*" says to match all paths which begin with ./prefix, meaning all paths which are at the lowest level of the . directory you specified in your find command. -path "*/prefix*" will match all paths which begin with anything, followed by /prefix, meaning any path beginning with prefix at any level within any dir in your search path.
-type d says to only match directories. This gets "and"ed with the -path just specified, making it match only files which begin with your specified prefix and are of type "directory".
-prune says to not traverse into matching directories. From man find: "if the file is a directory, do not descend into it." Therefore, without the -prune option, the directory ./prefixWhateverDir itself would be excluded, but files ./prefixWhateverDir/file1.c and ./prefixWhateverDir/file2.c within that directory would NOT be excluded (even ./prefixWhateverDir/prefixFile1.c and ./prefixWhateverDir/prefixFile2.c would not be excluded--also since they are not of -type d). Adding -prune avoids traversing into the excluded directory, thereby excluding files within that directory as well. This might seem weird, but keep in mind in Linux and Unix systems, directories are "files" too, just special types of files which can be a prefix in the path to other files is all. So, with that in mind, having to use -prune makes more sense.
Piping to sort -V with | sort -V just sorts the output to be nice and alphabetical is all.
If you think that -not or -prune is required, but not both, that is incorrect. See the new section I just added below called "Addressing other comments" to see a detailed example of running the above commands with both -not and -prune, only -not, and only -prune. They are not the same thing.
Quick summary and answer to the OP's question:
This answers the OP's question directly.
Follow these patterns. See also my comment here. These are the best and most-effective patterns I have found, period. The escaped parenthesis (\( and \)) and the -prune option are very important for speed. Read below to find out why.
Best patterns to use:
Remove the -name '*.js' part of each command below, of course, if you are looking for a generic answer and not trying to solve the OP's original question, which involved also finding only files with extension .js in their name.
# Exclude one path, and its contents, saving time by *not* recursing down the
# excluded path at all.
find . -name '*.js' -not \( -path "./dir_to_exclude" -prune \)
# Add the wildcard asterisk (`*`) to the end of the match pattern, as
# in "./dir_to_exclude*", to exclude all files & folders beginning with the
# name `./dir_to_exclude`. Prune to save time by *not* recursing down the
# excluded paths at all.
# - You can add the asterisk to the end of the pattern to apply this pattern to
# all examples below as well, if desired.
# - This example pattern would exclude "./dir_to_exclude", "./dir_to_exclude1",
# "./dir_to_exclude2", "./dir_to_exclude99", "./dir_to_exclude_some_long_name",
# "./dir_to_exclude_another_long_name", etc., as well as exclude all **files**
# beginning with this match pattern but not otherwise in an excluded dir.
find . -name '*.js' -not \( -path "./dir_to_exclude*" -prune \)
# Exclude multiple paths and their contents, saving time by *not* recursing down
# the excluded paths at all.
find . -name '*.js' \
-not \( -path "./dir_to_exclude1" -prune \) \
-not \( -path "./dir_to_exclude2" -prune \) \
-not \( -path "./dir_to_exclude3" -prune \)
# If you change your "starting point" path from `.` to something else, be sure
# to update the beginning of your `-path` with that as well, like this:
find "some_dir" -name '*.js' -not \( -path "some_dir/dir_to_exclude" -prune \)
find "some_dir" -name '*.js' \
-not \( -path "some_dir/dir_to_exclude1" -prune \) \
-not \( -path "some_dir/dir_to_exclude2" -prune \) \
-not \( -path "some_dir/dir_to_exclude3" -prune \)
The above patterns are the best because when the -prune option is on with escaped parenthesis as shown above, and when you specify the folder name like that (nothing after the folder name in this case), it excludes both the folder and its contents.
If you remove the parenthesis and the -prune option, -not -path "./dir_to_exclude" will undesirably exclude only the directory name, but not its contents. If you don't follow my recommended patterns above, you'd have to use -not -path "./dir_to_exclude" to exclude only the folder name, and -not -path "./dir_to_exclude/*" to exclude only the folder contents, and -not -path "./dir_to_exclude" -not -path "./dir_to_exclude/*" to exclude both.
Additionally, removing the parenthesis and -prune option from my examples above takes 2x~100x longer. That's a HUGE speed difference! Using the parenthesis and -prune option causes find to NOT recurse down the excluded directories, whereas find . -not -path "./dir_to_exclude" -not -path "./dir_to_exclude/*" would still waste vast amounts of time recursing down the excluded directory.
Discussion of nuances and rules of thumb
When using find:
You must include either a wildcard (*) or the "starting point" path in the -path you are trying to match. Examples:
Match exact paths relative to the "starting point" path by prefixing your -path to match with the "starting point" path:
# 1. with the "starting point" being the current directory, `.`
find . -not -path "./dir_to_exclude/*"
# or (same thing)
find -not -path "./dir_to_exclude/*"
# 2. with the "starting point" being the root dir, `/`
find / -not -path "/dir_to_exclude/*"
# 3. with the "starting point" being "some_dir"
find "some_dir" -not -path "some_dir/dir_to_exclude/*"
Again, notice that in all -path matches above, you must explicitly prefix the path with the "starting point" path. Otherwise, you can use a wildcard:
Match wildcard paths to find your -path at any level or sub-directory within your search path. ie: prefix your -path with *. Examples:
# match "./dir_to_exclude/file1" as well as
# "./another_dir/dir_to_exclude/file1"
find . -not -path "*/dir_to_exclude/*"
# match "/dir_to_exclude/file1" as well as
# "/another_dir/dir_to_exclude/file1"
find / -not -path "*/dir_to_exclude/*"
# match "some_dir/dir_to_exclude/file1" as well as
# "some_dir/another_dir/dir_to_exclude/file1"
find "some_dir" -not -path "*/dir_to_exclude/*"
Again, notice that in all -path matches above, I explictly prefixed the path with the * wildcard char to match at any level.
Use -ipath to do case-insensitive path matches. From man find:
-ipath pattern
Like -path. but the match is case insensitive.
Examples:
# exclude "./dir_to_exclude/*", as well as "./DIR_TO_EXCLUDE/*", and
# "./DiR_To_eXcluDe/*", etc.
find . -not -ipath "./dir_to_exclude/*"
When not using the escaped parenthesis and the -prune option, find will still recurse down the excluded paths, making it as slow as mud. ☹️
When not using the escaped parenthesis and the -prune option, find . -not -path "./dir_to_exclude/*" excludes only the contents of the excluded dir, but NOT the excluded dir itself, and find . -not -path "./dir_to_exclude" excluded only the directory name itself, but NOT the contents (files and folders) within that directory! Use both to exclude both. Examples:
# exclude the files and folders within the excluded dir, but
# leaving "./dir_to_exclude" itself
find . -not -path "./dir_to_exclude/*"
# exclude the dir name only, but leaving (NOT excluding) all files and
# folders within that dir!
find . -not -path "./dir_to_exclude"
# exclude both the folder itself, as well as its contents
find . \
-not -path "./dir_to_exclude/*" \
-not -path "./dir_to_exclude"
All of the above examples in this "rules of thumb" section are pure garbage 🧻 and trash πŸ—‘ ☹️. I'm kidding and exaggerating, but the point is: I think they are not nearly as good, for the reasons explained. You should wrap every single one of them with the escaped parenthesis and the -prune option, like this πŸ˜€:
find . -not \( -path "./dir_to_exclude/*" -prune \)
find -not \( -path "./dir_to_exclude/*" -prune \)
find / -not \( -path "/dir_to_exclude/*" -prune \)
find "some_dir" -not \( -path "some_dir/dir_to_exclude/*" -prune \)
find . -not \( -path "*/dir_to_exclude/*" -prune \)
find / -not \( -path "*/dir_to_exclude/*" -prune \)
find "some_dir" -not \( -path "*/dir_to_exclude/*" -prune \)
find . -not \( -ipath "./dir_to_exclude/*" -prune \)
find . -not \( -path "./dir_to_exclude/*" -prune \)
find . -not \( -path "./dir_to_exclude" -prune \)
find . \
-not \( -path "./dir_to_exclude/*" -prune \) \
-not \( -path "./dir_to_exclude" -prune \)
The -prune option is really important. Here is what it means, from man find (emphasis added):
-prune True; if the file is a directory, do not descend into it. If -depth is given, then
-prune has no effect. Because -delete implies -depth, you cannot usefully use -prune
and -delete together.
For example, to skip the directory src/emacs and all files and directories under
it, and print the names of the other files found, do something like this:
find . -path ./src/emacs -prune -o -print
The above content is my latest information as of 4 Sept. 2022. The below content is my older answer, which still has a ton of useful information, but doesn't cover the nuances as well as what I've presented above. Read it to gain more knowledge and see some more examples, applying what you learned above to what I present below.
Generic examples
Notice that the ./ (or */, see below) before and the /* (or *, but see the caveat below) after the folder name to exclude are required in order to exclude dir_to_exclude, and anything within it!
Also, for speed, and to not traverse excluded directories, notice the really important escaped grouping parenthesis and the -prune option. Ex: find -not \( -path "*/dir_to_exclude/*" -prune \).
To see examples of these escaped grouping parenthesis in the manual pages, run man find, and then press / to search. Search for the pattern \(, for instance, using the regular expression pattern \\\(. Press Enter to begin searching the man pages. Press N for "next match" while searching.
Summary
These work:
# [my favorite #1] exclude contents of `dir_to_exclude` at the search root
find -not -path "./dir_to_exclude/*"
# exclude all files & folders beginning with the name `dir_to_exclude` at the
# search root
find -not -path "./dir_to_exclude*"
# [my favorite #2] exclude contents of `dir_to_exclude` at any level within your
# search path
find -not -path "*/dir_to_exclude/*"
# exclude all files & folders beginning with the name `dir_to_exclude` at any
# level within your search path
find -not -path "*/dir_to_exclude*"
# To exclude multiple matching patterns, use `-not -path "*/matching pattern/*"`
# multiple times, like this
find -not -path "*/dir_to_exclude1/*" -not -path "*/dir_to_exclude2/*"
[USE THESE] These work too, and are BETTER because they cause find to NOT unnecessarily traverse down excluded paths!:
(This makes a huge difference in speed (is 2x~100x faster)! See here and here. You can also search the man find pages locally for the strings \( and \) with the escaped search strings \\\( and \\\), respectively).
find -not \( -path "./dir_to_exclude" -prune \) # works to exclude *both* the
# directory *and* its contents
# here, here but does *not*
# exclude the contents as well
# when the directory name is
# written like this in the
# examples above
find -not \( -path "./dir_to_exclude*" -prune \)
find -not \( -path "./dir_to_exclude/*" -prune \)
find -not \( -path "*/dir_to_exclude" -prune \) # same note as just above
find -not \( -path "*/dir_to_exclude*" -prune \)
find -not \( -path "*/dir_to_exclude/*" -prune \)
# To exclude multiple matching patterns at once, use the `-not \( ... \)`
# pattern multiple times, like this
find -not \( -path "*/dir_to_exclude1/*" -prune \) \
-not \( -path "*/dir_to_exclude2/*" -prune \)
...but these do NOT work:
# These do NOT work!
find -not -path "dir_to_exclude"
find -not -path "dir_to_exclude/*"
find -not -path "./dir_to_exclude"
find -not -path "./dir_to_exclude/"
The key is that generally, to make it work, you must begin each matching pattern with either ./ or */, and end each matching pattern with either /* or *, depending on what you're trying to achieve. I say "generally", because there are two noted exceptions in the -not \( ... \)-style section above. You can identify these two exceptions by the comments to the right of them which say: # works here but not above.
Further Explanation:
[BEST, depending on what you want] This WORKS! Exclude all files and folders inside dir_to_exclude at the root of where you are searching.
Note that this excludes all subfiles and subfolders inside dir_to_exclude, but it does NOT exclude the dir_to_exclude dir itself.
find -not \( -path "./dir_to_exclude/*" -prune \)
Also exclude the dir_to_exclude dir itself (and any file or folder with a name which begins with these characters).
Caveat: this also excludes dir_to_exclude1, dir_to_exclude2, dir_to_exclude_anyTextHere, etc. It excludes ANY file or folder which merely begins with the text dir_to_exclude and is in the root directory of where you're searching.
find -not \( -path "./dir_to_exclude*" -prune \)
[BEST, depending on what you want] to recursively exclude a dir by this name at any level in your search path. Simply add a wildcard * to the front of the path too, rather than using the . to indicate the search root directory.
find -not \( -path "*/dir_to_exclude/*" -prune \)
Recursively exclude any file or folder with a name which begins with the characters dir_to_exclude at any level in your search path. (See also the caveat above).
find -not \( -path "*/dir_to_exclude*" -prune \)
Summary:
In ./, the . at the beginning means "start in the current directory" (or in */, the * is a wildcard to pick up any characters up to this point), and in /* at the end, the * is a wildcard to pick up any characters in the path string after the / character. That means the following:
"./dir_to_exclude/*" matches all subfiles and subfolders within dir_to_exclude in the root search directory (./), but does NOT match the directory itself.
"./dir_to_exclude*" matches all files and folders within the root search directory (./), including dir_to_exclude, as well as all contents within it, but also with the caveat it will match any file or folder name beginning with the characters dir_to_exclude.
"*/dir_to_exclude/*" matches all subfiles and subfolders within dir_to_exclude in any directory at any level in your search path (*/), but does NOT match the directory itself.
"*/dir_to_exclude*" matches all files and folders at any level (*/) within your search path with a name which begins with dir_to_exclude.
Going further
From there, I like to pipe to grep to search for certain matching patterns in the paths of interest. Ex: search for any path that is NOT inside the dir_to_exclude directory, and which has desired_file_name.txt in it:
# Case-sensitive; notice I use `\.` instead of `.` when grepping, in order to
# search for the literal period (`.`) instead of the regular expression
# wildcard char, which is also a period (`.`).
find -not \( -path "./dir_to_exclude/*" -prune \) \
| grep "desired_file_name\.txt"
# Case-INsensitive (use `-i` with your `grep` search)
find -not \( -path "./dir_to_exclude/*" -prune \) \
| grep -i "desired_file_name\.txt"
# To make `dir_to_exclude` also case INsensitive, use the `find` `-ipath` option
# instead of `-path`:
find -not -ipath \( -path "./dir_to_exclude/*" -prune \) \
| grep -i "desired_file_name\.txt"
To exclude multiple matching patterns, simply use -not \( -path "*/matching pattern/*" -prune \) multiple times. Ex:
# Exclude all ".git" and "..git" dirs at any level in your search path
find -not \( -path "*/.git/*" -prune \) -not \( -path "*/..git/*" -prune \)
I use the above example as part of my sublf alias here (update: that alias is being expanded and moved into a sublf.sh script in this folder here instead). This alias allows me to use the fzf fuzzy finder to quickly search for and open multiple files in Sublime Text. See the links above for the latest version of it.
alias sublf='FILES_SELECTED="$(find -not \( -path "*/.git/*" -prune \) \
-not \( -path "*/..git/*" -prune \) \
| fzf -m)" \
&& echo "Opening these files in Sublime Text:" \
&& echo "$FILES_SELECTED" \
&& subl $(echo "$FILES_SELECTED")'
Addressing other comments
1. Both -prune and -not are required to get the desired effect
Comment from #Ritin (fixed for formatting/wording):
#Gabriel Staples, both -not and -prune are not required. use either -prune or -not: find . \( -path '*frontend*' -o -path '*/\.*' -o -path "*node_modules*" \) -prune -o -type f |sort -V
My response:
#Ritin, that's incorrect. To get the effect I want, both -not and -prune are required. This is exactly what I'm talking about when I said at the beginning of my answer:
find is incredibly important and powerful, but it is so nuanced and confusing!
Run the following examples in my eRCaGuy_hello_world/cpp/ folder to see the difference:
both -not and -prune:
Command and output:
eRCaGuy_hello_world/cpp$ find . -not \( -path "./template*" -type d -prune \) | sort -V | grep -i '\./template'
./template_non_type_template_params_print_int_TODO.cpp
As you can see, this command leaves only the one file: ./template_non_type_template_params_print_int_TODO.cpp. It strips all directories which begin with ./template in their path, as well as all contents (files and folders) within them. That's the effect I want.
-not only:
Command and output:
eRCaGuy_hello_world/cpp$ find . -not \( -path "./template*" -type d \) | sort -V | grep -i '\./template'
./template_function_sized_array_param/print_array_calls_by_array_size.ods
./template_function_sized_array_param/readme.md
./template_function_sized_array_param/regular_func
./template_function_sized_array_param/regular_func.cpp
./template_function_sized_array_param/template_func
./template_function_sized_array_param/template_func.cpp
./template_non_type_template_params_print_int_TODO.cpp
./template_practice/explicit_template_specialization.cpp
./template_practice/research/Buckys C++ Programming Tutorials - 61 - Template Specializations - YouTube.desktop
./template_practice/research/Link to explicit (full) template specialization - cppreference.com%%%%%+.desktop
./template_practice/research/Link to template c++ - Google Search%%%%%.desktop
./template_practice/research/Link to template specialization - Google Search [videos]%%%%%.desktop
./template_practice/research/Link to template specialization - Google Search%%%%%.desktop
./template_practice/research/Template (C++) - Wikipedia.desktop
./template_practice/research/Template (C++) - Wikipedia.pdf
./template_practice/research/Template (C++) - Wikipedia_GS_edit.pdf
./template_practice/research/partial template specialization - cppreference.com.desktop
./template_practice/research/(7) Template Specialization In C++ - YouTube.desktop
./template_practice/run_explicit_template_specialization.sh
As you can see, this command strips out the two folders beginning with ./template, namely: ./template_function_sized_array_param and ./template_practice. It still recurses into those directories, however, leaving all of the contents (files and folders) within those directories. The file ./template_non_type_template_params_print_int_TODO.cpp is also present, as before.
-prune only:
Command and output:
eRCaGuy_hello_world/cpp$ find . \( -path "./template*" -type d -prune \) | sort -V | grep -i '\./template'
./template_function_sized_array_param
./template_practice
As you can see, this command only finds the ./template_function_sized_array_param and ./template_practice folders themselves, but the -prune option says to not recurse down into those directories, so it finds none of their contents (files and folders) within them. It also erroneously strips out the ./template_non_type_template_params_print_int_TODO.cpp file, which I don't want. Using -prune only appears to be the exact opposite of using -not only.
Using both -not and -prune together produces the effect I want.
References:
[the main answer to this question] How do I exclude a directory when using `find`?
https://unix.stackexchange.com/questions/350085/is-it-possible-to-exclude-a-directory-from-the-find-command/350172#350172
https://unix.stackexchange.com/questions/32155/find-command-how-to-ignore-case/32158#32158
See also:
My answer: Unix & Linux: All about finding, filtering, and sorting with find, based on file size
[I still need to study and read this] https://www.baeldung.com/linux/find-exclude-paths
[my answer] How to store the output of find (a multi-line string list of files) into a bash array
Keywords: exclude dir in find command; don't search for path with find; case-insensitive find and grep commands
I prefer the -not notation ... it's more readable:
find . -name '*.js' -and -not -path directory
Use the -prune option. So, something like:
find . -type d -name proc -prune -o -name '*.js'
The '-type d -name proc -prune' only look for directories named proc to exclude.
The '-o' is an 'OR' operator.
-prune definitely works and is the best answer because it prevents descending into the dir that you want to exclude. -not -path which still searches the excluded dir, it just doesn't print the result, which could be an issue if the excluded dir is mounted network volume or you don't permissions.
The tricky part is that find is very particular about the order of the arguments, so if you don't get them just right, your command may not work. The order of arguments is generally as such:
find {path} {options} {action}
{path}: Put all the path related arguments first, like . -path './dir1' -prune -o
{options}: I have the most success when putting -name, -iname, etc as the last option in this group. E.g. -type f -iname '*.js'
{action}: You'll want to add -print when using -prune
Here's a working example:
# setup test
mkdir dir1 dir2 dir3
touch dir1/file.txt; touch dir1/file.js
touch dir2/file.txt; touch dir2/file.js
touch dir3/file.txt; touch dir3/file.js
# search for *.js, exclude dir1
find . -path './dir1' -prune -o -type f -iname '*.js' -print
# search for *.js, exclude dir1 and dir2
find . \( -path './dir1' -o -path './dir2' \) -prune -o -type f -iname '*.js' -print
There are plenty of good answers, it just took me some time to understand what each element of the command was for and the logic behind it.
find . -path ./misc -prune -o -name '*.txt' -print
find will start finding files and directories in the current directory, hence the find ..
The -o option stands for a logical OR and separates the two parts of the command :
[ -path ./misc -prune ] OR [ -name '*.txt' -print ]
Any directory or file that is not the ./misc directory will not pass the first test -path ./misc. But they will be tested against the second expression. If their name corresponds to the pattern *.txt they get printed, because of the -print option.
When find reaches the ./misc directory, this directory only satisfies the first expression. So the -prune option will be applied to it. It tells the find command to not explore that directory. So any file or directory in ./misc will not even be explored by find, will not be tested against the second part of the expression and will not be printed.
This is the format I used to exclude some paths:
$ find ./ -type f -name "pattern" ! -path "excluded path" ! -path "excluded path"
I used this to find all files not in ".*" paths:
$ find ./ -type f -name "*" ! -path "./.*" ! -path "./*/.*"
The -path -prune approach also works with wildcards in the path. Here is a find statement that will find the directories for a git server serving multiple git repositiories leaving out the git internal directories:
find . -type d \
-not \( -path */objects -prune \) \
-not \( -path */branches -prune \) \
-not \( -path */refs -prune \) \
-not \( -path */logs -prune \) \
-not \( -path */.git -prune \) \
-not \( -path */info -prune \) \
-not \( -path */hooks -prune \)
To exclude multiple directories:
find . -name '*.js' -not \( -path "./dir1" -o -path "./dir2/*" \)
To add directories, add -o -path "./dirname/*":
find . -name '*.js' -not \( -path "./dir1" -o -path "./dir2/*" -o -path "./dir3/*"\)
But maybe you should use a regular expression, if there are many directories to exclude.
a good trick for avoiding printing the pruned directories is to use -print (works for -exec as well) after the right side of the -or after -prune. For example, ...
find . -path "*/.*" -prune -or -iname "*.j2"
will print the path of all files beneath the current directory with the `.j2" extension, skipping all hidden directories. Neat. But it will also print the print the full path of each directory one is skipping, as noted above. However, the following does not, ...
find . -path "*/.*" -prune -or -iname "*.j2" -print
because logically there's a hidden -and after the -iname operator and before the -print. This binds it to the right part of the -or clause due to boolean order of operations and associativity. But the docs say there's a hidden -print if it (or any of its cousins ... -print0, etc) is not specified. So why isn't the left part of the -or printing? Apparently (and I didn't understand this from my first reading the man page), that is true if there there is no -print -or -exec ANYWHERE, in which case, -print is logically sprinkled around such that everything gets printed. If even ONE print-style operation is expressed in any clause, all those hidden logical ones go away and you get only what you specify. Now frankly, I might have preferred it the other way around, but then a find with only descriptive operators would apparently do nothing, so I guess it makes sense as it is. As mentioned above, this all works with -exec as well, so the following gives a full ls -la listing for each file with the desired extension, but not listing the first level of each hidden directory, ...
find . -path "*/.*" -prune -or -iname "*.j2" -exec ls -la -- {} +
For me (and others on this thread), find syntax gets pretty baroque pretty quickly, so I always throw in parens to make SURE I know what binds to what, so I usually create a macro for type-ability and form all such statements as ...
find . \( \( ... description of stuff to avoid ... \) -prune \) -or \
\( ... description of stuff I want to find ... [ -exec or -print] \)
It's hard to go wrong by setting up the world into two parts this way. I hope this helps, though it seems unlikely for anyone to read down to the 30+th answer and vote it up, but one can hope. :-)
If anyone's researching on how to ignore multiple paths at once.
You can use bash arrays (works perfectly on GNU bash, version 4.4.20(1)-release)
#!/usr/bin/env bash
# This script helps ignore unnecessary dir paths while using the find command
EXCLUDE_DIRS=(
"! -path /*.git/*"
"! -path /*go/*"
"! -path /*.bundle/*"
"! -path /*.cache/*"
"! -path /*.local/*"
"! -path /*.themes/*"
"! -path /*.config/*"
"! -path /*.codeintel/*"
"! -path /*python2.7/*"
"! -path /*python3.6/*"
"! -path /*__pycache__/*"
)
find $HOME -type f ${EXCLUDE_DIRS[#]}
# if you like fzf
find $HOME -type f ${EXCLUDE_DIRS[#]} | fzf --height 40% --reverse
Also for some reason, you won't be able to ignore /bin/ directory paths.
For a working solution (tested on Ubuntu 12.04 (Precise Pangolin))...
find ! -path "dir1" -iname "*.mp3"
will search for MP3 files in the current folder and subfolders except in dir1 subfolder.
Use:
find ! -path "dir1" ! -path "dir2" -iname "*.mp3"
...to exclude dir1 AND dir2
find . \( -path '.**/.git' -o -path '.**/.hg' \) -prune -o -name '*.js' -print
The example above finds all *.js files under the current directory, excluding folders .git and .hg, does not matter how deep these .git and .hg folders are.
Note: this also works:
find . \( -path '.*/.git' -o -path '.*/.hg' \) -prune -o -name '*.js' -print
but I prefer the ** notation for consistency with some other tools which would be off topic here.
find -name '*.js' -not -path './node_modules/*' -not -path './vendor/*'
seems to work the same as
find -name '*.js' -not \( -path './node_modules/*' -o -path './vendor/*' \)
and is easier to remember IMO.
You can also use regular expressions to include / exclude some files /dirs your search using something like this:
find . -regextype posix-egrep -regex ".*\.(js|vue|s?css|php|html|json)$" -and -not -regex ".*/(node_modules|vendor)/.*"
This will only give you all js, vue, css, etc files but excluding all files in the node_modules and vendor folders.
None of previous answers is good on Ubuntu.
Try this:
find . ! -path "*/test/*" -type f -name "*.js" ! -name "*-min-*" ! -name "*console*"
I have found this here
You can use the prune option to achieve this. As in for example:
find ./ -path ./beta/* -prune -o -iname example.com -print
Or the inverse grep β€œgrep -v” option:
find -iname example.com | grep -v beta
You can find detailed instructions and examples in Linux find command exclude directories from searching.
find . -name '*.js' -\! -name 'glob-for-excluded-dir' -prune
TLDR: understand your root directories and tailor your search from there, using the -path <excluded_path> -prune -o option. Do not include a trailing / at the end of the excluded path.
Example:
find / -path /mnt -prune -o -name "*libname-server-2.a*" -print
To effectively use the find I believe that it is imperative to have a good understanding of your file system directory structure. On my home computer I have multi-TB hard drives, with about half of that content backed up using rsnapshot (i.e., rsync). Although backing up to to a physically independent (duplicate) drive, it is mounted under my system root (/) directory: /mnt/Backups/rsnapshot_backups/:
/mnt/Backups/
└── rsnapshot_backups/
β”œβ”€β”€ hourly.0/
β”œβ”€β”€ hourly.1/
β”œβ”€β”€ ...
β”œβ”€β”€ daily.0/
β”œβ”€β”€ daily.1/
β”œβ”€β”€ ...
β”œβ”€β”€ weekly.0/
β”œβ”€β”€ weekly.1/
β”œβ”€β”€ ...
β”œβ”€β”€ monthly.0/
β”œβ”€β”€ monthly.1/
└── ...
The /mnt/Backups/rsnapshot_backups/ directory currently occupies ~2.9 TB, with ~60M files and folders; simply traversing those contents takes time:
## As sudo (#), to avoid numerous "Permission denied" warnings:
time find /mnt/Backups/rsnapshot_backups | wc -l
60314138 ## 60.3M files, folders
34:07.30 ## 34 min
time du /mnt/Backups/rsnapshot_backups -d 0
3112240160 /mnt/Backups/rsnapshot_backups ## 3.1 TB
33:51.88 ## 34 min
time rsnapshot du ## << more accurate re: rsnapshot footprint
2.9T /mnt/Backups/rsnapshot_backups/hourly.0/
4.1G /mnt/Backups/rsnapshot_backups/hourly.1/
...
4.7G /mnt/Backups/rsnapshot_backups/weekly.3/
2.9T total ## 2.9 TB, per sudo rsnapshot du (more accurate)
2:34:54 ## 2 hr 35 min
Thus, anytime I need to search for a file on my / (root) partition, I need to deal with (avoid if possible) traversing my backups partition.
EXAMPLES
Among the approached variously suggested in this thread (How to exclude a directory in find . command), I find that searches using the accepted answer are much faster -- with caveats.
Solution 1
Let's say I want to find the system file libname-server-2.a, but I do not want to search through my rsnapshot backups. To quickly find a system file, use the exclude path /mnt (i.e., use /mnt, not /mnt/, or /mnt/Backups, or ...):
## As sudo (#), to avoid numerous "Permission denied" warnings:
time find / -path /mnt -prune -o -name "*libname-server-2.a*" -print
/usr/lib/libname-server-2.a
real 0m8.644s ## 8.6 sec <<< NOTE!
user 0m1.669s
sys 0m2.466s
## As regular user (victoria); I also use an alternate timing mechanism, as
## here I am using 2>/dev/null to suppress "Permission denied" warnings:
$ START="$(date +"%s")" && find 2>/dev/null / -path /mnt -prune -o \
-name "*libname-server-2.a*" -print; END="$(date +"%s")"; \
TIME="$((END - START))"; printf 'find command took %s sec\n' "$TIME"
/usr/lib/libname-server-2.a
find command took 3 sec ## ~3 sec <<< NOTE!
... finds that file in just a few seconds, while this take much longer (appearing to recurse through all of the "excluded" directories):
## As sudo (#), to avoid numerous "Permission denied" warnings:
time find / -path /mnt/ -prune -o -name "*libname-server-2.a*" -print
find: warning: -path /mnt/ will not match anything because it ends with /.
/usr/lib/libname-server-2.a
real 33m10.658s ## 33 min 11 sec (~231-663x slower!)
user 1m43.142s
sys 2m22.666s
## As regular user (victoria); I also use an alternate timing mechanism, as
## here I am using 2>/dev/null to suppress "Permission denied" warnings:
$ START="$(date +"%s")" && find 2>/dev/null / -path /mnt/ -prune -o \
-name "*libname-server-2.a*" -print; END="$(date +"%s")"; \
TIME="$((END - START))"; printf 'find command took %s sec\n' "$TIME"
/usr/lib/libname-server-2.a
find command took 1775 sec ## 29.6 min
Solution 2
The other solution offered in this thread (SO#4210042) also performs poorly:
## As sudo (#), to avoid numerous "Permission denied" warnings:
time find / -name "*libname-server-2.a*" -not -path "/mnt"
/usr/lib/libname-server-2.a
real 33m37.911s ## 33 min 38 sec (~235x slower)
user 1m45.134s
sys 2m31.846s
time find / -name "*libname-server-2.a*" -not -path "/mnt/*"
/usr/lib/libname-server-2.a
real 33m11.208s ## 33 min 11 sec
user 1m22.185s
sys 2m29.962s
SUMMARY | CONCLUSIONS
Use the approach illustrated in "Solution 1"
find / -path /mnt -prune -o -name "*libname-server-2.a*" -print
i.e.
... -path <excluded_path> -prune -o ...
noting that whenever you add the trailing / to the excluded path, the find command then recursively enters (all those) /mnt/* directories -- which in my case, because of the /mnt/Backups/rsnapshot_backups/* subdirectories, additionally includes ~2.9 TB of files to search! By not appending a trailing / the search should complete almost immediately (within seconds).
"Solution 2" (... -not -path <exclude path> ...) likewise appears to recursively search through the excluded directories -- not returning excluded matches, but unnecessarily consuming that search time.
Searching within those rsnapshot backups:
To find a file in one of my hourly/daily/weekly/monthly rsnapshot backups):
$ START="$(date +"%s")" && find 2>/dev/null /mnt/Backups/rsnapshot_backups/daily.0 -name '*04t8ugijrlkj.jpg'; END="$(date +"%s")"; TIME="$((END - START))"; printf 'find command took %s sec\n' "$TIME"
/mnt/Backups/rsnapshot_backups/daily.0/snapshot_root/mnt/Vancouver/temp/04t8ugijrlkj.jpg
find command took 312 sec ## 5.2 minutes: despite apparent rsnapshot size
## (~4 GB), it is in fact searching through ~2.9 TB)
Excluding a nested directory:
Here, I want to exclude a nested directory, e.g. /mnt/Vancouver/projects/ie/claws/data/* when searching from /mnt/Vancouver/projects/:
$ time find . -iname '*test_file*'
./ie/claws/data/test_file
./ie/claws/test_file
0:01.97
$ time find . -path '*/data' -prune -o -iname '*test_file*' -print
./ie/claws/test_file
0:00.07
Aside: Adding -print at the end of the command suppresses the printout of the excluded directory:
$ find / -path /mnt -prune -o -name "*libname-server-2.a*"
/mnt
/usr/lib/libname-server-2.a
$ find / -path /mnt -prune -o -name "*libname-server-2.a*" -print
/usr/lib/libname-server-2.a
The following commands works:
find . -path ./.git -prune -o -print
If You have a problem with find, use the -D tree option to view the expression analysis information.
find -D tree . -path ./.git -prune -o -print
Or the -D all, to see all the execution information.
find -D all . -path ./.git -prune -o -print
This is suitable for me on a Mac:
find . -name *.php -or -path "./vendor" -prune -or -path "./app/cache" -prune
It will exclude vendor and app/cache dir for search name which suffixed with php.
If you are looking for a high-performance answer, then it is:
find . -type d -name node_modules -prune -false -o -type f
Use -false to exclude node_modules itself.
It will be 3x faster than -not -path approach in a directory with 10000 files in node_modules.
find . -type f -not -path '*node_modules*'
And if node_modules has more files, you shall get a much higher performance.
I was using find to provide a list of files for xgettext, and wanted to omit a specific directory and its contents. I tried many permutations of -path combined with -prune but was unable to fully exclude the directory which I wanted gone.
Although I was able to ignore the contents of the directory which I wanted ignored, find then returned the directory itself as one of the results, which caused xgettext to crash as a result (doesn't accept directories; only files).
My solution was to simply use grep -v to skip the directory that I didn't want in the results:
find /project/directory -iname '*.php' -or -iname '*.phtml' | grep -iv '/some/directory' | xargs xgettext
Whether or not there is an argument for find that will work 100%, I cannot say for certain. Using grep was a quick and easy solution after some headache.
For those of you on older versions of UNIX who cannot use -path or -not
Tested on SunOS 5.10 bash 3.2 and SunOS 5.11 bash 4.4
find . -type f -name "*" -o -type d -name "*excluded_directory*" -prune -type f
how-to-use-prune-option-of-find-in-sh is an excellent answer by Laurence Gonsalves on how -prune works.
And here is the generic solution:
find /path/to/search \
-type d \
\( -path /path/to/search/exclude_me \
-o \
-name exclude_me_too_anywhere \
\) \
-prune \
-o \
-type f -name '*\.js' -print
To avoid typing /path/to/seach/ multiple times, wrap the find in a pushd .. popd pair.
pushd /path/to/search; \
find . \
-type d \
\( -path ./exclude_me \
-o \
-name exclude_me_too_anywhere \
\) \
-prune \
-o \
-type f -name '*\.js' -print; \
popd

Append output of Find command to Variable in Bash Script

Trying to append output of find command to a variable in a Bash script
Can append output of find command to a log file ok, but can't append it to a variable i.e.
This works ok:
find $DIR -type d -name "*" >> $DIRS_REMOVED_LOG
But this won't:
FILES_TO_EVAL=find $DIR -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \)
ENV=`basename $PS_CFG_HOME | tr "[:lower:]" "[:upper:]"`
FILE_TYPES=(*.log *.xml *.txt *.sh)
DIRS_TO_CLEAR="$PS_CFG_HOME/data/files $PS_CFG_HOME/appserv/prcs/$ENV/files $PS_CFG_HOME/appserv/prcs/$ENV/files/CQ"
FILES_REMOVED_LOG=$PS_CFG_HOME/files_removed.log
DIRS_REMOVED_LOG=$PS_CFG_HOME/dirs_removed.log
##Cycle through directories
##Below for files_removed_log works ok but can't get the find into a variable.
for DIR in `echo $DIRS_TO_CLEAR`
do
echo "Searching $DIR for files:"
FILES_TO_EVAL=find $DIR -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \)
find $DIR -type d -name "*" >> $DIRS_REMOVED_LOG
done
Expected FILES_TO_EVAL to be populated with results of find command but it is empty.
Run your scripts through ShellCheck. It finds lots of common mistakes, much like a compiler would.
FILES_TO_EVAL=find $DIR -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \)
SC2209: Use var=$(command) to assign output (or quote to assign string).
In addition to the problems that shellcheck.net will point out, there are a number of subtler problems.
For one thing, you're using all-caps variable names. This is dangerous, because there are a large number of all-caps variables that have special meanings to the shell and/or other tools, and if you accidentally use one of those, it can have weird effects. Lower- or mixed-case variables are much safer (except when you specifically want the special meaning).
Also, you should almost always put double-quotes around variable references (e.g. find "$dir" ... instead of find $dir ...). Without them, the variables will be subject to word splitting and wildcard expansion, which can have a variety of unintended consequences. In some cases, you need word splitting and/or wildcard expansion on a variable's value, but usually not quite the way the shell does it; in these cases, you should look for a better way to do the job.
In the line that's failing,
FILES_TO_EVAL=find $DIR -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \)
the immediate problem is that you need to use $(find ...) to capture the output from the find command. But this is still dangerous, because it's just storing a newline-delimited list of file paths, and the standard way to expand this (just using an unquoted variable reference) has all the problems I mentioned above. In this case, it will lead to trouble if any filenames contain spaces or wildcards (which are perfectly legal in filenames). In you're in a controlled environment where you can guarantee this won't happen, you'll get away with it... but it's really not the best idea.
Correctly handling a list of filepaths from find is a little complicated, but there are a number of ways to do it. There's a lot of good info in BashFAQ #20: "How can I find and safely handle file names containing newlines, spaces or both?" I'll summarize some common options below:
If you don't need to store the list, just run commands on individual files, you can use find -exec:
find "$dir" -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \) -exec somecommand {} \;
If you need to run something more complex, you can use find -print0 to output the list in an unambiguous form, and then use read -d '' to read them. There are a bunch of potential pitfalls here, so here's the version I use to avoid all the trouble spots:
while IFS= read -r -d '' filepath <&3; do
dosomethingwith "$filepath"
done 3< <(find "$dir" -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \) -print0)
Note that the <(command) syntax (known as process substitution) is a bash-only feature, so use an explicit bash shebang (#!/bin/bash or #!/usr/bin/env bash) on your script, and don't override it by running the script with sh.
If you really do need to store the list of paths for later, store it as an array:
files_to_eval=()
while IFS= read -r -d '' filepath; do
files_to_eval+=("$filepath")
done < <(find "$dir" -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \) -print0)
..or, if you have bash v4.4 or later, it's easier to use readarray (aka mapfile):
readarray -td '' files_to_eval < <(find "$dir" -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \) -print0)
In either case, you should then expand the array with "${files_to_eval[#]}" to get all the elements without subjecting them to word splitting and wildcard expansion.
On to some other problems. In this line:
FILE_TYPES=(*.log *.xml *.txt *.sh)
In this context, the wildcards will be expanded immediately to a list of matches in the current director. You should quote them to prevent this:
file_types=("*.log" "*.xml" "*.txt" "*.sh")
In these lines:
DIRS_TO_CLEAR="$PS_CFG_HOME/data/files $PS_CFG_HOME/appserv/prcs/$ENV/files $PS_CFG_HOME/appserv/prcs/$ENV/files/CQ"
...
for DIR in `echo $DIRS_TO_CLEAR`
You're storing a list as a single string with entries separated by spaces, which has all the word-split and wildcard problems I've been harping on. Also, the echo here is a complication that doesn't do anything useful, and actually makes the wildcard problem worse. Use an array, and avoid all the mess:
dirs_to_clear=("$ps_cfg_home/data/files" "$ps_cfg_home/appserv/prcs/$env/files" "$ps_cfg_home/appserv/prcs/$env/files/CQ")
...
for dir in "${dirs_to_clear[#]}"

Bash - How to properly list files in a folder and manage exclusion

I'm looking for a proper way to list
all filenames (without extension)
matching a specific extension list
recursively in a specific folder
with some exclusions patterns
and then export that to a file.
Currently i'm doing the following which is working properly:
ls -R --ignore={"Sample","Sample.*","sample.*","*_sample.*","*.sample.*","*-sample.*","*.sample-*","*-sample-*","*trailer]*"} "$filesSource" | grep -E '\.mkv$|\.mp4$|\.avi$' | sed -e 's/\(.*\)/\L\1/' | sort >> "$listFile"
Thanks to ShellChecker, I have a feedback on this line and I don't know how to do that properly!
Thanks for your help!
Why don't you try find command?
something like
find YOUR_PATH -type f -name "*.FIRST_EXTENSION" -o -name "*.SECOND_EXTENSION"| grep -v SOME_EXCLUSION | awk -F. '{print $(NF-1)}' | sort > SOME_FILE
note: this will work only if the filenames contain only 1 "." character for the extension, otherwise you need to modify a little bit the awk part.
If you are searching just on filenames, then you can use:
I split the command line in multiple lines:
$ find /path/to/folder -type f \( \( -name '*.ext1' -or -name '*.ext2' -or -name '*.ext3' \) -and -not \( -name '*excl1*' -or -name 'excl2*' \) \) -print
This will do:
/path/to/folder: the folder you are searching
-type f : you are searching for files in the above folder which satisfy
\(: open the conditional test
\( -name '*.ext1' -or -name '*.ext2' -or -name '*.ext3' \): who have one of the three listed extensions (with a conditional or)
-and -not \( -name '*excl1*' -or -name 'excl2*' \): if the above condition mathches it will check (-and) if one of the patterns *excl1* or excl2* do -not match.
\) close the main conditional test
-print perform the action to print the found paths.

Find especific directory and ignore other

I need to find all the iplanets on one server and I was thinking to use this command:
find / type d -name https-* | uniq
But at the same time I need to ignore some directories/file. I've been trying to use !, but it not always work. I have a command like this:
find / type d -name https-* ! -name https-admserv* ! -name conf_bk* ! -name alias* ! -name *db* ! -name ClassCache* | uniq
I need to ignore all that. The directories admserv, conf_bk, alias and tmp and the files *.db*
Basically I need find this:
/opt/mw/iplanet/https-daniel.com
/opt/https-daniel1.com
/apps/https-daniel2.com
I only need to find the directory name. How can I ignore all the other stuff?
Use -prune to keep from recursing into directories:
find / \( -type d \( -name 'https-admserv*' -o -name 'conf_bk*' -o -name 'alias*' -o -name 'tmp' \) -prune -o -type d -name 'https-*' -print
There's no need to ignore any files. You're only selecting https-* directories, so everything else is ignored.
And there's no need to pipe to uniq, since find never produces duplicates.

Need help writing compound linux query

Trying to write my first compund linux query and running into some gaps in knowledge.
The idea is to find all the file that may be either .doc or .txt as well as search the contents for the text clown.
So I started off with searching from the root as such.
$find /
Then I added the wildcard for filename.'
$find / -name '*.doc'...uhh oh
First question. How do I specify or? Is it with pipe | or double pipe || or...? and do I need to repeat the -name parameter like this?
$find / -name '*.doc' || -name '*.txt'
Second ? do I add the grep for the string after / before...?
$find / -name '*.doc' || -name '*.txt' grep -H 'cat' {} \
Finally is there a place where I can validate syntax / run like SQLFiddle?
TIA
'Or' in find is -o
You have to specify the find type again though. So something like:
find / -name *.doc -o -name *.txt
You can simply put your grep command in front, so long as you encase your find command in backticks:
grep 'whatever' `find / -name *.doc -o -name *.txt`
There's a reasonably nice guide to find here
You want something like this:
find / \( -name \*.doc -o -name \*.txt \) -exec grep clown {} \; -print
you specify or with -o within \( \), you run grep in a -exec and you can validate the syntax in a bash shell.
Try:
(find ./ -name "*.txt" -print0 2>/dev/null ; find ./ -name "*.doc" -print0 2>/dev/null) | xargs -0 grep clown

Resources