Exclude list of files from find - linux

If I have a list of filenames in a text file that I want to exclude when I run find, how can I do that? For example, I want to do something like:
find /dir -name "*.gz" -exclude_from skip_files
and get all the .gz files in /dir except for the files listed in skip_files. But find has no -exclude_from flag. How can I skip all the files in skip_files?

I don't think find has an option like this, you could build a command using printf and your exclude list:
find /dir -name "*.gz" $(printf "! -name %s " $(cat skip_files))
Which is the same as doing:
find /dir -name "*.gz" ! -name first_skip ! -name second_skip .... etc
Alternatively you can pipe from find into grep:
find /dir -name "*.gz" | grep -vFf skip_files

This is what i usually do to remove some files from the result (In this case i looked for all text files but wasn't interested in a bunch of valgrind memcheck reports we have here and there):
find . -type f -name '*.txt' ! -name '*mem*.txt'
It seems to be working.

I think you can try like
find /dir \( -name "*.gz" ! -name skip_file1 ! -name skip_file2 ...so on \)

find /var/www/test/ -type f \( -iname "*.*" ! -iname "*.php" ! -iname "*.jpg" ! -iname "*.png" \)
The above command gives list of all files excluding files with .php, .jpg ang .png extension. This command works for me in putty.

Josh Jolly's grep solution works, but has O(N**2) complexity, making it too slow for long lists. If the lists are sorted first (O(N*log(N)) complexity), you can use comm, which has O(N) complexity:
find /dir -name '*.gz' |sort >everything_sorted
sort skip_files >skip_files_sorted
comm -23 everything_sorted skip_files_sorted | xargs . . . etc
man your computer's comm for details.

This solution will go through all files (not exactly excluding from the find command), but will produce an output skipping files from a list of exclusions.
I found that useful while running a time-consuming command (file /dir -exec md5sum {} \;).
You can create a shell script to handle the skipping logic and run commands on the files found (make it executable with chmod, replace echo with other commands):
$ cat skip_file.sh
#!/bin/bash
found=$(grep "^$1$" files_to_skip.txt)
if [ -z "$found" ]; then
# run your command
echo $1
fi
Create a file with the list of files to skip named files_to_skip.txt (on the dir you are running from).
Then use find using it:
find /dir -name "*.gz" -exec ./skip_file.sh {} \;

This should work:
find * -name "*.gz" $(printf "! -path %s " $(<skip_files.txt))
Working out
Assuming skip_files has a filename on each line, you can get the list of filenames via $(<skip_files.txt). E.g. echo $(<skip_files.txt) should print them all out.
For each filename you want to have a ! -path filename expression. To build this, use $(printf "! -path %s " $(<skip_files.txt))
Then, put it together with a filter on -name "*.gz"

Related

"find" command but it stops going deep if it finds a directory starting with "."

I have to make a script that goes through a whole folder (/home, in my case).
I have to save all the files except the ones that start with ., and also, if I find a directory that starts with ., I don't have to care what's inside, I don't have to read it.
For the first part we use the command
for path in $(find /home \! -name ".*");do
where path is a variable that contains the path. But we don't know how to do the directory part.
I thought I'd cut the path through the / and then see if there's any .. In that case, have an if that does not save the file, but I don't know how to cut a string and save it in a variable and then go through it.
You can prune all files starting with a ..
From the man page of GNU find:
-prune True; if the file is a directory, do not descend into it. If -depth is given, false; no effect. Because -delete implies -depth, you cannot usefully use -prune and -delete together.
You should not loop over the result from find. You will get unexpected results if you have filenames with spaces or newlines.
Use xargs or -exec, e.g.
find /home -path "*/.*" -prune -o -print0 | xargs -0I{} sh -c 'echo "doing something with $1"' sh {}
or
find /home -path "*/.*" -prune -o -exec sh -c 'for i; do echo "doing something with $i"; done' sh {} +
The -prune part removes all filenames (files and directories) starting with a dot and does not descend into directories starting with a dot.
All other filenames are printed with a NUL character instead of a newline (-o -print0) and piped to xargs or a shell script is executed with your action (as few times as possible).
To save all filenames into a file:
find /home -path "*/.*" -prune -o -print > allfiles.txt
Try this
for path in $(find /home -type d -name ".*" -prune -o -type f \! -name ".*" -print);do echo $path; done
I think I would do something like that:
for path in $(find . -type f | egrep -v '/\.[^\/]+\/'); do
...
Note that you may have to take extra steps if some of your files have spaces in their names.

Find all files contained into directory named

I would like to recursively find all files contained into a directory that has name “name1” or name “name2”
for instance:
structure/of/dir/name1/file1.a
structure/of/dir/name1/file2.b
structure/of/dir/name1/file3.c
structure/of/dir/name1/subfolder/file1s.a
structure/of/dir/name1/subfolder/file2s.b
structure/of/dir/name2/file1.a
structure/of/dir/name2/file2.b
structure/of/dir/name2/file3.c
structure/of/dir/name2/subfolder/file1s.a
structure/of/dir/name2/subfolder/file2s.b
structure/of/dir/name3/name1.a ←this should not show up in the result
structure/of/dir/name3/name2.a ←this should not show up in the result
so when I start my magic command the expected output should be this and only this:
structure/of/dir/name1/file1.a
structure/of/dir/name1/file2.b
structure/of/dir/name1/file3.c
structure/of/dir/name2/file1.a
structure/of/dir/name2/file2.b
structure/of/dir/name2/file3.c
I scripted something but it does not work because it search within the files and not only folder names:
for entry in $(find $SEARCH_DIR -type f | grep 'name1\|name2');
do
echo "FileName: $(basename $entry)"
done
If you can use the -regex option, avoiding subfolders with [^/]:
~$ find . -type f -regex ".*name1/[^/]*" -o -regex ".*name2/[^/]*"
./structure/of/dir/name2/file1.a
./structure/of/dir/name2/file3.c
./structure/of/dir/name2/subfolder
./structure/of/dir/name2/file2.b
./structure/of/dir/name1/file1.a
./structure/of/dir/name1/file3.c
./structure/of/dir/name1/file2.b
I'd use -path and -prune for this, since it's standard (unlike -regex which is GNU specific).
find . \( -path "*/name1/*" -o -path "*/name2/*" \) -prune -type f -print
But more importantly, never do for file in $(find...). Use finds -exec or a while read loop instead, depending on what you really need to with the matching files. See UsingFind and BashFAQ 20 for more on how to handle find safely.

Recursively find files with a specific extension

I'm trying to find files with specific extensions.
For example, I want to find all .pdf and .jpg files that's named Robert
I know I can do this command
$ find . -name '*.h' -o -name '*.cpp'
but I need to specify the name of the file itself besides the extensions.
I just want to see if there's a possible way to avoid writing the file name again and over again
Thank you !
My preference:
find . -name '*.jpg' -o -name '*.png' -print | grep Robert
Using find's -regex argument:
find . -regex '.*/Robert\.\(h\|cpp\)$'
Or just using -name:
find . -name 'Robert.*' -a \( -name '*.cpp' -o -name '*.h' \)
find -name "*Robert*" \( -name "*.pdf" -o -name "*.jpg" \)
The -o repreents an OR condition and you can add as many as you wish within the braces. So this says to find all files containing the word "Robert" anywhere in their names and whose names end in either "pdf" or "jpg".
As an alternative to using -regex option on find, since the question is labeled bash, you can use the brace expansion mechanism:
eval find . -false "-o -name Robert".{jpg,pdf}
This q/a shows how to use find with regular expression: How to use regex with find command?
Pattern could be something like
'^Robert\\.\\(h|cgg\\)$'
As a script you can use:
find "${2:-.}" -iregex ".*${1:-Robert}\.\(h\|cpp\)$" -print
save it as findcc
chmod 755 findcc
and use it as
findcc [name] [[search_direcory]]
e.g.
findcc # default name 'Robert' and directory .
findcc Joe # default directory '.'
findcc Joe /somewhere # no defaults
note you cant use
findcc /some/where #eg without the name...
also as alternative, you can use
find "$1" -print | grep "$#"
and
findcc directory grep_options
like
findcc . -P '/Robert\.(h|cpp)$'
Using bash globbing (if find is not a must)
ls Robert.{pdf,jpg}
Recurisvely with ls: (-al for include hidden folders)
ftype="jpg"
ls -1R *.${ftype} 2> /dev/null
For finding the files in system using the files database:
locate -e --regex "\.(h|cpp)$"
Make sure locate package is installed i.e. mlocate

Need help writing compound linux query

Trying to write my first compund linux query and running into some gaps in knowledge.
The idea is to find all the file that may be either .doc or .txt as well as search the contents for the text clown.
So I started off with searching from the root as such.
$find /
Then I added the wildcard for filename.'
$find / -name '*.doc'...uhh oh
First question. How do I specify or? Is it with pipe | or double pipe || or...? and do I need to repeat the -name parameter like this?
$find / -name '*.doc' || -name '*.txt'
Second ? do I add the grep for the string after / before...?
$find / -name '*.doc' || -name '*.txt' grep -H 'cat' {} \
Finally is there a place where I can validate syntax / run like SQLFiddle?
TIA
'Or' in find is -o
You have to specify the find type again though. So something like:
find / -name *.doc -o -name *.txt
You can simply put your grep command in front, so long as you encase your find command in backticks:
grep 'whatever' `find / -name *.doc -o -name *.txt`
There's a reasonably nice guide to find here
You want something like this:
find / \( -name \*.doc -o -name \*.txt \) -exec grep clown {} \; -print
you specify or with -o within \( \), you run grep in a -exec and you can validate the syntax in a bash shell.
Try:
(find ./ -name "*.txt" -print0 2>/dev/null ; find ./ -name "*.doc" -print0 2>/dev/null) | xargs -0 grep clown

excluding a directory from find and sed rename

I'm using this command to go through all files, directories and subdirectories to change any mentions of oldurl.com to newurl.org:
find . -type f -name '*.php' -exec sed -i 's|oldurl.com|newurl.org|g' {} +
It works fine, however, I need to exclude three sub-directories from ANY CHANGES: /cache, /archive and /etc as changing the urls with the above command in these paths breaks other scripts.
Haven't had much luck finding an answer... Is it even possible?
Many thanks for any suggestions/help.
Use finds -not Option:
find . -type f -name '*.php' -not \( -path './etc/*' -o -path './cache/*' -o -path './archive/*' \) -exec sed -i 's|oldurl.com|newurl.org|g' {} \;

Resources