find command: delete everything but one folder - linux

I have this command:
find ~/Desktop/testrm -mindepth 1 -path ~/Desktop/testrm/.snapshot -o -mtime +2 -prune -exec rm -rf {} +
I want it to work as is, but it must avoid to remove a specific directory ($ROOT_DIR/$DATA_DIR).
it must remove the files inside the directory but not the directory itself
the flag "r" in rm is needed because it has to delete other directories
-prune is not suitable since it will discard the content and also sub directories

You can exclude individual paths using the short circuiting behavior of -o (like you already did with ~/Desktop/testrm/.snapshot).
However, for each excluded path you also have to exclude all of its parent directories. Otherwise you would delete a/b/c by deleting a/b/ or a/ with rm -rf.
In the following script, the function orParents generates a part of the find command. Example:
find $(orParents a/b/c) ... would run
find -path a/b/c -o -path a/b -o -path a -o ....
#! /usr/bin/env bash
orParents() {
p="$1"
while
printf -- '-path %q -o' "$p"
p=$(dirname "$p")
[ "$p" != . ]
do :; done
}
find ~/Desktop/testrm -mindepth 1 \
$(orParents "$ROOT_DIR/$DATA_DIR") -path ~/Desktop/testrm/.snapshot -o \
-mtime +2 -prune -exec rm -rf {} +
Warning: You have to make sure that $ROOT_DIR/$DATA_DIR does not end with a / and does not contain glob characters like *, ?, and [].
Spaces are ok as printf %q escapes them correctly. However, find -path interprets its argument as a glob pattern independently. We could do a double quoting mechanism. Maybe something like printf %q "$(sed 's/[][*?\]/\\&/' <<< "$p")", but I'm not so sure about how exactly find -path interprets its argument.
Alternatively, you could write a script isParentOf and do ...
find ... -exec isParentOf "$ROOT_DIR/$DATA_DIR" {} \; -o ...
... to exclude $ROOT_DIR/$DATA_DIR and all of its parents. This is probably safer and more portable, but slower and a hassle to set up (find -exec bash -c ... and so on) if you don't want to add a script file to your path.

Related

How to "rm -rf" with excluding files and folders with the "find -o" command

I'm trying to use the find command, but still can't figure out how to pipe the find ... to rm -rf
Here is the directory tree for testing:
/path/to/directory
/path/to/directory/file1_or_dir1_to_exclude
/path/to/directory/file2_or_dir2_to_exclude
/path/to/directory/.hidden_file1_or_dir1_to_exclude
/path/to/directory/.hidden_file2_or_dir2_to_exclude
/path/to/directory/many_other_files
/path/to/directory/many_other_directories
Here is the command for removing the whole directory:
rm -rf /path/to/directory
But how to rm -rf while excluding files and folders?
Here is the man help for reference:
man find
-prune True; if the file is a directory, do not descend into it. If
-depth is given, then -prune has no effect. Because -delete im‐
plies -depth, you cannot usefully use -prune and -delete to‐
gether.
For example, to skip the directory `src/emacs' and all files
and directories under it, and print the names of the other files
found, do something like this:
find . -path ./src/emacs -prune -o -print
What's the -o in this find command? Does it mean "or"? I can't find the meaning of -o in the man page.
mkdir -p /path/to/directory
mkdir -p /path/to/directory/file1_or_dir1_to_exclude
mkdir -p /path/to/directory/file2_or_dir2_to_exclude
mkdir -p /path/to/directory/.hidden_file1_or_dir1_to_exclude
mkdir -p /path/to/directory/.hidden_file2_or_dir2_to_exclude
mkdir -p /path/to/directory/many_other_files
mkdir -p /path/to/directory/many_other_directories
I have tried to use this find command to exclude the .hidden_file1_or_dir1_to_exclude and then pipe it to rm, but this command does not work as expected.
cd /path/to/directory
find . -path ./.hidden_file1_or_dir1_to_exclude -prune -o -print | xargs -0 -I {} rm -rf {}
The meaning of rm -rf is to recursively remove everything in a directory tree.
The way to avoid recursively removing everything inside a directory is to get find to enumerate exactly the files you want to remove, and nothing else (and then of course you don't need rm at all; find knows how to remove files, too).
find . -depth -path './.hidden_file1_or_dir1_to_exclude/*' -o -delete
Using -delete turns on the -depth option, which disables the availability of -prune; but just say "delete if not in this tree" instead. And indeed, as you seem to have discovered already, -o stands for "or".
The reason -delete enables -depth should be obvious; you can't traverse the files inside a directory after you have deleted it.
As an aside, you need to use -print0 if you use xargs -0. (This facility is a GNU extension, and generally not available on POSIX.)
You need to separate files from directories to exclude:
find . -mindepth 1\
\( -path ./dir_to_exclude -o\
-path ./.hidden_dir_to_exclude \) -type d -prune\
-o\
! \( -path ./file_to_exclude -o\
-path ./.hidden_file_to_exclude \)\
-exec echo rm -rf {} \;
You can remove the echo once tested.

"find" command but it stops going deep if it finds a directory starting with "."

I have to make a script that goes through a whole folder (/home, in my case).
I have to save all the files except the ones that start with ., and also, if I find a directory that starts with ., I don't have to care what's inside, I don't have to read it.
For the first part we use the command
for path in $(find /home \! -name ".*");do
where path is a variable that contains the path. But we don't know how to do the directory part.
I thought I'd cut the path through the / and then see if there's any .. In that case, have an if that does not save the file, but I don't know how to cut a string and save it in a variable and then go through it.
You can prune all files starting with a ..
From the man page of GNU find:
-prune True; if the file is a directory, do not descend into it. If -depth is given, false; no effect. Because -delete implies -depth, you cannot usefully use -prune and -delete together.
You should not loop over the result from find. You will get unexpected results if you have filenames with spaces or newlines.
Use xargs or -exec, e.g.
find /home -path "*/.*" -prune -o -print0 | xargs -0I{} sh -c 'echo "doing something with $1"' sh {}
or
find /home -path "*/.*" -prune -o -exec sh -c 'for i; do echo "doing something with $i"; done' sh {} +
The -prune part removes all filenames (files and directories) starting with a dot and does not descend into directories starting with a dot.
All other filenames are printed with a NUL character instead of a newline (-o -print0) and piped to xargs or a shell script is executed with your action (as few times as possible).
To save all filenames into a file:
find /home -path "*/.*" -prune -o -print > allfiles.txt
Try this
for path in $(find /home -type d -name ".*" -prune -o -type f \! -name ".*" -print);do echo $path; done
I think I would do something like that:
for path in $(find . -type f | egrep -v '/\.[^\/]+\/'); do
...
Note that you may have to take extra steps if some of your files have spaces in their names.

Find command works in terminal but not in bash script

I wrote a find command, which finds the files, but excludes other files/directories. I did echo this code and copied it. If I paste it in the terminal, it works. Some files were excluded. But if I execute it from the script, it does not work as expected.
I tried to escape my variables, quote them, between brackets like $() or ${} but nothing worked.
My find code looks like this:
find ${StartDirs[*]} $pat -print
In fact it will be executed like:
find ./bin -wholename './bin/backup2' -prune -o -wholename './bin/backup3' -prune -o -print
The second code above works in the terminal but not in the script.
What did I do wrong?
For more info I will try to paste necessary code below
I am trying to make a backup and want to do that with find and cp. Most code of my script are omitted. I think the code below is the necessary minimal code for this problem.
StartDirs=();
ExcludedFiles=(); #files/directories which needs to be excluded
#check and store excluded files
CheckExcludedFile(){ #this function will be called over and over again by another function (getopts). It depends on the chain of -x options. With the -x option I can decide which file i want to exclude. The Getopts function is also omitted.
exclFile=`find $1 2>/dev/null | wc -l`
if [ $exclFile -lt 1 ]; then
echo $FILEMAPNOTEXIST | sed 's~-~'$1'~g' # FILEMAPNOTEXIST is a variable from another script with error messages
exit 0
else
ExcludedFiles+=($1) #add excluded file/dir path to array
fi
}
MakeBackup(){
for i in ${ExcludedFiles[*]}
do
s=" -wholename $i -prune -o"
pat=$pat$s
done
# the code above edits the array elements of the EcludedFIles[]
#For example by calling the script with the -x option (-x fileA -x fileB -x fileC) wordt als volgt: -wholename 'fileA' -prune -o -wholename 'fileB' -prune -o -wholename 'fileC' -prune -o.
#the pat variable will be used for the find command to ignore files/directories
mkdir -p ~/var
echo "Start-time $(date '+%F %T')" >> ~/var/dq.log
find ./bin -wholename './bin/backup2' -prune -o -wholename './bin/backup3' -prune -o -print
#the code above should work like in terminal. That is not the case..
# find ${StartDirs[*]} $pat -print #this should work also.
# cp -av ${StartDirs[#]} $Destination >> ~/var/dq.log find command not working therefore this rule is commented
echo "end-time $(date '+%F %T')" >> ~/var/dq.log
}
The expected result should simply be some files/directories being excluded if given.
If a full script is necessary, let me know.
The command find ./bin -wholename './bin/backup2' -prune -o -wholename './bin/backup3' -prune -o -print should work as intended, provided the current directory is directly above bin/. This may be the cause of your problems: If in the real script you assemble path names which do not match the prefixes in the found paths then e.g. the prune will not work. Example: You have a dir /home/me; in it is bin/backup2/, bin/backup3/ and stuff-to-backup/. Now if you are in /home/me and execute find . it finds e.g. ./bin/backup2 which will be pruned.
But if you put this in a script and call the script with path arguments, e.g. /home/me, it will find the same files but the paths will be different, e.g. /home/me/bin/backup2, and will not prune it because it does not match the supplied exclude pattern, even though they are the same files. Likewise no patterns supplied with -wholename will be found. Here is a question which addresses this problem.

How to delete all subdirectories with a specific name

I'm working on Linux and there is a folder, which contains lots of sub directories. I need to delete all of sub directories which have a same name. For example,
dir
|---subdir1
|---subdir2
| |-----subdir1
|---file
I want to delete all of subdir1. Here is my script:
find dir -type d -name "subdir1" | while read directory ; do
rm -rf $directory
done
However, I execute it but it seems that nothing happens.
I've tried also find dir -type d "subdir1" -delete, but still, nothing happens.
If find finds the correct directories at all, these should work:
find dir -type d -name "subdir1" -exec echo rm -rf {} \;
or
find dir -type d -name "subdir1" -exec echo rm -rf {} +
(the echo is there for verifying the command hits the files you wanted, remove it to actually run the rm and remove the directories.)
Both piping to xargs and to while read have the downside that unusual file names will cause issues. Also, find -delete will only try to remove the directories themselves, not their contents. It will fail on any non-empty directories (but you should at least get errors).
With xargs, spaces separate words by default, so even file names with spaces will not work. read can deal with spaces, but in your command it's the unquoted expansion of $tar that splits the variable on spaces.
If your filenames don't have newlines or trailing spaces, this should work, too:
find ... | while read -r x ; do rm -rf "$x" ; done
With the globstar option (enable with shopt -s globstar, requires Bash 4.0 or newer):
rm -rf **/subdir1/
The drawback of this solution as compared to using find -exec or find | xargs is that the argument list might become too long, but that would require quite a lot of directories named subdir1. On my system, ARG_MAX is 2097152.
Using xargs:
find dir -type d -name "subdir1" -print0 |xargs -0 rm -rf
Some information not directly related to the question/problem:
find|xargs or find -exec
https://www.everythingcli.org/find-exec-vs-find-xargs/
From the question, it seems you've tried to use while with find. The following substitution may help you:
while IFS= read -rd '' dir; do rm -rf "$dir"; done < <(find dir -type d -name "subdir" -print0)

Getting all files from various folders and copying them with unique names

Currently using this command to get all my "fanart" from my TV folder, and dump it into a single folder.
find /volume1/tv/ -type f \( -name '*fanart.jpg'* -o -path '*/fanart/*.jpg' -o -path '*/extrafanart/*.jpg' \) -exec cp {} /volume1/tv/_FANART \;
Here's the issue: a lot of these files have the same name, and can't be dumped into the same folder. Example:
Folder A
fanart.jpg
Folder B
fanart.jpg
Is there a way to copy these files from their respective folders and give them a unique name in the destination folder? Name needn't be anything descriptive, random is just fine.
Thanks!
find /volume1/tv/ -type f \( -name '*fanart.jpg'* -o -path '*/fanart/*.jpg' -o -path '*/extrafanart/*.jpg' \) -exec cp --backup=numbered {} /volume1/tv/_FANART \;
..
cp --backup=numbered {}
If the file exists, this will not overwrite but make a backup with a number assigned.
The files will be hidden. Ctrl+H to view hidden files
You could copy the files while giving them names according to their locations in the original directory tree. For instance (":" is legal but
unusual in filenames), your "find" command could call a shell script (rather than "cp" directly), which might look like this:
#!/bin/sh
case "x$1" in
x/volume1/tv/_FANART/*)
;;
*)
target=`echo "$1" | sed -e 's,^/volume1/tv/,,' -e s,/,:,g`
cp "$1" "$2/$target"
;;
esac
and the corresponding "-exec" would be
-exec myscript "{}" /volume1/tv/_FANART \;
By the way, the source/destination on the original example are in the same directory tree "/volume1/tv", which is why the sample script uses a case statement - to exclude files already copied to the _FANART folder.
If you want to use the md5sum as the new name:
find /volume1/tv/ -type d -path '/volume1/tv/_FANART' -prune -o -type f \( -name '*fanart.jpg'* -o -path '*/fanart/*.jpg' -o -path '*/extrafanart/*.jpg' \) -exec sh -c 'md5=$(md5sum < "$0") && md5=${md5%% *}.jpg && echo cp "$0" "/volume1/tv/_FANART/$md5"' {} \;
Every thing happens in the sh command (all commands are separated by && but I omitted the && for clarity):
md5=$(md5sum < "$0")
md5=${md5%% *}.jpg
cp "$0" "/volume1/tv/_FANART/$md5"'
the $0 expands to the filename processed. We first compute the md5sum of the file, then only keep the md5sum (md5sum puts a hyphen next to the hash) and append .jpg to that, and finally we copy the file into the target folder, with the computed name.
Notes.
I added
-type d -path '/volume1/tv/_FANART` -prune -o
to your command to omit this folder, since you very likely don't want to process it; it would actually be weird to process it, as its content is changed throughout find's traversal.
I left an echo in the command, so that absolutely nothing is copied (as is, it's 100% safe, you can just copy and paste it in your terminal): it only shows what commands are going to be performed (and you'll also see how fast/slow it is).
The command is 100% safe regarding funny filenames with spaces, newlines, globs, etc.
I used md5sum < fileand not md5sum file, because if the filename file contains special characters (like backslashes, newlines, etc.), md5sum (at least my version) prepends the hash with a backslash. Weird. By not giving a filename, we're safe, this won't happen.

Resources