shell command to delete all directories with empty __init__.py file - linux

I'm looking for a command for Linux shell, that will recursively delete all directories containing just empty __init__.py file and/or other empty directories. So if any file in that directory actually contains at least one byte, it shouldn't be removed.
So, in other words, remove all empty python modules recursively.
Please note, that if directory contains anything else but empty init.py file - it shouldn't be deleted.
What i've found/tried so far was:
find . -type d -empty -delete
And
find . -type d -size -5k -delete
And
find . -type d -size 0 -delete
First one deletes directories without files(in my example, they contain empty init.py file.
Second one somewhy captures all directories
Third doesn't capture anything

It may be possible to do this with one complicated find command, but it's more manageable if you break it up into stages:
Delete empty __init__.py files.
Delete empty directories.
If you do this bottom up using -depth then it'll naturally remove directories containing only empty init files and/or nested empty directories.
find -depth '(' -type d -o -name __init__.py ')' -print0 |
while IFS= read -d '' -r path; do
[[ -f $path && ! -s $path ]] && (($(ls -A1 "$(dirname "$path")" | wc -l) == 1)) && rm "$path"
rmdir "$path" 2> /dev/null || :
done
Steps:
Use -depth to process children before parents.
Find directories and __init__.pys.
Process each match in a loop. -print0 pairs up with read -d '' to make sure we handle paths with spaces and newlines properly.
The only files we matched were __init__.py, so [[ -f && ! -s ]] matches empty init files. (($(ls -A1 "$(dirname "$path")" | wc -l) == 1)) checks that the init file is the only one in its directory. If both conditions are met, the init file is removed.
Try to rmdir the path. If it's an empty directory it'll be removed. If it's a file or a non-empty directory it won't be. That's fine: errors are suppressed with 2> /dev/null. || : ignores the failed exit code, making it safe to run this script with set -e.

You sound like you've already covered how to delete empty directories with your first find command. To delete any directories that have empty files in them you can use:
find . -size 0 -exec dirname {} + | xargs rm -rf
Here the find command will get the directory name for each directory containing empty files and then all directory names will be piped to xargs which will remove them.
If you have files other than __init.py__ that may be empty then you can just directly specify that only 0 byte files named __init.py__ match the find command with:
find . -size 0 -and -name "__init.py__" -exec dirname {} + | xargs rm -rf

Related

Linux Command Line - list all directories containing .js files, and copy the directories and their contents to a new folder

Here is the code I already have that finds and lists all directories containing .js files (excluding the node_modules directory).
find . -name '*.js*' -printf "%h\n" | sort -u | grep -v node_modules
As you can see, listing those directories is no problem. However, rather than list the directories, I would like to copy them (and their contents) to a new folder, preferably all in one line without running any kind of script.
Any help would be much appreciated!
The safest way to do this is to process the list of directories using NULL as the delimiter so that directories with spaces (and other odd characters) are handled correctly.
Remove the echo if the output looks correct.
"1-liner"
find "/path/to/tld" -path "*node_modules*" -prune -o -name "*.js" -printf "%h\0" | \
sort -uz | xargs -0 -I _ echo cp -a _ "/path/to/new/dir"
Bash Script
This requires Bash 4 for the associative array which will filter out duplicates.
#!/bin/bash
tld="/path/to/top/level/dir"
newdir="/path/to/new/dir"
unset dirHash;
declare -A dirHash
while read -r -d $'\0' dir; do
(( ! dirHash["$dir"]++ )) && echo cp -a "$dir" "$newdir"
done < <(find "$tld" -path "*node_modules*" -prune -o -name "*.js" -printf "%h\0")

Moving files with a pattern in their name to a folder with the same pattern as its name

My directory contains mix of hundreds of files and directories similar to this:
508471/
ae_lstm__ts_ 508471_detected_anomalies.pdf
ae_lstm__508471_prediction_result.pdf
mlp_508471_prediction_result.pdf
mlp__ts_508471_detected_anomalies.pdf
vanilla_lstm_508471_prediction_result.pdf
vanilla_lstm_ts_508471_detected_anomalies.pdf
598690/
ae_lstm__ts_598690_detected_anomalies.pdf
ae_lstm__598690_prediction_result.pdf
mlp_598690_prediction_result.pdf
mlp__ts_598690_detected_anomalies.pdf
vanilla_lstm_598690_prediction_result.pdf
vanilla_lstm_ts_598690_detected_anomalies.pdf
There are folders with an ID number as their names, like 508471 and 598690.
In the same path as these folders, there are pdf files that have this ID number as part of their name. I need to move all the pdf files with the same ID in their name, to their related directories.
I tried the following shell script but it doesn't do anything. What am I doing wrong?
I'm trying to loop over all the directories, find the files that have id in their name, and move them to the same dir:
for f in ls -d */; do
id=${f%?} # f value is '598690/', I'm removing the last character, `\`, to get only the id part
find . -maxdepth 1 -type f -iname *.pdf -exec grep $id {} \; -exec mv -i {} $f \;
done
#!/bin/sh
find . -mindepth 1 -maxdepth 1 -type d -exec sh -c '
for d in "$#"; do
id=${d#./}
for file in *"$id"*.pdf; do
[ -f "$file" ] && mv -- "$file" "$d"
done
done
' findshell {} +
This finds every directory inside the current one (finding, for example, ./598690). Then, it removes ./ from the relative path and selects each file that contains the resulting id (598690), moving it to the corresponding directory.
If you are unsure of what this will do, put an echo between && and mv, it will list the mv actions the script would make.
And remember, do not parse ls.
The below code should do the required job.
for dir in */; do find . -mindepth 1 -maxdepth 1 -type f -name "*${dir%*/}*.pdf" -exec mv {} ${dir}/ \;; done
where */ will consider only the directories present in the given directory, find will search only files in the given directory which matches *${dir%*/}*.pdf i.e file name containing the directory name as its sub-string and finally mv will copy the matching files to the directory.
in Unix please use below command
find . -name '*508471*' -exec bash -c 'echo mv $0 ${0/508471/598690}' {} \;
You may use this for loop from the parent directory of these pdf files and directories:
for d in */; do
compgen -G "*${d%/}*.pdf" >/dev/null && mv *"${d%/}"*.pdf "$d"
done
compgen -G is used to check if there is a match for given glob or not.

How to delete all subdirectories with a specific name

I'm working on Linux and there is a folder, which contains lots of sub directories. I need to delete all of sub directories which have a same name. For example,
dir
|---subdir1
|---subdir2
| |-----subdir1
|---file
I want to delete all of subdir1. Here is my script:
find dir -type d -name "subdir1" | while read directory ; do
rm -rf $directory
done
However, I execute it but it seems that nothing happens.
I've tried also find dir -type d "subdir1" -delete, but still, nothing happens.
If find finds the correct directories at all, these should work:
find dir -type d -name "subdir1" -exec echo rm -rf {} \;
or
find dir -type d -name "subdir1" -exec echo rm -rf {} +
(the echo is there for verifying the command hits the files you wanted, remove it to actually run the rm and remove the directories.)
Both piping to xargs and to while read have the downside that unusual file names will cause issues. Also, find -delete will only try to remove the directories themselves, not their contents. It will fail on any non-empty directories (but you should at least get errors).
With xargs, spaces separate words by default, so even file names with spaces will not work. read can deal with spaces, but in your command it's the unquoted expansion of $tar that splits the variable on spaces.
If your filenames don't have newlines or trailing spaces, this should work, too:
find ... | while read -r x ; do rm -rf "$x" ; done
With the globstar option (enable with shopt -s globstar, requires Bash 4.0 or newer):
rm -rf **/subdir1/
The drawback of this solution as compared to using find -exec or find | xargs is that the argument list might become too long, but that would require quite a lot of directories named subdir1. On my system, ARG_MAX is 2097152.
Using xargs:
find dir -type d -name "subdir1" -print0 |xargs -0 rm -rf
Some information not directly related to the question/problem:
find|xargs or find -exec
https://www.everythingcli.org/find-exec-vs-find-xargs/
From the question, it seems you've tried to use while with find. The following substitution may help you:
while IFS= read -rd '' dir; do rm -rf "$dir"; done < <(find dir -type d -name "subdir" -print0)

Find pattern of the file, create a folder with that pattern and copy the files to that folder - Bash script

I have a task, to find the pattern of the file, create a folder with the pattern name and copy the file to that folder. I am able to create the folders.
folders=`find /Location -type f -name "*.pdf" -printf "%f\n" | cut -f 1 -d '_' | sort -u`
for i in $folders
do
mkdir -p /LocationToCreateTheFolder/$i
done
Not able to go further on how to copy the files.
maybe try?
for i in $folders do mkdir -p /LocationToCreateTheFolder/$i && cp ./$i.pdf ./$i/
This will do the finding and the copying:
find Location -type f -name '*.pdf' -exec bash -c 'f=${1##*/}; d="LocationToCreateTheFolder/${f%%_*}"; mkdir -p "$d" && cp "$1" "$d"' None {} \;
This is safe for difficult file names even ones that contain spaces, tabs, or newlines in their names.
How it works
find Location -type f -name '*.pdf' -exec bash -c '...' None {} \;
This will find the pdf files under directory Location and, for each one found, the bash commands inside '...' will be executed with $1 set to the name of the file found. ($0 is set to None. We don't use $0.)
f=${1##*/}
This removes the directory names from the name of the file. This is an example of prefix removal: everything in $1 up to and including the last / is removed.
d="LocationToCreateTheFolder/${f%%_*}"
This creates the name of the directory to which we want to send the file.
${f%%_*}" is an example of suffix removal. Everything in $f from the first _ and after is removed.
mkdir -p "$d" && cp "$1" "$d"
This makes sure that the directory exists and then copies the file to it.

How to find/list the directories where a particular sub-directory is not present

I am writing a shell script where it is checking if the bin directory is present under all the users directory under /home directory. The bin directory can be present directly under user directory or under the child directory of the user directory.
I mean let say I have a user as amit under /home. So the bin directory can be present directly as /amit/bin or can be present as /amit/jash/bin
Now my requirement is that I should have a list of users directories where the bin directory is not present either directly under user directory or under the child directory of the user directory. I tried the command as :
find /home -type d ! -exec test -e '{}/bin' \; -print
but it is not working. However when I am replacing the bin directory with some file, the command is working fine. Looks like this command is particularly for files. Is there any similar command for directories?? Any help on this will be greatly appreciated.
You're on the right track. The catch is that your test of "does the following directory NOT exist in this target" can't be expressed within find's conditions in such a way as to return only the top-level directory. So you need to nest, one way or another.
One strategy would be to use a for loop in bash:
$ mkdir foo bar baz one two
$ mkdir bar/bin baz/bin
$ for d in /home/*/; do find "$d" -type d -name bin | grep -q . || echo "$d"; done
foo/
one/
two/
This uses pathname expansion (globbing) to generate the list of directories to test, and then checks for the existence of "bin". If that check fails (i.e. find outputs nothing), the directory is printed. Note the trailing slash on /home/*/, which ensures that you will only be searching within directories, rather than files that might accidentally exist in /home/.
Another possibility might be to use nested finds, if you don't want to depend on bash:
$ find /home/ -type d -depth 1 -not -exec sh -c "find {}/ -type d -name bin -print | grep -q . " \; -print
/home/foo
/home/one
/home/two
This roughly duplicates the effect of the bash for loop above, but by nesting find within find -exec. It uses grep -q . to convert the output of find into an exit status that can be used as a condition for the outer find.
Note that since you're looking for a bin directory, we want to use test -d rather than test -e (which would also check for a bin file, which probably does not matter to you.)
Another option is to use bash process redirection. On multiple lines for easier reading:
cd /home/
comm -3 \
<(printf '%s\n' */ | sed 's|/.*||' | sort) \
<(find */ -type d -name bin | cut -d/ -f1 | uniq)
This unfortunately requires you to change to the /home directory before running, because of the way it strips off subdirectories. You can of course collapse this into a big long one-liner if you feel so inclined.
This comm solution also has the risk of failing on directories with special characters in their names, like newlines.
One last option is bash-only but more than a one-liner. It involves subtracting the directories containing "bin" from the full list. It uses an associative array and globstar, so it depends on bash version 4.
#!/usr/bin/env bash
shopt -s globstar
# Go to our root
cd /home
# Declare an associative array
declare -A dirs=()
# Populate the array with our "full" list of home directories
for d in */; do dirs[${d%/}]=""; done
# Remove directories that contain a "bin" somewhere inside 'em
for d in **/bin; do unset dirs[${d%%/*}]; done
# Print the result in reproducible form
declare -p dirs
# Or print the result just as a list of words.
printf '%s\n' "${!dirs[#]}"
Note that we're storing directories in the array index, which (1) makes it easy for us to find and delete items, and (2) insures unique entries, even if one user has multiple "bin" directories under their home.
cd /home
find . -maxdepth 1 -type d ! -name . | sort > a
find . -type d -name bin | cut -d/ -f1,2 | sort > b
comm -23 a b
Here, I'm making two sorted lists. The first contains all the home directories, and the second contains the top parent of any bin subdirectory. Finally I output any items from the first list not present in the second.

Resources