Remove all but newest file from all sub directories - linux

I have found the following which will list the files in all subdirectories, hide the last 5, and then delete the rest:
find -type f -printf '%T# %P\n' | sort -n | cut -d' ' -f2- | head -n -5 | xargs rm
Unfortunately if I don't know how many subdirectories there are, it won't delete the correct number of files. Does anyone have a way to transverse each directory, and then delete all but the newest of file in each subdirectory?
Directory structure would be the following:
-> Base Directory -> Parent Directory -> Child directory

I'd write a script.
It would be a recursive function:
call function: rm_files(base_dir)
list all directories
if there are directories go through on the list and call rm_files(act_dir) for each item
else (if there is no directories):
list all files
delete all files but the newest
return from function
In case lot of subdirectories it may be memory problem because of the recursive function.

I found I was able to do what I needed to do with the following one liner:
find . -name *.* -mmin +59 -delete > /dev/null

Related

Counting number of files in a directory with an OSX terminal command

I'm looking for a specific directory file count that returns a number. I would type it into the terminal and it can give me the specified directory's file count.
I've already tried echo find "'directory' | wc -l" but that didn't work, any ideas?
You seem to have the right idea. I'd use -type f to find only files:
$ find some_directory -type f | wc -l
If you only want files directly under this directory and not to search recursively through subdirectories, you could add the -maxdepth flag:
$ find some_directory -maxdepth 1 -type f | wc -l
Open the terminal and switch to the location of the directory.
Type in:
find . -type f | wc -l
This searches inside the current directory (that's what the . stands for) for all files, and counts them.
The fastest way to obtain the number of files within a directory is by obtaining the value of that directory's kMDItemFSNodeCount metadata attribute.
mdls -name kMDItemFSNodeCount directory_name -raw|xargs
The above command has a major advantage over find . -type f | wc -l in that it returns the count almost instantly, even for directories which contain millions of files.
Please note that the command obtains the number of files, not just regular files.
I don't understand why folks are using 'find' because for me it's a lot easier to just pipe in 'ls' like so:
ls *.png | wc -l
to find the number of png images in the current directory.
I'm using tree, this is the way :
tree ph

How to recursively remove different files in two directories

I have 2 different recursive directories, in one directory have 200 .txt files in another have 210 .txt files, need a script to find the different file names and remove them from the directory.
There are probably better ways, but I think about:
find directory1 directory2 -name \*.txt -printf '%f\n' |
sort | uniq -u |
xargs -I{} find directory1 directory2 -name {} -delete
find directory1 directory2 -name \*.txt -printf '%f\n':
print basename of each file matching the glob *.txt
sort | uniq -u:
only print unique lines (if you wanted to delete duplicate, it would have been uniq -d)
xargs -I{} find directory1 directory2 -name {} -delete:
remove them (re-specify the path to narrow the search and avoid deleting files outside the initial search path)
Notes
Thank's to #KlausPrinoth for all the suggestions.
Obviously I'm assuming a GNU userland, I suppose people running with the tools providing bare minimum POSIX compatibility will be able to adapt it.
Yet another way is to use diff which is more than capable in finding file differences in files in directories. For instance if you have d1 and d2 that contain your 200 and 210 files respectively (with the first 200 files being the same), you could use diff and process substitution to provide the names to remove to a while loop:
( while read -r line; do printf "rm %s\n" ${line##*: }; done < <(diff -q d1 d2) )
Output (of d1 with 10 files, d2 with 12 files)
rm file11.txt
rm file12.txt
diff will not fit all circumstances, but is does a great job finding directory differences and is quite flexible.

Create a bash script to delete folders which do not contain a certain filetype

I have recently run into a problem.
I used a utility to move all my music files into directories based on tags. This left a LOT of almost empty folders. The folders, in general, contain a thumbs.db file or some sort of image for album art. The mp3s have the correct album art in their new directories, so the old ones are okay to delete.
Basically, I need to find any directories within D:/Music/ that:
-Do not have any subdirectories
-Do not contain any mp3 files
And then delete them.
I figured this would be easier to do in a shell script or bash script or whatever else linux/unix world than in Windows 8.1 (HAHA).
Any suggestions? I'm not very experienced writing scripts like this.
This should get you started
find /music -mindepth 1 -type d |
while read dt
do
find "$dt" -mindepth 1 -type d | read && continue
find "$dt" -iname '*.mp3' -type f | read && continue
echo DELETE $dt
done
Here's the short story...
find . -name '*.mp3' -o -type d -printf '%h\n' | sort | uniq > non-empty-dirs.tmp
find . -type d -print | sort | uniq > all-dirs.tmp
comm -23 all-dirs.tmp non-empty-dirs.tmp > dirs-to-be-deleted.tmp
less dirs-to-be-deleted.tmp
cat dirs-to-be-deleted.tmp | xargs rm -rf
Note that you might have to run all the commands a few times (depending on your repository's directory depth) before you're done deleting all recursive empty directories...
And the long story goes...
You can approach this problem from two basic perspective: either you find all directories, then iterate over each of them, check if it contain any mp3 file or any subdirectory, if not, mark that directory for deletion. It will works, but on large very large repositories, you might expect a significant run time.
Another approach, which is in my sense much more interesting, is to build a list of directories NOT to be deleted, and subtract that list from the list of all directories. Let's work the second strategy, one step at a time...
First of all, to find the path of all directories that contains mp3 files, you can simply do:
find . -name '*.mp3' -printf '%h\n' | sort | uniq
This means "find any file ending with .mp3, then print the path to it's parent directory".
Now, I could certainly name at least ten different approaches to find directories that contains at least one subdirectory, but keeping the same strategy as above, we can easily get...
find . -type d -printf '%h\n' | sort | uniq
What this means is: "Find any directory, then print the path to it's parent."
Both of these queries can be combined in a single invocation, producing a single list containing the paths of all directories NOT to be deleted.. Let's redirect that list to a temporary file.
find . -name '*.mp3' -o -type d -printf '%h\n' | sort | uniq > non-empty-dirs.tmp
Let's similarly produce a file containing the paths of all directories, no matter if they are empty or not.
find . -type d -print | sort | uniq > all-dirs.tmp
So there, we have, on one side, the complete list of all directories, and on the other, the list of directories not to be deleted. What now? There are tons of strategies, but here's a very simple one:
comm -23 all-dirs.tmp non-empty-dirs.tmp > dirs-to-be-deleted.tmp
Once you have that, well, review it, and if you are satisfied, then pipe it through xargs to rm to actually delete the directories.
cat dirs-to-be-deleted.tmp | xargs rm -rf

Recursive script Delete folders by name all but 2

I need to write a recursive script to delete all folders in a subfolder named 'date-2012-01-01_12_30' but leave the two latest.
/var/www/temp/updates/ then hundreds of folders by 'date' and by 'code'
e.g.
/var/www/temp/updates/2012-01-01/temp1/date-2012-01-_12_30
/var/www/temp/updates/2012-01-01/temp1/date-2012-02-_13_30
/var/www/temp/updates/2012-01-01/temp1/date-2013-11-_12_30
/var/www/temp/updates/2012-01-01/temp2/date-2012-01-_12_30
I was thinking about using a find to get the folder but unsure how to know what folders I can delete as the script will have to know how date- folders are in that subfolder and which ones are the latest ones
Hmm, any help would be great?
This should work:
find /var/www/temp/updates/ -type d -name "date-*" -printf '%T# %p\n' | sort -n | head -n -2 | cut -f2- | xargs rm -rf
find prints out the directory paths along with their last modification times. This is then sorted and all but the last two are deleted.
If all folders are in subdirectories temp1, temp2, ..., you can just use ls -tr
ls -dtr /var/www/temp/updates/2012-01-01/temp*/* | head -n -2 | xargs rm -rf
This lists all folders sorted by time ls -dtr, takes all but the two latest head and removes the remaining folders xargs rm -rf.

Get the date of the most recent entry (file or directory) within a directory

I have a bunch of directories with different revisions of the same c++ project within. I'd like to make things sorted out moving each of these directories to a parent directory named by pattern of YYYY.MM.DD. Where YYYY.MM.DD is the date of the most recent entry (file or directory) in a directory.
How can I recursively find the date of the most recent entry in a particular directory?
Update
Below is one of the ways to do it:
find . -not -type d -printf "%T+ %p\n" | sort -n | tail -1
Or even:
find . -not -type d -printf "%TY.%Tm.%Td\n" | sort -n | tail -1
Try using ls -t|head -n 1 to list files sorted by modification date and show only the first. The date will be in the format defined by your locale (ie YYYY-MM-DD).
For example,
ls -tl | awk '{date=$6; file=$8; system("mkdir " date); system("mv $8 " " date"/")'
will go through all files and create a directory for every modification data and move the file there (beware: care must be taken for filenames containing whitespace). Now use find -type d in the root directory of the source tree to recursively list all the directories. Combined with the above you have now (sadly there is some overhead now):
for dir in $(find -type d) ; do
export dir
ls -tl dir| awk '{dir=ENVIRON["dir"]; date=$6; file=$8; system("mkdir " dir "/" date); system("mv " dir "/" $8 " " dir "/" date"/")'
done
This does not go recursively through the tree, but takes all directories of the complete tree and then iterates over them. If you need the date-directories outside of the source tree (suppose so), just edit the two system() calls in the awk script accordingly.
Edited: fix script, add more description
Another option, mixing your solution with the previous answer:
find -print0 | xargs --null ls -dtl
It shows directories as well.

Resources