Shell script to delete all folders but the last 5 ones - linux

I'm looking for a variant on the following script to delete all folders except the last 5 created:
find ./ -type d -ctime +10 -exec rm -rf {} +
So this will delete everything older than 10 days.
But the time factor in my case does not always apply. I need a similar script to delete folders, but I always want to keep the last 5 created folders (by date).
So when there are 100 folders, it needs to delete 95 of them and keep the last 5 created.
When there are 5, it needs to keep them all.
When there are 6, it needs to delete only the first created and keep the other 5.

First find the files printed with a leading date in column #1, sort by date, omit the last 5 newest items from the list, (the ones to keep), remove column #1, and then rm whatever older directories are left. Test code with echo first:
find . -type d -printf '%T# %p\n' | sort -n |
head -n -5 | cut -f2- -d" " | xargs -0 echo rm -rd
...and only if it looks OK, remove the echo to do the deed:
find . -type d -printf '%T# %p\n' | sort -n |
head -n -5 | cut -f2- -d" " | xargs -0 rm -rd
Some of the above code stolen from plundra's answer to "How to recursively find the latest modified file in a directory?"

Pretty much untested, hence the 'ls -ld' at the end! :)
find ./ -type d -printf '%T# %p\n' | sort -nr | cut -d' ' -f2- | grep "^....*" | tail --lines=+5 | xargs -i ls -ld {}

Related

Remove similar directories with conditions in bash

Lets say I have several directories, which are similar but are slightly different at the end:
XYZ_e6586_e5984
XYZ_e3282_e5984
XYZ_e9823_e5984
Now, in case there are two or more directories whose name is identical except the number between e and _ , only the directories with the highest number should be kept. In this case, XYZ_e6586_e5984 and XYZ_e3282_e5984 should be removed.
How do I do that?
Simple find regex case here:
find /directory -mindepth 1 -maxdepth 1 -type d -regextype sed -regex "XYZ_e[0-9]\{4}\_e5984 -print0" | sort -nr | tail -n +2 | xargs -i -0 rm -rf "{}"
Yet this will only work on linux with GNU find. A more portable but less pretty version is
find /directory -mindepth 1 -maxdepth 1 -type d -regextype sed -regex "XYZ_e[0-9][0-9][0-9][0-9]_e5984" | sort -nr | tail -n +2 | xargs -i rm -rf "{}"
Explanation:
Use -mindepth 1 and -maxdepth 1 to search only direct children of /directory.
-type -d specifies only searching for directories.
Regexes are pretty self explanatory in that case.
-print0 helps to deal with special characters
sort -nr sorts the output numericaly from highest to lowest
tail -n +2 skips first line (ie the highest numbered folder to keep)
xargs -i rm -rf "{}" performs the actual deletion (-0 is necessary because of -print0).
Just make sure the sort reverse gets done right (replace xargs -i rm -rf "{}" with echo "xargs -i rm -rf \"{}\"" to show the actual commands that would get executed.
If not sorted right, try export LANG=C before executing the command.

shell script to delete all files except the last updated file in different folders

My application logs will be created in below folders in linux system.
Folder 1: 100001_1001
folder 2 : 200001_1002
folder 3 :300061_1003
folder 4: 300001_1004
folder 5 :400011_1008
want to delete all files except the latest file in above folders and want to add this to cron job.
i tried below one not working need help
30 1 * * * ls -lt /abc/cde/etc/100* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
30 1 * * * ls -lt /abc/cde/etc/200* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
30 1 * * * ls -lt /abc/cde/etc/300* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
30 1 * * * ls -lt /abc/cde/etc/400* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
You can use this pipeline consisting all gnu utilities (so that we can also handle file paths with special characters, whitespaces and glob characters)
find /parent/log/dir -type f -name '*.zip' -printf '%T#\t%p\0' |
sort -zk1,1rn |
cut -zf2 |
tail -z -n +2 |
xargs -0 rm -f
Using a slightly modified approach to your own:
find /abc/cde/etc/100* -printf "%A+\t%p\n" | sort -k1,1r| awk 'NR!=1{print $2}' | xargs -i rm "{}"
The find version doesn't suffer the lack of paths, so this MIGHT work (I don't know anything about the directory structure, and whether 100* points at a directory, a file or a group of files ...
You should use find, instead. It has a -delete action that deletes he files it found that match you specification. Warning: it is very easy to go wrong with -delete. Test your command first. Example, to find all files named *.zip under a/b/c (and only files):
find a/b/c -depth -name '*.zip' -type f -print
This is the test, it will print all files that the final command will delete (do not forget the -depth, it is important). And once you are sure, the command that does the deletion is:
find a/b/c -depth -name '*.zip' -type f -delete
find also has options to select files by last modification date, by size... You could, for instance, find all files that were modified at least 24 hours ago:
find a/b/c -depth -type f -mtime +0 -print
and, after careful check, delete them:
find a/b/c -depth -type f -mtime +0 -delete

Delete the first 10 largest regular files using shell script

I'm trying to delete the first largest regular files from the given directory, but it doesn't work for files which contain whitespace caracters.
My code (it works if the files doesn't contain whitespace caracters):
find mydir -type f -exec du -ahb {} + | sort -n -r | cut -f2 | head -n 10 | xargs rm -i
I also tried this, but it gives an error message:
find mydir -type f -exec du -ahb {} + -print 0 | sort -n -r | cut -f2 | head -n 10 | xargs -0 rm -i
The following should work at least with GNU coreutils 8.25 and newer :
find mydir -type f -exec du -0b {} + | sort -znr | cut -zf2 | head -zn 10 | xargs -0pn 1 rm
I made sure every command handled and outputted NUL bytes (\0) separated records rather than linefeed separated records :
du outputs NUL-separated records with -0
sort, cut and head handle and output NUL-separated records with -z
xargs handles NUL-separated records with -0
Additionally, I removed the interactive mode of rm and asked xargs to handle that instead (-p), because xargs didn't provide a prompt to rm when invoking it. I had to limit the number of parameters given at once to rm to 1 for this to work (xargs' -n 1 parameter). There might be a way to preserve the -i and provide rm with an interface to your prompt, but I don't know how.
Last point : I removed du's -human-readable mode because it would have made the sort often fail and it didn't serve any purpose since the filesizes were never displayed to an human.

How to count number of files in each directory?

I am able to list all the directories by
find ./ -type d
I attempted to list the contents of each directory and count the number of files in each directory by using the following command
find ./ -type d | xargs ls -l | wc -l
But this summed the total number of lines returned by
find ./ -type d | xargs ls -l
Is there a way I can count the number of files in each directory?
This prints the file count per directory for the current directory level:
du -a | cut -d/ -f2 | sort | uniq -c | sort -nr
Assuming you have GNU find, let it find the directories and let bash do the rest:
find . -type d -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
find . -type f | cut -d/ -f2 | sort | uniq -c
find . -type f to find all items of the type file, in current folder and subfolders
cut -d/ -f2 to cut out their specific folder
sort to sort the list of foldernames
uniq -c to return the number of times each foldername has been counted
You could arrange to find all the files, remove the file names, leaving you a line containing just the directory name for each file, and then count the number of times each directory appears:
find . -type f |
sed 's%/[^/]*$%%' |
sort |
uniq -c
The only gotcha in this is if you have any file names or directory names containing a newline character, which is fairly unlikely. If you really have to worry about newlines in file names or directory names, I suggest you find them, and fix them so they don't contain newlines (and quietly persuade the guilty party of the error of their ways).
If you're interested in the count of the files in each sub-directory of the current directory, counting any files in any sub-directories along with the files in the immediate sub-directory, then I'd adapt the sed command to print only the top-level directory:
find . -type f |
sed -e 's%^\(\./[^/]*/\).*$%\1%' -e 's%^\.\/[^/]*$%./%' |
sort |
uniq -c
The first pattern captures the start of the name, the dot, the slash, the name up to the next slash and the slash, and replaces the line with just the first part, so:
./dir1/dir2/file1
is replaced by
./dir1/
The second replace captures the files directly in the current directory; they don't have a slash at the end, and those are replace by ./. The sort and count then works on just the number of names.
Here's one way to do it, but probably not the most efficient.
find -type d -print0 | xargs -0 -n1 bash -c 'echo -n "$1:"; ls -1 "$1" | wc -l' --
Gives output like this, with directory name followed by count of entries in that directory. Note that the output count will also include directory entries which may not be what you want.
./c/fa/l:0
./a:4
./a/c:0
./a/a:1
./a/a/b:0
Slightly modified version of Sebastian's answer using find instead of du (to exclude file-size-related overhead that du has to perform and that is never used):
find ./ -mindepth 2 -type f | cut -d/ -f2 | sort | uniq -c | sort -nr
-mindepth 2 parameter is used to exclude files in current directory. If you remove it, you'll see a bunch of lines like the following:
234 dir1
123 dir2
1 file1
1 file2
1 file3
...
1 fileN
(much like the du-based variant does)
If you do need to count the files in current directory as well, use this enhanced version:
{ find ./ -mindepth 2 -type f | cut -d/ -f2 | sort && find ./ -maxdepth 1 -type f | cut -d/ -f1; } | uniq -c | sort -nr
The output will be like the following:
234 dir1
123 dir2
42 .
Everyone else's solution has one drawback or another.
find -type d -readable -exec sh -c 'printf "%s " "$1"; ls -1UA "$1" | wc -l' sh {} ';'
Explanation:
-type d: we're interested in directories.
-readable: We only want them if it's possible to list the files in them. Note that find will still emit an error when it tries to search for more directories in them, but this prevents calling -exec for them.
-exec sh -c BLAH sh {} ';': for each directory, run this script fragment, with $0 set to sh and $1 set to the filename.
printf "%s " "$1": portably and minimally print the directory name, followed by only a space, not a newline.
ls -1UA: list the files, one per line, in directory order (to avoid stalling the pipe), excluding only the special directories . and ..
wc -l: count the lines
This can also be done with looping over ls instead of find
for f in */; do echo "$f -> $(ls $f | wc -l)"; done
Explanation:
for f in */; - loop over all directories
do echo "$f -> - print out each directory name
$(ls $f | wc -l) - call ls for this directory and count lines
This should return the directory name followed by the number of files in the directory.
findfiles() {
echo "$1" $(find "$1" -maxdepth 1 -type f | wc -l)
}
export -f findfiles
find ./ -type d -exec bash -c 'findfiles "$0"' {} \;
Example output:
./ 6
./foo 1
./foo/bar 2
./foo/bar/bazzz 0
./foo/bar/baz 4
./src 4
The export -f is required because the -exec argument of find does not allow executing a bash function unless you invoke bash explicitly, and you need to export the function defined in the current scope to the new shell explicitly.
My answer is a little different, due to the options of find, you can actually be much more flexible. Just try:
find . -type f -printf "%h\n" | sort | uniq -c
With the "%h" option to "-printf", find prints only the directory of the files it found. Then sort and count with "uniq -c". This prints the number of search result entries with the same directory, per directory.
Using further options on find, you can be much more flexible. For example, to get an overview how many files in which directory have been modified at a certain date, use:
find . -newermt "2022-01-01 00:00:00" -type f -printf "%TY-%Tm-%Td %h\n" | sort | uniq -c
This finds all files that have been modified since 1. January 2022, prints (with "-printf") the modification date and the directory, then sorts and counts them. In this example, each line in the result has the number of files, the date of modification (without time), and the directory.
Note that "-printf" may not be available in all versions of find I think.
I combined #glenn jackman's answer and #pcarvalho's answer(in comment list, there is something wrong with pcarvalho's answer because the extra style control function of character '`'(backtick)).
My script can accept path as an augument and sort the directory list as ls -l, also it can handles the problem of "space in file name".
#!/bin/bash
OLD_IFS="$IFS"
IFS=$'\n'
for dir in $(find $1 -maxdepth 1 -type d | sort);
do
files=("$dir"/*)
printf "%5d,%s\n" "${#files[#]}" "$dir"
done
FS="$OLD_IFS"
My first answer in stackoverflow, and I hope it can help someone ^_^
THis could be another way to browse through the directory structures and provide depth results.
find . -type d | awk '{print "echo -n \""$0" \";ls -l "$0" | grep -v total | wc -l" }' | sh
find . -type f -printf '%h\n' | sort | uniq -c
gives for example:
5 .
4 ./aln
5 ./aln/iq
4 ./bs
4 ./ft
6 ./hot
I tried with some of the others here but ended up with subfolders included in the file count when I only wanted the files. This prints ./folder/path<tab>nnn with the number of files, not including subfolders, for each subfolder in the current folder.
for d in `find . -type d -print`
do
echo -e "$d\t$(find $d -maxdepth 1 -type f -print | wc -l)"
done
This will give the overall count.
for file in */; do echo "$file -> $(ls $file | wc -l)"; done | cut -d ' ' -f 3| py --ji -l 'numpy.sum(l)'
A super fast miracle command, which recursively traverses files to count the number of images in a directory and organize the output by image extension:
find . -type f | sed -e 's/.*\.//' | sort | uniq -c | sort -n | grep -Ei '(tiff|bmp|jpeg|jpg|png|gif)$'
Credits: https://unix.stackexchange.com/a/386135/354980
I edited the script in order to exclude all node_modules directories inside the analyzed one.
This can be used to check if the project number of files is exceeding the maximum number that the file watcher can handle.
find . -type d ! -path "*node_modules*" -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
To check the maximum files that your system can watch:
cat /proc/sys/fs/inotify/max_user_watches
node_modules folder should be added to your IDE/editor excluded paths in slow systems, and the other files count shouldn't ideally exceed the maximum (which can be changed though).
Easy Method:
find ./|grep "Search_file.txt" |cut -d"/" -f2|sort |uniq -c
In my case I needed the count at subfolder level, so I did:
du -a | cut -d/ -f3 | sort | uniq -c | sort -nr
Easy way to recursively find files of a given type. In this case, .jpg files for all folders in current directory:
find . -name *.jpg -print | wc -l
omg why the complex commands. just use something like
find whatever_folder | wc -l

Shell script to count files, then remove oldest files

I am new to shell scripting, so I need some help here. I have a directory that fills up with backups. If I have more than 10 backup files, I would like to remove the oldest files, so that the 10 newest backup files are the only ones that are left.
So far, I know how to count the files, which seems easy enough, but how do I then remove the oldest files, if the count is over 10?
if [ls /backups | wc -l > 10]
then
echo "More than 10"
fi
Try this:
ls -t | sed -e '1,10d' | xargs -d '\n' rm
This should handle all characters (except newlines) in a file name.
What's going on here?
ls -t lists all files in the current directory in decreasing order of modification time. Ie, the most recently modified files are first, one file name per line.
sed -e '1,10d' deletes the first 10 lines, ie, the 10 newest files. I use this instead of tail because I can never remember whether I need tail -n +10 or tail -n +11.
xargs -d '\n' rm collects each input line (without the terminating newline) and passes each line as an argument to rm.
As with anything of this sort, please experiment in a safe place.
find is the common tool for this kind of task :
find ./my_dir -mtime +10 -type f -delete
EXPLANATIONS
./my_dir your directory (replace with your own)
-mtime +10 older than 10 days
-type f only files
-delete no surprise. Remove it to test your find filter before executing the whole command
And take care that ./my_dir exists to avoid bad surprises !
Make sure your pwd is the correct directory to delete the files then(assuming only regular characters in the filename):
ls -A1t | tail -n +11 | xargs rm
keeps the newest 10 files. I use this with camera program 'motion' to keep the most recent frame grab files. Thanks to all proceeding answers because you showed me how to do it.
The proper way to do this type of thing is with logrotate.
I like the answers from #Dennis Williamson and #Dale Hagglund. (+1 to each)
Here's another way to do it using find (with the -newer test) that is similar to what you started with.
This was done in bash on cygwin...
if [[ $(ls /backups | wc -l) > 10 ]]
then
find /backups ! -newer $(ls -t | sed '11!d') -exec rm {} \;
fi
Straightforward file counter:
max=12
n=0
ls -1t *.dat |
while read file; do
n=$((n+1))
if [[ $n -gt $max ]]; then
rm -f "$file"
fi
done
I just found this topic and the solution from mikecolley helped me in a first step. As I needed a solution for a single line homematic (raspberrymatic) script, I ran into a problem that this command only gave me the fileames and not the whole path which is needed for "rm". My used CUxD Exec command can not start in a selected folder.
So here is my solution:
ls -A1t $(find /media/usb0/backup/ -type f -name homematic-raspi*.sbk) | tail -n +11 | xargs rm
Explaining:
find /media/usb0/backup/ -type f -name homematic-raspi*.sbk searching only files -type f whiche are named like -name homematic-raspi*.sbk (case sensitive) or use -iname (case insensitive) in folder /media/usb0/backup/
ls -A1t $(...) list the files given by find without files starting with "." or ".." -A sorted by mtime -t and with a return of only one column -1
tail -n +11 return of only the last 10 -n +11 lines for following rm
xargs rm and finally remove the raiming files in the list
Maybe this helps others from longer searching and makes the solution more flexible.
stat -c "%Y %n" * | sort -rn | head -n +10 | \
cut -d ' ' -f 1 --complement | xargs -d '\n' rm
Breakdown: Get last-modified times for each file (in the format "time filename"), sort them from oldest to newest, keep all but the last ten entries, and then keep all but the first field (keep only the filename portion).
Edit: Using cut instead of awk since the latter is not always available
Edit 2: Now handles filenames with spaces
On a very limited chroot environment, we had only a couple of programs available to achieve what was initially asked. We solved it that way:
MIN_FILES=5
FILE_COUNT=$(ls -l | grep -c ^d )
if [ $MIN_FILES -lt $FILE_COUNT ]; then
while [ $MIN_FILES -lt $FILE_COUNT ]; do
FILE_COUNT=$[$FILE_COUNT-1]
FILE_TO_DEL=$(ls -t | tail -n1)
# be careful with this one
rm -rf "$FILE_TO_DEL"
done
fi
Explanation:
FILE_COUNT=$(ls -l | grep -c ^d ) counts all files in the current folder. Instead of grep we could use also wc -l but wc was not installed on that host.
FILE_COUNT=$[$FILE_COUNT-1] update the current $FILE_COUNT
FILE_TO_DEL=$(ls -t | tail -n1) Save the oldest file name in the $FILE_TO_DEL variable. tail -n1 returns the last element in the list.
Based on others suggestions and some awk foo, I got this to work. I know this an old thread, but I didn't find a decent answer here and this sorted it for me. This just deletes the oldest file, but you can change the head -n 1 to 10 and get the oldest 10.
find $DIR -type f -printf '%T+ %p\n' | sort | head -n 1 | awk '{first =$1; $1 =""; print $0}' | xargs -d '\n' rm
Using inode numbers via stat & find command (to avoid pesky-chars-in-file-name issues):
stat -f "%m %i" * | sort -rn -k 1,1 | tail -n +11 | cut -d " " -f 2 | \
xargs -n 1 -I '{}' find "$(pwd)" -type f -inum '{}' -print
#stat -f "%m %i" * | sort -rn -k 1,1 | tail -n +11 | cut -d " " -f 2 | \
# xargs -n 1 -I '{}' find "$(pwd)" -type f -inum '{}' -delete

Resources