Script to delete all folders barring the last two most modified? - linux

I need to write a recursive script to delete all folders in a subfolder named 'date-2012-01-01_12_30' but leave the two latest.
/var/www/temp/updates/ then hundreds of folders by 'date' and by 'code'
e.g.
/var/www/temp/updates/2012-01-01/temp1/date-2012-01-_12_30
/var/www/temp/updates/2012-01-01/temp1/date-2012-02-_13_30
/var/www/temp/updates/2012-01-01/temp1/date-2013-11-_12_30
/var/www/temp/updates/2012-01-01/temp2/date-2012-01-_12_30
I was thinking about using a find to get the folder but unsure how to know what folders I can delete as the script will have to know how date - folders are in that subfolder and which ones are the latest ones
Hmm, any help would be great?
Code:
$PATH=/var/www/temp/updates/*/*
find $PATH -type d -name "date-*" -printf '%T# %p\n' | sort -n | head -n -2 | cut -f2- | xargs ls -l
The script will need to go through thousands of different folders and keep the two most recent folders - Someone on here helped before but I haven't changed it for the thousands of folders to search through

Can you try this script
PATH1=/var/www/temp/updates
find $PATH1 -iname "date-*" -print0 | ls -tr | tail -2 | xargs -I file rm -fr file
thanx

Actually I think the script will work fine as the find will going through all the folders under /updates/
$PATH=/var/www/temp/updates/*/*
find $PATH -type d -name "date-*" -printf '%T# %p\n' | sort -n | xargs rm -rf

Related

shell script to delete all files except the last updated file in different folders

My application logs will be created in below folders in linux system.
Folder 1: 100001_1001
folder 2 : 200001_1002
folder 3 :300061_1003
folder 4: 300001_1004
folder 5 :400011_1008
want to delete all files except the latest file in above folders and want to add this to cron job.
i tried below one not working need help
30 1 * * * ls -lt /abc/cde/etc/100* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
30 1 * * * ls -lt /abc/cde/etc/200* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
30 1 * * * ls -lt /abc/cde/etc/300* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
30 1 * * * ls -lt /abc/cde/etc/400* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
You can use this pipeline consisting all gnu utilities (so that we can also handle file paths with special characters, whitespaces and glob characters)
find /parent/log/dir -type f -name '*.zip' -printf '%T#\t%p\0' |
sort -zk1,1rn |
cut -zf2 |
tail -z -n +2 |
xargs -0 rm -f
Using a slightly modified approach to your own:
find /abc/cde/etc/100* -printf "%A+\t%p\n" | sort -k1,1r| awk 'NR!=1{print $2}' | xargs -i rm "{}"
The find version doesn't suffer the lack of paths, so this MIGHT work (I don't know anything about the directory structure, and whether 100* points at a directory, a file or a group of files ...
You should use find, instead. It has a -delete action that deletes he files it found that match you specification. Warning: it is very easy to go wrong with -delete. Test your command first. Example, to find all files named *.zip under a/b/c (and only files):
find a/b/c -depth -name '*.zip' -type f -print
This is the test, it will print all files that the final command will delete (do not forget the -depth, it is important). And once you are sure, the command that does the deletion is:
find a/b/c -depth -name '*.zip' -type f -delete
find also has options to select files by last modification date, by size... You could, for instance, find all files that were modified at least 24 hours ago:
find a/b/c -depth -type f -mtime +0 -print
and, after careful check, delete them:
find a/b/c -depth -type f -mtime +0 -delete

Find a bunch of randomly sorted images on disk and copy to target dir

For testing purposes I need a bunch of random images from disc, copied to a specific directory. So, in pseudo code:
find [] -iname "*.jpg"
and then sort -R
and then head -n [number wanted]
and then copy to destination
Is it possible to combine above commands in a single bash command? Like eg:
for i in `find ./images/ -iname "*.jpg" | sort -R | head -n243`; do cp "$i" ./target/; done;
But that doesn't quite work. I feel I'll need an 'xargs' somewhere in there, but I'm afraid I don't understand xargs very well... would I need to pass a 'print0' (or equivalent) to all seperate commands?
[edit]
I left out the final step: I'd like to copy the images to a certain directory under a new (sequential) name. So the first image becomes 1.jpg, the second 2.jpg etc. For this, the command I posted does not work as intended.
The command that you specified also will work without any issues. It works for me well. Can you point out the exact error you are facing.
Meanwhile,
This will just do the trick for you:
find ./images/ -iname "*.jpg" | sort -R | head -n <no. of files> | xargs -I {} cp {} target/
Simply use shuf -n.
Example:
find ./images/ -iname "*.jpg" | shuf -n 10 | xargs cp -t ./target/
It would copy 10 random images to ./target/. If you need 243 just use shuf -n 243.
According to your edit, this should do :
for i in `find ./images/ -iname "*.jpg" | sort -R | head -n2`; do cp $i ./target/$((1 + compt++)).jpg; done;`
Here, you add a counter to keep track of the number of files you already copied.

Delete everything other than file + linked file across multiple servers (NET::SSH::MULTI)

I've got a couple of thousand images that are saved as logs that need to be deleted.
To avoid the limit of rm and to do this across multiple servers, I used the following code
Net::SSH::Multi.start(:on_error => :ignore) do |session|
# define servers in groups for more granular access
session.group :app do
session.use 'example#example', :password=> 'example'
end
# execute commands on a subset of servers
session.with(:app).exec "find /tmp/motion -maxdepth 1 -not -name 'lastsnap.jpg' -print0 | sudo xargs -0 rm"
end
An ls -l lastsnap.jpg shows that lastsnap.jpg is linked to another file, like so
30 Jun 3 08:18 lastsnap.jpg -> 81-20140603081840-snap.jpg
This other file is constantly changed due to logging scenario that i mentioned above.
Reiterating the question, how do I delete every other logged file that is NOT lastsnap.jpg and it's linked file.
Thanks for the help :)
cd /tmp/motion
ls -1 | grep -v -E '$(basename `find . -lname lastsnap.jpg`)|lastsnap.jpg' | while read n ; do rm -rvf $n ; done
EDIT as per the comment
cd /tmp/motion; rm -rvf $(ls -1 | grep -v -E "$(basename `find . -lname lastsnap.jpg`)|lastsnap.jpg")
Note: Make sure that your file names don't have spaces in it. Other wise this method will not work and needs modification in order to accommodate spaces in the file name.
I wrote a logic using find command. Check whether its useful to you.
My directory contains following files
pyramid-stone.jpg
tallest_water_slide.jpg
SAOLA.JPG
testnap.jpg
silicon_valley_talent.jpg
The_Organic_Battery_From_Japan.jpg
Out of which testnap.jpg is a link
testnap.jpg -> pyramid-stone.jpg
So i wrote a small awk script to get the link name and where its pointing to
IG1=`ls -l | grep ^l | awk '{printf $(NF-2);}'`
IG2=`ls -l | grep ^l | awk '{printf $(NF);}'`
Then i used find command to print all jpg's instead of the link
find . -type f \( -iname "*.jpg" ! -iname $IG1 ! -iname $IG2 \)
OP is
./SAOLA.JPG
./silicon_valley_talent.jpg
./tallest_water_slide.jpg
./The_Organic_Battery_From_Japan.jpg
NOTE:You have add rm to remove files after the find command

How to count number of files in each directory?

I am able to list all the directories by
find ./ -type d
I attempted to list the contents of each directory and count the number of files in each directory by using the following command
find ./ -type d | xargs ls -l | wc -l
But this summed the total number of lines returned by
find ./ -type d | xargs ls -l
Is there a way I can count the number of files in each directory?
This prints the file count per directory for the current directory level:
du -a | cut -d/ -f2 | sort | uniq -c | sort -nr
Assuming you have GNU find, let it find the directories and let bash do the rest:
find . -type d -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
find . -type f | cut -d/ -f2 | sort | uniq -c
find . -type f to find all items of the type file, in current folder and subfolders
cut -d/ -f2 to cut out their specific folder
sort to sort the list of foldernames
uniq -c to return the number of times each foldername has been counted
You could arrange to find all the files, remove the file names, leaving you a line containing just the directory name for each file, and then count the number of times each directory appears:
find . -type f |
sed 's%/[^/]*$%%' |
sort |
uniq -c
The only gotcha in this is if you have any file names or directory names containing a newline character, which is fairly unlikely. If you really have to worry about newlines in file names or directory names, I suggest you find them, and fix them so they don't contain newlines (and quietly persuade the guilty party of the error of their ways).
If you're interested in the count of the files in each sub-directory of the current directory, counting any files in any sub-directories along with the files in the immediate sub-directory, then I'd adapt the sed command to print only the top-level directory:
find . -type f |
sed -e 's%^\(\./[^/]*/\).*$%\1%' -e 's%^\.\/[^/]*$%./%' |
sort |
uniq -c
The first pattern captures the start of the name, the dot, the slash, the name up to the next slash and the slash, and replaces the line with just the first part, so:
./dir1/dir2/file1
is replaced by
./dir1/
The second replace captures the files directly in the current directory; they don't have a slash at the end, and those are replace by ./. The sort and count then works on just the number of names.
Here's one way to do it, but probably not the most efficient.
find -type d -print0 | xargs -0 -n1 bash -c 'echo -n "$1:"; ls -1 "$1" | wc -l' --
Gives output like this, with directory name followed by count of entries in that directory. Note that the output count will also include directory entries which may not be what you want.
./c/fa/l:0
./a:4
./a/c:0
./a/a:1
./a/a/b:0
Slightly modified version of Sebastian's answer using find instead of du (to exclude file-size-related overhead that du has to perform and that is never used):
find ./ -mindepth 2 -type f | cut -d/ -f2 | sort | uniq -c | sort -nr
-mindepth 2 parameter is used to exclude files in current directory. If you remove it, you'll see a bunch of lines like the following:
234 dir1
123 dir2
1 file1
1 file2
1 file3
...
1 fileN
(much like the du-based variant does)
If you do need to count the files in current directory as well, use this enhanced version:
{ find ./ -mindepth 2 -type f | cut -d/ -f2 | sort && find ./ -maxdepth 1 -type f | cut -d/ -f1; } | uniq -c | sort -nr
The output will be like the following:
234 dir1
123 dir2
42 .
Everyone else's solution has one drawback or another.
find -type d -readable -exec sh -c 'printf "%s " "$1"; ls -1UA "$1" | wc -l' sh {} ';'
Explanation:
-type d: we're interested in directories.
-readable: We only want them if it's possible to list the files in them. Note that find will still emit an error when it tries to search for more directories in them, but this prevents calling -exec for them.
-exec sh -c BLAH sh {} ';': for each directory, run this script fragment, with $0 set to sh and $1 set to the filename.
printf "%s " "$1": portably and minimally print the directory name, followed by only a space, not a newline.
ls -1UA: list the files, one per line, in directory order (to avoid stalling the pipe), excluding only the special directories . and ..
wc -l: count the lines
This can also be done with looping over ls instead of find
for f in */; do echo "$f -> $(ls $f | wc -l)"; done
Explanation:
for f in */; - loop over all directories
do echo "$f -> - print out each directory name
$(ls $f | wc -l) - call ls for this directory and count lines
This should return the directory name followed by the number of files in the directory.
findfiles() {
echo "$1" $(find "$1" -maxdepth 1 -type f | wc -l)
}
export -f findfiles
find ./ -type d -exec bash -c 'findfiles "$0"' {} \;
Example output:
./ 6
./foo 1
./foo/bar 2
./foo/bar/bazzz 0
./foo/bar/baz 4
./src 4
The export -f is required because the -exec argument of find does not allow executing a bash function unless you invoke bash explicitly, and you need to export the function defined in the current scope to the new shell explicitly.
My answer is a little different, due to the options of find, you can actually be much more flexible. Just try:
find . -type f -printf "%h\n" | sort | uniq -c
With the "%h" option to "-printf", find prints only the directory of the files it found. Then sort and count with "uniq -c". This prints the number of search result entries with the same directory, per directory.
Using further options on find, you can be much more flexible. For example, to get an overview how many files in which directory have been modified at a certain date, use:
find . -newermt "2022-01-01 00:00:00" -type f -printf "%TY-%Tm-%Td %h\n" | sort | uniq -c
This finds all files that have been modified since 1. January 2022, prints (with "-printf") the modification date and the directory, then sorts and counts them. In this example, each line in the result has the number of files, the date of modification (without time), and the directory.
Note that "-printf" may not be available in all versions of find I think.
I combined #glenn jackman's answer and #pcarvalho's answer(in comment list, there is something wrong with pcarvalho's answer because the extra style control function of character '`'(backtick)).
My script can accept path as an augument and sort the directory list as ls -l, also it can handles the problem of "space in file name".
#!/bin/bash
OLD_IFS="$IFS"
IFS=$'\n'
for dir in $(find $1 -maxdepth 1 -type d | sort);
do
files=("$dir"/*)
printf "%5d,%s\n" "${#files[#]}" "$dir"
done
FS="$OLD_IFS"
My first answer in stackoverflow, and I hope it can help someone ^_^
THis could be another way to browse through the directory structures and provide depth results.
find . -type d | awk '{print "echo -n \""$0" \";ls -l "$0" | grep -v total | wc -l" }' | sh
find . -type f -printf '%h\n' | sort | uniq -c
gives for example:
5 .
4 ./aln
5 ./aln/iq
4 ./bs
4 ./ft
6 ./hot
I tried with some of the others here but ended up with subfolders included in the file count when I only wanted the files. This prints ./folder/path<tab>nnn with the number of files, not including subfolders, for each subfolder in the current folder.
for d in `find . -type d -print`
do
echo -e "$d\t$(find $d -maxdepth 1 -type f -print | wc -l)"
done
This will give the overall count.
for file in */; do echo "$file -> $(ls $file | wc -l)"; done | cut -d ' ' -f 3| py --ji -l 'numpy.sum(l)'
A super fast miracle command, which recursively traverses files to count the number of images in a directory and organize the output by image extension:
find . -type f | sed -e 's/.*\.//' | sort | uniq -c | sort -n | grep -Ei '(tiff|bmp|jpeg|jpg|png|gif)$'
Credits: https://unix.stackexchange.com/a/386135/354980
I edited the script in order to exclude all node_modules directories inside the analyzed one.
This can be used to check if the project number of files is exceeding the maximum number that the file watcher can handle.
find . -type d ! -path "*node_modules*" -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
To check the maximum files that your system can watch:
cat /proc/sys/fs/inotify/max_user_watches
node_modules folder should be added to your IDE/editor excluded paths in slow systems, and the other files count shouldn't ideally exceed the maximum (which can be changed though).
Easy Method:
find ./|grep "Search_file.txt" |cut -d"/" -f2|sort |uniq -c
In my case I needed the count at subfolder level, so I did:
du -a | cut -d/ -f3 | sort | uniq -c | sort -nr
Easy way to recursively find files of a given type. In this case, .jpg files for all folders in current directory:
find . -name *.jpg -print | wc -l
omg why the complex commands. just use something like
find whatever_folder | wc -l

Recursive script Delete folders by name all but 2

I need to write a recursive script to delete all folders in a subfolder named 'date-2012-01-01_12_30' but leave the two latest.
/var/www/temp/updates/ then hundreds of folders by 'date' and by 'code'
e.g.
/var/www/temp/updates/2012-01-01/temp1/date-2012-01-_12_30
/var/www/temp/updates/2012-01-01/temp1/date-2012-02-_13_30
/var/www/temp/updates/2012-01-01/temp1/date-2013-11-_12_30
/var/www/temp/updates/2012-01-01/temp2/date-2012-01-_12_30
I was thinking about using a find to get the folder but unsure how to know what folders I can delete as the script will have to know how date- folders are in that subfolder and which ones are the latest ones
Hmm, any help would be great?
This should work:
find /var/www/temp/updates/ -type d -name "date-*" -printf '%T# %p\n' | sort -n | head -n -2 | cut -f2- | xargs rm -rf
find prints out the directory paths along with their last modification times. This is then sorted and all but the last two are deleted.
If all folders are in subdirectories temp1, temp2, ..., you can just use ls -tr
ls -dtr /var/www/temp/updates/2012-01-01/temp*/* | head -n -2 | xargs rm -rf
This lists all folders sorted by time ls -dtr, takes all but the two latest head and removes the remaining folders xargs rm -rf.

Resources