Recursive script Delete folders by name all but 2 - linux

I need to write a recursive script to delete all folders in a subfolder named 'date-2012-01-01_12_30' but leave the two latest.
/var/www/temp/updates/ then hundreds of folders by 'date' and by 'code'
e.g.
/var/www/temp/updates/2012-01-01/temp1/date-2012-01-_12_30
/var/www/temp/updates/2012-01-01/temp1/date-2012-02-_13_30
/var/www/temp/updates/2012-01-01/temp1/date-2013-11-_12_30
/var/www/temp/updates/2012-01-01/temp2/date-2012-01-_12_30
I was thinking about using a find to get the folder but unsure how to know what folders I can delete as the script will have to know how date- folders are in that subfolder and which ones are the latest ones
Hmm, any help would be great?

This should work:
find /var/www/temp/updates/ -type d -name "date-*" -printf '%T# %p\n' | sort -n | head -n -2 | cut -f2- | xargs rm -rf
find prints out the directory paths along with their last modification times. This is then sorted and all but the last two are deleted.

If all folders are in subdirectories temp1, temp2, ..., you can just use ls -tr
ls -dtr /var/www/temp/updates/2012-01-01/temp*/* | head -n -2 | xargs rm -rf
This lists all folders sorted by time ls -dtr, takes all but the two latest head and removes the remaining folders xargs rm -rf.

Related

How to delete X number of files in a directory

To get X number of files in a directory, I can do:
$ ls -U | head -40000
How would I then delete these 40,000 files? For example, something like:
$ "rm -rf" (ls -U | head -40000)
The tool you need for this is xargs. It will convert standard input into arguments to a command that you specify. Each line of the input is treated as a single argument.
Thus, something like this would work (see the comment below, though, ls shouldn't be parsed this way normally):
ls -U | head -40000 | xargs rm -rf
I would recommend before trying this to start with a small head size and use xargs echo to print out the filenames being passed so you understand what you'll be deleting.
Be aware if you have files with weird characters that this can sometimes be a problem. If you are on a modern GNU system you may also wish to use the arguments to these commands that use null characters to separate each element. Since a filename cannot contain a null character that will safely parse all possible names. I am not aware of a simple way to take the top X items when they are zero separated.
So, for example you can use this to delete all files in a directory
find . -maxdepth 1 -print0 | xargs -0 rm -rf
Use a bash array and slice it. If the number and size of arguments is likely to get close to the system's limits, you can still use xargs to split up the remainder.
files=( * )
printf '%s\0' "${files[#]:0:40000}" | xargs -0 rm
What about using awk as the filter?
find "$FOLDER" -maxdepth 1 -mindepth 1 -print0 \
| awk -v limit=40000 'NR<=limit;NR>limit{exit}' RS="\0" ORS="\0" \
| xargs -0 rm -rf
It will reliably remove at most 40.000 files (or folders). Reliably means regardless of which characters the filenames may contain.
Btw, to get the number of files in a directory reliably you can do:
find FOLDER -mindepth 1 -maxdepth 1 -printf '.' | wc -c
I ended up doing this since my folders were named with sequential numbers. This should also work for alphabetical folders:
ls -r releases/ | sed '1,3d' | xargs -I {} rm -rf releases/{}
Details:
list all the items in the releases/ folder in reverse order
slice off the first 3 items (which would be the newest if numeric/alpha naming)
for each item, rm it
In your case, you can replace ls -r with ls -U and 1,3d with 1,40000d. That should be the same, I believe.

Create a bash script to delete folders which do not contain a certain filetype

I have recently run into a problem.
I used a utility to move all my music files into directories based on tags. This left a LOT of almost empty folders. The folders, in general, contain a thumbs.db file or some sort of image for album art. The mp3s have the correct album art in their new directories, so the old ones are okay to delete.
Basically, I need to find any directories within D:/Music/ that:
-Do not have any subdirectories
-Do not contain any mp3 files
And then delete them.
I figured this would be easier to do in a shell script or bash script or whatever else linux/unix world than in Windows 8.1 (HAHA).
Any suggestions? I'm not very experienced writing scripts like this.
This should get you started
find /music -mindepth 1 -type d |
while read dt
do
find "$dt" -mindepth 1 -type d | read && continue
find "$dt" -iname '*.mp3' -type f | read && continue
echo DELETE $dt
done
Here's the short story...
find . -name '*.mp3' -o -type d -printf '%h\n' | sort | uniq > non-empty-dirs.tmp
find . -type d -print | sort | uniq > all-dirs.tmp
comm -23 all-dirs.tmp non-empty-dirs.tmp > dirs-to-be-deleted.tmp
less dirs-to-be-deleted.tmp
cat dirs-to-be-deleted.tmp | xargs rm -rf
Note that you might have to run all the commands a few times (depending on your repository's directory depth) before you're done deleting all recursive empty directories...
And the long story goes...
You can approach this problem from two basic perspective: either you find all directories, then iterate over each of them, check if it contain any mp3 file or any subdirectory, if not, mark that directory for deletion. It will works, but on large very large repositories, you might expect a significant run time.
Another approach, which is in my sense much more interesting, is to build a list of directories NOT to be deleted, and subtract that list from the list of all directories. Let's work the second strategy, one step at a time...
First of all, to find the path of all directories that contains mp3 files, you can simply do:
find . -name '*.mp3' -printf '%h\n' | sort | uniq
This means "find any file ending with .mp3, then print the path to it's parent directory".
Now, I could certainly name at least ten different approaches to find directories that contains at least one subdirectory, but keeping the same strategy as above, we can easily get...
find . -type d -printf '%h\n' | sort | uniq
What this means is: "Find any directory, then print the path to it's parent."
Both of these queries can be combined in a single invocation, producing a single list containing the paths of all directories NOT to be deleted.. Let's redirect that list to a temporary file.
find . -name '*.mp3' -o -type d -printf '%h\n' | sort | uniq > non-empty-dirs.tmp
Let's similarly produce a file containing the paths of all directories, no matter if they are empty or not.
find . -type d -print | sort | uniq > all-dirs.tmp
So there, we have, on one side, the complete list of all directories, and on the other, the list of directories not to be deleted. What now? There are tons of strategies, but here's a very simple one:
comm -23 all-dirs.tmp non-empty-dirs.tmp > dirs-to-be-deleted.tmp
Once you have that, well, review it, and if you are satisfied, then pipe it through xargs to rm to actually delete the directories.
cat dirs-to-be-deleted.tmp | xargs rm -rf

Script to delete all folders barring the last two most modified?

I need to write a recursive script to delete all folders in a subfolder named 'date-2012-01-01_12_30' but leave the two latest.
/var/www/temp/updates/ then hundreds of folders by 'date' and by 'code'
e.g.
/var/www/temp/updates/2012-01-01/temp1/date-2012-01-_12_30
/var/www/temp/updates/2012-01-01/temp1/date-2012-02-_13_30
/var/www/temp/updates/2012-01-01/temp1/date-2013-11-_12_30
/var/www/temp/updates/2012-01-01/temp2/date-2012-01-_12_30
I was thinking about using a find to get the folder but unsure how to know what folders I can delete as the script will have to know how date - folders are in that subfolder and which ones are the latest ones
Hmm, any help would be great?
Code:
$PATH=/var/www/temp/updates/*/*
find $PATH -type d -name "date-*" -printf '%T# %p\n' | sort -n | head -n -2 | cut -f2- | xargs ls -l
The script will need to go through thousands of different folders and keep the two most recent folders - Someone on here helped before but I haven't changed it for the thousands of folders to search through
Can you try this script
PATH1=/var/www/temp/updates
find $PATH1 -iname "date-*" -print0 | ls -tr | tail -2 | xargs -I file rm -fr file
thanx
Actually I think the script will work fine as the find will going through all the folders under /updates/
$PATH=/var/www/temp/updates/*/*
find $PATH -type d -name "date-*" -printf '%T# %p\n' | sort -n | xargs rm -rf

Bash script to delete all but N files when sorted alphabetically

It's hard to explain in the title.
I have a bash script that runs daily to backup one folder into a zip file. The zip files are named worldYYYYMMDD.zip with YYYYMMDD being the date of backup. What I want to do is delete all but the 5 most recent backups. Sorting the files alphabetically will list the oldest ones first, so I basically need to delete all but the last 5 files when sorted in alphabetical order.
The following line should do the trick.
ls -F world*.zip | head -n -5 | xargs -r rm
ls -F: List the files alphabetically
head -n -5: Filter out all lines except the last 5
xargs -r rm: remove each given file. -r: don't run rm if the input is empty
How about this:
find /your/directory -name 'world*.zip' -mtime +5 | xargs rm
Test it before. This should remove all world*.zip files older than 5 days. So a different logic than you have.
I can't test it right now because I don't have a Linux machine, but I think it should be:
rm `ls -A | head -5`
ls | grep ".*[\.]zip" | sort | tail -n-5 | while read file; do rm $file; done
sort sorts the files
tail -n-5 returns all but the 5 most recent
the while loop does the deleting
ls world*.zip | sort -r | tail n+5 | xargs rm
sort -r will sort in reversed order, so the newest will be at the top
tail n+5 will output lines, starting with the 5th
xargs rm will remove the files. Xargs is used to pass stdin as parameters to rm.

Copy the three newest files under one directory (recursively) to another specified directory

I'm using bash.
Suppose I have a log file directory /var/myprogram/logs/.
Under this directory I have many sub-directories and sub-sub-directories that include different types of log files from my program.
I'd like to find the three newest files (modified most recently), whose name starts with 2010, under /var/myprogram/logs/, regardless of sub-directory and copy them to my home directory.
Here's what I would do manually
1. Go through each directory and do ls -lt 2010*
to see which files starting with 2010 are modified most recently.
2. Once I go through all directories, I'd know which three files are the newest. So I copy them manually to my home directory.
This is pretty tedious, so I wondered if maybe I could somehow pipe some commands together to do this in one step, preferably without using shell scripts?
I've been looking into find, ls, head, and awk that I might be able to use but haven't figured the right way to glue them together.
Let me know if I need to clarify. Thanks.
Here's how you can do it:
find -type f -name '2010*' -printf "%C#\t%P\n" |sort -r -k1,1 |head -3 |cut -f 2-
This outputs a list of files prefixed by their last change time, sorts them based on that value, takes the top 3 and removes the timestamp.
Your answers feel very complicated, how about
for FILE in find . -type d; do ls -t -1 -F $FILE | grep -v "/" | head -n3 | xargs -I{} mv {} ..; done;
or laid out nicely
for FILE in `find . -type d`;
do
ls -t -1 -F $FILE | grep -v "/" | grep "^2010" | head -n3 | xargs -I{} mv {} ~;
done;
My "shortest" answer after quickly hacking it up.
for file in $(find . -iname *.php -mtime 1 | xargs ls -l | awk '{ print $6" "$7" "$8" "$9 }' | sort | sed -n '1,3p' | awk '{ print $4 }'); do cp $file ../; done
The main command stored in $() does the following:
Find all files recursively in current directory matching (case insensitive) the name *.php and having been modified in the last 24 hours.
Pipe to ls -l, required to be able to sort by modification date, so we can have the first three
Extract the modification date and file name/path with awk
Sort these files based on datetime
With sed print only the first 3 files
With awk print only their name/path
Used in a for loop and as action copy them to the desired location.
Or use #Hasturkun's variant, which popped as a response while I was editing this post :)

Resources