Bash script to delete all but N files when sorted alphabetically - linux

It's hard to explain in the title.
I have a bash script that runs daily to backup one folder into a zip file. The zip files are named worldYYYYMMDD.zip with YYYYMMDD being the date of backup. What I want to do is delete all but the 5 most recent backups. Sorting the files alphabetically will list the oldest ones first, so I basically need to delete all but the last 5 files when sorted in alphabetical order.

The following line should do the trick.
ls -F world*.zip | head -n -5 | xargs -r rm
ls -F: List the files alphabetically
head -n -5: Filter out all lines except the last 5
xargs -r rm: remove each given file. -r: don't run rm if the input is empty

How about this:
find /your/directory -name 'world*.zip' -mtime +5 | xargs rm
Test it before. This should remove all world*.zip files older than 5 days. So a different logic than you have.

I can't test it right now because I don't have a Linux machine, but I think it should be:
rm `ls -A | head -5`

ls | grep ".*[\.]zip" | sort | tail -n-5 | while read file; do rm $file; done
sort sorts the files
tail -n-5 returns all but the 5 most recent
the while loop does the deleting

ls world*.zip | sort -r | tail n+5 | xargs rm
sort -r will sort in reversed order, so the newest will be at the top
tail n+5 will output lines, starting with the 5th
xargs rm will remove the files. Xargs is used to pass stdin as parameters to rm.

Related

How to delete X number of files in a directory

To get X number of files in a directory, I can do:
$ ls -U | head -40000
How would I then delete these 40,000 files? For example, something like:
$ "rm -rf" (ls -U | head -40000)
The tool you need for this is xargs. It will convert standard input into arguments to a command that you specify. Each line of the input is treated as a single argument.
Thus, something like this would work (see the comment below, though, ls shouldn't be parsed this way normally):
ls -U | head -40000 | xargs rm -rf
I would recommend before trying this to start with a small head size and use xargs echo to print out the filenames being passed so you understand what you'll be deleting.
Be aware if you have files with weird characters that this can sometimes be a problem. If you are on a modern GNU system you may also wish to use the arguments to these commands that use null characters to separate each element. Since a filename cannot contain a null character that will safely parse all possible names. I am not aware of a simple way to take the top X items when they are zero separated.
So, for example you can use this to delete all files in a directory
find . -maxdepth 1 -print0 | xargs -0 rm -rf
Use a bash array and slice it. If the number and size of arguments is likely to get close to the system's limits, you can still use xargs to split up the remainder.
files=( * )
printf '%s\0' "${files[#]:0:40000}" | xargs -0 rm
What about using awk as the filter?
find "$FOLDER" -maxdepth 1 -mindepth 1 -print0 \
| awk -v limit=40000 'NR<=limit;NR>limit{exit}' RS="\0" ORS="\0" \
| xargs -0 rm -rf
It will reliably remove at most 40.000 files (or folders). Reliably means regardless of which characters the filenames may contain.
Btw, to get the number of files in a directory reliably you can do:
find FOLDER -mindepth 1 -maxdepth 1 -printf '.' | wc -c
I ended up doing this since my folders were named with sequential numbers. This should also work for alphabetical folders:
ls -r releases/ | sed '1,3d' | xargs -I {} rm -rf releases/{}
Details:
list all the items in the releases/ folder in reverse order
slice off the first 3 items (which would be the newest if numeric/alpha naming)
for each item, rm it
In your case, you can replace ls -r with ls -U and 1,3d with 1,40000d. That should be the same, I believe.

Bash script to delete files in a directory if there are more than 5

This is a backup script that copies files from one directory to another. I use a for loop to check if there are more than five files. If there are, the loop should delete the oldest entries first.
I tried ls -tr | head -n -5 | xargs rm from the command line and it works successfully to delete older files if there are more than 5 in the directory.
However, when I put it into my for loop, I get an error rm: missing operand
Here is the full script. I don't think I am using the for loop correctly in the script, but I'm really not sure how to use the commands ls -tr | head -n -5 | xargs rm in a loop that iterates over the files in the directory.
timestamp=$(date +"%m-%d-%Y")
dest=${HOME}/mybackups
src=${HOME}/safe
fname='bu_'
ffname=${HOME}/mybackups/${fname}${timestamp}.tar.gz
# for loop for deletion of file
for f in ${HOME}/mybackups/*
do
ls -tr | head -n -5 | xargs rm
done
if [ -e $ffname ];
then
echo "The backup for ${timestamp} has failed." | tee ${HOME}/mybackups/Error_${timestamp}
else
tar -vczf ${dest}/${fname}${timestamp}.tar.gz ${src}
fi
Edit: I took out the for loop, so it's now just:
[...]
ffname=${HOME}/mybackups/${fname}${timestamp}.tar.gz
ls -tr | head -n -5 | xargs rm
if [ -e $ffname ];
[...]
The script WILL work if it is in the mybackups directory, however, I continue to get the same error if it is not in that directory. The script gets the file names but tries to remove them from the current directory, I think... I tried several modifications but nothing has worked so far.
I get an error rm: missing operand
The cause of that error is that there are no files left to be deleted. To avoid that error, use the --no-run-if-empty option:
ls -tr | head -n -5 | xargs --no-run-if-empty rm
In the comments, mklement0 notes that this issue is peculiar to GNU xargs. BSD xargs will not run with an empty argument. Consequently, it does not need and does not support the --no-run-if-empty option.
More
Quoting from a section of code in the question:
for f in ${HOME}/mybackups/*
do
ls -tr | head -n -5 | xargs rm
done
Note that (1) f is never used for anything and (2) this runs the ls -tr | head -n -5 | xargs rm several times in a row when it needs to be run only once.
Obligatory Warning
Your approach parses the output of ls. This makes for a simple and easily understood command. It can work if all your files are sensibly named. It will not work in general. For more on this, see: Why you shouldn't parse the output of ls(1).
Safer Alternative
The following will work with all manner of file names, whether they contains spaces, tabs, newlines, or whatever:
find . -maxdepth 1 -type f -printf '%T# %i\n' | sort -n | head -n -5 | while read tstamp inode
do
find . -inum "$inode" -delete
done
SMH. I ended up coming up to the simplest solution in the world by just cd-ing into the directory before I ran ls -tr | head -n -5 | xargs rm . Thanks for everyone's help!
timestamp=$(date +"%m-%d-%Y")
dest=${HOME}/mybackups
src=${HOME}/safe
fname='bu_'
ffname=${HOME}/mybackups/${fname}${timestamp}.tar.gz
cd ${HOME}/mybackups
ls -tr | head -n -5 | xargs rm
if [ -e $ffname ];
then
echo "The backup for ${timestamp} has failed." | tee ${HOME}/mybackups/Error_${timestamp}
else
tar -vczf ${dest}/${fname}${timestamp}.tar.gz ${src}
fi
This line ls -tr | head -n -5 | xargs rm came from here
ls -tr displays all the files, oldest first (-t newest first, -r
reverse).
head -n -5 displays all but the 5 last lines (ie the 5 newest files).
xargs rm calls rm for each selected file
.

Remove older backup from directory using shell command

In my shell script, I am creating a backup of my folder. I am setting this activity by cronjob and the schedule keeps on changing.
It is keeping the backup with timestamp. Like for e.g :
cd /tmp/BACKUP_DIR
backup_06-05-2014.tar
backup_06-08-2014.tar
backup_06-10-2014.tar
What I want, whenever I run the script, it should keep the latest one and the previously taken backup only. And delete the remaining backups.
Like if I run the script now, it should keep
backup_06-10-2014.tar
backup_06-18-2014.tar
And delete all the other one. What rm command should I use ?
Try as follows:
rm $(ls -1t /tmp/BACKUP_DIR | tail -n +2)
Listing sorted by date names of files with remaining only two newest
You could try deleting files older that 7 days using a find command, for example :
find /tmp/BACKUP_DIR -maxdepth 1 -type f -name "backup_*.tar" -mtime +6 -exec rm -f {} \;
Use
rm -rf `ls -lth backup_*.tar | awk '{print $NF}' | tail -n +4`
ls -lth backup_*.tar will give the sorted list of backup files (newest being at top)
awk '{print $NF}' will print file names and pass it to tail
tail -n +4 , will print file from number 3
At last tail's result is fed to rm to act
Another simplified method
rm -rf `ls -1rt backup_*.tar | tail -n +3`

Recursive script Delete folders by name all but 2

I need to write a recursive script to delete all folders in a subfolder named 'date-2012-01-01_12_30' but leave the two latest.
/var/www/temp/updates/ then hundreds of folders by 'date' and by 'code'
e.g.
/var/www/temp/updates/2012-01-01/temp1/date-2012-01-_12_30
/var/www/temp/updates/2012-01-01/temp1/date-2012-02-_13_30
/var/www/temp/updates/2012-01-01/temp1/date-2013-11-_12_30
/var/www/temp/updates/2012-01-01/temp2/date-2012-01-_12_30
I was thinking about using a find to get the folder but unsure how to know what folders I can delete as the script will have to know how date- folders are in that subfolder and which ones are the latest ones
Hmm, any help would be great?
This should work:
find /var/www/temp/updates/ -type d -name "date-*" -printf '%T# %p\n' | sort -n | head -n -2 | cut -f2- | xargs rm -rf
find prints out the directory paths along with their last modification times. This is then sorted and all but the last two are deleted.
If all folders are in subdirectories temp1, temp2, ..., you can just use ls -tr
ls -dtr /var/www/temp/updates/2012-01-01/temp*/* | head -n -2 | xargs rm -rf
This lists all folders sorted by time ls -dtr, takes all but the two latest head and removes the remaining folders xargs rm -rf.

Copy the three newest files under one directory (recursively) to another specified directory

I'm using bash.
Suppose I have a log file directory /var/myprogram/logs/.
Under this directory I have many sub-directories and sub-sub-directories that include different types of log files from my program.
I'd like to find the three newest files (modified most recently), whose name starts with 2010, under /var/myprogram/logs/, regardless of sub-directory and copy them to my home directory.
Here's what I would do manually
1. Go through each directory and do ls -lt 2010*
to see which files starting with 2010 are modified most recently.
2. Once I go through all directories, I'd know which three files are the newest. So I copy them manually to my home directory.
This is pretty tedious, so I wondered if maybe I could somehow pipe some commands together to do this in one step, preferably without using shell scripts?
I've been looking into find, ls, head, and awk that I might be able to use but haven't figured the right way to glue them together.
Let me know if I need to clarify. Thanks.
Here's how you can do it:
find -type f -name '2010*' -printf "%C#\t%P\n" |sort -r -k1,1 |head -3 |cut -f 2-
This outputs a list of files prefixed by their last change time, sorts them based on that value, takes the top 3 and removes the timestamp.
Your answers feel very complicated, how about
for FILE in find . -type d; do ls -t -1 -F $FILE | grep -v "/" | head -n3 | xargs -I{} mv {} ..; done;
or laid out nicely
for FILE in `find . -type d`;
do
ls -t -1 -F $FILE | grep -v "/" | grep "^2010" | head -n3 | xargs -I{} mv {} ~;
done;
My "shortest" answer after quickly hacking it up.
for file in $(find . -iname *.php -mtime 1 | xargs ls -l | awk '{ print $6" "$7" "$8" "$9 }' | sort | sed -n '1,3p' | awk '{ print $4 }'); do cp $file ../; done
The main command stored in $() does the following:
Find all files recursively in current directory matching (case insensitive) the name *.php and having been modified in the last 24 hours.
Pipe to ls -l, required to be able to sort by modification date, so we can have the first three
Extract the modification date and file name/path with awk
Sort these files based on datetime
With sed print only the first 3 files
With awk print only their name/path
Used in a for loop and as action copy them to the desired location.
Or use #Hasturkun's variant, which popped as a response while I was editing this post :)

Resources