Retaining n most recent directories in a backup script - linux

I have directory in /home/backup/ that stores yearly backups. Inside the backup folder, we have these directories:
/home/backup/2012
/home/backup/2013
/home/backup/2014
/home/backup/2015
/home/backup/2016
/home/backup/2017
and every year I have to clean up the data, keeping only the last three years of backup.
In the above case, I have to delete:
/home/backup/2012
/home/backup/2013
/home/backup/2014
What is the best way to find the directories to be deleted? I have this but it doesn't work:
find /home/ecentrix/recording/ -maxdepth 1 -mindepth 1 -type d -ctime +1095 -exec rm -rf {} \;
Do you guys have another idea to do that?

Since your directories have well-defined and integer names, I'd just use bash to calculate the appropriate targets:
mkdir -p backup/201{2..7} # just for testing
cd backup
rm -fr $(seq 2012 $(( $(date +"%Y") - 3)))
seq generates a list of numbers from 2012 through the current year minus 3, which are then passed to rm to blast them.

A more generic solution
I think it is best to traverse the directories in the descending order and then delete the ones after the third. This way, there is no danger of losing a directory when the script is run again and again:
#!/bin/bash
backups_to_keep=3
count=0
cd /home/backup
while read -d '' -r dir; do
[[ -d "$dir" ]] || continue # skip if not directory
((++count <= backups_to_keep)) && continue # skip if we are within retaining territory
echo "Removing old backup directory '$dir'" # it is good to log what was cleaned up
echo rm -rf -- "$dir"
done < <(find . -maxdepth 1 -name '[2-9][0-9][0-9][0-9]' -type d -print0 | sort -nrz)
Remove the echo before rm -rf after testing. For your example, it gives this output:
rm -rf -- ./2014
rm -rf -- ./2013
rm -rf -- ./2012
cd /home/backup restricts rm -rf to just that directory for extra safety
find . -maxdepth 1 -name '[2-9][0-9][0-9][0-9]' -type d gives the top level directories that match the glob
sort -nrz makes sure newer directories come first, -z processes the null terminated output of find ... -print0
This solution doesn't hardcode the years - it just assumes that the directories to be removed are named in numerically sortable way
It is resilient to any other files or directories being present in the backup directory
There are no side effects if the script is run again and again
This can easily be extended to support different naming conventions for the backup directory - just change the glob expression

Solution
# Check if extended globbing is on
shopt extglob
# If extended globbing is off, run this line
shopt -s extglob
# Remove all files except 2015, 2016, and 2017
rm -r -i /home/backup/!(2015|2016|2017)
# Turn off extended globbing (optional)
shop -u extglob
Explanation
shopt -s extglob allows you to match any files except the ones inside !(...). So that line means remove any file in /home/backup except 2015, 2016, or 2017.
The -i flag in rm -r -i ... allows you to interactively confirm the removal of each file. Remove -i if you want the files to be removed automatically.
Dynamic Dates
This solution is valid for automation (e.g. cron jobs)
# Number of latest years to keep
LATEST_YEARS=3
# Get the current year
current_year=$(date '+%Y')
# Get the first/earliest year to keep
first_year=$(( current_year - $(($LATEST_YEARS - 1)) ))
# Turn on extended globbing
shopt -s extglob
# Store years to keep in an array
keep_years=( $(seq $first_year $current_year) )
# Specify files to keep
rm -r /home/backup/!(${keep_years[0]}|${keep_years[1]}|${keep_years[2]})
NOTE: ALL FILES IN BACKUP DIRECTORY WILL BE REMOVED EXCEPT LAST 3 YEARS

Consider this:
find /home/backup/2* -maxdepth 1 | sort -r | awk "NR>3" | xargs rm -rf
How this works
Produce a list of filenames starting with "2", only under /home/backup/
Alphabetically sort the list, in reverse order.
Use AWK to filter the number of rows in the list. NR specifies the number of reverse-sorted rows. You can change that 3 to be however many rows you want to be left. So if you only wanted the latest two years, change the 3 to a 2. If you wanted the latest 10 to be kept, make it "NR>10".
Append the resultant list to the command "rm -rf".
Run as dedicated user, for safety
The danger here is that I'm suggesting rm -rf. This is risky. If something goes wrong, you could delete things you want to keep. I mitigate this risk by only invoking these commands by a dedicated user that ONLY has permissions to delete backup files (and not beyond).
Merit
The merit of this approach is that when you throw it in a cron job and time advances, it'll continue to retain only the latest few directories. So this, I consider to be a general solution to your problem.
Demonstration
To test this, I created a test directory with all the same directories you have. I altered it just to see what would be executed at the end, so I've tried:
find test01/2* -maxdepth 1 | sort -r | awk "NR>4" | xargs echo rm -rf
I used NR>4 rather than NR>3 (as you'd want) because NR>4 shows that we're selecting how many rows to remove from the list, and thus not delete.
Here's what I get:
The second-to-final command above changed the final stage not to echo what it would do, but actually do it.
I have a crude copy of a dump of this in a script as I use it on some servers of mine, you can view it here: https://github.com/docdawning/teenybackup
Required for success
This approach DEPENDS on the alphabetization of whatever the find command produces. In my case, I use ISO-8601 type dates, which lend themselves entirely to being inherently date-sorted when they're alphabetized. Your YYYY type dates totally qualify.
Additional Safety
I recommend that you change your backups to be stored as tar archives. Then you can change the rm -rf to a simple rm. This is a lot safer, though not fool-proof. Regardless, you really should run this as a dedicated otherwise unprivileged user (as you should do for any script calling a delete, in my opinion).
Be aware that if you start it with
find /home/backup
Then the call to xargs will include /home/backup itself, which would be a disaster, because it'd get deleted too. So you must search within that path. Insteading calling it with the below would work:
find /home/backup/*
The 2* I gave above is just a way of somewhat limiting the search operation.
Warranty
None; this is the Internet. Be careful. Test things heavily to convince yourself. Also, maybe get some offline backups too.
Finally - I previously posted this as an answer, but made the fatal mistake of representing the find command based out of /home/backup and not /home/backup/* or /home/backup/2*. This caused /home/backup to also be sent for deletion, which would be a disaster. It's a very small distinction that I've tried to be clear about above. I've deleted that previous answer and replaced it with this one.

Here is one way.
Updated answer.
[dev]$ find backup/* | grep -vE "$(date '+%Y')|$(date +%Y --date='1 year ago')|$(date +%Y --date='2 year ago')" | xargs rm -rfv
removed directory: ‘backup/2012’
removed directory: ‘backup/2013’
removed directory: ‘backup/2014’

Related

How to delete files which have X days lifetime, not last modified. Is it even possible. Linux

I run some kind of server on my linux machine and I use simple bash script to delete files every 3 and some files every 7 days. I use find command for doing that.But my files are saved periodically, meaning that the last modification day is the current day. So files never get deleted. Only worked for me the first time, because it met the conditions. I can't find a way to delete those files using a creation date, not modification date.
Here's my simple script:
#!/bin/sh
while true
do
java -server file.jar nogui
echo ">$(tput setaf 3)STARTING REBOOT$(tput sgr0) $(tput setaf 7)(Ctrl+C To Stop!)$(tput sgr0)"
find /folder/* -mtime +7 -exec rm -rf {} \;
find /folder/* -mtime +3 -exec rm -rf {} \;
find /logs/* -mtime +1 -exec rm -rf {} \;
echo ">Rebooting in:"
for i in 5 4 3 2 1
do
echo ">$i..."
sleep 1
done
done
If someone could help me with this, I would be really thankful!
Just an idea-don't shoot... :-)
If the files are not system files automatically generated by some process but is lets say server log files, you could possibly echo inside the file the creation date (i.e at the end or beginning) and grep that value later to decide if must be removed or kept.
No, it is not possible. Standard Linux filesystems do not track creation time at all. (ctime, sometimes mistaken for creation time, is metadata change time -- as compared to mtime, which is data modification time).
That said, there are certainly ugly hacks available. For instance, if you have the following script invoked by incron (or, less efficiently, cron) to record file creation dates:
#!/bin/bash
mkdir -p .ctimes
for f in *; do
if [[ -f $f ]] && [[ ! -e .ctimes/$f ]]; then
touch ".ctimes/$f"
fi
done
...then you can look for files in the .ctimes directory that are older than three days, and delete both the markers and the files they stand for:
#!/bin/bash
find .ctimes -mtime +3 -type f -print0 | while IFS= read -r -d '' filename; do
realname=${filename#.ctimes/}
rm -f -- "$filename" "$realname"
done
If you are on ext4 Filesystem there is some hope. You can retrieve it using stat and debugfs utilities. ext4 stores creation time with inode table entry i_crtime which is 'File creation time, in seconds since the epoch' per docs. Reference Link.

Script for deleting multiple directories based on directory name using shell or bash script

I am trying to write a shell script which removed the directories and its contents based on directory name instead of last modified time.
I have following directories in /tmp/ location
2015-05-25
2015-05-26
2015-05-27
2015-05-28
2015-05-29
2015-05-30
2015-05-31
Now I would like to delete all the directories till 2015-05-29. Last modified date is same for all the directories.
Can any one please suggest?
A straightforward but not flexible way (in bash) is:
rm -rf 2015-05-{25..29}
A more flexible way would involve some coding:
ls -d ./2015-* | sort | sed '/2015-06-02/,$d' | xargs rm -r
Sort lexcially all the directories follow the name pattern 2015-*
Use 'sed' to remove all files after (inclusive) 2015-06-02
Use 'xargs' to delete the remaining ones
A simple solution is:
rm -r /tmp/2015-05-2?
If you want to keep 2 folders, try:
ls -d ./2015-* | sort | head -n -2 | xargs echo
Replace -2 with the negative number of folders to keep. Replace echo with rm -r when the output looks correct.
The intention of this question is to find directories whose names indicate a timestamp. Therefore I'd propose to calculate those timestamps to decide whether to delete or not:
ref=$(date --date 2015-05-29 +%s)
for d in ????-??-??; do [[ $(date --date "$d" +%s) -le $ref ]] && rm -rf "$d"; done
ref is the reference date, the names of the other directories are compared to this one.

Unix command deleted every directory even though not specified

I am very new to the unix. I ran the following command.
ls -l | xargs rm -rf bark.*
and above command removed every directory in the folder.
Can any one explained me why ?
The -r argument means "delete recursively" (ie descend into subdirectories). The -f command means "force" (in other words, don't ask for confirmation). -rf means "descend recursively into subdirectories without asking for confirmation"
ls -l lists all files in the directory. xargs takes the input from ls -l and appends it to the command you pass to xargs
The final command that got executed looked like this:
rm -rf bark.* <output of ls -l>
This essentially removed bark.* and all files in the current directory. Moral of the story: be very careful with rm -rf. (You can use rm -ri to ask before deleting files instead)
rm(1) deleted every file and directory in the current working directory because you asked it to.
To see roughly what happened, run this:
cd /etc ; ls -l | xargs echo
Pay careful attention to the output.
I strongly recommend using echo in place of rm -rf when constructing command lines. Only if the output looks fine should you then re-run the command with rm -rf. When in doubt, maybe just use rm -r so that you do not accidentally blow away too much. rm -ir if you are very skeptical of your command line. (I have been using Linux since 1994 and I still use this echo trick when constructing slightly complicated command lines to selectively delete a pile of files.)
Incidentally, I would avoid parsing ls(1) output in any fashion -- filenames can contain any character except ASCII NUL and / chars -- including newlines, tabs, and output that looks like ls -l output. Trying to parse this with tools such as xargs(1) can be dangerous.
Instead, use find(1) for these sorts of things. To delete all files in all directories named bark.*, I'd run a command like this:
find . -type d -name 'bark.*' -print0 | xargs -0 rm -r
Again, I'd use echo in place of rm -r for the first execution -- and if it looked fine, then I'd re-run with rm -r.
The ls -l command gave a list of all the subdirectories in your current present-working-directory (PWD).
The rm command can delete multiple files/directories if you pass them to it as a list.
eg: rm test1.txt test2.txt myApp will delete all three of the files with names:
test1.txt
test2.txt
myApp
Also, the flags for the rm command you used are common in many a folly.
rm -f - Force deletion of files without asking or confirming
rm -r - Recurse into all subdirectories and delete all their contents and subdirectories
So, let's say you are in /home/user, and the directory structure looks like so:
/home/user
|->dir1
|->dir2
`->file1.txt
the ls -l command will provide the list containing "dir1 dir2 file1.txt", and the result of the command ls -l | xargs rm -rf will look like this:
rm -rf dir1 dir2 file1.txt
If we expand your original question with the example above, the final command that gets passed to the system becomes:
rm -rf di1 dir2 file1.txt bark.*
So, everything in the current directory gets wiped out, so the bark.* is redundant (you effectively told the machine to destroy everything in the current directory anyway).
I think what you meant to do was delete all files in the current directory and all subdirectories (recurse) that start with bark. To do that, you just have to do:
find -iname bark.* | xargs rm
The command above means "find all files in this directory and subdirectories, ignoring UPPERCASE/lowercase/mIxEdCaSe, that start with the characters "bark.", and delete them". This could still be a bad command if you have a typo, so to be sure, you should always test before you do a batch-deletion like this.
In the future, first do the following to get a list of all the files you will be deleting first to confirm they are the ones you want deleted.
find -iname bark.* | xargs echo
Then if you are sure, delete them via
find -iname bark.* | xargs rm
Hope this helps.
As a humorous note, one of the most famous instances of "rm -rf" can be found here:
https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/commit/a047be85247755cdbe0acce6f1dafc8beb84f2ac
An automated script runs something like rm -rf /usr/local/........., but due to accidentally inserting a space, the command became rm -rf /usr /local/......, so this effectively means "delete all root folders that start with usr or local", effectively destroying the system of anyone who uses it. I feel bad for that developer.
You can avoid these kinds of bugs by quoting your strings, ie:
rm -rf "/usr/ local/...." would have provided an error message and avoided this bug, because the quotes mean that everything between them is the full path, NOT a list of separate paths/files (ie: you are telling rm that the file/folder has a SPACE character in its name).

Remove all files in a directory (do not touch any folders or anything within them)

I would like to know whether rm can remove all files within a directory (but not the subfolders or files within the subfolders)?
I know some people use:
rm -f /direcname/*.*
but this assumes the filename has an extension which not all do (I want all files - with or without an extension to be removed).
Although find allows you to delete files using -exec rm {} \; you can use
find /direcname -maxdepth 1 -type f -delete
and it is faster. Using -delete implies the -depth option, which means process directory contents before directory.
find /direcname -maxdepth 1 -type f -exec rm {} \;
Explanation:
find searches for files and directories within /direcname
-maxdepth restricts it to looking for files and directories that are direct children of /direcname
-type f restricts the search to files
-exec rm {} \; runs the command rm {} for each file (after substituting the file's path in place of {}).
I would like to know whether rm can remove all files within a directory (but not the subfolders or files within the subfolders)?
That's easy:
$ rm folder/*
Without the -r, the rm command won't touch sub-directories or the files they contain. This will only remove the files in folder and not the sub-directories or their files.
You will see errors telling you that folder/foo is a directory can cannot be removed, but that's actually okay with you. If you want to eliminate these messages, just redirect STDERR:
$ rm folder/* 2> /dev/null
By the way, the exit status of the rm command may not be zero, so you can't check rm for errors. If that's important, you'll have to loop:
$ for file in *
> do
> [[ -f $file ]] && rm $file
> [[ $? -ne 0 ]] && echo "Error in removing file '$file'"
> done
This should work in BASH even if the file names have spaces in them.
You can use
find /direcname -maxdepth 1 -type f -exec rm -f {} \;
A shell solution (without the non-standard find -maxdepth) would be
for file in .* *; do
test -f "$file" && rm "$file"
done
Some shells, notably zsh and perhaps bash version 4 (but not version 3), have a syntax to do that.
With zsh you might just type
rm /dir/path/*(.)
and if you would want to remove any file whose name starts with foo, recursively in subdirectories, you could do
rm /dir/path/**/foo*(.)
the double star feature is (with IMHO better interactive completion) in my opinion enough to switch to zsh for interactive shells. YMMV
The dot in parenthesis suffix indicates that you want only files (not symlinks or directories) to be expanded by the zsh shell.
Unix isn't DOS. There is no special "extension" field in a file name. Any characters after a dot are just part of the name and are called the suffix. There can be more than one suffix, for example.tar.gz. The shell glob character * matches across the . character; it is oblivious to suffixes. So the MS-DOS *.* is just * In Unix.
Almost. * does not match files which start with a .. Objects named with a leading dot are, by convention, "hidden". They do not show up in ls either unless you specify -a.
(This means that the . and .. directory entries for "self" and "parent" are considered hidden.)
To match hidden entries also, use .*
The rm command does not remove directories (when not operated recursively with -r).
Try rm <directory> and see. Even if the directory is empty, it will refuse.
So, the way you remove all (non-hidden) files, pipes, devices, sockets and symbolic links from a directory (but leave the subdirectories alone) is in fact:
rm /path/to/directory/*
to also remove the hidden ones which start with .:
rm /path/to/directory/{*,.*}
This syntax is brace expansion. Brace expansion is not pattern matching; it is just a short-hand for generating multiple arguments, in this case:
rm /path/to/directory/* /path/to/directory/.*
this expansion takes place first first and then globbing takes place to generate the names to be deleted.
Note that various solutions posted here have various issues:
find /path/to/directory -type f -delete
# -delete is not Unix standard; GNU find extension
# without -maxdepth 1 this will recurse over all files
# -maxdepth is also a GNU extension
# -type f finds only files; so this neglects to delete symlinks, fifos, etc.
The GNU find solutions have the benefit that they work even if the number of directory entries to be deleted is huge: too large to pass in a single call to rm. Another benefit is that the built-in -delete does not have issues with passing funny path names to an external command.
The portable workaround for the problem of too many directory entries is to list the entries with ls and pipe to xargs:
( cd /path/to/directory ; ls -a | xargs rm -- )
The parentheses mean "do these commands in a sub-process"; this way the effect of the cd is forgotten, which is useful in scripting. ls -a includes the hidden files.
We now include a -- after rm which means "this is the last option; everything else is a non-option argument". This guards us against directory entries whose names are indistinguishable from options. What if a file is called -rf and ends up the first argument? Then you have rm -rf ... which will blow off subdirectories.
The easiest way to do this is to use:
rm *
In order to remove directories, you must specify the option -r
rm -r
so your directories and anything contained in them will not be removed by using
rm *
per the man page for rm, its purpose is to remove files, which is why this works

What is the safest way to empty a directory in *nix?

I'm scared that one day, I'm going to put a space or miss out something in the command I currently use:
rm -rf ./*
Is there a safer way of emptying the current directory's contents?
The safest way is to sit on your hands before pressing Enter.
That aside, you could create an alias like this one (for Bash)
alias rm="pwd;read;rm"
That will show you your directory, wait for an enter press and then remove what you specified with the proper flags. You can cancel by pressing ^C instead of Enter.
Here is a safer way: use ls first to list the files that will be affected, then use command-line history or history substitution to change the ls to rm and execute the command again after you are convinced the correct files will be operated on.
If you want to be really safe, you could create a simple alias or shell script like:
mv $1 ~/.recycle/
This would just move your stuff to a .recycle folder (hello, Windows!).
Then set up a cron job to do rm -rf on stuff in that folder that is older than a week.
I think this is a reasonable way:
find . -maxdepth 1 \! -name . -print0 | xargs -0 rm -rf
and it will also take care of hidden files and directories. The slash isn't required after the dot and this then will also eliminate the possible accident of typing . /.
Now if you are worried what it will delete, just change it into
find . -maxdepth 1 \! -name . -print | less
And look at the list. Now you can put it into a function:
function enum_files { find . -maxdepth 1 \! -name . "$#"; }
And now your remove is safe:
enum_files | less # view the files
enum_files -print0 | xargs -0 rm -rf # remove the files
If you are not in the habit of having embedded new-lines in filenames, you can omit the -print0 and -0 parameters. But i would use them, just in case :)
Go one level up and type in the directory name
rm -rf <dir>/*
I use one of:
rm -fr .
cd ..; rm -fr name-of-subdirectory
I'm seldom sufficiently attached to a directory that I want to get rid of the contents but must keep the directory itself.
When using rm -rf I almost always use the fully qualified path.
Use the trash command. In Debian/Ubuntu/etc., it can be installed from the package trash-cli. It works on both files and directories (since it's really moving the file, rather than immediately deleting it).
trash implements the freedesktop.org trash specification, compatible with the GNOME and KDE trash.
Files can be undeleted using restore-trash from the same package, or through the usual GUI.
You could always turn on -i which would prompt you on every file, but that would be really time consuming for large directories.
I always do a pwd first.
I'll even go as far as to create an alias so that it forces the prompt for my users. Red Hat does that by default, I think.
You could drop the `f' switch and it should prompt you for each file to make sure you really want to remove it.
If what you want to do is to blow away an entire directory there is always some level of danger associated with that operation. If you really want to be sure that you are doing the right thing you could always do a move operation to some place like /tmp, wait for some amount of time to make sure that everything is okay with the "deletion" in place. Then go into the /tmp directory and ONLY use relative paths for a forced and recursive remove operation. Additional, in the move do a rename to "delete-directoryname" to make it easier not to make a mistake.
For example I want to delete /opt/folder so I do:
mv /opt/folder /tmp/delete-folder
.... wait to be sure everything is okay - maybe a minute, maybe a week ....
cd /tmp
pwd
rm -rf delete-folder/
The most important tip for doing an rm -rf is to always use relative paths. This keeps you from ever having typed a / before having completed your typing.
There's a reason I have [tcsh]:
alias clean '\rm -i -- "#"* *~'
alias rmo 'rm -- *.o'
They were created the first time I accidentally put a space between the * and the .o. Suffice to say, what happened wasn't what I expected to happen...
But things could have been worse. Back in the early '90s, a friend of mine had a ~/etc directory. He wanted to delete it. Unfortunately he typed rm -rf /etc. Unfortunately, he was logged in as root. He had a bad day!
To be evil: touch -- '-rf *'
To be safe, use '--' and -i. Or get it right once and create an alias!
Here are the alias I am using on macOS. It would ask for every rm command before executing.
# ~/.profile
function saferm() {
echo rm "$#"
echo ""
read -p "* execute rm (y/n)? : " yesorno
if [ $yesorno == "y" ]; then
/bin/rm "$#"
fi
}
alias srm=saferm
alias rm=srm

Resources