Delete the oldest five files with cron or launchctl - cron

I'm currently backing up a project into a tarball. I'd only like to keep the five most recent backups. Currently, the script reads:
tar -cjf $HOME/projects/foo.$(date +%Y%m%d%H%M%S).tar.bz2 $HOME/projects/foo > /dev/null 2>&1
find $HOME/projects -maxdepth 1 -name "foo*.tar.bz2" | ghead -n -5 | xargs rm > /dev/null 2>&1
# CR and blank line
The tarballs are created, but the old ones are never removed. Odd thing is, when I copy and paste the second line into a shell, the files are deleted as expected. The script functions as expected when called manually via the command line. Is the script not reaching the second line, or are there some rules about running commands via cron I'm not aware of?
Mac OS X 10.8 with gnu-coreutils. Attempted under cron and using launch services, with the same results.

The following script now works when called from cron or an equivalent. Thanks to #ansh0l for the clue.
PATH=/usr/local/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin
cd $HOME/projects
tar -cjf foo.$(date +%Y%m%d%H%M%S).tar.bz2 foo
find ./ -maxdepth 1 -name "foo*.tar.bz2" | ghead -n -5 | xargs rm

Related

How to use ls command output in rm for a particular directory

I want to delete oldest files in a directory when the number of files is greater than 5. I'm using
(ls -1t | tail -n 3)
to get the oldest 3 files in the directory. This works exactly as I want. Now I want to delete them in a single command with rm. As I'm running these commands on a Linux server, cd into the directory and deleting is not working so I need to use either find or ls with rm and delete the oldest 3 files. Please help out.
Thanks :)
If you want to delete files from some arbitrary directory, then pass the directory name into the ls command. The default is to use the current directory.
Then use $() parameter expansion to transfer the result of tail into rm like this
rm $(ls -1t dirname| tail -n 3)
rm $(ls -1t | tail -n 3) 2> /dev/null
ls may return No such file or directory error message, which may cause rm to run unnessesary with that value.
With the help of following answer: find - suppress "No such file or directory" errors and https://unix.stackexchange.com/a/140647/198423
find $dirname -type d -exec ls -1t {} + | tail -n 3 | xargs rm -rf

Remove older backup from directory using shell command

In my shell script, I am creating a backup of my folder. I am setting this activity by cronjob and the schedule keeps on changing.
It is keeping the backup with timestamp. Like for e.g :
cd /tmp/BACKUP_DIR
backup_06-05-2014.tar
backup_06-08-2014.tar
backup_06-10-2014.tar
What I want, whenever I run the script, it should keep the latest one and the previously taken backup only. And delete the remaining backups.
Like if I run the script now, it should keep
backup_06-10-2014.tar
backup_06-18-2014.tar
And delete all the other one. What rm command should I use ?
Try as follows:
rm $(ls -1t /tmp/BACKUP_DIR | tail -n +2)
Listing sorted by date names of files with remaining only two newest
You could try deleting files older that 7 days using a find command, for example :
find /tmp/BACKUP_DIR -maxdepth 1 -type f -name "backup_*.tar" -mtime +6 -exec rm -f {} \;
Use
rm -rf `ls -lth backup_*.tar | awk '{print $NF}' | tail -n +4`
ls -lth backup_*.tar will give the sorted list of backup files (newest being at top)
awk '{print $NF}' will print file names and pass it to tail
tail -n +4 , will print file from number 3
At last tail's result is fed to rm to act
Another simplified method
rm -rf `ls -1rt backup_*.tar | tail -n +3`

Move files to directories based on extension

I am new to Linux. I am trying to write a shell script which will move files to certain folders based on their extension, like for example in my downloads folder, I have all files of mixed file types. I have written the following script
mv *.mp3 ../Music
mv *.ogg ../Music
mv *.wav ../Music
mv *.mp4 ../Videos
mv *.flv ../Videos
How can I make it run automatically when a file is added to this folder? Now I have to manually run the script each time.
One more question, is there any way of combining these 2 statements
mv *.mp3 ../../Music
mv *.ogg ../../Music
into a single statement? I tried using || (C programming 'or' operator) and comma but they don't seem to work.
There is no trigger for when a file is added to a directory. If the file is uploaded via a webpage, you might be able to make the webpage do it.
You can put a script in crontab to do this, on unix machines (or task schedular in windows). Google crontab for a how-to.
As for combining your commands, use the following:
mv *.mp3 *.ogg ../../Music
You can include as many different "globs" (filenames with wildcards) as you like. The last thing should be the target directory.
Two ways:
find . -name '*mp3' -or -name '*ogg' -print | xargs -J% mv % ../../Music
find . -name '*mp3' -or -name '*ogg' -exec mv {} ../Music \;
The first uses a pipe and may run out of argument space; while the second may use too many forks and be slower. But, both will work.
Another way is:
mv -v {*.mp3,*.ogg,*.wav} ../Music
mv -v {*.mp4,*.flv} ../Videos
PS: option -v shows what is going on (verbose).
I like this method:
#!/bin/bash
for filename in *; do
if [[ -f "$filename" ]]; then
base=${filename%.*}
ext=${filename#$base.}
mkdir -p "${ext}"
mv "$filename" "${ext}"
fi
done
incron will watch the filesystem and perform run commands upon certain events.
You can combine multiple commands on a single line by using a command separator. The unconditional serialized command separator is ;.
command1 ; command2
You can use for loop to traverse through folders and subfolders inside the source folder.
The following code will help you move files in pair from "/source/foler/path/" to "/destination/fodler/path/". This code will move file matching their name and having different extensions.
for d in /source/folder/path/*; do
ls -tr $d |grep txt | rev | cut -f 2 -d '.' | rev | uniq | head -n 4 | xargs -I % bash -c 'mv -v '$d'/%.{txt,csv} /destination/folder/path/'
sleep 30
done

What is the best and the fastest way to delete large directory containing thousands of files (in ubuntu)

As I know the commands like
find <dir> -type f -exec rm {} \;
are not the best variant to remove large amount of files (total files, including subfolder). It works good if you have small amount of files, but if you have 10+ mlns files in subfolders, it can hang a server.
Does anyone know any specific linux commands to solve this problem?
It may seem strange but:
$ rm -rf <dir>
Here's an example bash script:
#!/bin/bash
local LOCKFILE=/tmp/rmHugeNumberOfFiles.lock
# this process gets ultra-low priority
ionice -c2 -n7 -p $$ > /dev/null
if [ $? ]; then
echo "Could not set disk IO priority. Exiting..."
exit
fi
renice +19 -p $$ > /dev/null
if [ $? ]; then
echo "Could not renice process. Exiting..."
exit
fi
# check if there's an instance running already. If so--exit
if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
echo "An instance of this script is already running."
exit
fi
# make sure the lockfile is removed when we exit. Then: claim the lock
trap "command rm -f -- $LOCKFILE; exit" INT TERM EXIT
echo $$ > $LOCKFILE
# also create a tempfile, and make sure that's removed too upon exit
tmp=$(tempfile) || exit
trap "command rm -f -- '$tmp'" INT TERM EXIT
# ----------------------------------------
# option 1
# ----------------------------------------
# find your specific files
find "$1" -type f [INSERT SPECIFIC SEARCH PATTERN HERE] > "$tmp"
cat $tmp | rm
# ----------------------------------------
# option 2
# ----------------------------------------
command rm -r "$1"
# remove the lockfile, tempfile
command rm -f -- "$tmp" $LOCKFILE
This script starts by setting its own process priority and diskIO priority to very low values, to ensure other running processes are as unaffected as possible.
Then it makes sure that it is the ONLY such process running.
The core of the script is really up to your preference. You can use rm -r if you are sure that the whole dir can be deleted indesciminately (option 2), or you can use find for more specific file deletion (option 1, possibly using command line options "$2" and onw. for convenience).
In the implementation above, Option 1 (find) first outputs everything to a tempfile, so that the rm function is only called once instead of after each file found by find. When the number of files is indeed huge, this can amount to a significant time saving. On the downside, the size of the tempfile may become an issue, but this is only likely if you're deleting literally billions of files, plus, because the diskIO has such low priority, using a tempfile followed by a single rm may in total be slower than using the find (...) -exec rm {} \; option. As always, you should experiment a bit to see what best fits your needs.
EDIT: As suggested by user946850, you can also skip the whole tempfile and use find (...) -print0 | xargs -0 rm. This has a larger memory footprint, since all full paths to all matching files will be inserted in RAM until the find command is completely finished. On the upside: there is no additional file IO due to writes to the tempfile. Which one to choose depends on your use-case.
The -r (recursive) switch removes everything below a directory, too -- including subdirectories. (Your command does not remove the directories, only the files.)
You can also speed up the find approach:
find -type f -print0 | xargs -0 rm
I tried every one of these commands, but problem I had was that the deletion process was locking the disk, and since no other processes could access it, there was a big pileup of processes trying to access the disk making the problem worse. Run "iotop" and see how much disk IO your process is using.
Here's the python script that solved my problem. It deletes 500 files at a time, then takes a 2 second break to let the other processes do their business, then continues.
import os, os.path
import time
for root, dirs, files in os.walk('/dir/to/delete/files'):
i = 0
file_num = 0
for f in files:
fullpath = os.path.join(root, f)
i = i + 1
file_num = file_num + 1
os.remove(fullpath)
if i%500 == 1:
time.sleep(2)
print "Deleted %i files" % file_num
Hope this helps some people.
If you need to deal with space limit issue on a very large file tree (in my case many perforce branches), that sometimes being hanged while running the find and delete process -
Here's a script that I schedule daily to find all directories with specific file ("ChangesLog.txt"),
and then Sort all directories found that are older than 2 days, and Remove the first matched directory (each schedule there could be a new match):
bash -c "echo #echo Creating Cleanup_Branch.cmd on %COMPUTERNAME% - %~dp0 > Cleanup_Branch.cmd"
bash -c "echo -n 'bash -c \"find ' >> Cleanup_Branch.cmd"
rm -f dirToDelete.txt
rem cd. > dirToDelete.txt
bash -c "find .. -maxdepth 9 -regex ".+ChangesLog.txt" -exec echo {} >> dirToDelete.txt \; & pid=$!; sleep 100; kill $pid "
sed -e 's/\(.*\)\/.*/\1/' -e 's/^./"&/;s/.$/&" /' dirToDelete.txt | tr '\n' ' ' >> Cleanup_Branch.cmd
bash -c "echo -n '-maxdepth 0 -type d -mtime +2 | xargs -r ls -trd | head -n1 | xargs -t rm -Rf' >> Cleanup_Branch.cmd"
bash -c 'echo -n \" >> Cleanup_Branch.cmd'
call Cleanup_Branch.cmd
Note the requirements:
Deleting only those directories with "ChangesLog.txt", since other old directories should not be deleted.
Calling the OS commands in cygwin directly, since otherwise it used Windows default commands.
Collecting the directories to delete into external text file, in order to save find results, since sometimes the find process has hanged.
Setting a timeout to the find process by using & background process that being killed after 100 seconds.
Sorting the directories oldest first, for the delete priority.
If you have a reasonably modern version of find (4.2.3 or greater) you can use the -delete flag.
find <dir> -type f -delete
If you have version 4.2.12 or greater you can take advantage of xargs style command line stacking via the \+ -exec modifier. This way you don't run a separate copy of /bin/rm for every file.
find <dir> -type f -exec rm {} \+
The previous commands are good.
rm -rf directory/ also works faster for billion of files in one folder. I tried that.
If you would like delete tons of files as soon as possible, try this:
find . -type f -print0 | xargs -P 0 -0 rm -f
Note the -P option will make xargs use processes as many as possible.
mv large_folder /tmp/.
sudo reboot
Call to mv is fast - it just modifies labels. System reboot will clear the /tmp folder (mount it again?) in the fastest way possible.
You can create a empty directory and RSYNC it to the directory which you need to empty.
You will avoid time out and memory out issue

How can I use FIND to recursively backup multiple subversion repositories

At the moment our backup script explicitly runs svnadmin hotcopy on each of our repositories every night. Our repos are all stored under a parent directory (/usr/local/svn/repos)
Our backup script has a line for each of the repos under that directory along the lines of:
svnadmin hotcopy /usr/local/svn/repos/myrepo1 /usr/local/backup/myrepo1
Instead of having to manually add a new line for each every new repo we bring online, I was hoping to using the find command to run svnadmin hotcopy for every directory it finds under /usr/local/svn/repos.
So far I've got:
find /usr/local/svn/repos/ -maxdepth 1 -mindepth 1 -type d -exec echo /usr/local/backup{} \;
,where I'm substituting "svnadmin hotcopy" with "echo" for simplicity's sake.
The output of which is:
/usr/local/backup/usr/local/svn/repos/ure
/usr/local/backup/usr/local/svn/repos/cheetah
/usr/local/backup/usr/local/svn/repos/casemgt
/usr/local/backup/usr/local/svn/repos/royalliver
/usr/local/backup/usr/local/svn/repos/ure_andras
/usr/local/backup/usr/local/svn/repos/profserv
/usr/local/backup/usr/local/svn/repos/frontoffice
/usr/local/backup/usr/local/svn/repos/ure.orig
/usr/local/backup/usr/local/svn/repos/projectcommon
/usr/local/backup/usr/local/svn/repos/playground
/usr/local/backup/usr/local/svn/repos/casegen
The problem being the full path is included in {}. I need only the last element of the directory name passed to -exec
The output I want being:
/usr/local/backup/ure
/usr/local/backup/cheetah
/usr/local/backup/casemgt
/usr/local/backup/royalliver
/usr/local/backup/ure_andras
/usr/local/backup/profserv
/usr/local/backup/frontoffice
/usr/local/backup/ure.orig
/usr/local/backup/projectcommon
/usr/local/backup/playground
/usr/local/backup/casegen
I'm pretty much stuck at this point. Can anyone help me out here?
Thanks in advance,
Dave
You were on the right track. Try this:
find /usr/local/svn/repos/ -maxdepth 1 -mindepth 1 -type d -printf "%f\0" | xargs -0 -I{} echo svnadmin hotcopy /usr/local/svn/repos/\{\} /usr/local/backup/\{\}
The %f is like basename and the null plus the -0 on xargs ensures that names with spaces, etc., get passed through successfully.
Just remove the echo and make any adjustments you might need and it should do the trick.
put a cut command at the end
find /usr/local/svn/repos/ -maxdepth 1 -mindepth 1 -type d -exec echo /usr/local/backup{} \| cut -f1,2,3,9 -d"/"
How about adding a sed filter cuting out the middle part?
sed 's/usr.local.svn.repos.//g'
Added like this
find /usr/local/svn/repos/ -maxdepth 1 -mindepth 1 -type d -exec echo /usr/local/backup{} ";" | sed 's/usr.local.svn.repos.//g'
ls -al /usr/local/svn/repos/ |grep '^d' |sed s/^...............................................................//" |xargs -L 1 -I zzyggy echo /usr/local/svn/repos/zzyggy
It's a bit long but it does the trick. You don't have to do everything with find when there are lots of other shell commands, although if I had to write this kind of script, I would do it in Python and leave the shell for interactive work.
ls -al lists all the files in the named directory with attributes
grep '^d' selects the lines beginning with d which are directories
sed strips off all the characters to the left of the actual directory name. You may need to add or delete some dots
xargs takes the list of directory names and issues it one at a time. I specified zzyggy as the name to substitute in the executed command, but you can choose what you like. Of course, you would replace echo with your svnadmin command.
If it was in a shell script you should really do this
SVNDIRNAME="/usr/local/svn/repos"
ls -al $SVNDIRNAME |grep '^d' |sed s/^...............................................................//" |xargs -L 1 -I zzyggy echo $SVNDIRNAME/zzyggy
but I decided to show the wrong and right way just to explain this point. I'm going to tag this with some shell tag, but I still think that a Python script is a superior way to solve this kind of problem in the 21st century.

Resources