How do I keep latest 8 backup file and delete the older one - linux

How do I keep the latest 8 backup files and delete the older one
backup-Y-M-D.zip
backup-Y-M-D.zip
backup-Y-M-D.zip
backup-Y-M-D.zip
.
.
backup-Y-M-D.zip
There are about 80 files having .zip extension all I wanted to do is to keep latest 8 files according to the date on which created. I also tried logrotate but failed to rotate logs as it is not doing anything. Below down is the config file of logrotate.
/root/test/*.zip {
daily
missingok
extension .zip
rotate 4
nocompress
}

If the naming convention is guaranteed you could just rely on the alphabetical ordering of the files when expanding a glob pattern to get the oldest or newest files. According to Filename Expansion:
After word splitting, unless the -f option has been set (see The Set Builtin), Bash scans each word for the characters ‘*’, ‘?’, and ‘[’. If one of these characters appears, and is not quoted, then the word is regarded as a pattern, and replaced with an alphabetically sorted list of filenames matching the pattern (see Pattern Matching).
Demo:
[user#hostname]$ touch backup-2022-06-14.zip backup-2022-06-13.zip backup-2021-07-04.zip
[user#hostname]$ echo *
backup-2021-07-04.zip backup-2022-06-13.zip backup-2022-06-14.zip
You can leverage this to get a list of files other than the last N elements:
[user#hostname]$ all_files=(*)
[user#hostname]$ old_files=( "${all_files[#]:0:${#all_files[#]}-1}" ) #change -1 to -8 if you want to keep the 8 newest
[user#hostname]$ echo "${old_files[#]}"
backup-2021-07-04.zip backup-2022-06-13.zip
And then do whatever you want with that list, such as remove it with rm "${old_files[#]}".

One way to do this is with the following one-liner, ran from the directory where the logs are located:
ls -t | head -n -8 | xargs --no-run-if-empty rm
Explanation:
ls -t - lists all the files in order from youngest to oldest
head -n -8 - gets all the files except the first 8
xargs --no-run-if-empty rm - deletes the selected files if there are any, preventing errors if you ever have fewer than 8 logs
If you want to set this up to run automatically every day, giving you peace of mind in case your server is offline on the 7th day of a cycle and misses the one week mark, run crontab -e and add the following to your jobs:
0 0 * * * cd yourDirNameHere && ls -t | head -n -8 | xargs --no-run-if-empty rm
Then the log cleaner will be ran every night at midnight.

Related

How to delete folders except two with linux

I have many directories of backup starting with "backup_".
I want to keep only the two last created folders.
I did this command to show the last two created:
ls -1 -t -d */ | head -2
The problem is i don't know how to exclude the result of that command from remove command (rm -rf | ...).
I know grep -v only works with strings.
In general, xargs is the tool you want to use to pass a generated list of names as arguments to a command. In your case, you just need to invert the head -2 to a command that prints everything except the first 2 lines. eg:
cmd-to-generate-file-list | sed -e 1,2d | xargs rm
The sed will delete the first two lines, and xargs will call rm with each line of output as an argument. Note that it is not generally safe to use ls to generate the file list, but that is a different issue entirely.
A zsh specific approach:
setopt extended_glob # Turn on extended globbing if it's not already enabled
dirs=( backup_*(#q/om) ) # Match only directories, sorted by modification time - newest first
rm -rf "${dirs[#]:2}" # Delete all but the first two elements of that array of directory names
See the documentation for more on zsh glob qualifiers like the above uses. They can make things with filenames that are tedious or difficult to do in other shell dialects trivial.

Quickly list random set of files in directory in Linux

Question:
I am looking for a performant, concise way to list N randomly selected files in a Linux directory using only Bash. The files must be randomly selected from different subdirectories.
Why I'm asking:
In Linux, I often want to test a random selection of files in a directory for some property. The directories contain 1000's of files, so I only want to test a small number of them, but I want to take them from different subdirectories in the directory of interest.
The following returns the paths of 50 "randomly"-selected files:
find /dir/of/interest/ -type f | sort -R | head -n 50
The directory contains many files, and resides on a mounted file system with slow read times (accessed through ssh), so the command can take many minutes. I believe the issue is that the first find command finds every file (slow), and only then prints a random selection.
If you are using locate and updatedb updates regularly (daily is probably the default), you could:
$ locate /home/james/test | sort -R | head -5
/home/james/test/10kfiles/out_708.txt
/home/james/test/10kfiles/out_9637.txt
/home/james/test/compr/bar
/home/james/test/10kfiles/out_3788.txt
/home/james/test/test
How often do you need it? Do the work periodically in advance to have it quickly available when you need it.
Create a refreshList script.
#! /bin/env bash
find /dir/of/interest/ -type f | sort -R | head -n 50 >/tmp/rand.list
mv -f /tmp/rand.list ~
Put it in your crontab.
0 7-20 * * 1-5 nice -25 ~/refresh
Then you will always have a ~/rand.list that's under an hour old.
If you don't want to use cron and aren't too picky about how old it is, just write a function that refreshes the file after you use it every time.
randFiles() {
cat ~/rand.list
{ find /dir/of/interest/ -type f |
sort -R | head -n 50 >/tmp/rand.list
mv -f /tmp/rand.list ~
} &
}
If you can't run locate and the find command is too slow, is there any reason this has to be done in real time?
Would it be possible to use cron to dump the output of the find command into a file and then do the random pick out of there?

Keep newest x amount of files delete rest bash

I have this this bash script as a crontab running every hour. I want to keep the latest 1,000 images in a folder, deleting the oldest files. I don't want to delete by mtime because if no new files are being uploaded, I want to keep them, it's fine to keep if image is 1 day or 50 days old, I just want when image 1,001 is uploaded (newest) image_1 (oldest) will be deleted, cycling through folder to keep a static amount of 1,000 images.
This works, However at ever hour, there could be now 1,200 by the time it executes. Running the crontab every say minute seems to be overkill. Can I make it so once the folder hits 1,001 images it auto executes? Basically I want the folder to be self-scanning and keep the newest 1,000 images, deleted the oldest one.
#!/bin/sh
cd /folder/to/execute; ls -t | sed -e '1,1000d' | xargs -d '\n' rm
keep=10 #set this to how many files want to keep
discard=$(expr $keep - $(ls|wc -l))
if [ $discard -lt 0 ]; then
ls -Bt|tail $discard|tr '\n' '\0'|xargs -0 printf "%b\0"|xargs -0 rm --
fi
This first calculates the number of files to delete, then safely passes them to rm. It uses negative numbers intentionally, since that conveniently works as the argument to tail.
The use of tr and xargs -0 is to ensure that this works even if file names contain spaces. The printf bit is to handle file names containing newlines.
EDIT: added -- to rm args to be safe if any of the files to be deleted start with a hyphen.
Try the following script.It first checks the count in the current directory and then , if the count is greater than 1000 , it evaluates the difference and then gets the oldest such files.
#/bin/bash
count=`ls -1 | wc -l`
if [ $count -gt 1000 ]
then
difference=${count-1000}
dirnames=`ls -t * | tail -n $difference`
arr=($dirnames)
for i in "${arr[#]}"
do
echo $i
done
fi

Linux/Perl Returning list of folders that have not been modified for over x minutes

I have a directory that has multiple folders. I want to get a list of the names of folders that have not been modified in the last 60 minutes.
The folders will have multiple files that will remain old so I can't use -mmin +60
I was thinking I could do something with inverse though. Get a list of files that have been modified in 60 minutes -mmin -60 and then output the inverse of this list.
Not sure to go about doing that or if there is a simpler way to do so?
Eventually I will take these list of folders in a perl script and will add them to a list or something.
This is what I have so far to get the list of folders
find /path/to/file -mmin -60 | sed 's/\/path\/to\/file\///' | cut -d "/" -f1 | uniq
Above will give me just the names of the folders that have been updated.
There is a neat trick to do set operations on text lines like this with sort and uniq. You have already the paths that have been updated. Assume they are in a file called upd. A simple find -type d can give you all folders. Lets assume to have them in file all. Then run
cat all upd | sort |uniq -c |grep '^1'
All paths that appear in both files will have a count of 2 prefixed. All paths only appearing in file all will be prefixed with a 1. The lines prefixed with 1 represent the set difference between all and upd, i.e. the paths that were not touched. (I take it you are able to remove the prefix 1 yourself.)
Surely this can be done with perl or any other scripting language, but this simple sort|uniq is just too nice.-)
The diff command is made for this.
Given two files, "all":
# cat all
/dir1/old2
/dir2/old4
/dir2/old5
/dir1/new1
/dir2/old2
/dir2/old3
/dir1/old1
/dir1/old3
/dir1/new4
/dir2/new1
/dir2/old1
/dir1/new2
/dir2/new2
and "updated":
# cat updated
/dir2/new1
/dir1/new4
/dir2/new2
/dir1/new2
/dir1/new1
We can sort the files and run diff. For this task,I prefer inline sorting:
# diff <(sort all) <(sort updated)
4,6d3
< /dir1/old1
< /dir1/old2
< /dir1/old3
9,13d5
< /dir2/old1
< /dir2/old2
< /dir2/old3
< /dir2/old4
< /dir2/old5
If there are any files in "updated" that aren't in "all", they'll be prefixed with '>'.

Clearing archive files with linux bash script

Here is my problem,
I have a folder where is stored multiple files with a specific format:
Name_of_file.TypeMM-DD-YYYY-HH:MM
where MM-DD-YYYY-HH:MM is the time of its creation. There could be multiple files with the same name but not the same time of course.
What i want is a script that can keep the 3 newest version of each file.
So, I found one example there:
Deleting oldest files with shell
But I don't want to delete a number of files but to keep a certain number of newer files. Is there a way to get that find command, parse in the Name_of_file and keep the 3 newest???
Here is the code I've tried yet, but it's not exactly what I need.
find /the/folder -type f -name 'Name_of_file.Type*' -mtime +3 -delete
Thanks for help!
So i decided to add my final solution in case anyone liked to get it. It's a combination of the 2 solutions given.
ls -r | grep -P "(.+)\d{4}-\d{2}-\d{2}-\d{2}:\d{2}" | awk 'NR > 3' | xargs rm
One line, super efficiant. If anything changes on the pattern of date or name just change the grep -P pattern to match it. This way you are sure that only the files fitting this pattern will get deleted.
Can you be extra, extra sure that the timestamp on the file is the exact same timestamp on the file name? If they're off a bit, do you care?
The ls command can sort files by timestamp order. You could do something like this:
$ ls -t | awk 'NR > 3' | xargs rm
THe ls -t lists the files by modification time where the newest are first.
The `awk 'NR > 3' prints out the list of files except for the first three lines which are the three newest.
The xargs rm will remove the files that are older than the first three.
Now, this isn't the exact solution. There are possible problems with xargs because file names might contain weird characters or whitespace. If you can guarantee that's not the case, this should be okay.
Also, you probably want to group the files by name, and keep the last three. Hmm...
ls | sed 's/MM-DD-YYYY-HH:MM*$//' | sort -u | while read file
do
ls -t $file* | awk 'NR > 3' | xargs rm
done
The ls will list all of the files in the directory. The sed 's/\MM-DD-YYYY-HH:MM//' will remove the date time stamp from the files. Thesort -u` will make sure you only have the unique file names. Thus
file1.txt-01-12-1950
file2.txt-02-12-1978
file2.txt-03-12-1991
Will be reduced to just:
file1.txt
file2.txt
These are placed through the loop, and the ls $file* will list all of the files that start with the file name and suffix, but will pipe that to awk which will strip out the newest three, and pipe that to xargs rm that will delete all but the newest three.
Assuming we're using the date in the filename to date the archive file, and that is possible to change the date format to YYYY-MM-DD-HH:MM (as established in comments above), here's a quick and dirty shell script to keep the newest 3 versions of each file within the present working directory:
#!/bin/bash
KEEP=3 # number of versions to keep
while read FNAME; do
NODATE=${FNAME:0:-16} # get filename without the date (remove last 16 chars)
if [ "$NODATE" != "$LASTSEEN" ]; then # new file found
FOUND=1; LASTSEEN="$NODATE"
else # same file, different date
let FOUND="FOUND + 1"
if [ $FOUND -gt $KEEP ]; then
echo "- Deleting older file: $FNAME"
rm "$FNAME"
fi
fi
done < <(\ls -r | grep -P "(.+)\d{4}-\d{2}-\d{2}-\d{2}:\d{2}")
Example run:
[me#home]$ ls
another_file.txt2011-02-11-08:05
another_file.txt2012-12-09-23:13
delete_old.sh
not_an_archive.jpg
some_file.exe2011-12-12-12:11
some_file.exe2012-01-11-23:11
some_file.exe2012-12-10-00:11
some_file.exe2013-03-01-23:11
some_file.exe2013-03-01-23:12
[me#home]$ ./delete_old.sh
- Deleting older file: some_file.exe2012-01-11-23:11
- Deleting older file: some_file.exe2011-12-12-12:11
[me#home]$ ls
another_file.txt2011-02-11-08:05
another_file.txt2012-12-09-23:13
delete_old.sh
not_an_archive.jpg
some_file.exe2012-12-10-00:11
some_file.exe2013-03-01-23:11
some_file.exe2013-03-01-23:12
Essentially, but changing the file name to dates in the form to YYYY-MM-DD-HH:MM, a normal string sort (such as that done by ls) will automatically group similar files together sorted by date-time.
The ls -r on the last line simply lists all files within the current working directly print the results in reverse order so newer archive files appear first.
We pass the output through grep to extract only files that are in the correct format.
The output of that command combination is then looped through (see the while loop) and we can simply start deleting after 3 occurrences of the same filename (minus the date portion).
This pipeline will get you the 3 newest files (by modification time) in the current dir
stat -c $'%Y\t%n' file* | sort -n | tail -3 | cut -f 2-
To get all but the 3 newest:
stat -c $'%Y\t%n' file* | sort -rn | tail -n +4 | cut -f 2-

Resources