Linux: Search for old files.... copy the oldest ones to a location... (+- Verify copy)... then delete them

Linux: Search for old files.... copy the oldest ones to a location... (+- Verify copy)... then delete them - linux

I need help with file handling in on Raspbian Stretch Lite running on Raspberry Pi Zero --- fresh install, updated.
The following script is run periodically as a cron job:
partition=/dev/root
imagedir=/etc/opt/kerberosio/capture/
if [[ $(df -h | grep $partition | head -1 | awk -F' ' '{ print $5/1 }' | tr ['%'] ["0"]) -gt 90 ]];
then
echo "Cleaning disk"
find $imagedir -type f | sort | head -n 100 | xargs -r rm -rf;
fi;
Essentially when the SD card is >90% full the oldest 100 files in a directory are deleted.
I want to add some functionality:
1) Copy the 100 oldest files to a NAS drive mounted on the file system and
2) Verify successful copy and
3) Delete the files that were copied.
I have found the following string which may be helpful in the modification of the script above:
find /data/machinery/capture/ -type f -name '*.*' -mtime +1 -exec mv {} /data/machinery/nas/ \;

Related

Delete files older than 3 days if disk usage is over 85 percent in bash

I'm working on a bash code which would enter the command df-h, loop through the disk usage fields and print the number using awk and if any of the disk usage reaches over 85 percent, it would find the files older than 3 days within the log path indicated in variable and remove them.
However upon trying to run the script, it constantly complains that the command was not found on line 6.
This is the code that I'm working on
files =$(find /files/logs -type f -mtime +3 -name '*.log)
process =$(df-h | awk '{print $5+0}')
for i in process
do
if $i -ge 85
then
for k in $files
do
rm -rf $k
done
fi
done;
Its so irritating because I feel that I'm so close to the solution and yet I still cant figure out as to whats wrong with the script that it refuses to work

you are searching files in /files/logs so you are probably only interested in this partition (root partition?)
your if statement did not respect the proper syntax... should be if [[ x -gt y ]]; then....
there was no need to loop thhrough the files collected by find since you can use -exec directly in find (see man find)
#!/bin/bash
# find the partition that contains the log files (this example with root partition)
PARTITION="$(mount | grep "on / type" | awk '{print $1}')"
# find the percentage used of the partition
PARTITION_USAGE_PERCENT="$(df -h | grep "$PARTITION" | awk '{print $5}' | sed s/%//g)"
# if partition too full...
if [[ "$PARTITION_USAGE_PERCENT" -gt 85 ]]; then
# run command to delete extra files
printf "%s%s%s\n" "Disk usage is at " "$PARTITION_USAGE_PERCENT" "% deleting log files..."
find /files/logs -type f -mtime +3 -name '*.log' -exec rm {} \;
fi

LINUX Copy the name of the newest folder and paste it in a command [duplicate]

I would like to find the newest sub directory in a directory and save the result to variable in bash.
Something like this:
ls -t /backups | head -1 > $BACKUPDIR
Can anyone help?

BACKUPDIR=$(ls -td /backups/*/ | head -1)
$(...) evaluates the statement in a subshell and returns the output.

There is a simple solution to this using only ls:
BACKUPDIR=$(ls -td /backups/*/ | head -1)
-t orders by time (latest first)
-d only lists items from this folder
*/ only lists directories
head -1 returns the first item
I didn't know about */ until I found Listing only directories using ls in bash: An examination.

This ia a pure Bash solution:
topdir=/backups
BACKUPDIR=
# Handle subdirectories beginning with '.', and empty $topdir
shopt -s dotglob nullglob
for file in "$topdir"/* ; do
[[ -L $file || ! -d $file ]] && continue
[[ -z $BACKUPDIR || $file -nt $BACKUPDIR ]] && BACKUPDIR=$file
done
printf 'BACKUPDIR=%q\n' "$BACKUPDIR"
It skips symlinks, including symlinks to directories, which may or may not be the right thing to do. It skips other non-directories. It handles directories whose names contain any characters, including newlines and leading dots.

Well, I think this solution is the most efficient:
path="/my/dir/structure/*"
backupdir=$(find $path -type d -prune | tail -n 1)
Explanation why this is a little better:
We do not need sub-shells (aside from the one for getting the result into the bash variable).
We do not need a useless -exec ls -d at the end of the find command, it already prints the directory listing.
We can easily alter this, e.g. to exclude certain patterns. For example, if you want the second newest directory, because backup files are first written to a tmp dir in the same path:
backupdir=$(find $path -type -d -prune -not -name "*temp_dir" | tail -n 1)

The above solution doesn't take into account things like files being written and removed from the directory resulting in the upper directory being returned instead of the newest subdirectory.
The other issue is that this solution assumes that the directory only contains other directories and not files being written.
Let's say I create a file called "test.txt" and then run this command again:
echo "test" > test.txt
ls -t /backups | head -1
test.txt
The result is test.txt showing up instead of the last modified directory.
The proposed solution "works" but only in the best case scenario.
Assuming you have a maximum of 1 directory depth, a better solution is to use:
find /backups/* -type d -prune -exec ls -d {} \; |tail -1
Just swap the "/backups/" portion for your actual path.
If you want to avoid showing an absolute path in a bash script, you could always use something like this:
LOCALPATH=/backups
DIRECTORY=$(cd $LOCALPATH; find * -type d -prune -exec ls -d {} \; |tail -1)

With GNU find you can get list of directories with modification timestamps, sort that list and output the newest:
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\0" | sort -z -n | cut -z -f2- | tail -z -n1
or newline separated
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\n" | sort -n | cut -f2- | tail -n1
With POSIX find (that does not have -printf) you may, if you have it, run stat to get file modification timestamp:
find . -mindepth 1 -maxdepth 1 -type d -exec stat -c '%Y %n' {} \; | sort -n | cut -d' ' -f2- | tail -n1
Without stat a pure shell solution may be used by replacing [[ bash extension with [ as in this answer.

Your "something like this" was almost a hit:
BACKUPDIR=$(ls -t ./backups | head -1)
Combining what you wrote with what I have learned solved my problem too. Thank you for rising this question.
Note: I run the line above from GitBash within Windows environment in file called ./something.bash.

Delete all files except the latest 10 in subdirectories

I have some BackUp-Files from machines which store their Backup in different folders. Additionally the files are not created at the same time (Machine 1: every sunday, Machine 2: every first monday of the month, etc.).
I need to keep the latest 10 files in each folder and delete all the others. Because of the different backup-intervals I can't just delete all files older than x days.
The folder-structure is like this:
./<SystemType>/<FQDN_Machine1>/backup_2015_09_08_02_00_00.zip
./<SystemType>/<FQDN_Machine2>/backup_2015_09_01_14_00_00.zip

IFS='
'
for i in dir/*; do
ls -d1t $i/* | head -n-10
done | xargs rm
List all subdirs excluding latest ten and send it by xargs to rm.

This is my solution:
#!/bin/bash
cat find ./ -type f | while IFS= read -r line
do
find "$line" -type f | head -n -10 | while read file
do
rm -f "$file"
done
done

Delete all files in a directory, except those listed matching specific criteria

I need to automate a clean-up of a Linux based FTP server that only holds backup files.
In our "\var\DATA" directory is a collection of directories. Any directory here used for backup begins with "DEV". In each "DEVxxx*" directory are the actual backup files, plus any user files that may have been needed in the course of maintenance on these devices.
We only want to retain the following files - anything else found in these "DEVxxx*" directories is to be deleted:
The newest two backups: ls -t1 | grep -m2 ^[[:digit:]{6}_Config]
The newest backup done on the first of the month: ls -t1 | grep -m1 ^[[:digit:]{4}01_Config]
Any file that was modified less than 30 days ago: find -mtime -30
Our good configuration file: ls verification_cfg
Anything that doesn't match the above should be deleted.
How can we script this?
I'm guessing a BASH script can do this, and that we can create a cron job to run daily to perform the task.

Something like this perhaps?
{ ls -t1 | grep -m2 ^[[:digit:]{6}_Config] ;
ls -t1 | grep -m1 ^[[:digit:]{4}01_Config] ;
find -mtime -30 ;
ls -1 verification_cfg ;
} | rsync -a --exclude=* --include-from=- /var/DATA/ /var/DATA.bak/
rm -rf /var/DATA
mv /var/DATA.bak /var/DATA

For what it's worth, here is the bash script I created to accomplish my task. Comments are welcome.
#!/bin/bash
# This script follows these rules:
#
# - Only process directories beginning with "DEV"
# - Do not process directories within the device directory
# - Keep files that match the following criteria:
# - Keep the two newest automated backups
# - Keep the six newest automated backups generated on the first of the month
# - Keep any file that is less than 30 days old
# - Keep the file "verification_cfg"
#
# - An automated backup file is identified as six digits, followed by "_Config"
# e.g. 20120329_Config
# Remember the current directory
CurDir=`pwd`
# FTP home directory
DatDir='/var/DATA/'
cd $DatDir
# Only process directories beginning with "DEV"
for i in `find . -type d -maxdepth 1 | egrep '\.\/DEV' | sort` ; do
cd $DatDir
echo Doing "$i"
cd $i
# Set the GROUP EXECUTE bit on all files
find . -type f -exec chmod g+x {} \;
# Find the two newest automated config backups
for j in `ls -t1 | egrep -m2 ^[0-9]{8}_Config$` ; do
chmod g-x $j
done
# Find the six newest automated config backups generated on the first of the month
for j in `ls -t1 | egrep -m6 ^[0-9]{6}01_Config$` ; do
chmod g-x $j
done
# Find all files that are less than 30 days old
for j in `find -mtime -30 -type f` ; do
chmod g-x $j
done
# Find the "verification_cfg" file
for j in `find -name verification_cfg` ; do
chmod g-x $j
done
# Remove any files that still have the GROUP EXECUTE bit set
find . -type f -perm -g=x -exec rm -f {} \;
done
# Back to the users current directory
cd $CurDir

Shell script to count files, then remove oldest files

I am new to shell scripting, so I need some help here. I have a directory that fills up with backups. If I have more than 10 backup files, I would like to remove the oldest files, so that the 10 newest backup files are the only ones that are left.
So far, I know how to count the files, which seems easy enough, but how do I then remove the oldest files, if the count is over 10?
if [ls /backups | wc -l > 10]
then
echo "More than 10"
fi

Try this:
ls -t | sed -e '1,10d' | xargs -d '\n' rm
This should handle all characters (except newlines) in a file name.
What's going on here?
ls -t lists all files in the current directory in decreasing order of modification time. Ie, the most recently modified files are first, one file name per line.
sed -e '1,10d' deletes the first 10 lines, ie, the 10 newest files. I use this instead of tail because I can never remember whether I need tail -n +10 or tail -n +11.
xargs -d '\n' rm collects each input line (without the terminating newline) and passes each line as an argument to rm.
As with anything of this sort, please experiment in a safe place.

find is the common tool for this kind of task :
find ./my_dir -mtime +10 -type f -delete
EXPLANATIONS
./my_dir your directory (replace with your own)
-mtime +10 older than 10 days
-type f only files
-delete no surprise. Remove it to test your find filter before executing the whole command
And take care that ./my_dir exists to avoid bad surprises !

Make sure your pwd is the correct directory to delete the files then(assuming only regular characters in the filename):
ls -A1t | tail -n +11 | xargs rm
keeps the newest 10 files. I use this with camera program 'motion' to keep the most recent frame grab files. Thanks to all proceeding answers because you showed me how to do it.

The proper way to do this type of thing is with logrotate.

I like the answers from #Dennis Williamson and #Dale Hagglund. (+1 to each)
Here's another way to do it using find (with the -newer test) that is similar to what you started with.
This was done in bash on cygwin...
if [[ $(ls /backups | wc -l) > 10 ]]
then
find /backups ! -newer $(ls -t | sed '11!d') -exec rm {} \;
fi

Straightforward file counter:
max=12
n=0
ls -1t *.dat |
while read file; do
n=$((n+1))
if [[ $n -gt $max ]]; then
rm -f "$file"
fi
done

I just found this topic and the solution from mikecolley helped me in a first step. As I needed a solution for a single line homematic (raspberrymatic) script, I ran into a problem that this command only gave me the fileames and not the whole path which is needed for "rm". My used CUxD Exec command can not start in a selected folder.
So here is my solution:
ls -A1t $(find /media/usb0/backup/ -type f -name homematic-raspi*.sbk) | tail -n +11 | xargs rm
Explaining:
find /media/usb0/backup/ -type f -name homematic-raspi*.sbk searching only files -type f whiche are named like -name homematic-raspi*.sbk (case sensitive) or use -iname (case insensitive) in folder /media/usb0/backup/
ls -A1t $(...) list the files given by find without files starting with "." or ".." -A sorted by mtime -t and with a return of only one column -1
tail -n +11 return of only the last 10 -n +11 lines for following rm
xargs rm and finally remove the raiming files in the list
Maybe this helps others from longer searching and makes the solution more flexible.

stat -c "%Y %n" * | sort -rn | head -n +10 | \
cut -d ' ' -f 1 --complement | xargs -d '\n' rm
Breakdown: Get last-modified times for each file (in the format "time filename"), sort them from oldest to newest, keep all but the last ten entries, and then keep all but the first field (keep only the filename portion).
Edit: Using cut instead of awk since the latter is not always available
Edit 2: Now handles filenames with spaces

On a very limited chroot environment, we had only a couple of programs available to achieve what was initially asked. We solved it that way:
MIN_FILES=5
FILE_COUNT=$(ls -l | grep -c ^d )
if [ $MIN_FILES -lt $FILE_COUNT ]; then
while [ $MIN_FILES -lt $FILE_COUNT ]; do
FILE_COUNT=$[$FILE_COUNT-1]
FILE_TO_DEL=$(ls -t | tail -n1)
# be careful with this one
rm -rf "$FILE_TO_DEL"
done
fi
Explanation:
FILE_COUNT=$(ls -l | grep -c ^d ) counts all files in the current folder. Instead of grep we could use also wc -l but wc was not installed on that host.
FILE_COUNT=$[$FILE_COUNT-1] update the current $FILE_COUNT
FILE_TO_DEL=$(ls -t | tail -n1) Save the oldest file name in the $FILE_TO_DEL variable. tail -n1 returns the last element in the list.

Based on others suggestions and some awk foo, I got this to work. I know this an old thread, but I didn't find a decent answer here and this sorted it for me. This just deletes the oldest file, but you can change the head -n 1 to 10 and get the oldest 10.
find $DIR -type f -printf '%T+ %p\n' | sort | head -n 1 | awk '{first =$1; $1 =""; print $0}' | xargs -d '\n' rm

Using inode numbers via stat & find command (to avoid pesky-chars-in-file-name issues):
stat -f "%m %i" * | sort -rn -k 1,1 | tail -n +11 | cut -d " " -f 2 | \
xargs -n 1 -I '{}' find "$(pwd)" -type f -inum '{}' -print
#stat -f "%m %i" * | sort -rn -k 1,1 | tail -n +11 | cut -d " " -f 2 | \
# xargs -n 1 -I '{}' find "$(pwd)" -type f -inum '{}' -delete

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Linux: Search for old files.... copy the oldest ones to a location... (+- Verify copy)... then delete them - linux

Related

Delete files older than 3 days if disk usage is over 85 percent in bash

LINUX Copy the name of the newest folder and paste it in a command [duplicate]

Delete all files except the latest 10 in subdirectories

Delete all files in a directory, except those listed matching specific criteria

Shell script to count files, then remove oldest files

Categories

Resources