Remove all files with a common prefix except the N latest modified - linux

I am trying to create a bash command/script to remove all files in a directory older than X days that starts with a certain substring.
For example, if our directory contains the files
-rw-r--r-- 1 root root 0 Jun 30 10:22 foo_5
-rw-r--r-- 1 root root 0 Jun 29 10:22 bar_4
-rw-r--r-- 1 root root 0 Jun 29 10:22 foo_4
-rw-r--r-- 1 root root 0 Jun 28 10:22 bar_3
-rw-r--r-- 1 root root 0 Jun 28 10:22 foo_3
-rw-r--r-- 1 root root 0 Jun 27 10:22 bar_2
-rw-r--r-- 1 root root 0 Jun 27 10:22 foo_2
-rw-r--r-- 1 root root 0 Jun 26 10:22 foo_1
we want to delete all foo* files except the 2 most recent one. This will result in the directory
-rw-r--r-- 1 root root 0 Jun 30 10:22 foo_5
-rw-r--r-- 1 root root 0 Jun 29 10:22 bar_4
-rw-r--r-- 1 root root 0 Jun 29 10:22 foo_4
-rw-r--r-- 1 root root 0 Jun 28 10:22 bar_3
-rw-r--r-- 1 root root 0 Jun 27 10:22 bar_2
I am currently only able to delete all files except the 2 most recent, which will affect bar* files.
ls -t | tail -n +4 | xargs rm --
How can we also restrict our deletion to files that starts with a certain string?
Code to create test files
(
touch -d "6 days ago" foo_5
touch -d "7 days ago" foo_4
touch -d "7 days ago" bar_4
touch -d "8 days ago" foo_3
touch -d "8 days ago" bar_3
touch -d "9 days ago" foo_2
touch -d "9 days ago" bar_2
touch -d "10 days ago" foo_1
)

Parsing the output of ls is not a good idea. Using tools from GNU coreutils and findutils packages, a fail-safe program to achieve this task can be written as below.
n=2 # except the last two
find -maxdepth 1 -type f -name 'foo*' \
-printf '%T#\t%p\0' \
| sort -z -k 1n,1 \
| head -z -n -$n \
| cut -z -f 2- \
| xargs -0 rm

This is a job for stat
stat -c '%Y %n' foo* | sort -n | head -n -2 | cut -d " " -f 2- | xargs echo rm
rm foo_1 foo_2 foo_3
Remove "echo" if it is selecting the right files to delete.

Using perl and glob() (handle files with newlines or spaces as well) via only one process:
perl -e '
my #files = sort { -M $a <=> -M $b } grep -f, <./foo*>;
unlink #files[2..$#files]
'

Related

Find regular expression matching condition

I have a set of files including a date in their name:
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
I want to select the files matching a condition on this date. In this example: date > 20151230
I tried things like:
find . -regex ".*.SUB.nc" | cut -d "." -f 4 | while read a; do if [ $a -ge 20151201 ]; then echo $a; fi; done
BUT:
1) This is returning only a part of the filename, whereas I would like to return the entire filename.
2) There may be a more elegant way than using while read/do
thanks in advance!
Rearranging your code becomes:
#!/usr/bin/env bash
find . -regex ".*.SUB.nc" \
| rev | cut -d '.' -f 3 | rev \
| while read a; do
if [ $a -ge 20151201 ]; then
echo $a
fi
done
rev | cut -d '.' -f 3 | rev is used because
if you give absolute path or
the subdirectories have . in them
then it won't be the 4th field, but it will always be the 3rd last field.
This will give the output:
20151231
20151229
20151230
To show the complete file names change echo $a with ls *$a*. Output:
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
I tested this script with file names whose dates are less than 20151201. For example MERRA2_400.tavg1_2d_slv_Nx.20151200.SUB.nc. The results are consistent.
Perhaps a more efficient way to accomplish your task is using a grep regex like:
find . -regex ".*.SUB.nc" | grep -E "201512(0[1-9]|[1-9][0-9])|201[6-9][0-9][0-9][0-9]"
This will work just fine.
find . -regex ".*.SUB.nc" | rev | cut -d '.' -f 3 | rev | while read a; do if [ $a -ge 20151201 ]; then echo `ls -R | grep $a` ;fi ;done
rev | cut -d '.' -f 3 | rev is used because
if you give absolute path or
the subdirectories have . in them
then it won't be the 4th field now, but it will always be the 3rd last field always.
ls -R | grep $a so that you can recursively find out the name of the file.
Assume is the files and file structure is :
[root#localhost temp]# ls -lrt -R
.:
total 8
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
drwxr-xr-x. 2 root root 4096 Apr 25 16:32 temp.3
drwxr-xr-x. 3 root root 4096 Apr 25 17:13 temp2
./temp.3:
total 0
./temp2:
total 4
-rw-r--r--. 1 root root 0 Apr 25 16:27 MERRA2_400.tavg1_2d_slv_Nx.20151111.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:27 MERRA2_400.tavg1_2d_slv_Nx.20151222.SUB.nc
drwxr-xr-x. 2 root root 4096 Apr 25 17:13 temp21
./temp2/temp21:
total 0
-rw-r--r--. 1 root root 0 Apr 25 17:13 MERRA2_400.tavg1_2d_slv_Nx.20151333.SUB.nc
Running above command gives :
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151333.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151222.SUB.nc

grep - a simple issue with end of line

ls -alp $base/$currentDir | awk '{print $9}' | grep '/' | egrep -v '^t|^tz$|^html$|^\.'
I have this grep and I am trying to ignore matches with "t" "tz" or "html" full names of directories.
All is good except that ^html$ does not match, while ^html does match, same for ^tz$ not matching -- somehow the $ is not being recognized as end of line. ^ is fine as start of line.
I really want to know the answer to the above, and secondarily, is there a different way to get list of all subdirectories in a given directory?
I found ls -d but that does not take directory parameter?:
ls -d * /
/ arch index.html
that works fine
but unsucessful tries:
abc> ls -d * /
/ arch index.html
abc> ls -d ../../arizona /
../../arizona /
abc> ls -d ../../arizona
../../arizona
abc ls -d '../../arizona'
../../arizona
abc> ls -d '../../arizona' /
../../arizona /
while this is the layout
abc> ls -alp ../../arizona | grep '/'
drwxr-xr-x 7 roberto007 inetuser 4096 Jan 26 11:16 ./
drwxr-xr-x 205 roberto007 inetuser 28672 Mar 10 11:07 ../
drwxr-xr-x 3 roberto007 inetuser 4096 Jan 26 11:17 grand-canyon/
drwxr-xr-x 3 roberto007 inetuser 4096 Jan 26 11:16 havasu-falls/
drwxr-xr-x 2 roberto007 inetuser 28672 Feb 27 2014 html/
drwxr-xr-x 4 roberto007 inetuser 4096 Jan 26 11:17 sedona/
drwxr-xr-x 3 roberto007 inetuser 4096 Jan 26 11:16 superstitions/
This should work:
cd $base/$currentDir
printf '%s\n' */ | egrep -v '^t|^tz/$|^html/$'
or
printf '%s\n' $base/$currentDir/*/ | egrep -v '^t|^tz/$|^html/$'
*/ lists only directories
printf '%s\n' puts a newline after each directory
egrep does what you want, no need to filter out ./ because hidden directories are not expanded by */

How to tar the n most recent files

I am trying to create a script that foreach directoy in the folder folder, only the n most recent files are to be compressed.
However, I am having trouble with the multiple word files. I need a way to wrap them in quote marks so the tar command knows wich is each file.
Here is my script so far:
#!/bin/bash
if [ ! -d ~/backup ]; then
mkdir ~/backup
fi
cd ~/folder
for i in *; do
if [ -d "$i" ]; then
original=`pwd`
cd $i
echo tar zcf ~/backup/"$i".tar.gz "`ls -t | head -10`"
cd $original
fi
done
echo "Backup copied in $HOME/backup/"
exit 0
if [ ! -d ~/backup ]; then
mkdir ~/backup
fi
You can simplify by this :
[[ ! -d ~/backup ]] && mkdir ~/backup
Now to answer your question :
$ ls -t|head -10
file with spaces
file
test.txt
test
test.sh
$ lstFiles=""; while read; do lstFiles="$lstFiles \"$REPLY\""; done <<< "$(ls -t|head -10)"
$ echo $lstFiles
"file with spaces" "file" "test.txt" "test" "test.sh"
See how to read a command output or file content with a loop in Bash to read more details.
Several workarounds if you want to stick to one-liners - simplest is probably to use 'tr' and introduce wildcard for spaces:
echo tar zcf ~/backup/"$i".tar.gz "ls -t | head -10| tr ' ' '?'"
-rw-rw-r-- 1 dale dale 35 Apr 6 09:11 test 1_dummy.txt
-rw-rw-r-- 1 dale dale 35 Apr 6 09:11 test 2_dummy.txt
-rw-rw-r-- 1 dale dale 35 Apr 6 09:11 test 3_dummy.txt
-rw-rw-r-- 1 dale dale 35 Apr 6 09:11 test 4_dummy.txt
-rw-rw-r-- 1 dale dale 35 Apr 6 09:11 test 5_dummy.txt
-rw-rw-r-- 1 dale dale 35 Apr 6 09:11 test 6_dummy.txt
-rw-rw-r-- 1 dale dale 35 Apr 6 09:11 test 7_dummy.txt
-rw-rw-r-- 1 dale dale 35 Apr 6 09:11 test 8_dummy.txt
-rw-rw-r-- 1 dale dale 35 Apr 6 09:11 test 9_dummy.txt
-rw-rw-r-- 1 dale dale 35 Apr 6 09:11 test 10_dummy.txt
-rw-rw-r-- 1 dale dale 35 Apr 6 09:11 test 11_dummy.txt
$ tar cvf TEST.tar $(ls -t | head -5 | tr ' ' '?')
test 11_dummy.txt
test 10_dummy.txt
test 9_dummy.txt
test 8_dummy.txt
test 7_dummy.txt
Another option might be to redirect to a file and then use '-T':
ls -t | head > /tmp/10tarfiles.txt
echo tar zcf ~/backup/"$i".tar.gz -T /tmp/10tarfiles.txt"
rm /tmp/10tarfiles.txt

backup script in shell

I am new in shell script.Will you please suggest how to write backup shell script. I am having following formated data in target directory.
StoreID_date_time.zip
Like:
-rw------- 1 rupesh ldapusers 8267310 Mar 22 12:00 44_22032014_115629.zip
-rw------- 1 rupesh ldapusers 8269938 Mar 22 12:07 44_22032014_120013.zip
-rw------- 1 rupesh ldapusers 8267110 Mar 22 12:14 44_22032014_120704.zip
-rw------- 1 rupesh ldapusers 8254223 Mar 22 14:25 45_22032014_142155.zip
-rw------- 1 rupesh ldapusers 7871060 Mar 22 12:11 48_22032014_120813.zip
-rw------- 1 rupesh ldapusers 8314418 Mar 22 12:22 48_22032014_121038.zip
-rw------- 1 rupesh ldapusers 8254699 Mar 24 12:13 49_22032014_145338.zip
Now I want to backup files with following way:
Backup directory : /backup/date/storeid/zip files of that store
like:
/backup/22032014/44/44_22032014_115629.zip,44_22032014_120013.zip...so on
/backup/22032014/45/45_22032014_142155.zip
/backup/22032014/48/48_22032014_120813.zip,48_22032014_121038.zip
/backup/22032014/49/49_22032014_145338.zip
for next day /backup/23032014/respective_storeIDfolder&files
Please give some hint or code example so I can move foreword.
I have coded in bare minimum steps without doing a real check but verified it. It works fine with some dummy files I created on my box :)
#!/bin/bash
for i in $(find * -type f -iname '*.zip' )
do
echo "Zip file : "$i
store_id=$(echo $i | cut -d "_" -f 1 );
timestamp=$(echo $i | cut -d "_" -f 2 );
echo Store id = ${store_id}
# I am assuming all these directories here will be of teh same pattern name. Else put a numeric check down.
mkdir -p /backup/${timestamp}/${store_id}
cp -f $i /backup/${timestamp}/${store_id}/
done;

Tail latest file that matches a selected rule

Have a directory that multiple processes log to and I want to tail the latest file of a selected process.
in ~/bashrc I have added the following
function __taillog {
tail -f $(find $1 -maxdepth 1 -type f -printf "%T# %p\n" | sort -n | tail -n 1 | cut -d' ' -f 2-)
}
alias taillog='__taillog'
Taken from: https://superuser.com/questions/117596/how-to-tail-the-latest-file-in-a-directory
An example of the log file directory
-rw-r--r-- 1 genesys genesys 2284 Mar 19 16:34 gdalog.20130319_163436_906.log
-rw-r--r-- 1 genesys genesys 131072 Mar 19 16:34 gdalog.20130319_163436_906.snapshot.log
-rw-r--r-- 1 genesys genesys 10517 Mar 19 16:54 lcalog.20130319_163332_719.log
-rw-r--r-- 1 genesys genesys 131072 Mar 19 16:54 lcalog.20130319_163332_719.snapshot.log
-rw-r--r-- 1 genesys genesys 3792 Mar 19 16:37 StatServer_TLSTest.20130319_163700_703.log
-rw-r--r-- 1 genesys genesys 160562 Mar 19 16:52 StatServer_TLSTest.20130319_163712_045.log
-rw-r--r-- 1 genesys genesys 49730 Mar 19 16:54 StatServer_TLSTest.20130319_165217_402.log
-rw-r--r-- 1 genesys genesys 53960 Mar 20 09:55 StatServer_TLSTest.20130319_165423_702.log
-rw-r--r-- 1 genesys genesys 131072 Mar 20 09:56 StatServer_TLSTest.20130319_165423_702.snapshot.log
So to tail the all StatServer the command would be
taillog /home/user/logs/StatServer*
and it would tail the latest file for that application in the given path
The issue is the tail displays some of the file output but does not show any updates when the log file is appended. If the following command is run the log is tailed correctly
tail -f $(find /home/user/logs/StatServer* -maxdepth 1 -type f -printf "%T# %p\n" | sort -n | tail -n 1 | cut -d' ' -f 2-)
Some how adding this command as a bash function then calling it from an alias causes it to not operate as desired.
Any suggestion on a better way are welcome.
I believe you should be running this command:
taillog /home/user/logs
When you say /home/user/logs/this_app* you're passing all the files that match the pattern as argument to taillog and only using the first argument i.e. $1, and the command eventually translates to tail -f $1.
Instead $1 should be the directory where find should look for the files at that directory level (i.e. /home/user/logs in your case), then pipe the results to sort, tail and cut.
I didn't have any problems running your taillog function on linux/bash. Perhaps the log output is being buffered, so changes aren't being written right away? You might try turning off the [log]buffering option for this StatServer.

Resources