Remove similar directories with conditions in bash - linux

Lets say I have several directories, which are similar but are slightly different at the end:
XYZ_e6586_e5984
XYZ_e3282_e5984
XYZ_e9823_e5984
Now, in case there are two or more directories whose name is identical except the number between e and _ , only the directories with the highest number should be kept. In this case, XYZ_e6586_e5984 and XYZ_e3282_e5984 should be removed.
How do I do that?

Simple find regex case here:
find /directory -mindepth 1 -maxdepth 1 -type d -regextype sed -regex "XYZ_e[0-9]\{4}\_e5984 -print0" | sort -nr | tail -n +2 | xargs -i -0 rm -rf "{}"
Yet this will only work on linux with GNU find. A more portable but less pretty version is
find /directory -mindepth 1 -maxdepth 1 -type d -regextype sed -regex "XYZ_e[0-9][0-9][0-9][0-9]_e5984" | sort -nr | tail -n +2 | xargs -i rm -rf "{}"
Explanation:
Use -mindepth 1 and -maxdepth 1 to search only direct children of /directory.
-type -d specifies only searching for directories.
Regexes are pretty self explanatory in that case.
-print0 helps to deal with special characters
sort -nr sorts the output numericaly from highest to lowest
tail -n +2 skips first line (ie the highest numbered folder to keep)
xargs -i rm -rf "{}" performs the actual deletion (-0 is necessary because of -print0).
Just make sure the sort reverse gets done right (replace xargs -i rm -rf "{}" with echo "xargs -i rm -rf \"{}\"" to show the actual commands that would get executed.
If not sorted right, try export LANG=C before executing the command.

Related

Remove all files contain specific string - Bash

I have these bad data
AWS-Console.pngcrop-AWS-Console.png
Alimofire.pngcrop-Alimofire.png
Amazon-ECR-.pngcrop-Amazon-ECR-.png
Amazon-ECS.pngcrop-Amazon-ECS.png
Amazon-RDS.pngcrop-Amazon-RDS.png
Angular.pngcrop-Angular.png
AngularJS.pngcrop-AngularJS.png
.... 1000 more
I'm trying to delete them
I've tried
ls public/assets/fe/img/skill/ | grep crop | rm -rf *crop*
ls public/assets/fe/img/skill/ | grep crop | rm -rf
rm -rf $(ls public/assets/fe/img/skill/ | grep crop)
None of them work ...
rm can handle the glob expressions that ls handles:
rm public/assets/fe/img/skill/*crop*
Use the find command instead
find . -name "*crop*" -type f -exec rm -i {} \;
-type f will specify to search file only and avoid directories
-exec requires the command input to end with \;, the {} being substitute by the result of the command
the -i will ask you to confirm ; remove it once sure what you do.
advice display the result beforehand with -print in place of -exec ...
find . -name "*crop*" -type f -print
More here where your question would find more accurate answers
The main problem in your commands is the missing path in the output of the ls command.
ls public/assets/fe/img/skill/ | grep crop will retur e.g. AWS-Console.pngcrop-AWS-Console.png which is passed to rm. But rm AWS-Console.pngcrop-AWS-Console.png fails because there is no such file in the current directory. It should be rm public/assets/fe/img/skill/AWS-Console.pngcrop-AWS-Console.png instead.
Adding -d to the ls command should do the trick:
ls -d public/assets/fe/img/skill/ | grep crop | rm -rf
rm -rf $(ls -d public/assets/fe/img/skill/ | grep crop)
As pointed out in other answers, other solutions exist, including:
rm public/assets/fe/img/skill/*crop*
find public/assets/fe/img/skill/ -name "*crop*" -type f -exec rm -i {} \;
If it's a really large number of files (apparently wasn't in your case), xargs can speed up the process. This applies for a lot of things you might want to read from a pipe.
find . -name "*crop*" -type f | xargs rm
The main advantage of using find here is that it's an easy way to ignore directories. If that's not an issue, let the OS handle all that.
printf "%s\n" public/assets/fe/img/skill/*crop* | xargs rm
If you need to be able to pick up files in subdirectories -
shopt -s globstar # double asterisks not include arbitrary preceding paths
printf "%s\n" public/assets/fe/img/skill/**crop* | xargs rm
You might want to look over the list first, though.
printf "%s\n" public/assets/fe/img/skill/*crop* >crop.lst
# check the list - vi, grep, whatever satisfies you.
xargs rm < crop.lst # fast-delete them in bulk

shell script to delete all files except the last updated file in different folders

My application logs will be created in below folders in linux system.
Folder 1: 100001_1001
folder 2 : 200001_1002
folder 3 :300061_1003
folder 4: 300001_1004
folder 5 :400011_1008
want to delete all files except the latest file in above folders and want to add this to cron job.
i tried below one not working need help
30 1 * * * ls -lt /abc/cde/etc/100* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
30 1 * * * ls -lt /abc/cde/etc/200* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
30 1 * * * ls -lt /abc/cde/etc/300* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
30 1 * * * ls -lt /abc/cde/etc/400* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
You can use this pipeline consisting all gnu utilities (so that we can also handle file paths with special characters, whitespaces and glob characters)
find /parent/log/dir -type f -name '*.zip' -printf '%T#\t%p\0' |
sort -zk1,1rn |
cut -zf2 |
tail -z -n +2 |
xargs -0 rm -f
Using a slightly modified approach to your own:
find /abc/cde/etc/100* -printf "%A+\t%p\n" | sort -k1,1r| awk 'NR!=1{print $2}' | xargs -i rm "{}"
The find version doesn't suffer the lack of paths, so this MIGHT work (I don't know anything about the directory structure, and whether 100* points at a directory, a file or a group of files ...
You should use find, instead. It has a -delete action that deletes he files it found that match you specification. Warning: it is very easy to go wrong with -delete. Test your command first. Example, to find all files named *.zip under a/b/c (and only files):
find a/b/c -depth -name '*.zip' -type f -print
This is the test, it will print all files that the final command will delete (do not forget the -depth, it is important). And once you are sure, the command that does the deletion is:
find a/b/c -depth -name '*.zip' -type f -delete
find also has options to select files by last modification date, by size... You could, for instance, find all files that were modified at least 24 hours ago:
find a/b/c -depth -type f -mtime +0 -print
and, after careful check, delete them:
find a/b/c -depth -type f -mtime +0 -delete

How to only get file name with Linux 'find'?

I'm using find to all files in directory, so I get a list of paths. However, I need only file names. i.e. I get ./dir1/dir2/file.txt and I want to get file.txt
In GNU find you can use -printf parameter for that, e.g.:
find /dir1 -type f -printf "%f\n"
If your find doesn't have a -printf option you can also use basename:
find ./dir1 -type f -exec basename {} \;
Use -execdir which automatically holds the current file in {}, for example:
find . -type f -execdir echo '{}' ';'
You can also use $PWD instead of . (on some systems it won't produce an extra dot in the front).
If you still got an extra dot, alternatively you can run:
find . -type f -execdir basename '{}' ';'
-execdir utility [argument ...] ;
The -execdir primary is identical to the -exec primary with the exception that utility will be executed from the directory that holds the current file.
When used + instead of ;, then {} is replaced with as many pathnames as possible for each invocation of utility. In other words, it'll print all filenames in one line.
If you are using GNU find
find . -type f -printf "%f\n"
Or you can use a programming language such as Ruby(1.9+)
$ ruby -e 'Dir["**/*"].each{|x| puts File.basename(x)}'
If you fancy a bash (at least 4) solution
shopt -s globstar
for file in **; do echo ${file##*/}; done
If you want to run some action against the filename only, using basename can be tough.
For example this:
find ~/clang+llvm-3.3/bin/ -type f -exec echo basename {} \;
will just echo basename /my/found/path. Not what we want if we want to execute on the filename.
But you can then xargs the output. for example to kill the files in a dir based on names in another dir:
cd dirIwantToRMin;
find ~/clang+llvm-3.3/bin/ -type f -exec basename {} \; | xargs rm
On mac (BSD find) use:
find /dir1 -type f -exec basename {} \;
As others have pointed out, you can combine find and basename, but by default the basename program will only operate on one path at a time, so the executable will have to be launched once for each path (using either find ... -exec or find ... | xargs -n 1), which may potentially be slow.
If you use the -a option on basename, then it can accept multiple filenames in a single invocation, which means that you can then use xargs without the -n 1, to group the paths together into a far smaller number of invocations of basename, which should be more efficient.
Example:
find /dir1 -type f -print0 | xargs -0 basename -a
Here I've included the -print0 and -0 (which should be used together), in order to cope with any whitespace inside the names of files and directories.
Here is a timing comparison, between the xargs basename -a and xargs -n1 basename versions. (For sake of a like-with-like comparison, the timings reported here are after an initial dummy run, so that they are both done after the file metadata has already been copied to I/O cache.) I have piped the output to cksum in both cases, just to demonstrate that the output is independent of the method used.
$ time sh -c 'find /usr/lib -type f -print0 | xargs -0 basename -a | cksum'
2532163462 546663
real 0m0.063s
user 0m0.058s
sys 0m0.040s
$ time sh -c 'find /usr/lib -type f -print0 | xargs -0 -n 1 basename | cksum'
2532163462 546663
real 0m14.504s
user 0m12.474s
sys 0m3.109s
As you can see, it really is substantially faster to avoid launching basename every time.
Honestly basename and dirname solutions are easier, but you can also check this out :
find . -type f | grep -oP "[^/]*$"
or
find . -type f | rev | cut -d '/' -f1 | rev
or
find . -type f | sed "s/.*\///"
-exec and -execdir are slow, xargs is king.
$ alias f='time find /Applications -name "*.app" -type d -maxdepth 5'; \
f -exec basename {} \; | wc -l; \
f -execdir echo {} \; | wc -l; \
f -print0 | xargs -0 -n1 basename | wc -l; \
f -print0 | xargs -0 -n1 -P 8 basename | wc -l; \
f -print0 | xargs -0 basename | wc -l
139
0m01.17s real 0m00.20s user 0m00.93s system
139
0m01.16s real 0m00.20s user 0m00.92s system
139
0m01.05s real 0m00.17s user 0m00.85s system
139
0m00.93s real 0m00.17s user 0m00.85s system
139
0m00.88s real 0m00.12s user 0m00.75s system
xargs's parallelism also helps.
Funnily enough i cannot explain the last case of xargs without -n1.
It gives the correct result and it's the fastest ¯\_(ツ)_/¯
(basename takes only 1 path argument but xargs will send them all (actually 5000) without -n1. does not work on linux and openbsd, only macOS...)
Some bigger numbers from a linux system to see how -execdir helps, but still much slower than a parallel xargs:
$ alias f='time find /usr/ -maxdepth 5 -type d'
$ f -exec basename {} \; | wc -l; \
f -execdir echo {} \; | wc -l; \
f -print0 | xargs -0 -n1 basename | wc -l; \
f -print0 | xargs -0 -n1 -P 8 basename | wc -l
2358
3.63s real 0.10s user 0.41s system
2358
1.53s real 0.05s user 0.31s system
2358
1.30s real 0.03s user 0.21s system
2358
0.41s real 0.03s user 0.25s system
I've found a solution (on makandracards page), that gives just the newest file name:
ls -1tr * | tail -1
(thanks goes to Arne Hartherz)
I used it for cp:
cp $(ls -1tr * | tail -1) /tmp/

Shell script to count files, then remove oldest files

I am new to shell scripting, so I need some help here. I have a directory that fills up with backups. If I have more than 10 backup files, I would like to remove the oldest files, so that the 10 newest backup files are the only ones that are left.
So far, I know how to count the files, which seems easy enough, but how do I then remove the oldest files, if the count is over 10?
if [ls /backups | wc -l > 10]
then
echo "More than 10"
fi
Try this:
ls -t | sed -e '1,10d' | xargs -d '\n' rm
This should handle all characters (except newlines) in a file name.
What's going on here?
ls -t lists all files in the current directory in decreasing order of modification time. Ie, the most recently modified files are first, one file name per line.
sed -e '1,10d' deletes the first 10 lines, ie, the 10 newest files. I use this instead of tail because I can never remember whether I need tail -n +10 or tail -n +11.
xargs -d '\n' rm collects each input line (without the terminating newline) and passes each line as an argument to rm.
As with anything of this sort, please experiment in a safe place.
find is the common tool for this kind of task :
find ./my_dir -mtime +10 -type f -delete
EXPLANATIONS
./my_dir your directory (replace with your own)
-mtime +10 older than 10 days
-type f only files
-delete no surprise. Remove it to test your find filter before executing the whole command
And take care that ./my_dir exists to avoid bad surprises !
Make sure your pwd is the correct directory to delete the files then(assuming only regular characters in the filename):
ls -A1t | tail -n +11 | xargs rm
keeps the newest 10 files. I use this with camera program 'motion' to keep the most recent frame grab files. Thanks to all proceeding answers because you showed me how to do it.
The proper way to do this type of thing is with logrotate.
I like the answers from #Dennis Williamson and #Dale Hagglund. (+1 to each)
Here's another way to do it using find (with the -newer test) that is similar to what you started with.
This was done in bash on cygwin...
if [[ $(ls /backups | wc -l) > 10 ]]
then
find /backups ! -newer $(ls -t | sed '11!d') -exec rm {} \;
fi
Straightforward file counter:
max=12
n=0
ls -1t *.dat |
while read file; do
n=$((n+1))
if [[ $n -gt $max ]]; then
rm -f "$file"
fi
done
I just found this topic and the solution from mikecolley helped me in a first step. As I needed a solution for a single line homematic (raspberrymatic) script, I ran into a problem that this command only gave me the fileames and not the whole path which is needed for "rm". My used CUxD Exec command can not start in a selected folder.
So here is my solution:
ls -A1t $(find /media/usb0/backup/ -type f -name homematic-raspi*.sbk) | tail -n +11 | xargs rm
Explaining:
find /media/usb0/backup/ -type f -name homematic-raspi*.sbk searching only files -type f whiche are named like -name homematic-raspi*.sbk (case sensitive) or use -iname (case insensitive) in folder /media/usb0/backup/
ls -A1t $(...) list the files given by find without files starting with "." or ".." -A sorted by mtime -t and with a return of only one column -1
tail -n +11 return of only the last 10 -n +11 lines for following rm
xargs rm and finally remove the raiming files in the list
Maybe this helps others from longer searching and makes the solution more flexible.
stat -c "%Y %n" * | sort -rn | head -n +10 | \
cut -d ' ' -f 1 --complement | xargs -d '\n' rm
Breakdown: Get last-modified times for each file (in the format "time filename"), sort them from oldest to newest, keep all but the last ten entries, and then keep all but the first field (keep only the filename portion).
Edit: Using cut instead of awk since the latter is not always available
Edit 2: Now handles filenames with spaces
On a very limited chroot environment, we had only a couple of programs available to achieve what was initially asked. We solved it that way:
MIN_FILES=5
FILE_COUNT=$(ls -l | grep -c ^d )
if [ $MIN_FILES -lt $FILE_COUNT ]; then
while [ $MIN_FILES -lt $FILE_COUNT ]; do
FILE_COUNT=$[$FILE_COUNT-1]
FILE_TO_DEL=$(ls -t | tail -n1)
# be careful with this one
rm -rf "$FILE_TO_DEL"
done
fi
Explanation:
FILE_COUNT=$(ls -l | grep -c ^d ) counts all files in the current folder. Instead of grep we could use also wc -l but wc was not installed on that host.
FILE_COUNT=$[$FILE_COUNT-1] update the current $FILE_COUNT
FILE_TO_DEL=$(ls -t | tail -n1) Save the oldest file name in the $FILE_TO_DEL variable. tail -n1 returns the last element in the list.
Based on others suggestions and some awk foo, I got this to work. I know this an old thread, but I didn't find a decent answer here and this sorted it for me. This just deletes the oldest file, but you can change the head -n 1 to 10 and get the oldest 10.
find $DIR -type f -printf '%T+ %p\n' | sort | head -n 1 | awk '{first =$1; $1 =""; print $0}' | xargs -d '\n' rm
Using inode numbers via stat & find command (to avoid pesky-chars-in-file-name issues):
stat -f "%m %i" * | sort -rn -k 1,1 | tail -n +11 | cut -d " " -f 2 | \
xargs -n 1 -I '{}' find "$(pwd)" -type f -inum '{}' -print
#stat -f "%m %i" * | sort -rn -k 1,1 | tail -n +11 | cut -d " " -f 2 | \
# xargs -n 1 -I '{}' find "$(pwd)" -type f -inum '{}' -delete

Move all files except one

How can I move all files except one? I am looking for something like:
'mv ~/Linux/Old/!Tux.png ~/Linux/New/'
where I move old stuff to new stuff -folder except Tux.png. !-sign represents a negation. Is there some tool for the job?
If you use bash and have the extglob shell option set (which is usually the case):
mv ~/Linux/Old/!(Tux.png) ~/Linux/New/
Put the following to your .bashrc
shopt -s extglob
It extends regexes.
You can then move all files except one by
mv !(fileOne) ~/path/newFolder
Exceptions in relation to other commands
Note that, in copying directories, the forward-flash cannot be used in the name as noticed in the thread Why extglob except breaking except condition?:
cp -r !(Backups.backupdb) /home/masi/Documents/
so Backups.backupdb/ is wrong here before the negation and I would not use it neither in moving directories because of the risk of using wrongly then globs with other commands and possible other exceptions.
I would go with the traditional find & xargs way:
find ~/Linux/Old -maxdepth 1 -mindepth 1 -not -name Tux.png -print0 |
xargs -0 mv -t ~/Linux/New
-maxdepth 1 makes it not search recursively. If you only care about files, you can say -type f. -mindepth 1 makes it not include the ~/Linux/Old path itself into the result. Works with any filenames, including with those that contain embedded newlines.
One comment notes that the mv -t option is a probably GNU extension. For systems that don't have it
find ~/Linux/Old -maxdepth 1 -mindepth 1 -not -name Tux.png \
-exec mv '{}' ~/Linux/New \;
A quick way would be to modify the tux filename so that your move command will not match.
For example:
mv Tux.png .Tux.png
mv * ~/somefolder
mv .Tux.png Tux.png
I think the easiest way to do is with backticks
mv `ls -1 ~/Linux/Old/ | grep -v Tux.png` ~/Linux/New/
Edit:
Use backslash with ls instead to prevent using it with alias, i.e. mostly ls is aliased as ls --color.
mv `\ls -1 ~/Linux/Old/ | grep -v Tux.png` ~/Linux/New/
Thanks #Arnold Roa
For bash, sth answer is correct. Here is the zsh (my shell of choice) syntax:
mv ~/Linux/Old/^Tux.png ~/Linux/New/
Requires EXTENDED_GLOB shell option to be set.
I find this to be a bit safer and easier to rely on for simple moves that exclude certain files or directories.
ls -1 | grep -v ^$EXCLUDE | xargs -I{} mv {} $TARGET
This could be simpler and easy to remember and it works for me.
mv $(ls ~/folder | grep -v ~/folder/exclude.png) ~/destination
The following is not a 100% guaranteed method, and should not at all be attempted for scripting. But some times it is good enough for quick interactive shell usage. A file file glob like
[abc]*
(which will match all files with names starting with a, b or c) can be negated by inserting a "^" character first, i.e.
[^abc]*
I sometimes use this for not matching the "lost+found" directory, like for instance:
mv /mnt/usbdisk/[^l]* /home/user/stuff/.
Of course if there are other files starting with l I have to process those afterwards.
How about:
mv $(echo * | sed s:Tux.png::g) ~/Linux/New/
You have to be in the folder though.
This can bei done without grep like this:
ls ~/Linux/Old/ -QI Tux.png | xargs -I{} mv ~/Linux/Old/{} ~/Linux/New/
Note: -I is a captial i and makes the ls command ignore the Tux.png file, which is listed afterwards.
The output of ls is then piped into mv via xargs, which allows to use the output of ls as source argument for mv.
ls -Q just quotes the filenames listed by ls.
mv `find Linux/Old '!' -type d | fgrep -v Tux.png` Linux/New
The find command lists all regular files and the fgrep command filters out any Tux.png. The backticks tell mv to move the resulting file list.
ls ~/Linux/Old/ | grep -v Tux.png | xargs -i {} mv ~/Linux/New/'
move all files(not include except file) to except_file
find -maxdepth 1 -mindepth 1 -not -name except_file -print0 |xargs -0 mv -t ./except_file
for example(cache is current except file)
find -maxdepth 1 -mindepth 1 -not -name cache -print0 |xargs -0 mv -t ./cache

Resources