shell script to delete all files except the last updated file in different folders

shell script to delete all files except the last updated file in different folders - linux

My application logs will be created in below folders in linux system.
Folder 1: 100001_1001
folder 2 : 200001_1002
folder 3 :300061_1003
folder 4: 300001_1004
folder 5 :400011_1008
want to delete all files except the latest file in above folders and want to add this to cron job.
i tried below one not working need help
30 1 * * * ls -lt /abc/cde/etc/100* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
30 1 * * * ls -lt /abc/cde/etc/200* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
30 1 * * * ls -lt /abc/cde/etc/300* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;
30 1 * * * ls -lt /abc/cde/etc/400* | awk '{if(NR!=1) print $9}' | xargs -i rm -rf {} \;

You can use this pipeline consisting all gnu utilities (so that we can also handle file paths with special characters, whitespaces and glob characters)
find /parent/log/dir -type f -name '*.zip' -printf '%T#\t%p\0' |
sort -zk1,1rn |
cut -zf2 |
tail -z -n +2 |
xargs -0 rm -f

Using a slightly modified approach to your own:
find /abc/cde/etc/100* -printf "%A+\t%p\n" | sort -k1,1r| awk 'NR!=1{print $2}' | xargs -i rm "{}"
The find version doesn't suffer the lack of paths, so this MIGHT work (I don't know anything about the directory structure, and whether 100* points at a directory, a file or a group of files ...

You should use find, instead. It has a -delete action that deletes he files it found that match you specification. Warning: it is very easy to go wrong with -delete. Test your command first. Example, to find all files named *.zip under a/b/c (and only files):
find a/b/c -depth -name '*.zip' -type f -print
This is the test, it will print all files that the final command will delete (do not forget the -depth, it is important). And once you are sure, the command that does the deletion is:
find a/b/c -depth -name '*.zip' -type f -delete
find also has options to select files by last modification date, by size... You could, for instance, find all files that were modified at least 24 hours ago:
find a/b/c -depth -type f -mtime +0 -print
and, after careful check, delete them:
find a/b/c -depth -type f -mtime +0 -delete

Related

Remove similar directories with conditions in bash

Lets say I have several directories, which are similar but are slightly different at the end:
XYZ_e6586_e5984
XYZ_e3282_e5984
XYZ_e9823_e5984
Now, in case there are two or more directories whose name is identical except the number between e and _ , only the directories with the highest number should be kept. In this case, XYZ_e6586_e5984 and XYZ_e3282_e5984 should be removed.
How do I do that?

Simple find regex case here:
find /directory -mindepth 1 -maxdepth 1 -type d -regextype sed -regex "XYZ_e[0-9]\{4}\_e5984 -print0" | sort -nr | tail -n +2 | xargs -i -0 rm -rf "{}"
Yet this will only work on linux with GNU find. A more portable but less pretty version is
find /directory -mindepth 1 -maxdepth 1 -type d -regextype sed -regex "XYZ_e[0-9][0-9][0-9][0-9]_e5984" | sort -nr | tail -n +2 | xargs -i rm -rf "{}"
Explanation:
Use -mindepth 1 and -maxdepth 1 to search only direct children of /directory.
-type -d specifies only searching for directories.
Regexes are pretty self explanatory in that case.
-print0 helps to deal with special characters
sort -nr sorts the output numericaly from highest to lowest
tail -n +2 skips first line (ie the highest numbered folder to keep)
xargs -i rm -rf "{}" performs the actual deletion (-0 is necessary because of -print0).
Just make sure the sort reverse gets done right (replace xargs -i rm -rf "{}" with echo "xargs -i rm -rf \"{}\"" to show the actual commands that would get executed.
If not sorted right, try export LANG=C before executing the command.

Moving files with a specific modification date; "find | xargs ls | grep | -exec" fails w/ "-exec: command not found"

Iam using centos 7
If I want to find files that have specific name and specific date then moving these files to another folder iam issuing the command
find -name 'fsimage*' | xargs ls -ali | grep 'Oct 20' | -exec mv {} /hdd/fordelete/ \;
with the following error
-bash: -exec: command not found xargs: ls: terminated by signal 13

As another answer already explains, -exec is an action for find, you can't use it as a shell command. On contrary, xargs and grep are commands, and you can't use them as find actions, just like you can't use pipe | inside find.
But more importantly, even though you could use ls and grep on find's result just to move files older than some amount of time, you shouldn't. Such pipeline is fragile and fails on many corner cases, like symlinks, files with newlines in name, etc.
Instead, use find. You'll find it quite powerful.
For example, to mv files modified more than 7 days ago, use the -mtime test:
find -name 'fsimage*' -mtime +7 -exec mv '{}' /some/dir/ \;
To mv files modified on a specific/reference date, e.g. 2017-10-20, you can use the -newerXY test:
find -name 'fsimage*' -newermt 2017-10-20 ! -newermt 2017-10-21 -exec mv '{}' /some/dir/ \;
Also, if your mv supports the -t option (to give target dir first, multiple files after), you can use {} + placeholder in find for multiple files, reducing the total number of mv command invocations (thanks #CharlesDuffy):
find -name 'fsimage*' -mtime +7 -exec mv -t /some/dir/ '{}' +

the -exec as you wrote it is quite meaningless, moreover it seems you are mixing find syntax with shell oe (-exec as you wrote it should be passed to find)
there are probably more concise ways of doing, but this should do what you expect:
find -name 'fsimage*' -type f | xargs ls -ali | grep 'Oct 20' | awk '{ print $NF }' | while read file; do mv "$file" /hdd/fordelete/ ; done
nevertheless, you should take care of not just copy/paste things you do not really understand from the web, you may wreck you system...

In Linux terminal, how to delete all files in a directory except one or two

In a Linux terminal, how to delete all files from a folder except one or two?
For example.
I have 100 image files in a directory and one .txt file.
I want to delete all files except that .txt file.

From within the directory, list the files, filter out all not containing 'file-to-keep', and remove all files left on the list.
ls | grep -v 'file-to-keep' | xargs rm
To avoid issues with spaces in filenames (remember to never use spaces in filenames), use find and -0 option.
find 'path' -maxdepth 1 -not -name 'file-to-keep' -print0 | xargs -0 rm
Or mixing both, use grep option -z to manage the -print0 names from find

In general, using an inverted pattern search with grep should do the job. As you didn't define any pattern, I'd just give you a general code example:
ls -1 | grep -v 'name_of_file_to_keep.txt' | xargs rm -f
The ls -1 lists one file per line, so that grep can search line by line. grep -v is the inverted flag. So any pattern matched will NOT be deleted.
For multiple files, you may use egrep:
ls -1 | grep -E -v 'not_file1.txt|not_file2.txt' | xargs rm -f
Update after question was updated:
I assume you are willing to delete all files except files in the current folder that do not end with .txt. So this should work too:
find . -maxdepth 1 -type f -not -name "*.txt" -exec rm -f {} \;

find supports a -delete option so you do not need to -exec. You can also pass multiple sets of -not -name somefile -not -name otherfile
user#host$ ls
1.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt josh.pdf keepme
user#host$ find . -maxdepth 1 -type f -not -name keepme -not -name 8.txt -delete
user#host$ ls
8.txt keepme

Use the not modifier to remove file(s) or pattern(s) you don't want to delete, you can modify the 1 passed to -maxdepth to specify how many sub directories deep you want to delete files from
find . -maxdepth 1 -not -name "*.txt" -exec rm -f {} \;
You can also do:
find -maxdepth 1 \! -name "*.txt" -exec rm -f {} \;

In bash, you can use:
$ shopt -s extglob # Enable extended pattern matching features
$ rm !(*.txt) # Delete all files except .txt files

How to count number of files in each directory?

I am able to list all the directories by
find ./ -type d
I attempted to list the contents of each directory and count the number of files in each directory by using the following command
find ./ -type d | xargs ls -l | wc -l
But this summed the total number of lines returned by
find ./ -type d | xargs ls -l
Is there a way I can count the number of files in each directory?

This prints the file count per directory for the current directory level:
du -a | cut -d/ -f2 | sort | uniq -c | sort -nr

Assuming you have GNU find, let it find the directories and let bash do the rest:
find . -type d -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done

find . -type f | cut -d/ -f2 | sort | uniq -c
find . -type f to find all items of the type file, in current folder and subfolders
cut -d/ -f2 to cut out their specific folder
sort to sort the list of foldernames
uniq -c to return the number of times each foldername has been counted

You could arrange to find all the files, remove the file names, leaving you a line containing just the directory name for each file, and then count the number of times each directory appears:
find . -type f |
sed 's%/[^/]*$%%' |
sort |
uniq -c
The only gotcha in this is if you have any file names or directory names containing a newline character, which is fairly unlikely. If you really have to worry about newlines in file names or directory names, I suggest you find them, and fix them so they don't contain newlines (and quietly persuade the guilty party of the error of their ways).
If you're interested in the count of the files in each sub-directory of the current directory, counting any files in any sub-directories along with the files in the immediate sub-directory, then I'd adapt the sed command to print only the top-level directory:
find . -type f |
sed -e 's%^\(\./[^/]*/\).*$%\1%' -e 's%^\.\/[^/]*$%./%' |
sort |
uniq -c
The first pattern captures the start of the name, the dot, the slash, the name up to the next slash and the slash, and replaces the line with just the first part, so:
./dir1/dir2/file1
is replaced by
./dir1/
The second replace captures the files directly in the current directory; they don't have a slash at the end, and those are replace by ./. The sort and count then works on just the number of names.

Here's one way to do it, but probably not the most efficient.
find -type d -print0 | xargs -0 -n1 bash -c 'echo -n "$1:"; ls -1 "$1" | wc -l' --
Gives output like this, with directory name followed by count of entries in that directory. Note that the output count will also include directory entries which may not be what you want.
./c/fa/l:0
./a:4
./a/c:0
./a/a:1
./a/a/b:0

Slightly modified version of Sebastian's answer using find instead of du (to exclude file-size-related overhead that du has to perform and that is never used):
find ./ -mindepth 2 -type f | cut -d/ -f2 | sort | uniq -c | sort -nr
-mindepth 2 parameter is used to exclude files in current directory. If you remove it, you'll see a bunch of lines like the following:
234 dir1
123 dir2
1 file1
1 file2
1 file3
...
1 fileN
(much like the du-based variant does)
If you do need to count the files in current directory as well, use this enhanced version:
{ find ./ -mindepth 2 -type f | cut -d/ -f2 | sort && find ./ -maxdepth 1 -type f | cut -d/ -f1; } | uniq -c | sort -nr
The output will be like the following:
234 dir1
123 dir2
42 .

Everyone else's solution has one drawback or another.
find -type d -readable -exec sh -c 'printf "%s " "$1"; ls -1UA "$1" | wc -l' sh {} ';'
Explanation:
-type d: we're interested in directories.
-readable: We only want them if it's possible to list the files in them. Note that find will still emit an error when it tries to search for more directories in them, but this prevents calling -exec for them.
-exec sh -c BLAH sh {} ';': for each directory, run this script fragment, with $0 set to sh and $1 set to the filename.
printf "%s " "$1": portably and minimally print the directory name, followed by only a space, not a newline.
ls -1UA: list the files, one per line, in directory order (to avoid stalling the pipe), excluding only the special directories . and ..
wc -l: count the lines

This can also be done with looping over ls instead of find
for f in */; do echo "$f -> $(ls $f | wc -l)"; done
Explanation:
for f in */; - loop over all directories
do echo "$f -> - print out each directory name
$(ls $f | wc -l) - call ls for this directory and count lines

This should return the directory name followed by the number of files in the directory.
findfiles() {
echo "$1" $(find "$1" -maxdepth 1 -type f | wc -l)
}
export -f findfiles
find ./ -type d -exec bash -c 'findfiles "$0"' {} \;
Example output:
./ 6
./foo 1
./foo/bar 2
./foo/bar/bazzz 0
./foo/bar/baz 4
./src 4
The export -f is required because the -exec argument of find does not allow executing a bash function unless you invoke bash explicitly, and you need to export the function defined in the current scope to the new shell explicitly.

My answer is a little different, due to the options of find, you can actually be much more flexible. Just try:
find . -type f -printf "%h\n" | sort | uniq -c
With the "%h" option to "-printf", find prints only the directory of the files it found. Then sort and count with "uniq -c". This prints the number of search result entries with the same directory, per directory.
Using further options on find, you can be much more flexible. For example, to get an overview how many files in which directory have been modified at a certain date, use:
find . -newermt "2022-01-01 00:00:00" -type f -printf "%TY-%Tm-%Td %h\n" | sort | uniq -c
This finds all files that have been modified since 1. January 2022, prints (with "-printf") the modification date and the directory, then sorts and counts them. In this example, each line in the result has the number of files, the date of modification (without time), and the directory.
Note that "-printf" may not be available in all versions of find I think.

I combined #glenn jackman's answer and #pcarvalho's answer(in comment list, there is something wrong with pcarvalho's answer because the extra style control function of character '`'(backtick)).
My script can accept path as an augument and sort the directory list as ls -l, also it can handles the problem of "space in file name".
#!/bin/bash
OLD_IFS="$IFS"
IFS=$'\n'
for dir in $(find $1 -maxdepth 1 -type d | sort);
do
files=("$dir"/*)
printf "%5d,%s\n" "${#files[#]}" "$dir"
done
FS="$OLD_IFS"
My first answer in stackoverflow, and I hope it can help someone ^_^

THis could be another way to browse through the directory structures and provide depth results.
find . -type d | awk '{print "echo -n \""$0" \";ls -l "$0" | grep -v total | wc -l" }' | sh

find . -type f -printf '%h\n' | sort | uniq -c
gives for example:
5 .
4 ./aln
5 ./aln/iq
4 ./bs
4 ./ft
6 ./hot

I tried with some of the others here but ended up with subfolders included in the file count when I only wanted the files. This prints ./folder/path<tab>nnn with the number of files, not including subfolders, for each subfolder in the current folder.
for d in `find . -type d -print`
do
echo -e "$d\t$(find $d -maxdepth 1 -type f -print | wc -l)"
done

This will give the overall count.
for file in */; do echo "$file -> $(ls $file | wc -l)"; done | cut -d ' ' -f 3| py --ji -l 'numpy.sum(l)'

A super fast miracle command, which recursively traverses files to count the number of images in a directory and organize the output by image extension:
find . -type f | sed -e 's/.*\.//' | sort | uniq -c | sort -n | grep -Ei '(tiff|bmp|jpeg|jpg|png|gif)$'
Credits: https://unix.stackexchange.com/a/386135/354980

I edited the script in order to exclude all node_modules directories inside the analyzed one.
This can be used to check if the project number of files is exceeding the maximum number that the file watcher can handle.
find . -type d ! -path "*node_modules*" -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
To check the maximum files that your system can watch:
cat /proc/sys/fs/inotify/max_user_watches
node_modules folder should be added to your IDE/editor excluded paths in slow systems, and the other files count shouldn't ideally exceed the maximum (which can be changed though).

Easy Method:
find ./|grep "Search_file.txt" |cut -d"/" -f2|sort |uniq -c

In my case I needed the count at subfolder level, so I did:
du -a | cut -d/ -f3 | sort | uniq -c | sort -nr

Easy way to recursively find files of a given type. In this case, .jpg files for all folders in current directory:
find . -name *.jpg -print | wc -l

omg why the complex commands. just use something like
find whatever_folder | wc -l

How to only get file name with Linux 'find'?

I'm using find to all files in directory, so I get a list of paths. However, I need only file names. i.e. I get ./dir1/dir2/file.txt and I want to get file.txt

In GNU find you can use -printf parameter for that, e.g.:
find /dir1 -type f -printf "%f\n"

If your find doesn't have a -printf option you can also use basename:
find ./dir1 -type f -exec basename {} \;

Use -execdir which automatically holds the current file in {}, for example:
find . -type f -execdir echo '{}' ';'
You can also use $PWD instead of . (on some systems it won't produce an extra dot in the front).
If you still got an extra dot, alternatively you can run:
find . -type f -execdir basename '{}' ';'
-execdir utility [argument ...] ;
The -execdir primary is identical to the -exec primary with the exception that utility will be executed from the directory that holds the current file.
When used + instead of ;, then {} is replaced with as many pathnames as possible for each invocation of utility. In other words, it'll print all filenames in one line.

If you are using GNU find
find . -type f -printf "%f\n"
Or you can use a programming language such as Ruby(1.9+)
$ ruby -e 'Dir["**/*"].each{|x| puts File.basename(x)}'
If you fancy a bash (at least 4) solution
shopt -s globstar
for file in **; do echo ${file##*/}; done

If you want to run some action against the filename only, using basename can be tough.
For example this:
find ~/clang+llvm-3.3/bin/ -type f -exec echo basename {} \;
will just echo basename /my/found/path. Not what we want if we want to execute on the filename.
But you can then xargs the output. for example to kill the files in a dir based on names in another dir:
cd dirIwantToRMin;
find ~/clang+llvm-3.3/bin/ -type f -exec basename {} \; | xargs rm

On mac (BSD find) use:
find /dir1 -type f -exec basename {} \;

As others have pointed out, you can combine find and basename, but by default the basename program will only operate on one path at a time, so the executable will have to be launched once for each path (using either find ... -exec or find ... | xargs -n 1), which may potentially be slow.
If you use the -a option on basename, then it can accept multiple filenames in a single invocation, which means that you can then use xargs without the -n 1, to group the paths together into a far smaller number of invocations of basename, which should be more efficient.
Example:
find /dir1 -type f -print0 | xargs -0 basename -a
Here I've included the -print0 and -0 (which should be used together), in order to cope with any whitespace inside the names of files and directories.
Here is a timing comparison, between the xargs basename -a and xargs -n1 basename versions. (For sake of a like-with-like comparison, the timings reported here are after an initial dummy run, so that they are both done after the file metadata has already been copied to I/O cache.) I have piped the output to cksum in both cases, just to demonstrate that the output is independent of the method used.
$ time sh -c 'find /usr/lib -type f -print0 | xargs -0 basename -a | cksum'
2532163462 546663
real 0m0.063s
user 0m0.058s
sys 0m0.040s
$ time sh -c 'find /usr/lib -type f -print0 | xargs -0 -n 1 basename | cksum'
2532163462 546663
real 0m14.504s
user 0m12.474s
sys 0m3.109s
As you can see, it really is substantially faster to avoid launching basename every time.

Honestly basename and dirname solutions are easier, but you can also check this out :
find . -type f | grep -oP "[^/]*$"
or
find . -type f | rev | cut -d '/' -f1 | rev
or
find . -type f | sed "s/.*\///"

-exec and -execdir are slow, xargs is king.
$ alias f='time find /Applications -name "*.app" -type d -maxdepth 5'; \
f -exec basename {} \; | wc -l; \
f -execdir echo {} \; | wc -l; \
f -print0 | xargs -0 -n1 basename | wc -l; \
f -print0 | xargs -0 -n1 -P 8 basename | wc -l; \
f -print0 | xargs -0 basename | wc -l
139
0m01.17s real 0m00.20s user 0m00.93s system
139
0m01.16s real 0m00.20s user 0m00.92s system
139
0m01.05s real 0m00.17s user 0m00.85s system
139
0m00.93s real 0m00.17s user 0m00.85s system
139
0m00.88s real 0m00.12s user 0m00.75s system
xargs's parallelism also helps.
Funnily enough i cannot explain the last case of xargs without -n1.
It gives the correct result and it's the fastest ¯\_(ツ)_/¯
(basename takes only 1 path argument but xargs will send them all (actually 5000) without -n1. does not work on linux and openbsd, only macOS...)
Some bigger numbers from a linux system to see how -execdir helps, but still much slower than a parallel xargs:
$ alias f='time find /usr/ -maxdepth 5 -type d'
$ f -exec basename {} \; | wc -l; \
f -execdir echo {} \; | wc -l; \
f -print0 | xargs -0 -n1 basename | wc -l; \
f -print0 | xargs -0 -n1 -P 8 basename | wc -l
2358
3.63s real 0.10s user 0.41s system
2358
1.53s real 0.05s user 0.31s system
2358
1.30s real 0.03s user 0.21s system
2358
0.41s real 0.03s user 0.25s system

I've found a solution (on makandracards page), that gives just the newest file name:
ls -1tr * | tail -1
(thanks goes to Arne Hartherz)
I used it for cp:
cp $(ls -1tr * | tail -1) /tmp/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string