Delete all files except the newest 3 in bash script - linux

Question: How do you delete all files in a directory except the newest 3?
Finding the newest 3 files is simple:
ls -t | head -3
But I need to find all files except the newest 3 files. How do I do that, and how do I delete these files in the same line without making an unnecessary for loop for that?
I'm using Debian Wheezy and bash scripts for this.

This will list all files except the newest three:
ls -t | tail -n +4
This will delete those files:
ls -t | tail -n +4 | xargs rm --
This will also list dotfiles:
ls -At | tail -n +4
and delete with dotfiles:
ls -At | tail -n +4 | xargs rm --
But beware: parsing ls can be dangerous when the filenames contain funny characters like newlines or spaces. If you are certain that your filenames do not contain funny characters then parsing ls is quite safe, even more so if it is a one time only script.
If you are developing a script for repeated use then you should most certainly not parse the output of ls and use the methods described here: http://mywiki.wooledge.org/ParsingLs

Solution without problems with "ls" (strange named files)
This is a combination of ceving's and anubhava's answer.
Both solutions are not working for me. Because I was looking for a script that should run every day for backing up files in an archive, I wanted to avoid problems with ls (someone could have saved some funny named file in my backup folder). So I modified the mentioned solutions to fit my needs.
My solution deletes all files, except the three newest files.
find . -type f -printf '%T#\t%p\n' |
sort -t $'\t' -g |
head -n -3 |
cut -d $'\t' -f 2- |
xargs rm
Some explanation:
find lists all files (not directories) in current folder. They are printed out with timestamps.
sort sorts the lines based on timestamp (oldest on top).
head prints out the top lines, up to the last 3 lines.
cut removes the timestamps.
xargs runs rm for every selected file.
For you to verify my solution:
(
touch -d "6 days ago" test_6_days_old
touch -d "7 days ago" test_7_days_old
touch -d "8 days ago" test_8_days_old
touch -d "9 days ago" test_9_days_old
touch -d "10 days ago" test_10_days_old
)
This creates 5 files with different timestamps in the current folder. Run this script first and then the code for deleting old files.

The following looks a bit complicated, but is very cautious to be correct, even with unusual or intentionally malicious filenames. Unfortunately, it requires GNU tools:
count=0
while IFS= read -r -d ' ' && IFS= read -r -d '' filename; do
(( ++count > 3 )) && printf '%s\0' "$filename"
done < <(find . -maxdepth 1 -type f -printf '%T# %P\0' | sort -g -z) \
| xargs -0 rm -f --
Explaining how this works:
Find emits <mtime> <filename><NUL> for each file in the current directory.
sort -g -z does a general (floating-point, as opposed to integer) numeric sort based on the first column (times) with the lines separated by NULs.
The first read in the while loop strips off the mtime (no longer needed after sort is done).
The second read in the while loop reads the filename (running until the NUL).
The loop increments, and then checks, a counter; if the counter's state indicates that we're past the initial skipping, then we print the filename, delimited by a NUL.
xargs -0 then appends that filename into the argv list it's collecting to invoke rm with.

ls -t | tail -n +4 | xargs -I {} rm {}
If you want a 1 liner

In zsh:
rm /files/to/delete/*(Om[1,-4])
If you want to include dotfiles, replace the parenthesized part with (Om[1,-4]D).
I think this works correctly with arbitrary chars in the filenames (just checked with newline).
Explanation: The parentheses contain Glob Qualifiers. O means "order by, descending", m means mtime (See man zshexpn for other sorting keys - large manpage; search for "be sorted"). [1,-4] returns only the matches at one-based index 1 to (last + 1 - 4) (note the -4 for deleting all but 3).

Don't use ls -t as it is unsafe for filenames that may contain whitespaces or special glob characters.
You can do this using all gnu based utilities to delete all but 3 newest files in the current directory:
find . -maxdepth 1 -type f -printf '%T#\t%p\0' |
sort -z -nrk1 |
tail -z -n +4 |
cut -z -f2- |
xargs -0 rm -f --

ls -t | tail -n +4 | xargs -I {} rm {}
Michael Ballent's answer works best as
ls -t | tail -n +4 | xargs rm --
throw me error if I have less than 3 file

Recursive script with arbitrary num of files to keep per-directory
Also handles files/dirs with spaces, newlines and other odd characters
#!/bin/bash
if (( $# != 2 )); then
echo "Usage: $0 </path/to/top-level/dir> <num files to keep per dir>"
exit
fi
while IFS= read -r -d $'\0' dir; do
# Find the nth oldest file
nthOldest=$(find "$dir" -maxdepth 1 -type f -printf '%T#\0%p\n' | sort -t '\0' -rg \
| awk -F '\0' -v num="$2" 'NR==num+1{print $2}')
if [[ -f "$nthOldest" ]]; then
find "$dir" -maxdepth 1 -type f ! -newer "$nthOldest" -exec rm {} +
fi
done < <(find "$1" -type d -print0)
Proof of concept
$ tree test/
test/
├── sub1
│   ├── sub1_0_days_old.txt
│   ├── sub1_1_days_old.txt
│   ├── sub1_2_days_old.txt
│   ├── sub1_3_days_old.txt
│   └── sub1\ 4\ days\ old\ with\ spaces.txt
├── sub2\ with\ spaces
│   ├── sub2_0_days_old.txt
│   ├── sub2_1_days_old.txt
│   ├── sub2_2_days_old.txt
│   └── sub2\ 3\ days\ old\ with\ spaces.txt
└── tld_0_days_old.txt
2 directories, 10 files
$ ./keepNewest.sh test/ 2
$ tree test/
test/
├── sub1
│   ├── sub1_0_days_old.txt
│   └── sub1_1_days_old.txt
├── sub2\ with\ spaces
│   ├── sub2_0_days_old.txt
│   └── sub2_1_days_old.txt
└── tld_0_days_old.txt
2 directories, 5 files

As an extension to the answer by flohall. If you want to remove all folders except the newest three folders use the following:
find . -maxdepth 1 -mindepth 1 -type d -printf '%T#\t%p\n' |
sort -t $'\t' -g |
head -n -3 |
cut -d $'\t' -f 2- |
xargs rm -rf
The -mindepth 1 will ignore the parent folder and -maxdepth 1 subfolders.

This uses find instead of ls with a Schwartzian transform.
find . -type f -printf '%T#\t%p\n' |
sort -t $'\t' -g |
tail -3 |
cut -d $'\t' -f 2-
find searches the files and decorates them with a time stamp and uses the tabulator to separate the two values. sort splits the input by the tabulator and performs a general numeric sort, which sorts floating point numbers correctly. tail should be obvious and cut undecorates.
The problem with decorations in general is to find a suitable delimiter, which is not part of the input, the file names. This answer uses the NULL character.

Below worked for me:
rm -rf $(ll -t | tail -n +5 | awk '{ print $9}')

Related

Find directories where a text is found in a specific file

How can I find the directories where a text is found in a specific file? E.g. I want to get all the directories in "/var/www/" that contain the text "foo-bundle" in the composer.json file. I have a command that already does it:
find ./ -maxdepth 2 -type f -print | grep -i 'composer.json' | xargs grep -i '"foo-bundle"'
However I want to make an sh script that gets all those directories and do things with them. Any idea?
find
Your current command is almost there, instead off using xargs with grep, lets:
Move the grep to an -exec
Use xargs to pass the result to dirname to show only the parent folder
find ./ -maxdepth 2 -type f -exec grep -l "foo-bundle" {} /dev/null \; | xargs dirname
If you only want to search for composer.json files, we can include the -iname option like so:
find ./ -maxdepth 2 -type f -iname '*composer.json' -exec grep -l "foo-bundle" {} /dev/null \; | xargs dirname
If the | xargs dirname doesn't give enough data, we can extend it so we can loop over the results of find using a while read like so:
find ./ -maxdepth 2 -type f -iname '*composer.json' -exec grep -l "foo-bundle" {} /dev/null \; | while read -r line ; do
parent="$(dirname ${line%%:*})"
echo "$parent"
done
grep
We can use grep to search for all files containing a specific text.
After looping over each line, we can
Remove behind the : to get the filepath
Use dirname to get the parent folder path
Consider this file setup, were /test/b/composer.json contains foo-bundle
➜ /tmp tree
.
├── test
│   ├── a
│   │   └── composer.json
│   └── b
│   └── composer.json
└── test.sh
When running the following test.sh:
#!/bin/bash
grep -rw '/tmp/test' --include '*composer.json' -e 'foo-bundle' | while read -r line ; do
parent="$(dirname ${line%:*})"
echo "$parent"
done
The result is as expected, the path to folder b:
/tmp/test/b
In order to find all files, containing a particular piece of text, you can use:
find ./ -maxdepth 2 -type f -exec grep -l "composer.json" {} /dev/null \;
The result is a list of filenames. Now all you need to do is to get a way to launch the command dirname on all of them. (I tried using a simple pipe, but that would have been too easy :-) )
Thanks to #0stone0 for leading the way. I finally got it with:
#!/bin/sh
find /var/www -maxdepth 2 -type f -print | grep -i 'composer.json' | xargs grep -i 'foo-bundle' | while read -r line ; do
parent="$(dirname ${line%%:*})"
echo "$parent"
done

How to get occurrences of word in all files? But with count of the words per directory instead of single number

I would like to get given word count in all the files but per directory instead of a single count. I do get the word count with simple grep foo error*.log | wc -l by going to a specific directory. I would like to get the word count per directory when the directory structure is like below.
Directory tree
.
├── dir1
│   └── error2.log
└── error1.log
└── dir2
└── error_123.log
└── error_234.log
── dir3
└── error_12345.log
└── error_23554.log
Update: The following command can be used on AIX:
#!/bin/bash
for name in /path/to/folder/* ; do
if [ ! -d "${name}" ] ; then
continue
fi
# See: https://unix.stackexchange.com/a/398414/45365
count="$(cat "${name}"/error*.log | tr '[:space:]' '[\n*]' | grep -c 'SEARCH')"
printf "%s %s\n" "${name}" "${count}"
done
On GNU/Linux, with GNU findutils and GNU grep:
find /path/to/folder -maxdepth 1 -type d \
-printf "%p " -exec bash -c 'grep -ro 'SEARCH' {} | wc -l' \;
Replace SEARCH by the actual search term.

How to count number of files in each directory?

I am able to list all the directories by
find ./ -type d
I attempted to list the contents of each directory and count the number of files in each directory by using the following command
find ./ -type d | xargs ls -l | wc -l
But this summed the total number of lines returned by
find ./ -type d | xargs ls -l
Is there a way I can count the number of files in each directory?
This prints the file count per directory for the current directory level:
du -a | cut -d/ -f2 | sort | uniq -c | sort -nr
Assuming you have GNU find, let it find the directories and let bash do the rest:
find . -type d -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
find . -type f | cut -d/ -f2 | sort | uniq -c
find . -type f to find all items of the type file, in current folder and subfolders
cut -d/ -f2 to cut out their specific folder
sort to sort the list of foldernames
uniq -c to return the number of times each foldername has been counted
You could arrange to find all the files, remove the file names, leaving you a line containing just the directory name for each file, and then count the number of times each directory appears:
find . -type f |
sed 's%/[^/]*$%%' |
sort |
uniq -c
The only gotcha in this is if you have any file names or directory names containing a newline character, which is fairly unlikely. If you really have to worry about newlines in file names or directory names, I suggest you find them, and fix them so they don't contain newlines (and quietly persuade the guilty party of the error of their ways).
If you're interested in the count of the files in each sub-directory of the current directory, counting any files in any sub-directories along with the files in the immediate sub-directory, then I'd adapt the sed command to print only the top-level directory:
find . -type f |
sed -e 's%^\(\./[^/]*/\).*$%\1%' -e 's%^\.\/[^/]*$%./%' |
sort |
uniq -c
The first pattern captures the start of the name, the dot, the slash, the name up to the next slash and the slash, and replaces the line with just the first part, so:
./dir1/dir2/file1
is replaced by
./dir1/
The second replace captures the files directly in the current directory; they don't have a slash at the end, and those are replace by ./. The sort and count then works on just the number of names.
Here's one way to do it, but probably not the most efficient.
find -type d -print0 | xargs -0 -n1 bash -c 'echo -n "$1:"; ls -1 "$1" | wc -l' --
Gives output like this, with directory name followed by count of entries in that directory. Note that the output count will also include directory entries which may not be what you want.
./c/fa/l:0
./a:4
./a/c:0
./a/a:1
./a/a/b:0
Slightly modified version of Sebastian's answer using find instead of du (to exclude file-size-related overhead that du has to perform and that is never used):
find ./ -mindepth 2 -type f | cut -d/ -f2 | sort | uniq -c | sort -nr
-mindepth 2 parameter is used to exclude files in current directory. If you remove it, you'll see a bunch of lines like the following:
234 dir1
123 dir2
1 file1
1 file2
1 file3
...
1 fileN
(much like the du-based variant does)
If you do need to count the files in current directory as well, use this enhanced version:
{ find ./ -mindepth 2 -type f | cut -d/ -f2 | sort && find ./ -maxdepth 1 -type f | cut -d/ -f1; } | uniq -c | sort -nr
The output will be like the following:
234 dir1
123 dir2
42 .
Everyone else's solution has one drawback or another.
find -type d -readable -exec sh -c 'printf "%s " "$1"; ls -1UA "$1" | wc -l' sh {} ';'
Explanation:
-type d: we're interested in directories.
-readable: We only want them if it's possible to list the files in them. Note that find will still emit an error when it tries to search for more directories in them, but this prevents calling -exec for them.
-exec sh -c BLAH sh {} ';': for each directory, run this script fragment, with $0 set to sh and $1 set to the filename.
printf "%s " "$1": portably and minimally print the directory name, followed by only a space, not a newline.
ls -1UA: list the files, one per line, in directory order (to avoid stalling the pipe), excluding only the special directories . and ..
wc -l: count the lines
This can also be done with looping over ls instead of find
for f in */; do echo "$f -> $(ls $f | wc -l)"; done
Explanation:
for f in */; - loop over all directories
do echo "$f -> - print out each directory name
$(ls $f | wc -l) - call ls for this directory and count lines
This should return the directory name followed by the number of files in the directory.
findfiles() {
echo "$1" $(find "$1" -maxdepth 1 -type f | wc -l)
}
export -f findfiles
find ./ -type d -exec bash -c 'findfiles "$0"' {} \;
Example output:
./ 6
./foo 1
./foo/bar 2
./foo/bar/bazzz 0
./foo/bar/baz 4
./src 4
The export -f is required because the -exec argument of find does not allow executing a bash function unless you invoke bash explicitly, and you need to export the function defined in the current scope to the new shell explicitly.
My answer is a little different, due to the options of find, you can actually be much more flexible. Just try:
find . -type f -printf "%h\n" | sort | uniq -c
With the "%h" option to "-printf", find prints only the directory of the files it found. Then sort and count with "uniq -c". This prints the number of search result entries with the same directory, per directory.
Using further options on find, you can be much more flexible. For example, to get an overview how many files in which directory have been modified at a certain date, use:
find . -newermt "2022-01-01 00:00:00" -type f -printf "%TY-%Tm-%Td %h\n" | sort | uniq -c
This finds all files that have been modified since 1. January 2022, prints (with "-printf") the modification date and the directory, then sorts and counts them. In this example, each line in the result has the number of files, the date of modification (without time), and the directory.
Note that "-printf" may not be available in all versions of find I think.
I combined #glenn jackman's answer and #pcarvalho's answer(in comment list, there is something wrong with pcarvalho's answer because the extra style control function of character '`'(backtick)).
My script can accept path as an augument and sort the directory list as ls -l, also it can handles the problem of "space in file name".
#!/bin/bash
OLD_IFS="$IFS"
IFS=$'\n'
for dir in $(find $1 -maxdepth 1 -type d | sort);
do
files=("$dir"/*)
printf "%5d,%s\n" "${#files[#]}" "$dir"
done
FS="$OLD_IFS"
My first answer in stackoverflow, and I hope it can help someone ^_^
THis could be another way to browse through the directory structures and provide depth results.
find . -type d | awk '{print "echo -n \""$0" \";ls -l "$0" | grep -v total | wc -l" }' | sh
find . -type f -printf '%h\n' | sort | uniq -c
gives for example:
5 .
4 ./aln
5 ./aln/iq
4 ./bs
4 ./ft
6 ./hot
I tried with some of the others here but ended up with subfolders included in the file count when I only wanted the files. This prints ./folder/path<tab>nnn with the number of files, not including subfolders, for each subfolder in the current folder.
for d in `find . -type d -print`
do
echo -e "$d\t$(find $d -maxdepth 1 -type f -print | wc -l)"
done
This will give the overall count.
for file in */; do echo "$file -> $(ls $file | wc -l)"; done | cut -d ' ' -f 3| py --ji -l 'numpy.sum(l)'
A super fast miracle command, which recursively traverses files to count the number of images in a directory and organize the output by image extension:
find . -type f | sed -e 's/.*\.//' | sort | uniq -c | sort -n | grep -Ei '(tiff|bmp|jpeg|jpg|png|gif)$'
Credits: https://unix.stackexchange.com/a/386135/354980
I edited the script in order to exclude all node_modules directories inside the analyzed one.
This can be used to check if the project number of files is exceeding the maximum number that the file watcher can handle.
find . -type d ! -path "*node_modules*" -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
To check the maximum files that your system can watch:
cat /proc/sys/fs/inotify/max_user_watches
node_modules folder should be added to your IDE/editor excluded paths in slow systems, and the other files count shouldn't ideally exceed the maximum (which can be changed though).
Easy Method:
find ./|grep "Search_file.txt" |cut -d"/" -f2|sort |uniq -c
In my case I needed the count at subfolder level, so I did:
du -a | cut -d/ -f3 | sort | uniq -c | sort -nr
Easy way to recursively find files of a given type. In this case, .jpg files for all folders in current directory:
find . -name *.jpg -print | wc -l
omg why the complex commands. just use something like
find whatever_folder | wc -l

Recursively counting files in a Linux directory

How can I recursively count files in a Linux directory?
I found this:
find DIR_NAME -type f ¦ wc -l
But when I run this it returns the following error.
find: paths must precede expression: ¦
This should work:
find DIR_NAME -type f | wc -l
Explanation:
-type f to include only files.
| (and not ¦) redirects find command's standard output to wc command's standard input.
wc (short for word count) counts newlines, words and bytes on its input (docs).
-l to count just newlines.
Notes:
Replace DIR_NAME with . to execute the command in the current folder.
You can also remove the -type f to include directories (and symlinks) in the count.
It's possible this command will overcount if filenames can contain newline characters.
Explanation of why your example does not work:
In the command you showed, you do not use the "Pipe" (|) to kind-of connect two commands, but the broken bar (¦) which the shell does not recognize as a command or something similar. That's why you get that error message.
For the current directory:
find -type f | wc -l
If you want a breakdown of how many files are in each dir under your current dir:
for i in */ .*/ ; do
echo -n $i": " ;
(find "$i" -type f | wc -l) ;
done
That can go all on one line, of course. The parenthesis clarify whose output wc -l is supposed to be watching (find $i -type f in this case).
On my computer, rsync is a little bit faster than find | wc -l in the accepted answer:
$ rsync --stats --dry-run -ax /path/to/dir /tmp
Number of files: 173076
Number of files transferred: 150481
Total file size: 8414946241 bytes
Total transferred file size: 8414932602 bytes
The second line has the number of files, 150,481 in the above example. As a bonus you get the total size as well (in bytes).
Remarks:
the first line is a count of files, directories, symlinks, etc all together, that's why it is bigger than the second line.
the --dry-run (or -n for short) option is important to not actually transfer the files!
I used the -x option to "don't cross filesystem boundaries", which means if you execute it for / and you have external hard disks attached, it will only count the files on the root partition.
You can use
$ tree
after installing the tree package with
$ sudo apt-get install tree
(on a Debian / Mint / Ubuntu Linux machine).
The command shows not only the count of the files, but also the count of the directories, separately. The option -L can be used to specify the maximum display level (which, by default, is the maximum depth of the directory tree).
Hidden files can be included too by supplying the -a option .
Since filenames in UNIX may contain newlines (yes, newlines), wc -l might count too many files. I would print a dot for every file and then count the dots:
find DIR_NAME -type f -printf "." | wc -c
Note: The -printf option does only work with find from GNU findutils. You may need to install it, on a Mac for example.
Combining several of the answers here together, the most useful solution seems to be:
find . -maxdepth 1 -type d -print0 |
xargs -0 -I {} sh -c 'echo -e $(find "{}" -printf "\n" | wc -l) "{}"' |
sort -n
It can handle odd things like file names that include spaces parenthesis and even new lines. It also sorts the output by the number of files.
You can increase the number after -maxdepth to get sub directories counted too. Keep in mind that this can potentially take a long time, particularly if you have a highly nested directory structure in combination with a high -maxdepth number.
If you want to know how many files and sub-directories exist from the present working directory you can use this one-liner
find . -maxdepth 1 -type d -print0 | xargs -0 -I {} sh -c 'echo -e $(find {} | wc -l) {}' | sort -n
This will work in GNU flavour, and just omit the -e from the echo command for BSD linux (e.g. OSX).
You can use the command ncdu. It will recursively count how many files a Linux directory contains. Here is an example of output:
It has a progress bar, which is convenient if you have many files:
To install it on Ubuntu:
sudo apt-get install -y ncdu
Benchmark: I used https://archive.org/details/cv_corpus_v1.tar (380390 files, 11 GB) as the folder where one has to count the number of files.
find . -type f | wc -l: around 1m20s to complete
ncdu: around 1m20s to complete
If what you need is to count a specific file type recursively, you can do:
find YOUR_PATH -name '*.html' -type f | wc -l
-l is just to display the number of lines in the output.
If you need to exclude certain folders, use -not -path
find . -not -path './node_modules/*' -name '*.js' -type f | wc -l
tree $DIR_PATH | tail -1
Sample Output:
5309 directories, 2122 files
If you want to avoid error cases, don't allow wc -l to see files with newlines (which it will count as 2+ files)
e.g. Consider a case where we have a single file with a single EOL character in it
> mkdir emptydir && cd emptydir
> touch $'file with EOL(\n) character in it'
> find -type f
./file with EOL(?) character in it
> find -type f | wc -l
2
Since at least gnu wc does not appear to have an option to read/count a null terminated list (except from a file), the easiest solution would just be to not pass it filenames, but a static output each time a file is found, e.g. in the same directory as above
> find -type f -exec printf '\n' \; | wc -l
1
Or if your find supports it
> find -type f -printf '\n' | wc -l
1
To determine how many files there are in the current directory, put in ls -1 | wc -l. This uses wc to do a count of the number of lines (-l) in the output of ls -1. It doesn't count dotfiles. Please note that ls -l (that's an "L" rather than a "1" as in the previous examples) which I used in previous versions of this HOWTO will actually give you a file count one greater than the actual count. Thanks to Kam Nejad for this point.
If you want to count only files and NOT include symbolic links (just an example of what else you could do), you could use ls -l | grep -v ^l | wc -l (that's an "L" not a "1" this time, we want a "long" listing here). grep checks for any line beginning with "l" (indicating a link), and discards that line (-v).
Relative speed: "ls -1 /usr/bin/ | wc -l" takes about 1.03 seconds on an unloaded 486SX25 (/usr/bin/ on this machine has 355 files). "ls -l /usr/bin/ | grep -v ^l | wc -l" takes about 1.19 seconds.
Source: http://www.tldp.org/HOWTO/Bash-Prompt-HOWTO/x700.html
With bash:
Create an array of entries with ( ) and get the count with #.
FILES=(./*); echo ${#FILES[#]}
Ok that doesn't recursively count files but I wanted to show the simple option first. A common use case might be for creating rollover backups of a file. This will create logfile.1, logfile.2, logfile.3 etc.
CNT=(./logfile*); mv logfile logfile.${#CNT[#]}
Recursive count with bash 4+ globstar enabled (as mentioned by #tripleee)
FILES=(**/*); echo ${#FILES[#]}
To get the count of files recursively we can still use find in the same way.
FILES=(`find . -type f`); echo ${#FILES[#]}
For directories with spaces in the name ... (based on various answers above) -- recursively print directory name with number of files within:
find . -mindepth 1 -type d -print0 | while IFS= read -r -d '' i ; do echo -n $i": " ; ls -p "$i" | grep -v / | wc -l ; done
Example (formatted for readability):
pwd
/mnt/Vancouver/Programming/scripts/claws/corpus
ls -l
total 8
drwxr-xr-x 2 victoria victoria 4096 Mar 28 15:02 'Catabolism - Autophagy; Phagosomes; Mitophagy'
drwxr-xr-x 3 victoria victoria 4096 Mar 29 16:04 'Catabolism - Lysosomes'
ls 'Catabolism - Autophagy; Phagosomes; Mitophagy'/ | wc -l
138
## 2 dir (one with 28 files; other with 1 file):
ls 'Catabolism - Lysosomes'/ | wc -l
29
The directory structure is better visualized using tree:
tree -L 3 -F .
.
├── Catabolism - Autophagy; Phagosomes; Mitophagy/
│   ├── 1
│   ├── 10
│   ├── [ ... SNIP! (138 files, total) ... ]
│   ├── 98
│   └── 99
└── Catabolism - Lysosomes/
├── 1
├── 10
├── [ ... SNIP! (28 files, total) ... ]
├── 8
├── 9
└── aaa/
└── bbb
3 directories, 167 files
man find | grep mindep
-mindepth levels
Do not apply any tests or actions at levels less than levels
(a non-negative integer). -mindepth 1 means process all files
except the starting-points.
ls -p | grep -v / (used below) is from answer 2 at https://unix.stackexchange.com/questions/48492/list-only-regular-files-but-not-directories-in-current-directory
find . -mindepth 1 -type d -print0 | while IFS= read -r -d '' i ; do echo -n $i": " ; ls -p "$i" | grep -v / | wc -l ; done
./Catabolism - Autophagy; Phagosomes; Mitophagy: 138
./Catabolism - Lysosomes: 28
./Catabolism - Lysosomes/aaa: 1
Applcation: I want to find the max number of files among several hundred directories (all depth = 1) [output below again formatted for readability]:
date; pwd
Fri Mar 29 20:08:08 PDT 2019
/home/victoria/Mail/2_RESEARCH - NEWS
time find . -mindepth 1 -type d -print0 | while IFS= read -r -d '' i ; do echo -n $i": " ; ls -p "$i" | grep -v / | wc -l ; done > ../../aaa
0:00.03
[victoria#victoria 2_RESEARCH - NEWS]$ head -n5 ../../aaa
./RNA - Exosomes: 26
./Cellular Signaling - Receptors: 213
./Catabolism - Autophagy; Phagosomes; Mitophagy: 138
./Stress - Physiological, Cellular - General: 261
./Ancient DNA; Ancient Protein: 34
[victoria#victoria 2_RESEARCH - NEWS]$ sed -r 's/(^.*): ([0-9]{1,8}$)/\2: \1/g' ../../aaa | sort -V | (head; echo ''; tail)
0: ./Genomics - Gene Drive
1: ./Causality; Causal Relationships
1: ./Cloning
1: ./GenMAPP 2
1: ./Pathway Interaction Database
1: ./Wasps
2: ./Cellular Signaling - Ras-MAPK Pathway
2: ./Cell Death - Ferroptosis
2: ./Diet - Apples
2: ./Environment - Waste Management
988: ./Genomics - PPM (Personalized & Precision Medicine)
1113: ./Microbes - Pathogens, Parasites
1418: ./Health - Female
1420: ./Immunity, Inflammation - General
1522: ./Science, Research - Miscellaneous
1797: ./Genomics
1910: ./Neuroscience, Neurobiology
2740: ./Genomics - Functional
3943: ./Cancer
4375: ./Health - Disease
sort -V is a natural sort. ... So, my max number of files in any of those (Claws Mail) directories is 4375 files. If I left-pad (https://stackoverflow.com/a/55409116/1904943) those filenames -- they are all named numerically, starting with 1, in each directory -- and pad to 5 total digits, I should be ok.
Addendum
Find the total number of files, subdirectories in a directory.
$ date; pwd
Tue 14 May 2019 04:08:31 PM PDT
/home/victoria/Mail/2_RESEARCH - NEWS
$ ls | head; echo; ls | tail
Acoustics
Ageing
Ageing - Calorie (Dietary) Restriction
Ageing - Senescence
Agriculture, Aquaculture, Fisheries
Ancient DNA; Ancient Protein
Anthropology, Archaeology
Ants
Archaeology
ARO-Relevant Literature, News
Transcriptome - CAGE
Transcriptome - FISSEQ
Transcriptome - RNA-seq
Translational Science, Medicine
Transposons
USACEHR-Relevant Literature
Vaccines
Vision, Eyes, Sight
Wasps
Women in Science, Medicine
$ find . -type f | wc -l
70214 ## files
$ find . -type d | wc -l
417 ## subdirectories
There are many correct answers here. Here's another!
find . -type f | sort | uniq -w 10 -c
where . is the folder to look in and 10 is the number of characters by which to group the directory.
I have written ffcnt to speed up recursive file counting under specific circumstances: rotational disks and filesystems that support extent mapping.
It can be an order of magnitude faster than ls or find based approaches, but YMMV.
suppose you want a per directory total files, try:
for d in `find YOUR_SUBDIR_HERE -type d`; do
printf "$d - files > "
find $d -type f | wc -l
done
for current dir try this:
for d in `find . -type d`; do printf "$d - files > "; find $d -type f | wc -l; done;
if you have long space names you need change IFS, like this:
OIFS=$IFS; IFS=$'\n'
for d in `find . -type d`; do printf "$d - files > "; find $d -type f | wc -l; done
IFS=$OIFS
We can use tree command it displays all the files and folders recursively. As well as it displays count of folders and files in last line of output.
$ tree path/to/folder/
path/to/folder/
├── a-first.html
├── b-second.html
├── subfolder
│ ├── readme.html
│ ├── code.cpp
│ └── code.h
└── z-last-file.html
1 directories, 6 files
For only last line of output in tree command we can use tail command on it's output
$ tree path/to/folder/ | tail -1
1 directories, 6 files
for installing tree we can use below command
$ sudo apt-get install tree
This alternate approach with filtering for format counts all available grub kernel modules:
ls -l /boot/grub/*.mod | wc -l
Based on the responses given above and comments, I've came up with the following file count listing. Especially it's a combination of the solution provided by #Greg Bell, with comments from #Arch Stanton
& #Schneems
Count all files in the current directory & subdirectories
function countit { find . -maxdepth 1000000 -type d -print0 | while IFS= read -r -d '' i ; do file_count=$(find "$i" -type f | wc -l) ; echo "$file_count: $i" ; done }; countit | sort -n -r >file-count.txt
Count all files of given name in the current directory & subdirectories
function countit { find . -maxdepth 1000000 -type d -print0 | while IFS= read -r -d '' i ; do file_count=$(find "$i" -type f | grep <enter_filename_here> | wc -l) ; echo "$file_count: $i" ; done }; countit | sort -n -r >file-with-name-count.txt
find -type f | wc -l
OR (If directory is current directory)
find . -type f | wc -l
This will work completely fine. Simple short. If you want to count the number of files present in a folder.
ls | wc -l
ls -l | grep -e -x -e -dr | wc -l
long list
filter files and dirs
count the filtered line no

Shell script to count files, then remove oldest files

I am new to shell scripting, so I need some help here. I have a directory that fills up with backups. If I have more than 10 backup files, I would like to remove the oldest files, so that the 10 newest backup files are the only ones that are left.
So far, I know how to count the files, which seems easy enough, but how do I then remove the oldest files, if the count is over 10?
if [ls /backups | wc -l > 10]
then
echo "More than 10"
fi
Try this:
ls -t | sed -e '1,10d' | xargs -d '\n' rm
This should handle all characters (except newlines) in a file name.
What's going on here?
ls -t lists all files in the current directory in decreasing order of modification time. Ie, the most recently modified files are first, one file name per line.
sed -e '1,10d' deletes the first 10 lines, ie, the 10 newest files. I use this instead of tail because I can never remember whether I need tail -n +10 or tail -n +11.
xargs -d '\n' rm collects each input line (without the terminating newline) and passes each line as an argument to rm.
As with anything of this sort, please experiment in a safe place.
find is the common tool for this kind of task :
find ./my_dir -mtime +10 -type f -delete
EXPLANATIONS
./my_dir your directory (replace with your own)
-mtime +10 older than 10 days
-type f only files
-delete no surprise. Remove it to test your find filter before executing the whole command
And take care that ./my_dir exists to avoid bad surprises !
Make sure your pwd is the correct directory to delete the files then(assuming only regular characters in the filename):
ls -A1t | tail -n +11 | xargs rm
keeps the newest 10 files. I use this with camera program 'motion' to keep the most recent frame grab files. Thanks to all proceeding answers because you showed me how to do it.
The proper way to do this type of thing is with logrotate.
I like the answers from #Dennis Williamson and #Dale Hagglund. (+1 to each)
Here's another way to do it using find (with the -newer test) that is similar to what you started with.
This was done in bash on cygwin...
if [[ $(ls /backups | wc -l) > 10 ]]
then
find /backups ! -newer $(ls -t | sed '11!d') -exec rm {} \;
fi
Straightforward file counter:
max=12
n=0
ls -1t *.dat |
while read file; do
n=$((n+1))
if [[ $n -gt $max ]]; then
rm -f "$file"
fi
done
I just found this topic and the solution from mikecolley helped me in a first step. As I needed a solution for a single line homematic (raspberrymatic) script, I ran into a problem that this command only gave me the fileames and not the whole path which is needed for "rm". My used CUxD Exec command can not start in a selected folder.
So here is my solution:
ls -A1t $(find /media/usb0/backup/ -type f -name homematic-raspi*.sbk) | tail -n +11 | xargs rm
Explaining:
find /media/usb0/backup/ -type f -name homematic-raspi*.sbk searching only files -type f whiche are named like -name homematic-raspi*.sbk (case sensitive) or use -iname (case insensitive) in folder /media/usb0/backup/
ls -A1t $(...) list the files given by find without files starting with "." or ".." -A sorted by mtime -t and with a return of only one column -1
tail -n +11 return of only the last 10 -n +11 lines for following rm
xargs rm and finally remove the raiming files in the list
Maybe this helps others from longer searching and makes the solution more flexible.
stat -c "%Y %n" * | sort -rn | head -n +10 | \
cut -d ' ' -f 1 --complement | xargs -d '\n' rm
Breakdown: Get last-modified times for each file (in the format "time filename"), sort them from oldest to newest, keep all but the last ten entries, and then keep all but the first field (keep only the filename portion).
Edit: Using cut instead of awk since the latter is not always available
Edit 2: Now handles filenames with spaces
On a very limited chroot environment, we had only a couple of programs available to achieve what was initially asked. We solved it that way:
MIN_FILES=5
FILE_COUNT=$(ls -l | grep -c ^d )
if [ $MIN_FILES -lt $FILE_COUNT ]; then
while [ $MIN_FILES -lt $FILE_COUNT ]; do
FILE_COUNT=$[$FILE_COUNT-1]
FILE_TO_DEL=$(ls -t | tail -n1)
# be careful with this one
rm -rf "$FILE_TO_DEL"
done
fi
Explanation:
FILE_COUNT=$(ls -l | grep -c ^d ) counts all files in the current folder. Instead of grep we could use also wc -l but wc was not installed on that host.
FILE_COUNT=$[$FILE_COUNT-1] update the current $FILE_COUNT
FILE_TO_DEL=$(ls -t | tail -n1) Save the oldest file name in the $FILE_TO_DEL variable. tail -n1 returns the last element in the list.
Based on others suggestions and some awk foo, I got this to work. I know this an old thread, but I didn't find a decent answer here and this sorted it for me. This just deletes the oldest file, but you can change the head -n 1 to 10 and get the oldest 10.
find $DIR -type f -printf '%T+ %p\n' | sort | head -n 1 | awk '{first =$1; $1 =""; print $0}' | xargs -d '\n' rm
Using inode numbers via stat & find command (to avoid pesky-chars-in-file-name issues):
stat -f "%m %i" * | sort -rn -k 1,1 | tail -n +11 | cut -d " " -f 2 | \
xargs -n 1 -I '{}' find "$(pwd)" -type f -inum '{}' -print
#stat -f "%m %i" * | sort -rn -k 1,1 | tail -n +11 | cut -d " " -f 2 | \
# xargs -n 1 -I '{}' find "$(pwd)" -type f -inum '{}' -delete

Resources