piping empty find result to du through xargs results in unexpected behavior - linux

I came up with a command to find files and print their sizes using find, xargs, and du. I am having a problem when I search for something that does not exist. Using the xargs method, du reports all the folders when something doesn't exist, but I expect it to report nothing because nothing should be found. When using the -exec method it works correctly, but from what I have read and observed in bigger searches, it is less efficient because it repeats the du command for each file found instead of operating on the group of files found. See the section where it mentions -delete: http://content.hccfl.edu/pollock/unix/findcmd.htm
Here is an example. First, this is what is in the directories:
ls
bar_dir/ test1.foo test2.foo test3.foo
ls bar_dir
test1.bar test2.bar test3.bar
Here are two searches where I expect to find results:
find . -name '*.foo' -type f -print0 | xargs -0 du -h
4.0K ./test2.foo
4.0K ./test1.foo
4.0K ./test3.foo
find . -name '*.bar' -type f -print0 | xargs -0 du -h
4.0K ./bar_dir/test1.bar
4.0K ./bar_dir/test2.bar
4.0K ./bar_dir/test3.bar
Here is a search where I expect no results, but instead I get a listing of directories:
find . -name '*.qux' -type f -print0 | xargs -0 du -h
16K ./bar_dir
32K .
If I just use find, it returns nothing (as expected)
find . -name '*.qux' -print0
And if I use the -exec method for du, it also returns nothing (as expected)
find . -name '*.qux' -type f -exec du -h '{}' \;
So what is the matter with the xargs du method when find doesn't find anything? Thanks for your time.

Did you look at du --files0-from - ?
From man du
--files0-from=F
summarize disk usage of the NUL-terminated file names specified in file F; If F is - then read names from standard input
Try like this:
find . -name '*.qux' -type f -print0 | du -h --files0-from -

Related

Find all files pattern with total size

In order to find all logs files with a pattern from all subdirectories I used the command :
du -csh *log.2017*
But this command does not search in subdirectories. Is there any way to get the total size of all files with a pattern from all subdirectories?
This will do the trick:
find . -name *log.2017* | xargs du -csh
find . -name *log.2017* -type f -exec stat -c "%s" {} \; | paste -sd+ | bc
you can use find command
find /path -type f -name "*log.2017*" -exec stat -c "%s" {} \; | bc
It will do the search recursively.

Adjusting xargs to accept ls -lh

I want to find files larger than X MB, so I run
find data/ -size +2M
but I need MB next to each file, so I tried this:
find data/ -size +2M | xargs -I '{}' ls -lh '{}'
Above seems to list all files regardless of size, is the xargs part incorrect and it also does a ls on the data/ rather than on the matching files ?
How should the above be written ?
It worked OK if I specify -type f but I think that is not the solution.
find data/ -size +2M -type f | xargs -I '{}' ls -lh '{}'
This might help you sudo find / -size +2M -exec ls -s1h {} \;

calculate total used disk space by files older than 180 days using find

I am trying to find the total disk space used by files older than 180 days in a particular directory. This is what I'm using:
find . -mtime +180 -exec du -sh {} \;
but the above is quiet evidently giving me disk space used by every file that is found. I want only the total added disk space used by the files. Can this be done using find and exec command ?
Please note I simply don't want to use a script for this, it will be great if there could be a one liner for this. Any help is highly appreciated.
Why not this?
find /path/to/search/in -type f -mtime +180 -print0 | du -hc --files0-from - | tail -n 1
#PeterT is right. Almost all these answers invoke a command (du) for each file, which is very resource intensive and slow and unnecessary. The simplest and fastest way is this:
find . -type f -mtime +356 -printf '%s\n' | awk '{total=total+$1}END{print total/1024}'
du wouldn't summarize if you pass a list of files to it.
Instead, pipe the output to cut and let awk sum it up. So you can say:
find . -mtime +180 -exec du -ks {} \; | cut -f1 | awk '{total=total+$1}END{print total/1024}'
Note that the option -h to display the result in human-readable format has been replaced by -k which is equivalent to block size of 1K. The result is presented in MB (see total/1024 above).
Be careful not to take into account the disk usage by the directories. For example, I have a lot of files in my ~/tmp directory:
$ du -sh ~/tmp
3,7G /home/rpet/tmp
Running the first part of example posted by devnull to find the files modified in the last 24 hours, we can see that awk will sum the whole disk usage of the ~/tmp directory:
$ find ~/tmp -mtime 0 -exec du -ks {} \; | cut -f1
3849848
84
80
But there is only one file modified in that period of time, with very little disk usage:
$ find ~/tmp -mtime 0
/home/rpet/tmp
/home/rpet/tmp/kk
/home/rpet/tmp/kk/test.png
$ du -sh ~/tmp/kk
84K /home/rpet/tmp/kk
So we need to take into account only the files and exclude the directories:
$ find ~/tmp -type f -mtime 0 -exec du -ks {} \; | cut -f1 | awk '{total=total+$1}END{print total/1024}'
0.078125
You can also specify date ranges using the -newermt parameter. For example:
$ find . -type f -newermt "2014-01-01" ! -newermt "2014-06-01"
See http://www.commandlinefu.com/commands/view/8721/find-files-in-a-date-range
You can print file size with find using the -printf option, but you still need awk to sum.
For example, total size of all files older than 365 days:
find . -type f -mtime +356 -printf '%s\n' \
| awk '{a+=$1;} END {printf "%.1f GB\n", a/2**30;}'

How to find total size of all files under the ownership of a user?

I'm trying to find out the total size of all files owned by a given user.
I've tried this:
find $myfolder -user $myuser -type f -exec du -ch {} +
But this gives me an error:
missing argument to exec
and I don't know how to fix it. Can somebody can help me with this?
You just need to terminate the -exec. If you want the totals for each directory
possibly -type d is required.
find $myfolder -user $myuser -type d -exec du -ch {} \;
Use:
find $myfolder -user gisi -type f -print0 | xargs -0 du -sh
where user gisi is my cat ;)
Note the option -s for summarize
Further note that I'm using find ... -print0 which on the one hand separates filenames by 0 bytes, which are one of the few characters which are not allowed in filenames, and on the other hand xargs -0 which uses the 0 byte as the delimiter. This makes sure that even exotic filenames won't be a problem.
some version of find command does not like "+" for termination of find command
use "\;" instead of "+"

List files over a specific size in current directory and all subdirectories

How can I display all files greater than 10k bytes in my current directory and it's subdirectories.
Tried ls -size +10k but that didn't work.
find . -size +10k -exec ls -lh {} \+
the first part of this is identical to #sputnicks answer, and sucesffully finds all files in the directory over 10k (don't confuse k with K), my addition, the second part then executes ls -lh or ls that lists(-l) the files by human readable size(-h). negate the h if you prefer. of course the {} is the file itself, and the \+ is simply an alternative to \;
which in practice \; would repeat or:
ls -l found.file; ls -l found.file.2; ls -l found.file.3
where \+ display it as one statement or:
ls -l found.file found.file.2 found.file.3
more on \; vs + with find
Additionaly, you may want the listing ordered by size. Which is relatively easy to accomplish. I would at the -s option to ls, so ls -ls and then pipe it to sort -n to sort numerically
which would become:
find . -size +10k -exec ls -ls {} \+ | sort -n
or in reverse order add an -r :
find . -size +10k -exec ls -ls {} \+ | sort -nr
finally, your title says find biggest file in directory. You can do that by then piping the code to tail
find . -size +10k -exec ls -ls {} \+ | sort -n | tail -1
would find you the largest file in the directory and its sub directories.
note you could also sort files by size by using -S, and negate the need for sort. but to find the largest file you would need to use head so
find . -size +10k -exec ls -lS {} \+ | head -1
the benefit of doing it with -S and not sort is one, you don't have to type sort -n and two you can also use -h the human readable size option. which is one of my favorite to use, but is not available with older versisions of ls, for example we have an old centOs 4 server at work that doesn't have -h
Try doing this:
find . -size +10k -ls
And if you want to use the binary ls :
find . -size +10k -exec ls -l {} \;
I realize the assignment is likely long over. For anyone else:
You are overcomplicating.
find . -size +10k
I'll add to #matchew answer (not enough karma points to comment):
find . -size +10k -type f -maxdepth 1 -exec ls -lh {} \; > myLogFile.txt
-type f :specify regular file type
-maxdepth 1 :make sure it only find files in the current directory
You may use ls like that:
ls -lR | egrep -v '^d' | awk '$5>10240{print}'
Explanation:
ls -lR # list recursivly
egrep -v '^d' # only print lines which do not start with a 'd'. (files)
only print lines where the fifth column (size) is greater that 10240 bytes:
awk '$5>10240{print}'

Resources