Find all zero-byte files in directory and subdirectories - linux

How can I find all zero-byte files in a directory and its subdirectories?
I have done this:
#!/bin/bash
lns=`vdir -R *.* $dir| awk '{print $8"\t"$5}'`
temp=""
for file in $lns; do
if test $file = "0"; then
printf $temp"\t"$file"\n"
fi
temp=$file
done
But, I only get results in the current directory, not subdirs,
and if any file name contains a space then I get only first word followed by tab

To print the names of all files in and below $dir of size 0:
find "$dir" -size 0
Note that not all implementations of find will produce output by default, so you may need to do:
find "$dir" -size 0 -print
Two comments on the final loop in the question:
Rather than iterating over every other word in a string and seeing if the alternate values are zero, you can partially eliminate the issue you're having with whitespace by iterating over lines. eg:
printf '1 f1\n0 f 2\n10 f3\n' | while read size path; do
test "$size" -eq 0 && echo "$path"; done
Note that this will fail in your case if any of the paths output by ls contain newlines, and this reinforces 2 points: don't parse ls, and have a sane naming policy that doesn't allow whitespace in paths.
Secondly, to output the data from the loop, there is no need to store the output in a variable just to echo it. If you simply let the loop write its output to stdout, you accomplish the same thing but avoid storing it.

As addition to the answers above:
If you would like to delete those files
find $dir -size 0 -type f -delete

No, you don't have to bother grep.
find $dir -size 0 ! -name "*.xml"

Bash 4+ tested -
This is the correct way to search for size 0:
find /path/to/dir -size 0 -type f -name "*.xml"
Search for multiple file extensions of size 0:
find /path/to/dir -size 0 -type f \( -iname \*.css -o -iname \*.js \)
Note: If you removed the \( ... \) the results would be all of the files that meet this requirement hence ignoring the size 0.

Related

BASH: Filter list of files by return value of another command

I have series of directories with (mostly) video files in them, say
test1
1.mpg
2.avi
3.mpeg
junk.sh
test2
123.avi
432.avi
432.srt
test3
asdf.mpg
qwerty.mpeg
I create a variable (video_dir) with the directory names (based on other parameters) and use that with find to generate the basic list. I then filter based on another variable (video_type) for file types (because there is sometimes non-video files in the dirs) piping it through egrep. Then I shuffle the list around and save it out to a file. That file is later used by mplayer to slideshow through the list.
I currently use the following command to accomplish that. I'm sure it's a horrible way to do it, but it works for me and it's quite fast even on big directories.
video_dir="/test1 /test2"
video_types=".mpg$|.avi$|.mpeg$"
find ${video_dir} -type f |
egrep -i "${video_types}" |
shuf > "$TEMP_OUT"
I now would like to add the ability to filter out files based on the resolution height of the video file. I can get that from.
mediainfo --Output='Video;%Height%' filename
Which just returns a number. I have tried using the -exec functionality of find to run that command on each file.
find ${video_dir} -type f -exec mediainfo --Output='Video;%Height%' {} \;
but that just returns the list of heights, not the filenames and I can't figure out how to reject ones based on a comparison, like <480.
I could do a for next loop but that seems like a bad (slow) idea.
Using info from #mark-setchell I modified it to,
video_dir="test1"
find ${video_dir} -type f \
-exec bash -c 'h=$(mediainfo --Output="Video;%Height%" "$1"); [[ $h -gt 480 ]]' _ {} \; -print
Which works.
You can replace your egrep with the following so you are still inside the find command (-iname is case insensitive and -o represents a logical OR):
find test1 test2 -type f \
\( -iname "*.mpg" -o -iname "*.avi" -o -iname "*.mpeg" \) \
NEXT_BIT
The NEXT_BIT can then -exec bash and exit with status 0 or 1 depending on whether you want the current file included or excluded. So it will look like this:
-exec bash -c 'H=$(mediainfo -output ... "$1"); [ $H -lt 480 ] && exit 1; exit 0' _ {} \;
So, taking note of #tripleee advice in comments about superfluous exit statements, I get this:
find test1 test2 -type f \
\( -iname "*.mpg" -o -iname "*.avi" -o -iname "*.mpeg" \) \
-exec bash -c 'h=$(mediainfo ...options... "$1"); [ $h -lt 480 ]' _ {} \; -print
This Q&A was focused on one particular case, so the accepted answer is not as general as it could be.
find
If the list of files comes from find, one can use its filtering facilities, e.g. -exec:
find ${video_dir} -type f \
-exec COMMAND \; \
-print
Here
COMMAND is not enclosed in quotes -- find reads everything after -exec and up to a \;
find will expand {} to the current file name (including path -- you might find -execdir helpful, which will cd to the file's directory and replace {} with the leaf file name)
The exit code of COMMAND is treated as follows:
0 -> true
non-0 -> false
Note that you can build more complex expressions (e.g. -not -exec ...), which will be evaluated "from left to right, according to the rules of precedence ... -and is assumed where the operator is omitted." (per man find)
xargs
If the list of files comes from elsewhere (and is available on stdin), you can use xargs as follows (from
If xargs is map, what is filter? )
ls | xargs -I{} bash -c "COMMAND '{}' && echo '{}'"
Here is my solution.
#!/bin/bash
shopt -s nullglob
video_dir=(/test1 /test2)
while IFS= read -rd '' file; do
if [[ $file = *.#(mpg|avi|mpeg|mp4) ]]; then
h=$(mediainfo --Output="Video;%Height%" "$file")
(( h >= 480 )) && echo "$file"
fi
done < <(find "${video_dir[#]}" -type f -print0)
This solution you can process everything inside the while read loop.

Bash - how to exclude directory with find command and how to get full path with find?

so I have the code right now down below, and I'm running into a few problems with it
I'm having trouble excluding the directories being outputted by
find ${1-.}
It is giving me the directories too instead of only names; I've tried different methods such as -prune etc.
I'm having trouble with deleting the empty files
The data given to me by
EMPTY_FILE=$(find ${1-.} -size 0)
Does not give me the correct path
Here is the output for that
TestFolder/TestFile
in this case I can't just do:
rm TestFolder/TestFile
As it is invalid path; since it needs ./TestFolder/TestFile
How would I add on the ./ or is there away to get the full path.
#!/bin/bash
echo "Here are all the files in the directory specified\n"
find ${1-.}
EMPTY_FILE=$(find ${1-.} -size 0)
echo "Here are the list of empty files\n"
echo "$EMPTY_FILE \n"
echo "Do you want to delete those empty files?(yes/no)"
read text
if [ "$text" == "yes" ]; then $(rm -- $EMPTY_FILE); fi
Any help is appreciated!
You want this:
#!/bin/bash
echo -e "Here are all the files in the directory specified\n"
# Use -printf "%f\n" to print the filename without leading directories
# Use -type f to restrict find to files
find "${1-.}" -type f -printf " %f\n"
echo -e "Here are the list of empty files\n"
# Again, use -printf "%f\n"
find "${1-.}" -type f -size 0 -printf " %f\n"
echo -e "Do you want to delete those empty files?(yes/no)"
read answer
# Delete files using the `-delete` option
[ "$answer" = "yes" ] && find "${1-.}" -type f -size 0 -delete
Also note that I've quotes "${1-.}" at all occurrences. Since it is user input, you can't rely on the input. Even if it is a path, it might still contain problematic characters, like spaces.
I'm having trouble excluding the directories being outputted by
find ${1-.}
It is giving me the directories too instead of only names
You are looking for the -type test. To instruct find to report only regular files, you could say
find ${1-.} -type f
That's probably what you really want, but what you actually asked (to exclude only directories) would be
find ${1-.} -not -type d
Excluding only directories will list symbolic links and special files, too.
in this case I can't just do:
rm TestFolder/TestFile
As it is invalid path; since it needs ./TestFolder/TestFile
Nonsense. ./TestFolder/TestFile means exactly the same thing as TestFolder/TestFile.
In any event, find does print paths starting at the specified starting path(s).
I have a feeling that I'm missing something from your question, but if all you need to do is exclude directories, just tell find to only look for files:
find . -type f -size 0 -delete
And then adjust that to suit your script. Hope this helps.
-size 0 -type f
rm with no option will not delete directories . Your claim that rm needs ./ is wrong anyway.

How can I search for files in directories that contain spaces in names, using "find"?

How can I search for files in directories that contain spaces in names, using find?
i use script
#!/bin/bash
for i in `find "/tmp/1/" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
do
for j in `ls "$i" | grep sh | sed 's/\.txt//g'`
do
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
but the files and directories that contain spaces in names are not processed?
This will grab all the files that have spaces in them
$ls
more space nospace stillnospace this is space
$find -type f -name "* *"
./this is space
./more space
I don't know how to achieve you goal. But given your actual solution, the problem is not really with find but with the for loops since "spaces" are taken as delimiter between items.
find has a useful option for those cases:
from man find:
-print0
True; print the full file name on the standard output, followed by a null character
(instead of the newline character that -print uses). This allows file names
that contain newlines or other types of white space to be correctly interpreted
by programs that process the find output. This option corresponds to the -0
option of xargs.
As the man saids, this will match with the -0 option of xargs. Several other standard tools have the equivalent option. You probably have to rewrite your complex pipeline around those tools in order to process cleanly file names containing spaces.
In addition, see bash "for in" looping on null delimited string variable to learn how to use for loop with 0-terminated arguments.
Do it like this
find . -type f -name "* *"
Instead of . you can specify your path, where you want to find files with your criteria
Your first for loop is:
for i in `find "/tmp/1" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
If I understand it correctly, it is looking for all text files in the /tmp/1 directory, and then attempting to remove the file name with the sed command right? This would cause a single directory with multiple .txt files to be processed by the inner for loop more than once. Is that what you want?
Instead of using sed to get rid of the filename, you can use dirname instead. Also, later on, you use sed to get rid of the extension. You can use basename for that.
for i in `find "/tmp/1" -iname "*.txt"` ; do
path=$(dirname "$i")
for j in `ls $path | grep POD` ; do
file=$(basename "$j" .txt)
# Do what ever you want with the file
This doesn't solve the problem of having a single directory processed multiple times, but if it is an issue for you, you can use the for loop above to store the file name in an array instead and then remove duplicates with sort and uniq.
Use while read loop with null-delimited pathname output from find:
#!/bin/bash
while IFS= read -rd '' i; do
while IFS= read -rd '' j; do
find "/tmp/2/" -iname "$j.sh" -exec echo cp '{}' "$i" \;
done <(exec find "$i" -maxdepth 1 -mindepth 1 -name '*POD*' -not -name '*.txt' -printf '%f\0')
done <(exec find /tmp/1 -iname '*.txt' -not -iname '[0-9A-Za-z]*.txt' -print0)
Never used for i in $(find...) or similar as it'll fail for file names containing white space as you saw.
Use find ... | while IFS= read -r i instead.
It's hard to say without sample input and expected output but something like this might be what you need:
find "/tmp/1/" -iname "*.txt" |
while IFS= read -r i
do
i="${i%%[0-9A-Za-z]*\.txt}"
for j in "$i"/*sh*
do
j="${j%%\.txt}"
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
The above will still fail for file names that contains newlines. If you have that situation and can't fix the file names then look into the -print0 option for find, and piping it to xargs -0.

bash script collecting filenames seems to get confused by spaces

I'm trying to build a script that lists all the zip files in a set of directories, with some filters and get it to spit them out to file but when a filename has a space in it it seems to appear on a new line.
This list will eventually be used as an input to tar to gzip all the zip files, script is below:
#!/bin/bash
rm -f set1.txt
rm -f set2.txt
for line in $(find /home -type d -name assets ;);
do
echo $line >> set1.txt
for line in $(find $line -type f -name \*.zip -mtime +2 ;);
do
echo \"$line\" >> set2.txt
done;
This works as expected until you get a space in a filename then set2.txt contains entries like this:
"/home/xxxxxx/oldwebroot/htdocs/upload/assets/jobbags/rbjbCost"
"in"
"use"
"sept"
"2010.zip"
Does anyone know how I can get it to keep these filenames with spaces in in a single line with the whole lot wrapped in one set of quotes?
Thanks!
The correct way to loop over a set of files located via find is with a while read construct, thus:
while IFS= read -r -d '' line ; do
echo "$line" >> set1.txt
while IFS= read -r -d '' file ; do
printf '"%s"\n' "$file" >> set2.txt
done < <(find "$line" -type f -name \*.zip -mtime +2 -print0)
done < <(find /home -type d -name assets -print0)
For clarity I have given the inner loop variable a different name.
If you didn't have bash you'd have to issue the find command separately and redirect the output to a file, then read the file with while read ; do .. done < filename.
Note that each expansion of each variable is double-quoted. This is necessary.
Note also, however, that for what you want you can simply use the -printf switch to find, if you have GNU find.
find /home -type f -path '*/assets/*.zip' -mtime +2 -printf '"%p"\n' > set2.txt
Although, as #sarnold notes, this is not safe.
You should probably be executing your tar(1) command through some other mechanism; the find(1) program supports a -print0 option to request ASCII NUL-separated filename output, and the xargs(1) program supports a -0 option to tell it that the input is separated by ASCII NUL characters. (Since NUL is the only character that is not allowed in filenames, this is the only way to get reliable filename handling.)
Simply using the -print0 and -0 options will help but this still leaves the script open to another problem -- xargs(1) might decide to execute the tar(1) command two, three, or more times, depending upon its input. The last execution is the one that will "win", and the data from earlier invocations will be lost for ever. (This is useless as a backup.)
So you should also look into adding the --concatenate command line option to tar(1), too, so that it will add to the archive. It might make sense to perform the compression after all the files have been added, via gzip(1) or bzip2(1). (This does mean you need to remove the archive before a "fresh run" of this script.)

Find files older than X days excluding some other files

i'm trying to write a shell script, for linux and solaris, that finds some specific files older than X days and then deletes them. the trick is that during this process there are a couple of files that must not be deleted.
for example from the following list of files i need to delete *.zip and keep *.log and *.something.*
1.zip
2.zip
3.log
prefix.something.suffix
finding the files and feeding them to rm was easy, but i'm having difficulties in excluding the files from the deletion list.
experimenting around i discovered one can benefit from multiple complex expressions grouped with logical operators like this:
find -L path -type f \( -name '*.log' \) -a ! \( -name '*.zip' -o -name '*something*' \) -mtime +3
cheers,
G
or you could do this:
find /appl/ftp -type f -mtime +30 |grep -vf [exclude_file] | xargs rm -rf;
I needed to find a way to provide a hard coded list of exclude files to not remove, but remove everything else that was older than 30 days. Here is a little script to perform a remove of all files older that 30 days, except files that are listed in the [exclude_file].
EXCL_FILES=`/bin/cat [exclude_file]`;
RM_FILE=`/usr/bin/find [path] -type f -mtime +30`;
for I in $RM_FILES;
do
for J in $EXCL_FILES;
do
grep $J $I;
if [[ $? == 0 ]]; then
/bin/rm $I;
if [[ $? != 0 ]]; then echo "PROBLEM: Could not remove $I"; exit 1; fi;
fi;
done;
done;

Resources