GNU find: when does the default action apply? - linux

The man page of Debian 8's find command says:
If the whole expression contains no actions other than -prune or -print,
-print is performed on all files for which the whole expression is true.
So why do these outputs differ:
$ mkdir -p test/foo test/bar && cd test && touch foo/bar bar/foo
$ # Test 1
$ find . -name foo -type d -prune -o -name foo
./foo
./bar/foo
$ # Test 2
$ find . -name foo -type d -prune -o -name foo -print
./bar/foo
So test 1: does the expression contain "no actions other than -prune or -print?" Well, excluding the prune, yes that statement is true, there are no actions. So these results are expected since for ./foo the expression before the -o option returns True, so it's printed.
But test 2: does the expression contain "no actions other than -prune or -print?" Well, excluding the prune and the print, yes that statement is true again, there are no other actions. So I would expect the same results.
But I don't get ./foo. Why?
It's as if the man page should read: "If the whole expression contains no actions other than -prune or -print, -print is performed on all files for which the whole expression is true."

I'm going with the simpler explanation, the man page is wrong. It should instead say
If the whole expression contains no actions other than -prune or -print, -print is performed on all files for which the whole expression is true.
It should also maybe contain a caveat for -quit, which is an action, but it causes -find to exit immediately. So even though an implicit -print is added for the whole expression it is never actually executed.
The posix find man page contains a clearer explanation, though it doesn't have quite as many actions as the expanded gnu version.
If no expression is present, -print shall be used as the expression. Otherwise, if the given expression does not contain any of the primaries -exec, -ok, or -print, the given expression shall be effectively replaced by:
( given_expression ) -print
Out of what gnu calls actions, posix only defines -exec, -ok, -print, and -prune. It does not have any of the expanded actions -delete, -ls, etc... So the definition matches the corrected gnu one by only omitting -prune.
Here are some examples using all the gnu find actions which prove the point. For all consider the following file structure
$ tree
.
└── file
-delete
$ find -name file -delete
$
-exec command ;
$ find -name file -exec echo '-exec is an action so an implicit -print is not applied' \;
-exec is an action so an implicit -print is not applied
$
-execdir command {} +
$ find -name file -exec echo 'This should print the filename twice if an implicit -print is applied: ' {} +
This should print the filename twice if an implicit -print is applied: ./file
$
-fls
$ find -name file -fls file
$
-fprint
$ find -name file -fprint file
$
-ls
$ find -name file -ls
1127767338 0 -rw-rw-r-- 1 user user 0 May 6 07:15 ./file
$
-ok command ;
$ find -name file -ok echo '-ok is an action so an implicit -print is not applied' \;
< echo ... ./file > ? y
-ok is an action so an implicit -print is not applied
$
-okdir command ;
$ find -name file -okdir echo '-okdir is an action so an implicit -print is not applied' \;
< echo ... ./file > ? y
-okdir is an action so an implicit -print is not applied
$
-print
#./file would be printed twice if an implicit `-print was applied`
$ find -name file -print
./file
$
-print0
#./file would be printed twice if an implicit `-print was applied`
$ find -name file -print0
./file$
-printf
$ find -name file -printf 'Since -printf is an action the implicit -print is not applied\n'
Since -printf is an action the implicit -print is not applied
$
-prune
$ find -name file -prune
./file
$
-quit
$ find -name file -quit
$ find -D opt -name file -quit
...
Optimized command line:
( -name file [0.1] -a [0.1] -quit [1] ) -a [0.1] -print [1]

Let's look at this command:
find . -name foo -type d -prune -o -name foo
Since -print is the default action, then this action is applied to the whole set of expressions, i.e. -name foo -type d -prune -o -name foo. So it's the same as the following:
find . \( -name foo -type d -prune -o -name foo \) -print
Now let's look at this command:
find . -name foo -type d -prune -o -name foo -print
According to man find expr1 expr2 has higher priority than expr1 -o expr2. So in the command above two expressions are combined with the OR operator:
-name foo -type d -prune
-name foo -print
So if you want to apply -print to both, use parentheses:
find . \( -name foo -type d -prune -o -name foo \) -print
But -prune -o RHS implies that RHS is evaluated only for those items which didn't get pruned.
We can check if we are right by running find with -D tree or -D opt:
find -D opt -O0 . -name foo -type d -prune -o -name foo -print
...
( ( -name foo [0.1] -a [0.04] [need type] -type d [0.4] ) -a [0.04] [call stat] [need type] -prune [1] ) -o [0.14] ( -name foo [0.1] -a [0.1] -print [1] )
./bar/foo
find -D opt -O0 . -name foo -type d -prune -o -name foo
( ( ( -name foo [0.1] -a [0.04] [need type] -type d [0.4] ) -a [0.04] [call stat] [need type] -prune [1] ) -o [1] -name foo [0.1] ) -a [0.14] -print [1]
./foo
./bar/foo
As we can see, find makes (... -prune) -o (... -print) from the first expression where we put -print explicitly. It makes (...) -a -print from the second expression where we omit -print.
So I think that by "the whole expression" the man page means one of expression parts described in OPERATORS section.

Check the GNU Findutils manual, it says
If the expression contains no actions other than ‘-prune’, ‘-print’ is
performed on all files for which the entire expression is true.
Apparently, debian's manual is wrong, because it's just a GNU Find. And I have no idea why this happened, since it's just a copy to me.

Related

find: name pattern as a variable

A trivial situation where I want to manage the file name patterns for the find command in the variable:
/bin/bash
EXCLUDE="! \( -name "\"run*"\" -o -name "\"doc*"\" \)"
find . -maxdepth 1 -type f "$EXCLUDE"
The expectation is to find all the files not matching the $EXCLUDE pattern.
The approach doesn't work, despite the same pattern directly works as expected.
In the shell tracing mode I observed something that I blame as a potential rootcause - the $EXCLUDE variable is evaluated between single quotes:
set -x
find . -maxdepth 1 -type f "$EXCLUDE"
+ find . -maxdepth 1 -type f '! \( -name "run*" -o -name "doc*" \)'
find: paths must precede expression: ! \( -name "run*" -o -name "doc*" \)
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
The substituted $EXCLUDE variable appears in the line starting with + and my find command won't return anything with those characters: '! \( -name "run*" -o -name "doc*" \)'.
Is there a way to remove those quotes, which are not a part of the variable, or I am missing something bigger?
This is BashFAQ/050 -- use an array instead:
EXCLUDE=( '!' '(' -name 'run*' -o -name 'doc*' ')' )
Then use the array expansion syntax
find . -maxdepth 1 -type f "${EXCLUDE[#]}"
Store the arguments in an array:
#! /bin/bash
EXCLUDE=( '!' '(' '-name' 'run*' '-o' '-name' 'doc*' ')' )
find . -maxdepth 1 -type f "${EXCLUDE[#]}"
This avoids quoting errors.

bash scripting: looping and file manipulation [duplicate]

I have a list of images, collected using the following line:
# find . -mindepth 1 -type f -name "*.JPG" | grep "MG_[0-9][0-9][0-9][0-9].JPG"
output:
./DCIM/103canon/IMG_0039.JPG
./DCIM/103canon/IMG_0097.JPG
./DCIM/103canon/IMG_1600.JPG
./DCIM/103canon/IMG_2317.JPG
./DCIM/IMG_0042.JPG
./DCIM/IMG_1152.JPG
./DCIM/IMG_1810.JPG
./DCIM/IMG_2564.JPG
./images/IMG_0058.JPG
./images/IMG_0079.JPG
./images/IMG_1233.JPG
./images/IMG_1959.JPG
./images/IMG_2012/favs/IMG_0039.JPG
./images/IMG_2012/favs/IMG_1060.JPG
./images/IMG_2012/favs/IMG_1729.JPG
./images/IMG_2012/favs/IMG_2013.JPG
./images/IMG_2012/favs/IMG_2317.JPG
./images/IMG_2012/IMG_0079.JPG
./images/IMG_2012/IMG_1403.JPG
./images/IMG_2012/IMG_2102.JPG
./images/IMG_2013/IMG_0060.JPG
./images/IMG_2013/IMG_1311.JPG
./images/IMG_2013/IMG_1729.JPG
./images/IMG_2013/IMG_2013.JPG
./IMG_0085.JPG
./IMG_1597.JPG
./IMG_2288.JPG
however I only want the very last portion, the IMG_\d\d\d\d.JPG. I have tried hundreds of regular expressions and this is the one that gives me the best result. Is there a way to only print out the filename without the directory tree before it or is is solely down to the regex?
Thanks
It should be
find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -printf "%f\n"
If the -printf option is not available with your implementation of find (as in current versions of Mac OS X),
then you can use -execdir echo {} \; instead (if that's available):
find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -execdir echo {} \;

Linux Bash find files on user input with multiple -name clauses

I am trying to create small utility to collect log files from remote host by creating tar ball, for simplicity assume for now assume to just display list of files based on user input.
This command works fine
find $LOGS_DIR -maxdepth 1 -type f \( -name 'process1.log*' -o -name 'process2.log*' \) -exec echo 'FOUND_FILES:{}' ';'
If i programmatically want to update -name clause based on the user input, say for example user input is process3.log*, process4.log*, process5*.log then my bash script should generate find command as
find $LOGS_DIR -maxdepth 1 -type f \( -name 'process3.log*' -o -name 'process4.log*' -o -name 'process5.log*' \) -exec echo 'FOUND_FILES:{}' ';'
Here is my snippet
...
for pattern in "${file_pattern_to_match[#]}"
do
if [ -z $final_pattern ];then
final_pattern="-name $pattern"
continue;
fi
final_pattern="$final_pattern -o -name $pattern"
done
#This will print final_pattern: -name process3.log* -o -name process4.log* -o -name process5.log*
echo "final_pattern:$final_pattern"
find $LOGS_DIR -maxdepth 1 -type f \( $final_pattern \) -exec echo "FOUND_FILES:{}" \;
But the issue is while executing the script find is evaluated as
find /x/path/logs -maxdepth 1 -type f \( -name process3.log.1 process3.log.2 -o -name process4.log.1 process4.log.2 \) -exec echo "FOUND_FILES:{}" \;
But the expected is
find /x/path/logs -maxdepth 1 -type f \( -name "process3.log.*" -o -name process4.log.* -o -name process5.log.* \) -exec echo "FOUND_FILES:{}" \;
because the variable got expanded "find" is exiting with an error
Can someone please help me how to get the expected result above?
Use an array to keep each argument properly quoted.
first=
for pattern in "${file_pattern_to_match[#]}"
do
if [ -z "$first" ]; then
final_pattern=(-name "$pattern")
first=1
else
final_pattern+=(-o -name "$pattern")
fi
done
# Hacky
# first=
# for pattern in "${file_pattern_to_match[#]}"
# do
# final_pattern+=($first -name "$pattern")
# first=-o
# done
find "$LOGS_DIR" -maxdepth 1 -type f \( "${final_pattern[#]}" \) -exec echo "FOUND_FILES:{}" \;

Retrieve specific files under a directory in Linux

I want to see the list of specific files under the directory using linux.
Say for example:-
I have following sub-directories in my current directory
Feb 16 00:37 a1
Feb 16 00:38 a2
Feb 16 00:36 a3
Now if i do ls a* - I can see
bash-4.1$ ls a*
a:
a1:
123.sh 123.txt
a2:
a234.sh a234.txt
a3:
a345.sh a345.txt
I want to filter out only .sh files from the directory so that output should be:-
a1:
123.sh
a2:
a234.sh
a3:
a345.sh
Is it Possible?
Moreover is it possible to print the 1st line of sh file also?
The following find command should work for you:
find . -maxdepth 2 -mindepth 2 -path '*/a*/*.sh' -print -exec head -n1 {} \;
Just take a look at those options. I hope you would find what you you are looking for
basic 'find file' commands
find / -name foo.txt -type f -print # full command
find / -name foo.txt -type f # -print isn't necessary
find / -name foo.txt # don't have to specify "type==file"
find . -name foo.txt # search under the current dir
find . -name "foo.*" # wildcard
find . -name "*.txt" # wildcard
find /users/al -name Cookbook -type d # search '/users/al'
search multiple dirs
find /opt /usr /var -name foo.scala -type f # search multiple dirs
case-insensitive searching
find . -iname foo # find foo, Foo, FOo, FOO, etc.
find . -iname foo -type d # same thing, but only dirs
find . -iname foo -type f # same thing, but only files
find files with different extensions
find . -type f \( -name "*.c" -o -name "*.sh" \) # *.c and *.sh files
find . -type f \( -name "*cache" -o -name "*xml" -o -name "*html" \) # three patterns
find files that don't match a pattern (-not)
find . -type f -not -name "*.html" # find allfiles not ending in ".html"
find files by text in the file (find + grep)
find . -type f -name "*.java" -exec grep -l StringBuffer {} \; # find StringBuffer in all *.java files
find . -type f -name "*.java" -exec grep -il string {} \; # ignore case with -i option
find . -type f -name "*.gz" -exec zgrep 'GET /foo' {} \; # search for a string in gzip'd files
Only using ls, you can get the .sh files and their parent directory with:
ls -1 * | grep ":\|.sh" | grep -B1 .sh
Which will provide the output:
a1:
123.sh
a2:
a234.sh
a3:
a345.sh
However, note that this won't have the correct behavior in case of you have any file called for example 123.sh.txt
In order to print the first line of the first .sh file in every folder:
head -n1 $(ls -1 */*.sh)
Yes and very easy and simple just with ls itself:
ls -d */*.sh
Prove
If you would like to print it with newline:
t $ ls -d */*.sh | tr ' ' '\n'
d1/file.sh
d2/file.sh
d3/file.sh
Or
ls -d */*.sh | tr '/' '\n'
the output:
d1
file.sh
d2
file.sh
d3
file.sh
Also for the first line if you want:
t $ ls -d */*.sh | tr ' ' '\n' | head -n 1
d1/file.sh

List all directories that contain a file with a particular extension

I need list all the directories that contain a file with .info extension in the first level.
--contrib
--abc
--ab.info
--def
--de.info
--xyz
--ab.gh
--ab.ij
The command should list
abc, def
This should work if you run it from your contrib directory:
find . -maxdepth 2 -name "*.info" -exec dirname {} \;
It will need more tweaking if you actually want to run it from the parent of contrib.
The above will give you:
./abc
./def
Which is not exactly what you wanted. So maybe something more like this will help:
find . -maxdepth 2 -name "*.info" -exec sh -c 'F=$(dirname {}) ; basename $F' \;
It is more convoluted but the result is:
abc
def
Or without basename and dirname:
find . -maxdepth 2 -name "*.info" -exec bash -c '[[ {} =~ .*/(.*)/.* ]] && echo ${BASH_REMATCH[1]}' \;
Or with sed:
find . -maxdepth 2 -name "*.info" -exec echo {} + | sed 's|./\(\S*\)/\S*|\1,|g'
Result:
abc, def,

Resources