find: name pattern as a variable - linux

A trivial situation where I want to manage the file name patterns for the find command in the variable:
/bin/bash
EXCLUDE="! \( -name "\"run*"\" -o -name "\"doc*"\" \)"
find . -maxdepth 1 -type f "$EXCLUDE"
The expectation is to find all the files not matching the $EXCLUDE pattern.
The approach doesn't work, despite the same pattern directly works as expected.
In the shell tracing mode I observed something that I blame as a potential rootcause - the $EXCLUDE variable is evaluated between single quotes:
set -x
find . -maxdepth 1 -type f "$EXCLUDE"
+ find . -maxdepth 1 -type f '! \( -name "run*" -o -name "doc*" \)'
find: paths must precede expression: ! \( -name "run*" -o -name "doc*" \)
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
The substituted $EXCLUDE variable appears in the line starting with + and my find command won't return anything with those characters: '! \( -name "run*" -o -name "doc*" \)'.
Is there a way to remove those quotes, which are not a part of the variable, or I am missing something bigger?

This is BashFAQ/050 -- use an array instead:
EXCLUDE=( '!' '(' -name 'run*' -o -name 'doc*' ')' )
Then use the array expansion syntax
find . -maxdepth 1 -type f "${EXCLUDE[#]}"

Store the arguments in an array:
#! /bin/bash
EXCLUDE=( '!' '(' '-name' 'run*' '-o' '-name' 'doc*' ')' )
find . -maxdepth 1 -type f "${EXCLUDE[#]}"
This avoids quoting errors.

Related

Easiest way to parse command line string to subprocess list?

I'm trying to figure out how to run this command using subprocess.run():
cmd = 'find / \( -path /mnt -prune -o -path /dev -prune -o -path /proc -prune -o -path /sys -prune \) -o ! -type l -type f -or -type d -printf "depth="%d/"perm="%m/"size="%s/"atime="%A#/"mtime"=%T#/"ctime"=%C#/"hardlinks"=%n/"selinux_context"=%Z/"user="%u/"group="%g/"name="%p/"type="%Y\\n'
I've put the command into a list, even removing items, etc:
cmd = [
'find',
'/',
'\( -path /mnt -prune -o -path /dev -prune -o -path /proc -prune -o -path /sys -prune \)',
'-o',
'! -type l',
'-type f',
'-or',
'-type d'
]
I've tried running the command using /bin/bash:
cmd = '/bin/bash -c find / \( -path /mnt -prune -o -path /dev -prune -o -path /proc -prune -o -path /sys -prune \) -o ! -type l -type f -or -type d -printf "depth="%d/"perm="%m/"size="%s/"atime="%A#/"mtime"=%T#/"ctime"=%C#/"hardlinks"=%n/"selinux_context"=%Z/"user="%u/"group="%g/"name="%p/"type="%Y\\n'
Doesn't matter. Everything I've tried does not work. Either I get no output at all, or it lists the files in my home directory, or I get an error, e.g.: b'find: paths must precede expression: ! -type l\nUsage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]\n'
Is there any easy way to take a command that works at the command line and just parse the string into whatever list elements subprocess.run() wants?
Parsing With shlex.split()
After fixing the incorrect quotes in your printf string, we get:
cmd = r'''
find / \( -path /mnt -prune -o -path /dev -prune -o -path /proc -prune -o -path /sys -prune \) -o ! -type l -type f -or -type d -printf 'depth=%d/perm=%m/size=%s/atime=%A#/mtime=%T#/ctime=%C#/hardlinks=%n/selinux_context=%Z/user=%u/group=%g/name=%p/type=%Y\\n'
'''
print(shlex.split(cmd))
...which emits an entirely correct result, and subprocess.call() works with it properly.
Building A Correct Command Line By Hand
In terms of what it looks like to do this by hand:
cmd = [
'find', '/',
'(',
'-path', '/mnt', '-prune',
'-o', '-path', '/dev', '-prune',
'-o', '-path', '/proc', '-prune',
'-o', '-path', '/sys', '-prune',
')',
'-o', '!', '-type', 'l',
'-type', 'f',
'-or',
'-type', 'd',
'-printf', 'depth=%d/perm=%m/size=%s/atime=%A#/mtime=%T#/ctime=%C#/hardlinks=%n/selinux_context=%Z/user=%u/group=%g/name=%p/type=%Y\n'
]
Note:
Syntactic quotes change the shell's parsing mode, they don't become part of the data. "foo" just becomes foo; "foo"bar"baz" becomes foobarbaz. So you can't/shouldn't/don't try to put those quotes into the data that Python is passing in.
This is true also for \(: the backslash is shell syntax. It doesn't actually become one of find's arguments, so you leave it out.
Any space that isn't quoted or escaped separates words; so -type f in shell is '-type', 'f', two separate words.

I want to find files with specific extensions (.pfd, .xls, . ser and .csv) which are 30 days older

I am struggling to list down all the files in the current directory with .pdf, .xls, . ser and .csv extensions which must be 30 days older.
I am using the command
find $Path -maxdepth 1 -mtime +33 -type f \(-iname "*pdf" -o -iname "*xls" -o -iname "*ser" -o -iname "*csv"\) | xargs ls -ltr >> ${LOG_OUT};
but i am receiving an error:
find: paths must precede expression: (-iname Usage: find [-H] [-L]
[-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...]
[expression]
Try this:
find $Path -maxdepth 1 -mtime +33 -type f \( -iname "*pdf" -o -iname "*xls" -o -iname "*ser" -o -iname "*csv" \) | xargs ls -ltr >> ${LOG_OUT};
you need a space after \( and before \)
Also you do not need |xargs, try this:
find $Path -maxdepth 1 -mtime +33 -type f \( -iname "*pdf" -o -iname "*xls" -o -iname "*ser" -o -iname "*csv" \) -exec ls -ltr {} \; >> ${LOG_OUT}

bash scripting: looping and file manipulation [duplicate]

I have a list of images, collected using the following line:
# find . -mindepth 1 -type f -name "*.JPG" | grep "MG_[0-9][0-9][0-9][0-9].JPG"
output:
./DCIM/103canon/IMG_0039.JPG
./DCIM/103canon/IMG_0097.JPG
./DCIM/103canon/IMG_1600.JPG
./DCIM/103canon/IMG_2317.JPG
./DCIM/IMG_0042.JPG
./DCIM/IMG_1152.JPG
./DCIM/IMG_1810.JPG
./DCIM/IMG_2564.JPG
./images/IMG_0058.JPG
./images/IMG_0079.JPG
./images/IMG_1233.JPG
./images/IMG_1959.JPG
./images/IMG_2012/favs/IMG_0039.JPG
./images/IMG_2012/favs/IMG_1060.JPG
./images/IMG_2012/favs/IMG_1729.JPG
./images/IMG_2012/favs/IMG_2013.JPG
./images/IMG_2012/favs/IMG_2317.JPG
./images/IMG_2012/IMG_0079.JPG
./images/IMG_2012/IMG_1403.JPG
./images/IMG_2012/IMG_2102.JPG
./images/IMG_2013/IMG_0060.JPG
./images/IMG_2013/IMG_1311.JPG
./images/IMG_2013/IMG_1729.JPG
./images/IMG_2013/IMG_2013.JPG
./IMG_0085.JPG
./IMG_1597.JPG
./IMG_2288.JPG
however I only want the very last portion, the IMG_\d\d\d\d.JPG. I have tried hundreds of regular expressions and this is the one that gives me the best result. Is there a way to only print out the filename without the directory tree before it or is is solely down to the regex?
Thanks
It should be
find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -printf "%f\n"
If the -printf option is not available with your implementation of find (as in current versions of Mac OS X),
then you can use -execdir echo {} \; instead (if that's available):
find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -execdir echo {} \;

Linux Bash find files on user input with multiple -name clauses

I am trying to create small utility to collect log files from remote host by creating tar ball, for simplicity assume for now assume to just display list of files based on user input.
This command works fine
find $LOGS_DIR -maxdepth 1 -type f \( -name 'process1.log*' -o -name 'process2.log*' \) -exec echo 'FOUND_FILES:{}' ';'
If i programmatically want to update -name clause based on the user input, say for example user input is process3.log*, process4.log*, process5*.log then my bash script should generate find command as
find $LOGS_DIR -maxdepth 1 -type f \( -name 'process3.log*' -o -name 'process4.log*' -o -name 'process5.log*' \) -exec echo 'FOUND_FILES:{}' ';'
Here is my snippet
...
for pattern in "${file_pattern_to_match[#]}"
do
if [ -z $final_pattern ];then
final_pattern="-name $pattern"
continue;
fi
final_pattern="$final_pattern -o -name $pattern"
done
#This will print final_pattern: -name process3.log* -o -name process4.log* -o -name process5.log*
echo "final_pattern:$final_pattern"
find $LOGS_DIR -maxdepth 1 -type f \( $final_pattern \) -exec echo "FOUND_FILES:{}" \;
But the issue is while executing the script find is evaluated as
find /x/path/logs -maxdepth 1 -type f \( -name process3.log.1 process3.log.2 -o -name process4.log.1 process4.log.2 \) -exec echo "FOUND_FILES:{}" \;
But the expected is
find /x/path/logs -maxdepth 1 -type f \( -name "process3.log.*" -o -name process4.log.* -o -name process5.log.* \) -exec echo "FOUND_FILES:{}" \;
because the variable got expanded "find" is exiting with an error
Can someone please help me how to get the expected result above?
Use an array to keep each argument properly quoted.
first=
for pattern in "${file_pattern_to_match[#]}"
do
if [ -z "$first" ]; then
final_pattern=(-name "$pattern")
first=1
else
final_pattern+=(-o -name "$pattern")
fi
done
# Hacky
# first=
# for pattern in "${file_pattern_to_match[#]}"
# do
# final_pattern+=($first -name "$pattern")
# first=-o
# done
find "$LOGS_DIR" -maxdepth 1 -type f \( "${final_pattern[#]}" \) -exec echo "FOUND_FILES:{}" \;

GNU find: when does the default action apply?

The man page of Debian 8's find command says:
If the whole expression contains no actions other than -prune or -print,
-print is performed on all files for which the whole expression is true.
So why do these outputs differ:
$ mkdir -p test/foo test/bar && cd test && touch foo/bar bar/foo
$ # Test 1
$ find . -name foo -type d -prune -o -name foo
./foo
./bar/foo
$ # Test 2
$ find . -name foo -type d -prune -o -name foo -print
./bar/foo
So test 1: does the expression contain "no actions other than -prune or -print?" Well, excluding the prune, yes that statement is true, there are no actions. So these results are expected since for ./foo the expression before the -o option returns True, so it's printed.
But test 2: does the expression contain "no actions other than -prune or -print?" Well, excluding the prune and the print, yes that statement is true again, there are no other actions. So I would expect the same results.
But I don't get ./foo. Why?
It's as if the man page should read: "If the whole expression contains no actions other than -prune or -print, -print is performed on all files for which the whole expression is true."
I'm going with the simpler explanation, the man page is wrong. It should instead say
If the whole expression contains no actions other than -prune or -print, -print is performed on all files for which the whole expression is true.
It should also maybe contain a caveat for -quit, which is an action, but it causes -find to exit immediately. So even though an implicit -print is added for the whole expression it is never actually executed.
The posix find man page contains a clearer explanation, though it doesn't have quite as many actions as the expanded gnu version.
If no expression is present, -print shall be used as the expression. Otherwise, if the given expression does not contain any of the primaries -exec, -ok, or -print, the given expression shall be effectively replaced by:
( given_expression ) -print
Out of what gnu calls actions, posix only defines -exec, -ok, -print, and -prune. It does not have any of the expanded actions -delete, -ls, etc... So the definition matches the corrected gnu one by only omitting -prune.
Here are some examples using all the gnu find actions which prove the point. For all consider the following file structure
$ tree
.
└── file
-delete
$ find -name file -delete
$
-exec command ;
$ find -name file -exec echo '-exec is an action so an implicit -print is not applied' \;
-exec is an action so an implicit -print is not applied
$
-execdir command {} +
$ find -name file -exec echo 'This should print the filename twice if an implicit -print is applied: ' {} +
This should print the filename twice if an implicit -print is applied: ./file
$
-fls
$ find -name file -fls file
$
-fprint
$ find -name file -fprint file
$
-ls
$ find -name file -ls
1127767338 0 -rw-rw-r-- 1 user user 0 May 6 07:15 ./file
$
-ok command ;
$ find -name file -ok echo '-ok is an action so an implicit -print is not applied' \;
< echo ... ./file > ? y
-ok is an action so an implicit -print is not applied
$
-okdir command ;
$ find -name file -okdir echo '-okdir is an action so an implicit -print is not applied' \;
< echo ... ./file > ? y
-okdir is an action so an implicit -print is not applied
$
-print
#./file would be printed twice if an implicit `-print was applied`
$ find -name file -print
./file
$
-print0
#./file would be printed twice if an implicit `-print was applied`
$ find -name file -print0
./file$
-printf
$ find -name file -printf 'Since -printf is an action the implicit -print is not applied\n'
Since -printf is an action the implicit -print is not applied
$
-prune
$ find -name file -prune
./file
$
-quit
$ find -name file -quit
$ find -D opt -name file -quit
...
Optimized command line:
( -name file [0.1] -a [0.1] -quit [1] ) -a [0.1] -print [1]
Let's look at this command:
find . -name foo -type d -prune -o -name foo
Since -print is the default action, then this action is applied to the whole set of expressions, i.e. -name foo -type d -prune -o -name foo. So it's the same as the following:
find . \( -name foo -type d -prune -o -name foo \) -print
Now let's look at this command:
find . -name foo -type d -prune -o -name foo -print
According to man find expr1 expr2 has higher priority than expr1 -o expr2. So in the command above two expressions are combined with the OR operator:
-name foo -type d -prune
-name foo -print
So if you want to apply -print to both, use parentheses:
find . \( -name foo -type d -prune -o -name foo \) -print
But -prune -o RHS implies that RHS is evaluated only for those items which didn't get pruned.
We can check if we are right by running find with -D tree or -D opt:
find -D opt -O0 . -name foo -type d -prune -o -name foo -print
...
( ( -name foo [0.1] -a [0.04] [need type] -type d [0.4] ) -a [0.04] [call stat] [need type] -prune [1] ) -o [0.14] ( -name foo [0.1] -a [0.1] -print [1] )
./bar/foo
find -D opt -O0 . -name foo -type d -prune -o -name foo
( ( ( -name foo [0.1] -a [0.04] [need type] -type d [0.4] ) -a [0.04] [call stat] [need type] -prune [1] ) -o [1] -name foo [0.1] ) -a [0.14] -print [1]
./foo
./bar/foo
As we can see, find makes (... -prune) -o (... -print) from the first expression where we put -print explicitly. It makes (...) -a -print from the second expression where we omit -print.
So I think that by "the whole expression" the man page means one of expression parts described in OPERATORS section.
Check the GNU Findutils manual, it says
If the expression contains no actions other than ‘-prune’, ‘-print’ is
performed on all files for which the entire expression is true.
Apparently, debian's manual is wrong, because it's just a GNU Find. And I have no idea why this happened, since it's just a copy to me.

Resources