Execute function on results of `find` - sh - linux

I'm writing a shell script to run on a docker image based on Alpine. It's shell is /bin/sh.
What I'm trying to do is execute a function for the results of a find command. The following works in my local bash and sh shells.
myscript.sh:
#!/bin/sh
function get_tags {
# do stuff
}
export -f get_tags
# get all YAML files in ./assets/config that have 'FIND' somewhere in the filename
# pass each to the get_tags function
find ./assets/config -type f \( -iname "Find*.yaml" -or -iname "Find*.yml" \) -exec sh -c 'get_tags "$0"' {} \;
When I run it on the alpine image, however, I get the following error:
./myscript.sh: export: line 31: illegal option -f
Is there another way I can do this?
My question is NOT "what is the difference between sh and bash". My question is: how do I accomplish the task of running a function on the output of the find command.

You need to use bash, like this:
#!/bin/bash
fun() { echo "fun ${1}" ; }
export -f fun
find . -name 'foo' -exec bash -c 'fun "${1}"' -- {} \;
The key here is to run bash -c 'fun "${1}"' -- {} \;. You can't call the function directly (and pass arguments to it). You need to wrap it into a minimal script where this minimal script receives the argument passed by find and passes it through to the function.
Note: I'm passing two arguments to bash -c: the string -- and the actual filename {}. I'm doing this by convention, because argument counting starts at $0 when a script is executed by bash -c, in opposite to $1 when running a script the normal way (in a file, not via bash -c)
bash -c 'fun "${0}"' {} \; would work, but people might think $0 is the script name like they know it from normal scripts.

Exporting functions is a Bash feature. Alpine Linux does not come with Bash.
You can instead use a while read loop to process the results, as this is POSIX and will work on all shells:
get_tags() {
echo "Getting tags for $1"
}
find ./assets/config -type f \( -iname "Find*.yaml" -o -iname "Find*.yml" \) |
while IFS="" read -r file
do
get_tags "$file"
done

Related

I've found a way to use find -exec to cp multiple files all in one line like xargs, but I'm not sure exactly how it works

I've been working with find -exec and find | xargs for the past few hours exploring and experimenting, and now I've found a variation of the command that I haven't seen anywhere else.
For example this find command to get all files in the child subdirectories and copy them to the current directory
find . -type f -regex './[^\.].*/[^.*].*' -exec sh -c 'cp "$#" .' thiscanbeanything {} +
will all execute on one line like so:
cp ./testy/bar ./testy/baz ./testy/foo .
Instead of the usual:
find . -type f -regex './[^\.].*/[^.*].*' -exec sh -c 'cp {} .' \;
which executes on multiple lines
cp ./testy/bar .
cp ./testy/baz .
cp ./testy/foo .
Moreover in the first command the output will be only:
cp ./testy/baz ./testy/foo .
Unless the sh -c 'cmd' is followed by something else, which in my example was thiscanbeanything.
Could someone elucidate what's going on, or if this is even viable?
To understand what is going on, have a look at this:
$ sh -c 'echo 0=$0 1=$1 2=$2' thiscanbeanything one two
0=thiscanbeanything 1=one 2=two
This executes sh with the option -c 'echo 0=$0 1=$1 2=$2' and three arguments thiscanbeanything one two.
Normally, $0 is the name of the script being executed. When running sh -c, there is no script but the name is taken from the first argument that your provide which, in this case, is thiscanbeanything.
Documentation
This behavior is documented in man bash under the -c option:
-c string If the -c option is present, then commands are read from string. If there are arguments after the string, they are
assigned to the positional parameters, starting with $0.

What is Flow Control in Bash?

In this question the answer was
find . -type f -name \*.mp4 -exec sh -c 'ffprobe "$0" 2>&1 |
grep -q 1920x1080 && echo "$0"' {} \;
which will output all mp4 files that are 1920x1080.
I don't understand why sh -c is there. If I remove it, then it doesn't find anything.
The author says
The new shell is required to handle the flow control inside the exec'd
command.
but I guess I am missing some fundamental knowledge to understand the answer.
Question
Can anyone explain why sh -c have to be there, and why it only works then ffprobe is opened in a new shell?
The -exec option takes a sequence of arguments:
find . -exec arg0 arg1 arg2 ... \;
If you put the arguments in quotes
find . -exec "arg0 arg1 arg2" \;
then "arg0 arg1 arg2" are treated as a single argument. It would expect a command called arg0 arg1 arg2, with the spaces, to exist on your system, instead of a command called arg0 with parameters of arg1 and arg2.
If you were to use find without the sh -c, then you would have this:
find . -type f -name \*.mp4 -exec 'ffprobe "{}" 2>&1 |
grep -q 1920x1080 && echo "{}"' \;
This would mean that find would look for a command called ffprobe "$0" ...., passing no arguments -- there is no such command. There is a command called ffprobe, which takes arguments, and that is what you need. One possibility would be to do something like this:
find . -type f -name \*.mp4 -exec ffprobe '$0' 2>&1 |
grep -q 1920x1080 && echo '{}' \;
However, that doesn't work, since the output redirection 2>&1 and the pipe | and the command sequence operator && would all be treated differently than what you want.
To get around this, they use another shell. This is similar to creating a script to do the work:
find . -type f -name \*.mp4 -exec myscript {} \;
But instead of a separate script, everything is all on one line.
find executes the arguments given to exec as a command directly (that is it invokes the first argument as an application with the following arguments as arguments to that command) without doing any processing on it other than replacing {} with the file name. That is it does not implement any shell features like piping or input redirection. In your case the command contains pipes and input redirections. So you need to run the command through sh, so that sh handles those.
The difference lies in the implementation of find. Find uses flavors of fork() and exec() calls of Linux/Unix (Posix) to spawn a new process and pass the command with arguments to it. The exec() call runs an executable (binary or shebang interpreted program), passing the arguments directly to it as argument strings. No processing of those arguments is done.
Therefore shell conventions are not interpreted. E.g. the $0, 2>&1 and '|' etc. are passed to fprobe as arguments. Not what you want. By adding the sh -c, you are telling find to exec a shell, passing the rest of the line to it for interpretation.
Note that you could get the same effect by putting your command in a shebang script and calling this script from find. Here the exec'ed script would load the shell.

Find and basename not playing nicely

I want to echo out the filename portion of a find on the linux commandline. I've tried to use the following:
find www/*.html -type f -exec sh -c "echo $(basename {})" \;
and
find www/*.html -type f -exec sh -c "echo `basename {}`" \;
and a whole host of other combinations of escaping and quoting various parts of the text. The result is that the path isn't stripped:
www/channel.html
www/definition.html
www/empty.html
www/index.html
www/privacypolicy.html
Why not?
Update: While I have a working solution below, I'm still interested in why "basename" doesn't do what it should do.
The trouble with your original attempt:
find www/*.html -type f -exec sh -c "echo $(basename {})" \;
is that the $(basename {}) code is executed once, before the find command is executed. The output of the single basename is {} since that is the basename of {} as a filename. So, the command that is executed by find is:
sh -c "echo {}"
for each file found, but find actually substitutes the original (unmodified) file name each time because the {} characters appear in the string to be executed.
If you wanted it to work, you could use single quotes instead of double quotes:
find www/*.html -type f -exec sh -c 'echo $(basename {})' \;
However, making echo repeat to standard output what basename would have written to standard output anyway is a little pointless:
find www/*.html -type f -exec sh -c 'basename {}' \;
and we can reduce that still further, of course, to:
find www/*.html -type f -exec basename {} \;
Could you also explain the difference between single quotes and double quotes here?
This is routine shell behaviour. Let's take a slightly different command (but only slightly — the names of the files could be anywhere under the www directory, not just one level down), and look at the single-quote (SQ) and double-quote (DQ) versions of the command:
find www -name '*.html' -type f -exec sh -c "echo $(basename {})" \; # DQ
find www -name '*.html' -type f -exec sh -c 'echo $(basename {})' \; # SQ
The single quotes pass the material enclosed direct to the command. Thus, in the SQ command line, the shell that launches find removes the enclosing quotes and the find command sees its $9 argument as:
echo $(basename {})
because the shell removes the quotes. By comparison, the material in the double quotes is processed by the shell. Thus, in the DQ command line, the shell (that launches find — not the one launched by find) sees the $(basename {}) part of the string and executes it, getting back {}, so the string it passes to find as its $9 argument is:
echo {}
Now, when find does its -exec action, in both cases it replaces the {} by the filename that it just found (for sake of argument, www/pics/index.html). Thus, you get two different commands being executed:
sh -c 'echo $(basename www/pics/index.html)' # SQ
sh -c "echo www/pics/index.html" # DQ
There's a (slight) notational cheat going on there — those are the equivalent commands that you'd type at the shell. The $2 of the shell that is launched actually has no quotes in it in either case — the launched shell does not see any quotes.
As you can see, the DQ command simply echoes the file name; the SQ command runs the basename command and captures its output, and then echoes the captured output. A little bit of reductionist thinking shows that the DQ command could be written as -print instead of using -exec, and the SQ command could be written as -exec basename {} \;.
If you're using GNU find, it supports the -printf action which can be followed by Format Directives such that running basename is unnecessary. However, that is only available in GNU find; the rest of the discussion here applies to any version of find you're likely to encounter.
Try this instead :
find www/*.html -type f -printf '%f\n'
If you want to do it with a pipe (more resources needed) :
find www/*.html -type f -print0 | xargs -0 -n1 basename
Thats how I batch resize files with imagick, rediving output filename from source
find . -name header.png -exec sh -c 'convert -geometry 600 {} $(dirname {})/$(basename {} ".png")_mail.png' \;
I had to accomplish something similar, and found following the practices mentioned for avoiding looping over find's output and using find with sh sidestepped these problems with {} and -printfentirely.
You can try it like this:
find www/*.html -type f -exec sh -c 'echo $(basename $1)' find-sh {} \;
The summary is "Don't reference {} directly inside of a sh -c but instead pass it to sh -c as an argument, then you can reference it with a number variable inside of sh -c" the find-sh is just there as a dummy to take up the $0, there is more utility in doing it that way and using {} for $1.
I'm assuming the use of echo is really to simplify the concept and test function. There are easier ways to simply echo as others have mentioned, But an ideal use case for this scenario might be using cp, mv, or any more complex commands where you want to reference the found file names more than once in the command and you need to get rid of the path, eg. when you have to specify filename in both source and destination or if you are renaming things.
So for instance, if you wanted to copy only the html documents to your public_html directory (Why? because Example!) then you could:
find www/*.html -type f -exec sh -c 'cp /var/www/$(basename $1) /home/me/public_html/$(basename $1)' find-sh {} \;
Over on unix stackexchange, user wildcard's answer on looping with find goes into some great gems on usage of -exec and sh -c. (You can find it here: https://unix.stackexchange.com/questions/321697/why-is-looping-over-finds-output-bad-practice)

Bash Script to find files

Good day,
I've found an easy way to find files that have certain content, but I would like to create a bash script to do it quickier,
The script is:
#!/bin/bash
DIRECTORY=$(cd `dirname .` && pwd)
ARGUMENTS="'$#'"
echo find: $ARGUMENTS on $DIRECTORY
find $DIRECTORY -iname '*' | xargs grep $ARGUMENTS -sl
So if I write:
$ script.sh text
It should find in that directory files that contains 'text'
But when I execute this script it always fails, but the echo command shows exactly what I need, what's wrong with this script?
Thank you!
Luis
References: http://www.liamdelahunty.com/tips/linux_find_string_files.php
There are problems with quoting that will break in this script if either the current directory or the search pattern contains a space. The following is more simply, and fixes both issues:
find . -maxdepth 1 -type f -exec grep "$#" {} +
With the proper quoting of $#, you can even pass options to grep, such as -i.
./script -i "some text"
Try this version, with the following changes:
1.Use $1 instead of $# unless you intend to run multiple find/grep to search for multiple patterns.
2.Use find $DIR -type f to find all files instead of find $DIR -iname '*'
3.Avoid piping by using the -exec command line option of find.
4.Do not single quote the command line arguments to your script, this was the main problem with the version you had. Your grep string had escaped single quotes \'search_string\'
#!/bin/bash
DIRECTORY=$(cd `dirname .` && pwd)
ARGUMENTS="$1"
echo find: $ARGUMENTS on $DIRECTORY
find $DIRECTORY . -type f -exec grep -sl "$ARGUMENTS" {} \;
There is no point extracting all the command line arguments and passing it to grep. If you want to search for a string with spaces, pass the string within single quotes from the command line as follows:
/home/user/bin/test-find.sh 'i need to search this'
Why not just run the following?:
grep -R text .

Redirecting stdout with find -exec and without creating new shell

I have one script that only writes data to stdout. I need to run it for multiple files and generate a different output file for each input file and I was wondering how to use find -exec for that. So I basically tried several variants of this (I replaced the script by cat just for testability purposes):
find * -type f -exec cat "{}" > "{}.stdout" \;
but could not make it work since all the data was being written to a file literally named{}.stdout.
Eventually, I could make it work with :
find * -type f -exec sh -c "cat {} > {}.stdout" \;
But while this latest form works well with cat, my script requires environment variables loaded through several initialization scripts, thus I end up with:
find * -type f -exec sh -c "initscript1; initscript2; ...; myscript {} > {}.stdout" \;
Which seems a waste because I have everything already initialized in my current shell.
Is there a better way of doing this with find? Other one-liners are welcome.
You can do it with eval. It may be ugly, but so is having to make a shell script for this. Plus, it's all on one line.
For example
find -type f -exec bash -c "eval md5sum {} > {}.sum " \;
A simple solution would be to put a wrapper around your script:
#!/bin/sh
myscript "$1" > "$1.stdout"
Call it myscript2 and invoke it with find:
find . -type f -exec myscript2 {} \;
Note that although most implementations of find allow you to do what you have done, technically the behavior of find is unspecified if you use {} more than once in the argument list of -exec.
If you export your environment variables, they'll already be present in the child shell (If you use bash -c instead of sh -c, and your parent shell is itself bash, then you can also export functions in the parent shell and have them usable in the child; see export -f).
Moreover, by using -exec ... {} +, you can limit the number of shells to the smallest possible number needed to pass all arguments on the command line:
set -a # turn on automatic export of all variables
source initscript1
source initscript2
# pass as many filenames as possible to each sh -c, iterating over them directly
find * -name '*.stdout' -prune -o -type f \
-exec sh -c 'for arg; do myscript "$arg" > "${arg}.stdout"' _ {} +
Alternately, you can just perform the execution in your current shell directly:
while IFS= read -r -d '' filename; do
myscript "$filename" >"${filename}.out"
done < <(find * -name '*.stdout' -prune -o -type f -print0)
See UsingFind discussing safely and correctly performing bulk actions through find; and BashFAQ #24 discussing the use of process substitution (the <(...) syntax) to ensure that operations are performed in the parent shell.

Resources