Find and basename not playing nicely - linux

I want to echo out the filename portion of a find on the linux commandline. I've tried to use the following:
find www/*.html -type f -exec sh -c "echo $(basename {})" \;
and
find www/*.html -type f -exec sh -c "echo `basename {}`" \;
and a whole host of other combinations of escaping and quoting various parts of the text. The result is that the path isn't stripped:
www/channel.html
www/definition.html
www/empty.html
www/index.html
www/privacypolicy.html
Why not?
Update: While I have a working solution below, I'm still interested in why "basename" doesn't do what it should do.

The trouble with your original attempt:
find www/*.html -type f -exec sh -c "echo $(basename {})" \;
is that the $(basename {}) code is executed once, before the find command is executed. The output of the single basename is {} since that is the basename of {} as a filename. So, the command that is executed by find is:
sh -c "echo {}"
for each file found, but find actually substitutes the original (unmodified) file name each time because the {} characters appear in the string to be executed.
If you wanted it to work, you could use single quotes instead of double quotes:
find www/*.html -type f -exec sh -c 'echo $(basename {})' \;
However, making echo repeat to standard output what basename would have written to standard output anyway is a little pointless:
find www/*.html -type f -exec sh -c 'basename {}' \;
and we can reduce that still further, of course, to:
find www/*.html -type f -exec basename {} \;
Could you also explain the difference between single quotes and double quotes here?
This is routine shell behaviour. Let's take a slightly different command (but only slightly — the names of the files could be anywhere under the www directory, not just one level down), and look at the single-quote (SQ) and double-quote (DQ) versions of the command:
find www -name '*.html' -type f -exec sh -c "echo $(basename {})" \; # DQ
find www -name '*.html' -type f -exec sh -c 'echo $(basename {})' \; # SQ
The single quotes pass the material enclosed direct to the command. Thus, in the SQ command line, the shell that launches find removes the enclosing quotes and the find command sees its $9 argument as:
echo $(basename {})
because the shell removes the quotes. By comparison, the material in the double quotes is processed by the shell. Thus, in the DQ command line, the shell (that launches find — not the one launched by find) sees the $(basename {}) part of the string and executes it, getting back {}, so the string it passes to find as its $9 argument is:
echo {}
Now, when find does its -exec action, in both cases it replaces the {} by the filename that it just found (for sake of argument, www/pics/index.html). Thus, you get two different commands being executed:
sh -c 'echo $(basename www/pics/index.html)' # SQ
sh -c "echo www/pics/index.html" # DQ
There's a (slight) notational cheat going on there — those are the equivalent commands that you'd type at the shell. The $2 of the shell that is launched actually has no quotes in it in either case — the launched shell does not see any quotes.
As you can see, the DQ command simply echoes the file name; the SQ command runs the basename command and captures its output, and then echoes the captured output. A little bit of reductionist thinking shows that the DQ command could be written as -print instead of using -exec, and the SQ command could be written as -exec basename {} \;.
If you're using GNU find, it supports the -printf action which can be followed by Format Directives such that running basename is unnecessary. However, that is only available in GNU find; the rest of the discussion here applies to any version of find you're likely to encounter.

Try this instead :
find www/*.html -type f -printf '%f\n'
If you want to do it with a pipe (more resources needed) :
find www/*.html -type f -print0 | xargs -0 -n1 basename

Thats how I batch resize files with imagick, rediving output filename from source
find . -name header.png -exec sh -c 'convert -geometry 600 {} $(dirname {})/$(basename {} ".png")_mail.png' \;

I had to accomplish something similar, and found following the practices mentioned for avoiding looping over find's output and using find with sh sidestepped these problems with {} and -printfentirely.
You can try it like this:
find www/*.html -type f -exec sh -c 'echo $(basename $1)' find-sh {} \;
The summary is "Don't reference {} directly inside of a sh -c but instead pass it to sh -c as an argument, then you can reference it with a number variable inside of sh -c" the find-sh is just there as a dummy to take up the $0, there is more utility in doing it that way and using {} for $1.
I'm assuming the use of echo is really to simplify the concept and test function. There are easier ways to simply echo as others have mentioned, But an ideal use case for this scenario might be using cp, mv, or any more complex commands where you want to reference the found file names more than once in the command and you need to get rid of the path, eg. when you have to specify filename in both source and destination or if you are renaming things.
So for instance, if you wanted to copy only the html documents to your public_html directory (Why? because Example!) then you could:
find www/*.html -type f -exec sh -c 'cp /var/www/$(basename $1) /home/me/public_html/$(basename $1)' find-sh {} \;
Over on unix stackexchange, user wildcard's answer on looping with find goes into some great gems on usage of -exec and sh -c. (You can find it here: https://unix.stackexchange.com/questions/321697/why-is-looping-over-finds-output-bad-practice)

Related

Variable causing issue while doing the test command in unix:: find $PWD -type d -exec sh -c 'test "{}" ">" "$PWD/$VersionFolders"' ; -print|wc -l`

Variable causing issue while doing the test command in unix Command is::
find $PWD -type d -exec sh -c 'test "{}" ">" "$PWD/$VersionFolders"' \; -print|wc -l`
Input Values-
Here $PWD- Current Directory
b1_v.1.0
b1_v.1.2
b1_v.1.3
b1_v.1.4
Given Version folder as $VersionFolders
b1_v.1.2
The Command should check if any folders exist in current directory which is greater than the give version folder and it should count or display.
This approach has to be consider with out date or time created of folders.
Expected Output-
b1_v.1.3
b1_v.1.4
If I give hard code Directories its working fine. But when I pass it like as variable.it give all folders.
working fine this commend-
find $PWD -type d -exec sh -c 'test "{}" ">" "$PWD/b1_v.1.2"' \; -print|wc -l`
Not working this command with variable-
find $PWD -type d -exec sh -c 'test "{}" ">" "$PWD/$VersionFolders"' ; -print|wc -l`
The variable $VersionFolders won't get expanded inside the single quotes, and apparently you did not export it to make it visible to subprocesses.
An obscure but common hack is to put it in $0 (which nominally should contain the name of the shell itself, and is the first argument after sh -c '...') because that keeps the code simple.
find . -type d \
-exec sh -c 'test "$1" ">" "$0"' \
"$VersionFolders" {} \; \
-print | wc -l
But as #chepner remarks, you can run -exec test {} ">" "$VersionFolders" \; directly.
The shell already does everything in the current directory, so you don't need to spell out $PWD. Perhaps see also What exactly is current working directory?

How to use grep to reverse search files in a folder

I'm trying to create a script which will find missing topics from multiple log files. These logfiles are filled top down, so the newest logs are at the bottom of the file. I would like to grep only the last line from this file which includes UNKNOWN_TOPIC_OR_PARTITION. This should be done in multiple files with completely different names. Is grep the best solution or is there another solution that suits my needs. I already tried adding tail, but that doesn't seem to work.
missingTopics=$(grep -Ri -m1 --exclude=*.{1,2,3,4,5} UNKNOWN_TOPIC_OR_PARTITION /app/tibco/log/tra/domain/)
You could try a combination of find, tac and grep:
find /app/tibco/log/tra/domain -type f ! -name '*.[1-5]' -exec sh -c \
'tac "$1" | grep -im1 UNKNOWN_TOPIC_OR_PARTITION' "sh" '{}' \;
tac prints files in reverse, the -exec sh -c SCRIPT "sh" '{}' \; action of find executes the shell SCRIPT each time a file matching the previous tests is found. The SCRIPT is executed with "sh" as parameter $0 and the path of the found file as parameter $1.
If performance is an issue you can probably improve it with:
find . -type f ! -name '*.[1-5]' -exec sh -c 'for f in "$#"; do \
tac "$f" | grep -im1 UNKNOWN_TOPIC_OR_PARTITION; done' "sh" '{}' +
which will spawn less shells. If security is also an issue you can also replace -exec by -execdir (even if with this SCRIPT I do not immediately see any exploit).

How can I change the extension of files of a type using "find" with Bash? [duplicate]

This question already has answers here:
Recursively change file extensions in Bash
(6 answers)
Closed 6 years ago.
The user inputs a file type they are looking for; it is stored in $arg1; the file type they would like to change them is stored as $arg2. I'm able to find what I'm looking for, but I'm unsure of how to keep the filename the same but just change the type... ie., file1.txt into file1.log.
find . -type f -iname "*.$arg1" -exec mv {} \;
To enable the full power of shell parameter expansions, you can call bash -c in your exec action:
find . -type f -iname "*.$arg1" \
-exec bash -c 'echo mv "$1" "${1/%.*/$1}"' _ {} "$arg2" \;
We add {} and "$arg2" as a parameters to bash -c, so they become accessible within the command as $0 and $1. ${0%.*} removes the extension, to be replaced by whatever $arg2 expands to.
As it is, the command just prints the mv commands it would execute; to actually rename the files, the echo has to be removed.
The quoting is relevant: the argument to bash -c is in single quotes to prevent $0 and $1 from being expanded prematurely, and the two arguments to mv, and arg2 are also quoted to deal with file names with spaces in them.
Combining the find -exec bash idea with the bash loop idea, you can use the + terminator on the -exec to tell find to pass multiple filenames to a single invocation of the bash command. Pass the new type as the first argument - which shows up in $0 and so is conveniently skipped by a for loop over the rest of the command-line arguments - and you have a pretty efficient solution:
find . -type f -iname "*.$arg1" -exec bash -c \
'for arg; do mv "$arg" "${arg%.*}.$0"; done' "$arg2" {} +
Alternatively, if you have either version of the Linux rename command, you can use that. The Perl one (a.k.a. prename, installed by default on Ubuntu and other Debian-based distributions; also available for OS X from Homebrew via brew install rename) can be used like this:
find . -type f -iname "*.$arg1" -exec rename 's/\Q'"$arg1"'\E$/'"$arg2"'/' {} +
That looks a bit ugly, but it's really just the s/old/new/ substitution command familiar from many UNIX tools. The \Q and \E around $arg1 keep any weird characters inside the suffix from being interpreted as regular expression metacharacters that might match something unexpected; the $ after the \E makes sure the pattern only matches at the end of the filename.
The pattern-based version installed by default on Red Hat-based Linux distros (Fedora, CentOS, etc) is simpler:
find . -type f -iname "*.$arg1" -exec rename ".$arg1" ".$arg2" {} +
but it's also dumber: if you rename .com .exe stackoverflow.com_scanner.com, you'll get a file named stackoverflow.exe_scanner.exe.
I would do it like so:
find . -type f -iname "*.$arg1" -print0 |\
while IFS= read -r -d '' file; do
mv -- "$file" "${file%$arg1}$arg2"
done
I took your find command and fed its output to a while loop. Within that loop, I am doing the actual renaming. This way I have the name of the file as a variable that I can manipulate using bash's string manipulation operations.
If you have perl based rename command
Sample directory:
$ find
.
./a"bn.txt
./t2.abc
./abc
./abc/t1.txt
./abc/list.txt
./a bc.txt
Sample args:
$ arg1='txt'
$ arg2='log'
Dry run:
$ find -type f -iname "*.$arg1" -exec rename -n "s/$arg1$/$arg2/" {} +
rename(./a"bn.txt, ./a"bn.log)
rename(./abc/t1.txt, ./abc/t1.log)
rename(./abc/list.txt, ./abc/list.log)
rename(./a bc.txt, ./a bc.log)
Remove -n option once it is okay:
$ find -type f -iname "*.$arg1" -exec rename "s/$arg1$/$arg2/" {} +
$ find
.
./a bc.log
./t2.abc
./abc
./abc/list.log
./abc/t1.log
./a"bn.log

Redirecting stdout with find -exec and without creating new shell

I have one script that only writes data to stdout. I need to run it for multiple files and generate a different output file for each input file and I was wondering how to use find -exec for that. So I basically tried several variants of this (I replaced the script by cat just for testability purposes):
find * -type f -exec cat "{}" > "{}.stdout" \;
but could not make it work since all the data was being written to a file literally named{}.stdout.
Eventually, I could make it work with :
find * -type f -exec sh -c "cat {} > {}.stdout" \;
But while this latest form works well with cat, my script requires environment variables loaded through several initialization scripts, thus I end up with:
find * -type f -exec sh -c "initscript1; initscript2; ...; myscript {} > {}.stdout" \;
Which seems a waste because I have everything already initialized in my current shell.
Is there a better way of doing this with find? Other one-liners are welcome.
You can do it with eval. It may be ugly, but so is having to make a shell script for this. Plus, it's all on one line.
For example
find -type f -exec bash -c "eval md5sum {} > {}.sum " \;
A simple solution would be to put a wrapper around your script:
#!/bin/sh
myscript "$1" > "$1.stdout"
Call it myscript2 and invoke it with find:
find . -type f -exec myscript2 {} \;
Note that although most implementations of find allow you to do what you have done, technically the behavior of find is unspecified if you use {} more than once in the argument list of -exec.
If you export your environment variables, they'll already be present in the child shell (If you use bash -c instead of sh -c, and your parent shell is itself bash, then you can also export functions in the parent shell and have them usable in the child; see export -f).
Moreover, by using -exec ... {} +, you can limit the number of shells to the smallest possible number needed to pass all arguments on the command line:
set -a # turn on automatic export of all variables
source initscript1
source initscript2
# pass as many filenames as possible to each sh -c, iterating over them directly
find * -name '*.stdout' -prune -o -type f \
-exec sh -c 'for arg; do myscript "$arg" > "${arg}.stdout"' _ {} +
Alternately, you can just perform the execution in your current shell directly:
while IFS= read -r -d '' filename; do
myscript "$filename" >"${filename}.out"
done < <(find * -name '*.stdout' -prune -o -type f -print0)
See UsingFind discussing safely and correctly performing bulk actions through find; and BashFAQ #24 discussing the use of process substitution (the <(...) syntax) to ensure that operations are performed in the parent shell.

Why does find -exec mv {} ./target/ + not work?

I want to know exactly what {} \; and {} \+ and | xargs ... do. Please clarify these with explanations.
Below 3 commands run and output same result but the first command takes a little time and the format is also little different.
find . -type f -exec file {} \;
find . -type f -exec file {} \+
find . -type f | xargs file
It's because 1st one runs the file command for every file coming from the find command. So, basically it runs as:
file file1.txt
file file2.txt
But latter 2 find with -exec commands run file command once for all files like below:
file file1.txt file2.txt
Then I run the following commands on which first one runs without problem but second one gives error message.
find . -type f -iname '*.cpp' -exec mv {} ./test/ \;
find . -type f -iname '*.cpp' -exec mv {} ./test/ \+ #gives error:find: missing argument to `-exec'
For command with {} \+, it gives me the error message
find: missing argument to `-exec'
why is that? can anyone please explain what am I doing wrong?
The manual page (or the online GNU manual) pretty much explains everything.
find -exec command {} \;
For each result, command {} is executed. All occurences of {} are replaced by the filename. ; is prefixed with a slash to prevent the shell from interpreting it.
find -exec command {} +
Each result is appended to command and executed afterwards. Taking the command length limitations into account, I guess that this command may be executed more times, with the manual page supporting me:
the total number of invocations of the command will be much less than the number of matched files.
Note this quote from the manual page:
The command line is built in much the same way that xargs builds its command lines
That's why no characters are allowed between {} and + except for whitespace. + makes find detect that the arguments should be appended to the command just like xargs.
The solution
Luckily, the GNU implementation of mv can accept the target directory as an argument, with either -t or the longer parameter --target. It's usage will be:
mv -t target file1 file2 ...
Your find command becomes:
find . -type f -iname '*.cpp' -exec mv -t ./test/ {} \+
From the manual page:
-exec command ;
Execute command; true if 0 status is returned. All following arguments to find are taken to be arguments to the command until an argument consisting of `;' is encountered. The string `{}' is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find. Both of these constructions might need to be escaped (with a `\') or quoted to protect them from expansion by the shell. See the EXAMPLES section for examples of the use of the -exec option. The specified command is run once for each matched file. The command is executed in the starting directory. There are unavoidable security problems surrounding use of the -exec action; you should use the -execdir option instead.
-exec command {} +
This variant of the -exec action runs the specified command on the selected files, but the command line is built by appending each selected file name at the end; the total number of invocations of the command will be much less than the number of matched files. The command line is built in much the same way that xargs builds its command lines. Only one instance of `{}' is allowed within the command. The command is executed in the starting directory.
I encountered the same issue on Mac OSX, using a ZSH shell: in this case there is no -t option for mv, so I had to find another solution.
However the following command succeeded:
find .* * -maxdepth 0 -not -path '.git' -not -path '.backup' -exec mv '{}' .backup \;
The secret was to quote the braces. No need for the braces to be at the end of the exec command.
I tested under Ubuntu 14.04 (with BASH and ZSH shells), it works the same.
However, when using the + sign, it seems indeed that it has to be at the end of the exec command.
The standard equivalent of find -iname ... -exec mv -t dest {} + for find implementations that don't support -iname or mv implementations that don't support -t is to use a shell to re-order the arguments:
find . -name '*.[cC][pP][pP]' -type f -exec sh -c '
exec mv "$#" /dest/dir/' sh {} +
By using -name '*.[cC][pP][pP]', we also avoid the reliance on the current locale to decide what's the uppercase version of c or p.
Note that +, contrary to ; is not special in any shell so doesn't need to be quoted (though quoting won't harm, except of course with shells like rc that don't support \ as a quoting operator).
The trailing / in /dest/dir/ is so that mv fails with an error instead of renaming foo.cpp to /dest/dir in the case where only one cpp file was found and /dest/dir didn't exist or wasn't a directory (or symlink to directory).
find . -name "*.mp3" -exec mv --target-directory=/home/d0k/Музика/ {} \+
no, the difference between + and \; should be reversed. + appends the files to the end of the exec command then runs the exec command and \; runs the command for each file.
The problem is find . -type f -iname '*.cpp' -exec mv {} ./test/ \+ should be find . -type f -iname '*.cpp' -exec mv {} ./test/ + no need to escape it or terminate the +
xargs I haven't used in a long time but I think works like +.

Resources