Find files matching a pattern, replace strings and then diff the output with original, command fails - linux

I am trying to find files with the name.* and run sed on the ones that match, then pipe to diff to see what was changed.
However the command fails. If I remove the pipe an diff it is happy to output results. Why is failing with the diff? Is there a better way to do this?
> find -type f -name "names.*" -printf '%p' -exec sed 's/Cow/Kitten' {} | diff {} - \;
diff: extra operand ';'
diff: Try 'diff --help' for more information.
find: missing argument to \-exec\'`

A shell is needed to do what you wanted, like so.
find -type f -name "names.*" -exec sh -c '
for f; do
sed 's/Cow/Kitten/' "$f" | diff "$f" -
done' _ {} \;
In-one-line
find -type f -name "names.*" -exec sh -c 'for f; do sed 's/Cow/Kitten/' "$f" | diff "$f" -; done' _ {} \;
See understanding-the-exec-option-of-find
Or using a while + read loop and Process Substitution.
#!/usr/bin/env bash
while IFS= read -rd '' files; do
sed 's/Cow/Kitten/' "$files" | diff "$files" -
done < <(find -type f -name "names.*" -print0)
The latter script is white space/tab/newlines safe but is strictly bash as oppose to the former script which is POSIX sh. (Will/should work/execute with any POSIX compliant shell.)
See How can I find and safely handle file names containing newlines, spaces or both?
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?

Related

I want to get an output of the find command in shell script

Am trying to write a script that finds the files that are older than 10 hours from the sub-directories that are in the "HS_client_list". And send the Output to a file "find.log".
#!/bin/bash
while IFS= read -r line; do
echo Executing cd /moveit/$line
cd /moveit/$line
#Find files less than 600 minutes old.
find $PWD -type f -iname "*.enc" -mmin +600 -execdir basename '{}' ';' | xargs ls > /home/infa91punv/find.log
done < HS_client_list
However, the script is able to cd to the folders from HS_client_list(this file contents the name of the subdirectories) but, the find command (find $PWD -type f -iname "*.enc" -mmin +600 -execdir basename '{}' ';' | xargs ls > /home/infa91punv/find.log) is not working. The Output file is empty. But when I run find $PWD -type f -iname "*.enc" -mmin +600 -execdir basename '{}' ';' | xargs ls > /home/infa91punv/find.log as a command it works and from the script it doesn't.
You are overwriting the file in each iteration.
You can use xargs to perform find on multiple directories; but you have to use an alternate delimiter to avoid having xargs populate the {} in the -execdir command.
sed 's%^%/moveit/%' HS_client_list |
xargs -I '<>' find '<>' -type f -iname "*.enc" -mmin +600 -execdir basename {} \; > /home/infa91punv/find.log
The xargs ls did not seem to perform any useful functionality, so I took it out. Generally, don't use ls in scripts.
With GNU find, you could avoid the call to an external utility, and use the -printf predicate to print just the part of the path name that you care about.
For added efficiency, you could invoke a shell to collect the arguments:
sed 's%^%/moveit/%' HS_client_list |
xargs sh -c 'find "$#" -type f -iname "*.enc" -mmin +600 -execdir basename {} \;' _ >/home/infa91punv/find.log
This will run as many directories as possible in a single find invocation.
If you want to keep your loop, the solution is to put the redirection after done. I would still factor out the cd, and take care to quote the variable interpolation.
while IFS= read -r line; do
find /moveit/"$line" -type f -iname "*.enc" -mmin +600 -execdir basename '{}' ';'
done < HS_client_list >/home/infa91punv/find.log

BASH: Filter list of files by return value of another command

I have series of directories with (mostly) video files in them, say
test1
1.mpg
2.avi
3.mpeg
junk.sh
test2
123.avi
432.avi
432.srt
test3
asdf.mpg
qwerty.mpeg
I create a variable (video_dir) with the directory names (based on other parameters) and use that with find to generate the basic list. I then filter based on another variable (video_type) for file types (because there is sometimes non-video files in the dirs) piping it through egrep. Then I shuffle the list around and save it out to a file. That file is later used by mplayer to slideshow through the list.
I currently use the following command to accomplish that. I'm sure it's a horrible way to do it, but it works for me and it's quite fast even on big directories.
video_dir="/test1 /test2"
video_types=".mpg$|.avi$|.mpeg$"
find ${video_dir} -type f |
egrep -i "${video_types}" |
shuf > "$TEMP_OUT"
I now would like to add the ability to filter out files based on the resolution height of the video file. I can get that from.
mediainfo --Output='Video;%Height%' filename
Which just returns a number. I have tried using the -exec functionality of find to run that command on each file.
find ${video_dir} -type f -exec mediainfo --Output='Video;%Height%' {} \;
but that just returns the list of heights, not the filenames and I can't figure out how to reject ones based on a comparison, like <480.
I could do a for next loop but that seems like a bad (slow) idea.
Using info from #mark-setchell I modified it to,
video_dir="test1"
find ${video_dir} -type f \
-exec bash -c 'h=$(mediainfo --Output="Video;%Height%" "$1"); [[ $h -gt 480 ]]' _ {} \; -print
Which works.
You can replace your egrep with the following so you are still inside the find command (-iname is case insensitive and -o represents a logical OR):
find test1 test2 -type f \
\( -iname "*.mpg" -o -iname "*.avi" -o -iname "*.mpeg" \) \
NEXT_BIT
The NEXT_BIT can then -exec bash and exit with status 0 or 1 depending on whether you want the current file included or excluded. So it will look like this:
-exec bash -c 'H=$(mediainfo -output ... "$1"); [ $H -lt 480 ] && exit 1; exit 0' _ {} \;
So, taking note of #tripleee advice in comments about superfluous exit statements, I get this:
find test1 test2 -type f \
\( -iname "*.mpg" -o -iname "*.avi" -o -iname "*.mpeg" \) \
-exec bash -c 'h=$(mediainfo ...options... "$1"); [ $h -lt 480 ]' _ {} \; -print
This Q&A was focused on one particular case, so the accepted answer is not as general as it could be.
find
If the list of files comes from find, one can use its filtering facilities, e.g. -exec:
find ${video_dir} -type f \
-exec COMMAND \; \
-print
Here
COMMAND is not enclosed in quotes -- find reads everything after -exec and up to a \;
find will expand {} to the current file name (including path -- you might find -execdir helpful, which will cd to the file's directory and replace {} with the leaf file name)
The exit code of COMMAND is treated as follows:
0 -> true
non-0 -> false
Note that you can build more complex expressions (e.g. -not -exec ...), which will be evaluated "from left to right, according to the rules of precedence ... -and is assumed where the operator is omitted." (per man find)
xargs
If the list of files comes from elsewhere (and is available on stdin), you can use xargs as follows (from
If xargs is map, what is filter? )
ls | xargs -I{} bash -c "COMMAND '{}' && echo '{}'"
Here is my solution.
#!/bin/bash
shopt -s nullglob
video_dir=(/test1 /test2)
while IFS= read -rd '' file; do
if [[ $file = *.#(mpg|avi|mpeg|mp4) ]]; then
h=$(mediainfo --Output="Video;%Height%" "$file")
(( h >= 480 )) && echo "$file"
fi
done < <(find "${video_dir[#]}" -type f -print0)
This solution you can process everything inside the while read loop.

How to exclude new lines (carriage returns) using find regex on GNU/Linux cli?

I'm trying to rename all files in a directory with de JPG extension to lowercase jpg.
I've made this bash code with the help of this post:
find . -regex ".*\.JPG" -exec sh -c 'echo "$0" | sed -r "s/\.JPG/\.jpg/" && mv "$0" "$1"' {} \;
But I get the following error:
./IMG_1352.jpg
mv: cannot move './IMG_1352.JPG' to '': No such file or directory
(and so on...)
I think I need to change names "places" but I don't know how.
TL;DR: Try this code:
find . -name '*.JPG' -type f -exec sh -c 'mv $0 ${0/.JPG/.jpg}' {} \;
More commentary:
Your code is pretty close to where it needs to be. There is an issue, however, in as much as $1 is not defined.
I ran the following code to figure out what value was going to be in $1.
$ ls
bye.jpg hi.JPG
$ find . -regex ".*\.JPG" -exec sh -c 'echo "$0" | sed "s/\.JPG/\.jpg/" && echo "0:$0 1:$1" ' {} \;
./hi.jpg
0:./hi.JPG 1:
According to my results above, there is no value in $1. You should be aware that $1 actually refers to the second parameter passed to the sh. That is, sh -c 'code' $0 $1 ...
Anyway, you'll need to capture the result of the sed command in a variable to pass it to the move command as follows:
find . -regex ".*\.JPG" -exec sh -c '
lower=$( echo "$0" | sed -r "s/\.JPG/\.jpg/" )
mv "$0" "$lower"
' {} \;
All that said, you could make this more concise. As PS suggests, the for loop is a good choice. Also, the JPG -> jpb replacement is more readable with PS's suggestion. If you are sold on using find, you can incorporate PS's suggestion as follows:
find . -regex ".*\.JPG" -exec sh -c '
filename=$0
mv "$filename" "${filename/.JPG/.jpg}"
' {} \;

Using 'find' to return filenames without extension

I have a directory (with subdirectories), of which I want to find all files that have a ".ipynb" extension. But I want the 'find' command to just return me these filenames without the extension.
I know the first part:
find . -type f -iname "*.ipynb" -print
But how do I then get the names without the "ipynb" extension?
Any replies greatly appreciated...
To return only filenames without the extension, try:
find . -type f -iname "*.ipynb" -execdir sh -c 'printf "%s\n" "${0%.*}"' {} ';'
or (omitting -type f from now on):
find "$PWD" -iname "*.ipynb" -execdir basename {} .ipynb ';'
or:
find . -iname "*.ipynb" -exec basename {} .ipynb ';'
or:
find . -iname "*.ipynb" | sed "s/.*\///; s/\.ipynb//"
however invoking basename on each file can be inefficient, so #CharlesDuffy suggestion is:
find . -iname '*.ipynb' -exec bash -c 'printf "%s\n" "${#%.*}"' _ {} +
or:
find . -iname '*.ipynb' -execdir basename -s '.sh' {} +
Using + means that we're passing multiple files to each bash instance, so if the whole list fits into a single command line, we call bash only once.
To print full path and filename (without extension) in the same line, try:
find . -iname "*.ipynb" -exec sh -c 'printf "%s\n" "${0%.*}"' {} ';'
or:
find "$PWD" -iname "*.ipynb" -print | grep -o "[^\.]\+"
To print full path and filename on separate lines:
find "$PWD" -iname "*.ipynb" -exec dirname "{}" ';' -exec basename "{}" .ipynb ';'
Here's a simple solution:
find . -type f -iname "*.ipynb" | sed 's/\.ipynb$//1'
I found this in a bash oneliner that simplifies the process without using find
for n in *.ipynb; do echo "${n%.ipynb}"; done
If you need to have the name with directory but without the extension :
find . -type f -iname "*.ipynb" -exec sh -c 'f=$(basename $1 .ipynb);d=$(dirname $1);echo "$d/$f"' sh {} \;
find . -type f -iname "*.ipynb" | grep -oP '.*(?=[.])'
The -o flag outputs only the matched part. The -P flag matches according to Perl regular expressions. This is necessary to make the lookahead (?=[.]) work.
Perl One Liner
what you want
find . | perl -a -F/ -lne 'print $F[-1] if /.*.ipynb/g'
Then not your code
what you do not want
find . | perl -a -F/ -lne 'print $F[-1] if !/.*.ipynb/g'
NOTE
In Perl you need to put extra .. So your pattern would be .*.ipynb
If there's no occurrence of this ".ipynb" string on any file name other than a suffix, then you can try this simpler way using tr:
find . -type f -iname "*.ipynb" -print | tr -d ".ipbyn"
If you don't know that the extension is or there are multiple you could use this:
find . -type f -exec basename {} \;|perl -pe 's/(.*)\..*$/$1/;s{^.*/}{}'
and for a list of files with no duplicates (originally differing in path or extension)
find . -type f -exec basename {} \;|perl -pe 's/(.*)\..*$/$1/;s{^.*/}{}'|sort|uniq
Another easy way which uses basename is:
find . -type f -iname '*.ipynb' -exec basename -s '.ipynb' {} +
Using + will reduce the number of invocations of the command (manpage):
-exec command {} +
This variant of the -exec action runs the specified command on
the selected files, but the command line is built by appending
each selected file name at the end; the total number of
invocations of the command will be much less than the number
of matched files. The command line is built in much the same
way that xargs builds its command lines. Only one instance of
'{}' is allowed within the command, and (when find is being
invoked from a shell) it should be quoted (for example, '{}')
to protect it from interpretation by shells. The command is
executed in the starting directory. If any invocation with
the `+' form returns a non-zero value as exit status, then
find returns a non-zero exit status. If find encounters an
error, this can sometimes cause an immediate exit, so some
pending commands may not be run at all. For this reason -exec
my-command ... {} + -quit may not result in my-command
actually being run. This variant of -exec always returns
true.
Using -s with basename runs accepts multiple filenames and removes a specified suffix (manpage):
-a, --multiple
support multiple arguments and treat each as a NAME
-s, --suffix=SUFFIX
remove a trailing SUFFIX; implies -a

bash: complex test in find command

I would like to do something like:
find . -type f -exec test $(file --brief --mime-type '{}' ) == 'text/html' \; -print
but I can't figure out the correct way to quote or escape the args to test, especially the '$(' ... ')' .
You cannot simply escape the arguments for passing them to find.
Any shell expansion will happen before find is run. find will not pass its arguments through a shell, so even if you escape the shell expansion, everything will simply be treated as literal arguments to the test command, not expanded by the shell as you are expecting.
The best way to achieve what you want would be to write a short shell script, which takes the filename as an argument, and use -exec on that:
find . -type f -exec is_html.sh {} \; -print
with is_html.sh:
#!/bin/sh
test $(file --brief --mime-type "$1") == 'text/html'
If you really want it all on one line, without using a separate script, you can invoke sh directly from find:
find . -type f -exec sh -c 'test $(file --brief --mime-type "$0") == "text/html"' {} \; -print
Although it may be possible to turn it into one wildly quoted statement, it is often easier - and more clear - to be a little more verbose:
$ find . -type f -print0 | xargs -0 file --mime-type | ↷
grep ':[^:]*text/html$'| sed 's,:[^:]*text/html,,'
Use "{}" instead, for an example this simply lists file types:
find * -maxdepth 0 -exec file "{}" \;

Resources