BASH: Filter list of files by return value of another command

BASH: Filter list of files by return value of another command - linux

I have series of directories with (mostly) video files in them, say
test1
1.mpg
2.avi
3.mpeg
junk.sh
test2
123.avi
432.avi
432.srt
test3
asdf.mpg
qwerty.mpeg
I create a variable (video_dir) with the directory names (based on other parameters) and use that with find to generate the basic list. I then filter based on another variable (video_type) for file types (because there is sometimes non-video files in the dirs) piping it through egrep. Then I shuffle the list around and save it out to a file. That file is later used by mplayer to slideshow through the list.
I currently use the following command to accomplish that. I'm sure it's a horrible way to do it, but it works for me and it's quite fast even on big directories.
video_dir="/test1 /test2"
video_types=".mpg$|.avi$|.mpeg$"
find ${video_dir} -type f |
egrep -i "${video_types}" |
shuf > "$TEMP_OUT"
I now would like to add the ability to filter out files based on the resolution height of the video file. I can get that from.
mediainfo --Output='Video;%Height%' filename
Which just returns a number. I have tried using the -exec functionality of find to run that command on each file.
find ${video_dir} -type f -exec mediainfo --Output='Video;%Height%' {} \;
but that just returns the list of heights, not the filenames and I can't figure out how to reject ones based on a comparison, like <480.
I could do a for next loop but that seems like a bad (slow) idea.
Using info from #mark-setchell I modified it to,
video_dir="test1"
find ${video_dir} -type f \
-exec bash -c 'h=$(mediainfo --Output="Video;%Height%" "$1"); [[ $h -gt 480 ]]' _ {} \; -print
Which works.

You can replace your egrep with the following so you are still inside the find command (-iname is case insensitive and -o represents a logical OR):
find test1 test2 -type f \
\( -iname "*.mpg" -o -iname "*.avi" -o -iname "*.mpeg" \) \
NEXT_BIT
The NEXT_BIT can then -exec bash and exit with status 0 or 1 depending on whether you want the current file included or excluded. So it will look like this:
-exec bash -c 'H=$(mediainfo -output ... "$1"); [ $H -lt 480 ] && exit 1; exit 0' _ {} \;
So, taking note of #tripleee advice in comments about superfluous exit statements, I get this:
find test1 test2 -type f \
\( -iname "*.mpg" -o -iname "*.avi" -o -iname "*.mpeg" \) \
-exec bash -c 'h=$(mediainfo ...options... "$1"); [ $h -lt 480 ]' _ {} \; -print

This Q&A was focused on one particular case, so the accepted answer is not as general as it could be.
find
If the list of files comes from find, one can use its filtering facilities, e.g. -exec:
find ${video_dir} -type f \
-exec COMMAND \; \
-print
Here
COMMAND is not enclosed in quotes -- find reads everything after -exec and up to a \;
find will expand {} to the current file name (including path -- you might find -execdir helpful, which will cd to the file's directory and replace {} with the leaf file name)
The exit code of COMMAND is treated as follows:
0 -> true
non-0 -> false
Note that you can build more complex expressions (e.g. -not -exec ...), which will be evaluated "from left to right, according to the rules of precedence ... -and is assumed where the operator is omitted." (per man find)
xargs
If the list of files comes from elsewhere (and is available on stdin), you can use xargs as follows (from
If xargs is map, what is filter? )
ls | xargs -I{} bash -c "COMMAND '{}' && echo '{}'"

Here is my solution.
#!/bin/bash
shopt -s nullglob
video_dir=(/test1 /test2)
while IFS= read -rd '' file; do
if [[ $file = *.#(mpg|avi|mpeg|mp4) ]]; then
h=$(mediainfo --Output="Video;%Height%" "$file")
(( h >= 480 )) && echo "$file"
fi
done < <(find "${video_dir[#]}" -type f -print0)
This solution you can process everything inside the while read loop.

Related

Variable causing issue while doing the test command in unix:: find $PWD -type d -exec sh -c 'test "{}" ">" "$PWD/$VersionFolders"' ; -print|wc -l`

Variable causing issue while doing the test command in unix Command is::
find $PWD -type d -exec sh -c 'test "{}" ">" "$PWD/$VersionFolders"' \; -print|wc -l`
Input Values-
Here $PWD- Current Directory
b1_v.1.0
b1_v.1.2
b1_v.1.3
b1_v.1.4
Given Version folder as $VersionFolders
b1_v.1.2
The Command should check if any folders exist in current directory which is greater than the give version folder and it should count or display.
This approach has to be consider with out date or time created of folders.
Expected Output-
b1_v.1.3
b1_v.1.4
If I give hard code Directories its working fine. But when I pass it like as variable.it give all folders.
working fine this commend-
find $PWD -type d -exec sh -c 'test "{}" ">" "$PWD/b1_v.1.2"' \; -print|wc -l`
Not working this command with variable-
find $PWD -type d -exec sh -c 'test "{}" ">" "$PWD/$VersionFolders"' ; -print|wc -l`

The variable $VersionFolders won't get expanded inside the single quotes, and apparently you did not export it to make it visible to subprocesses.
An obscure but common hack is to put it in $0 (which nominally should contain the name of the shell itself, and is the first argument after sh -c '...') because that keeps the code simple.
find . -type d \
-exec sh -c 'test "$1" ">" "$0"' \
"$VersionFolders" {} \; \
-print | wc -l
But as #chepner remarks, you can run -exec test {} ">" "$VersionFolders" \; directly.
The shell already does everything in the current directory, so you don't need to spell out $PWD. Perhaps see also What exactly is current working directory?

Find files matching a pattern, replace strings and then diff the output with original, command fails

I am trying to find files with the name.* and run sed on the ones that match, then pipe to diff to see what was changed.
However the command fails. If I remove the pipe an diff it is happy to output results. Why is failing with the diff? Is there a better way to do this?
> find -type f -name "names.*" -printf '%p' -exec sed 's/Cow/Kitten' {} | diff {} - \;
diff: extra operand ';'
diff: Try 'diff --help' for more information.
find: missing argument to \-exec\'`

A shell is needed to do what you wanted, like so.
find -type f -name "names.*" -exec sh -c '
for f; do
sed 's/Cow/Kitten/' "$f" | diff "$f" -
done' _ {} \;
In-one-line
find -type f -name "names.*" -exec sh -c 'for f; do sed 's/Cow/Kitten/' "$f" | diff "$f" -; done' _ {} \;
See understanding-the-exec-option-of-find
Or using a while + read loop and Process Substitution.
#!/usr/bin/env bash
while IFS= read -rd '' files; do
sed 's/Cow/Kitten/' "$files" | diff "$files" -
done < <(find -type f -name "names.*" -print0)
The latter script is white space/tab/newlines safe but is strictly bash as oppose to the former script which is POSIX sh. (Will/should work/execute with any POSIX compliant shell.)
See How can I find and safely handle file names containing newlines, spaces or both?
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?

Circumvent Argument list too long in script (for loop)

I've seen a few answers regarding this, but as a newbie, I don't really understand how to implement that in my script.
it should be pretty easy (for those who can stuff like this)
I'm using a simple
for f in "/drive1/"images*.{jpg,png}; do
but this is simply overloading and giving me
Argument list too long
How is this easiest solved?

Argument list too long workaroud
Argument list length is something limited by your config.
getconf ARG_MAX
2097152
But after discuss around differences between bash specifics and system (os) limitations (see comments from that other guy), this question seem wrong:
Regarding discuss on comments, OP tried something like:
ls "/simple path"/image*.{jpg,png} | wc -l
bash: /bin/ls: Argument list too long
This happen because of OS limitation, not bash!!
But tested with OP code, this work finely
for file in ./"simple path"/image*.{jpg,png} ;do echo -n a;done | wc -c
70980
Like:
printf "%c" ./"simple path"/image*.{jpg,png} | wc -c
Reduce line length by reducing fixed part:
First step: you could reduce argument length by:
cd "/drive1/"
ls images*.{jpg,png} | wc -l
But when number of file will grow, you'll be buggy again...
More general workaround:
find "/drive1/" -type f \( -name '*.jpg' -o -name '*.png' \) -exec myscript {} +
If you want this to NOT be recursive, you may add -maxdepth as 1st option:
find "/drive1/" -maxdepth 1 -type f \( -name '*.jpg' -o -name '*.png' \) \
-exec myscript {} +
There, myscript will by run with filenames as arguments. The command line for myscript is built up until it reaches a system-defined limit.
myscript /drive1/file1.jpg '/drive1/File Name2.png' /drive1/...
From man find:
-exec command {} +
This variant of the -exec action runs the specified command on
the selected files, but the command line is built by appending
each selected file name at the end; the total number of invoca‐
tions of the command will be much less than the number of
matched files. The command line is built in much the same way
that xargs builds its command lines. Only one instance of `{}'
Inscript sample
You could create your script like
#!/bin/bash
target=( "/drive1" "/Drive 2/Pictures" )
[ "$1" = "--run" ] && exec find "${target[#]}" -type f \( -name '*.jpg' -o \
-name '*.png' \) -exec $0 {} +
for file ;do
echo Process "$file"
done
Then you have to run this with --run as argument.
work with any number of files! (Recursively! See maxdepth option)
permit many target
permit spaces and special characters in file and directrories names
you could run same script directly on files, without --run:
./myscript hello world 'hello world'
Process hello
Process world
Process hello world
Using pure bash
Using arrays, you could do things like:
allfiles=( "/drive 1"/images*.{jpg,png} )
[ -f "$allfiles" ] || { echo No file found.; exit ;}
echo Number of files: ${#allfiles[#]}
for file in "${allfiles[#]}";do
echo Process "$file"
done

There's also a while read loop:
find "/drive1/" -maxdepth 1 -mindepth 1 -type f \( -name '*.jpg' -o -name '*.png' \) |
while IFS= read -r file; do
or with zero terminated files:
find "/drive1/" -maxdepth 1 -mindepth 1 -type f \( -name '*.jpg' -o -name '*.png' \) -print0 |
while IFS= read -r -d '' file; do

Linux Move files to their child directory in a loop

Can you please suggest efficient way to move files from one location to their sub directory in a loop.
Ex:
/MY_PATH/User1/1234/Daily/abc.txt to /MY_PATH/User1/1234/Daily/Archive/abc.txt
/MY_PATH/User2/3456/Daily/def.txt to /MY_PATH/User2/3456/Daily/Archive/def.txt
/MY_PATH/User1/1111/Daily/hij.txt to /MY_PATH/User1/1111/Daily/Archive/hij.txt
/MY_PATH/User2/2222/Daily/def.txt to /MY_PATH/User2/2222/Daily/Archive/def.txt
I started in this way, but need your suggestions and best way to write it:
#!/bin/bash
dir1="/MyPath/"
subs= `ls $dir1`
for i in $subs; do
mv $dir1/$i/*/Daily $dir1/$i/*/Daily/Archive
done

My one line bash
for dir in $(
find MY_PATH -mindepth 3 -maxdepth 3 -type d -name Daily
);do
mkdir -p $dir/Archives
find $dir -maxdepth 1 -mindepth 1 ! -name Archives \
-exec mv -t $dir/Archives {} +
done
To quickly test:
mkdir -p MY_PATH/User{1,2,3,4}/{1234,2346,3333,2323}/Daily
touch MY_PATH/User{1,2,3,4}/{1234,2346,3333,2323}/Daily/{abc,bcd,def,feg,fds}.txt
for dir in $( find MY_PATH -mindepth 3 -maxdepth 3 -type d -name Daily );do
mkdir -p $dir/Archives; find $dir -maxdepth 1 -mindepth 1 ! -name Archives \
-exec mv -t $dir/Archives {} + ; done
ls -lR MY_PATH
This seem match OP's request
For more robust solution
There is a solution wich work with spaces somewhere in path...
Edited to include #mklement0's well pointed suggestion.
while IFS= read dir;do
mkdir -p "$dir"/Archives
find "$dir" -maxdepth 1 -mindepth 1 ! -name Archives \
-exec mv -t "$dir/Archives" {} +
done < <(
find MY_PATH -mindepth 3 -maxdepth 3 -type d -name Daily
)
Same demo;
mkdir -p MY_PATH/User{1,2,3,"4 3"}/{1234,"23 6",3333,2323}/Daily
touch MY_PATH/User{1,2,3,"4 3"}/{1234,"23 6",3333,2323}/Daily/{abc,"b c",def,hgz0}.txt
while read dir;do mkdir -p "$dir"/Archives;find "$dir" -maxdepth 1 -mindepth 1 \
! -name Archives -exec mv -t "$dir/Archives" {} +; done < <(
find MY_PATH -mindepth 3 -maxdepth 3 -type d -name Daily )
ls -lR MY_PATH

Assuming the directory structure is as you have shown in your examples, i.e.
MY_PATH/
subdir-level-1/
subdir-level-2/
Daily/
files
Archive/
Here's what you can do:
shopt -s nullglob # defend against globbing failure -- inspired by mklement0's answer
root="/MyPath"
for dir in "${root}"/*/*/Daily/; do
mkdir -p "${dir}/Archive" # if Archive might not exist; to be pedantic you should look at David C. Rankin's answer for error handling, but usually we know what we're doing so that's not necessary
find "${dir}" -maxdepth 1 -type f -print0 | xargs -0 mv -t "${dir}/Archive"
done
The reason I use find and xargs is to save a few processes; you can as well move files in each ${dir} one by one.
Update: #mklement0 suggested that find "${dir}" -maxdepth 1 -type f -print0 | xargs -0 mv -t "${dir}/Archive" can be further improved to
find "${dir}" -maxdepth 1 -type f -exec mv -t "${dir}/Archive" +
which is a very good point.

Try the following:
dir1="/MyPath"
for d in "$dir1"/*/*/Daily/; do
[[ -d $d ]] || break # break, if no subdirectories match
for f in "$d"/*; do # loop over files in */*/Daily/
[[ -f "$f" ]] || continue # skip non-files or if nothing matches
mv "$f" "$d"/Archive/
done
done
"$dir1"*/*/Daily/ matches all grandchild subdirectories of $dir1; thanks to the terminating /, only directories match; note that, as a result, $d ends in /.
Note that $d therefore ends in /, and, strictly speaking, needs no / later on when synthesizing paths with it (e.g., "$d"/*), but doing so does no harm and helps readability, as #4ae1e1 points out in a comment.
[[ -d $d ]] || break ensures that the loop is exited if no grandchild directories match (by default, a glob (pattern) that has no matches is passed as is to the loop).
for f in "$d"* loops over all entries (files and/or subdirs.) in $d:
[[ -f "$f" ]] || continue ensures that only files are processed or, in the event that nothing matches, the loop is exited.
mv "$f" "$d"/Archive/ then moves each file to subdir. Archive.

You need to check for, and if not present, create the destination directory before moving the file to Archive. If you cannot create the directory (due to permissions or otherwise), you skip the move. The following does not assume any limitation on depth, but will omit any directory containing Archive as an intermediate subdirectory:
oldifs="$IFS"
IFS=$'\n'
for i in $(find /MY_PATH -type f); do
[[ "$i" =~ Archive ]] && continue
[ -d "${i%/*}/Archive" ] || mkdir -p "${i%/*}/Archive"
[ -d "${i%/*}/Archive" ] || {
printf "error: unable to create '%s'\n" "${i%/*}/Archive"
continue
}
mv -fv "$i" "${i/Daily/Daily\/Archive}"
done
IFS="$oldifs"
Output when run
$ bash archive_daily.sh
mv -fv /MY_PATH/User1/1111/Daily/hij.txt /MY_PATH/User1/1111/Daily/Archive/hij.txt
mv -fv /MY_PATH/User1/1234/Daily/abc.txt /MY_PATH/User1/1234/Daily/Archive/abc.txt
mv -fv /MY_PATH/User2/3456/Daily/def.txt /MY_PATH/User2/3456/Daily/Archive/def.txt
mv -fv /MY_PATH/User2/2222/Daily/def.txt /MY_PATH/User2/2222/Daily/Archive/def.txt
Note: you can limit/tighten the file selection by adjusting the call to find populating the for loop (e.g. -name or -iname). This simply checks/moves every file to its Archive folder. To limit to only files with the .txt extension, you can specify find /MY_PATH -type f -name "*.txt". To limit to only files in the /MY_PATH/User1 and /MY_PATH/User2directories with a .txt extension, use find /MY_PATH/User[12] -type f -name "*.txt".
Note2: when looping on filenames, the paths & filenames should not contain non-standard characters for the current locale. Certainly you should not have the '\n' as a character in your filename. Setting IFS is required to protect against word splitting on spaces in either the path or filename.

Since you said efficient, anything with a subshell will fail in funny ways with lots of entries. You're better off using xargs:
#!/bin/bash
dir1="/MyPath/"
find $dir1 -name Daily -type d -depth 3 | while read i
do
pushd .
cd $i
mkdir Archive
find . -type f -depth 1 | xargs -J {} mv {} Archive
popd
done
The outer find will look for you Daily directories. It's very specific in that they have to be at a certain depth and directories, not regular files. The results gets piped into read, where each directory is entered, Archive is created, and files batch-copied with xargs ... mv. Complete file lists and directory lists are never stored in memory, so it scales very well.

How to pipe the results of 'find' to mv in Linux

How do I pipe the results of a 'find' (in Linux) to be moved to a different directory? This is what I have so far.
find ./ -name '*article*' | mv ../backup
but its not yet right (I get an error missing file argument, because I didn't specify a file, because I was trying to get it from the pipe)

find ./ -name '*article*' -exec mv {} ../backup \;
OR
find ./ -name '*article*' | xargs -I '{}' mv {} ../backup

xargs is commonly used for this, and mv on Linux has a -t option to facilitate that.
find ./ -name '*article*' | xargs mv -t ../backup
If your find supports -exec ... \+ you could equivalently do
find ./ -name '*article*' -exec mv -t ../backup {} \+
The -t option is a GNU extension, so it is not portable to systems which do not have GNU coreutils (though every proper Linux I have seen has that, with the possible exception of Busybox). For complete POSIX portability, it's of course possible to roll your own replacement, maybe something like
find ./ -name '*article*' -exec sh -c 'mv "$#" "$0"' ../backup {} \+
where we shamelessly abuse the convenient fact that the first argument after sh -c 'commands' ends up as the "script name" parameter in $0 so that we don't even need to shift it.
Probably see also https://mywiki.wooledge.org/BashFAQ/020

I found this really useful having thousands of files in one folder:
ls -U | head -10000 | egrep '\.png$' | xargs -I '{}' mv {} ./png
To move all pngs in first 10000 files to subfolder png

mv $(find . -name '*article*') ../backup

Here are a few solutions.
find . -type f -newermt "2019-01-01" ! -newermt "2019-05-01" \
-exec mv {} path \;**
or
find path -type f -newermt "2019-01-01" ! -newermt "2019-05-01" \
-exec mv {} path \;
or
find /Directory/filebox/ -type f -newermt "2019-01-01" \
! -newermt "2019-05-01" -exec mv {} ../filemove/ \;
The backslash + newline is just for legibility; you can equivalently use a single long line.

xargs is your buddy here (When you have multiple actions to take)!
And using it the way I have shown will give great control to you as well.
find ./ -name '*article*' | xargs -n1 sh -c "mv {} <path/to/target/directory>"
Explanation:
-n1
Number of lines to consider for each operation ahead
sh -c
The shell command to execute giving it the lines as per previous condition
"mv {} /target/path"
The move command will take two arguments-
1) The line(s) from operation 1, i.e. {}, value substitutes automatically
2) The target path for move command, as specified
Note: the "Double Quotes" are specified to allow any number of spaces or arguments for the shell command which receives arguments from xargs

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

BASH: Filter list of files by return value of another command - linux

Related

Variable causing issue while doing the test command in unix:: find $PWD -type d -exec sh -c 'test "{}" ">" "$PWD/$VersionFolders"' ; -print|wc -l`

Find files matching a pattern, replace strings and then diff the output with original, command fails

Circumvent Argument list too long in script (for loop)

Linux Move files to their child directory in a loop

How to pipe the results of 'find' to mv in Linux

Categories

Resources