Bash Globbing Pattern Matching for Imagemagick recursive convert to pdf - linux

I have the following 2 scripts, that recursively convert folders of images to pdf's for my wifes japanese manga kindle using find and Imagemagick convert:
#!/bin/bash
_d="$(pwd)"
echo "$_d"
find . -type d -exec echo "Will convert in the following order: {}" \;
find . -type d -exec echo "Converting: '{}'" \; -exec convert '{}/*.jpg' "$_d/{}.pdf" \;
and the same for PNG
#!/bin/bash
_d="$(pwd)"
echo "$_d"
find . -type d -exec echo "Will convert in the following order: {}" \;
find . -type d -exec echo "Converting: '{}'" \; -exec convert '{}/*.png' "$_d/{}.pdf" \;
Unfortunately I am not able make one universal script that works for all image formats.
How do I make one script that works for both ?
I would also need JPG,PNG as well as jpeg,JPEG
Thx in advance

I wouldn't use find at all, just a loop:
#!/use/bin/env bash
# enable recursive globs
shopt -s globstar
for dir in **/*/; do
printf "Converting jpgs in %s\n" "$dir"
convert "$dir"/*.jpg "$dir/out.pdf"
done
If you want to combine .jpg and .JPG in the same pdf, add nocaseglob to the shopt line. Add .jpeg to the mix? Add extglob and change "$dir"/*.jpg to "$dir"/*.#(jpg|jpeg)

You can do more complicated actions if you turn the find exec into a bash function (or even a standalone script).
#!/bin/bash
do_convert()(
shopt -s nullglob
for dir in "$#"; do
files=("$dir"/*.{jpg,JPG,PNG,jpeg,JPEG})
if [[ -z $files ]]; then
echo 1>&2 "no suitable files in $dir"
continue
fi
echo "Converting $dir"
convert "${files[#]}" "$dir.pdf"
done
)
export -f do_convert
pwd
echo "Will convert in the following order:"
find . -type d
# find . -type d -exec bash -c 'do_convert {}' \;
find . -type d -exec bash -c 'do_convert "$#"' -- {} \+
nullglob makes *.xyz return nothing if there is no match, instead of returning the original string unchanged
p/*.{a,b,c} expands into p/*.a p/*.b p/*.c before the * are expanded
x()(...) instead of the more normal x(){...} uses a subshell so we don't have to remember to unset nullglob again or clean up any variable definitions
export -f x makes function x available in subshells
we skip conversion if there are no suitable files
with the slightly more complicated find command, we can reduce the number of invocations of bash (probably doesn't save a great deal in this particular case)

how about a one-liner
dry-run
find -name \*.jpg -or -name \*.png | xargs -I xxx echo "xxx =>" xxx.pdf
run
find -name \*.jpg -or -name \*.png | xargs -I xxx echo xxx xxx.pdf
help
-name match name
-or logical or => both jpg and png
xargs map input into a name to execute a command on
-I select a name, it is like {} in file
NOTE
instead of $(pwd) which is a command substitution you can use variable $PWD
xxx maps into a name and xxx.pdf still has the matched extension found by find. which means filename.png becomes filename.png.pdf. If this is not desired, you can sed it
to run convert command in parallel you can use -P 0 with xargs -- see xargs --help
With sed to remove extensions
dry-run
find -name \*.jpg -or -name \*.png | sed 's/.\(png\|jpg\)$//g' | xargs -I xxx echo "xxx =>" xxx.pdf

#shawn Your solution works, just as I stated in the comments, I am to stupid to name the resulting pdf properly (folder name) and save in the script caller directory. Nevertheless, it solves my case insensitive jpg, jpeg, png problems just fine.
Here is shawns solution:
#!/bin/bash
# enable recursive globs
shopt -s globstar nocaseglob extglob
for dir in **/*/; do
printf "Converting (jpg|jpeg|png) in %s\n" "$dir"
convert "$dir"/*.#(jpg|jpeg|png) "$dir/out.pdf"
done
#jhnc Your solution works out of the box, it does exactly what I intended, and I really like calling functions, or even standalone scripts to increase complexity. One drawback is, that I can not Ctrl-c the process, because it is thereby threaded, or runs in a subshell ? I think you were missing an exit statement at the end of the function, it never stopped.
#!/bin/bash
do_convert()(
shopt -s nullglob
for dir in "$#"; do
files=("$dir"/*.{jpg,JPG,png,PNG,jpeg,JPEG})
if [[ -z $files ]]; then
echo 1>&2 "no suitable files in $dir"
continue
fi
echo "Converting $dir"
convert "${files[#]}" "$dir.pdf"
done
exit
)
export -f do_convert
pwd
echo "Will convert in the following order:"
find . -type d
# find . -type d -exec bash -c 'do_convert {}' \;
find . -type d -exec bash -c 'do_convert "$#"' -- {} \+
# everyone else, it's already after midnight again, I guess this is a trivial question for you guys, and I am very grateful for your ALL your answers, I didn't have the time to try everything.
I find linux bash very challenging.

A lot of ways to skin this cat. My thought is:
for F in `find . -type f -print`
do
TYPE=`file -n --mime-type $F`
if [ "$TYPE" = image/png ]
then
## do png conversion here
elif [ "$TYPE" = image/jpg ]
then
## do jpg conversion here
fi
done

Related

Use find to copy files to a new folder

I'm searching for a find command to copy all wallpaper files that look like this:
3245x2324.png (All Numbers are just a placeholder)
3242x3242.jpg
I'm in my /usr/share/wallpapers folder and there are many sub folders with the files I want to copy.
There are many like "screenshot.png" and these files I don't want to copy.
My find command is like this:
find . -type f -name "*????x????.???"
If I search with this I get the files I wanted to see, but if I combine this with -exec cp:
find . -type f -name "*????x????.???" -exec cp "{}" /home/mine/Pictures/WP \;
the find command only copies 10 files and there are 77 (I counted with wc).
Does anyone know what I'm doing wrong?
You can look it up if you follow the link.
renaming with find
You can use -exec to do this. But i'm not sure you can do rename and copy in one take.Maybe with a script that got executed after every find result.
But that's only a suggestion.
One idea/approach is to copy absolute path of the file in question to the destination, but replace the / with an underscore _ since / is not allowed in file names, at least in a Unix like environment.
With find and bash, Something like.
find /usr/share/wallpapers -type f -name "????x????.???" -exec bash -c '
destination=/home/mine/Pictures/WP/
shift
for f; do
path_name=${f%/*}
file_name=${f##*/}
echo cp -vi -- "$f" "$destination${path_name//\//_}$file_name"
done' _ {} +
See understanding-the-exec-option-of-find
With globstar nullglob shell option and Associative array from the bash shell to avoid the duplicate filenames.
#!/usr/bin/env bash
shopt -s globstar nullglob
pics=(/usr/share/wallpapers/**/????x????.???)
shopt -u globstar nullglob
declare -A dups
destination=/home/mine/Pictures/WP/
for i in "${pics[#]}"; do
((!dups["${i##*/}"]++)) &&
echo cp -vi -- "$i" "$destination"
done
GNU cp(1) has the -u flag/option which might come in handy along the way.
Remove the echo if you're satisfied with the result.
Another option is to add a trailing ( ) with a number/int inside it and increment it , e.g. ????x????.???(N) where N is a number/int. Pretty much like how some gui file manager deals with duplicate file/directory names.
Something like:
#!/usr/bin/env bash
source=/usr/share/wallpapers/
destination=/home/mine/Pictures/WP/
while IFS= read -rd '' file; do
counter=1
file_name=${file##*/}
if [[ ! -e "$destination$file_name" && ! -e "$destination$file_name($counter)" ]]; then
cp -v -- "$file" "$destination$file_name"
elif [[ -e "$destination$file_name" && ! -e "$destination$file_name($counter)" ]]; then
cp -v -- "$file" "$destination$file_name($counter)"
elif [[ -e "$destination$file_name" && -e "$destination$file_name($counter)" ]]; then
while [[ -e "$destination$file_name($counter)" ]]; do
((counter++))
done
cp -v -- "$file" "$destination$file_name($counter)"
fi
done < <(find "$source" -type f -name '????x????.???' -print0)
Note that the -print0 primary is a GNU/BSD find(1) feature.

How to find and delete resized Wordpress images if the original image was already deleted?

This question pertains to the situation where
An image was uploaded, say mypicture.jpg
Wordpress created multiple copies of it with different resolutions like mypicture-300x500.jpg and mypicture-600x1000.jpg
You delete the original image only
In this scenario, the remaining photos on the filesystem are mypicture-300x500.jpg and mypicture-600x1000.jpg.
How can you script this to find these "dangling" images with the missing original and delete the "dangling" images.
You could use find to find all lower resolution pictures with the -regex test:
find . -type f -regex '.*-[0-9]+x[0-9]+\.jpg'
And this would be much better than trying to parse the ls output which is for humans only, not for automation. A safer (and simpler) bash script could thus be:
#!/usr/bin/env bash
while IFS= read -r -d '' f; do
[[ "$f" =~ (.*)-[0-9]+x[0-9]+\.jpg ]] &&
! [ -f "${BASH_REMATCH[1]}".jpg ] &&
echo rm -f "$f"
done < <(find . -type f -regex '.*-[0-9]+x[0-9]+\.jpg' -print0)
(delete the echo once you will be convinced that it works as expected).
Note: we use the -print0 action and the empty read delimiter (-d '') to separate the file names with the NUL character instead of the newline character. This is preferable because it works as expected even if you have unusual file names (e.g., with spaces).
Note: as we test the file name inside the loop we could simply search for files (find . -type f -print0). But I suspect that if you have a large number of files the performance would be negatively impacted. So keeping the -regex test is probably better.
Bash loops are OK but they tend to become really slow when the number of iteration increases. So, let's incorporate our simple bash script in a single find command with the -exec action:
find . -type f -exec bash -c '[[ "$1" =~ (.*)-[0-9]+x[0-9]+\.jpg ]] &&
! [ -f "${BASH_REMATCH[1]}".jpg ]' _ {} \; -print
Note: bash -c takes a script to execute as first argument, then the positional parameters to pass to the script, starting with $0. This is why we pass _ (my favourite for don't care), followed by {} (the current file path).
Note: -print is normally the default find action but here it is needed because -exec is one of the find actions that inhibit the default behaviour.
This will print a list of files. Check that it is correct and, once you will be satisfied, add the -delete action:
find . -type f -exec bash -c '[[ "$1" =~ (.*)-[0-9]+x[0-9]+\.jpg ]] &&
! [ -f "${BASH_REMATCH[1]}".jpg ]' _ {} \; -delete -print
See man find and man bash for more explanations.
Demo:
$ touch mypicture.jpg mypicture-300x500.jpg mypicture-600x1000.jpg
$ find . -type f -exec bash -c '[[ "$1" =~ (.*)-[0-9]+x[0-9]+\.jpg ]] &&
! [ -f "${BASH_REMATCH[1]}".jpg ]' _ {} \; -print
$ rm -f mypicture.jpg
$ find . -type f -exec bash -c '[[ "$1" =~ (.*)-[0-9]+x[0-9]+\.jpg ]] &&
! [ -f "${BASH_REMATCH[1]}".jpg ]' _ {} \; -print
./mypicture-300x500.jpg
./mypicture-600x1000.jpg
$ find . -type f -exec bash -c '[[ "$1" =~ (.*)-[0-9]+x[0-9]+\.jpg ]] &&
! [ -f "${BASH_REMATCH[1]}".jpg ]' _ {} \; -delete -print
./mypicture-300x500.jpg
./mypicture-600x1000.jpg
$ ls *.jpg
ls: cannot access '*.jpg': No such file or directory
One last note: if, by accident, one of your full resolution picture matches the regular expression for lower resolution pictures (e.g., if you have a balloon-1x1.jpg full resolution picture) it will be deleted. This is unfortunate but according your specifications there is no easy way to distinguish it from an orphan lower resolution picture. Be careful...
I've written a Bash script that will attempt to find the original filename (i.e. mypicture.jpg) based on scraping away the WordPress resolution (i.e. mypicture-300x500.jpg), and if it's not found, delete the "dangling image" (i.e. rm -f mypicture-300x500.jpg)
#!/bin/bash
for directory in $(find . -type d)
do
for image in $(ls $directory)
do
echo "The current filename is $image"
resolution=$(echo $image | rev | cut -f 1 -d "-" | rev | xargs)
echo "The resolution is $resolution"
extension=$(echo $resolution | rev| cut -f 1 -d "." | rev | xargs)
echo "The extension is $extension"
resolutiononly=$(echo $resolution | sed "s#.$extension##g")
echo "The resolution only is $resolutiononly"
pattern="[0-9]+x[0-9]+"
if [[ $resolutiononly =~ $pattern ]]; then
echo "The pattern matches"
originalfilename=$(echo $image | sed "s#-$resolution#.$extension#g")
echo "The current filename is $image"
echo "The original filename is $originalfilename"
if [[ -f "$originalfilename" ]]; then
echo "The file exists $originalfilename"
else
rm -f $directory/$image
fi
else
break
fi
done
done

Need guidance with a bash script to check log files in a certain directory for a certain string

I would like to preface this with I am a complete noob with scripting. So I have a situation where I need to manually look for a phone number that could live in one of hundreds of files.
so the logs live in the following directory.
/actlogs/sbclogger_archive
The logs file names are in directories numbered 01-31 inside of that directory and all the files are zipped.
Inside of those numbered directories are tons of files but the only ones I want to search are "sipd.logthenthedate.gz" and "sipmsg.logthenthedate.gz".
So I need to look in all the files in the following directory.
"/actlogs/sbclogger_archive"
Which has 31 directories labeled "01-31"
Then in each 01-31 there is hundreds of files the only ones I want to look are are "sipd.logthenthedate.gz" and "sipmsg.logthenthedate.gz".
The script I am using is below, please let me know what I could do to make this work.
#!/bin/bash
read -p "Enter a phone number: " text
read -p "Enter directory of log file's, Hint it should be /actlogs/sbclogger_archive: " directory
#arr=( $(find $directory -type f -exec grep -l "$text" {} \; | sort -r) )
#find $directory -type f -exec grep -qe "$text" {} \; -exec bash -c '
file=$(find $directory -type f -name 'sipd.log*' -exec grep -qe "$text" {} \; -exec bash -c 'select f; do echo $f; break; done' find-sh {} +;)
if [ -z "$file" ]; then
echo "No matches found."
else
echo "select tool:"
tools=("nano" "less" "vim" "quit")
select tool in "${tools[#]}"
do
case $tool in
"quit")
break
;;
*)
$tool $file
break
;;
esac
done
fi
This would give you the list of files matching:
find \( -name 'sipd.log[0-9]*.gz' -o -name 'sipmsg.log[0-9]*.gz' \) \
-exec sh -c 'gunzip -c {}| grep -m1 -q 888333' \; -print
./18/sipd.log20200118.gz
./7/sipd.log20200107.gz
Note: -m1 tells grep to stop after first match, since you need only the file name in this case, it's enough.
If you have zgrep, you can shorten it to:
find \( -name 'sipd.log[0-9]*.gz' -o -name 'sipmsg.log[0-9]*.gz' \) \
-exec zgrep -l '888333' {} \;
./18/sipd.log20200118.gz
./7/sipd.log20200107.gz
Also, some of the tools you are suggesting do not support gzip files (nano and some variants of less for example). In which case you might need to decompress the file and compress it again when done.
And, you might want to consider a loop if you want to "quit". Feeding the file list to the tool doesn't make sense.
Note: AFAIK zgrep doesn't do recursive:
DESCRIPTION
Zgrep invokes grep on compressed or gzipped files. These grep options will cause zgrep to terminate with an
error code:
(-[drRzZ]|--di*|--exc*|--inc*|--rec*|--nu*). All other options specified are passed directly to grep. If no file is specified, then
the
standard input is decompressed if necessary and fed to grep. Otherwise the given files are uncompressed if necessary and fed to
grep.
so zgrep -rl "$text" "$directory" or zgrep -rl --include 'simpd.log*.gz' "$test" {01..31} won't work except if you have a special zgrep
As you must unzip before using your tool, i would divide the problem in two blocks.
Firstly, i would expand the paths you need (looking under <directory> for the phone <text>), and then iterate to apply the tool (because some tools like vim or nano cannot be piped).
Try something like this:
#!/bin/bash
#...
# text/directory input stuff
#...
tmpdir=$(mktemp -d)
trap 'rm -rf ${tmpdir}' EXIT
while IFS= read -r file; do
unzipped=${tmpdir}/$(basename "${file}" .gz)
gunzip -c "${file}" > "${unzipped}"
${tool} "${unzipped}"
done < <(zgrep -lw "${text}" "${directory}"/{01..31}/{sipd.logthenthedate.gz,sipmsg.logthenthedate.gz} 2>/dev/null)
Above is the proposed invert-form by Charles Duffy following this Bash FAQ.
If you prefer to iterate an array, you could build in this way:
# shellcheck disable=SC2207
files=( $(zgrep -lw "${text}" "${directory}"/{01..31}/{sipd.logthenthedate.gz,sipmsg.logthenthedate.gz} 2>/dev/null) )
for file in "${files[#]}"; do
# etc.
as in our particular case, the files to match have no spaces in their names and shellcheck warning is not so important (hidden above).
BRs

Save output command in a variable and write for loop

I want to write a shell script. I list my jpg files inside nested subdirectories with the following command line:
find . -type f -name "*.jpg"
How can I save the output of this command inside a variable and write a for loop for that? (I want to do some processing steps for each jpg file)
You don't want to store output containing multiple files into a variable/array and then post-process it later. You can just do those actions on the files on-the-run.
Assuming you have bash shell available, you could write a small script as
#!/usr/bin/env bash
# ^^^^ bash shell needed over any POSIX shell because
# of the need to use process-substitution <()
while IFS= read -r -d '' image; do
printf '%s\n' "$image"
# Your other actions can be done here
done < <(find . -type f -name "*.jpg" -print0)
The -print0 option writes filenames with a null byte terminator, which is then subsequently read using the read command. This will ensure the file names containing special characters are handled without choking on them.
Better than storing in a variable, use this :
find . -type f -name "*.jpg" -exec command {} \;
Even, if you want, command can be a full bloated shell script.
A demo is better than an explanation, no ? Copy paste the whole lines in a terminal :
cat<<'EOF' >/tmp/test
#!/bin/bash
echo "I play with $1 and I can replay with $1, even 3 times: $1"
EOF
chmod +x /tmp/test
find . -type f -name "*.jpg" -exec /tmp/test {} \;
Edit: new demo (from new questions from comments)
find . -type f -name "*.jpg" | head -n 10 | xargs -n1 command
(this another solution doesn't take care of filenames with newlines or spaces)
This one take care :
#!/bin/bash
shopt -s globstar
count=0
for file in **/*.jpg; do
if ((++count < 10)); then
echo "process file $file number $count"
else
break
fi
done

unix bash find file directories with 2 explicit file extensions

I am trying to create a small bash script that essentially looks through a directory that includes hundreds of sub directories. in SOME of these subdirectories include a textfile.txt and a htmlfile.html where the names textfile and htmlfile are variable.
I only really care about sub directories that have both the .txt and the .html, all other subdirecories can be ignored.
I then want to list all the .html files and .txt files that are in the same sub directory
this seems like a pretty simple issue to solve but I am at a loss. all I can really get working is a line of code that outputs sub directories that have either a .html file or .txt with no association with the actual sub directory they are in, and I am pretty new at bash scripting so I can't go any further
#!/bin/bash
files="$(find ~/file/ -type f -name '*.txt' -or -name '*.html')"
for file in $files
do
echo $file
done
The following find command looks checks every subdirectory and, if it has both html and txt files, it lists all of them:
find . -type d -exec env d={} bash -c 'ls "$d"/*.html &>/dev/null && ls "$d"/*.txt &>/dev/null && ls "$d/"*.{html,txt}' \;
Explanation:
find . -type d
This looks for all subdirectories of the current directory.
-exec env d={} bash -c '...' \;
This sets the environment variable d to the value of the found subdirectory and then executes the bash command that is contained within the single quotes (see below).
ls "$d"/*.html &>/dev/null && ls "$d"/*.txt &>/dev/null && ls "$d/"*.{html,txt}
This is the bash command that is executed. It consists of three statements and-ed together. The first checks to see if directory d has any html files. If so, the second statement runs and it checks to see if there are any txt files. If so, the last statement is executed and it lists all html and txt files in the directory d.
This command is safe for all file and directory names containing spaces, tabs, or other difficult characters.
You could do it by searching recursively with the globstar option:
shopt -s globstar
for file in **; do
if [[ -d $file ]]; then
for sub_file in "$file"/*; do
case "$sub_file" in
*.html)
html=1;;
*.txt)
txt=1;;
esac
done
[[ $html && $txt ]] && echo "$file"
html=""
txt=""
fi
done
You can make use of -o
#!/bin/bash
files=$(find ~/file/ -type f -name '*.txt' -o -name '*.html')
for file in $files
do
echo $file
done
#!/bin/bash
#A quick peek into a dir to see if there's at least one file that matches pattern
dir_has_file() { dir="$1"; pattern="$2";
[ -n "$(find "$dir" -maxdepth 1 -type f -name "$pattern" -print -quit)" ]
}
#Assumes there are no newline characters in the filenames, but will behave correctly with subdirectories that match *.html or *.txt
find "$1" -type d|\
while read d
do
dir_has_file "$d" '*.txt' &&
dir_has_file "$d" '*.html' &&
#Now print all the matching files
find "$d" -maxdepth 1 -type f -name '*.txt' -o -name '*.html'
done
This script takes the root directory to look into as the first argument ($1).
The test command is what you need to check for the existence of each file in each of the subdirs:
find . -type d -exec sh -c "if test -f {}/$file1 -a -f {}/$file2 ; then ls {}/*.{txt,html} ; fi" \;
where $file1 and $file2 are the two .txt and .html files you are looking for.

Resources