How to combine Linux find, convert and copy commands into one command

How to combine Linux find, convert and copy commands into one command - linux

I have the following cmd that fetches all .pdf files with an STP pattern in the filename and places them into a folder:
find /home/OurFiles/Images/ -name '*.pdf' |grep "STP*" | xargs cp -t /home/OurFiles/ImageConvert/STP/
I have another cmd that converts pdf to jpg.
find /home/OurFiles/ImageConvert/STP/ -type f -name '*.pdf' -print0 |
while IFS= read -r -d '' file
do convert -verbose -density 500 -resize 800 "${file}" "${file%.*}.jpg"
done
Is it possible to combine these commands into one? Also, I would like pre-pend a prefix onto the converted image file name in the single command, if possible. Example: STP_OCTOBER.jpg to MSP-STP_OCTOBER.jpg. Any feedback is much appreciated.

find /home/OurFiles/Images/ -type f -name '*STP*.pdf' -exec sh -c '
destination=$1; shift # get the first argument
for file do # loop over the remaining arguments
fname=${file##*/} # get the filename part
cp "$file" "$destination" &&
convert -verbose -density 500 -resize 800 "$destination/$fname" "$destination/MSP-${fname%pdf}jpg"
done
' sh /home/OurFiles/ImageConvert/STP {} +
You could pass the destination directory and all PDFs found to find's -exec option to execute a small script.
The script removes the first argument and saves it to variable destination and then loops over the given PDF paths. For each filepath, extract the filename, copy the file to the destination directory and run the convert command if the copy operation was successful.

Maybe something like:
find /home/OurFiles/Images -type f -name 'STP*.pdf' -print0 |
while IFS= read -r -d '' file; do
destfile="/home/OurFiles/ImageConvert/STP/MSP-$(basename "$file" .pdf).jpg"
convert -verbose -density 500 -resize 800 "$file" "$destfile"
done
The only really new thing in this merged one compared to your two separate commands is using basename(1) to strip off the directories and extension from the filename in order to create the output filename.

Related

Bash script to find .jpgs created within a certain time frame and then rename them

I have the following find command working pretty well which walks through a directory tree looking for any .jpg it finds with a file modification date of 600 minutes or less:
find /some/directory/ -depth -mmin -600 -name *.jpg
What I need to do now is rename all the .jpg it finds to the actual creation date that the .jpg was created on and create some random numbers at the end of the file before appending .jpg back to it. I've used this in the past: (date -r "$f" +%Y-%m-%d_%H-%M-%S-%N).jpg but I can't seem to figure out how to tie the find to the mv.
Am I missing a simple way to do this with -exec?

This should achieve what you wanted :
find /some/directory/ -depth -mmin -600 -name "*.jpg" \
-exec bash -c 'echo mv "$1" "$(dirname "$1")/$(date -r "$1" +%Y-%m-%d_%H-%M-%S-)$(date +%N).jpg"' bash {} \;
Remove echo to do renaming once you are satisfied with result.

Copying a type of file, in specific directories, to another directory

I have a .txt file that contains a list of directories. I want to make a script that goes through this .txt file, copies anything in the directory thats listed of a certain file type, to another directory.
I've never done this with directories, only files.
How can i edit this simple script to work for reading a directory list, looking for a .csv file, and copy it to another directory?
cat filenames.list | \
while read FILENAME
do
find . -name "$FILENAME" -exec cp '{}' new_dir\;
done

for DIRNAME in $(dirname.list); do find $DIRNAME -type f -name "*.csv" -exec cp \{} dest \; ; done;
sorry, in my first answer i didnt understand what you asking for.
The first line of code, simply, take a dirname entry in your directory list as a path and search in it for each file which end with ".csv" extension; then copy it inside the destination you want.
But you could do with less code:
for DIRNAME in $(dirname.list); do cp $DIRNAME/*.csv dest ; done

Despite the filename of the list filenames.list, let me assume the file contains the list of directory names, not filenames. Then would you please try:
while IFS= read -r dir; do
find "$dir" -type f -name "*.mp3" -exec cp -p -- {} new_dir \;
done < filenames.list
The find command searches in "$dir" for files which have an extension .mp3 then copies them to the new_dir.
The script above does not care the duplication of the filenames. If you want to keep the original directory tree and/or need a countermeasure for the duplication of the filenames, please let me know.

Using find inside a while loop works but find will run on each line of the file, another alternative is to save the list in an array, that way find can search on the directories in the list in one search.
If you have bash4+ you can use mapfile.
mapfile -t directories < filenames.list
If you're stuck at bash3.
directories=()
while IFS= read -r line; do
directories+=("$lines")
done < filenames.list
Now if you're just after one file type like files ending in *.csv.
find "${directories[#]}" -type f -name '*.csv' -exec sh -c 'cp -v -- "$#" /newdirectory' _ {} +
If you have multiple file type to match and multiple directories to copy the files.
while IFS= read -r -d '' file; do
case $file in
*.csv) cp -v -- "$file" /foodirectory;; ##: csv file copy to foodirectory
*.mp3) cp -v -- "$file" /bardirectory;; ##: mp3 file copy to bardirectory
*.avi) cp -v -- "$file" /bazdirectory;; ##: avi file copy to bazdirectory
esac
done < <(find "${directories[#]}" -type f -print0)
find's print0 will work with read's -d '' when dealing with files with white spaces and newlines. see How can I find and deal with file names containing newlines, spaces or both?
The -- is there so if you have a problematic filename that starts with a dash - cp will not interpret it as an option.

Given find ability to process multiple folder, and assuming goal is to 'flatten' all csv files into a single destination, consider the following.
Note that it assumes folder names do not have special characters (including spaces, tabs, new lines, etc).
As a side benefit, it will minimize the number of 'cp' calls, making the process efficient across large number of files/folders.
find $(<filename.list) -name '*.csv' | xargs cp -t DESTINATION/
For the more complex case, where folder names/file name can be anything (including space, '*', etc.), consider using NUL separator (-print0 and -0).
xargs -I{} -t find '{}' -name '*.csv' <dd -print0 | xargs -0 -I{} -t cp -t new/ '{}'
Which will fork multiple find and multiple cp.

Save output command in a variable and write for loop

I want to write a shell script. I list my jpg files inside nested subdirectories with the following command line:
find . -type f -name "*.jpg"
How can I save the output of this command inside a variable and write a for loop for that? (I want to do some processing steps for each jpg file)

You don't want to store output containing multiple files into a variable/array and then post-process it later. You can just do those actions on the files on-the-run.
Assuming you have bash shell available, you could write a small script as
#!/usr/bin/env bash
# ^^^^ bash shell needed over any POSIX shell because
# of the need to use process-substitution <()
while IFS= read -r -d '' image; do
printf '%s\n' "$image"
# Your other actions can be done here
done < <(find . -type f -name "*.jpg" -print0)
The -print0 option writes filenames with a null byte terminator, which is then subsequently read using the read command. This will ensure the file names containing special characters are handled without choking on them.

Better than storing in a variable, use this :
find . -type f -name "*.jpg" -exec command {} \;
Even, if you want, command can be a full bloated shell script.
A demo is better than an explanation, no ? Copy paste the whole lines in a terminal :
cat<<'EOF' >/tmp/test
#!/bin/bash
echo "I play with $1 and I can replay with $1, even 3 times: $1"
EOF
chmod +x /tmp/test
find . -type f -name "*.jpg" -exec /tmp/test {} \;
Edit: new demo (from new questions from comments)
find . -type f -name "*.jpg" | head -n 10 | xargs -n1 command
(this another solution doesn't take care of filenames with newlines or spaces)
This one take care :
#!/bin/bash
shopt -s globstar
count=0
for file in **/*.jpg; do
if ((++count < 10)); then
echo "process file $file number $count"
else
break
fi
done

split files greater than 500kb at particular directory

I want to split files which are >500kb. For this first I use find to list all such files find . -maxdepth 1 -name '*.log' -size +500k that returns "./filename" and then I write another command to split file according to my requirement split -b 500k -d -a 4 filename filename. here filename is the output of first command. Now can someone help me to combine both of them such that the output of first is input of second command.

How about a one liner?
find . -maxdepth 1 -name '*' -size +500k -exec 'split' '-b' '500k' '-d' '-a' '4' '{}' '{}' ';'

You can use a process substitution for this:
while IFS= read file
do
split -b 500k -d -a 4 "$file" "$file"
done < <(find . -maxdepth 1 -name '*.log' -size +500k)
That is: the while loop gets fed by the find output.

Merge multiple JPGs into single PDF in Linux

I used the following command to convert and merge all the JPG files in a directory to a single PDF file:
convert *.jpg file.pdf
The files in the directory are numbered from 1.jpg to 123.jpg. The conversion went fine but after converting, the pages were all mixed up. I wanted the PDF to have pages from 1.jpg to 123.jpg in the same order as they are named. I tried it with the following command as well:
cd 1
FILES=$( find . -type f -name "*jpg" | cut -d/ -f 2)
mkdir temp && cd temp
for file in $FILES; do
BASE=$(echo $file | sed 's/.jpg//g');
convert ../$BASE.jpg $BASE.pdf;
done &&
pdftk *pdf cat output ../1.pdf &&
cd ..
rm -rf temp
But still no luck. Operating system is Linux.

From the manual of ls:
-v natural sort of (version) numbers within text
So, doing what we need in a single command:
convert $(ls -v *.jpg) foobar.pdf
Mind that convert is part of ImageMagick.

The problem is because your shell is expanding the wildcard in a purely alphabetical order, and because the lengths of the numbers are different, the order will be incorrect:
$ echo *.jpg
1.jpg 10.jpg 100.jpg 101.jpg 102.jpg ...
The solution is to pad the filenames with zeros as required so they're the same length before running your convert command:
$ for i in *.jpg; do num=`expr match "$i" '\([0-9]\+\).*'`;
> padded=`printf "%03d" $num`; mv -v "$i" "${i/$num/$padded}"; done
Now the files will be matched by the wildcard in the correct order, ready for the convert command:
$ echo *.jpg
001.jpg 002.jpg 003.jpg 004.jpg 005.jpg 006.jpg 007.jpg 008.jpg ...

You could use
convert '%d.jpg[1-132]' file.pdf
via https://www.imagemagick.org/script/command-line-processing.php:
Another method of referring to other image files is by embedding a
formatting character in the filename with a scene range. Consider the
filename image-%d.jpg[1-5]. The command
magick image-%d.jpg[1-5] causes ImageMagick to attempt to read images
with these filenames:
image-1.jpg image-2.jpg image-3.jpg image-4.jpg image-5.jpg
See also https://www.imagemagick.org/script/convert.php

All of the above answers failed for me, when I wanted to merge many high-resolution jpeg images (from a scanned book).
Imagemagick tried to load all files into RAM, I therefore used the following two-step approach:
find -iname "*.JPG" | xargs -I'{}' convert {} {}.pdf
pdfunite *.pdf merged_file.pdf
Note that with this approach, you can also use GNU parallel to speed up the conversion:
find -iname "*.JPG" | parallel -I'{}' convert {} {}.pdf

This is how I do it:
First line convert all jpg files to pdf it is using convert command.
Second line is merging all pdf files to one single as pdf per page. This is using gs ((PostScript and PDF language interpreter and previewer))
for i in $(find . -maxdepth 1 -name "*.jpg" -print); do convert $i ${i//jpg/pdf}; done
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=merged_file.pdf -dBATCH `find . -maxdepth 1 -name "*.pdf" -print"`

https://gitlab.mister-muffin.de/josch/img2pdf
In all of the proposed solutions involving ImageMagick, the JPEG data gets fully decoded and re-encoded. This results in generation loss, as well as performance "ten to hundred" times worse than img2pdf.
img2pdf is also available from many Linux distros, as well as via pip3.

Mixing first idea with their reply, I think this code maybe satisfactory
jpgs2pdf.sh
#!/bin/bash
cd $1
FILES=$( find . -type f -name "*jpg" | cut -d/ -f 2)
mkdir temp > /dev/null
cd temp
for file in $FILES; do
BASE=$(echo $file | sed 's/.jpg//g');
convert ../$BASE.jpg $BASE.pdf;
done &&
pdftk `ls -v *pdf` cat output ../`basename $1`.pdf
cd ..
rm -rf temp

How to create A PDF document from a list of images
Step 1: Install parallel from Repository. This will speed up the process
Step 2: Convert each jpg to pdf file
find -iname "*.JPG" | sort -V | parallel -I'{}' convert -compress jpeg -quality 25 {} {}.pdf
The sort -V will sort the file names in natural order.
Step 3: Merge all PDFs into one
pdfunite $(find -iname '*.pdf' | sort -V) output_document.pdf
Credit Gregor Sturm

Combining Felix Defrance's and Delan Azabani's answer(from above):
convert `for file in $FILES; do echo $file; done` test_2.pdf

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to combine Linux find, convert and copy commands into one command - linux

Related

Bash script to find .jpgs created within a certain time frame and then rename them

Copying a type of file, in specific directories, to another directory

Save output command in a variable and write for loop

split files greater than 500kb at particular directory

Merge multiple JPGs into single PDF in Linux

Categories

Resources