problem
/a.jpg
/a-150x150.jpg
/a-300x300.jpg
/b.jpg
/b-150x150.jpg
/b-300x300.jpg
how do I provide bash script watermarking only on the main image without 150x150 and 300x300 participating in the watermark?
I tried some code but not working. 300x300 remain watermarked
file -i *.jpg | grep image | awk -F':' '{ print $1 }' | while read IMAGE
I just want
/a.jpg
/b.jpg
/c.jpg
You can use find to select exactly the files you want and then feed them to a while loop using process substitution:
#!/bin/bash
while read filename ; do
echo "$filename" # your code here
done < <( find . -name '*.jpg' -and -not -regex '.*-[0-9]+x[0-9]+\.jpg' )
This code uses -name to select all .jpg images and -regex to exclude the ones with a -<number>x<number> pattern (regex has to be written to match the whole filename).
If your code needs to safeguard against exotic filenames with even newlines, you should use the -print0 switch with find and the -d $'\0' switch with read.
Related
Trying to loop every file, do some cutting, extract the first 4 characters of the MD5.
Here's what I got so far:
find . -name *.jpg | cut -f4 -d/ | cut -f1 -d. | md5sum | head -c 4
Problem is, I don't see any more output at this point. How can I send output to md5sum and continue sending the result?
md5sum reads everything from stdin till end of file (eof) and outputs md5 sum of full file. You should separate input into lines and run md5sum per line, for example with while read var loop:
find . -name *.jpg | cut -f4 -d/ | cut -f1 -d. |
while read -r a;
do echo -n $a| md5sum | head -c 4;
done
read builtin bash command will read one line from input into shell variable $a; while loop will run loop body (commands between do and done) for every return from read, and $a will be the current line. -r option of read is to not convert backslash; -n option of echo command will not add newline (if you want newline, remove -n option of echo).
This will be slow for thousands of files and more, as there are several forks/execs for every file inside loop. Faster will be some scripting with perl or python or nodejs or any other scripting language with builtin md5 hash computing (or with some library).
You can do what you are attempting to do with a short "helper" script that you call from find. For example, you could create a short script to find the basename of each file passed as an argument, remove the '.jpg' extension, and then provide the remaining name w/o extension as input to md5sum on stdin to get the md5sum of the name itself. Call the script anything you like, say namemd5.sh. Example:
#!/bin/bash
[ -z "$1" ] && exit 1 ## validate single argument
fname=$(basename "$1") ## get the filename alone
fname="${fname%.jpg}" ## remove .jpg extension
fnsum=$(md5sum - <<<"$fname") ## get md5sum of name w/o .jpg
fnsum=${fnsum%% *} ## remove trailing ' -'
echo "$fnsum - $fname" ## output md5sum - name
## (remove ' - $fname' for md5sum alone)
(note: the name is provided as part of the output for example purposes, remove if you want the md5sum alone as shown in the comment above)
Example Files
$ find /home/david/img/wp/ -type f -name "*.jpg"
/home/david/img/wp/hacker_manifesto_1200x900.jpg
/home/david/img/wp/hacker_manifesto_by_otalicus.jpg
/home/david/img/wp/reflections-triple-1920x1200.jpg
/home/david/img/wp/hacker_wallpaper_1600x900.jpg
/home/david/img/wp/Zen.jpg
/home/david/img/wp/hacker_wallpaper_by_vanilla23-dot254.jpg
/home/david/img/wp/hacker_manifesto_1600x900.jpg
Example Use/Output
$ find /home/david/img/wp/ -type f -name "*.jpg" -exec ./namemd5.sh '{}' \;
0f7d2aac158eb9f7842215e14ff6573c - hacker_manifesto_1200x900
604bc695a0bb70b8db0352267caf226f - hacker_manifesto_by_otalicus
5decea0e306f185bf988ac9934ec0e2c - reflections-triple-1920x1200
82bd8e1ad3df588eb0e0848c5f764812 - hacker_wallpaper_1600x900
0f4daba431a22c03f28977f087e4c695 - Zen
0c55cd3ebd2a847e10c20d86e80e6ceb - hacker_wallpaper_by_vanilla23-dot254
e5c1da0c2db3827d2bf81c306633cc56 - hacker_manifesto_1600x900
You can also call the script with the -execdir version within find as well, e.g.
$ find /home/david/img/wp/ -type f -name "*.jpg" -execdir \
/full/path/to/namemd5.sh '{}' \;
(note: the use of the /full/path to your helper script above)
How to find all .jpg file then execute md5sum then cut first 4 caracters:
find . -name '*.jpg' -exec md5sum {} \; | cut -b 1-4
I am faced with a challenge that requires multiple aspects of bash. I work in Linux (precisely Debian Stretch). Here is the situation (for all points/problem I write along the solution I considered for now, but I'm open to other ideas) :
I have videos of various types (and various upper-lower case), such as .mp4, .mov, .MOV, .MP4, .avi,... located in a directory (and spread across an almost un-structured tree of directories). To find all I tried to use the find command
For each video, I need to extract some metadata (i.e. the name of the file, duration of video, size of file and date of creation/last modification). The package mediainfo yields (among a lot of other things) the required fields.
The output of mediainfo is a long list of fields with format : <Tag>\t : <value>. I need to extract values for fields Complete name, Duration, File size and Encoded date.
So with all this information, I must filter the required fields value and put them in a CSV file. I considered using sed.
My goal is to achieve all these tasks either in a script or a small amount of separate commands.
The idea code (this code is hideously wrong, but you can get an idea) :
find . -type f -name "*.[mp4|MP4|mov|MOV|avi|AVI]" -exec mediainfo {} | sed '/Complete name|Duration|File size|Encoded date/p' > myfile.csv \;
Would you have any idea how to perform this task ? I feel terribly lost in combining find, exec and sed and outputting to a csv...
Thanks in advance for your help !
So I finally managed to write a script doing that. Probably not the best way to do, but here it is :
resFile="myresult.csv"
dstDir="./destination/"
srcDir="./source/"
#first copy all files at same level in dstDir (with preserve and update)
#this is somehow necessary, relative name for MOV files and mediainfo
#do not seem to work together.
find $srcDir -type f \( -name "*.mp4" -o -name "*.mov" -o -name "*.MOV" -o -name "*.avi" \) -exec cp -up {} $dstDir \;
#then for each file, output mediainfo of file and keep only interesting tags. add ### between each file.
find $dstDir -type f \( -name "*.mp4" -o -name "*.mov" -o -name "*.MOV" -o -name "*.avi"\
-exec sh -c " mediainfo --Output=XML {} | sed '1,15!d;/Duration\|Complete\|File_size\|Encoded_date/!d' >> $resFile && echo '########' >> $resFile" \;
#removes tags : <Duration>42s 15ms</Duration> -> 42s 15ms
sed -i 's/^<.*>\(.*\)<.*>/\1/I' $resFile
#Extract exact filename (and not relative)
sed -i 's/^\.\/.*\/\(.*\)\.[mp4|MOV|mov|avi|MP4]/\1/' $resFile
#Puts fields for a file on a unique line separated with commas
sed -i 'N;s/\n/,/;N;s/\n/,/;N;s/\n/,/;N;s/\n/,/' $resFile
#remove all trailing ###
sed -i 's/,#*$//' $resFile
I would still be interested if anyone has idea to improve the code.
I "minimized" a little bit, my actual code is a bit more modular and performs a few checks
Try this. Due to less time,I was not able to complete. You just have to send output to CSV.
for c in $(locate --basename .mp4 .mkv .wmv .flv .webm .mov .avi)
do
Complete_name=$(mediainfo --Output=XML $c | xml_grep 'Complete_name' --text_only| awk 'BEGIN{FS="/"}{print $NF}')
echo $Complete_name
Duration=$(mediainfo --Output=XML $c | xml_grep 'Duration' --text_only --nb_result 1)
echo $Duration
File_size=$(mediainfo --Output=XML $c | xml_grep 'File_size' --text_only)
echo $File_size
Encoded_date=$(mediainfo --Output=XML $c | xml_grep 'Encoded_date' --text_only -nb_result 1 | awk '{print $2}')
echo $Encoded_date
done
I used the following command to convert and merge all the JPG files in a directory to a single PDF file:
convert *.jpg file.pdf
The files in the directory are numbered from 1.jpg to 123.jpg. The conversion went fine but after converting, the pages were all mixed up. I wanted the PDF to have pages from 1.jpg to 123.jpg in the same order as they are named. I tried it with the following command as well:
cd 1
FILES=$( find . -type f -name "*jpg" | cut -d/ -f 2)
mkdir temp && cd temp
for file in $FILES; do
BASE=$(echo $file | sed 's/.jpg//g');
convert ../$BASE.jpg $BASE.pdf;
done &&
pdftk *pdf cat output ../1.pdf &&
cd ..
rm -rf temp
But still no luck. Operating system is Linux.
From the manual of ls:
-v natural sort of (version) numbers within text
So, doing what we need in a single command:
convert $(ls -v *.jpg) foobar.pdf
Mind that convert is part of ImageMagick.
The problem is because your shell is expanding the wildcard in a purely alphabetical order, and because the lengths of the numbers are different, the order will be incorrect:
$ echo *.jpg
1.jpg 10.jpg 100.jpg 101.jpg 102.jpg ...
The solution is to pad the filenames with zeros as required so they're the same length before running your convert command:
$ for i in *.jpg; do num=`expr match "$i" '\([0-9]\+\).*'`;
> padded=`printf "%03d" $num`; mv -v "$i" "${i/$num/$padded}"; done
Now the files will be matched by the wildcard in the correct order, ready for the convert command:
$ echo *.jpg
001.jpg 002.jpg 003.jpg 004.jpg 005.jpg 006.jpg 007.jpg 008.jpg ...
You could use
convert '%d.jpg[1-132]' file.pdf
via https://www.imagemagick.org/script/command-line-processing.php:
Another method of referring to other image files is by embedding a
formatting character in the filename with a scene range. Consider the
filename image-%d.jpg[1-5]. The command
magick image-%d.jpg[1-5] causes ImageMagick to attempt to read images
with these filenames:
image-1.jpg image-2.jpg image-3.jpg image-4.jpg image-5.jpg
See also https://www.imagemagick.org/script/convert.php
All of the above answers failed for me, when I wanted to merge many high-resolution jpeg images (from a scanned book).
Imagemagick tried to load all files into RAM, I therefore used the following two-step approach:
find -iname "*.JPG" | xargs -I'{}' convert {} {}.pdf
pdfunite *.pdf merged_file.pdf
Note that with this approach, you can also use GNU parallel to speed up the conversion:
find -iname "*.JPG" | parallel -I'{}' convert {} {}.pdf
This is how I do it:
First line convert all jpg files to pdf it is using convert command.
Second line is merging all pdf files to one single as pdf per page. This is using gs ((PostScript and PDF language interpreter and previewer))
for i in $(find . -maxdepth 1 -name "*.jpg" -print); do convert $i ${i//jpg/pdf}; done
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=merged_file.pdf -dBATCH `find . -maxdepth 1 -name "*.pdf" -print"`
https://gitlab.mister-muffin.de/josch/img2pdf
In all of the proposed solutions involving ImageMagick, the JPEG data gets fully decoded and re-encoded. This results in generation loss, as well as performance "ten to hundred" times worse than img2pdf.
img2pdf is also available from many Linux distros, as well as via pip3.
Mixing first idea with their reply, I think this code maybe satisfactory
jpgs2pdf.sh
#!/bin/bash
cd $1
FILES=$( find . -type f -name "*jpg" | cut -d/ -f 2)
mkdir temp > /dev/null
cd temp
for file in $FILES; do
BASE=$(echo $file | sed 's/.jpg//g');
convert ../$BASE.jpg $BASE.pdf;
done &&
pdftk `ls -v *pdf` cat output ../`basename $1`.pdf
cd ..
rm -rf temp
How to create A PDF document from a list of images
Step 1: Install parallel from Repository. This will speed up the process
Step 2: Convert each jpg to pdf file
find -iname "*.JPG" | sort -V | parallel -I'{}' convert -compress jpeg -quality 25 {} {}.pdf
The sort -V will sort the file names in natural order.
Step 3: Merge all PDFs into one
pdfunite $(find -iname '*.pdf' | sort -V) output_document.pdf
Credit Gregor Sturm
Combining Felix Defrance's and Delan Azabani's answer(from above):
convert `for file in $FILES; do echo $file; done` test_2.pdf
I'm trying to rename a load of files (I count over 200) that either have the company name in the filename, or in the text contents. I basically need to change any references to "company" to "newcompany", maintaining capitalisation where applicable (ie "Company becomes Newcompany", "company" becomes "newcompany"). I need to do this recursively.
Because the name could occur pretty much anywhere I've not been able to find example code anywhere that meets my requirements. It could be any of these examples, or more:
company.jpg
company.php
company.Class.php
company.Company.php
companysomething.jpg
Hopefully you get the idea. I not only need to do this with filenames, but also the contents of text files, such as HTML and PHP scripts. I'm presuming this would be a second command, but I'm not entirely sure what.
I've searched the codebase and found nearly 2000 mentions of the company name in nearly 300 files, so I don't fancy doing it manually.
Please help! :)
bash has powerful looping and substitution capabilities:
for filename in `find /root/of/where/files/are -name *company*`; do
mv $filename ${filename/company/newcompany}
done
for filename in `find /root/of/where/files/are -name *Company*`; do
mv $filename ${filename/Company/Newcompany}
done
For the file and directory names, use for, find, mv and sed.
For each path (f) that has company in the name, rename it (mv) from f to the new name where company is replaced by newcompany.
for f in `find -name '*company*'` ; do mv "$f" "`echo $f | sed s/company/nemcompany/`" ; done
For the file contents, use find, xargs and sed.
For every file, change company by newcompany in its content, keeping original file with extension .backup.
find -type f -print0 | xargs -0 sed -i .bakup 's/company/newcompany/g'
I'd suggest you take a look at man rename an extremely powerful perl-utility for, well, renaming files.
Standard syntax is
rename 's/\.htm$/\.html/' *.htm
the clever part is that the tool accept any perl-regexp as a pattern for a filename to be changed.
you might want to run it with the -n switch which will make the tool to only report what it would have changed.
Can't figure out a nice way to keep the capitalization right now, but since you already can search through the filestructure, issue several rename with different capitalization until all files are changed.
To loop through all files below current folder and to search for a particular string, you can use
find . -type f -exec grep -n -i STRING_TO_SEARCH_FOR /dev/null {} \;
The output from that command can be directed to a file (after some filtering to just extract the file names of the files that need to be changed).
find . /type ... > files_to_operate_on
Then wrap that in a while read loop and do some perl-magic for inplace-replacement
while read file
do
perl -pi -e 's/stringtoreplace/replacementstring/g' $file
done < files_to_operate_on
There are few right ways to recursively process files. Here's one:
while IFS= read -d $'\0' -r file ; do
newfile="${file//Company/Newcompany}"
newfile="${newfile//company/newcompany}"
mv -f "$file" "$newfile"
done < <(find /basedir/ -iname '*company*' -print0)
This will work with all possible file names, not just ones without whitespace in them.
Presumes bash.
For changing the contents of files I would advise caution because a blind replacement within a file could break things if the file is not plain text. That said, sed was made for this sort of thing.
while IFS= read -d $'\0' -r file ; do
sed -i '' -e 's/Company/Newcompany/g;s/company/newcompany/g'"$file"
done < <(find /basedir/ -iname '*company*' -print0)
For this run I recommend adding some additional switches to find to limit the files it will process, perhaps
find /basedir/ \( -iname '*company*' -and \( -iname '*.txt' -or -ianem '*.html' \) \) -print0
I'm trying to build a script that lists all the zip files in a set of directories, with some filters and get it to spit them out to file but when a filename has a space in it it seems to appear on a new line.
This list will eventually be used as an input to tar to gzip all the zip files, script is below:
#!/bin/bash
rm -f set1.txt
rm -f set2.txt
for line in $(find /home -type d -name assets ;);
do
echo $line >> set1.txt
for line in $(find $line -type f -name \*.zip -mtime +2 ;);
do
echo \"$line\" >> set2.txt
done;
This works as expected until you get a space in a filename then set2.txt contains entries like this:
"/home/xxxxxx/oldwebroot/htdocs/upload/assets/jobbags/rbjbCost"
"in"
"use"
"sept"
"2010.zip"
Does anyone know how I can get it to keep these filenames with spaces in in a single line with the whole lot wrapped in one set of quotes?
Thanks!
The correct way to loop over a set of files located via find is with a while read construct, thus:
while IFS= read -r -d '' line ; do
echo "$line" >> set1.txt
while IFS= read -r -d '' file ; do
printf '"%s"\n' "$file" >> set2.txt
done < <(find "$line" -type f -name \*.zip -mtime +2 -print0)
done < <(find /home -type d -name assets -print0)
For clarity I have given the inner loop variable a different name.
If you didn't have bash you'd have to issue the find command separately and redirect the output to a file, then read the file with while read ; do .. done < filename.
Note that each expansion of each variable is double-quoted. This is necessary.
Note also, however, that for what you want you can simply use the -printf switch to find, if you have GNU find.
find /home -type f -path '*/assets/*.zip' -mtime +2 -printf '"%p"\n' > set2.txt
Although, as #sarnold notes, this is not safe.
You should probably be executing your tar(1) command through some other mechanism; the find(1) program supports a -print0 option to request ASCII NUL-separated filename output, and the xargs(1) program supports a -0 option to tell it that the input is separated by ASCII NUL characters. (Since NUL is the only character that is not allowed in filenames, this is the only way to get reliable filename handling.)
Simply using the -print0 and -0 options will help but this still leaves the script open to another problem -- xargs(1) might decide to execute the tar(1) command two, three, or more times, depending upon its input. The last execution is the one that will "win", and the data from earlier invocations will be lost for ever. (This is useless as a backup.)
So you should also look into adding the --concatenate command line option to tar(1), too, so that it will add to the archive. It might make sense to perform the compression after all the files have been added, via gzip(1) or bzip2(1). (This does mean you need to remove the archive before a "fresh run" of this script.)