unix bash find file directories with 2 explicit file extensions - linux

I am trying to create a small bash script that essentially looks through a directory that includes hundreds of sub directories. in SOME of these subdirectories include a textfile.txt and a htmlfile.html where the names textfile and htmlfile are variable.
I only really care about sub directories that have both the .txt and the .html, all other subdirecories can be ignored.
I then want to list all the .html files and .txt files that are in the same sub directory
this seems like a pretty simple issue to solve but I am at a loss. all I can really get working is a line of code that outputs sub directories that have either a .html file or .txt with no association with the actual sub directory they are in, and I am pretty new at bash scripting so I can't go any further
#!/bin/bash
files="$(find ~/file/ -type f -name '*.txt' -or -name '*.html')"
for file in $files
do
echo $file
done

The following find command looks checks every subdirectory and, if it has both html and txt files, it lists all of them:
find . -type d -exec env d={} bash -c 'ls "$d"/*.html &>/dev/null && ls "$d"/*.txt &>/dev/null && ls "$d/"*.{html,txt}' \;
Explanation:
find . -type d
This looks for all subdirectories of the current directory.
-exec env d={} bash -c '...' \;
This sets the environment variable d to the value of the found subdirectory and then executes the bash command that is contained within the single quotes (see below).
ls "$d"/*.html &>/dev/null && ls "$d"/*.txt &>/dev/null && ls "$d/"*.{html,txt}
This is the bash command that is executed. It consists of three statements and-ed together. The first checks to see if directory d has any html files. If so, the second statement runs and it checks to see if there are any txt files. If so, the last statement is executed and it lists all html and txt files in the directory d.
This command is safe for all file and directory names containing spaces, tabs, or other difficult characters.

You could do it by searching recursively with the globstar option:
shopt -s globstar
for file in **; do
if [[ -d $file ]]; then
for sub_file in "$file"/*; do
case "$sub_file" in
*.html)
html=1;;
*.txt)
txt=1;;
esac
done
[[ $html && $txt ]] && echo "$file"
html=""
txt=""
fi
done

You can make use of -o
#!/bin/bash
files=$(find ~/file/ -type f -name '*.txt' -o -name '*.html')
for file in $files
do
echo $file
done

#!/bin/bash
#A quick peek into a dir to see if there's at least one file that matches pattern
dir_has_file() { dir="$1"; pattern="$2";
[ -n "$(find "$dir" -maxdepth 1 -type f -name "$pattern" -print -quit)" ]
}
#Assumes there are no newline characters in the filenames, but will behave correctly with subdirectories that match *.html or *.txt
find "$1" -type d|\
while read d
do
dir_has_file "$d" '*.txt' &&
dir_has_file "$d" '*.html' &&
#Now print all the matching files
find "$d" -maxdepth 1 -type f -name '*.txt' -o -name '*.html'
done
This script takes the root directory to look into as the first argument ($1).

The test command is what you need to check for the existence of each file in each of the subdirs:
find . -type d -exec sh -c "if test -f {}/$file1 -a -f {}/$file2 ; then ls {}/*.{txt,html} ; fi" \;
where $file1 and $file2 are the two .txt and .html files you are looking for.

Related

Bash Globbing Pattern Matching for Imagemagick recursive convert to pdf

I have the following 2 scripts, that recursively convert folders of images to pdf's for my wifes japanese manga kindle using find and Imagemagick convert:
#!/bin/bash
_d="$(pwd)"
echo "$_d"
find . -type d -exec echo "Will convert in the following order: {}" \;
find . -type d -exec echo "Converting: '{}'" \; -exec convert '{}/*.jpg' "$_d/{}.pdf" \;
and the same for PNG
#!/bin/bash
_d="$(pwd)"
echo "$_d"
find . -type d -exec echo "Will convert in the following order: {}" \;
find . -type d -exec echo "Converting: '{}'" \; -exec convert '{}/*.png' "$_d/{}.pdf" \;
Unfortunately I am not able make one universal script that works for all image formats.
How do I make one script that works for both ?
I would also need JPG,PNG as well as jpeg,JPEG
Thx in advance
I wouldn't use find at all, just a loop:
#!/use/bin/env bash
# enable recursive globs
shopt -s globstar
for dir in **/*/; do
printf "Converting jpgs in %s\n" "$dir"
convert "$dir"/*.jpg "$dir/out.pdf"
done
If you want to combine .jpg and .JPG in the same pdf, add nocaseglob to the shopt line. Add .jpeg to the mix? Add extglob and change "$dir"/*.jpg to "$dir"/*.#(jpg|jpeg)
You can do more complicated actions if you turn the find exec into a bash function (or even a standalone script).
#!/bin/bash
do_convert()(
shopt -s nullglob
for dir in "$#"; do
files=("$dir"/*.{jpg,JPG,PNG,jpeg,JPEG})
if [[ -z $files ]]; then
echo 1>&2 "no suitable files in $dir"
continue
fi
echo "Converting $dir"
convert "${files[#]}" "$dir.pdf"
done
)
export -f do_convert
pwd
echo "Will convert in the following order:"
find . -type d
# find . -type d -exec bash -c 'do_convert {}' \;
find . -type d -exec bash -c 'do_convert "$#"' -- {} \+
nullglob makes *.xyz return nothing if there is no match, instead of returning the original string unchanged
p/*.{a,b,c} expands into p/*.a p/*.b p/*.c before the * are expanded
x()(...) instead of the more normal x(){...} uses a subshell so we don't have to remember to unset nullglob again or clean up any variable definitions
export -f x makes function x available in subshells
we skip conversion if there are no suitable files
with the slightly more complicated find command, we can reduce the number of invocations of bash (probably doesn't save a great deal in this particular case)
how about a one-liner
dry-run
find -name \*.jpg -or -name \*.png | xargs -I xxx echo "xxx =>" xxx.pdf
run
find -name \*.jpg -or -name \*.png | xargs -I xxx echo xxx xxx.pdf
help
-name match name
-or logical or => both jpg and png
xargs map input into a name to execute a command on
-I select a name, it is like {} in file
NOTE
instead of $(pwd) which is a command substitution you can use variable $PWD
xxx maps into a name and xxx.pdf still has the matched extension found by find. which means filename.png becomes filename.png.pdf. If this is not desired, you can sed it
to run convert command in parallel you can use -P 0 with xargs -- see xargs --help
With sed to remove extensions
dry-run
find -name \*.jpg -or -name \*.png | sed 's/.\(png\|jpg\)$//g' | xargs -I xxx echo "xxx =>" xxx.pdf
#shawn Your solution works, just as I stated in the comments, I am to stupid to name the resulting pdf properly (folder name) and save in the script caller directory. Nevertheless, it solves my case insensitive jpg, jpeg, png problems just fine.
Here is shawns solution:
#!/bin/bash
# enable recursive globs
shopt -s globstar nocaseglob extglob
for dir in **/*/; do
printf "Converting (jpg|jpeg|png) in %s\n" "$dir"
convert "$dir"/*.#(jpg|jpeg|png) "$dir/out.pdf"
done
#jhnc Your solution works out of the box, it does exactly what I intended, and I really like calling functions, or even standalone scripts to increase complexity. One drawback is, that I can not Ctrl-c the process, because it is thereby threaded, or runs in a subshell ? I think you were missing an exit statement at the end of the function, it never stopped.
#!/bin/bash
do_convert()(
shopt -s nullglob
for dir in "$#"; do
files=("$dir"/*.{jpg,JPG,png,PNG,jpeg,JPEG})
if [[ -z $files ]]; then
echo 1>&2 "no suitable files in $dir"
continue
fi
echo "Converting $dir"
convert "${files[#]}" "$dir.pdf"
done
exit
)
export -f do_convert
pwd
echo "Will convert in the following order:"
find . -type d
# find . -type d -exec bash -c 'do_convert {}' \;
find . -type d -exec bash -c 'do_convert "$#"' -- {} \+
# everyone else, it's already after midnight again, I guess this is a trivial question for you guys, and I am very grateful for your ALL your answers, I didn't have the time to try everything.
I find linux bash very challenging.
A lot of ways to skin this cat. My thought is:
for F in `find . -type f -print`
do
TYPE=`file -n --mime-type $F`
if [ "$TYPE" = image/png ]
then
## do png conversion here
elif [ "$TYPE" = image/jpg ]
then
## do jpg conversion here
fi
done

Create duplicate file and rename it

I want duplicates of the files with different name.
I am currently trying out these commands before putting them into my bash script.
$ set dir = /somewhere/states
$ find $dir -name "total.txt" -type f | xargs ls -1
/somewhere/states/florida/fixed.fl_Asite_ttl/somewhere/total.txt
/somewhere/states/hawaii/fixed.hi_Bsite_ttl/somewhere/total.txt
/somewhere/states/kentucky/fixed.ky_Asite_ttl/somewhere/total.txt
/somewhere/states/michigan/fixed.mi_Csite_ttl/somewhere/total.txt
/somewhere/states/texas/fixed.tx_Vsite_ttl/somewhere/total.txt
I know I can rename file using something like this, but it isn't exactly what I want:
$ find $dir -name "total.txt" -exec sh -c 'cp {} `dirname {}`/`basename {} `why.xls' \;
/somewhere/states/florida/fixed.fl_Asite_ttl/somewhere/total.txtwhy.xls
/somewhere/states/hawaii/fixed.hi_Bsite_ttl/somewhere/total.txtwhy.xls
/somewhere/states/kentucky/fixed.ky_Asite_ttl/somewhere/total.txtwhy.xls
/somewhere/states/michigan/fixed.mi_Csite_ttl/somewhere/total.txtwhy.xls
/somewhere/states/texas/fixed.tx_Vsite_ttl/somewhere/total.txtwhy.xls
May I know how to copy the files and have the new files in the same dir?
below are the examples.
I want to name the new files as everything behind "fixed." and before "/somewhere" and changing the file extension as well
/somewhere/states/florida/fixed.fl_Asite_ttl/somewhere/fl_Asite_ttl.xls
/somewhere/states/hawaii/fixed.hi_Bsite_ttl/somewhere/hi_Bsite_ttl.xls
/somewhere/states/kentucky/fixed.ky_Asite_ttl/somewhere/ky_Asite_ttl.xls
/somewhere/states/michigan/fixed.mi_Csite_ttl/somewhere/mi_Csite_ttl.xls
/somewhere/states/texas/fixed.tx_Vsite_ttl/somewhere/tx_Vsite_ttl.xls
Update:
/somewhere/states/florida_fixed_ttl/fixed.fl_Asite_ttl/somewhere/total.txt
Probably not the most elegant but this should work:
find . -name total.txt | while read F ; do [[ $F =~ fixed.[^/]* ]] ; N=$(echo $BASH_REMATCH | sed s/fixed\.//) ; echo "cp $F $(dirname $F)/$N.xls" ; done
If you are happy with the output just remove the last echo, i.e. this:
echo "cp $F $(dirname $F)/$N.xls"
to this:
cp "$F" "$(dirname $F)/$N.xls"
Note, if the .txt and .xls contents will always remain the same you can use ln instead of cp -- one file, two names.

How to rename directory and subdirectories recursively in linux?

Let say I have 200 directories and it have variable hierarchy sub-directories, How can I rename the directory and its sub directories using mv command with find or any sort of combination?
for dir in ./*/; do (i=1; cd "$dir" && for dir in ./*; do printf -v dest %s_%02d "$dir" "$((i++))"; echo mv "$dir" "$dest"; done); done
This is for 2 level sub directory, is there more cleaner way to do it for multiple hierarchy? Any other one line command suggestions/ solutions are welcome.
I had a specific task - to replace non-ASCII symbols and square brackets, in directories and in files as well. It works fine.
First, exactly my case, as a working example:
find . -depth -execdir rename -v 's/([^\x00-\x7F]+)|([\[\]]+)/\_/g' {} \;
or separately non-ascii and brackets:
find . -depth -execdir rename -v 's/[^\x00-\x7F]+/\_/g' {} \;
find . -depth -execdir rename -v 's/[\[\]]+/\_/g' {} \;
If we'd like to work only with directories, add -type d (after the -depth option)
Now, in more generalized view:
find . -depth [-type d] [-type f] -execdir rename [-v] 's/.../.../g' '{}' \;
Here we can control dirs/files and verbosity. Quotes around {} may be needed or not on your machine (backslash before ; serves the same, may be replaced with quotes)
You have two options when you want to do recursive operations in files/directories:
Option 1 : Find
while IFS= read -r -d '' subd;do
#do your stuff here with var $subd
done < <(find . -type d -print0)
In this case we use find to return only dirs using -type d
We can ask find to return only files using -type f or not to specify any type and both directories and files will be returned.
We also use find option -print0 to force null separation of the find results and thus to ensure correct names handling in case names include special chars like spaces, etc.
Testing:
$ while IFS= read -r -d '' s;do echo "$s";done < <(find . -type d -print0)
.
./dir1
./dir1/sub1
./dir1/sub1/subsub1
./dir1/sub1/subsub1/subsubsub1
./dir2
./dir2/sub2
Option 2 : Using Bash globstar option
shopt -s globstar
for subd in **/ ; do
#Do you stuff here with $subd directories
done
In this case , the for loop will match all subdirs under current working directory (operation **/).
You can also ask bash to return both files and folders using
for sub in ** ;do #your commands;done
if [[ -d "$sub" ]];then
#actions for folders
elif [[ -e "$sub" ]];then
#actions for files
else
#do something else
fi
done
Folders Test:
$ shopt -s globstar
$ for i in **/ ;do echo "$i";done
dir1/
dir1/sub1/
dir1/sub1/subsub1/
dir1/sub1/subsub1/subsubsub1/
dir2/
dir2/sub2/
In your small script, just by enabling shopt -s globstar and by changing your for to for dir in **/;do it seems that work as you expect.

Recursively prepend text to file names

I want to prepend text to the name of every file of a certain type - in this case .txt files - located in the current directory or a sub-directory.
I have tried:
find -L . -type f -name "*.txt" -exec mv "{}" "PrependedTextHere{}" \;
The problem with this is dealing with the ./ part of the path that comes with the {} reference.
Any help or alternative approaches appreciated.
You can do something like this
find -L . -type f -name "*.txt" -exec bash -c 'echo "$0" "${0%/*}/PrependedTextHere${0##*/}"' {} \;
Where
bash -c '...' executes the command
$0 is the first argument passed in, in this case {} -- the full filename
${0%/*} removes everything including and after the last / in the filename
${0##*/} removes everything before and including the last / in the filename
Replace the echo with a mv once you're satisfied it's working.
Are you just trying to move the files to a new file name that has Prepend before it?
for F in *.txt; do mv "$F" Prepend"$F"; done
Or do you want it to handle subdirectories and prepend between the directory and file name:
dir1/PrependA.txt
dir2/PrependB.txt
Here's a quick shot at it. Let me know if it helps.
for file in $(find -L . -type f -name "*.txt")
do
parent=$(echo $file | sed "s=\(.*/\).*=\1=")
name=$(echo $file | sed "s=.*/\(.*\)=\1=")
mv "$file" "${parent}PrependedTextHere${name}"
done
This ought to work, as long file names does not have new line character(s). In such case make the find to use -print0 and IFS to have null.
#!/bin/sh
IFS='
'
for I in $(find -L . -name '*.txt' -print); do
echo mv "$I" "${I%/*}/prepend-${I##*/}"
done
p.s. Remove the echo to make the script effective, it's there to avoid accidental breakage for people who randomly copy paste stuff from here to their shell.

A bash script to run a program for directories that do not have a certain file

I need a Bash Script to Execute a program for all directories that do not have a specific file and create the output file on the same directory.This program needs an input file which exist in every directory with the name *.DNA.fasta.Suppose I have the following directories that may contain sub directories also
dir1/a.protein.fasta
dir2/b.protein.fasta
dir3/anyfile
dir4/x.orf.fasta
I have started by finding the directories that don't have that specific file whic name is *.protein.fasta
in this case I want the dir3 and dir4 to be listed (since they do not contain *.protein.fasta)
I have tried this code:
find . -maxdepth 1 -type d \! -exec test -e '{}/*protein.fasta' \; -print
but it seems I missed some thing it does not work.
also I do not know how to proceed for the whole story.
This is a tricky one.
I can't think of a good solution. But here's a solution, nevertheless. Note that this is guaranteed not to work if your directory or file names contain newlines, and it's not guaranteed to work if they contain other special characters. (I've only tested with the samples in your question.)
Also, I haven't included a -maxdepth because you said you need to search subdirectories too.
#!/bin/bash
# Create an associative array
declare -A excludes
# Build an associative array of directories containing the file
while read line; do
excludes[$(dirname "$line")]=1
echo "excluded: $(dirname "$line")" >&2
done <<EOT
$(find . -name "*protein.fasta" -print)
EOT
# Walk through all directories, print only those not in array
find . -type d \
| while read line ; do
if [[ ! ${excludes[$line]} ]]; then
echo "$line"
fi
done
For me, this returns:
.
./dir3
./dir4
All of which are directories that do not contain a file matching *.protein.fasta. Of course, you can replace the last echo "$line" with whatever you need to do with these directories.
Alternately:
If what you're really looking for is just the list of top-level directories that do not contain the matching file in any subdirectory, the following bash one-liner may be sufficient:
for i in *; do test -d "$i" && ( find "$i" -name '*protein.fasta' | grep -q . || echo "$i" ); done
#!/bin/bash
for dir in *; do
test -d "$dir" && ( find "$dir" -name '*protein.fasta' | grep -q . || Programfoo"$dir/$dir.DNA.fasta");
done

Resources