A bash script to run a program for directories that do not have a certain file - linux

I need a Bash Script to Execute a program for all directories that do not have a specific file and create the output file on the same directory.This program needs an input file which exist in every directory with the name *.DNA.fasta.Suppose I have the following directories that may contain sub directories also
dir1/a.protein.fasta
dir2/b.protein.fasta
dir3/anyfile
dir4/x.orf.fasta
I have started by finding the directories that don't have that specific file whic name is *.protein.fasta
in this case I want the dir3 and dir4 to be listed (since they do not contain *.protein.fasta)
I have tried this code:
find . -maxdepth 1 -type d \! -exec test -e '{}/*protein.fasta' \; -print
but it seems I missed some thing it does not work.
also I do not know how to proceed for the whole story.

This is a tricky one.
I can't think of a good solution. But here's a solution, nevertheless. Note that this is guaranteed not to work if your directory or file names contain newlines, and it's not guaranteed to work if they contain other special characters. (I've only tested with the samples in your question.)
Also, I haven't included a -maxdepth because you said you need to search subdirectories too.
#!/bin/bash
# Create an associative array
declare -A excludes
# Build an associative array of directories containing the file
while read line; do
excludes[$(dirname "$line")]=1
echo "excluded: $(dirname "$line")" >&2
done <<EOT
$(find . -name "*protein.fasta" -print)
EOT
# Walk through all directories, print only those not in array
find . -type d \
| while read line ; do
if [[ ! ${excludes[$line]} ]]; then
echo "$line"
fi
done
For me, this returns:
.
./dir3
./dir4
All of which are directories that do not contain a file matching *.protein.fasta. Of course, you can replace the last echo "$line" with whatever you need to do with these directories.
Alternately:
If what you're really looking for is just the list of top-level directories that do not contain the matching file in any subdirectory, the following bash one-liner may be sufficient:
for i in *; do test -d "$i" && ( find "$i" -name '*protein.fasta' | grep -q . || echo "$i" ); done

#!/bin/bash
for dir in *; do
test -d "$dir" && ( find "$dir" -name '*protein.fasta' | grep -q . || Programfoo"$dir/$dir.DNA.fasta");
done

Related

Rename all files in multiple folders with some condition in single linux command os script.

I have multiple folders with multiple files. I need to rename those files with the same name like the folder where the file stored with "_partN" prefix.
As example,
I have a folder named as "new_folder_for_upload" which have 2 files. I need to convert the name of these 2 files like,
new_folder_for_upload_part1
new_folder_for_upload_part2
I have so many folders like above which have multiple files. I need to convert all the file names as I describe above.
Can anybody help me to find out for a single linux command or script to do this work automatically?
Assuming bash shell, and assuming you want the file numbering to restart for each subdirectory, and doing the moving of all files to the top directory (leaving empty subdirectories). Formatted as script for easier reading:
find . -type f -print0 | while IFS= read -r -d '' file
do
myfile=$(echo $file | sed "s#./##")
mydir=$(dirname "$myfile")
if [[ $mydir != $lastdir ]]
then
NR=1
fi
lastdir=${mydir}
mv "$myfile" "$(dirname "$myfile")_part${NR}"
((NR++))
done
Or as one-line command:
find . -type f -print0 | while IFS= read -r -d '' file; do myfile=$(echo $file | sed "s#./##"); mydir=$(dirname "$myfile"); if [[ $mydir != $lastdir ]]; then NR=1; fi; lastdir=${mydir}; mv "$myfile" "$(dirname "$myfile")_part${NR}"; ((NR++)); done
Beware. This is armed, and will do a bulk renaming / moving of every file in or below your current work directory. Use at your own risk.
To delete the empty subdirs:
find . -depth -empty -type d -delete

How to remove all but a few selected files in a directory?

I want to remove all files in a directory except some through a shell script. The name of files will be passed as command line argument and number of arguments may vary.
Suppose the directory has these 5 files:
1.txt, 2.txt, 3.txt. 4.txt. 5.txt
I want to remove two files from it through a shell script using file name. Also, the number of files may vary.
There are several ways this could be done, but the one that's most robust and highest performance with large directories is probably to construct a find command.
#!/usr/bin/env bash
# first argument is the directory name to search in
dir=$1; shift
# subsequent arguments are filenames to absolve from deletion
find_args=( )
for name; do
find_args+=( -name "$name" -prune -o )
done
if [[ $dry_run ]]; then
exec find "$dir" -mindepth 1 -maxdepth 1 "${find_args[#]}" -print
else
exec find "$dir" -mindepth 1 -maxdepth 1 "${find_args[#]}" -exec rm -f -- '{}' +
fi
Thereafter, to list files which would be deleted (if the above is in a script named delete-except):
dry_run=1 delete-except /path/to/dir 1.txt 2.txt
or, to actually delete those files:
delete-except /path/to/dir 1.txt 2.txt
A simple, straightforward way could be using the GLOBIGNORE variable.
GLOBIGNORE is a colon-separated list of patterns defining the set of filenames to be ignored by pathname expansion. If a filename matched by a pathname expansion pattern also matches one of the patterns in GLOBIGNORE, it is removed from the list of matches.
Thus, the solution is to iterate through the command line args, appending file names to the list. Then call rm *. Don't forget to unset GLOBIGNORE var at the end.
#!/bin/bash
for arg in "$#"
do
if [ $arg = $1 ]
then
GLOBIGNORE=$arg
else
GLOBIGNORE=${GLOBIGNORE}:$arg
fi
done
rm *
unset GLOBIGNORE
*In case you had set GLOBIGNORE before, you can just store the val in a tmp var then reset it at the end.
We can accomplish this in pure Bash, without the need for any external tools:
#!/usr/bin/env bash
# build an associative array that contains all the filenames to be preserved
declare -A skip_list
for f in "$#"; do
skip_list[$f]=1
done
# walk through all files and build an array of files to be deleted
declare -a rm_list
for f in *; do # loop through all files
[[ -f "$f" ]] || continue # not a regular file
[[ "${skip_list[$f]}" ]] && continue # skip this file
rm_list+=("$f") # now it qualifies for rm
done
# remove the files
printf '%s\0' "${rm_list[#]}" | xargs -0 rm -- # Thanks to Charles' suggestion
This solution will also work for files that have whitespaces or glob characters in them.
Thanks all for your answers, I have figured out my solution. Below is the solution worked for me:
find /home/mydir -type f | grep -vw "goo" | xargs rm

using IF to see a directory exists if not do something

I am trying to move the directories from $DIR1 to $DIR2 if $DIR2 does not have the same directory name
if [[ ! $(ls -d /$DIR2/* | grep test) ]] is what I currently have.
then
mv $DIR1/test* /$DIR2
fi
first it gives
ls: cannot access //data/lims/PROCESSING/*: No such file or directory
when $DIR2 is empty
however, it still works.
secondly
when i run the shell script twice.
it doesn't let me move the directories with the similar name.
for example
in $DIR1 i have test-1 test-2 test-3
when it runs for the first time all three directories moves to $DIR2
after that i do mkdir test-4 at $DIR1 and run the script again..
it does not let me move the test-4 because my loop thinks that test-4 is already there since I am grabbing all test
how can I go around and move test-4 ?
Firstly, you can check whether or not a directory exists using bash's built in 'True if directory exists' expression:
test="/some/path/maybe"
if [ -d "$test" ]; then
echo "$test is a directory"
fi
However, you want to test if something is not a directory. You've shown in your code that you already know how to negate the expression:
test="/some/path/maybe"
if [ ! -d "$test" ]; then
echo "$test is NOT a directory"
fi
You also seem to be using ls to get a list of files. Perhaps you want to loop over them and do something if the files are not a directory?
dir="/some/path/maybe"
for test in $(ls $dir);
do
if [ ! -d $test ]; then
echo "$test is NOT a directory."
fi
done
A good place to look for bash stuff like this is Machtelt Garrels' guide. His page on the various expressions you can use in if statements helped me a lot.
Moving directories from a source to a destination if they don't already exist in the destination:
For the sake of readability I'm going to refer to your DIR1 and DIR2 as src and dest. First, let's declare them:
src="/place/dir1/"
dest="/place/dir2/"
Note the trailing slashes. We'll append the names of folders to these paths so the trailing slashes make that simpler. You also seem to be limiting the directories you want to move by whether or not they have the word test in their name:
filter="test"
So, let's first loop through the directories in source that pass the filter; if they don't exist in dest let's move them there:
for dir in $(ls -d $src | grep $filter); do
if [ ! -d "$dest$dir" ]; then
mv "$src$dir" "$dest"
fi
done
I hope that solves your issue. But be warned, #gniourf_gniourf posted a link in the comments that should be heeded!
If you need to mv some directories to another according to some pattern, than you can use find:
find . -type d -name "test*" -exec mv -t /tmp/target {} +
Details:
-type d - will search only for directories
-name "" - set search pattern
-exec - do something with find results
-t, --target-directory=DIRECTORY move all SOURCE arguments into DIRECTORY
There are many examples of exec or xargs usage.
And if you do not want to overwrite files, than add -n option to mv command:
find . -type d -name "test*" -exec mv -n -t /tmp/target {} +
-n, --no-clobber do not overwrite an existing file

unix bash find file directories with 2 explicit file extensions

I am trying to create a small bash script that essentially looks through a directory that includes hundreds of sub directories. in SOME of these subdirectories include a textfile.txt and a htmlfile.html where the names textfile and htmlfile are variable.
I only really care about sub directories that have both the .txt and the .html, all other subdirecories can be ignored.
I then want to list all the .html files and .txt files that are in the same sub directory
this seems like a pretty simple issue to solve but I am at a loss. all I can really get working is a line of code that outputs sub directories that have either a .html file or .txt with no association with the actual sub directory they are in, and I am pretty new at bash scripting so I can't go any further
#!/bin/bash
files="$(find ~/file/ -type f -name '*.txt' -or -name '*.html')"
for file in $files
do
echo $file
done
The following find command looks checks every subdirectory and, if it has both html and txt files, it lists all of them:
find . -type d -exec env d={} bash -c 'ls "$d"/*.html &>/dev/null && ls "$d"/*.txt &>/dev/null && ls "$d/"*.{html,txt}' \;
Explanation:
find . -type d
This looks for all subdirectories of the current directory.
-exec env d={} bash -c '...' \;
This sets the environment variable d to the value of the found subdirectory and then executes the bash command that is contained within the single quotes (see below).
ls "$d"/*.html &>/dev/null && ls "$d"/*.txt &>/dev/null && ls "$d/"*.{html,txt}
This is the bash command that is executed. It consists of three statements and-ed together. The first checks to see if directory d has any html files. If so, the second statement runs and it checks to see if there are any txt files. If so, the last statement is executed and it lists all html and txt files in the directory d.
This command is safe for all file and directory names containing spaces, tabs, or other difficult characters.
You could do it by searching recursively with the globstar option:
shopt -s globstar
for file in **; do
if [[ -d $file ]]; then
for sub_file in "$file"/*; do
case "$sub_file" in
*.html)
html=1;;
*.txt)
txt=1;;
esac
done
[[ $html && $txt ]] && echo "$file"
html=""
txt=""
fi
done
You can make use of -o
#!/bin/bash
files=$(find ~/file/ -type f -name '*.txt' -o -name '*.html')
for file in $files
do
echo $file
done
#!/bin/bash
#A quick peek into a dir to see if there's at least one file that matches pattern
dir_has_file() { dir="$1"; pattern="$2";
[ -n "$(find "$dir" -maxdepth 1 -type f -name "$pattern" -print -quit)" ]
}
#Assumes there are no newline characters in the filenames, but will behave correctly with subdirectories that match *.html or *.txt
find "$1" -type d|\
while read d
do
dir_has_file "$d" '*.txt' &&
dir_has_file "$d" '*.html' &&
#Now print all the matching files
find "$d" -maxdepth 1 -type f -name '*.txt' -o -name '*.html'
done
This script takes the root directory to look into as the first argument ($1).
The test command is what you need to check for the existence of each file in each of the subdirs:
find . -type d -exec sh -c "if test -f {}/$file1 -a -f {}/$file2 ; then ls {}/*.{txt,html} ; fi" \;
where $file1 and $file2 are the two .txt and .html files you are looking for.

How to loop over directories in Linux?

I am writing a script in bash on Linux and need to go through all subdirectory names in a given directory. How can I loop through these directories (and skip regular files)?
For example:
the given directory is /tmp/
it has the following subdirectories: /tmp/A, /tmp/B, /tmp/C
I want to retrieve A, B, C.
All answers so far use find, so here's one with just the shell. No need for external tools in your case:
for dir in /tmp/*/ # list directories in the form "/tmp/dirname/"
do
dir=${dir%*/} # remove the trailing "/"
echo "${dir##*/}" # print everything after the final "/"
done
cd /tmp
find . -maxdepth 1 -mindepth 1 -type d -printf '%f\n'
A short explanation:
find finds files (quite obviously)
. is the current directory, which after the cd is /tmp (IMHO this is more flexible than having /tmp directly in the find command. You have only one place, the cd, to change, if you want more actions to take place in this folder)
-maxdepth 1 and -mindepth 1 make sure that find only looks in the current directory and doesn't include . itself in the result
-type d looks only for directories
-printf '%f\n prints only the found folder's name (plus a newline) for each hit.
Et voilĂ !
You can loop through all directories including hidden directrories (beginning with a dot) with:
for file in */ .*/ ; do echo "$file is a directory"; done
note: using the list */ .*/ works in zsh only if there exist at least one hidden directory in the folder. In bash it will show also . and ..
Another possibility for bash to include hidden directories would be to use:
shopt -s dotglob;
for file in */ ; do echo "$file is a directory"; done
If you want to exclude symlinks:
for file in */ ; do
if [[ -d "$file" && ! -L "$file" ]]; then
echo "$file is a directory";
fi;
done
To output only the trailing directory name (A,B,C as questioned) in each solution use this within the loops:
file="${file%/}" # strip trailing slash
file="${file##*/}" # strip path and leading slash
echo "$file is the directoryname without slashes"
Example (this also works with directories which contains spaces):
mkdir /tmp/A /tmp/B /tmp/C "/tmp/ dir with spaces"
for file in /tmp/*/ ; do file="${file%/}"; echo "${file##*/}"; done
Works with directories which contains spaces
Inspired by Sorpigal
while IFS= read -d $'\0' -r file ; do
echo $file; ls $file ;
done < <(find /path/to/dir/ -mindepth 1 -maxdepth 1 -type d -print0)
Original post (Does not work with spaces)
Inspired by Boldewyn: Example of loop with find command.
for D in $(find /path/to/dir/ -mindepth 1 -maxdepth 1 -type d) ; do
echo $D ;
done
find . -mindepth 1 -maxdepth 1 -type d -printf "%P\n"
The technique I use most often is find | xargs. For example, if you want to make every file in this directory and all of its subdirectories world-readable, you can do:
find . -type f -print0 | xargs -0 chmod go+r
find . -type d -print0 | xargs -0 chmod go+rx
The -print0 option terminates with a NULL character instead of a space. The -0 option splits its input the same way. So this is the combination to use on files with spaces.
You can picture this chain of commands as taking every line output by find and sticking it on the end of a chmod command.
If the command you want to run as its argument in the middle instead of on the end, you have to be a bit creative. For instance, I needed to change into every subdirectory and run the command latemk -c. So I used (from Wikipedia):
find . -type d -depth 1 -print0 | \
xargs -0 sh -c 'for dir; do pushd "$dir" && latexmk -c && popd; done' fnord
This has the effect of for dir $(subdirs); do stuff; done, but is safe for directories with spaces in their names. Also, the separate calls to stuff are made in the same shell, which is why in my command we have to return back to the current directory with popd.
a minimal bash loop you can build off of (based off ghostdog74 answer)
for dir in directory/*
do
echo ${dir}
done
to zip a whole bunch of files by directory
for dir in directory/*
do
zip -r ${dir##*/} ${dir}
done
If you want to execute multiple commands in a for loop, you can save the result of find with mapfile (bash >= 4) as a variable and go through the array with ${dirlist[#]}. It also works with directories containing spaces.
The find command is based on the answer by Boldewyn. Further information about the find command can be found there.
IFS=""
mapfile -t dirlist < <( find . -maxdepth 1 -mindepth 1 -type d -printf '%f\n' )
for dir in ${dirlist[#]}; do
echo ">${dir}<"
# more commands can go here ...
done
TL;DR:
(cd /tmp; for d in */; do echo "${d%/}"; done)
Explanation.
There's no need to use external programs. What you need is a shell globbing pattern. To avoid the need of removing /tmp afterward, I'm running it in a subshell, which may or not be suitable for your purposes.
Shell globbing patterns in a nutshell:
* Match any non-empty string any number of times.
? Match exactly one character.
[...] Matches with a character from between the brackets. You can also specify ranges ([a-z], [A-F0-9], etc.) or classes ([:digit:], [:alpha:], etc.).
[^...] Match one of the characters not between the braces.
* If no file names match the pattern, the shell will return the pattern unchanged. Any character or string that is not one of the above represents itself.
Consequently, the pattern */ will match any file name that ends with a /. A trailing / in a file name unambiguously identifies a directory.
The last bit is removing the trailing slash, which is achieved with the variable substitution ${var%PATTERN}, which removes the shortest matching pattern from the end of the string contained in var, and where PATTERN is any valid globbing pattern. So we write ${d%/}, meaning we want to remove the trailing slash from the string represented by d.
find . -type d -maxdepth 1
In short, put the results of find into an array and iterate the array and do what you want. Not the quickest but more organized thinking.
#!/bin/bash
cd /tmp
declare -a results=(`find -type d`)
#Iterate the results
for path in ${results[#]}
do
echo "Your path is $path"
#Do something with the path..
if [[ $path =~ "/A" ]]; then
echo $path | awk -F / '{print $NF}'
#prints A
elif [[ $path =~ "/B" ]]; then
echo $path | awk -F / '{print $NF}'
#Prints B
elif [[ $path =~ "/C" ]]; then
echo $path | awk -F / '{print $NF}'
#Prints C
fi
done
This can be reduced to find -type d | grep "/A" | awk -F / '{print $NF}' prints A
find -type d | grep "/B" | awk -F / '{print $NF}' prints B
find -type d | grep "/C" | awk -F / '{print $NF}' prints C

Resources