Using shell to finds directories that do not have a specified pattern of files in the folder - linux

The directory tree is like this:
.
├── A_123
│   └── 123.txt
├── A_456
│   ├── tmp
│   └── tmp.log
└── A_789
└── 789.txt
There're 3 directories (A_123, A_456, A_789).
The pattern of a directory name is: A_{numbers} and the file I'm interested in is {numbers}.txt.
I was wondering whether there's a way to get the directories A_{numbers} that has no {numbers}.txt file in them. For example above, this script should return:
./A_456
as A_456 doesn't have 456.txt in its folder but A_123 and A_789 have their {numbers}.txt files in the relevant folder.
Anyone has ideas about this? Thanks!

Here's one approach:
for dir in *;do
if [ $(find "$dir" -maxdepth 1 -regex '.*/[0-9][0-9]*\.txt' | wc -l) = 0 ]; then
echo $dir
fi
done

Since the A_[0-9] directories are not nested, you can easily do this with a glob in a loop. This implementation is pure bash, and does not spawn in external utilities:
for d in A_[0-9]*/; do # the trailing / causes only directories to be matched
files=("$d"/[0-9]*.txt) # populate an array with matching text files
((!${#files})) && echo "$d" # echo $d if the array is empty
done
There are some problems with this implementation. It will match a file such as "12ab.txt" and requires loading all the filenames for a directory into the array.
Here is another method in bash that does a more accurate filename matching:
re='^[0-9]+[.]txt$'
for d in A_[0-9]*/; do
for f in *; do
if [[ -f $f && $f =~ $re ]]; then
echo "$d"
break
fi
done
done

A slight variation on a couple of other answers: use bash extended pattern matching:
shopt -s extglob nullglob
for dir in A_+([0-9]); do
files=($dir/+([0-9]).txt)
(( ${#files[#]} == 0 )) && echo $dir
done

for file in *; do
if [[ "$file" =~ ^A_([0-9]+)$ && ! -f "$file/${BASH_REMATCH[1]}.txt" ]]; then
echo $file;
fi
done
How it works:
Checks using regexp (note the =~ ) that the file/folder's name is "A_" followed by number.
At the same time, it captures the numbers (note the parentheses) and stores them in ${BASH_REMATCH[1]}
Next, check if the folder contains {number}.txt.
If it does not, echo the folder's name.

Related

How to list directories and files in a Bash by script?

I would like to list directory tree, but I have to write script for it and as parameter script should take path to base directory. Listing should start from this base directory.
The output should look like this:
Directory: ./a
File: ./a/A
Directory: ./a/aa
File: ./a/aa/AA
Directory: ./a/ab
File: ./a/ab/AB
So I need to print path from the base directory for every directory and file in this base directory.
UPDATED
Running the script I should type in the terminal this: ".\test.sh /home/usr/Desktop/myDirectory" or ".\test.sh myDirectory" - since I run the test.sh from the Desktop level.
And right now the script should be run from the level of /home/usr/Dekstop/myDirectory"
I have the following command in my test.sh file:
find . | sed -e "s/[^-][^\/]*\// |/g"
But It is the command, not shell code and prints the output like this:
DIR: dir1
DIR: dir2
fileA
DIR: dir3
fileC
fileB
How to print the path from base directory for every dir or file from the base dir? Could someone help me to work it out?
Not clear what you want maybe,
find . -type d -printf 'Directory: %p\n' -o -type f -printf 'File: %p\n'
However to see the subtree of a directory, I find more useful
find "$dirname" -type f
To answer comment it can also be done in pure bash (builtin without external commands), using a recursive function.
rec_find() {
local f
for f in "$1"/*; do
[[ -d $f ]] && echo "Directory: $f" && rec_find "$f"
[[ -f $f ]] && echo "File: $f"
done
}
rec_find "$1"
You can use tree command. Key -L means max depth. Examples:
tree
.
├── 1
│   └── test
├── 2
│   └── test
└── 3
└── test
3 directories, 3 files
Or
tree -L 1
.
├── 1
├── 2
└── 3
3 directories, 0 files
Create your test.sh with the below codes. Here you are reading command line parameter in system variable $1 and provides parameter to find command.
#!/bin/bash #in which shell you want to execute this script
find $1 | sed -e "s/[^-][^\/]*\// |/g"
Now how will it work:-
./test.sh /home/usr/Dekstop/myDirectory #you execute this command
Here command line parameter will be assign into $1. More than one parameter you can use $1 till $9 and after that you have to use shift command. (You will get more detail information online).
So your command will be now:-
#!/bin/bash #in which shell you want to execute this script
find /home/usr/Dekstop/myDirectory | sed -e "s/[^-][^\/]*\// |/g"
Hope this will help you.

How to remove all but a few selected files in a directory?

I want to remove all files in a directory except some through a shell script. The name of files will be passed as command line argument and number of arguments may vary.
Suppose the directory has these 5 files:
1.txt, 2.txt, 3.txt. 4.txt. 5.txt
I want to remove two files from it through a shell script using file name. Also, the number of files may vary.
There are several ways this could be done, but the one that's most robust and highest performance with large directories is probably to construct a find command.
#!/usr/bin/env bash
# first argument is the directory name to search in
dir=$1; shift
# subsequent arguments are filenames to absolve from deletion
find_args=( )
for name; do
find_args+=( -name "$name" -prune -o )
done
if [[ $dry_run ]]; then
exec find "$dir" -mindepth 1 -maxdepth 1 "${find_args[#]}" -print
else
exec find "$dir" -mindepth 1 -maxdepth 1 "${find_args[#]}" -exec rm -f -- '{}' +
fi
Thereafter, to list files which would be deleted (if the above is in a script named delete-except):
dry_run=1 delete-except /path/to/dir 1.txt 2.txt
or, to actually delete those files:
delete-except /path/to/dir 1.txt 2.txt
A simple, straightforward way could be using the GLOBIGNORE variable.
GLOBIGNORE is a colon-separated list of patterns defining the set of filenames to be ignored by pathname expansion. If a filename matched by a pathname expansion pattern also matches one of the patterns in GLOBIGNORE, it is removed from the list of matches.
Thus, the solution is to iterate through the command line args, appending file names to the list. Then call rm *. Don't forget to unset GLOBIGNORE var at the end.
#!/bin/bash
for arg in "$#"
do
if [ $arg = $1 ]
then
GLOBIGNORE=$arg
else
GLOBIGNORE=${GLOBIGNORE}:$arg
fi
done
rm *
unset GLOBIGNORE
*In case you had set GLOBIGNORE before, you can just store the val in a tmp var then reset it at the end.
We can accomplish this in pure Bash, without the need for any external tools:
#!/usr/bin/env bash
# build an associative array that contains all the filenames to be preserved
declare -A skip_list
for f in "$#"; do
skip_list[$f]=1
done
# walk through all files and build an array of files to be deleted
declare -a rm_list
for f in *; do # loop through all files
[[ -f "$f" ]] || continue # not a regular file
[[ "${skip_list[$f]}" ]] && continue # skip this file
rm_list+=("$f") # now it qualifies for rm
done
# remove the files
printf '%s\0' "${rm_list[#]}" | xargs -0 rm -- # Thanks to Charles' suggestion
This solution will also work for files that have whitespaces or glob characters in them.
Thanks all for your answers, I have figured out my solution. Below is the solution worked for me:
find /home/mydir -type f | grep -vw "goo" | xargs rm

Bash - finding files with spaces and rename with sed [duplicate]

This question already has answers here:
Recursively rename files using find and sed
(20 answers)
Closed 9 years ago.
I have been trying to write a script to rename all files that contain a space and replace the space with a dash.
Example: "Hey Bob.txt" to "Hey-Bob.txt"
When I used a for-loop, it just split up the file name at the space, so "Hey Bob.txt" gave separate argument like "Hey" and "Bob.txt".
I tried the following script but it keeps hanging on me.
#!/bin/bash
find / -name '* *' -exec mv {} $(echo {} | sed 's/ /-g')\;
Building off OP's idea:
find ${PATH_TO_FILES} -name '* *' -exec bash -c 'eval $(echo mv -v \"{}\" $(echo {} | sed "s/ /-/g"))' \;
NOTE: need to specify the PATH_TO_FILES variable
EDIT: BroSlow pointed out need to consider directory structure:
find ${PATH_TO_FILES} -name '* *' -exec bash -c 'DIR=$(dirname "{}" | sed "s/ /-/g" ); BASE=$(basename "{}"); echo mv -v \"$DIR/$BASE\" \"$DIR/$(echo $BASE | sed "s/ /-/g")\"' \; > rename-script.sh ; sh rename-script.sh
Another way:
find . -name "* *" -type f |while read file
do
new=${file// /}
mv "${file}" $new
done
Not one line, but avoids sed and should work just as well if you're going to be using it for a script anyway. (replace the mv with an echo if you want to test)
In bash 4+
#!/bin/bash
shopt -s globstar
for file in **/*; do
filename="${file##*/}"
if [[ -f $file && $filename == *" "* ]]; then
onespace=$(echo $filename)
dir="${file%/*}"
[[ ! -f "$dir/${onespace// /-}" ]] && mv "$file" "$dir/${onespace// /-}" || echo "$dir/${onespace// /-} already exists, so not moving $file" 1>&2
fi
done
Older bash
#!/bin/bash
find . -type f -print0 | while read -r -d '' file; do
filename="${file##*/}"
if [[ -f $file && $filename == *" "* ]]; then
onespace=$(echo $filename)
dir="${file%/*}"
[[ ! -f "$dir/${onespace// /-}" ]] && mv "$file" "$dir/${onespace// /-}" || echo "$dir/${onespace// /-} already exists, so not moving $file" 1>&2
fi
done
Explanation of algorithm
**/* This recursively lists all files in the current directory (** technically does it but /* is added at the end so it doesn't list the directory itself)
${file##*/} Will search for the longest pattern of */ in file and remove it from the string. e.g. /foo/bar/test.txt gets printed as test.txt
$(echo $filename) Without quoting echo will truncate spaces to one, making them easier to replace with one - for any number of spaces
${file%/*} Remove everything after and including the last /, e.g. /foo/bar/test.txt prints /foo/bar
mv "$file" ${onespace// /-} replace every space in our filename with - (we check if the hyphened version exists before hand and if it does echo that it failed to stderr, note && is processed before || in bash)
find . -type f -print0 | while read -r -d '' file This is used to avoid break up strings with spaces in them by setting a delimiter and not processing \
Sample Output
$ tree
.
├── bar
│   ├── some dir
│   │   ├── some-name-without-space1.pdf
│   │   ├── some name with space1.pdf
│   ├── some-name-without-space1.pdf
│   ├── some name with space1.pdf
│   └── some-name-with-space1.pdf
└── space.sh
$ ./space.sh
bar/some-name-with-space1.pdf already exists, so not moving bar/some name with space1.pdf
$ tree
.
├── bar
│   ├── some dir
│   │   ├── some-name-without-space1.pdf
│   │   ├── some-name-with-space1.pdf
│   ├── some-name-without-space1.pdf
│   ├── some name with space1.pdf
│   └── some-name-with-space1.pdf
└── space.sh

A bash script to run a program for directories that do not have a certain file

I need a Bash Script to Execute a program for all directories that do not have a specific file and create the output file on the same directory.This program needs an input file which exist in every directory with the name *.DNA.fasta.Suppose I have the following directories that may contain sub directories also
dir1/a.protein.fasta
dir2/b.protein.fasta
dir3/anyfile
dir4/x.orf.fasta
I have started by finding the directories that don't have that specific file whic name is *.protein.fasta
in this case I want the dir3 and dir4 to be listed (since they do not contain *.protein.fasta)
I have tried this code:
find . -maxdepth 1 -type d \! -exec test -e '{}/*protein.fasta' \; -print
but it seems I missed some thing it does not work.
also I do not know how to proceed for the whole story.
This is a tricky one.
I can't think of a good solution. But here's a solution, nevertheless. Note that this is guaranteed not to work if your directory or file names contain newlines, and it's not guaranteed to work if they contain other special characters. (I've only tested with the samples in your question.)
Also, I haven't included a -maxdepth because you said you need to search subdirectories too.
#!/bin/bash
# Create an associative array
declare -A excludes
# Build an associative array of directories containing the file
while read line; do
excludes[$(dirname "$line")]=1
echo "excluded: $(dirname "$line")" >&2
done <<EOT
$(find . -name "*protein.fasta" -print)
EOT
# Walk through all directories, print only those not in array
find . -type d \
| while read line ; do
if [[ ! ${excludes[$line]} ]]; then
echo "$line"
fi
done
For me, this returns:
.
./dir3
./dir4
All of which are directories that do not contain a file matching *.protein.fasta. Of course, you can replace the last echo "$line" with whatever you need to do with these directories.
Alternately:
If what you're really looking for is just the list of top-level directories that do not contain the matching file in any subdirectory, the following bash one-liner may be sufficient:
for i in *; do test -d "$i" && ( find "$i" -name '*protein.fasta' | grep -q . || echo "$i" ); done
#!/bin/bash
for dir in *; do
test -d "$dir" && ( find "$dir" -name '*protein.fasta' | grep -q . || Programfoo"$dir/$dir.DNA.fasta");
done

Appending rather than overwriting files when moving

I have the following directory structure:
+-archive
+-a
+-data.txt
+-b
+-data.txt
+-incoming
+-a
+-data.txt
+-c
+-data.txt
How do I do the equivalent of mv incoming/* archive/ but have the contents of the files in incoming appended to those in archive rather than overwrite them?
# move to incoming/ so that we don't
# need to strip a path prefix
cd incoming
# create directories that are missing in archive
for d in `find . -type d`; do
if [ ! -d "../archive/$d" ]; then
mkdir -p "../archive/$d"
fi
done
# concatenate all files to already existing
# ones (or automatically create them)
for f in `find . -type f`; do
cat "$f" >> "../archive/$f"
done
This should find any file in incoming and concatenate it to an existing file in archive.
The important part is to be inside incoming, because else we'd had to strip the path prefix (which is possible, but in the above case unnecessary). In the above case, a value of $f typically looks like ./a/data.txt, and hence the redirection goes to ../archive/./a/data.txt.
run it on the current directory.
find ./incoming -type f | while read -r FILE
do
dest=${FILE/incoming/archive}
cat "$FILE" >> "$dest"
done
the one in incoming/c would not be appended though
Here's a version with proper quoting:
#!/bin/sh
if [ -z "$1" ]; then
# acting as parent script
find incoming -type f -exec "$0" {} \;
else
# acting as child script
for in_file; do
if [ -f "$in_file" ]; then
destfile="${in_file/incoming/archive}"
test -d "$(dirname "$destfile")" || mkdir -p "$_"
cat "$in_file" >> "$destfile" &&
rm -f "$in_file"
fi
done
fi

Resources