How to find a list of files that are of specific extension but do not contain certain characters in their file name? - linux

I have a folder with files that have extensions, such as .txt, .sh and .out.
However, I want a list of files that have only .txt extension, with the file names not containing certain characters.
For example, the .txt files are named L-003_45.txt and so on all up to L-003_70.txt. Some files have a change in the L-003 part to lets say L-004, creating duplicates of lets say file 45, so basically both L-003_45.txt and L-004_45.txt exist. So I want to get a list of text files that don't have 45 in their name.
How would I do that?
I tried with find and ls and succeeded but I would like to know how to do a for loop instead.
I tried:
for FILE in *.txt; do ls -I '*45.txt'; done but it failed.
Would be grateful for the help!

Or you use Bash's extendedglobing
#!/usr/bin/env bash
# Enables extended globing
shopt -s extglob
# Prevents iterating patterns if no match found
shopt -s nullglob
# Iterates files not having 45 or 57 before .txt
for file in !(*#(45|57)).txt; do
printf '%s\n' "$file"
done

I would advise you to use the find command to find all files with the required extensions, and later filter out the ones with the "strange" characters, e.g. for finding the file extensions:
find ./ -name "*.txt" -o -name "*.sh" -o name "*.out"
... and now, for not showing the ones with "45" in the name, you can do:
find ./ -name "*.txt" -o -name "*.sh" -o name "*.out" | grep -v "45"
... and if you don't want "45" nor "56", you can do:
find ./ -name "*.txt" -o -name "*.sh" -o name "*.out" | grep -v "45" | grep -v "56"
Explanation:
-o stands for OR
grep -v stands for "--invert-match" (not showing those results)

Setup:
$ touch L-004_23.txt L-003_45.txt L-004_45.txt L-003_70.txt
$ ls -1 L*txt
L-003_45.txt
L-003_70.txt
L-004_23.txt
L-004_45.txt
One idea using ! to negate a criteria:
$ find . -name "*.txt" ! -name "*_45.txt"
./L-003_70.txt
./L-004_23.txt
Feeding the find results to a while loop, eg:
while read -r file
do
echo "file: ${file}"
done < <(find . -name "*.txt" ! -name "*_45.txt")
This generates:
file: ./L-003_70.txt
file: ./L-004_23.txt

The proposed solution with extglob is a very good one. In case you need to exclude more than one pattern you can also test and continue. Example to exclude all *45.txt and *57.txt:
declare -a excludes=("45" "57")
for f in *.txt; do
for e in "${excludes[#]}"; do
[[ "$f" == *"$e.txt" ]] && continue 2
done
printf '%s\n' "$f"
done

Related

Need guidance with a bash script to check log files in a certain directory for a certain string

I would like to preface this with I am a complete noob with scripting. So I have a situation where I need to manually look for a phone number that could live in one of hundreds of files.
so the logs live in the following directory.
/actlogs/sbclogger_archive
The logs file names are in directories numbered 01-31 inside of that directory and all the files are zipped.
Inside of those numbered directories are tons of files but the only ones I want to search are "sipd.logthenthedate.gz" and "sipmsg.logthenthedate.gz".
So I need to look in all the files in the following directory.
"/actlogs/sbclogger_archive"
Which has 31 directories labeled "01-31"
Then in each 01-31 there is hundreds of files the only ones I want to look are are "sipd.logthenthedate.gz" and "sipmsg.logthenthedate.gz".
The script I am using is below, please let me know what I could do to make this work.
#!/bin/bash
read -p "Enter a phone number: " text
read -p "Enter directory of log file's, Hint it should be /actlogs/sbclogger_archive: " directory
#arr=( $(find $directory -type f -exec grep -l "$text" {} \; | sort -r) )
#find $directory -type f -exec grep -qe "$text" {} \; -exec bash -c '
file=$(find $directory -type f -name 'sipd.log*' -exec grep -qe "$text" {} \; -exec bash -c 'select f; do echo $f; break; done' find-sh {} +;)
if [ -z "$file" ]; then
echo "No matches found."
else
echo "select tool:"
tools=("nano" "less" "vim" "quit")
select tool in "${tools[#]}"
do
case $tool in
"quit")
break
;;
*)
$tool $file
break
;;
esac
done
fi
This would give you the list of files matching:
find \( -name 'sipd.log[0-9]*.gz' -o -name 'sipmsg.log[0-9]*.gz' \) \
-exec sh -c 'gunzip -c {}| grep -m1 -q 888333' \; -print
./18/sipd.log20200118.gz
./7/sipd.log20200107.gz
Note: -m1 tells grep to stop after first match, since you need only the file name in this case, it's enough.
If you have zgrep, you can shorten it to:
find \( -name 'sipd.log[0-9]*.gz' -o -name 'sipmsg.log[0-9]*.gz' \) \
-exec zgrep -l '888333' {} \;
./18/sipd.log20200118.gz
./7/sipd.log20200107.gz
Also, some of the tools you are suggesting do not support gzip files (nano and some variants of less for example). In which case you might need to decompress the file and compress it again when done.
And, you might want to consider a loop if you want to "quit". Feeding the file list to the tool doesn't make sense.
Note: AFAIK zgrep doesn't do recursive:
DESCRIPTION
Zgrep invokes grep on compressed or gzipped files. These grep options will cause zgrep to terminate with an
error code:
(-[drRzZ]|--di*|--exc*|--inc*|--rec*|--nu*). All other options specified are passed directly to grep. If no file is specified, then
the
standard input is decompressed if necessary and fed to grep. Otherwise the given files are uncompressed if necessary and fed to
grep.
so zgrep -rl "$text" "$directory" or zgrep -rl --include 'simpd.log*.gz' "$test" {01..31} won't work except if you have a special zgrep
As you must unzip before using your tool, i would divide the problem in two blocks.
Firstly, i would expand the paths you need (looking under <directory> for the phone <text>), and then iterate to apply the tool (because some tools like vim or nano cannot be piped).
Try something like this:
#!/bin/bash
#...
# text/directory input stuff
#...
tmpdir=$(mktemp -d)
trap 'rm -rf ${tmpdir}' EXIT
while IFS= read -r file; do
unzipped=${tmpdir}/$(basename "${file}" .gz)
gunzip -c "${file}" > "${unzipped}"
${tool} "${unzipped}"
done < <(zgrep -lw "${text}" "${directory}"/{01..31}/{sipd.logthenthedate.gz,sipmsg.logthenthedate.gz} 2>/dev/null)
Above is the proposed invert-form by Charles Duffy following this Bash FAQ.
If you prefer to iterate an array, you could build in this way:
# shellcheck disable=SC2207
files=( $(zgrep -lw "${text}" "${directory}"/{01..31}/{sipd.logthenthedate.gz,sipmsg.logthenthedate.gz} 2>/dev/null) )
for file in "${files[#]}"; do
# etc.
as in our particular case, the files to match have no spaces in their names and shellcheck warning is not so important (hidden above).
BRs

Remove all additional extensions from files

I have files that have any number of extensions
a.jpg
b.jpg.jpg
c.jpg.jpg.png
...
I need to remove all extensions beyond the first one
a.jpg
b.jpg
c.jpg
...
I currently have the following to find files with additional extensions
find . -type f -name "*.*.*"
find . -type f -name "*.*.*.*"
find . -type f -name "*.*.*.*.*"
...
I'm not sure how to make a cleaner version of this. The only periods in the file names are right before the extensions so I can pick everything up to and including the first extension with regex along with -exec mv in the above find command but I'm not sure how to do this.
If you have rename utility then you can use:
rename -n 's/([^.]*\.[^.]+).+/$1/' *.*.*
Remove -n (dry run) after you are satisfied with output.
If you don't have rename then you may use sed like this:
for i in *.*.*; do
echo mv "$i" "$(sed -E 's/([^.]*\.[^.]+).+/\1/' <<< "$i")"
done
Remove echo after you're satisfied with the output.
Using only parameter expansion with extglob:
#!/usr/bin/env bash
shopt -s extglob
for file in *.*.*; do
del=${file##+([^.]).+([^.])}
mv "$file" "${file%"$del"}"
done

unix bash find file directories with 2 explicit file extensions

I am trying to create a small bash script that essentially looks through a directory that includes hundreds of sub directories. in SOME of these subdirectories include a textfile.txt and a htmlfile.html where the names textfile and htmlfile are variable.
I only really care about sub directories that have both the .txt and the .html, all other subdirecories can be ignored.
I then want to list all the .html files and .txt files that are in the same sub directory
this seems like a pretty simple issue to solve but I am at a loss. all I can really get working is a line of code that outputs sub directories that have either a .html file or .txt with no association with the actual sub directory they are in, and I am pretty new at bash scripting so I can't go any further
#!/bin/bash
files="$(find ~/file/ -type f -name '*.txt' -or -name '*.html')"
for file in $files
do
echo $file
done
The following find command looks checks every subdirectory and, if it has both html and txt files, it lists all of them:
find . -type d -exec env d={} bash -c 'ls "$d"/*.html &>/dev/null && ls "$d"/*.txt &>/dev/null && ls "$d/"*.{html,txt}' \;
Explanation:
find . -type d
This looks for all subdirectories of the current directory.
-exec env d={} bash -c '...' \;
This sets the environment variable d to the value of the found subdirectory and then executes the bash command that is contained within the single quotes (see below).
ls "$d"/*.html &>/dev/null && ls "$d"/*.txt &>/dev/null && ls "$d/"*.{html,txt}
This is the bash command that is executed. It consists of three statements and-ed together. The first checks to see if directory d has any html files. If so, the second statement runs and it checks to see if there are any txt files. If so, the last statement is executed and it lists all html and txt files in the directory d.
This command is safe for all file and directory names containing spaces, tabs, or other difficult characters.
You could do it by searching recursively with the globstar option:
shopt -s globstar
for file in **; do
if [[ -d $file ]]; then
for sub_file in "$file"/*; do
case "$sub_file" in
*.html)
html=1;;
*.txt)
txt=1;;
esac
done
[[ $html && $txt ]] && echo "$file"
html=""
txt=""
fi
done
You can make use of -o
#!/bin/bash
files=$(find ~/file/ -type f -name '*.txt' -o -name '*.html')
for file in $files
do
echo $file
done
#!/bin/bash
#A quick peek into a dir to see if there's at least one file that matches pattern
dir_has_file() { dir="$1"; pattern="$2";
[ -n "$(find "$dir" -maxdepth 1 -type f -name "$pattern" -print -quit)" ]
}
#Assumes there are no newline characters in the filenames, but will behave correctly with subdirectories that match *.html or *.txt
find "$1" -type d|\
while read d
do
dir_has_file "$d" '*.txt' &&
dir_has_file "$d" '*.html' &&
#Now print all the matching files
find "$d" -maxdepth 1 -type f -name '*.txt' -o -name '*.html'
done
This script takes the root directory to look into as the first argument ($1).
The test command is what you need to check for the existence of each file in each of the subdirs:
find . -type d -exec sh -c "if test -f {}/$file1 -a -f {}/$file2 ; then ls {}/*.{txt,html} ; fi" \;
where $file1 and $file2 are the two .txt and .html files you are looking for.

Recursively prepend text to file names

I want to prepend text to the name of every file of a certain type - in this case .txt files - located in the current directory or a sub-directory.
I have tried:
find -L . -type f -name "*.txt" -exec mv "{}" "PrependedTextHere{}" \;
The problem with this is dealing with the ./ part of the path that comes with the {} reference.
Any help or alternative approaches appreciated.
You can do something like this
find -L . -type f -name "*.txt" -exec bash -c 'echo "$0" "${0%/*}/PrependedTextHere${0##*/}"' {} \;
Where
bash -c '...' executes the command
$0 is the first argument passed in, in this case {} -- the full filename
${0%/*} removes everything including and after the last / in the filename
${0##*/} removes everything before and including the last / in the filename
Replace the echo with a mv once you're satisfied it's working.
Are you just trying to move the files to a new file name that has Prepend before it?
for F in *.txt; do mv "$F" Prepend"$F"; done
Or do you want it to handle subdirectories and prepend between the directory and file name:
dir1/PrependA.txt
dir2/PrependB.txt
Here's a quick shot at it. Let me know if it helps.
for file in $(find -L . -type f -name "*.txt")
do
parent=$(echo $file | sed "s=\(.*/\).*=\1=")
name=$(echo $file | sed "s=.*/\(.*\)=\1=")
mv "$file" "${parent}PrependedTextHere${name}"
done
This ought to work, as long file names does not have new line character(s). In such case make the find to use -print0 and IFS to have null.
#!/bin/sh
IFS='
'
for I in $(find -L . -name '*.txt' -print); do
echo mv "$I" "${I%/*}/prepend-${I##*/}"
done
p.s. Remove the echo to make the script effective, it's there to avoid accidental breakage for people who randomly copy paste stuff from here to their shell.

A bash script to run a program for directories that do not have a certain file

I need a Bash Script to Execute a program for all directories that do not have a specific file and create the output file on the same directory.This program needs an input file which exist in every directory with the name *.DNA.fasta.Suppose I have the following directories that may contain sub directories also
dir1/a.protein.fasta
dir2/b.protein.fasta
dir3/anyfile
dir4/x.orf.fasta
I have started by finding the directories that don't have that specific file whic name is *.protein.fasta
in this case I want the dir3 and dir4 to be listed (since they do not contain *.protein.fasta)
I have tried this code:
find . -maxdepth 1 -type d \! -exec test -e '{}/*protein.fasta' \; -print
but it seems I missed some thing it does not work.
also I do not know how to proceed for the whole story.
This is a tricky one.
I can't think of a good solution. But here's a solution, nevertheless. Note that this is guaranteed not to work if your directory or file names contain newlines, and it's not guaranteed to work if they contain other special characters. (I've only tested with the samples in your question.)
Also, I haven't included a -maxdepth because you said you need to search subdirectories too.
#!/bin/bash
# Create an associative array
declare -A excludes
# Build an associative array of directories containing the file
while read line; do
excludes[$(dirname "$line")]=1
echo "excluded: $(dirname "$line")" >&2
done <<EOT
$(find . -name "*protein.fasta" -print)
EOT
# Walk through all directories, print only those not in array
find . -type d \
| while read line ; do
if [[ ! ${excludes[$line]} ]]; then
echo "$line"
fi
done
For me, this returns:
.
./dir3
./dir4
All of which are directories that do not contain a file matching *.protein.fasta. Of course, you can replace the last echo "$line" with whatever you need to do with these directories.
Alternately:
If what you're really looking for is just the list of top-level directories that do not contain the matching file in any subdirectory, the following bash one-liner may be sufficient:
for i in *; do test -d "$i" && ( find "$i" -name '*protein.fasta' | grep -q . || echo "$i" ); done
#!/bin/bash
for dir in *; do
test -d "$dir" && ( find "$dir" -name '*protein.fasta' | grep -q . || Programfoo"$dir/$dir.DNA.fasta");
done

Resources