Bash - How to properly list files in a folder and manage exclusion - linux

I'm looking for a proper way to list
all filenames (without extension)
matching a specific extension list
recursively in a specific folder
with some exclusions patterns
and then export that to a file.
Currently i'm doing the following which is working properly:
ls -R --ignore={"Sample","Sample.*","sample.*","*_sample.*","*.sample.*","*-sample.*","*.sample-*","*-sample-*","*trailer]*"} "$filesSource" | grep -E '\.mkv$|\.mp4$|\.avi$' | sed -e 's/\(.*\)/\L\1/' | sort >> "$listFile"
Thanks to ShellChecker, I have a feedback on this line and I don't know how to do that properly!
Thanks for your help!

Why don't you try find command?
something like
find YOUR_PATH -type f -name "*.FIRST_EXTENSION" -o -name "*.SECOND_EXTENSION"| grep -v SOME_EXCLUSION | awk -F. '{print $(NF-1)}' | sort > SOME_FILE
note: this will work only if the filenames contain only 1 "." character for the extension, otherwise you need to modify a little bit the awk part.

If you are searching just on filenames, then you can use:
I split the command line in multiple lines:
$ find /path/to/folder -type f \( \( -name '*.ext1' -or -name '*.ext2' -or -name '*.ext3' \) -and -not \( -name '*excl1*' -or -name 'excl2*' \) \) -print
This will do:
/path/to/folder: the folder you are searching
-type f : you are searching for files in the above folder which satisfy
\(: open the conditional test
\( -name '*.ext1' -or -name '*.ext2' -or -name '*.ext3' \): who have one of the three listed extensions (with a conditional or)
-and -not \( -name '*excl1*' -or -name 'excl2*' \): if the above condition mathches it will check (-and) if one of the patterns *excl1* or excl2* do -not match.
\) close the main conditional test
-print perform the action to print the found paths.

Related

Find workspace and delete everything with the name, except for filename and everything in a directory pattern

I'm trying to create a cronjob that will delete everything with a pattern *.jar, except for master.jar and anything in a directory pattern */jarkeeper/*/staging/*
I'm close but not luck in finding the correct command. Here's what i have so far:
find /var/lib/jenkins/workspace/ ! -path "*/jarkeeper/*/staging/*" -or -type f ! -name master.jar -name \*.jar
and
find /var/lib/jenkins/workspace/ \( ! -path "*/jarkeeper/*/staging/*" \) -or \( -type f ! -name master.jar \) -name \*.jar
What should the correct format be?
The issue looks like you are using -or as opposed to -or. I would also suggest using -path as opposed to -name throughout to keep everything consistent and so:
find /var/lib/jenkins/workspace/ -type f ! -path "*master.jar" -or ! -path "*/jarkeeper/*/staging/*" -or -path "*.jar"
As an idea, I've always felt more comfortable combining more primitive tools than to use find's complex syntax, like:
find $somewhere -name \*.jar | grep -v master.jar | \
grep -vE "jarkeeper/.*/staging/" | xargs rm -rf
This also comes at the advantage that you can test/check/debug your scripts part by part.

For loop won't repeat itself

I have this block and the for loop doesn't repeat even if the path has more than 2 files.. It executes only once and that's all.. What's the problem? How can I make it run for all files in the list?
list=$(find $path -type f \( -name "*.c" -or -name "*.cpp" -or -name "*.cxx" -or -name "*.cc" \))
for file in "$list";do
#commands
done
You can avoid the use of find entirely here (Assuming the only files with those extensions are regular files; no directories etc.), via bash's extended globbing:
shopt -s extglob globstar
for file in "$path"/**/*.#(c|cpp|cxx|cc); do
# commands
done
Putting $list in quotes makes it just one word, so it doesn't loop.
But if you take out the quotes, it won't work properly if any of the filenames contain whitespace, since they'll be split into multiple words.
Instead of assigning to a variable, pipe the output to a while read loop.
find $path -type f \( -name "*.c" -or -name "*.cpp" -or -name "*.cxx" -or -name "*.cc" \) | while read -r file
do
# commands
done

Append output of Find command to Variable in Bash Script

Trying to append output of find command to a variable in a Bash script
Can append output of find command to a log file ok, but can't append it to a variable i.e.
This works ok:
find $DIR -type d -name "*" >> $DIRS_REMOVED_LOG
But this won't:
FILES_TO_EVAL=find $DIR -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \)
ENV=`basename $PS_CFG_HOME | tr "[:lower:]" "[:upper:]"`
FILE_TYPES=(*.log *.xml *.txt *.sh)
DIRS_TO_CLEAR="$PS_CFG_HOME/data/files $PS_CFG_HOME/appserv/prcs/$ENV/files $PS_CFG_HOME/appserv/prcs/$ENV/files/CQ"
FILES_REMOVED_LOG=$PS_CFG_HOME/files_removed.log
DIRS_REMOVED_LOG=$PS_CFG_HOME/dirs_removed.log
##Cycle through directories
##Below for files_removed_log works ok but can't get the find into a variable.
for DIR in `echo $DIRS_TO_CLEAR`
do
echo "Searching $DIR for files:"
FILES_TO_EVAL=find $DIR -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \)
find $DIR -type d -name "*" >> $DIRS_REMOVED_LOG
done
Expected FILES_TO_EVAL to be populated with results of find command but it is empty.
Run your scripts through ShellCheck. It finds lots of common mistakes, much like a compiler would.
FILES_TO_EVAL=find $DIR -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \)
SC2209: Use var=$(command) to assign output (or quote to assign string).
In addition to the problems that shellcheck.net will point out, there are a number of subtler problems.
For one thing, you're using all-caps variable names. This is dangerous, because there are a large number of all-caps variables that have special meanings to the shell and/or other tools, and if you accidentally use one of those, it can have weird effects. Lower- or mixed-case variables are much safer (except when you specifically want the special meaning).
Also, you should almost always put double-quotes around variable references (e.g. find "$dir" ... instead of find $dir ...). Without them, the variables will be subject to word splitting and wildcard expansion, which can have a variety of unintended consequences. In some cases, you need word splitting and/or wildcard expansion on a variable's value, but usually not quite the way the shell does it; in these cases, you should look for a better way to do the job.
In the line that's failing,
FILES_TO_EVAL=find $DIR -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \)
the immediate problem is that you need to use $(find ...) to capture the output from the find command. But this is still dangerous, because it's just storing a newline-delimited list of file paths, and the standard way to expand this (just using an unquoted variable reference) has all the problems I mentioned above. In this case, it will lead to trouble if any filenames contain spaces or wildcards (which are perfectly legal in filenames). In you're in a controlled environment where you can guarantee this won't happen, you'll get away with it... but it's really not the best idea.
Correctly handling a list of filepaths from find is a little complicated, but there are a number of ways to do it. There's a lot of good info in BashFAQ #20: "How can I find and safely handle file names containing newlines, spaces or both?" I'll summarize some common options below:
If you don't need to store the list, just run commands on individual files, you can use find -exec:
find "$dir" -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \) -exec somecommand {} \;
If you need to run something more complex, you can use find -print0 to output the list in an unambiguous form, and then use read -d '' to read them. There are a bunch of potential pitfalls here, so here's the version I use to avoid all the trouble spots:
while IFS= read -r -d '' filepath <&3; do
dosomethingwith "$filepath"
done 3< <(find "$dir" -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \) -print0)
Note that the <(command) syntax (known as process substitution) is a bash-only feature, so use an explicit bash shebang (#!/bin/bash or #!/usr/bin/env bash) on your script, and don't override it by running the script with sh.
If you really do need to store the list of paths for later, store it as an array:
files_to_eval=()
while IFS= read -r -d '' filepath; do
files_to_eval+=("$filepath")
done < <(find "$dir" -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \) -print0)
..or, if you have bash v4.4 or later, it's easier to use readarray (aka mapfile):
readarray -td '' files_to_eval < <(find "$dir" -type f \( -name '*.sh' -or -name '*.txt' -or -name '*.xml' -or -name '*.log' \) -print0)
In either case, you should then expand the array with "${files_to_eval[#]}" to get all the elements without subjecting them to word splitting and wildcard expansion.
On to some other problems. In this line:
FILE_TYPES=(*.log *.xml *.txt *.sh)
In this context, the wildcards will be expanded immediately to a list of matches in the current director. You should quote them to prevent this:
file_types=("*.log" "*.xml" "*.txt" "*.sh")
In these lines:
DIRS_TO_CLEAR="$PS_CFG_HOME/data/files $PS_CFG_HOME/appserv/prcs/$ENV/files $PS_CFG_HOME/appserv/prcs/$ENV/files/CQ"
...
for DIR in `echo $DIRS_TO_CLEAR`
You're storing a list as a single string with entries separated by spaces, which has all the word-split and wildcard problems I've been harping on. Also, the echo here is a complication that doesn't do anything useful, and actually makes the wildcard problem worse. Use an array, and avoid all the mess:
dirs_to_clear=("$ps_cfg_home/data/files" "$ps_cfg_home/appserv/prcs/$env/files" "$ps_cfg_home/appserv/prcs/$env/files/CQ")
...
for dir in "${dirs_to_clear[#]}"

Find especific directory and ignore other

I need to find all the iplanets on one server and I was thinking to use this command:
find / type d -name https-* | uniq
But at the same time I need to ignore some directories/file. I've been trying to use !, but it not always work. I have a command like this:
find / type d -name https-* ! -name https-admserv* ! -name conf_bk* ! -name alias* ! -name *db* ! -name ClassCache* | uniq
I need to ignore all that. The directories admserv, conf_bk, alias and tmp and the files *.db*
Basically I need find this:
/opt/mw/iplanet/https-daniel.com
/opt/https-daniel1.com
/apps/https-daniel2.com
I only need to find the directory name. How can I ignore all the other stuff?
Use -prune to keep from recursing into directories:
find / \( -type d \( -name 'https-admserv*' -o -name 'conf_bk*' -o -name 'alias*' -o -name 'tmp' \) -prune -o -type d -name 'https-*' -print
There's no need to ignore any files. You're only selecting https-* directories, so everything else is ignored.
And there's no need to pipe to uniq, since find never produces duplicates.

Running a script based on hostname within a group variable

I'm new to shell/bash and I'm trying to perform a function to clear logs on my Oracle files. The environment I'm working in has to have all logs fully open, but the issue we have is the volumes filling up and not allowing services to restart. I'm trying to create a script to run as a cron job to search directories depending on which group they're a part of (each group has slightly different paths and names).
I've got the script going through the "VMORDER" which cycles through the groups listed. I want it to pull the host name. Is there a way for me to say "If VM belongs to a group (i.e. GP1, GP2, etc) then run "GP1s" script"?
Thanks for any help you can provide :).
#!/bin/bash
SCRIPTDIR=[SCRIPT DIR]
GP1="vm01 vm02"
GP2="vm03 vm04 vm05"
GP3="vm06 vm07 vm08"
VMORDER="GP1 GP2 GP3"
##DIRECTORY PATHS
VCIE_DIRECTORY=[DIRECTORY]
##FILE EXCLUSION LISTING
access_log='access.log'
admin_server='AdminServer.log'
admin_service='adminservice.log'
app_ms_1='app_ms*.log'
app_ms_2='app_ms*.out'
app_wm_1='app_ms*.log'
app_wm_2='app_ms*.out'
audit_recorder_log='DefaultAuditRecorder.log'
jms_log='jms*.log'
osb_log='osb_domain.log'
diagnostic_log='diagnostic.log'
HNAME=$( hostname | cut -d'.' -f1 | tr [:lower:] [:upper:] )
find_log_rotation(){
for i in $(VMORDER)
do
clear_logs ${i}
done
}
clear_logs(){
##GP1
if [ $HNAME = GP1];
find -P $VCIE_DIRECTORY/app_ms{1..4}/logs/ -type f -not -name "$app_ms_1" -not -name "$app_ms_2" -not -name "$access_log" -not -name "$audit_recorder_log" -not -name "$jms_log" -mtime 1
fi
##GP2
if [ $HNAME = GP2];
find -P $VCIE_DIRECTORY/app_wm{1..4}/logs/ -type f -not -name "$app_wm_1" -not -name "$app_wm_2" -not -name "$access_log" -not -name "$audit_recorder_log" -not -name "$jms_log" -mtime 1
fi
##GP3
if [ $HNAME = GP3];
find -P $VCIE_DIRECTORY/AdminServer/logs/ -type f -not -name "$admin_server" -not -name "$access_log" -not -name "$access_log" -not -name "$admin_service" -not -name "$osb_domain" -mtime 1
fi
Short answer:
When the hostname is something like "vm04" und you do not tr it to uppercase, you could use:
if [[ "${GP1}" = *${HNAME}* ]]; then
The double [] make it a special syntax, right from the =-sign is an expression.
Do not put quotes around it, that would make it a normal string.
Long answer
Do you really have to look for different files on different servers? When you do not have app_ms_1 files under AdminServer, making the selection of the files to skip easier.
SKIP="${app_ms_1}|${app_ms_2}"
SKIP="${SKIP}|${access_log}|${audit_recorder_log}|${jms_log}"
SKIP="${SKIP}|${app_wm_1}|${app_wm_2}"
SKIP="${SKIP}|${admin_server}|${osb_domain}"
find ${VCIE_DIRECTORY}/*/logs -type f -mtime 1 | egrep -v "${SKIP}" | while read file; do
echo Something with ${file}
done
First make sure the code above returns the correct files (should SKIP have ${app_ms_3}, are the wildcards handled correctly).
Do you need to use the HNAME?
Than you might want to rewrite your code:
if [[ "${GP1}" = *${HNAME}* ]]; then
SKIP="${app_ms_1}|${app_ms_2}"
SKIP="${SKIP}|${access_log}|${audit_recorder_log}|${jms_log}"
STARTDIR=${VCIE_DIRECTORY}/app_ms{1..4}/logs/"
fi
# Something like this also for GP2 and GP3
find ${STARTDIR} -type f -mtime 1 | egrep -v "${SKIP}" | xargs rm

Resources