How to store --exclude arguments for grep in an environment variable in a bash script - linux

My goal is to search a file-hierarchy for certain text patterns, excluding certain file-name patterns, and recursively copy just the matching files to a local directory named confs. The following script does the job:
#!/bin/bash
export FEXCLUDE="{*edit,*debug,*orig,*BAK,*bak,*fcs,*NOPE,*tomcat,*full.xml,*-ha.xml}";
export SRCDIR=/opt/jboss-as-7.1.1.Final/standalone;
confshow() {
for ii in `grep -rlZ \
--exclude={*edit,*debug,*orig,*BAK,*bak,*fcs,*NOPE,*tomcat,*full.xml,*-ha.xml} \
--exclude-dir={log,tmp,i2b2.war,*.log,*_history,*.old} "<datasource\|username\|password\|user-name" \
$SRCDIR/* | xargs -0 ls {}` ;
do cp --parents $ii confs;
done;
}
However, the exclusion patterns are likely to need frequent updates and may need to be shared with other functions, so I prefer to have them all in a variables declared at the beginning of the script. When I do the following, files that should be excluded get copied to the confs directory:
#!/bin/bash
export FEXCLUDE="{*edit,*debug,*orig,*BAK,*bak,*fcs,*NOPE,*tomcat,*full.xml,*-ha.xml}";
export SRCDIR=/opt/jboss-as-7.1.1.Final/standalone;
confshow() {
for ii in `grep -rlZ \
--exclude=$FEXCLUDE \
--exclude-dir={log,tmp,i2b2.war,*.log,*_history,*.old} "<datasource\|username\|password\|user-name" \
$SRCDIR/* | xargs -0 ls {}` ;
do cp --parents $ii confs;
done;
}
Any idea how to obtain the desired behavior? Or how to see what grep sees when it gets passed the $FEXCLUDE argument (echo doesn't show anything wrong)?
Thanks.

Brace expansion is nice for interactive use, but if you are writing a script, just use your editor to quickly copy the necessary --exclude options and store them in an array. Parameter expansions do not undergo brace expansion, as you may have noticed.
#!/bin/bash
# You didn't need to export these anyway, since only your script uses them
FEXCLUDE=( --exclude '*edit'
--exclude '*debug'
# etc
)
DEXCLUDE=( --exclude-dir log
--exclude-dir tmp
# etc
)
SRCDIR=/opt/jboss-as-7.1.1.Final/standalone
confshow() {
while IFS= read -d'' -r ii; do
cp --parents "$ii" confs
done < <( grep -rlZ "${FEXCLUDE[#]}" "${DEXCLUDE[#]}" "<datasource\|username\|password\|user-name" $SRCDIR/* )
Also, using ls defeats the purpose of using null-delimited output from grep in the first place.

I know this will raise frowns but this can be solved by using eval and it might not come with usual risks as we're using pattern in --exclude= argument.
#!/bin/bash
fexclude='{*edit,*debug,*orig,*BAK,*bak,*fcs,*NOPE,*tomcat,*full.xml,*-ha.xml}'
dexclude='{log,tmp,i2b2.war,*.log,*_history,*.old}'
srcdir=/opt/jboss-as-7.1.1.Final/standalone
confshow() {
eval grep -rlZ \
--exclude="$fexclude" \
--exclude-dir="$dexclude" \
"<datasource\|username\|password\|user-name" \
$srcdir/* | xargs -0 -I {} cp --parents '{}' confs
done
}

Related

Concatenate (using bash) all file names in subdirectories with option

I have directory work_dir, and there are some subdirectories inside. And inside subdirectories there are zip archives. I can see all zip archives in terminal:
find . -name *.zip
The output:
./folder2/sub/dir/test2.zip
./folder3/test3.zip
./folder1/sub/dir/new/test1.zip
Now I want to concatinate all these file names in single row with some option. For example I want single row:
my_command -f ./folder2/sub/dir/test2.zip -f ./folder3/test3.zip -f ./folder1/sub/dir/new/test1.zip -u user1 -p pswd1
In this example:
my_command is some command
-f the option
-u user1 another option with value
-p pswd1 another option with value
Can you help me please, how can I do this in Linux BASH ?
One way is: (updated per #M. Nejat Aydin comments)
find . -name "*.zip" -print0 | xargs -0 -n1 printf -- '-f\0%s\0' | xargs -0 -n100000 my_command -u user1 -p pswd1
Note that -n100000 parameter forces all output of the previous xargs to be executed on the same line with the assumption that number of findings will be less than 100000.
I used null terminated versions (notice: -0 flag, -print0) because file names can contain spaces.
This is a bash script that should do what you wanted.
#!/usr/bin/env bash
user=user1
passwd=pswd1
while IFS= read -rd '' files; do
args+=(-f "$files")
done < <(find . -name '*.zip' -print0)
args=("${args[#]}" -u "$user" -p "$passwd")
##: Just for the human eye to see the output,
##: change this line of code according to the comment below.
printf 'mycommand %s\n' "${args[*]}"
The output should be in one-line, like what you wanted, but do change the last line from
printf 'mycommand %s\n' "${args[*]}"
into
mycommand "${args[#]}"
If you actually want to execute mycommand with the arguments.
Change the value of user and passwd too.
A while + read loop was used with IFS.
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
Why the last line should be change.
See Arguments
Shell quoting is a basic but common mistake when dealing with spaces in file/path name.
See How can I find and safely handle file names containing
Also the find command/utiliy.
The construct "${args[#}" is an array.
See Array1 Array2 Array3
You can do this by making a bash script.
Make a new file called whatever.sh
Type chmod +x ./whatever.sh so it becomes executable on the terminal
Add the BASH scripting as shown below..
#!/bin/bash
# Get all the zip files from your FolderName
files="`find ./FolderName -name *.zip`"
# Loop through the files and build your args
arg=""
for file in $files; do
arg="$arg -f $file"
done
# Run your command
mycommand $arg -u user1 -p pswd1

i want to delete all files via shell except few as but results in Syntax error: "(" unexpected

rm -rf * ! ( "update.sh" | "new_update" ) #or
rm -rf ! ( "update.sh" | "new_update" ) #or
rm -rf ! ( update.sh | new_update ) #or
I want to delete all files except update.sh and new_update
I have tired above all lines one by one in shell script but return error
unexpected token (
and when run directly on terminal some times executes and some times gives same error as above
First, you have to enable extended globbing with
shopt -s extglob
Then you can't have spaces between the parts of the wildcard.
rm -rf !(update.sh|new_update)
You can use grep -v to invert match, -w for whole words and -E fir regex to be able add more files.
rm $( printf '%s\n' * | grep -Ewv "update.sh|new_update" )
I don't guarantee it to work with filenames containing spaces and other crazy characters.
Safer option that requires gnu extensions is:
(read here more details)
printf '%s\0' * | grep -zEv '^(update.sh|new_update)$' | xargs -0 rm --
Ed: as commentator accuse ls and xargs of being 'dangerous' (here is why) I keep it for the record.
ls * | grep -Ev "update.sh|new_update" | xargs rm

Set file modification time from the date string present in the filename

I'm restoring a number of archives with dates within their names, something along the lines of:
user-2018.12.20.tar.xz
user-2019.01.10.tar.xz
user-2019.02.25.tar.xz
user-2019.04.19.tar.xz
...
I want to set each file's modification date to match the date in their filename by piping the filenames to touch via xargs and using replace-str to set the dates.
touch -m -t will take a datetime in the format [CCYYMMDDhhmm], but I'm having trouble substituting inline:
find . -name "*.xz" | xargs -I {} touch -m -t $(sed -e 's/\.tar\.xz//g; s/user-//g; s/\.//g; s/\///g; s/$/0000/g' {}) {}
Returns touch: invalid date format ‘./user-2018.03.22.tar.xz’, even though this:
find . -name "*.xz" | sed -e 's/\.tar\.xz//g; s/user-//g; s/\.//g; s/\///g; s/$/0000/g'
Returns properly-formatted dates, for example 201812200000. Am I misusing command substitution in my replace string somehow?
EDIT : Yes, a simple script could do this no problem. But the question remains...
You don't need find, sed, xargs or any third party tools, but just use the shell built-in regex capabilities to get the timestamp from the file
for file in *.tar.xz; do
[ -f "$file" ] || continue
if [[ $file =~ ^user-([[:digit:]]+).([[:digit:]]+).([[:digit:]]+).tar.xz$ ]]; then
dateStr="${BASH_REMATCH[1]}${BASH_REMATCH[2]}${BASH_REMATCH[3]}0000"
touch -m -t "$dateStr"
fi
done
The problem is that the command substitution will be evaluated once when you call xargs, not for each argument. You would need to spawn a shell for that:
find . -name "*.xz" \
| xargs -I {} bash -c 'touch -m --date "$(sed -e "s/\.tar\.xz//;s/user-//g; s/\.//g; s/\///g;" <<< "$1")" "$1"' -- {}
Note: xargs is not needed because you can use the -exec option of find:
find . -name "*.xz" -exec bash -c 'touch -m --date "$(sed -e "s/\.tar\.xz//;s/user-//g; s/\.//g; s/\///g;" <<< "$1")" "$1"' -- {} \;
PS: A small for loop would be more readable:
for file in user-*.tar.xz ; do
# remove prefix and suffix
date=${file#user-}
date=${date%.tar.xz}
# replace dots by /
date=${date//./\/}
touch -m --date "${date}" "${file}"
done
This might work for you (GNU parallel):
parallel --dryrun touch -m --date '{= s/[^0-9]//g =}' {} ::: *.xz
When happy that the commands are correct, then remove the --dryrun option.
Alternative:
parallel touch -m --date `{= s/user-//;s/\.tar\.xz//;s/\.//g =}' {} ::: *.xz

Deleting all files except ones mentioned in config file

Situation:
I need a bash script that deletes all files in the current folder, except all the files mentioned in a file called ".rmignore". This file may contain addresses relative to the current folder, that might also contain asterisks(*). For example:
1.php
2/1.php
1/*.php
What I've tried:
I tried to use GLOBIGNORE but that didn't work well.
I also tried to use find with grep, like follows:
find . | grep -Fxv $(echo $(cat .rmignore) | tr ' ' "\n")
It is considered bad practice to pipe the exit of find to another command. You can use -exec, -execdir followed by the command and '{}' as a placeholder for the file, and ';' to indicate the end of your command. You can also use '+' to pipe commands together IIRC.
In your case, you want to list all the contend of a directory, and remove files one by one.
#!/usr/bin/env bash
set -o nounset
set -o errexit
shopt -s nullglob # allows glob to expand to nothing if no match
shopt -s globstar # process recursively current directory
my:rm_all() {
local ignore_file=".rmignore"
local ignore_array=()
while read -r glob; # Generate files list
do
ignore_array+=(${glob});
done < "${ignore_file}"
echo "${ignore_array[#]}"
for file in **; # iterate over all the content of the current directory
do
if [ -f "${file}" ]; # file exist and is file
then
local do_rmfile=true;
# Remove only if matches regex
for ignore in "${ignore_array[#]}"; # Iterate over files to keep
do
[[ "${file}" == "${ignore}" ]] && do_rmfile=false; #rm ${file};
done
${do_rmfile} && echo "Removing ${file}"
fi
done
}
my:rm_all;
If we assume that none of the files in .rmignore contain newlines in their name, the following might suffice:
# Gather our exclusions...
mapfile -t excl < .rmignore
# Reverse the array (put data in indexes)
declare -A arr=()
for file in "${excl[#]}"; do arr[$file]=1; done
# Walk through files, deleting anything that's not in the associative array.
shopt -s globstar
for file in **; do
[ -n "${arr[$file]}" ] && continue
echo rm -fv "$file"
done
Note: untested. :-) Also, associative arrays were introduced with Bash 4.
An alternate method might be to populate an array with the whole file list, then remove the exclusions. This might be impractical if you're dealing with hundreds of thousands of files.
shopt -s globstar
declare -A filelist=()
# Build a list of all files...
for file in **; do filelist[$file]=1; done
# Remove files to be ignored.
while read -r file; do unset filelist[$file]; done < .rmignore
# Annd .. delete.
echo rm -v "${!filelist[#]}"
Also untested.
Warning: rm at your own risk. May contain nuts. Keep backups.
I note that neither of these solutions will handle wildcards in your .rmignore file. For that, you might need some extra processing...
shopt -s globstar
declare -A filelist=()
# Build a list...
for file in **; do filelist[$file]=1; done
# Remove PATTERNS...
while read -r glob; do
for file in $glob; do
unset filelist[$file]
done
done < .rmignore
# And remove whatever's left.
echo rm -v "${!filelist[#]}"
And .. you guessed it. Untested. This depends on $f expanding as a glob.
Lastly, if you want a heavier-weight solution, you can use find and grep:
find . -type f -not -exec grep -q -f '{}' .rmignore \; -delete
This runs a grep for EACH file being considered. And it's not a bash solution, it only relies on find which is pretty universal.
Note that ALL of these solutions are at risk of errors if you have files that contain newlines.
This line do perfectly the job
find . -type f | grep -vFf .rmignore
If you have rsync, you might be able to copy an empty directory to the target one, with suitable rsync ignore files. Try it first with -n, to see what it will attempt, before running it for real!
This is another bash solution that seems to work ok in my tests:
while read -r line;do
exclude+=$(find . -type f -path "./$line")$'\n'
done <.rmignore
echo "ignored files:"
printf '%s\n' "$exclude"
echo "files to be deleted"
echo rm $(LC_ALL=C sort <(find . -type f) <(printf '%s\n' "$exclude") |uniq -u ) #intentionally non quoted to remove new lines
Test it online here
Alternatively, you may want to look at the simplest format:
rm $(ls -1 | grep -v .rmignore)

How do I search for a file based on what is output by a command running on that file

I am working on a project for one of my professors and he asked me to sort a couple hundred .fits images based on their header files (specifically what star they are images of) I think that grep would be the best way to do this however I can't seam to figure out how to use grep based on the header.
I am entering:
ls | imhead *.fits | grep -E -r "PG\ 1104+243" *
to just list them out for now, once they are listed I know how to copy them into a directory.
I am new to using grep so I am unsure as to where my error lies? any help would be greatly appreciated! Thanks!
Assuming that imghead will extract the headers of the .fits as txt, you can use a simple shell script to do it:
script.sh
#!/bin/bash
grep "$1" "$2" > /dev/null 2>&1 && echo "$2"
Note that the + is a special character if you use extended regular expression, meaning if you pass the -E as in the question. A simple grep without any options should do the trick here.
Use find to exec the script on every *.fits file in the current folder:
find -maxdepth 1 -name '*.fits' -exec ./script.sh 'PG 1104+243' {} \;
If you are going to copy/move/alter or do something with the files you find, you might be better off, in terms of complexity and ease of quoting, using a loop like this:
#!/bin/bash
find . -name \*.fits -print0 | while read -d '' -r file; do
echo Checking file: $file
imhead "$file" | grep -q 'PG 1104+243'
if [ $? -eq 0 ]; then
echo Object matches: $file
fi
done

Resources