BASH: grep doesn't work in shell script but echo shows correct command and it works on command line - linux

I need to write a script that checks some >20k files for some >2k search text and it needs to be flexible, so I came up with this script:
#!/bin/bash
# This script checks all files in a given directory against a list of criteria
shopt -s expand_aliases
source ~/.bashrc
TIMESTAMP=$(date "+%Y-%m-%d-%T")
ROOT_DIR=/data
PROJECT_NAME=$1
FILE_DIR=$ROOT_DIR/projects/$1/$2
RESULT_DIR=$ROOT_DIR/projects/$1/check_result
SEARCHTEXT_FILE=$ROOT_DIR/scripts/$3
OIFS="$IFS"
IFS=$'\n'
files=$(find $FILE_DIR -type f -name '*.json')
for file in $files; do
while read line; do
grep -H -o $line "$file" >> $RESULT_DIR/check_result_$TIMESTAMP.log
done < $SEARCHTEXT_FILE
done
IFS="$OIFS"
This script only produces the empty $RESULT_DIR/check_result_$TIMESTAMP.log log file with correct name.
Because the file names sometimes contain spaces I added the IFS... statements and I enclosed $file in " quotes (copied from another post).
The content of the $SEARCHTEXT_FILE is for example:
'Tel alt........'
'City ..........'
If I place an echo before the grep like this
echo grep -H -o $line "$file"
then output I get is
grep -H -o 'Tel alt........' /data/projects/DNAR/input/report-157538.json
and I can execute this line as is and get the correct result.
I tried to put various combinations of " or ' or ` or () or {} around any part of this grep command but nothing changed.
Somewhere I did read about alias and the alias set for grep is
alias grep='grep --color=auto'
After many hours of searching on the internet I couldn't find any post that helped me as most of them are covering issues around wrong quotes or inline bash issues.
What are I missing here?

The simple and obvious workaround is to remove all that complexity and simply use the features of the commands you are running anyway.
find "$FILE_DIR" -type f -name '*.json' \
-exec grep -H -o -f "$SEARCHTEXT_FILE" {} + > "$RESULT_DIR/check_result_$TIMESTAMP.log"
Notice also the quoting fixes; see When to wrap quotes around a shell variable; to avoid mishaps, you should switch to lower case for your private variables (see Correct Bash and shell script variable capitalization).
shopt -s expand_aliases
and source ~/.bashrc merely look superfluous, but could contribute to whatever problem you are trying to troubleshoot; they should basically never be part of a script you plan to use in production.

Related

Concatenate (using bash) all file names in subdirectories with option

I have directory work_dir, and there are some subdirectories inside. And inside subdirectories there are zip archives. I can see all zip archives in terminal:
find . -name *.zip
The output:
./folder2/sub/dir/test2.zip
./folder3/test3.zip
./folder1/sub/dir/new/test1.zip
Now I want to concatinate all these file names in single row with some option. For example I want single row:
my_command -f ./folder2/sub/dir/test2.zip -f ./folder3/test3.zip -f ./folder1/sub/dir/new/test1.zip -u user1 -p pswd1
In this example:
my_command is some command
-f the option
-u user1 another option with value
-p pswd1 another option with value
Can you help me please, how can I do this in Linux BASH ?
One way is: (updated per #M. Nejat Aydin comments)
find . -name "*.zip" -print0 | xargs -0 -n1 printf -- '-f\0%s\0' | xargs -0 -n100000 my_command -u user1 -p pswd1
Note that -n100000 parameter forces all output of the previous xargs to be executed on the same line with the assumption that number of findings will be less than 100000.
I used null terminated versions (notice: -0 flag, -print0) because file names can contain spaces.
This is a bash script that should do what you wanted.
#!/usr/bin/env bash
user=user1
passwd=pswd1
while IFS= read -rd '' files; do
args+=(-f "$files")
done < <(find . -name '*.zip' -print0)
args=("${args[#]}" -u "$user" -p "$passwd")
##: Just for the human eye to see the output,
##: change this line of code according to the comment below.
printf 'mycommand %s\n' "${args[*]}"
The output should be in one-line, like what you wanted, but do change the last line from
printf 'mycommand %s\n' "${args[*]}"
into
mycommand "${args[#]}"
If you actually want to execute mycommand with the arguments.
Change the value of user and passwd too.
A while + read loop was used with IFS.
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
Why the last line should be change.
See Arguments
Shell quoting is a basic but common mistake when dealing with spaces in file/path name.
See How can I find and safely handle file names containing
Also the find command/utiliy.
The construct "${args[#}" is an array.
See Array1 Array2 Array3
You can do this by making a bash script.
Make a new file called whatever.sh
Type chmod +x ./whatever.sh so it becomes executable on the terminal
Add the BASH scripting as shown below..
#!/bin/bash
# Get all the zip files from your FolderName
files="`find ./FolderName -name *.zip`"
# Loop through the files and build your args
arg=""
for file in $files; do
arg="$arg -f $file"
done
# Run your command
mycommand $arg -u user1 -p pswd1

Deleting all files except ones mentioned in config file

Situation:
I need a bash script that deletes all files in the current folder, except all the files mentioned in a file called ".rmignore". This file may contain addresses relative to the current folder, that might also contain asterisks(*). For example:
1.php
2/1.php
1/*.php
What I've tried:
I tried to use GLOBIGNORE but that didn't work well.
I also tried to use find with grep, like follows:
find . | grep -Fxv $(echo $(cat .rmignore) | tr ' ' "\n")
It is considered bad practice to pipe the exit of find to another command. You can use -exec, -execdir followed by the command and '{}' as a placeholder for the file, and ';' to indicate the end of your command. You can also use '+' to pipe commands together IIRC.
In your case, you want to list all the contend of a directory, and remove files one by one.
#!/usr/bin/env bash
set -o nounset
set -o errexit
shopt -s nullglob # allows glob to expand to nothing if no match
shopt -s globstar # process recursively current directory
my:rm_all() {
local ignore_file=".rmignore"
local ignore_array=()
while read -r glob; # Generate files list
do
ignore_array+=(${glob});
done < "${ignore_file}"
echo "${ignore_array[#]}"
for file in **; # iterate over all the content of the current directory
do
if [ -f "${file}" ]; # file exist and is file
then
local do_rmfile=true;
# Remove only if matches regex
for ignore in "${ignore_array[#]}"; # Iterate over files to keep
do
[[ "${file}" == "${ignore}" ]] && do_rmfile=false; #rm ${file};
done
${do_rmfile} && echo "Removing ${file}"
fi
done
}
my:rm_all;
If we assume that none of the files in .rmignore contain newlines in their name, the following might suffice:
# Gather our exclusions...
mapfile -t excl < .rmignore
# Reverse the array (put data in indexes)
declare -A arr=()
for file in "${excl[#]}"; do arr[$file]=1; done
# Walk through files, deleting anything that's not in the associative array.
shopt -s globstar
for file in **; do
[ -n "${arr[$file]}" ] && continue
echo rm -fv "$file"
done
Note: untested. :-) Also, associative arrays were introduced with Bash 4.
An alternate method might be to populate an array with the whole file list, then remove the exclusions. This might be impractical if you're dealing with hundreds of thousands of files.
shopt -s globstar
declare -A filelist=()
# Build a list of all files...
for file in **; do filelist[$file]=1; done
# Remove files to be ignored.
while read -r file; do unset filelist[$file]; done < .rmignore
# Annd .. delete.
echo rm -v "${!filelist[#]}"
Also untested.
Warning: rm at your own risk. May contain nuts. Keep backups.
I note that neither of these solutions will handle wildcards in your .rmignore file. For that, you might need some extra processing...
shopt -s globstar
declare -A filelist=()
# Build a list...
for file in **; do filelist[$file]=1; done
# Remove PATTERNS...
while read -r glob; do
for file in $glob; do
unset filelist[$file]
done
done < .rmignore
# And remove whatever's left.
echo rm -v "${!filelist[#]}"
And .. you guessed it. Untested. This depends on $f expanding as a glob.
Lastly, if you want a heavier-weight solution, you can use find and grep:
find . -type f -not -exec grep -q -f '{}' .rmignore \; -delete
This runs a grep for EACH file being considered. And it's not a bash solution, it only relies on find which is pretty universal.
Note that ALL of these solutions are at risk of errors if you have files that contain newlines.
This line do perfectly the job
find . -type f | grep -vFf .rmignore
If you have rsync, you might be able to copy an empty directory to the target one, with suitable rsync ignore files. Try it first with -n, to see what it will attempt, before running it for real!
This is another bash solution that seems to work ok in my tests:
while read -r line;do
exclude+=$(find . -type f -path "./$line")$'\n'
done <.rmignore
echo "ignored files:"
printf '%s\n' "$exclude"
echo "files to be deleted"
echo rm $(LC_ALL=C sort <(find . -type f) <(printf '%s\n' "$exclude") |uniq -u ) #intentionally non quoted to remove new lines
Test it online here
Alternatively, you may want to look at the simplest format:
rm $(ls -1 | grep -v .rmignore)

How do I search for a file based on what is output by a command running on that file

I am working on a project for one of my professors and he asked me to sort a couple hundred .fits images based on their header files (specifically what star they are images of) I think that grep would be the best way to do this however I can't seam to figure out how to use grep based on the header.
I am entering:
ls | imhead *.fits | grep -E -r "PG\ 1104+243" *
to just list them out for now, once they are listed I know how to copy them into a directory.
I am new to using grep so I am unsure as to where my error lies? any help would be greatly appreciated! Thanks!
Assuming that imghead will extract the headers of the .fits as txt, you can use a simple shell script to do it:
script.sh
#!/bin/bash
grep "$1" "$2" > /dev/null 2>&1 && echo "$2"
Note that the + is a special character if you use extended regular expression, meaning if you pass the -E as in the question. A simple grep without any options should do the trick here.
Use find to exec the script on every *.fits file in the current folder:
find -maxdepth 1 -name '*.fits' -exec ./script.sh 'PG 1104+243' {} \;
If you are going to copy/move/alter or do something with the files you find, you might be better off, in terms of complexity and ease of quoting, using a loop like this:
#!/bin/bash
find . -name \*.fits -print0 | while read -d '' -r file; do
echo Checking file: $file
imhead "$file" | grep -q 'PG 1104+243'
if [ $? -eq 0 ]; then
echo Object matches: $file
fi
done

Bash Script Variable

#!/bin/bash
RESULT=$(grep -i -e "\.[a-zA-z]\{3\}$" ./test.txt)
for i in $(RESULT);
do
echo "$i"
FILENAME="$(dirname $RESULT)"
done
I have a problem with the line FILENAME="$(dirname $RESULT)". Running the script in debugging mode(bash -x script-name), the ouput is:
test.sh: line 9: RESULT: command not found
For some reason, it can't take the result of the variable RESULT and save the output of dir command to the new variable FILENAME. I can't understand why this happens.
After lots of tries, I found the solution to save full path of finame and finame to two different variables.
Now, I want for each finame, find non-case sensitive of each filename. For example, looking for file image.png, it doesn't matter if the file is image.PNG
I am running the script
while read -r name; do
echo "$name"
FILENAME="$(dirname $name)"
BASENAME="$(basename $name)"
done < <(grep -i -e "\.[a-zA-z]\{3\}$" ./test.txt)
and then enter the command:
find . $FILENAME -iname $BASENAME
but it says command FILENAME and BASENAME not found.
The syntax:
$(RESULT)
denotes command substitution. Saying so would attempt to run the command RESULT.
In order to substitute the result of the variable RESULT, say:
${RESULT}
instead.
Moreover, if the command returns more than one line of output this approach wouldn't work.
Instead say:
while read -r name; do
echo "$name"
FILENAME="$(dirname $name)"
done < <(grep -i -e "\.[a-zA-z]\{3\}$" ./test.txt)
The <(command) syntax is referred to as Process Substitution.
for i in $(RESULT) isn't right.You can use $RESULT or ${RESULT}

Escaping space in bash script

I am trying to make a script to append all files ending with .hash to be verified by md5deep. Files with space in their name seem to break this script.
#!/bin/bash
XVAR=""
for f in *.hash
do
XVAR="$XVAR -x $f "
done
md5deep -e $XVAR -r *
Whenever i run the script with a file called "O S.hash" i would get
O: No such file or directory
If i change XVAR="$XVAR -x $f " to XVAR="$XVAR -x \'$f\' " or XVAR="$XVAR -x \"$f\" "
md5deep will interpenetrate the input as "O instead
"O: No such file or directory
an echo of the variable in the script shows XVAR as -x 'O S.hash' or -x "O S.hash"
a manual input of the command in shell such as md5deep -e -x "O S.hash" -r * works but if its in the script the command seems to break
This is not the nicest solution, but is seems it will work:
find . -name '*.hash' -printf "-x\0%p\0" | xargs -0 md5deep -r * -e
This actually doesn't do exactly the same as the OP wanted, so here's a modification as suggested by Tim Pote and Jonathan Leffler:
find . -maxdepth 1 -name '*.hash' -printf "-x\0%p\0" | xargs -0 md5deep -r * -e
Now you know why people on Unix systems traditionally avoided file names with spaces in them (and directory names likewise) — it is a nuisance (to be polite about it) to have to program the shell to handle such names. The shell was designed for use in systems without such names. Newlines also cause much grief.
With bash, your best solution by far is to use an array to hold the elements, and then "${array[#]}" to list them; it is almost trivial:
declare -a XVAR
for file in *.hash
do
XVAR+=("-x" "$file")
done
md5deep -e "${XVAR[#]}" -r *
(Exploiting the array extension notation mentioned by Gordon Davisson. See section §6.7 'Arrays' of the bash reference manual (for Bash 4.1) for a lot of array information; see section §3.4 'Shell Parameters' for the += operator.)
If you can't use arrays for some reason, then you need a program that escapes its arguments so that the shell won't distort things. I have such a program, called escape:
XVAR=
for file in *.hash
do
name=$(escape "$file")
XVAR="$XVAR -x $file"
done
eval md5deep -e $XVAR -r *
With the eval, it is tricky to use; it works, but use arrays.

Resources