Search for multiple patterns in multiple files

Search for multiple patterns in multiple files - linux

This is related to Function to search of multiple patterns using grep
I want to search multiple files with multiple patterns using command such as follows:
myscript *.txt pattern1 pattern2 pattern3
I tried implementing the codes in the previous question but they do not work with wildcards. For example, following does not work:
#!/bin/bash
ARGS=$#
if [ $ARGS -lt 2 ]
then
echo "You entered only $ARGS arguments- at least 2 are needed."
exit
fi
search() {
if [ $# -gt 0 ]
then
local pat=$1
shift
grep -i "$pat" | search "$#"
else
cat
fi
}
for VAR in $1
do
file=$VAR
shift
cat "$file" | search "$#"
done
How can I create a script which can search for multiple files (taking it from first argument) to search multiple patterns (from rest of arguments)?

Did you try to use find and sed?
find . -name *.txt -exec sed -n -e '/pattern1/p' -e '/pattern2/' '{}' ';'
The -n option will make sure sed does not print all the file, and the p command prints the matching lines. Finaly, find will get all the files you need.
EDIT:
If you want to put that in a script to generate the sed command, you can use this trick.
EDIT 2:
As #shellter said, it is usually better to use options, and as your script is written, *.txt will be expanded by bash. To avoid that, you'll need to quote the first argument.
As usual, there is several solutions to your problem:
Solution 1 (Using bash built-in):
#! /usr/bin/env bash
set -o nounset # Throw error if variable not set
set -o errexit # Exit if error is thrown
work_dir=$PWD # directory to search from
# Reading the command line
files_pattern=${1:-}; # Save first argument as files pattern.
shift 1; # Move $1 to next argument (and propagate such as $n gets $n+1)
echo "==> Files to search follow pattern: ${files_pattern}"
_len=$#; #save the number of arguments.
for (( i=0; i<$_len; i=$i+1 )); # Go through the search patterns.
do
search_patterns[$i]=$1; # store the next search pattern
shift 1; # move $1 to next patern.
echo "==> New search pattern #$i: ${search_patterns[$i]}"
done
while read -r file; # Go through all the matching files
do
echo "==> In file: ${file}"
while read -r line; # Go though all the lines in the file
do
for regex in "${search_patterns[#]}"; # iterate trough patterns
do
[[ "${line}" =~ $regex ]] && echo "${line}";
done
done < ${file}
done < <(find $work_dir -iname $files_pattern -print) # find all the files matching file_pattern
Solution 2 (using grep):
#! /usr/bin/env bash
set -o nounset # Throw error if variable not set
set -o errexit # Exit if error is thrown
work_dir=$PWD # directory to search from
# Reading the command line
files_pattern=${1:-}; # Save first argument as files pattern.
shift 1; # Move $1 to next argument (and propagate such as $n gets $n+1)
echo "==> Files to search follow pattern: ${files_pattern}"
while [ $# -gt 0 ]; # Go through the search patterns.
do
search_patterns+="$1"; # store the next search pattern
shift 1; # move $1 to next patern.
[ $# -gt 0 ] && search_patterns+="|" #Add or option
done
echo "==> Search patterns: ${search_patterns}"
cd ${work_dir} && egrep -iR '('"${search_patterns}"')' && cd -;
Solution 3 (Using sed):
#! /usr/bin/env bash
set -o nounset # Throw error if variable not set
set -o errexit # Exit if error is thrown
work_dir=$PWD # directory to search from
# Reading the command line
files_pattern=${1:-}; # Save first argument as files pattern.
shift 1; # Move $1 to next argument (and propagate such as $n gets $n+1)
echo "==> Files to search follow pattern: ${files_pattern}"
while [ $# -gt 0 ]; # Go through the search patterns.
do
search_patterns+="/$1/p;"; # store the next search pattern
shift 1; # move $1 to next patern.
[ $# -gt 0 ] && search_patterns+=" " #Add or option
done
echo "==> Search patterns: ${search_patterns}"
# Will print file names, and then matching lines
find "$work_dir" -iname "$files_pattern" -print -exec sed -n "${search_patterns}" '{}' ';'
I am sure there is plenty other ways to tweak or solve this problem, but this should get you started.
Good Luck!

Related

Bash Loop with counter gives a count of 1 when no item found. Why?

In the function below my counter works fine as long as an item is found in $DT_FILES. If the find is empty for that folder the counter gives me a count of 1 instead of 0. I am not sure what I am missing.
What the script does here is 1) makes a variable containing all the parent folders. 2) Loop through each folder, cd inside each one and makes a list of all files that contain the string "-DT-". 3) If it finds a file that doesn't not end with ".tif", it then copy the DT files and put a .tif extension to it. Very simple.
I count the number of times the loop did create a new file with the ".tif" extension.
So I am not sure why I am getting a count of 1 at times.
function create_tifs()
{
IFS=$'\n'
# create list of main folders
LIST=$( find . -maxdepth 1 -mindepth 1 -type d )
for f in $LIST
do
echo -e "\n${OG}>>> Folder processed: ${f} ${NONE}"
cd ${f}
DT_FILES=$(find . -type f -name '*-DT-*' | grep -v '.jpg')
if (( ${#DT_FILES} ))
then
count=0
for b in ${DT_FILES}
do
if [[ "${b}" != *".tif" ]]
then
# cp -n "${b}" "${b}.tif"
echo -e "TIF created ${b} as ${b}.tif"
echo
((count++))
else
echo -e "TIF already done ${b}"
fi
done
fi
echo -e "\nCount = ${count}"
}

I can't repro your problem, but your code contains several dubious constructs. Here is a refactoring might coincidentally also remove whatever problem you were experiencing.
#!/bin/bash
# Don't use non-portable function definition syntax
create_tifs() {
# Don't pollute global namespace; don't attempt to parse find output
# See also https://mywiki.wooledge.org/BashFAQ/020
local f
for f in ./*/; do
# prefer printf over echo -e
# print diagnostic messages to standard error >&2
# XXX What are these undeclared global variables?
printf "\n%s>>> Folder processed: %s %s" "$OG" "$f" "$NONE" >&2
# Again, avoid parsing find output
find "$f" -name '*-DT-*' -not -name '*.jpg' -exec sh -c '
for b; do
if [[ "${b}" != *".tif" ]]
then
# cp -n "${b}" "${b}.tif"
printf "TIF created %s as %s.tif\n" "$b" "$b" >&2
# print one line for wc
printf ".\n"
else
# XXX No newline, really??
printf "TIF already done %s" "$b" >&2
fi
done
fi' _ {} +
# Missing done!
done |
# Count lines produced by printf inside tif creation
wc -l |
sed 's/.*/Count = &/'
}
This could be further simplified by using find ./*/ instead of looping over f but then you don't (easily) get to emit a diagnostic message for each folder separately. Similarly, you could add -not -name '*.tif' but then you don't get to print "tif already done" for those.
Tangentially perhaps see also Correct Bash and shell script variable capitalization; use lower case for your private variables.
Printing a newline before your actual message (like in the first printf) is a weird antipattern, especially when you don't do that consequently. The usual arrangement would be to put a newline at the end of each emitted message.

If you've got Bash 4.0 or later you can use globstar instead of (the error-prone) find. Try this Shellcheck-clean code:
#! /bin/bash -p
shopt -s dotglob extglob nullglob globstar
function create_tifs
{
local dir dtfile
local -i count
for dir in */; do
printf '\nFolder processed: %s\n' "$dir" >&2
count=0
for dtfile in "$dir"**/*-DT-!(*.jpg); do
if [[ $dtfile == *.tif ]]; then
printf 'TIF already done %s\n' "$dtfile" >&2
else
cp -v -n -- "$dtfile" "$dtfile".tif
count+=1
fi
done
printf 'Count = %d\n' "$count" >&2
done
return 0
}
shopt -s ... enables some Bash settings that are required by the code:
dotglob enables globs to match files and directories that begin with .. find shows such files by default.
extglob enables "extended globbing" (including patterns like !(*.jpg)). See the extglob section in glob - Greg's Wiki.
nullglob makes globs expand to nothing when nothing matches (otherwise they expand to the glob pattern itself, which is almost never useful in programs).
globstar enables the use of ** to match paths recursively through directory trees.
Note that globstar is potentially dangerous in versions of Bash prior to 4.3 because it follows symlinks, possibly leading to processing the same file or directory multiple times, or getting stuck in a cycle.
The -v option with cp causes it to print details of what it does. You might prefer to drop the option and print a different format of message instead.
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why I used printf instead of echo.
I didn't use cd because it often leads to problems in programs.

shell script to find a word in a list of files, all of them given as parameters

I need a simple shell program which has to do something like this:
script.sh word_to_find file1 file2 file3 .... fileN
which will display
word_to_find 3 - if word_to_find appears in 3 files
or
word_to_find 5 - if word_to_find appears in 5 files
This is what I've tried
#!/bin/bash
count=0
for i in $#; do
if [ grep '$1' $i ];then
((count++))
fi
done
echo "$1 $count"
But this message appears:
syntax error: "then" unexpected (expecting "done").
Before this the error was
[: grep: unexpected operator.

Try this:
#!/bin/sh
printf '%s %d\n' "$1" $(grep -hm1 "$#" | wc -l)
Notice how all the script's arguments are passed verbatim to grep -- the first is the search expression, the rest are filenames.
The output from grep -hm1 is a list of matches, one per file with a match, and wc -l counts them.
I originally posted this answer with grep -l but that would require filenames to never contain a newline, which is a rather pesky limitation.
Maybe add an -F option if regular expression search is not desired (i.e. only search literal text).

The code you showed is:
#!/bin/bash
count=0
for i in $#; do
if [ grep '$1' $i ];then
((count++))
fi
done
echo "$1 $count"
When I run it, I get the error:
script.sh: line 5: [: $1: binary operator expected
This is reasonable, but it is not the same as either of the errors reported in the question. There are multiple problems in the code.
The for i in $#; do should be for i in "$#"; do. Always use "$#" so that any spaces in the arguments are preserved. If none of your file names contain spaces or tabs, it is not critical, but it is a good habit to get into. (See How to iterate over arguments in bash script for more information.)
The if operations runs the [ (aka test) command, which is actually a shell built-in as well as a binary in /bin or /usr/bin. The use of single quotes around '$1' means that the value is not expanded, and the command sees its arguments as:
[
grep
$1
current-file-name
]
where the first is the command name, or argv[0] in C, or $0 in shell. The error I got is because the test command expects an operator such as = or -lt at the point where $1 appears (that is, it expects a binary operator, not $1, hence the message).
You actually want to test whether grep found the word in $1 in each file (the names listed after $1). You probably want to code it like this, then:
#!/bin/bash
word="$1"
shift
count=0
for file in "$#"
do
if grep -l "$word" "$file" >/dev/null 2>&1
then ((count++))
fi
done
echo "$word $count"
We can negotiate on the options and I/O redirections used with grep. The POSIX grep
options -q and/or -s options provide varying degrees of silence and -q could be used in place of -l. The -l option simply lists the file name if the word is found, and stops scanning on the first occurrence. The I/O redirection ensures that errors are thrown away, but the test ensures that successful matches are counted.
Incorrect output claimed
It has been claimed that the code above does not produce the correct answer. Here's the test I performed:
$ echo "This country is young" > young.iii
$ echo "This country is little" > little.iii
$ echo "This fruit is fresh" > fresh.txt
$ bash findit.sh country young.iii fresh.txt little.iii
country 2
$ bash -x findit.sh country young.iii fresh.txt little.iii
+ '[' -f /etc/bashrc ']'
+ . /etc/bashrc
++ '[' -z '' ']'
++ return
+ alias 'r=fc -e -'
+ word=country
+ shift
+ count=0
+ for file in '"$#"'
+ grep -l country young.iii
+ (( count++ ))
+ for file in '"$#"'
+ grep -l country fresh.txt
+ for file in '"$#"'
+ grep -l country little.iii
+ (( count++ ))
+ echo 'country 2'
country 2
$
This shows that for the given files, the output is correct on my machine (Mac OS X 10.10.2; GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14)). If the equivalent test works differently on your machine, then (a) please identify the machine and the version of Bash (bash --version), and (b) please update the question with the output you see from bash -x findit.sh country young.iii fresh.txt little.iii. You may want to create a sub-directory (such as junk), and copy findit.sh into that directory before creating the files as shown, etc.
You could also bolster your case by showing the output of:
$ grep country young.iii fresh.txt little.iii
young.iii:This country is young
little.iii:This country is little
$

#!/usr/bin/perl
use strict;
use warnings;
my $wordtofind = shift(#ARGV);
my $regex = qr/\Q$wordtofind/s;
my #file = ();
my $count = 0;
my $filescount = scalar(#ARGV);
for my $file(#ARGV)
{
if(-e $file)
{
eval { open(FH,'<' . $file) or die "can't open file $file "; };
unless($#)
{
for(<FH>)
{
if(/$regex/)
{
$count++;
last;
}
}
close(FH);
}
}
}
print "$wordtofind $count\n";

You could use an Awk script:
#!/usr/bin/env awk -f
BEGIN {
n=0
} $0 ~ w {
n++
} END {
print w,n
}
and run it like this:
./script.awk w=word_to_find file1 file2 file3 ... fileN
or if you don't want to worry about assigning a variable (w) on the command line:
BEGIN {
n=0
w=ARGV[1]
delete ARGV[1]
} $0 ~ w {
n++
} END {
print w,n
}

Recursive directory listing in shell without using ls

I am looking for a script that recursively lists all files using export and read link and by not using ls options. I have tried the following code, but it does not fulfill the purpose. Please can you help.
My Code-
#!/bin/bash
for i in `find . -print|cut -d"/" -f2`
do
if [ -d $i ]
then
echo "Hello"
else
cd $i
echo *
fi
done

Here's a simple recursive function which does a directory listing:
list_dir() {
local i # do not use a global variable in our for loop
# ...note that 'local' is not POSIX sh, but even ash
# and dash support it.
[[ -n $1 ]] || set -- . # if no parameter is passed, default to '.'
for i in "$1"/*; do # look at directory contents
if [ -d "$i" ]; then # if our content is a directory...
list_dir "$i" # ...then recurse.
else # if our content is not a directory...
echo "Found a file: $i" # ...then list it.
fi
done
}
Alternately, if by "recurse", you just mean that you want the listing to be recursive, and can accept your code not doing any recursion itself:
#!/bin/bash
# ^-- we use non-POSIX features here, so shebang must not be #!/bin/sh
while IFS='' read -r -d '' filename; do
if [ -f "$filename" ]; then
echo "Found a file: $filename"
fi
done < <(find . -print0)
Doing this safely calls for using -print0, so that names are separated by NULs (the only character which cannot exist in a filename; newlines within names are valid.

List only common parent directories for files

I am searching for one file, say "file1.txt", and output of find command is like below.
/home/nicool/Desktop/file1.txt
/home/nicool/Desktop/dir1/file1.txt
/home/nicool/Desktop/dir1/dir2/file1.txt
In above cases I want only common parent directory, which is "/home/nicool/Desktop" in above case. How it can be achieved using bash? Please help to find general solution for such problem.

This script reads lines and stores the common prefix in each iteration:
# read a line into the variable "prefix", split at slashes
IFS=/ read -a prefix
# while there are more lines, one after another read them into "next",
# also split at slashes
while IFS=/ read -a next; do
new_prefix=()
# for all indexes in prefix
for ((i=0; i < "${#prefix[#]}"; ++i)); do
# if the word in the new line matches the old one
if [[ "${prefix[i]}" == "${next[i]}" ]]; then
# then append to the new prefix
new_prefix+=("${prefix[i]}")
else
# otherwise break out of the loop
break
fi
done
prefix=("${new_prefix[#]}")
done
# join an array
function join {
# copied from: http://stackoverflow.com/a/17841619/416224
local IFS="$1"
shift
echo "$*"
}
# join the common prefix array using slashes
join / "${prefix[#]}"
Example:
$ ./x.sh <<eof
/home/nicool/Desktop1/file1.txt
/home/nicool/Desktop2/dir1/file1.txt
/home/nicool/Desktop3/dir1/dir2/file1.txt
eof
/home/nicool

I don't think there's a bash builtin for this, but you can use this script, and pipe your find into it.
read -r FIRSTLINE
DIR=$(dirname "$FIRSTLINE")
while read -r NEXTLINE; do
until [[ "${NEXTLINE:0:${#DIR}}" = "$DIR" || "$DIR" = "/" ]]; do
DIR=$(dirname "$DIR")
done
done
echo $DIR
For added safety, use -print0 on your find, and adjust your read statements to have -d '\0'. This will work with filenames that have newlines.

lcp() {
local prefix path
read prefix
while read path; do
while ! [[ $path =~ ^"$prefix" ]]; do
[[ $prefix == $(dirname "$prefix") ]] && return 1
prefix=$(dirname "$prefix")
done
done
printf '%s\n' "$prefix"
return 0
}
This finds the longest common prefix of all of the lines of standard input.
$ find / -name file1.txt | lcp
/home/nicool/Desktop

Running diff and have it stop on a difference

I have a script running that is checking multiples directories and comparing them to expanded tarballs of the same directories elsewhere.
I am using diff -r -q and what I would like is that when diff finds any difference in the recursive run it will stop running instead of going through more directories in the same run.
All help appreciated!
Thank you
#bazzargh I did try it like you suggested or like this.
for file in $(find $dir1 -type f);
do if [[ $(diff -q $file ${file/#$dir1/$dir2}) ]];
then echo differs: $file > /tmp/$runid.tmp 2>&1; break;
else echo same: $file > /dev/null; fi; done
But this only works with files that exist in both directories. If one file is missing I won't get information about that. Also the directories I am working with have over 300.000 files so it seems to be a bit of overhead to do a find for each file and then diff.
I would like something like this to work, with and elif statement that checks if $runid.tmp contains data and breaks if it does. I added 2> after the first if statement so stderr is sent to the $runid.tmp file.
for file in $(find $dir1 -type f);
do if [[ $(diff -q $file ${file/#$dir1/$dir2}) ]] 2> /tmp/$runid.tmp;
then echo differs: $file > /tmp/$runid.tmp 2>&1; break;
elif [[ -s /tmp/$runid.tmp ]];
then echo differs: $file >> /tmp/$runid.tmp 2>&1; break;
else echo same: $file > /dev/null; fi; done
Would this work?

You can do the loop over files with 'find' and break when they differ. eg for dirs foo, bar:
for file in $(find foo -type f); do if [[ $(diff -q $file ${file/#foo/bar}) ]]; then echo differs: $file; break; else echo same: $file; fi; done
NB this will not detect if 'bar' has directories that do not exist in 'foo'.
Edited to add: I just realised I overlooked the really obvious solution:
diff -rq foo bar | head -n1

It's not 'diff', but with 'awk' you can compare two files (or more) and then exit when they have a different line.
Try something like this (sorry, it's a little rough)
awk '{ h[$0] = ! h[$0] } END { for (k in h) if (h[k]) exit }' file1 file2
Sources are here and here.
edit: to break out of the loop when two files have the same line, you may have to do the loop in awk. See here.

You can try the following:
#!/usr/bin/env bash
# Determine directories to compare
d1='./someDir1'
d2='./someDir2'
# Loop over the file lists and diff corresponding files
while IFS= read -r line; do
# Split the 3-column `comm` output into indiv. variables.
lineNoTabs=${line//$'\t'}
numTabs=$(( ${#line} - ${#lineNoTabs} ))
d1Only='' d2Only='' common=''
case $numTabs in
0)
d1Only=$lineNoTabs
;;
1)
d2Only=$lineNoTabs
;;
*)
common=$lineNoTabs
;;
esac
# If a file exists in both directories, compare them,
# and exit if they differ, continue otherwise
if [[ -n $common ]]; then
diff -q "$d1/$common" "$d2/$common" || {
echo "EXITING: Diff found: '$common'" 1>&2;
exit 1; }
# Deal with files unique to either directory.
elif [[ -n $d1Only ]]; then # fie
echo "File '$d1Only' only in '$d1'."
else # implies: if [[ -n $d2Only ]]; then
echo "File '$d2Only' only in '$d2."
fi
# Note: The `comm` command below is CASE-SENSITIVE, which means:
# - The input directories must be specified case-exact.
# To change that, add `I` after the last `|` in _both_ `sed commands`.
# - The paths and names of the files diffed must match in case too.
# To change that, insert `| tr '[:upper:]' '[:lower:]' before _both_
# `sort commands.
done < <(comm \
<(find "$d1" -type f | sed 's|'"$d1/"'||' | sort) \
<(find "$d2" -type f | sed 's|'"$d2/"'||' | sort))
The approach is based on building a list of files (using find) containing relative paths (using sed to remove the root path) for each input directory, sorting the lists, and comparing them with comm, which produces 3-column, tab-separated output to indicated which lines (and therefore files) are unique to the first list, which are unique to the second list, and which lines they have in common.
Thus, the values in the 3rd column can be diffed and action taken if they're not identical.
Also, the 1st and 2nd-column values can be used to take action based on unique files.
The somewhat complicated splitting of the 3 column values output by comm into individual variables is necessary, because:
read will treat multiple tabs in sequence as a single separator
comm outputs a variable number of tabs; e.g., if there's only a 1st-column value, no tab is output at all.

I got a solution to this thanks to #bazzargh.
I use this code in my script and now it works perfectly.
for file in $(find ${intfolder} -type f);
do if [[ $(diff -q $file ${file/#${intfolder}/${EXPANDEDROOT}/${runid}/$(basename ${intfolder})}) ]] 2> ${resultfile}.tmp;
then echo differs: $file > ${resultfile}.tmp 2>&1; break;
elif [[ -s ${resultfile}.tmp ]];
then echo differs: $file >> ${resultfile}.tmp 2>&1; break;
else echo same: $file > /dev/null;
fi; done
thanks!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Search for multiple patterns in multiple files - linux

Related

Bash Loop with counter gives a count of 1 when no item found. Why?

shell script to find a word in a list of files, all of them given as parameters

Recursive directory listing in shell without using ls

List only common parent directories for files

Running diff and have it stop on a difference

Categories

Resources