how to use && with grep in bash - linux

I want to check if multiple lines in a file exist in bash.
so for that I use grep -q which works with only one line:
if grep -q string1 "/path/to/file";then
echo 'exists'
else
echo 'does not exist'
fi
I tried many things in various combinations, for example:
if grep -q [ string1 ] && grep -q [ string2 ] "path/to/file";then
I also tried it with -E:
grep -E 'pattern1' filename | grep -E 'pattern2'
but nothing seems to work. Any ideas?

Rather than running multiple grep commands you can use this gnu-awk command to assert presence of multiple strings in a file:
awk -v RS='\\Z' '/string1/ && /string2/ && /string3/{e=1} END{exit !e}' file &&
echo 'exists' || echo 'does not exist'
RS=\Z will make awk read all the input in a single record separator
Using && between multiple search terms will make sure all the search words exist in input file
This will print exists only if all 3 search terms exists in the input file.

since #iruvar hasn't posted his comment as answer, i'll put it here:
grep -q string_1 file && grep -q string_2 file
now, here is my contribution. is #anubhava's more computationally complex awk answer, which reads the file only once, any faster than #iruvar's simpler answer, which reads the file three times?
awk 11.730 s
grep && grep 0.258 s
no.
this surely will depend on the speed of the filesystem vs the cpu, and on how much caching goes on, but on my system, which is probably a typical B+/A- workstation, grep kw1 file && grep kw2 file && grep kw3 file is ~50x as fast as #anubhava's awk solution. this held true both on ssd and spindle raid. (details: test file was 5,000,000 lines, 160M, and had kw1 on the first line, kw2 on the 2.5 millionth, and kw3 on the 5 millionth.)
some easy optimization is possible, for example, if you can solve your problem by matching whole lines, do so (with grep -x); it's twice as fast in this case.
for many (e.g., >1,000) files, it is faster to use grep -l and xargs:
grep -l kw1 *.txt | xargs grep -l kw2 | xargs grep -q kw3
as opposed to a loop:
for f in *.txt; do
grep -q kw1 $f && grep -q kw2 $f && grep -q kw3 $f
done
with the same test file, grep -l | xargs grep took 0.258 s, just like grep && grep. with two test files, it was still no faster than grep && grep. with 2000 test files of 5,000 lines each, none of which contained any matches, grep -l | xargs grep was ~10x as faster as grep && grep.

There are a couple ambiguities in your question, but assuming you want pattern_1 and pattern_2 to exist in a file (not on the same line) then you can do this.
for file in *; do
egrep -q pattern_1 $file && egrep -q pattern_2 $file && echo $file
done

With grep -p you can match multiply patterns in the same line:
grep -P '(?=.*string1)(?=.*string2)' file
The above will print lines that matches string1 and string2.
(?=...) is a positive lookaheads which matches a pattern without making it a part of the match.
And -z will slurp the whole file:
% seq 1 100 | grep -qzP '(?=.*1)(?=.*5)'; echo $?
0
% seq 1 100 | grep -qzP '(?=.*1)(?=.*a)'; echo $?
1

You can do it like this:
if grep -q 'string1' /path/to/file; then
if grep -q 'string2' /path/to/file; then
echo exists
else
echo 'does not exist'
else
echo 'does not exist'
fi
Or:
grep -q 'string1' /path/to/file &&
grep -q 'string2' /path/to/file &&
echo exists ||
echo 'does not exist'

you can use "-q" to search using grep
if grep -q string1 "/path/to/file" && grep -q string2 "/path/to/file";then
echo 'exists'
else
echo 'does not exist'
fi

Related

find and grep: get filenames

I need to find the reports (.docx files), read them with docx2txt, find the second match of "passed" (excluding "not passed") and save these filenames to text file. Here is what I tried:
OIFS="$IFS"
IFS=$'\n'
for f in $(find . -wholename '*_done/(*Report*.docx' |grep -v appendix)
do
docx2txt "$f" - | (grep -q -m2 passed || grep -q -v "not passed") || echo $f >> failed
done
IFS="$OIFS"
But this script gives me an empty file. If I replace || to && before echo, all filenames are stored into the file. grep works fine if it is not in the script, as well as docx2txt. What am I doing wrong here?
There are quite a lot problems with the grep commands.
grep -q always exits successfully on the first match.
With -q the -m2 has no effect. If there is one match grep exits successfully. It does not check if there is a second match.
To check that there are (at least) two matches, count the matches and then use test/[ ] to check the number of found matches. If there is at most one passed per line, grep -c is sufficient. If there can be multiple matches per line, you need grep -o ... | wc -l.
-q and -v together means: Is there at least one line that does not contain the pattern? When grep finds such a line it exits successfully. The only way for this command to fail is an input in which every line contains not passed (this includes the empty file).
Matching passed but not not passed is trickier than one might suspect. If there can be at most one passed/not passed per line, you can use grep -v 'not passed' | grep passed. Otherwise you need a need negative lookbehind, which is only available in perl compatible regular expressions (PCRE).
In addition to that command | (grep ... || grep ...) might not do what you expect. command produces output only once. After the first grep read some of this output, that read part is gone. The second grep will then continue reading where the first grep stopped.
BTW: for … in $(find … | grep -v …) can be turned into a single, safe find command using -not and -exec.
Solution
If each line contains at most one passed/not passed, use
find . -wholename '*_done/(*Report*.docx' -not -wholename '*appendix*' \
-exec sh -c '[ $(docx2txt "$0" - | grep -v "not passed" | grep -cm2 passed) = 2 ]' {} \; -print
If there can be multiple passed/not passed per line, you need GNU grep or pcregrep:
find . -wholename '*_done/(*Report*.docx' -not -wholename '*appendix*' \
-exec sh -c '[ $(docx2txt "$0" - | grep -Pom2 "(?<!not )passed" | wc -l) = 2 ]' {} \; -print
When you run into a problem like this, it's a good idea to remove as much code as possible. If we just take that one line with the multiple grep statements, we can first verify that the current expression doesn't work:
$ echo passed | ((grep -q -m2 passed || grep -q -v "not passed") || echo failed
$ echo not passed | ((grep -q -m2 passed || grep -q -v "not passed") || echo failed
We can see that neither of these commands produces at any output.
Let's think carefully about the logic:
The || operator means "if the first command doesn't succeed, run the second command". So in both cases, the first grep succeeds (because both passed and not passed contain the phrase passed). This means the second grep will never run, and it means that since the first command was successful, the entire grep ... || grep ... command will be successful, and that means the final echo $f will never run.
I was trying to think of a clever way to solve this, but it seems simplest if we make use of a temporary file:
OIFS="$IFS"
IFS=$'\n'
tmpfile=$(mktemp docXXXXXX)
trap "rm -f $tmpfile" EXIT
for f in $(find . -wholename '*_done/(*Report*.docx' |grep -v appendix)
do
docx2txt "$f" - | head -2 > $tmpfile
if grep -q passed $tmpfile && ! grep -q 'not passed' $tmpfile; then
echo $f >> failed
fi
done
IFS="$OIFS"

Running variable string match against grep search?

I've defined the variables here to shorten the logic a little. The wget works fine (downloads the correct file) and grepping for tar.gz works in the wget.log
The issue is the match to another file!
Basically, if it's on a blacklist I want it to skip!
var1=https://somewebsite.com/directory
line1=directory
sudo wget -O wget.log https://somewebsite.com/$line1/releases
if grep -q "tar.gz" wget.log | "$var1" -ne grep -q
"https://somewebsite.com/$line1" banned; then
echo "Good Job!"
else
echo "Skip!"
fi
Use && to test if both of the grep commands succeed
if grep -q -F 'tar.gz' wget.log && grep -q -F -x "$variable" banned
then
echo "Skip!"
else
echo "Good Job!"
fi
I've used the -F option to grep because none of the strings we're searching for are regular expressions, they're fixed strings. And I used -x in the second grep to match the whole line in the blacklist.

Concatenating xargs with the use of if-else in bash

I've got two test files, namely, ttt.txt and ttt2.txt, the Content of which is shown as below:
#ttt.txt
(132) 123-2131
543-732-3123
238-3102-312
#ttt2.txt
1
2
3
I've already tried the following commands in bash and it works fine:
if grep -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" ttt1.txt ; then echo "found"; fi
# with output 'found'
if grep -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" ttt2.txt ; then echo "found"; fi
But when I combine the above command with xargs, it complains error '-bash: syntax error near unexpected token `then''. Could anyone give me some explanation? Thanks in advance!
ll | awk '{print $9}' | grep ttt | xargs -I $ if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" $; then echo "found"; fi
$ is a special character in bash (it marks variables) so don't use it as your xargs marker, you'll only get confused.
The real problem here though is that you are passing if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" $ as the argument to xargs, and then the remainder of the line is being treated as a new command, because it breaks at the ;.
You can wrap the whole thing in a sub-invocation of bash, so that xargs sees the whole command:
$ ll | awk '{print $9}' | grep ttt | xargs -I xx bash -c 'if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" xx; then echo "found"; fi'
found
Finally, ll | awk '{print $9}' | grep ttt is a needlessly complicated way of listing the files that you're looking for. You actually you don't need any of the code above, just do this:
$ if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" ttt*; then echo "found"; fi
found
Alternatively, if you want to process each file in turn (which you don't need here, but you might want when this gets more complicated):
for file in ttt*
do
if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" "$file"
then
echo "found"
fi
done

How to clean csv by another csv while in a 'for' loop?

I'm not a linux expert, and usually in this situation PHP would be much more suitable... But due to the circumstances it occurred that I wrote it in Bash :)
I have the following .sh which runs over all .csv files in the current folder and execute a bunch of commands.
The goal: Cleaning email lists in .csv files (not actually .csv but just a .txt file in practice).
for file in $(find . -name "*.csv" ); do
echo "====================================================" >> db_purge_log.txt
echo "$file" >> db_purge_log.txt
echo "----------------------------------------------------" >> db_purge_log.txt
echo "Contacts BEFORE purge:" >> db_purge_log.txt
wc -l $file | cut -d " " -f1 >> db_purge_log.txt
echo " " >> db_purge_log.txt
cat $file | egrep -v "xxx|yyy|zzz" | grep -v -E -i '([0-z])\1{2,}' | uniq | sort -u > tmp_file
mv tmp_file $file ;
echo "Contacts AFTER purge:" >> db_purge_log.txt
wc -l $file | cut -d " " -f1 >> db_purge_log.txt
done
Now the trouble is:
I want to add a command, somewhere in the middle of this loop, to use another .csv file as suppression list, meaning - every line found as perfect match in that suppression list - delete from $file.
At this point my brain is stuck and I can't think of a solution. To be honest, I didn't manage using sort or grep on 2 different files and export to a 3rd file without completely eliminating the duplicated lines cross both files, so I end up with much less data.
Any help would be much appreciated!
Clean up
Before adding functionality to the script, the existing script needs to be cleaned up — a lot.
I/O Redirection — Don't Repeat Yourself
When I see wall-to-wall I/O redirections like that, I want to cry — that isn't how you do it! You have three options to avoid all that:
for file in $(find . -name "*.csv" )
do
echo "===================================================="
echo "$file"
echo "----------------------------------------------------"
echo "Contacts BEFORE purge:"
wc -l $file | cut -d " " -f1
echo " "
cat $file | egrep -v "xxx|yyy|zzz" | grep -v -E -i '([0-z])\1{2,}' | uniq | sort -u > tmp_file
mv tmp_file $file ;
echo "Contacts AFTER purge:"
wc -l $file | cut -d " " -f1
done >> db_purge_log.txt
Or:
{
for file in $(find . -name "*.csv" )
do
echo "===================================================="
echo "$file"
echo "----------------------------------------------------"
echo "Contacts BEFORE purge:"
wc -l $file | cut -d " " -f1
echo " "
cat $file | egrep -v "xxx|yyy|zzz" | grep -v -E -i '([0-z])\1{2,}' | uniq | sort -u > tmp_file
mv tmp_file $file ;
echo "Contacts AFTER purge:"
wc -l $file | cut -d " " -f1
done
} >> db_purge_log.txt
Or even:
exec >>db_purge_log.txt # By default, standard output will go to db_purge_log.txt
for file in $(find . -name "*.csv" )
do
echo "===================================================="
echo "$file"
echo "----------------------------------------------------"
echo "Contacts BEFORE purge:"
wc -l $file | cut -d " " -f1
echo " "
cat $file | egrep -v "xxx|yyy|zzz" | grep -v -E -i '([0-z])\1{2,}' | uniq | sort -u > tmp_file
mv tmp_file $file ;
echo "Contacts AFTER purge:"
wc -l $file | cut -d " " -f1
done
The first form is adequate for this script which has a single loop in it to provide I/O redirection to. The second form, using { and } would handle more general sequences of commands. The third form, using exec, is 'permanent'; you can't recover the original standard output, whereas with the { ... } form you can have different sections of the script writing to different places.
One other advantage of all these variations is that you can trivially send errors to the same place that you're sending standard output if that's what you desire. For example:
exec >>db_purge_log.txt 2>&1
Other issues
Suppressing file name from wc — instead of:
wc -l $file | cut -d " " -f1
use:
wc -l < $file
UUOC — Useless use of cat — instead of:
cat $file | egrep -v "xxx|yyy|zzz" | grep -v -E -i '([0-z])\1{2,}' | uniq | sort -u > tmp_file
use:
egrep -v "xxx|yyy|zzz" $file | grep -v -E -i '([0-z])\1{2,}' | uniq | sort -u > tmp_file
UUOU — Useless use of uniq
It is not at all clear why you need uniq and sort -u; in context, sort -u is sufficient, so:
egrep -v "xxx|yyy|zzz" $file | grep -v -E -i '([0-z])\1{2,}' | sort -u > tmp_file
UUOG — Useless use of grep
egrep is equivalent to grep -E and both are capable of handling multiple regular expressions, and the second will match what is matched by the expression in the parentheses 3 or more times (we really only need to match three times), so in fact the second expression will do the job of the first. And the [0-z] match is dubious. It probably matches sundry punctuation characters as well as the upper and lower case digits, but you're already doing a case-insensitive search because of the -i, so we can regularize all that to:
grep -Eiv '([0-9a-z]){3}' $file | sort -u > tmp_file
File names with spaces
The code is not going to handle file names with spaces, tabs or newlines because of the for file in $(find ...) notation. It probably isn't necessary to deal with that now — be aware of the issue.
Final clean up
for file in $(find . -name "*.csv" )
do
echo "===================================================="
echo "$file"
echo "----------------------------------------------------"
echo "Contacts BEFORE purge:"
wc -l < $file
echo " "
grep -Evi '([0-9a-z]){3}' | sort -u > tmp_file
mv tmp_file $file
echo "Contacts AFTER purge:"
wc -l <$file
done >> db_purge_log.txt
Add the extra functionality
I want to add a command, somewhere in the middle of this loop, to use another .csv file as suppression list — meaning that every line found as perfect match in that suppression list should be deleted from $file.
Since we're already sorting the input files ($file), we can sort the suppression file (call it suppfile='suppressions.txt'too if it is not already sorted. Given that, we then use comm to eliminate the lines that appear in both $file and $suppfile. We're interested in the lines that only appear in $file (or, as will be the case here, in the edited and sorted version of the file), so we want to suppress the common entries and the entries from $suppfile that do not appear in $file. The comm -23 - "$suppfile" command reads the edited, sorted file from standard input - and leaves out the entries from "$suppfile"
suppfile='suppressions.txt' # Must be in sorted order
for file in $(find . -name "*.csv" )
do
echo "===================================================="
echo "$file"
echo "----------------------------------------------------"
echo "Contacts BEFORE purge:"
wc -l < "$file"
echo " "
grep -Evi '([0-9a-z]){3}' | sort -u | comm -23 - "$suppfile" > tmp_file
mv tmp_file "$file"
echo "Contacts AFTER purge:"
wc -l < "$file"
done >> db_purge_log.txt
If the suppression file is not in sorted order, simply sort it into a temporary file. Beware of using the .csv suffix on the suppression file in the current directory; it will catch the file and empty it because every line in the suppression file matches a line in the suppression file, which is not helpful for any files processed after the suppression file.
Oops — I over-simplified the grep regex. It should (probably) be:
grep -Evi '([0-9a-z])\1{2}' $file
The difference is considerable. My original rewrite will look for any three adjacent digits or letters (e.g. 123 or abz); the revision (actually very similar to one of the original commands) looks for a character from [0-9A-Za-z] followed by two occurrences of the same character (e.g. 111 or aaa, but not 123 or abz).
If perchance the alternatives xxx|yyy|zzz were really not 3 repeated characters, you might need two invocations of grep in sequence.
If I understand you correctly, assuming a recent 'nix, grep should do most of the trick for you. The command, grep -vf filterfile input.csv will output the lines in input.csv that do NOT match any regular expression found in filterfile.
A couple of other comments ... uniq needs the input sorted in order to remove dups, so you might want the sort before it in the pipe (unless your input data is sorted).
Or if the input is sorted to start with, grep -u will omit duplicates.
Small suggestion -- you might add a #!/bin/bash as the first line in order to ensure that the script is run by bash rather than the user's login shell (it might not be bash).
HTH.
b

Find and highlight text in linux command line

I am looking for a linux command that searches a string in a text file,
and highlights (colors) it on every occurence in the file, WITHOUT omitting text lines (like grep does).
I wrote this handy little script. It could probably be expanded to handle args better
#!/bin/bash
if [ "$1" == "" ]; then
echo "Usage: hl PATTERN [FILE]..."
elif [ "$2" == "" ]; then
grep -E --color "$1|$" /dev/stdin
else
grep -E --color "$1|$" $2
fi
it's useful for stuff like highlighting users running processes:
ps -ef | hl "alice|bob"
Try
tail -f yourfile.log | egrep --color 'DEBUG|'
where DEBUG is the text you want to highlight.
command | grep -iz -e "keyword1" -e "keyword2" (ignore -e switch if just searching for a single word, -i for ignore case, -z for treating as a single file)
Alternatively,while reading files
grep -iz -e "keyword1" -e "keyword2" 'filename'
OR
command | grep -A 99999 -B 99999 -i -e "keyword1" "keyword2" (ignore -e switch if just searching for a single word, -i for ignore case,-A and -B for no of lines before/after the keyword to be displayed)
Alternatively,while reading files
grep -A 99999 -B 99999 -i -e "keyword1" "keyword2" 'filename'
command ack with --passthru switch:
ack --passthru pattern path/to/file
I take it you meant "without omitting text lines" (instead of emitting)...
I know of no such command, but you can use a script such as this (this one is a simple solution that takes the filename (without spaces) as the first argument and the search string (also without spaces) as the second):
#!/usr/bin/env bash
ifs_store=$IFS;
IFS=$'\n';
for line in $(cat $1);
do if [ $(echo $line | grep -c $2) -eq 0 ]; then
echo $line;
else
echo $line | grep --color=always $2;
fi
done
IFS=$ifs_store
save as, for instance colorcat.sh, set permissions appropriately (to be able to execute it) and call it as
colorcat.sh filename searchstring
I had a requirement like this recently and hacked up a small program to do exactly this. Link
Usage: ./highlight test.txt '^foo' 'bar$'
Note that this is very rough, but could be made into a general tool with some polishing.
Using dwdiff, output differences with colors and line numbers.
echo "Hello world # $(date)" > file1.txt
echo "Hello world # $(date)" > file2.txt
dwdiff -c -C 0 -L file1.txt file2.txt

Resources