Grep multiple strings - linux

I want to find a list of files that have A but do not have B and C.
grep -r -L 'B\|C' finds the ones without B and C, but how do I add the condition of having A as well?

If I understand your question correctly:
grep -l "A" $(grep -r -E -L "B|C" *)
i.e. search for files containing "A" in the list of files that your original command generates.

You can use negative lookahead in grep using options -P or --perl-regexp
grep -r -P -L '^(?!.*A).*$|B|C'

If I understood your question correctly, you can do it like this:
grep "A" file.txt | grep -v -e "B" -e "C"
The first grep finds lines containing A, the second greptakes the result and removes lines containing either "B" or "C". This works by the -v flag which inverses matches.

Related

grep 2 words at if statements in Bash

I am trying to see if my nohup file contains the words that I am looking for. If it does, then I need to put that into tmp file.
So I am currently using:
if grep -q "Started|missing" $DIR3/$dirName/nohup.out
then
grep -E "Started|missing" "$DIR3/$dirName/nohup.out" > tmp
fi
But it never goes into the if statement even if there are words that I am looking for.
How can I fix this?
Since basic sed uses BRE, regex alternation operator is represented by \| . | matches a literal | symbol. And you don't need to touch | symbol in the grep which uses ERE.
if grep -q "Started\|missing" $DIR3/$dirName/nohup.out
You should use egrep instead of grep (Avinash Raj has explained that in other words already in his answer).
I would generally recommend using egrep as a default for everyday use (even though many expressions only contain the basic regular expression syntax). From a practical point the standard grep is only interesting for performance reasons.
Details about the advantages of grep vs. egrep can be found in that superuser question.
When you only put the grep results into the tmp-file, you do not want to grep the file twice.
You can not use
egrep "Started|missing" $DIR3/$dirName/nohup.out > tmp
since that would create an empty tmp file when nothing is found.
You can remove empty files with if [ ! -s tmp ] or use another solution:
Redirectong the grep results without grepping again can be done with
rm -f tmp 2>/dev/null
egrep "Started|missing" $DIR3/$dirName/nohup.out | while read -r strange_line; do
echo "${strange_line}" >> tmp
done

grep -o and display part of filenames using ls

I have a directory which has many directories inside it with the pattern of their name as :
YYYYDDMM_HHMISS
Example: 20140102_120202
I want to extract only the YYYYDDMM part.
I tried ls -l|awk '{print $9}'|grep -o ^[0-9]* and got the answer.
However i have following questions:
Why doesnt this return any results: ls -l|awk '{print $9}'|grep -o [0-9]* . Infact it should have returned all the directories.
Strangely just including '^' before [0-9] works fine :
ls -l|awk '{print $9}'|grep -o ^[0-9]*
Any other(simpler) way to achieve the result?
Why doesnt this return any results: ls -l|awk '{print $9}'|grep -o [0-9]*
If there are files in your current directory that start with [0-9], then the shell will expand them before calling grep. For example, if I have two files a1, a2 and a3 and run this:
ls | grep a*
After the filenames are expanded, the shell will run this:
ls | grep a1 a2 a3
The result of which is that it will print the lines in a2 and a3 that match the text "a1". It will also ignore whatever is coming from stdin, because when you specify filenames for grep (2nd argument and beyond), it will ignore stdin.
Next, consider this:
ls | grep ^a*
Here, ^ has no special meaning to the shell, so it uses it verbatim. Since I don't have filenames starting with ^a, it will use ^a* as the pattern. If I did have filenames like ^asomething or ^another, then again, ^a* would be expanded to those filenames and grep would do something I didn't really intend.
This is why you have to quote search patterns, to prevent the shell from expanding them. The same goes for patterns in find /path -name 'pattern'.
As for a simpler way for what you want, I think this should do it:
ls | sed -ne 's/_.*//p'
To show only the YYDDMM part of the directory names:
for i in ./*; do echo $(basename "${i%%_*}"); done
Not sure what you want to do with it once you've got it though...
You must avoid parsing ls output.
Simple is to use this printf:
printf "%s\n" [0-9]*_[0-9]*|egrep -o '^[0-9]+'

Find specific string in subdirectories and order top directories by modification date

I have a directory structure containing some files. I'm trying to find the names of top directories that do contain a file with specific string in it.
I've got this:
grep -r abcdefg . | grep commit_id | sed -r 's/\.\/(.+)\/.*/\1/';
Which returns something like:
topDir1
topDir2
topDir3
I would like to be able to take this output and somehow feed it into this command:
ls -t | grep -e topDir1 -e topDir2 -e topDir3
which would returned the output filtered by the first command and ordered by modification date.
I'm hoping for a one liner. Or maybe there is a better way of doing it?
This should work as long as none of the directory names contain whitespace or wildcard characters:
ls -td $(grep -r abcdefg . | grep commit_id | dirname)

Grep Syntax with Capitals

I'm trying to write a script with a file as an argument that greps the text file to find any word that starts with a capital and has 8 letters following it. I'm bad with syntax so I'll show you my code, I'm sure it's an easy fix.
grep -o '[A-Z][^ ]*' $1
I'm not sure how to specify that:
a) it starts with a capital letter, and
b)that it's a 9 letter word.
Cheers
EDIT:
As an edit I'd like to add my new code:
while read p
do
echo $p | grep -Eo '^[A-Z][[:alpha:]]{8}'
done < $1
I still can't get it to work, any help on my new code?
'[A-Z][^ ]*' will match one character between A and Z, followed by zero or more non-space characters. So it would match any A-Z character on its own.
Use \b to indicate a word boundary, and a quantifier inside braces, for example:
grep '\b[A-Z][a-z]\{8\}\b'
If you just did grep '[A-Z][a-z]\{8\}' that would match (for example) "aaaaHellosailor".
I use \{8\}, the braces need to be escaped unless you use grep -E, also known as egrep, which uses Extended Regular Expressions. Vanilla grep, that you are using, uses Basic Regular Expressions. Also note that \b is not part of the standard, but commonly supported.
If you use ^ at the beginning and $ at the end then it will not find "Wiltshire" in "A Wiltshire pig makes great sausages", it will only find lines which just consist of a 9 character pronoun and nothing else.
This works for me:
$ echo "one-Abcdefgh.foo" | grep -o -E '[A-Z][[:alpha:]]{8}'
$ echo "one-Abcdefghi.foo" | grep -o -E '[A-Z][[:alpha:]]{8}'
Abcdefghi
$
Note that this doesn't handle extensions or prefixes. If you want to FORCE the input to be a 9-letter capitalized word, we need to be more explicit:
$ echo "one-Abcdefghij.foo" | grep -o -E '\b[A-Z][[:alpha:]]{8}\b'
$ echo "Abcdefghij" | grep -o -E '\b[A-Z][[:alpha:]]{8}\b'
$ echo "Abcdefghi" | grep -o -E '\b[A-Z][[:alpha:]]{8}\b'
Abcdefghi
$
I have a test file named 'testfile' with the following content:
Aabcdefgh
Babcdefgh
cabcdefgh
eabcd
Now you can use the following command to grep in this file:
grep -Eo '^[A-Z][[:alpha:]]{8}' testfile
The code above is equal to:
cat testfile | grep -Eo '^[A-Z][[:alpha:]]{8}'
This matches
Aabcdefgh
Babcdefgh

Grep Search all files in directory for string1 AND string2

How can I make use of grep in cygwin to find all files that contain BOTH words.
This is what I use to search all files in a directory recursively for one word:
grep -r "db-connect.php" .
How can I extend the above to look for files that contain both "db-connect.php" AND "version".
I tried this: grep -r "db-connect.php\|version" . but this is an OR i.e. it gets file that contain one or the other.
Thanks all for any help
grep -r db-connect.php . | grep version
If you want to grep for several strings in a file which have different lines, use the following command:
grep -rl expr1 | xargs grep -l expr2 | xargs grep -l expr3
This will give you a list of files that contain expr1, expr2, and expr3.
Note that if any of the file names in the directory contains spaces, these files will produce errors. This can be fixed by adding -0 I think to grep and xargs.
grep "db-connect.php" * | cut -d: -f1 | xargs grep "version"
I didn't try it in recursive mode but it should be the same.
To and together multiple searches, use multiple lookahead assertions, one per thing looked for apart from the last one:
instead of writing
grep -P A * | grep B
you write
grep -P '(?=.*A)B' *
grep -Pr '(?=.*db-connect\.php)version' .
Don’t write
grep -P 'A.*B|B.*A' *
because that fails on overlaps, whereas the (?=…)(?=…) technique does not.
You can also add in NOT operators as well. To search for lines that don’t match X, you normally of course use -v on the command line. But you can’t do that if it is part of a larger pattern. When it is, you add (?=(?!X).)*$) to the pattern to exclude anything with X in it.
So imagine you want to match lines with all three of A, B, and then either of C or D, but which don’t have X or Y in them. All you need is this:
grep -P '(?=^.*A)(?=^.*B)(?=^(?:(?!X).)*$)(?=^(?:(?!Y).)*$)C|D' *
In some shells and in some settings. you’ll have to escape the ! if it’s your history-substitution character.
There, isn’t that pretty cool?
In my cygwin the given answers didn't work, but the following did:
grep -l firststring `grep -r -l secondstring . `
Do you mean "string1" and "string2" on the same line?
grep 'string1.*string2'
On the same line but in indeterminate order?
grep '(string1.*string2)|(string2.*string1)'
Or both strings must appear in the file anywhere?
grep -e string1 -e string2
The uses PCRE (Perl-Compatible Regular Expressions) with multiline matching and returns the filenames of files that contain both strings (AND rather than OR).
grep -Plr '(?m)db-connect\.php(.*\n)*version|version(.*\n)*db-connect\.php' .
Why to stick to only grep:
perl -lne 'print if(/db-connect.php/&/version/)' *

Resources