grep the files that contain specific word and don't contain those words - linux

I need to use grep command to get all the files that contain a [form] word,
and don't contain a [head] or [cfGenerate] words at the same time.

GNU grep with PCRE :
grep -P '^(?!.*head)(?!.*cfGenerate).*form'
negative lookaheads can be combined to make the match fail if any unwanted pattern occurs.

Related

Linux counting words in random characters

I have generated a file of random characters for A-Z and a-z, the file has different sizes for example 10000 characters or 1000000 I would like to search in them how many times the word 'cat' or 'dog' appeared Would someone be able to provide the command linux grep... | wc... or any other command that can handle this task.
grep has a -c command that will count the number of matches found.
So
grep -c "cat\|dog" <file name>
add -i if you want a case insensitive count
You can use grep with the flag -o. For example:
grep -o "dog\|cat" <filename> | wc -l
About the flag -o, according to man grep: «Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.»
This solution will work in several situations: multiple lines, a single line, the word surrounded with whitespaces or other characters, etc.

Filtering file-list using grep

I am trying to list files in a specific directory whose name do not match a certain pattern.
For eg. list all files not ending with abc.yml
For this I am using the command:
ls | grep -v "*abc.yml"
However I still see the files ending with abc.yml, what am I doing wrong here?
Asterisk has a different meaning in regular expressions. In fact, putting it to the front of the expressions makes it match literally. You can remove it, as grep tries to match the expression anywhere on the line, it doesn't try to match the whole line. To add the "end of line" anchor, add $. Also, . matches any character, use \. to match a dot literally:
ls | grep -v 'abc\.yml$'
In some shells, you can use extended globbing to list the files without the need to pipe to grep. For example, in bash:
shopt -s extglob
ls !(*abc.yml)

Find the words do not appear in the dictionary in linux platform

Here's the text maybe have some words ,each line has one word.and I accept it as a command line arguments. For example the textile a.txt is like:
about
catb
west
eastren
And what I want to do is to find the words do not in the dictionary, if the words are dictionary words, delete it in the textfile.
I use the following commands:
word=$1
grep "$1$" /usr/share/dict/linux.words -q
for word in $(<a.txt)
do
if [ $word -eq 0 ]
then
sed '/$word/d'
fi
done
Nothing Happened.
grep alone is enough from what I understand
$ grep -xvFf /usr/share/dict/linux.words a.txt
catb
eastren
catb and eastren are words not found in /usr/share/dict/linux.words. The options used are
-x, --line-regexp
Select only those matches that exactly match the whole line. For a regular expression pattern, this is
like parenthesizing the pattern and then surrounding it with ^ and $.
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines,
any of which is to be matched.
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. If this option is used multiple times or is combined with the
-e (--regexp) option, search for all patterns given. The empty file contains zero patterns, and
therefore matches nothing.
An alternative would be to use a spell checker like hunspell, so you can apply it to any text, not only a pre-formatted file with only one word per line. You can also specify several dictionaries, and it shows only words which are in none of them.
For example, after copy/pasting the content of this page into test.txt
lang1=en_US
lang2=fr_FR
hunspell -d $lang1,$lang2 -l test.txt | sort -u
produces a list of 46 words:
'
Apps
Arqade
catb
'cba'
Cersei
ceving
Drupal
eastren
...
WordPress
Worldbuilding
xvFf
xz
Yodeya

How to print the longest word in a file by using combination of grep and wc

iam trining to find the longest word in a text file.
i tried it and find out the no of characters in the longest word in a file
by using the command
wc -L
i need to print the longest word By using this number and grep command .
If you must use the two commands give, I'd suggest:
grep -E ".{$(wc -L < test.txt)}" test.txt
The command substitution is used to build the correct brace expression to match the line(s) with exactly the given number of characters. -E is needed to enable extended regular expression support; otherwise, the braces need to be escaped: grep ".\{...\}" test.txt.
Using an awk command that makes a single pass through the file may be faster.

line return in grep search?

I have some files with the text:
xxxxx
xxxxx
<cert>
</cert>
some other stuff
How can I search with grep and ignore the line returns?
I have many files in the same folder.
I have tried this but it does not seem to stop running:
tr '\n' ' ' | grep '<cert></cert>' *
That is searching for a multi-line pattern, which the usual grep does not appear to support. There are alternative tools, e.g.,
How can I search for a multiline pattern in a file?, which suggests pcregrep, or custom awk, perl scripts.
How can I “grep” patterns across multiple lines?, again suggesting pcregrep (as well as sed scripts).
However, GNU grep is said to support this as well:
How do I grep for multiple patterns on multiple lines? gives as an example
grep -Pzo "^begin\$(.|\n)*^end$" file
to use a newline in a pattern. The options used however include the "experimental" -P which may make it less suitable than pcregrep:
-P, --perl-regexp
Interpret PATTERN as a Perl regular expression. This is highly
experimental and grep -P may warn of unimplemented features.
-z, --null-data
Treat the input as a set of lines, each terminated by a zero
byte (the ASCII NUL character) instead of a newline. Like the
-Z or --null option, this option can be used with commands like
sort -z to process arbitrary file names.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
Some experimental options are useful, others less so. This one was noted as the source of problems in Searching for non-ascii characters.

Resources