Linux counting words in random characters - linux

I have generated a file of random characters for A-Z and a-z, the file has different sizes for example 10000 characters or 1000000 I would like to search in them how many times the word 'cat' or 'dog' appeared Would someone be able to provide the command linux grep... | wc... or any other command that can handle this task.

grep has a -c command that will count the number of matches found.
So
grep -c "cat\|dog" <file name>
add -i if you want a case insensitive count

You can use grep with the flag -o. For example:
grep -o "dog\|cat" <filename> | wc -l
About the flag -o, according to man grep: «Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.»
This solution will work in several situations: multiple lines, a single line, the word surrounded with whitespaces or other characters, etc.

Related

How to count exact match of certain patterns in a text file using linux shell command?

I want to find the count of certain pattern in a text file which contains lot of mixed patterns also using linux shell command.
I have a text file which contains below patterns,
[--------------]
[+--------------+]
[+----------+------------+--------------------+]
[+---------------------+---------------------+]
How to find exact count of only first pattern [--------------]?
Note: Don't include square bracket as a pattern. Only special character inside square bracket is a pattern.
cat ./file | sed -e 's/\]/\]\n/' |grep "\[--------------\]" -c
cat reads file
sed replace ] with ]\n
grep searches every line for your expression and prints the number of lines -c

grep the files that contain specific word and don't contain those words

I need to use grep command to get all the files that contain a [form] word,
and don't contain a [head] or [cfGenerate] words at the same time.
GNU grep with PCRE :
grep -P '^(?!.*head)(?!.*cfGenerate).*form'
negative lookaheads can be combined to make the match fail if any unwanted pattern occurs.

Using grep to get 12 letter alphabet only lines

Using grep
How many 12 letter - alphabet only lines are in testing.txt?
excerpt of testing.txt
tyler1
Tanktop_Paedo
xyz2#geocities.com
milt#uole.com
justincrump
cranges10
namer#uole.com
soulfunkbrotha
timetolearnz
hotbooby#geocities.com
Fire_Crazy
helloworldad
dingbat#geocities.com
from this excerpt, I want to get a result of 2. (helloworldad, and timetolearnz)
I want to check every line and grep only those that have 12 characters in each line. I can't think of a way to do this with grep though.
For the alphabet only, I think I can use
grep [A-Za-z] testing.txt
However, how do I make it so only the characters [A-Za-z] show up in those 12 characters?
You can do it with extended regex -E and by specifying that the match is exactly {12} characters from start ^ to finish $
$ grep -E "^[A-Za-z]{12}$" testing.txt
timetolearnz
helloworldad
Or if you want to get the count -c of the lines you can use
$ grep -cE "^[A-Za-z]{12}$" testing.txt
2
grep supports whole-line match and counting, e.g.:
grep -xc '[[:alpha:]]\{12\}' testing.txt
Output:
2
The [:alpha:] character class is another way of saying [A-Za-z]. See section 3.2 of the the info pages: info grep 'Regular Expressions' 'Character Classes and Bracket Expressions' for more on this subject. Or look it up in the pdf manual online.

How to print the longest word in a file by using combination of grep and wc

iam trining to find the longest word in a text file.
i tried it and find out the no of characters in the longest word in a file
by using the command
wc -L
i need to print the longest word By using this number and grep command .
If you must use the two commands give, I'd suggest:
grep -E ".{$(wc -L < test.txt)}" test.txt
The command substitution is used to build the correct brace expression to match the line(s) with exactly the given number of characters. -E is needed to enable extended regular expression support; otherwise, the braces need to be escaped: grep ".\{...\}" test.txt.
Using an awk command that makes a single pass through the file may be faster.

Find the number of occurences of certain string sequences

I want to count the number of occurences of the IP address 192.168.1.10 in a text file using grep | wc.
The command I use is:
cat ./capture.txt|grep "192.168.1.10"|wc -w
which returns 0, and I don't know why.
Here is the content of my .txt file:
give this a try:
grep -Fwo '192.168.1.10' file|wc -l
-F makes the grep take your pattern as literal string instead of regex
-w excludes 192.168.1.101 or 192.168.1.100
-o lists each match in a line. grep does line based match, if your pattern matched twice in a line, the result of occurrence count may be wrong.
cat ./capture.txt | grep "\b192\.168\.1\.10\b" -c
\. search for dot, not any character
\b match at the beginning or end of a word
-c return the number of occurrences

Resources