linux grep command is not returning accurate results - linux

grep command is not returning accurate results.
I have a text file which has some html content. I want to get the count of a specific word using the grep command.the grep command is not returning accurate results.OS - Red Hat Enterprise Linux Server release 6.6 (Santiago)
Below is the content of the input file's test.txt.
This file has two occurrences of the word "Tomcat"
<html><title>Tomcat Server</title><body><font face="Verdana, Arial" size="-1"><p>Tomcat Server</p></body></html>
grep command
cat test.txt|grep -c Tomcat
cat test.txt|grep -c "Tomcat"
Note: It's the same result with or without quotes
Expected Result: count - 2
Actual Result: count - 1

Note the difference between "accurate" and "desired." The grep man page says of the -c flag:
Suppress normal output; instead print a count of matching lines for each input file. With the -v, --invert-match option (see below), count non-matching lines.
So it's counting that one line had a match, and that's what it tells you.

Related

Linux counting words in random characters

I have generated a file of random characters for A-Z and a-z, the file has different sizes for example 10000 characters or 1000000 I would like to search in them how many times the word 'cat' or 'dog' appeared Would someone be able to provide the command linux grep... | wc... or any other command that can handle this task.
grep has a -c command that will count the number of matches found.
So
grep -c "cat\|dog" <file name>
add -i if you want a case insensitive count
You can use grep with the flag -o. For example:
grep -o "dog\|cat" <filename> | wc -l
About the flag -o, according to man grep: «Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.»
This solution will work in several situations: multiple lines, a single line, the word surrounded with whitespaces or other characters, etc.

How to grep for a matching word, not the surrounding line, with a wildcard?

Maybe an odd question, but I'm attempting to grep the output of a command to select just the matching word and not the line. This word also has a wildcard in it.
git log --format=%aD <file> | tail -1 | grep -oh 201
The first and second sections of the command check the git log for a file and grabs the line pertaining to the date and time of creation. I'm attempting to write a bash script that does something with the year it was created, so I need to grab just that one word (the year).
Looking at the grep documentation, -o specifically prints the matching word (and -h suppresses filenames). I can't find anything that allows for matching the rest of the word that it's matching, though (I could just be spacing).
So the output of that previous command is:
201
And I need it to be (as an example):
2017
Help would be much appreciated!
You can use . as a wildcard character:
$ echo 'before2017after' | grep -o '201.'
2017
Or, better yet, specify that the fourth character be a digit:
$ echo 'before2017after' | grep -o '201[[:digit:]]'
2017
Notes:
Since you are getting input from stdin, there are no filenames. Consequently, in this case, -h changes nothing.
[[:digit:]] is a unicode-safe way of specifying a digit.

grep an empty value in a binary file in linux

I have a binary file in Linux machine with values: AB=^] (^] is an empty value), AB=N and AB=Y. I want to get the count of occurrences of AB=^] in the file.
I am using the following command :
zcat Logfile|grep 'AB=^]' |wc -l
but it gives the count 0. The above command works fine for AB=N and Y so I guess I am searching for wrong pattern, what should I search for if not AB=^] ?
Output for the above command:
gzip: Logfile: unexpected end of file
0
here 0 indicates the number of occurrences of tag AB=^]
Basically the deleted answers should work. Except of escaping the ^ and ] your regex, you can also use their hexadecimal notation:
grep -o 'AB='$'\x5E'$'\x5D' file | wc -l

LINUX Shell commands cat and grep

I am a windows user having basic idea about LINUX and i encountered this command:
cat countryInfo.txt | grep -v "^#" >countryInfo-n.txt
After some research i found that cat is for concatenation and grep is for regular exp search (don't know if i am right) but what will the above command result in (since both are combined together) ?
Thanks in Advance.
EDIT: I am asking this as i dont have linux installed. Else, i could test it.
Short answer: it removes all lines starting with a # and stores the result in countryInfo-n.txt.
Long explanation:
cat countryInfo.txt reads the file countryInfo.txt and streams its content to standard output.
| connects the output of the left command with the input of the right command (so the right command can read what the left command prints).
grep -v "^#" returns all lines that do not (-v) match the regex ^# (which means: line starts with #).
Finally, >countryInfo-n.txt stores the output of grep into the specified file.
It will remove all lines starting with # and put the output in countryInfo-n.txt
This command would result in removing lines starting with # from the file countryInfo.txt and place the output in the file countryInfo-n.txt.
This command could also have been written as
grep -v "^#" countryInfo.txt > countryInfo-n.txt
See Useless Use of Cat.

Output grep results to text file, need cleaner output

When using the Grep command to find a search string in a set of files, how do I dump the results to a text file?
Also is there a switch for the Grep command that provides cleaner results for better readability, such as a line feed between each entry or a way to justify file names and search results?
For instance, a away to change...
./file/path: first result
./another/file/path: second result
./a/third/file/path/here: third result
to
./file/path: first result
./another/file/path: second result
./a/third/file/path/here: third result
grep -n "YOUR SEARCH STRING" * > output-file
The -n will print the line number and the > will redirect grep-results to the output-file.
If you want to "clean" the results you can filter them using pipe | for example:
grep -n "test" * | grep -v "mytest" > output-file
will match all the lines that have the string "test" except the lines that match the string "mytest" (that's the switch -v) - and will redirect the result to an output file.
A few good grep-tips can be found in this post
Redirection of program output is performed by the shell.
grep ... > output.txt
grep has no mechanism for adding blank lines between each match, but does provide options such as context around the matched line and colorization of the match itself. See the grep(1) man page for details, specifically the -C and --color options.
To add a blank line between lines of text in grep output to make it easier to read, pipe (|) it through sed:
grep text-to-search-for file-to-grep | sed G

Resources