linux command grep -is "abc" filename|wc -l - linux

what does the s mean there and also when pipe into wc what is that for? I know it eventually count the number of abc appeared in file filename, but not sure about the option s for and also pipe to wc mean
linux command grep -is "abc" filename|wc -l
output
47

-s means "suppress error messages about unreadable files" and the pipe to wc means "take the output and send it to the wc -l command" which effectively counts the number of lines matched. You can accomplish the same with the -c option to grep: grep -isc "abc" filename

Consider,
command_1 | command_2
Role of the pipe is that- it takes output of command written before it (command_1 here) and supplies that output to the command written after it (command_2 here).

The man page has everything you would want to know about the options for grep:
-s, --no-messages
Suppress error messages about nonexistent or unreadable files.
Portability note: unlike GNU grep, traditional grep did not con-
form to POSIX.2, because traditional grep lacked a -q option and
its -s option behaved like GNU grep's -q option. Shell scripts
intended to be portable to traditional grep should avoid both -q
and -s and should redirect output to /dev/null instead.
The pipe to wc -l is what gives you the count of how many lines the string "abc" appeared on. It isn't necessarily the number of times the string appeared in the file since one line with multiple occurrences is going to be counted as only 1.

grep man page says:
-s, --no-messages suppress error messages
grep returns the lines that have abc (case insensitive) in them. You pipe them to wc to get a count of the number of lines.

From man grep:
-s, --no-messages
Suppress error messages about nonexistent or unreadable files.
The wc command counts line, words and characters. With -l it returns the number of lines.

Related

GREP to show files WITH text and WITHOUT text

I am trying to search for files with specific text but excluding a certain text and showing only the files.
Here is my code:
grep -v "TEXT1" *.* | grep -ils "ABC2"
However, it returns:
(standard input)
Please suggest. Thanks a lot.
The output should only show the filenames.
Here's one way to do it, assuming you want to match these terms anywhere in the file.
grep -LZ 'TEXT1' *.* | xargs -0 grep -li 'ABC2'
-L will match files not containing the given search term
use -LiZ if you want to match TEXT1 irrespective of case
The -Z option is needed to separate filenames with NUL character and xargs -0 will then separate out filenames based on NUL character
If you want to check these two conditions on same line instead of anywhere in the file:
grep -lP '^(?!.*TEXT1).*(?i:ABC2)' *.*
-P enables PCRE, which I assume you have since linux is tagged
(?!regexp) is a negative lookahead construct, so ^(?!.*TEXT1) will ensure the line doesn't have TEXT1
(?i:ABC2) will match ABC2 case insensitively
Use grep -liP '^(?!.*TEXT1).*ABC2' if you want to match both terms irrespective of case
(standard input)
This error is due to use of grep -l in a pipeline as your second grep command is reading input from stdin not from a file and -l option is printing (standard input) instead of the filename.
You can use this alternate solution in a single awk command:
awk '/ABC2/ && !/TEXT1/ {print FILENAME; nextfile}' *.* 2>/dev/null

what does the option Inone mean for linux command xargs?

I have seen the Inone option for linux command xargs ,but I googled and did not find out what the option means.
seq 2 | xargs -Inone cat file
Using seq with xargs:
This is a clever use of xargs and seq to avoid writing a loop. It is basically the equivalent of:
for i in {1..2}; do
cat file
done
That is, it will run cat file once for each line of output from the seq command. The -Inone simply prevents xargs from appending the value read from seq to the command; see the xargs man page for details on the -I option:
-I replace-str
Replace occurrences of replace-str in the initial-ar‐
guments with names read from standard input. Also,
unquoted blanks do not terminate input items; instead
the separator is the newline character. Implies -x
and -L 1.

Optimizing search in linux

I have a huge log file close to 3GB in size.
My task is to generate some reporting based on # of times something is being logged.
I need to find the number of time StringA , StringB , StringC is being called separately.
What I am doing right now is:
grep "StringA" server.log | wc -l
grep "StringB" server.log | wc -l
grep "StringC" server.log | wc -l
This is a long process and my script takes close to 10 minutes to complete. What I want to know is that whether this can be optimized or not ? Is is possible to run one grep command and find out the number of time StringA, StringB and StringC has been called individually ?
You can use grep -c instead of wc -l:
grep -c "StringA" server.log
grep can't report count of individual strings. You can use awk:
out=$(awk '/StringA/{a++;} /StringB/{b++;} /StringC/{c++;} END{print a, b, c}' server.log)
Then you can extract each count with a simple bash array:
arr=($out)
echo "StringA="${arr[0]}
echo "StringA="${arr[1]}
echo "StringA="${arr[2]}
This (grep without wc) is certainly going to be faster and possibly awk solution is also faster. But I haven't measured any.
Certainly this approach could be optimized since grep doesn't perform any text indexing. I would use a text indexing engine like one of those from this review or this stackexchange QA . Also you may consider using journald from systemd which stores logs in a structured and indexed format so lookups are more effective.
So many greps so little time... :-)
According to David Lyness, a straight grep search is about 7 times as fast as an awk in large file searches.
If that is the case, the current approach could be optimized by changing grep to fgrep, but only if the patterns being searched for are not regular expressions. fgrep is optimized for fixed patterns.
If the number of instances is relatively small compared to the original log file entries, it may be an improvement to use the egrep version of grep to create a temporary file filled with all three instances:
egrep "StringA|StringB|StringC" server.log > tmp.log
grep "StringA" tmp.log | wc -c
grep "StringB" tmp.log | wc -c
grep "StringC" tmp.log | wc -c
The egrep variant of grep allows for a | (vertical bar/pipe) character to be used between two or more separate search strings so that you can find multiple strings in statement. You can use grep -E to do the same thing.
Full documentation is in the man grep page and information about the Extended Regular Expressions that egrep uses from the man 7 re_format command.

grep using output from another command

Say I have command1 which outputs this:
b05808aa-c6ad-4d30-a334-198ff5726f7c
59996d37-9008-4b3b-ab22-340955cb6019
2b41f358-ff6d-418c-a0d3-ac7151c03b78
7ac4995c-ff2c-4717-a2ac-e6870a5670f0
I also have command2 which outputs this:
b05808aa-c6ad-4d30-a334-198ff5726f7c
59996d37-9008-4b3b-ab22-340955cb6019
Is there a way to grep the output from command1 to not include any lines matched from command2, so that the final output would look like this?
2b41f358-ff6d-418c-a0d3-ac7151c03b78
7ac4995c-ff2c-4717-a2ac-e6870a5670f0
Issue this grep
command1 | grep -vF -f <(command2)
Here,
-F means Fixed string match*
-v means invert match
-f means the file with patterns
<(command) actually creates a FIFO with that command and use it on redirection.
To get all the lines from the output of command1 that do not appear in the output of command2:
grep -vFf <(command2) <(command1)
-f tells grep to use patterns that come from a file. In this case, that file is the output of command2. -F tells grep that those patterns are to be treated as fixed strings, not regex. -v tells grep to invert its normal behavior and just show lines the lines that do not match.

Pipe output to use as the search specification for grep on Linux

How do I pipe the output of grep as the search pattern for another grep?
As an example:
grep <Search_term> <file1> | xargs grep <file2>
I want the output of the first grep as the search term for the second grep. The above command is treating the output of the first grep as the file name for the second grep. I tried using the -e option for the second grep, but it does not work either.
You need to use xargs's -i switch:
grep ... | xargs -ifoo grep foo file_in_which_to_search
This takes the option after -i (foo in this case) and replaces every occurrence of it in the command with the output of the first grep.
This is the same as:
grep `grep ...` file_in_which_to_search
Try
grep ... | fgrep -f - file1 file2 ...
If using Bash then you can use backticks:
> grep -e "`grep ... ...`" files
the -e flag and the double quotes are there to ensure that any output from the initial grep that starts with a hyphen isn't then interpreted as an option to the second grep.
Note that the double quoting trick (which also ensures that the output from grep is treated as a single parameter) only works with Bash. It doesn't appear to work with (t)csh.
Note also that backticks are the standard way to get the output from one program into the parameter list of another. Not all programs have a convenient way to read parameters from stdin the way that (f)grep does.
I wanted to search for text in files (using grep) that had a certain pattern in their file names (found using find) in the current directory. I used the following command:
grep -i "pattern1" $(find . -name "pattern2")
Here pattern2 is the pattern in the file names and pattern1 is the pattern searched for
within files matching pattern2.
edit: Not strictly piping but still related and quite useful...
This is what I use to search for a file from a listing:
ls -la | grep 'file-in-which-to-search'
Okay breaking the rules as this isn't an answer, just a note that I can't get any of these solutions to work.
% fgrep -f test file
works fine.
% cat test | fgrep -f - file
fgrep: -: No such file or directory
fails.
% cat test | xargs -ifoo grep foo file
xargs: illegal option -- i
usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements]] [-J replstr]
[-L number] [-n number [-x]] [-P maxprocs] [-s size]
[utility [argument ...]]
fails. Note that a capital I is necessary. If i use that all is good.
% grep "`cat test`" file
kinda works in that it returns a line for the terms that match but it also returns a line grep: line 3 in test: No such file or directory for each file that doesn't find a match.
Am I missing something or is this just differences in my Darwin distribution or bash shell?
I tried this way , and it works great.
[opuser#vjmachine abc]$ cat a
not problem
all
problem
first
not to get
read problem
read not problem
[opuser#vjmachine abc]$ cat b
not problem xxy
problem abcd
read problem werwer
read not problem 98989
123 not problem 345
345 problem tyu
[opuser#vjmachine abc]$ grep -e "`grep problem a`" b --col
not problem xxy
problem abcd
read problem werwer
read not problem 98989
123 not problem 345
345 problem tyu
[opuser#vjmachine abc]$
You should grep in such a way, to extract filenames only, see the parameter -l (the lowercase L):
grep -l someSearch * | xargs grep otherSearch
Because on the simple grep, the output is much more info than file names only. For instance when you do
grep someSearch *
You will pipe to xargs info like this
filename1: blablabla someSearch blablabla something else
filename2: bla someSearch bla otherSearch
...
Piping any of above line makes nonsense to pass to xargs.
But when you do grep -l someSearch *, your output will look like this:
filename1
filename2
Such an output can be passed now to xargs
I have found the following command to work using $() with my first command inside the parenthesis to have the shell execute it first.
grep $(dig +short) file
I use this to look through files for an IP address when I am given a host name.

Resources