Counting words in a apropos command - linux

I have this command apropos -l apple. I want to count the word "apple" from the output of the command. I am a beginner in UNIX commands, and have an idea that I have to use grep or wc, but I'm not sure how. Any help would be appreciated.

apropos -l apple | grep -io apple
There are many options within grep that can help you meet your objective and the above is just an example.
Take the output of apropos -l and then pipe through to grep. With grep, we search for all "apple" entries with -i to show any case combinations. We then finally output the generated list through to wc -l to count the lines and therefore the entries,

Related

Optimizing search in linux

I have a huge log file close to 3GB in size.
My task is to generate some reporting based on # of times something is being logged.
I need to find the number of time StringA , StringB , StringC is being called separately.
What I am doing right now is:
grep "StringA" server.log | wc -l
grep "StringB" server.log | wc -l
grep "StringC" server.log | wc -l
This is a long process and my script takes close to 10 minutes to complete. What I want to know is that whether this can be optimized or not ? Is is possible to run one grep command and find out the number of time StringA, StringB and StringC has been called individually ?
You can use grep -c instead of wc -l:
grep -c "StringA" server.log
grep can't report count of individual strings. You can use awk:
out=$(awk '/StringA/{a++;} /StringB/{b++;} /StringC/{c++;} END{print a, b, c}' server.log)
Then you can extract each count with a simple bash array:
arr=($out)
echo "StringA="${arr[0]}
echo "StringA="${arr[1]}
echo "StringA="${arr[2]}
This (grep without wc) is certainly going to be faster and possibly awk solution is also faster. But I haven't measured any.
Certainly this approach could be optimized since grep doesn't perform any text indexing. I would use a text indexing engine like one of those from this review or this stackexchange QA . Also you may consider using journald from systemd which stores logs in a structured and indexed format so lookups are more effective.
So many greps so little time... :-)
According to David Lyness, a straight grep search is about 7 times as fast as an awk in large file searches.
If that is the case, the current approach could be optimized by changing grep to fgrep, but only if the patterns being searched for are not regular expressions. fgrep is optimized for fixed patterns.
If the number of instances is relatively small compared to the original log file entries, it may be an improvement to use the egrep version of grep to create a temporary file filled with all three instances:
egrep "StringA|StringB|StringC" server.log > tmp.log
grep "StringA" tmp.log | wc -c
grep "StringB" tmp.log | wc -c
grep "StringC" tmp.log | wc -c
The egrep variant of grep allows for a | (vertical bar/pipe) character to be used between two or more separate search strings so that you can find multiple strings in statement. You can use grep -E to do the same thing.
Full documentation is in the man grep page and information about the Extended Regular Expressions that egrep uses from the man 7 re_format command.

diff command to get number of different lines only

Can I use the diff command to find out how many lines do two files differ in?
I don't want the contextual difference, just the total number of lines that are different between two files. Best if the result is just a single integer.
diff can do all the first part of the job but no counting; wc -l does the rest:
diff -y --suppress-common-lines file1 file2 | wc -l
Yes you can, and in true Linux fashion you can use a number of commands piped together to perform the task.
First you need to use the diff command, to get the differences in the files.
diff file1 file2
This will give you an output of a list of changes. The ones your interested in are the lines prefixed with a '>' symbol
You use the grep tool to filter these out as follows
diff file1 file2 | grep "^>"
finally, once you have a list of the changes your interested in, you simply use the wc command in line mode to count the number of changes.
diff file1 file2 | grep "^>" | wc -l
and you have a perfect example of the philosophy that Linux is all about.

How to make grep to stop searching in each file after N lines?

It's best to describe the use by a hypothetical example:
Searching for some useful header info in a big collection of email storage (each email in a separate file). e.g. doing stats of top mail client apps used.
Normally if you do grep you can specify -m to stop at first match but let's say an email does not contact X-Mailer or whatever it is we are looking for in a header? It will scan through the whole email. Since most headers are <50 lines performance could be increased by telling grep to search only 50 lines on any file. I could not find a way to do that.
I don't know if it would be faster but you could do this with awk:
awk '/match me/{print;exit}FNR>50{exit}' *.mail
will print the first line matching match me if it appears in the first 50 lines. (If you wanted to print the filename as well, grep style, change print; to print FILENAME ":" $0;)
awk doesn't have any equivalent to grep's -r flag, but if you need to recursively scan directories, you can use find with -exec:
find /base/dir -iname '*.mail' \
-exec awk '/match me/{print FILENAME ":" $0;exit}FNR>50{exit}' {} +
You could solve this problem by piping head -n50 through grep but that would undoubtedly be slower since you'd have to start two new processes (one head and one grep) for each file. You could do it with just one head and one grep but then you'd lose the ability to stop matching a file as soon as you find the magic line, and it would be awkward to label the lines with the filename.
you can do something like this
head -50 <mailfile>| grep <your keyword>
Try this command:
for i in *
do
head -n 50 $i | grep -H --label=$i pattern
done
output:
1.txt: aaaaaaaa pattern aaaaaaaa
2.txt: bbbb pattern bbbbb
ls *.txt | xargs head -<N lines>| grep 'your_string'

Bash Sorting STDIN

I want to write a bash script that sorts the input by rules in different files. The first rule is to write all chars or strings in file1. The second rule is to write all numbers in file2. The third rule is to write all alphanumerical strings in file3. All specials chars must be ignored. Because I am not familiar with bash I don t know how to realize this.
Could someone help me?
Thanks,
Haniball
Thanks for the answers,
I wrote this script,
#!/bin/bash
inp=0 echo "Which filename for strings?"
read strg
touch $strg
echo "Which filename for nums?"
read nums
touch $nums
echo "Which filename for alphanumerics?"
read alphanums
touch $alphanums
while [ "$inp" != "quit" ]
do
echo "Input: "
read inp
echo $inp | grep -o '\<[a-zA-Z]+>' > $strg
echo $inp | grep -o '\<[0-9]>' > $nums
echo $inp | grep -o -E '\<[0-9]{2,}>' > $nums
done
After I ran it, it only writes string in the stringfile.
Greetings, Haniball
Sure can help. See here:
How To Ask Questions The Smart Way
Help Vampires: A Spotter’s Guide
cool site about the bash is here: http://wiki.bash-hackers.org/doku.php
for sorting try man sort
for pattern matching try man grep
other useful tools: man sed man awk man strings man tee
And it is always correct tag your homework as "homework" ;)
You can try something like:
<input_file strings -1 -a | tee chars_and_strings.txt |\
grep "^[A-Za-z0-9][A-Za-z0-9]*$" | tee alphanum.txt |\
grep "^[0-9][0-9]*$" > numonly.txt
The above is only for USA - no international (read unicode) chars, where things coming a little bit more complicated.
grep is sufficient (your question is a bit vague. If I got something wrong, let me know...)
Using the following input file:
this is a string containing words,
single digits as in 1 and 2 as well
as whole numbers 42 1066
all chars or strings
$ grep -o '\<[a-zA-Z]\+\>' sorting_input
this
is
a
string
containing
words
single
digits
as
in
and
as
well
all single digit numbers
$ grep -o '\<[0-9]\>' sorting_input
1
2
all multiple digit numbers
$ grep -o -E '\<[0-9]{2,}\>' sorting_input
42
1066
Redirect the output to a file, i.e. grep ... > file1
Bash really isn't the best language for this kind of task. While possible, ild highly recommend the use of perl, python, or tcl for this.
That said, you can write all of stdin from input to a temporary file with shell redirection. Then, use a command like grep to output matches to another file. It might look something like this.
#!/bin/bash
cat > temp
grep pattern1 > file1
grep pattern2 > file2
grep pattern3 > file3
rm -f temp
Then run it like this:
cat file_to_process | ./script.sh
I'll leave the specifics of the pattern matching to you.

linux command grep -is "abc" filename|wc -l

what does the s mean there and also when pipe into wc what is that for? I know it eventually count the number of abc appeared in file filename, but not sure about the option s for and also pipe to wc mean
linux command grep -is "abc" filename|wc -l
output
47
-s means "suppress error messages about unreadable files" and the pipe to wc means "take the output and send it to the wc -l command" which effectively counts the number of lines matched. You can accomplish the same with the -c option to grep: grep -isc "abc" filename
Consider,
command_1 | command_2
Role of the pipe is that- it takes output of command written before it (command_1 here) and supplies that output to the command written after it (command_2 here).
The man page has everything you would want to know about the options for grep:
-s, --no-messages
Suppress error messages about nonexistent or unreadable files.
Portability note: unlike GNU grep, traditional grep did not con-
form to POSIX.2, because traditional grep lacked a -q option and
its -s option behaved like GNU grep's -q option. Shell scripts
intended to be portable to traditional grep should avoid both -q
and -s and should redirect output to /dev/null instead.
The pipe to wc -l is what gives you the count of how many lines the string "abc" appeared on. It isn't necessarily the number of times the string appeared in the file since one line with multiple occurrences is going to be counted as only 1.
grep man page says:
-s, --no-messages suppress error messages
grep returns the lines that have abc (case insensitive) in them. You pipe them to wc to get a count of the number of lines.
From man grep:
-s, --no-messages
Suppress error messages about nonexistent or unreadable files.
The wc command counts line, words and characters. With -l it returns the number of lines.

Resources