Word count after zgrep - linux

I'm a little confused as to what this counts:
zgrep -B 4 "xyz" <filename> | grep -i abcd | wc -l
Does it count the occurences of abcd in the 4 lines xyz? "wc -l" returns a different count than the number of lines present inside the file.

Prints every 'xyz' and 4 lines before from the file.
zgrep -B 4 "xyz" <filename>
Prints only lines, containing (case-insensitive) 'abcd'.
grep -i abcd
Counts the lines.
wc -l
All in all, it shows how many case-insensitive 'abcd' line can be find in file being withing 4 lines before 'xyz'.

It counts the number of abcd strings in the result of your first zgrep -B 4 "xyz" <filename> filter operation.
zgrep -B 4 "xyz" <filename>
will return all lines with the "xyz" string and the four lines that precede them if they exist.

Related

Linux: Number of characters in a text file on lines 'x' through 'y'

How do I print the number of characters on lines x - y of a text file?
I tried using wc -m filename.txt
but I couldn't figure out how to limit the search.
You could use
head -y filename | tail -(y-x+1) | wc -m
You can use the sed command to select the lines you want and then pipe the output into wc. Something like this would select lines 6-10 and print the number of characters:
sed -n '6,10p' filename.txt | wc -m
Try this:
awk '{ print NR, "-", length($0)}' filename.txt
It will print the line number NR and the characters per line length($0) of filename.txt so output will be something like:
1 - 3 # line 1 with 3 characters
2 - 0 # line 2 with no characters
...
In case you just want to print the number of characters for a specific range, let's say from line 1 to 3, this could be used:
awk 'NR>=1 && NR<=3 { print length($0)}' filename.txt

Finding the Number of strings in a File

I'm trying to write a very small program that will check the number of sub strings in a large text file. All it will do is count the first 2000 lines of the text file, find any "TTT" sub-strings, count them, and set a variable to that total. I'm a bit new to shell, so any help would be amazingly appreciated!
#!/bin/bash
$counter=(head -2000 [file name] | grep TTT | grep -o TTT | wc -l)
echo $counter
For what it's worth you might awk better suited for this task:
awk -F"ttt" '{j=(NF-1)+j}END{print j}' filename
This will split each record in your file by delimiter "ttt". Then it counts the number of fields, subtracts one, and adds that to the total.
A file like:
ttt tttttt something
1 5 ttt
tt
one more ttt record
Would be split (visualizing with pipe delim) like:
| || something
1 5 |
tt
one more | record
Counting the number of fields per record:
4
2
1
2
Subtracting one from that:
3
1
0
1
Which totals to 5, which is how many "ttt" substrings are present.
To incorporate this into your script (and fixing your other issue):
#!/bin/bash
counter=$(awk -F"ttt" '{j=(NF-1)+j}END{print j}' filename)
echo $counter
The change here is that when we set a variable in Bash we don't include the $ sign at the front. Only in referencing the variable do we include the $.
You have some minor syntax errors there, probably you meant this:
counter=$(head -2000 [file name] | grep TTT | grep -o TTT | wc -l)
echo $counter
Notice the tiny changes I made there to make it work.
Btw the grep TTT in the middle is redundant, you can simply drop it, that is:
counter=$(head -2000 [file name] | grep -o TTT | wc -l)
grep can already do what you want: counter=$(grep -c TTT $infile). You can limit the number of hits (not lines) with -m NUM, --max-count=NUM, which makes grep stop at the end of the file OR when NUM occurrences are found.

Obtaining the total of coincidences with multiple pattern using grep command

I have a file in Linux contains strings:
CALLTMA
Starting
Starting
Ending
Starting
Ending
Ending
CALLTMA
Ending
I need the quantity of any string (FE. #Ending, # Starting, #CALLTMA). In my example I need obtaining:
CALLTMA : 2
Starting: 3
Ending : 4
I can obtaining this output when I execute 3 commands:
grep -i "Starting" "/myfile.txt" | wc -l
grep -i "Ending" "/myfile.txt" | wc -l
grep -i "CALLTMA" "/myfile.txt" | wc -l
I want to know if it is possible to obtain the same output using only one command.
I try running this command
grep -iE "CALLTMA|Starting|Ending" "/myfile.txt" | wc -l
But this returned the total of coincidences. I appreciate your help .
Use sort and uniq:
sort myfile.txt | uniq -c
The -c adds the counts to the unique lines. If you want to sort the output by frequency, add
| sort -n
to the end (and change to -nr if you want the descending order).
A simple awk way to handle this:
awk '{counts[$1]++} END{for (c in counts) print c, counts[c]}' file
Starting 3
Ending 4
CALLTMA 2
grep -c will work. You can put it all together in a short script:
for i in Starting CALLTMA Ending; do
printf "%-8s : %d\n" "$i" $(grep -c "$i" file.txt)
done
(to enter the search terms as arguments, just use the arguments array for the loop list, e.g. for i in "$#"; do)
Output
Starting : 3
CALLTMA : 2
Ending : 4

bash remove the same in file

I have one issue with getting number different strings.
I have two files, for example :
file1 :
aaa1
aaa4
bbb3
ccc2
and
file2:
bbb3
ccc2
aaa4
How from this get value 1 (in this case aaa1 string reason)?
I have one query but it calculates not only different strings, them also takes into account the order of the rows.
diff file1 file2 | grep "<" | wc -l
Thanks.
You can use grep -v -c with other options as this:
grep -cvwFf file2 file1
1
Options used are:
-c - get the count of matches
-v - invert matches
-w - full word match (to avoid partial matches)
-F - fixed string match
-f - Use a file for matching patterns
As far as I understand your requirements, sorting the files prior to the diff is a quick solution:
sort file1 > file1.sorted
sort file2 > file2.sorted
diff file1.sorted file2.sorted | egrep "[<>]" | wc -l

wc -l is NOT counting last of the file if it does not have end of line character

I need to count all lines of an unix file. The file has 3 lines but wc -l gives only 2 count.
I understand that it is not counting last line because it does not have end of line character
Could any one please tell me how to count that line as well ?
grep -c returns the number of matching lines. Just use an empty string "" as your matching expression:
$ echo -n $'a\nb\nc' > 2or3.txt
$ cat 2or3.txt | wc -l
2
$ grep -c "" 2or3.txt
3
It is better to have all lines ending with EOL \n in Unix files. You can do:
{ cat file; echo ''; } | wc -l
Or this awk:
awk 'END{print NR}' file
This approach will give the correct line count regardless of whether the last line in the file ends with a newline or not.
awk will make sure that, in its output, each line it prints ends with a new line character. Thus, to be sure each line ends in a newline before sending the line to wc, use:
awk '1' file | wc -l
Here, we use the trivial awk program that consists solely of the number 1. awk interprets this cryptic statement to mean "print the line" which it does, being assured that a trailing newline is present.
Examples
Let us create a file with three lines, each ending with a newline, and count the lines:
$ echo -n $'a\nb\nc\n' >file
$ awk '1' f | wc -l
3
The correct number is found.
Now, let's try again with the last new line missing:
$ echo -n $'a\nb\nc' >file
$ awk '1' f | wc -l
3
This still provides the right number. awk automatically corrects for a missing newline but leaves the file alone if the last newline is present.
Respect
I respect the answer from John1024 and would like to expand upon it.
Line Count function
I find myself comparing line counts A LOT especially from the clipboard, so I have defined a bash function. I'd like to modify it to show the filenames and when passed more than 1 file a total. However, it hasn't been important enough for me to do so far.
# semicolons used because this is a condensed to 1 line in my ~/.bash_profile
function wcl(){
if [[ -z "${1:-}" ]]; then
set -- /dev/stdin "$#";
fi;
for f in "$#"; do
awk 1 "$f" | wc -l;
done;
}
Counting lines without the function
# Line count of the file
$ cat file_with_newline | wc -l
3
# Line count of the file
$ cat file_without_newline | wc -l
2
# Line count of the file unchanged by cat
$ cat file_without_newline | cat | wc -l
2
# Line count of the file changed by awk
$ cat file_without_newline | awk 1 | wc -l
3
# Line count of the file changed by only the first call to awk
$ cat file_without_newline | awk 1 | awk 1 | awk 1 | wc -l
3
# Line count of the file unchanged by awk because it ends with a newline character
$ cat file_with_newline | awk 1 | awk 1 | awk 1 | wc -l
3
Counting characters (why you don't want to put a wrapper around wc)
# Character count of the file
$ cat file_with_newline | wc -c
6
# Character count of the file unchanged by awk because it ends with a newline character
$ cat file_with_newline | awk 1 | awk 1 | awk 1 | wc -c
6
# Character count of the file
$ cat file_without_newline | wc -c
5
# Character count of the file changed by awk
$ cat file_without_newline | awk 1 | wc -c
6
Counting lines with the function
# Line count function used on stdin
$ cat file_with_newline | wcl
3
# Line count function used on stdin
$ cat file_without_newline | wcl
3
# Line count function used on filenames passed as arguments
$ wcl file_without_newline file_with_newline
3
3

Resources