This question already has answers here:
Count all occurrences of a string in lots of files with grep
(16 answers)
Closed 25 days ago.
This post was edited and submitted for review 25 days ago and failed to reopen the post:
Original close reason(s) were not resolved
I have a single column CSV file with no header and I want to iteratively find the value of each row and count the number of times it appears in several files.
Something like this:
for i in file.csv:
zcat *json.gz | grep i | wc -l
However, I don't know how to iterate through the csv and pass the values forward
Imagine that file.csv is:
foo,
bar
If foo exists 20 times in *json.gz and bar exists 30 times in *json.gz, I would expect the output of my command to be:
20
30
Here is the solution I found:
while IFS=',' read -r column; do
count=$(zgrep -o "$column" *json.gz | wc -l)
echo "$column,$count"; done < file.csv
You can achieve that with a single grep operation treating file.csv as a patterns file (obtaining patterns one per line):
grep -f file.csv -oh *.json | wc -l
-o - to print only matched parts
-h - to suppress file names from the output
You can iterate through output of cat run through subprocess:
for i in `cat file.csv`: # iterates through all the rows in file.csv
do echo "My value is $i"; done;
using chatgpt :), try this:
#!/bin/bash
# Define the name of the CSV file
csv_file="path/to/file.csv"
# Extract the values from each row of the CSV file
values=$(cut -f1 "$csv_file" | uniq -c)
# Loop through each file
for file in path/to/file1 path/to/file2 path/to/file3
do
# Extract the values from each row of the file
file_values=$(cut -f1 "$file" | uniq -c)
# Compare the values and print the results
for value in $values
do
count=$(echo $value | cut -f1 -d' ')
val=$(echo $value | cut -f2 -d' ')
file_count=$(echo $file_values | grep -o "$val" | wc -l)
echo "$val appears $count times in $csv_file and $file_count times in $file"
done
done
Related
I have certain tags stored in a series of .log files and i would like for the grep to show me Only the values > 31, meaning different to 0 and higher than 31
I have this code:
#! /bin/bash
-exec grep -cFH "tag" {} ; | grep -v ':[0-31]$' >> file.txt
echo < file.txt
Output:
I have this result from the grep:
/opt/logs/folder2.log:31
i was expecting not to bring nothing back if the result is 31 or less but still shows the result 31
i have also tried to add:
|tail -n+31
but didn't work
[0-31] means "0 or 1 or 2 or 3 or 1".
To drop all lines with 0-9, 10-19, 20-29, 30, and 31, you could use the following:
... | grep -ve ':[0-9]$' -e ':[12][0-9]$' -e ':3[01]$'
or as single regex:
... | grep -v ':\([12]\?[0-9]\|3[01]\)$'
With extended grep:
... | grep -vE ':([12]?[0-9]|3[01])$'
This question already has answers here:
Getting the count of unique values in a column in bash
(7 answers)
Closed 2 years ago.
I have a directory with n files in it, all starting with a date in format yyyymmdd.
Example:
20210208_bla.txt
20210208_bla2.txt
20210209_bla.txt
I want know how many files of a certain date I have, so output should be like:
20210208 112
20210209 96
20210210 213
...
Or at least find the different beginnings of the actual files (=the different dates) in my folder.
Thanks
A very simple solution would be to do something like:
ls | cut -f 1 -d _ | sort -n | uniq -c
With your example this gives:
2 20210208
1 20210209
Update: If you need to swap the two columns you can follow: https://stackoverflow.com/a/11967849/2001017
ls | cut -f 1 -d _ | sort -n | uniq -c | awk '{ t = $1; $1 = $2; $2 = t; print; }'
which prints:
20210208 2
20210209 1
I'm trying to write a very small program that will check the number of sub strings in a large text file. All it will do is count the first 2000 lines of the text file, find any "TTT" sub-strings, count them, and set a variable to that total. I'm a bit new to shell, so any help would be amazingly appreciated!
#!/bin/bash
$counter=(head -2000 [file name] | grep TTT | grep -o TTT | wc -l)
echo $counter
For what it's worth you might awk better suited for this task:
awk -F"ttt" '{j=(NF-1)+j}END{print j}' filename
This will split each record in your file by delimiter "ttt". Then it counts the number of fields, subtracts one, and adds that to the total.
A file like:
ttt tttttt something
1 5 ttt
tt
one more ttt record
Would be split (visualizing with pipe delim) like:
| || something
1 5 |
tt
one more | record
Counting the number of fields per record:
4
2
1
2
Subtracting one from that:
3
1
0
1
Which totals to 5, which is how many "ttt" substrings are present.
To incorporate this into your script (and fixing your other issue):
#!/bin/bash
counter=$(awk -F"ttt" '{j=(NF-1)+j}END{print j}' filename)
echo $counter
The change here is that when we set a variable in Bash we don't include the $ sign at the front. Only in referencing the variable do we include the $.
You have some minor syntax errors there, probably you meant this:
counter=$(head -2000 [file name] | grep TTT | grep -o TTT | wc -l)
echo $counter
Notice the tiny changes I made there to make it work.
Btw the grep TTT in the middle is redundant, you can simply drop it, that is:
counter=$(head -2000 [file name] | grep -o TTT | wc -l)
grep can already do what you want: counter=$(grep -c TTT $infile). You can limit the number of hits (not lines) with -m NUM, --max-count=NUM, which makes grep stop at the end of the file OR when NUM occurrences are found.
I have a file in Linux contains strings:
CALLTMA
Starting
Starting
Ending
Starting
Ending
Ending
CALLTMA
Ending
I need the quantity of any string (FE. #Ending, # Starting, #CALLTMA). In my example I need obtaining:
CALLTMA : 2
Starting: 3
Ending : 4
I can obtaining this output when I execute 3 commands:
grep -i "Starting" "/myfile.txt" | wc -l
grep -i "Ending" "/myfile.txt" | wc -l
grep -i "CALLTMA" "/myfile.txt" | wc -l
I want to know if it is possible to obtain the same output using only one command.
I try running this command
grep -iE "CALLTMA|Starting|Ending" "/myfile.txt" | wc -l
But this returned the total of coincidences. I appreciate your help .
Use sort and uniq:
sort myfile.txt | uniq -c
The -c adds the counts to the unique lines. If you want to sort the output by frequency, add
| sort -n
to the end (and change to -nr if you want the descending order).
A simple awk way to handle this:
awk '{counts[$1]++} END{for (c in counts) print c, counts[c]}' file
Starting 3
Ending 4
CALLTMA 2
grep -c will work. You can put it all together in a short script:
for i in Starting CALLTMA Ending; do
printf "%-8s : %d\n" "$i" $(grep -c "$i" file.txt)
done
(to enter the search terms as arguments, just use the arguments array for the loop list, e.g. for i in "$#"; do)
Output
Starting : 3
CALLTMA : 2
Ending : 4
I need to count all lines of an unix file. The file has 3 lines but wc -l gives only 2 count.
I understand that it is not counting last line because it does not have end of line character
Could any one please tell me how to count that line as well ?
grep -c returns the number of matching lines. Just use an empty string "" as your matching expression:
$ echo -n $'a\nb\nc' > 2or3.txt
$ cat 2or3.txt | wc -l
2
$ grep -c "" 2or3.txt
3
It is better to have all lines ending with EOL \n in Unix files. You can do:
{ cat file; echo ''; } | wc -l
Or this awk:
awk 'END{print NR}' file
This approach will give the correct line count regardless of whether the last line in the file ends with a newline or not.
awk will make sure that, in its output, each line it prints ends with a new line character. Thus, to be sure each line ends in a newline before sending the line to wc, use:
awk '1' file | wc -l
Here, we use the trivial awk program that consists solely of the number 1. awk interprets this cryptic statement to mean "print the line" which it does, being assured that a trailing newline is present.
Examples
Let us create a file with three lines, each ending with a newline, and count the lines:
$ echo -n $'a\nb\nc\n' >file
$ awk '1' f | wc -l
3
The correct number is found.
Now, let's try again with the last new line missing:
$ echo -n $'a\nb\nc' >file
$ awk '1' f | wc -l
3
This still provides the right number. awk automatically corrects for a missing newline but leaves the file alone if the last newline is present.
Respect
I respect the answer from John1024 and would like to expand upon it.
Line Count function
I find myself comparing line counts A LOT especially from the clipboard, so I have defined a bash function. I'd like to modify it to show the filenames and when passed more than 1 file a total. However, it hasn't been important enough for me to do so far.
# semicolons used because this is a condensed to 1 line in my ~/.bash_profile
function wcl(){
if [[ -z "${1:-}" ]]; then
set -- /dev/stdin "$#";
fi;
for f in "$#"; do
awk 1 "$f" | wc -l;
done;
}
Counting lines without the function
# Line count of the file
$ cat file_with_newline | wc -l
3
# Line count of the file
$ cat file_without_newline | wc -l
2
# Line count of the file unchanged by cat
$ cat file_without_newline | cat | wc -l
2
# Line count of the file changed by awk
$ cat file_without_newline | awk 1 | wc -l
3
# Line count of the file changed by only the first call to awk
$ cat file_without_newline | awk 1 | awk 1 | awk 1 | wc -l
3
# Line count of the file unchanged by awk because it ends with a newline character
$ cat file_with_newline | awk 1 | awk 1 | awk 1 | wc -l
3
Counting characters (why you don't want to put a wrapper around wc)
# Character count of the file
$ cat file_with_newline | wc -c
6
# Character count of the file unchanged by awk because it ends with a newline character
$ cat file_with_newline | awk 1 | awk 1 | awk 1 | wc -c
6
# Character count of the file
$ cat file_without_newline | wc -c
5
# Character count of the file changed by awk
$ cat file_without_newline | awk 1 | wc -c
6
Counting lines with the function
# Line count function used on stdin
$ cat file_with_newline | wcl
3
# Line count function used on stdin
$ cat file_without_newline | wcl
3
# Line count function used on filenames passed as arguments
$ wcl file_without_newline file_with_newline
3
3