grep matches counted per keyword? - linux

I am using
$ grep -cf keyword_file Folder1/*
to generate a count of how many lines, in files housed within Folder1, match a keyword from the keyword_file.
this generates an aggregate count total such as
file1: 7
file2: 4
file3: 9
my problem:
I would like the output to be in the form below
first_keyword
file1: 5
file2: 0
file3: 5
second_keyword
file1: 0
file2: 3
file3: 1
third_keyword
file1: 2
file2: 1
file3: 3
so that I can see how many times each individual keyword was present in a line of each file.
How do I achieve this?
===== added detail ====
keyword_file is at Documents/script_pad/keyword_file
Folder1 is at Documents/script_pad/Folder1

what worked for me was
creating a file "Documents/script_pad/loop2" containing
#!/bin/bash
cat Documents/script_pad/keyword_file | while read line
do
echo $line; grep -c $line Documents/script_pad/Folder1/*
done
which when run resulted in
$ bash Documents/script_pad/loop2
first_keyword
Documents/script_pad/Folder1/file1:5
Documents/script_pad/Folder1/file2:0
Documents/script_pad/Folder1/file3:5
second_keyword
Documents/script_pad/Folder1/file1:0
Documents/script_pad/Folder1/file2:3
Documents/script_pad/Folder1/file3:1
third_keyword
Documents/script_pad/Folder1/file1:2
Documents/script_pad/Folder1/file2:1
Documents/script_pad/Folder1/file3:3

Related

How can I combine one file's tail with another's head?

I know how to take e.g the first 2 lines from a .txt data and appending it to the end of a .txt data. But how should I add the last 2 lines of a .txt data to the 1st line of a .txt data
I've tried :
tail -n 2 test1.txt >> head test1.txt # takes last 2 lines of text and adding
it to the head
Looks awfully wrong but I can't find the answer anywhere, doing it with tail and head.
tail n 2 test1.txt >> head test1.txt
cat test1.txt
Someone please correct my code so I get my expected result.
Just run the two commands one after the other -- the stdout resulting from doing so will be exactly the same as what you'd get by concatenating their output together, without needing to do an explicit/extra concatenation step:
tail -n 2 test1.txt
head -n 1 test1.txt
If you want to redirect their output together, put them in a brace group:
{
tail -n 2 test1.txt
head -n 1 test1.txt
} >out.txt
What about:
$ cat file1.txt
file 1 line 1
file 1 line 2
file 1 line 3
file 1 line 4
$ cat file2.txt
file 2 line 1
file 2 line 2
file 2 line 3
file 2 line 4
$ tail -n 2 file1.txt > output.txt
$ head -n 1 file2.txt >> output.txt
$ cat output.txt
file 1 line 3
file 1 line 4
file 2 line 1

Extracting difference values between two files [duplicate]

This question already has answers here:
Fast way of finding lines in one file that are not in another?
(11 answers)
Closed 7 years ago.
Working in linux/shell env, how can I accomplish the following:
text file 1 contains:
1
2
3
4
5
text file 2 contains:
6
7
1
2
3
4
I need to extract the entries in file 2 which are not in file 1. So '6' and '7' in this example and now where it found them.
For example, 6, 7 in file 1
I already work with this awk command
awk 'FNR==NR{a[$0]++;next}!a[$0]' file1 file2
But this command can only show the difference, So, 6 and 7 but not where it fouind it.
How can I do this from the command line?
many thanks!
Using awk you can do this:
awk 'FNR==NR { seen[$0]=FILENAME; next }
{if ($1 in seen) delete seen[$1]; else print $1, FILENAME}
END { for (i in seen) print i, seen[i] }' file{1,2}
6 file2
7 file2
5 file1
While traversing file1 we are storing column1 of each row in an array seen with value as FILENAME. Next while iterating file2 we print each missing entry and delete if entry is found (common entries). Finally in END block we print all remaining entries from array seen.
The comm program will tell you what lines files have in common (or are unique to one file). comm works best when the files are sorted lexically.
$ echo "only in file1"; comm -2 -3 <(sort file1) <(sort file2)
only in file1
5
$ echo "only in file2"; comm -1 -3 <(sort file1) <(sort file2)
only in file2
6
7
$ echo "common to file1 and file2"; comm -1 -2 <(sort file1) <(sort file2)
common to file1 and file2
1
2
3
4

how to sort a file according to another file?

Is there a unix oneliner or some other quick way on linux to sort a file according to a permutation set by the sorting of another file?
i.e.:
file1: (separated by CRLFs, not spaces)
2
3
7
4
file2:
a
b
c
d
sorted file1:
2
3
4
7
so the result of this one liner should be
sorted file2:
a
b
d
c
paste file1 file2 | sort | cut -f2
Below is a perl one-liner that will print the contents of file2 based on the sorted input of file1.
perl -n -e 'BEGIN{our($x,$t,#a)=(0,1,)}if($t){$a[$.-1]=$_}else{$a[$.-1].=$_ unless($.>$x)};if(eof){$t=0;$x=$.;close ARGV};END{foreach(sort #a){($j,$l)=split(/\n/,$_,2);print qq($l)}}' file1 file2
Note: If the files are different lengths, the output will only print up to the shortest file length.
For example, if file-A has 5 lines and file-B has 8 lines then the output will only be 5 lines.

Combine file names and content

I got several files like this:
First file is named XXX
1
2
3
Second file is named YYY
4
5
6
I would like to write content and the file names to a separate file that would look like this:
1 XXX
2 XXX
3 XXX
4 YYY
5 YYY
6 YYY
Can someone suggest a way to do this?
awk '{print $0,FILENAME}' file1 file2
Or Ruby(1.9+)
$ ruby -ne 'puts "#{$_.chomp} #{ARGF.filename}"' file1 file2
Without further explanation of what you actually need this should work:
for file in $(ls)
do
echo -n $file >> outfile
cat $file >> outfile
done

linux file compare string unincluded

how can i compare 2 files in LINUX containing for example:
file1
1
2
3
4
5
file2
1
2
3
and to get the result
file3
4
5
How about using comm: Select or reject lines common to two files?
comm -3 file1 file2 > file3
would work for your simple example.
If you want to list all the lines that are in file1, but not in file2, you can do this:
diff file1 file2 | grep "^<" | sed "s/^< //" > file3

Resources