Combine Two Files With Common Column - linux

I have two files that look like the following
First File:
FileA
FileB
FileC
Second File:
FileA 2
FileC 2
I want the third file to look like the following:
FileA FileA 2
FileB
FileC FileC 2
Basically I'm doing a selective paste. I'm open to any awk or sed solution in order to achieve the desired results.

It's a job for join:
join -a1 -o 1.1 2.1 2.2 file1 file2

Using awk you can do:
awk 'FNR == NR{a[$1]=$0; next} {print $0, a[$1]}' file2 file1
FileA FileA 2
FileB
FileC FileC 2

Related

How to print only words that doesn't match between two files? [duplicate]

This question already has answers here:
Compare two files line by line and generate the difference in another file
(14 answers)
Closed 2 years ago.
FILE1:
cat
dog
house
tree
FILE2:
dog
cat
tree
I need to be printed only:
house
$ cat file1
cat
dog
house
tree
$ cat file2
dog
cat
tree
$ grep -vF -f file2 file1
house
The -v flag only shows non-matches, -f is for a filename to use as a filter, and -F is for exact matches (doesn't slow it down with any pattern matching).
Using awk
awk 'FNR==NR{arr[$0]=1; next} !($0 in arr)' FILE2 FILE1
First build an associative array with words from FILE2 and than loop over FILE1 and only print those.
Using comm
comm -2 -3 <(sort FILE1) <(sort FILE2)
-2 suppresses lines unique to FILE2 and -3 suppresses lines found in both.
If you want just the words, you can sort the files, diff them, then use sed to filter out diff's symbols:
diff <(sort file1) <(sort file2) | sed -n '/^</s/^< //p'
Awk is an option here:
awk 'NR==FNR { arr[$1]="1" } NR != FNR { if (arr[$1] == "") { print $0 } } ' file2 file1
Create an array called arr, using the contents of file2 as indexes. Then with file1, look at each entry and check to see if an entry in the array arr exists. If it doesn't, print.

AWK to filter to files if their columns match

I basically am working with two files (file1 and file2). The goal is to write a script that pulls rows from file1, if columns 1,2,3 match between files1 and files2. Here's the code I have been playing with:
awk -F'|' 'NR==FNR{c[$1$2$3]++;next};c[$1$2$3] > 0' file1 file2 > filtered.txt
ile1 and file2 both look like this (but has many more columns):
name1 0 c
name1 1 c
name1 2 x
name2 3 x
name2 4 c
name2 5 c
The awk code I provided isn't producing any output. Any help would be appreciated!
your delimiter isn't pipe, try this
$ awk 'NR==FNR {c[$1,$2,$3]++; next} c[$1,$2,$3]' file1 file2 > filtered.txt
or
$ awk 'NR==FNR {c[$0]++; next} c[$0]' file1 file2 > filtered.txt
however, if you're matching the whole line perhaps easier with grep
$ grep -xFf file1 file2 > filtered.txt
awk '{key=$1 FS $2 FS $3} NR==FNR{file2[key];next} key in file2' file2 file1

How to extract some missing rows by comparing two different files in linux?

I have two diferrent files which some rows are missing in one of the files. I want to make a new file including those non-common rows between two files. as and example, I have following files:
file1:
id1
id22
id3
id4
id43
id100
id433
file2:
id1
id2
id22
id3
id4
id8
id43
id100
id433
id21
I want to extract those rows which exist in file2 but do not in file1:
new file:
id2
id8
id21
any suggestion please?
Use the comm utility (assumes bash as the shell):
comm -13 <(sort file1) <(sort file2)
Note how the input must be sorted for this to work, so your delta will be sorted, too.
comm uses an (interleaved) 3-column layout:
column 1: lines only in file1
column 2: lines only in file2
column 3: lines in both files
-13 suppresses columns 1 and 2, which prints only the values exclusive to file2.
Caveat: For lines to be recognized as common to both files they must match exactly - seemingly identical lines that differ in terms of whitespace (as is the case in the sample data in the question as of this writing, where file1 lines have a trailing space) will not match.
cat -et is a command that visualizes line endings and control characters, which is helpful in diagnosing such problems.
For instance, cat -et file1 would output lines such as id1 $, making it obvious that there's a trailing space at the end of the line (represented as $).
If instead of cleaning up file1 you want to compare the files as-is, try:
comm -13 <(sed -E 's/ +$//' file1 | sort) <(sort file2)
A generalized solution that trims leading and trailing whitespace from the lines of both files:
comm -13 <(sed -E 's/^[[:blank:]]+|[[:blank:]]+$//g' file1 | sort) \
<(sed -E 's/^[[:blank:]]+|[[:blank:]]+$//g' file2 | sort)
Note: The above sed commands require either GNU or BSD sed.
Edit: I only wanted to change 1 character but 6 is the minimum... Delete this...
You can try to sort both files then count the duplicate rows and select only those row where the count is 1
sort file1 file2 | uniq -c | awk '$1 == 1 {print $2}'

How to store missmatching rows from two files to a new file

I have two input files as follows And I need to write the mismatching rows from second file to a new file.Each column in the file is separated by a tab space
Input 1
1 94564350 . C A
1 94564350 . C T
Input 2
1 94564351 . A T
1 94564351 . A C
1 94564350 . C A
and the Output is
1 94564351 . A T
1 94564351 . A C
I have tried this command
awk -F"\t" 'NR==FNR{a[$0];next}($2 in a)&& $1>=3' fileB fileA >fileC
but not working.
awk 'NR == FNR{a[$0];next} !($0 in a)' fileA fileB
above command also taking too much time for big files is there any other options to do the same
Try this taken from Idiomatic awk:
awk 'NR == FNR{a[$0];next} !($0 in a)' fileA fileB
You don't need to assign -F="\t", awk interprets it properly on files like these.
Test
$ awk 'NR == FNR{a[$0];next} !($0 in a)' fileA fileB
1 94564351 . A T
1 94564351 . A C

script to join 2 separate text files and also add specified text

hi there i want to creat a bash script that does the following:
i have 2 texts files one is adf.ly links and the other Recipie names
i want to creat a batch scrript that takes each line from each text file and do the following
<li>**Recipie name line 1 of txt file**</li>
<li>**Recipie name line 2 of txt file**</li>
ect ect and save all the results to another text file called LINKS.txt
someone please help or point me in direction of linux bash script
this awk one-liner will do the job:
awk 'BEGIN{l="<li>%s</li>\n"}NR==FNR{a[NR]=$0;next}{printf l, a[FNR],$0}' file1 file2
more clear version (same script):
awk 'BEGIN{l="<li>%s</li>\n"}
NR==FNR{a[NR]=$0;next}
{printf l, a[FNR],$0}' file1 file2
example:
kent$ seq -f"%g from file1" 7 >file1
kent$ seq -f"%g from file2" 7 >file2
kent$ head file1 file2
==> file1 <==
1 from file1
2 from file1
3 from file1
4 from file1
5 from file1
6 from file1
7 from file1
==> file2 <==
1 from file2
2 from file2
3 from file2
4 from file2
5 from file2
6 from file2
7 from file2
kent$ awk 'BEGIN{l="<li>%s</li>\n"};NR==FNR{a[NR]=$0;next}{printf l, a[FNR],$0}' file1 file2
<li>1 from file2</li>
<li>2 from file2</li>
<li>3 from file2</li>
<li>4 from file2</li>
<li>5 from file2</li>
<li>6 from file2</li>
<li>7 from file2</li>
EDIT for the comment of OP:
if you have only one file: (the foo here is just dummy text)
awk 'BEGIN{l="<li>foo</li>\n"}{printf l,$0}' file1
output from same file1 example:
<li>foo</li>
<li>foo</li>
<li>foo</li>
<li>foo</li>
<li>foo</li>
<li>foo</li>
<li>foo</li>
if you want to save the output to a file:
awk 'BEGIN{l="<li>foo</li>\n"}{printf l,$0}' file1 > newfile
Try doing this :
$ cat file1
aaaa
bbb
ccc
$ cat file2
111
222
333
$ paste file1 file2 | while read a b; do
printf '<li>%s</li>\n' "$a" "$b"
done | tee newfile
Output
<li>111</li>
<li>222</li>
<li>333</li>

Resources