How to find uncommon lines between two text files using shell script? - linux

I have two text files file1.txt & file2.txt
file1.txt Contains :
a
b
c
file2.txt Contains :
a
b
c
d
e
f
The Output Should be :
d
e
f
The command i'm trying to use is 'diff file2.txt file1.txt'
It gives the common lines only.

Assuming that the input files are sorted:
join -v 2 file1.txt file2.txt
Check man join for details on all the other things join can do for you.

please try below ones
grep -vf file1.txt file2.txt
comm -13 file1.txt file2.txt
for diff you have to perform something extra
diff inp inp1 | grep '>' | cut -f2 -d' '

Related

How to print only words that doesn't match between two files? [duplicate]

This question already has answers here:
Compare two files line by line and generate the difference in another file
(14 answers)
Closed 2 years ago.
FILE1:
cat
dog
house
tree
FILE2:
dog
cat
tree
I need to be printed only:
house
$ cat file1
cat
dog
house
tree
$ cat file2
dog
cat
tree
$ grep -vF -f file2 file1
house
The -v flag only shows non-matches, -f is for a filename to use as a filter, and -F is for exact matches (doesn't slow it down with any pattern matching).
Using awk
awk 'FNR==NR{arr[$0]=1; next} !($0 in arr)' FILE2 FILE1
First build an associative array with words from FILE2 and than loop over FILE1 and only print those.
Using comm
comm -2 -3 <(sort FILE1) <(sort FILE2)
-2 suppresses lines unique to FILE2 and -3 suppresses lines found in both.
If you want just the words, you can sort the files, diff them, then use sed to filter out diff's symbols:
diff <(sort file1) <(sort file2) | sed -n '/^</s/^< //p'
Awk is an option here:
awk 'NR==FNR { arr[$1]="1" } NR != FNR { if (arr[$1] == "") { print $0 } } ' file2 file1
Create an array called arr, using the contents of file2 as indexes. Then with file1, look at each entry and check to see if an entry in the array arr exists. If it doesn't, print.

How do I print a line in one file based on a corresponding value in a second file?

I have two files:
File1:
A
B
C
File2:
2
4
3
I would like to print each line in file1 the number of times found on the corresponding line of file2, and then append each line to a separate file.
Desired output:
A
A
B
B
B
B
C
C
C
Here is one of the approaches I have tried:
touch output.list
paste file1 file2 > test.dict
cat test.dict
A 2
B 4
C 3
while IFS="\t" read -r f1 f2
do
yes "$f1" | head -n "$f2" >> output.list
done < test.dict
For my output I get a bunch of lines that read:
head: : invalid number of lines
Any guidance would be greatly appreciated. Thanks!
Change IFS to an ANSI-C quoted string or remove IFS (since the default value already contains tab).
You can also use a process substitution and prevent the temporary file.
while IFS=$'\t' read -r f1 f2; do
yes "$f1" | head -n "$f2"
done < <(paste file1 file2) > output.list
You could loop through the output of paste and use a c-style for loop in replacement of yes and head.
#!/usr/bin/env bash
while read -r first second; do
for ((i=1;i<=second;i++)); do
printf '%s\n' "$first"
done
done < <(paste file1.txt file2.txt)
If you think that the output is correct add
| tee file3.txt
with a white space in between the command substitution to redirect the output to a stdout and the output file which is file3.txt
done < <(paste file1.txt file2.txt) | tee file3.txt
You can do this with an awk 1-liner
$ awk 'NR==FNR{a[++i]=$0;next}{for(j=0;j<$0;j++)print a[FNR]}' File1 File2
A
A
B
B
B
B
C
C
C

Print text after last slash

I have a Data => File1.txt
Data
#demo/file/wk/Fil0.fk
#demo/file/wk/Fil1.fk
#demo/file/wk/Fil2.fk
#demo/file/wk/Fil3.fk
#demo/file/wk/Fil4.fk
Want to Print the data to Another File2.txt in below Format
Fil0.fk
Fil1.fk
Fil2.fk
Fil3.fk
Fil4.fk
Try below code :
cat File1.txt | cut -d'/' -f 4 > File2.txt

How to extract some missing rows by comparing two different files in linux?

I have two diferrent files which some rows are missing in one of the files. I want to make a new file including those non-common rows between two files. as and example, I have following files:
file1:
id1
id22
id3
id4
id43
id100
id433
file2:
id1
id2
id22
id3
id4
id8
id43
id100
id433
id21
I want to extract those rows which exist in file2 but do not in file1:
new file:
id2
id8
id21
any suggestion please?
Use the comm utility (assumes bash as the shell):
comm -13 <(sort file1) <(sort file2)
Note how the input must be sorted for this to work, so your delta will be sorted, too.
comm uses an (interleaved) 3-column layout:
column 1: lines only in file1
column 2: lines only in file2
column 3: lines in both files
-13 suppresses columns 1 and 2, which prints only the values exclusive to file2.
Caveat: For lines to be recognized as common to both files they must match exactly - seemingly identical lines that differ in terms of whitespace (as is the case in the sample data in the question as of this writing, where file1 lines have a trailing space) will not match.
cat -et is a command that visualizes line endings and control characters, which is helpful in diagnosing such problems.
For instance, cat -et file1 would output lines such as id1 $, making it obvious that there's a trailing space at the end of the line (represented as $).
If instead of cleaning up file1 you want to compare the files as-is, try:
comm -13 <(sed -E 's/ +$//' file1 | sort) <(sort file2)
A generalized solution that trims leading and trailing whitespace from the lines of both files:
comm -13 <(sed -E 's/^[[:blank:]]+|[[:blank:]]+$//g' file1 | sort) \
<(sed -E 's/^[[:blank:]]+|[[:blank:]]+$//g' file2 | sort)
Note: The above sed commands require either GNU or BSD sed.
Edit: I only wanted to change 1 character but 6 is the minimum... Delete this...
You can try to sort both files then count the duplicate rows and select only those row where the count is 1
sort file1 file2 | uniq -c | awk '$1 == 1 {print $2}'

Find lines containing ' \N abcd '

How can I find lines that contain a double tab and then \N
It should match, for example, \N abcd
I've tried
grep $'\t'$'\t''\N' file1.txt
grep $'\t\t''\N' file1.txt
grep $'\t\t\N' file1.txt
The following works for me:
RHEL:
$ grep $'\t\t''\\N' file1.txt
OSX:
$ grep '\t\t\\N' file1.txt

Resources