Find different line in 2 UNSORTED files with different size

Find different line in 2 UNSORTED files with different size - linux

I have to files f_hold and f_new. f_new is 2 times bigger than f_hold. Both files are unsorted.
How can I discard lines in f_new which are in f_hold? ex:
f_hold:
aaa
bbb
ccc
ddd
eee
f_new:
ppp
ddd
aaa
ccc
bbb
fff
jjj
nnn
what I want:
ppp
fff
jjj
nnn
So, it is not a simple line by line comparison.
I tried several tips like 'grep -Fxv -f', 'comm' etc... but they are making line by line comparison. Is there a linux command to do that?

for the example you provided, using grep will work:
grep -v -f f_hold f_new
-f flag means "read from file"
-v flag is used for invert matching.
UPDATE:
I guess awk can be much faster:
awk 'NR==FNR{a[$0];next} !($0 in a)' f_hold f_new

Related

intersection between two files linux

I need to compare the difference between two files in lunix exactly I need the intersection for exmple I have in the firts file test.txt this lines
aaaa
bbbb
cccc
dddd
and in the seconde file test2.txt this lines
eeee
ffff
aaaa
gggg
dddd
I need the result as this
aaaa
dddd
I use this commande
comm -23 <(sort -i /var/test.txt) <(sort -i /var/test2.txt) > g.txt
and this is the result
bbbb
cccc
I need the intersection between test and test1 any help
grep take alof of memeory

man comm:
EXAMPLES
comm -12 file1 file2
Print only lines present in both file1 and file2.
So:
$ comm -12 <(sort -i test.txt) <(sort -i test2.txt)
aaaa
dddd

It is unclear whether you are attempting to pick off certain numeric columns (e.. 2, 3, etc..) or if you are attempting to find common words in a line in two separate files (I take the latter to be your goal, let me know if I'm wrong).
In that case, you cannot suppress any column from either file because you don't know which column the common words will reside in after sort. One column agnostic way to fine and output the common words in sort-order is to simply sort one file, take your pick, and then loop over the sorted words calling grep -q to test whether a word is present in the second file, and if so output it (you can control line format as you desire)
One, not especially pretty way to accomplish this is:
for i in $(printf "%s\n" | sort -i test1.txt) ## loop over sorted test1.txt
do
grep -q "$i" test2.txt && echo -n "$i " ## grep if value in test2.txt
done
echo ""
You can wrap it in a subshell (e.g. (....) and just copy and pasted it into your terminal (in the directory with the files test1.txt and test2.txt) to see if this meets your needs, e.g.
Example Use/Output
$ (
> for i in $(printf "%s\n" | sort -i test1.txt)
> do
> grep -q "$i" test2.txt && echo -n "$i "
> done
> echo ""
> )
aaaa dddd
Look things over and let me know if you have further questions.

How to split a text file in linux from the bottom of the file to the top based on a given pattern

How to split (it doesn't matter what command) a text file in Linux from the bottom of the file to the top based on a given pattern.
If I have the file:
111
aaa
222
aaa
333
aaa
The output should be
1st file
333
aaa
2nd file
222
aaa
3rd file
111
aaa
Thank you.

Reverse the file with tac and then run it through csplit. The -k option means that you don't need to know the number of splits in advance.
tac file | csplit -s -k - "/aaa/+1" "{99}"

How to select uncommon rows from two large files using linux (in terminal)?

Both have two columns: names and IDs.(files are in xls or txt format)
File 1:
AAA K0125
ccc K0234
BMN_a K0567
BMN_c K0567
File 2:
AKP K0897
BMN_a K0567
ccc K0234
I want to print uncommon rows using these two files.
how can it be done using linux terminal.

Try something like this:-
join "-t " -j 1 -v 1 file1 file2
Considering the two files are sorted.

First sort both the files and then use comm utility with -3 option
sort file1 > file1_sorted
sort file2 > file2_sorted
comm -3 file1_sorted file2_sorted
A portion from man comm
-3 suppress column 3 (lines that appear in both files)
Output:
AAA K0125
AKP K0897
BMN_c K0567

Merge two files on Linux keeping only lines that appear in both files

In Linux, how can I merge two files and only keep lines that have a match in both files?
Each line is separated by a newline (\n).
So far, I found to sort it, then use comm -12. Is this the best approach (assuming it's correct)?
fileA contains
aaa
bbb
ccc
ddd
fileB contains
aaa
ddd
eee
and I'd like a new file to contain
aaa
ddd

Provided both of your two input files are lexicographically sorted, you can indeed use comm:
$ comm -12 fileA fileB > fileC
If that's not the case, you should sort your input files first:
$ comm -12 <(sort fileA) <(sort fileB) > fileC

Linux, How to sort the lines of a file

I have a file called abc. The content of abc is:
ccc
abc
ccc
ccc
a
b
dd
ccc
I want to sort the lines of the file and delete all duplicates (in this case ccc are duplicates).
In the shell script I use this:
sort -u < $1
But the sorted result becomes the standard output instead of saved into the abc file. How do I do this?

You can redirect output to a file as
sort -u < $1 > abc

try
sort -u abc -o abc_sorted
or if you want to replace the file
sort -u abc -o abc
you could also do
sort abc | uniq > abc_sorted

You can simply do it by using the commands sort uniq , | (pipes) and > (re direction). If your file name is file you can do it simply by the following command:-
sort file | uniq >file

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Find different line in 2 UNSORTED files with different size - linux

for the example you provided, using grep will work: grep -v -f f_hold f_new -f flag means "read from file" -v flag is used for invert matching. UPDATE: I guess awk can be much faster: awk 'NR==FNR{a[$0];next} !($0 in a)' f_hold f_new

Related

intersection between two files linux

How to split a text file in linux from the bottom of the file to the top based on a given pattern

How to select uncommon rows from two large files using linux (in terminal)?

Merge two files on Linux keeping only lines that appear in both files

Linux, How to sort the lines of a file

Categories

Resources