Compare two files and write the unmatched numbers in a new file - linux

I have two files where ifile1.txt is a subset of ifile2.txt.
ifile1.txt ifile2.txt
2 2
23 23
43 33
51 43
76 50
81 51
100 72
76
81
89
100
Desire output
ofile.txt
33
50
72
89
I was trying with
diff ifile1.txt ifile2.txt > ofile.txt
but it is giving different format of output.

Since your files are sorted, you can use the comm command for this:
comm -1 -3 ifile1.txt ifile2.txt > ofile.txt
-1 means omit the lines unique to the first file, and -3 means omit the lines that are in both files, so this shows just the lines that are unique to the second file.

This will do your job:
diff file1 file2 |awk '{print $2}'

You could try:
diff file1 file2 | awk '{print $2}' | grep -v '^$' > output.file

Related

Replace first few lines with first few lines from other file

I am working on Linux. I have 2 files - file1.dat and file2.dat.
cat file1.dat
1
2
3
4
5
6
7
8
9
10
and for file2:
cat file2.dat
1a
2a
3a
4a
5a
6a
7a
8a
9a
10a
I want to replace first 4 lines from file1.dat with first 3 lines from file2.dat. So my output would be following
cat file1.dat
1a
2a
3a
5
6
7
8
9
10
I tried following input:
sed -i.bak '1,4d;3r file2.dat' file1.dat
But with this input I have following output:
5
6
7
8
9
10
How should I modify input command? I tried various combinations.
Following awk may also help you in same, tested codes in GNU awk.
Solution 1st:
awk 'FNR==NR && FNR<4{print;next} FNR>4 && FNR!=NR' file2.dat file1.dat
Solution 2nd:
awk 'FNR==NR && FNR==4{nextfile} FNR==NR{print;next} FNR>4 && FNR!=NR' file2.dat file1.dat
OR
awk 'FNR==NR{if(FNR==4){nextfile};print;next} FNR>4 && FNR!=NR' file2.dat file1.dat
Solution 3rd: Using awk and head and tail command's combinations here.
awk 'FNR==1{system("head -n3 file2.dat");next} 1' <(tail -n +4 file1.dat)
Assuming GNU sed
$ sed '3q' f2 | sed -e '3r /dev/stdin' -e '1,4d' f1
1a
2a
3a
5
6
7
8
9
10
sed '3q' f2 gives the first three lines from second file
-e '3r /dev/stdin' use stdin data
-e '1,4d' delete required lines
order is important - first r then d
For small number of lines, you can also use
sed -e '3R f2' -e '3R f2' -e '3R f2' -e '1,4d' f1
R command reads one line at a time
With GNU coreutils, this would probably be better for all/most scenarios
head -n3 f2; tail -n +5 f1
awk is your friend
Script
# awk 'NR==FNR && FNR<=3 || NR>FNR && FNR>4' file2 file1
Output
1a
2a
3a
5
6
7
8
9
10
Tips
NR - Total number of records processed
FNR - Total number of records processed but resets when reading a new file.
When a condition evaluates to true and no extra commands are given,awk just prints.
All good :-)

Compare two files having different column numbers and print the requirement to a new file if condition satisfies

I have two files with more than 10000 rows:
File1 has 1 col File2 has 4 col
23 23 88 90 0
34 43 74 58 5
43 54 87 52 3
54 73 52 35 4
. .
. .
I want to compare each value in file-1 with that in file-2. If exists then print the value along with other three values in file-2. In this example output will be:
23 88 90 0
43 74 58 5
54 87 52 3
.
.
I have written following script, but it is taking too much time to execute.
s1=1; s2=$(wc -l < File1.txt)
while [ $s1 -le $s2 ]
do n=$(awk 'NR=="$s1" {print $1}' File1.txt)
p1=1; p2=$(wc -l < File2.txt)
while [ $p1 -le $p2 ]
do awk '{if ($1==$n) printf ("%s %s %s %s\n", $1, $2, $3, $4);}'> ofile.txt
(( p1++ ))
done
(( s1++ ))
done
Is there any short/ easy way to do it?
You can do it very shortly using awk as
awk 'FNR==NR{found[$1]++; next} $1 in found'
Test
>>> cat file1
23
34
43
54
>>> cat file2
23 88 90 0
43 74 58 5
54 87 52 3
73 52 35 4
>>> awk 'FNR==NR{found[$1]++; next} $1 in found' file1 file2
23 88 90 0
43 74 58 5
54 87 52 3
What it does?
FNR==NR Checks if FNR file number of record is equal to NR total number of records. This will be same only for the first file, file1 because FNR is reset to 1 when awk reads a new file.
{found[$1]++; next} If the check is true then creates an associative array indexed by $1, the first column in file1
$1 in found This check is only done for the second file, file2. If column 1 value, $1 is and index in associative array found then it prints the entire line ( which is not written because it is the default action)

reformatting report file using linux shell commands combining multiple lines output into one

I have a file that contains the following input:
name: ted
position:11.11.11.11"
applicationKey:88
channel:45
protocol:4
total:350
name:janet
position:170.198.80.209
applicationKey:256
channel:44
protocol:4
total:1
I like the out put to look like this
tedd 11.11.11.11 88 45 4 350
janet 170.198.80.209 256 44 4 1
Can someone help with this please ?
This should work:
awk -F':' '{printf "%s %s",$2,ORS=NF?"":"\n"}END{print "\n"}' file
$ cat file
name:ted
position:11.11.11.11
applicationKey:88
channel:45
protocol:4
total:350
name:janet
position:170.198.80.209
applicationKey:256
channel:44
protocol:4
total:1
$ awk -F':' '{printf "%s %s",$2,ORS=NF?"":"\n"}END{print "\n"}' file
ted 11.11.11.11 88 45 4 350
janet 170.198.80.209 256 44 4 1

Linux-Change line between two awk scripts

I have two awk scripts to run in Linux. The output of each one is in one line.
How can I separate the two output into two lines?
For example:
awk '{printf $1}' f.txt >> a.txt
awk '{printf $3}' f.txt >> a.txt
The output of the first script is:
35 56 40 28 57
And the second output is:
29 48 73 26
If I run them one after another, the output will become:
35 56 40 28 57 29 48 73 26
Is there any way to get the result to:
35 56 40 28 57
29 48 73 26
Thank you!~
Although I don't understand how you manage to get the spaces between fields the way you do it, you can add an END statement to the first script:
awk '{printf $1} END{print "\n"}'
You can also do this with a single awk command:
awk -v ORS=" " 'BEGIN{ARGV[ARGC++] = ARGV[1]; i = 1 }
NR!=FNR && FNR==1 { printf "\n"; i=3 }
{ print $i }
END { printf "\n" }' f.txt

how to subset a file - select a numbers of rows or columns

I would like to have your advice/help on how to subset a big file (millions of rows or lines).
For example,
(1)
I have big file (millions of rows, tab-delimited). I want to a subset of this file with only rows from 10000 to 100000.
(2)
I have big file (millions of columns, tab-delimited). I want to a subset of this file with only columns from 10000 to 100000.
I know there are tools like head, tail, cut, split, and awk or sed. I can use them to do simple subsetting. But, I do not know how to do this job.
Could you please give any advice? Thanks in advance.
Filtering rows is easy, for example with AWK:
cat largefile | awk 'NR >= 10000 && NR <= 100000 { print }'
Filtering columns is easier with CUT:
cat largefile | cut -d '\t' -f 10000-100000
As Rahul Dravid mentioned, cat is not a must here, and as Zsolt Botykai added you can improve performance using:
awk 'NR > 100000 { exit } NR >= 10000 && NR <= 100000' largefile
cut -d '\t' -f 10000-100000 largefile
Some different solutions:
For row ranges:
In sed :
sed -n 10000,100000p somefile.txt
For column ranges in awk:
awk -v f=10000 -v t=100000 '{ for (i=f; i<=t;i++) printf("%s%s", $i,(i==t) ? "\n" : OFS) }' details.txt
For the first problem, selecting a set of rows from a large file, piping tail to head is very simple. You want 90000 rows from largefile starting at row 10000. tail grabs the back end of largefile starting at row 10000 and then head chops off all but the first 90000 rows.
tail -n +10000 largefile | head -n 90000 -
Was beaten to it for the sed solution, so I'll post a perl dito instead.
To print selected lines.
$ seq 100 | perl -ne 'print if $. >= 10 && $. <= 20'
10
11
12
13
14
15
16
17
18
19
20
To print selective columns, use
perl -lane 'print $F[1] .. $F[3] '
-F is used in conjunction with -a, to choose the delimiter on which to split lines.
To test, use seq and paste to get generate some columns
$ seq 50 | paste - - - - -
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36 37 38 39 40
41 42 43 44 45
46 47 48 49 50
Lets's print everything except the first and the last column
$ seq 50 | paste - - - - - | perl -lane 'print join " ", $F[1] .. $F[3]'
2 3 4
7 8 9
12 13 14
17 18 19
22 23 24
27 28 29
32 33 34
37 38 39
42 43 44
47 48 49
In the join statement above, there is a tab, you get it by doing a ctrl-v tab.

Resources