How to compare two columns in same file and store the difference in new file with the unchanged column according to it? - linux

Row Actual Expected
1 AAA BBB
2 CCC CCC
3 DDD EEE
4 FFF GGG
5 HHH HHH
I want to compare actual and expected and store the difference in a file. Like
Row Actual Expected
1 AAA BBB
3 DDD EEE
4 FFF GGG
I have used awk -F, '{if ($2!=$3) {print $1,$2,$3}}' Sample.csv It will only compare Int values not String value

You can use AWK to do this
awk '{if($2!=$3) print $0}' oldfile > newfile
where
$2 and $3 are second and third columns
!= means second and third columns does not match
$0 means whole line
> newfile redirects to new file

I prefer an awk solution (can handle more fields and easier to understand), but you could use
sed -r '/\t([^ ]*)\t\1$/d' Sample.csv

Assuming the file uses tab or some other delimiter to separate the columns, then tsv-filter from eBay's TSV Utilities supports this type of field comparison directly. For the file above:
$ tsv-filter --header --ff-str-ne 2:3 file.tsv
Row Actual Expected
1 AAA BBB
3 DDD EEE
4 FFF GGG
The --ff-str-ne option compares two fields in a row for non-equal strings.
Disclaimer: I'm the author.

Related

shell duplicate spaces in file

Is it possible to remove multiple spaces from a text file and save the changes in the same file using awk or grep?
Input example:
aaa bbb ccc
ddd yyyy
Output I want:
aaa bbb ccc
ddd yyyy
Simply reset value of $1 to again $1 which will allow OFS to come into picture and will add proper spaces into lines.
awk '{$1=$1} 1' Input_file
EDIT: Since OP mentioned that what if we want to keep only starting spaces then try following.
awk '
match($0,/^ +/){
spaces=substr($0,RSTART,RLENGTH)
}
{
$1=$1
$1=spaces $1
spaces=""
}
1
' Input_file
Using sed
sed -i -E 's#[[:space:]]+# #g' < input file
For removing spaces at the start
sed -i -E 's#[[:space:]]+# #g; s#^ ##g' < input file
Demo:
$cat test.txt
aaa bbb ccc
ddd yyyy
Output I want:
aaa bbb ccc
ddd yyyy
$sed -i -E 's#[[:space:]]+# #g' test.txt
$cat test.txt
aaa bbb ccc
ddd yyyy
Output I want:
aaa bbb ccc
ddd yyyy
$

compare columns from different files and print those that DO NOT match

I have two files, file1 and file2. I want to compare several columns - $1,$2 ,$3 and $4 of file1 with several columns $1,$2, $3 and $4 of file2 and print those rows of file2 that do not match any row in file1.
E.g.
file1
aaa bbb ccc 1 2 3
aaa ccc eee 4 5 6
fff sss sss 7 8 9
file2
aaa bbb ccc 1 f a
mmm nnn ooo 1 d e
aaa ccc eee 4 a b
ppp qqq rrr 4 e a
sss ttt uuu 7 m n
fff sss sss 7 5 6
I want to have as output:
mmm nnn ooo 1 d e
ppp qqq rrr 4 e a
sss ttt uuu 7 m n
I have seen questions asked here for finding those that do match and printing them, but not viceversa,those that DO NOT match.
Thank you!
Use the following script:
awk '{k=$1 FS $2 FS $3 FS $4} NR==FNR{a[k]; next} !(k in a)' file1 file2
k is the concatenated value of the columns 1, 2, 3 and 4, delimited by FS (see comments), and will be used as a key in a search array a later. NR==FNR is true while reading file1. I'm creating the array a indexed by k while reading file1.
For the remaining lines of input I check with !(k in a) if the index does not exists in a. If that evaluates to true awk will print that line.
here is another approach if the files are sorted and you know the used char set.
$ function f(){ sed 's/ /~/g;s/~/ /4g' $1; }; join -v2 <(f file1) <(f file2) |
sed 's/~/ /g'
mmm nnn ooo 1 d e
aaa ccc eee 4 a b
ppp qqq rrr 4 e a
sss ttt uuu 7 m n
fff sss sss 7 5 6
create a key field by concatenating first four fields (with a ~ char, but any unused char can be used), use join to find the unmatched entries from file2 and partition the synthetic key field back.
However, the best way is to use awk solution with a slight fix
$ awk 'NR==FNR{a[$1,$2,$3,$4]; next} !(($1,$2,$3,$4) in a)' file1 file2
No doubt that the awk solution from #hek2mgl is better than this one, but for information this is also possible using uniq, sort, and rev:
rev file1 file2 | sort -k3 | uniq -u -f2 | rev
rev is reverting both files from right to left.
sort -k3 is sorting lines skipping the 2 first column.
uniq -u -f2 prints only lines that are unique (skipping the 2 first while comparing).
At last the rev is reverting back the lines.
This solution sorts the lines of both files. That might be desired or not.

In AWK, how to split consecutive rows that have the same string as a "record"?

Let's say I have below text.
aaaaaaa
aaaaaaa
bbb
bbb
bbb
ccccccccccccc
ddddd
ddddd
Is there a way to modify the text as the following.
1 aaaaaaa
1 aaaaaaa
2 bbb
2 bbb
2 bbb
3 ccccccccccccc
4 ddddd
4 ddddd
You could use something like this in awk:
$ awk '{print ($0!=p?++i:i),$0;p=$0}' file
1 aaaaaaa
1 aaaaaaa
2 bbb
2 bbb
2 bbb
3 ccccccccccccc
4 ddddd
4 ddddd
i is incremented whenever the current line differs from the previous line. p holds the value of the previous line, $0.
Alternatively, as suggested by JID:
awk '$0!=p{p=$0;i++}{print i,$0}' file
When the current line differs from p, replace p and increment i. See the comments for discussion of the pros and cons of either approach :)
A further contribution (and even shorter!) by NeronLeVelu
$ awk '{print i+=($0!=p),p=$0}' file
This version performs the addition assignment and basic assignment within the print statement. This works because the return value of each assignment is the value that has been assigned.
As pointed out in the comments, if the first line of the file is empty, the behaviour changes slightly. Assuming that the first line should always begin with a 1, the following block can be added to the start of any of the one-liners:
NR==1{p=$0;i=1}
i.e. on the first line, initialise p to the contents of the line (empty or not) and i to 1. Thanks to Wintermute for this suggestion.

merge specific line using awk and sed

I want to merge specific line
Input :
AAA
BBB
CCC
DDD
EEE
AAA
BBB
DDD
CCC
EEE
Output Should be
AAA
BBB
CCC DDD
EEE
AAA
BBB
DDD
CCC EEE
I want to search CCC and merge next line with it.
I have tried with awk command but didn't get success
Use awk patterns, if the line matches /CCC/ then print the line with a space at the end and go on to the next line. Otherwise (1), print the line.
awk '/CCC/ { printf("%s ", $0); next } 1' file
Using sed:
sed '/CCC/ { N; s/\n/ / }' file
Using awk:
awk '{ ORS=(/CCC/ ? FS : RS) }1' file

Delete whole line NOT containing given string

Is there a way to delete the whole line if it contains specific word using sed? i.e.
I have the following:
aaa bbb ccc
qqq fff yyy
ooo rrr ttt
kkk ccc www
I want to delete lines that contain 'ccc' and leave other lines intact. In this example the output would be:
qqq fff yyy
ooo rrr ttt
All this using sed. Any hints?
sed -n '/ccc/!p'
or
sed '/ccc/d'

Resources