How to use awk to get the result of computation of column1 value of the same column2 value in 2 csv files in Ubuntu? - linux

I am using ubuntu and we got a csv file1.csv with 2 columns looks like
a,1
b,2
c,3
...
and another file2.csv with 2 columns looks like
a,4
b,3
d,2
...
Some of column 1 value appear in file1.csv but not in file2.csv and vice cersa and these values should not be in result.csv. Say the value of first column in file1.csv is x and the value of first column in file2.csv with the same column2 value is y. How to use awk to compute (x-y)/(x+y) of second lines of 2 csv files in Ubuntu to get the result.csv like this:
a,-0.6
b,-0.2
-0.6 is computed by (1-4)/(1+4)
-0.2 is computed by (2-3)/(2+3)

What about this?
$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1]=$2; next} {if ($1 in a) print $1,(a[$1]-$2)/(a[$1]+$2)}' f1 f2
a,-0.6
b,-0.2
Explanation
BEGIN{FS=OFS=","} set input and output field separators as comma.
FNR==NR {a[$1]=$2; next} when processinig first file, store in the array a[] the values like a[first col]=second col.
{if ($1 in a) print $1,(a[$1]-$2)/(a[$1]+$2)} when looping through second file, on each line do: check if the first col is stored in the a[] array; if so, print (x-y)/(x+y), being x=stored value and y=current second column.

Related

Grep csv with arguments from another csv

If I have the following in file1.csv (space delimited):
RED 4 VWX
BLUE 2 MNO
BLUE 7 DEF
PURPLE 6 JKL
BLACK 8 VWX
BROWN 1 MNO
RED 1 GHI
RED 7 ABC
And the following in file2.csv (comma delimited):
BROWN,2
RED,5
YELLOW,8
Is there a way to use file2.csv to search file1.csv for matching lines? Currently, if I want to use line 1 terms from file2.csv to search file1.csv, I have to manually enter the following:
grep "BROWN" file1.csv | grep "2"
I would like to automate this search to find lines in file1.csv that match BOTH items in a given line in file2.csv. I have tried some awk commands, but am having a hard time using awk output as an argument in grep. I am running all this through a standard Mac terminal (so I guess I'm using bash?) Any help is greatly appreciated. Thank you!
awk one-liner
awk 'FNR==NR{a[$1]=$2; next} ($1 in a) && a[$1]==$2' FS=, file2.csv FS=" " file1.csv
FNR==NR{a[$1]=$2; next} : To read first input file, here file2.csv and create an associative array a with keys as column 1 in file2.csv and value as item number.
($1 in a) && a[$1]==$2: While iterating over second input file i.e file1.csv here check if column 1 value exists as key in array a. If it exists check if item number matches. If it matches the result will be 1 and line would be printed.
Or
simply using grep
grep -wf <(tr "," " " <file2) file1
Here we are replacing , with space in file2 using tr and using each line in file2 as pattern to search in file1 using the -f option provided by our lovely grep
-w is to match with word boundaries so that ABC 1 won't match with ABC 123
You can use awk to do the matching.
awk 'BEGIN { FS=","; }
NR == FNR { a[$1,$2]++; next; }
FNR == 1 { FS=" "; }
($1,$2) in a' file2.csv file1.csv
The second line creates an array whose keys are the values in file2.csv. The third line changes the field separator to space when we're processing file1.csv, and the last line matches any line where the first two fields are a key in the array.

Subtract a constant number from a column

I have two large files (~10GB) as follows:
file1.csv
name,id,dob,year,age,score
Mike,1,2014-01-01,2016,2,20
Ellen,2, 2012-01-01,2016,4,35
.
.
file2.csv
id,course_name,course_id
1,math,101
1,physics,102
1,chemistry,103
2,math,101
2,physics,102
2,chemistry,103
.
.
I want to subtract 1 from the "id" columns of these files:
file1_updated.csv
name,id,dob,year,age,score
Mike,0,2014-01-01,2016,2,20
Ellen,0, 2012-01-01,2016,4,35
file2_updated.csv
id,course_name,course_id
0,math,101
0,physics,102
0,chemistry,103
1,math,101
1,physics,102
1,chemistry,103
I have tried awk '{print ($1 - 1) "," $0}' file2.csv, but did not get the correct result:
-1,id,course_name,course_id
0,1,math,101
0,1,physics,102
0,1,chemistry,103
1,2,math,101
1,2,physics,102
1,2,chemistry,103
You've added an extra column in your attempt. Instead set your first field $1 to $1-1:
awk -F"," 'BEGIN{OFS=","} {$1=$1-1;print $0}' file2.csv
That semicolon separates the commands. We set the delimiter to comma (-F",") and the Output Field Seperator to comma BEGIN{OFS=","}. The first command to subtract 1 from the first field executes first, then the print command executes second, so the entire record, $0, will now contain the new $1 value when it's printed.
It might be helpful to only subtract 1 from records that are not your header. So you can add a condition to the first command:
awk -F"," 'BEGIN{OFS=","} NR>1{$1=$1-1} {print $0}' file2.csv
Now we only subtract when the record number (NR) is greater than 1. Then we just print the entire record.

AWK write to new column base on if else of other column

I have a CSV file with columns A,B,C,D. Column D contains values on a scale of 0 to 1. I want to use AWK to write to a new column E base in values in column D.
For example:
if value in column D <0.7, value in column E = 0.
if value in column D>=0.7, value in column E =1.
I am able to print the output of column E but not sure how to write it to a new column. Its possible to write the output of my code to a new file then paste it back to the old file but i was wondering if there was a more efficient way. Here is my code:
awk -F"," 'NR>1 {if ($3>=0.7) $4= "1"; else if ($3<0.7) $4= "0"; print $4;}' test_file.csv
below awk command should give you intended output
cat yourfile.csv|awk -F "," '{if($4>=0.7)print $0",1";else if($4<0.7)print $0",0"}' > test_file.csv
You can use:
awk -F, 'NR>1 {$0 = $0 FS (($4 >= 0.7) ? 1 : 0)} 1' test_file.csv

insert values of a column into other column

I have a tab-delimited .txt file with two columns and long list of values in both columns
col1 col2
1 a
2 b
3 c
... ...
I want to convert this now to
col1
1
a
2
b
3
c
So that he insert the values from column 2 into column 1 at the correct location.
Is there any way to do this, maybe using awk, or something else through the command line?
You can ask awk to print first column and then second column. By using print for each case, you ensure you have a new line in between them:
awk -F"\t" '{print $1; print $2}' file
Or the following if you just want to print the 1st column on the first line:
awk -F"\t" 'NR==1 {print $1; next} {print $1; print $2}' file
The second command returns the following for your given input:
col1
1
a
2
b
3
c
this should do:
awk -F"\t" -v OFS="\n" '{$1=$1}7' file

comparing two files with different columns

i have the two files(count.txt, count1.txt). i need to do the following
1. get the values from count.txt and count1.txt where 1st column is equal.
2. if its equal need to compare the 2nd column like ((1st column value + 5) >= 2 column value)
count.txt
order1,150
order2,165
order3,125
count1.txt
order1,155
order2,170
order3,125
order4,123
and i want the output like below,
Output.txt
order1,155
order2,170
i have used below nawk command for the 1st point, but not able to complete the 2nd point. Please suggest to achieve the same.
nawk -F"," 'NR==FNR {a[$1];next} ($1 in a)' count.txt count1.txt
nawk -F"," 'NR==FNR {a[$1]=$2;next} ($1 in a) && (a[$1]+5)<=$2' count.txt count1.txt

Resources