comparing two files with different columns - linux

i have the two files(count.txt, count1.txt). i need to do the following
1. get the values from count.txt and count1.txt where 1st column is equal.
2. if its equal need to compare the 2nd column like ((1st column value + 5) >= 2 column value)
count.txt
order1,150
order2,165
order3,125
count1.txt
order1,155
order2,170
order3,125
order4,123
and i want the output like below,
Output.txt
order1,155
order2,170
i have used below nawk command for the 1st point, but not able to complete the 2nd point. Please suggest to achieve the same.
nawk -F"," 'NR==FNR {a[$1];next} ($1 in a)' count.txt count1.txt

nawk -F"," 'NR==FNR {a[$1]=$2;next} ($1 in a) && (a[$1]+5)<=$2' count.txt count1.txt

Related

awk print number of row only in uniq column

I have data set like this:
1 A
1 B
1 C
2 A
2 B
2 C
3 B
3 C
And I have a script which calculates me:
Number of occurrences in searching string
Number of rows
awk -v search="A" \
'BEGIN{count=0} $2 == search {count++} END{print count "\n" NR}' input
That works perfectly fine.
I would like to add to my awk one liner number of unique lines from the first column.
So the output should be separated by \n:
2
8
3
I can do this in separate awk code, but I am not able to integrate it to my original awk code.
awk '{a[$1]++}END{for(i in a){print i}}' input | wc -l
Any idea how to integrate it in one awk solution without piping ?
Looks like you want this:
awk -v search="A" '{a[$1]++}
$2 == search {count++}
END{OFS="\n";print count+0, NR, length(a)}' file

Subtract a constant number from a column

I have two large files (~10GB) as follows:
file1.csv
name,id,dob,year,age,score
Mike,1,2014-01-01,2016,2,20
Ellen,2, 2012-01-01,2016,4,35
.
.
file2.csv
id,course_name,course_id
1,math,101
1,physics,102
1,chemistry,103
2,math,101
2,physics,102
2,chemistry,103
.
.
I want to subtract 1 from the "id" columns of these files:
file1_updated.csv
name,id,dob,year,age,score
Mike,0,2014-01-01,2016,2,20
Ellen,0, 2012-01-01,2016,4,35
file2_updated.csv
id,course_name,course_id
0,math,101
0,physics,102
0,chemistry,103
1,math,101
1,physics,102
1,chemistry,103
I have tried awk '{print ($1 - 1) "," $0}' file2.csv, but did not get the correct result:
-1,id,course_name,course_id
0,1,math,101
0,1,physics,102
0,1,chemistry,103
1,2,math,101
1,2,physics,102
1,2,chemistry,103
You've added an extra column in your attempt. Instead set your first field $1 to $1-1:
awk -F"," 'BEGIN{OFS=","} {$1=$1-1;print $0}' file2.csv
That semicolon separates the commands. We set the delimiter to comma (-F",") and the Output Field Seperator to comma BEGIN{OFS=","}. The first command to subtract 1 from the first field executes first, then the print command executes second, so the entire record, $0, will now contain the new $1 value when it's printed.
It might be helpful to only subtract 1 from records that are not your header. So you can add a condition to the first command:
awk -F"," 'BEGIN{OFS=","} NR>1{$1=$1-1} {print $0}' file2.csv
Now we only subtract when the record number (NR) is greater than 1. Then we just print the entire record.

insert values of a column into other column

I have a tab-delimited .txt file with two columns and long list of values in both columns
col1 col2
1 a
2 b
3 c
... ...
I want to convert this now to
col1
1
a
2
b
3
c
So that he insert the values from column 2 into column 1 at the correct location.
Is there any way to do this, maybe using awk, or something else through the command line?
You can ask awk to print first column and then second column. By using print for each case, you ensure you have a new line in between them:
awk -F"\t" '{print $1; print $2}' file
Or the following if you just want to print the 1st column on the first line:
awk -F"\t" 'NR==1 {print $1; next} {print $1; print $2}' file
The second command returns the following for your given input:
col1
1
a
2
b
3
c
this should do:
awk -F"\t" -v OFS="\n" '{$1=$1}7' file

How to use awk to get the result of computation of column1 value of the same column2 value in 2 csv files in Ubuntu?

I am using ubuntu and we got a csv file1.csv with 2 columns looks like
a,1
b,2
c,3
...
and another file2.csv with 2 columns looks like
a,4
b,3
d,2
...
Some of column 1 value appear in file1.csv but not in file2.csv and vice cersa and these values should not be in result.csv. Say the value of first column in file1.csv is x and the value of first column in file2.csv with the same column2 value is y. How to use awk to compute (x-y)/(x+y) of second lines of 2 csv files in Ubuntu to get the result.csv like this:
a,-0.6
b,-0.2
-0.6 is computed by (1-4)/(1+4)
-0.2 is computed by (2-3)/(2+3)
What about this?
$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1]=$2; next} {if ($1 in a) print $1,(a[$1]-$2)/(a[$1]+$2)}' f1 f2
a,-0.6
b,-0.2
Explanation
BEGIN{FS=OFS=","} set input and output field separators as comma.
FNR==NR {a[$1]=$2; next} when processinig first file, store in the array a[] the values like a[first col]=second col.
{if ($1 in a) print $1,(a[$1]-$2)/(a[$1]+$2)} when looping through second file, on each line do: check if the first col is stored in the a[] array; if so, print (x-y)/(x+y), being x=stored value and y=current second column.

Removing last column from rows that have three columns using bash

I have a file that contains several lines of data. Some lines contain three columns, but most contain only two. All lines are single-tab separated. For those that contain three columns, the third column is typically redundant and contains the same data as the second so I'd like to remove it.
I imagine awk or cut would be appropriate, but I'm drawing a blank on how to test the row for three columns so my script will only work on those rows. I know awk is a very powerful language with logic and whatnot built into it, I'm just not that strong with it.
I looked at a similar question, but I'm not sure what is going on with the awk answer. Should the -4 be -1 since I only want to remove one column? What about if the row has two columns; will it remove the second even though I don't want to do anything?
I modified it to what I think it would be:
awk -F"\t" -v OFS="\t" '{ for (i=1;i<=NF-4;i++){ print $i }}'
But when I run it (with the file) nothing happens. If I change NF-1 or NF-2 I get some output, but it only a handful of lines and only the first column.
Can anyone clue me into what I should be doing?
If you just want to remove the third column, you could just print the first and the second:
awk -F '\t' '{print $1 "\t" $2}'
And it's similar to cut:
cut -f 1,2
The awk variable NF gives you the number for fields. So an expression like this should work for you.
awk -F, 'NF == 3 {print $1 "," $2} NF != 3 {print $0}'
Running it on an input file like so
a,b,c
x,y
u,v,w
l,m
gives me
$ cat test | awk -F, 'NF == 3 {print $1 "," $2} NF != 3 {print $0}'
a,b
x,y
u,v
l,m
This might work for you (GNU sed):
sed 's/\t[^\t]*//2g' file
Restricts the file to two columns.
awk 'NF==3{print $1"\t"$2}NF==2{print}' your_file
Testde below:
> cat temp
1 2
3 4 5
6 7
8 9 10
>
> awk 'NF==3{print $1"\t"$2}NF==2{print}' temp
1 2
3 4
6 7
8 9
>
or in a much more simplere way in awk:
awk 'NF==3{print $1"\t"$2}NF==2' your_file
Or you can also go with perl:
perl -lane 'print "$F[0]\t$F[1]"' your_file

Resources