Awk matching values of first two columns and printing in blank field - linux

I have a csv file which looks like below:
2212,A1,
2212,A1,128
2307,B1,
2307,B1,107
how can i copy value of 3rd column in place of missing values in 3rd column of if value of first 2 column is same. e.g. first two columns of first two rows are same so automatically it should print value of 3rd column of second row in missing place of third column of first row.
expected output:
2212,A1,128
2212,A1,128
2307,B1,107
2307,B1,107
Please help as i couldn't even think of a solution and there are millions of values such like this in my file..

If you first sort the file in reverse order, the rows with data preceed the empty rows:
$ sort -r file
2307,B1,107
2307,B1,
2212,A1,128
2212,A1,
Then use following awk to process the output of sort:
$ sort -r file | awk 'NR>1 && match(prev,$0) {$0=prev} {prev=$0} 1'
2307,B1,107
2307,B1,107
2212,A1,128
2212,A1,128

awk -F, '{a[$1FS$2]++;b[$1FS$2]=$NF}END{for (i in b) {for(j=1;j<=a[i];j++) print i FS b[i]}}' file

Related

How to add a Header with value after a perticular column in linux

Here I want to add a column with header name Gender after column name Age with value.
cat Person.csv
First_Name|Last_Name||Age|Address
Ram|Singh|18|Punjab
Sanjeev|Kumar|32|Mumbai
I am using this:
cat Person.csv | sed '1s/$/|Gender/; 2,$s/$/|Male/'
output:
First_Name|Last_Name||Age|Address|Gender
Ram|Singh|18|Punjab|Male
Sanjeev|Kumar|32|Mumbai|Male
I want output like this:
First_Name|Last_Name|Age|Gender|Address
Ram|Singh|18|Male|Punjab
Sanjeev|Kumar|32|Male|Mumbai
I took the second pipe out (for consistency's sake) ... the sed should look like this:
$ sed -E '1s/^([^|]+\|[^|]+\|[^|]+\|)/\1Gender|/;2,$s/^([^|]+\|[^|]+\|[^|]+\|)/\1male|/' Person.csv
First_Name|Last_Name|Age|Gender|Address
Ram|Singh|18|male|Punjab
Sanjeev|Kumar|32|male|Mumbai
We match and remember the first three fields and replace them with themselves, followed by Gender and male respectively.
Using awk:
$ awk -F"|" 'BEGIN{ OFS="|"}{ last=$NF; $NF=""; print (NR==1) ? $0"Gender|"last : $0"Male|"last }' Person.csv
First_Name|Last_Name||Age|Gender|Address
Ram|Singh|18|Male|Punjab
Sanjeev|Kumar|32|Male|Mumbai
Use '|' as the input field separator and set the output field separator as '|'. Store the last column value in variable named last and then remove the last column $NF="". Then print the appropriate output based on whether is first row or succeeding rows.

How to get a column value based on adjacent column value in linux?

I want to fetch a column value based on the next column value.i'm grepping for a column having value host and want to print the previous column value
tried using grep -Po ".* (?=host)" but didn't get the proper output
file test.log contains below sample data(all in a single row)
test Plus 193310 68FAD575EC59C2C6 exa4dbadm03 host
cat test.log|grep -i 193310|grep -i host|grep -Po ".* (?=host)"
I'm trying to grep the column which having value as host and print the previous column value. In this case i want to get exa4dbadm03 as a output
expected result:
exa4dbadm03
Why don't you use awk for this? E.g:
awk '{for(i=2;i<=NF;++i){if($i=="host"){print $(i-1);break}}}' file

Print whole line with highest value from one column

I have a little issue right now.
I have a file with 4 columns
test0000002,10030010330,c_,218
test0000002,10030010330,d_,202
test0000002,10030010330,b_,193
test0000002,10030010020,c_,178
test0000002,10030010020,b_,170
test0000002,10030010330,a_,166
test0000002,10030010020,a_,151
test0000002,10030010020,d_,150
test0000002,10030070050,c_,119
test0000002,10030070050,b_,99
test0000002,10030070050,d_,79
test0000002,10030070050,a_,56
test0000002,10030010390,c_,55
test0000002,10030010390,b_,44
test0000002,10030010380,d_,41
test0000002,10030010380,a_,37
test0000002,10030010390,d_,35
test0000002,10030010380,c_,33
test0000002,10030010390,a_,31
test0000002,10030010320,c_,30
test0000002,10030010320,b_,27
test0000002,10030010380,b_,26
test0000002,10030010320,a_,23
test0000002,10030010320,d_,22
test0000002,10030010010,a_,6
and I want the highest value from 4th column sorted from 2nd column.
test0000002,10030010330,c_,218
test0000002,10030010020,c_,178
test0000002,10030010330,a_,166
test0000002,10030010020,a_,151
test0000002,10030070050,c_,119
test0000002,10030010390,c_,55
test0000002,10030010380,d_,41
test0000002,10030010320,c_,30
test0000002,10030010390,a_,31
test0000002,10030010380,c_,33
test0000002,10030010390,d_,35
test0000002,10030010320,a_,23
test0000002,10030010380,b_,26
test0000002,10030010010,a_,6
It appears that your file is already sorted in descending order on the 4th column, so you just need to print lines where the 2nd column appears for the first time:
awk -F, '!seen[$2]++' file
test0000002,10030010330,c_,218
test0000002,10030010020,c_,178
test0000002,10030070050,c_,119
test0000002,10030010390,c_,55
test0000002,10030010380,d_,41
test0000002,10030010320,c_,30
test0000002,10030010010,a_,6
If your input file is not sorted on column 4, then
sort -t, -k4nr file | awk -F, '!seen[$2]++'
You can use two sorts:
sort -u -t, -k2,2 file | sort -t, -rnk4
The first one removes duplicates in the second column, the second one sorts the first one on the 4th column.

Linux - putting lines that contain a string at a specific column in a new file

I want to pull all rows from a text file in linux which contain a specific number (in this case 9913) in a specific column (column 4). This is a tab-delimited file, so I am calling this a column, though I am not sure it is.
In some cases, there is only one number in column 4, but in other lines there are multiple numbers in this column (ex. 9913; 4444; 5555). I would like to get any rows for which the number 9913 appears in the 4th column (whether or not it is the only number or in a list). How do I put all lines which contain the number 9913 in column 4 and put them in their own file?
Here is an example of what I have tried:
cat file.txt | grep 9913 > newFile.txt
result is a mixture of the following:
CDR1as CDR1as ENST00000003100.8 9913 AAA-GGCAGCAAGGGACUAAAA (files that I want)
CDR1as CDR1as ENST00000399139.1 9606 GUCCCCA................(file ex. I don't want)
I do not get any results when calling a specific column. Shown by the helper below, the code is not recognizing the columns I think, and I get blank files when using awk.
awk '$4 == "9913"' file.txt > newfile.txt
will give me no transfer of data to a new file.
Thanks
This is one way of doing it
awk '$4 == "9913" {print $0}' file.txt > newfile.txt
or just
awk '$4 == "9913"' file.txt > newfile.txt

Match data in Any Column, and change data over all Other Columns in Same Row

I'm trying to figure out how to search a csv file for a value, in this case "---", and change all proceeding columns in the same row to "---".
I have been looking to do this with awk, but I can only figure out how to check for known fields,
i.e.-
awk '{if ($(NF-1)=="---")$NF="---"}{print $0}' file
I need to find a way to use a for loop, I think(that's why I'm asking) to:
1) Search all the fields for a value
2) Find the value in a field, and change all proceeding fields of the same record to a specific value (i.e.- "---" )
Any ideas will be highly appreciated. And I apologize if my wording doesn't convey all the different trial and error attempts I have made at this, I would like to know what does work instead of showing everybody what does not.
Let's consider this test file:
$ cat file.csv
a,b,---,d,e
1,---,3,4,5
To look for --- and change all preceding columns in the same row to ---:
$ awk -F, '{f=0; for (i=NF;i>=1;i--) {$i=(f?"---":$i); f=($i=="---")}} 1' OFS=, file.csv
---,---,---,d,e
---,---,3,4,5
Alternatively, to look for --- and change all subsequent (succeeding) columns in the same row to ---:
$ awk -F, '{f=0; for (i=1;i<=NF;i++) {$i=(f?"---":$i); f=($i=="---")}} 1' OFS=, file.csv
a,b,---,---,---
1,---,---,---,---

Resources