Here I want to add a column with header name Gender after column name Age with value.
cat Person.csv
First_Name|Last_Name||Age|Address
Ram|Singh|18|Punjab
Sanjeev|Kumar|32|Mumbai
I am using this:
cat Person.csv | sed '1s/$/|Gender/; 2,$s/$/|Male/'
output:
First_Name|Last_Name||Age|Address|Gender
Ram|Singh|18|Punjab|Male
Sanjeev|Kumar|32|Mumbai|Male
I want output like this:
First_Name|Last_Name|Age|Gender|Address
Ram|Singh|18|Male|Punjab
Sanjeev|Kumar|32|Male|Mumbai
I took the second pipe out (for consistency's sake) ... the sed should look like this:
$ sed -E '1s/^([^|]+\|[^|]+\|[^|]+\|)/\1Gender|/;2,$s/^([^|]+\|[^|]+\|[^|]+\|)/\1male|/' Person.csv
First_Name|Last_Name|Age|Gender|Address
Ram|Singh|18|male|Punjab
Sanjeev|Kumar|32|male|Mumbai
We match and remember the first three fields and replace them with themselves, followed by Gender and male respectively.
Using awk:
$ awk -F"|" 'BEGIN{ OFS="|"}{ last=$NF; $NF=""; print (NR==1) ? $0"Gender|"last : $0"Male|"last }' Person.csv
First_Name|Last_Name||Age|Gender|Address
Ram|Singh|18|Male|Punjab
Sanjeev|Kumar|32|Male|Mumbai
Use '|' as the input field separator and set the output field separator as '|'. Store the last column value in variable named last and then remove the last column $NF="". Then print the appropriate output based on whether is first row or succeeding rows.
I have a little issue right now.
I have a file with 4 columns
test0000002,10030010330,c_,218
test0000002,10030010330,d_,202
test0000002,10030010330,b_,193
test0000002,10030010020,c_,178
test0000002,10030010020,b_,170
test0000002,10030010330,a_,166
test0000002,10030010020,a_,151
test0000002,10030010020,d_,150
test0000002,10030070050,c_,119
test0000002,10030070050,b_,99
test0000002,10030070050,d_,79
test0000002,10030070050,a_,56
test0000002,10030010390,c_,55
test0000002,10030010390,b_,44
test0000002,10030010380,d_,41
test0000002,10030010380,a_,37
test0000002,10030010390,d_,35
test0000002,10030010380,c_,33
test0000002,10030010390,a_,31
test0000002,10030010320,c_,30
test0000002,10030010320,b_,27
test0000002,10030010380,b_,26
test0000002,10030010320,a_,23
test0000002,10030010320,d_,22
test0000002,10030010010,a_,6
and I want the highest value from 4th column sorted from 2nd column.
test0000002,10030010330,c_,218
test0000002,10030010020,c_,178
test0000002,10030010330,a_,166
test0000002,10030010020,a_,151
test0000002,10030070050,c_,119
test0000002,10030010390,c_,55
test0000002,10030010380,d_,41
test0000002,10030010320,c_,30
test0000002,10030010390,a_,31
test0000002,10030010380,c_,33
test0000002,10030010390,d_,35
test0000002,10030010320,a_,23
test0000002,10030010380,b_,26
test0000002,10030010010,a_,6
It appears that your file is already sorted in descending order on the 4th column, so you just need to print lines where the 2nd column appears for the first time:
awk -F, '!seen[$2]++' file
test0000002,10030010330,c_,218
test0000002,10030010020,c_,178
test0000002,10030070050,c_,119
test0000002,10030010390,c_,55
test0000002,10030010380,d_,41
test0000002,10030010320,c_,30
test0000002,10030010010,a_,6
If your input file is not sorted on column 4, then
sort -t, -k4nr file | awk -F, '!seen[$2]++'
You can use two sorts:
sort -u -t, -k2,2 file | sort -t, -rnk4
The first one removes duplicates in the second column, the second one sorts the first one on the 4th column.
I want to pull all rows from a text file in linux which contain a specific number (in this case 9913) in a specific column (column 4). This is a tab-delimited file, so I am calling this a column, though I am not sure it is.
In some cases, there is only one number in column 4, but in other lines there are multiple numbers in this column (ex. 9913; 4444; 5555). I would like to get any rows for which the number 9913 appears in the 4th column (whether or not it is the only number or in a list). How do I put all lines which contain the number 9913 in column 4 and put them in their own file?
Here is an example of what I have tried:
cat file.txt | grep 9913 > newFile.txt
result is a mixture of the following:
CDR1as CDR1as ENST00000003100.8 9913 AAA-GGCAGCAAGGGACUAAAA (files that I want)
CDR1as CDR1as ENST00000399139.1 9606 GUCCCCA................(file ex. I don't want)
I do not get any results when calling a specific column. Shown by the helper below, the code is not recognizing the columns I think, and I get blank files when using awk.
awk '$4 == "9913"' file.txt > newfile.txt
will give me no transfer of data to a new file.
Thanks
This is one way of doing it
awk '$4 == "9913" {print $0}' file.txt > newfile.txt
or just
awk '$4 == "9913"' file.txt > newfile.txt
I'm trying to figure out how to search a csv file for a value, in this case "---", and change all proceeding columns in the same row to "---".
I have been looking to do this with awk, but I can only figure out how to check for known fields,
i.e.-
awk '{if ($(NF-1)=="---")$NF="---"}{print $0}' file
I need to find a way to use a for loop, I think(that's why I'm asking) to:
1) Search all the fields for a value
2) Find the value in a field, and change all proceeding fields of the same record to a specific value (i.e.- "---" )
Any ideas will be highly appreciated. And I apologize if my wording doesn't convey all the different trial and error attempts I have made at this, I would like to know what does work instead of showing everybody what does not.
Let's consider this test file:
$ cat file.csv
a,b,---,d,e
1,---,3,4,5
To look for --- and change all preceding columns in the same row to ---:
$ awk -F, '{f=0; for (i=NF;i>=1;i--) {$i=(f?"---":$i); f=($i=="---")}} 1' OFS=, file.csv
---,---,---,d,e
---,---,3,4,5
Alternatively, to look for --- and change all subsequent (succeeding) columns in the same row to ---:
$ awk -F, '{f=0; for (i=1;i<=NF;i++) {$i=(f?"---":$i); f=($i=="---")}} 1' OFS=, file.csv
a,b,---,---,---
1,---,---,---,---