Match data in Any Column, and change data over all Other Columns in Same Row - linux

I'm trying to figure out how to search a csv file for a value, in this case "---", and change all proceeding columns in the same row to "---".
I have been looking to do this with awk, but I can only figure out how to check for known fields,
i.e.-
awk '{if ($(NF-1)=="---")$NF="---"}{print $0}' file
I need to find a way to use a for loop, I think(that's why I'm asking) to:
1) Search all the fields for a value
2) Find the value in a field, and change all proceeding fields of the same record to a specific value (i.e.- "---" )
Any ideas will be highly appreciated. And I apologize if my wording doesn't convey all the different trial and error attempts I have made at this, I would like to know what does work instead of showing everybody what does not.

Let's consider this test file:
$ cat file.csv
a,b,---,d,e
1,---,3,4,5
To look for --- and change all preceding columns in the same row to ---:
$ awk -F, '{f=0; for (i=NF;i>=1;i--) {$i=(f?"---":$i); f=($i=="---")}} 1' OFS=, file.csv
---,---,---,d,e
---,---,3,4,5
Alternatively, to look for --- and change all subsequent (succeeding) columns in the same row to ---:
$ awk -F, '{f=0; for (i=1;i<=NF;i++) {$i=(f?"---":$i); f=($i=="---")}} 1' OFS=, file.csv
a,b,---,---,---
1,---,---,---,---

Related

Bash: write to specific column in CSV

I am trying to write the contents of a .txt file to the "B" or second column in a CSV file.
awk '{$2 = $2"i"; print}' x.txt >> y.csv
I thought this would write the contents of x.txt to y.csv followed by the letter "i" in the second column. However, this code still writes to the 1st column.
Sample of x.txt:
hello
hellox
hello1
Sample output to y.csv:
A Column
hello i
hellox i
hello1 i
I want to have this content written to the B column. Preferably without the "i".
Any solution to this would be appreciated.
You may use this awk:
awk 'BEGIN{FS=OFS=","} {$2 = $1} 1' file.csv
hello,hello
hellox,hellox
hello1,hello1
If you want literal i in 2nd column of output:
awk 'BEGIN{FS=OFS=","} {$2 = "i"} 1' file.csv
hello,i
hellox,i
hello1,i
Unless I'm misunderstanding what you're doing, the paste command would be easier and simpler

How to cut column data from flat file

I've data in format below;
111,Ja,M,Oes,2012-08-03 16:42:00,x,xz
112,Ln,d,D,Gn,2012-08-03 16:51:00,y,yx
I need to create files with data in the sequence below:
111,x,xz
112,y,yz
In output format, we've first value before comma and last two comma prefix values. Here we can have any number of commas in-between.
Kindly advise, how can generate required output file from input file in Linux machine.
The Awk statement for this is pretty straight-forward. Set the input and output field separators and print the fields using $1..$NF, where $NF is the value of the last column,
awk 'BEGIN{FS=OFS=","}{print $1,$(NF-1),$NF}' input.csv > newfile.csv
Not much to this one in awk:
awk -F"," 'BEGIN{OFS=","}{print $1,$(NF-1), $NF}' inFile > outFile
We split the lines in awk with a comma -F"," and then print the first field $1, the second to last field $(NF-1), and the last field $NF.
NF is the "Number of fields" so subtracting 1 from it will give you the second to last item.
with sed
$ sed -r 's/([^,]+).*(,[^,]+,[^,]+)/\1\2/' file
111,x,xz
112,y,yx
or
$ sed -r 's/([^,]+).*((,[^,]+){2})/\1\2/' file
awk '{print substr($1,1,4) substr($2,10,4)}' file
111,x,xz
112,y,yx

Linux - putting lines that contain a string at a specific column in a new file

I want to pull all rows from a text file in linux which contain a specific number (in this case 9913) in a specific column (column 4). This is a tab-delimited file, so I am calling this a column, though I am not sure it is.
In some cases, there is only one number in column 4, but in other lines there are multiple numbers in this column (ex. 9913; 4444; 5555). I would like to get any rows for which the number 9913 appears in the 4th column (whether or not it is the only number or in a list). How do I put all lines which contain the number 9913 in column 4 and put them in their own file?
Here is an example of what I have tried:
cat file.txt | grep 9913 > newFile.txt
result is a mixture of the following:
CDR1as CDR1as ENST00000003100.8 9913 AAA-GGCAGCAAGGGACUAAAA (files that I want)
CDR1as CDR1as ENST00000399139.1 9606 GUCCCCA................(file ex. I don't want)
I do not get any results when calling a specific column. Shown by the helper below, the code is not recognizing the columns I think, and I get blank files when using awk.
awk '$4 == "9913"' file.txt > newfile.txt
will give me no transfer of data to a new file.
Thanks
This is one way of doing it
awk '$4 == "9913" {print $0}' file.txt > newfile.txt
or just
awk '$4 == "9913"' file.txt > newfile.txt

Awk matching values of first two columns and printing in blank field

I have a csv file which looks like below:
2212,A1,
2212,A1,128
2307,B1,
2307,B1,107
how can i copy value of 3rd column in place of missing values in 3rd column of if value of first 2 column is same. e.g. first two columns of first two rows are same so automatically it should print value of 3rd column of second row in missing place of third column of first row.
expected output:
2212,A1,128
2212,A1,128
2307,B1,107
2307,B1,107
Please help as i couldn't even think of a solution and there are millions of values such like this in my file..
If you first sort the file in reverse order, the rows with data preceed the empty rows:
$ sort -r file
2307,B1,107
2307,B1,
2212,A1,128
2212,A1,
Then use following awk to process the output of sort:
$ sort -r file | awk 'NR>1 && match(prev,$0) {$0=prev} {prev=$0} 1'
2307,B1,107
2307,B1,107
2212,A1,128
2212,A1,128
awk -F, '{a[$1FS$2]++;b[$1FS$2]=$NF}END{for (i in b) {for(j=1;j<=a[i];j++) print i FS b[i]}}' file

How to use grep or awk to process a specific column ( with keywords from text file )

I've tried many combinations of grep and awk commands to process text from file.
This is a list of customers of this type:
John,Mills,81,Crescent,New York,NY,john#mills.com,19/02/1954
I am trying to separate these records into two categories, MEN and FEMALES.
I have a list of some 5000 Female Names , all in plain text , all in one file.
How can I "grep" the first column ( since I am only matching first names) but still printing the entire customer record ?
I found it easy to "cut" the first column and grep --file=female.names.txt, but this way it's not going to print the entire record any longer.
I am aware of the awk option but in that case I don't know how to read the female names from file.
awk -F ',' ' { if($1==" ???Filename??? ") print $0} '
Many thanks !
You can do this with Awk:
awk -F, 'NR==FNR{a[$0]; next} ($1 in a)' female.names.txt file.csv
Would print the lines of your csv file that contain first names of any found in your file female.names.txt.
awk -F, 'NR==FNR{a[$0]; next} !($1 in a)' female.names.txt file.csv
Would output lines not found in female.names.txt.
This assumes the format of your female.names.txt file is something like:
Heather
Irene
Jane
Try this:
grep --file=<(sed 's/.*/^&,/' female.names.txt) datafile.csv
This changes all the names in the list of female names to the regular expression ^name, so it only matches at the beginning of the line and followed by a comma. Then it uses process substitution to use that as the file to match against the data file.
Another alternative is Perl, which can be useful if you're not super-familiar with awk.
#!/usr/bin/perl -anF,
use strict;
our %names;
BEGIN {
while (<ARGV>) {
chomp;
$names{$_} = 1;
}
}
print if $names{$F[0]};
To run (assume you named this file filter.pl):
perl filter.pl female.names.txt < records.txt
So, I've come up with the following:
Suppose, you have a file having the following lines in a file named test.txt:
abe 123 bdb 532
xyz 593 iau 591
Now you want to find the lines which include the first field having the first and last letters as vowels. If you did a simple grep you would get both of the lines but the following will give you the first line only which is the desired output:
egrep "^([0-z]{1,} ){0}[aeiou][0-z]+[aeiou]" test.txt
Then you want to the find the lines which include the third field having the first and last letters as vowels. Similary, if you did a simple grep you would get both of the lines but the following will give you the second line only which is the desired output:
egrep "^([0-z]{1,} ){2}[aeiou][0-z]+[aeiou]" test.txt
The value in the first curly braces {1,} specifies that the preceding character which ranges from 0 to z according to the ASCII table, can occur any number of times. After that, we have the field separator space in this case. Change the value within the second curly braces {0} or {2} to the desired field number-1. Then, use a regular expression to mention your criteria.

Resources