Bash: write to specific column in CSV - linux

I am trying to write the contents of a .txt file to the "B" or second column in a CSV file.
awk '{$2 = $2"i"; print}' x.txt >> y.csv
I thought this would write the contents of x.txt to y.csv followed by the letter "i" in the second column. However, this code still writes to the 1st column.
Sample of x.txt:
hello
hellox
hello1
Sample output to y.csv:
A Column
hello i
hellox i
hello1 i
I want to have this content written to the B column. Preferably without the "i".
Any solution to this would be appreciated.

You may use this awk:
awk 'BEGIN{FS=OFS=","} {$2 = $1} 1' file.csv
hello,hello
hellox,hellox
hello1,hello1
If you want literal i in 2nd column of output:
awk 'BEGIN{FS=OFS=","} {$2 = "i"} 1' file.csv
hello,i
hellox,i
hello1,i

Unless I'm misunderstanding what you're doing, the paste command would be easier and simpler

Related

How to cut column data from flat file

I've data in format below;
111,Ja,M,Oes,2012-08-03 16:42:00,x,xz
112,Ln,d,D,Gn,2012-08-03 16:51:00,y,yx
I need to create files with data in the sequence below:
111,x,xz
112,y,yz
In output format, we've first value before comma and last two comma prefix values. Here we can have any number of commas in-between.
Kindly advise, how can generate required output file from input file in Linux machine.
The Awk statement for this is pretty straight-forward. Set the input and output field separators and print the fields using $1..$NF, where $NF is the value of the last column,
awk 'BEGIN{FS=OFS=","}{print $1,$(NF-1),$NF}' input.csv > newfile.csv
Not much to this one in awk:
awk -F"," 'BEGIN{OFS=","}{print $1,$(NF-1), $NF}' inFile > outFile
We split the lines in awk with a comma -F"," and then print the first field $1, the second to last field $(NF-1), and the last field $NF.
NF is the "Number of fields" so subtracting 1 from it will give you the second to last item.
with sed
$ sed -r 's/([^,]+).*(,[^,]+,[^,]+)/\1\2/' file
111,x,xz
112,y,yx
or
$ sed -r 's/([^,]+).*((,[^,]+){2})/\1\2/' file
awk '{print substr($1,1,4) substr($2,10,4)}' file
111,x,xz
112,y,yx

Match data in Any Column, and change data over all Other Columns in Same Row

I'm trying to figure out how to search a csv file for a value, in this case "---", and change all proceeding columns in the same row to "---".
I have been looking to do this with awk, but I can only figure out how to check for known fields,
i.e.-
awk '{if ($(NF-1)=="---")$NF="---"}{print $0}' file
I need to find a way to use a for loop, I think(that's why I'm asking) to:
1) Search all the fields for a value
2) Find the value in a field, and change all proceeding fields of the same record to a specific value (i.e.- "---" )
Any ideas will be highly appreciated. And I apologize if my wording doesn't convey all the different trial and error attempts I have made at this, I would like to know what does work instead of showing everybody what does not.
Let's consider this test file:
$ cat file.csv
a,b,---,d,e
1,---,3,4,5
To look for --- and change all preceding columns in the same row to ---:
$ awk -F, '{f=0; for (i=NF;i>=1;i--) {$i=(f?"---":$i); f=($i=="---")}} 1' OFS=, file.csv
---,---,---,d,e
---,---,3,4,5
Alternatively, to look for --- and change all subsequent (succeeding) columns in the same row to ---:
$ awk -F, '{f=0; for (i=1;i<=NF;i++) {$i=(f?"---":$i); f=($i=="---")}} 1' OFS=, file.csv
a,b,---,---,---
1,---,---,---,---

Adding a specific pattern,if needed, in every line of a text file linux

I have file with the following format (the spaces is one tab)
sentenceA1 sentencek1
sentencek2
sentencek3
sentenceA2 sentencel1
sentencel2
and I want the output to be:
sentenceA1 sentencek1
sentenceA1 sentencek2
sentenceA1 sentencek3
sentenceA2 sentencel1
sentenceA2 sentencel2
I tried separating the values by creating two files(using sed) one with the first and one with the second values but I don't know how to merge them afterwards successfully
Is this possible by only using sed or awk?
This awk should work:
awk 'NF==2{p=$1; print; next} {print p, $1}' file
A1 k1
A1 k2
A1 k3
A2 l1
A2 l2
awk '!/^\t/{p=substr($0,1,index($0,"\t"))} /^\t/{$0=p$0}1' input
This is just a rewritten awk based on anubhavas post to get the data correct.
awk -F"\t" '$1{p=$1;print;next} {print p$0}' file
sentenceA1 sentencek1
sentenceA1 sentencek2
sentenceA1 sentencek3
sentenceA2 sentencel1
sentenceA2 sentencel2
Since there are tabs in all line, all line will have same numbers of fields.
If line starts with tab, first field will be nothing, so $1 will test for this.

How to use grep or awk to process a specific column ( with keywords from text file )

I've tried many combinations of grep and awk commands to process text from file.
This is a list of customers of this type:
John,Mills,81,Crescent,New York,NY,john#mills.com,19/02/1954
I am trying to separate these records into two categories, MEN and FEMALES.
I have a list of some 5000 Female Names , all in plain text , all in one file.
How can I "grep" the first column ( since I am only matching first names) but still printing the entire customer record ?
I found it easy to "cut" the first column and grep --file=female.names.txt, but this way it's not going to print the entire record any longer.
I am aware of the awk option but in that case I don't know how to read the female names from file.
awk -F ',' ' { if($1==" ???Filename??? ") print $0} '
Many thanks !
You can do this with Awk:
awk -F, 'NR==FNR{a[$0]; next} ($1 in a)' female.names.txt file.csv
Would print the lines of your csv file that contain first names of any found in your file female.names.txt.
awk -F, 'NR==FNR{a[$0]; next} !($1 in a)' female.names.txt file.csv
Would output lines not found in female.names.txt.
This assumes the format of your female.names.txt file is something like:
Heather
Irene
Jane
Try this:
grep --file=<(sed 's/.*/^&,/' female.names.txt) datafile.csv
This changes all the names in the list of female names to the regular expression ^name, so it only matches at the beginning of the line and followed by a comma. Then it uses process substitution to use that as the file to match against the data file.
Another alternative is Perl, which can be useful if you're not super-familiar with awk.
#!/usr/bin/perl -anF,
use strict;
our %names;
BEGIN {
while (<ARGV>) {
chomp;
$names{$_} = 1;
}
}
print if $names{$F[0]};
To run (assume you named this file filter.pl):
perl filter.pl female.names.txt < records.txt
So, I've come up with the following:
Suppose, you have a file having the following lines in a file named test.txt:
abe 123 bdb 532
xyz 593 iau 591
Now you want to find the lines which include the first field having the first and last letters as vowels. If you did a simple grep you would get both of the lines but the following will give you the first line only which is the desired output:
egrep "^([0-z]{1,} ){0}[aeiou][0-z]+[aeiou]" test.txt
Then you want to the find the lines which include the third field having the first and last letters as vowels. Similary, if you did a simple grep you would get both of the lines but the following will give you the second line only which is the desired output:
egrep "^([0-z]{1,} ){2}[aeiou][0-z]+[aeiou]" test.txt
The value in the first curly braces {1,} specifies that the preceding character which ranges from 0 to z according to the ASCII table, can occur any number of times. After that, we have the field separator space in this case. Change the value within the second curly braces {0} or {2} to the desired field number-1. Then, use a regular expression to mention your criteria.

CSV grep but keep the header

I have a CSV file that look like this:
A,B,C
1,2,3
4,4,4
1,2,6
3,6,9
Is there an easy way to grep all the rows in which the B column is 2, and keep the header? For example, I want the output be like
A,B,C
1,2,3
1,2,6
I am working under linux
Using awk:
awk -F, 'NR==1 || $2==2' file
NR==1 -> if first line,
$2==2 -> if second column is equal to 2. Lines are printed if either of the above is true.
To choose the column using the header column name:
awk -F, -v col="B" 'NR==1{for(i=1;i<=NF;i++)if($i==col)break;print;next}$i==2' file
Replace B with the appropriate name of the column which you want to check against.
You can use addresses in sed:
sed -n '1p;/^[^,]*,2/p'
It means:
1p Print the first line.
/ Start a match.
^ Match the beginnning of a line.
[^,] Match anything but a comma
* zero or more times.
, Match a comma.
2 Match a 2.
/p End of match, if it matches, print.
If the header can contain the value you are looking for, you should be more careful:
sed -n '1p;1!{/^[^,]*,2/p}'
1!{ ... } just means "Do the following for lines other then the first one".
For column number n>2, you can add a quantifier:
sed -n '1p;1!{/^\([^,]*,\)\{M\}2/p}'
where M=n-1. The quantifier just means repetition, so the non-comma-0-or-more-times-comma thing is repeated M times.
For true CSV files where a value can contain a comma, switch to Perl and Text::CSV.
$ awk -F, 'NR==1 { for (i=1;i<=NF;i++) h[$i] = i; print; next } $h["B"] == 2' file
A,B,C
1,2,3
1,2,6
By the way, sed is an excellent tool for simple substitutions on a single line, for anything else, just use awk - the code will be clearer and MUCH easier to enhance in future if necessary.

Resources