Awk split file give incomplete lines - linux

My file is a csv file with comma delimited fields.
I tried to split the file into multiple files by first field. I did the following:
cat myfile.csv | awk -F',' '{print $0 > "Mydata"$1".csv"}'
It does split the file, but the file is corrupted, the last line of each file is not complete. The breaking position seems random. Anyone has the same problem?

These types of problem are invariably because you created your input file on Windows and so it has spurious control-Ms at the end of the lines. Run dos2unix on your input file to clean it up then re-run your awk command but re-write it as:
awk -F',' '{print > ("Mydata" $1 ".csv") }' myfile.csv
to solve a couple of unrelated problems.

Use this awk command to ignore \r characters before \n:
awk -F ',' -v RS='\r\n' '{print > ("Mydata" $1 ".csv") }' myfile.csv

Just don't forget to close your files :
awk -F ',' '{ f="Mydata"$1".csv"; print $0 > f; close(f) }' myfile.csv

Use a real CSV parser/generator instead. It's safe for unusual inputs including those with multi-lined values. And here's a one-liner for Ruby:
ruby -e 'require "csv";CSV.foreach(ARGV.shift){|r| File.open("Mydata#{r[0]}.csv","w").puts(CSV.generate_line(r))}' file.csv

Related

How Can I Perform Awk Commands Only On Certain Fields

I have CSV columns that I'm working with:
info,example-string,super-example-string,otherinfo
I would like to get:
example-string super example string
Right now, I'm running the following command:
awk -F ',' '{print $3}' | sed "s/-//g"
But, then I have to paste the lines together to combine $2 and $3.
Is there anyway to do something like this?
awk -F ',' '{print $2" "$3}' | sed "s/-//g"
Except, where the sed command is only performed on $3 and $2 stays in place? I'm just concerned later on if the lines don't match up, the data could be misaligned.
Please note: I need to keep the pipe for the SED command. I just used a simple example but I end up running a lot of commands after that as well.
Try:
$ awk -F, '{gsub(/-/," ",$3); print $2,$3}' file
example-string super example string
How it works
-F,
This tells awk to use a comma as the field separator.
gsub(/-/," ",$3)
This replaces all - in field 3 with spaces.
print $2,$3
This prints fields 2 and 3.
Examples using pipelines
$ echo 'info,example-string,super-example-string,otherinfo' | awk -F, '{gsub(/-/," ",$3); print $2,$3}'
example-string super example string
In a pipeline with sed:
$ echo 'info,example-string,super-example-string,otherinfo' | awk -F, '{gsub(/-/," ",$3); print $2,$3}' | sed 's/string/String/g'
example-String super example String
Though best solution will be either use a single sed or use single awk. Since you have requested to use awk and sed solution so providing this. Also considering your actual data will be same as shown sample Input_file.
awk -F, '{print $2,$3}' Input_file | sed 's/\([^ ]*\)\([^-]*\)-\([^-]*\)-\([^-]*\)/\1 \2 \3 \4/'
Output will be as follows.
example-string super example string

How To Substitute Piped Output of Awk Command With Variable

I'm trying to take a column and pipe it through an echo command. If possible, I would like to keep it in one line or do this as efficiently as possible. While researching, I found that I have to use single quotes to expand the variable and to escape the double quotes.
Here's what I was trying:
awk -F ',' '{print $2}' file1.txt | while read line; do echo "<href=\"'${i}'\">'${i}'</a>"; done
But, I keep getting the number of lines than the single line's output. If you know how to caputure each line in field 4, that would be so helpful.
File1.txt:
Hello,http://example1.com
Hello,http://example2.com
Hello,http://example3.com
Desired output:
<href="http://example1.com">http://example1.com</a>
<href="http://example2.com">http://example2.com</a>
<href="http://example3.com">http://example3.com</a>
$ awk -F, '{printf "<href=\"%s\">%s</a>\n", $2, $2}' file
<href="http://example1.com">http://example1.com</a>
<href="http://example2.com">http://example2.com</a>
<href="http://example3.com">http://example3.com</a>
Or slightly briefer but less robustly:
$ sed 's/.*,\(.*\)/<href="\1">\1<\/a>/' file
<href="http://example1.com">http://example1.com</a>
<href="http://example2.com">http://example2.com</a>
<href="http://example3.com">http://example3.com</a>

Add/Sub/Mul/Div a constant to a column in a csv file in linux shell scripting

I am trying to modify the contents of a particular column in a csv file by dividing a constant.
For Ex: If the contents are
1000,abc,0,1
2000,cde,2,3 and so on..
I would like to change it to
1,abc,0,1
2,cde,2,3
I went through all the previous solutions in this blog, and i tried this
awk -F\; '{$1=($1/1000)}1' file.csv > tmp.csv && mv tmp.csv file.csv
The above command opens up file.csv , performs $1/1000 and save it to a temporary file and then overwrites to the original file.
The problem i see is, in the final file.csv, The contents displayed are as follows
1
2
3
4 and so on ..
It doesn't copy all the other columns except column 1.
How can i fix this ?
Because your file is comma-separated, you need to specify a comma as the field separator on both input and output:
$ awk -F, '{$1=($1/1000)}1' OFS=, file.csv
1,abc,0,1
2,cde,2,3
-F, tells awk to use a comma as the field separator on input.
OFS=, tells awk to use a comma as the field separator on output.
Changing the file in-place
With a modern GNU awk:
awk -i inplace -F, '{$1=($1/1000)}1' OFS=, file.csv
With BSD/OSX or other non-GNU awk:
awk -F, '{$1=($1/1000)}1' OFS=, file.csv >tmp && mv tmp file.csv
Alternate style
Some stylists prefer OFS to be set before the code:
awk -F, -v OFS=, '{$1=($1/1000)}1' file.csv

awk to print some parameters of a line

I have lines in a file in linux, and i am trying print the line without the | and without some parameters
$cat file
2013-07-15,Provider 1.99,3|30000055|2347|0,12222,1,3,0,0,0,19,aaa,bbb
2013-07-15,Provider 1.99,3|30000055|2347|0,12222,44,12,0,0,0,33,aaa,bbb
and i need the output like:
2013-07-15,Provider,2347,12222,1,3,0,0,0,19,aaa,bbb
2013-07-15,Provider,2347,12222,44,12,0,0,0,33,aaa,bbb
and i am trying with awk, but i have some problems.
If your lines have similar pattern you would to retain then you can do:
awk 'BEGIN{FS=OFS=","}{$2="Provider";$3=2347}1' file
If you don't know what the patterns are then here is a more generic one:
awk 'BEGIN{FS=OFS=","}{split($2,a,/ /);split($3,b,/\|/);$2=a[1];$3=b[3]}1' file
If it doesn't solve your problem, I am pretty sure it would help you guide to get one.
Using sed:
sed 's/ [^|]*|[^|]*|\([^|]*\)|[^,]/,\1/' input
and some shorter version:
sed 's/ .*|\([^|]*\)|[^,]*/,\1/' input
and even shorter:
sed 's/ .*|\(.*\)|[^,]*/,\1/' input
Use awk, and let blank or comma or pipe be the field separators:
awk -F '[[:blank:],|]' -v OFS=, '{
print $1,$2,$6,$8,$9,$10,$11,$12,$13,$14,$15,$16
}' file
2013-07-15,Provider,2347,12222,1,3,0,0,0,19,aaa,bbb
2013-07-15,Provider,2347,12222,44,12,0,0,0,33,aaa,bbb

How to reverse order of fields using AWK?

I have a file with the following layout:
123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010
How can I convert it into the following by using AWK?
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2009-12-01
Didn't read the question properly the first time. You need a field separator that can be either a dash or a comma. Once you have that you can use the dash as an output field separator (as it's the most common) and fake the comma using concatenation:
awk -F',|-' 'OFS="-" {print $1 "," $4,$3,$2}' file
Pure awk
awk -F"," '{ n=split($2,b,"-");$2=b[3]"-"b[2]"-"b[1];$i=$1","$2 } 1' file
sed
sed -r 's/(^.[^,]*,)([0-9]{2})-([0-9]{2})-([0-9]{4})/\1\4-\3-\2/' file
sed 's/\(^.[^,]*,\)\([0-9][0-9]\)-\([0-9][0-9]\)-\([0-9]\+\)/\1\4-\3-\2/' file
Bash
#!/bin/bash
while IFS="," read -r a b
do
IFS="-"
set -- $b
echo "$a,$3-$2-$1"
done <"file"
Unfortunately, I think standard awk only allows one field separator character so you'll have to pre-process the data. You can do this with tr but if you really want an awk-only solution, use:
pax> echo '123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010' | awk -F, '{print $1"-"$2}' | awk -F- '{print $1","$4"-"$3"-"$2}'
This outputs:
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2010-12-01
as desired.
The first awk changes the , characters to - so that you have four fields separated with the same character (this is the bit I'd usually use tr ',' '-' for).
The second awk prints them out in the order you specified, correcting the field separators at the same time.
If you're using an awk implementation that allows multiple FS characters, you can use something like:
gawk -F ',|-' '{print $1","$4"-"$3"-"$2}'
If it doesn't need to be awk, you could use Perl too:
$ perl -nle 'print "$1,$4-$3-$2" while (/(\d{3}),(\d{2})-(\d{2})-(\d{4})\s*/g)' < file.txt

Resources