Replace characters in specific columns only (CSV) - linux

I have data like this:
1;2015-04-10;23:10:00;10.4.2015 23:10;8.9;1007.5;0.3;0.0;0;55
2;2015-04-10;23:20:00;10.4.2015 23:20;8.6;1007.8;0.4;0.0;0;56
3;2015-04-10;23:30:00;10.4.2015 23:30;8.5;1008.1;0.4;0.0;0;57
It has dot . as decimal separator but I need to use , instead.
Desired data:
1;2015-04-10;23:10:00;10.4.2015 23:10;8,9;1007,5;0,3;0,0;0;55
I tried using Sed. With sed -i 's/\./,/g' myfile.csv I could replace all dots with commas but would destroy dates on the fourth column. How can I change dots to commas in elsewhere but leave the fourth column as it is? If some other Linux tool is better for this task than Sed I could use it as well.

sed is for simple substitutions, for anything else just use awk:
$ awk 'BEGIN{FS=OFS=";"} {for (i=5;i<=NF;i++) sub(/\./,",",$i)} 1' file
1;2015-04-10;23:10:00;10.4.2015 23:10;8,9;1007,5;0,3;0,0;0;55
2;2015-04-10;23:20:00;10.4.2015 23:20;8,6;1007,8;0,4;0,0;0;56
3;2015-04-10;23:30:00;10.4.2015 23:30;8,5;1008,1;0,4;0,0;0;57

Perl and Text::CSV:
#! /usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my $csv = 'Text::CSV'->new({ binary => 1,
sep_char => ';',
quote_space => 0,
}) or die 'Text::CSV'->error_diag;
open my $FH, '<:encoding(utf8)', 'input.csv' or die $!;
$csv->eol("\n");
while (my $row = $csv->getline($FH)) {
s/\./,/g for #$row[ 0 .. 2, 4 .. $#$row ];
$csv->print(*STDOUT, $row);
}

You could go with:
awk 'BEGIN {FS=OFS=";"} {if(NF==5);gsub(/\./,",",$5)} 1 ' filename
Here I have used gsub instead of sub; the difference is that sub will replace only the first occurrence, whereas gsub will replace all occurrences.

changes dot to comma in second column
awk '{gsub(/\./,",",$2)}1' file
1;2015-04-10;23:10:00;10.4.2015 23:10;8,9;1007,5;0,3;0,0;0;55
2;2015-04-10;23:20:00;10.4.2015 23:20;8,6;1007,8;0,4;0,0;0;56
3;2015-04-10;23:30:00;10.4.2015 23:30;8,5;1008,1;0,4;0,0;0;57

Related

How to pad CSV file missing columns

I have a problem with some CSV files comming from a soft and that I want to use to make PostgreSQL import (function COPY FROM CSV). The problem is that some last columns are missing like this (letter for headers, number for values, _ for the TAB delimiter):
a_b_c_d
1_2_3_4
5_6_7 <- last column missing
8_9_0_1
2_6_7 <- last column missing
COPY in_my_table FROM file.csv result is :
ERROR: missing data for column "d"
Sample of a correct file for import :
a_b_c_d
1_2_3_4
5_6_7_ <- null column but not missing
8_9_0_1
2_6_7_ <- null column but not missing
My question : is there some commands in bash / linux shell to add the TAB delimiter to make a correct / comlete / padded csv file with all columns.
Thanks for help.
Ok, so in fact I found this:
awk -F'\t' -v OFS='\t' 'NF=50' input.csv > output.csv
where 50 is the number of TAB + 1.
Don't knew much about linux but this could be easily done in postgresql via simple command like
copy tableName from '/filepath/name.csv' delimiter '_' csv WITH NULL AS 'null';
You can use a combination of sed and regular expressions:
sed -r 's/^[0-9](_[0-9]){2}$/\0_/g' file.csv
You only need to replace _ by your delimiter (\t).
Awk is good for this.
awk -F"\t" '{ # Tell awk we are working with tabs
if ($4 =="") # If the last field is empty
print $0"\t" # print the whole line with a tab
else
print $0 # Otherwise just print the line
}' your.csv > your.fixed.csv
Perl has a CSV module, which might be handy to fix even more complicated CSV errors. On my Ubuntu test system it is part of the package libtext-csv-perl.
This fixes your problem:
#! /usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new ({ binary => 1, eol => $/, sep_char => '_' });
open my $broken, '<', 'broken.csv';
open my $fixed, '>', 'fixed.csv';
while (my $row = $csv->getline ($broken)) {
$#{$row} = 3;
$csv->print ($fixed, $row);
}
Change sep_char to "\t", if you have a tabulator delimited file and keep in mind that Perl treats "\t" and '\t' differently.

bash Changing every other comma to point

I am working with set of data which is written in Swedish format. comma is used instead of point for decimal numbers in Sweden.
My data set is like this:
1,188,1,250,0,757,0,946,8,960
1,257,1,300,0,802,1,002,9,485
1,328,1,350,0,846,1,058,10,021
1,381,1,400,0,880,1,100,10,418
Which I want to change every other comma to point and have output like this:
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418
Any idea of how to do that with simple shell scripting. It is fine If I do it in multiple steps. I mean if I change first the first instance of comma and then the third instance and ...
Thank you very much for your help.
Using sed
sed 's/,\([^,]*\(,\|$\)\)/.\1/g' file
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418
For reference, here is a possible way to achieve the conversion using awk:
awk -F, '{for(i=1;i<=NF;i=i+2) {printf $i "." $(i+1); if(i<NF-2) printf FS }; printf "\n" }' file
The for loop iterates every 2 fields separated by a comma (set by the option -F,) and prints the current element and the next one separated by a dot.
The comma separator represented by FS is printed except at the end of line.
As a Perl one-liner, using split and array manipulation:
perl -F, -e '#a = #b = (); while (#b = splice #F, 0, 2) {
push #a, join ".", #b} print join ",", #a' file
Output:
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418
Many sed dialects allow you to specify which instance of a pattern to replace by specifying a numeric option to s///.
sed -e 's/,/./9' -e 's/,/./7' -e 's/,/./5' -e 's/,/./3' -e 's/,/./'
ISTR some sed dialects would allow you to simplify this to
sed 's/,/./1,2'
but this is not supported on my Debian.
Demo: http://ideone.com/6s2lAl

remove lines from text file that contain specific text

I'm trying to remove lines that contain 0/0 or ./. in column 71 "FORMAT.1.GT" from a tab delimited text file.
I've tried the following code but it doesn't work. What is the correct way of accomplishing this? Thank you
my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";
You can either call a one-liner as borodin and zdim said. Which one is right for you is still not clear because you don't tell whether 71st column means the 71st tab-separated field of a line or the 71st character of that line. Consider
12345\t6789
Now what is the 2nd column? Is it the character 2 or the field 6789? Borodin's answer assumes it's 6789 while zdim assumes it's 2. Both showed a solution for either case but these solutions are stand-alone solutions. Programs of its own to be run from the commandline.
If you want to integrate that into your Perl script you could do it like this:
Replace this line:
my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";
with this snippet:
open( my $fh_in, '<', $Variantlinestsvfile ) or die "cannot open $Variantlinestsvfile: $!\n";
open( my $fh_out, '>', $MDLtsvfile ) or die "cannot open $MDLtsvfile: $!\n";
while( my $line = <$fh_in> ) {
# character-based:
print $fh_out $line unless (substr($line, 70, 3) =~ m{(?:0/0|\./\.)});
# tab/field-based:
my #fields = split(/\s+/, $line);
print $fh_out $line unless ($fields[70] =~ m|([0.])/\1|);
}
close($fh_in);
close($fh_out);
Use either the character-based line or the tab/field-based lines. Not both!
Borodin and zdim condensed this snippet to a one-liner, but you must not call that from a Perl script.
Since you need the exact position and know string lenghts substr can find it
perl -ne 'print if not substr($_, 70, 3) =~ m{(?:0/0|\./\.)}' filename
This prints lines only when a three-character long string starting at 71st column does not match either of 0/0 and ./.
The {} delimiters around the regex allow us to use / and | inside without escaping. The ?: is there so that the () are used only for grouping, and not capturing. It will work fine also without ?: which is there only for efficiency's sake.
perl -ane 'print unless $F[70] =~ m|([0.])/\1|' myfile > newfile
The problem with your command is that you are attempting to capture the output of a command which produces no output - all the matches are redirected to a file, so that's where all the output is going.
Anyway, calling grep from Perl is just wacky. Reading the file in Perl itself is the way to go.
If you do want a single shell command,
grep -Ev $'^([^\t]*\t){70}(\./\.|0/0)\t' file
would do what you are asking more precisely and elegantly. But you can use that regex straight off in your Perl program just as well.
Try it!
awk '{ if ($71 != "./." && $71 != ".0.") print ; }' old_file.txt > new_file.txt

replace word in line only if line number start with + csv file

I use the following sed command in order to replace string in CSV line
( the condition to replace the string is to match the number in the beginning of the CSV file )
SERIAL_NUM=1
sed "/$SERIAL_NUM/ s//OK/g" file.csv
the problem is that I want to match only the number that start in the beginning of the line ,
but sed match other lines that have this number
example:
in this example I want to replace the word - STATUS to OK but only in line that start with 1 ( before the "," separator )
so I do this
SERIAL_NUM=1
more file.csv
1,14556,43634,266,242,def,45,STATUS
2,4345,1,43,57,24,657,SD,STATUS
3,1,WQ,435,676,90,3,44f,STATUS
sed -i "/$SERIAL_NUM/ s/STATUS/OK/g" file.csv
more file.csv
1,14556,43634,266,242,def,45,OK
2,4345,1,43,57,24,657,SD,OK
3,1,WQ,435,676,90,3,44f,OK
but sed also replace the STATUS to OK also in line 2 and line 3 ( because those lines have the number 1 )
please advice how to change the sed syntax in order to match only the number that start the line before the "," separator
remark - solution can be also with perl line liner or awk ,
You can use anchor ^ to make sure $SERIAL_NUM only matches at start and use , after that to make sure there is a comma followed by this number:
sed "/^$SERIAL_NUM,/s/STATUS/OK/g" file.csv
Since this answer was ranked fifth in the Stackoverflow perl report but had no perl content, I thought it would be useful to add the following - instead of removing the perl tag :-)
#!/usr/bin/env perl
use strict;
use warnings;
while(<DATA>){
s/STATUS/OK/g if /^1\,/;
print ;
}
__DATA__
1,14556,43634,266,242,def,45,STATUS
2,4345,1,43,57,24,657,SD,STATUS
3,1,WQ,435,676,90,3,44f,STATUS
or as one line:
perl -ne 's/STATUS/OK/g if /^1\,/;' file.csv

csv file replace two character string with three character

I would like to replace a handful of strings with others (e.g. "GG" with "GGX", "GG " with "GGX", "FG" with "FGX", etc) in the first column of a big csv file using a shell command.
I know I need something like
big.csv shell_commands big.csv
but I don't know awk or sed
Using sed, replacing all instances of "GG" with "GGX" in big.csv would look like:
sed 's/^GG/GGX/g' big.csv >big_translated.csv
If you need to replace multiple patterns, you can use multiple replace commands in sed separated by semicolons.
sed 's/^GG/GGX/g; s/^FG/FGX/g' big.csv >big_translated.csv
The ^ character means beginning of line and ensures that we only edit the first field of the csv.
awk 'BEGIN{ r["GG"] = "GGX"; r["FG"] = "FGX" }
{ for( k in r ) if( gsub( k, r[k], $1 ) break } 1' input-file
The break is there to prevent multiple substitutions.
Try this (provided you have single occurance of strings)
awk '{sub("GG","GGX",$0); sub("FG","FGX",$0); print}' temp.txt
How about this?
sed -i "s/^\(..\),/\1X,/" big.csv
Or if you got there some spaces this:
sed -i "s/^\([^ ][^ ][ ]*\),/\1X,/" big.csv

Resources