Replace characters in specific columns only (CSV)

Replace characters in specific columns only (CSV) - linux

I have data like this:
1;2015-04-10;23:10:00;10.4.2015 23:10;8.9;1007.5;0.3;0.0;0;55
2;2015-04-10;23:20:00;10.4.2015 23:20;8.6;1007.8;0.4;0.0;0;56
3;2015-04-10;23:30:00;10.4.2015 23:30;8.5;1008.1;0.4;0.0;0;57
It has dot . as decimal separator but I need to use , instead.
Desired data:
1;2015-04-10;23:10:00;10.4.2015 23:10;8,9;1007,5;0,3;0,0;0;55
I tried using Sed. With sed -i 's/\./,/g' myfile.csv I could replace all dots with commas but would destroy dates on the fourth column. How can I change dots to commas in elsewhere but leave the fourth column as it is? If some other Linux tool is better for this task than Sed I could use it as well.

sed is for simple substitutions, for anything else just use awk:
$ awk 'BEGIN{FS=OFS=";"} {for (i=5;i<=NF;i++) sub(/\./,",",$i)} 1' file
1;2015-04-10;23:10:00;10.4.2015 23:10;8,9;1007,5;0,3;0,0;0;55
2;2015-04-10;23:20:00;10.4.2015 23:20;8,6;1007,8;0,4;0,0;0;56
3;2015-04-10;23:30:00;10.4.2015 23:30;8,5;1008,1;0,4;0,0;0;57

Perl and Text::CSV:
#! /usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my $csv = 'Text::CSV'->new({ binary => 1,
sep_char => ';',
quote_space => 0,
}) or die 'Text::CSV'->error_diag;
open my $FH, '<:encoding(utf8)', 'input.csv' or die $!;
$csv->eol("\n");
while (my $row = $csv->getline($FH)) {
s/\./,/g for #$row[ 0 .. 2, 4 .. $#$row ];
$csv->print(*STDOUT, $row);
}

You could go with:
awk 'BEGIN {FS=OFS=";"} {if(NF==5);gsub(/\./,",",$5)} 1 ' filename
Here I have used gsub instead of sub; the difference is that sub will replace only the first occurrence, whereas gsub will replace all occurrences.

changes dot to comma in second column
awk '{gsub(/\./,",",$2)}1' file
1;2015-04-10;23:10:00;10.4.2015 23:10;8,9;1007,5;0,3;0,0;0;55
2;2015-04-10;23:20:00;10.4.2015 23:20;8,6;1007,8;0,4;0,0;0;56
3;2015-04-10;23:30:00;10.4.2015 23:30;8,5;1008,1;0,4;0,0;0;57

Related

How to pad CSV file missing columns

I have a problem with some CSV files comming from a soft and that I want to use to make PostgreSQL import (function COPY FROM CSV). The problem is that some last columns are missing like this (letter for headers, number for values, _ for the TAB delimiter):
a_b_c_d
1_2_3_4
5_6_7 <- last column missing
8_9_0_1
2_6_7 <- last column missing
COPY in_my_table FROM file.csv result is :
ERROR: missing data for column "d"
Sample of a correct file for import :
a_b_c_d
1_2_3_4
5_6_7_ <- null column but not missing
8_9_0_1
2_6_7_ <- null column but not missing
My question : is there some commands in bash / linux shell to add the TAB delimiter to make a correct / comlete / padded csv file with all columns.
Thanks for help.

Ok, so in fact I found this:
awk -F'\t' -v OFS='\t' 'NF=50' input.csv > output.csv
where 50 is the number of TAB + 1.

Don't knew much about linux but this could be easily done in postgresql via simple command like
copy tableName from '/filepath/name.csv' delimiter '_' csv WITH NULL AS 'null';

You can use a combination of sed and regular expressions:
sed -r 's/^[0-9](_[0-9]){2}$/\0_/g' file.csv
You only need to replace _ by your delimiter (\t).

Awk is good for this.
awk -F"\t" '{ # Tell awk we are working with tabs
if ($4 =="") # If the last field is empty
print $0"\t" # print the whole line with a tab
else
print $0 # Otherwise just print the line
}' your.csv > your.fixed.csv

Perl has a CSV module, which might be handy to fix even more complicated CSV errors. On my Ubuntu test system it is part of the package libtext-csv-perl.
This fixes your problem:
#! /usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new ({ binary => 1, eol => $/, sep_char => '_' });
open my $broken, '<', 'broken.csv';
open my $fixed, '>', 'fixed.csv';
while (my $row = $csv->getline ($broken)) {
$#{$row} = 3;
$csv->print ($fixed, $row);
}
Change sep_char to "\t", if you have a tabulator delimited file and keep in mind that Perl treats "\t" and '\t' differently.

bash Changing every other comma to point

I am working with set of data which is written in Swedish format. comma is used instead of point for decimal numbers in Sweden.
My data set is like this:
1,188,1,250,0,757,0,946,8,960
1,257,1,300,0,802,1,002,9,485
1,328,1,350,0,846,1,058,10,021
1,381,1,400,0,880,1,100,10,418
Which I want to change every other comma to point and have output like this:
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418
Any idea of how to do that with simple shell scripting. It is fine If I do it in multiple steps. I mean if I change first the first instance of comma and then the third instance and ...
Thank you very much for your help.

Using sed
sed 's/,\([^,]*\(,\|$\)\)/.\1/g' file
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418

For reference, here is a possible way to achieve the conversion using awk:
awk -F, '{for(i=1;i<=NF;i=i+2) {printf $i "." $(i+1); if(i<NF-2) printf FS }; printf "\n" }' file
The for loop iterates every 2 fields separated by a comma (set by the option -F,) and prints the current element and the next one separated by a dot.
The comma separator represented by FS is printed except at the end of line.

As a Perl one-liner, using split and array manipulation:
perl -F, -e '#a = #b = (); while (#b = splice #F, 0, 2) {
push #a, join ".", #b} print join ",", #a' file
Output:
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418

Many sed dialects allow you to specify which instance of a pattern to replace by specifying a numeric option to s///.
sed -e 's/,/./9' -e 's/,/./7' -e 's/,/./5' -e 's/,/./3' -e 's/,/./'
ISTR some sed dialects would allow you to simplify this to
sed 's/,/./1,2'
but this is not supported on my Debian.
Demo: http://ideone.com/6s2lAl

remove lines from text file that contain specific text

I'm trying to remove lines that contain 0/0 or ./. in column 71 "FORMAT.1.GT" from a tab delimited text file.
I've tried the following code but it doesn't work. What is the correct way of accomplishing this? Thank you
my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";

You can either call a one-liner as borodin and zdim said. Which one is right for you is still not clear because you don't tell whether 71st column means the 71st tab-separated field of a line or the 71st character of that line. Consider
12345\t6789
Now what is the 2nd column? Is it the character 2 or the field 6789? Borodin's answer assumes it's 6789 while zdim assumes it's 2. Both showed a solution for either case but these solutions are stand-alone solutions. Programs of its own to be run from the commandline.
If you want to integrate that into your Perl script you could do it like this:
Replace this line:
my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";
with this snippet:
open( my $fh_in, '<', $Variantlinestsvfile ) or die "cannot open $Variantlinestsvfile: $!\n";
open( my $fh_out, '>', $MDLtsvfile ) or die "cannot open $MDLtsvfile: $!\n";
while( my $line = <$fh_in> ) {
# character-based:
print $fh_out $line unless (substr($line, 70, 3) =~ m{(?:0/0|\./\.)});
# tab/field-based:
my #fields = split(/\s+/, $line);
print $fh_out $line unless ($fields[70] =~ m|([0.])/\1|);
}
close($fh_in);
close($fh_out);
Use either the character-based line or the tab/field-based lines. Not both!
Borodin and zdim condensed this snippet to a one-liner, but you must not call that from a Perl script.

Since you need the exact position and know string lenghts substr can find it
perl -ne 'print if not substr($_, 70, 3) =~ m{(?:0/0|\./\.)}' filename
This prints lines only when a three-character long string starting at 71st column does not match either of 0/0 and ./.
The {} delimiters around the regex allow us to use / and | inside without escaping. The ?: is there so that the () are used only for grouping, and not capturing. It will work fine also without ?: which is there only for efficiency's sake.

perl -ane 'print unless $F[70] =~ m|([0.])/\1|' myfile > newfile

The problem with your command is that you are attempting to capture the output of a command which produces no output - all the matches are redirected to a file, so that's where all the output is going.
Anyway, calling grep from Perl is just wacky. Reading the file in Perl itself is the way to go.
If you do want a single shell command,
grep -Ev $'^([^\t]*\t){70}(\./\.|0/0)\t' file
would do what you are asking more precisely and elegantly. But you can use that regex straight off in your Perl program just as well.

Try it!
awk '{ if ($71 != "./." && $71 != ".0.") print ; }' old_file.txt > new_file.txt

replace word in line only if line number start with + csv file

I use the following sed command in order to replace string in CSV line
( the condition to replace the string is to match the number in the beginning of the CSV file )
SERIAL_NUM=1
sed "/$SERIAL_NUM/ s//OK/g" file.csv
the problem is that I want to match only the number that start in the beginning of the line ,
but sed match other lines that have this number
example:
in this example I want to replace the word - STATUS to OK but only in line that start with 1 ( before the "," separator )
so I do this
SERIAL_NUM=1
more file.csv
1,14556,43634,266,242,def,45,STATUS
2,4345,1,43,57,24,657,SD,STATUS
3,1,WQ,435,676,90,3,44f,STATUS
sed -i "/$SERIAL_NUM/ s/STATUS/OK/g" file.csv
more file.csv
1,14556,43634,266,242,def,45,OK
2,4345,1,43,57,24,657,SD,OK
3,1,WQ,435,676,90,3,44f,OK
but sed also replace the STATUS to OK also in line 2 and line 3 ( because those lines have the number 1 )
please advice how to change the sed syntax in order to match only the number that start the line before the "," separator
remark - solution can be also with perl line liner or awk ,

You can use anchor ^ to make sure $SERIAL_NUM only matches at start and use , after that to make sure there is a comma followed by this number:
sed "/^$SERIAL_NUM,/s/STATUS/OK/g" file.csv

Since this answer was ranked fifth in the Stackoverflow perl report but had no perl content, I thought it would be useful to add the following - instead of removing the perl tag :-)
#!/usr/bin/env perl
use strict;
use warnings;
while(<DATA>){
s/STATUS/OK/g if /^1\,/;
print ;
}
__DATA__
1,14556,43634,266,242,def,45,STATUS
2,4345,1,43,57,24,657,SD,STATUS
3,1,WQ,435,676,90,3,44f,STATUS
or as one line:
perl -ne 's/STATUS/OK/g if /^1\,/;' file.csv

csv file replace two character string with three character

I would like to replace a handful of strings with others (e.g. "GG" with "GGX", "GG " with "GGX", "FG" with "FGX", etc) in the first column of a big csv file using a shell command.
I know I need something like
big.csv shell_commands big.csv
but I don't know awk or sed

Using sed, replacing all instances of "GG" with "GGX" in big.csv would look like:
sed 's/^GG/GGX/g' big.csv >big_translated.csv
If you need to replace multiple patterns, you can use multiple replace commands in sed separated by semicolons.
sed 's/^GG/GGX/g; s/^FG/FGX/g' big.csv >big_translated.csv
The ^ character means beginning of line and ensures that we only edit the first field of the csv.

awk 'BEGIN{ r["GG"] = "GGX"; r["FG"] = "FGX" }
{ for( k in r ) if( gsub( k, r[k], $1 ) break } 1' input-file
The break is there to prevent multiple substitutions.

Try this (provided you have single occurance of strings)
awk '{sub("GG","GGX",$0); sub("FG","FGX",$0); print}' temp.txt

How about this?
sed -i "s/^\(..\),/\1X,/" big.csv
Or if you got there some spaces this:
sed -i "s/^\([^ ][^ ][ ]*\),/\1X,/" big.csv

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Replace characters in specific columns only (CSV) - linux

You could go with: awk 'BEGIN {FS=OFS=";"} {if(NF==5);gsub(/\./,",",$5)} 1 ' filename Here I have used gsub instead of sub; the difference is that sub will replace only the first occurrence, whereas gsub will replace all occurrences.

changes dot to comma in second column awk '{gsub(/\./,",",$2)}1' file 1;2015-04-10;23:10:00;10.4.2015 23:10;8,9;1007,5;0,3;0,0;0;55 2;2015-04-10;23:20:00;10.4.2015 23:20;8,6;1007,8;0,4;0,0;0;56 3;2015-04-10;23:30:00;10.4.2015 23:30;8,5;1008,1;0,4;0,0;0;57

Related

How to pad CSV file missing columns

bash Changing every other comma to point

remove lines from text file that contain specific text

replace word in line only if line number start with + csv file

csv file replace two character string with three character

Categories

Resources