csv file replace two character string with three character

csv file replace two character string with three character - linux

I would like to replace a handful of strings with others (e.g. "GG" with "GGX", "GG " with "GGX", "FG" with "FGX", etc) in the first column of a big csv file using a shell command.
I know I need something like
big.csv shell_commands big.csv
but I don't know awk or sed

Using sed, replacing all instances of "GG" with "GGX" in big.csv would look like:
sed 's/^GG/GGX/g' big.csv >big_translated.csv
If you need to replace multiple patterns, you can use multiple replace commands in sed separated by semicolons.
sed 's/^GG/GGX/g; s/^FG/FGX/g' big.csv >big_translated.csv
The ^ character means beginning of line and ensures that we only edit the first field of the csv.

awk 'BEGIN{ r["GG"] = "GGX"; r["FG"] = "FGX" }
{ for( k in r ) if( gsub( k, r[k], $1 ) break } 1' input-file
The break is there to prevent multiple substitutions.

Try this (provided you have single occurance of strings)
awk '{sub("GG","GGX",$0); sub("FG","FGX",$0); print}' temp.txt

How about this?
sed -i "s/^\(..\),/\1X,/" big.csv
Or if you got there some spaces this:
sed -i "s/^\([^ ][^ ][ ]*\),/\1X,/" big.csv

Related

Replace multiline string with sed

I have a file that's basically an INI/CFG file the looks like this:
[thing-a]
attribute1=foo
attribute2=bar
attribute3=foobar
attribute4=barfoo
[thing-b]
attribute1=dog
attribute3=foofoo
attribute4=castles
[thing-c]
attribute1=foo
attribute4=barfoo
[thing-d]
attribute1=123455
attribute2=dogs
attribute3=biscuits
attribute4=1234
Each 'thing' has a set of attributes that could include all the same ones or a subset there of.
I am trying to write a small bash script that will replace the attributes for 'thing-c' with a predefined block $a1, $a2 & $a3 are generated elsewhere in the wider script:
NEW_BLOCK="[thing-c]
attribute1=${a1}
attribute2=${a2}
attribute3=${a3}"
I can find the right block with sed like this:
THING_BLOCK=$(sed -nr "/^\[thing-c\]/ { :l /^\s*[^#].*/ p; n; /^\[/ q; b l; }" ./myThingFile)
I'm not sure if i've gone down a rabbit hole or what with this and I'm pretty sure there is a better way of doing it.
I'm wanting to do what is:
sed "s/${THING_BLOCK}/${NEW_BLOCK}/"
But I can't quite figure out the multiline aspect to this and I'm not sure what the best route to take is.
Is there a way to do this sort of multiline find and replace with sed (or a better way with bash)

Is there a way to do this sort of multiline find and replace ...
Yes there is indeed a better way, albeit using awk:
awk -v blk="$NEW_BLOCK" -v RS= '{ORS = RT} $1 == "[thing-c]" {$0 = blk} 1' file
Using -v RS= we use an empty record separator that splits records in input file on each new line.

Another awk. Store the replacement to file2 and:
$ awk -v RS="" '
NR==FNR {
b=$0
next
}
$1~/thing-c/ {
$0=b
}
{
print (++c==1?"":ORS) $0
}' file2 file1
Output:
[thing-a]
attribute1=foo
attribute2=bar
attribute3=foobar
attribute4=barfoo
[thing-b]
attribute1=dog
attribute3=foofoo
attribute4=castles
[thing-c]
attribute1=${a1}
attribute2=${a2}
attribute3=${a3}
[thing-d]
attribute1=123455
attribute2=dogs
attribute3=biscuits
attribute4=1234

When you want to use sed(IMHO awk is better here), you must have "nice" data (no special characters that sed will try to handle and [ inside block thing-3).
I tested with
read -d '' -r NEW_BLOCK <<END
[thing-c]
attribute1=${a1}
attribute2=${a2}
attribute3=${a3}
END
For my solution I first need to replace newlines in $NEW_BLOCK with the two characters \n.
echo "This is the replacement string: ${NEW_BLOCK//$'\n'/\\n}"
With the "multi-line" option "-z" you can do
sed -rz "s/\[thing-c\][^[]*/${NEW_BLOCK//$'\n'/\\n}\n\n/" myThingFile

bash Changing every other comma to point

I am working with set of data which is written in Swedish format. comma is used instead of point for decimal numbers in Sweden.
My data set is like this:
1,188,1,250,0,757,0,946,8,960
1,257,1,300,0,802,1,002,9,485
1,328,1,350,0,846,1,058,10,021
1,381,1,400,0,880,1,100,10,418
Which I want to change every other comma to point and have output like this:
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418
Any idea of how to do that with simple shell scripting. It is fine If I do it in multiple steps. I mean if I change first the first instance of comma and then the third instance and ...
Thank you very much for your help.

Using sed
sed 's/,\([^,]*\(,\|$\)\)/.\1/g' file
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418

For reference, here is a possible way to achieve the conversion using awk:
awk -F, '{for(i=1;i<=NF;i=i+2) {printf $i "." $(i+1); if(i<NF-2) printf FS }; printf "\n" }' file
The for loop iterates every 2 fields separated by a comma (set by the option -F,) and prints the current element and the next one separated by a dot.
The comma separator represented by FS is printed except at the end of line.

As a Perl one-liner, using split and array manipulation:
perl -F, -e '#a = #b = (); while (#b = splice #F, 0, 2) {
push #a, join ".", #b} print join ",", #a' file
Output:
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418

Many sed dialects allow you to specify which instance of a pattern to replace by specifying a numeric option to s///.
sed -e 's/,/./9' -e 's/,/./7' -e 's/,/./5' -e 's/,/./3' -e 's/,/./'
ISTR some sed dialects would allow you to simplify this to
sed 's/,/./1,2'
but this is not supported on my Debian.
Demo: http://ideone.com/6s2lAl

replace string in a file with a string from within the same file

I have a file like this (tens of variables) :
PLAY="play"
APPS="/opt/play/apps"
LD_FILER="/data/mysql"
DATA_LOG="/data/log"
I need a script that will output the variables into another file like this (with space between them):
PLAY=${PLAY} APPS=${APPS} LD_FILER=${LD_FILER}
Is it possible ?

I would say:
$ awk -F= '{printf "%s=${%s} ", $1,$1} END {print ""}' file
PLAY=${PLAY} APPS=${APPS} LD_FILER=${LD_FILER} DATA_LOG=${DATA_LOG}
This loops through the file and prints the content before = in a format var=${var} together with a space. At the end, it prints a new line.
Note this leaves a trailing space at the end of the line. If this matters, we can check how to improve it.

< input sed -e 's/\(.*\)=.*/\1=${\1}/' | tr \\n \ ; echo

sed 's/"\([^"]*"\)"/={\1}/;H;$!d
x;y/\n/ /;s/.//' YourFile
your sample exclude last line so if this is important
sed '/DATA_LOG=/ d
s/"\([^"]*"\)"/={\1}/;H;$!d
x;y/\n/ /;s/.//' YourFile

Remove string using sed or awk, grep

I'm trying find and remove strings like:
[1126604244001,85.00], [1122204245002,85.00], [1221104246003,85.00],
[1222204247004,85.00], [1823304248005,85.00], [1424404249006,85.00],
85.00 = constans. I mean [xxxxxxxxxxxxx,85.00],
In notepad++ is simple:
find: "[^........].............,85.00]" and replace:""
I wolud like to use awk or sed to remove string automaticly without importing it to notepad++.
ok, I have file
temp.txt
[1126604244001,17.00], [1126604244001,17.00], [1126604244001,17.00],
[1126604244001,85.00], [1122204245002,85.00], [1221104246003,85.00],
[1222204247004,85.00], [1823304248005,85.00], [1424404249006,85.00], [1126604244001,17.00], [1126604244001,17.00],
My desire output
temp.txt
[1126604244001,17.00],[1126604244001,17.00],[1126604244001,17.00],[1126604244001,17.00],[1126604244001,17.00],
Thx in advance!

With sed, simply:
sed 's/\[[^]]*,85.00\],[[:space:]]*//g' filename
With this, everything that matches the regex \[[^]]*,85.00\],[[:space:]]* is removed. The regex matches [ followed by an arbitrary number of characters that are not ], followed by ,85.00], and optionally spaces; the only syntactically tricky bit is the [^]] character set which matches all characters other than ].
Alternatively with awk:
awk -v RS='],' -v ORS='],' '!/,85.00$/' filename
This splits the input into records delimited by ], and prints only those that don't end with ,85.00.

egrep -v '[^0-9]85\.00]' YourFile
remove (not empty) line with your pattern

grep only for certain word on line

Need to grep only the word between the 2nd and 3rd to last /
This is shown in the extract below, to note that the location on the filename is not always the same counting from the front. Any ideas would be helpful.
/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt

Here is a Perl script that does the job:
my $str = q!/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt!;
my $res = (split('/',$str))[-3];
print $res;
output:
usernameneedtogrep

I'd use awk:
awk -F/ '{print $(NF-2)}'
splits on /
NF is the index of the last column, $NF the last column itself and $(NF-2) the 3rd-to-last column.
You might of course first need to filter out lines in your input that are not paths (e.g. using grep and then piping to awk)

a regular expression something like this should do the trick:
/.\/(.+?)\/.*?\/.*$/
(note I'm using lazy searches (+? and *?) so that it doesn't includes slashes where we don't want it to)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

csv file replace two character string with three character - linux

I would like to replace a handful of strings with others (e.g. "GG" with "GGX", "GG " with "GGX", "FG" with "FGX", etc) in the first column of a big csv file using a shell command. I know I need something like big.csv shell_commands big.csv but I don't know awk or sed

awk 'BEGIN{ r["GG"] = "GGX"; r["FG"] = "FGX" } { for( k in r ) if( gsub( k, r[k], $1 ) break } 1' input-file The break is there to prevent multiple substitutions.

Try this (provided you have single occurance of strings) awk '{sub("GG","GGX",$0); sub("FG","FGX",$0); print}' temp.txt

How about this? sed -i "s/^\(..\),/\1X,/" big.csv Or if you got there some spaces this: sed -i "s/^\([^ ][^ ][ ]*\),/\1X,/" big.csv

Related

Replace multiline string with sed

bash Changing every other comma to point

replace string in a file with a string from within the same file

Remove string using sed or awk, grep

grep only for certain word on line

Categories

Resources