Replace comma between two characters - linux

I have a txt file and I need to replace comma with space only between quotation marks.
For example:
This,is,example,"need,delete comma",xxxx
And the result should be:
This,is,example,"need delete comma",xxxx
I have this command, but it's wrong:
sed -i '/^"/,/^"/s/.,/ /' output.txt

Try this:
awk 'NR%2-1{gsub(/,/," ")}1' RS=\" ORS=\" input.txt > output.txt
Input:
This,is,example,"need,delete comma",xxxx
Output:
This,is,example,"need delete comma",xxxx

Complex awk solution:
Sample testfile:
This,is,example,"need,delete comma",xxxx
asda
asd.dasd,asd"sdf,dd","sdf,sdfsdf"
"some,text,here" another text there""
The job:
awk -F'"' '$0~/"/ && NF>1{ for(i=1;i<=NF;i++) { if(!(i%2)) gsub(/,/," ",$i) }}1' OFS='"' testfile
The output:
This,is,example,"need delete comma",xxxx
asda
asd.dasd,asd"sdf dd","sdf sdfsdf"
"some text here" another text there""

how about the following awk, where I am looking for only match between "..." and then simply removing all commas in that match. Then replace the new values of it with old ".." values.
awk '{match($0,/\".*\"/);val=substr($0,RSTART,RLENGTH);gsub(/,/," ",val);gsub(/\".*\"/,val,$0)} 1' Input_file
EDIT1: After seeing RomanPerekhrest's Input_file a little change in above code which will prohibit to change "," in any line.
awk '{match($0,/\".*\"/);val=substr($0,RSTART,RLENGTH);gsub(/[^","],/," ",val);gsub(/\".*\"/,val,$0)} 1' Input_file

echo 'This,is,example,"need,delete comma",xxxx' |awk -F\" '{sub(/,/," ",$2); print}' OFS=\"
This,is,example,"need delete comma",xxxx

Related

Insert filename as column, separated by a comma

I have 100 file that looks like this
>file.csv
gene1,55
gene2,23
gene3,33
I want to insert the filename and make it look like this:
file.csv
gene1,55,file.csv
gene2,23,file.csv
gene3,33,file.csv
Now, I can almost get there using awk
awk '{print $0,FILENAME}' *.csv > concatenated_files.csv
But this prints the filenames with a space, instead of a comma. Is there a way to replace the space with a comma?
Is there a way to replace the space with a comma?
Yes, change the OFS
$ awk -v OFS="," '{print $0,FILENAME}' file.csv
gene1,55,file.csv
gene2,23,file.csv
gene3,33,file.csv
Figured it out, turns out:
for d in *.csv; do (awk '{print FILENAME (NF?",":"") $0}' "$d" > ${d}.all_files.csv); done
Works just fine.
You can also create a new field
awk -vOFS=, '{$++NF=FILENAME}1' file.csv
gene1,55,file.csv
gene2,23,file.csv
gene3,33,file.csv

How to get 1st field of a file only when 2nd field matches a string?

How to get 1st field of a file only when 2nd field matches a given string?
#cat temp.txt
Ankit pass
amit pass
aman fail
abhay pass
asha fail
ashu fail
cat temp.txt | awk -F"\t" '$2 == "fail" { print $1 }'*
gives no output
Another syntax with awk:
awk '$2 ~ /^faild$/{print $1}' input_file
A deleted 'cat' command.
^ start string
$ end string
It's the best way to match patten.
Either:
Your fields are not tab-separated or
You have blanks at the end of the relevant lines or
You have DOS line-endings and so there are CRs at the end of every
line and so also at the end of every $2 in every line (see
Why does my tool output overwrite itself and how do I fix it?)
With GNU cat you can run cat -Tev temp.txt to see tabs (^I), CRs (^M) and line endings ($).
Your code seems to work fine when I remove the * at the end
cat temp.txt | awk -F"\t" '$2 == "fail" { print $1 }'
The other thing to check is if your file is using tab or spaces. My copy/paste of your data file copied spaces, so I needed this line:
cat temp.txt | awk '$2 == "fail" { print $1 }'
The other way of doing this is with grep:
cat temp.txt | grep fail$ | awk '{ print $1 }'

removing bracket value from first and adding some data to second column

I want to remove all the brackets value from first column and add some field in last column
Sample Input
EXAMPLE(abc#gmail.com),60,6
EXAMPLE(bcd#gmail.com),30,6
EXAMPLE1(sample#gmail.com),60,3
Sample Output
EXAMPLE,60,6.ABC
EXAMPLE,30,6,ABC
EXAMPLE1,60,3,ABC
Below is code which I tried but no luck :
for file_name in tmp/*.csv
do
sed -i 's/$/,"AB"/' "$file_name"| awk '{sub(/[(].*[)]/,"")}1' $file_name > tmp.csv && mv tmp.csv $file_name
done
when I try it with single file it is working but in look it is not working:
sed -i 's/$/,"AB"/' abc.csv| awk '{sub(/[(].*[)]/,"")}1' abc.csv > tmp.csv
Perhaps this will do:
awk '{sub(/[(].*[)]/,""); print $0",ABC"}' file | awk '{sub(/60,6,/,"60,6.")}1'
EXAMPLE,60,6.ABC
EXAMPLE,30,6,ABC
EXAMPLE1,60,3,ABC
Try this
/tmp> cat jyoti.txt
EXAMPLE(abc#gmail.com),60,6
EXAMPLE(bcd#gmail.com),30,6
EXAMPLE1(sample#gmail.com),60,3
/tmp> perl -lne 's/\(.*\)//; print "$_,ABC" ' jyoti.txt
EXAMPLE,60,6,ABC
EXAMPLE,30,6,ABC
EXAMPLE1,60,3,ABC
/tmp>

How can I show only some words in a line using sed?

I'm trying to use sed to show only the 1st, 2nd, and 8th word in a line.
The problem I have is that the words are random, and the amount of spaces between the words are also random... For example:
QST334 FFR67 HHYT 87UYU HYHL 9876S NJI QD112 989OPI
Is there a way to get this to output as just the 1st, 2nd, and 8th words:
QST334 FFR67 QD112
Thanks for any advice or hints for the right direction!
Use awk
awk '{print $1,$2,$8}' file
In action:
$ echo "QST334 FFR67 HHYT 87UYU HYHL 9876S NJI QD112 989OPI" | awk '{print $1,$2,$8}'
QST334 FFR67 QD112
You do not really need to put " " between two columns as mentioned in another answer. By default awk consider single white space as output field separator AKA OFS. so you just need commas between the desired columns.
so following is enough:
awk '{print $1,$2,$8}' file
For Example:
echo "QST334 FFR67 HHYT 87UYU HYHL 9876S NJI QD112 989OPI" |awk '{print $1,$2,$8}'
QST334 FFR67 QD112
However, if you wish to have some other OFS then you can do as follow:
echo "QST334 FFR67 HHYT 87UYU HYHL 9876S NJI QD112 989OPI" |awk -v OFS="," '{print $1,$2,$8}'
QST334,FFR67,QD112
Note that this will put a comma between the output columns.
Another solution is to use the cut command:
cut --delimiter '<delimiter-character>' --fields <field> <file>
Where:
'<delimiter-character>'>: is the delimiter on which the string should be parsed.
<field>: specifies which column to output, could a single column 1, multiple columns 1,3 or a range of them 1-3.
In action:
cut -d ' ' -f 1-3 /path/to/file
This might work for you (GNU sed):
sed 's/\s\+/\n/g;s/.*/echo "&"|sed -n "1p;2p;8p"/e;y/\n/ /' file
Convert spaces to newlines. Evaluate each line as a separate file and print only the required lines i.e. fields. Replace remaining newlines with spaces.

awk or sed to change column value in a file

I have a csv file with data as follows
16:47:07,3,r-4-VM,230000000.,0.466028518635,131072,0,0,0,60,0
16:47:11,3,r-4-VM,250000000.,0.50822578824,131072,0,0,0,0,0
16:47:14,3,r-4-VM,240000000.,0.488406067907,131072,0,0,32768,0,0
16:47:17,3,r-4-VM,230000000.,0.467893525702,131072,0,0,0,0,0
I would like to shorten the value in the 5th column.
Desired output
16:47:07,3,r-4-VM,230000000.,0.46,131072,0,0,0,60,0
16:47:11,3,r-4-VM,250000000.,0.50,131072,0,0,0,0,0
16:47:14,3,r-4-VM,240000000.,0.48,131072,0,0,32768,0,0
16:47:17,3,r-4-VM,230000000.,0.46,131072,0,0,0,0,0
Your help is highly appreciated
awk '{$5=sprintf( "%.2g", $5)} 1' OFS=, FS=, input
This will round and print .47 instead of .46 on the first line, but perhaps that is desirable.
Try with this:
cat filename | sed 's/\(^.*\)\(0\.[0-9][0-9]\)[0-9]*\(,.*\)/\1\2\3/g'
So far, the output is at GNU/Linux standard output, so
cat filename | sed 's/\(^.*\)\(0\.[0-9][0-9]\)[0-9]*\(,.*\)/\1\2\3/g' > out_filename
will send the desired result to out_filename
If rounding is not desired, i.e. 0.466028518635 needs to be printed as 0.46, use:
cat <input> | awk -F, '{$5=sprintf( "%.4s", $5)} 1' OFS=,
(This can another example of Useless use of cat)
You want it in perl, This is it:
perl -F, -lane '$F[4]=~s/^(\d+\...).*/$1/g;print join ",",#F' your_file
tested below:
> cat temp
16:47:07,3,r-4-VM,230000000.,0.466028518635,131072,0,0,0,60,0
16:47:11,3,r-4-VM,250000000.,10.50822578824,131072,0,0,0,0,0
16:47:14,3,r-4-VM,240000000.,0.488406067907,131072,0,0,32768,0,0
16:47:17,3,r-4-VM,230000000.,0.467893525702,131072,0,0,0,0,0
> perl -F, -lane '$F[4]=~s/^(\d+\...).*/$1/g;print join ",",#F' temp
16:47:07,3,r-4-VM,230000000.,0.46,131072,0,0,0,60,0
16:47:11,3,r-4-VM,250000000.,10.50,131072,0,0,0,0,0
16:47:14,3,r-4-VM,240000000.,0.48,131072,0,0,32768,0,0
16:47:17,3,r-4-VM,230000000.,0.46,131072,0,0,0,0,0
sed -r 's/^(([^,]+,){4}[^,]{4})[^,]*/\1/' file.csv
This might work for you (GNU sed):
sed -r 's/([^,]{,4})[^,]*/\1/5' file
This replaces the 5th occurence of non-commas to no more than 4 characters length.

Resources