Regex: Replace every Comma with Tab Not within quotes - excel

I have a huge data set of entries like these:
(21, 2, '23.5R25 ETADT', 'description, with a comma'),
(22, 1, '26.5R25 ETADT', 'Description without a comma'),
(23, 5, '20.5R20.5', 'Another description with ; semicolumn'),
I'm trying to replace every comma in the list with a tab. Excluding the commas within the single quotes. Also excluding the ending commas.
So the examples entries should become:
(21[TAB]2[TAB]'23.5R25 ETADT'[TAB]'description, with a comma'),
(22[TAB]1[TAB]'26.5R25 ETADT'[TAB]'Description without a comma'),
(23[TAB]5[TAB]'20.5R20.5'[TAB]'Another description with ; semicolumn'),
I've got like 6000 rows of data like this. The tabs allow me to tell Excel to import the elements of these entries into different columns.
The Regex I've tried was this: [ ]*,[ ]*
But this Regex selects all the commas, even the ones within the single quotes.

It looks as though each of your lines has 4 elements within parenthesis. And it looks like only the last 2 elements use single quotes. If those assumptions can be made, I've tested the following in Notepad++:
"Find what :" ^\(([^,]*),\s*([^,]*),\s*'([^']*)'\s*,\s*
"Replace with :" \(\1\t\2\t'\3'\t
EDIT:
The search regex is dependent upon the 4 column model with only the last two elements having single quotes. Visually this is how it works:
^\(: Finds an opening parenthesis
([^,]*): Captures non-comma characters which will be all of element 1
,\s*: Matches a comma and any trailing spaces
([^,]*): Captures non-comma characters which will be all of element 2
,\s*: Matches a comma and any trailing spaces
'([^']*)': Captures the string in single quotes which will be all of element 3
\s*,\s*: Matches a comma and all surrounding spaces
Ignore the rest of the string, there are no more commas to be replaced we just want to replace parts of the line we just read in

Related

Append characters based on the count of a match in Vim

I would like to append - at the end of each word match. But, the number of - appended should be based on the count of the match, so that the total number of characters in that line remain constant.
As shown in the example below, the total number of characters should be 6.
e.g.
ab
xyz
abcde
The above text should be replaced to:
ab----
xyz---
abcde-
You can use \= to substitute with an expression, see :h sub-replace-expression.
When the substitute string starts with \=, the remainder is interpreted as an expression.
The submatch() function can be used to obtain matched text. The whole matched text can be accessed with submatch(0). The text matched with the first pair of () with submatch(1). Likewise for further sub-matches in ().
So you can achieve it like this:
:[range]s//\=submatch(0) . repeat('-', 6-strlen(submatch(0)))/

How do I remove text using sed?

For instance let say I have a text file:
worker1, 0001, company1
worker2, 0002, company2
worker3, 0003, company3
How would I use sed to take the first 2 characters of the first column so "wo" and remove the rest of the text and attach it to the second column so the output would look like this:
wo0001,company1
wo0002,company2
wo0003,company3
$ sed -E 's/^(..)[^,]*, ([^,]*,) /\1\2/' file
wo0001,company1
wo0002,company2
wo0003,company3
s/ begin substitution
^(..) match the first two characters at the beginning of the line, captured in a group
[^,]* match any amount of non-comma characters of the first column
, match a comma and a space character
([^,]*,) match the second field and comma captured in a group (any amount of non-comma characters followed by a comma)
match the next space character
/\1\2/ replace with the first and second capturing group

Remove extra commas from only 2nd and 3rd row of CSV file

I have a comma delimited file (CSV file) test.csv as shown below.
FHEAD,1,2,3,,,,,,
FDEP,2,3,,,,,,,,
FCLS,3,,,4-5,,,,,,,
FDETL,4,5,6,7,8,
FTAIL,5,67,,,,,,
I wanted to remove the empty columns only from 2nd and 3rd row of the file i.e. were ever the records starts with FDEP and FCLS only in those rows I wanted to remove the empty columns (,,).
after removing the empty columns the same file test.csv should look like
FHEAD,1,2,3,,,,,,
FDEP,2,3
FCLS,3,4-5
FDETL,4,5,6,7,8,
FTAIL,5,67,,,,,,
How can I do this in Unix???
Here's one way to do it, using sed:
sed '/^F\(DEP\|CLS\),/ { s/,\{2,\}/,/g; s/,$// }'
We use a range of /^F\(DEP\|CLS\),/, i.e. the following command will only process lines matching ^F\(DEP\|CLS\),. This regex matches beginning-of-string, followed by F, followed by either DEP or CLS, followed by ,. In other words, we look for lines starting with FDEP, or FCLS,.
Having found such a line, we first substitute (s command) all runs (g flag, match as many times as possible) of 2 or more (\{2,\}) commas (,) in a row by a single ,. This squeezes ,,, down to a single ,.
Second, we substitute , at end-of-string by nothing. This gets rid of any trailing comma.

How to Escape Comma within Formula in CSV file

I'm trying to create a CSV file (Excel or LibreOffice) which contains a formula that uses multiple arguments separated by a comma. The comma is interpreted as a field separator. Is it possible to escape the comma somehow so formula requiring commas can be used?
This works as expected:
=Sum(A1:A10)
This doesn't read correctly due to use of comma in formula:
=Confidence.Norm(.01, Stdev.p(A1:A10), 10)
Imagined solutions that don't work:
=Confidence.Norm(.01 \, Stdev.p(A1:A10) \, 10)
=Confidence.Norm(.01 ',' Stdev.p(A1:A10) ',' 10)
=+"Confidence.Norm(.01 \, Stdev.p(A1:A10) \, 10)"
Try this:
"=Confidence.Norm(.01, Stdev.p(A1:A10), 10)"

How to empty out a column

I'm searching to empty out multiple columns in VIM
(not to delete but to put spaces inside).
This is my search command:
/\%2c\|\%4c\|\%>5c\%<9c
(column: 2,4,6-8)
How can I empty out these columns in vim?
:%s/\%2c\|\%4c\|\%>5c\%<9c/ /g doesn't work
/\%c is a zero-width match.
You'll need to match something like:
/\v^(.).(.).(.)...
Which will keep the values of columns 1, 3, and 5 in groups.
Then you can substitute:
:%s!\v^(.).(.).(.)...!\1 \2 \3 !
...keeping columns 1, 3, and 5 but replacing the rest of the first eight columns with spaces.

Resources