Escaping quotes and delimiters in CSV files with Excel - excel

I try to import a CSV file in Excel, using ; as delimiters, but some columns contains
; and/or quotes.
My problem is : I can use double quotes to ignore the delimiters for a specific string, but if there is a double quote inside the string, it ignores delimiters until the first double quote, but not after.
I don't know if it's clear, it's not that easy to explain.
I will try to explain with a example :
Suppose I have this string this is a;test : I use double quotes around the string, to ignore the delimiter => It works.
Now if this string contains delimiters AND double quotes : my trick doesn't work anymore. For example if I have the string this; is" a;test : My added double quotes around the string ignore delimiters for the first part (the delimiter in the part this; is is correctly ignored, but since there is a double quote after, Excel doesn't ignore the next delimiter in the a;test part.
I tried my best to be as clear as possible, I hope you'll understand what is the problem.

When reading in a quoted string in a csv file, Excel will interpret all pairs of double-quotes ("") with single double-quotes(").
so "this; is"" a;test" will be converted to one cell containing this; is" a;test
So replace all double-quotes in your strings with pairs of double quotes.
Excel will reverse this process when exporting as CSV.
Here is some CSV
a,b,c,d,e
"""test1""",""",te"st2,"test,3",test"4,test5
And this is how it looks after importing into Excel:

Import your Excel file in openOffice and export as CSV (column escaped with " unlike Excel csv, utf8, comma against ";").

Related

Groovy replace using Regex

I have varibale which contains raw data as shown below.
I want to replace the comma inside double quotes to nothing/blank.
I used replaceAll(',',''), but all other commas are also getting replaced.
So need regex function to identify pattern like "123,456,775" and then replace here comma into blank.
var = '
2/5/2023,25,"717,990","18,132,406"
2/4/2023,27,"725,674","19,403,116"
2/3/2023,35,"728,501","25,578,008"
1/31/2023,37,"716,580","26,358,186"
2/1/2023,37,"720,466","26,494,010"
1/30/2023,37,"715,685","26,517,878"
2/2/2023,37,"723,545","26,603,765" '
Tried replaceAll, but did not work
If you just want to replace "," with "", you have to escape the quotes this will do:
var.replaceAll(/\",\"/, /\"\"/)
If you want to replace commas inside the number strings, "725,674" with "725674" you will have to use a regex and capture groups, like this:
var.replaceAll(/(\"\d+),(\d+\")/, /$1$2/)
It will change for three groupings, like "18,132,406", you will have to use three capture groups.

Regular expression for splitting a comma with the string

I had to split string data based on Comma.
This is the excel data:-
Please find the excel data
string strCurrentLine="\"Himalayan Salt Body Scrub with Lychee Essential Oil from Majestic Pure, All Natural Scrub to Exfoliate & Moisturize Skin, 12 oz\",SKU_27,\"Tombow Dual Brush Pen Art Markers, Portrait, 6-Pack\",SKU_27,My Shopify Store 1,Valid,NonInventory".
Regex CSVParser = new Regex(",(?=(?:[^\"]\"[^\"]\")(?![^\"]\"))");
string[] lstColumnValues = CSVParser.Split(strCurrentLine);
I have attached the image.The problem is I used the Regex to split the string with comma but i need the ouptut just like SKU_27 because string[0] and string2 contains the forward and backward slash.I need the output string1 and remove the forward and backward slash.
The file seems to be a CVA file. For CVA to be properly formatted, it will use quotes "" to wrap strings that contains comma, such as
id, name, date
1,"Some text, that includes comma", 2020/01/01
Simply split the string by comma, you will get the 2nd column with double quote.
I'm not sure whether you are asking how to remove the double-quotes from lstColumnValues[0] and lstColumnValues[2], or add them to lstColumnValues[1].
To remove the double-quotes, just use Replace:
string myString = lstColumnValues[0].Replace("\"", "");
If you need to add them:
string myString = $"\"{lstColumnValues[1]}\"";

excel trim function is removing spaces in middle of text - this was unexpected (?)

The excel trim function is removing spaces in middle of text - this was unexpected (?)
i.e. I thought that the excel trim was for trimming leading and trailing spaces.
e.g. a cell value of =Trim("Last Obs Resp") becomes a value of "Last Obs Resp"
Sure enough Microsoft documents it this way:
https://support.office.com/en-gb/article/trim-function-410388fa-c5df-49c6-b16c-9e5630b479f9
I am used to the Oracle database trim function which only removes leading and trailing spaces.
https://www.techonthenet.com/oracle/functions/trim.php
Was excel Trim function always this way?
Excel does not have ltrim and rtrim functions..
i.e. I can't do:
=RTRIM(Ltrim("Last Obs Resp"))
I wonder how I achieve the equivalent in Excel when I don't want to remove doubled up spaces in the middle of the string?
This page documents VBA trim function:
https://www.techonthenet.com/excel/formulas/trim.php
Create a UDF that uses VBA's version of Trim which does not touch the inner spaces. Only removing the leading and trailing spaces
Function MyTrim(str As String) As String
MyTrim = Trim(str)
End Function
Then you can call it from the worksheet:
=MyTrim(A1)
If you want a formula to do it:
=MID(LEFT(A1,AGGREGATE(14,6,ROW($XFD$1:INDEX(XFD:XFD,LEN(A1)))/(MID(A1,ROW($XFD$1:INDEX(XFD:XFD,LEN(A1))),1)<>" "),1)),AGGREGATE(15,6,ROW($XFD$1:INDEX(XFD:XFD,LEN(A1)))/(MID(A1,ROW($XFD$1:INDEX(XFD:XFD,LEN(A1))),1)<>" "),1),999)

Nested strings for text(,format) conversion

How do you use Text() with a format that has a string inside it ?
=TEXT(A1,"Comfi+"#0"(JO)";"Comfi-"#0"(JO)")
Tried """ both the inner string :
=TEXT(A1," """Comfi+"""#0"""(JO)""";"""Comfi-"""#0"(JO)""" ")
Same result with &char(34)&
Similar issue here, but I couldn't transpose the solution to my problem : How to create strings containing double quotes in Excel formulas?
Post Solution edit :
Building an almanac/calendar with the following (now fixed)formula :
=CONCATENATE(
TEXT(Format!K25,"d"),
" J+",
Format!S25,
" ",
TEXT(Format!AA25,"""Comfi+""#0""(JO)"";""Comfi-""#0""(JO)"""),
" ",
Format!AI25
)
Giving the following output in each cell :
9
J+70
Comfi+21(JO)
CRG
You've got too many quotation marks inside:
=TEXT(A1,"""Comfi+""#0""(JO)"";""Comfi-""#0""(JO)""")
You were tripling many of the inside quotation marks.
Personally, doubling up double-quotes within a quoted string is something I try to avoid at all costs. You can 'escape' the text into literals with a backslash.
=TEXT(A1,"\C\o\m\f\i+#0\(\J\O\);\C\o\m\f\i-#0\(\J\O\)")
'alternately
="Comfi"&text(a1, "+#0;-#0")&"(JO)"
Not all of those actually need to be escaped; only reserved characters. However, I usually escape them all and let Excel sort them out.

Remove quotes from csv file using opencsv

I am trying to add changes data in a csv file:
This is the sample data:
DATE status code value value2
"2016-01-26","Subscription All","119432660","1315529431362550","0.0080099833517888"
"2016-01-26","Subscription All","119432664","5836995058433524","0.033825584764444"
"2016-01-26","Subscription All","119432664","8287300074499777","0.076913377834744"
"2016-01-26","Subscription All","119432664","14870697739968326","0.0074188355187426"
My code used to format the data:
CSVReader reader = new CSVReader(new FileReader(new File(fileToChange)), CSVParser.DEFAULT_SEPARATOR, CSVParser.NULL_CHARACTER, CSVParser.NULL_CHARACTER, 1)
info "Read all rows at once"
List<String[]> allRows = reader.readAll();
CSVWriter writer = new CSVWriter(new FileWriter(fileToChange), CSVWriter.DEFAULT_SEPARATOR, CSVWriter.NO_QUOTE_CHARACTER)
writer.writeAll(allRows)
writer.close()
The output i get is this, with extra quote added instead of removing it.
""2016-01-26"",""Subscription All"",""119432660"",""1315529431362550"",""0.0080099833517888""
""2016-01-26"",""Subscription All"",""119432664"",""5836995058433524"",""0.033825584764444""
""2016-01-26"",""Subscription All"",""119432664"",""8287300074499777"",""0.076913377834744""
""2016-01-26"",""Subscription All"",""119432664"",""14870697739968326"",""0.0074188355187426""
I want to remove the quotes.
Please can someone help.
Also, is it possible to change the date format to yyyymmdd instead of yyyy-mm-dd?
allRows.each { String[] theLine ->
String newDate = theLine[0].replaceAll('-', '')
String newline = theLine.eachWithIndex { String s, int i -> return i > 0 ? s : newDate}
writer.writeLine(newline)
}
Thanks
When you instantiated your CSVReader you told it to treat no characters as quotes, therefore it read the existing quotes as data and did not remove them.
When you told CSVWriter not to add any quotes it honored your request. However, the input data contained quote characters, and the convention for including quotes inside a string in CSV is to double the quotes. Thus the
string value
ABC"DEF
gets coded in CSV as
"ABC""DEF"
So the result you see is the combination of not removing the quotes on input (you told it not to) and then doubling the quotes on output.
To solve this change the input option from NULL_CHARACTER to DEFAULT_QUOTE_CHARACTER. However be aware that if any of your data actually contains embedded quotes or commas the resulting output will not be valid CSV.
Also I think this might be a valid bug report against OpenCSV. I believe that OpenCSV needs to inform you if it is about to generate invalid CSV when you told it to omit quotes, probably via a runtime exception. Although I suppose they might argue that you chose to work without a net and should accept whatever you get. Personally I go for the "principle of least surprise", which IMHO would be not to double quotes when the output is unquoted.
Because quotation in your CSVReader is set to CSVParser.NULL_CHARACTER " is treated as normal character which is part of read token. This causes your array to contain data in form:
["2016-01-26", "Subscription All", "119432660", "1315529431362550", "0.0080099833517888"]
rather than:
[2016-01-26, Subscription All, 119432660, 1315529431362550, 0.0080099833517888]
So try changing option from CSVParser.NULL_CHARACTER to either
'"'
CSVParser.DEFAULT_QUOTE_CHARACTER (it also stores '"').
CsvToBean csvToBean = new CsvToBeanBuilder(new StringReader(csv))
.withMappingStrategy(strategy)
.withIgnoreLeadingWhiteSpace(true)
.withSeparator(',')
.withIgnoreEmptyLine(true)
.withQuoteChar('\'')
.withQuoteChar('"')
.build();

Resources