CSV File with values having single quote within quote text qualifier - excel

I am trying to parse a CSV file which has single quote as text qualifier. The problem here is that some values with single quote text qualifier itself contains single quote
e-g:
'Fri, 24 Feb 2017 17:44:57 +0700','th01ham000tthxs','/','','Writer's Tools Data','7.1.0.0',
I am struggling to parse the file as after this row, all of the remaining rows get displaced.
I tried working with OpenCSV, UnivocityParsers but didn't get any luck.
If I place the above row in excel (Excel Image) and provide text qualifier as single quote, it give correct result without any displacement of rows.

If using java, the JRecord library should handle the File.
How it works: if a field starts with a quote (e.g. ,') specifically look for ', or ''', or ''''', or ' etc (an odd number of quotes followed by either a comma or end-of-line marker). This approach breaks down if:
The embedded quote is the last character in a field i.e. 'Field with quote '',
White space between the quote and comma i.e. 'Field' , or , '
Here is the line in ReCsvEditor
Also in the ReCsvEditor when editing the file, if you select Generate >>> Java Code >>> ... it will generate Java/JRecord Code to read the file.
Disclaimer: I am the author of JRecord / ReCvEditor. Also the ReCsvEditor Generate function is new and needs more work

Try configuring univocity-parsers to handle the unescaped quote according to your scenario. 'Writer's Tools Data' has an unescaped quote. From your input, I can see you want to use STOP_AT_CLOSING_QUOTE as the strategy to work around these values.
Add this line to your code and it should work fine:
parserSettings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);
Hope this helps.

Related

Remove Leading Spaces in Excel Address List

I have numerous files where the address field is in a single line of text, for the most part separated by a comma. My first step is using 'Replace' function in Excel to replace comma's with a carriage return. This is to turn an address from a single line into multiple lines.
The issue I'm looking to get assistance with, is when I complete the steps above, a leading space is often remaining in all rows from the second row onwards. I would like to know the best way to remove the leading spaces in these rows and keep the format of multi-line addresses.
I have tried using TRIM however these returns the address back to a single line
To show the pre and post transformed data I've added an image below as I can't seem to get the format to show correctly here on this post. Due to my profile being new I also can't imbed the image so there is a link below showing the pre and post transformed data, and the leading space issue I'm seeking help with
Pre and Post Example
Thanks,
Steve
As #Anonymous mention in comment, replace both comma and space at a time by SUBSTITUTE() formula and use WRAP TEXT format of resulting cell.
=SUBSTITUTE(A2,", ",CHAR(10))

how to deal with trailing blanks after importing from excel to sas

I can find several topics and solutions on importing from excel to SAS and how to deal with variable names containing blanks or spaces.
However, in my situation, some of the variable values contain spaces at the end, and after importing I can see the trailing blanks, but compress does not remove them.
I'm thinking they're some other type of character. I've tried some modifiers on the compress function, but cannot seem to make it recognize these spaces.
Because I'm often creating different excel files, I would prefer not having to remove the blanks manually. Is there an option to the proc import step I should add, or is there a modifier I can provide to the compress function to solve this?
I'm using the following basic code to import:
proc import out = METADATA
datafile = "&mdata\meta_data.xlsx"
DBMS = Excel replace;
SHEET = "Sheet1";
GETNAMES = YES;
run;
EDIT (after implementing instructions from comments):
I don't really know how my component of SAS is called - I started working with SAS recently.
I'm using some kind of editor, with a VIEWTABLE window. When looking at my dataset this way, I can select (as in highlight) the variable values. One of my values has a trailing whitespace - I can highlight a finite space beyond the string, which I can't for the other variables. And I know the space is there because I have put it there in excel as well.
The length of my variable is 8, and setting the format to $HEX128 shows:
DOSE 444F534520202020
DOSE2 444F534532A02020
DOSE2 contains the blank space so it's actually 'DOSE2 ' in excel and in the VIEWTABLE.
When converting from string to hex I believe '2' is converted to 32.
That means the whitespace is converted to 'A0' instead of '20'.
Just as a reference for other people searching on these keywords or this topic:
After importing from excel where your values contain spaces, you might end up with a special kind of whitespace: these are non-breaking spaces.
You can find out by setting the format to $HEX128. - the whitespaces should be converted to A0 instead of 20, used for regular whitespaces.
If you want to remove these, you can use var = compress(var, 'A0'x);

How do you escape commas in uploaded CSV's for Dialogflow?

Have what I thought was a simple problem, I'm not able to upload a CSV into Dialogflow's Knowledge due to the following error:
CSV documents must have exactly two columns. The provided document has 3 columns.
I realised quickly that for whatever reason Dialogflow didn't like the way I was escaping commas in each column. Consider the following example:
This is column 1\,line 1,This is column 2 line 1
Validates via CSV Lint so it should work, but doesn't. I've also tried escaping commas with double quotes, but still get the error.
Any ideas appreciated?!
To escape the comma's in a row of csv we have to put the string inside double quotes.
So correct way will be :
"This is column 1,line 1","This is column 2 line 1"
Thanks to #sid8491 for answering. The solution for Dialogflow is to escape every row in your CSV around quotes, even if you only have one column that uses a comma. So the example above is correct:
"This is column 1,line 1","This is column 2 line 1"

#Name error in excel sheet

I have a csv file which is filled automatically through a java programm. I have a line which have the following text when I open the text in Notepad++:
-LRB- from the PMI Practice Standard for Work Breakdown Structures , Oct 2000 -RRB- '',"no","f1_FRAG:1.0","f2_specialChar:1.0","f3:15.0","f4:7.0","f5:0.0","f6:2.0","f7:0.0","f8:3.7612001156935624","f9:7.0","f10:1.0","f11:1.0","f12:0.0","f13:0.0","f14:0.0,"f15_ROOT:1.0","f16_specialChar:1.0","f17_NOTHING:1.0","f18_IN:1.0""
But when I open it in excel sheet, there are two problems:
1) When I click on the cell, I see #Name error and any click on the page causes an error. I even can't close the excel window normally. I also sometimes see something like =A228 or =B223 when I click on the cell. It sounds to be read as a formula, but it actually isn't.
2) The row is not shown completely. I can't see this part when I open the file using office excel:
",f15_ROOT:1.0","f16_specialChar:1.0","f17_NOTHING:1.0","f18_IN:1.0"".
Any help is appreciated.
Since the row starts with a - (minus sign), Excel is expecting a formula.
Manually, you could either:
add an ' (apostrophe) at the beginning of the line (which tells Excel that the cell contains text), or
Format the cell as text : Right-click the cell → Format Cells → Number tab → Text
Ideally, to prevent this issue in the future, the Java program which generates the .CSV file should be changed to enclose text fields with " double quotation marks.
Oddly, that is the only field in your example that isn't surrounded by double quotes.
"-LRB- from the PMI Practice Standard for Work Breakdown Structures , Oct 2000 -RRB- ''","no","f1_FRAG:1.0","f2_specialChar:1.0","f3:15.0","f4:7.0","f5:0.0","f6:2.0","f7:0.0","f8:3.7612001156935624","f9:7.0","f10:1.0","f11:1.0","f12:0.0","f13:0.0","f14:0.0,"f15_ROOT:1.0","f16_specialChar:1.0","f17_NOTHING:1.0","f18_IN:1.0""
At the minimum, double-quotes should the used around any fields that begin with a symbol or contain a comma (like above).
1997,Ford,E350,"Super, luxurious truck"
The double-quotes will be recognized and removed by most apps that open CSV's.
Any field may be quoted (that is, enclosed within double-quote characters). Some fields must be quoted, as specified in following
rules.
"1997","Ford","E350"
Fields with embedded commas or double-quote characters must be quoted.
1997,Ford,E350,"Super, luxurious truck"
Each of the embedded double-quote characters must be represented by a pair of double-quote characters.
1997,Ford,E350,"Super, ""luxurious"" truck"
.
More about Comma Separated Value files:
Wikipedia: CSV Files - Basic Rules
RFC 2046 Standard
RFC4180 Standard
.
Surprisingly, I can't find any reference document from Microsoft that mentions starting text cells with an apostrophe. (I guess it's a secret, so if anyone asks, you didn't hear it from me.) :-)
The reason you are getting the #NAME error specifically is because Excel figures you're trying to enter a formula (because of the minus sign) but it doesn't recognize the Name of the function ("LRB")

Pass new line to description field in ics export from sharepoint

I am exporting an ics file from a sharepoint list item using the following format:
http://sharepoint/site/_vti_bin/owssvr.dll?CS=109&Cmd=Display&List=[List GUID]&CacheControl=1&ID=[Item ID]&Using=event.ics
I have a column in my list called description, this gets passed to the generated ics file in the correct place but any \n's seem to get escaped to this \\n which displays as text in the calendar appointment.
I have tried many different options but cannot seem to get this working.
\n gets replaced with \\n
\134n gets replaced with \\n
\\n generates correctly but does not work
\012 seems to break the ics file unless it is followed by a whitespace character, but then it gets unfolded and ignored.
I refuse to believe that this is impossible. Any help will be appreciated and any solution will save me days of frustration.
I do not know how the original data is stored but icalendar spec (RFC5545) specs:
The "TEXT" property values may also contain special characters
that are used to signify delimiters, such as a COMMA character for
lists of values or a SEMICOLON character for structured values.
In order to support the inclusion of these special characters in
"TEXT" property values, they MUST be escaped with a BACKSLASH
character. A BACKSLASH character in a "TEXT" property value MUST
be escaped with another BACKSLASH character. A COMMA character in
a "TEXT" property value MUST be escaped with a BACKSLASH
character. A SEMICOLON character in a "TEXT" property value MUST
be escaped with a BACKSLASH character. However, a COLON character
in a "TEXT" property value SHALL NOT be escaped with a BACKSLASH
character.
Example: A multiple line value of:
Project XYZ Final Review
Conference Room - 3B
Come Prepared.
would be represented as:
Project XYZ Final Review\nConference Room - 3B\nCome Prepared.
So for your export to work you should leave \n as is, replace CRLF by \n and also hope your user has a standard compliant calendar tool.

Resources