Want to Ignore Comma(,) in ADF as delimiter - azure

Have text tab delimited file which I'm converting into csv in ADF
In which, some field having comma separated values ( Don't want to split into columns) -> want to treat it as single column
But in ADF trying to separate those values into multiple column
Although, I have set a delimiter as "TAB(\t)", still ADF taking Comma and tab both as delimiter
why?
As you can see in the above Ex,
I want to split this on basis of '\t' delimiter but ADF considering 'tab' as well as 'comma' as a delimiter(Want to treat [pp,aa] as single value/column)
Any solution? plz let me know.
Thank you
Attaching ADF conf as well.

Set the quote character to a single character instead of "No quote character" in source and sink datasets.
I tried with the sample and was able to get the data as expected.
Source:
Source Dataset:
Sink Dataset:
output:
Reference - Dataset Properties
Property
Description
quoteChar
The single character to quote column values if it contains column delimiter. The default value is double quotes ".
escapeChar
The single character to escape quotes inside a quoted value. The default value is backslash \.

Related

Handle comma data in csv file in Azure Data Factory

I have a csv which is somewhat like foo,bar. This must go into a single column but what I see is it goes into a separate column as it is comma delimited.
How do we handle this ? I also added double quotes "foo,bar" and put the escape character as double quotes(") in source but I get the data with the double quotes in the final data.
Please suggest how do we handle this situation in ADF ?
Handle comma data in csv file in Azure Data Factory
If your source file has data which contain comma(,) in it and you set the Column delimiter also as comma(,) then while reading file it will consider different column wherever it finds comma(,) in data.
To solve this there are two methods:
You can change the Column delimiter for file like Semicolon (;), Pipe (|), Tab (\t) etc; then it will not consider the comma in data as different column.
Add Quotes ("") to data where it contains comma
e.g, "foo,bar" and set Quote character as Double quote (") so it will consider it as single value
I also added double quotes "foo,bar" and put the escape character as double quotes(") in source but I get the data with the double quotes in the final data.
To handle this set Escape character as No escape character and Quote character as No quote character in the Sink dataset
Sink dataset setting:
Sample data:
Output

Copy Activity missing column in the output

I have a copy activity which takes the output of a procedure and writes it to a temp CSV file. I needed to have headers in double quotation mark so after that I have a Data Flow task that takes the temp file and adds the quote all in the sink settings. Yet the output is not what is expected. It looks like the last column is missing in some of the records due to comma in the data.
Is there a way to use only copy activity but still have the column names in double quotes?
When we set the column delimiter, data factory will consider the first row as the schema according the delimiter number. If your data which has the value which same with the column delimiter, then you will miss some columns.
Just for now in Data Factory, we can't solve it. The only way is that please se the different column delimiter, for example the '|':
Output example:
And we also can't make the header wrapped by double quote for the output .csv file. It's not supported in Data Factory.
HTH.

Azure Data Factory Escape Characters In String Values

I've got a pipeline built in Azure Data Factory that's pulling data from SQL and storing it as csv. My Dataset for the csv has Escape Character set to a blackslash (\) and Quote Character set to Double Quote ("). The problem this causes is when I have an empty string with just a backslash, which unfortunately exists in my data quite a bit. Dozens of tables with many columns.
When this happens ADF outputs the backslash as "\" which then can't be interpreted from other systems to load. The second quote is ignored because it's being escaped and it becomes just a single quote and it throws off the columns. I have tried using other characters such as "^" to escape. But I also have "^" in many values across many table columns and the same issue can happen.
It seems whenever an escape character exists in data that is at the end of a value (or the only value) it will cause an error. How can I handle this without having to address it on the source side? Why doesn't data Factory convert escape characters to double backslash when it's in data? I would expect a string that's "\" to get converted to "\\" but it isn't happening. Am I missing something?
Thanks for the help!

CSV File with values having single quote within quote text qualifier

I am trying to parse a CSV file which has single quote as text qualifier. The problem here is that some values with single quote text qualifier itself contains single quote
e-g:
'Fri, 24 Feb 2017 17:44:57 +0700','th01ham000tthxs','/','','Writer's Tools Data','7.1.0.0',
I am struggling to parse the file as after this row, all of the remaining rows get displaced.
I tried working with OpenCSV, UnivocityParsers but didn't get any luck.
If I place the above row in excel (Excel Image) and provide text qualifier as single quote, it give correct result without any displacement of rows.
If using java, the JRecord library should handle the File.
How it works: if a field starts with a quote (e.g. ,') specifically look for ', or ''', or ''''', or ' etc (an odd number of quotes followed by either a comma or end-of-line marker). This approach breaks down if:
The embedded quote is the last character in a field i.e. 'Field with quote '',
White space between the quote and comma i.e. 'Field' , or , '
Here is the line in ReCsvEditor
Also in the ReCsvEditor when editing the file, if you select Generate >>> Java Code >>> ... it will generate Java/JRecord Code to read the file.
Disclaimer: I am the author of JRecord / ReCvEditor. Also the ReCsvEditor Generate function is new and needs more work
Try configuring univocity-parsers to handle the unescaped quote according to your scenario. 'Writer's Tools Data' has an unescaped quote. From your input, I can see you want to use STOP_AT_CLOSING_QUOTE as the strategy to work around these values.
Add this line to your code and it should work fine:
parserSettings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);
Hope this helps.

Azure Data Factory v2 pipeline double quotes

My source file has nvarchar and numeric columns.
Numeric column has thousand separator, do identify the value comes with double quotes. When i use the quoteChar ("\"") in file format the numeric value works fine.
Same time the nvarchar column (Name) has multiple double quotes between the data, if i use the quoteChar the values are split into further more columns based on the number of double quotes.
Is there any fix/solution for this?
Based on the properties in Text format, only one character is allowed. So you can't separate the different data types by different delimiter.
You could try to use | column delimiter if your nvarchar doesn't have | character. Or maybe you have to parse your source file to remove the thousand separator in an Azure Function activity before the transfer. Then it could be identified by copy activity in adf.
ADF parser fails while reading text that is encapsulated with double quotes and also have a comma within the text, like "Hello, World". To make it work, set Escape Character and Quote Character properties value to double quote. This will save the whole text with double quotes into destination.

Resources