Handle comma data in csv file in Azure Data Factory - azure

I have a csv which is somewhat like foo,bar. This must go into a single column but what I see is it goes into a separate column as it is comma delimited.
How do we handle this ? I also added double quotes "foo,bar" and put the escape character as double quotes(") in source but I get the data with the double quotes in the final data.
Please suggest how do we handle this situation in ADF ?

Handle comma data in csv file in Azure Data Factory
If your source file has data which contain comma(,) in it and you set the Column delimiter also as comma(,) then while reading file it will consider different column wherever it finds comma(,) in data.
To solve this there are two methods:
You can change the Column delimiter for file like Semicolon (;), Pipe (|), Tab (\t) etc; then it will not consider the comma in data as different column.
Add Quotes ("") to data where it contains comma
e.g, "foo,bar" and set Quote character as Double quote (") so it will consider it as single value
I also added double quotes "foo,bar" and put the escape character as double quotes(") in source but I get the data with the double quotes in the final data.
To handle this set Escape character as No escape character and Quote character as No quote character in the Sink dataset
Sink dataset setting:
Sample data:
Output

Related

Parse a CSV file into rows but my cell contains newline character and the fields are not enclosed in double quotes in nodejs

I have a csv file which is not enclosed in double quotes but one of my cell have newline character in them then how can i find the next row.Earlier i was delimiting it using /n.In nodejs I dont want to use any additional packages.

How to remove or replace "\" from the output file generated in Synapse (using copy stge)

I tried converting data in table to a file and able to achieve. But, in the output file generated,there is "\" comming in. Can you please help how to eliminate or remove "\" from file.
Note: Have used Copy stage to move data from oracle sql table to storage account.
Output:
"\"we16\",\"ACTIVE\",\"we16\",01-JAN-22,\"Tester\""
"\"sb64\",\"ACTIVE\",\"sb64\",01-JAN-22,\"\"Operations"
"\"sb47\",\"ACTIVE\",\"sb47\",01-JAN-22,\"\"Developer"
"\"ud53\",\"ACTIVE\",\"ud53\",01-JAN-22,\"\"Manager"
"\"hk72\",\"ACTIVE\",\"hk72\",01-JAN-22,\"\"Tester"
"\"sk99\",\"ACTIVE\",\"sk99\",01-JAN-22,\"\"Tester"
I have reproduced the above with SQL as source and Blob as sink in copy activity.
This is my sample SQL data with Double Quotes on a column.
sink dataset:
When I set the Escape character to Backslash and Quote character to Double quote("), I got the same result as above.
In the above we have the data with double quotes and our Quote character also a double Quote, so that's why it is escaping the extra double quotes with the Escaping character.
As our table has values with double quotes, we need to set the Quote characters to any other values apart from double quotes.
So, when I have set it to single quotes('), I got proper result.
Result data of double quotes with single quotes as Quote characters:
When you are using this dataset in synapse or ADF, set it to the single Quote so that you can use the correct data.

Want to Ignore Comma(,) in ADF as delimiter

Have text tab delimited file which I'm converting into csv in ADF
In which, some field having comma separated values ( Don't want to split into columns) -> want to treat it as single column
But in ADF trying to separate those values into multiple column
Although, I have set a delimiter as "TAB(\t)", still ADF taking Comma and tab both as delimiter
why?
As you can see in the above Ex,
I want to split this on basis of '\t' delimiter but ADF considering 'tab' as well as 'comma' as a delimiter(Want to treat [pp,aa] as single value/column)
Any solution? plz let me know.
Thank you
Attaching ADF conf as well.
Set the quote character to a single character instead of "No quote character" in source and sink datasets.
I tried with the sample and was able to get the data as expected.
Source:
Source Dataset:
Sink Dataset:
output:
Reference - Dataset Properties
Property
Description
quoteChar
The single character to quote column values if it contains column delimiter. The default value is double quotes ".
escapeChar
The single character to escape quotes inside a quoted value. The default value is backslash \.

Azure Data Factory Escape Characters In String Values

I've got a pipeline built in Azure Data Factory that's pulling data from SQL and storing it as csv. My Dataset for the csv has Escape Character set to a blackslash (\) and Quote Character set to Double Quote ("). The problem this causes is when I have an empty string with just a backslash, which unfortunately exists in my data quite a bit. Dozens of tables with many columns.
When this happens ADF outputs the backslash as "\" which then can't be interpreted from other systems to load. The second quote is ignored because it's being escaped and it becomes just a single quote and it throws off the columns. I have tried using other characters such as "^" to escape. But I also have "^" in many values across many table columns and the same issue can happen.
It seems whenever an escape character exists in data that is at the end of a value (or the only value) it will cause an error. How can I handle this without having to address it on the source side? Why doesn't data Factory convert escape characters to double backslash when it's in data? I would expect a string that's "\" to get converted to "\\" but it isn't happening. Am I missing something?
Thanks for the help!

Azure Data Factory v2 pipeline double quotes

My source file has nvarchar and numeric columns.
Numeric column has thousand separator, do identify the value comes with double quotes. When i use the quoteChar ("\"") in file format the numeric value works fine.
Same time the nvarchar column (Name) has multiple double quotes between the data, if i use the quoteChar the values are split into further more columns based on the number of double quotes.
Is there any fix/solution for this?
Based on the properties in Text format, only one character is allowed. So you can't separate the different data types by different delimiter.
You could try to use | column delimiter if your nvarchar doesn't have | character. Or maybe you have to parse your source file to remove the thousand separator in an Azure Function activity before the transfer. Then it could be identified by copy activity in adf.
ADF parser fails while reading text that is encapsulated with double quotes and also have a comma within the text, like "Hello, World". To make it work, set Escape Character and Quote Character properties value to double quote. This will save the whole text with double quotes into destination.

Resources