Azure Data Factory Escape Characters In String Values - azure

I've got a pipeline built in Azure Data Factory that's pulling data from SQL and storing it as csv. My Dataset for the csv has Escape Character set to a blackslash (\) and Quote Character set to Double Quote ("). The problem this causes is when I have an empty string with just a backslash, which unfortunately exists in my data quite a bit. Dozens of tables with many columns.
When this happens ADF outputs the backslash as "\" which then can't be interpreted from other systems to load. The second quote is ignored because it's being escaped and it becomes just a single quote and it throws off the columns. I have tried using other characters such as "^" to escape. But I also have "^" in many values across many table columns and the same issue can happen.
It seems whenever an escape character exists in data that is at the end of a value (or the only value) it will cause an error. How can I handle this without having to address it on the source side? Why doesn't data Factory convert escape characters to double backslash when it's in data? I would expect a string that's "\" to get converted to "\\" but it isn't happening. Am I missing something?
Thanks for the help!

Related

Handle comma data in csv file in Azure Data Factory

I have a csv which is somewhat like foo,bar. This must go into a single column but what I see is it goes into a separate column as it is comma delimited.
How do we handle this ? I also added double quotes "foo,bar" and put the escape character as double quotes(") in source but I get the data with the double quotes in the final data.
Please suggest how do we handle this situation in ADF ?
Handle comma data in csv file in Azure Data Factory
If your source file has data which contain comma(,) in it and you set the Column delimiter also as comma(,) then while reading file it will consider different column wherever it finds comma(,) in data.
To solve this there are two methods:
You can change the Column delimiter for file like Semicolon (;), Pipe (|), Tab (\t) etc; then it will not consider the comma in data as different column.
Add Quotes ("") to data where it contains comma
e.g, "foo,bar" and set Quote character as Double quote (") so it will consider it as single value
I also added double quotes "foo,bar" and put the escape character as double quotes(") in source but I get the data with the double quotes in the final data.
To handle this set Escape character as No escape character and Quote character as No quote character in the Sink dataset
Sink dataset setting:
Sample data:
Output

How to remove or replace "\" from the output file generated in Synapse (using copy stge)

I tried converting data in table to a file and able to achieve. But, in the output file generated,there is "\" comming in. Can you please help how to eliminate or remove "\" from file.
Note: Have used Copy stage to move data from oracle sql table to storage account.
Output:
"\"we16\",\"ACTIVE\",\"we16\",01-JAN-22,\"Tester\""
"\"sb64\",\"ACTIVE\",\"sb64\",01-JAN-22,\"\"Operations"
"\"sb47\",\"ACTIVE\",\"sb47\",01-JAN-22,\"\"Developer"
"\"ud53\",\"ACTIVE\",\"ud53\",01-JAN-22,\"\"Manager"
"\"hk72\",\"ACTIVE\",\"hk72\",01-JAN-22,\"\"Tester"
"\"sk99\",\"ACTIVE\",\"sk99\",01-JAN-22,\"\"Tester"
I have reproduced the above with SQL as source and Blob as sink in copy activity.
This is my sample SQL data with Double Quotes on a column.
sink dataset:
When I set the Escape character to Backslash and Quote character to Double quote("), I got the same result as above.
In the above we have the data with double quotes and our Quote character also a double Quote, so that's why it is escaping the extra double quotes with the Escaping character.
As our table has values with double quotes, we need to set the Quote characters to any other values apart from double quotes.
So, when I have set it to single quotes('), I got proper result.
Result data of double quotes with single quotes as Quote characters:
When you are using this dataset in synapse or ADF, set it to the single Quote so that you can use the correct data.

Want to Ignore Comma(,) in ADF as delimiter

Have text tab delimited file which I'm converting into csv in ADF
In which, some field having comma separated values ( Don't want to split into columns) -> want to treat it as single column
But in ADF trying to separate those values into multiple column
Although, I have set a delimiter as "TAB(\t)", still ADF taking Comma and tab both as delimiter
why?
As you can see in the above Ex,
I want to split this on basis of '\t' delimiter but ADF considering 'tab' as well as 'comma' as a delimiter(Want to treat [pp,aa] as single value/column)
Any solution? plz let me know.
Thank you
Attaching ADF conf as well.
Set the quote character to a single character instead of "No quote character" in source and sink datasets.
I tried with the sample and was able to get the data as expected.
Source:
Source Dataset:
Sink Dataset:
output:
Reference - Dataset Properties
Property
Description
quoteChar
The single character to quote column values if it contains column delimiter. The default value is double quotes ".
escapeChar
The single character to escape quotes inside a quoted value. The default value is backslash \.

Error with No Escape Character in Mapping Data Flow in Data Factory

TLDR: Why does Azure Data Factory Data Flow not allow you to have no escape character?
We have bad source data from a source that is unlikely to update it on their end (this is the nicest way I could phrase this). They have multiple columns where the value in the column is 01F\ or 8239\ and the backslash is written in their spec to be part of the value, not to be considered an escape character like it is standardized to be in the entire world.
The overall set up of the files are that they are comma delimited, each column's contents is in " ", and we have all of the normal new line characters. It's just the backslash that is not complying with standards. E.g.
"Column 1","Column 2","Column 3","Column 4"
"John","01F\","34","NY"
"Jane","3K","8239\","CA"
|---------------------|------------------|------------------|------------------|
| Column 1 | Column 2 | Column 3 | Column 4 |
|---------------------|------------------|------------------|------------------|
| "John" | "01F\" | "34" | "NY" |
|---------------------|------------------|------------------|------------------|
| "Jane" | "3K" | "8239\" | "CA" |
|---------------------|------------------|------------------|------------------|
In Azure Data Factory we are trying to see if we can make it ignore the \ as being an escape character. (FYI when we leave it as being treated as an escape character, it pulls the column right after the column with the backslash into one column). We can see in the data set where to set it so that there is no escape character.
However when we then add that data set to our data flow and attempt to preview the data there, we get an error that we can't have no escape character in the data flow, and that quote character should be no quote character when we have no escape character.
If we try to go back and also set no quote characters (which we don't actually want to do, just to test if that'll let it work), we get an error that the data flow can't have no escape or quote characters.
What is the purpose of having those two options available if they do not work in Azure Data Factory? Or is there somewhere else we need to update additional settings to make this work?
Thank you!!
Edit: I forgot to mention that we also attempted to see if we could replace the backslashes in the data flow column mapping section. We tried using the replace() function, but could not get it to work (kept giving syntax errors).
ORIGINAL working code for column: trim(toString($$))
ATTEMPTED WORKAROUNDS:
replace(trim(toString($$)),'\','-')
trim(replace(toString($$),'\','-')
trim(toString(replace($$),'\','-'))
I just wanted to share that a user on the Microsoft forums provided an answer that ended up working.
We changed the escape character from \ to ^ in the data set settings (only did this after confirming that the ^ character is not used in any fashion anywhere). We did not apply the replace function in the mapping of the column because unfortunately we needed to keep the \ character in those columns. But it worked and our data is now flowing through the way we need it to (even though these are not best practices for data management).
https://learn.microsoft.com/en-us/answers/questions/48595/error-with-no-escape-character-in-mapping-data-flo.html
For the 'Delimited Text' source, it does not allow you to select Quote character with 'No escape character'. You can try the workaround options
If your destination is Azure Synapse, try with Polybase load. (It will load the data with escape sequence and Quote character. After data load you can do a cleansing.)
If possible convert the source data format from 'Delimited Text' to 'parquet' or 'Json'

Azure Data Factory v2 pipeline double quotes

My source file has nvarchar and numeric columns.
Numeric column has thousand separator, do identify the value comes with double quotes. When i use the quoteChar ("\"") in file format the numeric value works fine.
Same time the nvarchar column (Name) has multiple double quotes between the data, if i use the quoteChar the values are split into further more columns based on the number of double quotes.
Is there any fix/solution for this?
Based on the properties in Text format, only one character is allowed. So you can't separate the different data types by different delimiter.
You could try to use | column delimiter if your nvarchar doesn't have | character. Or maybe you have to parse your source file to remove the thousand separator in an Azure Function activity before the transfer. Then it could be identified by copy activity in adf.
ADF parser fails while reading text that is encapsulated with double quotes and also have a comma within the text, like "Hello, World". To make it work, set Escape Character and Quote Character properties value to double quote. This will save the whole text with double quotes into destination.

Resources