Tab Delimiter in Data Factory

Tab Delimiter in Data Factory - azure

I am running into an issue when trying to parse the data from a config file in Data Factory.
I am using a configuration file and the items are called in the copy activity. We have the option to parameterize the 'Column Delimiter' field from a data set, so I am using the value from the file (because in some cases is ';' and in others '\t').
When the delimiter is semicolon is working perfectly, but when it's \t , I get the following error :
Copy activity doesn't support multi-char or none column delimiter.
When I'm checking the value that goes into the field, I see that the value is not the one from the file (\t), but \\t.
Do you have any idea why this behavior or if there is an escape character for this one. I also tried with ASCII code (\0009) and I get the same error - it doesn't know to transform it. Thanks a lot!

Can you try passing a real tab copied from a text editor, like - ' '.
This has been seen to work.
Had there been no parameterization in the delimiter, you could have done it through the GUI or even the code.

The short answer, is when entering a tab value in the UI, do not use \t, instead use " ".
Between the empty quotes, I pasted an actual tab character.

Based on the statements in the official document, Currently, multi-char delimiter is only supported for mapping data flow but not Copy activity.
You could try to use mapping data flows which is also designed data transformations in ADF. Please see more details here: https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-overview
Any concern,please let me know.

You should use t instead of \t. Data Factory replaces t with \t itself. That is why \t ends up as \t

Related

Adding comma and quotes for two columns in sublime on Mac

I have data in sublime arranged as follows;
353154 0.001039699782
506472 0.02085443362
482346 0.08343533852
439791 0.001349253676
486087 0.9999999476
I am trying to put quotes and commas so as to get following output;
('353154', '0.001039699782'),
('506472', '0.02085443362'),
('482346', '0.08343533852'),
('439791', '0.001349253676'),
('486087', '0.9999999476')
I am aware of using CMD+Shift+L in order to move cursors right and left. But I need help on how to get the commas and quotes between the two columns. Kindly advise.

You can do this with regex search and replace. Select Find → Replace… and make sure the Regular Expression button (usually looking like this *) is selected. In the Find: field, enter
([\d\.]+)\s+([\d\.]+)
and in the Replace: field, enter
('\1', '\2'),
Hit Replace All and it will look like this
See the demo at regex101.com, which also explains what each of the capture and replacement groups do.
Please keep in mind that you'll have to delete the final comma , if that's not proper syntax for your language of choice.

In Azure Mapping Dataflows has anyone been able to successfully change the escape character of a Source Dataset?

Has anyone tried this using the Mapping Dataflows?
Example input field is:
"This is a sentence, it contains ""double quotes"" and a comma"
The escape character is a " and the quote character is a ".
When I use a regular Copy activity this works without a problem, however
when using the same Dataset in a Mapping Dataflow it gets parsed into 2 fields instead of one. In fact changing the escape character makes no difference.

Closing this issue. I've realised that output still has the \ as the escape character so when opening the output file in Excel it appears corrupted

How to remove \r\n line breaks in a text file that are within quotes and not the end of the row

I have a large set of files that contain line breaks within a column that are all wrapped in quotes, but U-SQL cannot process the files because it is seeing the \r\n as the end of the row despite being wrapped in quotes.
Is there an easy way to fix these files other than opening each file up individually in something like notepad++? It seems there should be a way to ignore line breaks if they are contained within quotes.
Example is something like this:
1,200,400,"123 street","123 street,\r\nNew York, NY\r\nUnited States",\N,\N,200\r\n
Notepad++ works fine for finding and replacing values manually, but I'm trying to find a batch way to do this because I have multiple files (50+ per source table) and hundreds of thousands of records in each that I need to fix.

According to U-SQL GitHub issue 84: USQL and embedded newline characters you can either build a custom extractor, or try to use the escapeCharacter parameter of the built-in extractor:
USING Extractors.Csv(quoting : true, escapeCharacter : '\\') // quoting is true by default, but it does not hurt to repeat.

SQL Loader unable to load CSV file data into linux environment correctly

i was able to load the same comma delimited csv file's data into window oracle database correctly but in linux environment, the record being inserted having weird behavior. For example, the data being inserted are having a behavior like \n. i selected the record and paste it out notice that the record is like this
"data
"
the control file i used is as below
Load DATA
REPLACE INTO TABLE TABLE_NM
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
please advice what i can do to make this scenario right. thank you in advance

Its the classic issue where on *nix systems lines end with a linefeed, but on Windows lines end with a carriage return/linefeed. Since your data ends with carriage return/linefeed it is read fine on Windows, but Linux loads the carriage return.
You can either preprocess the data file and replace the line (record) termination character with a utility like dos2unix or change the control file by adding the STR clause to the INFILE option to set the record termination character to the carriage return:
INFILE "test.dat" "STR x'0D'"
I would opt for running the data through dos2unix to keep the control file more generic and not data filename specific.

After investigation, notice that the root cause of the issues is the feed file is not generated from Linux base environment. So after I manually convert the file into Linux version, the feed file is able to load into DB without any issues.

Embedding text in AS2, like HEREDOC or CDATA

I'm loading a text file into a string variable using LoadVars(). For the final version of the code I want to be able to put that text as part of the actionscript code and assign it to the string, instead of loading it from an external file.
Something along the lines of HEREDOC syntax in PHP, or CDATA in AS3 ( http://dougmccune.com/blog/2007/05/15/multi-line-strings-in-actionscript-3/ )
Quick and dirty solutions I've found is to put the text into a text object in a movieclip and then get the value, but I dont like it
Btw: the text is multiline, and can include single quotes and double quotes.
Thanks!

I think in AS2 the only way seems to do it dirty. In AS3 you can embed resources with the Embed tag, but as far as I know not in AS2.
If it's a final version and it means you don't want to edit the text anymore, you could escape the characters and use \n as a line break.
var str = "\'one\' \"two\"\nthree";
trace(str);
outputs:
'one' "two"
three
Now just copy the text into your favourite text editor and change every ' and " to \' and \", also the line breaks to \n.

Cristian, anemgyenge's solution works when you realize it's a single line. It can be selected and replaced in a simple operation.
Don't edit the doc in the code editor. Edit the doc in a doc editor and create a process that converts it to a long string (say running it through a quick PHP script). Take the converted string and paste it in over the old string. Repeat as necessary.
It's way less than ideal from a long-term management perspective, especially if code maintenance gets handed off without you handing off the parser, but it gets around some of your maintenance issues.

Use a new pair of quotes on each line and add a space as the word delimiter:
var foo = "Example of string " +
"spanning multiple lines " +
"using heredoc syntax."
There is a project which may help that adds partial E4X support to ActionScript 2:
as24x
As well as a project which adds E4X support to Haxe, which can compile to a JavaScript target:
E4X Macro for Haxe

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Tab Delimiter in Data Factory - azure

Can you try passing a real tab copied from a text editor, like - ' '. This has been seen to work. Had there been no parameterization in the delimiter, you could have done it through the GUI or even the code.

The short answer, is when entering a tab value in the UI, do not use \t, instead use " ". Between the empty quotes, I pasted an actual tab character.

You should use t instead of \t. Data Factory replaces t with \t itself. That is why \t ends up as \t

Related

Adding comma and quotes for two columns in sublime on Mac

In Azure Mapping Dataflows has anyone been able to successfully change the escape character of a Source Dataset?

How to remove \r\n line breaks in a text file that are within quotes and not the end of the row

SQL Loader unable to load CSV file data into linux environment correctly

Embedding text in AS2, like HEREDOC or CDATA

Categories

Resources