Ecodage Data factory DelimitedText source - azure

I have data factory pipeline with source (DelimitedText) on SFTP with ecodage ISO-8859-13, it was working without any problems with the special characetrs, but yesterday it's blocked with many errors, with the special characters,
Have you any solution for this problem ? whitch kind of encodage you use on ADF to read special characters (like this: commerçant)
Best regards,
I try to copy data with data factory pipeline

ADF delimited text format supports below encoding types to read/write files.
Ref doc: ADF Delimited format dataset properties
I just tried with default encoding type UTF-8 for the special character you have provided (like this: commerçant) and it works fine without any issue.
I have also tried with encoding type ISO-8859-13 and didn't notice any issue for the character type you have provided.
Hence assuming some other special characters might be causing the problem. But I would recommend trying to default UTF-8 and see if that helps. In case if the issue still persists, then try using the Skip incompatible rows, which will help filter down the bad records and log them to a log file from where you can get the exact rows which are causing the pipeline to fail and based on those records you can choose the encoding type in your dataset settings.

Related

How to use backslash in Azure Data Factory?

I have an 'Execute SSIS Package' activity in my ADF, that requires from me passing the parameter for Environment. In my case the paramater value is .\Dev. Unfortunately, the dot - backslash sentence is required by SSIS.
I am trying to provide the parameter dynamically, but every time my value .\Dev is being changed to .\\Dev (Each backslash is doubled inside Azure Data Factory). I have tried different solutions like looking up the parameter in the database or setting a variable and even replacing double backslash with single one, but it didn't work - every time I should get .\Dev i was getting .\\Dev. It also looks like this double backslash is treated as one character by ADF (in case of replace or substring functions).
Can anyone help with that?

U-SQL Error - Change the identifier to use at least one lower case letter

I am fairly new to U-SQL and trying to run a U-SQL script in Azure Data Lake Analytics to process a parquet file using the Parquet extractor functionality. I am getting the below error and I don't find a way to get around it.
Error - Change the identifier to use at least one lower case letter. If that is not possible, then escape that identifier (for example: '[ACTIVITY]'), or embed it in a CSHARP() block (e.g CSHARP(ACTIVITY)).
Unfortunately all the different fields generated in the Parquet file are capitalized and I don't want to to escape these identifiers. I have tried if I could wrap the identifier with CSHARP block and it fails as well (E_CSC_USER_RESERVEDKEYWORDASIDENTIFIER: Reserved keyword CSHARP is used as an identifier.) Is there anyway I could extract the parquet file? Thanks for your help!
Code Snippet:
SET ##FeaturePreviews = "EnableParquetUdos:on";
#var1 =
EXTRACT ACTIVITY string,
AUTHOR_NAME string,
AFFLIATION string
FROM "adl://xxx.azuredatalakestore.net/Abstracts/FY2018_028"
USING Extractors.Parquet();
#var2 =
SELECT *
FROM #var1
ORDER BY ACTIVITY ASC
FETCH 5 ROWS;
OUTPUT #var2
TO "adl://xxx.azuredatalakestore.net/Results/AbstractsResults.csv"
USING Outputters.Csv();
Based on your description you try to say
EXTRACT ALLCAPSNAME int FROM "/data.parquet" USING Extractors.Parquet();
In U-SQL, we reserve all caps identifiers so we can add new keywords in the future without invalidating old scripts.
To work around, you just have to quote the name (escape it) like in any other SQL dialect:
EXTRACT [ALLCAPSNAME] int FROM "/data.parquet" USING Extractors.Parquet();
Note that this is not changing the name of the field. It is just the syntactic way to address the field.
Also note, that in most SQL communities, it is considered a best practice to always quote identifiers to avoid reserved keyword clashes.
If all fields in the Parquet file are all caps, you will have to quote them all... In a future update you will be able to say EXTRACT * FROM … for Parquet (and Orc) files, but you still will need to quote the columns when you refer to them explicitly.

Using tab as delimiter in tFileInputDelimited component in Talend Open Studio

I have written an ETL in Talend Open Studio that loads a CSV/TSV file in a database. To do so, I want to provide the delimiter in tFileInputDelimited component using dynamic context load from a text file. I have specified it in the context file as fieldDelimiter="\t" and in the tFileInputDelimited component as shown in the screenshot. But, it doesn't work as a delimiter. I have also tried using fieldDelimiter="\\t" or fieldDelimiter="\u0009" (unicode character for tab).
What should I provide in the context file so that the delimiter is a tab character and not "\t" string as is happening in this case?
I notice a difference in the context variable names. In the screen shot you have mentioned (String)context.get("fileDelimiter"). But in the text you are saying "I have specified it in the context file as fieldDelimiter="\t" ".
just keeping the context as follows in the .properties file should work
fieldDelimiter=\t
Also use context.fieldDelimiter instead of (String)context.get("fileDelimiter").
In your context file, just put fileDelimiter = \t
(Without quotes)
And then access the variable in field delimiter. Talend will automatically handle it as string.
Hope this works.
There is no function (String)context.get("key") that I know of. If you have set the separator as a String element in the Context, just access it directly. Now there will be an empty String set as the field separator I suppose.
So if your field is called fileDelimiter simply put context.fileDelimiter into the Field Separator.
As pointed out by others, you should use context.ParamName syntax, the benefit of this method is syntax checking at compile time which eliminates the risk of typos in your variable names.
This parameter must be declared in your job (contexts tab) in order for Talend to recognize it. You can either create it as a built-in or import it if it's in the repository.

PexObserve only records 255 characters

I am using Pex from the command line to find input values for test case generation.
I use PexObserve to record certain values during execution.
One of the values that I want to record is an XML-String.
However, when parsing the XML I receive "malformed XML" exceptions, since Pex only writes the first 255 characters into the log.
Is there a way to record the full XML string? or does PexObserve have a different type that will let me record longer texts?
Leaving this here, in case somebody at any point has the same issue.
I've found a solution that helped me.
Unfortunately the 255 character limit is set internally in static readonly fields.
Therefore I needed to use reflection.
My solution works by including the following line in the PUT:
typeof(Microsoft.Pex.Framework.PexObserve.ValueWriterManager).GetField("MaxWrittenElements").SetValue(null, 1000);
Replace the 1000 with any value you like.
BUT: remember that this is a quick-fix solution, that might not work for you.
It may have unwanted side-effects. You're also changing the number of List elements that are written, and perhaps other things.

ABAP startRFC.exe UTF-8 diacritics text transfer

I have a function module (FM) in SAP and I call it externally using startRFC. The only output of FM is one internal table. This table has only 1 column of type char(100) and I need to get it to text file. StartRFC works well, but if there is diacritics (for example Czech language: ěščřžýáíé) instead of these characters only hashes # appear.
Have someone ever solved similar issue?
If I call the same algorithm manually and write strings on screen in SAP, everything is ok. But startRFC somehow destroys it. The problem may be in the data transfer between SAP and startRFC. But I don't know how this transfer works.
I found a solution but it is terribly slow. It converts string to hexadecimal string using "gcl_conv_to_x->write" and "gcl_conv_to_x->get_buffer" than calls "SCMS_XSTRING_TO_BINARY" and you need a binary table. But it takes 5minutes to do all this stuff. Without this conversion my algorithm takes 15 seconds.
So finally a solution...
You need to create XSTRING variable and fill it with your text. To convert STRING to XSTRING use FM: SCMS_STRING_TO_XSTRING.
Then you will need an internal table with row type BAPICONTEN. It already contains component (column) of type SDOK_SDATX (RAW 1022).
And you just append a new line to this table like this:
data: my_table_row LIKE LINE OF my_table.
my_table_row-line = my_xstring.
APPEND my_table_row INTO my_table.
This table (my_table) can be returned via RFC and will contain Cyrillic, German characters etc..
I am just a beginner, so do not ask me how to create the table, please :)

Resources