Using tab as delimiter in tFileInputDelimited component in Talend Open Studio - string

I have written an ETL in Talend Open Studio that loads a CSV/TSV file in a database. To do so, I want to provide the delimiter in tFileInputDelimited component using dynamic context load from a text file. I have specified it in the context file as fieldDelimiter="\t" and in the tFileInputDelimited component as shown in the screenshot. But, it doesn't work as a delimiter. I have also tried using fieldDelimiter="\\t" or fieldDelimiter="\u0009" (unicode character for tab).
What should I provide in the context file so that the delimiter is a tab character and not "\t" string as is happening in this case?

I notice a difference in the context variable names. In the screen shot you have mentioned (String)context.get("fileDelimiter"). But in the text you are saying "I have specified it in the context file as fieldDelimiter="\t" ".
just keeping the context as follows in the .properties file should work
fieldDelimiter=\t
Also use context.fieldDelimiter instead of (String)context.get("fileDelimiter").

In your context file, just put fileDelimiter = \t
(Without quotes)
And then access the variable in field delimiter. Talend will automatically handle it as string.
Hope this works.

There is no function (String)context.get("key") that I know of. If you have set the separator as a String element in the Context, just access it directly. Now there will be an empty String set as the field separator I suppose.
So if your field is called fileDelimiter simply put context.fileDelimiter into the Field Separator.

As pointed out by others, you should use context.ParamName syntax, the benefit of this method is syntax checking at compile time which eliminates the risk of typos in your variable names.
This parameter must be declared in your job (contexts tab) in order for Talend to recognize it. You can either create it as a built-in or import it if it's in the repository.

Related

Ecodage Data factory DelimitedText source

I have data factory pipeline with source (DelimitedText) on SFTP with ecodage ISO-8859-13, it was working without any problems with the special characetrs, but yesterday it's blocked with many errors, with the special characters,
Have you any solution for this problem ? whitch kind of encodage you use on ADF to read special characters (like this: commerçant)
Best regards,
I try to copy data with data factory pipeline
ADF delimited text format supports below encoding types to read/write files.
Ref doc: ADF Delimited format dataset properties
I just tried with default encoding type UTF-8 for the special character you have provided (like this: commerçant) and it works fine without any issue.
I have also tried with encoding type ISO-8859-13 and didn't notice any issue for the character type you have provided.
Hence assuming some other special characters might be causing the problem. But I would recommend trying to default UTF-8 and see if that helps. In case if the issue still persists, then try using the Skip incompatible rows, which will help filter down the bad records and log them to a log file from where you can get the exact rows which are causing the pipeline to fail and based on those records you can choose the encoding type in your dataset settings.

How to use backslash in Azure Data Factory?

I have an 'Execute SSIS Package' activity in my ADF, that requires from me passing the parameter for Environment. In my case the paramater value is .\Dev. Unfortunately, the dot - backslash sentence is required by SSIS.
I am trying to provide the parameter dynamically, but every time my value .\Dev is being changed to .\\Dev (Each backslash is doubled inside Azure Data Factory). I have tried different solutions like looking up the parameter in the database or setting a variable and even replacing double backslash with single one, but it didn't work - every time I should get .\Dev i was getting .\\Dev. It also looks like this double backslash is treated as one character by ADF (in case of replace or substring functions).
Can anyone help with that?

U-SQL Error - Change the identifier to use at least one lower case letter

I am fairly new to U-SQL and trying to run a U-SQL script in Azure Data Lake Analytics to process a parquet file using the Parquet extractor functionality. I am getting the below error and I don't find a way to get around it.
Error - Change the identifier to use at least one lower case letter. If that is not possible, then escape that identifier (for example: '[ACTIVITY]'), or embed it in a CSHARP() block (e.g CSHARP(ACTIVITY)).
Unfortunately all the different fields generated in the Parquet file are capitalized and I don't want to to escape these identifiers. I have tried if I could wrap the identifier with CSHARP block and it fails as well (E_CSC_USER_RESERVEDKEYWORDASIDENTIFIER: Reserved keyword CSHARP is used as an identifier.) Is there anyway I could extract the parquet file? Thanks for your help!
Code Snippet:
SET ##FeaturePreviews = "EnableParquetUdos:on";
#var1 =
EXTRACT ACTIVITY string,
AUTHOR_NAME string,
AFFLIATION string
FROM "adl://xxx.azuredatalakestore.net/Abstracts/FY2018_028"
USING Extractors.Parquet();
#var2 =
SELECT *
FROM #var1
ORDER BY ACTIVITY ASC
FETCH 5 ROWS;
OUTPUT #var2
TO "adl://xxx.azuredatalakestore.net/Results/AbstractsResults.csv"
USING Outputters.Csv();
Based on your description you try to say
EXTRACT ALLCAPSNAME int FROM "/data.parquet" USING Extractors.Parquet();
In U-SQL, we reserve all caps identifiers so we can add new keywords in the future without invalidating old scripts.
To work around, you just have to quote the name (escape it) like in any other SQL dialect:
EXTRACT [ALLCAPSNAME] int FROM "/data.parquet" USING Extractors.Parquet();
Note that this is not changing the name of the field. It is just the syntactic way to address the field.
Also note, that in most SQL communities, it is considered a best practice to always quote identifiers to avoid reserved keyword clashes.
If all fields in the Parquet file are all caps, you will have to quote them all... In a future update you will be able to say EXTRACT * FROM … for Parquet (and Orc) files, but you still will need to quote the columns when you refer to them explicitly.

escaped Ambersand in JSF i18n Resource Bundle

i have something like
<s:link view="/member/index.xhtml" value="My News" propagation="none"/>
<s:link view="/member/index.xhtml" value="#{msg.myText}" propagation="none"/>
where the value of myText in the messages.properties is
myText=My News
The first line of the example works fine and replaces the text to "My News", but the second that uses a value from the resource bundle escapes the ambersand, too "My&#160;News".
I tried also to use unicode escape sequences for the ambersand and/or hash with My\u0026\u0023160;News, My\u0026#160;News and My\u0026nbsp;News in the properties file without success.
(Used css no-wrap instead of the previous used xml encoding, but would be interested anyway)
EDIT - Answer to clarified question
The first is obviously inline, so interpreter knows that this is safe.
The second one comes from external source (you are using Expression Language) and as such is not safe and need to be escaped. The result of escaping would be as you wrote, basically it will show you the exact value of HTML entity.
This is related to security (XSS for example) and not necessary i18n.
Previous attempt
I don't quite know what you are asking for but I believe it is "how to display it?".
Most of the standard JSF controls contain escape attribute that if set to false won't escape the text. Unfortunately it seems that you are using something like SeamTools which does not have this attribute.
Well, in this case there is not much to be done. Unless you could use standard control, maybe you should go and try to actually save your properties file as Unicode (UTF-16 BigEndian in fact) and simply put valid Unicode non-breaking space character. Theoretically that should work; Unicode-encoded properties files are supported in latest version of Java (although I cannot recall if it was Java SE 5 or Java SE 6)...

Invalid Parameter Name

I am having a problem with saving of data because of an incorrectly generated parameter name.
The table has a field "E-mail", and when the class wrapper is generated, the InsertCmd uses "#E-mail" as one of the parameters. In SQL Server, this is illegal and generated an exception.
I have hunted all over SubSonic for a way to modify the parameter name to simply "#Email" but the ParameterName property is read only.
I am using SubSonic 2.2 and don't have the source for it to make an internal modification.
Any ideas?
TIA
I got a mate of mine that uses SVN to pull the source code, and as expected, found a bug in the SS source.
When the column name is set in the class wrapper, the setter for the ColumnName property sets the ParamaterName property for you correctly using
"parameterName = Utility.PrefixParameter(Utility.StripNonAlphaNumeric(columnName), Table.Provider);". This removes any illegal characters like the hyphen in my E-mail column.
BUT... The property ParameterName is NOT used when the SQL commands are created. Here is the code in SQLDataProvider.GetInsertSQL, line 1462.
pars.Append(Utility.PrefixParameter( colName, this));
I changed this to
pars.Append(col.ParameterName);
and the problem is now sorted.
Thanks to you that came up with possible solutions.
You can modify the templates if you can't change the column name. See this blog post for details of how:
http://johnnycoder.com/blog/2008/06/09/custom-templates-with-subsonic/

Resources