Delta Loading in Informatica Developer

Delta Loading in Informatica Developer - databricks

I need to apply delta loading in Informatica Developer. But i receive this error: 'System error: Evaluation failed and was not completed. Check the Developer tool logs for details.'
There are 2 basic csv files in ADLS Gen2. (Lets csv1 and csv2)
I compare csv1 with cs2.
If there is a new data in csv2 which does not exist in csv1, i want to insert this data into csv1.
If there is any update on any data, i want to update related data at csv1.
I took csv2 with 'read' option. (physical data object-read)
I took csv1 with 'lookup' option. (physical data object-lookup)
I drag and dropped columns from csv2 into csv1 lookup.
In expression i had one flag which has 'I','U','NA'. 'I' shows that this data should be inserted. 'U' shows that this data should be updated. 'NA' shows that there is no change on data so no need any action.
After having this flag, i added update strategy with 'IIF(FLAG='I',DD_INSERT,IIF(FLAG='U',DD_UPDATE))'.
When i run data viewer on update strategy it shows me correctly.
After update strategy i added csv1 'physical object with write option'.
But i receive this error: 'System error: Evaluation failed and was not completed. Check the Developer tool logs for details.'
(Note: Because Informatica Developer said to me lookup can not be run in native run-time environment, i changed it with databricks.)
Do you have any suggestion?
Thank you

Related

Failed to parse data from SAP table via Azure Data Factory

I am trying to extract data from SAP using SAP CDC Connector in ADF. The source data looks something like this.
START_DT|PROD_NAME|END_DT
20201230165830.0|BBEESABX|20180710143703.0
When we perform a preview data on the source, we are getting data just like above. But while performing copy via copy activity, below failure is observed :-
Failure happened on 'Source' side. ErrorCode=SapParsingDataFailure,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed when parsing data, parsing value: 'ESABX 201807', expected data type 'Microsoft.DataTransfer.Common.Shared.ClrTypeCode'.Please check your origin data in SAP side,Source=Microsoft.DataTransfer.Runtime.SapRfcHelper,''Type=System.FormatException,Message=Input string was not in a correct format.,Source=mscorlib,'
I have tried several combination and changes on sink side such as changing parquet to csv, changing Copy behavior to all available options...but nothing seems to work.

Probably you have hiding fields in the SAP Extractor? (RSA6). Try this workaround, make a selection of all fields in the SAP CDC connector and run it again.

Azure ADF Salesforce connector Copy Activity failing with HybridDeliveryException

I'm trying to load the data from Salesforce table to ADLS path. To perform this I'm using SOQL formatted query in the source dataset(Salesforce) of ADF pipeline copy activity. Sample below.
Select distinct `col1`, `col2`, `col3`....... from table
This pipeline is working for all the tables except two table where it is failing with HybridDeliveryException (Exact error below)
I also tried pulling only 10 rows. still no luck. But for the same table is working without any issues by selecting all columns -> select * from table
Any suggestions greatly appreciated
Error:
Failure happened on 'Source' side. ErrorCode=UserErrorOdbcOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared. ,Message=ERROR [HY000] [Microsoft][DSI] (20051) Internal error using swap file "D:\Users_azbatchtask_410\AppData\Local\Temp\a60f5b9a-da9c-47b3-9d03-14d64bf44dce.tmp" in "Simba::DSI::DiskSwapDevice::DoFlushBlock": "[Microsoft][Support] (40635) Simba::Support::BinaryFile: Write of 57168 bytes on file "D:\Users_azbatchtask_410\AppData\Local\Temp\a60f5b9a-da9c-47b3-9d03-14d64bf44dce.tmp" failed: No space left on device".,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [HY000] [Microsoft][DSI] (20051) Internal error using swap file "D:\Users_azbatchtask_410\AppData\Local\Temp\a60f5b9a-da9c-47b3-9d03-14d64bf44dce.tmp" in "Simba::DSI::DiskSwapDevice::DoFlushBlock": "[Microsoft][Support] (40635) Simba::Support::BinaryFile: Write of 57168 bytes on file "D:\Users_azbatchtask_410\AppData\Local\Temp\a60f5b9a-da9c-47b3-9d03-14d64bf44dce.tmp" failed: No space left on device".,Source=Microsoft Salesforce ODBC Driver,'

This might not be a complete answer but this be helpful for someone as a workaround.
I ran some more tests today and when I remove the key word "distinct" in the SOQL statement, the query is working fine and no exceptions this time.
Seems like that the issue is occurring with only specific large tables.
But the SOQL with distinct (Select distinct col1, col2, col3.......) is working fine for other smaller tables.

ADF Default Columns

ADF Copy task:
Importing flat files with wildcard *.txt, some files have 18 cols, some have 24.
SQL table sink has 24 cols.
Fails because it does not find a mapping for cols 19-24.
Can i default the mapping of the last 6 cols to NULL when no value is found ?
EDIT:
I copied my source to blob and used a dataflow with schema drift instead. I can connect to my source and can see that it writes parquet files to the staging folder, but after calculating the rows the workflow fails with error:
Operation on target Dataflow1 failed: {"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'nsodevsynapse': Unable to stage data before write. Check configuration/credentials of storage","Details":"org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: This operation is not permitted on a non-empty directory.\n\tat org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.delete(AzureNativeFileSystemStore.java:2607)\n\tat org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.delete(AzureNativeFileSystemStore.java:2617)\n\tat org.apache.hadoop.fs.azure.NativeAzureFileSystem.deleteFile(NativeAzureFileSystem.java:2657)\n\tat org.apache.hadoop.fs.azure.NativeAzureFileSystem$2.execute(NativeAzureFileSystem.java:2391)\n\tat org.apache.hadoop.fs.azure.AzureFileSystemThreadPoolExecutor.executeParallel(AzureFileSystemThreadPoolExecutor.java:223)\n\tat org.apache.hadoop.fs.azure.NativeAzureFileSystem.deleteWithoutAuth(NativeAzureFileSystem.java:2403)\n\tat org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:2453)\n\tat org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1936)\n\tat org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter."}
My sink is using a sql account to connect, i can connect to sql using that account. I can write edit SQL tables using that account.
the managed instance has owner permissions on the storage account.

You can load data using the dataflow activity by enabling “Allow schema drift” in source and sink transformations and it will automatically default the values to NULL when not passed.
Source files:
Dataflow:
• In source and Sink, enable "Allow schema drift" if the source schema changes often.
• Add mapping to map all source columns to destination.
Destination SQL table:

How to load a table from a databse in synapse to default(spark)?

I am trying to load a table sitting in a database in synapse azure to default(spark) so that i can call the table to run the respective pandas code. However i am not able to do it.
%%spark
val df = spark.read.sqlanalytics("emea***********.rpt.Vw_APInvoices")
df.write.mode("overwrite").saveAsTable("default.t1")
Error:
Error : com.microsoft.spark.sqlanalytics.exception.SQLAnalyticsConnectorException: The specified table does not exist. Please provide a valid table.
at com.microsoft.spark.sqlanalytics.read.SQLAnalyticsReader.readSchema(SQLAnalyticsReader.scala:103)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$.create(DataSourceV2Relation.scala:175)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:204)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at org.apache.spark.sql.SqlAnalyticsConnector$SQLAnalyticsFormatReader.sqlanalytics(SqlAnalyticsConnector.scala:42)
... 52 elided

The error message clearly says - The specified table does not exist. Please provide a valid table.
Error : com.microsoft.spark.sqlanalytics.exception.SQLAnalyticsConnectorException: The specified table does not exist. Please provide a valid table.
Make sure specified table exists before you running above code.
Reference: Azure Synapse Analytics - Load the NYC Taxi data into the Spark nyctaxi database.

How to write into Microsoft SQL Server table even if table exist using PySpark

I have a PySpark Code which writes into SQL Server database like this
df.write.jdbc(url=url, table="AdventureWorks2012.dbo.people", properties=properties)
However problem is that I want to keep writing in the table people even if the table exist and I see in the Spark Document that there are possible options error, append, overwrite and ignore for mode and all of them throws error, the object already exist if the table already exist in the database.
Spark throw following error
py4j.protocol.Py4JJavaError: An error occurred while calling o43.jdbc.
com.microsoft.sqlserver.jdbc.SQLServerException: There is already an object named 'people' in the database
Is there way to write data into the table even if the table already exits ?
Please let me know you need more explanation

For me the issue was with Spark 1.5.2. The way it checks if the table exists (here) is by running SELECT 1 FROM $table LIMIT 1. If the query fails, the tables doesn't exist. That query failed even when the table was there.
This was changed to SELECT * FROM $table WHERE 1=0 in 1.6.0 (here).

So append and overwrite mode will not throw an error when the table already exists. From the spark documentation ( http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes ) SaveMode.Append will "When saving a DataFrame to a data source, if data/table already exists, contents of the DataFrame are expected to be appended to existing data." and SaveMode.Overwrite will "Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame." Depending on how you want to handle the existing table one of these two should likely meet your needs.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Delta Loading in Informatica Developer - databricks

Related

Failed to parse data from SAP table via Azure Data Factory

Azure ADF Salesforce connector Copy Activity failing with HybridDeliveryException

ADF Default Columns

How to load a table from a databse in synapse to default(spark)?

How to write into Microsoft SQL Server table even if table exist using PySpark

Categories

Resources