Failed to parse data from SAP table via Azure Data Factory - azure

I am trying to extract data from SAP using SAP CDC Connector in ADF. The source data looks something like this.
START_DT|PROD_NAME|END_DT
20201230165830.0|BBEESABX|20180710143703.0
When we perform a preview data on the source, we are getting data just like above. But while performing copy via copy activity, below failure is observed :-
Failure happened on 'Source' side. ErrorCode=SapParsingDataFailure,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed when parsing data, parsing value: 'ESABX 201807', expected data type 'Microsoft.DataTransfer.Common.Shared.ClrTypeCode'.Please check your origin data in SAP side,Source=Microsoft.DataTransfer.Runtime.SapRfcHelper,''Type=System.FormatException,Message=Input string was not in a correct format.,Source=mscorlib,'
I have tried several combination and changes on sink side such as changing parquet to csv, changing Copy behavior to all available options...but nothing seems to work.

Probably you have hiding fields in the SAP Extractor? (RSA6). Try this workaround, make a selection of all fields in the SAP CDC connector and run it again.

Related

How to log errors in dataflow adf of parallel sources

I have to do some data engineering by reading manifest.cdm.json files from datalake.
add pipeline run id column and push to sql database.
I have one json list file which have required parameter to read CDM json file in source of dataflow.
Previous Approach: I used Foreach and passed parameter to dataflow with single activity then error capturing. But use of Dataflow with for each costs too much..
Current Approch: I mannually created Dataflow with all cdm files. But here I'm not able to capture error. If any source got error all dataflow activity fails. and If I select skip error in dataflow activity I'm not getting any error.
So what should be the approch to get errors from current approch.
You can capture the error using set variable activity in Azure Data Factory.
Use below expression to capture the error message using Set Variable activity:
#activity('Data Flow1').Error.message
Later you can store the error message in blob storage for future reference using copy activity. In below example we are saving error message in .csv file using DelimitedText dataset.

Azure ADF Salesforce connector Copy Activity failing with HybridDeliveryException

I'm trying to load the data from Salesforce table to ADLS path. To perform this I'm using SOQL formatted query in the source dataset(Salesforce) of ADF pipeline copy activity. Sample below.
Select distinct `col1`, `col2`, `col3`....... from table
This pipeline is working for all the tables except two table where it is failing with HybridDeliveryException (Exact error below)
I also tried pulling only 10 rows. still no luck. But for the same table is working without any issues by selecting all columns -> select * from table
Any suggestions greatly appreciated
Error:
Failure happened on 'Source' side. ErrorCode=UserErrorOdbcOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared. ,Message=ERROR [HY000] [Microsoft][DSI] (20051) Internal error using swap file "D:\Users_azbatchtask_410\AppData\Local\Temp\a60f5b9a-da9c-47b3-9d03-14d64bf44dce.tmp" in "Simba::DSI::DiskSwapDevice::DoFlushBlock": "[Microsoft][Support] (40635) Simba::Support::BinaryFile: Write of 57168 bytes on file "D:\Users_azbatchtask_410\AppData\Local\Temp\a60f5b9a-da9c-47b3-9d03-14d64bf44dce.tmp" failed: No space left on device".,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [HY000] [Microsoft][DSI] (20051) Internal error using swap file "D:\Users_azbatchtask_410\AppData\Local\Temp\a60f5b9a-da9c-47b3-9d03-14d64bf44dce.tmp" in "Simba::DSI::DiskSwapDevice::DoFlushBlock": "[Microsoft][Support] (40635) Simba::Support::BinaryFile: Write of 57168 bytes on file "D:\Users_azbatchtask_410\AppData\Local\Temp\a60f5b9a-da9c-47b3-9d03-14d64bf44dce.tmp" failed: No space left on device".,Source=Microsoft Salesforce ODBC Driver,'
This might not be a complete answer but this be helpful for someone as a workaround.
I ran some more tests today and when I remove the key word "distinct" in the SOQL statement, the query is working fine and no exceptions this time.
Seems like that the issue is occurring with only specific large tables.
But the SOQL with distinct (Select distinct col1, col2, col3.......) is working fine for other smaller tables.

Azure stream analytics split at comma

I have an input to my stream analytics job as a CSV string such as follows:
jon,41,111 treadmill lane,07831231123,aa,bb,123...etc.
I'd like to sort this data into columns of an SQL table with column headings:
name,age,address,phone,result1,result2,result3...etc.
I've tried using SQL split functions but none I've tried seem to be compatible with Azure stream analytics job query. Could anyone provide any assistance as to how I can split my string into the appropriate tables? Many thanks.
If your events are coming in with a CSV format, you don't have to do anything in your query to work with it. The trick is to set the correct serialisation for your input. When you create your IoT Hub input, set the serialisation to CSV:
This will work if your CSV message has the headers included in the message:
name,age,address,phone,result1,result2,result3
jon,41,111 treadmill lane,07831231123,aa,bb,123
It will show up in the input preview like so:
When the headers are present, you can use them in your queries.
SELECT
name,
age
INTO
target
FROM
[csv-input]

Azure Data Factory - Google BigQuery Copy Data activity not returning nested column names

I have a copy activity in Azure Data Factory with a Google BigQuery source.
I need to import the whole table (which contains nested fields - Records in BigQuery).
Nested fields get imported as follows (a string containing only data values):
"{\"v\":{\"f\":[{\"v\":\"1\"},{\"v\":\"1\"},{\"v\":\"1\"},{\"v\":null},{\"v\":\"1\"},{\"v\":null},{\"v\":null},{\"v\":\"1\"},{\"v\":null},{\"v\":null},{\"v\":null},{\"v\":null},{\"v\":\"0\"}]}}"
Expected output would be something like:
{"nestedColName" : [{"subNestedColName": 1}, {"subNestedColName": 1}, {"subNestedColName": 1}, {"subNestedColName": null}, ...] }
I think this is a connector issue from Data Factory's side but am not sure how to proceed.
Have considered using Databricks to import data from GBQ directly and then saving the DataFrame to sink.
Have also considered querying for a subset of columns and using UNNEST where required but would rather not do this as Parquet handles both Array and Map types.
Anyone encountered this before / what did you do?
Solution used:
Databricks (Spark) connector for Google BigQuery:
https://docs.databricks.com/data/data-sources/google/bigquery.html
This preserves schemas and nested field names.
Preferring the simpler setup of ADF BigQuery connector to Databricks's BigQuery support, I opted for a solution where I extract the data in JSON and 'massage' it into Parquet using Databricks:
Use a Copy activity to get data from BigQuery with all the data packed into a single JSON string field. Output format can be Parquet or JSON (I'm using Parquet). Use a BigQuery query like this:
select TO_JSON_STRING(t) as value from `<your BigQuery table>` as t
NOTE: The name of the field must be value. The df.write.text() text file writer writes the contents of value column into each row of the text file, which is a JSON string in this case.
Run a Databrick notebook activity with code like this:
# Read data and write it out as text file to get the JSON. (Compression is optional).
dfInput=spark.read.parquet(inputpath)
dfInput.write.mode("overwrite").option("compression","gzip").text(tmppath)
# Read back as JSON to extract the correct schema.
dfTemp=spark.read.json(tmppath)
dfTemp.write.mode("overwrite").parquet(outputpath)
Use the output as is, or use a Copy activity to copy it to where you like.

Azure Storage Explorer: Properties of type '' are not supported error

I inherited a project that uses an Azure table storage database. I'm using Microsoft Azure Storage Explorer as a tool to query and manage data. I'm attempting to migrate data from my Dev database to my QA database. To do this, I'm exporting a CSV from a Dev database table and then trying to import into the QA database table. For a small number of tables, I get the following error when I try to import the CSV:
Failed: Properties of type '' are not supported.
When I ran into this before, since I exported a "typed" CSV from Dev, I checked to make sure all "#type" columns had values. They did. Then I split the CSV (with thousands of records) up into smaller files to try to determine which record was the issue. When I did this and started importing them, I was ultimately able to import all of the records successfully by individual files which is peculiar. Almost like a constraint violation issue.
I'm also seeing errors with different types. Eg:
Properties of type 'Double' are not supported.
In this case, there is already a column in the particular table of type "Double".
Anyway, now that I'm seeing it again, I'm having trouble resolving it. Any thoughts?
UPDATE
I was able to track a few of these errors to "bad" data in the CSV. It was a JSON string in a Edm.String field that for some reason, it wasn't liking. I minified the JSON using an online tool and it imported fine. There is one data set, though, that has over 7,000 records I'm trying to import (the one I referenced breaking up previously earlier in this post). I ended up breaking it up into different files and was able to successfully import them individually. When I try to import the entire file after loading all the data through individual files, though, I again get an error.
I split the CSV (with thousands of records) up into smaller files to try to determine which record was the issue. When I did this and started importing them, I was ultimately able to import all of the records successfully by individual files which is peculiar.
Based on your test, the format and data of source CSV file seems ok. It will be difficult to find out why Azure Storage Explorer return those unexpected error while importing large CSV file. You can try to upgrade your Azure Storage Explorer and check if you can export and import data successfully using the latest Azure Storage Explorer.
Besides, you can try to use AzCopy (designed for copying data to and from Microsoft Azure Blob, File, and Table storage using simple commands with optimal performance) to export/import table.
Export table:
AzCopy /Source:https://myaccount.table.core.windows.net/myTable/ /Dest:C:\myfolder\ /SourceKey:key /Manifest:abc.manifest
Import table:
AzCopy /Source:C:\myfolder\ /Dest:https://myaccount.table.core.windows.net/mytable1/ /DestKey:key /Manifest:"abc.manifest" /EntityOperation:InsertOrReplace

Resources