Facts:
-I am running an Azure Data Factory Pipeline between AWS Redshift -> Azure Data Warehouse (since Power BI Online Service doesn't support Redshift as of this posts date)
-I am using Polybase for the copy since I need to skip a few problematic rows.
I use the "rejectValue" key and give it an integer.
-I made two Activity runs and got different errors on each run
Issue:
Run no:1 Error
Database operation failed. Error message from database execution : ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BooleanWritable,Source=.Net SqlClient Data Provider,SqlErrorNumber=106000,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=106000,State=1,Message=org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BooleanWritable,},],'.
Run No:2 Error
Database operation failed. Error message from database execution : ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message= ,Source=.Net SqlClient Data Provider,SqlErrorNumber=106000,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=106000,State=1,Message= ,},],'.
Below is the reply from Azure Data Factory product team:
Like Alexandre mentioned, the error #1 means you have a text valued column on the source Redshift where the corresponding column in SQL DW has type bit. You should be able to resolve the error by making the two column types compatible to each other.
Error #2 is another error from Polybase deserialization. Unfortunately the error message is not clear enough to find out the root cause. However, recently the product team has done some change on the staging format for Polybase load so you should no longer see such error. Do you have the Azure Data Factory runID for the failed job? The product team could take a look.
Power BI Online Service does support Redshift, through ODBC and an On-Premises Data Gateway (https://powerbi.microsoft.com/en-us/blog/on-premises-data-gateway-august-update/). You can install the latter on a Windows VM in Azure or AWS.
Redshift ODBC Drivers are here: http://docs.aws.amazon.com/redshift/latest/mgmt/install-odbc-driver-windows.html
Otherwise, your error indicates that one column of your SQL DW table does not have the expected data type (you probably have a BIT where a CHAR or VARCHAR should be.
Related
We’ve had twice intermittent issue of the copy activities running into
A transport-level error has occurred when receiving results from the
server. (provider: TCP Provider, error: 0 - An existing connection was
forcibly closed by the remote host.) And on the next run, the issue is
not there anymore.
For SQL, say if 100k records get batched into 10k records, will we end up with duplicate records if something happens in the middle of the copy activity? I believe the copy activity is not treated as a single DB transaction.
For UPSERT (copy activities) in SQL, we do have retry enabled, as the key columns will ensure no duplicates will be created. We’re wondering if we can also enable Retry for INSERT (copy activities).
In our other projects, we do have retry enabled for the copy activities for those involving Files (since as per link, files will just be picked up on the one that failed).
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#resume-from-last-failed-run
Resume happens at file level. If copy activity fails when copying a
file, in next run, this specific file will be re-copied.
Question is - will it be safe to enable RETRY for Copy Activites doing SQL Inserts (Azure SQL to another Azure SQL table)? Will it cause us to run into duplicate records when a transient error happens in the middle of the operation?
Unfortunately copy activities in adf are not transaction bound and unless there is a pre script involved, copy activity would only append the data thereby creating duplicates. So ideally best way would be to copy it within a staging table and then leverage stored procedure activity to move into final table that would be bound within transaction
So I am getting an error in Azure Data Factory that I haven't been able to find any information about. I am running a data flow and eventually (after an hour or so) get this error
{"StatusCode":"DFExecutorUserError","Message":"Job failed due to
reason: The service has encountered an error processing your request.
Please try again. Error code 1204.","Details":"The service has
encountered an error processing your request. Please try again. Error
code 1204."}
Troubleshooting I have already done :
I have successfully ran the data flow using the sample option. Did this with 1 million rows.
I am processing 3 years of data and I have successfully processed all the data by filter the data by year and running the data flow once for each year.
So I think I have shown the data isn't the problem, because I have processed all of it by breaking it down into 3 runs.
I haven't found a pattern in the time the pipeline runs for before the error occurs that would indicate I am hitting any timeout value.
The source and sink for this data flow are both an Azure SQL Server database.
Does anyone have any thoughts? Any suggestions for getting a more verbose error out of data factory (I already have the pipeline set with verbose logging).
We are glad to hear that you has found the cause:
"I opened a Microsoft support ticket and they are saying it is a
database transient caused failure."
I think the error will be resolved automatically. I post this as answer and this can be beneficial to other community members. Thank you.
Update:
The most important thing is that you have resolved it by increase the vCorces in the end.
"The only thing they gave me was their BS article on handling
transient errors. Maybe I’m just old but a database that cannot
maintain connections to it is not very useful. What I’ve done to
workaround this is increase my vCores. This sql database was a
serverless one. While performance didn’t look bad my guess is the
database must be doing some sort of resize in the background to
handle the hour long data builds I need it to do. I had already tried
setting the min/max vCores to be the same. The connection errors
disappeared when I increased the vCores count to 6."
I'm talking to Cosmos DB via the (SQL) REST API, so existing questions that refer to various SDKs are of limited use.
When I run a simple query on a partitioned container, like
select value count(1) from foo
I run into a HTTP 400 error:
The provided cross partition query can not be directly served by the gateway. This is a first chance (internal) exception that all newer clients will know how to handle gracefully. This exception is traced, but unless you see it bubble up as an exception (which only
happens on older SDK clients), then you can safely ignore this message.
How can I get rid of this error? Is it a matter of running separate queries by partition key? If so, would I have to keep track of what the existing key values are?
I'm new to the Azure Stream Job, and I want to use the reference data from Azure SQL DB to load into Power BI to have streaming data.
I've set up the storage account when setting up the SQL input table. I test the output table (Power BI) which is also fine, no error.
I tested both input table and output table connection, both are successfully connected, and I can see the input data from Input preview.
But when I tried to compose the query to test it out, the query cannot detect either input table or the output table.
The output table icon also grey out.
Error message: Query must refer to as least one data stream input.
Could you help me?
Thank you!!
The test query portal will not allow you to test the query if there are syntax errors. You will need to correct the syntax (as seen by yellow squiggles) before testing.
Here is a sample test query without any syntax error messages:
Stream Analytics does require to have at least one source coming from one of these 3 streaming sources: Event Hubs, IoT Hub, or Blob/ADLS. We don't support SQL as a streaming source at this time.
Using reference data is meant to augment the stream of data.
From your scenario, I see you want to get data from SQL to Power BI directly. For this, you can actually directly connect Power BI to your SQL source.
JS (Azure Stream Analytics)
I´m trying to grab Firebase analytics data from Google BigQuery with Azure Data Factory.
The Connection to BigQuery works but I have quite often timeout issues when running a (simple) query. 3 out of 5 times I run into a timeout. If no timeout occurs I recive the data as expected.
Can someone of you confirm this issue? Or has an idea what´s the reason for the.
Thanks & best,
Michael
Timeout issues could happen in the Azure Data Factory sometimes. It is affected by source dataset, sink dataset, network, query performance and other factors, etc. After all, your connectors are not azure services.
You could try to set timeout param follow this json chart. Or you could set retry times to deal with timeout issues.
If your sample data is so simple that can't be timeout,maybe you could commit feedback here to ask adf team about your concern.