Azure Data Factory Connection to Google Big Query Timeout Issues - azure

I´m trying to grab Firebase analytics data from Google BigQuery with Azure Data Factory.
The Connection to BigQuery works but I have quite often timeout issues when running a (simple) query. 3 out of 5 times I run into a timeout. If no timeout occurs I recive the data as expected.
Can someone of you confirm this issue? Or has an idea what´s the reason for the.
Thanks & best,
Michael

Timeout issues could happen in the Azure Data Factory sometimes. It is affected by source dataset, sink dataset, network, query performance and other factors, etc. After all, your connectors are not azure services.
You could try to set timeout param follow this json chart. Or you could set retry times to deal with timeout issues.
If your sample data is so simple that can't be timeout,maybe you could commit feedback here to ask adf team about your concern.

Related

Azure Data Factory error DFExecutorUserError error code 1204

So I am getting an error in Azure Data Factory that I haven't been able to find any information about. I am running a data flow and eventually (after an hour or so) get this error
{"StatusCode":"DFExecutorUserError","Message":"Job failed due to
reason: The service has encountered an error processing your request.
Please try again. Error code 1204.","Details":"The service has
encountered an error processing your request. Please try again. Error
code 1204."}
Troubleshooting I have already done :
I have successfully ran the data flow using the sample option. Did this with 1 million rows.
I am processing 3 years of data and I have successfully processed all the data by filter the data by year and running the data flow once for each year.
So I think I have shown the data isn't the problem, because I have processed all of it by breaking it down into 3 runs.
I haven't found a pattern in the time the pipeline runs for before the error occurs that would indicate I am hitting any timeout value.
The source and sink for this data flow are both an Azure SQL Server database.
Does anyone have any thoughts? Any suggestions for getting a more verbose error out of data factory (I already have the pipeline set with verbose logging).
We are glad to hear that you has found the cause:
"I opened a Microsoft support ticket and they are saying it is a
database transient caused failure."
I think the error will be resolved automatically. I post this as answer and this can be beneficial to other community members. Thank you.
Update:
The most important thing is that you have resolved it by increase the vCorces in the end.
"The only thing they gave me was their BS article on handling
transient errors. Maybe I’m just old but a database that cannot
maintain connections to it is not very useful. What I’ve done to
workaround this is increase my vCores. This sql database was a
serverless one. While performance didn’t look bad my guess is the
database must be doing some sort of resize in the background to
handle the hour long data builds I need it to do. I had already tried
setting the min/max vCores to be the same. The connection errors
disappeared when I increased the vCores count to 6."

How Monitor - Cosmos DB (preview) Requests is calculated?

Azure provides monitor to the incoming request to the Cosmos. When I am alone working on my Cosmos DB, ran a simple select vertex statement(eg., g.V('id')). Then I monitored the incoming request, it shows around 10. But for sure I know i'm the only person accessed. I also tried traversing through the graph in a single select query the Request count is huge (around 100).
Do anybody noticed the metrics? We are assuming the request code is huge for an hour in production cause the performance slowness. Is the metric is trustworthy to believe or how to find the incoming request to the cosmos?

Azure Data Factory - CRM (OData) Connector

I have an Azure Data Factory for Data extraction from OnPremise CRM. I am running into an issue with one of the Data entities where the Pipeline runs for close to 8 hours and throws this below exception. I know it's not an issue with authentication as I am able to get the other entities without any issues. I tried to change the parallelCopies to 18 and DIUs but when I trigger the Pipeline it sticks to Parallel Copies of '1', DIUs of 4 and eventually fails. Appreciate any inputs.
Operation on target XXXX failed: Failure happened on 'Source' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Upload file failed at path XXXXXXX,Source=Microsoft.DataTransfer.Common,''Type=System.NotSupportedException,Message=The authentication endpoint Kerberos was not found on the configured Secure Token Service!,Source=Microsoft.Xrm.Sdk,'
I ran into something similar when using CRM as a sink; any upsert activities would fail very near exactly 60 minutes. The error I observed in the Azure Data Factory activity was:
'Type=System.NotSupportedException,Message=The authentication endpoint Kerberos was not found on the configured Secure Token Service!,Source=Microsoft.Xrm.Sdk,'
This post helped me find what to change in ADFS. I ran Get-ADFSRelyingPartyTrust and reviewed the TokenLifetime property, which happened to be 0. Apparently tokens last 60 minutes when the configuration is 0.
The following PowerShell increased the timeout, and I confirmed upsert activities no longer fail when exceeding 60 minutes.
Set-ADFSRelyingPartyTrust –TargetName "<RelyingPartyTrust>" –TokenLifetime <timeout in minutes>
It turned out to be a time out setting on the ADFS, once the time out is increased the job ran successfully.

How to find/cure source of function app throughput issues

I have an Azure function app triggered by an HttpRequest. The function app reads the request, tosses one copy of it into a storage table for safekeeping and sends another copy to a queue for further processing by another element of the system. I have a client running an ApacheBench test that reports approximately 148 requests per second processed. That rate of processing will not be enough for our expected load.
My understanding of function apps is that it should spawn as many instances as is needed to handle the load sent to it. But this function app might not be scaling out quickly enough as it’s only handling that 148 requests per second. I need it to handle at least 200 requests per second.
I’m not 100% sure the problem is on my end, though. In analyzing the performance of my function app I found a LOT of 429 errors. What I found online, particularly https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits, suggests that these errors could be due to too many requests being sent from a single IP. Would several ApacheBench 10K and 20K request load tests within a given day cause the 429 error?
However, if that’s not it, if the problem is with my function app, how can I force my function app to spawn more instances more quickly? I assume this is the way to get more throughput per second. But I’m still very new at working with function apps so if there is a different way, I would more than welcome your input.
Maybe the Premium app service plan that’s in public preview would handle more throughput? I’ve thought about switching over to that and running a quick test but am unsure if I’d be able to switch back?
Maybe EventHub is something I need to investigate? Is that something that might increase my apparent throughput by catching more requests and holding on to them until the function app could accept and process them?
Thanks in advance for any assistance you can give.
You dont provide much context of you app but this is few steps how you can improve
If you want more control you need to use App Service plan with always on to avoid cold start, also you will need to configure auto scaling since you are responsible in this plan and auto scale is not enabled by default in app service plan.
Your azure function must be fully async as you have external dependencies so you dont want to block thread while you are calling them.
Look on the limits. Using host.json you can tweek it.
429 error means that function is busy to process your request, so probably when you writing to table you are not using async and blocking thread
Function apps work very well and scale as it says. It could be because request coming from Single IP and Azure could be considering it DDOS. You can do the following
AzureDevOps Load Test
You can load test using one of the azure service . I am very sure they have better criteria of handling IPs. Azure DeveOps Load Test
Provision VM in Azure
The way i normally do is provision the VM (windows 10 pro) in azure and use JMeter to Load test. I have use this method to test and it works fine. You can provision couple of them and subdivide the load.
Use professional Load testing services
If possible you may use services like Loader.io . They use sophisticated algos to run the load test and provision bunch of VMs to run the same test.
Use Application Insights
If not already you must be using application insights to have a better look from server perspective. Go to live stream and see how many instance it would provision to handle the load test . You can easily look into events and error logs that may be arising and investigate. You can deep dive into each associated dependency and investigate the problem.

Is there a limit on the number of sessions for Azure Web SQL Database?

We are using the Azure SQL Database (Web Edition) for a MVC3 ASP.NET/EF5 application.
Is there a limit to the number of sessions that this SQL Database setup supports? I am just wondering whether any delays that we are getting is due to some form of queuing or pooling. Currently we have about 5 concurrent users.
Thanks.
The SQL Azure Web edition database should support a high number of concurrent users - we've had applications running that issue thousands of queries per minute against Web databases.
Throttling
SQL Azure does implement database throttling to maintain performance for all users of the platform. If throttling has been applied to the current operation you'll receive error 40501. The link I've provided also shows you how to determine why throttling is being applied. If you receive this error you can treat it as a transient error and wait before retrying.
It doesn't sound like your connections are being throttled, because you mention only 5 concurrent users and talk about delays, whereas the throttling error would occur pretty quickly.
Transient error handling
If you're getting connection timeouts etc you need to handle them as transient errors. Transient errors are timeouts or dropped connections, as well as error codes 10054, 10053, 40501 (throttling as described above) and 40197 (usually because an upgrade or failover operation is in progress).
You should ensure you implement retry logic to handle transient errors.
Query performance
If you're executing long running queries you can check which ones are slow by logging into the database management URL:
https://<database-id>.database.windows.net/#$database=<database-name>
Log in and click "Query Performance" - take a look at the longest running queries at the top.

Resources