We are getting "Operation was cancelled" exception while Azure Indexer is running for larger records (around 2M+). Here are the log details -
"The operation was canceled. Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request. The I/O operation has been aborted because of either a thread exit or an application request "
We are running the indexer under thread. It is working for smaller records but for larger records (1M+), it is throwing Socket Exception.
Does anyone saw this error while running Azure Indexer for larger records (running for long time)?
(we have already increase httpclient timeout to maximum value for serviceClient object.)
This could happen because of happen because of excess http connections. Try to make your **HttpClient** static and see if anything improves. **HttpClient** timeout to maximum value is required to execute with maximum records.
You may also want to consider working to reduce your sql query time for best indexer performance. Also please share you code if possible.
Hope it helps.
Try set SearchServiceClient.HttpClient.Timeout to Timeout.InfiniteTimeSpan. You have to set the timeout before you send any request to Azure Cognitive Search.
client.HttpClient.Timeout = Timeout.InfiniteTimeSpan;
Related
I've implemented bulk deletion as recommended with newer SDK. Created a list of tasks to delete each item and then awaited them all. And my CosmosClient was configured with BulkOperations = true. As I understand, it's implied that under the hood new SDK does its magic and performs bulk operation.
Unfortunatelly, I've encountered 429 response status. Meaning my multiple requests hit request rate limit (it is low, development only tier, but nontheless). I wonder, how single bulk operation might cause 429 error. And how to implement bulk deletion in not "per item" fashion.
UPDATE: I use Azure Cosmos DB .NET SDK v3 for SQL API with bulk operations support as described in this article https://devblogs.microsoft.com/cosmosdb/introducing-bulk-support-in-the-net-sdk/
You need to handle 429s for deletes the way you'd handle for any operation by creating an exception block, trapping for the status code, then checking the retry-after value in the header, then sleeping and retrying after that amount of time.
PS if you're trying to delete all the data in the container, it can be more efficient to delete then recreate the container.
I have the following code where I start getting an error during long-running tests on the same Service Bus Client.
ServiceBusMessageBatch batch = this._serviceBusSender.CreateMessageBatchAsync().GetAwaiter().GetResult();
The error is,
Azure.Messaging.ServiceBus.ServiceBusException: 'The operation did not complete within the allocated time 00:01:00 for object request42. (ServiceTimeout)'
Why is this statement throwing this error? Is the creation of a batch object such a heavy operation that it can even timeout? If this is the case, should I switch to the overload of using the List of ServiceBusMessage instead of this batch mode?
My understanding is that this way of batch creation can protect me from creating a batch that the queue may not allow. I am finding it difficult to understand why it times out after 1 min
.
In order for a batch to be able to enforce limits on the size, it has to establish an AMQP link to the entity that you'll be sending to and read the maximum allowable message size from the service. This results in a network operation that, in this case, timed out. This overhead is performed only in the case that there is not an existing AMQP link already established - typically on the first call that requires a network operation.
What jumps out at me from your code is the use of GetAwaiter().GetResult() to perform sync-over-async. This is really not a good idea and is very likely to cause contention in the thread pool that prevents continuations from being scheduled in a timely manner. Because network operations in Service Bus are asynchronous - including establishing the AMQP link - delays in scheduling continuations would certainly increase the chance of timeouts.
I'd strongly advise refactoring your sync-over-async code paths and shifting to an asynchronous approach. In those scenarios where it's not possible to go full async, limiting sync-over-async to the outermost layer of your code would be the next best thing.
So I am getting an error in Azure Data Factory that I haven't been able to find any information about. I am running a data flow and eventually (after an hour or so) get this error
{"StatusCode":"DFExecutorUserError","Message":"Job failed due to
reason: The service has encountered an error processing your request.
Please try again. Error code 1204.","Details":"The service has
encountered an error processing your request. Please try again. Error
code 1204."}
Troubleshooting I have already done :
I have successfully ran the data flow using the sample option. Did this with 1 million rows.
I am processing 3 years of data and I have successfully processed all the data by filter the data by year and running the data flow once for each year.
So I think I have shown the data isn't the problem, because I have processed all of it by breaking it down into 3 runs.
I haven't found a pattern in the time the pipeline runs for before the error occurs that would indicate I am hitting any timeout value.
The source and sink for this data flow are both an Azure SQL Server database.
Does anyone have any thoughts? Any suggestions for getting a more verbose error out of data factory (I already have the pipeline set with verbose logging).
We are glad to hear that you has found the cause:
"I opened a Microsoft support ticket and they are saying it is a
database transient caused failure."
I think the error will be resolved automatically. I post this as answer and this can be beneficial to other community members. Thank you.
Update:
The most important thing is that you have resolved it by increase the vCorces in the end.
"The only thing they gave me was their BS article on handling
transient errors. Maybe I’m just old but a database that cannot
maintain connections to it is not very useful. What I’ve done to
workaround this is increase my vCores. This sql database was a
serverless one. While performance didn’t look bad my guess is the
database must be doing some sort of resize in the background to
handle the hour long data builds I need it to do. I had already tried
setting the min/max vCores to be the same. The connection errors
disappeared when I increased the vCores count to 6."
CREATE DATABASE {0}
AS COPY OF {1} ( SERVICE_OBJECTIVE = 'S2' )
Execution timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
CREATE DATABASE AS Copy of operation failed. Internal service error.
If setting a higher connection timeout via the connectionstring doesn't work, you might want to check out the Command Timout setting on the SqlCommand.
You can also set this with any of the ORM-frameworks available, though the property is probably named something different.
You have a timeout exception, which indicates the time to complete the command is longer as your timeout. Have a look at the connectionstring to see the connection timeout. Change it to a larger value.
Depending on what takes time, you can create the db as a larger size (S3) and than scale it down afterwards. Check if the DTU usage is at 100% while creating the db.
We are using the Azure SQL Database (Web Edition) for a MVC3 ASP.NET/EF5 application.
Is there a limit to the number of sessions that this SQL Database setup supports? I am just wondering whether any delays that we are getting is due to some form of queuing or pooling. Currently we have about 5 concurrent users.
Thanks.
The SQL Azure Web edition database should support a high number of concurrent users - we've had applications running that issue thousands of queries per minute against Web databases.
Throttling
SQL Azure does implement database throttling to maintain performance for all users of the platform. If throttling has been applied to the current operation you'll receive error 40501. The link I've provided also shows you how to determine why throttling is being applied. If you receive this error you can treat it as a transient error and wait before retrying.
It doesn't sound like your connections are being throttled, because you mention only 5 concurrent users and talk about delays, whereas the throttling error would occur pretty quickly.
Transient error handling
If you're getting connection timeouts etc you need to handle them as transient errors. Transient errors are timeouts or dropped connections, as well as error codes 10054, 10053, 40501 (throttling as described above) and 40197 (usually because an upgrade or failover operation is in progress).
You should ensure you implement retry logic to handle transient errors.
Query performance
If you're executing long running queries you can check which ones are slow by logging into the database management URL:
https://<database-id>.database.windows.net/#$database=<database-name>
Log in and click "Query Performance" - take a look at the longest running queries at the top.