Is it possible to cancel an Azure table query?
We have cases where we are making a long running query (can take 30-60 seconds), but the object gets disposed and needs to abort the query before it completes.
We are using TableServicesContext, and ExecuteQuery (synchronously). We can consider async as well if the solution requires it.
First of all, I doubt a table service query may last longer than 30 seconds. Check out this documentation on Query Timeouts and Pagination.
Also, the Windows Azure Storage Services SLA guarantees that the maximum response time for a Table Service (which is for batch operation) is 30 seconds. And operations on single entities shall complete within 2 seconds.
If yet, you still having issues, your solution is to use BeginExecute method which will give you back an IAsyncResult object. You can have your own timer and call CancelRequest with the given IAsyncResult upon your own logic.
By now, if you followed all my links, you might have noticed that BeginExecute and CancelRequest are methods of DataServiceContext calss. That's why they are not complete in the documentation for TableSeriveContext. But since TableServiceContext inherits directly DataServiceContext, these methods are availabe in your TableServiceContext also.
You may also want to take a look at How to: Execute Asynchronous Data Service Queries
Hope this helps!
Related
I am facing timeout issues on a firebase https function so I decided to optimize each line of code and realized that a single query is taking about 10 seconds to complete.
let querySnapshot = await admin.firestore()
.collection("enrollment")
.get()
The enrollment collection has about 23k documents, totaling approximately 6MB.
To my understanding, since the https function is running on a cloud function stateless server, it should not suffer from the query result size. Both Firestore and Cloud Functions are running on the same region (us-central). Yet 10 seconds is indeed a high interval of time for executing such a simple query that results in a small snapshot size.
An interesting fact is that later in the code I update those 23k documents back with a new field using Bulk Writter and it takes less than 3 seconds to run bulkWriter.commit().
Another fact is that the https function is not returning any of the query results to the client, so there shouldn't be any "downloading" time affecting the function performance.
Why on earth does it take 3x longer to read values from a collection than writing to it? I always thought Firestore architecture was meant for apps with high reading rates rather than writing.
Is there anything you would propose to optimize this?
When we perform the get(), a query is created to all document snapshots and the results are returned. These results are fetched sequentially within a single execution, i.e. the list is returned and parsed sequentially until all documents have been listed.
While the data may be small, are there any subcollections? This may add some additional latency as the API fetches and parses subcollections.
Updating the fields with a bulk writer update is over 3x the speed because the bulkwriter operation is performed in parallel and is queued based upon Promises. This allows many more operations per second.
The best way to optimize listing all documents is summarised in this link, and Google’s recommendation follows the same guideline being to use an index for faster queries and to use multiple readers that fetch the documents in parallel.
Can documentDb stored procedures run in parallel and update the same object? Will documentDb process them sequentially?
Consider the following scenario.
I have an app and I have 10000 coins to give away to my users when they complete a task. And I have the following object
{
remainingPoints: 10000
}
I have a stored procedure that subtracts 10 points from this object and adds them to the users' points.
Now lets say 10 users complete the task at the same time and I call the stored procedure 10 times at the same time, will DocDb execute them sequentially? Or will I have to execute the stored procedures sequentially?
I had similar questions when I first started using DocumentDB and got good answers here and in email from the DocumentDB product managers. Quoting:
Stored procedures ... get an isolated snapshot of the database for transactional support. The snapshot reflects the current state of the world (no stale data) at the time the sproc begins execution (strongly consistent).
Caveat – since stored procedures are operating on a snapshot, you can still get a stale read in a sproc if a new write come in from the outside world during execution.
Also, stored procedures will ALWAYS read their owns writes.
Sprocs are DocumentDB’s mechanism for multi-document transactions. Sproc writes are committed when a sproc successfully complete execution. If an exception is thrown, all work done in a sproc gets rolled back.
So if two are sprocs are running concurrently, they won’t see eachother’s writes.
If both sprocs happen to write to the same document (replace) – then the 2nd one will fail due to an etag mismatch when it attempts to commit writes.
From that, I went forward with my design making sure to use ETags in my writes as #Julian suggests. I also automatically retry up to 3 times each sproc execution to handle the case where they fail due to parallel operations among other reasons. In practice, I've never exceed the 3 retries (except in cases where my sproc had a bug) and I rarely even get a single retry.
I assume from the behavior that I observe that it sends each new sproc execution to a different replica until it runs out of replicas and then it queues them for sequential execution, so it's a hybrid of parallel and serial execution.
One other tip that I learned through experimentation is that you are better off doing pure read operations (no writes and no significant aggregation) client-side rather than in a sproc when you are on a heavily loaded system. I assume the advantage is because DocumentDB can satisfy different reads from different replicas in parallel. I have modularized my sproc code using the expandScript functionality of documentdb-utils to make sure that I use the exact same code for write validation, intra-document consistency, and derived fields both client-side and server-side, which is possible using node.js. Even if you are mostly .NET, you may want to use expandScripts to build your sprocs in a modular DRY way. You'll still need to run node.js in your build process to pre-process your sprocs or use Edge.NET (node running inside of .NET) to do so on the fly.
It will depend on the consistency you have choose for your collection. But the idea is that DocumentDb handle concurrency using etag and executes stored procedure on a snapshot of a document version, and commit the result only if the execution succeed.
See: https://azure.microsoft.com/en-us/documentation/articles/documentdb-faq/#develop
This thread may help too: Atomically increment an integer in a document in Azure DocumentDB
I have the following setup:
Azure function in Python 3.6 is processing some entities utilizing
the TableService class from the Python Table API (new one for
CosmosDB) to check whether the currently processed entity is already
in a storage table. This is done by invoking
TableService.get_entity.
get_entity method throws exception each
time it does not find an entity with the given rowid (same as the
entity id) and partition key.
If no entity with this id is found then i call insert_entity to add it to the table.
With this i am trying to implement processing only of entities that haven't been processed before (not logged in the table).
I observe however consistent behaviour of the function simply freezing after exactly 10 exceptions, where it halts execution on the 10th invocation and does not continue processing for another minute or 2!
I even changed the implementation of instead doing a lookup first to simply call insert_entity and let it fail when a duplicate row key is added. Surprisingly enough the behaviour there is exactly the same - on the 10th duplicate invocation the execution freezes and continues after a while.
What is the cause of such behaviour? Is this some sort of protection mechanism on the storage account that kicks in? Or is it something to do with the Python client? To me it looks very much something by design.
I wasnt able to find any documentation or settings page on the portal for influencing such behaviour.
I am wondering of it is possible to implement such logic with using table storage? I don't seem to find it justifiable to spin up a SQL Azure db or Cosmos DB instance for such trivial functionality of checking whether an entity does not exist in a table.
Thanks
I have 100 rows in an Azure table storage. But later I can add more rows or set "disabled" property on any row in the table.
I have an Azure function - "XProcessor". I would like to have an Azure function "HostFunction" which would start a new instance of the "XProcessor" for each row from the Azure table storage.
The "HostFunction" should be able to pass details of a table row to the instance of the "XProcessor" and the "HostFunction" needs to be executed every minute.
How do I achieve this? I am looking into the Azure Logic app but not sure yet how to orchestrate "XProcessor" with the details.
I would look in to using a combination of "Durable Functions" techniques.
Eternal Orchestration - will allow you to enable your process to run, wait a set period of time after completion, and then run again.
From the docs: Eternal orchestrations are orchestrator functions that never end. They are useful when you want to use Durable Functions for aggregators and any scenario that requires an infinite loop.
Fan-in-fan-out - will allow you to call a separate function per row.
From the docs: Fan-out/fan-in refers to the pattern of executing multiple functions in parallel, and then waiting for all to finish. Often some aggregation work is done on results returned from the functions.
There is a bit of additional overhead getting going with durable functions but it gives you fine grained control over your execution. Keep in mind that state of objects is serialised in durable functions at every await call, so thousands of rows would potentially be an issue, but for the scenario you describe it will work well and I have had a lot of success with it.
Good luck!
Durable functions is your go to option here. what you can do is
1) Have a controller function called the orchestration function
2) Another child function which can be invoked multiple times by the orchestration
And in your orchestrate function, you wait till all the child instances give you the response back, hence you can get a fan-out- fan in scenario.
Have a look at following links
https://blog.mexia.com.au/tag/azure-durable-functions and https://learn.microsoft.com/en-us/azure/azure-functions/durable-functions-overview
I am new to the documentDb. I wrote a stored procedure that checks all records and update them under certain circumstances.
Current scenario:
It would run 100 records at a time, updates them and after running few times( taking 100 records at a time and updating) it is timing out.
Expectation
Run the script on all the records without timing out.
The document has close to a million records. So, running the same script multiple times manually is not a the way I am looking for.
Can anyone please advise me how I can achieve that?
tl;dr; Keep calling the sproc with the query continuation token being passed back and forth.
A few thoughts:
There is no capacity of RUs for collections that will allow you to do all million in one call to the sproc.
Sprocs run in isolation on a single replica. This means that they can be transactional but their use will have lower throughput than a regular query that can use all replicas to satisfy the request, so unless you need it to be in a sproc, I recommend using direct queries for reads that don't need to be transactional with writes. Even then, with a million documents, your queries will max out and you'll have to run the query again with a continuation token.
If you must use a sproc... As you are probably aware since you have done the 100 at a time thing, each query returns a continuation token. You can actually add that to the package that you send back from your sproc when it times out. Then you can pass that back into another call to the same sproc and write your sproc to pick up where you left off. The documentdb-utils library for node.js automatically re-calls the sproc until done as long as you follow this pattern for writing your sprocs. If you are using node.js, you could use that (but it has not yet been upgraded to support partitioned collections) or you could write the equivalent in whatever platform you are using.