Dealing with Azure Cosmos DB cross-partition queries in REST API - azure

I'm talking to Cosmos DB via the (SQL) REST API, so existing questions that refer to various SDKs are of limited use.
When I run a simple query on a partitioned container, like
select value count(1) from foo
I run into a HTTP 400 error:
The provided cross partition query can not be directly served by the gateway. This is a first chance (internal) exception that all newer clients will know how to handle gracefully. This exception is traced, but unless you see it bubble up as an exception (which only
happens on older SDK clients), then you can safely ignore this message.
How can I get rid of this error? Is it a matter of running separate queries by partition key? If so, would I have to keep track of what the existing key values are?


Maintain a distributed incremental counter in Azure cosmos DB

I am fairly new to cosmos DB and was trying to understand the increment operation that azure cosmos DB SDK provides for Java for patching a document.
I have a requirement to maintain an incremental counter in one of the Documents in the container. The document looks like this-
{"counter": 1}
Now from my application I want to increment this counter by a value of 1 every time an action happens. For this I am using CosmosPatchOperations. I add an increment here like this cosmosPatch.increment("/counter", 1) which works fine.
Now this application can have multiple instances running, all of them talking to same document in the cosmos container. So App1 and App2 both could trigger an increment at the same time. The SDK method returns the updated document and I need to use that updated value.
My question here would be that does cosmos DB here employ some locking mechanism to make sure both the patches happen one after another and also in this case what would be the updated value that I would get in App1 and App2 (The SDK method returns the updated document). Will it be 2 in one of them and 3 in the other one?
Couchbase supports such a counter at cluster level as explained here and it has been working perfectly for me without any concurrency issues. I am now migrating to cosmos Db and have been struggling to find how can this be achieved.
Update 1:
I decided to test this. I set up the cosmos emulator in my local mac and created a DB and container with automatically increasing RUs starting from 1 to 10K. Then in this container I added a document like this -
"id": "randomId",
"counter": 0
Post this I created a simple API whose responsibility is just to increment the counter by 1 every-time it is invoked. Then I used locust to invoke this API multiple times to mimic a small load-like scenario.
Initially the test ran fine with each invocation receiving a counter like it is supposed to (in an incremental manner). On increasing the load I saw some errors namely RequestTimeOutException with status code 408. Other requests were still working fine with them getting the correct counter value. I do not understand what caused RequestTimeOut exceptions here. The stack trace hints something to do with concurrency but I am not able to get my head around it. Here's the stack trace-
Update 2:
The test run in Update 1 was done on my local machine and I realised I might have resource issues on my local leading to those errors. Decided to test this in a Pre-Prod environment with actual cosmos DB and not emulator.
Test configuration-
Cosmos DB container with RUs to automatically scale from 400 to 4000
2 instances of application sharing the load.
Locust script to ingest load on the application
Up until ~170 TPS, everything was running smoothly. Beyond that I noticed errors belonging to 2 different buckets-
"exception": "["Request rate is large. More Request Units may be needed, so no changes were made. Please retry this request later. Learn more:"]".
I am not sure how 170 odd patch operations would have exhausted 4000 RUs but that's a different discussion altogether.
"exception": "["Conflicting request to resource has been attempted. Retry to avoid conflicts."]", with status code 449.
This error clearly indicates that cosmos DB doesn't handle concurrent requests. I want to understand if they maintain a queue internally to handle some requests or they don't handle any concurrent writes at all.
PATCH is not different from other operations, Fundamentally CosmosDB implements Optimistic Concurrency Control unlike the relational databases which have these mechanisms. Optimistic Concurrency Control (OCC) allows you to prevent lost updates and to keep your data correct. OCC can be implemented by using the etag of a document. T Each document within Azure Cosmos DB has an E_TAG property.
In your scenario, yes it will return 2 in one of them and 3 in other one given both get succeeded, because SDK has the retry mechanism and it's explained here. Also have a look at this sample.
If your Azure Cosmos DB account is configured with multiple write
regions, conflicts and conflict resolution policies are applicable at
the document level, with Last Write Wins (LWW) being the default
conflict resolution policy

Bulk delete in Azure CosmosDB leads to 429 error

I've implemented bulk deletion as recommended with newer SDK. Created a list of tasks to delete each item and then awaited them all. And my CosmosClient was configured with BulkOperations = true. As I understand, it's implied that under the hood new SDK does its magic and performs bulk operation.
Unfortunatelly, I've encountered 429 response status. Meaning my multiple requests hit request rate limit (it is low, development only tier, but nontheless). I wonder, how single bulk operation might cause 429 error. And how to implement bulk deletion in not "per item" fashion.
UPDATE: I use Azure Cosmos DB .NET SDK v3 for SQL API with bulk operations support as described in this article
You need to handle 429s for deletes the way you'd handle for any operation by creating an exception block, trapping for the status code, then checking the retry-after value in the header, then sleeping and retrying after that amount of time.
PS if you're trying to delete all the data in the container, it can be more efficient to delete then recreate the container.

CreateContainerIfNotExistsAsync is slower than GetContainer?

I am using Azure cosmosDB SDK v3.As you know the SDK supports CreateContainerIfNotExistsAsync which creates a container if there is no container matching provided container id. This is convenient.
But it pings Cosmos DB to know container exists or not whereas GetContainer doesn't as GetContainer assumes container exists. So CreateContainerIfNotExistsAsync would need one more round trip to Cosmos DB for most of operations if my understanding is correct.
So my questions is would it better to avoid using CreateContainerIfNotExistsAsync as much as possible in terms of API perspective? Api can have better latency and save bandwidth.
The different is explained in the Intellisense, GetContainer just returns a proxy object, one that simply gives you the ability to execute operations within that container, it performs no network requests. If, for example, you try to read an Item (ReadItemAsync) on that proxy and the container does not exist (which also makes the item non-existent) you will get a 404 response.
CreateContainerIfNotExists is also not recommended for hot path operations as it involves a metadata or management plane operation:
Retrieve the names of your databases and containers from configuration or cache them on start. Calls like ReadDatabaseAsync or ReadDocumentCollectionAsync and CreateDatabaseQuery or CreateDocumentCollectionQuery will result in metadata calls to the service, which consume from the system-reserved RU limit. CreateIfNotExist should also only be used once for setting up the database. Overall, these operations should be performed infrequently.
See for more details
Bottomline: Unless you expect the container to be deleted due to some logical pathway in your application, GetContainer is the right way, it gives you a proxy object that you can use to execute Item operations without any network requests.

How to avoid database from being hit hard when API is getting bursted?

I have an API which allows other microservices to call on to check whether a particular product exists in the inventory. The API takes in only one parameter which is the ID of the product.
The API is served through API Gateway in Lambda and it simply queries against a Postgres RDS to check for the product ID. If it finds the product, it returns the information about the product in the response. If it doesn't, it just returns an empty response. The SQL is basically this:
SELECT * FROM inventory where expired = false and product_id = request.productId;
However, the problem is that many services are calling this particular API very heavily to check the existence of products. Not only that, the calls often come in bursts. I assume those services loop through a list of product IDs and check for their existence individually, hence the burst.
The number of concurrent calls on the API has resulted in it making many queries to the database. The rate can burst beyond 30 queries per sec and there can be a few hundred thousands of requests to fulfil. The queries are mostly the same, except for the product ID in the where clause. The column has been indexed and it takes an average of only 5-8ms to complete. Still, the connection to the database occasionally time out when the rate gets too high.
I'm using Sequelize as my ORM and the error I get when it time out is SequelizeConnectionAcquireTimeoutError. There is a good chance that the burst rate was too high and it max'ed out the pool too.
Some options I have considered:
Using a cache layer. But I have noticed that, most
of the time, 90% of the product IDs in the requests are not repeated.
This would mean that 90% of the time, it would be a cache miss and it
will still query against the database.
Auto scale up the database. But because the calls are bursty and I don't
know when they may come, the autoscaling won't complete in time to
avoid the time out. Moreover, the query is a very simple select statement and the CPU of the RDS instance hardly crosses 80% during the bursts. So I doubt scaling it would do much too.
What other techniques can I do to avoid the database from being hit hard when the API is getting burst calls which are mostly unique and difficult to cache?
Use cache in the boot time
You can load all necessary columns into an in-memory data storage (redis). Every update in database (cron job) will affect cached data.
Problems: memory overhead of updating cache
Limit db calls
Create a buffer for ids. Store n ids and then make one query for all of them. Or empty the buffer every m seconds!
Problems: client response time extra process for query result
Change your database
Use NoSql database for these data. According to this article and this one, I think choosing NoSql database is a better idea.
Problems: multiple data stores
Start with a covering index to handle your query. You might create an index like this for your table:
CREATE INDEX inv_lkup ON inventory (product_id, expired) INCLUDE (col, col, col);
Mention all the columns in your SELECT in the index, either in the main list of indexed columns or in the INCLUDE clause. Then the DBMS can satisfy your query completely from the index. It's faster.
You could start using AWS lambda throttling to handle this problem. But, for that to work the consumers of your API will need to retry when they get 429 responses. That might be super-inconvenient.
Sorry to say, you may need to stop using lambda. Ordinary web servers have good stuff in them to manage burst workload.
They have an incoming connection (TCP/IP listen) queue. Each new request coming in lands in that queue, where it waits until the server software accept the connection. When the server is busy requests wait in that queue. When there's a high load the requests wait for a bit longer in that queue. In nodejs's case, if you use clustering there's just one of these incoming connection queues, and all the processes in the cluster use it.
The server software you run (to handle your API) has a pool of connections to your DBMS. That pool has a maximum number of connections it it. As your server software handles each request, it awaits a connection from the pool. If no connection is immediately available the request-handling pauses until one is available, then handles it. This too smooths out the requests to the DBMS. (Be aware that each process in a nodejs cluster has its own pool.)
Paradoxically, a smaller DBMS connection pool can improve overall performance, by avoiding too many concurrent SELECTs (or other queries) on the DBMS.
This kind of server configuration can be scaled out: a load balancer will do. So will a server with more cores and more nodejs cluster processes. An elastic load balancer can also add new server VMs when necessary.

Load balancing Or Read replica in AWS RDS

The MVC application is in EC2 and interacting with RDS ( SQL Server). The application is sending Bulk GET request (API call) to RDS via NHibernate to get the items. The application performance is very slow as sometimes it’s making around 500 numbers of GET API call to get 500 items from the DB ( note - getting items from DB has its own stored procedure/ Logic)
I was referring this to understand scaling RDS and
However, didn’t get much clue that support my business scenario.
My questions are(considering above scenario):
Is there any way to distribute my GET request to RDS (SQL Server) so that it can return the 500 items from SQL server quickly?
Is it possible to achieve this without any code or existing architecture change ( both from .net an SQL end)?
What are the different ways I should tryout to make this performance better?
What are the pricing details for Read replica?
Note: The application does both read and write. And, I’m more concern about this particular GET API calls.
Is there any way to distribute my GET request to RDS (SQL Server) so that it can return the 500 items from SQL server quickly?
You will need to have a router in your application that will route the request to the read replicas(can be many).
You can provision a read replica with different instance type with enhanced capacity for that use-case.
You can try memory cache, it can reduce response time and can off load read work load to the read replicas.
Is it possible to achieve this without any code or existing architecture change ( both from .net an SQL end)?
Based on the documentation "applications can connect to a read replica just as they would to any DB instance." which means your application requires additional modification to support the use-case.
What are the different ways I should tryout to make this performance better?
memory cache and instance type with enhanced capacity for reads(the same suggestion above)
What are the pricing details for Read replica?
It will depends on the instance type that you provision.
