Request Timeout in Azure Cosmos DB in sdk v3 - azure

I am inserting the data to azure cosmos db. In some time it throws an error (Request Timeout : 408). I have increased the Request Timeout to 10 mins.
Also, i have iterate each item from api and calling CreateItemAsync() method instead of bulk executor.
Data To Insert = 430 K Items
Microsoft.Azure.Cosmos SDK used = v3
Container Throughput = 400
Can anyone help me to fix this issue.

Just increase your throughput. But it's going to cost you a lot of money if you leave it increased. 400 RU/s isn't going to cut it unless you batch your operation to the point where it's going to take a long time to insert 400k items.
If this is a one-time deal, increase your RU/s to 2000+, then start slowly inserting items. I would say, depending on the size of your documents, maybe do 50 at a time, then wait 250 milliseconds, then do 50 more until you are done. You will have to play with this though.
Once you are done, move your RU/s back down to 400.
Cosmos DB can be ridiculously expensive, so be careful.
ETA:
This is from some documentation:
Increase throughput: The duration of your data migration depends on the amount of throughput you set up for an individual collection or a set of collections. Be sure to increase the throughput for larger data migrations. After you've completed the migration, decrease the throughput to save costs. For more information about increasing throughput in the Azure portal, see performance levels and pricing tiers in Azure Cosmos DB.

The documentation page for 408 timeouts lists a number of possible causes to investigate.
Aside from addressing the root cause with the SDK client app or increasing throughput, you might also consider leveraging Azure Data Factory to ingest the data as in this example. This assumes your data load is an initialization process and your data can be made available as a blob file.

Related

Cosmos DB metrics report 100x more requests than expected

I'm comparing the service side metrics of my app with the metrics emitted by Cosmos DB and I can see a 100x difference in request counts.
Is my container misconfigured? Am I querying the wrong way? Is Cosmos performing multiple requests internally for each query I'm running against it?
The metric I'm looking at in Cosmos is TotalRequests/Count/5min.
The container has indexes on all attributes + a few composite indexes.
The query I'm running is:
SELECT *
FROM x
WHERE x.partitionKey = 0
and x.index1 = 1
and x.index2 = 2
The container is suffering from a VERY hot partition.
Each request consumes about 5 RUs.
The consistency level is BOUNDED_STALENESS.
I tried changing the consistency level to EVENTUAL which brought the consumed RUs down, but I'm still seeing a huge amount of requests that aren't accounted for.
The Total Requests metric includes every request between the SDK and the service. The SDK makes frequent calls to the service when an SDK instance is first created, then makes regular calls for metadata and other information. If you want to see just requests made by user, apply a filter for OperationType and select the operations you want to monitor.
It's not clear why you were using Bounded Staleness. Reads using Strong and Bounded Staleness consume twice the RU/s because they read from 2 replicas rather than 1 replica for the other weaker consistency models. In addition to differences in cost, there are of course differences in whether you may read stale data or not. They also play a big role in your RTO and RPO in multi-region scenarios.
A hot partition does not have impact on throughput consumption. 5 RU/s for a query is actually very good.

Azure Cache Redis Slow Performance rather than DB query

We have implemented Azure Cache Redis for our project.
But the problem is Azure Cache query performance is slower than db query.
For a query while Redis response average is 115ms the db query average is 60ms.
For another query while Redis response average is 200ms db query average is 210ms.
I expected redis queries to return around 50ms for all requests.
Is this normal or are we missing any point.
Maybe speed is not the case all the time?
The performance of AzureRedis cache queries depends on various criteria:
First, you need to check out the source from where you are querying
the Redis cache. If the source and Redis cache resource are in
different regions, there may be a significant network latency.
The pricing tier of the Redis cache also plays a crucial role in the
performance.
Use the redis-benchmark.exe utility to check the throughput and
characteristics of your cache.
You can also consider the scaling link to improve the
performance of the Redis cache.
Possible reasons why query run time varies:
Different data volumes
Different system load
Different query plans
This also can depend on the type of query. In this case SQL database vs Redis.
A website tested this and achieved the following conclusion:
Especially if your request number is large, it's always better to use db and Redis together. Otherwise, both are nearly equal and change in their timings per query.
Related:
Why does the same query takes different amount of time to run?
Redis vs mySQL

Cosmos Write Returning 429 Error With Bulk Execution

We have a solution utilizing a micro-service approach. One of our micro-service is responsible for pushing data to Cosmos. Our Cosmos database is using serverless provision having a 5,000 RU/s limit.
The data we are inserting into Cosmos looks like the below. There are 10 columns and we are pushing a batch containing 5,807 rows of this data.
Id
CompKey
Primary Id
Secondary Id
Type
DateTime
Item
Volume
Price
Fee
1
Veg_Buy
csd2354csd
dfg564dsfg55
Buy
30/08/21
Leek
10
0.75
5.00
2
Veg_Buy
sdf15s1dfd
sdf31sdf654v
Buy
30/08/21
Corn
5
0.48
3.00
We are retrieving data from multiple sources, normalizing it, and sending out the data as one bulk execution to Cosmos. The retrieval process happens every hour. We understand that we are spiking the Cosmos database once per hour with the data that has been retrieved and then stop sending data until the next retrieval cycle. So if this high peak is the problem, what remedies exist for such a scenario?
Can anyone shed some light on what we should/need to do to overcome this issue? Perhaps we are missing a setting when creating the Cosmos database or possibly this has something to do with partitioning?
You can mostly determine these things by looking at the metrics published in the Azure Portal. This doc is a good place to start, Monitor and debug with insights in Azure Cosmos DB.
In particular I would look at the section titled, Determine the throughput consumption by a partition key range
If you are not dealing with a hot partition key you may want to look at options to throttle your writes. This may include modifying your batch size and putting the write operations on a while..loop with a one second timer until RU/s consumed equals 5000 RU/s. You could also possibly look at doing queue-based load leveling and put writes on a queue in front of Cosmos and stream them in.

What happen if Azure COSMOS container run out of RUs? [duplicate]

This question already has answers here:
How is cosmosDB RU throughput enforced
(2 answers)
Closed 2 years ago.
Example - 1000 RUs/hr and within hr 1000 RUs are eaten up, what happen to 1001 request?
Cosmos DB request units are per second not per hour.
So with 1000 RU/s you can run 1000 queries per second that each take 1 RU or you can run 100 queries per second that each take 10 RUs.
You might be confused because the billing is per hour based on the maximum amount of RUs you defined in that hour.
So if you set 1000 RUs on a collection and then change to 400 RUs, you still get billed for 1000 RUs for the current hour.
If you exceed that number, you get a 429 error back from Cosmos DB like the other answer states.
If you use the Cosmos DB SDK, you don't need to worry about this usually as they will automatically retry the query after some time if they get a 429.
This retry policy is configurable, so you can decide how many times retry should be attempted.
Getting some 429s is usually expected, otherwise you might be over-provisioning throughput in Cosmos DB.
That is a really in interesting question. Some information suggests if you hit the provisioned RU/s rate limit for any operation or query, a Cosmos DB service won't execute the operation and the API will throw a DocumentClientException exception with the HttpStatusCode property set to 429. This HTTP status code means that the request made to Azure Cosmos DB has exceeded the provisioned throughput and it couldn't be executed.
429 Too many requests
The collection has exceeded the provisioned throughput limit. Retry the request after the server specified retry after duration
Cosmos DB throttles your requests intelligently
When you’re exceeding your RU quota, Cosmos DB doesn’t reject your additional requests by just screaming ERROR! Not only does it explicitly flag these throttled requests with the HTTP status code 429, but the response also provides a very useful header: x-ms-retry-after-ms. As its name implies, this header tells you how much time you should wait before re-trying.
Although this hint has its own limits (it may not be very reliable if multiple clients overload your RU quota at the same time), it’s still a very useful information to have in order to define the cool-off period one should wait and avoid a retry policy that would be too aggressive.
Ref
https://learn.microsoft.com/en-us/rest/api/cosmos-db/http-status-codes-for-cosmosdb#:~:text=429%20Too%20many%20requests,more%20information%2C%20see%20request%20units.
https://medium.com/#thomasweiss_io/how-i-learned-to-stop-worrying-and-love-cosmos-dbs-request-units-92c68c62c938

Can a partitioned CosmosDB / DocumentDB collection have fewer than 400 RU/s of throughput configured?

Update: This question is now invalid as the events I'd thought happened didn't happen quite as I'd thought (see below for details). I'm leaving the question as-is though as the answers and comments may be useful to others.
I've created a collection via the Azure Portal, configured initially with:
Storage Capacity: Unlimited
Initial Throughput Capacity (RU/s): 2500
Partition Key: /PartitionKey
Then through the .NET SDK I've changed the Initial Throughput Capacity (RU/s) to 400.
According to the Scale & Settings tab for the collection in the Azure Portal the value of Throughput (400 - 10,000 RU/s)* is 400.
Is this a supported configuration? I'm assuming this is a bug somewhere but perhaps it isn't? What would I be charged for this collection?
As an aside...
The Add Collection screen doesn't allow me to set the Throughput to 400 on initial creation but it seems I can change it afterwards.
Update: I think I've worked out what happened. I manually created a partitioned collection, then I forgot that my code (an importer/migration tool I'm working on) deletes the database and recreates the database and collection on startup. When it does this, it's created as a non-partitioned collection. Now that I've corrected this, I get the error "The offer should have valid throughput values between 2500 and 100000 inclusive in increments of 100." if I try to reproduce what I thought I'd managed to do before.
You're not seeing a bug. You're attempting to set an RU range on a partitioned collection.
Single-partition collections (10GB) allow for 400-10000 RU.
What you're showing in your question is a partitioned collection, with scale starting at 2500 RU.
And you cannot configure a partitioned collection for 400 RU, whether through the portal or through API/SDK.

Resources