We are planning in writing 10000 JSON Documents to Azure Cosmos DB (MongoDB), Does the Throughput Units matter, if so, can we increase for the batch load and set it back to low number
Yes you can do that. The lowest the RUs can be is 400. Scale up before you're about to do your insert and then turn it down again. As always, that part can be automated if you know when the documents are going to be inserted.
Check out the DocumentClient documentation and more specifically ReplaceOfferAsync.
You can scale the RU/sec allocation up or down at any time. You'll want to look at your insertion cost (RU cost is returned in a header) for a typical document, to get an idea of how many documents you might be able to write, per second, before getting throttled.
Also keep in mind: if you scale your RU out beyond what an underlying physical partition can provide, Cosmos DB will scale out your collection to have additional physical partitions. This means you might not be able to scale your RU back down to the bare minimum later (though you will be able to scale down).
Related
I'm trying to figure out why my CreateItem<T> returns a request charge of 7.05 RU for inserting a simple object with 5 properties, less then 1KB including the id and partition key. Its just a default container, with default indexing etc.
The documentation says it should be 5 RU. I've tried to use the ItemRequestOption to disable returning the object, disable indexing for that item etc. but it stays at 7.05 RU. I've also tried changing the consistency level of the account but nothing changes.
A ReadItem on id and partition key returns with a RU charge of 1.0 as expected and a RU of 2.9 if run as a query.
I'm a little annoyed at the 7.05 RU cost for the CreateItem since its a 41% increase in RU cost over the 5 RUs. Enabling TTL jumped the RU cost to 7.65.
What is the best way to diagnose these kind of issues? I've tried looking at the response diagnostics but its not really helpful.
RU consumption on write operations will be heavily impacted by your indexing policy.
By default CosmosDB container will have indexing enabled for the whole document (each property). If you replace it with specific paths which are used heavily in queries, it can significantly improve performance of write operations as well as reduce consumption costs.
Also please note that there is no guarantee in RU consumption per document in query: documentation. It will vary based on index, partition keys, cardinality of partitions, geo-replication, "fan-out" during execution etc.
I'm a beginner to Azure. I'm using log monitors to view the logs for a Cosmos DB resource. I could see one log with Replace operation which is consuming a lot of average RUs.
Generally operation names should be CREATE/DELETE/UPDATE/READ . But why REPLACE operation has come in place over here - I could not understand it. And why the REPLACE operation is consuming lot of RUs?
What can I try next?
Updates in Cosmos are full replacement operations rather than in-place updates, as such these consume more RU/s than inserts. Also, the larger the document, the more throughput required for the update.
Strategies to optimize throughput consumption on update operations typically center around splitting documents into two with properties that don't change going into one document that is typically larger and another document with those properties that change frequently going into an other that is smaller. This will allow for the updates to be made on a smaller document which reduces RU/s consumed to do the operation.
All that said, 12 RU/s is not an inordinate amount of RU/s for a replace operation. I don't think you will get much, if any throughput reduction doing this. But you can certainly try.
I've been thinking about scaling my collection up before initiating a large write or bulk import.
However I am struck that I don't know how many RU's it costs to perform the scaling operation. It's possible that it could cost more to scale up, execute, and back down than it would be to just leave it at a constant level.
Naturally, there are concerns around how long between writes, how long the process takes, etc, but I can't really approach the question without knowing the cost of scaling. I'm curious if anyone has a policy or rule-of-thumb they use to control this.
The cost of the scaling operation itself is the same as the cost of updating any other Resource in CosmosDB.
What you need to know is that from the Database to the Document everything inherits from a single type. The Resource.
What you are talking about is updating an Offer which is the Resource that holds the collection's offer data, such as the throughput. Updating the offer costs the same as updating any other document of that size (which would be something around the 5-10 RUs).
Keep in mind however that CosmosDB charges you on an hourly basis based on the maximum provisioned throughput of the collection for that hour. This means that even if you upscale and instantly downscale the throughput, you will still be charged for one hour worth of that maximum throughput.
Is there a way to calculate how many RUs I would need if the a documentdb database is expected to have roughly 800 writes a second and 1500 reads a second?
Each read is a simple retrieve based on the index, and each item will have about 15 small data fields (a few bools, short strings, and short doubles).
Each write will be an update of most of the data values for the record.
The documentations states 1 RU = 1kb GET, well each GET in this instance should be less than 1kb I would suspect so the reads would be about 1500 RU/s but I have no idea how to calculate the writes; any help would be greatly appreciated.
There's a simple to use capacity planning tool available online. You can simply upload a sample JSON document and then specify how many reads and writes per second you expect and it will estimate your required RU/s throughput.
As David so eloquently pointed out, this should only be used as a starting point to give you a ballpark of what your minimum RU cost might be. If your primary read pattern was simply retrieving documents directly by their Id then it might be relatively accurate. In reality, RU is calculated based on the complexity of your queries. So once you have your baseline it's important to do proper analysis of your query patterns and get a feel for their RU cost.
Luckily, the ease and speed with which you can scale Cosmos in response to load is one of it's most compelling features in my opinion. In my experience, adding or removing RU throughput is done within a matter of seconds so you can definitely add a layer of intelligent database tuning within your application to optimize your cost and usage.
We migrated our mobile app (still being developed) from Parse to Azure. Everything is running, but the price of DocumentDB is so high that we can't continue with Azure without fix that. Probably we're doing something wrong.
1) Price seams to have a bottleneck in the DocumentDB requests.
Running a process to load the data (about 0.5 million documents), memory and CPU was ok, but the DocumentDB request limit was a bottleneck, and the price charged was very high.
2) Even after the end of this data migration (few days of processing), azure continue to charge us every day.
We can't understand what is going on here. The graphic for use are flat, but the price is still climbing, as you can see in the imagens.
Any ideas?
Thanks!
From your screenshots, you have 15 collections under the Parse database. With Parse: Aside from the system classes, each of your user-defined classes gets stored in its own collection. And given that each (non-partitioned) collection has a starting run-rate of ~$24/month (for an S1 collection), you can see where the baseline cost would be for 15 collections (around $360).
You're paying for reserved storage and RU capacity. Regardless of RU utilization, you pay whatever the cost is for that capacity (e.g. S2 runs around $50/month / collection, even if you don't execute a single query). Similar to spinning up a VM of a certain CPU capacity and then running nothing on it.
The default throughput setting for the parse collections is set to 1000 RUPS. This will cost $60 per collection (at the rate of $6 per 100 RUPS). Once you finish the parse migration, the throughput can be lowered if you believe the workload decreased. This will reduce the charge.
To learn how to do this, take a look at https://azure.microsoft.com/en-us/documentation/articles/documentdb-performance-levels/ (Changing the throughput of a Collection).
The key thing to note is that DocumentDB delivers predictable performance by reserving resources to satisfy your application's throughput needs. Because application load and access patterns change over time, DocumentDB allows you to easily increase or decrease the amount of reserved throughput available to your application.
Azure is a "pay-for-what-you-use" model, especially around resources like DocumentDB and SQL Database where you pay for the level of performance required along with required storage space. So if your requirements are that all queries/transactions have sub-second response times, you may pay more to get that performance guarantee (ignoring optimizations, etc.)
One thing I would seriously look into is the DocumentDB Cost Estimation tool; this allows you to get estimates of throughput costs based upon transaction types based on sample JSON documents you provide:
So in this example, I have an 8KB JSON document, where I expect to store 500K of them (to get an approx. storage cost) and specifying I need throughput to create 100 documents/sec, read 10/sec, and update 100/sec (I used the same document as an example of what the update will look like).
NOTE this needs to be done PER DOCUMENT -- if you're storing documents that do not necessarily conform to a given "schema" or structure in the same collection, then you'll need to repeat this process for EVERY type of document.
Based on this information, I cause use those values as inputs into the pricing calculator. This tells me that I can estimate about $450/mo for DocumentDB services alone (if this was my anticipated usage pattern).
There are additional ways you can optimize the Request Units (RUs -- metric used to measure the cost of the given request/transaction -- and what you're getting billed for): optimizing index strategies, optimizing queries, etc. Review the documentation on Request Units for more details.