I am trying to pass the more than 3 MB json data as input parameter to a CosmosDB stored procedure but I get an error
RequestEntityTooLarge
Is there any limitation or there is other way to do that?
Below is the screenshot where I am just returning the constant to check there is Ru's consumption over the data param or not.
Note: Container RU's is set to 10k
Hi max size for the item is 2MB
Per-item limits
Depending on which API you use, an Azure Cosmos item can represent either a document in a collection, a row in a table, or a node or edge in a graph. The following table shows the limits per item in Cosmos DB.
Maximum size of an item: 2 MB (UTF-8 length of JSON representation)
Related
I have a container that stores ~5000 documents. Each document is not very large. The most frequent query is just to select everything in this container (so that the frontend can display it in a nice table client-side). Each document has a unique ID. I was using this as the partition key (/id) for the container but I have read that querying data like this is more efficient in terms of time and RU/s when all the data comes from the same partition as I can avoid cross-partition queries.
Can I create a container without a partition key? Or a container that only has one partition? Will I have to add a property to every document that is the same value to force this or is there an easier way?
The number of partitions of a container is defined by the provisioned RU and data size: https://learn.microsoft.com/azure/cosmos-db/partitioning-overview#physical-partitions
So, if you create a container with less than 10K RU and keep the data size small (<50GB), it should be a single physical partition.
If you use a single value for your Partition Key, you will hit the data cap: https://learn.microsoft.com/azure/cosmos-db/sql/troubleshoot-forbidden#partition-key-exceeding-storage because your database simply won't be able to scale.
Looking at the no of documents in your container (~5000), it would ideally land in a single physical partition unless you have huge amount of RU requirement above 10,000 RU's. Assuming you have an RU config of less than 10,000 RU/S, this would be in a single physical partition >You can confirm this by looking at the metric(classic) option from the left-hand blade in the portal
We are using Azure Data Factory to copy data from on-premise SQL table to a REST endpoint, for example, Google Cloud Storage. Our source table has more than 3 million of rows. Based the document https://learn.microsoft.com/en-us/azure/data-factory/connector-rest#copy-activity-properties, the default value for writeBatchSize (number of records write to the REST sink per batch) is 10000. I tried to increase the size up to 5,000,000 and 1,000,000, and noticed the final file size are the same. It shows that not all the 3M records were written to GCS. Does anyone know what is max size for writeBatchSize? The pagination seems only for the case that using REST as source. I wonder if there is any workaround for my case?
I'm investigating why we're exhausting so many RUs in Cosmos. Our writes are the expected amount of RUs but our reads are through the roof - a magnitude more than our writes. I tried to strip it to the simplest scenario. A single request querying on a partition with no results uses up 2000 RUs. Why is this so expensive?
var query = new QueryDefinition("SELECT * FROM c WHERE c.partitionKey = #partionKey ORDER BY c._ts ASC, c.id ASC")
.WithParameter("#partionKey", id.Value)
using var queryResultSetIterator = container.GetItemQueryIterator<MyType>(query,
requestOptions: new QueryRequestOptions
{
PartitionKey = new PartitionKey(id.Value.ToString()),
});
while (queryResultSetIterator.HasMoreResults)
{
foreach (var response in await queryResultSetIterator.ReadNextAsync())
{
yield return response.Data;
}
}
The partition key of the collection is /partitionKey. The RU capacity is set directly on the container, not shared. We have a composite index matching the where clause - _ts asc, id asc. Although I'm not sure how that would make any difference for returning no records.
Unfortunately the SDK doesn't appear to give you the spent RUs when querying this way so I've been using Azure monitor to observe RU usage.
Is anyone able to shed any light on why this query, returning zero records and limited to a single partition would take 2k RUs?
Update:
I just ran this query on another instance of the database in the same storage account. Both configured identically. DB1 has 0MB in it the, DB2 has 44MB in it. For the exact same operation involving no records returned, DB1 used 111 RUs, DB2 used 4730RUs - over 40 times more for the same no-result queries.
Adding some more detail: The consistency is set to consistent prefix. It's single region.
Another Update:
I've replicated the issue just querying via Azure Portal and it's related to the number of records in the container. Looking at the query stats it's as though it's loading every single document in the container to search on the partition key. Is the partition key not the most performant way to search? Doesn't Cosmos know exactly where to find documents belonging to a partition key by design?
2445.38 RUs
Showing Results
0 - 0
Retrieved document count: 65671
Retrieved document size: 294343656 bytes
Output document count: 0
Output document size: 147 bytes
Index hit document count: 0
Index lookup time: 0 ms
Document load time: 8804.060000000001 ms
Query engine execution time: 133.11 ms
System function execution time: 0 ms
User defined function execution time: 0 ms
Document write time: 0 ms
I eventually got to the bottom of the issue. In order to search on the partition key it needs to be indexed. Which strikes me as odd considering the partition key is used to decide where a document is stored, so you'd think Cosmos would inherently know the location of every partition key.
Including the partition key in the list of indexed items solved my problem. It also explains why performance degraded over time as the database grew in size - it was scanning through every single document.
I have a requirement of reading multiple files (105 files) from ADLS(Azure data lake storage); parsing them and subsequently adding the parsed data directly to multiple collections in azure cosmos db for mongodb api. All this needs to be done in one request. Average file size is 120kb.
The issue is that after multiple documents are added,an error is raised "request size limit too large"
Please let me know if someone has any inputs on this.
It's unclear how you're performing multi-document inserts but... You can't increase maximum request size. You'll need to perform individual inserts, or insert in smaller batches.
I have inserted exactly 1 million documents in an Azure Cosmos DB SQL container using the Bulk Executor. No errors were logged. All documents share the same partition key. The container is provisioned for 3,200 RU/s, unlimited storage capacity and single-region write.
When performing a simple count query:
select value count(1) from c where c.partitionKey = #partitionKey
I get varying results varying from 303,000 to 307,000.
This count query works fine for smaller partitions (from 10k up to 250k documents).
What could cause this strange behavior?
It's reasonable in cosmos db. Firstly, what you need to know is that Document DB imposes limits on Response page size. This link summarizes some of those limits: Azure DocumentDb Storage Limits - what exactly do they mean?
Secondly, if you want to query large data from Document DB, you have to consider the query performance issue, please refer to this article:Tuning query performance with Azure Cosmos DB.
By looking at the Document DB REST API, you can observe several important parameters which has a significant impact on query operations : x-ms-max-item-count, x-ms-continuation.
So, your error is resulted of bottleneck of RUs setting. The count query is limited by the number for RUs allocated to your collection. The result that you would have received will have a continuation token.
You may have 2 solutions:
1.Surely, you could raise the RUs setting.
2.For cost, you could keep looking for next set of results via continuation token and keep on adding it so that you will get total count.(Probably in sdk)
You could set value of Max Item Count and paginate your data using continuation tokens. The Document Db sdk supports reading paginated data seamlessly. You could refer to the snippet of python code as below:
q = client.QueryDocuments(collection_link, query, {'maxItemCount':10})
results_1 = q._fetch_function({'maxItemCount':10})
#this is a string representing a JSON object
token = results_1[1]['x-ms-continuation']
results_2 = q._fetch_function({'maxItemCount':10,'continuation':token})
I imported exactly 30k documents into my database.Then I tried to run the query
select value count(1) from c in Query Explorer. It turns out only partial of total documents every page. So I need to add them all by clicking Next Page button.
Surely, you could do this query in the sdk code via continuation token.