What is the maximum number of collections per SQL API database in CosmosDB? - azure

We are considering partitioning our data through multiple collections, or creating a separate collection per key.
The key range is in tens of thousands, less than 100k. My concern here is the collections limit per each SQL (DocumentDB) database.

According to Azure Cosmos DB limits, it seems that there is no limit about the count of collection.
If you have any questions about the scale Azure Cosmos DB provides, you could send email to askcosmosdb#microsoft.com.
Azure Cosmos DB is a global scale database in which throughput and storage can be scaled to handle whatever your application requires.
But I don't recommand that you create so huge number of collections, because collections are billing entities. For more information, you could refer to Azure cosmos FAQ.
Collections are also the billing entities for Azure Cosmos DB. Each collection is billed hourly, based on the provisioned throughput and used storage space

Related

Reduce storage in cosmos db

I just realized that some of the tables which I moved from parquet to cosmos db, have pretty big size, as obviously there is not the same level of compression like in parquet. That is obviously resulting in big cost. Eventually RUs don't cost me much, but storage is a bit high. Any good recommendations how to reduce the size of collections in Cosmos db. Apart from the excluding not needed fields and indexes?
Cosmos DB is not designed to be a cold store for massive amounts of data that isn't actively queried. If you have huge amounts of data that is infrequently queried, one suggestion would be to enable Synapse Link and let it write that data from Cosmos DB into analytical storage on a remote blob store in parquet format. With your data in analytical store, you can then TTL the data from Cosmos DB that you are not actively using and querying for OLTP operations.
If you need to query the older data, you can provision a new Workspace and Notebooks and use SQL or Spark to query the data. If you don't need to query it then you can just let the data remain there. Best of all the storage costs are the same as regular blob storage, definitely less expensive than the price for storage in Cosmos DB which is .25c/GB due it being on cluster SSD storage.
Maybe someone could find it useful, but I have resolved this problem by applying "high storage low throughput program" https://learn.microsoft.com/en-us/azure/cosmos-db/set-throughput#high-storage-low-throughput-program

Is it possible to use Scalar DB on Azure Cosmos DB in a single region zone redundancy configuration?

When using Scalar DB on Azure Cosmos DB, I'm considering the use of zone redundancy configuration to increase availability.
Is it possible to use Scalar DB on Azure Cosmos DB in a single region zone redundancy configuration? The consistency level of Cosmos DB is Strong.
Scalar DB can work with multiple zones as long as zone redundancy supports Strong consistency.
However, since the implementation of Cosmos DB is totally disclosed, please check with Azure technical support to see if Strong consistency works properly with multiple zones.

Is Azure table storage data retrieval faster than Sql Azure

There is a requirement to store data (xml data) in some storage. As it is mentioned it stores large XML data, each record (row) size nearly 1MB. The doubt is which storage we are going to use to store the data means Azure Table storage (Storage Account) or Sql Azure.
So which storage will help data store and retrieval faster?
When looking at sheer volume, Table Storage is today far more scalable than SQL Azure. Given a storage account (storage accounts hold blobs, queues and tables) is allowed to be 100TB in size, in theory your table could consume all 100TB. At first glance, a 100TB chunk of data may seem overwhelming. However, Table Storage can be partitioned. Each partition of Table Storage can be moved to a separate server by the Azure controller thereby reducing the load on any single server. As demand lessens, the partitions can be reconsolidated. Reads of Azure Table Storage are load balanced across three replicas to help performance.
Entities in Table Storage are limited to 1MB each with no more than 255 properties (3 of which are required partition key, row key, and timestamp).
Today, SQL Azure databases are limited to 1GB or 10GB. However, sometime this month (June 2010), a 50GB limit is supposed to be available. What happens if your database is larger than 10GB today (or 50GB tomorrow)? Options include repartitioning your database into multiple smaller databases or sharding (Microsoft’s generally recommended approach). Without getting into the database details of both of these database design patterns, both of these approaches are not without issue and complexity, some of which must be resolved at the application level.
It's hard to say Azure table storage data retrieval must be faster than Sql Azure. It depend on your data structure, size.
As you said, each record (row) size of your XML data nearly 1MB, if not exceed the limit 1MB, you can choose the Table Storage first.
You can reference this document to know more comparisons about Azure Table Storage and SQL Azure: Azure Table Storage vs. Windows SQL Azure
Hope this helps.

Provision throughput on Database level using Table API in cosmos db

I have come across the requirement where I have to choose the API for Cosmos DB.
I have gone through with all API's like SQL,Graph, Mongo and Table. Since my current project structure is based on Table storage where I am storing IoT Device data.
In Current structure (Table storage) :
I have a separate Table for each Device with payload like below
{
Timestamp,
Parameter name,
value
}
Now If I plan to use Cosmos DB then I can see that I have to Provision RU/throughput against each table which I think going to be big cost. I have not found any way to assign RU on database level so that my allocated RU can be shared across all tables.
Please let me know in case we have something here.... or is it the limitation i can treat for CosmosDB with Table API?
As far as I can see SQL API and consider my use case I can create a single data base and then multiple collection (with the name of Table) and then I have both option for RU provision like on Database as well as on Device level which give me more control on cost.
You can set the throughput on the account level.
You can optionally provision throughput at the account level to be shared by all tables in this account, to reduce your bill. These settings can be changed ONLY when you don't have any tables in the account. Note, throughput provisioned at the account level is billed for, whether you have tables created or not. The estimate below is approximate and does not include any discounts you may be entitled to.
Azure Cosmos DB pricing
The throughput configured on the database is shared across all the containers of the database. You can choose to explicitly exclude certain containers from database provisioning and instead provision throughput for those containers at container level.
A Cosmos DB database maps to the following: a database while using SQL or MongoDB APIs, a keyspace while using Cassandra API or a database account while using Gremlin or Table storage APIs.
You can embed Cerebrata into the situation where the tools allow you to assign any number of throughput values post assigning the throughput type (fixed, auto-scale, or no throughput)
Disclaimer: It’s purely based on my experience

Azure Stream Analytics Job degrading while pushing data to cosmos DB

I Have data getting pushed from Azure IoT Hub -> Stream Analytics -> CosmosDB
I had 1 simulated device and my cosmos DB collection was of 1000 RU/s working fine .
now i have made it 10 simulated devices and my Cosmos DB collection scaled to 15000 RU/s still my stream analytics getting degraded .
Is there i need to increase number of parallel connections to collection ?
can we make it more optimal As Azure pricing of Cosmos DB , Depend on throughput and RU
Can we make it more optimal as Azure pricing of Cosmos DB, depend on
throughput and RUs?
I just want to share some thoughts with you about improving write performance of Cosmos db here.
1.Consistency Level
Based on the document:
Depending on what levels of read consistency your scenario needs
against read and write latency, you can choose a consistency level on
your database account.
You could try to set Consistency Level as Eventually. Details please refer to here.
2.Indexing:
Based on the document:
by default, Azure Cosmos DB enables synchronous indexing on each CRUD
operation to your collection. This is another useful option to control
the write/read performance in Azure Cosmos DB.
Please try to set index lazy. Also, remove useless index.
3.Partition:
Based on the document:
Azure Cosmos DB unlimited are the recommended approach for
partitioning your data, as Azure Cosmos DB automatically scales
partitions based on your workload. When writing to unlimited
containers, Stream Analytics uses as many parallel writers as previous
query step or input partitioning scheme.
Please partition your collection and pass the partition key in output to improve write performance.

Resources