I wonder if it's a good idea to use Azure Cosmos DB "container" to manage an entity's status? For example, an employee's reimbursement can have different statuses like submitted, approved, paid, etc. Do you see any problem creating a separate container for "Submitted", "Approved", etc? They would contain the similar reimbursement object but slightly different data points due to their status. For example, Submitted container could have Manager's name as the approver, Paid container could have payment method.
In other words, it's like a persistent queue. It will be moved out of the container and into the next in the workflow process.
Are there any concerns with this approach? Does Azure pricing model "provisioned throughput" charge by the container? Meaning the more container you have, the more expensive it gets? Or is it on the database level so that I can have as many containers I want, it's only charging the queries?
Sorry for the newbie questions, learning about Cosmos. Thanks for any advice!
It's a two part question :).
First part (single container v/s multiple container) is basically an "opionion-based" question. I would have opted for a single container approach as it would have given me just one place to look for the current status of an item. But, that's just my opinion :).
Regarding your question about pricing model, Azure Cosmos DB offers you both.
You can provision throughput at the container level as well as on the database level. When you provision throughput at the database level, all containers (up to 25) in that database will share the throughput of the database.
You can even mix and match the approaches as well i.e. you can have throughput provisioned at the database level and then have some containers share the throughput of the database while some containers have dedicated throughput.
Please note that once throughput type (fixed, shared or auto-scale) is configured at the database/container level, it can't be changed. You will have to delete and create new resource with changed throughput type.
You can learn more about throughput here: https://learn.microsoft.com/en-us/azure/cosmos-db/set-throughput.
Related
I have a container that is needed per customer(about 5000 customers). For each customer that signs up on my website, one container goes up (with a different port) and the customer sends his/her logs to it, then logs are processed and ingested to the data storage container (one for all customers). in Kubernetes (In terms of security and other conditions), is it better to separate each customer container in a specific namespace or not?
It doesn't matter you can create the as many namespaces you want in Kertnetes however if you have any requirement to give K8s access to customers it would be better to separate them out by namespaces.
Keeping different containers themselves creates one layer of separation however if you still want to create a virtual env you can do it and use the namespace.
I am getting ready to create a brand new mobile application that communicates with CosmosDB and I will probably go the serverless way. The serverless way has some little disadvantages compared to the provisioned throughput (eg. only 50GB per container, no Geo-Redundancy, no Multi-region Writes, etc.).
If I need later on to convert my DB to a provisioned throughput one, can I do it somehow?
I know that I can probably use the change-feed and from that (I guess) recreate a new DB from it (provisioned throughput one) but this might open the Pandora's box especially while a mobile app connects to a specific DB.
As Gaurav mentioned, there is no way to change to Provisioned from Serverless plan once you create an account.
You will need to recreate the account with Serverless as type and follow the below ways to migrate the data,
(i) Data Migration Tool - You can easily migrate from one account to another
(ii) ChangeFeed and Restore - push the changes to the new the instance of Azure Cosmos DB
Once you are synced switch back to the new one.
Based on the documentation available here: https://learn.microsoft.com/en-us/azure/cosmos-db/serverless#using-serverless-resources, it is currently not possible to change a Cosmos DB server less account to provisioned throughput.
We are new to COSMOS and migrating our multiple applications to cloud. What would be the pros & cons if we have one database per COSMOS instance or all applications databases in single COSMOS instance, will that be cost effective? Because if Microsoft bill on usage RU/s and storage, and not on how many instances of COSMOS are running, what difference it will make to have single database in each COSMOS instance?
Example-
Approach A :
COSMOS Resource1 > Database1 > Container1
COSMOS Resource1 > Database2 > Container2
Approach B:
COSMOS Resource1 > Database1 > Container1
COSMOS Resource2 > Database2 > Container2
Which approach is better?
Pros
The database can have multiple containers. Each container can have it's own RU quota or you can have them share RU's by placing the quota on the database level. This could save you money by sharing RU's across your whole suite of container needs without the hassle of managing each containers cost.
You get the ease of connection information as your endpoint and key are the same for all of your containers as they are in one database.
Adding more RU's benefits all containers not just one.
Cons
If you have a really read/write intensive application that takes up a lot of RU's, combining containers under one provision quota could leave your other applications receiving errors as there are no RU's left for them to perform their operations.
Someone obtains your key and endpoint, all of your containers are exposed since they are on under the same database. This can expose your companies full data inventory to a hacker.
You can't control cost to a fine point. Meaning if you have a container that doesn't need much RU's, this container could have a 400 RU provision and only cost you $20 or dollars, while you place the bulk of your budget toward the RU hungry app. Separation allows you pin point control over RU distibution/cost.
https://learn.microsoft.com/en-us/azure/cosmos-db/set-throughput
Additional tid bits.
Change feed allows you to connect a function, etc. to events within cosmos and it would allow you to sync data to an outside database like SQL Server, ElasticSearch, and/or Redis. I've seen a lot of people/companies use that serverless power to sync ElasticSearch with very little code.
Make sure you choose your partition key carefully and never do a operation without it. Sometimes the difference can be over 100 RU's on a query.
If I write an Azure addon, can it access the WADPerformanceCountersTable table (of the business application that provisioned this addon)? Especially in terms of security/permissions.
E.g. say I wanted my addon to monitor some performance counters, and send an email alert if they pass some thresholds (regardless of whether there are already such commercial products, I'm just interested in the technical capability). What will I have to do? I'm guessing WADPerformanceCountersTable isn't publicly exposed to the entire worlds - so how can I make them accessible to my addon?
thanks very much
WADPerformanceCountersTable is nothing different from other Azure tables, and it's stored in the storage defined by Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString in the configuration file. You will need the storage account name/key pair to read from this table.
FYI, here is an article about how to effectively fetching performance counter data from this table: http://gauravmantri.com/2012/02/17/effective-way-of-fetching-diagnostics-data-from-windows-azure-diagnostics-table-hint-use-partitionkey/
I have an ASP.NET MVC 2 Azure application that I am trying to switch from being single tenant to multi-tenant. I have been reviewing many blogs and posts and questions here on Stack Overflow, but am still trying to wrap my head around the specifics of what's right for this particular app.
Currently the application stores some information in a SQL Azure database, as well as some other info in an Azure Storage Account. I'm considering writing the tenant provisioning code to simply create a new database for a new tenant, along with a new azure storage account. This brings me to the following question:
How will I go about testing this approach locally? As far as I can tell, the local Azure Storage Emulator only has 1 storage account. I'm not sure if I'm able to create others locally. How will I be able to test this locally? Or will it be possible?
There are many aspects to consider with multitenancy, one of which is data architecture. You also have billing, performance, security and so forth.
Regarding data architecture, let's first explore SQL storage. You have the following options available to you: add a CustomerID (or other identifyer) that your code will use to filter records, use different schema containers for different customers (each customer has its own copy of all the database objects owned by a dedicated schema in a database), linear sharding (in which each customer has its own database) and Federation (a feature of SQL Azure that offers progressive sharding based on performance and scalability needs). All these options are valid, but have different implications on performance, scalability, security, maintenance (such as backups), cost and of course database design. I couldn't tell you which one to choose based on the information you provided; some models are easier to implement than others if you already have a code base. Generally speaking a linear shard is the simplest model and provides strong customer isolation, but perhaps the most expensive of all. A schema-based separation is not too hard, but requires a good handle on security requirements and can introduce cross-customer performance issues because this approach is not shared-nothing (for customers on the same database). Finally Federations requires the use of a customer identifyer and has a few limitations; however this technology gives you more control over performance distribution and long-term scalability (because like a linear shard, Federation uses a shared-nothing architecture).
Regarding storage accounts, using different storage accounts per customer is definitively the way to go. The primary issue you will face if you don't use separate storage accounts is performance limitations, such as the maximum number of transactions per second that can be executed using a single storage account. As you are pointing out however, testing locally may be a problem; however consider this: the local emulator does not offer 100% parity with an Azure Storage Account (some functions are not supported in the emulator). So I would only use the local emulator for initial development and troubleshooting. Any serious testing, including multitenant testing, should be done using real storage accounts. This is the only way you can fully test an application.
You should consider not creating separate databases, but instead creating different object namespaces within a single SQL database. Each tenant can have their own set of tables.
Depending on how you are using storage, you can create separate storage containers or message queues per client.
Given these constraints you should be able to test locally with the storage emulator and local SQL instance.
Please let me know if you need further explanation.