Microsoft Azure DocumentDB vs Azure Table Storage - azure

For several recent years, Microsoft offers a "NoSQL" key/value storage, called "Table Storage" (http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-tables/)
Table Storage offers a high performance, scalability (via partitioning) and relatively low cost. A primary drawback of Tables that only Partition and Row keys can be indexed - so making queries on values is very inefficient.
Recently Microsoft announced a new "NoSQL" service, called "DocumentDB" (http://azure.microsoft.com/en-us/documentation/services/documentdb/)
Instead of storing a list of properties (like Tables do), DocumentDB stores JSON objects. The whole object being indexed - so efficient queries may be created based on every property and any nested property of stored objects.
Microsoft says that DocumentDB provides high performance and scalability as well.
If that's so - why anyone would use Table Storage over DocumentDB? It sounds like DocumentDB provides the same functionality as Tables, but with additional capabilities such as the ability to index anything.
I will glad if someone could make a comparison between DocumentDB and Table Storage, highlighting cons and pros of each one.

Both are NoSQL technologies, but they are massively different. Azure Tables is a simple Key/Value store and does not support complex functionality like complex queries (most of them will require a full partition/table scan anyway, which will kill your performance and your cost savings), custom indexing (indexing is based on PartitionKey and RowKey only, you currently can't index on any other entity property and searching for anything other than PartitionKey/RowKey combination will require a partition/table scan), or stored procedures. You also can't batch read requests for multiple entities (through batch write requests are supported if all the entities belong to the same partition). For a real-life application of Azure Tables, see HERE.
If your data needs (particularly around querying them) are simple (like in the example above), then Azure Tables provide what you need, you might end up using that in favor of DocDB due to pricing, performance and storage capacity. For example, Azure Tables performance target is 20.000 operations per second. Trying to get that same level of performance on DocDB will have a significantly higher service cost for you. Also, Azure tables are limited by the capacity of your Azure storage account (500TB), whereas DocDB storage is limited by the capacity units you buy.

Table Services is mainly a key-value type NOSQL and DocumentDB is (as the name suggests) a Document Type NoSQL store. What you are asking is essentially the difference between these two types of NOSQL approaches. If you shape your research according to this you should be able to get a better understanding for sure.
Just to keep things simple I suggest you consider the differences between how DocumentDB and Table Services are priced. Not only the cost of these services vary a lot from each other but the fact that DocumentDB works on a "provision first" model and Table Services are offered on a pure consumption based pricing might give you some clues on your compare/contrast.
Let me ask you this; why would I use DocumentDB if the features in Table Services well serve my needs? ;) I suggest you to take a look at how the current Azure Diagnostics tooling use Azure Storage Services, how Storage Metrics use Azure Storage on itself to get a sense of how useful Table Services would be and how overkill DocumentDB might be in some situations.
Hope this helps.

I think that the comparison is all about trading price for performance. Table Services are just Storage Services, which seem to cap out at 20,000 ops/second, but paying for that kind of throughput all the time (because Storage gives it to us all the time) is $1,200/month. Crazy money.
Table services have simple indexes, so queries are very limited. Good for anything that is written and read via IDs. DocumentDB indexes the entire document, so a query can be done on any property.
And lastly, Table services are bound by the storage constraint of the Storage account it's on (which could get crazy high given negotiation with Microsoft directly), where DocumentDB storage seems unlimited.
So it's a balance. Do you have a LOT of data (hundreds of gigs, or terabytes) that you need in one place? DocumentDB. Do you need to support complex queries? DocumentDB. Do you have data that needs to come and go fast, but based on a 1-to-2 property lookup? Table services. Would you trade having to code around a simple index in order to avoid paying through the nose for throughput? Table services.
And Redis, someone mentioned that... man, I dunno. Even the existence of persistence in a caching framework (which Redis offers) doesn't turn it into a tech of choice... There is a huge difference between a persistent store that holds data that is "often used, but may be missing or time-retired", like a cache would, and a persistent store that guarantees your data to be there.

A real life example:
I have to store some tokens, retrieve them, delete them. Only query ever done will be based on User ID.
So I use Table Storage, as it fulfill my requirement perfectly. I save the token against User ID.
Document DB seemed to be overkill for this.

Here is the answer from microsoft's official docs
Common attributes of Cosmos DB, Azure Table Storage, and Azure SQL Database:
99.99 availability SLA
Fully managed database services
ISO 27001, HIPAA and EU Model Clauses Compliant
The following table shows the uncommon attributes of Azure Cosmos DB,
Azure Table Storage

Related

Querying Azure Diagnostic table storages

We are storing our Windows/Linux VM metrics and logs into Azure diagnostics storage account for long term retention. We keep this data in Log Analytics as well but being cost conscious we keep only the minimal essential set and for 1 month. However it seems there is no way to efficiently query the Table storage data when we need it - e.g. checking historical cpu usage for a particular machine over a specific period in the past, or checking the logs captured during that period. The partition key and row key is highly convoluted with some very basic help available for the WAD tables schema while none exist for LinuxsyslogVer2v0 table schema. I was curious if anyone else using the diagnostic logs table storage for any querying/reporting? If so how do you query these for a specific host and time period? I can do a querying using non primary/row key but besides being time consuming it will cost a hell eventually considering that will be a table scan. Really appreciate any advice.
You should consider using Azure Data Explorer (ADX) for your long-term storage solution. It allows for KQL queries on your long-term data and is the preferred method for keeping log/security data past the default for services like LogA and Sentinel.
The pricing page for ADX can be a bit confusing and there is a website to help you estimate costs here: https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
By default, logs ingested into Azure Sentinel are stored in Azure Monitor Log Analytics. This article explains how to reduce retention costs in Azure Sentinel by sending them to Azure Data Explorer for long-term retention.
Storing logs in Azure Data Explorer reduces costs while retains your ability to query your data, and is especially useful as your data grows. For example, while security data may lose value over time, you may be required to retain logs for regulatory requirements or to run periodic investigations on older data.
https://learn.microsoft.com/en-us/azure/sentinel/store-logs-in-azure-data-explorer?tabs=adx-event-hub

Azure Storage Account for Tables

So first of all I'd like to say I'm no DBA nor coder, I'm just a regular IT person that works as support for network and infrastructure, however, I like to get familiar with technologies in general and understand the basics of it, let's say how they work, implemented with no additional specific details.
I've been reading about Azure Storage Accounts in regards to tables. As IT, I had to implement simple file shares via SMB 3.0 in order to have them mapped on our network, I've come across other options such as blobs, tables and queues. I've read about them however I'm trying to get the main functionality of tables for a coder.
Correct me if I am wrong, when you code an app with a database, you can put the database on same/different server, and that can be on premise or on the cloud and you kind of link both together.
And as far as Im concerned and what I was able to find out investigating on the web, these tables are NoSQL and no constraints, you create the tables and data through Visual Studio thanks to an API, then that information is reflect on your storage.
How is this is useful when using it for the app you're developing?
I've been reading about Azure Storage Accounts in regards to tables. As IT, I had to implement simple file shares via SMB 3.0 in order to have them mapped on our network, I've come across other options such as blobs, tables and queues. I've read about them however I'm trying to get the main functionality of tables for a coder.
And as far as Im concerned and what I was able to find out investigating on the web, these tables are NoSQL and no constraints, you create the tables and data through Visual Studio thanks to an API, then that information is reflect on your storage.
Azure Storage Accounts is a "box" to keep your Blobs, Tables, Queues, Files organised from the management point of view and for the access control. Each storage type is good for it's specific tasks.
If the world would have just one super storage which will solve all our possible cases for storing, querying and managing the data then there would not be such variety of different databases, storage types etc. available.
If you need to share the files as a "network folder" - try Azure Files.
If your coders need a database storage, then the first question would be what are the requirements to the database do they have? What is the purpose of that database would be, etc. Azure, particularly, has a lot of different database solutions, and again, each of them good for some specific task, and can be not a good choice for other tasks.
As to Azure Tables, from the official docs:
Azure Table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schemaless design.
So, if your coders do need to store such data, then yes, that would be one of the possible choices.
Correct me if I am wrong, when you code an app with a database, you can put the database on same/different server, and that can be on premise or on the cloud and you kind of link both together.
Correct. But also you can have your own server with the database which you need to manage yourself, or you can choose some cloud service which will provide the database for you but will keep the underlying server and other maintenance activity managed for you, so you no need to worry/spend your time on that.
How is this is useful when using it for the app you're developing?
It is important to understand what your requirements are for data storage in order to pick a proper one. This question perhaps should be addressed not to you, but to your coders, who are building the app and can consolidate their requirements to the database store. Usually, they will tell you exactly what they need, and you may give them some ideas or advice of the alternatives, if any (That may be a similar solution with extra functionality or the way how the data is stored or processed, or have more built in integrations that may be important for you, or a decision whether keep own installation or use cloud managed service)
For your further possible question about When should I use a NoSQL database instead of a relational database? Is it okay to use both on the same site? see this thread
Update based on further questions:
If I develop an application with a database whose tables are on Azure, can I call let's say functions or data from it to my main application that is hosted on premise? What's the benefit of doing that versus hosting the tables on premise other than it's largely scalable and highly available?
Perhaps you need to better understand the relationship between App (Application) and DB (Database). The Database is a standalone system, which store the data, reply to the incoming queries (receive request, process it, return the result). In overall to the DB is not important who is requesting the data. It is a "passive" system. (There are some cases when DB can trigger further processes in data processing pipelines, but that is beyond this scope).
The App in opposite is an active system in App<->DB relationship. (Also leave behind more advanced designs where App is not just a 1 system). App receive requests, process them (may do external requests to other "services" if that is necessary), give a response (with or without data) to the requester. In App<->DB relationship the external requests is what happening. At some point App need some data from the DB, so App make a request to the DB, obtain the response and continue its own logic.
Where App server and DB server are placed is not that important (for simplicity). The important part is whether DB server is accessable for the requests. DB can be on-prem with public static IP address, it can be in cloud on your own server which has public static IP address (sometimes that is archived in different ways but we skip that for simplicity), that can be a Database as a Service cloud solution, where you do not need to have a server and configure the database, but have a url endpoint which you need to use to query the DB.
I appreciate the answer, and I pretty much agree with what you're saying.
But my questions goes beyond what the requirements are for the developers.
I'll modify the question. If I develop an application with a database whose tables are on Azure, can I call let's say functions or data from it to my main application that is hosted on premise? What's the benefit of doing that versus hosting the tables on premise other than it's largely scalable and highly available?
Azure Storage Tables are the "Notepad" of NoSQL Databases. If you want quick and easy key/value pairs, tables is the way to go. If you are looking for the "Word" of NoSQL in Azure then Cosmos DB is where it's at. Cosmos DB offers global distrobution, better features and better SLA (see comparison). Tables are cheaper too.
Azure also supports MySQL, PostGreSQL, MariaDB and MSSQL as PaaS offerings if you wish to use a traditional database.

Maintenance required for Azure SQL DB in the long term

What is the maintenance required from an organization when deploying an Azure SQL Database in the long term?
My current organization is hoping to do as little database management as possible, and have looked for products that fully manage our databases without much intervention needed from our end. Some products that are being considered includes Snowflake (for their automated partitioning of tables) and Domo (for their data warehousing, connectors, and BI tool offerings).
I'm leaning towards using Azure SQL DB for multiple reasons (products offered, transparent pricing, integration ease, available documentation, SSO, etc.), but want to first understand the skills needed and ease in maintaining it in the long run.
Will we have to manually rebuild indexes and partition out tables as we scale up? Or is Azure intelligent enough that it'll do most of the heavy lifting of performance optimization itself?
Does Azure or other vendors provide services to optimize a DB?
Sorry for the vague prompts, but any additional considerations in choosing DB vendors would be great. Thanks!
Actually for your questions, you should know what is Azure SQL database and it's capabilities.
I'm leaning towards using Azure SQL DB for multiple reasons (products offered, transparent pricing, integration ease, available documentation, SSO, etc.), but want to first understand the skills needed and ease in maintaining it in the long run.
This document What is Azure SQL Database service introduced almost all message you want to know. SQL Database is a general-purpose relational database managed service in Microsoft Azure that supports structures such as relational data, JSON, spatial, and XML. SQL Database delivers dynamically scalable performance within two different purchasing models: a vCore-based purchasing model and a DTU-based purchasing model. SQL Database also provides options such as columnstore indexes for extreme analytic analysis and reporting, and in-memory OLTP for extreme transactional processing. Microsoft handles all patching and updating of the SQL code base seamlessly and abstracts away all management of the underlying infrastructure.
Will we have to manually rebuild indexes and partition out tables as we scale up? Or is Azure intelligent enough that it'll do most of the heavy lifting of performance optimization itself?
No, you don't. Scalability is one of the most important characteristics of PaaS that enables you to dynamically add more resources to your service when needed. Azure SQL Database enables you to easily change resources (CPU power, memory, IO throughput, and storage) allocated to your databases.
You can mitigate performance issues due to increased usage of your application that cannot be fixed using indexing or query rewrite methods. Adding more resources enables you to quickly react when your database hits the current resource limits and needs more power to handle the incoming workload. Azure SQL Database also enables you to scale-down the resources when they are not needed to lower the cost.
For more details, please reference: Scale Up/Down.
Does Azure or other vendors provide services to optimize a DB?
As Woblli said, Azure SQL database provides the Azure SQL database Monitoring and tuning for you.
As a complement, you also can use Azure SQL Database Automatic tuning to help you optimize the database automatically.
Hope this helps.
Azure SQL DB offers the services you're asking.
You can enable automatic tuning, which will create and drop indexes based on performance gains. Force good query plans again based on performance. It will roll back changes if the specific change has decreased the overall database performance level.
It will not partition or shard your database for you however.
Official documentation:
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning

Azure Search scalability

We are developing a mobile app that should scale for thousands of users and we are using Azure Search as our main storage. According to Azure pricing model the query limits are set to 15 queries per second/per unit for the standard plan. With these limits and with a system that should scale with thousands of concurent users we would hit the limits pretty quickly.
In our situation is Azure Search not the right option when scaling for thousands of concurrent users?
Would DocumentDB be a better option?
Thanks!
Interesting that you're using Azure Search as your primary storage, as it's not built to be a database engine. The storage is specifically for search content (type typical pattern is to use Azure Search in conjunction with a database engine, such as SQL Database or DocumentDB, for example), using results to point back to the "system of record" content in your database.
The scale for Search is specifically for full-text-search queries your users will generate. And Azure Search scales per unit, with each unit offering 15 searches / second. So, you can scale far beyond 15/sec if you buy more search units.
However: Don't confuse this with database engine queries. You asked about DocumentDB, so using that as an example: You can query far beyond 15/second with that database engine, and that scales independently. Same goes for any VM-based database solution, SQL Database, etc - they all can scale.
This really comes down to whether you need full-text-search at high volume. If so, great - just scale Azure Search to the number of units you need, to handle your request traffic. If you can do more database-specific searches, without driving your request through Azure Search, then you don't need to scale out as much, and can take advantage of the native database query capabilities.
One thing to add to David's excellent answer - if your scenario is primarily search driven and you don't need to store data for purposes other than search and are OK with eventual consistency, then using Azure Search as the primary store may be fine.
Also, 15 requests per second query throughput of Azure Search is just a ballpark - it's neither a hard limit nor a promise. Depending on your data and query complexity, the actual throughput can be significantly (many times) higher or lower.

Windows Azure and multiple storage accounts

I have an ASP.NET MVC 2 Azure application that I am trying to switch from being single tenant to multi-tenant. I have been reviewing many blogs and posts and questions here on Stack Overflow, but am still trying to wrap my head around the specifics of what's right for this particular app.
Currently the application stores some information in a SQL Azure database, as well as some other info in an Azure Storage Account. I'm considering writing the tenant provisioning code to simply create a new database for a new tenant, along with a new azure storage account. This brings me to the following question:
How will I go about testing this approach locally? As far as I can tell, the local Azure Storage Emulator only has 1 storage account. I'm not sure if I'm able to create others locally. How will I be able to test this locally? Or will it be possible?
There are many aspects to consider with multitenancy, one of which is data architecture. You also have billing, performance, security and so forth.
Regarding data architecture, let's first explore SQL storage. You have the following options available to you: add a CustomerID (or other identifyer) that your code will use to filter records, use different schema containers for different customers (each customer has its own copy of all the database objects owned by a dedicated schema in a database), linear sharding (in which each customer has its own database) and Federation (a feature of SQL Azure that offers progressive sharding based on performance and scalability needs). All these options are valid, but have different implications on performance, scalability, security, maintenance (such as backups), cost and of course database design. I couldn't tell you which one to choose based on the information you provided; some models are easier to implement than others if you already have a code base. Generally speaking a linear shard is the simplest model and provides strong customer isolation, but perhaps the most expensive of all. A schema-based separation is not too hard, but requires a good handle on security requirements and can introduce cross-customer performance issues because this approach is not shared-nothing (for customers on the same database). Finally Federations requires the use of a customer identifyer and has a few limitations; however this technology gives you more control over performance distribution and long-term scalability (because like a linear shard, Federation uses a shared-nothing architecture).
Regarding storage accounts, using different storage accounts per customer is definitively the way to go. The primary issue you will face if you don't use separate storage accounts is performance limitations, such as the maximum number of transactions per second that can be executed using a single storage account. As you are pointing out however, testing locally may be a problem; however consider this: the local emulator does not offer 100% parity with an Azure Storage Account (some functions are not supported in the emulator). So I would only use the local emulator for initial development and troubleshooting. Any serious testing, including multitenant testing, should be done using real storage accounts. This is the only way you can fully test an application.
You should consider not creating separate databases, but instead creating different object namespaces within a single SQL database. Each tenant can have their own set of tables.
Depending on how you are using storage, you can create separate storage containers or message queues per client.
Given these constraints you should be able to test locally with the storage emulator and local SQL instance.
Please let me know if you need further explanation.

Resources