Microsoft Cosmos DB includes DocumentDB API, Table API and others. I have about ~ 10 TB of data and would like to have a fast key-value lookup (very little updating and writing, mostly are reading). Add a link for Microsoft Cosmos DB:
https://learn.microsoft.com/en-us/azure/cosmos-db/
So how should I choose between DocumentDB API and Table API?
Or when should I choose DocumentDB API? When should I choose Table API?
Is it a good practice to use DcoumentDB API to store 10 TB of data?
The Azure Cosmos DB Table API was introduced to make Cosmos DB and its advanced indexing, geo-distribution, etc. features available to the Azure Table storage community. The idea is that someone using Azure Table storage who needs more advanced features only offered by Cosmos DB can literally just change their connection string and their existing code will work with Cosmos DB.
But if you are a greenfield customer then I would recommend using SQL API (formerly called Document DB API) which is a super set of Table API. We are constantly investing in providing more advanced features and capabilities to SQL API where as for Table API we are just looking to maintain compatibility with Azure Table storage's API which hasn't changed in many years.
How much data you have doesn't have any affect on what API you choose. They both have the same multi-model infrastructure and can handle the same sizes of data, query loads, distribution, etc.
So how should I choose between DocumentDB API and Table API?
Choosing between DocumentDB API and Table API will primarily depend on the kind of data that you're going to store. DocumentDB API provides a schema-less JSON database engine with SQL querying capabilities whereas Table API provides a key-value storage database service. Since you mentioned that your data is key-value based, recommended is that you use Table API.
Or when should I choose DocumentDB API? When should I choose Table API?
Same as above.
Is it a good practice to use DcoumentDB API to store 10 TB of data?
Both Document DB API and Table API are designed to store huge amounts of data.
However you may want to look into Azure Table Storage as well. Cosmos DB lets you fine tune the throughput that you need and robust indexing/querying support and that comes at a price. Azure Tables on the other hand comes with fixed throughput and limited indexing/querying support and is extremely cheap compared to Cosmos DB.
You may find this link helpful to explore more about Cosmos DB: https://learn.microsoft.com/en-us/azure/cosmos-db/introduction.
Please don't flag this as off-topic.
It might help for you to know in advance: if you are considering the document interface, then in fact there is a case-insensitivity that can affect how DataContract classes (and I believe all others) are transformed to and from Cosmos.
In the linked discussion below, you will see that there is a case insensitivity in Newtonsoft.Json that can have effects on your handling of objects that you pass or get directly from the API. Not that Cosmos has ANY flaws, and in fact it is totally excellent. But with a document API, you might (like me) start to simply pass DataContract objects into Cosmos (which is obviously not wrong, and in fact very much expected from the object API), but there are some serializer and naming strategy handler options that you are probably better of at least being aware of up front.
So just to add a note for you to be aware of this behavior with an object interface. The discussion is here on GitHub:
https://github.com/JamesNK/Newtonsoft.Json/issues/815
Related
I am trying to figure out how to create a meta model for a graph database on Azure Cosmos DB using the Gremlin API, such as the meta graph in neo4j, but I haven't been able to find a way so far.
I want to be able to see the entities of my database as nodes, and the relationships among them as edges, without having to load any data yet (so that I can map these nodes and edges programmatically to the data sources, and the sources are only called -and the data loaded- when there is a matching query).
The only information that's relatively close to this that I've managed to find, is about visualizing the whole graph but not its meta structure (although even this seems to not be possible yet, or only possible through external visualization platforms).
Is it actually possible to do so? Or Cosmos DB being a schema-free database means that it indeed isn't?
There isn't a way to specify a meta-graph in Azure Cosmos DB's Gremlin API - usually Azure Data Factory, or other application-level solutions are recommended.
How can we create Azure's Data Factory pipeline with Cosoms DB (with Graph API) as data sink ? (data source being Cosmos DB only (Document DB as API)
One option that is available to you is to simply continue using the Document API for the graph enabled CosmosDB sink. If you transform and write your documents into the destination in GraphSON format as regular documents they will be automatically usable as vertices and edges in future graph traversals.
The ability to use both DocumentSQL and Gremlin APIs against the same collection is one of the most exciting and powerful features of CosmosDB IMO (and the team plans to support more APIs interacting with the same dataset in the future).
Not only is this possible, but I've personally observed significant improvements in throughput when importing large datasets into a graph enabled Cosmos collection using the Document APIs instead of gremlin. I plan to release a blog post describing this process in more detail in the near future.
Cosmos DB Graph API is not supported yet and we will add to our product backlog.
I am very excited to use Cosmos DB into my current application instead of Azure SQL database.
Before use Cosmos DB as backend in my current application, I have few questions in my mind those are
In my current application I used Entity framework.
And also used column encryption, dynamic data masking features.
So, if I moved to Cosmos DB instead of using Azure SQL database then how can I achieve those features by using Cosmos DB?
Documentation doesn't specify details about encryption, masking and entity framework.
Can you please tell me “is it possible to use Cosmos DB with above requirements instead of Azure SQL Database?
Entity Framework is specific to relational databases, so it doesn't fit with Cosmos DB's document store (or graph, or tables).
Regarding encryption: Cosmos DB provides encryption-at-rest, built-in. There is no per-property data-masking feature built-in; you'd have to do your own data masking.
Whether you migrate to a document (or graph, or table) store is really up to you, and whether you want to re-shape your data to fit in such a storage model, vs a relational model. No real way to answer that for you. (TL;DR you cannot just switch from relational to, say, document, without any changes, as they are fundamentally different storage concepts).
I'm trying to find the best solution for storing dynamic spatial data. I wonder if any of Microsoft's Azure solutions could work. Azure Table Storage would let me create a lot of custom and dynamic structures stored on fast SSD disks.
Because of data's dynamic nature, common indexing seems useless. I would also like to create a lot of table-like structures so the whole architecture cannot be static. Using Azure Table Storage I would dynamically create a table based on country, city, etc sorted by latitude or longitude.
I would appreciate any clue.
Azure Table Storage has mostly been replaced by Azure Cosmos DB.
At the time of writing the Table Storage page even says:
The content in this article applies to the original basic Azure Table storage. However, there is now a premium offering for Azure Table storage in public preview that offers throughput-optimized tables, global distribution, and automatic secondary indexes. To learn more and try out the new premium experience, please check out Azure Cosmos DB: Table API.
You can use Cosmos DB via the Table API, but you'll probably find the Document DB API to be more powerful.
Documents are "schema-free". You can just throw your documents in to a collection, and then you can query against them.
You can create documents which have geo-spatial properties which are indexed automatically.
Then you can perform geo-spatial queries against those properties.
For example you might give each of your documents a point, and then create a query to select all documents that are inside of a polygon.
Or maybe you want to find out how far away each document is from a given point.
I'm developing a .NET app, which needs to run both on Azure and on regular Windows Servers(2003). It needs to store a few GB of data and SQL Azure is too expensive for me, so I'll use Azure tables in the cloud version. Can you recommend a storage solution, which will run on standalone servers and have an API and behavior similar to Azure tables? From what I've seen Server AppFabric does not include Tables.
If you think what Windows Azure Table Storage is, it is a Key-Value pair based non-relational databse which is accessible through REST API. Please download this document about Windows Azure and NoSQL database details.
If I were in your situation, my approach would have been to find something similar to Azure Table Storage which I can access over REST and have similar accessibility API. So if you try to find the similar database to run on a machine you really need to look for:
Key Value Pair DB
Support for basic operations i.e add, delete, insert, modify an entity
Partition Key and Row Key based Accessibility
RESTful Interface to connect
If you would want to try something you sure can look at:
DBreeze (C# based Key Value Pair NoSQL DB) I just saw it and looks exciting
Googles LevelDB (Key Value Pair DB, open source and available on Windows) I have no idea about API
Redis (Great Key-Value Pair DB but not sure for Windows compatibility and API)
Here is a list of key/value databases without additional indexing facilities are:
Berkeley DB
HBase
MemcacheDB
Redis
SimpleDB
Tokyo Cabinet/Tyrant
Voldemort
Riak
If none works, you sure can get any of open source DB and modify to work for your requirement and then make that available to others as your contribution to community.
ADDED
Now you can use Windows Azure Virtual Machine to run any kind of Key-Value pair DB on Linux or Windows Machine and connection with your application.
I'm not sure which storage solution to recommend, but just about any database solution would work provided that you write an Interface to abstract all your data storage code. Then write implementations of that interface for Azure Table storage and whatever other database you want to use on the non-cloud server
You should be doing that anyway so that your code isn't tightly coupled with Azure Table Storage APIs.
If you combine coding against that Interface with an IoC container, then a single line of code or a single configuration setting would enable you to switch between data implementations based on which platform the code is running on.