Schema migration for Cosmos DB SQL API. Makes sense? - azure

I started working on a Java project where the chosen database was the Azure Cosmos DB SQL API, so reading the SQL API Cosmos DB introduction I understood that that SQL, in this case, is only for query and not for data manipulation(insert, delete).
The question is: Does it make sense to use a schema migration tool like Flyway/Liquibase for this kind of database?

CosmosDb does not have any support for schemas at the database level. It's schema free with an indexing mechanism that allows for efficient querying of arbitrary JSON data. As such, a SQL schema migration tool doesn't make sense in this context and wouldn't work anyways. It's up to your application code to ensure that data is normalized and migrated to new formats if necessary.

Little late to the party but I think this might help: https://github.com/liquibase/liquibase-cosmosdb. It's an extension for Liquibase for Cosmos DB. So, pretty much what you were looking for!

Related

Where does Azure Cosmos DB store data?

Where does Azure Cosmos DB store data? Is it stored in Azure Table or Blob Storage, or something else?
I have looked through the Docs from MS and I do not see this explained anywhere.
I'm sure we can't really answer this question beyond what Microsoft tells us:
Key/value (table), columnar, document, and graph data models are all natively supported because of the ARS (atoms, records, and sequences) design that Azure Cosmos DB is built on. Atoms, records, and sequences can be easily mapped and projected to various data models. The APIs for a subset of models are available right now (SQL, MongoDB, Table, and Gremlin) and others specific to additional data models will be available in the future.
Azure Cosmos DB has a schema agnostic indexing engine capable of automatically indexing all the data it ingests without requiring any schema or secondary indexes from the developer. The engine relies on a set of logical index layouts (inverted, columnar, tree) which decouple the storage layout from the index and query processing subsystems. Cosmos DB also has the ability to support a set of wire protocols and APIs in an extensible manner and translate them efficiently to the core data model (1) and the logical index layouts (2) making it uniquely capable of supporting more than one data model natively.
But the comments to your question are still relevant. If Microsoft does their job right, you shouldn't need to care how or where it's stored. And if you feel that how you proceed building an app is determined on how the data is stored, you probably need to ask a different question. But if you only wanted to know out of curiosity, then here you go.

How do you perform queries without specifying shard key in mongodbapi and how do you query across partitions?

How do you perform queries without specifying shard key in mongodb api and how do you query across partitions?
In sql api the latter is enabled by setting EnableCrossPartitionQuery to true on the request but I'm not able to find anything like that for the mongodb api. And my queries that work on an unsharded collection now fails(queries that specify the shard key works as expected).
The queries fail indiscriminately of whether I use the AsQueryable extension syntax or the aggregation framework.
As I know, no such property similar to EnableCrossPartitionQuery in CosmosDB Mongo API. In fact, CosmosDB is an independent server implementation that does not directly align with MongoDB server versions and features.
CosmosDB supports a subset of the MongoDB API and translates requests into the CosmosDB SQL equivalent. CosmosDB has some different behaviours and results, particularly with their implementation of partitioning as compared to MongoDB's sharding. But the onus is on CosmosDB to improve their emulation of MongoDB.
Certainly, you could add feedback here to get official assistance or consider using MongoDB Atlas on Azure if you'd like full MongoDB feature support.
Hope it helps you.
Was confirmed a bug by the Product Group team! Will be fixed in first two weeks of september in case anyone runs into the same problems in the mean time.

JOOQ with SQL DataWarehouse?

Does JOOQ support dialect for "SQL DataWarehouse"?
Any pointers .
From a jOOQ perspective, SQL Data Warehouse is just another flavour of SQL Server as can be seen in the documentation:
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-reference-tsql-statements
Like SQL Server, SQL Data Warehouse implements parts of the T-SQL language
I question why you're planning to use a Java-focussed query generator with an MPP data warehouse. If you're intending to use it for updates, deletes, etc. in some kind of ETL flow, you're going to be in a world of pain.
Nothing wrong with JOOQ, but maybe not the right technology for interacting with ASDW.

What does it mean that Azure Cosmos DB is multi-model?

Looking at the new Azure cosmos database, I'm a bit confused about the multi-model nature of it. Specifically, does it mean:
a) That the same underlying database/store can be queried multiple ways concurrently so that I can use both gremlin graph queries and mongodb api against the same collections.
or -
b) Does it mean that you can choose a different model (graph, key value, column, document) at the time of provisioning your Cosmos DB and that is how the data will be stored from then on.
The brochure makes it sound like a), but using the Azure dashboard to create a cosmos instance it makes it seem like b) since you have to choose a model type at creation.
Additionally, the literature makes reference to columnar data, but I don't see the option for it at create time.
Cosmos DB is a single NoSQL data engine, an evolution of Document DB. When you create a container ("database instance") you choose the most relevant API for your use case which optimises the way you interact with the underling data store and how the data is persisted in to that store.
So, depending on the API chosen, it projects the desired model (graph, column, key value or document) on to the underlying store.
You can only use one API against a container, multiple are not possible due to the way the data is stored and retrieved. The API dictates the storage model - graph, key value, column etc, but they all map back on to the same technology under the hood.
Thanks to #Jesse Carter's comment below it appears you are however able to mix and match the graph and DocumentSQL APIs.
From the docs:
Multi-model, multi-API support
Azure Cosmos DB natively supports multiple data models including documents, key-value, graph, and column-family. The core content-model of Cosmos DB’s database engine is based on atom-record-sequence (ARS). Atoms consist of a small set of primitive types like string, bool, and number. Records are structs composed of these types. Sequences are arrays consisting of atoms, records, or sequences.
The database engine can efficiently translate and project different data models onto the ARS-based data model. The core data model of Cosmos DB is natively accessible from dynamically typed programming languages and can be exposed as-is as JSON.
The service also supports popular database APIs for data access and querying. Cosmos DB’s database engine currently supports DocumentDB SQL, MongoDB, Azure Tables (preview), and Gremlin (preview). You can continue to build applications using popular OSS APIs and get all the benefits of a battle-tested and fully managed, globally distributed database service.
Cosmos DB at its heart is a geographically distributed database with its own Atom-Record-Sequence storage engine and index. On top of that infrastructure we are able to implement many different kinds of stores, from SQL like stores using our SQL API, to Mongo, to Cassandra, to Gremlin, to an implementation of Azure Table storage and so on.
Each of the different store types have their own data types (e.g. ways of encoding numbers, dates, etc.) and are encoded in our storage and index layer in their own way. Over time we expect most of those data types to be natively supported by our SQL API. But for now each of our data base types uses its own encoding conventions. When creating an account in Cosmos DB (this is a unit of organization, users can have many accounts) the "type" of Database is specified on the account. So one can have a Table API account or a Mongo account or what have you.
In some cases it is possible to access an account with Data Type X using API Y. For example, one can use SQL API to talk to tables in a Table API account. But outside of graph, that is usually not a great idea. Right now we encode information for each API in a special format and the different data types don't speak each other's formats. So if one were to write to a Table API using SQL API the end result will most likely be corrupt data.
The exception is graph which we work hard to make sure work reasonably well with all database types and we'll have more to say on that in the future.
So if you do want to play around with multi API access we strongly encourage you to only do so in "read only" mode when not using the "native" API for the given account. In other words, by all means play around with the SQL API reading from a Table API, just please don't write to a Table API account suing a SQL API client.
The accepted answer misses out on some points.
Cosmos DB is a NoSQL database, but it is highly distributed and we its storage format is Atom-Record-Sequence.
Why does that matter? We know that it accepts JSON as in- and output formats, that does not mean Cosmos stores its data as JSON, it could be any format actually. This helps us to reason about the multi-modelness of Cosmos: what you get when you execute a query according to a certain model is probably a projection or view of your data.
#JesseCarter already explained we can interchangeably use Document API and Graph API. Last week Table API got publicly announced and probably this API is not too different as well.
The guys over at Spectologic have written a nice blogpost about the Cross-API usage of Cosmos and have also pointed out that the multi-modelness is more cosmetics than internals, the only real exception seems Mongo. The interesting part gets pointed out in the chapter 'Switching the portal experience' here: https://blog.spectologic.com/2017/06/30/digging-into-cosmosdb-storage/
So maybe in the end it boils down to GlobalDocumentDb vs. MongoDb
I too was intrigued by this, wanting to understand more from a API usage auditing perspective and have learned more reading through these answers.
Upon experimenting it appear things have progressed further than the original answers, so to add a contemporary spin...
I have been able to successfully create a Cosmos DB account choosing the SQL API, created a document in the portal then retrieved the document via the MongoDB API.
The original answers suggested that MongoDB was the odd-one-out and couldn't interact with data created with other APIs.
Now whether with fuller testing this would result in corrupt documents due to the data type differences hinted upon by Yaron (https://stackoverflow.com/a/48286729/141022) and whether the storage differences would result in poor performance still as hints to that is to be seen.
For my purposes I'm interested to whether auditing one API is enough, which in this case it is not as data created in one can be retrieved by another, so I haven't tested in depth.
Notably, the ARM template deploys with neither GlobalDocumentDB nor MongoDB kind, however exporting the ARM template back from the portal results in GlobalDocumentDB if that happens to make a difference.
If you are interested in the implementation details of CosmosDB, you can read this whitepaper from a long time ago (assuming that the implementation hasn't changed). http://www.vldb.org/pvldb/vol8/p1668-shukla.pdf
TLDR:
At the bottom, CosmosDB stores data in ARS and exposes them in JSON format.
The database engine index ALL fields in ALL documents by default, therefore enabling very flexible query.
The database engine executes an intermediate language similar to JavaScript, bridging the low-level storage and APIs that database exposes.
Because of that bridging, more database APIs can be added to support different querying mechanism (e.g. SQL, document, columnar).
Multimodel means your data can be stored in a number of different ways. Currently, CosmosDB stores 4 different types of data and it allows you to integrate with an API and build out a user experience around these database storage types.
The 4 types are Document DB or Mongo DB, Graph Database, Key Value Paire, and Wide Column or Column Family.

SharePoint List like Data Access Interface

I am impressed by the way we programmatically access lists in SharePoint. I percieve it as a Data Access Layer, while modeling the database is as simple as defining the columns in the List.
I am looking for a tool OR an application that would give me similar interface to a database. Basically, for some reason I cannot use SharePoint and I don't wish to take up the responsibility of modeling, deploying and maintaining the database. I find the SharePoint way of persistence management acceptable and exciting.
Can anyone suggest me something even close to this.
BTW, my application is on ASP.Net and my preferred RDBMS is MS SQL Server.
If you don't want the overhead and expense of a Sharepoint installation, 90% of the time all you really need is WSS 3.0 (free with a windows server license).
For auto generated entity classes you can use Linq To Sharepoint (SPMetal)
For hand written POCO entities you can try using SharepointCommon ORM
Use NOSQL database like MongoDb or CouchDB which are schema less, allowing you to freely add fields to JSON documents without having to first define schema.

Resources