How to create a meta graph in Azure Cosmos DB with Gremlin API - azure

I am trying to figure out how to create a meta model for a graph database on Azure Cosmos DB using the Gremlin API, such as the meta graph in neo4j, but I haven't been able to find a way so far.
I want to be able to see the entities of my database as nodes, and the relationships among them as edges, without having to load any data yet (so that I can map these nodes and edges programmatically to the data sources, and the sources are only called -and the data loaded- when there is a matching query).
The only information that's relatively close to this that I've managed to find, is about visualizing the whole graph but not its meta structure (although even this seems to not be possible yet, or only possible through external visualization platforms).
Is it actually possible to do so? Or Cosmos DB being a schema-free database means that it indeed isn't?

There isn't a way to specify a meta-graph in Azure Cosmos DB's Gremlin API - usually Azure Data Factory, or other application-level solutions are recommended.

Related

Microsoft Cosmos DB (DocumentDB API) vs. Cosmos DB (Table API)

Microsoft Cosmos DB includes DocumentDB API, Table API and others. I have about ~ 10 TB of data and would like to have a fast key-value lookup (very little updating and writing, mostly are reading). Add a link for Microsoft Cosmos DB:
https://learn.microsoft.com/en-us/azure/cosmos-db/
So how should I choose between DocumentDB API and Table API?
Or when should I choose DocumentDB API? When should I choose Table API?
Is it a good practice to use DcoumentDB API to store 10 TB of data?
The Azure Cosmos DB Table API was introduced to make Cosmos DB and its advanced indexing, geo-distribution, etc. features available to the Azure Table storage community. The idea is that someone using Azure Table storage who needs more advanced features only offered by Cosmos DB can literally just change their connection string and their existing code will work with Cosmos DB.
But if you are a greenfield customer then I would recommend using SQL API (formerly called Document DB API) which is a super set of Table API. We are constantly investing in providing more advanced features and capabilities to SQL API where as for Table API we are just looking to maintain compatibility with Azure Table storage's API which hasn't changed in many years.
How much data you have doesn't have any affect on what API you choose. They both have the same multi-model infrastructure and can handle the same sizes of data, query loads, distribution, etc.
So how should I choose between DocumentDB API and Table API?
Choosing between DocumentDB API and Table API will primarily depend on the kind of data that you're going to store. DocumentDB API provides a schema-less JSON database engine with SQL querying capabilities whereas Table API provides a key-value storage database service. Since you mentioned that your data is key-value based, recommended is that you use Table API.
Or when should I choose DocumentDB API? When should I choose Table API?
Same as above.
Is it a good practice to use DcoumentDB API to store 10 TB of data?
Both Document DB API and Table API are designed to store huge amounts of data.
However you may want to look into Azure Table Storage as well. Cosmos DB lets you fine tune the throughput that you need and robust indexing/querying support and that comes at a price. Azure Tables on the other hand comes with fixed throughput and limited indexing/querying support and is extremely cheap compared to Cosmos DB.
You may find this link helpful to explore more about Cosmos DB: https://learn.microsoft.com/en-us/azure/cosmos-db/introduction.
Please don't flag this as off-topic.
It might help for you to know in advance: if you are considering the document interface, then in fact there is a case-insensitivity that can affect how DataContract classes (and I believe all others) are transformed to and from Cosmos.
In the linked discussion below, you will see that there is a case insensitivity in Newtonsoft.Json that can have effects on your handling of objects that you pass or get directly from the API. Not that Cosmos has ANY flaws, and in fact it is totally excellent. But with a document API, you might (like me) start to simply pass DataContract objects into Cosmos (which is obviously not wrong, and in fact very much expected from the object API), but there are some serializer and naming strategy handler options that you are probably better of at least being aware of up front.
So just to add a note for you to be aware of this behavior with an object interface. The discussion is here on GitHub:
https://github.com/JamesNK/Newtonsoft.Json/issues/815

How to construct graph based on triple list in Cosmos DB?

I have extracted information from a given text, the result is a triple list in the RDF format(entity1, entity2, releation). I'd like to construct a knowledge graph using the triple list, however, cosmos db graph API does not provide such APIs. So basically I have two questions
How to import a triple list to construct a graph in Azure Cosmos db? Specifically, it would be better if there is a C# solution;
Is there such API that allows me to query the knowledge graph using SPARQL?
I'm a newbie at NLP field, please correct me if you find any mistake in my description.
You're going to have to write an application using one of the Cosmos DB SDKs and convert your triple list into Gremlin statements that can be executed by Cosmos to seed the database.
SPARQL is not natively supported, Gremlin is the only graph query language available at this time. However, Cosmos data can be exported into HDInsight for analysis so you could install SPARQL on your HDInsight cluster and then execute whatever SPARQL you wanted using Spark.

How can we create Azure's Data Factory pipeline with Cosoms DB (with Graph API) as data sink ?

How can we create Azure's Data Factory pipeline with Cosoms DB (with Graph API) as data sink ? (data source being Cosmos DB only (Document DB as API)
One option that is available to you is to simply continue using the Document API for the graph enabled CosmosDB sink. If you transform and write your documents into the destination in GraphSON format as regular documents they will be automatically usable as vertices and edges in future graph traversals.
The ability to use both DocumentSQL and Gremlin APIs against the same collection is one of the most exciting and powerful features of CosmosDB IMO (and the team plans to support more APIs interacting with the same dataset in the future).
Not only is this possible, but I've personally observed significant improvements in throughput when importing large datasets into a graph enabled Cosmos collection using the Document APIs instead of gremlin. I plan to release a blog post describing this process in more detail in the near future.
Cosmos DB Graph API is not supported yet and we will add to our product backlog.

Is it possible to use Cosmos DB instead of Azure SQL DATABASE?

I am very excited to use Cosmos DB into my current application instead of Azure SQL database.
Before use Cosmos DB as backend in my current application, I have few questions in my mind those are
In my current application I used Entity framework.
And also used column encryption, dynamic data masking features.
So, if I moved to Cosmos DB instead of using Azure SQL database then how can I achieve those features by using Cosmos DB?
Documentation doesn't specify details about encryption, masking and entity framework.
Can you please tell me “is it possible to use Cosmos DB with above requirements instead of Azure SQL Database?
Entity Framework is specific to relational databases, so it doesn't fit with Cosmos DB's document store (or graph, or tables).
Regarding encryption: Cosmos DB provides encryption-at-rest, built-in. There is no per-property data-masking feature built-in; you'd have to do your own data masking.
Whether you migrate to a document (or graph, or table) store is really up to you, and whether you want to re-shape your data to fit in such a storage model, vs a relational model. No real way to answer that for you. (TL;DR you cannot just switch from relational to, say, document, without any changes, as they are fundamentally different storage concepts).

Best solution for dynamic spatial data

I'm trying to find the best solution for storing dynamic spatial data. I wonder if any of Microsoft's Azure solutions could work. Azure Table Storage would let me create a lot of custom and dynamic structures stored on fast SSD disks.
Because of data's dynamic nature, common indexing seems useless. I would also like to create a lot of table-like structures so the whole architecture cannot be static. Using Azure Table Storage I would dynamically create a table based on country, city, etc sorted by latitude or longitude.
I would appreciate any clue.
Azure Table Storage has mostly been replaced by Azure Cosmos DB.
At the time of writing the Table Storage page even says:
The content in this article applies to the original basic Azure Table storage. However, there is now a premium offering for Azure Table storage in public preview that offers throughput-optimized tables, global distribution, and automatic secondary indexes. To learn more and try out the new premium experience, please check out Azure Cosmos DB: Table API.
You can use Cosmos DB via the Table API, but you'll probably find the Document DB API to be more powerful.
Documents are "schema-free". You can just throw your documents in to a collection, and then you can query against them.
You can create documents which have geo-spatial properties which are indexed automatically.
Then you can perform geo-spatial queries against those properties.
For example you might give each of your documents a point, and then create a query to select all documents that are inside of a polygon.
Or maybe you want to find out how far away each document is from a given point.

Resources