Azure Cosmos db monitoring metrics - azure

Below is the total requests metric of various 'operationtype' against a particular cosmosdb collection.
What does particularly "Execute" and "Readfeed" operation type mean?
Below metrics does not cover the operations performed via stored procs as mentioned here, so how to get total read/write operation metrics performed via stored procs?

Execute operation type refers to the Execution of Stored Procedures.
Readfeed operation is to retrieve all documents, or just the incremental changes to documents within the collection.
As mentioned in the documentation, currently metrics does not capture the operations performed via the SP. You might need to log it somewhere manually.

What does particularly "Execute" and "Readfeed" operation type mean?
Refer this ReadFeed Microsoft documentation.
Read Feed can be used to retrieve all documents i.e., refers to read the entire feed of a container
Execute operation type.
It executes user-defined functions or JavaScript stored procedures.
You can also Refer this thread for Read vs Read Feeds Vs Query.
AFAIK you can capture metrics using application insights or you can use context object provided by cosmos DB and in your stored procedures, use the SDK to track custom events and metrics.

Related

How to cache data between Azure Durable Function orchestration instances?

Documentation states that Azure Durable Function orchestrations code should be deterministic, cos of replays. In my case, I have some data in Azure Table Storage, that I need to fetch in workflow. The workflow is recursive and the data in Azure Table Storage can change during execution, and it is OK to have stale state for ~1 min. In regular code I would rely on memory cache to improve the performance. But in orchestrations, suppose it can not be used directly, cos this makes workflow non-deterministic.
I can still use cache in activity and call it from orchestrations, but every activity call involves serialization\deserialization of inputs\outputs and passing messages though control queue. These operations are heavier then fetching data itself.
So I have a question, is there any pattern, that can be used to cache data between orchestration instances in memory, without wrapping this logic in activity?
What I can suggest you is: use a distributed cache, specifically Redis Cache for Azure.
I drew an image for you:
Get your data from Azure Table Storage in your orchestration, do your operation in there and save it to Redis cache. Then pass the id of the required data to each activity. Then you can get the data from Redis cache inside each activity.
This is a solution with cache as you asked. However, please note that if you want high-performance data query, Azure Table Storage is not the best solution to work with. I suggest you to use either Azure SQL or CosmosDB. But if you are seeking a cheap option that's fine. But in that case, Redis cache won't be good option for you, because it's not a cheap solution neither. If this Redis cache won't work for you, I would suggest you review your algorithm.
Good luck!
You can store data between orchestrations with entity functions.
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entities
And be able to 64 operations per second.
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-perf-and-scale#performance-targets

What does it mean that Azure Cosmos DB is multi-model?

Looking at the new Azure cosmos database, I'm a bit confused about the multi-model nature of it. Specifically, does it mean:
a) That the same underlying database/store can be queried multiple ways concurrently so that I can use both gremlin graph queries and mongodb api against the same collections.
or -
b) Does it mean that you can choose a different model (graph, key value, column, document) at the time of provisioning your Cosmos DB and that is how the data will be stored from then on.
The brochure makes it sound like a), but using the Azure dashboard to create a cosmos instance it makes it seem like b) since you have to choose a model type at creation.
Additionally, the literature makes reference to columnar data, but I don't see the option for it at create time.
Cosmos DB is a single NoSQL data engine, an evolution of Document DB. When you create a container ("database instance") you choose the most relevant API for your use case which optimises the way you interact with the underling data store and how the data is persisted in to that store.
So, depending on the API chosen, it projects the desired model (graph, column, key value or document) on to the underlying store.
You can only use one API against a container, multiple are not possible due to the way the data is stored and retrieved. The API dictates the storage model - graph, key value, column etc, but they all map back on to the same technology under the hood.
Thanks to #Jesse Carter's comment below it appears you are however able to mix and match the graph and DocumentSQL APIs.
From the docs:
Multi-model, multi-API support
Azure Cosmos DB natively supports multiple data models including documents, key-value, graph, and column-family. The core content-model of Cosmos DB’s database engine is based on atom-record-sequence (ARS). Atoms consist of a small set of primitive types like string, bool, and number. Records are structs composed of these types. Sequences are arrays consisting of atoms, records, or sequences.
The database engine can efficiently translate and project different data models onto the ARS-based data model. The core data model of Cosmos DB is natively accessible from dynamically typed programming languages and can be exposed as-is as JSON.
The service also supports popular database APIs for data access and querying. Cosmos DB’s database engine currently supports DocumentDB SQL, MongoDB, Azure Tables (preview), and Gremlin (preview). You can continue to build applications using popular OSS APIs and get all the benefits of a battle-tested and fully managed, globally distributed database service.
Cosmos DB at its heart is a geographically distributed database with its own Atom-Record-Sequence storage engine and index. On top of that infrastructure we are able to implement many different kinds of stores, from SQL like stores using our SQL API, to Mongo, to Cassandra, to Gremlin, to an implementation of Azure Table storage and so on.
Each of the different store types have their own data types (e.g. ways of encoding numbers, dates, etc.) and are encoded in our storage and index layer in their own way. Over time we expect most of those data types to be natively supported by our SQL API. But for now each of our data base types uses its own encoding conventions. When creating an account in Cosmos DB (this is a unit of organization, users can have many accounts) the "type" of Database is specified on the account. So one can have a Table API account or a Mongo account or what have you.
In some cases it is possible to access an account with Data Type X using API Y. For example, one can use SQL API to talk to tables in a Table API account. But outside of graph, that is usually not a great idea. Right now we encode information for each API in a special format and the different data types don't speak each other's formats. So if one were to write to a Table API using SQL API the end result will most likely be corrupt data.
The exception is graph which we work hard to make sure work reasonably well with all database types and we'll have more to say on that in the future.
So if you do want to play around with multi API access we strongly encourage you to only do so in "read only" mode when not using the "native" API for the given account. In other words, by all means play around with the SQL API reading from a Table API, just please don't write to a Table API account suing a SQL API client.
The accepted answer misses out on some points.
Cosmos DB is a NoSQL database, but it is highly distributed and we its storage format is Atom-Record-Sequence.
Why does that matter? We know that it accepts JSON as in- and output formats, that does not mean Cosmos stores its data as JSON, it could be any format actually. This helps us to reason about the multi-modelness of Cosmos: what you get when you execute a query according to a certain model is probably a projection or view of your data.
#JesseCarter already explained we can interchangeably use Document API and Graph API. Last week Table API got publicly announced and probably this API is not too different as well.
The guys over at Spectologic have written a nice blogpost about the Cross-API usage of Cosmos and have also pointed out that the multi-modelness is more cosmetics than internals, the only real exception seems Mongo. The interesting part gets pointed out in the chapter 'Switching the portal experience' here: https://blog.spectologic.com/2017/06/30/digging-into-cosmosdb-storage/
So maybe in the end it boils down to GlobalDocumentDb vs. MongoDb
I too was intrigued by this, wanting to understand more from a API usage auditing perspective and have learned more reading through these answers.
Upon experimenting it appear things have progressed further than the original answers, so to add a contemporary spin...
I have been able to successfully create a Cosmos DB account choosing the SQL API, created a document in the portal then retrieved the document via the MongoDB API.
The original answers suggested that MongoDB was the odd-one-out and couldn't interact with data created with other APIs.
Now whether with fuller testing this would result in corrupt documents due to the data type differences hinted upon by Yaron (https://stackoverflow.com/a/48286729/141022) and whether the storage differences would result in poor performance still as hints to that is to be seen.
For my purposes I'm interested to whether auditing one API is enough, which in this case it is not as data created in one can be retrieved by another, so I haven't tested in depth.
Notably, the ARM template deploys with neither GlobalDocumentDB nor MongoDB kind, however exporting the ARM template back from the portal results in GlobalDocumentDB if that happens to make a difference.
If you are interested in the implementation details of CosmosDB, you can read this whitepaper from a long time ago (assuming that the implementation hasn't changed). http://www.vldb.org/pvldb/vol8/p1668-shukla.pdf
TLDR:
At the bottom, CosmosDB stores data in ARS and exposes them in JSON format.
The database engine index ALL fields in ALL documents by default, therefore enabling very flexible query.
The database engine executes an intermediate language similar to JavaScript, bridging the low-level storage and APIs that database exposes.
Because of that bridging, more database APIs can be added to support different querying mechanism (e.g. SQL, document, columnar).
Multimodel means your data can be stored in a number of different ways. Currently, CosmosDB stores 4 different types of data and it allows you to integrate with an API and build out a user experience around these database storage types.
The 4 types are Document DB or Mongo DB, Graph Database, Key Value Paire, and Wide Column or Column Family.

How to get snapshot consistency when extracting data from MS Dynamics CRM?

I've been researching how to extract data from an MS Dynamics CRM 2011/3 Online instance so that I can replicate entire CRM entities in a target database.
I've looked at the Retrieve and RetrieveAll operations of the Organisation web service. These are able to extract data from a single CRM entity (entity type).
There's also the FetchXML interface, that can retrieve data using a complex query, from multiple entities.
It's possible that will be be no quiet time, when there are no data changes being made by users, or via web services, that I could use to extract data from the system in order to get a consistent snapshot of the data.
If I was able to access the SQL Server database directly I would be able to set an isolation level for a transaction and extract all data within that transaction, and get a consistent view of data.
I think FetchXML would give me a consistent snapshot, but only of the data queried by each call to it.
I could use FetchXML to query all the entities I'd like to replicate, in a single call, and then renormalise the data, with some ETL code, on my target database. That query wouldn't be nice though (complex and possibly non performant, and impacting the system performance).
So, basically my problem is this: if I extract from each entity in turn, and the database is changing whilst I'm extracting, I'm highly like to get an inconsistent data set in my target database.
How can I get a consistent snapshot of data to access?
You can contact support through the Support Portal and request a database backup. Then you can just restore that database to your On-Premise installation through Deployment Manager.
EDIT
After your comments below, I suggest a "push" model instead of a "pull" model. You'll need to create plugins for Create/Update/Delete on all entities in which you are interested in CRM Online. These plugins will push those updates to your database (probably through your own web-service). Since these plugins happen inside the transaction, if your web service throws an error you can cancel the source action in CRM, thus guaranteeing transactional consistency.
Once you get these plugins up and running, you can do a one-time export and your plugins will keep it up to date from there.

what would be the best way to migration data from SQL Azure to Azure Table

For a project, I am using both SQL Azure and Azure table. A requirement here is that for the first 7 days, all data are stored in SQL Azure. After the first 7 days, the data are migrated into Azure table.
Is there any reliable project to achieve this goal? Or any idea to implement this?
thanks,
I think your best best is to have a set of SQL queries (or sprocs) that return data older than 7 days. Then have table-insertion code that writes this data to one or more tables, with appropriate partition/row key based on your query needs. Then, just build some type of background operation to perform the read+write+delete. There's no tool to do this (that I know of), since one is a relational database and the other is a NoSQL variant with no specific schema.
To optimize your writes, see if you can write batches of rows at the same time (this is called an Entity Group Transaction). It optimizes # of transactions, plus the rows in a group will be written atomically. See more info on entity group transactions, here.
You also may want to consider using a queue for workload assignment. That is, maybe once a day (or hour, whenever), push a queue message telling some background process to transfer data from SQL to Table Storage. This way, in case something fails during the operation, you can process it again later, since the queue message will still be there (you'd only delete the message if the operation succeeded).
If you're looking for a tool to do so, take a look at Cloud Storage Studio (http://www.cerebrata.com/products/cloudstoragestudio) which has a feature to import data from SQL Server to Azure Table Storage. I haven't checked for a long time but I believe ClumsyLeaf's TableXplorer (http://www.clumsyleaf.com) also has this feature. Long time back, we also built an open source tool to do the same. You can find it here: http://azuredatabaseupload.codeplex.com/.
As David mentioned, you could basically write some views in your database to fetch data older than 7 days. The idea is simple: You fetch the data, map the SQL Server data types to Azure data types, choose appropriate PartitionKey/RowKey values, convert the data into entities and then upload entities in batches.

Using the WCF Data Services client for Azure Table Storage - storing graphs of objects

I am working with Azure Table storage using the .NET API (TableServiceContext, WCF Data Service, etc). I have a simple graph of objects that I want to save to the table store. In the service context class, I have the following code.
_TableClient.CreateTableIfNotExist("AggRootTable");
this.AddObject("AggRoots", model);
foreach (var related in model.RelatedObjects)
{
this.AddRelatedObject(model, "RelatedCollection", related);
}
this.SaveChanges();
I have used this style of code in WCF Data Services via EF and a SQL Server, but it doesn't work against Azure Tables. I would not expect it to, as there aren't real relationships between tables in Azure. However, the methods are there. Does anyone know how to use AddRelatedObject, AddLink, etc in the context of Azure Tables? Or can suggest approaches to storing object graphs in general? I haven't been able to find any docs, and Google hasn't been helpful.
Thanks,
Erick
You can't. ATS does not support relationships. There are many non-working methods available due to it using data services API.
What you can do, however, is store the full object tree in a single table. Not sure if this will work for your design/architecture
also, it is a bad idea to keep calling CreateIfNotExists before every write operation. First, you pay extra for transactions that occur for the round-trip, second the call is not instantaneous and will slow down your writes.
just precreate the tables before deployment or during roles start.
The Table Storage Service is generally not a good place to store entire object graphs, since there's a size limit (of 1 MB, IIRC) on each row/entity. Obviously, if you know that your object graphs will never be large, you may not care...
A good alternative is often to store a serialized graph in Blob Storage. However, you must have a strategy for how to handle versioning.

Resources