How to get snapshot consistency when extracting data from MS Dynamics CRM? - dynamics-crm-2011

I've been researching how to extract data from an MS Dynamics CRM 2011/3 Online instance so that I can replicate entire CRM entities in a target database.
I've looked at the Retrieve and RetrieveAll operations of the Organisation web service. These are able to extract data from a single CRM entity (entity type).
There's also the FetchXML interface, that can retrieve data using a complex query, from multiple entities.
It's possible that will be be no quiet time, when there are no data changes being made by users, or via web services, that I could use to extract data from the system in order to get a consistent snapshot of the data.
If I was able to access the SQL Server database directly I would be able to set an isolation level for a transaction and extract all data within that transaction, and get a consistent view of data.
I think FetchXML would give me a consistent snapshot, but only of the data queried by each call to it.
I could use FetchXML to query all the entities I'd like to replicate, in a single call, and then renormalise the data, with some ETL code, on my target database. That query wouldn't be nice though (complex and possibly non performant, and impacting the system performance).
So, basically my problem is this: if I extract from each entity in turn, and the database is changing whilst I'm extracting, I'm highly like to get an inconsistent data set in my target database.
How can I get a consistent snapshot of data to access?

You can contact support through the Support Portal and request a database backup. Then you can just restore that database to your On-Premise installation through Deployment Manager.
EDIT
After your comments below, I suggest a "push" model instead of a "pull" model. You'll need to create plugins for Create/Update/Delete on all entities in which you are interested in CRM Online. These plugins will push those updates to your database (probably through your own web-service). Since these plugins happen inside the transaction, if your web service throws an error you can cancel the source action in CRM, thus guaranteeing transactional consistency.
Once you get these plugins up and running, you can do a one-time export and your plugins will keep it up to date from there.

Related

Best practice - Storage options for external reference data that is queried in different ways

We have a cloud platform with various Health Care applications. Each application needs what we call reference data. Reference data is always external data coming from a provider on a daily or some regular schedule. An example of reference data is FDB MedKnowledge which includes a comprehensive compendium of consumer medication monographs, along with drug images and imprints.
Various applications will query the reference data to present it to their target customers (who can be physicians, nurses, technicians, procurement department etc...). A common global API will be developed to return the requested data.
Historical information is required ( for ex: FDB in 2017 had NDC1 which then got deleted from the FDB feed in 2019. So a physician who prescribed NDC1 should be able to query the information of that drug going through history).
Daily we receive the feed from the external provider and use it as input source to merge ( update, insert, delete) our reference data copy such that its live table reflects the latest external feed.
In Azure, we have the following storage options:
Blob storage
Cosmos Db
Azure sql database with system versioning
Azure Datawarehouse
Azure Data lake
What is the best practice to store external reference data? We are leaning toward azure sql database with system versioning. Have any of you worked with external reference data? If yes, what is your storage decision and has it worked well for you? I would like to hear your comments and opinions. Thank you!
You need to base your choice on the type of data you are trying to store, and how you need to reference it. It sounds like you might actually need a few different technologies here.
For example, Azure SQL is great for storing relational data. So if your data is tabular in form and needs to have relationships between it, then this is a good choice. However, if you're going to be storing millions and millions of rows then performance might suffer in a relational database. In that sort of scenario, or one where you have lots of transactional data you might want to look at Cosmos DB.
You mentioned images at one point, putting these in a database is not a good idea, in this sort of scenario you are going to want to look at using blob storage.
"Reference Data" really doesn't mean anything, look at the individual types of data you need to store, and how this data is used, and make decisions based on this. For lots of different types of data, there is unlikely to be a one size fits all solution.

What does it mean that Azure Cosmos DB is multi-model?

Looking at the new Azure cosmos database, I'm a bit confused about the multi-model nature of it. Specifically, does it mean:
a) That the same underlying database/store can be queried multiple ways concurrently so that I can use both gremlin graph queries and mongodb api against the same collections.
or -
b) Does it mean that you can choose a different model (graph, key value, column, document) at the time of provisioning your Cosmos DB and that is how the data will be stored from then on.
The brochure makes it sound like a), but using the Azure dashboard to create a cosmos instance it makes it seem like b) since you have to choose a model type at creation.
Additionally, the literature makes reference to columnar data, but I don't see the option for it at create time.
Cosmos DB is a single NoSQL data engine, an evolution of Document DB. When you create a container ("database instance") you choose the most relevant API for your use case which optimises the way you interact with the underling data store and how the data is persisted in to that store.
So, depending on the API chosen, it projects the desired model (graph, column, key value or document) on to the underlying store.
You can only use one API against a container, multiple are not possible due to the way the data is stored and retrieved. The API dictates the storage model - graph, key value, column etc, but they all map back on to the same technology under the hood.
Thanks to #Jesse Carter's comment below it appears you are however able to mix and match the graph and DocumentSQL APIs.
From the docs:
Multi-model, multi-API support
Azure Cosmos DB natively supports multiple data models including documents, key-value, graph, and column-family. The core content-model of Cosmos DB’s database engine is based on atom-record-sequence (ARS). Atoms consist of a small set of primitive types like string, bool, and number. Records are structs composed of these types. Sequences are arrays consisting of atoms, records, or sequences.
The database engine can efficiently translate and project different data models onto the ARS-based data model. The core data model of Cosmos DB is natively accessible from dynamically typed programming languages and can be exposed as-is as JSON.
The service also supports popular database APIs for data access and querying. Cosmos DB’s database engine currently supports DocumentDB SQL, MongoDB, Azure Tables (preview), and Gremlin (preview). You can continue to build applications using popular OSS APIs and get all the benefits of a battle-tested and fully managed, globally distributed database service.
Cosmos DB at its heart is a geographically distributed database with its own Atom-Record-Sequence storage engine and index. On top of that infrastructure we are able to implement many different kinds of stores, from SQL like stores using our SQL API, to Mongo, to Cassandra, to Gremlin, to an implementation of Azure Table storage and so on.
Each of the different store types have their own data types (e.g. ways of encoding numbers, dates, etc.) and are encoded in our storage and index layer in their own way. Over time we expect most of those data types to be natively supported by our SQL API. But for now each of our data base types uses its own encoding conventions. When creating an account in Cosmos DB (this is a unit of organization, users can have many accounts) the "type" of Database is specified on the account. So one can have a Table API account or a Mongo account or what have you.
In some cases it is possible to access an account with Data Type X using API Y. For example, one can use SQL API to talk to tables in a Table API account. But outside of graph, that is usually not a great idea. Right now we encode information for each API in a special format and the different data types don't speak each other's formats. So if one were to write to a Table API using SQL API the end result will most likely be corrupt data.
The exception is graph which we work hard to make sure work reasonably well with all database types and we'll have more to say on that in the future.
So if you do want to play around with multi API access we strongly encourage you to only do so in "read only" mode when not using the "native" API for the given account. In other words, by all means play around with the SQL API reading from a Table API, just please don't write to a Table API account suing a SQL API client.
The accepted answer misses out on some points.
Cosmos DB is a NoSQL database, but it is highly distributed and we its storage format is Atom-Record-Sequence.
Why does that matter? We know that it accepts JSON as in- and output formats, that does not mean Cosmos stores its data as JSON, it could be any format actually. This helps us to reason about the multi-modelness of Cosmos: what you get when you execute a query according to a certain model is probably a projection or view of your data.
#JesseCarter already explained we can interchangeably use Document API and Graph API. Last week Table API got publicly announced and probably this API is not too different as well.
The guys over at Spectologic have written a nice blogpost about the Cross-API usage of Cosmos and have also pointed out that the multi-modelness is more cosmetics than internals, the only real exception seems Mongo. The interesting part gets pointed out in the chapter 'Switching the portal experience' here: https://blog.spectologic.com/2017/06/30/digging-into-cosmosdb-storage/
So maybe in the end it boils down to GlobalDocumentDb vs. MongoDb
I too was intrigued by this, wanting to understand more from a API usage auditing perspective and have learned more reading through these answers.
Upon experimenting it appear things have progressed further than the original answers, so to add a contemporary spin...
I have been able to successfully create a Cosmos DB account choosing the SQL API, created a document in the portal then retrieved the document via the MongoDB API.
The original answers suggested that MongoDB was the odd-one-out and couldn't interact with data created with other APIs.
Now whether with fuller testing this would result in corrupt documents due to the data type differences hinted upon by Yaron (https://stackoverflow.com/a/48286729/141022) and whether the storage differences would result in poor performance still as hints to that is to be seen.
For my purposes I'm interested to whether auditing one API is enough, which in this case it is not as data created in one can be retrieved by another, so I haven't tested in depth.
Notably, the ARM template deploys with neither GlobalDocumentDB nor MongoDB kind, however exporting the ARM template back from the portal results in GlobalDocumentDB if that happens to make a difference.
If you are interested in the implementation details of CosmosDB, you can read this whitepaper from a long time ago (assuming that the implementation hasn't changed). http://www.vldb.org/pvldb/vol8/p1668-shukla.pdf
TLDR:
At the bottom, CosmosDB stores data in ARS and exposes them in JSON format.
The database engine index ALL fields in ALL documents by default, therefore enabling very flexible query.
The database engine executes an intermediate language similar to JavaScript, bridging the low-level storage and APIs that database exposes.
Because of that bridging, more database APIs can be added to support different querying mechanism (e.g. SQL, document, columnar).
Multimodel means your data can be stored in a number of different ways. Currently, CosmosDB stores 4 different types of data and it allows you to integrate with an API and build out a user experience around these database storage types.
The 4 types are Document DB or Mongo DB, Graph Database, Key Value Paire, and Wide Column or Column Family.

CRM 2015: Archive options for Audit logs

We are using MS CRM 2015 and we are looking to know our options/best practice to archive audit logs. Any suggestion please? Thanks!
You can use the MSCRM Toolkit at http://mscrmtoolkit.codeplex.com/, which has a tool called Audit Export Manager to aid in archiving audit logs. The documentation for the tool is available at http://mscrmtoolkit.codeplex.com/documentation#auditexportmanager . The key items that this tool allows you to do is to do filtering by entities, Metadata, summary or detail, picking individual users, actions, and/or operations to include in your export. Exports can be limited to a particular date range and can be exported to CSV, XML, or XML spreadsheet 2003 format. Note that I've had trouble exporting with a couple of the formats, but typically get good results when exporting to CSV formats.
This is one of the tools I've found that gives you some flexibility when exporting audit records since Microsoft CRM allows you to filter the audit data, but doesn't provide a good built in means to export it.
You can try newest Stretch Database feature from SQL Server 2016:
Stretch Database migrates your cold data transparently and securely to the Microsoft Azure cloud.
Stretch warm and cold transactional data dynamically from SQL Server to Microsoft Azure with SQL Server Stretch Database. Unlike typical cold data storage, your data is always online and available to query. You can provide longer data retention timelines without breaking the bank for large tables like Customer Order History.
There is a helpful hands-on review SQL Server 2016 Stretch Database with very interesting SWITCH TABLE example.
Also there must be a solution with moving archived data from audit to separate filegroup. Take a look to Transferring Data Efficiently by Using Partition Switching:
You can use the Transact-SQL ALTER TABLE...SWITCH statement to quickly and efficiently transfer subsets of your data in the following ways:
Assigning a table as a partition to an already existing partitioned table.
Switching a partition from one partitioned table to another.
Reassigning a partition to form a single table.

what would be the best way to migration data from SQL Azure to Azure Table

For a project, I am using both SQL Azure and Azure table. A requirement here is that for the first 7 days, all data are stored in SQL Azure. After the first 7 days, the data are migrated into Azure table.
Is there any reliable project to achieve this goal? Or any idea to implement this?
thanks,
I think your best best is to have a set of SQL queries (or sprocs) that return data older than 7 days. Then have table-insertion code that writes this data to one or more tables, with appropriate partition/row key based on your query needs. Then, just build some type of background operation to perform the read+write+delete. There's no tool to do this (that I know of), since one is a relational database and the other is a NoSQL variant with no specific schema.
To optimize your writes, see if you can write batches of rows at the same time (this is called an Entity Group Transaction). It optimizes # of transactions, plus the rows in a group will be written atomically. See more info on entity group transactions, here.
You also may want to consider using a queue for workload assignment. That is, maybe once a day (or hour, whenever), push a queue message telling some background process to transfer data from SQL to Table Storage. This way, in case something fails during the operation, you can process it again later, since the queue message will still be there (you'd only delete the message if the operation succeeded).
If you're looking for a tool to do so, take a look at Cloud Storage Studio (http://www.cerebrata.com/products/cloudstoragestudio) which has a feature to import data from SQL Server to Azure Table Storage. I haven't checked for a long time but I believe ClumsyLeaf's TableXplorer (http://www.clumsyleaf.com) also has this feature. Long time back, we also built an open source tool to do the same. You can find it here: http://azuredatabaseupload.codeplex.com/.
As David mentioned, you could basically write some views in your database to fetch data older than 7 days. The idea is simple: You fetch the data, map the SQL Server data types to Azure data types, choose appropriate PartitionKey/RowKey values, convert the data into entities and then upload entities in batches.

Using the WCF Data Services client for Azure Table Storage - storing graphs of objects

I am working with Azure Table storage using the .NET API (TableServiceContext, WCF Data Service, etc). I have a simple graph of objects that I want to save to the table store. In the service context class, I have the following code.
_TableClient.CreateTableIfNotExist("AggRootTable");
this.AddObject("AggRoots", model);
foreach (var related in model.RelatedObjects)
{
this.AddRelatedObject(model, "RelatedCollection", related);
}
this.SaveChanges();
I have used this style of code in WCF Data Services via EF and a SQL Server, but it doesn't work against Azure Tables. I would not expect it to, as there aren't real relationships between tables in Azure. However, the methods are there. Does anyone know how to use AddRelatedObject, AddLink, etc in the context of Azure Tables? Or can suggest approaches to storing object graphs in general? I haven't been able to find any docs, and Google hasn't been helpful.
Thanks,
Erick
You can't. ATS does not support relationships. There are many non-working methods available due to it using data services API.
What you can do, however, is store the full object tree in a single table. Not sure if this will work for your design/architecture
also, it is a bad idea to keep calling CreateIfNotExists before every write operation. First, you pay extra for transactions that occur for the round-trip, second the call is not instantaneous and will slow down your writes.
just precreate the tables before deployment or during roles start.
The Table Storage Service is generally not a good place to store entire object graphs, since there's a size limit (of 1 MB, IIRC) on each row/entity. Obviously, if you know that your object graphs will never be large, you may not care...
A good alternative is often to store a serialized graph in Blob Storage. However, you must have a strategy for how to handle versioning.

Resources