Can you do distributed transactions across Azure SQL and Azure Cosmos Db? - azure

I have an C# application where my domain transactions are stored in Azure SQL. My event store I would like to utilize Azure Cosmos DB. I am wondering if a distributed transaction across them will work?

Of course distributed transactions work across whatever systems, if implemented correctly. But actually implementing a full architecture for a distributed atomic commit protocol with transaction managers is a daunting task.
If you mean built-in support in Azure - no, there is no support for that. There is only very limited support for transactions even within a single Cosmos DB container.

Related

Using ksqldb to join data from multiple types of source connectors

We are evaluating ksqldb as a ETL tool for our organization. Our entire app is hosted on Microsoft Azure and mostly PaaS offerings are preferable in our organization. However 1 use case is that we have multiple microservices with their own databases and we want to join the tables in the databases together to produce some data in a denormalized format for some other tasks. An example would be Users table containing user data whereas Orders table contains all the orders. Users maybe in SQL format in MySQL whereas Orders maybe in NoSQL format in MongoDB. Now we need to generate some report on by joining Orders and Users tables together based on user_id. This can be done in ksqldb by using some joins on streams/tables and adding source connectors to each of the databases. Then we can write a sink connector to a new MongoDB database that can have the joined Users_Orders info. So if new data is added and the connectors and joins are running our joined data in Users_Orders will also get updated.
With Azure Event Hub I read that using ksqldb in production will not be possible because of some licensing issues. So my questions are:
Before going into some other products like Azure HDInsights or Confluent Cloud is there any way of running ksqldb to achieve the same solution (perhaps like managing your own Kafka cluster)?
You don't necessarily need ksql; you should be able to do something similar with SparkSQL, offered in Azure (Databricks). You don't necessarily need Kafka / EventHub either since Spark could read, join, and write Mongo/JDBC data all on its own (with the appropriate plugins).
The main reason ksqlDB isn't a hosted service by Azure, is that it conflicts with Confluent Licensing, but that does not prevent you from running it yourself, as long as you also adhere to the licensing restrictions of not publicly offering the ksqlDB REST API as a publicly available / paid API. I've not personally tried, but ksqlDB should work against EventHubs on its own, I don't think you need to self manage Kafka as the documentation suggests.

Storing IOT Data in Azure: SQL vs Cosmos vs Other Methods

The project I am working on as an architect has got an IOT setup where lots of sensors are sending data like water pressure, temperature etc. to an FTP(cant change it as no control over it due to security). From here few windows service on Azure pull the data and store it into an Azure SQL Database.
Here is my observation with respect to this architecture:
Problems: 1 TB limit in Azure SQL. With higher tier it can go to 4 TB but that's the max. So it does not appear to be infinitely scalable plus with size, the query issues could be a problem. Columnstore index and partitioning seem to be options but size limitation and DTUs is a deal breaker.
Problem-2- IOT data and SQL Database(downstream storage) seem to be tightly coupled. If some customer wants to extract few months of data or even more with millions of rows, DB will get busy and possibly throttle other customers due to DTU exhaustion.
I would like to have some ideas on possibly scaling this further. SQL DB is great and with JSON support it is awesome but a it is not horizontally scalable solution.
Here is what I am thinking:
All the messages should be consumed from FTP by Azure IOT hub by some means.
From the central hub, I want to push all messages to Azure Blob Storage in 128 MB files for later analysis at cheap cost.
At the same time,  I would like all messages to go to IOT hub and from there to Azure CosmosDB(for long term storage)\Azure SQL DB(Long term but not sure due to size restriction).
I am keeping data in blob storage because if client wants or hires a Machine learning team to create some models, I would prefer them to pull data from Blob storage rather than hitting my DB.
Kindly suggest few ideas on this. Thanks in advance!!
Chandan Jha
First, Azure SQL DB does have Hyperscale which is much larger than 4TB. That said, there is a tipping point where it makes sense to consider alternative architectures when you get to be bigger than what one machine can handle for your solution. While CosmosDB does give you a horizontal sharding solution, you can do the same with N SQL Databases (there are libraries to help there). Stepping back, it is actually pretty important to understand what you want to do with the data if it were in a database. Both CosmosDB and SQL DB are set up for OLTP-style operations (with some limited forms of broader queries - SQL DB supports columnstore and batch mode, for example, which means you could do a reasonably-sized data mart just fine there too). If you are just storing things in the database in the hope of needing to support future data scientists, then you may or may not really need either of these two OLTP stores.
Synapse SQL is set up for analytics and generally has support to read from data in formats in Azure Storage. So, this may be a better strategy if you want to support arbitrarily-large IoT data and do analytics/ML processing over it.
If you know your solution will never be above , you may not need to consider something like Synapse, but it is set up for those scenarios if you are of sufficient size.
Option - 1:
Why don't you extract and serialize the data based on the partition id (device id), send it over the to IoT hub, where you can have the Azure Functions or Logic Apps that de-serializes the data into files that are stored in the blob containers.
Option - 2:
You can also attempt to create a module that extracts the data into excel file, which is then sent to the IoT hub to be stored in the storage containers.

SQL Azure and CDN

what is the best way to limit latency for SQL Azure in global applications?
My Application uses SQL Azure and would like to know based on the network location of users if its possible to connect SQL Azure near to users.
So Logically would need to have SQL Azure database with global replication but not geo-replication as each copy would serve as Master and not secondary.
Thank you in advance.
You may want to try CosmosDB to distribute data globally and obtain low latency as explained on this article and this documentation.
For replicating data using SQL Data Sync with Azure SQL Database, take in consideration paired regions which may reduce latency. With SQL Data Sync a hub database can be defined and many member database on another region, and data can be synched on both ways between the hub and any member database.

Microsoft Cosmos DB (DocumentDB API) vs. Cosmos DB (Table API)

Microsoft Cosmos DB includes DocumentDB API, Table API and others. I have about ~ 10 TB of data and would like to have a fast key-value lookup (very little updating and writing, mostly are reading). Add a link for Microsoft Cosmos DB:
https://learn.microsoft.com/en-us/azure/cosmos-db/
So how should I choose between DocumentDB API and Table API?
Or when should I choose DocumentDB API? When should I choose Table API?
Is it a good practice to use DcoumentDB API to store 10 TB of data?
The Azure Cosmos DB Table API was introduced to make Cosmos DB and its advanced indexing, geo-distribution, etc. features available to the Azure Table storage community. The idea is that someone using Azure Table storage who needs more advanced features only offered by Cosmos DB can literally just change their connection string and their existing code will work with Cosmos DB.
But if you are a greenfield customer then I would recommend using SQL API (formerly called Document DB API) which is a super set of Table API. We are constantly investing in providing more advanced features and capabilities to SQL API where as for Table API we are just looking to maintain compatibility with Azure Table storage's API which hasn't changed in many years.
How much data you have doesn't have any affect on what API you choose. They both have the same multi-model infrastructure and can handle the same sizes of data, query loads, distribution, etc.
So how should I choose between DocumentDB API and Table API?
Choosing between DocumentDB API and Table API will primarily depend on the kind of data that you're going to store. DocumentDB API provides a schema-less JSON database engine with SQL querying capabilities whereas Table API provides a key-value storage database service. Since you mentioned that your data is key-value based, recommended is that you use Table API.
Or when should I choose DocumentDB API? When should I choose Table API?
Same as above.
Is it a good practice to use DcoumentDB API to store 10 TB of data?
Both Document DB API and Table API are designed to store huge amounts of data.
However you may want to look into Azure Table Storage as well. Cosmos DB lets you fine tune the throughput that you need and robust indexing/querying support and that comes at a price. Azure Tables on the other hand comes with fixed throughput and limited indexing/querying support and is extremely cheap compared to Cosmos DB.
You may find this link helpful to explore more about Cosmos DB: https://learn.microsoft.com/en-us/azure/cosmos-db/introduction.
Please don't flag this as off-topic.
It might help for you to know in advance: if you are considering the document interface, then in fact there is a case-insensitivity that can affect how DataContract classes (and I believe all others) are transformed to and from Cosmos.
In the linked discussion below, you will see that there is a case insensitivity in Newtonsoft.Json that can have effects on your handling of objects that you pass or get directly from the API. Not that Cosmos has ANY flaws, and in fact it is totally excellent. But with a document API, you might (like me) start to simply pass DataContract objects into Cosmos (which is obviously not wrong, and in fact very much expected from the object API), but there are some serializer and naming strategy handler options that you are probably better of at least being aware of up front.
So just to add a note for you to be aware of this behavior with an object interface. The discussion is here on GitHub:
https://github.com/JamesNK/Newtonsoft.Json/issues/815

Alternative to Windows Azure tables out of the cloud

I'm developing a .NET app, which needs to run both on Azure and on regular Windows Servers(2003). It needs to store a few GB of data and SQL Azure is too expensive for me, so I'll use Azure tables in the cloud version. Can you recommend a storage solution, which will run on standalone servers and have an API and behavior similar to Azure tables? From what I've seen Server AppFabric does not include Tables.
If you think what Windows Azure Table Storage is, it is a Key-Value pair based non-relational databse which is accessible through REST API. Please download this document about Windows Azure and NoSQL database details.
If I were in your situation, my approach would have been to find something similar to Azure Table Storage which I can access over REST and have similar accessibility API. So if you try to find the similar database to run on a machine you really need to look for:
Key Value Pair DB
Support for basic operations i.e add, delete, insert, modify an entity
Partition Key and Row Key based Accessibility
RESTful Interface to connect
If you would want to try something you sure can look at:
DBreeze (C# based Key Value Pair NoSQL DB) I just saw it and looks exciting
Googles LevelDB (Key Value Pair DB, open source and available on Windows) I have no idea about API
Redis (Great Key-Value Pair DB but not sure for Windows compatibility and API)
Here is a list of key/value databases without additional indexing facilities are:
Berkeley DB
HBase
MemcacheDB
Redis
SimpleDB
Tokyo Cabinet/Tyrant
Voldemort
Riak
If none works, you sure can get any of open source DB and modify to work for your requirement and then make that available to others as your contribution to community.
ADDED
Now you can use Windows Azure Virtual Machine to run any kind of Key-Value pair DB on Linux or Windows Machine and connection with your application.
I'm not sure which storage solution to recommend, but just about any database solution would work provided that you write an Interface to abstract all your data storage code. Then write implementations of that interface for Azure Table storage and whatever other database you want to use on the non-cloud server
You should be doing that anyway so that your code isn't tightly coupled with Azure Table Storage APIs.
If you combine coding against that Interface with an IoC container, then a single line of code or a single configuration setting would enable you to switch between data implementations based on which platform the code is running on.

Resources