I am trying to understand the differences between the new CockroackDB and other distributed SQL databases as compared to a cloud-managed database like Azure SQL Database.
It seems there is no difference in the use cases between them:
Like various NOSQL databases SQL (in general) allows partitioning keys.
I can add cores in Azure to increase the performance as needed, I can also switch to Hyper-scale if I have an elastic workload.
I can have read replication across multiple nodes over multiple availability zones (geo-locations)
I can configure data replication in Azure SQL Database too.
It seems to me that a cloud SQL database covers all the use cases the newer distributed databases cover, so why would I want to use a newer product ?
Isn't Azure SQL Database basically a distributed database server ?
Am I missing something ?
Is Azure SQL Server a Distributed SQL database?
No.
Like various NOSQL databases SQL (in general) allows partitioning keys.
Partitioning in NoSQL databases like Cassandra (and Azure Table Storage) is about distributing partitions to physically distinct nodes, and requires rows to have an explicitly set partition-key value.
Cassandra nodes are physically different machines that can run independently, which gives it excellent resiliency.
Partitioning in SQL Server, Azure SQL, and Azure SQL Managed Instance is about dividing data up into row-groups that exist in the same server for performance, not resiliency.
On on-prem MS SQL Server, these row-groups (well, partitions) can exist in different FILEGROUPs, which means they can exist in different storage volumes to avoid IO bottlenecks, but Azure SQL does not support multiple FILEGROUPs.
The benefits of implementing partitioning, including on Azure SQL, are documented online - and the article explains how it's about performance, not resilience.
I can add cores in Azure to increase the performance as needed, I can also switch to Hyper-scale if I have an elastic workload.
This fact has absolutely nothing to do with distributed databases.
I can have read replication across multiple nodes over multiple availability zones (geo-locations).
I can configure data replication in Azure SQL Database too.
Replication isn't the same thing as a true distributed database:
In Cassandra and other distributed databases, all clients can connect to all nodes and accomplish the same tasks; and you can arbitrarily add and remove nodes while the system is running.
In SQL Server and Azure SQL's replication feature, the replica is strictly a "secondary" that is subordinate to your primary server.
Clients can connect to either the secondary or the primary, but the secondary server can only perform read-only queries, whereas if a client wants to do DML (INSERT/UPDATE/DELETE/MERGE) or DDL (CREATE/ALTER) then the client must connect to the primary server.
It seems to me that a cloud SQL database covers all the use cases the newer distributed databases cover, so why would I want to use a newer product?
It can't: because Azure SQL is not a distributed database it cannot allow any client to read and write to any node or endpoint and have that change replicated to all other nodes (using an eventual consistency model). Instead, Azure SQL requires writes to be performed by the single primary "server".
Note that an Azure SQL "server" or logical server is largely an abstraction that hides what Azure SQL really is: a distinct build of SQL Server's engine that runs in a high-availability Azure Service Fabric environment (which is how cores/RAM can be added and removed while it's running and provides for some kind of local resilience against hardware failure) in a single Azure datacenter.
Related
Azure SQL Database has two similar flavors - Managed Instance and Elastic pools. Both flavors enables placing multiple databases that share the same resources and in both cases can be changed cpu/storage for entire group of database within the instance/pool. What is the difference between them?
Azure SQL Database Elastic Pool is a shared resource model for Single Azure SQL PaaS databases to achieve higher resource utilization efficiency, and all the databases within an elastic pool share predefined resources within the same elastic pool. The emphasis of this offering is on a simplified database-scoped programming model for multi-tenant SaaS apps where the workload pattern is well defined and delivers high cost-effectiveness when serving many tenants.
SQL Database Managed Instance offers a simplified instance-scoped programming model that is like an on-premises SQL Server instance. The databases in Managed Instance share the resources allocated to the Managed Instance, and the Managed Instance also represents the management grouping for these databases. The emphasis of this offering is on high compatibility with the programming model of on-premises SQL Server and out-of-box support for the large majority of SQL Server features and accompanying tools/services.
Some high-level guidelines might be:
Use Elastic pools if you need to group a large number of single
database that don't need all instance Transact-SQL functionalities
that exist in SQL Server.
Use Managed Instance if you want to migrate
a large number of SQL Server database that heavily use instance level
features such as CLR, Service Broker, SQL Agent, etc.
See more info in Azure SQL IaaS vs PaaS Comparison Table
what is the best way to limit latency for SQL Azure in global applications?
My Application uses SQL Azure and would like to know based on the network location of users if its possible to connect SQL Azure near to users.
So Logically would need to have SQL Azure database with global replication but not geo-replication as each copy would serve as Master and not secondary.
Thank you in advance.
You may want to try CosmosDB to distribute data globally and obtain low latency as explained on this article and this documentation.
For replicating data using SQL Data Sync with Azure SQL Database, take in consideration paired regions which may reduce latency. With SQL Data Sync a hub database can be defined and many member database on another region, and data can be synched on both ways between the hub and any member database.
I am new in Azure and HBase .
Say that I have 2 HDInsight (HBase ) cluster one installed in Asia and one on Europe, to get a better read/write performance for users access from different country.
but How to run a query over all data of these clusters ? Do I need to run query separately on all the clusters then combine the results ? Or there is some build-in functions like Distributed Queries for SQLserver
There is no distributed query across clusters in HBase. In your scenario the best solution would probably be setting up replication between two hbase clusters and then querying one of them. The data in both clusters will be complete with the data from the other cluster a few minutes stale as replication is asynchronous. You can also setup more complex replication typologies and have a separate central cluster that has superset of data while two others have their local subsets.
HDInsight team is working on documentation for replication setup in Azure. For now you would need to discover configuration yourself. You would need to provision clusters in the VNets, connect VNets, ensure they have name resolution setup correctly and then use hbase replication setup steps to setup replication itself: http://hbase.apache.org/book.html#_cluster_replication
Without replication solution you would need to query both clusters separately.
What is the difference between Data Sync and Standard Geo Replication on SQL Azure databases?
I understand that Active Geo Replication provides the ability to connect to a replicated database whereas Standard does not allow connections. However, how does Data Sync differ? I know it's not immediate replication, but I need to point my BI software to a replication and am debating which configuration I use for replication and disaster recovery.
Data Sync allows you specify what to sync (e.g., which tables), specify sync interval (e.g, 5mins, 15mins), replicas are read/write and allows you to specify how to resolve conflicts (e.g., hub wins, client wins), databases can exists independent of other databases.
My team development environment is based on local databases (SQL Server) and now I need to move our application to be based on SQL Azure Federations.
There's any way to "emulate" SQL Azure Federations in a local environment? Or should our development environment change?
afaik, you can't.
while you can simulate partitioning of data on your local SQL Server in terms of where they get stored (e.g, table partitions or partitioned views), you can't simulate the FEDERATION statements (i.e., USE FEDERATION, etc...) as they are only valid in Azure SQL Database Federations.