Currently we have an Azure SQL Server and each time we create new web app, we use EF Core Code First to generate the database. However after the database is created we manually go to Azure portal and add the newly created database to the Elastic Pool inside the Azure SQL Server. Is it possible to somehow automate the process so that each newly created database either via portal or generated using EF or whatever, it will be automatically added to the Pool?
You can use Transact-SQL to programmatically move an existing Azure SQL Database into an elastic pool.
ALTER DATABASE db1
MODIFY ( SERVICE_OBJECTIVE = ELASTIC_POOL ( name = pool1 ) ) ;
You have to use the Elastic database client library to manage the creation of the DB so that the ShardMapManager can register the database. Note that I said database (or shard in Elastic DB terminology) NOT Tenant (or Shardlet in Elastic DB terminology).
The entire Elastic DB components are as follows for SQL server, sharding was added via the Elastic Database SDK which involves the following components (some of which are available only in Azure):
• Elastic Database Client Library
Manage the data distributions and map Tenants to databases.
• Elastic Database Pools (Azure only)
Allocation of a pool of resources that can be shared across a number of databases. Allows for database to consumer resources at their own rate, rather than each database having a specific amount of resources that are available.
• Elastic Database Query
The ability to query across all Tenants within a Multi-Tenant database.
• Elastic Database Jobs
Package and reliably deploy database maintenance operations or database schema changes to multiple databases.
• Elastic Transactions
Process changes to several databases in an atomic and isolated way.
• Elastic Database Split-Merge Tools
Allows for the movement of shards between databases that are participating in the sharding framework
Important concepts using this library is that a Shard can have one or mapping to hold Shardlets (or mappings from the ShardMapManager). In fact the ShardMapManager maintains two collections: Shards (getShards), and Mapping (GetMappings). Mappings are ShardKeys are MAPPED to the Shard...
You have can two types of Shard Key definitions: RangeMapShard, and ListMapShard - both which inherit ShardMap (which contains a property ShardMapType to define the subtype that instantiated the ShardMap).
RangeMapShard - hold a RANGE of keys for ONE mapping.. e.g. customer 100-200 (200 is one higher in value than what is allowed in the range (crappy way of defining it in my opinion... but that is MS documentation) so what they mean is 100-199, 200-299 is defined using ranges of 100-200, and 200-300.
ListMapShard - hold one Shard Key for each Shardlet. So 1 value of a ShardKey is equal to 1 ShardMap ( a Shard can hold one or more Shardlets - so a Shard can hold mappings... )...
So you have to map your shard key(s) to a shard map, and then associated the shard map to a shard, and the shard has to be associated with a Database .... remember, you can create a database that does not have a shard (or schema) defined within it.
I found that I wanted to think of this the other way around, starting from the DB, and sticking in the data into that DB. IMO, the Elastic DB SDK works the other way, you define the Shardlet (tenant), map that to shard(schema), and then stick the Shard(schema) into a database.
HTH.
Related
I have an Azure SQL server that contains 15 DBs and it keeps rising every week.
When I create a new DB I need to run several queries for creating a login, user, and assigned roles. All this done using the SQL server admin and the SQL is ran agains the master DB.
I want to automate this entire process.
I was thinking to create a stored procedure that will run all the queries and then I need to run the store procedure.
My problem is: Where do I need to create the store procedure?
I was thinking to create it at the master DB level and share it with all future DBs I will create. Is it possible?
I read that in order to do it I need to name the store procedures with the prefix "sp_".
I tired it and it didn't work.
Thank you for the help, Tal
You can use elastic jobs to run those queries on one or a set of existing databases, or newly created databases. Elastic Jobs provide the ability to run one or more T-SQL scripts in parallel, across a number of databases, on a schedule or on-demand.
You can use elastic jobs not only to set them initially, but to later keep them standardize, applying changes to all of them at once when needed. You can run scheduled jobs against any combination of databases: one or more individual databases, all databases on a server, all databases in an elastic pool, or shard map, with the added flexibility to include or exclude any specific database. Jobs can run across multiple servers, multiple pools, and can even run against databases in different subscriptions
I am trying to understand the differences between the new CockroackDB and other distributed SQL databases as compared to a cloud-managed database like Azure SQL Database.
It seems there is no difference in the use cases between them:
Like various NOSQL databases SQL (in general) allows partitioning keys.
I can add cores in Azure to increase the performance as needed, I can also switch to Hyper-scale if I have an elastic workload.
I can have read replication across multiple nodes over multiple availability zones (geo-locations)
I can configure data replication in Azure SQL Database too.
It seems to me that a cloud SQL database covers all the use cases the newer distributed databases cover, so why would I want to use a newer product ?
Isn't Azure SQL Database basically a distributed database server ?
Am I missing something ?
Is Azure SQL Server a Distributed SQL database?
No.
Like various NOSQL databases SQL (in general) allows partitioning keys.
Partitioning in NoSQL databases like Cassandra (and Azure Table Storage) is about distributing partitions to physically distinct nodes, and requires rows to have an explicitly set partition-key value.
Cassandra nodes are physically different machines that can run independently, which gives it excellent resiliency.
Partitioning in SQL Server, Azure SQL, and Azure SQL Managed Instance is about dividing data up into row-groups that exist in the same server for performance, not resiliency.
On on-prem MS SQL Server, these row-groups (well, partitions) can exist in different FILEGROUPs, which means they can exist in different storage volumes to avoid IO bottlenecks, but Azure SQL does not support multiple FILEGROUPs.
The benefits of implementing partitioning, including on Azure SQL, are documented online - and the article explains how it's about performance, not resilience.
I can add cores in Azure to increase the performance as needed, I can also switch to Hyper-scale if I have an elastic workload.
This fact has absolutely nothing to do with distributed databases.
I can have read replication across multiple nodes over multiple availability zones (geo-locations).
I can configure data replication in Azure SQL Database too.
Replication isn't the same thing as a true distributed database:
In Cassandra and other distributed databases, all clients can connect to all nodes and accomplish the same tasks; and you can arbitrarily add and remove nodes while the system is running.
In SQL Server and Azure SQL's replication feature, the replica is strictly a "secondary" that is subordinate to your primary server.
Clients can connect to either the secondary or the primary, but the secondary server can only perform read-only queries, whereas if a client wants to do DML (INSERT/UPDATE/DELETE/MERGE) or DDL (CREATE/ALTER) then the client must connect to the primary server.
It seems to me that a cloud SQL database covers all the use cases the newer distributed databases cover, so why would I want to use a newer product?
It can't: because Azure SQL is not a distributed database it cannot allow any client to read and write to any node or endpoint and have that change replicated to all other nodes (using an eventual consistency model). Instead, Azure SQL requires writes to be performed by the single primary "server".
Note that an Azure SQL "server" or logical server is largely an abstraction that hides what Azure SQL really is: a distinct build of SQL Server's engine that runs in a high-availability Azure Service Fabric environment (which is how cores/RAM can be added and removed while it's running and provides for some kind of local resilience against hardware failure) in a single Azure datacenter.
I have multiple devices with assignments, where each generating similar in structure data offline. Also each device periodically gets online to sync with an Azure SQL database that is separate and only assign to it. The devices also received new assignment through syncing with the Azure SQL database.
I want to combine these multiple database into a single database for managing, while bidirectionally getting updates when a sync goes through and also relaying back any assignments to the separate databases.
Any help or ideas would be much appreciated.
You can use Azure SQL Data Sync for the same purpose, which can update bi-directionally and can be scheduled to run according to requirements. However for multiple databases, we need to create multiple sync groups.
Set up SQL Data Sync between databases in Azure SQL Database and SQL Server
I have come across the requirement where I have to choose the API for Cosmos DB.
I have gone through with all API's like SQL,Graph, Mongo and Table. Since my current project structure is based on Table storage where I am storing IoT Device data.
In Current structure (Table storage) :
I have a separate Table for each Device with payload like below
{
Timestamp,
Parameter name,
value
}
Now If I plan to use Cosmos DB then I can see that I have to Provision RU/throughput against each table which I think going to be big cost. I have not found any way to assign RU on database level so that my allocated RU can be shared across all tables.
Please let me know in case we have something here.... or is it the limitation i can treat for CosmosDB with Table API?
As far as I can see SQL API and consider my use case I can create a single data base and then multiple collection (with the name of Table) and then I have both option for RU provision like on Database as well as on Device level which give me more control on cost.
You can set the throughput on the account level.
You can optionally provision throughput at the account level to be shared by all tables in this account, to reduce your bill. These settings can be changed ONLY when you don't have any tables in the account. Note, throughput provisioned at the account level is billed for, whether you have tables created or not. The estimate below is approximate and does not include any discounts you may be entitled to.
Azure Cosmos DB pricing
The throughput configured on the database is shared across all the containers of the database. You can choose to explicitly exclude certain containers from database provisioning and instead provision throughput for those containers at container level.
A Cosmos DB database maps to the following: a database while using SQL or MongoDB APIs, a keyspace while using Cassandra API or a database account while using Gremlin or Table storage APIs.
You can embed Cerebrata into the situation where the tools allow you to assign any number of throughput values post assigning the throughput type (fixed, auto-scale, or no throughput)
Disclaimer: It’s purely based on my experience
Azure SQL Database has two similar flavors - Managed Instance and Elastic pools. Both flavors enables placing multiple databases that share the same resources and in both cases can be changed cpu/storage for entire group of database within the instance/pool. What is the difference between them?
Azure SQL Database Elastic Pool is a shared resource model for Single Azure SQL PaaS databases to achieve higher resource utilization efficiency, and all the databases within an elastic pool share predefined resources within the same elastic pool. The emphasis of this offering is on a simplified database-scoped programming model for multi-tenant SaaS apps where the workload pattern is well defined and delivers high cost-effectiveness when serving many tenants.
SQL Database Managed Instance offers a simplified instance-scoped programming model that is like an on-premises SQL Server instance. The databases in Managed Instance share the resources allocated to the Managed Instance, and the Managed Instance also represents the management grouping for these databases. The emphasis of this offering is on high compatibility with the programming model of on-premises SQL Server and out-of-box support for the large majority of SQL Server features and accompanying tools/services.
Some high-level guidelines might be:
Use Elastic pools if you need to group a large number of single
database that don't need all instance Transact-SQL functionalities
that exist in SQL Server.
Use Managed Instance if you want to migrate
a large number of SQL Server database that heavily use instance level
features such as CLR, Service Broker, SQL Agent, etc.
See more info in Azure SQL IaaS vs PaaS Comparison Table