How well does Azure SQL Data Warehouse scale? - azure

I want to replace all my on-prem DW on SQL Server and use Azure SQL DW. My plan is to remove the spoke and hub model that I currently use for my on-prem SQL and basically have a large Azure SQL DW instance that scale with my client base (currently at ~1000). Would SQL DW scale or I need to retain my spoke and hub model?

Azure SQL Data Warehoue is a great choice for removing your hub and spoke model. The service allows you to scale storage and compute independently to meet the compute and storage needs of your data warehouse. For example, you may have a large number of data sets that are infrequently accessed. SQL Data Warehouse allows you to have a small number of compute resources (to save costs) and using SQL Server features like table partitioning access only the data in the "hot" partitions efficiently - say the last 6 months.
The service offers the ability to adjust the compute power by moving a slider up or down - with the compute change happening in about 60 seconds (during the preview). This allows you to start small - say with a single spoke - and add over time making the migration to the cloud easy. As you need more power, you can simply add DWU/Compute resources by moving the slider to the right.
As the compute model scales, the number of query and loading concurrency slots increase offering you the ability to support larger numbers of customers. You can read more about Elastic performance and scale with SQL Data Warehouse on Azure.com

Related

Storing IOT Data in Azure: SQL vs Cosmos vs Other Methods

The project I am working on as an architect has got an IOT setup where lots of sensors are sending data like water pressure, temperature etc. to an FTP(cant change it as no control over it due to security). From here few windows service on Azure pull the data and store it into an Azure SQL Database.
Here is my observation with respect to this architecture:
Problems: 1 TB limit in Azure SQL. With higher tier it can go to 4 TB but that's the max. So it does not appear to be infinitely scalable plus with size, the query issues could be a problem. Columnstore index and partitioning seem to be options but size limitation and DTUs is a deal breaker.
Problem-2- IOT data and SQL Database(downstream storage) seem to be tightly coupled. If some customer wants to extract few months of data or even more with millions of rows, DB will get busy and possibly throttle other customers due to DTU exhaustion.
I would like to have some ideas on possibly scaling this further. SQL DB is great and with JSON support it is awesome but a it is not horizontally scalable solution.
Here is what I am thinking:
All the messages should be consumed from FTP by Azure IOT hub by some means.
From the central hub, I want to push all messages to Azure Blob Storage in 128 MB files for later analysis at cheap cost.
At the same time,  I would like all messages to go to IOT hub and from there to Azure CosmosDB(for long term storage)\Azure SQL DB(Long term but not sure due to size restriction).
I am keeping data in blob storage because if client wants or hires a Machine learning team to create some models, I would prefer them to pull data from Blob storage rather than hitting my DB.
Kindly suggest few ideas on this. Thanks in advance!!
Chandan Jha
First, Azure SQL DB does have Hyperscale which is much larger than 4TB. That said, there is a tipping point where it makes sense to consider alternative architectures when you get to be bigger than what one machine can handle for your solution. While CosmosDB does give you a horizontal sharding solution, you can do the same with N SQL Databases (there are libraries to help there). Stepping back, it is actually pretty important to understand what you want to do with the data if it were in a database. Both CosmosDB and SQL DB are set up for OLTP-style operations (with some limited forms of broader queries - SQL DB supports columnstore and batch mode, for example, which means you could do a reasonably-sized data mart just fine there too). If you are just storing things in the database in the hope of needing to support future data scientists, then you may or may not really need either of these two OLTP stores.
Synapse SQL is set up for analytics and generally has support to read from data in formats in Azure Storage. So, this may be a better strategy if you want to support arbitrarily-large IoT data and do analytics/ML processing over it.
If you know your solution will never be above , you may not need to consider something like Synapse, but it is set up for those scenarios if you are of sufficient size.
Option - 1:
Why don't you extract and serialize the data based on the partition id (device id), send it over the to IoT hub, where you can have the Azure Functions or Logic Apps that de-serializes the data into files that are stored in the blob containers.
Option - 2:
You can also attempt to create a module that extracts the data into excel file, which is then sent to the IoT hub to be stored in the storage containers.

Optimizing Azure Analysis Services processing on Synapse Analytics

I have a massive model with billions of records within Synapse Analytics that needs to be loaded into Azure Analysis Services. At present we have the tables partitioned by month, datatypes and joins are perfect and we have reduced the tables to ONLY the columns we need. We have scaled Synapse to DWU 5000c and Azure Analysis Services to S9v2 (the max setting). Given above it still takes hours and hours to process full for the first load with up to 64 partitions running in parallel given concurrency in Synapse.
I am curious, is there a setting that can enable a 'mass bulk import process full' that I am missing at this point?
For example here: https://www.businessintelligenceinfo.com/business-intelligence/self-service-bi/building-an-azure-analysis-services-model-on-top-of-azure-blob-storage-part-3, he is loading differently as from storage blobs and was able to load 1 TB by changing the config as: ReadBlobData(Source, ““, 1, 50). If there is any similar config for Synapse to Analysis Services?

Azure Synapse with SSIS package

I have used several Azure services to upload data from on-premises to Azure SQL DW for Power BI.
SQL Server (On Perm) -> Azure Data Factory (SSIS IR, SSIS in Azure SQL Database) -> Azure SQL Database<br/>
2 Azure services used
However, we find that the data size is growing much bigger than design the platform.
We are planning to change to Azure Synapse.
But based on Microsoft Documentation, it seems the Data Factory (Preview) did not come with SSIS IR.
Here is what come up on my mind:
SQL Server (On Perm) -> Azure Data Factory (SSIS IR, SSIS in Azure SQL Database) -> Azure Synapse<br/>
3 Azure services used
I wonder does it have a better way for Synapse with SSIS.
Many thanks.
Azure SQL database now has a Hyperscale service tier.
The Hyperscale service tier in Azure SQL Database provides the following additional capabilities:
Support for up to 100 TB of database size
Nearly instantaneous database backups (based on file snapshots stored
in Azure Blob storage) regardless of size with no IO impact on
compute resources
Fast database restores (based on file snapshots) in minutes rather
than hours or days (not a size of data operation)
Higher overall performance due to higher log throughput and faster
transaction commit times regardless of data volumes
Rapid scale out - you can provision one or more read-only nodes for
offloading your read workload and for use as hot-standbys
Rapid Scale up - you can, in constant time, scale up your compute
resources to accommodate heavy workloads when needed, and then scale
the compute resources back down when not needed.
Since the Azure Synapse Analytics is not supported with SSIS IR, I think scale up the Azure SQL Database service tier is good choice for you.

Migrate data from on-prem DW to Azure

I have an on-prem Dat Warehouse using SQL Server, what is the best way to load the data to SQL Data Warehouse?
The process of loading data depends on the amount of data. For very small data sets (<100 GB) you can simply use the bulk copy command line utility (bcp.exe) to export the data from SQL Server and then import to Azure SQL Data Warehouse. For data sets greater than 100 GB, you can export your data using bcp.exe, move the data to Azure Blob Storage using a tool like AzCopy, create an external table (via TSQL code) and then pull the data in via a Create Table As Select (CTAS) statement.
Using the PolyBase/CTAS route will allow you to take advantage of multiple compute nodes and the parallel nature of data processing in Azure SQL Data Warehouse - an MPP based system. This will greatly improve the data ingestion performance as each compute node is able to process a block of data in parallel with the other nodes.
One consideration as well is to increase the amount of DWU (compute resources) available in SQL Data Warehouse at the time of the CTAS statement. This will increase the number of compute resources adding additional parallelism which will decrease the total ingestion time.
SQL database migration wizard is a helpful tool to migrate schema and data from an on-premise database to Azure sql databases.
http://sqlazuremw.codeplex.com/

Load data to Azure SQL DW

I have large amount of data to be loaded for SQL DW. What is the best way to get the data to Azure? Should I use Import/Export or AzCopy? How long would it take for each methods?
The process of loading data depends on the amount of data. For very small data sets (<100 GB) you can simply use the bulk copy command line utility (bcp.exe) to export the data from SQL Server and then import to Azure SQL Data Warehouse.
For data sets greater than 100 GB, you can export your data using bcp.exe, move the data to Azure Blob Storage using a tool like AzCopy, create an external table (via TSQL code) and then pull the data in via a Create Table As Select (CTAS) statement. This works well update to a TB or two depending on your connectivity to the cloud.
For really large data sets, say greater than a couple of TBs, you can use the Azure Import/Export service to move the data into Azure Blob Storage and then load the data with PolyBase/CTAS.
Using the PolyBase/CTAS route will allow you to take advantage of multiple compute nodes and the parallel nature of data processing in Azure SQL Data Warehouse - an MPP based system. This will greatly improve the data ingestion performance as each compute node is able to process a block of data in parallel with the other nodes.
One consideration as well is to increase the amount of DWU (compute resources) available in SQL Data Warehouse at the time of the CTAS statement. This will increase the number of compute resources adding additional parallelism which will decrease the total ingestion time.
You can go through the documentation below and figure out which option suits you best.
https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-overview-load/
If you already have data in an on-premise SQL Server, you can use the migration wizard tool to load that data to Azure SQL DB.
http://sqlazuremw.codeplex.com/

Resources