Azure Synapse with SSIS package

Azure Synapse with SSIS package - azure

I have used several Azure services to upload data from on-premises to Azure SQL DW for Power BI.
SQL Server (On Perm) -> Azure Data Factory (SSIS IR, SSIS in Azure SQL Database) -> Azure SQL Database<br/>
2 Azure services used
However, we find that the data size is growing much bigger than design the platform.
We are planning to change to Azure Synapse.
But based on Microsoft Documentation, it seems the Data Factory (Preview) did not come with SSIS IR.
Here is what come up on my mind:
SQL Server (On Perm) -> Azure Data Factory (SSIS IR, SSIS in Azure SQL Database) -> Azure Synapse<br/>
3 Azure services used
I wonder does it have a better way for Synapse with SSIS.
Many thanks.

Azure SQL database now has a Hyperscale service tier.
The Hyperscale service tier in Azure SQL Database provides the following additional capabilities:
Support for up to 100 TB of database size
Nearly instantaneous database backups (based on file snapshots stored
in Azure Blob storage) regardless of size with no IO impact on
compute resources
Fast database restores (based on file snapshots) in minutes rather
than hours or days (not a size of data operation)
Higher overall performance due to higher log throughput and faster
transaction commit times regardless of data volumes
Rapid scale out - you can provision one or more read-only nodes for
offloading your read workload and for use as hot-standbys
Rapid Scale up - you can, in constant time, scale up your compute
resources to accommodate heavy workloads when needed, and then scale
the compute resources back down when not needed.
Since the Azure Synapse Analytics is not supported with SSIS IR, I think scale up the Azure SQL Database service tier is good choice for you.

Related

How to implement "High Availability" for Azure Synapse Analytics?

Does Azure Synapse Analytics support Geo-Redundancy like Storage Account & Key vault? If not, why do I implement High availability for Azure Synapse Analytics? I have the following components as a part of the Azure Synapse Analytics Solution
SQL Dedicated Pool
SQL Serverless Pool
Spark Pool
Storage Account(ADLS)
Azure DevOps Git Repo

First, designing and documenting a Disaster Recovery plan is a project unto itself. I’ve been working on one for a client of mine using Synapse for several months part-time.
The first task is to define your Recovery Time Objective (RTO, meaning how long before your solution is back up in the event of a disaster) and your Recovery Point Objective (RPO, meaning how many minutes or hours of data you can afford to lose… and with analytics solutions you can usually reload from the source to catch up). If your RTO and RPO are low for an analytics solution (like 2 hours) then you probably need to spin up parallel environments in another region and load data to both environments in parallel. If your RTO and RPO are typical for an analytics solution (24-48 hours) then you can probably survive with ensuring backups are geo-redundant and restoring in the event of an outage. I would recommend you preconfigured your Synapse workspace and other infrastructure before the outage unless you have a trust an infrastructure as code solution. If your RPO and RTO are long (like 7 days) it’s extremely unlikely an Azure service or region is going to be down for that long.
ADLS supports RA-GRS redundancy so you could read all the files from the secondary endpoint in its pair region and copy files to another ADLS in the secondary region. Unfortunately ADLS accounts don’t yet support user-initiated failover.
Dedicated SQL Pools support built-in geo redundant backups once a day but you can’t control when they are taken. If this isn’t acceptable then you need to proactively create a user-defined restore point and proactively restore it cross region and pause the SQL pool.
Synapse Serverless SQL pools have no storage so ensure you have a backup of the schema (views, permissions, external data sources, external tables, etc) in source control or somewhere. The data will failover with ADLS.
For Spark Pools ensure you have your notebook artifacts in source control and you can always run them in a different Synapse workspace in another region when needed. Document your cluster configs.
Write out a disaster recovery playbook and do a DR drill periodically (once a quarter or once a year).
Here is another author’s description of the DR plan for Synapse.

What is the cost to download/transfer data from Azure SQL database to local?

I am new to Azure SQL Server.
We have SQL Server database in Azure. We are stopping the Azure subscription as the web application that uses the Azure SQL database has been terminated.
We need to download/transfer the web application data from Azure SQL Server to our local storage.
Will it cost us to download/transfer/export the data present in Azure SQL Server database?

Yes, outbound data transfer costs but it is usually quite minimal.
Here is the pricing page: https://azure.microsoft.com/en-us/pricing/details/bandwidth/.
It'll depend slightly on your region, but the first 5 GB are free each month.

Azure Data Factory architecture with Azure SQL database to Power BI

I'm no MS expert - recently hopped onto the Azure train and apologies in advance if I get some information wrong.
Basically need some input in Azure's architecture utilising Azure Data Factory (as the ETL/ELT tool) and Azure SQL database (as the storage), to a BI output - Power BI. My situation is this;
I have on-premise data sources such as Oracle DB, Oracle Cloud SSAS, MS SQL server db
I'd like to have a MS cloud infrastructure solution for reporting purposes.
No data migration needed - merely pumping on-prem data onto cloud and producing a BI reporting solution
Based on my limited knowledge and Google research, Azure Data Factory caters for all my on-prem sources, as well as the future cloud Azure SQL database. If future analysis is needed, Azure Storage and Azure Databricks can be added in to this architecture. I have sketched out the architecture of my proposed solution.
Just confirming my understanding
Without Azure Storage & Databricks (the 2 pink boxes), the 2 Azure component (DF & SQL database) is sufficient to take data from on-premise sources, process on cloud & output into Power BI.
With Azure Storage & Databricks (the 2 pink boxes), processing will be more efficient as their summarised function is to store training data models & act as an analytics processing engine.
Azure SQL database is more suitable, as compared to Azure SQL datawarehouse as my data sources does not exceed 1TB; cost-wise is cheaper AND one of my data sources contain data from call centers, hence OLTP is more suitable. Plus I have Azure Databricks to support the analytical bit that SQL datawarehouse does (OLAP).
Any other comments to help me understand this whole architecture will be great!

I am a new learner of Azure. I was wondering if we have #Query (value="...") kind or any equivalence for DocumentDb (CosmosDB). Because, the documentDB does not take #Query. I am looking to convert the sql query (From jpa to cosmosDB).

Taking data from on-prem or IaaS sources like SQL on a VM, Oracle etc, requires a Self-Hosted Integration Runtime (SHIR).
Please review the Modern Data Warehouse pattern which sounds similar to what you are proposing.

Azure Data Lake Analytics Vs Azure SQL Data Warehouse

I am using ADF to connect to sources and get data into Azure Data Lake store. After getting data into Data Lake Store, I want to do some transformation, aggregation and use that data in SSRS reports and also for creating Cubes.
Can anyone suggest me which will be the best option (Azure Data Lake Analytics or Azure SQL DW) ?
I am looking here to make a decision on to take which one after Data lake.

There are no more Azure SQL DW. What we have now are Azure Synapse (same as Azure DW) and Azure Synapse Analytics (instead of Azure Datalake analytics). Microsoft is stopping support (develop) USQL and Azure Datalake analytic. If volume of your data is huge and you want use Polybase technology the best choice is Azure Synapse and Azure Synapse Analytics. You can rich your ADF by using Databricks to do analytics stuff. By using Polybase you can do ELT instead of ETL.

Microsoft Azure is not anymore investing on Azure Data Lake Analytics (ADLA) , you can evidently see that number of enhancements /updates in last couple of years are almost none in ADLA. While on the other side Azure SQL Data Warehouse is their flagship service ( recently names as azure synapse analytics) and hence getting enhanced and updated very fast. Synapse is based on MPP architecture and provides all required capabilities of big data computing.

What is the size of your data? Azure Data Lake is more meant for petabyte size big data processing and Azure SQL Data Warehouse for large relational DWH solutions (starting from 250/500 GB and up).
With Azure Data Lake you can even have the data from a data lake feed a NoSQL database, a SSAS cube, a data mart, or go right into Power BI. With Azure SQL Datawarehouse you can have cubes, Power BI reports and SSRS
If you need SQL Server Reporting Services, Integration Services (and you have complex SSIS logic), and Analysis Services (SSAS), you may better consider an Azure SQL VM.

How well does Azure SQL Data Warehouse scale?

I want to replace all my on-prem DW on SQL Server and use Azure SQL DW. My plan is to remove the spoke and hub model that I currently use for my on-prem SQL and basically have a large Azure SQL DW instance that scale with my client base (currently at ~1000). Would SQL DW scale or I need to retain my spoke and hub model?

Azure SQL Data Warehoue is a great choice for removing your hub and spoke model. The service allows you to scale storage and compute independently to meet the compute and storage needs of your data warehouse. For example, you may have a large number of data sets that are infrequently accessed. SQL Data Warehouse allows you to have a small number of compute resources (to save costs) and using SQL Server features like table partitioning access only the data in the "hot" partitions efficiently - say the last 6 months.
The service offers the ability to adjust the compute power by moving a slider up or down - with the compute change happening in about 60 seconds (during the preview). This allows you to start small - say with a single spoke - and add over time making the migration to the cloud easy. As you need more power, you can simply add DWU/Compute resources by moving the slider to the right.
As the compute model scales, the number of query and loading concurrency slots increase offering you the ability to support larger numbers of customers. You can read more about Elastic performance and scale with SQL Data Warehouse on Azure.com

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string