Architecture for Power BI - azure

I am designing an architecture for Power BI.
I am thinking this one on Azure Data Lake Store gen1 :
DSL => Databricks => Snowflakes => Azure Analysis Services (Tabular) => Power BI
Is this architecture relevant ?
Does Snownflakes and Analysis Services have the right connectors ?
Thank you

Microsoft recommends to use Modern Data Warehouse Architecture to build systems today, so on your first question the answer is yes.
And the answer to your second question is also yes, Power BI supports both Analysis Services and Snowflake, as you can see in the list of supported data sources.

Related

Is Azure Synapse is a good choice for Time Series Data?

We are in the process of analyzing which database will be the best choices for Time Series data (like stock market data / trading data, market sentiments ..etc.)
Is Azure Synapse is a good choice for Time Series Data?
Azure Synapse data explorer (Preview) provides you with a dedicated query engine optimized and built for log and time series data workloads.
With this new capability now part of Azure Synapse's unified analytics platform, you can easily access your machine and user data to surface insights that can directly improve business decisions.
To complement the existing SQL and Apache Spark analytical runtimes, Azure Synapse data explorer is optimized for efficient log analytics, using powerful indexing technology to automatically index structured, semi-structured, and free-text data commonly found in telemetry data.
For more info please refer to below related articles:
https://learn.microsoft.com/en-us/azure/synapse-analytics/data-explorer/data-explorer-overview
Time series solution - Azure Architecture
Please note that the feature is in public preview.

Azure Synapse, Azure Analysis Serivces and Power BI

Could someone clarify to me if Azure Analysis Service is still an architecture component that we have to consider if we decide to adopt Azure Synapse as DWH environment ?
The question comes in oder to understand if there is some best practise in plase in order to interconnect Power BI with Synapse avoiding to maintain another layer (e.g. Analysis Services).
The feature set isn't 100% identical, but at a high level a Power BI Dataset = AAS database. They use the same engine, so you only need to maintain a separate AAS instance if there is a feature currently available in AAS not currently implemented in Power BI.

Azure Data Explorer (ADX) vs Polybase vs Databricks

Question
Today I discovered another Azure service called Azure Data Explorer (ADX). Sorry for such comparison of services, I have good understanding of all except ADX. I feel like there is a big functionality overlay, so want to know the exact role of ADX in Azure infrastructure.
What is the use case when ADX is significantly better than Synapse/Databricks?
My understanding of ADX
AFAIK, ADX is a cluster (with per hour billing, like Databricks or Synapse, not like ADLA) that is handling database for you and is optimized for streaming ingestion and ad-hoc queries at scale. It also supports external tables, that has worse performance but cheaper (you pay for Blob/ADLS storage).
Details
I don't understand why do we need ADX if:
Azure Synapse has similar pricing model (cluster, per-hour), also it supports streaming ingestion and ad-hoc querying at scale. Azure Synapse support querying BlobStorage/ADLS through Polybase external tables.
Databricks is another service that is capable of doing it. Using Databricks Ingest and Delta Lake - you can ingest streaming data and consume them in both: streaming and batching way. Actually you can have interactive cluster that will handle ad-hoc queries for you.
Also if you want a real-time analytics - use Azure Stream Analytics. If you want Athena-like experience - use ADLA (still it doesn't support ADLS gen2).
Azure Data Explorer is focused on high velocity, high volume high variance (the 3 Vs of big data). It provides super fast interactive queries over such data that is streaming in. It supports json and text natively, including full text search and indexing.
It is used in a broad set of scenarios associated with sensing activity and time series in a large set of verticals: IoT, API logs, transaction monitoring and ad hoc data exploration.
Microsoft is offering ADX as a service as it is the major service that Microsoft is using for its own telemetry and all the analytical solutions as a service that we offer in Security, operational monitoring, game analytics, product insights usage analytics, Iot, Connected vehicles is built on ADX. You can find a full list in our docs. For clarity, SQL, Synapse, CosmosDB is storing its telemetry in Azure Data explorer...
SQL DW (AKA Synapse SQL pool) is an excellent data warehouse and implements the modern data warehouse pattern. ETL->Curated data model-> Load and serve via analysis services or power BI.
ADX is for real time analytics, enabling applying schema on read (SOR) on data as fresh as seconds old.
Consider ADX as a fully managed platform when replacing SOLR/Lucine based variants used for logs, time series databases and more.
Try it out in large workloads and you will see it is dramatically cheaper than the alternatives and much more powerful and performant.
Reach out to me if you need help.
Azure Data Explorer alias Kusto is focused on high volume data ingestion and almost real-time query and analytics. It is invented at Microsoft for log and telemetry analytics, but can be used for other purposes e.g. Iot, sensor data or web analytics. Same technology is used in Azure internal services like Azure Monitor and Log Analytics.
Similar capabilities could be build on Synapse or Databricks or HDInsight, but I see these as tools that fit much more broad use-cases. ADX has quite narrow focus. ADX does support queries (”KQL”) but has very limited SQL support. It is good for append only data, not for updates. It is not a data warehouse, database or data lake.
Microsoft material refers to the technology behind ADX with name Kusto. More info on this at https://learn.microsoft.com/en-us/azure/data-explorer/kusto/concepts/. A good comparison of services can be found in this blog post: https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto

Azure Data Factory architecture with Azure SQL database to Power BI

I'm no MS expert - recently hopped onto the Azure train and apologies in advance if I get some information wrong.
Basically need some input in Azure's architecture utilising Azure Data Factory (as the ETL/ELT tool) and Azure SQL database (as the storage), to a BI output - Power BI. My situation is this;
I have on-premise data sources such as Oracle DB, Oracle Cloud SSAS, MS SQL server db
I'd like to have a MS cloud infrastructure solution for reporting purposes.
No data migration needed - merely pumping on-prem data onto cloud and producing a BI reporting solution
Based on my limited knowledge and Google research, Azure Data Factory caters for all my on-prem sources, as well as the future cloud Azure SQL database. If future analysis is needed, Azure Storage and Azure Databricks can be added in to this architecture. I have sketched out the architecture of my proposed solution.
Just confirming my understanding
Without Azure Storage & Databricks (the 2 pink boxes), the 2 Azure component (DF & SQL database) is sufficient to take data from on-premise sources, process on cloud & output into Power BI.
With Azure Storage & Databricks (the 2 pink boxes), processing will be more efficient as their summarised function is to store training data models & act as an analytics processing engine.
Azure SQL database is more suitable, as compared to Azure SQL datawarehouse as my data sources does not exceed 1TB; cost-wise is cheaper AND one of my data sources contain data from call centers, hence OLTP is more suitable. Plus I have Azure Databricks to support the analytical bit that SQL datawarehouse does (OLAP).
Any other comments to help me understand this whole architecture will be great!
I am a new learner of Azure. I was wondering if we have #Query (value="...") kind or any equivalence for DocumentDb (CosmosDB). Because, the documentDB does not take #Query. I am looking to convert the sql query (From jpa to cosmosDB).
Taking data from on-prem or IaaS sources like SQL on a VM, Oracle etc, requires a Self-Hosted Integration Runtime (SHIR).
Please review the Modern Data Warehouse pattern which sounds similar to what you are proposing.

Is it possible to use Visual Studio for Azure Data Factory?

I am new to Azure. I would like to learn the architecture deployed in my company which i shown below on diagram. Can anyone point me to some video example or something that could reflect that from diagram below. I also have access to Azure portal that i have some money credit so if it is possible i could create some test environment based on that diagram.
P.S Is it possible to use Visual Studio for any kind of work based on that diagram or everything have to be created and develop from Azure portal?
Datasource Oracle DB --> on prem gateway --> ADF--> Azure DB --> AAS --> PowerBI
SQL EDP --------------------------------------^
You've got a fairly straightforward BI architecture there with the following logical components:
raw / source data
integration
data mart / dimensional model
semantic
visualisation
The physical components look a bit like this:
The physical components can be described like this:
Oracle database - former market leader database product. I would guess your employers have rejected OBIEE for some reason
Self-hosted Integration Runtime (SHIR)On-premises data gateway - the SHIR gateway enables the movement of data from on-prem data sources to the cloud. This must be used when moving data from on-prem to Azure SQL DB using Data Factory. Use the SHIR with Data Factory and the Gateway with Power BI and Azure Analysis Services.
Data Factory - Azure ELT tool for moving data from place to place. ETL feature Data Flow currently in preview.
Azure SQL DB - PaaS SQL database, scalable via service tiers. If your data in Oracle is not already in a data mart / dimensional format, then it can be made so here
Azure Analysis Services (AAS) - PaaS OLAP in-memory engine, scalable for fast slice-and-dice, drill down and semantic modelling. Tabular only.
Power BI - increasingly powerful visualisation tool. Run dashboard in DirectQuery / LiveConnection mode to avoid entirely duplicating the tabular model from AAS in Power BI.
In answer to some of your questions: you can have one Azure Data Factory with many pipelines. The Visual Studio Azure Data Factory project type is now defunct.
As to "why" for certain technologies:
why Oracle - Who knows.
why SHIR - SHIR is compulsory when moving data from on-prem to cloud with ADF
why Azure SQL DB - lightweight and powerful PaaS DB requiring no infra and low TCO; scalable. Might be location for restructuring of data from raw / relational structure to dimensional in readiness for semantic layer if your data is not already in that format in Oracle
why AAS - fast, in-memory slice-and-dice; scalable, can pause, can be interrogated by Excel, Power BI Desktop, SSMS, VS, other clients etc. Optionally has row-level security (RLS)
Power BI - online service Power BI.com offers easy sharing within organisation, even externally.
why all the components together - you could (in theory) go straight from Oracle to Power BI with a Power BI gateway (I think) BUT you would then have to do all the modelling in Power BI and your model is then only really accessible from Power BI. In this model, users with SQL skills can query the data mart, users with DAX (or Excel, or Power BI Desktop) skills can query the AAS tabular model, AAS is very scalable component, etc
These opinions are strictly my own personal ones and the value of them may go down, as well as up.
HTH
Azure Data Factory has a 1:M capability with various data sources. One instance of Azure Data Factory will support multiple data movement capabilities: Data movement activities
Information about On-Premise Gateway:
The on-premises data gateway acts as a bridge, providing secure data transfer between on-premises data sources and your Azure Analysis Services servers in the cloud. In addition to working with multiple Azure Analysis Services servers in the same region, the latest version of the gateway also works with Azure Logic Apps, Power BI, Power Apps, and Microsoft Flow. You can associate multiple services in the same subscription and same region with a single gateway.
Connecting to on-premises data sources with Azure On-premises Data Gateway

Resources