I have several SQL DBs in Azure. All have the same structure. Each DB represents a different location. What would be the best practice to aggreate the data of all locations? The goal would be to be able to answer queries like „How much material of type X was used in time range x to y accross all locations?“ or „Give me the location that produces the highest outcome?“
You can use the Azure SQL database Elastic pool.
Add all of your databases to the Elastic pool.
With Elastic query can help you aggreate the data of all locations in Azure.
The elastic query feature (in preview) enables you to run a Transact-SQL query that spans multiple databases in Azure SQL Database. It allows you to perform cross-database queries to access remote tables, and to connect Microsoft and third-party tools (Excel, Power BI, Tableau, etc.) to query across data tiers with multiple databases. Using this feature, you can scale out queries to large data tiers in SQL Database and visualize the results in business intelligence (BI) reports.
Hope this helps.
My recommendation in this scenario is to create a new database which we will name as the "hub" database and it will consolidate the information of all location databases which we will name as "member" databases. Use SQL Data Sync to synchronize each member database to the hub database. Use T-SQL and Power BI against the hub database to answer all your questions involving all locations.
I participated on a project of a Mexican retail store with 72 stores across Mexico they created a hub database to consolidate sales at the end of the day, and use Power BI to show consolidated sales to stakeholders.
Related
I'm currently working in Azure synapse DWH and I have some theoretical questions:
How I can create relationships between tables (Dim's and Fact's) and what implications I would have If I want to create those relationships.
I read that To create a primary key, I would need to set a nonclustered table, but what that means?
Azure Synapse Analytics (ASA) has three engines:
serverless SQL pools (was SQL on-demand)
dedicated SQL pools (the next step on from Azure SQL Data Warehouse)
Apache Spark pools
None of these currently support database relationships, as at today. I suspect you mean dedicated SQL pools and just to confirm it does not support the FOREIGN KEY syntax. Relationships is more of an OLTP concept and not common in big data platforms, which ASA is.
Therefore your options are to enforce these relationships downstream or on import to your warehouse. A common method is to identify unknown values and substitute them with a -1 / Unknown value on import. This will ensure there are no NULLs in your key columns.
Additionally, enforce your relationships downstream eg in an Azure Analysis Services tabular model or Power BI model.
If you really need relationships then depending on your data volumes you might consider Azure SQL Database which supports data volumes up to 4TB alongside columnstore indexes which give great compression.
having a similar issue:
I cannot find an automated solution thus far;
I'm importing 'entities' from D365 to datalake; and it does NOT come with the relationships.
it will also NOT suggest the "Related Tables"
Introduce; ETL of 'entities' using T-SQL and Spark.
Governance of:: py.spark, notebooks, Schema, linting T-SQL. orchestration of activities and pipelines, workflows. Etc...
OR
For small datasets and projects:
Reverse look-up each table needed.
In Azure Synapse create a new DataFlow; and download the .PBIX ;
Do your ETL: Create Primary fact and dimension tables; (by whatever means), such as Using PowerPivot Unique/distinct DAX expression on a Customer.Table).
Once complete; if you like; import the newly ETL primary tables to the datalake.
Repeat step 2.
Create the relationships with PowerBI. (Ideally if ETL is done correctly PBI will auto find the relationships)
RE-Publish the .PBIX with the relationships as a “DataFlow”.
a. You must create relationships for every Dataflow; dataflows cannot be combined.
Measures and Dataflows will consume resources and require performance analysis if they grow.
at some point 'dataverse' may allow D365 data making this easier.
depending on your 'cost/spend' cloning all of D365 still doesn't solve your relationship needs.
Two solutions I'm aware of thus far:
Import the serverless DBO's to PowerBI; Model and Create the Dataset there. you can do massive ETL, including Foreign Key creation, and Filtering of NULL values to create primary keys for Dimensions. Aggregate data and create Fact tables, etc...
Its far easier then using the Synapse GUI. Drawbacks are PBI licensing related.
Create a "Lake Database" (map as you go, great for 5 or less entities.tables.) ETL is low-code. But I'm skeptical that after 40 hours of training; I should have just learned how to scrip this in Workbook/Spark.
Do BOTH; use PowerBI to develop your model and test it. Then go back to synapse and deploy the working model as a pipeline or lake database.
Points of Clarity from the top posted solution:
Do not trust the auto-relationship of PowerBI; stay away from pre-made REFID relationships in PBI unless you know for sure this is what you want. (step 6: original poster; if ETL is correct its a 1:M)
Publishing with .PBIX has its limitations with sharing and other issues the OP mentioned. Lake Database might be the workaround if you have Tabelau, Python, or Qlik as your solution.
DataVerse is coming; and PBI Analytics as well as predictive analysis with HD Insights will be embedded into D365. You will also be able to create drag and drop dashboards. As of 08-05-2022 this is already working in its infancy; even thought they want you to go modular; with hybrid serverless setup you can STILL Pull the aggregate measures from D365 into synapse and Reverse engineer them.
I have come across the requirement where I have to choose the API for Cosmos DB.
I have gone through with all API's like SQL,Graph, Mongo and Table. Since my current project structure is based on Table storage where I am storing IoT Device data.
In Current structure (Table storage) :
I have a separate Table for each Device with payload like below
{
Timestamp,
Parameter name,
value
}
Now If I plan to use Cosmos DB then I can see that I have to Provision RU/throughput against each table which I think going to be big cost. I have not found any way to assign RU on database level so that my allocated RU can be shared across all tables.
Please let me know in case we have something here.... or is it the limitation i can treat for CosmosDB with Table API?
As far as I can see SQL API and consider my use case I can create a single data base and then multiple collection (with the name of Table) and then I have both option for RU provision like on Database as well as on Device level which give me more control on cost.
You can set the throughput on the account level.
You can optionally provision throughput at the account level to be shared by all tables in this account, to reduce your bill. These settings can be changed ONLY when you don't have any tables in the account. Note, throughput provisioned at the account level is billed for, whether you have tables created or not. The estimate below is approximate and does not include any discounts you may be entitled to.
Azure Cosmos DB pricing
The throughput configured on the database is shared across all the containers of the database. You can choose to explicitly exclude certain containers from database provisioning and instead provision throughput for those containers at container level.
A Cosmos DB database maps to the following: a database while using SQL or MongoDB APIs, a keyspace while using Cassandra API or a database account while using Gremlin or Table storage APIs.
You can embed Cerebrata into the situation where the tools allow you to assign any number of throughput values post assigning the throughput type (fixed, auto-scale, or no throughput)
Disclaimer: It’s purely based on my experience
what is the best way to limit latency for SQL Azure in global applications?
My Application uses SQL Azure and would like to know based on the network location of users if its possible to connect SQL Azure near to users.
So Logically would need to have SQL Azure database with global replication but not geo-replication as each copy would serve as Master and not secondary.
Thank you in advance.
You may want to try CosmosDB to distribute data globally and obtain low latency as explained on this article and this documentation.
For replicating data using SQL Data Sync with Azure SQL Database, take in consideration paired regions which may reduce latency. With SQL Data Sync a hub database can be defined and many member database on another region, and data can be synched on both ways between the hub and any member database.
Are there any tutorials available on the subject of migrating from an existing BI-stack based on SQL Server 2008 to Azure SQL Data Warehouse? I'm specifically interested in best practices with regards to how to handle cross database joins on non-premium tiers (our existing procedures and UDFs are full of joins on multiple database objects) and how to migrate existing SSAS cubes and its related programmability and ETL.
What BI-stack are you using? This will determine your next steps for the actual BI tools.
Specifically for cross-database queries when moving to the cloud, the guidance is to move the databases into schemas and then update your scripts to use schema based (2 part names) vs. database (3 part names) when referencing objects. For example, if you have staging and production databases you can simply move your staging objects into a [staging] schema within a single database.
Azure SQL Data Warehouse is commonly used as a backing store for SSAS cubes (MOLAP/ROLAP/Tabular mode). In the Azure cloud, customers have created IaaS SQL Server VMs to host ETL process (SSIS) and cubes (SSAS) with direct connections to SQL Data Warehouse.
How can I query multiple tables from multiple azure databases?
Imagine I have "customers" table in Database X and "sales" table in dabtabase Y and I want to join them in a query, how is it possible to do this in Azure?
David is correct -- currently Azure SQL DB doesn't support cross-database joins.
Is it possible for you to combine the two databases into a single DB, but use separate schema to keep the namespaces of objects separate? I am curious about the business reasons you are maintaining separate dbs. You can reach out to me directly at Stuarto Microsoft com.
Assuming you're talking about Azure's SQL Database service, then no, you cannot have queries across two separate database instances. The queries are limited to the single database.
If you require queries spanning multiple databases, you'd need to install SQL Server on a VM.
You can join them if they're hosted on the same server. Try this
SELECT a.userID, b.usersFirstName, b.usersLastName FROM databaseA.dbo.TableA a inner join database B.dbo.TableB b ON a.userID=b.userID