How to upsert/insert records in all tables in an Azure SQL Database with Azure Data Factory v2 - azure

I have an Azure SQL database with many tables that I want to update frequently with any change made, be it an update or an insert, using Azure Data Factory v2.
There is a section in the documentation that explains how to do this.
However, the example is about two tables, and for each table a TYPE needs to be defined, and for each table a Stored Procedure is built.
I don't know how to generalize this for a large number of tables.
Any suggestion would be welcome.

You can follow my answer https://stackoverflow.com/a/69896947/7392069 but I don't know how to generalise creation of table types and stored procedures, but at least the metadata table of the metadata driven copy task provides a lot of comfort to achieve what you need.

Related

Why would I not used Databricks as my data mart?

I'm trying to get my head around Databricks.
I've found documentation stepping through importing data from S3 or Azure Datalake, and then outputting into Azure Synapse Analytics or another Data Warehouse solution.
After a quick play, I've recognised that you can simply save a table in Databricks, access it using SQL, and even pull it into PowerBI as a source.
So my question: for a small Datamart (10 dims, 5 facts), why would I choose to pay for an additional database solution like Azure SQL, Synapse, RDS or other when I could simply leave the data in a table in Databricks and then access it directly from my reporting tool from there?
Thank you in advance.
Andy
Yes this is very much possible . Just to let you know that SQL Azure and Synapse may be a Microsoft offering but they are for different purpose , Synapse supports MPP and so it more big data implementation . Also its not only how many dimension and fact table you have , how much data you have , what kind of aggregation it has etc becomes decisive .

Relationships in Azure synapse (DWH)

I'm currently working in Azure synapse DWH and I have some theoretical questions:
How I can create relationships between tables (Dim's and Fact's) and what implications I would have If I want to create those relationships.
I read that To create a primary key, I would need to set a nonclustered table, but what that means?
Azure Synapse Analytics (ASA) has three engines:
serverless SQL pools (was SQL on-demand)
dedicated SQL pools (the next step on from Azure SQL Data Warehouse)
Apache Spark pools
None of these currently support database relationships, as at today. I suspect you mean dedicated SQL pools and just to confirm it does not support the FOREIGN KEY syntax. Relationships is more of an OLTP concept and not common in big data platforms, which ASA is.
Therefore your options are to enforce these relationships downstream or on import to your warehouse. A common method is to identify unknown values and substitute them with a -1 / Unknown value on import. This will ensure there are no NULLs in your key columns.
Additionally, enforce your relationships downstream eg in an Azure Analysis Services tabular model or Power BI model.
If you really need relationships then depending on your data volumes you might consider Azure SQL Database which supports data volumes up to 4TB alongside columnstore indexes which give great compression.
having a similar issue:
I cannot find an automated solution thus far;
I'm importing 'entities' from D365 to datalake; and it does NOT come with the relationships.
it will also NOT suggest the "Related Tables"
Introduce; ETL of 'entities' using T-SQL and Spark.
Governance of:: py.spark, notebooks, Schema, linting T-SQL. orchestration of activities and pipelines, workflows. Etc...
OR
For small datasets and projects:
Reverse look-up each table needed.
In Azure Synapse create a new DataFlow; and download the .PBIX ;
Do your ETL: Create Primary fact and dimension tables; (by whatever means), such as Using PowerPivot Unique/distinct DAX expression on a Customer.Table).
Once complete; if you like; import the newly ETL primary tables to the datalake.
Repeat step 2.
Create the relationships with PowerBI. (Ideally if ETL is done correctly PBI will auto find the relationships)
RE-Publish the .PBIX with the relationships as a “DataFlow”.
a. You must create relationships for every Dataflow; dataflows cannot be combined.
Measures and Dataflows will consume resources and require performance analysis if they grow.
at some point 'dataverse' may allow D365 data making this easier.
depending on your 'cost/spend' cloning all of D365 still doesn't solve your relationship needs.
Two solutions I'm aware of thus far:
Import the serverless DBO's to PowerBI; Model and Create the Dataset there. you can do massive ETL, including Foreign Key creation, and Filtering of NULL values to create primary keys for Dimensions. Aggregate data and create Fact tables, etc...
Its far easier then using the Synapse GUI. Drawbacks are PBI licensing related.
Create a "Lake Database" (map as you go, great for 5 or less entities.tables.) ETL is low-code. But I'm skeptical that after 40 hours of training; I should have just learned how to scrip this in Workbook/Spark.
Do BOTH; use PowerBI to develop your model and test it. Then go back to synapse and deploy the working model as a pipeline or lake database.
Points of Clarity from the top posted solution:
Do not trust the auto-relationship of PowerBI; stay away from pre-made REFID relationships in PBI unless you know for sure this is what you want. (step 6: original poster; if ETL is correct its a 1:M)
Publishing with .PBIX has its limitations with sharing and other issues the OP mentioned. Lake Database might be the workaround if you have Tabelau, Python, or Qlik as your solution.
DataVerse is coming; and PBI Analytics as well as predictive analysis with HD Insights will be embedded into D365. You will also be able to create drag and drop dashboards. As of 08-05-2022 this is already working in its infancy; even thought they want you to go modular; with hybrid serverless setup you can STILL Pull the aggregate measures from D365 into synapse and Reverse engineer them.

is it posible update row values from tables in Azure Data Factory?

I have a dataset in Data Factory, and I would like to know if is possible update row values using only data factory activities, without data flow, store procedures, queries...
There is a way to do update (and probably any other SQL statement) from Data Factory, it's a bit tacky though.
The Loopup activity, can execute a set of statements in Query mode, ie:
The only condition is to end it with select, otherwise Lookup activity throws error.
This works for Azure SQL, PostgreSQL, and most likely for any other DB Data Factory can connect to.
Concepts:
Datasets:
Datasets represent data structures within the data stores, which simply point to or reference the data you want to use in your activities as inputs or outputs.
Now, a dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs. Datasets identify data within different data stores, such as tables, files, folders, and documents. For example, an Azure Blob dataset specifies the blob container and folder in Blob storage from which the activity should read the data.
Currently, according to my experience, it's impossible to update row values using only data factory activities. Azure Data Factory doesn't support this now.
Fore more details,please reference:
Datasets
Datasets and linked services in Azure Data Factory.
For example, when I use Copy Active, Data Factory doesn't provide my any ways to update the rows:
Hope this helps.
This is now possible in Azure Data Factory, your Data flow should have an Alter Row stage, and the Sink has a drop-down where you can select the key column for doing updates.
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-alter-row
As mentioned in Above comment regarding ADF data flow, ADF data flow does not support on-permise sink or source, the sink & source should reside in Azure SQL or Azure Data lake or any other AZURE data services.

Best solution for dynamic spatial data

I'm trying to find the best solution for storing dynamic spatial data. I wonder if any of Microsoft's Azure solutions could work. Azure Table Storage would let me create a lot of custom and dynamic structures stored on fast SSD disks.
Because of data's dynamic nature, common indexing seems useless. I would also like to create a lot of table-like structures so the whole architecture cannot be static. Using Azure Table Storage I would dynamically create a table based on country, city, etc sorted by latitude or longitude.
I would appreciate any clue.
Azure Table Storage has mostly been replaced by Azure Cosmos DB.
At the time of writing the Table Storage page even says:
The content in this article applies to the original basic Azure Table storage. However, there is now a premium offering for Azure Table storage in public preview that offers throughput-optimized tables, global distribution, and automatic secondary indexes. To learn more and try out the new premium experience, please check out Azure Cosmos DB: Table API.
You can use Cosmos DB via the Table API, but you'll probably find the Document DB API to be more powerful.
Documents are "schema-free". You can just throw your documents in to a collection, and then you can query against them.
You can create documents which have geo-spatial properties which are indexed automatically.
Then you can perform geo-spatial queries against those properties.
For example you might give each of your documents a point, and then create a query to select all documents that are inside of a polygon.
Or maybe you want to find out how far away each document is from a given point.

Create Azure table storage folders

I have a worker role running that creates tables in table storage, and I would like to be able to group these tables into categories like you would under a folder.
I cannot see any way to do this with the table classes in .Net, but when I look in my table storage 'Tables', I see a 'Metrics Table' entry which looks like a 'folder' and expands to show multiple metrics tables below it.
How can I create/add one of these myself programmatically?
Any ideas gratefully received?
I'm afraid this is not possible. Metric tables are handled differently by Visual Studio. They are not even returned when using Query Tables storage REST API (you can only use them directly by name). Tools like Azure Storage Explorer do not show them at all.
Back to your question. Best practice is to use common prefix for tables in same 'category'.
ex. WAD* for all azure diagnostics tables, NLog*for nlog tables.
Simple answer is that you can't. Table Storage Service contains tables and then each table contains entities. The functionality about Metrics Table you're talking about is a UI feature where the UI combines all these tables together.

Resources