Star Schema modeling Azure Synapse - azure

According to this documentation Primary key, foreign key, and unique key in Synapse SQL pool, Azure Synapse doesn't support foreign keys.
Thus, how can I model the star schema in Azure Synapse? Or is it not necessary on Azure Synapse?

As per the documentation you have linked:
Having primary key and/or unique key allows Synapse SQL pool engine to
generate an optimal execution plan for a query
After creating a table with primary key or unique constraint in
Synapse SQL pool, users need to make sure all values in those columns
are unique. A violation of that may cause the query to return inaccurate result.
The take away from the above is that Synapse DB does not support constraints. PK and Unique "constrains" are purely for query performance optimisation and do not actually enforce uniqueness.

Related

AZURE SYNAPSE ANALYTICS - HOW TO RELATE TABLES?

I'm new to Azure synapse analytics and I'm trying to crate internal tables to create my DW.
I have just found out that we cannot use primary key/foreign key.
My question is, how do I relate my fact tables with my dimension tables? is it necessary to create any kind of relationship between them or just create it plainly and use joins whenever needed?
I suggest creating your tables with the primary key identified, it helps the optimiser make better query plan choices.
There is currently no syntax supported for foreign key identification.
As you suggested, just go ahead and write your joins.

Foreign Key constraints Synapse Azure

Am currently creating a datawarehouse in Azure Synapse, however Synapse does not allow for the creation of foreign keys. This is vital for referential integrity between the fact and dimension table. Does anyone have any suggestions as to what the alternatives are in synapse to enforce a PK FK relationship?
I searched about this topic and I found that the focus of Synapse is performance and not integrity reinforcement. We can create primary keys and structure the star schema with fact, dimensions and code join tables between them.
It was confused me too until I make this tutorial and read this carefully.
Load Contoso retail data to Synapse SQL
In a star schema any referential integrity should be enforced within the ETL tool used to load the data and not in the DB itself.
Some DBs support logical FKs that can help with query execution plans but they should never be physicalised

Ordering data in Azure Cosmos Table API

Azure Storage Tables have been superseeded by Azure Cosmos Table API at a significantly higher price point but also with new features like automatic secondary indexing.
One of the pain points using Azure Storage Tables was, that in order to achieve custom ordering of query, we have to redundantly store the data with different Partition/Row-Keys as the documentation states, that
Query results returned by the Table service are sorted in ascending
order based on PartitionKey and then by RowKey.
However, the next paragraph states, that
Query results returned by the Azure Table API in Azure DB are not
sorted by partition key or row key. For a detailed list of feature
differences, see differences between Table API in Azure Cosmos DB and
Azure Table storage.
Following, the link, i find that
Query results returned by the Table API aren't sorted in partition
key/row key order as they're in Azure Table storage.
So i am a bit confused now, how to achieve ordering when using Cosmos Table API. Is there no ordering at all? Or can i specify ordering with my querys?
For Azure Cosmos Table API, this one is correct: "Query results returned by the Azure Table API in Azure DB are not sorted by partition key or row key".
So the returned results is no sorting as of now.
Somebody has asked this issue before at GitHub here.
And the MS team suggests vote on this user voice. And they may add this basic sort feature in future.
Hope it helps.
Additional information to this topic i found out from the GitHub thread:
The latest preview of CosmosDB Tables SDK (0.11.0-preview) has OrderBy support:
https://github.com/MicrosoftDocs/azure-docs/issues/26228#issuecomment-471095278

Provision throughput on Database level using Table API in cosmos db

I have come across the requirement where I have to choose the API for Cosmos DB.
I have gone through with all API's like SQL,Graph, Mongo and Table. Since my current project structure is based on Table storage where I am storing IoT Device data.
In Current structure (Table storage) :
I have a separate Table for each Device with payload like below
{
Timestamp,
Parameter name,
value
}
Now If I plan to use Cosmos DB then I can see that I have to Provision RU/throughput against each table which I think going to be big cost. I have not found any way to assign RU on database level so that my allocated RU can be shared across all tables.
Please let me know in case we have something here.... or is it the limitation i can treat for CosmosDB with Table API?
As far as I can see SQL API and consider my use case I can create a single data base and then multiple collection (with the name of Table) and then I have both option for RU provision like on Database as well as on Device level which give me more control on cost.
You can set the throughput on the account level.
You can optionally provision throughput at the account level to be shared by all tables in this account, to reduce your bill. These settings can be changed ONLY when you don't have any tables in the account. Note, throughput provisioned at the account level is billed for, whether you have tables created or not. The estimate below is approximate and does not include any discounts you may be entitled to.
Azure Cosmos DB pricing
The throughput configured on the database is shared across all the containers of the database. You can choose to explicitly exclude certain containers from database provisioning and instead provision throughput for those containers at container level.
A Cosmos DB database maps to the following: a database while using SQL or MongoDB APIs, a keyspace while using Cassandra API or a database account while using Gremlin or Table storage APIs.
You can embed Cerebrata into the situation where the tools allow you to assign any number of throughput values post assigning the throughput type (fixed, auto-scale, or no throughput)
Disclaimer: It’s purely based on my experience

How to Assign partition key while Migrating Data from SQL to Cosmos Db in Azure Cosmos DB Data Migration Tool

I have Downloaded the Azure Cosmos DB Data Migration Tool from here.
i'm Migrating the Sql Data to Cosmos DB. while Using the Migration Tool.
Source Information
What Should i provide in the partition Key Field ?
In cosmosdb, The PartitionKey is a property that will exist on every single object that is best used to group similar objects together.
According to docs,
Specify the document property you wish to use as the Partition Key (if
Partition Key is left blank, documents are sharded randomly across the
target collections).
Collections over 10GB, partition key is required.
In order to understand more on PartionKey read here.
It depends if you migrating one table or multiple tables into a single collection?
I recommend using an Unique index that is exists in all the tables.
Alternately you can create a new partition key

Resources