I would like to understand why Admins in Workspaces are different than in Databricks SQL.
I'll explain myself.
In Databricks Data Science and Engineering:
As an admin you can modify any notebooks in workspaces
In Databricks SQL:
As an admin, on Queries > Admin View Panel. If I select a query that the person didn't share with me. I can't even read the query (as you can see below).
I find this behaviour too restricting for Admins (not for normal Users). Am I missing something ?
I already looked at Warehouses config and nothing seemed to be specified considering this subject.
What was the purpose of such a restriction ?
Related
I am looking for some inputs on how to do a GCP cloud to AZURE cloud data migration.
Scenario -
I have a snowflake instance configured on GCP cloud (multiple databases holding legacy data) and I have another snowflake instance configured on Azure Cloud (DWH created on this instance).
I want to move/copy the data of all the databases (including all child objects - schema, table, views etc) sitting on GCP snowflake instance to snowflake instance configured on Azure Cloud.
Can you please guide me on what can be the best solution for such data migration and any steps or documentation link would be really helpful.
Many thanks - Minti
Please check the Database replication mechanism which can be used as a migration tool for SF account from 1 cloud platform to another. https://docs.snowflake.com/en/user-guide/database-replication-intro.html
Not something I've done before to be honest but if you didn't want to use external tools one possible method would be to secure share your GCP databases with your Azure Snowflake account.
You then might be able to create a new database that is a clone of this share (not sure if this is possible).
Most objects get cloned apart from stages and pipes but tables, views etc should carry over
This is a pretty easy process with a couple of prerequisites.
Make sure you have Organizations enabled on your GCP account.
This feature allows you to self-provision Snowflake accounts on any cloud provider/region. Open a support case to enable it.
Introduction to Organizations
Create a new account on Azure if you haven't already.
Enable Replication on both accounts
This can be done when logged into the account with the ORGADMIN role
Replicate your databases
Note: this will work for having a replica of the databases in the GCP Snowflake account databases in your Azure Snowflake account. If you want to permanently migrate your databases you need to set up Failover/Failback. This is a Business Critical feature, but Snowflake support will enable it for lower editions until you can complete your migration, at which point they will disable it.
Replicating a Database to Another Account
There are two options
You could make use of the replication feature
High level Steps include the below
a. Target account to be created - Can use the Organizations feature available in Snowflake(Enabled by Snowflake Support upon request)
b. Account level objects should be created manually in the target account
Note: The Failover feature is supported for the accounts whose edition is Business-critical and above. However, for account migration scenarios, this feature will be enabled for a temporary period by the Snowflake Support.
c. Replication - the below links can be referenced for a complete understanding of the process.
https://docs.snowflake.com/en/user-guide/database-replication-intro.html#introduction-to-database-replication-across-multiple-accounts
https://docs.snowflake.com/en/user-guide/database-replication-config.html#replicating-a-database-to-another-account
https://docs.snowflake.com/en/user-guide/database-failover-config.html#failing-over-databases-across-multiple-accounts
Please find the link below to have an overview on the costs associated
https://docs.snowflake.com/en/user-guide/database-replication-billing.html#understanding-billing-for-database-replication
Limitations
https://docs.snowflake.com/en/user-guide/database-replication-intro.html#current-limitations-of-replication
One other option is to create the target account and use the unloading and loading feature
https://docs.snowflake.com/en/user-guide-data-unload.html
https://docs.snowflake.com/en/user-guide-data-load.html
I'm trying to get data from a different database in databricks that's not the default one. However, I can't seem to find details about how to go about it.
The docs here only mention that it uses the default db in databricks, however my data is not in there.
Can anyone point some resources to be able to query a different database in Databricks?
Thanks
In Azure SQL I can query what temp tables currently exist by using the query -
select * from tempdb.sys.tables;
However, I am not able to find who created these. Surely there must be a simple way to find out who created these temp tables! There are links which suggest things, but all of that works on SQL Server, not Azure SQL.
Permissions
Any user can create temporary objects in tempdb. Users can
access only their own objects, unless they receive additional
permissions. It's possible to revoke the connect permission to tempdb
to prevent a user from using tempdb. We don't recommend it because
some routine operations require the use of tempdb.
The tempdb system database is a global resource that's available to all users connected to the instance of SQL Server or connected to Azure SQL Database.
By default, server admin, database owner or a user with required permission can access the tables of tempdb.
This official article on tempdb database is related to Azure SQL Database. Please go thorugh for more details and better understanding.
I'm currently working in Azure synapse DWH and I have some theoretical questions:
How I can create relationships between tables (Dim's and Fact's) and what implications I would have If I want to create those relationships.
I read that To create a primary key, I would need to set a nonclustered table, but what that means?
Azure Synapse Analytics (ASA) has three engines:
serverless SQL pools (was SQL on-demand)
dedicated SQL pools (the next step on from Azure SQL Data Warehouse)
Apache Spark pools
None of these currently support database relationships, as at today. I suspect you mean dedicated SQL pools and just to confirm it does not support the FOREIGN KEY syntax. Relationships is more of an OLTP concept and not common in big data platforms, which ASA is.
Therefore your options are to enforce these relationships downstream or on import to your warehouse. A common method is to identify unknown values and substitute them with a -1 / Unknown value on import. This will ensure there are no NULLs in your key columns.
Additionally, enforce your relationships downstream eg in an Azure Analysis Services tabular model or Power BI model.
If you really need relationships then depending on your data volumes you might consider Azure SQL Database which supports data volumes up to 4TB alongside columnstore indexes which give great compression.
having a similar issue:
I cannot find an automated solution thus far;
I'm importing 'entities' from D365 to datalake; and it does NOT come with the relationships.
it will also NOT suggest the "Related Tables"
Introduce; ETL of 'entities' using T-SQL and Spark.
Governance of:: py.spark, notebooks, Schema, linting T-SQL. orchestration of activities and pipelines, workflows. Etc...
OR
For small datasets and projects:
Reverse look-up each table needed.
In Azure Synapse create a new DataFlow; and download the .PBIX ;
Do your ETL: Create Primary fact and dimension tables; (by whatever means), such as Using PowerPivot Unique/distinct DAX expression on a Customer.Table).
Once complete; if you like; import the newly ETL primary tables to the datalake.
Repeat step 2.
Create the relationships with PowerBI. (Ideally if ETL is done correctly PBI will auto find the relationships)
RE-Publish the .PBIX with the relationships as a “DataFlow”.
a. You must create relationships for every Dataflow; dataflows cannot be combined.
Measures and Dataflows will consume resources and require performance analysis if they grow.
at some point 'dataverse' may allow D365 data making this easier.
depending on your 'cost/spend' cloning all of D365 still doesn't solve your relationship needs.
Two solutions I'm aware of thus far:
Import the serverless DBO's to PowerBI; Model and Create the Dataset there. you can do massive ETL, including Foreign Key creation, and Filtering of NULL values to create primary keys for Dimensions. Aggregate data and create Fact tables, etc...
Its far easier then using the Synapse GUI. Drawbacks are PBI licensing related.
Create a "Lake Database" (map as you go, great for 5 or less entities.tables.) ETL is low-code. But I'm skeptical that after 40 hours of training; I should have just learned how to scrip this in Workbook/Spark.
Do BOTH; use PowerBI to develop your model and test it. Then go back to synapse and deploy the working model as a pipeline or lake database.
Points of Clarity from the top posted solution:
Do not trust the auto-relationship of PowerBI; stay away from pre-made REFID relationships in PBI unless you know for sure this is what you want. (step 6: original poster; if ETL is correct its a 1:M)
Publishing with .PBIX has its limitations with sharing and other issues the OP mentioned. Lake Database might be the workaround if you have Tabelau, Python, or Qlik as your solution.
DataVerse is coming; and PBI Analytics as well as predictive analysis with HD Insights will be embedded into D365. You will also be able to create drag and drop dashboards. As of 08-05-2022 this is already working in its infancy; even thought they want you to go modular; with hybrid serverless setup you can STILL Pull the aggregate measures from D365 into synapse and Reverse engineer them.
I'm migrating an application to SQL Azure Federation and I d'like to see and edit the tables content without SQL (it's just for testing).
With a standard SQL database (SQL Server or SQL Azure) I can use one of these :
Management Studio (SSMS) to see and edit data : right click on a table > edit top xx rows.
Visual Studio : in server explorer, I connect to my database, right click on a table, and click on "Show table data".
Of course this doesn't work for SQL Azure Federation.
Do you know a tool (even simple), free (if possible), to edit my data in a federation member ?
Btw, you can't use Edit top xxx rows from SSMS when connected to SQL Azure database. This option is disabled (it even is not listed on the context menu).
However it works with Visual Studio (Visual Studio 2012 SQL Server Explorer) though. Which is interesting.
And it works with Federations too. Your Federation Members are actually a separate databases with some strangely generated names. Connect with VS SQL Server Explorer to the SQL Azure Server. Then when you list all the databases in the tree you shall see other Databases, beside your federation root:
Now the only thing left is for you to know which system-xxxxx database corresponds to which federation member. You may be able to find this from that article. The following query might be helpful:
-- Route connection to the federation root
USE FEDERATION ROOT WITH RESET
GO
-- View the federation root metadata
SELECT db_name() [db_name]
SELECT * FROM sys.federations
SELECT * FROM sys.federation_distributions
SELECT * FROM sys.federation_member_distributions ORDER BY federation_id, range_low;
GO
Your task is fairly easy achievable when you have just one federation with only one federation member (because you will only have a single system-xxx-xxx-x DB). But as soon as you split, you will want to find out which exact federation member database you need to talk to.
UPDATE
There is one reliable way to get the exact database name for a particular federation member. You have to connect to the federation member you want to edit data in. For instance if our Federation is named MyFirstFederation and federation distribution key name is FederationKey, and we want to connect to the federation member where data with 10000 for value of Federation key is, we execute:
USE FEDERATION MyFirstFederation(FederationKey = 10000) WITH RESET
GO
On the same context we execute:
SELECT * FROM sys.databases
GO
This will list master and system-xxxx-yyy-zzz. Where the former will be the exact database which holds members with FederationKey values of 10000 (and everything else withing the range of that particular federation member).
Now we know the exact name of the database we want to select in Visual Studio 2012 SQL Server Explorer and will be able to visually edit content. It is a bit slow though, but it is a GUI tool you are asking for.