We are trying out Unity catalog in Azure Databricks. We connected a pre-existing workspace to the new metastore.
I created a new catalog. When I run a notebook and try to write to table "myfirstcatalog.bronze.mytable" I get the error
[UC_NOT_ENABLED] Unity Catalog is not enabled on this cluster.
I have run this both on an pre-existing cluster as well as a newly cluster.
I found the problem. I had used access mode None, when it needs Single user or Shared.
To create a cluster that can access Unity Catalog, the workspace you are creating the cluster in must be attached to a Unity Catalog metastore and must use a Unity-Catalog-capable access mode (shared or single user).
https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/compute
Related
Is anyone came across below issue during unity catalog table upgrade
we have configured unity catalog in azure and assigned one workspace
added external location and credentials in unity catalog
Assigned create permission on external location
Metastore created is using Azure Gen2
Location assigned in abfss path and connection test looks good
when we try to upgrade one table we are getting below error
[UPGRADE_NOT_SUPPORTED.UNSUPPORTED_FILE_SCHEME] Table is not eligible for upgrade from Hive Metastore to Unity Catalog. Reason: Unsupported file system scheme wasbs.
I am not seeing any issue in terms of unity catalog, do we have any pre-requisites on source /mnt, as it uses wasbs. usually it should not consider that as we are upgrading our external table using external credential that was configured
do we need to convert existing mount to abfss format before starting unity catalog external table upgrade, i am not seeing any reason
we have tried to updated table properties and tested pre-requistes
I am trying to access the internal Hive metastore tables like HIVE.SDS, HIVE.TBLS etc.
I have an HDInsight Hadoop Cluster running with the default internal metastore. From Ambari screen, I got the Advanced setting details required for connections like -
javax.jdo.option.ConnectionDriverName,javax.jdo.option.ConnectionURL,javax.jdo.option.ConnectionUserName as well as the password
When I try connecting to the SQL Server instance(internal hive metastore) instance from a local machine, I get the message to add my IP address to the allowed list. However, since this Azure SQL server is not visible in the list of Azure SQL server dbs in the portal, it is not possible for me to whitelist my IP.
So, I tried logging in via the secure shell user- SSHUSER into the Cluster and tried accessing the HIVE database from within the Cluster using the credentials of metastore provided in Ambari. I am still not able to access it. I am using sqlcmd to connect to sql sever.
Does HDInsight prevent direct access to internal Metastores? Is External Metastore the only way to move ahead? Any leads would be helpful.
Update- I created an external SQL Server instance and used it as an external metastore and was able to access it programatically.
No luck with the Internal one yet.
There is not a way to access internal metastores for HDInsight cluster. The internal metastores live in the internal subscription which only PGs are able to access.
If you want to have more control on you metastores it is recommended to bring your own "external" metastore.
I am new to azure and am trying to understand the below things. It would be helpful if anyone can share their knowledge on this.
Can the table be created in Cluster A be accessed in Cluster B if Cluster A is down?
What is the connection between the cluster and the data in the tables?
You need to have running process (cluster) to be able to access metastore, and read data, because data is stored in the customer's location, not directly accessible from the control plane that runs UI.
When you wrote data into table, then this data should be available in other cluster in following conditions:
the both clusters are using the same metastore
user has correct permissions (could be enforced via Table ACLs)
I have a 3-4 clusters in my databricks instance of Azure cloud platform. I want to maintain a common metastore for all the cluster. Let me know if anyone implemented this.
I recommend configuring an external Hive metastore. By default, Detabricks spins its own metastore behind the scenes. But you can create your own database (Azure SQL does work, also MySQL or Postgres) and specify it during the cluster startup.
Here are detailed steps:
https://learn.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore
Things to be aware of:
Data tab in Databricks - you can choose the cluster and see different metastores.
To avoid using SQL user&password, look at Managed Identities https://learn.microsoft.com/en-us/azure/stream-analytics/sql-database-output-managed-identity
Automate external Hive metastore connections by using initialization scripts for your cluster
Permissions management on your sources. In case of ADLS Gen 2, consider using password pass-through
I'm quite new to Azure. I'm trying to create a HDInsight Spark 4.0 cluster using powershell. Instead of using Azure Resource Manager module, I'm trying to do it using "az" command. Is there a way I can provide External Hive Metastore details while creating the cluster using az? When I try from the azure portal, I see that it is asking for Hive/Oozie metastore details. But I cannot find an option to provide the same while creating the cluster from the Command line (az).
There already is an Azure SQL Server and Database created and I'm able to create the HDInsight cluster by providing the server details. But where do I provide these details in the command line?