How to get Databricks Instance ID programatically? - databricks

Is there any possibility to retrieve Databricks Instance Id (the blurred piece of URL) by executing some code in the notebook cell?

for notebooks, you can get context of the notebooks via dbutils with code like this (in Python, for example):
dbutils.notebook.entry_point.getDbutils().notebook() \
.getContext().tags().get("browserHostName").get()
Then you can parse URL to using regex

Related

Object embedded in Databricks SQL command

I came across the following SQL command in Databricks notebook and I am confused about what is this ${da.paths.working_dir} object in following SQL command. Is it a python object or something else?
SELECT * FROM parquet.${da.paths.working_dir}/weather
I know it contains the path of a working directory but how can I access/print it.
I tried to demystify it but failed as illustrated in the following figure.
NOTE: My notebook is SQL notebook
Finally, I figured it out. This is a high-level variable in Databricks SQL and we can access it using the SELECT keyword in Databricks SQL as shown below:
SELECT '${da.paths.working_dir}';
EDIT: This high variable is spark configuration which can be set as follows:
## spark.conf.set(key, value)
spark.conf.set(da.paths.working_dir, "/path/to/files")
To access this property in python:
spark.conf.get(da.paths.working_dir)
To access this property in Databricks SQL:
SELECT {da.paths.working_dir}

How to get workspace name inside a python notebook in databricks

I am trying get the workspace name inside a python notebook. Is there any way we can do this?
Ex:
My workspace name is databricks-test.
I want to capture this in variable in python notebook
To get the workspace name (not Org ID which the other answer gives you) you can do it one of two main ways
spark.conf.get("spark.databricks.workspaceUrl")
which will give you the absolutely URL and you can then split on the first.
i.e
spark.conf.get("spark.databricks.workspaceUrl").split('.')[0]
You could also get it these two ways:
dbutils.notebook.entry_point.getDbutils().notebook().getContext() \
.browserHostName().toString()
or
import json
json.loads(dbutils.notebook.entry_point.getDbutils().notebook() \
.getContext().toJson())['tags']['browserHostName']
Top tip if you're ever wondering what Spark Confs exist you can get most of them in a list like this:
sc.getConf().getAll()
By using of below command , we can get the working workspace ID . But getting the workspace name ,I think difficult to find it .
spark.conf.get("spark.databricks.clusterUsageTags.clusterOwnerOrgId")
spark.conf.get("spark.databricks.clusterUsageTags.clusterName")
This command will return the cluster name :)

double percent spark sql in jupyter notebook

I'm using a Jupyter Notebook on a Spark EMR cluster, want to learn more about a certain command but I don't know what the right technology stack is to search. Is that Spark? Python? Jupyter special syntax? Pyspark?
When I try to google it, I get only a couple results and none of them actually include the content I quoted. It's like it ignores the %%.
What does "%%spark_sql" do, what does it originate from, and what are arguments you can pass to it like -s and -n?
An example might look like
%%spark_sql -s true
select
*
from df
These are called magic commands/functions. Try running %pinfo %%spark_sql or %pinfo2 %%spark_sqlin a Jupyter cell and see if it gives you a detailed information about %%spark_sql.

%run magic using get_ipython().run_line_magic() in Databricks

I am trying to import other modules inside an Azure Databricks notebook. For instance, I want to import the module called 'mynbk.py' that is at the same level as my current Databricks notebook called 'myfile'
To do so, inside 'myfile', in a cell, I use the magic command:
%run ./mynbk
And that works fine.
Now, I would like to achieve the same result, but with using get_ipython().run_line_magic()
I thought, this is what I needed to type:
get_ipython().run_line_magic('run', './mynbk')
Unfortunately, that does not work. The error I get is:
Exception: File `'./mynbk.py'` not found.
Any help is appreciated.
It won't work on Databricks because IPython commands doesn't know about Databricks-specific implementation, and IPython's %run is expecting the file to execute, but Databricks notebooks aren't files on the disk, but the data stored in the database, so %run from IPython can't find it, and you get error.

Databricks - Creating permanent User Defined Functions (UDFs)

I am able to create a UDF function and register to spark using spark.UDF method. However, this is per session only.
How to register python UDF functions automatically when the Cluster starts?. These functions should be available to all users. Example use case is to convert time from UTC to local time zone.
This is not possible; this is not like UDFs in Hive.
Code the UDF as part of the package / program you submit or in the jar included in the Spark App, if using spark-submit.
However,
spark.udf.register.udf("...
is required to be done as well. This applies to Databrick notebooks, etc. The UDFs need to be re-registered per Spark Context/Session.
acutally you can create a permanent function but not from a notebook
you need to create it from a JAR file
https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
CREATE [TEMPORARY] FUNCTION [db_name.]function_name AS class_name
[USING resource, ...]
resource:
: (JAR|FILE|ARCHIVE) file_uri

Resources