How to get workspace name inside a python notebook in databricks - databricks

I am trying get the workspace name inside a python notebook. Is there any way we can do this?
Ex:
My workspace name is databricks-test.
I want to capture this in variable in python notebook

To get the workspace name (not Org ID which the other answer gives you) you can do it one of two main ways
spark.conf.get("spark.databricks.workspaceUrl")
which will give you the absolutely URL and you can then split on the first.
i.e
spark.conf.get("spark.databricks.workspaceUrl").split('.')[0]
You could also get it these two ways:
dbutils.notebook.entry_point.getDbutils().notebook().getContext() \
.browserHostName().toString()
or
import json
json.loads(dbutils.notebook.entry_point.getDbutils().notebook() \
.getContext().toJson())['tags']['browserHostName']
Top tip if you're ever wondering what Spark Confs exist you can get most of them in a list like this:
sc.getConf().getAll()

By using of below command , we can get the working workspace ID . But getting the workspace name ,I think difficult to find it .
spark.conf.get("spark.databricks.clusterUsageTags.clusterOwnerOrgId")

spark.conf.get("spark.databricks.clusterUsageTags.clusterName")
This command will return the cluster name :)

Related

Is it possible to install a Databricks notebook into a cluster similarly to a library?

I want to be able to have the outputs/functions/definitions of a notebook available to be used by other notebooks in the same cluster without always have run the original one over and over...
For instance, i want to avoid:
definitions_file: has multiple commands, functions etc...
notebook_1
#invoking definitions file
%run ../../0_utilities/definitions_file
notebook_2
#invoking definitions file
%run ../../0_utilities/definitions_file
.....
Therefore i want that definitions_file is available for all other notebooks running in the same cluster.
I am using azure databricks.
Thank you!
No, there is no such thing as "shared notebook" that is implicitly imported. The closest thing you can do is to package your code as a Python library or into Python file inside Repos, but you still will need to write from my_cool_package import * in all notebooks.

How to get Databricks Instance ID programatically?

Is there any possibility to retrieve Databricks Instance Id (the blurred piece of URL) by executing some code in the notebook cell?
for notebooks, you can get context of the notebooks via dbutils with code like this (in Python, for example):
dbutils.notebook.entry_point.getDbutils().notebook() \
.getContext().tags().get("browserHostName").get()
Then you can parse URL to using regex

double percent spark sql in jupyter notebook

I'm using a Jupyter Notebook on a Spark EMR cluster, want to learn more about a certain command but I don't know what the right technology stack is to search. Is that Spark? Python? Jupyter special syntax? Pyspark?
When I try to google it, I get only a couple results and none of them actually include the content I quoted. It's like it ignores the %%.
What does "%%spark_sql" do, what does it originate from, and what are arguments you can pass to it like -s and -n?
An example might look like
%%spark_sql -s true
select
*
from df
These are called magic commands/functions. Try running %pinfo %%spark_sql or %pinfo2 %%spark_sqlin a Jupyter cell and see if it gives you a detailed information about %%spark_sql.

%run magic using get_ipython().run_line_magic() in Databricks

I am trying to import other modules inside an Azure Databricks notebook. For instance, I want to import the module called 'mynbk.py' that is at the same level as my current Databricks notebook called 'myfile'
To do so, inside 'myfile', in a cell, I use the magic command:
%run ./mynbk
And that works fine.
Now, I would like to achieve the same result, but with using get_ipython().run_line_magic()
I thought, this is what I needed to type:
get_ipython().run_line_magic('run', './mynbk')
Unfortunately, that does not work. The error I get is:
Exception: File `'./mynbk.py'` not found.
Any help is appreciated.
It won't work on Databricks because IPython commands doesn't know about Databricks-specific implementation, and IPython's %run is expecting the file to execute, but Databricks notebooks aren't files on the disk, but the data stored in the database, so %run from IPython can't find it, and you get error.

How to export data from a dataframe to a file databricks

I'm doing right now Introduction to Spark course at EdX.
Is there a possibility to save dataframes from Databricks on my computer.
I'm asking this question, because this course provides Databricks notebooks which probably won't work after the course.
In the notebook data is imported using command:
log_file_path = 'dbfs:/' + os.path.join('databricks-datasets',
'cs100', 'lab2', 'data-001', 'apache.access.log.PROJECT')
I found this solution but it doesn't work:
df.select('year','model').write.format('com.databricks.spark.csv').save('newcars.csv')
Databricks runs a cloud VM and does not have any idea where your local machine is located. If you want to save the CSV results of a DataFrame, you can run display(df) and there's an option to download the results.
You can also save it to the file store and donwload via its handle, e.g.
df.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").save("dbfs:/FileStore/df/df.csv")
You can find the handle in the Databricks GUI by going to Data > Add Data > DBFS > FileStore > your_subdirectory > part-00000-...
Download in this case (for Databricks west europe instance)
https://westeurope.azuredatabricks.net/files/df/df.csv/part-00000-tid-437462250085757671-965891ca-ac1f-4789-85b0-akq7bc6a8780-3597-1-c000.csv
I haven't tested it but I would assume the row limit of 1 million rows that you would have when donwloading it via the mentioned answer from #MrChristine does not apply here.
Try this.
df.write.format("com.databricks.spark.csv").save("file:///home/yphani/datacsv")
This will save the file into Unix Server.
if you give only /home/yphani/datacsv it looks for the path on HDFS.

Resources