How to display markdown output in databricks notebook from a python cell - databricks

With IPython/Jupyter it's possible to output markdown using the IPython display module and its MarkDownclass.
Question
How can I accomplish this with Azure Databricks?
What I tried
Databricks display
Tried using Databrick's display with the IPython Markdown class:
from IPython.display import Markdown
display(Markdown('*some markdown* test'))
but this results in the following error:
Exception: Cannot call display(<class 'IPython.core.display.Markdown'>)
IPython display
I then tried to use IPython's display:
from IPython.display import display, Markdown
display(Markdown('*some markdown* test'))
but this just displays the text:
<IPython.core.display.Markdown object>
IPython display_markdown
Tried using IPython's display_markdown:
from IPython.display import display_markdown
display_markdown('# Markdown is here!\n*some markdown*\n- and\n- some\n- more')
but this results in nothing showing up:
Looking up documentation
Also tried checking Azure Databricks documentation. At first I visited https://www.databricks.com/databricks-documentation which leads me to https://learn.microsoft.com/en-ca/azure/databricks/ but I wasn't able to find anything via searching or clicking the links and I usually find Microsoft documentation quite good.
Checking Databrick's display source
As Saideep Arikontham mentioned in the comments, Databricks version 11 and above is using IPython kernel so I dug a bit deeper.
According to Databrick's source for the display function, it will readily render any object that implements _repr_html().
However I'm having a hard time being able to get the raw html output that I'm assuming IPython.display.Markdown should be able to output. I can only find _repr_markdown_() and _data_and_metadata() where the former just calls the latter and the output, at least in Databricks, is just the original raw markdown string.

Markdown and display_markdown are not giving desired output when used in Azure Databricks. I have done the following in Databricks 11.1 runtime.
Taking inputs from the question, I understood that when a class has _repr_html(), it is able to output the desired result. But when this method is absent in class, it is returning an object.
So, for Markdown to work, I have written my own Markdown class where I used Python's markdown library.
from IPython.display import DisplayObject, TextDisplayObject
class Markdown(TextDisplayObject):
def __init__(self,TextDisplayObject):
import markdown as md
#converting markdown to html
self.html = md.markdown(TextDisplayObject)
def _repr_html_(self):
return self.html
Now, this class is not completely same as the IPython.display.Markdown. I have formatted your sample markdown
'# Markdown is here!\n*some markdown*\n- and\n- some\n- more' as following to get the desired result.
Markdown('''# Markdown is here!\n
*some markdown*\n
- and\n
- some\n
- more''')
NOTE:
For display_markdown() to display output, we must specify another argument raw as True (display_markdown(<markdown_str>, raw=True)). However, in Databricks it is returning undefined (NoneType).
Please do install markdown library first using %pip install markdown in Databricks cell.

Related

convert html to png using gmplot in Python

I am using gmplot in Python 3.8 to generate a layer of polygons on top of a satellite google maps layer. The map is saved to .html format but I would like to be able to convert the .html file to .png format to embed it in a pdf created in Python at a later stage (that will contain other elements, such as text and other images).
I generate the map using standard code as described in the gmplot tutorial:
import gmplot
latitude_list = [ 17.4567417, 17.5587901, 17.6245545]
longitude_list = [ 78.2913637, 78.007699, 77.9266135 ]
gmap = gmplot.GoogleMapPlotter(17.438139, 78.3936413, 11)
gmap.polygon(latitude_list, longitude_list, color = 'cornflowerblue')
gmap.draw("path_to_html")
I have checked different posts to get a solution, including this one and this one. From one of these posts, I have managed to get the following snippet of code:
import time
from selenium import webdriver
import chromedriver_binary # adds chromedriver binary to path
driver = webdriver.Chrome()
driver.get("local_url_of_html_file")
time.sleep(3)
driver.save_screenshot('map.png')
driver.quit()
It appears this code takes a screenshot of the html but I was wondering if there is any in-built function in gmplot to do this in a more straightforward way or other packages like bokeh.

Getting rid of print "<IPython.core.display.Markdown object>" when using `display`

I'm trying to create nice slides using jupyter notebook and RISE. One of my objectives is to display a pandas-dataframe in a Markdown cell in order to have some styling flexibility.
I am using the following code to display my dataframe in a Markdown cell:
{{Markdown(display(df_x))}}
After running this line, I get the following result:
image of dataframe displayed
I would like to get rid of the text printed below my dataframe (<IPython.core.display.Markdown object>).
I still haven't found a way to achieve this. Could someone give me a hand?
This is the library I'm working with:
from IPython.display import display
Not familiar with Markdown class so not sure why you need that but this text printed in the output cell is coming from the fact that this Markdown class is returning and object and since you're not assigning it to any variable the default behavior for the notebook is to run something like str(your_object) which correctly returns <IPython.core.display.Markdown object>.
So the easiest workaround would be to just assign it to some variable like this:
dummy_var = Markdown(display(df_x))
# or better yet:
_ = Markdown(display(df_x))

Writing into a Jupyter Notebook from Python

Is it possible for a Python script to write into a iPython Notebook?
with open("my_notebook.ipynb", "w") as jup:
jup.write("print(\"Hello there!\")")
If there's some package for doing so, can I also control the way cells are split in the notebook?
I'm designing a software tool (that carries out some optimization) to prepare an iPython notebook that can be run on some server performing scientific computations.
I understand that a related solution is to output to a Python script and load it within a iPython Notebook using %load my_python_script.py. However, that involves a user to type stuff that I would ideally like to avoid.
Look at the nbformat repo on Github. The reference implementation is shown there.
From their docs
Jupyter (né IPython) notebook files are simple JSON documents, containing text, source code, rich media output, and metadata. Each segment of the document is stored in a cell.
It also sounds like you want to create the notebook programmatically, so you should use the NotebookNode object.
For the code, something like, should get you what you need. new_cell_code should be used if you have code cells versus just plain text cells. Text cells should use the existing markdown formatting.
import nbformat
notebook = nbformat.v4.new_notebook()
text = """Hello There """
notebook['cells'] = [nbformat.v4.new_markdown_cell(text)]
notebook= nbformat.v4.new_notebook()
nbformat.write(notebook,'filename.ipynb')

Importing scripts into a notebook in IBM WATSON STUDIO

I am doing PCA on CIFAR 10 image on IBM WATSON Studio Free version so I uploaded the python file for downloading the CIFAR10 on the studio
pic below.
But when I trying to import cache the following error is showing.
pic below-
After spending some time on google I find a solution but I can't understand it.
link
https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/add-script-to-notebook.html
the solution is as follows:-
Click the Add Data icon (Shows the Add Data icon), and then browse the script file or drag it into your notebook sidebar.
Click in an empty code cell in your notebook and then click the Insert to code link below the file. Take the returned string, and write to a file in the file system that comes with the runtime session.
To import the classes to access the methods in a script in your notebook, use the following command:
For Python:
from <python file name> import <class name>
I can't understand this line
` and write to a file in the file system that comes with the runtime session.``
Where can I find the file that comes with runtime session? Where is the file system located?
Can anyone plz help me in this with the details where to find that file
You have the import error because the script that you are trying to import is not available in your Python runtime's local filesystem. The files (cache.py, cifar10.py, etc.) that you uploaded are uploaded to the object storage bucket associated with the Watson Studio project. To use those files you need to make them available to the Python runtime for example by downloading the script to the runtimes local filesystem.
UPDATE: In the meanwhile there is an option to directly insert the StreamingBody objects. This will also have all the required credentials included. You can skip to writing it to a file in the local runtime filesystem section of this answer if you are using insert StreamingBody object option.
Or,
You can use the code snippet below to read the script in a StreamingBody object:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3
def __iter__(self): return 0
os_client= ibm_boto3.client(service_name='s3',
ibm_api_key_id='<IBM_API_KEY_ID>',
ibm_auth_endpoint="<IBM_AUTH_ENDPOINT>",
config=Config(signature_version='oauth'),
endpoint_url='<ENDPOINT>')
# Your data file was loaded into a botocore.response.StreamingBody object.
# Please read the documentation of ibm_boto3 and pandas to learn more about the possibilities to load the data.
# ibm_boto3 documentation: https://ibm.github.io/ibm-cos-sdk-python/
# pandas documentation: http://pandas.pydata.org/
streaming_body_1 = os_client.get_object(Bucket='<BUCKET>', Key='cifar.py')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(streaming_body_1, "__iter__"): streaming_body_1.__iter__ = types.MethodType( __iter__, streaming_body_1 )
And then write it to a file in the local runtime filesystem.
f = open('cifar.py', 'wb')
f.write(streaming_body_1.read())
This opens a file with write access and calls the write method to write to the file. You should then be able to simply import the script.
import cifar
Note: You can get the credentials like IBM_API_KEY_ID for the file by clicking on the Insert credentials option on the drop-down menu for your file.
The instructions that op found miss one crucial line of code. I followed them and was able to import modules but wasn't able to use any functions or classes in those modules. This was fixed by closing the files after writing. This part in the instrucitons:
f = open('<myScript>.py', 'wb')
f.write(streaming_body_1.read())
should instead be (at least this works in my case):
f = open('<myScript>.py', 'wb')
f.write(streaming_body_1.read())
f.close()
Hopefully this helps someone.

Why pandas profiling isn't showing any output in ipython?

I've a quick question about "pandas_profiling" .So basically i'm trying to use the pandas 'profiling' but instead of showing the output it says something like this:
<pandas_profiling.ProfileReport at 0x23c02ed77b8>
Where i'm making the mistake?? or Does it have anything to do with Ipython?? Because i'm using Ipython in Anaconda.
try this
pfr = pandas_profiling.ProfileReport(df)
pfr.to_notebook_iframe()
pandas_profiling creates an object that then needs to be displayed or output. One standard way of doing so is to save it as an HTML:
profile.to_file(outputfile="sample_file_name.html")
("profile" being the variable you used to save the profile itself)
It doesn't have to do with ipython specifically - the difference is that because you're going line by line (instead of running a full block of code, including the reporting step) it's showing you the object itself. The code above should allow you to see the report once you open it up.

Resources