How to send custom notification mail from azure data bricks? - azure

I am having 4 different data bricks notebooks where 1 master notebook and 3 child notebooks, child notebooks are called into master notebook and run concurrently. I want to send notification email at notebook level whenever my notebook are trigged, passed and failed. I have checked with in-build data bricks mailing option as my child notebooks are called into master notebook I am not getting notification for child notebooks. This there any other custom way I can do this even by script. I am using Scala as my notebook language.

Related

Spark History Server within Jupyterlab

I am running Spark jobs in my Jupyter notebook deployed in an EKS Cluster. Jupyterlab provides a Spark UI Monitoring extension where I can view my Spark jobs by clicking on the "SparkMonitor" tab. I am also trying to access the History Server that is deployed on a different pod. What is the best way for me to access the History Server? Is there any way I can route to the History Server within the Jupyter Notebook?

does Azure Databricks log run info for child notebooks

I am working in Azure Databricks. I have a notebook (say, A) that runs another notebook (say, B). Where does the run info for notebook B get logged to, either in Databricks or in Log Analytics? I have checked Databricks (Workflows \ Job runs), but I don't see a log in there for this notebook. I also checked Log Analytics (in table DatabricksJobs), but I don't see it in there either. I see the log for the parent notebook (notebook A) in both Databricks and in Log Analytics, but I don't see a log for the child notebook (notebook B) in either tool. I am beginning to think that child notebook run info doesn't get logged.

Update databricks job status through API

We need to execute a long running exe running on a windows machine and thinking of ways to integrate with the workflow. The plan is to include the exe as a task in the Databricks workflow.
We are thinking of couple of approaches
Create a DB table and enter a row when this particular task gets started in the workflow. Exe which is running on a windows machine will ping the database table for any new records. Once a new record is found, the exe proceeds with actual execution and updates the status after completion. Databricks will query this table constantly for the status and once completed, task finishes.
Using databricks API, check whether the task has started execution in the exe and continue with execution. After application finishes, update the task status to completion until then the Databricks task will run like while (true). But the current API doesn't support updating the task execution status (To Complete) (not 100% sure).
Please share thoughts OR alternate solutions.
This is an interesting problem. Is there a reason you must use Databricks to execute an EXE?
Regardless, I think you have the right kind of idea. How I would do this with the jobs api is as described:
Have your EXE process output a file to a staging location probably in DBFS since this will be locally accessible inside of databricks.
Build a notebook to load this file, having a table is optional but may give you addtional logging capabilities if needed. The output of your notebook should use the dbutils.notebook.exit method which allows you to output any value string or array. You could return "In Progress" and "Success" or the latest line from your file you've written.
Wrap that notebook in a databricks job and execute on an interval with a cron schedule (you said 1 minute) and you can retrieve the output value of your job via the get-output endpoint
Additional Note, the benefit of abstracting this into return values from a notebook is you could orchestrate this via other workflow tools e.g. Databricks Workflows or Azure Data Factory with inside an Until condition. There are no limits so long as you can orchestrate a notebook in that tool.

Can I start the another cluster from current notebook in Databricks?

I have notebook1 assigned to cluster1 and notebook2 assigned to cluster2.
I want to trigger notebook2 from notebook1 but notebook2 should use only cluster2 for execution.
Currently its getting triggered using Cluster1.
Please let me know for more information.
Unfortunately, you cannot start another cluster from current notebook.
This is excepted behaviour, when you trigger notebook2 from notebook it will use cluster1 and not cluster2.
Reason: When you run any command from notebook1, always runs on the attached cluster.
Notebooks cannot be statically assigned to a cluster; that's actually runtime state only. If you want to run some code on a different cluster (in this case, the code is a notebook), then you have to do it by having your first notebook submit a separate job, rather than using dbutils.notebook.run or %run.
Notebook Job Details:
Hope this helps.

How do I share Databricks Spark Notebook report/dashboard with customers?

I have been using zeppelin for few months now. It is a great tool for internal data analytics. I am looking for more features for sharing the report with the customers. I need to send weekly/monthly/quarterly report to the customers. Looking for a way to automate this process.
Please let me know if Databricks Spark Notebook or any other tool has features to help me to do this.
You can use databricks dashboard for this. Once you have the dashboard, you can do an HTML export of the dashboard and share the HTML file to the public.
If you're interested in automating the reporting process, you may want to look into databricks REST API: https://docs.databricks.com/api/latest/jobs.html#runs-export. You need to pass the run_id of the notebook job and the desired views_to_export (this value should be DASHBOARD) as the query parameters. Note that this run export only supports notebook jobs exports only, which is fine cos dashboards are usually generated from notebook jobs.
If your databricks HTML dashboard export is successful, you'll get a "views" JSON response which consists of a list of key-value pair objects, your HTML string will be available under the "content" key in each of the objects. You can then do anything with this HTML string, you can send it directly to email/slack for automatic reporting.
In order to generate a run_id, you first need to create a notebook job, which you can do via databricks UI. Then, you can get the run_id by triggering the notebook job to run by either:
using databricks scheduler, or
using the databricks run job now REST API: https://docs.databricks.com/api/latest/jobs.html#run-now .
I preferred using the 2nd method, and run the job programmatically via REST API, because I can always find the run_id when I run the job, unlike the first method where I have to look at the databricks UI each time the job is scheduled to run. Either way, you must wait for the notebook job run to finish before running the notebook job export in order to get the complete databricks dashboard in HTML successfully.

Resources