I am working in Azure Databricks. I have a notebook (say, A) that runs another notebook (say, B). Where does the run info for notebook B get logged to, either in Databricks or in Log Analytics? I have checked Databricks (Workflows \ Job runs), but I don't see a log in there for this notebook. I also checked Log Analytics (in table DatabricksJobs), but I don't see it in there either. I see the log for the parent notebook (notebook A) in both Databricks and in Log Analytics, but I don't see a log for the child notebook (notebook B) in either tool. I am beginning to think that child notebook run info doesn't get logged.
Related
I am having 4 different data bricks notebooks where 1 master notebook and 3 child notebooks, child notebooks are called into master notebook and run concurrently. I want to send notification email at notebook level whenever my notebook are trigged, passed and failed. I have checked with in-build data bricks mailing option as my child notebooks are called into master notebook I am not getting notification for child notebooks. This there any other custom way I can do this even by script. I am using Scala as my notebook language.
How does one log on Qubole/access logs from spark on Qubole? The setup I have:
java library (JAR)
Zeppelin Notebook (Scala), simply calling a method from the library
Spark, Yarn cluster
Log4j2 used in the library (configured to log on stdout)
How can I access my logs from the log4j2 logger? What I tried so far:
Looking into the 'Logs' section of my Interpreters
Going through Spark UI's stdout logs of each executor
When a Spark job or application fails, you can use the Spark logs to analyze the failures.
The QDS UI provides links to the logs in the Application UI and Spark Application UI.
If you are running the Spark job or application from the Analyze page, you can access the logs via the Application UI and Spark Application UI.
If you are running the Spark job or application from the Notebooks page, you can access the logs via the Spark Application UI.
Accessing the Application UI Accessing the Spark Application UI You can also additional logs to identify the errors and exceptions in Spark job or application failures.
Accessing the Application UI
To access the logs via the Application UI from the Analyze page of the QDS UI:
Note the command id, which is unique to the Qubole job or command.
Click on the down arrow on the right of the search bar.
The Search History page appears as shown in the following figure.
../../_images/spark-debug1.png
Enter the command id in the Command Id field and click Apply.
Logs of any Spark job are displayed in Application UI and Spark Application UI, which are accessible in the Logs and Resources tabs. The information in these UIs can be used to trace any information related to command status.
The following figure shows an example of Logs tab with links.
Click on the Application UI hyperlink in the Logs tab or Resources tab.
The Hadoop MR application UI is displayed as shown in the following figure.
../../_images/application-ui.png
The Hadoop MR application UI displays the following information:
MR application master logs
Total Mapper/Reducer tasks
Completed/Failed/Killed/Successful tasks
Note
The MR application master logs corresponds to the Spark driver logs. For any Spark driver related issues, you should verify the AM logs (driver logs).
If you want to check the exceptions of the failed jobs, you can click on the logs link in the Hadoop MR application UI page. The Application Master (AM) logs page that contains stdout, stderr and syslog is displayed.
Accessing the Spark Application UI
You can access the logs by using the Spark Application UI from the Analyze page and Notebooks page.
From the Analyze page
From the Home menu, navigate to the Analyze page.
Note the command id, which is unique to the Qubole job or command.
Click on the down arrow on the right of the search bar. The Search History page appears as shown in the following figure.
../../_images/spark-debug1.png
Enter the command id in the Command Id field and click Apply.
Click on the Logs tab or Resources tab.
Click on the Spark Application UI hyperlink.
From the Notebooks page
From the Home menu, navigate to the Notebooks page.
Click on the Spark widget on the top right and click on Spark UI as shown in the following figure.
../../_images/spark-ui.png
OR
Click on the i icon in the paragraph as shown in the following figure.
../../_images/spark-debug2.png
When you open the Spark UI from the Spark widget of the Notebooks page or from the Analyze page, the Spark Application UI is displayed in a separate tab as shown in the following figure.
../../_images/spark-application-ui.png
The Spark Application UI displays the following information:
Jobs: The Jobs tab shows the total number of completed, succeeded and failed jobs. It also shows the number of stages that a job has succeeded.
Stages: The Stages tab shows the total number of completed and failed stages. If you want to check more details about the failed stages, click on the failed stage in the Description column. The details of the failed stages are displayed as shown in the following figure.
../../_images/spark-app-stage.png
The Errors column shows the detailed error message for the failed tasks. You should note the executor id and the hostname to view details in the container logs. For more details about the error stack trace, you should check the container logs.
Storage: The Storage tab displayed the cached data if caching is enabled.
Environment : The Environment tab shows the information about JVM, Spark properties, System properties and classpath entries which helps to know the values for a property that is used by the spark cluster during runtime. The following figure shows the Environment tab.
../../_images/spark-app-env.png
Executors : The Executors tab shows the container logs. You can map the container logs using the executor id and the hostname, which is displayed in the Stages tab.
Spark on Qubole provides the following additional fields in the Executors tab:
Resident size/Container size: Displays the total physical memory used within the container (which is the executor’s java heap + off heap memory) as Resident size, and the configured yarn container size (which is executor memory + executor overhead) as Container size.
Heap used/committed/max: Displays values corresponding to the executor’s java heap.
The following figure shows the Executors tab.
../../_images/spark-app-exec.png
The Logs column in shows the links to the container logs. Additionally, the number of tasks executed by each executor with number of active, failed, completed and total tasks are displayed.
Note
For debugging container memory issues, you can check the statistics on container size, Heap used, the input size, and shuffle read/write.
Feedback
Accessing Additional Spark Logs
Apart from accessing the logs from the QDS UI, you can also access the following logs, which reside on the cluster, to identify the errors and exceptions in Spark jobs failures:
and it contains the Spark event logs.
Spark History Server Logs: The spark-yarn-org.apache.spark.deploy.history.HistoryServer-1-localhost.localdomain.log
files are stored at /media/ephemeral0/logs/spark. The Spark history server logs are stored only on the master node of the cluster.
Spark Event Logs: The Spark eventlog files are stored at
/logs/hadoop///spark-eventlogs where:
scheme is the Cloud-specific URI scheme: s3:// for AWS; wasb:// or adl:// or abfs[s] for Azure; oci:// for Oracle OCI.
defloc is the default storage location for the QDS account.
cluster_id is the cluster ID as shown on the Clusters page of the QDS UI.
cluster_inst_id is the cluster instance ID. You should contact Qubole Support to obtain the cluster instance ID.
I am in middle of building a pyspark application that fails alot and has lot of jobs with lot of steps, so it is not possible to search with cluster id and step id. the current format in which spark on emr save is below
S3/buckt-name/logs/sparksteps/j-{clusterid}/steps/s-{stepid}/stderr.gz
I want something traceable in place of {clusterid} and {stepid} such that clustername+datetime and step-name
I saw log4j.properties and it has something named datepattern, but it is not saving anything with datetime
You could index the logs into an ELK cluster (managed or not) using filebeats.
Or send the logs to cloudwatch logs using a bootstrap script on the EMR or a Lambda. You can then customize the log group and log stream names to your needs.
I have been using zeppelin for few months now. It is a great tool for internal data analytics. I am looking for more features for sharing the report with the customers. I need to send weekly/monthly/quarterly report to the customers. Looking for a way to automate this process.
Please let me know if Databricks Spark Notebook or any other tool has features to help me to do this.
You can use databricks dashboard for this. Once you have the dashboard, you can do an HTML export of the dashboard and share the HTML file to the public.
If you're interested in automating the reporting process, you may want to look into databricks REST API: https://docs.databricks.com/api/latest/jobs.html#runs-export. You need to pass the run_id of the notebook job and the desired views_to_export (this value should be DASHBOARD) as the query parameters. Note that this run export only supports notebook jobs exports only, which is fine cos dashboards are usually generated from notebook jobs.
If your databricks HTML dashboard export is successful, you'll get a "views" JSON response which consists of a list of key-value pair objects, your HTML string will be available under the "content" key in each of the objects. You can then do anything with this HTML string, you can send it directly to email/slack for automatic reporting.
In order to generate a run_id, you first need to create a notebook job, which you can do via databricks UI. Then, you can get the run_id by triggering the notebook job to run by either:
using databricks scheduler, or
using the databricks run job now REST API: https://docs.databricks.com/api/latest/jobs.html#run-now .
I preferred using the 2nd method, and run the job programmatically via REST API, because I can always find the run_id when I run the job, unlike the first method where I have to look at the databricks UI each time the job is scheduled to run. Either way, you must wait for the notebook job run to finish before running the notebook job export in order to get the complete databricks dashboard in HTML successfully.
When the application is running, I am able to view the log from the RM UI. But after the application exits, I got this message when trying to view the log:
Failed while trying to construct the redirect url to the log server.
Log Server url may not be configured java.lang.Exception: Unknown
container. Container either has not started or has already completed
or doesn't belong to this node at all.
I looked around my HDInsight storage but I could not find any log file.
In case you are using YARN for your Spark execution, you could use its built-in log system.
According to the official Spark documentation:
If log aggregation is turned on (with the yarn.log-aggregation-enable config), container logs are copied to HDFS and deleted on the local machine. These logs can be viewed from anywhere on the cluster with the “yarn logs” command.
HDInsight clusters support this type of logging. In order to access them, the command below can be used from a command line:
yarn logs -applicationId <app ID>
To identify the application ID, you might want to access the Hadoop user interface and look up for the All Applications section:
Note: In order to output the entire log into a file, you might want to append > TextFile.txt to the above command.