Databricks, SPARK UI, sql logs: Retrieve with REST API - databricks

Is it possible to retrieve Databricks/Spark UI/SQL logs using the rest-API, any retention limit?, can't see any related API rest-api azure Databricks
Note: cluster /advanced options/logging has not been set.

cluster_log_conf: The configuration for delivering Spark logs to a long-term storage destination. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every
5 mins. The destination of driver logs is //driver, while the destination of executor logs is //executor.
Refer official documentation

Related

Spark stopped generating driver logand event logs, only executor logs are generated

I am using logging on my databricks clusters and I am sending my log data to blob container which I have mounted on my cluster (Cluster Configuration -> Advanced Options -> Logging -> Mounted Path).
Earlier, all the logs were getting generated but after some day (maybe because of some change) there are no logs being generated on log4j console of databricks.
I checked the same on blob container as well there also only executor logs are being logged
Blob Log Image 1
I tried recreating the same issue on some other cluster, but there I could find all the logs are getting generated as expected.
Blob log Image 2
This may occur if the SAS of your Blob storage expires after mounting with databricks.
To trouble shoot this, create two clusters and store one cluster's logs in dbfs and another in mounted blob with specified time for expiration of SAS.
After expiration check the logs stored in blob and dbfs. If logs are not storing in only blob this issue because of SAS expiration. If logs not storing in both dbfs and blob, then there is an issue with the workspace for cluster logs.
My suggestion is trying with new databricks workspace and if there is no issue with logs in both dbfs and blob, you can try Diagnostic logging in Azure Databricks - Azure Databricks | Microsoft Docs

Log analytics in Databricks Azure

I have created a log analytics workspace which used the grab all metrics from a databricks cluster.
Actually, my query would be to get the times when the cluster were running more than 10 worker instances. Can I find this information through log analytics in azure for databricks?
any hint would be apprecited.
You can use the Log4j appender to send your Azure Databricks application logs to Azure Log Analytics.
Refer Send application logs using Log4j to know the steps to implement the same.
You can also visualize the Databricks resource metrics using Grafana and Virtual Machine. Check this official Microsoft documentation for the same. Use dashboards to visualize Azure Databricks metrics

SPARK : How to access AzureFileSystemInstrumentation when using azure blob storage with spark cluster?

I am working on a spark project where the storage sink is Azure Blob Storage. I write data in parquet format. I need some metrics around storage, eg. numberOfFilesCreated, writtenBytes etc. On searching for it online I came across a particular metrics that the hadoop-azure package has called the AzureFileSystemInstrumentation. I am not sure about how to access the same from spark and can't find any resources for the same. How would one access this instrumentation for the given spark job?
Based on my experience, I think there are three solution can be used in your current scenario, as below.
Directly use Hadoop API for HDFS to get HDFS Metrics Data in Spark, because hadoop-azure just implements the HDFS APIs for using Azure Blob Storage, so please see the Hadoop offical document for Metrics to know what particular metrics you want to use, such as CreateFileOps or FilesCreated as the figure below to get numberOfFilesCreated. Meanwhile, there is a similar SO thread How do I get HDFS bytes read and write for Spark applications? which you can refer to.
Directly use Azure Storage SDK for Java or other languages you used to write a program to do the statistics for files stored in Azure Blob Storage as blobs ordered by creation timestamp or others, please refer to the offical document Quickstart: Azure Blob storage client library v8 for Java to know how to use its SDK.
Use Azure Function with Blob Trigger to monitor the events of files created in Azure Blob Storage, then you can write the code for statistics on every blob created event, please refer to the offical document Create a function triggered by Azure Blob storage to know how to use Blob Trigger. Even, you can send these metrics what you want to Azure Table Storage or Azure SQL Database or other services for statistics later in the Azure Blob Trigger Function.

How to re-direct logs from Azure Databricks to another destination?

We could use some help on how to send Spark Driver and worker logs to a destination outside Azure Databricks, like e.g. Azure Blob storage or Elastic search using Eleastic-beats.
When configuring a new cluster, the only options on get reg log delivery destination is dbfs, see
https://docs.azuredatabricks.net/user-guide/clusters/log-delivery.html.
Any input much appreciated, thanks!
Maybe the following could be helpful :
First you specify a dbfs location for your Spark driver and worker logs.
https://docs.databricks.com/user-guide/clusters/log-delivery.html
Then, you create a mount point that links your dbfs folder to a Blob Storage container.
https://docs.databricks.com/spark/latest/data-sources/azure/azure-storage.html#mount-azure-blob-storage-containers-with-dbfs
Hope this help !

Can we fetch custom logs in Azure OMS

Our Project is a Java Spring boot application, We have a logging system using log4j, Which we are pushing into the Azure Storage accounts.
Question:
I want to query these custom logs in OMS. (Is it possible)
If Yes how.
Till now what i have tried is.
1. Pushed the logs in Blob storage using Logback and container looks like
Pushed logs in table storage
And configured Storage accounts in log analytics in Azure workspace
But i am unable to see any Analytic data to query in OMS .
Please help.
If you can't use Application Insights, you can read logs files from Storage and use HTTP Data Collector API to push logs into Log Analytics workspace. Samples and reference: https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-data-collector-api

Resources