Is there a way to know creation date of databricks interactive cluster ?
I looked at configuration tab as well as JSON of ARM but couldn't find it..
Cluster event logs, which capture cluster lifecycle events, like creation, termination, configuration edits, and so on.
The cluster event log displays important cluster lifecycle events that are triggered manually by user actions or automatically by Azure Databricks. Such events affect the operation of a cluster as a whole and the jobs running in the cluster.
Related
When someone is connected to the Databricks cluster , I can see in Clusters details that the certain cluster is active and there are some notebooks attached.
But when I'm using the cluster with databricks-connect, cluster is not running.
Is there a way to check if someone is connected to the cluster with databricks-connect?
You can see that in the Spark UI of the cluster, in the Jobs tab, in the description of the executed job you will see message like: DB Connect execution from <user-name>
I have a spark cluster running on HDInsights. Ambari shows some general metrics in its dashboard, such as Namenode uptime. Where/How may I find the raw data related to such metric?
Thanks
You should "Enable HDInsight Azure Monitor logs integration" to find the raw data related to any metrics that are available in Ambari UI.
Using Ambari Dashboard:
Ambari dashboard, which contains widgets that show a handful of metrics to give you a quick overview of your HDInsight cluster's health. These widgets show metrics such as the number of live DataNodes (worker nodes) and JournalNodes (zookeeper node), NameNodes (head nodes) uptime, as well metrics specific to certain cluster types, like YARN ResourceManager uptime for Spark and Hadoop clusters.
Using Azure Monitor:
Azure Monitor logs enable data generated by multiple resources, such as HDInsight clusters, to be collected and aggregated in one place to achieve a unified monitoring experience.
As a prerequisite, you'll need a Log Analytics Workspace to store the collected data. If you haven't already created one, you can follow instructions here: Create a Log Analytics Workspace.
HDInsight clusters include Azure Monitor logs integration, which provides queryable metrics and logs, as well as configurable alerts. This article shows how to use Azure Monitor to monitor your cluster.
As an example, run the Availability rate sample query by selecting Run on that query, as shown in the screenshot above. This will show the availability rate of each node in your cluster as a percentage. If you have enabled multiple HDInsight clusters to send metrics to the same Log Analytics workspace, you'll see the availability rate for all nodes in those clusters displayed.
The Ambari agent is likely using os.system() python calls. You can do the same with
ssh user#node -c "uptime"
We have a ADFv1 pipeline with an HDInsightHive type activity that submits hive script to a Hadoop HDIndight cluster. Looking at the JSON for the pipeline, there doesn't seem to be any way to specify a YARN queue that the job should be submitted to.
So it's assuming that the job is always to be submitted to the default queue. I didn't find anything in ADFv1 documentation yet to specify queue name (assuming we actually create more YARN queues on the cluster using capacity scheduler).
Can someone provide sample JSON for specifying a YARN queue in an activity if it is possible at all? Also, my requirement is specifically for ADFv1, I would also like to know if this is a limitation of ADFv1, is it fixed in ADFv2?
Currently, Azure Data Factory doesn't support submitting an activity to a specific queue.
Azure Data Factory activity always submitted to the default queue.
I would suggest you vote up an idea submitted by another Azure customer.
https://feedback.azure.com/forums/270578-data-factory/suggestions/32956186-hdinsightspark-activity-should-support-additional
All of the feedback you share in these forums will be monitored and reviewed by the Microsoft engineering teams responsible for building Azure.
we installed the follwing presto cluster on Linux redhat 7.2 version
presto latest version - 0.216
1 presto coordinator
231 presto workers
on each worker machine we can use the follwing command in order to verify the status
/app/presto/presto-server-0.216/bin/launcher status
Running as 61824
and also stop/start as the follwing
/app/presto/presto-server-0.216/bin/launcher stop
/app/presto/presto-server-0.216/bin/launcher start
I also searches in google about UI that can manage the presto status/stop/start
but not seen any thing about this
its very strange that presto not comes with some user interface that can show the cluster status and do stop/start action if we need to do so
as all know the only user interface of presto is show status and not have the actions as stop/start
in the above example screen we can see that the active presto worker are only 5 from 231 , but this UI not support stop/start actions and not show on which worker presto isn't active
so what we can do about it?
its very bad idea to access each worker machine and see if presto is up or down
why presto not have centralized UI that can do stop/start action ?
example what we are expecting from the UI , - partial list
.
.
.
Presto currently uses discovery service where workers announce themselves to join the cluster, so if a worker node is not registered there is no way for coordinator or discovery server to know about its presence and/or restart it.
At Qubole, we use an external service alongside presto master that tracks nodes which do not register with discovery service within a certain interval. This service is responsible for removing such nodes from the cluster.
One more thing we do is use monit service on each of presto worker nodes, which ensures that presto server is restarted whenever it goes down.
You may have to do something similar for cluster management , as presto does not provide it right now.
In my opinion and experience managing prestosql cluster, it matters of service discovery in architecture patterns.
So far, it uses following patterns in the open source release of prestodb/prestosql:
server-side service discovery - it means a client app like presto cli or any app uses presto sdk just need to reach a coordinator w/o awareness of worker nodes.
service registry - a place to keep tracking available instances.
self-registration - A service instance is responsible for registering itself with the service registry. This is the key part that it forces several behaviors:
Service instances must be registered with the service registry on startup and unregistered on shutdown
Service instances that crash must be unregistered from the service registry
Service instances that are running but incapable of handling requests must be unregistered from the service registry
So it keeps the life-cycle management of each presto worker to each instance itself.
so what we can do about it?
It provides some observability from presto cluster itself like HTTP API /v1/node and /v1/service/presto to see instance status. Personally I recommend using another cluster manager like k8s or nomad to manage presto cluster members.
its very bad idea to access each worker machine and see if presto is up or down
why presto not have centralized UI that can do stop/start action ?
No opinion on good/bad. Take k8s for example, you can manage all presto workers as one k8s deployment and manage each presto worker in one pod. It can use Liveness, Readiness and Startup Probes to automate the instance lifecycle with a few YAML code. E.g., the design of livenessProbe of helm chart stable/presto. And cluster manageer like k8s does provide web UI so that you can touch resources to act like an admin. . Or you can choose to write more Java code to extend Presto.
We use Spark 2.2 on Azure HDInsight for ad hoc exploration and batch jobs.
The jobs should run ok on a 5x medium VM cluster. They are
1. notebooks (Zeppelin with Livy.spark2 magics)
2. compiled jars being run with Livy.
I have to remember to scale this cluster down to 1 worker when not using it, to save money. (0 workers would be nice, if that were possible).
I'd like Spark to manage this for me... When a Job starts, scale the cluster up to a minimum size first, then pause ~10 mins while that completes. After an idle period without Jobs, scale down again.
You can use PowerShell or Azure classic CLI to scale up/down the cluster. But you might need to write a script to track the cluster resource usage and scale down automatically.
Here is a powershell syntax
Set-AzureRmHDInsightClusterSize -ClusterName <Cluster Name> -TargetInstanceCount <NewSize>
Here is a PowerShell workflow runbook that will help you automate the process of scaling in or out your HDInsight clusters depending on your needs
https://gallery.technet.microsoft.com/scriptcenter/Scale-your-HDInsight-f57bb4d8
or
You can use the below option to scale it manually (even though your question is how to scale up/down automatically, I thought it would be useful to someone who wants to scale up/down manually)
Below is the link for an article explaining different methods to scale the cluster using PowerShell or Classic CLI (remember: the latest CLI does n't support scaling feature)
https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-scaling-best-practices
If you want Spark to handle it dynamically, Azure Databricks is the best choice (but it is only Spark cluster, no Hadoop components (except Hive)). As HDInsight - Spark is not a Azure managed service, and will not solve your use case.
Below is the image of a new cluster (in Azure Data bricks) - I highlighted an "enable auto scaling option" which will allow you to scale dynamically when job is executed.
I'm told that Azure Databricks may be a better solution for this use case.