How can I check if someone is using cluster with databricks connect? - databricks

When someone is connected to the Databricks cluster , I can see in Clusters details that the certain cluster is active and there are some notebooks attached.
But when I'm using the cluster with databricks-connect, cluster is not running.
Is there a way to check if someone is connected to the cluster with databricks-connect?

You can see that in the Spark UI of the cluster, in the Jobs tab, in the description of the executed job you will see message like: DB Connect execution from <user-name>

Related

How to find creation date of databricks interactive cluster?

Is there a way to know creation date of databricks interactive cluster ?
I looked at configuration tab as well as JSON of ARM but couldn't find it..
Cluster event logs, which capture cluster lifecycle events, like creation, termination, configuration edits, and so on.
The cluster event log displays important cluster lifecycle events that are triggered manually by user actions or automatically by Azure Databricks. Such events affect the operation of a cluster as a whole and the jobs running in the cluster.

Namenode uptime metric Ambari server

I have a spark cluster running on HDInsights. Ambari shows some general metrics in its dashboard, such as Namenode uptime. Where/How may I find the raw data related to such metric?
Thanks
You should "Enable HDInsight Azure Monitor logs integration" to find the raw data related to any metrics that are available in Ambari UI.
Using Ambari Dashboard:
Ambari dashboard, which contains widgets that show a handful of metrics to give you a quick overview of your HDInsight cluster's health. These widgets show metrics such as the number of live DataNodes (worker nodes) and JournalNodes (zookeeper node), NameNodes (head nodes) uptime, as well metrics specific to certain cluster types, like YARN ResourceManager uptime for Spark and Hadoop clusters.
Using Azure Monitor:
Azure Monitor logs enable data generated by multiple resources, such as HDInsight clusters, to be collected and aggregated in one place to achieve a unified monitoring experience.
As a prerequisite, you'll need a Log Analytics Workspace to store the collected data. If you haven't already created one, you can follow instructions here: Create a Log Analytics Workspace.
HDInsight clusters include Azure Monitor logs integration, which provides queryable metrics and logs, as well as configurable alerts. This article shows how to use Azure Monitor to monitor your cluster.
As an example, run the Availability rate sample query by selecting Run on that query, as shown in the screenshot above. This will show the availability rate of each node in your cluster as a percentage. If you have enabled multiple HDInsight clusters to send metrics to the same Log Analytics workspace, you'll see the availability rate for all nodes in those clusters displayed.
The Ambari agent is likely using os.system() python calls. You can do the same with
ssh user#node -c "uptime"

How to connect scheduler and workers in airflow?

We have configured a webserver a scheduler and a VMSS for workers in airflow. Have created the postgre for Metadata information for all Airflow related activities like tasks, connections , redis for For orchestrating the workers and azure blob storage for logging . Have created a sample dag and when triggered the dag keeps on running and is not executed.I find my scheduler workers and webserver up and working fine am not sure why my jobs are not picked by the scheduler. Is there any connection that I could have missed ? Kindly let me know on the same.
Ensure you have set a FERNET_KEY and its the same across Webserver, Scheduler and Workers.
To generate FERNET_KEY
https://bcb.github.io/airflow/fernet-key

presto + how to manage presto servers stop/start/status action

we installed the follwing presto cluster on Linux redhat 7.2 version
presto latest version - 0.216
1 presto coordinator
231 presto workers
on each worker machine we can use the follwing command in order to verify the status
/app/presto/presto-server-0.216/bin/launcher status
Running as 61824
and also stop/start as the follwing
/app/presto/presto-server-0.216/bin/launcher stop
/app/presto/presto-server-0.216/bin/launcher start
I also searches in google about UI that can manage the presto status/stop/start
but not seen any thing about this
its very strange that presto not comes with some user interface that can show the cluster status and do stop/start action if we need to do so
as all know the only user interface of presto is show status and not have the actions as stop/start
in the above example screen we can see that the active presto worker are only 5 from 231 , but this UI not support stop/start actions and not show on which worker presto isn't active
so what we can do about it?
its very bad idea to access each worker machine and see if presto is up or down
why presto not have centralized UI that can do stop/start action ?
example what we are expecting from the UI , - partial list
.
.
.
Presto currently uses discovery service where workers announce themselves to join the cluster, so if a worker node is not registered there is no way for coordinator or discovery server to know about its presence and/or restart it.
At Qubole, we use an external service alongside presto master that tracks nodes which do not register with discovery service within a certain interval. This service is responsible for removing such nodes from the cluster.
One more thing we do is use monit service on each of presto worker nodes, which ensures that presto server is restarted whenever it goes down.
You may have to do something similar for cluster management , as presto does not provide it right now.
In my opinion and experience managing prestosql cluster, it matters of service discovery in architecture patterns.
So far, it uses following patterns in the open source release of prestodb/prestosql:
server-side service discovery - it means a client app like presto cli or any app uses presto sdk just need to reach a coordinator w/o awareness of worker nodes.
service registry - a place to keep tracking available instances.
self-registration - A service instance is responsible for registering itself with the service registry. This is the key part that it forces several behaviors:
Service instances must be registered with the service registry on startup and unregistered on shutdown
Service instances that crash must be unregistered from the service registry
Service instances that are running but incapable of handling requests must be unregistered from the service registry
So it keeps the life-cycle management of each presto worker to each instance itself.
so what we can do about it?
It provides some observability from presto cluster itself like HTTP API /v1/node and /v1/service/presto to see instance status. Personally I recommend using another cluster manager like k8s or nomad to manage presto cluster members.
its very bad idea to access each worker machine and see if presto is up or down
why presto not have centralized UI that can do stop/start action ?
No opinion on good/bad. Take k8s for example, you can manage all presto workers as one k8s deployment and manage each presto worker in one pod. It can use Liveness, Readiness and Startup Probes to automate the instance lifecycle with a few YAML code. E.g., the design of livenessProbe of helm chart stable/presto. And cluster manageer like k8s does provide web UI so that you can touch resources to act like an admin. . Or you can choose to write more Java code to extend Presto.

Using Apache Falcon to setup data replication accross clusters

We have been PoC-ing falcon for our data ingestion workflow. We have a requirement to use falcon to setup a replication between two clusters (feed replication, not mirroring). The problem I have is that the user ID on cluster A is difference from the ID in cluster B. Has anyone used falcon with this setup? I can't seem to find a way to get this to work.
1) I am setting up a replication from Cluster A => Cluster B
2) I am defining the falcon job on cluster A
At the time of the job setup it looks like I can only define one user ID that owns the job. How do I setup a job where the ID on cluster A is different from the ID in cluster B? Any help would be awesome!!
Apache Falcon uses 'ACL owner', which should have write access as the target cluster where the data is to be copied.
Source cluster should have webhdfs enabled, by which the data will be accessed.
So on the source cluster dont schedule the feed, if the user does not have write access which is required for retention.
Hope this helps.

Resources