headnodehost in Azure HDInsights - azure

What is this headnodehost in Azure HDinsights? I setup a HBase cluster. There are headnodes in this HBase cluster. When I RDP to the cluster and open the Hadoop Name Node status weblink from the desktop, it opens web browser with link set to headnodehost:30070. Is the headnodehost the same as the headnodes? The hostname command in the RDP gives me "headnode0" rather than "headnodehost".

Each HDInsight cluster has two headnodes for high availability. It is documented in https://azure.microsoft.com/en-us/documentation/articles/hdinsight-high-availability/

Related

Is there a way to access internal metastore of Azure HDInsight to fire queries on Hive metastore tables?

I am trying to access the internal Hive metastore tables like HIVE.SDS, HIVE.TBLS etc.
I have an HDInsight Hadoop Cluster running with the default internal metastore. From Ambari screen, I got the Advanced setting details required for connections like -
javax.jdo.option.ConnectionDriverName,javax.jdo.option.ConnectionURL,javax.jdo.option.ConnectionUserName as well as the password
When I try connecting to the SQL Server instance(internal hive metastore) instance from a local machine, I get the message to add my IP address to the allowed list. However, since this Azure SQL server is not visible in the list of Azure SQL server dbs in the portal, it is not possible for me to whitelist my IP.
So, I tried logging in via the secure shell user- SSHUSER into the Cluster and tried accessing the HIVE database from within the Cluster using the credentials of metastore provided in Ambari. I am still not able to access it. I am using sqlcmd to connect to sql sever.
Does HDInsight prevent direct access to internal Metastores? Is External Metastore the only way to move ahead? Any leads would be helpful.
Update- I created an external SQL Server instance and used it as an external metastore and was able to access it programatically.
No luck with the Internal one yet.
There is not a way to access internal metastores for HDInsight cluster. The internal metastores live in the internal subscription which only PGs are able to access.
If you want to have more control on you metastores it is recommended to bring your own "external" metastore.

Provide external hive metastore while creating HDInsight spark 4.0 cluster from Command line using az

I'm quite new to Azure. I'm trying to create a HDInsight Spark 4.0 cluster using powershell. Instead of using Azure Resource Manager module, I'm trying to do it using "az" command. Is there a way I can provide External Hive Metastore details while creating the cluster using az? When I try from the azure portal, I see that it is asking for Hive/Oozie metastore details. But I cannot find an option to provide the same while creating the cluster from the Command line (az).
There already is an Azure SQL Server and Database created and I'm able to create the HDInsight cluster by providing the server details. But where do I provide these details in the command line?

How to submit custom spark application on Azure Databricks?

I have created a small application that submits a spark job at certain intervals and creates some analytical reports. These jobs can read data from a local filesystem or a distributed filesystem (fs could be HDFS, ADLS or WASB). Can I run this application on Azure databricks cluster?
The application works fine on HDInsights cluster as I was able to access the nodes. I kept my deployable jar at one location, started it using the start-script similarly I could also stop it using the stop-script that I prepared.
One thing I found is that Azure Databricks has its own File System: ADFS, I can also add support for this file system but then will I be able to deploy and run my application as I was able to do it on the HDInsight cluster? If not, is there a way I can submit jobs from an edge node, my HDInsight cluster or any other OnPrem Cluster to Azure Databricks cluster.
Have you looked at Jobs? https://docs.databricks.com/user-guide/jobs.html. You can submit jars to spark-submit just like on HDInsight.
Databricks file system is DBFS - ABFS is used for Azure Data Lake. You should not need to modify your application for these - the file paths will be handled by databricks.

How to submit Spark job to AWS EC2 cluster?

I am new to AWS EC2 and need to know how I can submit my Spark job to AWS EC2 spark cluster. Like in azure we can directly submit the job through IntelliJ idea with azure plugin.
You can submit a spark job easily through spark-submit command. Refer http://spark.apache.org/docs/latest/submitting-applications.html
Options:
1) login to master or other driver gateway node and use spark-submit to submit the job through YARN/media/etc
2) Use spark submit cluster deploy mode from any machine with sufficient ports and firewall access (may require configuration, such as client config files from Cloudera manager for CDH cluster)
3) Use a server setup like Livy (open source through Cloudera, and MS Azure HDinsights uses and contributes to it) or maybe thrift server. Livy (Livy.io) is a nice simple REST service that also has language APIs for Scala/Java to make it extra easy to submit jobs (and run interactive persisted sessions!)

Hive ODBC connecting to HDInsight

I'm setting up a VM SQL server in Azure and I want it to be able to connect to Hive on a HDInsight cluster. I'm trying to set the ODBC DSN up and I'm unsure of what the various settings are and how to find them in my Azure portal:
Hostname
Username
Password (can I reset this if I've forgotten it)
Cheers, Chris.
Hostname: HDinsight cluster name
Username: HDInsight cluster username
Password: HDinsight cluster password
I don't think you can recover the password. You can delete the HDInsight cluster, and create another one cluster. Because Hadoop jobs are batch jobs, and HDInsight cluster usually contains multiple nodes, poeple usually create a cluster, run a MapReduce job, and delete the cluster right after the job is completed. It is too costly to let an HDInsight cluster sitting in the cloud.
Because HDInsight cluster uses Windows Azure Blob storage for data storage, deleting a cluster will not impact the data.

Resources