Accessing Spark in Azure HDInsights via JDBC - azure

I'm able to connect to hive externally using the following URL for a HDInsight cluster in Azure.
jdbc:hive2://<host>:443/default;transportMode=http;ssl=true;httpPath=/
However, I'm not able to find such a string for spark. The documentation says the port is 10002, but its not open externally. How do I connect to the cluster to run SparkSQL queries through JDBC?

There is not one available. But you can vote for the feature at https://feedback.azure.com/forums/217335-hdinsight/suggestions/14794632-create-a-jdbc-driver-for-spark-on-hdinsight.

HDInsight is deployed with a gateway. This is the reason why HDInsight clusters out-of-box enable only HTTPS (Port 443) and SSH (Ports 22, 23) communication to the cluster. If you don' t deploy the cluster in a virtual network (vnet) there is no other way you can communicate with HDInsight clusters. So instead of Port 10002 Port 443 is used if you want to access the Spark thrift server. If you deploy the cluster in a vnet, you could also access the thrift server via the ip address it is running on (one of the headnodes) and standard port 10002. See also public and non-public ports in the documentation.

Related

Connect to Azure HDInsight Kafka cluster from public network / local machine

Is there a way to connect to Azure HDInsight Kafka cluster from the public network without using a VPN ?
Is it only possible to connect to Azure Kafka cluster from with in Azure network ?
Thanks
You don't need a VPN. Just open the firewall ports from Azure

Cassandra inter DC sync over VPN on GCP

I have an VPN between the company network 172.16.0.0/16 and GCP 10.164.0.0/24
On GCP there is a cassandra cluster running with 3 instances. These instances get dynamical local ip adresses - for example 10.4.7.4 , 10.4.6.5, 10.4.3.4.
My issue: from the company network I cannot access 10.4x addresses as the tunnel works only for 10.164.0.0/24.
I tried setting up an LB service on 10.164.0.100 with the cassandra nodes behind. This doesnt work: when I configure that ip adress as seed node on local cluster, it gets an reply from one of the 10.4.x ip addresses, which it doesnt have in its seed list.
I need advice how to setup inter DC sync in this scenario.
IP addresses which K8s assign to Pods and Services are internal cluster-only addresses which are not accessible from outside of the cluster. It is possible by some CNI to create connection between in-cluster addresses and external networks, but I don't think that is a good idea in your case.
You need to expose your Cassandra using Service with NodePort or LoadBalancer type. That is another one answer with a same solution from Kubernetes Github.
If you will add a Service with type NodePort, your Cassandra will be available on a selected port on all Kubernetes nodes.
If you will choose LoadBalancer, Kubernetes will create for you Cloud Load Balancer which will be an entrypoint for Cassandra. Because you have a VPN to your VPC, I think you will need an Internal Load Balancer.

WebUI access for workers in spark

We have a cluster which is built by docker swarm
Cluster consists of 1 Manager 3 Worker nodes.
it can be seen as follow:
and we have run Apache Spark on the cluster. It consists of a master and four workers. It is seen as follow on master web ui
The problem is that I can not access the details of worker node. It wants to connect to an ip(10.0.0.5:8081). But I can not access the link from my local machine.
you need to bind the port of the spark webui service and access the webui using localhost:8081 (if you are binding the localport 8081)
example in docker-compose.yml file
in the spark webui service put something like this
https://docs.docker.com/compose/compose-file/#ports
the ip you specified(10.0.0.5) is the subnet created by docker you cannot access using that ip from your machine

Access Cassandra Node on the Azure Cloud from outside

I have created a Linux VM with a single node Cassandra cluster installed.
Cassandra.yaml has the following:
seeds:
listen address:
rpc address:
netstat -an check with all required port are up and listening. (i.e. 9160, 9042)
I am trying to connect my application which is outside of the Azure cloud to access the cassandra cluster in the cloud. Looks like the connection between the outside host to the Azure cloud Cassandra node has been block.
Wonder if there is a true restriction to access Azure VM from out of network. Is there a way to access this cassandra node from outside?
If someone can answer my question would be very nice.
Thank you!
You need to go to the "Endpoints" of your virtual machine:
At the bottom click on "Add", and add new endpoints for these ports.
Then you will need to manage ACL for each endpoint, defining the IP ranges of the allowed and blocked IP addresses.
Keep in mind that, if the internal IP that is used by the virtual machine, is different from external (public) IP, that is used by the client, then depending on a driver you may need to teach it how to do address translation. Otherwise, the cluster will report only internal IPs upon the discovery request, which will obviously be not accessible from outside.
From this and from the security prospective I would recommend setting up cassandra cluster inside of the virtual network, and accessing it via VPN.
There is a comprehensive tutorial how to do it here: http://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-nodejs-running-cassandra/

Opening a port on HDInsight cluster on Azure

I have a microsoft Azure HDInsight cluster.
On the node I am rdp'ing and starting an application that binds to port 8080. I would like to be able to connect to this application from outside the cluster.
I have my cluster connection string (https://xxxxx.azurehdinsight.net) however when I try to connect to it I am timing out.
I believe this is due to the fact that I have not opened port 8080 to public. How can I do this as under the cluster I only have Hadoop Services and username....
At this point in time, we don't allow you to control / open additional network ports on an HDInsight cluster.
You can deploy an HDInsight cluster into an Azure Virtual network if you'd like to have another machine in Azure to have access to all of the ports/nodes on the cluster. We've documented how to deploy into a vnet in this article.

Resources