Spark standalone master behind the vpn - apache-spark

I have a Spark standalone master running on a machine in the AWS VPC binding to the private IP address. I'm able to run spark jobs from the machine inside the VPC but not from my laptop that is connecting to the cluster via VPN. I checked executor logs on a spark worker and got Cannot receive any reply in 120 seconds.. Looks like networking issue.
Anybody knows how to solve that?

Related

My kubernetes pods dont reach to a VM in the same network

I had to setup an cluster on CentOS virtual machines, and I did as I found on internet; at first seems like is working well, but recently i'm getting too much trouble with istio and communications. Well, my major problem is that I cant reach to a VM who has the database, the master node and the worker nodes can reach and telnet to the database VM, but the pods cant. Do I have to set some kubernetes network config?
Thanks (:

Failing to create Azure Databricks cluster because of unreachable instances

I'm trying to create a cluster in Azure Databricks and getting a such error messgae
Resources were not reachable via SSH. If the problem persists, this usually indicates a network environment misconfiguration. Please check your cloud provider configuration, and make sure that Databricks control plane can reach Spark clusters instances.
I have such the default configuration:
Cluster mode: Standard
Pool: None
Runtime version: 5.5 LTS
Autoscaling enabled
Worker Type: Standard_DS3_v2
Driver Type: Standard_DS3_v2
From Logs Analytics I see Azure tried to create virtual machines and without any reason (I suppose because they were unreachable) had to delete all of them.
Did anyone face such issue?
If the issue is temporary, this may be caused by the driver of the virtual machine going down or a networking issue since Azure Databricks was able to launch the cluster, but lost the connection to the instance hosting the Spark driver referring to this. You could try to remove it and create the cluster again.
If the problem persists, this may happen when you have an Azure Databricks workspace deployed to your own VNet. If the virtual network where the workspace is deployed is already peered or has an ExpressRoute connection to on-premises resources, the virtual network cannot make an ssh connection to the cluster node when Azure Databricks is attempting to create a cluster. You could add a user-defined route (UDR) to give the Azure Databricks control plane ssh access to the cluster instances.
For detailed UDR instructions, see Step 3: Create user-defined routes and associate them with your Azure Databricks virtual network subnets. For more VNet-related troubleshooting information, see Troubleshooting
Hope this could help you.
Issue: Instances Unreachable: Resources were not reachable via SSH.
Possible cause: traffic from control plane to workers is blocked. If you are deploying to an existing virtual network connected to your on-premises network, review your setup using the information supplied in Connect your Azure Databricks Workspace to your On-Premises Network.
Reference: Azure Databricks - Troubleshooting
Hope this helps.

Spark with Kubernetes connecting to pod id, not address

We have a k8s deployment of several services including Apache Spark. All services seem to be operational. Our application connects to the Spark master to submit a job using the k8s DNS service for the cluster where the master is called spark-api so we use master=spark://spark-api:7077 and we use spark.submit.deployMode=cluster. We submit the job through the API not by the spark-submit script.
This will run the "driver" and all "executors" on the cluster and this part seems to work but there is a callback to the launching code in our app from some Spark process. For some reason it is trying to connect to harness-64d97d6d6-4r4d8, which is the pod ID, not the k8s cluster IP or DNS.
How could this pod ID be getting into the system? Spark somehow seems to think it is the address of the service that called it. Needless to say any connection to the k8s pod ID fails and so does the job.
Any idea how Spark could think the pod ID is an IP address or DNS name?
BTW if we run a small sample job with master=local all is well, but the same job executed with the above config tries to connect to the spurious pod ID.
BTW2: the k8s DNS for the calling pod is harness-api
You can consider to use Headless service for harness-64etcetc Pod in order to accomplish backward DNS discovery. Actually, it will create particular endpoint for the relevant service by matching appropriate selector inside your application Pod and as result A record expects to be added into Kubernetes DNS configuration.
Eventually, I've found related #266 Github issue, which probably can bring some useful information for further investigation.

Accessing Spark in Azure HDInsights via JDBC

I'm able to connect to hive externally using the following URL for a HDInsight cluster in Azure.
jdbc:hive2://<host>:443/default;transportMode=http;ssl=true;httpPath=/
However, I'm not able to find such a string for spark. The documentation says the port is 10002, but its not open externally. How do I connect to the cluster to run SparkSQL queries through JDBC?
There is not one available. But you can vote for the feature at https://feedback.azure.com/forums/217335-hdinsight/suggestions/14794632-create-a-jdbc-driver-for-spark-on-hdinsight.
HDInsight is deployed with a gateway. This is the reason why HDInsight clusters out-of-box enable only HTTPS (Port 443) and SSH (Ports 22, 23) communication to the cluster. If you don' t deploy the cluster in a virtual network (vnet) there is no other way you can communicate with HDInsight clusters. So instead of Port 10002 Port 443 is used if you want to access the Spark thrift server. If you deploy the cluster in a vnet, you could also access the thrift server via the ip address it is running on (one of the headnodes) and standard port 10002. See also public and non-public ports in the documentation.

WebUI access for workers in spark

We have a cluster which is built by docker swarm
Cluster consists of 1 Manager 3 Worker nodes.
it can be seen as follow:
and we have run Apache Spark on the cluster. It consists of a master and four workers. It is seen as follow on master web ui
The problem is that I can not access the details of worker node. It wants to connect to an ip(10.0.0.5:8081). But I can not access the link from my local machine.
you need to bind the port of the spark webui service and access the webui using localhost:8081 (if you are binding the localport 8081)
example in docker-compose.yml file
in the spark webui service put something like this
https://docs.docker.com/compose/compose-file/#ports
the ip you specified(10.0.0.5) is the subnet created by docker you cannot access using that ip from your machine

Resources