Spark workers not able to connect to Spark master service - apache-spark

I have deployed spark on Kubernetes cluster. ref link:https://testdriven.io/blog/deploying-spark-on-kubernetes/
But spark workers are not able to join spark master service
Not able to connect to master service
comment below if you have any solutions.
Spark worker should be able to connect spark master service.

Related

SparkSession Connect to Databricks Azure

I'm using maven and scala to create a spark application that needs to connect to a cluster on azure databricks.
How can i point my sparksession to connect to the databricks cluster?
I saw databricks-connect, but it loads some jar files using sbt.
I don't understand how it achives that connectivity exactly.
My use case requires to run a spark job programmatically on the db cluster upon request, so i need to be able to connect it there.

Can we use spark on Kubernetes in PROD env with beta edition?

we are planning to deploy all spark batch and streaming jobs on Kubernetes (as cluster services). I want to know if we can deploy all spark jobs using spark operator manifest file in PROD Kubernetes Server.
Kubernetes cluster
You can take a look at https://cloud.google.com/dataproc/docs/concepts/jobs/dataproc-gke wherein you can run a Dataproc cluster on GKE. It is a managed service by Google Cloud.

how to remote submit spark jobs to Azure HDInsights cluster without Livy

I want to submit spark job to azure hdInsights cluster from airflow, I don't want to use livy as it doesn't accumulate logs on airflow. Is it possible to do remotely submit job. SSH is 1 option but if job is long running it might break connection. Is there any other option?
Note - Airflow cluster is remote cluster, it's not colocated with spark cluster.

Spark Deployment in Azure cloud

Is it possible to deploy spark code in Azure cloud without the yarn component? thanks in advance
Yes,you can deploy Apache Spark cluster in Azure HDInsight without Yarn.
Spark clusters in HDInsight include the following components that are available on the clusters by default.
1)Spark Core. Includes Spark Core, Spark SQL, Spark streaming APIs, GraphX, and MLlib.
2)Anaconda
3)Livy
4)Jupyter notebook
5)Zeppelin notebook
Spark clusters on HDInsight also provide an ODBC driver for connectivity to Spark clusters in HDInsight from BI tools such as Microsoft Power BI and Tableau.
Refer to the following sites for more information,
Create an Apache Spark cluster in Azure HDInsight
Introduction to Spark on HDInsight
I don't think it is possible to deploy HDInsight cluster without YARN.Refer to the HDInsight documentation
https://learn.microsoft.com/en-sg/azure/hdinsight/hdinsight-hadoop-introduction
https://learn.microsoft.com/en-sg/azure/hdinsight/hdinsight-component-versioning
YARN is the resource manager for Hadoop. Is there any particular reason you would not want to use YARN while working with HDInsight Spark cluster?
If you want to use the standalone mode, you can modify the location of the master url while submitting the job using Spark-submit command.
I have some examples in my repo with Spark-submit both in local mode and on HDInsight cluster
https://github.com/NileshGule/learning-spark
You can refer to
local mode : https://github.com/NileshGule/learning-spark/blob/master/src/main/java/com/nileshgule/movielens/MovieLens.md
HDInsight Spark cluster : https://github.com/NileshGule/learning-spark/blob/master/Azure.md

How to submit Spark job to AWS EC2 cluster?

I am new to AWS EC2 and need to know how I can submit my Spark job to AWS EC2 spark cluster. Like in azure we can directly submit the job through IntelliJ idea with azure plugin.
You can submit a spark job easily through spark-submit command. Refer http://spark.apache.org/docs/latest/submitting-applications.html
Options:
1) login to master or other driver gateway node and use spark-submit to submit the job through YARN/media/etc
2) Use spark submit cluster deploy mode from any machine with sufficient ports and firewall access (may require configuration, such as client config files from Cloudera manager for CDH cluster)
3) Use a server setup like Livy (open source through Cloudera, and MS Azure HDinsights uses and contributes to it) or maybe thrift server. Livy (Livy.io) is a nice simple REST service that also has language APIs for Scala/Java to make it extra easy to submit jobs (and run interactive persisted sessions!)

Resources