I'm trying to submit a Spark batch job using Apache Livy.
I am running my Spark cluster and Livy service on Openshift.
the Spark cluster is in standalone cluster mode, Livy runs spark in cluster deploy mode.
I've configured Livy to the Spark cluster master service:
livy.spark.master = "spark://spark.spark.svc:7077
when i try to run a scala spark job, using the POST /batches method and get the exception:
Unable to connect to server spark://spark.spark.svc:7077
Anyone experienced this issue?
Related
The main problem is that we are unable to run spark in client mode.
Whenever we try to connect to spark on YARN mode from kubeflow notebook we have the following error:
`Py4JJavaError: An error occurred while calling o81.showString.
: org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down
at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:932)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:930)`
It seems we have exact same issue here:
Up to now:
we have managed to submit spark on notebook.
Also, it is possible to connect cluster mode from kubeflow notebook.
We have also managed to run spark session with python shell on one of the worker server on kubernetes. We are able to connect remote edge node which managed by Cloudera.
We have checked that there is no network issue between hadoop clusters and kubernetes clusters.
However, we still have no access interactive spark on jupyter notebook.
I am trying to connect to Snowflake from EMR cluster using pyspark.
I am using these two jars in spark-submit.
snowflake-jdbc-3.5.2.jar
spark-snowflake_2.11-2.7.0-spark_2.4.jar
But it failing with connect time out error.
I have correct proxy configured for the EMR cluster. From the same EC2 (EMR Master)
I am able to connect to Snowflake using snowsql and python connector.
I am not sure why it is getting timed out for pyspark.
You can use our SnowCD tool to check the connectivity diagnostics. This is related to network issues.
https://docs.snowflake.com/en/user-guide/snowcd.html
Below were my commands when I tried running through EMR shell.
pyspark --packages net.snowflake:snowflake-jdbc:3.6.27,net.snowflake:spark-snowflake_2.12:2.4.14-spark_2.4
spark-submit --packages net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4 SparkConnPythonWithCert.py
Spark-shell --packages net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4
We are running spark on mesos client mode.
We have also spark history server.
Spark log events can be seen fine in spark history server.
But how we can get the spark executor logs from spark ui or spark history server?
Scenario :
I have spark cluster and I also want to use Livy.
I am new about Livy
Problem :
I built
my spark cluster by using docker swarm and I will also create a
service for Livy.
Can Livy communicate with external spark master and
send a job to external spark master? If it is ok, which configuration
need to be done? Or Livy should be installed on spark master node?
I think is a little late, but I hope this help you.
sorry for my english, but I am mexican, you can use docker to send jobs via livy, but also you can use livy to send jobs throw Livy REST API.
The livy server can be outside of the spark cluster, you only need to send a conf file to livy that points to you spark cluster.
It looks you are running spark standalone, easist way to configure livy to work is that livy lives on spark master node, if you already have YARN on your cluster machines, you can install livy on any node and run spark application in yarn-cluster or yarn-client mode.
I am able to submit spark job on linux server using console. But is there any API or some framework that can enable to submit spark job in linux server?
You can use port 7077 to submit spark jobs in you spark cluster instead of using spark-submit.
val spark = SparkSession
.builder()
.master(spark://master-machine:7077)
you can look into Livy server. It is in GA mode in Hortonworks and Cloudera distros of Apache Hadoop. We have had good success with it. its documentation is good enough to get started with. Spark jobs start instantaneously when submitted via Livy since it has multiple SparkContexts running inside it.