How to fix "Connection refused error" when running a cluster mode spark job

How to fix "Connection refused error" when running a cluster mode spark job - apache-spark

I am running terasort benchmark with spark on the uni cluster which uses SLURM job management system. It works fine when I use --master local[8], however when I set the master as my current node I get connection refused error.
I run this command to launch the app on local without problem:
> spark-submit \
--class com.github.ehiggs.spark.terasort.TeraGen \
--master local[8] \
target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar 1g \
data/terasort_in
When I use cluster mode I get the following error:
> spark-submit \
--class com.github.ehiggs.spark.terasort.TeraGen \
--master spark://iris-055:7077 \ #name of the cluster-node in use
--deploy-mode cluster \
--executor-memory 20G \
--total-executor-cores 24 \
target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar 5g \
data/terasort_in
Output:
WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
at
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at
.
.
./*many lines of timeout logs etc.*/
.
.
.
Caused by: java.net.ConnectException: Connection refused
... 11 more
I expect the command to run smooth and terminate, but I cannot get over this connection error.

The problem could be not defining --conf variables. This could work out:
spark-submit \
--class com.github.ehiggs.spark.terasort.TeraGen \
--master spark://iris-055:7077 \
--conf spark.driver.memory=4g \
--conf spark.executor.memory=20g \
--executor-memory 20g \
--total-executor-cores 24 \
target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar 5g \
data/terasort_in

Related

Can we have multiple executors in Spark master local[*] deployment code client

I have a 1 node Hadoop Cluster, I am submitting a spark job like this
spark-submit \
--class com.compq.scriptRunning \
--master local[*] \
--deploy-mode client \
--num-executors 3 \
--executor-cores 4 \
--executor-memory 21g \
--driver-cores 2 \
--driver-memory 5g \
--conf "spark.local.dir=/data/spark_tmp" \
--conf "spark.sql.shuffle.partitions=2000" \
--conf "spark.sql.inMemoryColumnarStorage.compressed=true" \
--conf "spark.sql.autoBroadcastJoinThreshold=200000" \
--conf "spark.speculation=false" \
--conf "spark.hadoop.mapreduce.map.speculative=false" \
--conf "spark.hadoop.mapreduce.reduce.speculative=false" \
--conf "spark.ui.port=8099" \
.....
Though I define 3 executors, I see only 1 executor in spark UI page running all the time. Can we have multiple executors running in parallel with
--master local[*] \
--deploy-mode client \
Its a on-prem, plain open source hadoop flavor installed in the cluster.
I tried changing master local to local[*] and playing around with deployment modes still, I could see only 1 executor running in spark UI

Spark Streaming - Diagnostics: Container is running beyond physical memory limits

My Spark Streaming job failed with the below exception
Diagnostics: Container is running beyond physical memory limits.
Current usage: 1.5 GB of 1.5 GB physical memory used; 3.6 GB of 3.1 GB
virtual memory used. Killing container.
Here is my spark submit command
spark2-submit \
--name App name \
--class Class name \
--master yarn \
--deploy-mode cluster \
--queue Queue name \
--num-executors 5 --executor-cores 3 --executor-memory 5G \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.locality.wait=10 \
--conf spark.task.maxFailures=8 \
--conf spark.ui.killEnabled=false \
--conf spark.logConf=true \
--conf spark.yarn.driver.memoryOverhead=512 \
--conf spark.yarn.executor.memoryOverhead=2048 \
--conf spark.yarn.max.executor.failures=40 \
jar path
I am not sure what's causing the above issue. Am I missing something in the above command or is it failing as I didn't set --driver-memory in my spark submit command?

spark-submit does not terminate driver when process terminates on Kubernetes

I am using Spark (2.4.0) on Kubernetes, if I run the following from a pod:
$SPARK_HOME/bin/spark-submit \
--master k8s://https://master-node \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=1 \
--conf spark.executor.cores=1 \
--conf spark.kubernetes.container.image=container-image \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar 1000000
and exit the process with ctrl+c, the application (driver/executors) keeps running. This is an issue that I have with streaming jobs, when the client pod terminates, the application keeps running.

java.lang.ClassNotFoundException: org.apache.spark.deploy.kubernetes.submit.Client

I am running a sample spark job in kubernetes cluster with following command:
bin/spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--master k8s://https://XXXXX \
--kubernetes-namespace sidartha-spark-cluster \
--conf spark.executor.instances=2 \
--conf spark.app.name=spark-pi \
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.1.0-rc1 \
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.1.0-rc1 \
examples/jars/spark-examples_2.11-2.1.0-k8s-0.1.0-SNAPSHOT.jar 1000
I am building the spark from apache-spark-on-k8s
I am not able find the jar for org.apache.spark.deploy.kubernetes.submit.Client Class.

This issue is resolved. We need to build the spark/resource-manager/kubernetes from the source.

spark throws java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition

when I use spark-submit command in Cloudera Yarn environment, I got this kind of exception:
java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.getDeclaredMethods(Class.java:1975)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.com$fasterxml$jackson$module$scala$introspect$BeanIntrospector$$listMethods$1(BeanIntrospector.scala:93)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findMethod$1(BeanIntrospector.scala:99)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.com$fasterxml$jackson$module$scala$introspect$BeanIntrospector$$findGetter$1(BeanIntrospector.scala:124)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$$anonfun$3$$anonfun$apply$5.apply(BeanIntrospector.scala:177)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$$anonfun$3$$anonfun$apply$5.apply(BeanIntrospector.scala:173)
at scala.collection.TraversableLike$WithFilter$$anonfun$map$2.apply(TraversableLike.scala:722)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:721)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$$anonfun$3.apply(BeanIntrospector.scala:173)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$$anonfun$3.apply(BeanIntrospector.scala:172)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
...
The spark-submit command is like:
spark-submit --master yarn-cluster \
--num-executors $2 \
--executor-cores $3 \
--class "APP" \
--deploy-mode cluster \
--properties-file $1 \
--files $HDFS_PATH/log4j.properties,$HDFS_PATH/metrics.properties \
--conf spark.metrics.conf=metrics.properties \
APP.jar
note that, TopicAndPartition.class is in shaded APP.jar.

Please try adding the Kafka jar using the --jars option as shown in the example below:
spark-submit --master yarn-cluster \
--num-executors $2 \
--executor-cores $3 \
--class "APP" \
--deploy-mode cluster \
--properties-file $1 \
--jars /path/to/kafka.jar
--files $HDFS_PATH/log4j.properties,$HDFS_PATH/metrics.properties \
--conf spark.metrics.conf=metrics.properties \
APP.jar

After using some methods, it turns out that the issue is caused because version incompatibility. As #user1050619 said, make sure the version of kafka, spark, zookeeper and scala are compatible with each other.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to fix "Connection refused error" when running a cluster mode spark job - apache-spark

Related

Can we have multiple executors in Spark master local[*] deployment code client

Spark Streaming - Diagnostics: Container is running beyond physical memory limits

spark-submit does not terminate driver when process terminates on Kubernetes

java.lang.ClassNotFoundException: org.apache.spark.deploy.kubernetes.submit.Client

spark throws java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition

Categories

Resources