Can we have multiple executors in Spark master local[*] deployment code client

Can we have multiple executors in Spark master local[*] deployment code client - apache-spark

I have a 1 node Hadoop Cluster, I am submitting a spark job like this
spark-submit \
--class com.compq.scriptRunning \
--master local[*] \
--deploy-mode client \
--num-executors 3 \
--executor-cores 4 \
--executor-memory 21g \
--driver-cores 2 \
--driver-memory 5g \
--conf "spark.local.dir=/data/spark_tmp" \
--conf "spark.sql.shuffle.partitions=2000" \
--conf "spark.sql.inMemoryColumnarStorage.compressed=true" \
--conf "spark.sql.autoBroadcastJoinThreshold=200000" \
--conf "spark.speculation=false" \
--conf "spark.hadoop.mapreduce.map.speculative=false" \
--conf "spark.hadoop.mapreduce.reduce.speculative=false" \
--conf "spark.ui.port=8099" \
.....
Though I define 3 executors, I see only 1 executor in spark UI page running all the time. Can we have multiple executors running in parallel with
--master local[*] \
--deploy-mode client \
Its a on-prem, plain open source hadoop flavor installed in the cluster.
I tried changing master local to local[*] and playing around with deployment modes still, I could see only 1 executor running in spark UI

Related

MountVolume.Setup failed for volume "spark-conf-volume"

We are running Spark Cluster on Kubernetes. When we submited jobs as below, driver pod and executer pods were all up and running. However, the application failed to work as expected, the root cause we suspected is that it failed to find the source path as specified by parameter "py-files". As we witnessed, the driver pod has a warning MountVolume.Setup failed for volume "spark-conf-volume".
Would you please advise?
bin/spark-submit \
--master k8s://https://k8s-master-ip:6443 \
--deploy-mode cluster \
--name algo-vm \
--py-files hdfs://{our_ip}:9000/testdata/src.zip \
--conf spark.executor.instances=2 \
--conf spark.driver.port=10000 \
--conf spark.port.maxRetries=1 \
--conf spark.blockManager.port=20000 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf spark.kubernetes.container.image={our_ip}/sutpc/k8s-spark-242-entry/spark-py:1.0 \
--jars hdfs://hdfs-master-ip:9000/jar/spark-sql-kafka-0-10_2.11-2.4.5.jar,hdfs://{our_ip}:9000/jar/kafka-clients-0.11.0.2.jar \
hdfs://{our_ip}:9000/testdata/spark_main.py

Kubernetes sport submit in cluster mode --packages not working as expected

I am trying to submit a spark job to a kubernetes cluster in cluster mode from a client in the cluster with --packages attribute to enable dependencies are downloaded by driver and executer but it is not working. It refers to path on submitting client. ( kubectl proxyis on )
here it the the submit options
/usr/local/bin/spark-submit \
--verbose \
--master=k8s://http://127.0.0.1:8001 \
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.namespace=spark \
--conf spark.kubernetes.container.image= <...> \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf spark.kubernetes.driver.secretKeyRef.AWS_ACCESS_KEY_ID=datazone-s3-secret:AWS_ACCESS_KEY_ID \
--conf spark.kubernetes.driver.secretKeyRef.AWS_SECRET_ACCESS_KEY=datazone-s3-secret:AWS_SECRET_ACCESS_KEY \
--packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3 \
s3.py 10
On the logs I can see that packages are referring my local file system.
Spark config:
(spark.kubernetes.namespace,spark)
(spark.jars,file:///Users/<my username>/.ivy2/jars/com.amazonaws_aws-java-sdk-1.7.4.jar,file:///Users/<my username>/.ivy2/jars/org.apache.hadoop_hadoop-aws-2.7.3.jar,file:///Users/<my username>/.ivy2/jars/joda-time_joda-time-2.10.5.jar, ....
Did someone face this problem?

Spark Thrift server queuing up queries

When Parallel queries are hitting Spark Thrift server, in Spark UI --> JDBC/ODBC Server , it shows up all queries as started but all of them gets executed in a sequential manner
Here's the Thrift Server startup script---
start_thriftserver (){
sudo /usr/lib/spark/sbin/start-thriftserver.sh \
--master yarn \
--deploy-mode client \
--executor-memory 3200m \
--executor-cores 2 \
--driver-memory 4g \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.shuffle.service.enabled=true \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.dynamicAllocation.schedulerBacklogTimeout=1s \
--conf spark.dynamicAllocation.minExecutors=50 \
--conf spark.executor.memoryOverhead=684

This is indeed a confusing topic.
spark.sql.hive.thriftServer.singleSession=false
Try this.
That said, I am a little sceptical on all this.

Spark Streaming - Diagnostics: Container is running beyond physical memory limits

My Spark Streaming job failed with the below exception
Diagnostics: Container is running beyond physical memory limits.
Current usage: 1.5 GB of 1.5 GB physical memory used; 3.6 GB of 3.1 GB
virtual memory used. Killing container.
Here is my spark submit command
spark2-submit \
--name App name \
--class Class name \
--master yarn \
--deploy-mode cluster \
--queue Queue name \
--num-executors 5 --executor-cores 3 --executor-memory 5G \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.locality.wait=10 \
--conf spark.task.maxFailures=8 \
--conf spark.ui.killEnabled=false \
--conf spark.logConf=true \
--conf spark.yarn.driver.memoryOverhead=512 \
--conf spark.yarn.executor.memoryOverhead=2048 \
--conf spark.yarn.max.executor.failures=40 \
jar path
I am not sure what's causing the above issue. Am I missing something in the above command or is it failing as I didn't set --driver-memory in my spark submit command?

java.lang.ClassNotFoundException: org.apache.spark.deploy.kubernetes.submit.Client

I am running a sample spark job in kubernetes cluster with following command:
bin/spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--master k8s://https://XXXXX \
--kubernetes-namespace sidartha-spark-cluster \
--conf spark.executor.instances=2 \
--conf spark.app.name=spark-pi \
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.1.0-rc1 \
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.1.0-rc1 \
examples/jars/spark-examples_2.11-2.1.0-k8s-0.1.0-SNAPSHOT.jar 1000
I am building the spark from apache-spark-on-k8s
I am not able find the jar for org.apache.spark.deploy.kubernetes.submit.Client Class.

This issue is resolved. We need to build the spark/resource-manager/kubernetes from the source.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Can we have multiple executors in Spark master local[*] deployment code client - apache-spark

Related

MountVolume.Setup failed for volume "spark-conf-volume"

Kubernetes sport submit in cluster mode --packages not working as expected

Spark Thrift server queuing up queries

Spark Streaming - Diagnostics: Container is running beyond physical memory limits

java.lang.ClassNotFoundException: org.apache.spark.deploy.kubernetes.submit.Client

Categories

Resources