I'm having 5 different node labels and all of them are exclusive and each belongs to the same queue. When I'm submitting spark job on different node labels, resources are being shared between them but that should not be the case when using exclusive node labels.
What could be the possible reason?
HDP Version - HDP-3.1.0.0
My Spark Submit -
bash $SPARK_HOME/bin/spark-submit --packages net.java.dev.jets3t:jets3t:0.9.0,com.google.guava:guava:16.0.1,com.amazonaws:aws-java-sdk:1.7.4,com.amazonaws:aws-java-sdk-pom:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0 --master yarn --queue prodQueue --conf "spark.executor.extraJavaOptions= -XX:SurvivorRatio=16 -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -XX:MaxDirectMemorySize=4g -XX:NewRatio=1" --conf spark.hadoop.yarn.timeline-service.enabled=false --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.yarn.executor.memoryOverhead=2048 --conf spark.task.maxFailures=40 --conf spark.kryoserializer.buffer.max=8m --conf spark.driver.memory=3g --conf spark.shuffle.sort.bypassMergeThreshold=5000 --conf spark.executor.heartbeatInterval=60s --conf spark.memory.storageFraction=0.20 --conf spark.ui.port=7070 --conf spark.reducer.maxReqsInFlight=10 --conf spark.scheduler.mode=FAIR --conf spark.port.maxRetries=100 --conf spark.yarn.max.executor.failures=280 --conf spark.shuffle.service.enabled=true --conf spark.cleaner.ttl=600 --executor-cores 2 --executor-memory 6g --num-executors 8 --conf spark.yarn.am.nodeLabelExpression=amNodeLabel --conf spark.yarn.executor.nodeLabelExpression=myNodeLabel my-application.jar
Thanks for helping.
Related
I am getting the below error even though I am running as administrator
C:\Spark\spark-3.3.1-bin-hadoop3\bin>
C:\Spark\spark-3.3.1-bin-hadoop3\bin>spark-shell --packages io.delta:delta-core_2.12:1.2.1,org.apache.hadoop:hadoop-aws:3.3.1 --conf spark.hadoop.fs.s3a.access.key=<my key> --conf spark.hadoop.fs.s3a.secret.key=<my secret> --conf "spark.hadoop.fs.s3a.endpoint=<my endpoint> --conf "spark.databricks.delta.retentionDurationCheck.enabled=false" --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
"C:\Program Files\Eclipse Adoptium\jdk-11.0.13.8-hotspot\bin\java" -cp "C:\Spark\spark-3.2.3-bin-hadoop3.2\bin\..\conf\;C:\Spark\spark-3.2.3-bin-hadoop3.2\jars\*" "-Dscala.usejavacp=true" -Xmx1g org.apache.spark.deploy.SparkSubmit --conf "spark.hadoop.fs.s3a.endpoint=<my endpoint> --conf spark.databricks.delta.retentionDurationCheck.enabled=false --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog > C:\Users\shari\AppData\Local\Temp\spark-class-launcher-output-8183.txt" --conf "spark.hadoop.fs.s3a.secret.key=<my secret>" --conf "spark.hadoop.fs.s3a.access.key=<my key>" --class org.apache.spark.repl.Main --name "Spark shell" --packages "io.delta:delta-core_2.12:1.2.1,org.apache.hadoop:hadoop-aws:3.3.1" spark-shell
The system cannot find the file C:\Users\shari\AppData\Local\Temp\spark-class-launcher-output-8183.txt.
Could Not Find C:\Users\shari\AppData\Local\Temp\spark-class-launcher-output-8183.txt
However, if I execute just spark-shell it works.
Can anyone please help me with it ?
OS: Windows 11
Spark: Apache Spark 3.3.1
Java: openjdk version "11.0.13" 2021-10-19
thanks
I have a 1 node Hadoop Cluster, I am submitting a spark job like this
spark-submit \
--class com.compq.scriptRunning \
--master local[*] \
--deploy-mode client \
--num-executors 3 \
--executor-cores 4 \
--executor-memory 21g \
--driver-cores 2 \
--driver-memory 5g \
--conf "spark.local.dir=/data/spark_tmp" \
--conf "spark.sql.shuffle.partitions=2000" \
--conf "spark.sql.inMemoryColumnarStorage.compressed=true" \
--conf "spark.sql.autoBroadcastJoinThreshold=200000" \
--conf "spark.speculation=false" \
--conf "spark.hadoop.mapreduce.map.speculative=false" \
--conf "spark.hadoop.mapreduce.reduce.speculative=false" \
--conf "spark.ui.port=8099" \
.....
Though I define 3 executors, I see only 1 executor in spark UI page running all the time. Can we have multiple executors running in parallel with
--master local[*] \
--deploy-mode client \
Its a on-prem, plain open source hadoop flavor installed in the cluster.
I tried changing master local to local[*] and playing around with deployment modes still, I could see only 1 executor running in spark UI
In K8S Spark Operator, submitted job are getting stuck at Java thread, at the following command with no error details:
/opt/tools/Linux/jdk/openjdk1.8.0.332_8.62.0.20_x64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/ org.apache.spark.deploy.SparkSubmit* --master k8s://https://x.y.z.a:443 --deploy-mode cluster --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent --conf spark.executor.memory=512m --conf spark.driver.memory=512m --conf spark.network.crypto.enabled=true --conf spark.driver.cores=0.100000 --conf spark.io.encryption.enabled=true --conf spark.kubernetes.driver.limit.cores=200m --conf spark.kubernetes.driver.label.version=3.0.1 --conf spark.app.name=sparkimpersonationx42aa8bff --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.executor.cores=1 --conf spark.authenticate=true --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.kubernetes.namespace=abc --conf spark.kubernetes.container.image=placeholder:94 --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=b651fb42-90fd-4675-8e2f-9b4b6e380010 --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=sparkimpersonationx42aa8bff --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/submission-id=b651fb42-90fd-4675-8e2f-9b4b6e380010 --conf spark.kubernetes.driver.pod.name=sparkimpersonationx42aa8bff-driver --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-driver-abc --conf spark.executor.instances=1 --conf spark.kubernetes.executor.label.version=3.0.1 --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=sparkimpersonationx42aa8bff --class org.apache.spark.examples.SparkPi --jars local:///sample-apps/sample-basic-spark-operator/extra-jars/* local:///sample-apps/sample-basic-spark-operator/sample-basic-spark-operator.jar
From the available information, the causes for this can be:
The workload pods cannot get scheduled on your k8s nodes. You can check this with kubectl get pods, are the pods Running?
The resource limits have been reached and the pods are unresponsive.
The spark-operator itself might not be running. You should check the logs for the operator itself.
That's all I can say from what is available.
I'm quite new to configuring spark, so wanted to know whether I am fully utilising my EMR cluster.
The EMR cluster is using spark 2.4 and hadoop 2.8.5.
The app reads loads of small gzipped json files from s3, transforms the data and writes them back out to s3.
I've read various articles, but I was hoping I could get my configuration double checked in case there were set settings that conflict with each other or something.
I'm using a c4.8xlarge cluster with each of the 3 worker nodes having 36 cpu cores and 60gb of ram.
So that's 108 cpu cores and 180gb of ram overall.
Here is my spark-submit settings that I paste in the EMR add step box:
--class com.example.app
--master yarn
--driver-memory 12g
--executor-memory 3g
--executor-cores 3
--num-executors 33
--conf spark.executor.memory=5g
--conf spark.executor.cores=3
--conf spark.executor.instances=33
--conf spark.driver.cores=16
--conf spark.driver.memory=12g
--conf spark.default.parallelism=200
--conf spark.sql.shuffle.partitions=500
--conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2
--conf spark.speculation=false
--conf spark.yarn.am.memory=1g
--conf spark.executor.heartbeatInterval=360000
--conf spark.network.timeout=420000
--conf spark.hadoop.fs.hdfs.impl.disable.cache=true
--conf spark.kryoserializer.buffer.max=512m
--conf spark.shuffle.consolidateFiles=true
--conf spark.hadoop.fs.s3a.multiobjectdelete.enable=false
--conf spark.hadoop.fs.s3a.fast.upload=true
--conf spark.worker.instances=3
I have following settings in my Spark job:
--num-executors 2
--executor-cores 1
--executor-memory 12G
--driver memory 16G
--conf spark.streaming.dynamicAllocation.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.streaming.receiver.writeAheadLog.enable=false
--conf spark.executor.memoryOverhead=8192
--conf spark.driver.memoryOverhead=8192'
My understanding is job should run with 2 executors however it is running with 3. This is happening to multiple of my jobs. Could someone please explain the reason?