Driver cores must be a positive number - apache-spark

I have upgraded Spark from version 3.1.1 to 3.2.1.
And now all existing Spark Jobs break with following ERROR.
Exception in thread "main" org.apache.spark.SparkException: Driver cores must be a positive number
at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:634)
at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:257)
at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:234)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:119)
at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:1026)
at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:1026)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
We are using Spark in cluster mode with apache mesos and co-located with cassandra.
I tried few options:
e.g. appl/spark/bin/spark-submit --name "Testjob" --deploy-mode cluster --master mesos://<master node>:7077 --executor-cores 4 --driver-memory 1G --driver-cores 1 -class ....
Do you have any hints or solutions for solving this problem.
Many thanks...
cheers

Unfortunately I think it is impossible to run Spark 3.2.x with Mesos in Cluster mode because of this feature and the way MesosClusterDispatcher works.
Basically what's happening the Dispatcher is submitting the Spark Application with the --driver-cores argument as a floating point number, and then Spark (SparkSubmitArguments.scala) reads it as String and parses it just like this:
driverCores.toInt
and of course this fails.
I proposed a quick fix for this but meanwhile I just built the code with the change I made in the PR. I also reported this as a bug.

Related

PySpark driver memory exceptions while reading too many small files

PySpark version: 2.3.0, HDP 2.6.5
My source is populating a Hive table(HDFS backed) with 826 paritions and 1557242 small files(40KB). I know this is highly inefficient way to store data but I dont have control over my source
The problem now is when I need to do some historical load and I need to scan all the files at once the driver is having memory exceptions. Tried setting driver-memory to 8g,16g and similar configuration for driver.memory.overhead too. But the problem still persists.
What makes me wonder is this is failing in listing files I presume this is just metadata. Is there an explanation why file metadata would need so much memory?
py4j.protocol.Py4JJavaError: An error occurred while calling o351.saveAsTable.
: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.String.substring(String.java:1956)
at java.net.URI$Parser.substring(URI.java:2869)
at java.net.URI$Parser.parse(URI.java:3049)
at java.net.URI.<init>(URI.java:746)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
at org.apache.hadoop.fs.Path.<init>(Path.java:171)
at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$bulkListLeafFiles$3$$anonfun$7.apply(InMemoryFileIndex.scala:251)
It would be helpfull if you can share the parameters you were passing in spark submit.
i too facing similar issue.. adjusting the parameters it worked.
try diffrent configs ( as number s i cant suggest it depends on server configs)
mine :
spark-submit \
--master yarn \
--deploy-mode client \
--driver-memory 5g \
--executor-memory 6g \
--executor-cores 3

Spark on EMR-5.32.0 not spawning requested executors

I am running into some problems in (Py)Spark on EMR (release 5.32.0). Approximately a year ago I ran the same program on an EMR cluster (I think the release must have been 5.29.0). Then I was able to configure my PySpark program using spark-submit arguments properly. However, now I am running the same/similar code, but the spark-submit arguments do not seem to have any effect.
My cluster configuration:
master node: 8 vCore, 32 GiB memory, EBS only storage EBS Storage:128 GiB
slave nodes: 10 x 16 vCore, 64 GiB memory, EBS only storage EBS Storage:256 GiB
I run the program with the following spark-submit arguments:
spark-submit --master yarn --conf "spark.executor.cores=3" --conf "spark.executor.instances=40" --conf "spark.executor.memory=8g" --conf "spark.driver.memory=8g" --conf "spark.driver.maxResultSize=8g" --conf "spark.dynamicAllocation.enabled=false" --conf "spark.default.parallelism=480" update_from_text_context.py
I did not change anything in the default configurations on the cluster.
Below a screenshot of the Spark UI, which is indicating only 10 executors, whereas I expect to have 40 executors available...
I tried different spark-submit arguments in order to make sure that the error was unrelated to Apache Spark: setting executor instances does not change the executors. I tried a lot of things, and nothing seems to help.
I am a little lost here, could someone help?
UPDATE:
I ran the same code on EMR release label 5.29.0, and there the conf settings in the spark-submit argument seems to have effect:
Why is this happening?
Sorry for the confusion, but this is intentional. On emr-5.32.0, Spark+YARN will coalesce multiple executor requests that land on the same node into a larger executor container. Note how even though you had fewer executors than you expected, each of them had more memory and cores that you had specified. (There's one asterisk here, though, that I'll explain below.)
This feature is intended to provide better performance by default in most cases. If you would really prefer to keep the previous behavior, you may disable this new feature by setting spark.yarn.heterogeneousExecutors.enabled=false, though we (I am on the EMR team) would like to hear from you about why the previous behavior is preferable.
One thing that doesn't make sense to me, though, is that you should be ending up with the same total number of executor cores that you would have without this feature, but that doesn't seem to have occurred for the example you shared. You asked for 40 executors with 3 cores each but then got 10 executors with 15 cores each, which is a bit more in total. This may have to do with the way that your requested spark.executor.memory of 8g divides into the memory available on your chosen instance type, which I'm guessing is probably m5.4xlarge. One thing that may help you is to remove all of your overrides for spark.executor.memory/cores/instances and just use the defaults. Our hope is that defaults will give the best performance in most cases. If not, like I said above, please let us know so that we can improve further!
Ok, if someone is facing the same problem. As a workaround you can just revert back to a previous version of EMR. In my example I reverted back to EMR release label 5.29.0, which solved all my problems. Suddenly I was able to configure the Spark job again!
Still I am not sure why it doesn't work in EMR release label 5.32.0. So if someone has suggestions, please let me know!

Spark : Understanding Dynamic Allocation

I have launched a spark job with the following configuration :
--master yarn --deploy-mode cluster --conf spark.scheduler.mode=FAIR --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors=19 --conf spark.dynamicAllocation.minExecutors=0
It works well and finished in success, but after checking spark history ui, this is what i saw :
My questions are (Im concerned by understanding more than solutions) :
Why spark request the last executor if it has no task to do ?
How can i optimise cluster resource requested by my job in the dynamic allocation mode ?
Im using Spark 2.3.0 on Yarn.
You need to respect the 2 requierements for using spark dynamic allocation:
spark.dynamicAllocation.enable
spark.shuffle.service.enabled => The purpose of the external shuffle service is to allow executors to be removed without deleting shuffle files.
The resources are adjusted dynamically based on the workload. The app will give resources back if you are no longer using it.
I am not sure that there is an order, it will just request executors in round and exponentially, i.e: an application will add 1 executor in the first round, and then 2, 4 8 and so on...
Configuring external shuffle service
It's difficult to know what Spark did there without knowing the content of the job you submitted. Unfortunately the configuration string you provided does not say much about what Spark will actually perform upon job submission.
You will likely get a better understanding of what happened during a task by looking at the 'SQL' part of the history UI (right side of the top bar) as well as at the stdout logs.
Generally one of the better places to read about how Spark works is the official page: https://spark.apache.org/docs/latest/cluster-overview.html
Happy sparking ;)
Its because of the allocation policy :
Additionally, the number of executors requested in each round
increases exponentially from the previous round.
reference

A SPARK CLUSTER ISSUE

I know that when the spark cluster in the production environment is running a job, it is in the stand-alone mode.
While I was running a job, a few points of worker's memory overflow caused the worker node process to die.
I would like to ask how to analyze the error shown in the image below:
Spark Worker Fatal Error
EDIT: This is a relatively common problem please also view this if the below doesn't help you Spark java.lang.OutOfMemoryError: Java heap space.
Without seeing your code here is the process you should follow:
(1) If the issue is caused primarily from the Java allocation running out of space within the container allocation I would advise messing with your memory overhead settings (below). The current value are a little high and will cause the excess spin-up of vcores. Add the two below settings to your spark-submit and re-run.
--conf "spark.yarn.executor.memoryOverhead=4000m"
--conf "spark.yarn.driver.memoryOverhead=2000m"
(2) Adjust Executor and Driver Memory Levels. Start low and climb. Add these values to the spark-submit statement.
--driver-memory 10g
--executor-memory 5g
(3) Adjust Number of Executor Values in the spark submit.
--num-executors ##
(4) Look at the Yarn stages of the job and figure where inefficiencies in the code is present and where persistence's can be added and replaced. I would advise to heavily look into spark-tuning.

Spark job fails when cluster size is large, succeeds when small

I have a spark job which takes in three inputs and does two outer joins. The data is in key-value format (String, Array[String]). Most important part of the code is:
val partitioner = new HashPartitioner(8000)
val joined = inputRdd1.fullOuterJoin(inputRdd2.fullOuterJoin(inputRdd3, partitioner), partitioner).cache
saveAsSequenceFile(joined, filter="X")
saveAsSequenceFile(joined, filter="Y")
I'm running the job on EMR with r3.4xlarge driver node and 500 m3.xlarge worker nodes. The spark-submit parameters are:
spark-submit --deploy-mode client --master yarn-client --executor-memory 3g --driver-memory 100g --executor-cores 3 --num-executors 4000 --conf spark.default.parallelism=8000 --conf spark.storage.memoryFraction=0.1 --conf spark.shuffle.memoryFraction=0.2 --conf spark.yarn.executor.memoryOverhead=4000 --conf spark.network.timeout=600s
UPDATE: with this setting, number of executors seen in spark jobs UI were 500 (one per node)
The exception I see in the driver log is the following:
17/10/13 21:37:57 WARN HeartbeatReceiver: Removing executor 470 with no recent heartbeats: 616136 ms exceeds timeout 600000 ms
17/10/13 21:39:04 ERROR ContextCleaner: Error cleaning broadcast 5
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [600 seconds]. This timeout is controlled by spark.network.timeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214)
...
Some of the things I tried that failed:
I thought the problem would be because of there are too many executors being spawned and driver has an overhead of tracking these executors. I tried reducing the number of executors by increasing the executor-memory to 4g. This did not help.
I tried changing the instance type of driver to r3.8xlarge, this did not help either.
Surprisingly, when I reduce the number of worker nodes to 300, the job runs file. Does any one have any other hypothesis on why this would happen?
Well this is a little bit a problem to understand how is the allocation of Spark works.
According to your information, you have 500 nodes with 4 cores each. So, you have 4000 cores. What you are doing with your request is creating 4000 executors with 3 cores each. It means that you are requesting 12000 cores for your cluster and there is no thing like that.
This error of RPC timeout is regularly associated with how many jvms you started in the same machine, and that machine is not able to respond in proper time due to much thing happens at the same time.
You need to know that, --num-executors is better been associated to you nodes, and the number of cores should be associated to the cores you have in each node.
For example, the configuration of m3.xLarge is 4 cores with 15 Gb of RAM. What is the best configuration to run a job there? That depends what you are planning to do. See if you are going to run just one job I suggest you to set up like this:
spark-submit --deploy-mode client --master yarn-client --executor-memory 10g --executor-cores 4 --num-executors 500 --conf spark.default.parallelism=2000 --conf spark.yarn.executor.memoryOverhead=4000
This will allow you job to run fine, if you don't have problem to fit your data to your worker is better change the default.parallelism to 2000 or you are going to lost lot of time with shuffle.
But, the best approach I think that you can do is keeping the dynamic allocation that EMR enables it by default, just set the number of cores and the parallelism and the memory and you job will run like a charm.
I experimented with lot of configurations modifying one parameter at a time with 500 nodes. I finally got the job to work by lowering the number of partitions in the HashPartitioner from 8000 to 3000.
val partitioner = new HashPartitioner(3000)
So probably the driver is overwhelmed with a the large number of shuffles that has to be done when there are more partitions and hence the lower partition helps.

Resources