Spark on Yarn IO issues - apache-spark

When scheduling a spark job on yarn is there a possibility to control home any executors are placed on a physical node?
I currently set the spark.executor.cores to 4.
Now when YARN places multiple executors on the same data node it is 4* #numberOfExecutorsOnSameNodethreads trying to read and possibly also swapping as HDFS and the temporary directory of spark reside on the same disks.
So this results in huge issues regarding blocked IO time. Getting SSDs is not an option for now. Are there other things I can try?

One quick fix is to increase spark.executor.memory as this might over-allocate memory but prevent too many executors being launched on the same data nodes.

Related

Spark Master vs Yarn Resource manager

How does yarn allocate resources for spark applications and how it is done when spark runs in standalone mode?
You define the driver memory size, deployment mode, number of executors and their memory sizes when you run spark-submit. If no options are provided, the defaults from spark-env and/or yarn-site.xml are used. Then that amount of resources will be scheduled.
If dynamic executor execution is enabled, and you're reading data from HDFS, for example, then more or less executors may start, depending on how many file blocks the data contains

apache spark executors and data locality

The spark literature says
Each application gets its own executor processes, which stay up for
the duration of the whole application and run tasks in multiple
threads.
And If I understand this right, In static allocation the executors are acquired by the Spark application when the Spark Context is created on all nodes in the cluster (in a cluster mode). I have a couple of questions
If executors are acquired on all nodes and will stay allocated to
this application during the the duration of the whole application,
isn't there a chance a lot of nodes remain idle?
What is the advantage of acquiring resources when Spark context is
created and not in the DAGScheduler? I mean the application could be
arbitrarily long and it is just holding the resources.
So when the DAGScheduler tries to get the preferred locations and
the executors in those nodes are running the tasks, would it
relinquish the executors on other nodes?
I have checked a related question
Does Spark on yarn deal with Data locality while launching executors
But I'm not sure there is a conclusive answer
If executors are acquired on all nodes and will stay allocated to this application during the the duration of the whole application, isn't there a chance a lot of nodes remain idle?
Yes there is chance. If you have data skew this will happen. The challenge is to tune the executors and executor core so that you get maximum utilization. Spark also provides dynamic resource allocation which ensures the idle executors are removed.
What is the advantage of acquiring resources when Spark context is created and not in the DAGScheduler? I mean the application could be arbitrarily long and it is just holding the resources.
Spark tries to keep data in memory while doing transformation. Contrary to map-reduce model where after every Map operation it writes to disk. Spark can keep the data in memory only if it can ensure the code is executed in the same machine. This is the reason of allocating resource beforehand.
So when the DAGScheduler tries to get the preferred locations and the executors in those nodes are running the tasks, would it relinquish the executors on other nodes?
Spark can't start a task on an executor unless the executor is free. Now spark application master negotiates with the yarn to get the preferred location. It may or may not get that. If it doesn't get, it will start task in different executor.

Limit Spark application from grabbing all the resources in a YARN cluster

We (an engineering team) are running an EMR cluster with YARN and Spark. What is typically happening is that when one user submits a heavy memory intensive job, it grabs all the YARN available memory and then all the subsequent users submitted jobs have to wait for that memory to clear (I know that autoscaling will solve this problem to a certain extent and we are looking into that, but we would like to avoid a single user occupying all the memory even when the cluster is autoscaled to it's full limits).
Is there a way to configure YARN such that any application (Spark or otherwise) may not occupy more than, say 75% of available memory?
Thanks
According to the documentation, you can manage the amount of memory allocated to an executor using the parameter: spark.executor.memory

Cannot run memory intense program on Spark through YARN

I'm trying to benchmark a program on an Azure cluster using Spark. We previously ran this on EC2 and know that 150 GB of RAM is sufficient. I have tried multiple setups for the executors and given them 160-180GB of RAM but regardless of what I do, the program dies due to executors requesting more memory.
What can I do? Are there more launch options I should consider, I have tried every conceivable executor setup and nothing seems to want to work. I'm at a total loss.
For your command, you specified 7 executor and each with 40g of memory. That's 280G of memory in total, but you said your cluster has only 160-180 G of memory? If only 150G of memory is needed, why the spark-submit is configured that way?
What's your HDI cluster node type and how many of them you created?
Were you using YARN previously on EC2 as well? In that case, are the configuration the same?

Why are my Spark completed applications still using my worker's disk space?

My Datastax Spark completed applications are using my worker's disc space. Therefore my spark can't run because it doesn't have any disk space left.
This is my spark worker directory. These blue lined applications in total take up 92GB but they shouldn't even exist anymore since they are completed applications Thanks for the help don't know where the problem lies.
This is my spark front UI:
Spark doesn't automatically clean up the jars transfered to the worker nodes. If you want it to do so, and you're running Spark Standalone (YARN is a bit different and won't work the same) you can set spark.worker.cleanup.enabled to true, and set the cleanup interval via spark.worker.cleanup.interval. This will allow Spark to clean up the data retained in your workers. You may also configure a default TTL for all application directories.
From the docs of spark.worker.cleanup.enabled:
Enable periodic cleanup of worker / application directories. Note that
this only affects standalone mode, as YARN works differently. Only the
directories of stopped applications are cleaned up.
For more, see Spark Configuration.

Resources