Spark - what triggers a spark job to be re-attempted? - apache-spark

For educational purposes mostly, I was trying to get Yarn + Spark to re-attempt my Spark job on purpose (i.e. fail it, and see it be rescheduled by yarn in another app-attempt).
Various failures seem to cause a Spark job to be re-run; I know I have seen this numerous times. I'm having trouble simulating it though.
I have tried forcefully stopping the streaming context and calling System.exit(-1) and neither achieved the desired affect.

After lots of playing with this, I have seen that Spark + YARN do not play well together with exit codes (at least not for MapR 5.2.1's versions), but I don't think it's MapR-specific.
Sometimes a spark program will throw an exception and die, and it reports SUCCESS to YARN (or YARN gets SUCCESS somehow), so there are no reattempts.
Doing System.exit(-1) does not provide any more stable results, sometimes it can be SUCCESS or FAILURE even when the same code is repeated.
Interestingly, getting a reference to the main thread of the driver and killing it does seem to force a re-attempt; but that is very dirty and requires use of a deprecated function on the thread class.

Related

Spark SQL How to Allow Some Tasks To Fail but Overall Job Still Succeed?

I have a Spark job where a small minority of the tasks keep failing, causing the whole job to then fail, and nothing gets outputted to the table where results are supposed to go. Is there a way to get Spark to tolerate a few failed tasks and still write the output from the successful ones? I don't actually need 100% of the data to get through, so I'm fine with a few tasks failing.
No, that is not possible, and not part of the design of Spark. No is also an answer.

Apache Spark: what can be the rootcause of OOM on driver if the code is NOT pulling anything to driver explicitly?

I am running an apache-spark job that is doing a lot of transformations described in spark ql.
At some point, my job hangs.
This point is when my job issues a query about dropping a table.
At the same time in spark UI, I don't see that the corresponding stage is started, just a previous one is done, and a new one is not starting.
An executor's logs I see
WARN executor.Executor: Issue communicating with driver in heartbeater
...which happens exactly at the moment when drivers stop writing logs.
I dont see any issue in driver logs, but I see a warning, which happens right after some executors report that they stop trying to get in touch with driver:
19/03/25 18:28:29 WARN hdfs.DataStreamer: Exception for BP-yyy-127.0.0.1-zzz:blk_xxx
java.io.EOFException: Unexpected EOF while trying to read response from server
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:399)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1020)
I checked the table which is about to be dropped. Nothing special, except that I see one empty parquet file there. But if I issue a drop command from another process - it succeeds. I doubt the drop table's nature is relevant here. Something wrong happens with the driver at some point and I cannot get what. I checked the driver's memory - no problem.
Any ideas on this puzzler?
UPDATE:
I raised spark.executor.heartbeatInterval and both mentioned issues went away.
Now I clearly see the issue: driver fails with Java Heap Space OOM.
But what can be the reason for this if I am NOT pulling anything from the driver?
I see a lot of broadcast/accumulator operations in logs, but they come from the engine, not from me.
UPDATE 2.0:
Actually the idea of this question is quite simple: how sparkSQL query can lead to OOM in driver?
What else can lead to OOM on driver? Logging policy?

Apache Spark Executors Dead - is this the expected behaviour?

I am running a pipeline to process my data on Spark. It seems like my Executors die every now and then when they reach near the Storage Memory limit. The job continues and eventually finishes but is this the normal behaviour? Is there something I should be doing to prevent this from happening? Every time this happens the job hangs for some time until (and I am guessing here) YARN provides some new executors for the job to continue.
I think this turned out to be related with a Yarn bug. It doesn't happen anymore after I set the following YARN options like suggested in section 4. of this blog post
Best practice 5: Always set the virtual and physical memory check flag
to false.
"yarn.nodemanager.vmem-check-enabled":"false",
"yarn.nodemanager.pmem-check-enabled":"false"

Spark stage stays stuck in pending

I am running a rather simple Spark job: read a couple of Parquet datasets (10-100GB) each, do a bunch of joins, and writing the result back to Parquet.
Spark always seem to get stuck on the last stage. The stage stays "Pending" even though all previous stages have completed, and there are executors waiting. I've waited up to 1.5 hours and it just stays stuck.
I have tried the following desperate measures:
Using smaller datasets appears to work, but then the plan changes (e.g., some broadcast joins start to pop up) so that doesn't really help to troubleshoot.
Allocating more executor or driver memory doesn't seem to help.
Any idea?
Details
Running Spark 2.3.1 on Amazon EMR (5.17)
client-mode on YARN
Driver thread dump
Appears similar to Spark job showing unknown in active stages and stuck although I can't be sure
Job details showing the stage staying in pending:

Spark tasks blockes randomly on standalone cluster

We are having a quite complex application that runs on Spark Standalone.
In some cases the tasks from one of the workers blocks randomly for an infinite amount of time in the RUNNING state.
Extra info:
there aren't any errors in the logs
ran with logger in debug and i didn't saw any relevant messages (i see when the tasks starts but then there is not activity for it)
the jobs are working ok if i have just only 1 worker
the same job may execute the second time without any issues, in a proper amount of time
i don't have any really big partitions that could cause delays for some of the tasks.
in spark 2.0 i've moved from RDD to Datasets and i have the same issue
in spark 1.4 i was able to overcome the issue by turning on speculation, but in spark 2.0 the blocking tasks are from different workers (while in 1.4 i have blocking tasks on only 1 worker) so speculation isn't fixing my issue.
i have the issue on more environments so i don't think it's hardware related.
Did anyone experienced something similar? Any suggestions on how could i identify the issue?
Thanks a lot!
Later Edit: I think i'm facing the same issue described here: Spark Indefinite Waiting with "Asked to send map output locations for shuffle" and here: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-stalling-during-shuffle-maybe-a-memory-issue-td6067.html but both are without a working solution.
The last thing in the log repeated infinitely is: [dispatcher-event-loop-18] DEBUG org.apache.spark.scheduler.TaskSchedulerImpl - parentName: , name: TaskSet_2, runningTasks: 6
The issue was fixed for me by allocating just one core per executor. If I have executors with more then 1 core the issue appears again. I didn't yet understood why is this happening but for the ones having similar issue they can try this.

Resources