Spark Streaming - Jobs run concurrently with default spark.streaming.concurrentJobs setting - apache-spark

I have come across a wierd behaviour in Spark Streaming Job.
We have used the default value for spark.streaming.concurrentJobs which is 1.
The same streaming job was running for more than a day properly with the batch interval being set to 10 Minutes.
Suddenly that same job has started running concurrently for all the batches that come in without putting them in Queue.
Has anyone faced this before?
This would be of great help!

This kind of behavior seems to be curious but I believe seems to happen when there is only 1 job running at a time and if batch processing time < batch interval, then the system seems to be stable.
Spark Streaming creator Tathagata hs mentioned about this: How jobs are assigned to executors in Spark Streaming?.

Related

Spark long running jobs with dataset

I have a spark code that used to run batch jobs(each job span varies from few seconds to few minutes). Now I wanted to take this same code and run it long running. To do this I have thought to create spark context only once and then in a while loop I would wait for new config/tasks to come and will start executing them.
So far whenever I tried to run this code, my applications stops running after 5-6 iterations without any exception or error printed. This long running job has been assigned with 1 executor with 10GB of memory and a spark driver with 4GB of memory(which was good for our batch job). So my questions is what are various things that we need to do to move from small batch jobs to long running jobs within code itself. I have seen this useful link - http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/ but this link is mostly about spark configurations to keep them running for long.
Spark version - 2.3 (can move to spark 2.4.1) running over yarn cluster

Possible reasons that spark waits and does not schedule tasks to run?

This might be a very generic question but hope someone can point some hint. But I found that sometimes, my job spark seems to hit a "pause" many times:
The natural of the job is: read orc files (from a hive table), filter by certain columns, no join, then write out to another hive table.
There were total 64K tasks for my job / stage (FileScan orc, followed by Filter, Project).
The application has 500 executors, each has 4 cores. Initially, about 2000 tasks were running concurrently, things look good.
After a while, I noticed the number running tasks dropped all the way near 100. Many cores/executors were just waiting with nothing to do. (I checked the log from these waiting executors, there was no error. All assigned tasks were done on them, they were just waiting)
After about 3-5 minutes, then these waiting executors suddenly got tasks assigned and now were working happily.
Any particular reasons this can be? The application is running from spark-shell (--master yarn --deploy-mode client, with number of executors/sizes etc. specified)
Thanks!

long scheduler Delay in Spark UI

i am running pyspark jobs on a 2.3.0 cluster on yarn.
i see that all the stages have a very long scheduler Delay.
BUT - it is just the max time, the 75th precentile is 28ms ....
all the other time metric are very low (GC time, task desirialization , etc.)
almost no shuffle write size.
the locality changes between mostly node local , process local and rack local.
what can be the reason for such long scheduler delay time ?
is it yarn or just missing resources to run the tasks ?
will increasing/decreasing partitions help this issue ?
answering my own question in case somebody has the same issue - appeared to be related to skewed data that caused long delays . that was caused by using coalesce instead of repartition of the data , that divided the data unevenly.
on top of that i also cached the data frame after partitioning , so the processed ran locally(process_local) and not node_local and rack_locak.

Spark concurrency performance issue Vs Presto

We are benchmarking spark with alluxio and presto with alluxio. For evaluating the performance we took 5 different queries (with some joins, group by and sort) and ran this on a dataset 650GB in orc.
Spark execution environment is setup in such a way that we have a ever running spark context and we are submitting queries using REST api (Jetty server). We are not considering first batch execution time for this load test as its taking little more time because of task deserialization and all.
What we observed while evaluating is that when we ran individual queries or even all these 5 queries executed concurrently, spark is performing very well compared to presto and is finishing all the execution in half the time than of presto.
But for actual load test, we executed 10 batches (one batch is this 5 queries submitted at the same time) with a batch interval of 60 sec. At this point presto is performing a lot better than spark. Presto finished all job in ~11 mins and spark is taking ~20 mins to complete all the task.
We tried different configurations to improve spark concurrency like
Using 20 pools with equal resource allocation and submitting jobs in a round robin fashion.
Tried using one FAIR pool and submitted all jobs to this default pool and let spark decide on resource allocations
Tuning some spark properties like spark.locality.wait and some other memory related spark properties.
All tasks are NODE_LOCAL (we replicated data in alluxio to make this happen)
Also tried playing arround with executor memory allocation, like tried with 35 small executors (5 cores, 30G) and also tried with (60core, 200G) executors.
But all are resulting in same execution time.
We used dstat on all the workers to see what was happening when spark was executing task and we could see no or minimal IO or network activity . And CPU was alway at 95%+ (Looks like its bounded on CPU) . (Saw almost similar dstat out with presto)
Can someone suggest me something which we can try to achieve similar or better results than presto?
And any explanation why presto is performing well with concurrency than spark ? We observed that presto's 1st batch is taking more time than the succeeding batches . Is presto cacheing some data in memory which spark is missing ? Or presto's resource management/ execution plan is better than spark ?
Note: Both clusters are running with same hardware configuration

Spark tasks blockes randomly on standalone cluster

We are having a quite complex application that runs on Spark Standalone.
In some cases the tasks from one of the workers blocks randomly for an infinite amount of time in the RUNNING state.
Extra info:
there aren't any errors in the logs
ran with logger in debug and i didn't saw any relevant messages (i see when the tasks starts but then there is not activity for it)
the jobs are working ok if i have just only 1 worker
the same job may execute the second time without any issues, in a proper amount of time
i don't have any really big partitions that could cause delays for some of the tasks.
in spark 2.0 i've moved from RDD to Datasets and i have the same issue
in spark 1.4 i was able to overcome the issue by turning on speculation, but in spark 2.0 the blocking tasks are from different workers (while in 1.4 i have blocking tasks on only 1 worker) so speculation isn't fixing my issue.
i have the issue on more environments so i don't think it's hardware related.
Did anyone experienced something similar? Any suggestions on how could i identify the issue?
Thanks a lot!
Later Edit: I think i'm facing the same issue described here: Spark Indefinite Waiting with "Asked to send map output locations for shuffle" and here: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-stalling-during-shuffle-maybe-a-memory-issue-td6067.html but both are without a working solution.
The last thing in the log repeated infinitely is: [dispatcher-event-loop-18] DEBUG org.apache.spark.scheduler.TaskSchedulerImpl - parentName: , name: TaskSet_2, runningTasks: 6
The issue was fixed for me by allocating just one core per executor. If I have executors with more then 1 core the issue appears again. I didn't yet understood why is this happening but for the ones having similar issue they can try this.

Resources