Spark Out of Memory Error For MapOutputTracker serializeMapStatuses - apache-spark

I have a spark job which have hundred thousands (300,000 task and more)of tasks at stage 0, and then during the shuffling, the following exception throws on Driver side:
util.Utils: Suppressing exception in finally: null
java.lang.OutOfMemoryError at
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123) at
java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at
java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253) at
java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211) at
java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:145) at
java.io.ObjectOutputStream$BlockDataOutputStream.writeBlockHeader(ObjectOutputStream.java:1894) at
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1875) at
java.io.ObjectOutputStream$BlockDataOutputStream.flush(ObjectOutputStream.java:1822) at
java.io.ObjectOutputStream.flush(ObjectOutputStream.java:719) at
java.io.ObjectOutputStream.close(ObjectOutputStream.java:740) at
org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$2.apply$mcV$sp(MapOutputTracker.scala:618) at
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1319) at
org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:617) at
org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:560) at
org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:349) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
I checked ByteArrayOutputStream code, and it throws out of memory error when the array size is larger than INTEGER.MAX which is about 2G. That means the map status serialization result should less than 2G.
I also checked the MapOutputTracker code, this map status size is related to task size and following stage task size.
I was wondering if anyone encounter this issue, how you resolve this. my understanding is we can only reduce the size of task, but my task can only stucks because less partition will delay the computation.

This is likely caused by a single block that exceeds 2GB of memory during a shuffle. This usually means your operation requires larger parallelism which will reduce the size of any individual block - hopefully below the 2GB limit (which is extremely high.)
No Spark shuffle block can be greater than 2 GB
Spark uses ByteBuffer as abstraction for storing blocks.
val buf =ByteBuffer.allocate(length.toInt)
ByteBuffer is limited by Integer.MAX_IZE(2GB)
Increasing your Parallelism
1) Repartition your data before invoking the operation that causes this error as follows:
DataFrame.repartition(400)
RDD.repartition(400)
2) Pass the number of partitions into the operation as the last argument (where supported):
import org.apache.spark.rdd.PairRDDFunctions
RDD.groupByKey(numPartitions: Int) RDD.join(other: RDD, numPartitions:Int)
3) Set the default parallelism (partitions) through the SparkConf as follows (NOT YET SUPPORTED in Databricks Cloud):
// create the SparkConf used to create the SparkContext val conf = new SparkConf()
// set the parallelism/partitions conf.set("spark.default.parallelism", 400)
// create the SparkContext with the conf val sc = new SparkContext(conf)
// check the parallelism/partitions sc.defaultParallelism
4) Set the SQL partitions through SQL as follows (default is 200):
SET spark.sql.shuffle.partitions=400;
Why 2GB?
This limit exists because of the limit on Java Integers: 2^31 == 2,147,483,647 ~= 2GB.
Spark's shuffle mechanism currently uses Java ByteArrays to transport the data across the network.
This will be enhanced in the future by either expanding Spark's shuffle to use either a larger ByteArrays using Longs, chaining together ByteArrays, or both.
https://forums.databricks.com/questions/1140/im-seeing-an-outofmemoryerror-requested-array-size.html

Related

When does Spark send data to different executors after I created a DataFrame with RDD?

I'm trying to construct a DataFrame from a list of data, then write it as parquet files:
dataframe = None
while True:
data_list = get_data_list() # this function would return a list of data, about 1 million rows
rdd = sparkContext.parallelize(data_list, 20)
if dataframe:
dataframe.union(sparkSession.createDataFrame(data=rdd))
else:
dataframe = sparkSession.createDataFrame(data=rdd)
if some_judgement:
break
dataframe.write.parquet('...')
But I found the driver would fail with java.lang.OutOfMemoryError: Java heap space after a few cycles. If I increase the driver-memory or decrease the number of cycles in the loop, this exception stops occurring. So I guess even if I created an RDD, the data is still stored in the driver. So when will the data be sent to executors ? I want to decrease the memory usage of the driver.
Can you check the logs and see where the exception is happening (Either at driver or executor)? If its happening at driver -> can you increase driver memory to 8 or 10 GB and see if it gets succeeded ?
Also I would suggest to set some higher values for memoryOverHead params.
spark.driver.memoryOverhead
spark.executor.memoryOverhead

What will happen if the single file is larger than a executor in a map operation in YARN - SPARK?

I'm working on a solution where driver program will read the xml file and from that i will take a HDFS file path and that will be read inside map operation.I have few questions here.
Since the map operation will be performed in containers (Containers will be allocated while starting the job ).
What is the single input file is greater than a executor. Since the file is not read in driver program it cannot allocate more resource? OR the application master will get more memory from resource manager?
Any help is highly appreciated.
What is the single input file is greater than a executor?
As the file is in HDFS, Spark will create 1 partition for 1 block in HDFS. Every partitions will be processed in a Worker.
If file has many blocks which can't be computed at a time then spark make sure the pending partition will be computed once resources are free(after completing transformation with a stage).
Loaded file appears as RDD. RDD is combination of pieces so called partitions which are reside across cluster. Reading file is not problem but after transformation it can throw OOM exception depending on executor memory limitations. Because there can be some shuffle operations which will require transfer of partitions to one place. By default executor memory set to be 512MB. But for processing large amount of data set custom memory parameter.
Spark reserves parts of that memory for cached data storage and for temporary shuffle data. Set the heap for these with the parameters spark.storage.memoryFraction (default 0.6) and spark.shuffle.memoryFraction (default 0.2). Because these parts of the heap can grow before Spark can measure and limit them, two additional safety parameters must be set: spark.storage.safetyFraction (default 0.9) and spark.shuffle.safetyFraction (default 0.8). Safety parameters lower the memory fraction by the amount specified. The actual part of the heap used for storage by default is 0.6 × 0.9 (safety fraction times the storage memory fraction), which equals 54%. Similarly, the part of the heap used for shuffle data is 0.2 × 0.8 (safety fraction times the shuffle memory fraction), which equals 16%. You then have 30% of the heap reserved for other Java objects and resources needed to run tasks. You should, however, count on only 20%.

Why does Spark Streaming fail at String decoding due to java.lang.OutOfMemoryError?

I run a Spark Streaming (createStream API) application on a YARN cluster of 3 nodes with 128G RAM each (!) The app reads records from a Kafka topic and writes to HDFS.
Most of the time the application fails/is killed (mostly receiver fails) due to Java heap error no matter how much memory I configure to executor/driver.
16/11/23 13:00:20 WARN ReceiverTracker: Error reported by receiver for stream 0: Error handling message; exiting - java.lang.OutOfMemoryError: Java heap space
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
at java.lang.StringCoding.decode(StringCoding.java:193)
at java.lang.String.<init>(String.java:426)
at java.lang.String.<init>(String.java:491)
at kafka.serializer.StringDecoder.fromBytes(Decoder.scala:50)
at kafka.serializer.StringDecoder.fromBytes(Decoder.scala:42)
at kafka.message.MessageAndMetadata.message(MessageAndMetadata.scala:32)
at org.apache.spark.streaming.kafka.KafkaReceiver$MessageHandler.run(KafkaInputDStream.scala:137)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
If you are using KafkaUtil.createStream(....) single Receiver will be run in an spark executor and if the topic is partioned, multiple receiver threads run for each partition. So if your stream has large string objects and the frequency is high and all threads share single executor memory you may get OOM issue.
The below are the possible solutions.
As the job fails out of memory in receiver, First check the batch and block interval properties. If batch interval is grater(like 5 min) try with lesser value like(100ms).
Limit the rate of the records received per second as "spark.streaming.receiver.maxRate", also make ensure that
"spark.streaming.unpersist" value is "true".
You may use KafkaUtil.KafkaUtils.createDirectStream[String, String,
StringDecoder, StringDecoder](streamingContext, kafkaParams,
topics). In this case instead of single receiver spark executors
directly connect to the kafka partition leads and receive the data
parallel(each kfka partition is one KafkaRDD partition). Unlike
multiple threads in single receiver executor here multiple
executors will run parallel and load will be distributed.

Running Spark Mllib Decision Tree gives block size error

I'm running a decision tree on a dataframe of about 2000 points and 500 features. Maxbins is 182. No matter how i increase the shuffling block size from 200 up to 4000 i keep getting a failure at stage 3 of the decision tree training saying "max integer reached" referring to Spark block size shuffling size. Note my dataframes are not rdds but spark sql dataframes.
Here is the error:
...
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:828)
at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)
at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)
at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:522)
at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:312)
at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)
at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
...
Here is the code producing it:
val assembled = assembler.transform(features)
val dt = new DecisionTreeClassifier().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures").setImpurity(impurity).setMaxBins(maxBins).setMaxDepth(maxDepth)
val pipeline = new Pipeline().setStages(Array(labelIndexer, dt))
val model = pipeline.fit(assembled)
Thank you for any pointers on what might be causing this and how to fix it.
Thank you.
Try to increase number of partitions - try repartition() method.
The reason for this error is that spark uses memory mapped files to handle partition data blocks and it's currently not possible to memory mapped something that's more, than 2GB (Integer.MAX_VALUE) - not the Spark issue, though.
The workaround would be increase number of partitions. That would reduce block size of particular partition and might help with the issue.
And there is also some activity to workaround that in Spark itself - to process partitions blocks in chunks.

spark groupby driver throws OutOfMemory

I have a RDD[((Long,Long),Float)] about 150G (shown in web ui storage).
When I groupby this RDD, driver program throws following error
15/07/16 04:37:08 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-39] shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at akka.serialization.JavaSerializer$$anonfun$toBinary$1.apply$mcV$sp(Serializer.scala:129)
at akka.serialization.JavaSerializer$$anonfun$toBinary$1.apply(Serializer.scala:129)
at akka.serialization.JavaSerializer$$anonfun$toBinary$1.apply(Serializer.scala:129)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at akka.serialization.JavaSerializer.toBinary(Serializer.scala:129)
at akka.remote.MessageSerializer$.serialize(MessageSerializer.scala:36)
at akka.remote.EndpointWriter$$anonfun$serializeMessage$1.apply(Endpoint.scala:845)
at akka.remote.EndpointWriter$$anonfun$serializeMessage$1.apply(Endpoint.scala:845)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at akka.remote.EndpointWriter.serializeMessage(Endpoint.scala:844)
at akka.remote.EndpointWriter.writeSend(Endpoint.scala:747)
The executors didn't even start the stage.
This RDD has 120000 partitions. Could this be the cause of the error?
The size of a at least one of the partitions is more that the memory you have allocated to the executor (you can do that with the --executor-memory flag on the command line running the spark job
After grouping by (Long, Long), at least one of your groups are big to fit in memory. Spark expects each record after grouping ((Long,long), Iterator[Float]) to fit in memory. and this is not the case for your data. see this https://spark.apache.org/docs/1.2.0/tuning.html look for Memory Usage of Reduce Tasks
I suggest to have a work around by increasing your data parallelism. Add a mapping step before the group by and break down your data.
ds.Map(x=>(x._1._1,x._1._2,x._1._1%2),float))
Then group by the new key (you might do something more sophisticated than this x._1._1%2).

Resources