Why cassandra fails with OutOfMemoryError during sstables compaction - cassandra

Hi may be this is a stupid question, but I did not find the answer via google.
So what I have:
java 1.7
cassandra 1.2.8 running in single node with -Xmx1G and -Xms1G without any changes to yaml file
I've created next test column family:
CREATE COLUMN FAMILY TEST_HUGE_SF
WITH comparator = UTF8Type
AND key_validation_class=UTF8Type;
Then I try to insert rows in this column family.
I use astyanax lib to access cassandra:
final long START = 1;
final long MAX_ROWS_COUNT = 1000000000; // 1 Billion
Keyspace keyspace = AstyanaxProvider.getAstyanaxContext().getClient();
ColumnFamily<String, String> cf = new ColumnFamily<>(
"TEST_HUGE_SF",
StringSerializer.get(),
StringSerializer.get());
MutationBatch mb = keyspace.prepareMutationBatch()
.withRetryPolicy(new BoundedExponentialBackoff(250, 5000, 20));
for (long i = START; i<MAX_ROWS_COUNT; i++) {
long t = i % 1000;
if (t == 0) {
System.out.println("pushed: " + i);
mb.execute();
Thread.sleep(1);
mb = keyspace.prepareMutationBatch()
.withRetryPolicy(new BoundedExponentialBackoff(250, 5000, 20));
}
ColumnListMutation<String> clm = mb.withRow(cf, String.format("row_%012d", i));
clm.putColumn("col1", i);
clm.putColumn("col2", t);
}
mb.execute();
So as you can see from code, I try to insert 1 Billion rows, each one contains two columns, each column contains simple long value.
After inserting ~ 122 million rows, - cassandra crashed with OutOfMemoryError.
In logs there is next:
INFO [CompactionExecutor:1571] 2014-08-08 08:31:45,334 CompactionTask.java (line 263) Compacted 4 sstables to [\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-2941,]. 865 252 169 bytes to 901 723 715 (~104% of original) in 922 963ms = 0,931728MB/s. 26 753 257 total rows, 26 753 257 unique. Row merge counts were {1:26753257, 2:0, 3:0, 4:0, }
INFO [CompactionExecutor:1571] 2014-08-08 08:31:45,337 CompactionTask.java (line 106) Compacting [SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-2069-Data.db'), SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-629-Data.db'), SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-2941-Data.db'), SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-1328-Data.db')]
ERROR [CompactionExecutor:1571] 2014-08-08 08:31:46,167 CassandraDaemon.java (line 132) Exception in thread Thread[CompactionExecutor:1571,1,main]
java.lang.OutOfMemoryError
at sun.misc.Unsafe.allocateMemory(Native Method)
at org.apache.cassandra.io.util.Memory.<init>(Memory.java:52)
at org.apache.cassandra.io.util.Memory.allocate(Memory.java:60)
at org.apache.cassandra.utils.obs.OffHeapBitSet.<init>(OffHeapBitSet.java:40)
at org.apache.cassandra.utils.FilterFactory.createFilter(FilterFactory.java:143)
at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:137)
at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:126)
at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.<init>(SSTableWriter.java:445)
at org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:92)
at org.apache.cassandra.db.ColumnFamilyStore.createCompactionWriter(ColumnFamilyStore.java:1958)
at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:144)
at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:59)
at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:62)
at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:191)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
INFO [CompactionExecutor:1570] 2014-08-08 08:31:46,994 CompactionTask.java (line 263) Compacted 4 sstables to [\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-3213,]. 34 773 524 bytes to 35 375 883 (~101% of original) in 44 162ms = 0,763939MB/s. 1 151 482 total rows, 1 151 482 unique. Row merge counts were {1:1151482, 2:0, 3:0, 4:0, }
INFO [CompactionExecutor:1570] 2014-08-08 08:31:47,105 CompactionTask.java (line 106) Compacting [SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-2069-Data.db'), SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-629-Data.db'), SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-2941-Data.db'), SSTableReader(path='\var\lib\cassandra\data\cyodaTest1\TEST_HUGE_SF\cyodaTest1-TEST_HUGE_SF-ib-1328-Data.db')]
ERROR [CompactionExecutor:1570] 2014-08-08 08:31:47,110 CassandraDaemon.java (line 132) Exception in thread Thread[CompactionExecutor:1570,1,main]
java.lang.OutOfMemoryError
at sun.misc.Unsafe.allocateMemory(Native Method)
at org.apache.cassandra.io.util.Memory.<init>(Memory.java:52)
at org.apache.cassandra.io.util.Memory.allocate(Memory.java:60)
at org.apache.cassandra.utils.obs.OffHeapBitSet.<init>(OffHeapBitSet.java:40)
at org.apache.cassandra.utils.FilterFactory.createFilter(FilterFactory.java:143)
at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:137)
at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:126)
at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.<init>(SSTableWriter.java:445)
at org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:92)
at org.apache.cassandra.db.ColumnFamilyStore.createCompactionWriter(ColumnFamilyStore.java:1958)
at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:144)
at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:59)
at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:62)
at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:191)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
As I see cassandra crashes during sstables compaction.
Does this mean that to handle more rows cassandra needs more heap space?
I expected that lack of heap space will only affect performance. Can someone describe, why my expectations are wrong?

Someone else noted this - 1GB heap is very small. With Cassandra 2.0, you could look into this tuning guide for further information:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_tune_jvm_c.html
Another consideration is how garbage collecting is being handled. In the cassandra log directory, there should be also GC logs indicating how often and how long the collections were. You can monitor them live using jvisualvm, if you want.

Related

Azure-Databricks autoloader Binaryfile option with foreach() gives java.lang.OutOfMemoryError: Java heap space

I am trying to do copying file from one location to another location using BinaryFile option and foreach(copy) in autoloader. It runs well with smaller files(upto 150 MB) but fails with bigger files throws below exception :
22/09/07 10:25:51 INFO FileScanRDD: Reading File path: dbfs:/mnt/somefile.csv, range: 0-1652464461, partition values: [empty row], modificationTime: 1662542176000.
22/09/07 10:25:52 ERROR Utils: Uncaught exception in thread stdout writer for /databricks/python/bin/python
java.lang.OutOfMemoryError: Java heap space
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBinary(UnsafeRow.java:416)
at org.apache.spark.sql.catalyst.expressions.SpecializedGettersReader.read(SpecializedGettersReader.java:75)
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.get(UnsafeRow.java:333)
at org.apache.spark.sql.execution.python.EvaluatePython$.toJava(EvaluatePython.scala:58)
at org.apache.spark.sql.execution.python.PythonForeachWriter.$anonfun$inputByteIterator$1(PythonForeachWriter.scala:43)
at org.apache.spark.sql.execution.python.PythonForeachWriter$$Lambda$1830/1643360976.apply(Unknown Source)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:92)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:82)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:82)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:442)
at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:871)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:573)
at org.apache.spark.api.python.BasePythonRunner$WriterThread$$Lambda$2008/2134044540.apply(Unknown Source)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2275)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:365)
22/09/07 10:25:52 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for /databricks/python/bin/python,5,main]
java.lang.OutOfMemoryError: Java heap space
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBinary(UnsafeRow.java:416)
at org.apache.spark.sql.catalyst.expressions.SpecializedGettersReader.read(SpecializedGettersReader.java:75)
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.get(UnsafeRow.java:333)
at org.apache.spark.sql.execution.python.EvaluatePython$.toJava(EvaluatePython.scala:58)
at org.apache.spark.sql.execution.python.PythonForeachWriter.$anonfun$inputByteIterator$1(PythonForeachWriter.scala:43)
at org.apache.spark.sql.execution.python.PythonForeachWriter$$Lambda$1830/1643360976.apply(Unknown Source)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:92)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:82)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:82)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:442)
at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:871)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:573)
at org.apache.spark.api.python.BasePythonRunner$WriterThread$$Lambda$2008/2134044540.apply(Unknown Source)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2275)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:365)
Below is the high-level code snippet for reference:
Cluster size is 2 workers and 1 driver with 14 Gb ram and 4 cores each
cloudfile_options = {
"cloudFiles.subscriptionId":subscription_ID,
"cloudFiles.connectionString": queue_SAS_connection_string,
"cloudFiles.format": "BinaryFile",
"cloudFiles.tenantId":tenant_ID,
"cloudFiles.clientId":client_ID,
"cloudFiles.clientSecret":client_secret,
"cloudFiles.useNotifications" :"true"
}
def copy(row):
source = row['path']
destination = "somewhere"
shutil.copy(source,destination)
spark.readStream.format("cloudFiles")
.options(**cloudfile_options)
.load(storage_input_path)
.writeStream
.foreach(copy)
.option("checkpointLocation", checkpoint_location)
.trigger(once=True)
.start()
I also tested shutil.copy with huge file sizes (20GB) outside foreach() and it works seemlessly.
Any leads on this would be much appreciated 😊
It happens because you're passing the full row that include the file content that should be serialized from JVM to Python. If everything you do is just copying the file, then just add .select("path") before .writeStream, so only file name will be passed to Python, but no content:

Why increasing spark.sql.shuffle.partitions will cause FetchFailedException

I meet a FetchFailedException when join to table while setting spark.sql.shuffle.partitions = 2700
But run successfully when setting spark.sql.shuffle.partitions = 500 .
As I know increasing shuffle.partitions will decrease data in every task when shuffle read..
Am I miss something?
Exception:
FetchFailed(BlockManagerId(699, nfjd-hadoop02-node120.jpushoa.com, 7337, None), shuffleId=4, mapId=59, reduceId=1140, message=
org.apache.spark.shuffle.FetchFailedException: failed to allocate 16777216 byte(s) of direct memory (used: 2147483648, max: 2147483648)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:554)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:485)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:64)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCode
Config:
spark.executor.cores = 1
spark.dynamicAllocation.maxExecutors = 800
After Reading the code of shuffleFetch .
The problem I meet is the real block from ShuffleMapTask is too large to fetch into memory once, and the block size from driver is a average block size If my shuffle partitions more than 2000(according to spark.shuffle.minNumPartitionsToHighlyCompress ) which will be smaller then real size when having skew data.

Why does Word2Vec.fit run out of heap space

I am tokenizing a month's worth of URLs. I am unable to get through fit without running out of heap space. The cached dataframe looks like it is taking up less than 400MB of RAM and have 1.9TB total available over 11 nodes.
As you see in the code below, I have thrown a ton of space at it in the form of executor RAM, driver RAM, and have created a number of partitions both in the dataframe as well as in the Word2Vec process. I am not sure how to provide more RAM or divide the task into smaller portions. I feel like I must be missing something fundamental about this error.
myconf = SparkConf().setAll([
("spark.driver.memory", "200G"),
("spark.driver.cores", "10"),
("spark.executor.cores", "10"),
("spark.executor.memory", "200G"),
("spark.driver.maxResultSize","100G"),
("spark.yarn.executor.memoryOverhead","20G"),
("spark.kryoserializer.buffer.max", "1400mb"),
("spark.maxRemoteBlockSizeFetchToMem","1400mb"),
("spark.executor.extraJavaOptions","-XX:+UseConcMarkSweepGC"),
("spark.sql.shuffle.partitions","3000"),
])
appName = "word2vec-test2"
spark = (SparkSession.builder.config(conf=myconf)
.getOrCreate())
interval_df = (spark.read.parquet("/output/ddrml/preprocessed/201906*/")
.select("request")
.withColumnRenamed("request", "uri")
.where(F.col("uri").isNotNull())
.dropDuplicates()
.withColumn("uniqueID",F.monotonically_increasing_id())
.cache())
tokenizer = Tokenizer(inputCol="uri", outputCol="words")
regexTokenizer = RegexTokenizer(inputCol="uri", outputCol="words", pattern="\\W")
regexTokenized = regexTokenizer.transform(interval_df)
regexTokenized = regexTokenized.repartition(2000)
word2Vec = Word2Vec(vectorSize=15, minCount=0, inputCol="words", outputCol="uri_vec", numPartitions=500, maxSentenceLength=10)
model = word2Vec.fit(regexTokenized)
print("Made it through fit")
It appears to have some problems keeping up. I presume that something is doing a bunch of garbage collection. Then it dies in fit with a java heap error.
19/07/17 12:28:43 ERROR scheduler.LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.
19/07/17 12:28:43 WARN scheduler.LiveListenerBus: Dropped 1 SparkListenerEvents since Wed Dec 31 19:00:00 EST 1969
Exception in thread "dispatcher-event-loop-4" java.lang.OutOfMemoryError: Java heap space
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "dispatcher-event-loop-4"
19/07/17 12:30:48 WARN scheduler.LiveListenerBus: Dropped 794 SparkListenerEvents since Wed Jul 17 12:28:43 EDT 2019
19/07/17 12:32:00 WARN server.TransportChannelHandler:
Exception in connection from /192.168.7.29:46960
java.lang.OutOfMemoryError: Java heap space
at sun.reflect.ByteVectorImpl.trim(ByteVectorImpl.java:70)
at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:386)
at sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:112)
at sun.reflect.ReflectionFactory.generateConstructor(ReflectionFactory.java:398)
at sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:360)
at java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1520)
at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:79)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:507)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:482)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:482)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:379)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:669)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1876)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1745)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2033)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:557)
at org.apache.spark.rpc.netty.NettyRpcEndpointRef.readObject(NettyRpcEnv.scala:495)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427)
Update 1:
I was able to make this run over 20 days of data by changing the conf as follows below. Still not able to make it run over 30 days. Also, when looking at jobs in UI via history server, I don't see any failed executors in the stage that the job is in when it dies. Not sure what to make of that.
myconf = SparkConf().setAll([
("spark.driver.memory", "32G"),
("spark.driver.cores", "10"),
("spark.executor.cores", "5"),
("spark.executor.memory", "15G"),
("spark.rpc.message.maxSize","800"),
("spark.default.parallelism","2000"),
])

Why does spark shuffling not spill to disk ?

The simple wordcount program in spark doesn’t spill to disk and results in OOM error. In short:
The environment:
Spark: 2.3.0, Scala 2.11.8
3 x Executor, each: 1 core + 512 MB RAM
Text file: 341 MB
Other configurations are default (spark.memory.fraction = 0.6)
The code:
import org.apache.spark.SparkContext
object WordCount {
def main(args: Array[String]): Unit = {
val inPath = args(0)
val sc = new SparkContext("spark://master:7077", "Word Count ver3")
val words = sc.textFile(inPath, minPartitions = 20)
.map(line => line.toLowerCase())
.flatMap(text => text.split(' '))
val wc = words.groupBy(word => word)
.map({ case (groupName, groupList) => (groupName, groupList.size) })
.count()
}
}
The error:
2018-05-04 13:46:36 WARN TaskSetManager:66 - Lost task 1.0 in stage 1.0 (TID 21, 192.168.10.107, executor 0): java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.String.<init>(String.java:325)
at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:598)
at com.esotericsoftware.kryo.io.Input.readString(Input.java:472)
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:195)
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:184)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:278)
at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:156)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:153)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:41)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:90)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The heapdump:
The problems are:
Heapsize for execution would be (512 - 300) * 0.6 = 127 MB (since I don’t use cache). Why does the ExternalAppendOnlyMap size is more than 380 MB ? The class must be stored in heap memory, and its size cannot be larger than the heap size.
The ExternalAppendOnlyMap is a spillable class, and it should spill its data to disk due to lack of memory in this case, but in this case it didn’t, results in a OOM error.
Heap memory of the program is divided into: Spark execution memory and user memory. Look into the heap dump, which objects will be stored in which division of heap memory ?
Really appreciated for your time.

cassandra sstableloader hanging with receiving progress 100%

I'm trying to load data by sstableloader in cassandra 2.0.7.
The terminal shows progress 100%. I check netstats by nodetool netstats
It shows:
Mode: NORMAL
Bulk Load 21d7d610-a5f2-11e5-baa7-8fc95be03ac4
/10.19.150.70
Receiving 4 files, 138895248 bytes total
/root/data/whatyyun/metadata/whatyyun-metadata-tmp-jb-8-Data.db 67039680/67039680 bytes(100%) received from /10.19.150.70
/root/data/whatyyun/metadata/whatyyun-metadata-tmp-jb-10-Data.db 3074549/3074549 bytes(100%) received from /10.19.150.70
/root/data/whatyyun/metadata/whatyyun-metadata-tmp-jb-9-Data.db 43581052/43581052 bytes(100%) received from /10.19.150.70
/root/data/whatyyun/metadata/whatyyun-metadata-tmp-jb-7-Data.db 25199967/25199967 bytes(100%) received from /10.19.150.70
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 0
Responses n/a 0 11671
The sstableloader hangs for hours. I check the log there is an error that may concerns.
ERROR [CompactionExecutor:7] 2015-12-19 09:45:53,811 CassandraDaemon.java (line 198) Exception in thread Thread[CompactionExecutor:7,1,main]
java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:62)
at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:51)
at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:31)
at org.apache.cassandra.dht.LocalToken.compareTo(LocalToken.java:44)
at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:85)
at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:36)
at java.util.concurrent.ConcurrentSkipListMap.findNode(ConcurrentSkipListMap.java:804)
at java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:828)
at java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1626)
at org.apache.cassandra.db.Memtable.resolve(Memtable.java:215)
at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
at org.apache.cassandra.db.index.AbstractSimplePerColumnSecondaryIndex.insert(AbstractSimplePerColumnSecondaryIndex.java:107)
at org.apache.cassandra.db.index.SecondaryIndexManager.indexRow(SecondaryIndexManager.java:441)
at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:407)
at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62)
at org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:833)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
ERROR [NonPeriodicTasks:1] 2015-12-19 09:45:53,812 CassandraDaemon.java (line 198) Exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException
at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:413)
at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:142)
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:113)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:409)
... 9 more
Caused by: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:62)
at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:51)
at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:31)
at org.apache.cassandra.dht.LocalToken.compareTo(LocalToken.java:44)
at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:85)
at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:36)
at java.util.concurrent.ConcurrentSkipListMap.findNode(ConcurrentSkipListMap.java:804)
at java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:828)
at java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1626)
at org.apache.cassandra.db.Memtable.resolve(Memtable.java:215)
at java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1626)
at org.apache.cassandra.db.Memtable.resolve(Memtable.java:215)
at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
at org.apache.cassandra.db.index.AbstractSimplePerColumnSecondaryIndex.insert(AbstractSimplePerColumnSecondaryIndex.java:107)
at org.apache.cassandra.db.index.SecondaryIndexManager.indexRow(SecondaryIndexManager.java:441)
at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:407)
at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62)
at org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:833)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
... 3 more
The scheme of the table is as follows:
CREATE TABLE metadata (
userid timeuuid,
dirname text,
basename text,
ctime timestamp,
fileid timeuuid,
imagefileid timeuuid,
imagefilesize int,
mtime timestamp,
nodetype int,
showname text,
size bigint,
timelong text,
PRIMARY KEY (userid, dirname, basename, ctime)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX idx_fileid ON metadata (fileid);
CREATE INDEX idx_nodetype ON metadata (nodetype);
Can I kill the process of the sstableloader safely? Has this bulk load process finished?
You should try to increase heap for SSTABLELoader.
vim $(which sstableloader)
#########"$JAVA" $JAVA_AGENT -ea -cp "$CLASSPATH" $JVM_OPTS -Xmx$MAX_HEAP_SIZE \
"$JAVA" $JAVA_AGENT -ea -cp "$CLASSPATH" $JVM_OPTS -XX:+UseG1GC -Xmx10G -Xms10G -XX:+UseTLAB -XX:+ResizeTLAB \
-Dcassandra.storagedir="$cassandra_storagedir" \
-Dlogback.configurationFile=logback-tools.xml \
org.apache.cassandra.tools.BulkLoader "$#"
I hope that would solve your issue.
Your node must be running out of resources may be due to heavy load or any other process.
Try restarting Cassandra on the nodes running on high load and see if that helps.
From the above error it seems you are running some resource crunch. So you need to tune some setting on memory side like Xmx,Xms Which indicates min and max heap size in cassandra-env.sh file for lower version of Cassandra(i.e 2.x)
After tuning above you need to restart you node/cluster and try loading again.

Resources