Cassandra : java.lang.OutOfMemoryError: Java heap space - cassandra

I am using cassandra 2.0.8 and getting this exception
INFO 16:44:50,132 Initializing system.batchlog
INFO 16:44:50,138 Initializing system.sstable_activity
INFO 16:44:50,142 Opening /var/lib/cassandra/data/system/sstable_activity/system-sstable_activity-jb-10 (826 bytes)
INFO 16:44:50,142 Opening /var/lib/cassandra/data/system/sstable_activity/system-sstable_activity-jb-9 (827 bytes)
INFO 16:44:50,142 Opening /var/lib/cassandra/data/system/sstable_activity/system-sstable_activity-jb-11 (825 bytes)
INFO 16:44:50,150 reading saved cache /var/lib/cassandra/saved_caches/system-sstable_activity-KeyCache-b.db
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid3460.hprof ...
Heap dump file created [13378724 bytes in 0.071 secs]
ERROR 16:44:50,333 Exception encountered during startup
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(ArrayList.java:144)
at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:120)
at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:262)
at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:421)
at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:392)
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:309)
at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:266)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:536)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(ArrayList.java:144)
at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:120)
at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:262)
at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:421)
at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:392)
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:309)
at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:266)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:536)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
Exception encountered during startup: Java heap space
Can anyone tell me the reason and solution:

Reach out to cassandra/conf/cassandra-env.sh location
Check out the current heap size
You can assign at max of 1/2 RAM to the HEAP
#MAX_HEAP_SIZE="4G"
#HEAP_NEWSIZE="800M"
if you are changing your current heap-size then remove comment..
MAX_HEAP_SIZE="4G"
HEAP_NEWSIZE="800M"

Its possible your key cache is taking up too much space (since thats where it died) but it seems unlikely. You can try to delete your KeyCache before starting
/var/lib/cassandra/saved_caches
and set
key_cache_size_in_mb: 0
in your cassandra.yaml as a test (I would not recommend this permanently) to have it disabled.
You can actually determine whats filling up your heap by opening up the java_pid3460.hprof file it created in yourkit or some heap analyzer to determine whats taking up the space. There may be something funny going on, very strange to be dying at 13mb or so (size of heap).

Delete all log files in usr/local/var/lib/cassandra/commitlog/ and restart Cassandra.

Related

Compaction cause out of memory error and shutdown the Cassandra process

Using Cassandra 3.11 with 18 nodes and 32 GB of memory in production, the weekly compaction job throws the following error in system.log and debug.log then the Cassandra process dies and I have to start Cassandra.
ERROR [ReadStage-4] JVMStabilityInspector.java:142 - JVM state determined to be unstable. Exiting forcefully due to: java.lang.OutOfMemoryError: Direct buffer memory
ERROR [ReadStage-5] JVMStabilityInspector.java:142 - JVM state determined to be unstable. Exiting forcefully due to: java.lang.OutOfMemoryError: Direct buffer memory
DEBUG [ReadRepairStage:29517] ReadCallback.java:242 - Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-3787568997731881233, 0000000004ca5c48) (2b912cd2000e6bb5b481fa849e438ae4 vs 962e899380ce22ac970c6be0014707de)
Java heap size is 8 GB
/opt/apache-cassandra-3.11.0/conf/jvm.options
-Xms8G
-Xmx8G
Any other workaround rather than increasing heap size to prevent out-of-memory issue during compaction?
On their own, the errors you posted don't indicate to me that they're related to compaction. Instead the thread IDs (ReadStage-*) indicate that they're the result of read requests.
If anything, the DigestMismatchException is a bit more telling because it indicates that your replicas are out-of-sync. If you're seeing dropped mutations in the logs, that's a clear indication that the nodes are overloaded. That symptom is more in-line with the OOM you're seeing which I believe is a result of your cluster not able to keep up with the app traffic.
With 32GB RAM systems, we recommend bumping up the heap to 16GB for production systems. I realise you prefer not to do this but it's an appropriate action and it's something I would do if I were managing the cluster.
I'd also make sure that data/ and commitlog/ are on separate disks/volumes so they're not competing for the same IO (unless you have direct-attached SSDs). If you're still seeing lots of dropped mutations, consider increasing the capacity of your cluster by adding more nodes. Cheers!

Compaction cause OutOfMemoryError

I'm getting OutOfMemoryError when run compaction on some big sstables in production, table size is around 800 GB, compaction on small sstables is working properly though.
$ noodtool compact keyspace1 users
error: Direct buffer memory
-- StackTrace --
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:693)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at org.apache.cassandra.io.compress.BufferType$2.allocate(BufferType.java:35)
Java heap memory(Xms and Xmx) have been set to 8 GB, wondering if I should increase Java heap memory to 12 or 16 GB?
It's not the Heap size, but it's instead so-called "direct memory" - you need to check what amount you have (it's could be specified by something like this -XX:MaxDirectMemorySize=512m, or it will take the same max size as heap). You can increase it indirectly by increasing the heap size, or you can control it explicitly via -XX flag. Here is the good article about non-heap memory in Java.

ODI Import: java.lang.OutOfMemoryError: GC overhead limit exceeded

i'm trying to import a large project in ODI with ODI Studio 12c.
After 3 hours of import the process fails with the following error:
java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at oracle.odi.ui.framework.adapter.DefaultAdapter.executeBackgroundTask(DefaultAdapter.java:636)
at oracle.odi.ui.framework.UIFramework.executeBackgroundTask(UIFramework.java:452)
at oracle.odi.ui.smartie.imp.ImportSmartWizard.runImportProcess(ImportSmartWizard.java:394)
at oracle.odi.ui.smartie.imp.ImportSmartWizard.runImport(ImportSmartWizard.java:260)
at oracle.odi.ui.smartie.imp.ImportSmartWizard.finished(ImportSmartWizard.java:205)
at
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at java.lang.StringBuilder.toString(StringBuilder.java:405)
at java.lang.Class.getDeclaredMethod(Class.java:2004)
at com.sunopsis.dwg.smartie.RunSmartImport.run(RunSmartImport.java:3264)
I increased the option
AddVMOption -XX:MaxPermSize=1024M to
AddVMOption -XX:MaxPermSize=4096M
Any other suggestions?
Thank you so much
You can try -XX:-UseGCOverheadLimit
You need to increase the size of the young/old generations, not the permanent generation. Change the argument to -Xmx4096m (or similar).
The permanent generation contains all the reflective data of the virtual machine itself, such as class and method objects. Your OOM exception is coming from an array copy so not related to the perm gen.

java.lang.OutOfMemoryError for simple rdd.count() operation

I'm having a lot of trouble getting a simple count operation working on about 55 files on hdfs and a total of 1B records. Both spark-shell and PySpark fail with OOM errors. I'm using yarn, MapR, Spark 1.3.1, and hdfs 2.4.1. (It fails in local mode as well.) I've tried following the tuning and configuration advice, throwing more and more memory at the executor. My configuration is
conf = (SparkConf()
.setMaster("yarn-client")
.setAppName("pyspark-testing")
.set("spark.executor.memory", "6g")
.set("spark.driver.memory", "6g")
.set("spark.executor.instances", 20)
.set("spark.yarn.executor.memoryOverhead", "1024")
.set("spark.yarn.driver.memoryOverhead", "1024")
.set("spark.yarn.am.memoryOverhead", "1024")
)
sc = SparkContext(conf=conf)
sc.textFile('/data/on/hdfs/*.csv').count() # fails every time
The job gets split into 893 tasks and after about 50 tasks are successfully completed, many start failing. I see ExecutorLostFailure in the stderr of the application. When digging through the executor logs, I see errors like the following:
15/06/24 16:54:07 ERROR util.Utils: Uncaught exception in thread stdout writer for /work/analytics2/analytics/python/envs/santon/bin/python
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
at java.nio.CharBuffer.allocate(CharBuffer.java:331)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
at org.apache.hadoop.io.Text.decode(Text.java:406)
at org.apache.hadoop.io.Text.decode(Text.java:383)
at org.apache.hadoop.io.Text.toString(Text.java:281)
at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:558)
at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:558)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:379)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:242)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:204)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:204)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1550)
at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:203)
15/06/24 16:54:07 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for /work/analytics2/analytics/python/envs/santon/bin/python,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
at java.nio.CharBuffer.allocate(CharBuffer.java:331)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
at org.apache.hadoop.io.Text.decode(Text.java:406)
at org.apache.hadoop.io.Text.decode(Text.java:383)
at org.apache.hadoop.io.Text.toString(Text.java:281)
at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:558)
at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:558)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:379)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:242)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:204)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:204)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1550)
at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:203)
15/06/24 16:54:07 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
In the stdout:
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill %p"
# Executing /bin/sh -c "kill 16490"...
In general, I think I understand the OOM errors and troubleshooting, but I'm stuck conceptually here. This is just a simple count. I don't understand how the Java heap could possibly be overflowing when the executors have ~3G heaps. Has anyone run into this before or have any pointers? Is there something going on under the hood that would shed light on the issue?
Update:
I've also noticed that by specifying the parallelism (for example sc.textFile(..., 1000)) to the same number of tasks (893), then the created job has 920 tasks, all but the last of which complete without error. Then the very last task hangs indefinitely. This seems exceedingly strange!
It turns out that the issue I was having was actually related to a single file that was corrupted. Running a simple cat or wc -l on the file would cause the terminal to hang.
Try to increase JAVA heap size as following on your console
export JAVA_OPTS="-Xms512m -Xmx5g"
You can change the values according to your data and memory size, -Xms Means minimum memory and -Xmx means max size. Hopefully it will help you.

Cassandra terminates with OutOfMemory (OOM) error

We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any of the default heap or GC settings. So each node is allocating 1.8GB of heap space. The rows are wide; each row stores around 260,000 columns. We are reading the data using Astyanax. If our application tries to read 80,000 columns each from 10 or more rows at the same time, some of the nodes run out of heap space and terminate with OOM error. Here is the error message:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)
at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
at org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
at org.apache.cassandra.db.Table.getRow(Table.java:355)
at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main] java.lang.OutOfMemoryError: Java heap space
at java.lang.Long.toString(Long.java:269)
at java.lang.Long.toString(Long.java:764)
at org.apache.cassandra.dht.Murmur3Partitioner$1.toString(Murmur3Partitioner.java:171)
at org.apache.cassandra.service.StorageService.describeRing(StorageService.java:1068)
at org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:1192)
at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3766)
at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3754)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722) ERROR 02:14:05,350 Exception in thread Thread[ACCEPT-/10.0.0.170,5,main] java.lang.RuntimeException: java.nio.channels.ClosedChannelException
at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:893) Caused by: java.nio.channels.ClosedChannelException
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:211)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:99)
at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:882)
The data in each column is less than 50 bytes. After adding all the column overheads (column name + metadata), it should not be more than 100 bytes. So reading 80,000 columns from 10 rows each means that we are reading 80,000 * 10 * 100 = 80 MB of data. It is large, but not large enough to fill up the 1.8 GB heap. So I wonder why the heap is getting full. If the data request is to big to fill in a reasonable amount of time, I would expect Cassandra to return a TimeOutException instead of terminating.
One easy solution is to increase the heap size, but that will only mask the problem. Reading 80MB of data should not make a 1.8 GB heap full.
Is there some other Cassandra setting that I can tweak to prevent the OOM exception?
No, there is no write operation in progress when I read the data. I am
sure that increasing the heap space may help. but I am trying to
understand why reading 80MB of data is making a 1.8GB heap full.
Cassandra uses Heap and OfHeap chaching.
First loading of 80MB userdata may result in 200-400 MB of Java Heap usage. (which vm? 64 bit?)
Secondly this memory is added to memory allready used for caches. It seemes that cassandra does not frees that caches to serve your private query. Could make sence for overal throughput.
Did you meanwhile solved your problem by increasing MaxHeap?

Resources