Cannot record QUEUE latency of n minutes - DSE - cassandra

One of our nodes in our 3 node cluster is down and on checking the log file, it shows the below messages
INFO [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:32,891 AbstractMetrics.java:114 - Cannot record QUEUE latency of 11 minutes because higher than 10 minutes.
INFO [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:33,233 AbstractMetrics.java:114 - Cannot record QUEUE latency of 10 minutes because higher than 10 minutes.
WARN [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:33,398 Worker.java:99 - Interrupt/timeout detected.
java.util.concurrent.BrokenBarrierException: null
at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:200) ~[na:1.7.0_79]
at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:355) ~[na:1.7.0_79]
at com.datastax.bdp.concurrent.FlushTask.bulkSync(FlushTask.java:76) ~[dse-core-4.8.3.jar:4.8.3]
at com.datastax.bdp.concurrent.Worker.run(Worker.java:94) ~[dse-core-4.8.3.jar:4.8.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_79]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
WARN [keyspace.core Index WorkPool work thread-2] 2016-09-14 14:05:33,398 Worker.java:99 - Interrupt/timeout detected.
java.util.concurrent.BrokenBarrierException: null
at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:200) ~[na:1.7.0_79]
at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:355) ~[na:1.7.0_79]
at com.datastax.bdp.concurrent.FlushTask.bulkSync(FlushTask.java:76) ~[dse-core-4.8.3.jar:4.8.3]
at com.datastax.bdp.concurrent.Worker.run(Worker.java:94) ~[dse-core-4.8.3.jar:4.8.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_79]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
INFO [keyspace.core Index WorkPool work thread-4] 2016-09-14 14:05:33,720 AbstractMetrics.java:114 - Cannot record QUEUE latency of 13 minutes because higher than 10 minutes.
INFO [keyspace.core Index WorkPool work thread-4] 2016-09-14 14:05:33,721 AbstractMetrics.java:114 - Cannot record QUEUE latency of 13 minutes because higher than 10 minutes.
The nodes configuration are 8 CPU, 32 GB RAM, 500 GB Disk space. What could be the reasons for only one particular node going down?

So I'm going to answer with some general info here, your case might be more complex. 32GB RAM might not be large enough for a Solr node; using the G1 collector on Java 1.8 has proved better for Solr with heap sizes above 26GB.
I'm also not sure what heap sizes, JVM settings and how many solr cores you have here. However, I've seen similar errors to this when a node is busy indexing and its trying to keep up. Once of the most common problems seen on Solr nodes in my experience is where the max_solr_concurrency_per_core is left at default (commented out) in the dse.yaml. This will typically allocate the number of indexing threads to the number of CPU cores, and to further compound the problem, you might see 8 cores but if you have HT then its actually likely 4 physical cores.
Check your dse.yaml and make sure you are setting it to num physcal cpu cores / num of solr cores with 2 at a minimum. This might index slower but you should remove the pressure off of your node.
I'd recommend this useful blog here as a good start to tuning DSE Solr:
http://www.datastax.com/dev/blog/tuning-dse-search
Also docs on the subject:
https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/srchTune.html

Related

timeouts on ReadRepairStage error messages

We are using Apache Cassandra 3.11.4 .Recently we are seeing overloaded readrepair ERROR messages in the entire cluster because that we are getting timeouts ..I'm not able to find the root cause for this . Appreciate any inputs on this issue ..
ERROR [ReadRepairStage:2537] 2019-07-18 17:08:15,119 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:2537,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 1 responses.
at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:202) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:79) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.3.jar:3.11.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_212]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) ~[apache-cassandra-3.11.3.jar:3.11.3]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_212]
reduced dclocalreadrepair to 0.0
Timeouts are a common issue while attempting repairs, and without more specifics of the errors, or your configuration, this will be a shot in the dark.
Repairs depend on disk space, as it will create temporary copies of files, as a rule of thumb the disk utilization should be lower than or equal to 50% to ensure that you'll have enough space.
Repairs can be delayed/aborted if the cluster is stressed, if that is the case, you may need to scale up the cluster to increase the available resources.
You may want to take a look in these other recommendations from Aaron regarding updates of the JVM settings in repairs.
Also note that since Cassandra 3.11.3, the settings read_repair_chance and dc_read_repair_chance were removed, as their names were misleading with the result obtained. Adding them won't have any effect.

Encounter SparkException "Cannot broadcast the table that is larger than 8GB"

I am using Spark 2.2.0 to do data processing. I am using Dataframe.join to join 2 dataframes together, however I encountered this stack trace:
18/03/29 11:27:06 INFO YarnAllocator: Driver requested a total number of 0 executor(s).
18/03/29 11:27:09 ERROR FileFormatWriter: Aborting job null.
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:123)
at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:248)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:126)
at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:155)
...........
Caused by: org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8GB: 10 GB
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:86)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:73)
at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:103)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:72)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:72)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I searched on Internet for this error, but didn't get any hint or solution how to fix this.
Does Spark automatically broadcast Dataframe as part of the join? I am very surprise with this 8GB limit because I would have thought Dataframe supports "big data" and 8GB is not very big at all.
Thank you very much in advance for your advice on this.
Linh
After some reading, I've tried to disable the auto-broadcast and it seemed to work. Change Spark config with:
'spark.sql.autoBroadcastJoinThreshold': '-1'
Currently it is a hard limit in spark that the broadcast variable size should be less than 8GB. See here.
The 8GB size is generally big enough. If you consider that you re running a job with 100 executors, spark driver needs to send the 8GB data to 100 Nodes resulting 800GB network traffic. This cost will be much less if you don't broadcast and use simple join.

Making Spark app work on m4 instances

I have a pyspark application running on EMR (5.7.0) which processes about 140M json records. Nothing terribly fancy outside of the size of the dataset -- main operations are map, filter, count, repartitionAndSort, and mapPartition.
This app runs on a cluster of 40 m3.2xlarge instances, but I wanted to try it on the m4 family (since I get access to beefier machines that way).
However, the executors fall over the the cluster is rendered inoperable (If I rerun my application, it fails far earlier with no available executors).
Here's the stack trace I get on failure.
17/08/26 15:51:15 WARN TaskSetManager: Lost task 1118.0 in stage 145.0 (TID 72871, ip-172-22-247-134.ec2.internal, executor 91): java.lang.IllegalArgumentException: Self-suppression not permitted
at java.lang.Throwable.addSuppressed(Throwable.java:1043)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1316)
at org.apache.spark.rdd.ReliableCheckpointRDD$.writePartitionToCheckpointFile(ReliableCheckpointRDD.scala:182)
at org.apache.spark.rdd.ReliableCheckpointRDD$$anonfun$writeRDDToCheckpointDirectory$1.apply(ReliableCheckpointRDD.scala:137)
at org.apache.spark.rdd.ReliableCheckpointRDD$$anonfun$writeRDDToCheckpointDirectory$1.apply(ReliableCheckpointRDD.scala:137)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /checkpoints/48b56f21-7f74-429c-934c-0aea983c0175/rdd-359/.part-01118-attempt-0 could only be replicated to 0 nodes instead of minReplication (=1). There are 36 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1580)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
Any help? Seems very weird that this can't run on m4-class machines.

Decommission cassandra node times out with "received only 0 responses"

When I try to decommission a node in my Cassandra cluster, the process starts (I see active streams flowing from the node to decommission to the other nodes in the cluster (using vnodes)), but then after a little delay nodetool decommission exists with the following error message.
I can repeatedly run nodetool decommission and it will start streaming data to other nodes, but so far always exists with the below error.
Why am I seeing this, and is there a way I can safely decommission this node?
Exception in thread "main" java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.db.HintedHandOffManager.getHintsSlice(HintedHandOffManager.java:578)
at org.apache.cassandra.db.HintedHandOffManager.listEndpointsPendingHints(HintedHandOffManager.java:528)
at org.apache.cassandra.service.StorageService.streamHints(StorageService.java:2854)
at org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2834)
at org.apache.cassandra.service.StorageService.decommission(StorageService.java:2795)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1454)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:74)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1295)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1387)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:818)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:100)
at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:1213)
at org.apache.cassandra.db.HintedHandOffManager.getHintsSlice(HintedHandOffManager.java:573)
... 33 more
The hinted handoff manager is checking for hints to see if it needs to pass those off during
the decommission so that the hints don't get lost. You most likely have a lot of hints, or
a bunch of tombstones, or something in the table causing the query to timeout. You aren't
seeing any other exceptions in your logs before the timeout are you? Raising the read timeout
period on your nodes before you decommission them, or manually deleting the hints CF, should
most likely get your past this. If you delete them, you would then want to make sure you
ran a full cluster repair when you are done with all of your decommissions, to propagate data
from any hints you deleted.
The short answer is that the node I was trying to decommission was underpowered for the amount of data it held. As of this writing there seems to be a reasonable hard minimum of resources needed to handle nodes with arbitrary amounts of data, which seems to be somewhere in the neighborhood of what an AWS i2.2xlarge provides. In particular, the old m1 instances let you get into trouble by allowing you to store far more data on each node than the memory and compute resources available can support on it.

Cassandra terminates with OutOfMemory (OOM) error

We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any of the default heap or GC settings. So each node is allocating 1.8GB of heap space. The rows are wide; each row stores around 260,000 columns. We are reading the data using Astyanax. If our application tries to read 80,000 columns each from 10 or more rows at the same time, some of the nodes run out of heap space and terminate with OOM error. Here is the error message:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)
at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
at org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
at org.apache.cassandra.db.Table.getRow(Table.java:355)
at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main] java.lang.OutOfMemoryError: Java heap space
at java.lang.Long.toString(Long.java:269)
at java.lang.Long.toString(Long.java:764)
at org.apache.cassandra.dht.Murmur3Partitioner$1.toString(Murmur3Partitioner.java:171)
at org.apache.cassandra.service.StorageService.describeRing(StorageService.java:1068)
at org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:1192)
at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3766)
at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3754)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722) ERROR 02:14:05,350 Exception in thread Thread[ACCEPT-/10.0.0.170,5,main] java.lang.RuntimeException: java.nio.channels.ClosedChannelException
at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:893) Caused by: java.nio.channels.ClosedChannelException
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:211)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:99)
at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:882)
The data in each column is less than 50 bytes. After adding all the column overheads (column name + metadata), it should not be more than 100 bytes. So reading 80,000 columns from 10 rows each means that we are reading 80,000 * 10 * 100 = 80 MB of data. It is large, but not large enough to fill up the 1.8 GB heap. So I wonder why the heap is getting full. If the data request is to big to fill in a reasonable amount of time, I would expect Cassandra to return a TimeOutException instead of terminating.
One easy solution is to increase the heap size, but that will only mask the problem. Reading 80MB of data should not make a 1.8 GB heap full.
Is there some other Cassandra setting that I can tweak to prevent the OOM exception?
No, there is no write operation in progress when I read the data. I am
sure that increasing the heap space may help. but I am trying to
understand why reading 80MB of data is making a 1.8GB heap full.
Cassandra uses Heap and OfHeap chaching.
First loading of 80MB userdata may result in 200-400 MB of Java Heap usage. (which vm? 64 bit?)
Secondly this memory is added to memory allready used for caches. It seemes that cassandra does not frees that caches to serve your private query. Could make sence for overal throughput.
Did you meanwhile solved your problem by increasing MaxHeap?

Resources