We are using cassandra in order to collect the data from thingsboard. The memory it started with was 4GB (after executing the systemctl status for cassandra) and after 15 hours it has reached up to 9.3GB.
I want to know why is there this much increase in memory and is there any way to control it or to restrict it to use fixed amount of memory without the data being lost.
Check this for setting max heap size used . But tune cassandra gc properly when you change this.
Related
We have a 6 node Cassandra Cluster under heavy utilization. We have been dealing a lot with garbage collector stop the world event, which can take up to 50 seconds in our nodes, in the meantime Cassandra Node is unresponsive, not even accepting new logins.
Extra details:
Cassandra Version: 3.11
Heap Size = 12 GB
We are using G1 Garbage Collector with default settings
Nodes size: 4 CPUs 28 GB RAM
The G1 GC behavior is the same across all nodes.
Any help would be very much appreciated!
Edit 1:
Checking object creation stats, it does not look healthy at all.
Edit 2:
I have tried to use the suggested settings by Chris Lohfink, here is the GC report:
Using CMS suggested settings
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTAtNDk=
Using G1 suggested settings
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTExLTE3
The behavior remains basically the same:
Old Gen starts to fill up.
GC can't clean it properly without a full GC and a STW event.
The full GC starts to take longer, until the node is completely unresponsive.
I'm going to get the cfstats output for maximum partition size and tombstones per read asap and edit the post again.
Have you looked at using Zing? Cassandra situations like these are a classic use case, as Zing fundamentally eliminates all GC-related glitches in Cassandra nodes and clusters.
You can see some details on the how/why in my recent "Understanding GC" talk from JavaOne (https://www.slideshare.net/howarddgreen/understanding-gc-javaone-2017). Or just skip to slides 56-60 for Cassandra-specific results.
Without knowing what your existing settings or possible data model problems, heres a guess of some conservative settings to use to try to reduce evacuation pauses from not having enough to-space (check gc logs):
-Xmx12G -Xms12G -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=500 -XX:-ReduceInitialCardMarks -XX:G1HeapRegionSize=32m
This should also help reduce the pause of the update remember set which becomes an issue and reducing humongous objects, by setting G1HeapRegionSize, which can become a problem depending on data model. Make sure -Xmn is not set.
12Gb with C* is probably more suited for using CMS for what its worth, you can get better throughput certainly. Just need to be careful of fragmentation over time with the rather large objects that can get allocated.
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=55 -XX:MaxTenuringThreshold=3 -Xmx12G -Xms12G -Xmn3G -XX:+CMSEdenChunksRecordAlways -XX:+CMSParallelInitialMarkEnabled -XX:+CMSParallelRemarkEnabled -XX:CMSWaitDuration=10000 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCondCardMark
Most likely theres an issue with data model or your under provisioned though.
What is the maximum value that one can set for transaction_buffer inside memsql cnf? I assume there is a correlation with RAM allocated on the server. My leaves have 32G each and at the moment we have transaction_buffer set to 0. We are passed designing phase on our cluster and we would like to do some performance tuning and one parameter that needs to be set up accordingly is this one.
The transaction_buffer size is an amount of memory reserved per database partition - i.e. each leaf node will need transaction_buffer size * partitions per leaf * number of databases memory. The default is 128 MB and this should be sufficient generally.
Basically, it's a balancing act - data in transaction_buffer will exist in memory before being written to disk. A transaction_buffer of 0 may save you some memory, but it's not taking full advantage of the speed of being in memory. If you have a lot of databases that are updated infrequently a low transaction_buffer may be the right balance as it is a per database cost (keeping in mind that each partition is a database itself).
Transaction_buffer may also be valuable for you as a "get out of jail free" card - since if your workload becomes more and more memory intensive it's possible to get into a situation where your OS is killing MemSQL too frequently to reduce memory consumption. Once you get stuck in a vicious cycle like that, restarting with a reduced transaction buffer can reduce memory overhead enough to keep the system from being OOM-killed long enough to troubleshoot and correct the issue on your end.
Eventually, it might become adaptive, and you'll be left without that easy way to get some wiggle-room. Which is why it is essential to make sure the maximum_memory is low enough that your system doesn't begin to OOM kill processes. https://docs.memsql.com/docs/memory-management
There is an In-Memory option introduced in the Cassandra by DataStax Enterprise 4.0:
http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/inMemory.html
But with 1GB size limited for an in-memory table.
Anyone know the consideration why limited it as 1GB? And possible extend to a large size of in-memory table, such as 64GB?
To answer your question: today it's not possible to bypass this limitation.
In-Memory tables are stored within the JVM Heap, regardless the amount of memory available on single node allocating more than 8GB to JVM Heap is not recommended.
The main reason of this limitation is that Java Garbage Collector slow down when dealing with huge memory amount.
However if you consider Cassandra as a distributed system 1GB is not the real limitation.
(nodes*allocated_memory)/ReplicationFactor
allocated_memory is max 1GB -- So your table may contains many GB in memory allocated in different nodes.
I think that in future something will improve but dealing with 64GB in memory it could be a real problem when you need to flush data on disk. One more consideration that creates limitation: avoid TTL when working with In-Memory tables. TTL creates tombstones, a tombstone is not deallocated until the GCGraceSeconds period passes -- so considering a default value of 10 days each tombstone will keep the portion of memory busy and unavailable, possibly for long time.
HTH,
Carlo
We are currently doing some stress tests with ab tool. The single inserts are doing fine in cassandra. However, when it comes to batch inserts, I'm currently dealing with java out of memory error: Java Heap Space.
I have a virtual box machine with Ubuntu server 13.04 installed in it with 2G of memory
I don't know much about internal configuration in cassandra.
I'm just making a batch insert with size 100(100 insert in a BATCH).
After the I see this error, I have no longer cqlsh access, no nodetool access for almost 1 hour.
How can I fix this error in heavy loads ?
NOTE : It doesn't happen on single inserts with a HTTP POST requests.
NOTE : In my column family, I have a key with TimeUUIDType and the column values are int s and varchar s
UPDATE : Test results show that I didn't have anything wrong before 6000 requests. However, when it comes to 7000, the php code throws the following;
Error connecting to 127.0.0.1: Thrift\Exception\TTransportException: TSocket: timed out reading 4 bytes from 127.0.0.1:9160
Morever, cassandra logs the following in heavy loads;
WARN [ScheduledTasks:1] 2013-06-28 03:43:07,931 GCInspector.java (line 142)
Heap is 0.9231763795560355 full. You may need to reduce memtable and/or cache sizes.
Cassandra will now flush up to the two largest memtables to free up memory. Adjust
flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to
do this automatically
The batch doesn't sound like a large enough dataset to cause the memory problem, so this sounds like a problem with the JVM on the virtual machine. How much memory have you allocated to it?
You can check by starting JConsole (just type jconsole in the terminal / prompt) and viewing the 'Memory' tab, specifically the value under Max:
You can also get some solid details about what caused the crash thanks to the XX:+HeapDumpOnOutOfMemoryError parameter included in C*'s startup script, its basically a log file storing the stacktrace that caused the memory problem.
Typically the heap size is calculated automatically by the calculate_heap_sizes() function in cassandra-env.sh. You can however override the number that function generated by setting MAX_HEAP_SIZE to a different value. The same variable is used on lines 174 & 175 in cassandra-env.sh JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}" for setting the min and max heap size.
I would like to tune Cassandra for heavy read scenario with skinny rows (5-50 columns). The idea is to use row cache, and enable key cache just in case - when data is to large for row cache.
I have dual Intel Xeon server with 24GB RAM (3 in ring, two data centers - gives 6 machines in total)
Those are changes that I've made to default configuration:
cassandra-env.sh
#JVM_OPTS="$JVM_OPTS -ea"
MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="500M"
cassandra.yaml
# do not persist caches to disk
key_cache_save_period: 0
row_cache_save_period: 0
key_cache_size_in_mb: 512
row_cache_size_in_mb: 14336
row_cache_provider: SerializingCacheProvider
The idea it to dedicate 6GB to Cassandra JVM, 0.5GB to key cache (out of 6GB heap), and 14GB to row cache as off-heap.
OS has still 4GB which should be enough, since there is running only one JVM process and it should have overhead of max 2GB.
Is this setup optimal? Any hints?
Thanks,
Maciej
I'm using 1.1.6 version.
SerializingCacheProvider will save cache data at Native Heap area.
That area is not for GC inspect. so It will not be occurred GC.
Your row_cache_size_in_mb setting is for SerializingCache's reference object.
That reference is saved using FreeableMemory(It is in 1.1.x. but after 1.2, it changed).
In other words, Your real cache value is not calculated when calculating row_cache_size_in_mb.
At the result If you want to calculate row_cache_size_in_mb, try to set from minimal size.
In my case, when I set 500mb, each node was using 2G old gen.(in according to deal which data set)
Run the heapspace_calculator and use the suggested value as an initial heap configuration. Monitor your heap usage with "nodetool info".
Try to use short column names and merge columns when possible.
This setup works just fine - I've tested it.