cassandra java heap space issues with phpcassa - cassandra

We are currently doing some stress tests with ab tool. The single inserts are doing fine in cassandra. However, when it comes to batch inserts, I'm currently dealing with java out of memory error: Java Heap Space.
I have a virtual box machine with Ubuntu server 13.04 installed in it with 2G of memory
I don't know much about internal configuration in cassandra.
I'm just making a batch insert with size 100(100 insert in a BATCH).
After the I see this error, I have no longer cqlsh access, no nodetool access for almost 1 hour.
How can I fix this error in heavy loads ?
NOTE : It doesn't happen on single inserts with a HTTP POST requests.
NOTE : In my column family, I have a key with TimeUUIDType and the column values are int s and varchar s
UPDATE : Test results show that I didn't have anything wrong before 6000 requests. However, when it comes to 7000, the php code throws the following;
Error connecting to 127.0.0.1: Thrift\Exception\TTransportException: TSocket: timed out reading 4 bytes from 127.0.0.1:9160
Morever, cassandra logs the following in heavy loads;
WARN [ScheduledTasks:1] 2013-06-28 03:43:07,931 GCInspector.java (line 142)
Heap is 0.9231763795560355 full. You may need to reduce memtable and/or cache sizes.
Cassandra will now flush up to the two largest memtables to free up memory. Adjust
flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to
do this automatically

The batch doesn't sound like a large enough dataset to cause the memory problem, so this sounds like a problem with the JVM on the virtual machine. How much memory have you allocated to it?
You can check by starting JConsole (just type jconsole in the terminal / prompt) and viewing the 'Memory' tab, specifically the value under Max:
You can also get some solid details about what caused the crash thanks to the XX:+HeapDumpOnOutOfMemoryError parameter included in C*'s startup script, its basically a log file storing the stacktrace that caused the memory problem.
Typically the heap size is calculated automatically by the calculate_heap_sizes() function in cassandra-env.sh. You can however override the number that function generated by setting MAX_HEAP_SIZE to a different value. The same variable is used on lines 174 & 175 in cassandra-env.sh JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}" for setting the min and max heap size.

Related

How to restrict Cassandra to fixed memory

We are using cassandra in order to collect the data from thingsboard. The memory it started with was 4GB (after executing the systemctl status for cassandra) and after 15 hours it has reached up to 9.3GB.
I want to know why is there this much increase in memory and is there any way to control it or to restrict it to use fixed amount of memory without the data being lost.
Check this for setting max heap size used . But tune cassandra gc properly when you change this.

Cassandra - how to disable memtable flush

I'm running Cassandra with a very small dataset so that the data can exist on memtable only. Below are my configurations:
In jvm.options:
-Xms4G
-Xmx4G
In cassandra.yaml,
memtable_cleanup_threshold: 0.50
memtable_allocation_type: heap_buffers
As per the documentation in cassandra.yaml, the memtable_heap_space_in_mb and memtable_heap_space_in_mb will be set of 1/4 of heap size i.e. 1000MB
According to the documentation here (http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold), the memtable flush will trigger if the total size of memtabl(s) goes beyond (1000+1000)*0.50=1000MB.
Now if I perform several write requests which results in almost ~300MB of the data, memtable still gets flushed since I see sstables being created on file system (Data.db etc.) and I don't understand why.
Could anyone explain this behavior and point out if I'm missing something here?
One additional trigger for memtable flushing is commitlog space used (default 32mb).
http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsMemtableThruput.html
http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__commitlog_total_space_in_mb
Since Cassandra should be persistent, it should do writes to disk to come up with the data after the node failing. If you don't need this durability, you can use any other memory based databases - redis, memcache etc.
Below is the response I got from Cassandra user group, copying it here in case someone else is looking for the similar info.
After thinking about your scenario I believe your small SSTable size might be due to data compression. By default, all tables enable SSTable compression.
Let go through your scenario. Let's say you have allocated 4GB to your Cassandra node. Your memtable_heap_space_in_mb and
memtable_offheap_space_in_mb will roughly come to around 1GB. Since you have memtable_cleanup_threshold to .50 table cleanup will be triggered when total allocated memtable space exceeds 1/2GB. Note the cleanup threshold is .50 of 1GB and not a combination of heap and off heap space. This memtable allocation size is the total amount available for all tables on your node. This includes all system related keyspaces. The cleanup process will write the largest memtable to disk.
For your case, I am assuming that you are on a single node with only one table with insert activity. I do not think the commit log will trigger a flush in this circumstance as by default the commit log has 8192 MB of space unless the commit log is placed on a very small disk.
I am assuming your table on disk is smaller than 500MB because of compression. You can disable compression on your table and see if this helps get the desired size.
I have written up a blog post explaining memtable flushing (http://abiasforaction.net/apache-cassandra-memtable-flush/)
Let me know if you have any other question.
I hope this helps.

Cassandra Mutation too Large, for small insert

I'm getting these errors:
java.lang.IllegalArgumentException: Mutation of 16.000MiB is too large for the maximum size of 16.000MiB
in Apache Cassandra 3.x. I'm doing inserts of 4MB or 8MB blobs, but not anything greater than 8MB. Why am I hitting the 16MB limit? Is Cassandra batching up multiple writes (inserts) and creating a "mutation" that is too large? (If so, why would it do that, since the configured limit is 8MB?)
There is little documentation on mutations -- except to say that a mutation is an insert or delete. How can I prevent these errors?
you can increase the commit log size to 64 mb in cassandra.yaml
commitlog_segment_size_in_mb: 64
By default the commitLog size is 32 mb.
By design intent the maximum allowed segment size is 50% of the configured commit_log_segment_size_in_mb. This is so Cassandra avoids writing segments with large amounts of empty space.
you should investigate why the write size has suddenly increased. If it is not expected i.e. due to a planned change then it may well be a problem with the client application that needs further inspection.

Cassandra - Understanding Java Heap Behavior (depending on internet connection?)

Cassandra - Understanding Java Heap Behavior (depending on internet connection?)
We are running tests on cassandra 1.2.5. and cannot understand fully the behavior of Java Heap.
The same test have a different behavior depending on the location within the company network from which it was started. Is it possible that the internet connection has an effect on cassandras Java Heap behavior. It seems for us like decreasing the upload speed has changed the behavior.
Picture taken after a test on the most worst connection of a Java Heap overflow. VPN, low upload speed.
Picture taken after a test on the best connection.
We have made a third test with an average connection and have a bahaviour between the both shown o the pictures. WLAN.
Our configuration are:
We use cassandra 1.2.5 for first doesn't change much the original setting in the cassandra.yaml, except of:
key_cache_size_in_mb: 0
we just set seeds, listen_address and rpc_address
cassandra-sh.env was not changed. So cassandra get nearly ~2G of our 7.9G RAM.
We are using the JDBC driver.
We are just testing with write load.
The test is pushing something about 200 kb per second.
We have no wide rows.
Except of "pretty normal insert", we are updating a variable every write.
Some ideas to understand that issue will really help us...

Cassandra 1.1 - Setup for 24GB RAM and Row Cache

I would like to tune Cassandra for heavy read scenario with skinny rows (5-50 columns). The idea is to use row cache, and enable key cache just in case - when data is to large for row cache.
I have dual Intel Xeon server with 24GB RAM (3 in ring, two data centers - gives 6 machines in total)
Those are changes that I've made to default configuration:
cassandra-env.sh
#JVM_OPTS="$JVM_OPTS -ea"
MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="500M"
cassandra.yaml
# do not persist caches to disk
key_cache_save_period: 0
row_cache_save_period: 0
key_cache_size_in_mb: 512
row_cache_size_in_mb: 14336
row_cache_provider: SerializingCacheProvider
The idea it to dedicate 6GB to Cassandra JVM, 0.5GB to key cache (out of 6GB heap), and 14GB to row cache as off-heap.
OS has still 4GB which should be enough, since there is running only one JVM process and it should have overhead of max 2GB.
Is this setup optimal? Any hints?
Thanks,
Maciej
I'm using 1.1.6 version.
SerializingCacheProvider will save cache data at Native Heap area.
That area is not for GC inspect. so It will not be occurred GC.
Your row_cache_size_in_mb setting is for SerializingCache's reference object.
That reference is saved using FreeableMemory(It is in 1.1.x. but after 1.2, it changed).
In other words, Your real cache value is not calculated when calculating row_cache_size_in_mb.
At the result If you want to calculate row_cache_size_in_mb, try to set from minimal size.
In my case, when I set 500mb, each node was using 2G old gen.(in according to deal which data set)
Run the heapspace_calculator and use the suggested value as an initial heap configuration. Monitor your heap usage with "nodetool info".
Try to use short column names and merge columns when possible.
This setup works just fine - I've tested it.

Resources