How to limit the memory used by voltdb - voltdb

I didn't see any properties for limiting the size of memory used by voltdb,when I config voltdb, so I want to ask:
Can I config the max size of memory used by voltdb? If I can, how to config?Where is the configuration file? Can voltdb work in mixed mode of memory and disk?

You can set VoltDB to pause when a certain memory limit is reached. You do this in the configuration file that you start VoltDB with, specifically within the resource monitor section. It would look something like this:
<systemsettings>
<resourcemonitor frequency="30">
<memorylimit size="70%" alert="60%"/>
</resourcemonitor>
</systemsettings>
You can read more about it in the VoltDB documentation: https://docs.voltdb.com/AdminGuide/MonitorSysResource.php
As for your other question, VoltDB doesn't work in a mixed mode of memory and disk. In fact, it is highly recommended that swapping be disabled. You should review the memory management optimizations listed here:
https://docs.voltdb.com/AdminGuide/adminmemmgt.php
Full disclosure: I work at VoltDB.

Related

How do I force a memtable flush for every write?

I would like to flush the memtable to disk after every update/write operation (or in any case, as frequently as possible). My sole purpose is to stress test the underlying disk using a production-level database software.
It seems like memtable_cleanup_threshold is the way to go, but it's deprecated, is there another way to accomplish this? How about memtable_heap_space_in_mb and memtable_offheap_space_in_mb? I'm no Java Programmer, which one should I tune without compromising the rest of the functionalities?
You can definitely try setting both memtable_heap_space_in_mb and memtable_offheap_space_in_mb to a really low value.
Additionally, you can also configure commitlog_total_space_in_mb. If the occupied space goes above this property, it will cause more frequent flushes.
But since your goal is to stress-test the disk, my suggestion is to do the following:
Configure both data_file_directories and commitlog_directory to be mounted on the same disk.
Use NoSQLBench to stress test with heavy writes.
This way, you don't have to muck around with the memtable settings. Have a look at the NoSQLBench Beginner's Guide blog post for details. Cheers!
You also could just trigger a flush on the table by issuing nodetool flush or run the according JMX op after each write. However, Cassandra stores data distributed over many nodes and a flush is always a node bound operation. To find out which nodes you need to flush you would need to query the list of endpoints the written data is stored on (also available via JMX or with nodetool), otherwise you would need to flush all nodes.
While this is fine for testing purposes I would not recommend that for production.

transaction_buffer Maximum value

What is the maximum value that one can set for transaction_buffer inside memsql cnf? I assume there is a correlation with RAM allocated on the server. My leaves have 32G each and at the moment we have transaction_buffer set to 0. We are passed designing phase on our cluster and we would like to do some performance tuning and one parameter that needs to be set up accordingly is this one.
The transaction_buffer size is an amount of memory reserved per database partition - i.e. each leaf node will need transaction_buffer size * partitions per leaf * number of databases memory. The default is 128 MB and this should be sufficient generally.
Basically, it's a balancing act - data in transaction_buffer will exist in memory before being written to disk. A transaction_buffer of 0 may save you some memory, but it's not taking full advantage of the speed of being in memory. If you have a lot of databases that are updated infrequently a low transaction_buffer may be the right balance as it is a per database cost (keeping in mind that each partition is a database itself).
Transaction_buffer may also be valuable for you as a "get out of jail free" card - since if your workload becomes more and more memory intensive it's possible to get into a situation where your OS is killing MemSQL too frequently to reduce memory consumption. Once you get stuck in a vicious cycle like that, restarting with a reduced transaction buffer can reduce memory overhead enough to keep the system from being OOM-killed long enough to troubleshoot and correct the issue on your end.
Eventually, it might become adaptive, and you'll be left without that easy way to get some wiggle-room. Which is why it is essential to make sure the maximum_memory is low enough that your system doesn't begin to OOM kill processes. https://docs.memsql.com/docs/memory-management

What configuration options are worth reviewing in Cassandra if one is using SSDs instead of spinning disk?

I just replaced a Cassandra cluster with brand new SSDs instead of spinning disks. What configuration options would you recommend that I review? Feel free to post links to blog posts/presentations if you know of any (yes, I've Googled).
Based on a quick look through the cassandra.yaml, there are three that I see right away:
memtable_flush_writers : It is set to 2 by default, but the text above the setting indicates that "If your data directories are backed by SSD, you should increase this to the number of cores."
trickle_fsync : Forces the OS to run an fsync to flush the dirty buffers during sequential writes. The text above the setting indicates that setting it to true is "Almost always a good idea on SSDs; not necessarily on platters."
concurrent_compactors : The number of simultaneous compactions allowed. Just like the memtable_flush_writers setting, the text above indicates that SSD users should set it to the number of system cores.
Also, according to the DataStax documentation on Selecting hardware for enterprise implementations:
Unlike spinning disks, it's all right to store both commit logs and SSTables are on the same mount point.

Cassandra 1.2.x - RAM heap and JRE parameter - questions for better understanding

May I ask some questions, to get a better understand of Cassandra and JRE and RAM configuration (referring to V1.2.5 and documentation of May 2013):
The current documentation and lots of google research still left some open questions to me.
Interested in using it as simple embedded datastore for a few hundred GB of data on 6 machines distributed in 3 locations, that also run a java application.
1) Cassandra's stack sizing
The Windows .bat file has a default set to 1GB, which I think is a bug, the Linux cassandra-env.sh defines 180k. Is this a "just leave it with 180k, fire and forget about stack size" thing?
2) Cassandra's RAM usage
When using JNA, system RAM is basically split into 3 main areas:
Cassandra uses the assigned Java heap
Cassandra uses exra RAM obtained by JNA
Operation system uses leftovers of RAM as disk cache
Current documentation basically only recommends: "don't set Java heap size higher than 8GB"
Is this info still up to date? (It could be that this statement was from a time, when CMS Garbage collector wasn't included in Java 1.6)
How do I limit the JNA heap (is it the 'row_cache_size_in_mb' parameter?)
what is a good layout rule of thumb for the 3 RAM areas (Java HEAP, JNA extra HEAP, OS CACHE) on a dedicated system in Cassandra 1.2.x ?
when having lots of RAM (128GB)?
when having few RAM (4GB)?
(I know about the heap size calculator, this question is more for theoretical understanding and up to date info)
3) Java Runtime
Why is the recommendation still to use Java 1.6 and not Java 1.7.
Is this a "maturity" operational recommendation?
Are there specific problems known from the near past?
Or just waiting a bit more until more people report flawless operation with 1.7?
4) Embedding Cassandra
The "-XX:MaxTenuringThreshold=1" in the C* start scripts is a slight hint to separate Cassandra from application code, which usually lives better with higher threshold. On the other hand the "1" might also be a bit outdated -
Is this setting still that important? (since now using CMS Garbage collector and JNA-RAM and maybe even using Java1.7?)
1) Are you looking at Xmx? I don't see Xss at all in cassandra.bat
2) Mostly correct. Cassandra doesn't actually require JNA for off-heap allocation for a long time now (1.0 IIRC).
You don't want heap larger than 8GB because CMS and G1 still choke and cause STW pauses eventually. Short explanation: fragmentation. Longer: http://www.scribd.com/doc/37127094/GCTuningPresentationFISL10
Cassandra does off-heap allocation for row cache and for storage engine metadata. The former is straightforward to tune; the latter is not. Basically, you need to have about 20GB of ram per TB of compressed data, end of story. Things you can do to reduce memory usage include disabling compression, reducing bloom filter accuracy, and increasing index_interval. All of which are going to reduce your performance, other things being equal.
3) Maturity. We're late adopters; we have less problems that way. Cassandra 2.0 will require Java7.
4) This is not outdated.

Cassandra in-memory configuration

We currently evaluate the use of Apache Cassandra 1.2 as a large scale data processing solution. As our application is read-intensive and to provide users with the fastest possible response time we would like to configure Apache Cassandra to keep all data in-memory.
Is it enough to set the storage option caching to rows_only on all column families and giving each Cassandra node sufficient memory to hold its data portion? Or are there other possibilities for Cassandra ?
Read performance tuning is much complex than write. Base on my experiences, there are some factors you can take into consideration. Some point of view are not memory related, but they also help improve the read performance.
1.Row Cache: avoid disk hit, but enable it only if the rows are not updated frequently. You could also enable the off-heap row cache to reduce the JVM heap usage.
2.Key Cache: enable by default, no need to disable it. It avoid disk searching when row cache is not hit.
3.Reduce the frequency of memtable flush: adjust memtable_total_space_in_mb, commitlog_total_space_in_mb, flush_largest_memtables_at
4.Using LeveledCompactionStrategy: avoid a row spread across multiple SSTables.
DataStax has added an in-memory computing feature in the latest version of its Apache Cassandra-based NoSQL database, as part of a drive to increase the performance of online applications.
Reference :
http://www.datastax.com/2014/02/welcome-to-datastax-enterprise-4-0-and-opscenter-4-1

Resources