We are currently in the process of deploying a larger Cassandra cluster and looking for ways to estimate the best size of the key cache. Or more accurately looking for a way of finding out the size of one row in the key cache.
I have tried tying into the integrated metrics systems using graphite, but I wasn't able to receive any clear answer. Further I tried putting my own debugging code into org.cassandra.io.sstable, but this neither yielded any concrete results.
We are using Cassandra 1.20.10, but are there any fool proof solutions to getting the size of one row in the key cache?
With best regards,
Ben
Check out jamm. Its a library used for measuring the size of an object in memory.
You need to add -javaagent:"/path/to/jamm.jar" to your startup parameters but cassandra is configured to start with jamm, so if you change internal cassandra code, this is already done for you.
To size of objects (in bytes):
MemoryMeter meter = new MemoryMeter();
meter.measureDeep(object);
Measure deep is a more costly but much more accurate measurement of an object's memory size.
For estimation of key size, let's assume you intended to store 1 million keys in cache, each key of length 60 bytes on an average. There will be some overhead to store the key, lets say it is 40 bytes that means key size per row = 100 bytes.
Since we need to cache 1 million keys
total key cache = 1 mn * 100 = 100 Mbytes
perform this for each CF in your keyspace.
Related
We found a delay of 2 hrs in starting the Cassandra service with WARN in the system.log file for one table.
Please find the warnings in a few of the below servers:
WARN [SSTableBatchOpen:5] 2022-08-29 10:01:13,732 IndexSummaryBuilder.java:115 - min_index_interval of 128 is too low for 5511836446 expected keys of avg size 64; using interval of 185 instead
Aaron's answer pointed to the right code: Since you have a LOT of keys in a single SSTable, the default min_index_interval is not efficient anymore and Cassandra recomputes it. This then triggers a rewrite of the index summary during startup, and in this very case it's taking a very long time.
Aaron's suggestion of using sstablesplit would be a temporary fix as eventually they'll get compacted again and you'll be back to the same situation.
Changes will have to be made in production to remediate anyway, and changing the min_index_interval seems easy enough as a fix, while really being the only thing to do that won't require deep schema changes to reduce the number of partitions per sstable (or compaction strategy changes which could have hard to predict performance impacts).
Note that changing the min_index_interval will not trigger the rewrite of the sstables straight away. Only newly written sstables will get the new setting, which can be (and should be) forced onto all the sstables using nodetool upgradesstables -a.
On a side note, there seem to be a confusion in the comments between the partition index and secondary indexes (indices?). They are two distinct things and the reported warning message is referring to the partition index summary, not secondary indexes.
It's difficult to discern a question from the above, so I'll assume you're wondering why Cassandra is taking 2 hours to start up.
If you look in the source of Cassandra 3.0, there are some clues given in the IndexSummaryBuilder class. Specifically, the calculations just prior to the warning:
if (maxExpectedEntriesSize > Integer.MAX_VALUE)
{
// that's a _lot_ of keys, and a very low min index interval
int effectiveMinInterval = (int) Math.ceil((double)(expectedKeys * expectedEntrySize) / Integer.MAX_VALUE);
maxExpectedEntries = expectedKeys / effectiveMinInterval;
maxExpectedEntriesSize = maxExpectedEntries * expectedEntrySize;
assert maxExpectedEntriesSize <= Integer.MAX_VALUE : maxExpectedEntriesSize;
logger.warn("min_index_interval of {} is too low for {} expected keys of avg size {}; using interval of {} instead",
minIndexInterval, expectedKeys, defaultExpectedKeySize, effectiveMinInterval);
The comment about "that's a _lot_ of keys" is a big one, and 5,511,836,446 keys is certainly a lot.
The calculations shown in the method above are driven by the number of keys and sampling intervals for a particular SSTable, to build the Partition Summary into RAM. You can see the Partition Summary on the right side of the diagram showing Cassandra's read path below:
Based on this, I would hypothesize that one particular table's SSTable file(s) is getting too big to handle efficiently. Have a look at the underlying data directory for that table. You may have to split some of those files with tools/bin/sstablesplit to make them more manageable.
What's the best/reliable method of estimating the space required in Cassandra. My Cluster consists of 2 nodes(RHEL 6.5) on Cassandra 3.11.2. I want to estimate the average size each row in every table will take in my database so that I can plan accordingly. I know about some methods like nodetool status command, du -sh command used in the data directory, nodetool cfstats etc. However each of these are giving different values and hence I'm not sure which one should I use in my calculations.
Also I found out that apart from the actual data, various metadata is also stored by Cassandra in various system specific tables like size_estimates, sstable_activity etc. Does this metadata also keep on increasing with the data? What's the ratio of space occupied by such metadata and the space occupied by the actual data in the database? Also what particular configurations in YAML(if any) should I keep in mind which might affect the size of the data.
A similar question was asked before but I wasn't satisfied by the answer.
If you are expecting 20 GB of data per day, here is the calculation.
1 Day = 20 GB, 1 Month = 600 GB, 1 Year = 7.2 TB, so your raw data size for one year is 7.2 TB with replication factor of 3 it would be around 21.6 TB of data for one year.
Taking compaction into consideration and your use case being write heavy,if you go with size tiered compaction. you would need twice the space of your raw data.
So you would need around 43 TB to 45 TB of disk space.
I am trying to migrate (copy) 35 million documents (which is a standard amount, not too big) between couchbase to elasticsearch.
My elasticsearch (version 1.3) cluster composed from 3 A3 (4 cores, 7 GB memory) CentOS Severs on Microsoft Azure (each server equals to a large server on Amazon)..
I used "timing data flow" indexing to store the docuemnts. each index represents a month and composed by 3 shards and 2 replicas.
when i start the migration script i see that the insertion time is becoming very slow (about 10 documents per second) and the load average of each server in the cluster jumping over than 1.5.
In addition, the JVM memory is being increased almost to 100% while the cpu shows 20% and the IOps shows 20 at max.
(i used Marvel CNC to get all these data)
Does anyone faced these kind of indexing problems in elasticsearch?
I would like to know if there are any parameters that i should be aware about to extend java memory?
is my cluster specifications good enough to handle 100 indexing per second.
is the indexing time depends on how big is the index? and should it be that slow?
Thnx Niv
I am quoting an answer I got in google group (link)
A couple of suggestions:
Disable replicas before large amounts of inserts (set replica count to 0), and only enable it afterwards again.
Use batching, actual batch size would depends on many factors (doc sizes, network, instances strengths)
Follow ES's advice on node setup, e.g. allocate 50% of the available memory size to the Java heap of ES, don't run anything else
on that machine, and disable swappiness.
Your index is already sharded, try spreading it out to 3 different servers instead of having them on one server ("virtual shards"). This
will help fan out the indexing load.
If you don't specify the document IDs yourself, make sure you use the latest ES, there's a significant improvement there in the ID
generation mechanism which could help speeding up things.
I applied points 1 & 3 and it seems that the problems solved :)
now i am indexing in rate of 80 docs per second and the load avg is low (0.7 at max)
I have to give the credit to Itamar Syn-Hershko that posted this reply.
Here is situation
I am trying to fetch around 10k keys from CF.
Size of cluster : 10 nodes
Data on node : 250 GB
Heap allotted : 12 GB
Snitch used : property snitch with 2 racks in same Data center.
no. of sstables for cf per node : around 8 to 10
I am supercolumn approach.Each row contains around 300 supercolumn which in terms contain 5-10 columns.I am firing multiget with 10k row keys and 1 supercolumn.
When fire the call 1st time it take around 30 to 50 secs to return the result.After that cassandra serves the data from key cache.Then it return the result in 2-4 secs.
So cassandra read performance is hampering our project.I am using phpcassa.Is there any way I can tweak cassandra servers so that I can get result faster?
Is super column approach affects the read performance?
Use of super columns is best suited for use cases where the number of sub-columns is a relatively small number. Read more here:
http://www.datastax.com/docs/0.8/ddl/column_family
Just in case you haven't done this already, since you're using phpcassa library, make sure that you've compiled the Thrift library. Per the "INSTALLING" text file in the phpcassa library folder:
Using the C Extension
The C extension is crucial for phpcassa's performance.
You need to configure and make to be able to use the C extension.
cd thrift/ext/thrift_protocol
phpize
./configure
make
sudo make install
Add the following line to your php.ini file:
extension=thrift_protocol.so
After doing much of RND about this stuff we figured there is no way you can get this working optimally.
When cassandra is fetching these 10k rows 1st time it is going to take time and there is no way to optimize this.
1) However in practical, probability of people accessing same records are more.So we take maximum advantage of key cache.Default setting for key cache is 2 MB. So we can afford to increase it to 128 MB with no problems of memory.
After data loading run the expected queries to warm up the key cache.
2) JVM works optimally at 8-10 GB (Dont have numbers to prove it.Just observation).
3) Most important if you are using physical machines (not cloud OR virtual machine) then do check out the disk scheduler you are using.Set it NOOP which is good for cassandra as it reads all keys from one section reducing disk header movement.
Above changes helped to bring down time required for querying within acceptable limits.
Along with above changes if you have CFs which are small in size but frequently accessed enable row caching for it.
Hope above info is useful.
I'm completely new to using cassandra.. is there a maximum key size and would that ever impact performance?
Thanks!
The key (and column names) must be under 64K bytes.
Routing is O(N) of the key size and querying and updating are O(N log N). In practice these factors are usually dwarfed by other overhead, but some users with very large "natural" keys use their hashes instead to cut down the size.
http://en.wikipedia.org/wiki/Apache_Cassandra claims (apparently incorrectly!) that:
The row key in a table is a string
with no size restrictions, although
typically 16 to 36 bytes long
See also:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/value-size-is-there-a-suggested-limit-td4959690.html which suggests that there is some limit.
Clearly, very large keys could have some network performance impact if they need to be sent over the Thrift RPC interface - and they would cost storage. I'd suggest you try a quick benchmark to see what impact it has for your data.
One way to deal with this might be to pre-hash your keys and just use the hash value as the key, though this won't suit all use cases.