Memsql: why memory usage is so high? - singlestore

Firstly, forgive my poor English.
My test environment :
[centos 6.5 16G RAM] * 2
MemSQL Community:MemSQL Ops Server Version:4.1.10 and MemSQL Version:4.1.2
my cluster picgure
leaf node,top picgure
Yeah, just one master and one leaf. I set maximum_memory = 10G on the leaf node.
Here is the question: Only 334M memory is used for table data, but the leaf node used 13.34G memory. Why is that?
By the way, my use scenario is just like "Speed Test", batch insert and batch delete.
Thank you very much!
memsql

As noted in the info tooltip next to "Memory", that figure includes cached memory used by MemSQL, that was allocated in the past but isn't being actively used. That's the most likely reason why it would be so much higher than your current table size.
To see where exactly MemSQL is using memory, click on the node in Ops and go to the page for the MemSQL node, there are more details on memory usage there.

Related

Cassandra CPU imbalance in Azure

We have a 30+ node Cassandra cluster (3.11.2) in 4 data centers. One of the centers consists of 8 nodes in Azure running on Standard DS12 v2 (4cpu, 28gb) nodes with a 500GB premium SSD drive. All in the same data center (central US).
We are seeing a dramatic CPU imbalance in the node activity when pushed to the max. We have a keyspace with about 200 million records, and we're running a process to check and refresh the records if necessary from another data stream.
What's happening, is we have 4 nodes that are running at 70-90% CPU compared to 15-25% of the other 4. The measurement of the CPU is being done in the nodes themselves, because Azure's own metrics is broken and never represents what is actually happening.
Digging into a pair of nodes (one low CPU and one high) the difference is the iowait% of the two. The data in the keyspace is balanced (within reason - they are all within 5% of another in record count and size). It looks like the number of reads is balanced, and even the read latency as reported by Cassandra is similar.
When I do an iostat compare of the nodes, the high CPU node is reporting a much higher (by 50 to 100%) rKB/s numbers... which is likely leading to the difference in iowait% time.
These nodes are 100% configured the same, running the same version of everything (OS, libraries, everything) that I can think to look. I cannot figure out why some nodes are deciding to do more disk reads that the others, resulting in the cluster as a whole slowing down.
Anybody have any suggestions on where I can look for differences?
The only thing that is a pattern, is the nodes that are slower are the 4 nodes that were added later in our expansion. We started with 4 nodes for a while and added 4 more when we needed space. All the appropriate repairs and other tasks required with node additions were done - the fact that the records and physical size of the data files on disk being equal should attest to that.
When we shut down our refresh process, the all the nodes settle down to a even 5% or less CPU across the board. No compaction or any other maintenance is happening that would indicate something different.
plz help... :)
Our final solution for this - to fix ONLY the unbalanced problem was to cleanup, full repair and compact. At that point the nodes are relatively equally used. We suspect expanding the cluster (adding nodes) may have left elements of data on the older nodes that were not compacted out based on regular compaction events.
We are still working to try to solve the load issue; but now at least all the nodes are feeling the same CPU crunch.

Cassandra and G1 Garbage collector stop the world event (STW)

We have a 6 node Cassandra Cluster under heavy utilization. We have been dealing a lot with garbage collector stop the world event, which can take up to 50 seconds in our nodes, in the meantime Cassandra Node is unresponsive, not even accepting new logins.
Extra details:
Cassandra Version: 3.11
Heap Size = 12 GB
We are using G1 Garbage Collector with default settings
Nodes size: 4 CPUs 28 GB RAM
The G1 GC behavior is the same across all nodes.
Any help would be very much appreciated!
Edit 1:
Checking object creation stats, it does not look healthy at all.
Edit 2:
I have tried to use the suggested settings by Chris Lohfink, here is the GC report:
Using CMS suggested settings
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTAtNDk=
Using G1 suggested settings
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTExLTE3
The behavior remains basically the same:
Old Gen starts to fill up.
GC can't clean it properly without a full GC and a STW event.
The full GC starts to take longer, until the node is completely unresponsive.
I'm going to get the cfstats output for maximum partition size and tombstones per read asap and edit the post again.
Have you looked at using Zing? Cassandra situations like these are a classic use case, as Zing fundamentally eliminates all GC-related glitches in Cassandra nodes and clusters.
You can see some details on the how/why in my recent "Understanding GC" talk from JavaOne (https://www.slideshare.net/howarddgreen/understanding-gc-javaone-2017). Or just skip to slides 56-60 for Cassandra-specific results.
Without knowing what your existing settings or possible data model problems, heres a guess of some conservative settings to use to try to reduce evacuation pauses from not having enough to-space (check gc logs):
-Xmx12G -Xms12G -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=500 -XX:-ReduceInitialCardMarks -XX:G1HeapRegionSize=32m
This should also help reduce the pause of the update remember set which becomes an issue and reducing humongous objects, by setting G1HeapRegionSize, which can become a problem depending on data model. Make sure -Xmn is not set.
12Gb with C* is probably more suited for using CMS for what its worth, you can get better throughput certainly. Just need to be careful of fragmentation over time with the rather large objects that can get allocated.
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=55 -XX:MaxTenuringThreshold=3 -Xmx12G -Xms12G -Xmn3G -XX:+CMSEdenChunksRecordAlways -XX:+CMSParallelInitialMarkEnabled -XX:+CMSParallelRemarkEnabled -XX:CMSWaitDuration=10000 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCondCardMark
Most likely theres an issue with data model or your under provisioned though.

memsql data inflation during recovery

During leaf recovery, data from our databases is inflated almost 50x the original size. If we run "show databases extended;" on each leaf, we can watch the size go from ~40Mb to ~2000Mb for each partition of the database (only during recovery). After recovery it returns to the original size.
We were hosting an 80Gb database, but memsql was unable to recover due to the huge amount of memory required. We had to remove this database in order for memsql to work again.
Is there a way to stop data inflation from crashing the recovery process? It seems like memsql should be able to host 80Gb.
We have one aggregator and 5 leaves, each with ~30Gb Memory and ~400Gb Disks.
Edit:
After upgrading to version 5.0.8 this problem dissappeared
You could try snapshotting the database (http://docs.memsql.com/docs/snapshot-database). This is likely to help if there is a significant amount of data being updated/deleted.

How to add new nodes in Cassandra with different machine configuration

I have 6 cassandra nodes in two datacenters with 16GB of memory and 1TB HD drive.
Now I am adding 3 more nodes with 32GB of memory. will these machines will cause overhead for existing machines ( May be in token distribution )? if so please suggest how to configure these machine to avoid those problems.
Thanks in advance.
The "balance" between nodes is best regulated using vnodes. If you recall (if you don't, you should read about it), the ring that Cassandra nodes form is actually consisted out of virtual nodes (vnodes). Each node in the ring has a certain portion of vnodes, which is set up in the Cassandra configuration on each node. Based on that number of vnodes, or rather the proportion between them, the amount of data going to those nodes is calculated. The configuration you are looking for is num_tokens. If you have similarly powerful machines, than an equal vnode number is available. The default is 256.
When adding a new, more powerful machine, you should assign a greater number of vnodes to it. How much? I think it's hard to tell. It's unwise to give it twice more, only be looking at the RAM, since those nodes will have twice as many data than the others. Than you might expect more IO operations on them (remember, you still have the same HDD) and CPU utilization (and the same CPU). You might want to take a look at this answer also.

Datastax Cassandra repair service weird estimation and heavy load

I have a 5 node cluster with around 1TB of data. Vnodes enabled. Ops Center version 5.12 and DSE 4.6.7. I would like to do a full repair within 10 days and use the repair service in Ops Center so that i don't put unnecessary load on the cluster.
The problem that I'm facing is that repair service puts to much load and is working too fast. It progress is around 30% (according to Ops Center) in 24h. I even tried to change it to 40 days without any difference.
Questions,
Can i trust the percent-complete number in OpsCenter?
The suggested number is something like 0.000006 days. Could that guess be related to the problem?
Are there any settings/tweaks that could be useful to lower the load?
You can use OpsCenter as a guideline about where data is stored and what's going on in the cluster, but it's really more of a dashboard. The real 'tale of the tape' comes from 'nodetool' via command line on server nodes such as
#shell> nodetool status
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack UN 10.xxx.xxx.xx 43.95 GB 256 33.3%
b1e56789-8a5f-48b0-9b76-e0ed451754d4 RAC1
What type of compaction are you using?
You've asked a sort of 'magic bullet' question, as there could be several factors in play. These are examples but not limited to:
A. Size of data, and the whole rows in Cassandra (you can see these with nodetool cf_stats table_size entries). Rows that result in a binary size of larger than 16M will be seen as "ultra" wide rows, which might be an indicator your schema in your data model needs a 'compound' or 'composite' row key.
B. Type of setup you have with respects to replication and network strategy.
C. Data entry point, how Cassandra gets it's data. Are you using Python? PHP? What inputs the data? You can get funky behavior from a cluster with a bad PHP driver (for example)
D. Vnodes are good, but can be bad. What version of Cassandra are you running? You can find out via CQLSH with cqlsh -3 then type 'show version'
E. Type of compaction is a big killer. Are you using SizeTieredCompaction or LevelCompaction?
Start by running 'nodetool cfstats' from command line on the server any given node is running on. The particular areas of interest would be (at this point)
Compacted row minimum size:
Compacted row maximum size:
More than X amount of bytes in size here on systems with Y amount of RAM can be a significant problem. Be sure Cassandra has enough RAM and that the stack is tuned.
The default configuration for performance on Cassandra should normally be enough, so the next step would be to open a CQLSH interface to the node with 'cqlsh -3 hostname' and issue the command 'describe keyspaces'. Take the known key space name you are running and issue 'describe keyspace FOO' and look at your schema. Of particular interest are your primary keys. Are you using "composite rowkeys" or "composite primary key"? (as described here: http://www.datastax.com/dev/blog/whats-new-in-cql-3-0 )If not, you probably need to depending on read/write load expected.
Also check how your initial application layer is inserting data into Cassandra? Using PHP? Python? What drivers are being used? There are significant bugs in Cassandra versions < 1.2.10 using certain Thrift connectors such as the Java driver or the PHPcassa driver so you might need to upgrade Cassandra and make some driver changes.
In addition to these steps also consider how your nodes were created.
Note that migration from static nodes to virtual nodes (or vnodes) has to be mitigated. You can't simply switch configs on a node that's already been populated. You will want to check your initial_token: settings in /etc/cassandra/cassandra.yaml. The questions I ask myself here are "what initial tokens are set? (no initial tokens for vnodes) were the tokens changed after the data was populated?" For static nodes which I typically run, I calculate them using a tool like: [http://www.geroba.com/cassandra/cassandra-token-calculator/] as I've run into complications with vnodes (though they are much more reliable now than before).

Resources