I've put up a test cluster - four nodes. Severely underpowered(!) - ok CPU, only 2 gigs of ram, shared non ssd storage. Hey, it's test :)
I just kept it running for three days. No data going in or out..everything's just idle. Connected with opscenter.
This morning, we found one of the nodes went down around 2 am last night. The OS didn't go down (was responding to pings). The cassandra log around that time is:
INFO [MemtableFlushWriter:114] 2014-07-29 02:07:34,952 Memtable.java:360 - Completed flushing /var/lib/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-107-Data.db (686 bytes) for commitlog position ReplayPosition(segmentId=1406304454537, position=29042136)
INFO [ScheduledTasks:1] 2014-07-29 02:08:24,227 GCInspector.java:116 - GC for ParNew: 276 ms for 1 collections, 648591696 used; max is 1040187392
Next entry is:
INFO [main] 2014-07-29 09:18:41,661 CassandraDaemon.java:102 - Hostname: xxxxx
i.e. when we restarted the node through opscenter.
Does that mean it crashed on GC, or that GC finished and something else crashed? Is there some other log I should be looking at?
Note: In opscenter eventlog, we see this:
7/29/2014, 2:15am Warning Node reported as being down: xxxxxxx
I appreciate the nodes are underpowered, but for being completely idle, it shouldn't crash, should it?
Using 2.1.0-rc4 btw.
My guess is your node was shut down by the OOM killer. Because the Linux system over commits ram, when a heavy stress is on the system it may shut down applications to recover memory for the os. With 2G total ram this can happen very easily.
Related
i have a data of 100 nodes and 165 relations to be inserted into one keyspace. My grakn image have 4 core CPU and 3 GB Memory. While i try to insert the data i am getting an error [grpc-request-handler-4] ERROR grakn.core.server.Grakn - Uncaught exception at thread [grpc-request-handler-4] java.lang.OutOfMemoryError: Java heap space. It was noticed that the image used 346 % CPU and 1.46 GB RAM only. Also a finding for the issue in log was Caused by: com.datastax.oss.driver.api.core.AllNodesFailedException: Could not reach any contact point, make sure you've provided valid addresses (showing first 1, use getErrors() for more: Node(endPoint=/127.0.0.1:9042, hostId=null, hashCode=3cb85440): io.netty.channel.ChannelException: Unable to create Channel from class class io.netty.channel.socket.nio.NioSocketChannel)
Could you please help me with this?
It sounds like Cassandra ran out of memory - currently, Grakn spawns to processes: one for Cassandra and one for Grakn server. You can increase your memory limit with the following flags (unix):
SERVER_JAVAOPTS=-Xms1G STORAGE_JAVAOPTS=-Xms2G ./grakn server start
this would give the server 1GB, and the storage engine (cassandra) 2gb of memory, for instance. 3 GB may be a bit on the low end once your data grows so keep these flags in mind :)
we just added new second DC to our cassandra cluster with 7 nodes (per 5 jbods SSD), and after replication of the new DC we got periodical stucked compactions of Opscenter.rollup_state table. When this happens node goes to down state for other but it stays alive itself, nodetool drain also stucked on the node and only reboot of the node helps in this situation.
the log below already after restart of the node. below both nodes are stucked in this state.
DEBUG [CompactionExecutor:14] 2019-09-03 17:03:44,456 CompactionTask.java:154 - Compacting (a43c8d71-ce53-11e9-972d-59ed5390f0df) [/cass-db1/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-581-big-Data.db:level=0, /cass-db1/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-579-big-Data.db:level=0, ]
other node
DEBUG [CompactionExecutor:14] 2019-09-03 20:38:22,272 CompactionTask.java:154 - Compacting (a00354f0-ce71-11e9-91a4-3731a2137ea5) [/cass-db2/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-610-big-Data.db:level=0, /cass-db2/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-606-big-Data.db:level=0, ]
WARN [CompactionExecutor:14] 2019-09-03 20:38:22,273 LeveledCompactionStrategy.java:273 - Live sstable /cass-db2/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-606-big-Data.db from level 0 is not on corresponding level in the leveled manifest. This is not a problem per se, but may indicate an orphaned sstable due to a failed compaction not cleaned up properly.
what is the way to solve this problem.
I'm using apache cassandra-3.0.6 ,4 node cluster, RF=3, CONSISTENCY is '1', Heap 16GB.
Im getting info message in system.log as
INFO [SharedPool-Worker-1] 2017-03-14 20:47:14,929 NoSpamLogger.java:91 - Maximum memory usage reached (536870912 bytes), cannot allocate chunk of 1048576 bytes
don't know exactly which memory it mean and I have tried by increasing the file_cache_size_in_mb to 1024 from 512 in Cassandra.yaml file But again it immediatly filled the remaining 512MB increased and stoping the application recording by showing the same info message as
INFO [SharedPool-Worker-5] 2017-03-16 06:01:27,034 NoSpamLogger.java:91 - Maximum memory usage reached (1073741824 bytes), cannot allocate chunk of 1048576 bytes
please suggest if anyone has faced the same issue..Thanks!!
Bhargav
As far as I can tell with Cassandra 3.11, no matter how large you set file_cache_size_in_mb, you will still get this message. The cache fills up, and writes this useless message. It happens in my case whether I set it to 2GB or 20GB. This may be a bug in the cache eviction strategy, but I can't tell.
The log message indicates that the node's off-heap cache is full because the node is busy servicing reads.
The 536870912 bytes in the log message is equivalent to 512 MB which is the default file_cache_size_in_mb.
It is fine to see the occasional occurrences of the message in the logs which is why it is logged at INFO level but if it gets logged repeatedly, it is an indicator that the node is getting overloaded and you should consider increasing the capacity of your cluster by adding more nodes.
For more info, see my post on DBA Stack Exchange -- What does "Maximum memory usage reached" mean in the Cassandra logs?. Cheers!
Setup:
We have 3 nodes Cassandra cluster having data of around 850G on each node, we have LVM setup for Cassandra data directory (currently consisting 3 drives 800G + 100G + 100G) and have separate volume (non LVM) for cassandra_logs
Versions:
Cassandra v2.0.14.425
DSE v4.6.6-1
Issue:
After adding 3rd (100G) volume in LVM on each of the node, all the nodes went very high in disk I/O and they go down quite often, servers also become inaccessible and we need to reboot the servers, servers don't get stable and we need to reboot after every 10 - 15 mins.
Other Info:
We have DSE recommended server settings (vm.max_map_count, file descriptor) configured on all nodes
RAM on each node : 24G
CPU on each node : 6 cores / 2600MHz
Disk on each node : 1000G (Data dir) / 8G (Logs)
As I suspected, you are having throughput problems on your disk. Here's what I looked at to give you background. The nodetool tpstats output from your three nodes had these lines:
Pool Name Active Pending Completed Blocked All time blocked
FlushWriter 0 0 22 0 8
FlushWriter 0 0 80 0 6
FlushWriter 0 0 38 0 9
The column I'm concerned about is the All Time Blocked. As a ratio to completed, you have a lot of blocking. The flushwriter is responsible for flushing memtables to the disk to keep the JVM from running out of memory or creating massive GC problems. The memtable is an in-memory representation of your tables. As your nodes take more writes, they start to fill and need to be flushed. That operation is a long sequential write to disk. Bookmark that. I'll come back to it.
When flushwriters are blocked, the heap starts to fill. If they stay blocked, you will see the requests starting to queue up and eventually the node will OOM.
Compaction might be running as well. Compaction is a long sequential read of SSTables into memory and then a long sequential flush of the merge sorted results. More sequential IO.
So all these operations on disk are sequential. Not random IOPs. If your disk is not able to handle simultaneous sequential read and write, IOWait shoots up, requests get blocked and then Cassandra has a really bad day.
You mentioned you are using Ceph. I haven't seen a successful deployment of Cassandra on Ceph yet. It will hold up for a while and then tip over on sequential load. Your easiest solution in the short term is to add more nodes to spread out the load. The medium term is to find some ways to optimize your stack for sequential disk loads, but that will eventually fail. Long term is get your data on real disks and off shared storage.
I have told this to consulting clients for years when using Cassandra "If your storage has an ethernet plug, you are doing it wrong" Good rule of thumb.
I'm trying to add new node to our cluster (cassandra 2.1.11, 16 nodes, 32Gb ram, 2x3Tb hdd, 8core cpu, 1 datacenter, 2 racks, about 700Gb of data on each node). After start of new node, data (approx 600Gb total) from 16 existing nodes successfully transfered to new node and building of secondary indexes starts. The process of secondary indexes building looks normal, i see info about successfull completition of some secondary indexes building and some stream tasks:
INFO [StreamReceiveTask:9] 2015-11-22 02:15:23,153 StreamResultFuture.java:180 - [Stream #856adc90-8ddd-11e5-a4be-69bddd44a709] Session with /192.168.21.66 is complete
INFO [StreamReceiveTask:9] 2015-11-22 02:15:23,152 SecondaryIndexManager.java:174 - Index build of [docs.docs_ex_pl_ph_idx, docs.docs_lo_pl_ph_idx, docs.docs_author_login_idx, docs.docs_author_extid_idx, docs.docs_url_idx] complete
Curently 9 out of 16 streams successfully finished, according to logs. Everything looks fine, except one issue: this process already lasts 5 full days. There is no errors in logs, no anything suspicious, except extremely slow progress.
nodetool compactionstats -H
shows
Secondary index build ... docs 882,4 MB 1,69 GB bytes 51,14%
So there is some process of index building and it has some progress, but very slow, 1% in half a hour or so.
The only significant difference between the new node and any of existing nodes is the fact that cassandra java process has 21k open files, in contrast of 300 open files on any existing node, and 80k files in the data dir on new node in contrast of 300-500 files in the data dir on any existing node.
Is it normal? At this speed it looks i'll spend 16 weeks or so to add 16 more nodes.
I know this is an old question, but we ran into this exact issue with 2.1.13 using DTCS. We were able to fix it in our test environment by increasing memtable flush thresholds to 0.7 - which didn't make any sense to us, but may be worth trying.