280K ./saved_caches
112G ./data
215G ./commitlog
326G .
How can this be explained? Is there a bug in cassandra 2.0.6?
It seems I accidentally had disabled commitlog_total_space_in_mb. I've reenabled it and restarted cassandra. It took a long time due to the huge log files but it started and cleaned up the extra log files.
Related
I'm using the latest Spark 3.3.0 but still got the exception: java.io.FileNotFoundException: File does not exist: /LOG_DIR/application_1657344020931_1400038_1.inprogress
which contradicts https://github.com/apache/spark/pull/29350 , how could it be?
In LOG_DIR there are 70k eventlogs.
One SHS is running on nodeA with Spark2.4.5.
In order to resolve the exception,I run a new SHS one node B with Spark 3.3.0 ,but still got the exception.
The basic problem is that some finished spark applications randomly cannot be seen from the SHS.I think the FileNotFoundException is the main cause and the PR is aimed to resolve it.
I re-checked the whole FsHistoryProvider.checkForLogs(...) and found that I misunderstood the PR.
My question: SHS always shows less (COMPLETED) apps than the number of event logs(exculde inprogress) in my LOG_DIR
This situation will always happens because when SHS found an untracked log eg. aaa._inprogress, at the moment checkForLogs is writing the aaa._inprogress to LevelDB, Spark renames it to aaa, thus FileNotFoundException happens and the loop is aborted.
So the rest of the logs which may be 10% of all apps in the loop are missed. That's why my SHS less so many than the LOG_DIR.
THE PR above skip such problematic aaa event logs in this loop, but next time, since aaa is not in LevelDB, it will be tracked.
SHS still show less than the num of eventlogs, but just 40~70 apps in my cluster and these apps will show soon
we are using Cassandra 1.2.9 + BAM 2.5 for API analysis.
We have scheduled a job to do cassandra data purge. This data purge job is divived into three steps.
The 1st step is to query the original column family and then insert them into the temporary columnFamily_purge.
The 2nd step is to delete from the orinal column family by adding tombstone,and insert the data from columnFamily_purge into the original column family.
The 3rd step is to drop the temporary columnFamily_purge
The 1st works well, but the 2nd step frequently crashes the cassandra servers during Hadoop map tasks,which makes Cassandra unavailable.The exception stacktrack is as follows:
2016-08-23 10:27:43,718 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hadoop for UID 47338 from the native implementation
2016-08-23 10:27:43,720 WARN org.apache.hadoop.mapred.Child: Error running child
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.
at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromLBPolicy(HConnectionManager.java:390)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:244)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.deleteRow(AbstractColumnFamilyTemplate.java:173)
at org.wso2.carbon.bam.cassandra.data.archive.mapred.CassandraMapReduceRowDeletion$RowKeyMapper.map(CassandraMapReduceRowDeletion.java:246)
at org.wso2.carbon.bam.cassandra.data.archive.mapred.CassandraMapReduceRowDeletion$RowKeyMapper.map(CassandraMapReduceRowDeletion.java:139)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Could someone help on this what may lead to this problem? Thanks!
This can happen due to 3 reasons.
1) Cassandra servers are down. I don't thing this is the case in your setup.
2) Network issues
3) The load is higher than what cluster can handle.
How do you delete data? Using a hive script?
After I increase the number of open files and max thread number,the problem is gone.
I'm trying to add new node to our cluster (cassandra 2.1.11, 16 nodes, 32Gb ram, 2x3Tb hdd, 8core cpu, 1 datacenter, 2 racks, about 700Gb of data on each node). After start of new node, data (approx 600Gb total) from 16 existing nodes successfully transfered to new node and building of secondary indexes starts. The process of secondary indexes building looks normal, i see info about successfull completition of some secondary indexes building and some stream tasks:
INFO [StreamReceiveTask:9] 2015-11-22 02:15:23,153 StreamResultFuture.java:180 - [Stream #856adc90-8ddd-11e5-a4be-69bddd44a709] Session with /192.168.21.66 is complete
INFO [StreamReceiveTask:9] 2015-11-22 02:15:23,152 SecondaryIndexManager.java:174 - Index build of [docs.docs_ex_pl_ph_idx, docs.docs_lo_pl_ph_idx, docs.docs_author_login_idx, docs.docs_author_extid_idx, docs.docs_url_idx] complete
Curently 9 out of 16 streams successfully finished, according to logs. Everything looks fine, except one issue: this process already lasts 5 full days. There is no errors in logs, no anything suspicious, except extremely slow progress.
nodetool compactionstats -H
shows
Secondary index build ... docs 882,4 MB 1,69 GB bytes 51,14%
So there is some process of index building and it has some progress, but very slow, 1% in half a hour or so.
The only significant difference between the new node and any of existing nodes is the fact that cassandra java process has 21k open files, in contrast of 300 open files on any existing node, and 80k files in the data dir on new node in contrast of 300-500 files in the data dir on any existing node.
Is it normal? At this speed it looks i'll spend 16 weeks or so to add 16 more nodes.
I know this is an old question, but we ran into this exact issue with 2.1.13 using DTCS. We were able to fix it in our test environment by increasing memtable flush thresholds to 0.7 - which didn't make any sense to us, but may be worth trying.
I'm trying to simultaneously add 4 nodes to my current 2-node DC. I have Vnodes turned off as per Datastax suggestion. Right after the major index build in each node, the following warning is printed several times in the logs:
WARN [SolrSecondaryIndex ks.cf index initializer.] 2014-06-20
09:39:59,904 CassandraUtil.java (line 108) Error Operation timed out -
received only 3 responses. on attempt 1 out of 4 with CL QUORUM...
I understand what it means. But why is Cassandra expecting the nodes to fulfill the CL when these nodes are still bootstrapping? More importantly, how does the warning affect the bootstrap? I noticed that the nodes are not doing any index build or streaming anymore; but they also remained in "Active - Joining" state. Is there any chance that they will finish? What should I do?
I'm using DSE 4.0.3. All existing and new nodes in the DC are Search nodes. I pre-computed the tokens using the python program for MurMur3Partitioner.
EDIT:
Although nodetool compactionstats does not show any on-going index build in the nodes, for some reason, I still see a lot of these lines in the logs:
INFO [IndexPool backpressure thread-0] 2014-06-20 12:30:31,346 IndexPool.java (line 472) Throttling at 26 index requests per second with target total queue size at 40
INFO [IndexPool backpressure thread-0] 2014-06-20 12:30:34,169 IndexPool.java (line 428) Back pressure is active with total index queue size 18586 and average processing time 2770
EDIT:
Interestingly, I found the following lines in each node after digging through the log files:
INFO [main] 2014-06-20 09:39:48,588 StorageService.java (line 1036) Bootstrap completed! for the tokens [node token]
INFO [SolrSecondaryIndex ks.cf index initializer.] 2014-06-20 11:32:07,833 AbstractSolrSecondaryIndex.java (line 411) Reindexing 1417116631 commit log updates for core ks.cf
Based from these lines, I feel a lot safer that the bootstrap actually completed and that the nodes are simply re-indexing their data. I don't know, though, why the re-indexing process is not being shown in nodetool compactionstats.
It appears the bootstrap completed, and the DSE Search system is running normally.
why the re-indexing process is not being shown in nodetool compactionstat
DSE Search is not generally exposed via Cassandra command line tools. The log output should show the indexing as having completed, were you able to verify that?
I've upgraded cassandra from 1.2.13 to 2.0.4 on a cluster of 5 nodes.
when i run nodetool -h localhost ring I see this errormessage in the end:
ERROR 10:33:28,324 Unable to initialize MemoryMeter (jamm not specified as javaagent). This means Cassandra will be unable to measure object sizes accurately and may consequently OOM.
according to this:
https://issues.apache.org/jira/browse/CASSANDRA-6404
it should be fixed.
i'm running java-1.7.0-oracle-1.7.0.45-1jpp.2.el6_4.x86_64.
this is the beginning of the process options:
/usr/lib/jvm/java-1.7.0-oracle-1.7.0.45.x86_64/jre/bin/java -ea -javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar
is there anyone who could point me in a direction where too look for a solution?
these error's, are they serious or mere cosmetic?
//john
If this error only emits from tools, then ignore. If you also see it in your output.log or system.log, then it can be problematic. It seems your version of JVM is good. If the command prompt you pasted here is from Cassandra process, then you are good. Tools have a different script to initialize their environment. Check cassandra/bin folder and inspect tools' scripts to see if they include this change:
https://issues.apache.org/jira/secure/attachment/12615604/0001-Set-javaagent-when-running-tools-in-bin.patch
Chances are somehow your upgrade process didn't change them.