How can the Cassandra commitlog be corrupted? - cassandra

This is the second time my commitlog is corrupted, and the server refuses to start. What worries me is that I get these error issues even if no update were made to the database.
My config says that commitlog are synced every 10s seconds, so how can a file be corrupt unless a crash occurs within these 10 seconds?
Is this a Cassandra bug? Or by design, i.e. bad design?
I am using 3.4 on Windows 10, Datastax installer.
In the stdout log, the last part is
INFO 06:17:39 Replaying C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1471353812251.log, C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1471353812252.log, C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1471411951134.log, C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1471454506802.log, C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1471532812678.log
ERROR 06:17:39 Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Could not read commit log descriptor in file C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1471353812252.log
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:611) [apache-cassandra-3.4.0.jar:3.4.0]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:373) [apache-cassandra-3.4.0.jar:3.4.0]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:236) [apache-cassandra-3.4.0.jar:3.4.0]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:192) [apache-cassandra-3.4.0.jar:3.4.0]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172) [apache-cassandra-3.4.0.jar:3.4.0]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) [apache-cassandra-3.4.0.jar:3.4.0]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:551) [apache-cassandra-3.4.0.jar:3.4.0]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:680) [apache-cassandra-3.4.0.jar:3.4.0]

I have seen similar errors. This happens, when Cassandra process gets crahed may be due to OOM. Run "dmesg" and check if it was killed due to OOM. In that case there is possibility that commit log it was writing to was corrupted or its of 0kb file (check size of above file in error), and it throws the above error when Cassandra is restarted and it replays that file.

Related

Spark 2.4 Got an error when resolving hostNames Falling back to /default-rack

Running an application in in client mode, the driver logs are printed with the below info messages, any idea on how to resolve this? Any spark configs to be updated? or missing?
[INFO ][dispatcher-event-loop-29][SparkRackResolver:54] Got an error when resolving hostNames. Falling back to /default-rack for all
The jobs runs fine, this msg is not in the executor logs.
Check this bug:
https://issues.apache.org/jira/browse/SPARK-28005
If you want to suppress this in the logs you can try to add this into your log4j.properties
log4j.logger.org.apache.spark.deploy.yarn.SparkRackResolver=ERROR
This can happen while using spart-submit with master yarn in a deploy mode local (not using --deploy-mode cluster) and the path to topology.py script is not correct into your core-site.xml.
Path to core-site.xml can be set via environment variable HADOOP_CONF_DIR (or YARN_CONF_DIR).
Check the path in the param net.topology.script.file.name value of core-site.xml.
If the path is incorrect, deploying driver in local mode will lead to error of executing with the following warning:
23/01/15 18:39:43 WARN ScriptBasedMapping: Exception running /home/alexander/xxx/.conf/topology.py 10.15.21.199
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "/home/john"): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
...
23/01/15 18:39:43 INFO SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all

cassandra Failed to classify files in /var/lib/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33

cassandra 3.6
openjdk version "1.8.0_91"
when I start cassandra service will throw this error.
INFO 05:57:10 Initializing system_schema.keyspaces
INFO 05:57:10 Initializing system_schema.tables
ERROR 05:57:10 Failed to classify files in /var/lib/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f
Some old files are missing but the txn log is still there and not completed
Files in folder:
/var/lib/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/ma-404-big-CompressionInfo.db
/var/lib/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/ma-404-big-Data.db
/var/lib/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/ma-404-big-Digest.crc32
/var/lib/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/ma-404-big-Filter.db
/var/lib/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/ma-404-big-Index.db
/var/lib/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/ma-404-big-Statistics.db
/var/lib/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/ma-404-big-Summary.db
/var/lib/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/ma-404-big-TOC.txt
Txn: [ma_txn_compaction_4c1ecb40-0ff9-11e8-a162-c50ddd47b4bb.log in /var/lib/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f]
what does it means and what should I have to do to solve this issue.
Thanks in advance
Finally I got the answer.
system files are corrupted and getting issues.
so you need to delete all direcory inside "/var/lib/cassandra/data/" except your keyspace.
*Note: Don't remove your keyspace data.

Cassandra 2.1.8: Node refuses to start with NPE in removeUnfinishedCompactionLeftovers

I am trying to upgrade a Cassandra 2.1.0 cluster to 2.1.8 (latest release).
When I start a first node with 2.1.8 runtime, I get an error and the node refuses to start.
This is the error's stack trace :
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:642) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302) [apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524) [apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613) [apache-cassandra-2.1.8.jar:2.1.8]
Caused by: java.lang.NullPointerException: null
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:634) ~[apache-cassandra-2.1.8.jar:2.1.8]
... 3 common frames omitted
FSReadError in Failed to remove unfinished compaction leftovers (file: /home/nudgeca2/datas/data/main/segment-97b5ba00571011e49a928bffe429b6b5/main-segment-ka-15432-Statistics.db). See log for details.
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:642)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613)
Caused by: java.lang.NullPointerException
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:634)
... 3 more
Exception encountered during startup: java.lang.NullPointerException
The cluster has 7 nodes and it turns on AWS Linux EC2 instances.
The node I try to upgrade was stopped after a nodetool drain.
Then I tried to come back to 2.1.0 runtime but I now get a similar error.
I also tried to stop and start another node and everything was ok, the node restarted without any problem.
I tried to touch the missing file (as it should be removed, I thought it would perhaps not need a specific content). I had two other files with the same error that I also touched. And finally the node fails further while trying to read these files.
Anyone has any idea what I should do ?
Thank you for any help.
It might be worth opening a Jira for that issue, so if nothing else, they can catch the NPE and provide a better error message.
It looks like it's trying to open:
file: /home/nudgeca2/datas/data/main/segment-97b5ba00571011e49a928bffe429b6b5/main-segment-ka-15432-Statistics.db
It's possible that it's trying to read that file because it finds the associated data file: (/home/nudgeca2/datas/data/main/segment-97b5ba00571011e49a928bffe429b6b5/main-segment-ka-15432-Data.db). Does that data file exist? I'd be tempted to move it out of the way, and see if it starts properly.

apache shark installation on spark cluster

When running shark on spark cluster with one node
I'm getting the following error. can anyone please solve it...
Thanks in advance
error::
Executor updated: app-20140619165031-0000/0 is now FAILED (class java.io.IOException: Cannot run program "/home/trendwise/Hadoop_tools/jdk1.7.0_40/bin/java" (in directory "/home/trendwise/Hadoop_tools/spark/spark-0.9.1-bin-hadoop1/work/app-20140619165031-0000/0"): error=2, No such file or directory)
In my experience "No such file or directory" is often a symptom of some other exception. Usually a "no space left on device" and sometimes "too many files open". Mine the logs for other stack traces and monitor your disk usage and inode usage to confirm.

Cassandra 1.2 using symbolic link for /var/lib/cassandra/data

I thought this would be simple. I guess not.
I have an external hard drive mounted at /root/storage - OK
I moved the data directory from /var/lib/cassandra/ to /root/storage - OK
I then created a symbolic link out of /var/lib/cassandra pointing to where the directory is now.....so...... ln -s /root/storage/data /var/lib/cassandra - OK
Now I am unable to start cassandra. I am getting this error in /var/log/cassandra/system.log:
INFO [main] 2013-02-15 10:08:36,329 CacheService.java (line 166)
Scheduling row cache save to each 0 seconds (going to save all keys).
ERROR [main] 2013-02-15 10:08:36,366 FileUtils.java (line 373)
Stopping the gossiper and the RPC server ERROR [main] 2013-02-15
10:08:36,367 CassandraDaemon.java (line 387) Exception encountered
during startup java.lang.IllegalStateException: No configured daemon
at org.apache.cassandra.service.StorageService.stopRPCServer(StorageService.java:314)
at org.apache.cassandra.io.util.FileUtils.handleFSError(FileUtils.java:375)
at org.apache.cassandra.db.Directories.(Directories.java:113)
at org.apache.cassandra.db.Directories.create(Directories.java:91)
at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:403)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:174)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:370)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:413)
[root#cassandra-new6 storage]# vi
/usr/share/cassandra/default.conf/cassandra.yaml
The permissions are exactly the same on this directory. All file permissiosn are teh same too. Any ideas would be appreciated.
When I get rid of the sym link and move the data directory back, everything works again.
The symlink regression is fixed in Cassandra 1.2.2. https://issues.apache.org/jira/browse/CASSANDRA-5185
Instead of creating a sym link, you can change where Cassandra looks for the data directory in the cassandra.yaml file by changing the data_file_directories parameter.

Resources