Cassandra bootstrap hang in join state for a very long time - cassandra

Our production environment has two cassandra node(v2.0.5), and we want to add additional node to extend scalability. We followed step desc in Datastax doc
After bootstrap new node, we observed some exception log
ERROR [CompactionExecutor:42] 2015-03-25 19:01:01,821 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor:42,1,main]
java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:154)
at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:137)
And it repeat some compaction task and after two week it didn't complete bootstrap. Node remains not join state
INFO [CompactionExecutor:4468] 2015-03-30 09:18:20,288 ColumnFamilyStore.java (line 784) Enqueuing flush of Memtable-compactions_in_progress#1247174540(212/13568 serialized/live bytes, 7 ops)
INFO [FlushWriter:314] 2015-03-30 09:18:22,408 Memtable.java (line 373) Completed flushing /var/lib/cassandra/data/production_alarm_keyspace/alarm_history_data_new/production_alarm_keyspace-alarm_history_data_new-jb-118-Data.db (11216962 bytes) for commitlog position ReplayPosition(segmentId=1427280544702, position=24550137)
INFO [FlushWriter:314] 2015-03-30 09:18:22,409 Memtable.java (line 333) Writing Memtable-alarm_master_data#37361826(26718076/141982437 serialized/live bytes, 791595 ops)
INFO [FlushWriter:314] 2015-03-30 09:18:24,018 Memtable.java (line 373) Completed flushing /var/lib/cassandra/data/production_alarm_keyspace/alarm_master_data/production_alarm_keyspace-alarm_master_data-jb-346-Data.db (8407637 bytes) for commitlog position ReplayPosition(segmentId=1427280544702, position=24550137)
INFO [FlushWriter:314] 2015-03-30 09:18:24,018 Memtable.java (line 333) Writing Memtable-compactions_in_progress#1247174540(212/13568 serialized/live bytes, 7 ops)
INFO [FlushWriter:314] 2015-03-30 09:18:24,185 Memtable.java (line 373) Completed flushing /var/lib/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-1019-Data.db (201 bytes) for commitlog position ReplayPosition(segmentId=1427280544702, position=24550511)
INFO [CompactionExecutor:4468] 2015-03-30 09:18:24,186 CompactionTask.java (line 115) Compacting [SSTableReader(path='/var/lib/cassandra/data/production_alarm_keyspace/alarm_common_dump_by_minutes/production_alarm_keyspace-alarm_common_dump_by_minutes-jb-356-Data.db'), SSTableReader(path='/var/lib/cassandra/data/production_alarm_keyspace/alarm_common_dump_by_minutes/production_alarm_keyspace-alarm_common_dump_by_minutes-jb-357-Data.db'), SSTableReader(path='/var/lib/cassandra/data/production_alarm_keyspace/alarm_common_dump_by_minutes/production_alarm_keyspace-alarm_common_dump_by_minutes-jb-355-Data.db'), SSTableReader(path='/var/lib/cassandra/data/production_alarm_keyspace/alarm_common_dump_by_minutes/production_alarm_keyspace-alarm_common_dump_by_minutes-jb-354-Data.db')]
INFO [CompactionExecutor:4468] 2015-03-30 09:18:39,189 ColumnFamilyStore.java (line 784) Enqueuing flush of Memtable-compactions_in_progress#810255650(0/0 serialized/live bytes, 1 ops)
INFO [FlushWriter:314] 2015-03-30 09:18:39,189 Memtable.java (line 333) Writing Memtable-compactions_in_progress#810255650(0/0 serialized/live bytes, 1 ops)
INFO [FlushWriter:314] 2015-03-30 09:18:39,357 Memtable.java (line 373) Completed flushing /var/lib/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-1020-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1427280544702, position=25306969)
INFO [CompactionExecutor:4468] 2015-03-30 09:18:39,367 CompactionTask.java (line 275) Compacted 4 sstables to [/var/lib/cassandra/data/production_alarm_keyspace/alarm_common_dump_by_minutes/production_alarm_keyspace-alarm_common_dump_by_minutes-jb-358,]. 70,333,241 bytes to 70,337,669 (~100% of original) in 15,180ms = 4.418922MB/s. 260 total partitions merged to 248. Partition merge counts were {1:236, 2:12, }
Nodetool status just show two node, and it's accept because 2.0.5 has bug in nodetool don't show join node.
[bpmesb#bpmesbap2 ~]$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 172.18.8.56 99 GB 256 51.1% 7321548a-3998-4122-965f-0366dd0cc47e rack1
UN 172.18.8.57 93.78 GB 256 48.9% bb306032-ff1c-4209-8300-d8c3de843f26 rack1
Can anybody help about this condition? Because datastax says bootstrap only take few minutes but our situation didn't complete after 2 weeks? We search stackoverflow and find This issue
may be related to our problem

After few days test and look at exception log. We found this may be key issue about this problem.
ERROR 19:01:01,821 Exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:413)
at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:140)
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:113)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
The log shows that one stream receiver task encounter tombstone overwhelming exception. And we think that's the key point cause cassandra never change state to normal.
And here is following step we done to fix this problem. We use nodetool to compact table and secondary index at two original node.
nohup nodetool compact production_alarm_keyspace object_daily_data &
nohup nodetool rebuild_index production_alarm_keyspace object_daily_data object_daily_data_alarm_no_idx_1 &
And we restart new node again, and after a hour, new node jump to normal and work fine until now.

Related

Cassandra killed by O/S without error in logs

One node of my cassandra cluster breaks very often like every day. It mostly breaks in the morning most probably because of memory consumption by other jobs. But I don't see any error msg in the logs. Following are the log messages when it breaks:
INFO [MemtableFlushWriter:313] 2016-04-07 07:19:19,629 Memtable.java:385 - Completed flushing /data/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-6666-Data.db (6945 bytes) for commitlog position ReplayPosition(segmentId=1459914506391, position=14934142)
INFO [MemtableFlushWriter:313] 2016-04-07 07:19:19,631 Memtable.java:346 - Writing Memtable-sstable_activity#823770964(21691 serialized bytes, 7605 ops, 0%/0% of on/off-heap limit)
INFO [MemtableFlushWriter:313] 2016-04-07 07:19:19,650 Memtable.java:385 - Completed flushing /data/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-7558-Data.db (13689 bytes) for commitlog position ReplayPosition(segmentId=1459914506391, position=14934142)
INFO [CompactionExecutor:1176] 2016-04-07 07:19:19,651 CompactionTask.java:140 - Compacting [SSTableReader(path='/data/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-7557-Data.db'), SSTableReader(path='/data/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-7556-Data.db'), SSTableReader(path='/data/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-7555-Data.db'), SSTableReader(path='/data/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-7558-Data.db')]
INFO [CompactionExecutor:1176] 2016-04-07 07:19:19,696 CompactionTask.java:270 - Compacted 4 sstables to [/data/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-7559,]. 46,456 bytes to 9,197 (~19% of original) in 45ms = 0.194910MB/s. 1,498 total partitions merged to 216. Partition merge counts were {1:675, 2:33, 3:11, 4:181, }
INFO [MemtableFlushWriter:312] 2016-04-07 07:19:19,717 Memtable.java:385 - Completed flushing /data/cassandra/data/system/size_estimates-618f817b005f3678b8a453f3930b8e86/system-size_estimates-ka-8014-Data.db (849738 bytes) for commitlog position ReplayPosition(segmentId=1459914506391, position=14934142)
I know that the OS kills it but how to find the root cause of the problem and if any config changes can be done for cassandra?

Cassandra: Long Par New GC Pauses when Bootstrapping new nodes to cluster

I've seen an issue that happens fairly often when bootstrapping new nodes to a Datastax Enterprise Cassandra cluster (ver: 2.0.10.71)
When starting the new node to be bootstrapped, the bootstrap process starts to stream data from other nodes in the cluster. After a short period of time (usually a min or less) - other nodes in the cluster show high Par New GC pause times and then the nodes drop off from the cluster, failing the stream session.
INFO [main] 2015-04-27 16:59:58,644 StreamResultFuture.java (line 91) [Stream #d42dfef0-ecfe-11e4-8099-5be75b0950b8] Beginning stream session with /10.1.214.186
INFO [GossipTasks:1] 2015-04-27 17:01:06,342 Gossiper.java (line 890) InetAddress /10.1.214.186 is now DOWN
INFO [HANDSHAKE-/10.1.214.186] 2015-04-27 17:01:21,400 OutboundTcpConnection.java (line 386) Handshaking version with /10.1.214.186
INFO [RequestResponseStage:11] 2015-04-27 17:01:23,439 Gossiper.java (line 876) InetAddress /10.1.214.186 is now UP
Then on the other node:
10.1.214.186 ERROR [STREAM-IN-/10.1.212.233] 2015-04-27 17:02:07,007 StreamSession.java (line 454) [Stream #d42dfef0-ecfe-11e4-8099-5be75b0950b8] Streaming error occurred
Also see things in the logs:
10.1.219.232 INFO [ScheduledTasks:1] 2015-04-27 18:20:19,987 GCInspector.java (line 116) GC for ParNew: 118272 ms for 2 collections, 980357368 used; max is 12801015808
10.1.221.146 INFO [ScheduledTasks:1] 2015-04-27 18:20:29,468 GCInspector.java (line 116) GC for ParNew: 154911 ms for 1 collections, 1287263224 used; max is 12801015808`
It seems that it happens on different nodes each time we try to bootstrap a new node.
I've found this related ticket. https://issues.apache.org/jira/browse/CASSANDRA-6653
My only guess is that when the new node comes up a lot of compactions are firing off and that might be causing the GC pause times, I had considered setting concurrent_compactors = 1/2 my total CPU
Anyone have an idea?
Edit: More details around GC settings Using i2.2xlarge nodes on EC2:
MAX_HEAP_SIZE="12G"
HEAP_NEWSIZE="800M"
Also
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
With the help from the DSE crew - the following settings helped us.
With an i2.2xlarge node (8 cpu, 60G of ram, local SSD only)
Increasing Heap New Size to 512M * num CPU (in our case 4G)
Setting memtable_flush_writers = 8
Setting concurrent_compactors = total CPU / 2 (in our case 4)
Making these changes no longer seeing ParNew GC times exceeding 1sec on bootstrap (previously we were seeing 50-100 SECOND Gc times). FWIW We don't see any ParNew GC times during normal operation - only bootstrap.

Cassandra nodetool repair freezes entire cluster

Need help understanding what's happening to Cassandra when attempting a nodetool repair on one of the column families in our keyspace.
We are running Cassandra 2.0.7 and have a table we use for indexing object data in our system.
CREATE TABLE ids_by_text (
object_type text,
field_name text,
ref_type text,
value text,
ref_id timeuuid,
PRIMARY KEY((object_type,field_name,ref_type),value,ref_id)
)
Rows can grow to be quite large. We have roughly 10 million objects in the database with an average of 4-6 fields that are indexing them via the table above. It doesn't seem like a lot to me.
When running nodetool repair, we will run for a bit and then hit a point where the following exception is thrown:
ERROR [AntiEntropySessions:8] 2014-07-06 16:47:48,863 RepairSession.java (line 286) [repair #5f37c2e0-052b-11e4-92f5-b9bfa38ef354] session completed with the following error
org.apache.cassandra.exceptions.RepairException: [repair #5f37c2e0-052b-11e4-92f5-b9bfa38ef354 on apps/ids_by_text, (-7683110849073497716,-7679039947314690170]] Sync failed between /10.0.2.166 and /10.0.2.163
at org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:207)
at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:236)
at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:59)
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO [ScheduledTasks:1] 2014-07-06 16:47:48,909 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 66029 ms for 1 collections, 7898896176 used; max is 8547991552
INFO [GossipTasks:1] 2014-07-06 16:47:48,901 Gossiper.java (line 883) InetAddress /10.0.2.162 is now DOWN
INFO [GossipTasks:1] 2014-07-06 16:47:49,181 Gossiper.java (line 883) InetAddress /10.0.2.163 is now DOWN
INFO [GossipTasks:1] 2014-07-06 16:47:49,184 StreamResultFuture.java (line 186) [Stream #da84b3e1-052b-11e4-92f5-b9bfa38ef354] Session with /10.0.2.163 is complete
WARN [GossipTasks:1] 2014-07-06 16:47:49,186 StreamResultFuture.java (line 215) [Stream #da84b3e1-052b-11e4-92f5-b9bfa38ef354] Stream failed
INFO [GossipTasks:1] 2014-07-06 16:47:49,187 Gossiper.java (line 883) InetAddress /10.0.2.165 is now DOWN
INFO [GossipTasks:1] 2014-07-06 16:47:49,188 Gossiper.java (line 883) InetAddress /10.0.2.164 is now DOWN
INFO [GossipTasks:1] 2014-07-06 16:47:49,189 Gossiper.java (line 883) InetAddress /10.0.2.166 is now DOWN
INFO [GossipTasks:1] 2014-07-06 16:47:49,189 StreamResultFuture.java (line 186) [Stream #da84b3e0-052b-11e4-92f5-b9bfa38ef354] Session with /10.0.2.166 is complete
WARN [GossipTasks:1] 2014-07-06 16:47:49,189 StreamResultFuture.java (line 215) [Stream #da84b3e0-052b-11e4-92f5-b9bfa38ef354] Stream failed
At this point, the other nodes will be unresponsive, throwing TPStatus logs and essentially be unresponsive. The system does not recover from this. We are dead.
I went through and ran 'nodetool scrub' on all of the nodes. That worked on most of them, some failed, so I used 'sstablescrub' on them. We wrote a script that did a subrange repair and I can identify the ranges that are problematic, but I haven't done enough testing to know if that is consistent or symptomatic. Testing is tough when it brings production down, so I have to be cautious.
Sidebar question... how do you stop a repair that is underway? If I can see things going sideways, I'd like to stop it.
Note that every other column family in the keyspace repairs just fine.
I am not sure what other detail to give. We have been beating our heads against this for a week and, well, we're stuck.
This(https://issues.apache.org/jira/browse/CASSANDRA-7330) may relate to unresponsiveness after repair failure. It is fixed in the latest 2.0.9 version.
how do you stop a repair that is underway?
It is still work in progress(https://issues.apache.org/jira/browse/CASSANDRA-3486).
You can stop a repair in 2.1.* as follows:
wget -q -O jmxterm.jar http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar
java -jar ./jmxterm.jar
open localhost:7199 -u [optional username] -p [optional password]
bean org.apache.cassandra.db:type=StorageService
run forceTerminateAllRepairSessions

Cassandra2.0.1 fail to restart when private ip is changed on ec2 with ebs

I use a cassandra cluster of 3 EC2 instance nodes on AWS.
Each node mount a EBS volume to the path(/usr/lib/cassandra/), so data of cassandra is stored in EBS volume.
All nodes have a dns to point the private ip of themselves. And all cassandra configure files use the DNS instead of the private ip.
My question is that: when an node is teminated, and i will lanuch another cassandra, and mount the EBS to the newly-launched instance, so the data will not be losted. But when starting cassandra service, cassandra service will fail. And the log is as bellow:
I don't know why it will use ip of other nodes ?
INFO 12:47:36,364 Writing Memtable-schema_keyspaces#1358300025(251/2510 serialized/live bytes, 8 ops)
INFO 12:47:36,367 Enqueuing flush of Memtable-schema_columns#1982787565(91991/919910 serialized/live bytes, 2375 ops)
INFO 12:47:36,367 Enqueuing flush of Memtable-schema_columnfamilies#59370809(38866/388660 serialized/live bytes, 1052 ops)
INFO 12:47:36,385 Completed flushing /var/lib/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-jb-10-Data.db (214 bytes) for commitlog position ReplayPosition(segmentId=1382719654978, position=289)
INFO 12:47:36,391 Writing Memtable-schema_columns#1982787565(91991/919910 serialized/live bytes, 2375 ops)
INFO 12:47:36,627 Completed flushing /var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-24-Data.db (19358 bytes) for commitlog position ReplayPosition(segmentId=1382719654978, position=289)
INFO 12:47:36,628 Writing Memtable-schema_columnfamilies#59370809(38866/388660 serialized/live bytes, 1052 ops)
INFO 12:47:36,660 Completed flushing /var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-jb-24-Data.db (8750 bytes) for commitlog position ReplayPosition(segmentId=1382719654978, position=289)
INFO 12:47:36,661 Log replay complete, 64 replayed mutations
INFO 12:47:36,746 Compacting [SSTableReader(path='/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-jb-23-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-jb-22-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-jb-24-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-jb-21-Data.db')]
INFO 12:47:37,584 Compacted 4 sstables to [/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-jb-25,]. 34,990 bytes to 8,759 (~25% of original) in 809ms = 0.010325MB/s. 16 total rows, 4 unique. Row merge counts were {1:0, 2:0, 3:0, 4:4, }
INFO 12:47:38,411 Cassandra version: 2.0.1
INFO 12:47:38,412 Thrift API version: 19.37.0
INFO 12:47:38,425 CQL supported versions: 2.0.0,3.1.1 (default: 3.1.1)
INFO 12:47:38,485 Loading persisted ring state
ERROR 12:47:38,537 Exception encountered during startup
java.lang.RuntimeException: Unknown host /172.31.9.175 with no default configured
at org.apache.cassandra.locator.PropertyFileSnitch.getEndpointInfo(PropertyFileSnitch.java:90)
at org.apache.cassandra.locator.PropertyFileSnitch.getDatacenter(PropertyFileSnitch.java:113)
at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:127)
at org.apache.cassandra.locator.TokenMetadata$Topology.addEndpoint(TokenMetadata.java:1049)
at org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:187)
at org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:159)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:470)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:428)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:343)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485)
java.lang.RuntimeException: Unknown host /172.31.9.175 with no default configured
at org.apache.cassandra.locator.PropertyFileSnitch.getEndpointInfo(PropertyFileSnitch.java:90)
at org.apache.cassandra.locator.PropertyFileSnitch.getDatacenter(PropertyFileSnitch.java:113)
at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:127)
at org.apache.cassandra.locator.TokenMetadata$Topology.addEndpoint(TokenMetadata.java:1049)
at org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:187)
at org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:159)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:470)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:428)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:343)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485)
Exception encountered during startup: Unknown host /172.31.9.175 with no default configured
I may be wrong, but in your cassandra.yaml, you may want to look at the seeds variable, it probably is incorrectly referencing an old IP. Another way around this is to use some of the floating IPs in AWS. That way if you spin up a new machine you point the floating IPs to it, and the configuration wouldn't have to change the IPs.
Check if you have configured Default for unknown nodes in your snitch property file.
default =<Any of your DC name>:RAC1
After making the changes restart, hope this will help.

Why is Cassandra ignoring my schema modifications?

I'm pretty new to Cassandra. I've had a single-node cluster running for a few days with no problems but today it started ignoring some of my CQL commands. SELECTs work fine but if I run DROP TABLE foo; from cqlsh then nothing happens. After a half-second pause it returns me to the prompt but the table wasn't dropped. The same goes for creating an index using CREATE INDEX.
I'm running in a virtual machine, using the Cassandra distribution from OpenStax on Ubuntu 12.04.
I checked the Cassandra logs and I definitely get output when I run a CREATE INDEX, but no apparent errors:
CREATE INDEX number_uri_index ON numbers (number);
Produces:
INFO [MigrationStage:1] 2012-07-25 14:25:59,120 ColumnFamilyStore.java (line 643) Enqueuing flush of Memtable-schema_columnfamilies#15955724(1212/1515 serialized/live bytes, 20 ops)
INFO [FlushWriter:5] 2012-07-25 14:25:59,122 Memtable.java (line 266) Writing Memtable-schema_columnfamilies#15955724(1212/1515 serialized/live bytes, 20 ops)
INFO [FlushWriter:5] 2012-07-25 14:25:59,139 Memtable.java (line 307) Completed flushing /var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hd-50-Data.db (1267 bytes) for commitlog position ReplayPosition(segmentId=140485087964, position=8551)
INFO [MigrationStage:1] 2012-07-25 14:25:59,141 ColumnFamilyStore.java (line 643) Enqueuing flush of Memtable-schema_columns#7576227(320/400 serialized/live bytes, 5 ops)
INFO [FlushWriter:5] 2012-07-25 14:25:59,141 Memtable.java (line 266) Writing Memtable-schema_columns#7576227(320/400 serialized/live bytes, 5 ops)
INFO [FlushWriter:5] 2012-07-25 14:25:59,172 Memtable.java (line 307) Completed flushing /var/lib/cassandra/data/system/schema_columns/system-schema_columns-hd-46-Data.db (367 bytes) for commitlog position ReplayPosition(segmentId=140485087964, position=8551)
Same problem here in a 3 nodes setup. Solved doing the same modification on a second node.
Investiganting CASSANDRA jira we discovered that it could be related with the way timestamps are managed by schema related commands and it should be fixed in 1.1.3:
CASSANDRA-4461
CASSANDRA-4432

Resources