Are there any known issues with initial_token collision when adding nodes to a cluster in a VM environment?
I'm working on a 4 node cluster set up on a VM. We're running into issues when we attempt to add nodes to the cluster.
In the cassandra.yaml file, initial_token is left blank.
Since we're running > 1.0 cassandra, auto_bootstrap should be true by default.
It's my understanding that each of the nodes in the cluster should be assigned an initial token at startup.
This is not what we're currently seeing.
We do not want to manually set the value for initial_token for each node (kind of defeats the goal of being dynamic..)
We also have set the partitioner to random: partitioner: org.apache.cassandra.dht.RandomPartitioner
I've outlined the steps we follow and results we are seeing below.
Can someone please asdvise as to what we're missing here?
Here are the detailed steps we are taking:
1) Kill all cassandra instances and delete data & commit log files on each node.
2) Startup Seed Node (S.S.S.S)
Starts up fine.
3) Run nodetool -h W.W.W.W ring and see:
Address DC Rack Status State Load Effective-Ownership Token
S.S.S.S datacenter1 rack1 Up Normal 28.37 GB 100.00% 24360745721352799263907128727168388463
4) X.X.X.X Startup
INFO [GossipStage:1] 2012-11-29 21:16:02,194 Gossiper.java (line 850) Node /X.X.X.X is now part of the cluster
INFO [GossipStage:1] 2012-11-29 21:16:02,194 Gossiper.java (line 816) InetAddress /X.X.X.X is now UP
INFO [GossipStage:1] 2012-11-29 21:16:02,195 StorageService.java (line 1138) Nodes /X.X.X.X and /Y.Y.Y.Y have the same token 113436792799830839333714191906879955254. /X.X.X.X is the new owner
WARN [GossipStage:1] 2012-11-29 21:16:02,195 TokenMetadata.java (line 160) Token 113436792799830839333714191906879955254 changing ownership from /Y.Y.Y.Y to /X.X.X.X
5) Run nodetool -h W.W.W.W ring and see:
Address DC Rack Status State Load Effective-Ownership Token
113436792799830839333714191906879955254
S.S.S.S datacenter1 rack1 Up Normal 28.37 GB 100.00% 24360745721352799263907128727168388463
W.W.W.W datacenter1 rack1 Up Normal 123.87 KB 100.00% 113436792799830839333714191906879955254
6) Y.Y.Y.Y Startup
INFO [GossipStage:1] 2012-11-29 21:17:36,458 Gossiper.java (line 850) Node /Y.Y.Y.Y is now part of the cluster
INFO [GossipStage:1] 2012-11-29 21:17:36,459 Gossiper.java (line 816) InetAddress /Y.Y.Y.Y is now UP
INFO [GossipStage:1] 2012-11-29 21:17:36,459 StorageService.java (line 1138) Nodes /Y.Y.Y.Y and /X.X.X.X have the same token 113436792799830839333714191906879955254. /Y.Y.Y.Y is the new owner
WARN [GossipStage:1] 2012-11-29 21:17:36,459 TokenMetadata.java (line 160) Token 113436792799830839333714191906879955254 changing ownership from /X.X.X.X to /Y.Y.Y.Y
7) Run nodetool -h W.W.W.W ring and see:
Address DC Rack Status State Load Effective-Ownership Token
113436792799830839333714191906879955254
S.S.S.S datacenter1 rack1 Up Normal 28.37 GB 100.00% 24360745721352799263907128727168388463
Y.Y.Y.Y datacenter1 rack1 Up Normal 123.87 KB 100.00% 113436792799830839333714191906879955254
8) Z.Z.Z.Z Startup
INFO [GossipStage:1] 2012-11-30 04:52:28,590 Gossiper.java (line 850) Node /Z.Z.Z.Z is now part of the cluster
INFO [GossipStage:1] 2012-11-30 04:52:28,591 Gossiper.java (line 816) InetAddress /Z.Z.Z.Z is now UP
INFO [GossipStage:1] 2012-11-30 04:52:28,591 StorageService.java (line 1138) Nodes /Z.Z.Z.Z and /Y.Y.Y.Y have the same token 113436792799830839333714191906879955254. /Z.Z.Z.Z is the new owner
WARN [GossipStage:1] 2012-11-30 04:52:28,592 TokenMetadata.java (line 160) Token 113436792799830839333714191906879955254 changing ownership from /Y.Y.Y.Y to /Z.Z.Z.Z
9) Run nodetool -h W.W.W.W ring and see:
Address DC Rack Status State Load Effective-Ownership Token
113436792799830839333714191906879955254
W.W.W.W datacenter1 rack1 Up Normal 28.37 GB 100.00% 24360745721352799263907128727168388463
S.S.S.S datacenter1 rack1 Up Normal 28.37 GB 100.00% 24360745721352799263907128727168388463
Z.Z.Z.Z datacenter1 rack1 Up Normal 123.87 KB 100.00% 113436792799830839333714191906879955254
Thanks in advance.
This is what I did to fix this problem:
Stop the Cassandra service
Set auto_bootstrap: false on the seed node.
Empty data and commitlog directories:
sudo rm -rf /var/lib/cassandra/data/*
sudo rm -rf /var/lib/cassandra/commitlog/*
And then restart the service
I tested this with Cassandra 3.7.
Clearly your nodes are holding onto some past cluster information that is being used at startup. Make sure to delete the LocationInfo directories, which contain the data about the cluster. You have a very strange token layout (where's the 0 token, for example?), so you're certainly going to need to reassign them if you want the proper ownership.
It may help to explain how token assignment works, so let me also address this. In a brand new cluster, the first node will get assigned token 0 by default and will have 100% ownership. If you do not specify a token for your next node, Cassandra will calculate a token such that the original node owns the lower 50% and the new node the higher 50%.
When you add node 3, it will insert the token between the first and second, so you'll actually end up with ownership that looks like 25%, 25%, 50%. This is really important, because the lesson to learn here is that Cassandra will NEVER reassign a token by itself to balance the ring. If you want your ownership balanced properly, you must assign your own tokens. This is not hard to do, and there's actually a utility provided to do this.
So Cassandra's initial bootstrap process, while dynamic, may not yield the desired ring balance. You can't simply allow new nodes to join willy nilly without some intervention to make sure you get the desired result. Otherwise you will end up with the scenario you have laid out in your question.
Related
My main objective is to experiment with a Cassandra cluster. I have a Laptop so my options are, as I understand, (1) Use some virtualization software (e.g. Hyper-V) to create multiple VMs and then run each VM with a Cassandra instance, or (2) Use docker to create multiple instances of Cassandra, or (3) Directly run multiple instances.
I thought (3) would be faster, and provide me with more insights. So I tried this (by following https://stackoverflow.com/a/25348301/1029599). But I'm getting strange situations when I see that I'm not able to change the JMX port. More details below:
I've created two folders of Cassandra 3.11.7 - one in C drive and other in D drive.
For C drive folder, I've edited cassandra.yaml to replace 'listen_address: localhost' by 'listen_address: 127.0.0.1' and 'rpc_address: localhost' by 'rpc_address: 127.0.0.1'. In addition, set seeds to to point to D-drive-instance as '- seeds: "127.0.0.2"'. I've NOT edited cassandra-env.sh to let JMX_PORT be the default 7199.
For D drive folder, I've edited cassandra.yaml to point localhost as '127.0.0.2' and seed as '- seeds: "127.0.0.1"'. In addition, I've edited cassandra-env.sh to let JMX_PORT=7200.
Surprisingly, when I'm starting D drive's cassandra instance, it's always picking JMX_PORT as 7199 and not 7200.
Log (relevant portion from the start):
D:\apache-cassandra-3.11.7\bin>.\cassandra
WARNING! Powershell script execution unavailable. Please use 'powershell Set-ExecutionPolicy Unrestricted' on this user-account to run cassandra with fully featured functionality on this platform. Starting with legacy startup options Starting Cassandra Server INFO [main] 2020-08-17 17:31:21,632 YamlConfigurationLoader.java:89 - Configuration location: file:/D:/apache-cassandra-3.11.7/conf/cassandra.yaml INFO [main] 2020-08-17 17:31:22,359 Config.java:534 - Node configuration:[allocate_tokens_for_keyspace=null; authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_bootstrap=true; auto_snapshot=true; back_pressure_enabled=false; back_pressure_strategy=org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9, factor=5, flow=FAST}; batch_size_fail_threshold_in_kb=50; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_address=null; broadcast_rpc_address=null; buffer_pool_use_heap_if_exhausted=true; cas_contention_timeout_in_ms=1000; cdc_enabled=false; cdc_free_space_check_interval_ms=250; cdc_raw_directory=null; cdc_total_space_in_mb=0; check_for_duplicate_rows_during_compaction=true; check_for_duplicate_rows_during_reads=true; client_encryption_options=<REDACTED>; cluster_name=Test Cluster; column_index_cache_size_in_kb=2; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_compression=null; commitlog_directory=null; commitlog_max_compression_buffers_in_pool=3; commitlog_periodic_queue_size=-1; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_batch_window_in_ms=NaN; commitlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=null; compaction_large_partition_warning_threshold_mb=100; compaction_throughput_mb_per_sec=16; concurrent_compactors=null; concurrent_counter_writes=32; concurrent_materialized_view_writes=32; concurrent_reads=32; concurrent_replicates=null; concurrent_writes=32; counter_cache_keys_to_save=2147483647; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; credentials_cache_max_entries=1000; credentials_update_interval_in_ms=-1; credentials_validity_in_ms=2000; cross_node_timeout=false; data_file_directories=[Ljava.lang.String;#235834f2; disk_access_mode=auto; disk_failure_policy=stop; disk_optimization_estimate_percentile=0.95; disk_optimization_page_cross_chance=0.1; disk_optimization_strategy=ssd; dynamic_snitch=true; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; enable_materialized_views=true; enable_sasi_indexes=true; enable_scripted_user_defined_functions=false; enable_user_defined_functions=false; enable_user_defined_functions_threads=true; encryption_options=null; endpoint_snitch=SimpleSnitch; file_cache_round_up=null; file_cache_size_in_mb=null; gc_log_threshold_in_ms=200; gc_warn_threshold_in_ms=1000; hinted_handoff_disabled_datacenters=[]; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; hints_compression=null; hints_directory=null; hints_flush_period_in_ms=10000; incremental_backups=false; index_interval=null; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; initial_token=null; inter_dc_stream_throughput_outbound_megabits_per_sec=200; inter_dc_tcp_nodelay=false; internode_authenticator=null; internode_compression=dc; internode_recv_buff_size_in_bytes=0; internode_send_buff_size_in_bytes=0; key_cache_keys_to_save=2147483647; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=127.0.0.2; listen_interface=null; listen_interface_prefer_ipv6=false; listen_on_broadcast_address=false; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; max_hints_file_size_in_mb=128; max_mutation_size_in_kb=null; max_streaming_retries=3; max_value_size_in_mb=256; memtable_allocation_type=heap_buffers; memtable_cleanup_threshold=null; memtable_flush_writers=0; memtable_heap_space_in_mb=null; memtable_offheap_space_in_mb=null; min_free_space_per_drive_in_mb=50; native_transport_flush_in_batches_legacy=true; native_transport_max_concurrent_connections=-1; native_transport_max_concurrent_connections_per_ip=-1; native_transport_max_concurrent_requests_in_bytes=-1; native_transport_max_concurrent_requests_in_bytes_per_ip=-1; native_transport_max_frame_size_in_mb=256; native_transport_max_negotiable_protocol_version=-2147483648; native_transport_max_threads=128; native_transport_port=9042; native_transport_port_ssl=null; num_tokens=256; otc_backlog_expiration_interval_ms=200; otc_coalescing_enough_coalesced_messages=8; otc_coalescing_strategy=DISABLED; otc_coalescing_window_us=200; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_cache_max_entries=1000; permissions_update_interval_in_ms=-1; permissions_validity_in_ms=2000; phi_convict_threshold=8.0; prepared_statements_cache_size_mb=null; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; repair_session_max_tree_depth=18; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_scheduler_id=null; request_scheduler_options=null; request_timeout_in_ms=10000; role_manager=CassandraRoleManager; roles_cache_max_entries=1000; roles_update_interval_in_ms=-1; roles_validity_in_ms=2000; row_cache_class_name=org.apache.cassandra.cache.OHCProvider; row_cache_keys_to_save=2147483647; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=127.0.0.2; rpc_interface=null; rpc_interface_prefer_ipv6=false; rpc_keepalive=true; rpc_listen_backlog=50; rpc_max_threads=2147483647; rpc_min_threads=16; rpc_port=9160; rpc_recv_buff_size_in_bytes=null; rpc_send_buff_size_in_bytes=null; rpc_server_type=sync; saved_caches_directory=null; seed_provider=org.apache.cassandra.locator.SimpleSeedProvider{seeds=127.0.0.1}; server_encryption_options=<REDACTED>; slow_query_log_timeout_in_ms=500; snapshot_before_compaction=false; snapshot_on_duplicate_row_detection=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=false; storage_port=7000; stream_throughput_outbound_megabits_per_sec=200; streaming_keep_alive_period_in_secs=300; streaming_socket_timeout_in_ms=86400000; thrift_framed_transport_size_in_mb=15; thrift_max_message_length_in_mb=16; thrift_prepared_statements_cache_size_mb=null; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; tracetype_query_ttl=86400; tracetype_repair_ttl=604800; transparent_data_encryption_options=org.apache.cassandra.config.TransparentDataEncryptionOptions#5656be13; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; unlogged_batch_across_partitions_warn_threshold=10; user_defined_function_fail_timeout=1500; user_defined_function_warn_timeout=500; user_function_timeout_policy=die; windows_timer_interval=1; write_request_timeout_in_ms=2000] INFO [main] 2020-08-17 17:31:22,361 DatabaseDescriptor.java:381 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO [main] 2020-08-17 17:31:22,366 DatabaseDescriptor.java:439 - Global memtable on-heap threshold is enabled at 503MB INFO [main] 2020-08-17 17:31:22,367 DatabaseDescriptor.java:443 - Global memtable off-heap threshold is enabled at 503MB INFO [main] 2020-08-17 17:31:22,538 RateBasedBackPressure.java:123 - Initialized back-pressure with high ratio: 0.9, factor: 5, flow: FAST, window size: 2000. INFO [main] 2020-08-17 17:31:22,538 DatabaseDescriptor.java:773 - Back-pressure is disabled with strategy org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9, factor=5, flow=FAST}. INFO [main] 2020-08-17 17:31:22,686 JMXServerUtils.java:252 - Configured JMX server at: service:jmx:rmi://127.0.0.1/jndi/rmi://127.0.0.1:7199/jmxrmi INFO [main] 2020-08-17 17:31:22,700 CassandraDaemon.java:490 - Hostname: DESKTOP-NQ7673H INFO [main] 2020-08-17 17:31:22,703 CassandraDaemon.java:497 - JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.8.0_261 INFO [main] 2020-08-17 17:31:22,709 CassandraDaemon.java:498 - Heap size: 1.968GiB/1.968GiB INFO [main] 2020-08-17 17:31:22,712 CassandraDaemon.java:503 - Code Cache Non-heap memory: init = 2555904(2496K) used = 5181696(5060K) committed = 5242880(5120K) max = 251658240(245760K) INFO [main] 2020-08-17 17:31:22,714 CassandraDaemon.java:503 - Metaspace Non-heap memory: init = 0(0K) used = 19412472(18957K) committed = 20054016(19584K) max
= -1(-1K) INFO [main] 2020-08-17 17:31:22,738 CassandraDaemon.java:503 - Compressed Class Space Non-heap memory: init = 0(0K) used = 2373616(2317K) committed = 2621440(2560K) max = 1073741824(1048576K) INFO [main] 2020-08-17 17:31:22,739 CassandraDaemon.java:503 - Par Eden Space Heap memory: init = 279183360(272640K) used = 111694624(109076K) committed = 279183360(272640K) max = 279183360(272640K) INFO [main] 2020-08-17 17:31:22,740 CassandraDaemon.java:503 - Par Survivor Space Heap memory: init = 34865152(34048K) used = 0(0K) committed = 34865152(34048K) max = 34865152(34048K) INFO [main] 2020-08-17 17:31:22,743 CassandraDaemon.java:503 - CMS Old Gen Heap memory: init
= 1798569984(1756416K) used = 0(0K) committed = 1798569984(1756416K) max = 1798569984(1756416K) INFO [main] 2020-08-17 17:31:22,744 CassandraDaemon.java:505 - Classpath: D:\apache-cassandra-3.11.7\conf;D:\apache-cassandra-3.11.7\lib\airline-0.6.jar;D:\apache-cassandra-3.11.7\lib\antlr-runtime-3.5.2.jar;D:\apache-cassandra-3.11.7\lib\apache-cassandra-3.11.7.jar;D:\apache-cassandra-3.11.7\lib\apache-cassandra-thrift-3.11.7.jar;D:\apache-cassandra-3.11.7\lib\asm-5.0.4.jar;D:\apache-cassandra-3.11.7\lib\caffeine-2.2.6.jar;D:\apache-cassandra-3.11.7\lib\cassandra-driver-core-3.0.1-shaded.jar;D:\apache-cassandra-3.11.7\lib\commons-cli-1.1.jar;D:\apache-cassandra-3.11.7\lib\commons-codec-1.9.jar;D:\apache-cassandra-3.11.7\lib\commons-lang3-3.1.jar;D:\apache-cassandra-3.11.7\lib\commons-math3-3.2.jar;D:\apache-cassandra-3.11.7\lib\compress-lzf-0.8.4.jar;D:\apache-cassandra-3.11.7\lib\concurrent-trees-2.4.0.jar;D:\apache-cassandra-3.11.7\lib\concurrentlinkedhashmap-lru-1.4.jar;D:\apache-cassandra-3.11.7\lib\disruptor-3.0.1.jar;D:\apache-cassandra-3.11.7\lib\ecj-4.4.2.jar;D:\apache-cassandra-3.11.7\lib\guava-18.0.jar;D:\apache-cassandra-3.11.7\lib\HdrHistogram-2.1.9.jar;D:\apache-cassandra-3.11.7\lib\high-scale-lib-1.0.6.jar;D:\apache-cassandra-3.11.7\lib\hppc-0.5.4.jar;D:\apache-cassandra-3.11.7\lib\jackson-annotations-2.9.10.jar;D:\apache-cassandra-3.11.7\lib\jackson-core-2.9.10.jar;D:\apache-cassandra-3.11.7\lib\jackson-databind-2.9.10.4.jar;D:\apache-cassandra-3.11.7\lib\jamm-0.3.0.jar;D:\apache-cassandra-3.11.7\lib\javax.inject.jar;D:\apache-cassandra-3.11.7\lib\jbcrypt-0.3m.jar;D:\apache-cassandra-3.11.7\lib\jcl-over-slf4j-1.7.7.jar;D:\apache-cassandra-3.11.7\lib\jctools-core-1.2.1.jar;D:\apache-cassandra-3.11.7\lib\jflex-1.6.0.jar;D:\apache-cassandra-3.11.7\lib\jna-4.2.2.jar;D:\apache-cassandra-3.11.7\lib\joda-time-2.4.jar;D:\apache-cassandra-3.11.7\lib\json-simple-1.1.jar;D:\apache-cassandra-3.11.7\lib\jstackjunit-0.0.1.jar;D:\apache-cassandra-3.11.7\lib\libthrift-0.9.2.jar;D:\apache-cassandra-3.11.7\lib\log4j-over-slf4j-1.7.7.jar;D:\apache-cassandra-3.11.7\lib\logback-classic-1.1.3.jar;D:\apache-cassandra-3.11.7\lib\logback-core-1.1.3.jar;D:\apache-cassandra-3.11.7\lib\lz4-1.3.0.jar;D:\apache-cassandra-3.11.7\lib\metrics-core-3.1.5.jar;D:\apache-cassandra-3.11.7\lib\metrics-jvm-3.1.5.jar;D:\apache-cassandra-3.11.7\lib\metrics-logback-3.1.5.jar;D:\apache-cassandra-3.11.7\lib\netty-all-4.0.44.Final.jar;D:\apache-cassandra-3.11.7\lib\ohc-core-0.4.4.jar;D:\apache-cassandra-3.11.7\lib\ohc-core-j8-0.4.4.jar;D:\apache-cassandra-3.11.7\lib\reporter-config-base-3.0.3.jar;D:\apache-cassandra-3.11.7\lib\reporter-config3-3.0.3.jar;D:\apache-cassandra-3.11.7\lib\sigar-1.6.4.jar;D:\apache-cassandra-3.11.7\lib\slf4j-api-1.7.7.jar;D:\apache-cassandra-3.11.7\lib\snakeyaml-1.11.jar;D:\apache-cassandra-3.11.7\lib\snappy-java-1.1.1.7.jar;D:\apache-cassandra-3.11.7\lib\snowball-stemmer-1.3.0.581.1.jar;D:\apache-cassandra-3.11.7\lib\ST4-4.0.8.jar;D:\apache-cassandra-3.11.7\lib\stream-2.5.2.jar;D:\apache-cassandra-3.11.7\lib\thrift-server-0.3.7.jar;D:\apache-cassandra-3.11.7\build\classes\main;D:\apache-cassandra-3.11.7\build\classes\thrift;D:\apache-cassandra-3.11.7\lib\jamm-0.3.0.jar INFO [main] 2020-08-17 17:31:22,747 CassandraDaemon.java:507 - JVM Arguments: [-ea,
-javaagent:D:\apache-cassandra-3.11.7\lib\jamm-0.3.0.jar, -Xms2G, -Xmx2G, -XX:+HeapDumpOnOutOfMemoryError, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled, -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Dlogback.configurationFile=logback.xml, -Djava.library.path=D:\apache-cassandra-3.11.7\lib\sigar-bin, -Dcassandra.jmx.local.port=7199, -Dcassandra, -Dcassandra-foreground=yes, -Dcassandra.logdir=D:\apache-cassandra-3.11.7\logs, -Dcassandra.storagedir=D:\apache-cassandra-3.11.7\data] WARN [main] 2020-08-17 17:31:22,763 StartupChecks.java:169 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info. WARN [main] 2020-08-17 17:31:22,768 StartupChecks.java:220 - The JVM is not configured to stop on OutOfMemoryError which can cause data corruption. Use one of the following JVM options to configure the behavior on OutOfMemoryError: -XX:+ExitOnOutOfMemoryError,
-XX:+CrashOnOutOfMemoryError, or -XX:OnOutOfMemoryError="<cmd args>;<cmd args>" INFO [main] 2020-08-17 17:31:22,774 SigarLibrary.java:44 - Initializing SIGAR library
Can you pls help me resolve the port issue which is stopping me from running two instances.
In addition, I'll appreciate if you suggest other ways to get this done e.g. options 1 or 2 as I've mentioned in the beginning of my post or GCP or AWS with free account.
Option 2 - using docker - is very easy. This is described in details here: cassandra - docker hub
First of all you must increase resources in docker, because default configuration migh be too small for running more than 1-2 nodes - I am using 8 GB RAM and 3 cpu on my laptop and I am able to start 3 nodes of Cassandra simultanously.
The first step is to create a bridge network which will be used by Cassandra's cluster:
docker network create mynet
Then start a first node :
docker run --name cas1 --network mynet -d cassandra
then wait about one minute until the node starts, and then start another node:
docker run --name cas2 --network mynet -d -e CASSANDRA_SEEDS=cas1 cassandra
and again, start third node:
docker run --name cas3 --network mynet -d -e CASSANDRA_SEEDS=cas1 cassandra
Finally start another docker container instance in interactive mode, run cqlsh in it and connect cqlsh to your cluster (to the first node named cas1):
docker run -it --network mynet --rm cassandra cqlsh cas1
Connected to Test Cluster at cas1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.7 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>
There is 3 node cassandra cluster running and which is serving Production Traffic And in cassandra.yaml file "endpoint_snitch: GossipingPropertyFileSnitch" is configured but somehow we have forgot to remove file cassandra-topology.properties from cassandra conf directory. As per Cassandra documentation if you are using GossipingPropertyFileSnitch you should remove cassandra-topology.properties file.
Now As all three nodes are running and serving Production traffic So can I remove this file all three nodes or I have to remove this file after shutdown the nodes one by one.
Apache Cassandra Version is "3.11.2"
./bin/nodetool status
Datacenter: dc1
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN x.x.x.x1 409.39 GiB 256 62.9% cshdkd-6065-4813-ae53-sdh89hs98so RAC1
UN x.x.x.x2 546.33 GiB 256 67.8% jfdsdk-f18f-4d46-af95-33jw9yhfcsd RAC2
UN x.x.x.x3 594.73 GiB 256 69.3% 7s9skk-a27f-4875-a410-sdsiudw9eww RAC3
If the cluster is already migrated to GossippingPropertyFileSnitch, then you can safely remove that file without stopping the cluster nodes. See the item 7 in DSE 5.1 documentation (compatible with Cassandra 3.11)
dsetool status
DC: dc1 Workload: Cassandra Graph: no
======================================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns VNodes Rack Health [0,1]
UN 192.168.1.130 810.47 MiB ? 256 2a 0.90
UN 192.168.1.131 683.53 MiB ? 256 2a 0.90
UN 192.168.1.132 821.33 MiB ? 256 2a 0.90
DC: dc2 Workload: Analytics Graph: no Analytics Master: 192.168.2.131
=========================================================================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns VNodes Rack Health [0,1]
UN 192.168.2.130 667.05 MiB ? 256 2a 0.90
UN 192.168.2.131 845.48 MiB ? 256 2a 0.90
UN 192.168.2.132 887.92 MiB ? 256 2a 0.90
when I try to launch the spark-submit job
dse -u user -p password spark-submit --class com.sparkLauncher test.jar prf
i am getting the following error (edited)
ERROR 2017-09-14 20:14:14,174 org.apache.spark.deploy.rm.DseAppClient$ClientEndpoint: Failed to connect to DSE resource manager
java.io.IOException: Failed to register with master: dse://?
....
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: The method DseResourceManager.registerApplication does not exist. Make sure that the required component for that method is active/enabled
....
ERROR 2017-09-14 20:14:14,179 org.apache.spark.deploy.rm.DseSchedulerBackend: Application has been killed. Reason: Failed to connect to DSE resource manager: Failed to register with master: dse://?
org.apache.spark.SparkException: Exiting due to error from cluster scheduler: Failed to connect to DSE resource manager: Failed to register with master: dse://?
....
WARN 2017-09-14 20:14:14,179 org.apache.spark.deploy.rm.DseSchedulerBackend: Application ID is not initialized yet.
ERROR 2017-09-14 20:14:14,384 org.apache.spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
ERROR 2017-09-14 20:14:14,387 org.apache.spark.deploy.DseSparkSubmitBootstrapper: Failed to start or submit Spark application
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
I can confirm that I have granted privileges as mentioned in this documentation, https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/security/secAuthSpark.html
I am trying this on AWS if that makes a difference and I can confirm that the routes between the nodes are all open.
I am able to start spark shell from any of the spark nodes, can bring up the Spark UI, can get spark master from cqlsh commands
Any pointers will be helpful, thanks in advance!
The master address must point to one or more nodes in a valid Analytics enabled datacenter.
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException:
The method DseResourceManager.registerApplication does not exist.
Make sure that the required component for that method is active/enabled```
Indicates that the connected node was not analytics enabled.
If you run from a non analytics node you must still point at one of the analytics nodes in the master ui.
dse://[Spark node address[:port number]]?[parameter name=parameter value;]...
By default the dse://? url connects to localhost for it's initial cluster connection.
See the documentation for more information.
For some reason I am unable to pin point, I can run it as mentioned in cluster mode but not in client mode
I am trying to start cassandra so I did
sudo ./cassandra
I came across this
Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: node24.nise.local: node24.nise.local
so I did what was mentioned as on problem on starting cassandra link and i changed the /etc/hosts file.
Then the starting process got stuck after this:
INFO 22:27:14,227 CFS(Keyspace='system', ColumnFamily='local') liveRatio is 33.904761904761905 (just-counted was 33.904761904761905). calculation took 110ms for 3 cells
INFO 22:27:14,260 Enqueuing flush of Memtable-local#726006040(84/840 serialized/live bytes, 4 ops)
INFO 22:27:14,262 Writing Memtable-local#726006040(84/2848 serialized/live bytes, 4 ops)
INFO 22:27:14,280 Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-50-Data.db (116 bytes) for commitlog position ReplayPosition(segmentId=1401859631027, position=500327)
WARN 22:27:14,327 setting live ratio to maximum of 64.0 instead of Infinity
INFO 22:27:14,327 Enqueuing flush of Memtable-local#1689909512(10100/101000 serialized/live bytes, 259 ops)
INFO 22:27:14,328 CFS(Keyspace='system', ColumnFamily='local') liveRatio is 64.0 (just-counted was 64.0). calculation took 0ms for 0 cells
INFO 22:27:14,350 Writing Memtable-local#1689909512(10100/101000 serialized/live bytes, 259 ops)
INFO 22:27:14,386 Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-51-Data.db (5278 bytes) for commitlog position ReplayPosition(segmentId=1401859631027, position=512328)
INFO 22:27:14,493 Node localhost/127.0.0.1 state jump to normal
No other line was executed after this . Can anyone help in letting me know why did this happen exactly.
I was getting same error ..
you just need to do give command in command prompt
hostname localhost (or the hostname of where cassandra is running)
This believe it will solve your problem
I think after this statement
INFO 22:27:14,493 Node localhost/127.0.0.1 state jump to normal
your server running normally, to verify do jps and check that CassandraDaemon is running or not.
Are there any known issues with initial_token collision when adding nodes to a cluster in a VM environment?
I'm working on a 4 node cluster set up on a VM. We're running into issues when we attempt to add nodes to the cluster.
In the cassandra.yaml file, initial_token is left blank.
Since we're running > 1.0 cassandra, auto_bootstrap should be true by default.
It's my understanding that each of the nodes in the cluster should be assigned an initial token at startup.
This is not what we're currently seeing.
We do not want to manually set the value for initial_token for each node (kind of defeats the goal of being dynamic..)
We also have set the partitioner to random: partitioner: org.apache.cassandra.dht.RandomPartitioner
I've outlined the steps we follow and results we are seeing below.
Can someone please asdvise as to what we're missing here?
Here are the detailed steps we are taking:
1) Kill all cassandra instances and delete data & commit log files on each node.
2) Startup Seed Node (S.S.S.S)
Starts up fine.
3) Run nodetool -h W.W.W.W ring and see:
Address DC Rack Status State Load Effective-Ownership Token
S.S.S.S datacenter1 rack1 Up Normal 28.37 GB 100.00% 24360745721352799263907128727168388463
4) X.X.X.X Startup
INFO [GossipStage:1] 2012-11-29 21:16:02,194 Gossiper.java (line 850) Node /X.X.X.X is now part of the cluster
INFO [GossipStage:1] 2012-11-29 21:16:02,194 Gossiper.java (line 816) InetAddress /X.X.X.X is now UP
INFO [GossipStage:1] 2012-11-29 21:16:02,195 StorageService.java (line 1138) Nodes /X.X.X.X and /Y.Y.Y.Y have the same token 113436792799830839333714191906879955254. /X.X.X.X is the new owner
WARN [GossipStage:1] 2012-11-29 21:16:02,195 TokenMetadata.java (line 160) Token 113436792799830839333714191906879955254 changing ownership from /Y.Y.Y.Y to /X.X.X.X
5) Run nodetool -h W.W.W.W ring and see:
Address DC Rack Status State Load Effective-Ownership Token
113436792799830839333714191906879955254
S.S.S.S datacenter1 rack1 Up Normal 28.37 GB 100.00% 24360745721352799263907128727168388463
W.W.W.W datacenter1 rack1 Up Normal 123.87 KB 100.00% 113436792799830839333714191906879955254
6) Y.Y.Y.Y Startup
INFO [GossipStage:1] 2012-11-29 21:17:36,458 Gossiper.java (line 850) Node /Y.Y.Y.Y is now part of the cluster
INFO [GossipStage:1] 2012-11-29 21:17:36,459 Gossiper.java (line 816) InetAddress /Y.Y.Y.Y is now UP
INFO [GossipStage:1] 2012-11-29 21:17:36,459 StorageService.java (line 1138) Nodes /Y.Y.Y.Y and /X.X.X.X have the same token 113436792799830839333714191906879955254. /Y.Y.Y.Y is the new owner
WARN [GossipStage:1] 2012-11-29 21:17:36,459 TokenMetadata.java (line 160) Token 113436792799830839333714191906879955254 changing ownership from /X.X.X.X to /Y.Y.Y.Y
7) Run nodetool -h W.W.W.W ring and see:
Address DC Rack Status State Load Effective-Ownership Token
113436792799830839333714191906879955254
S.S.S.S datacenter1 rack1 Up Normal 28.37 GB 100.00% 24360745721352799263907128727168388463
Y.Y.Y.Y datacenter1 rack1 Up Normal 123.87 KB 100.00% 113436792799830839333714191906879955254
8) Z.Z.Z.Z Startup
INFO [GossipStage:1] 2012-11-30 04:52:28,590 Gossiper.java (line 850) Node /Z.Z.Z.Z is now part of the cluster
INFO [GossipStage:1] 2012-11-30 04:52:28,591 Gossiper.java (line 816) InetAddress /Z.Z.Z.Z is now UP
INFO [GossipStage:1] 2012-11-30 04:52:28,591 StorageService.java (line 1138) Nodes /Z.Z.Z.Z and /Y.Y.Y.Y have the same token 113436792799830839333714191906879955254. /Z.Z.Z.Z is the new owner
WARN [GossipStage:1] 2012-11-30 04:52:28,592 TokenMetadata.java (line 160) Token 113436792799830839333714191906879955254 changing ownership from /Y.Y.Y.Y to /Z.Z.Z.Z
9) Run nodetool -h W.W.W.W ring and see:
Address DC Rack Status State Load Effective-Ownership Token
113436792799830839333714191906879955254
W.W.W.W datacenter1 rack1 Up Normal 28.37 GB 100.00% 24360745721352799263907128727168388463
S.S.S.S datacenter1 rack1 Up Normal 28.37 GB 100.00% 24360745721352799263907128727168388463
Z.Z.Z.Z datacenter1 rack1 Up Normal 123.87 KB 100.00% 113436792799830839333714191906879955254
Thanks in advance.
This is what I did to fix this problem:
Stop the Cassandra service
Set auto_bootstrap: false on the seed node.
Empty data and commitlog directories:
sudo rm -rf /var/lib/cassandra/data/*
sudo rm -rf /var/lib/cassandra/commitlog/*
And then restart the service
I tested this with Cassandra 3.7.
Clearly your nodes are holding onto some past cluster information that is being used at startup. Make sure to delete the LocationInfo directories, which contain the data about the cluster. You have a very strange token layout (where's the 0 token, for example?), so you're certainly going to need to reassign them if you want the proper ownership.
It may help to explain how token assignment works, so let me also address this. In a brand new cluster, the first node will get assigned token 0 by default and will have 100% ownership. If you do not specify a token for your next node, Cassandra will calculate a token such that the original node owns the lower 50% and the new node the higher 50%.
When you add node 3, it will insert the token between the first and second, so you'll actually end up with ownership that looks like 25%, 25%, 50%. This is really important, because the lesson to learn here is that Cassandra will NEVER reassign a token by itself to balance the ring. If you want your ownership balanced properly, you must assign your own tokens. This is not hard to do, and there's actually a utility provided to do this.
So Cassandra's initial bootstrap process, while dynamic, may not yield the desired ring balance. You can't simply allow new nodes to join willy nilly without some intervention to make sure you get the desired result. Otherwise you will end up with the scenario you have laid out in your question.