how to check when did cassandra got shut down or started/re-started from logs? - cassandra

nodetool status -- shows some nodes as down and also application is not able to reach the cassandra nodes. When I connect to cassandra nodes and check the logs there are some errors in the logs (system.log and debug.log).
Didn't understand how to check if when did Cassandra got started / re-started and got shutdown. Is there any way to check this from logs ? if so which log and how ?

In the logs folder of your Cassandra installation you should find system.log file. The default location for the log files is /var/log/cassandra.
When Cassandra starts the output looks like this:
INFO [main] 2018-08-18 19:10:27,162 YamlConfigurationLoader.java:89 - Configuration location: file:/home/20171127/.ccm/3113/node1/conf/cassandra.yaml
INFO [main] 2018-08-18 19:10:27,696 Config.java:495 - Node configuration:[allocate_tokens_for_keyspace=null; authenticator=AllowAllAuthenticator;INFO [main] 2018-08-18 19:10:27,698 DatabaseDescriptor.java:367 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO [main] 2018-08-18 19:10:27,699 DatabaseDescriptor.java:425 - Global memtable on-heap threshold is enabled at 123MB
INFO [main] 2018-08-18 19:10:27,700 DatabaseDescriptor.java:429 - Global memtable off-heap threshold is enabled at 123MB
WARN [main] 2018-08-18 19:10:27,974 DatabaseDescriptor.java:550 - Only 30.922GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
INFO [main] 2018-08-18 19:10:28,025 RateBasedBackPressure.java:123 - Initialized back-pressure with high ratio: 0.9, factor: 5, flow: FAST, window size: 2000.
INFO [main] 2018-08-18 19:10:28,026 DatabaseDescriptor.java:729 - Back-pressure is disabled with strategy org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9, factor=5, flow=FAST}.
INFO [main] 2018-08-18 19:10:28,265 JMXServerUtils.java:246 - Configured JMX server at: service:jmx:rmi://127.0.0.1/jndi/rmi://127.0.0.1:7100/jmxrmi
INFO [main] 2018-08-18 19:10:28,282 CassandraDaemon.java:473 - Hostname: 2VVg5
INFO [main] 2018-08-18 19:10:28,284 CassandraDaemon.java:480 - JVM vendor/version: OpenJDK 64-Bit Server VM/1.8.0_181
INFO [main] 2018-08-18 19:10:28,285 CassandraDaemon.java:481 - Heap size: 495.000MiB/495.000MiB
INFO [main] 2018-08-18 19:10:28,287 CassandraDaemon.java:486 - Code Cache Non-heap memory: init = 2555904(2496K) used = 4178816(4080K) committed = 4194304(4096K) max = 251658240(245760K)
INFO [main] 2018-08-18 19:10:28,289 CassandraDaemon.java:486 - Metaspace Non-heap memory: init = 0(0K) used = 18530200(18095K) committed = 19005440(18560K) max = -1(-1K)
INFO [main] 2018-08-18 19:10:28,289 CassandraDaemon.java:486 - Compressed Class Space Non-heap memory: init = 0(0K) used = 2092504(2043K) committed = 2228224(2176K) max = 1073741824(1048576K)
INFO [main] 2018-08-18 19:10:28,290 CassandraDaemon.java:486 - Par Eden Space Heap memory: init = 41943040(40960K) used = 41943040(40960K) committed = 41943040(40960K) max = 41943040(40960K)
INFO [main] 2018-08-18 19:10:28,291 CassandraDaemon.java:486 - Par Survivor Space Heap memory: init = 5242880(5120K) used = 5242880(5120K) committed = 5242880(5120K) max = 5242880(5120K)
INFO [main] 2018-08-18 19:10:28,293 CassandraDaemon.java:486 - CMS Old Gen Heap memory: init = 471859200(460800K) used = 669336(653K) committed = 471859200(460800K) max = 471859200(460800K)
When Cassandra shuts down properly, the log entries look like this:
INFO [StorageServiceShutdownHook] 2018-08-18 19:28:23,661 Gossiper.java:1559 - Announcing shutdown
INFO [StorageServiceShutdownHook] 2018-08-18 19:28:23,662 StorageService.java:2289 - Node /127.0.0.1 state jump to shutdown
INFO [StorageServiceShutdownHook] 2018-08-18 19:28:25,664 MessagingService.java:992 - Waiting for messaging service to quiesce
INFO [ACCEPT-/127.0.0.1] 2018-08-18 19:28:25,665 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO [StorageServiceShutdownHook] 2018-08-18 19:28:25,729 HintsService.java:220 - Paused hints dispatch
Depending on how Cassandra stopped is possible to not have any log entries where you could see that Cassandra stopped.
Other interesting files that you could look into are debug.log and gc.log.

Related

Setting up two instance Cassandra cluster on a single physical node (laptop with windows 10)

My main objective is to experiment with a Cassandra cluster. I have a Laptop so my options are, as I understand, (1) Use some virtualization software (e.g. Hyper-V) to create multiple VMs and then run each VM with a Cassandra instance, or (2) Use docker to create multiple instances of Cassandra, or (3) Directly run multiple instances.
I thought (3) would be faster, and provide me with more insights. So I tried this (by following https://stackoverflow.com/a/25348301/1029599). But I'm getting strange situations when I see that I'm not able to change the JMX port. More details below:
I've created two folders of Cassandra 3.11.7 - one in C drive and other in D drive.
For C drive folder, I've edited cassandra.yaml to replace 'listen_address: localhost' by 'listen_address: 127.0.0.1' and 'rpc_address: localhost' by 'rpc_address: 127.0.0.1'. In addition, set seeds to to point to D-drive-instance as '- seeds: "127.0.0.2"'. I've NOT edited cassandra-env.sh to let JMX_PORT be the default 7199.
For D drive folder, I've edited cassandra.yaml to point localhost as '127.0.0.2' and seed as '- seeds: "127.0.0.1"'. In addition, I've edited cassandra-env.sh to let JMX_PORT=7200.
Surprisingly, when I'm starting D drive's cassandra instance, it's always picking JMX_PORT as 7199 and not 7200.
Log (relevant portion from the start):
D:\apache-cassandra-3.11.7\bin>.\cassandra
WARNING! Powershell script execution unavailable. Please use 'powershell Set-ExecutionPolicy Unrestricted' on this user-account to run cassandra with fully featured functionality on this platform. Starting with legacy startup options Starting Cassandra Server INFO [main] 2020-08-17 17:31:21,632 YamlConfigurationLoader.java:89 - Configuration location: file:/D:/apache-cassandra-3.11.7/conf/cassandra.yaml INFO [main] 2020-08-17 17:31:22,359 Config.java:534 - Node configuration:[allocate_tokens_for_keyspace=null; authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_bootstrap=true; auto_snapshot=true; back_pressure_enabled=false; back_pressure_strategy=org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9, factor=5, flow=FAST}; batch_size_fail_threshold_in_kb=50; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_address=null; broadcast_rpc_address=null; buffer_pool_use_heap_if_exhausted=true; cas_contention_timeout_in_ms=1000; cdc_enabled=false; cdc_free_space_check_interval_ms=250; cdc_raw_directory=null; cdc_total_space_in_mb=0; check_for_duplicate_rows_during_compaction=true; check_for_duplicate_rows_during_reads=true; client_encryption_options=<REDACTED>; cluster_name=Test Cluster; column_index_cache_size_in_kb=2; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_compression=null; commitlog_directory=null; commitlog_max_compression_buffers_in_pool=3; commitlog_periodic_queue_size=-1; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_batch_window_in_ms=NaN; commitlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=null; compaction_large_partition_warning_threshold_mb=100; compaction_throughput_mb_per_sec=16; concurrent_compactors=null; concurrent_counter_writes=32; concurrent_materialized_view_writes=32; concurrent_reads=32; concurrent_replicates=null; concurrent_writes=32; counter_cache_keys_to_save=2147483647; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; credentials_cache_max_entries=1000; credentials_update_interval_in_ms=-1; credentials_validity_in_ms=2000; cross_node_timeout=false; data_file_directories=[Ljava.lang.String;#235834f2; disk_access_mode=auto; disk_failure_policy=stop; disk_optimization_estimate_percentile=0.95; disk_optimization_page_cross_chance=0.1; disk_optimization_strategy=ssd; dynamic_snitch=true; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; enable_materialized_views=true; enable_sasi_indexes=true; enable_scripted_user_defined_functions=false; enable_user_defined_functions=false; enable_user_defined_functions_threads=true; encryption_options=null; endpoint_snitch=SimpleSnitch; file_cache_round_up=null; file_cache_size_in_mb=null; gc_log_threshold_in_ms=200; gc_warn_threshold_in_ms=1000; hinted_handoff_disabled_datacenters=[]; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; hints_compression=null; hints_directory=null; hints_flush_period_in_ms=10000; incremental_backups=false; index_interval=null; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; initial_token=null; inter_dc_stream_throughput_outbound_megabits_per_sec=200; inter_dc_tcp_nodelay=false; internode_authenticator=null; internode_compression=dc; internode_recv_buff_size_in_bytes=0; internode_send_buff_size_in_bytes=0; key_cache_keys_to_save=2147483647; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=127.0.0.2; listen_interface=null; listen_interface_prefer_ipv6=false; listen_on_broadcast_address=false; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; max_hints_file_size_in_mb=128; max_mutation_size_in_kb=null; max_streaming_retries=3; max_value_size_in_mb=256; memtable_allocation_type=heap_buffers; memtable_cleanup_threshold=null; memtable_flush_writers=0; memtable_heap_space_in_mb=null; memtable_offheap_space_in_mb=null; min_free_space_per_drive_in_mb=50; native_transport_flush_in_batches_legacy=true; native_transport_max_concurrent_connections=-1; native_transport_max_concurrent_connections_per_ip=-1; native_transport_max_concurrent_requests_in_bytes=-1; native_transport_max_concurrent_requests_in_bytes_per_ip=-1; native_transport_max_frame_size_in_mb=256; native_transport_max_negotiable_protocol_version=-2147483648; native_transport_max_threads=128; native_transport_port=9042; native_transport_port_ssl=null; num_tokens=256; otc_backlog_expiration_interval_ms=200; otc_coalescing_enough_coalesced_messages=8; otc_coalescing_strategy=DISABLED; otc_coalescing_window_us=200; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_cache_max_entries=1000; permissions_update_interval_in_ms=-1; permissions_validity_in_ms=2000; phi_convict_threshold=8.0; prepared_statements_cache_size_mb=null; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; repair_session_max_tree_depth=18; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_scheduler_id=null; request_scheduler_options=null; request_timeout_in_ms=10000; role_manager=CassandraRoleManager; roles_cache_max_entries=1000; roles_update_interval_in_ms=-1; roles_validity_in_ms=2000; row_cache_class_name=org.apache.cassandra.cache.OHCProvider; row_cache_keys_to_save=2147483647; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=127.0.0.2; rpc_interface=null; rpc_interface_prefer_ipv6=false; rpc_keepalive=true; rpc_listen_backlog=50; rpc_max_threads=2147483647; rpc_min_threads=16; rpc_port=9160; rpc_recv_buff_size_in_bytes=null; rpc_send_buff_size_in_bytes=null; rpc_server_type=sync; saved_caches_directory=null; seed_provider=org.apache.cassandra.locator.SimpleSeedProvider{seeds=127.0.0.1}; server_encryption_options=<REDACTED>; slow_query_log_timeout_in_ms=500; snapshot_before_compaction=false; snapshot_on_duplicate_row_detection=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=false; storage_port=7000; stream_throughput_outbound_megabits_per_sec=200; streaming_keep_alive_period_in_secs=300; streaming_socket_timeout_in_ms=86400000; thrift_framed_transport_size_in_mb=15; thrift_max_message_length_in_mb=16; thrift_prepared_statements_cache_size_mb=null; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; tracetype_query_ttl=86400; tracetype_repair_ttl=604800; transparent_data_encryption_options=org.apache.cassandra.config.TransparentDataEncryptionOptions#5656be13; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; unlogged_batch_across_partitions_warn_threshold=10; user_defined_function_fail_timeout=1500; user_defined_function_warn_timeout=500; user_function_timeout_policy=die; windows_timer_interval=1; write_request_timeout_in_ms=2000] INFO [main] 2020-08-17 17:31:22,361 DatabaseDescriptor.java:381 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO [main] 2020-08-17 17:31:22,366 DatabaseDescriptor.java:439 - Global memtable on-heap threshold is enabled at 503MB INFO [main] 2020-08-17 17:31:22,367 DatabaseDescriptor.java:443 - Global memtable off-heap threshold is enabled at 503MB INFO [main] 2020-08-17 17:31:22,538 RateBasedBackPressure.java:123 - Initialized back-pressure with high ratio: 0.9, factor: 5, flow: FAST, window size: 2000. INFO [main] 2020-08-17 17:31:22,538 DatabaseDescriptor.java:773 - Back-pressure is disabled with strategy org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9, factor=5, flow=FAST}. INFO [main] 2020-08-17 17:31:22,686 JMXServerUtils.java:252 - Configured JMX server at: service:jmx:rmi://127.0.0.1/jndi/rmi://127.0.0.1:7199/jmxrmi INFO [main] 2020-08-17 17:31:22,700 CassandraDaemon.java:490 - Hostname: DESKTOP-NQ7673H INFO [main] 2020-08-17 17:31:22,703 CassandraDaemon.java:497 - JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.8.0_261 INFO [main] 2020-08-17 17:31:22,709 CassandraDaemon.java:498 - Heap size: 1.968GiB/1.968GiB INFO [main] 2020-08-17 17:31:22,712 CassandraDaemon.java:503 - Code Cache Non-heap memory: init = 2555904(2496K) used = 5181696(5060K) committed = 5242880(5120K) max = 251658240(245760K) INFO [main] 2020-08-17 17:31:22,714 CassandraDaemon.java:503 - Metaspace Non-heap memory: init = 0(0K) used = 19412472(18957K) committed = 20054016(19584K) max
= -1(-1K) INFO [main] 2020-08-17 17:31:22,738 CassandraDaemon.java:503 - Compressed Class Space Non-heap memory: init = 0(0K) used = 2373616(2317K) committed = 2621440(2560K) max = 1073741824(1048576K) INFO [main] 2020-08-17 17:31:22,739 CassandraDaemon.java:503 - Par Eden Space Heap memory: init = 279183360(272640K) used = 111694624(109076K) committed = 279183360(272640K) max = 279183360(272640K) INFO [main] 2020-08-17 17:31:22,740 CassandraDaemon.java:503 - Par Survivor Space Heap memory: init = 34865152(34048K) used = 0(0K) committed = 34865152(34048K) max = 34865152(34048K) INFO [main] 2020-08-17 17:31:22,743 CassandraDaemon.java:503 - CMS Old Gen Heap memory: init
= 1798569984(1756416K) used = 0(0K) committed = 1798569984(1756416K) max = 1798569984(1756416K) INFO [main] 2020-08-17 17:31:22,744 CassandraDaemon.java:505 - Classpath: D:\apache-cassandra-3.11.7\conf;D:\apache-cassandra-3.11.7\lib\airline-0.6.jar;D:\apache-cassandra-3.11.7\lib\antlr-runtime-3.5.2.jar;D:\apache-cassandra-3.11.7\lib\apache-cassandra-3.11.7.jar;D:\apache-cassandra-3.11.7\lib\apache-cassandra-thrift-3.11.7.jar;D:\apache-cassandra-3.11.7\lib\asm-5.0.4.jar;D:\apache-cassandra-3.11.7\lib\caffeine-2.2.6.jar;D:\apache-cassandra-3.11.7\lib\cassandra-driver-core-3.0.1-shaded.jar;D:\apache-cassandra-3.11.7\lib\commons-cli-1.1.jar;D:\apache-cassandra-3.11.7\lib\commons-codec-1.9.jar;D:\apache-cassandra-3.11.7\lib\commons-lang3-3.1.jar;D:\apache-cassandra-3.11.7\lib\commons-math3-3.2.jar;D:\apache-cassandra-3.11.7\lib\compress-lzf-0.8.4.jar;D:\apache-cassandra-3.11.7\lib\concurrent-trees-2.4.0.jar;D:\apache-cassandra-3.11.7\lib\concurrentlinkedhashmap-lru-1.4.jar;D:\apache-cassandra-3.11.7\lib\disruptor-3.0.1.jar;D:\apache-cassandra-3.11.7\lib\ecj-4.4.2.jar;D:\apache-cassandra-3.11.7\lib\guava-18.0.jar;D:\apache-cassandra-3.11.7\lib\HdrHistogram-2.1.9.jar;D:\apache-cassandra-3.11.7\lib\high-scale-lib-1.0.6.jar;D:\apache-cassandra-3.11.7\lib\hppc-0.5.4.jar;D:\apache-cassandra-3.11.7\lib\jackson-annotations-2.9.10.jar;D:\apache-cassandra-3.11.7\lib\jackson-core-2.9.10.jar;D:\apache-cassandra-3.11.7\lib\jackson-databind-2.9.10.4.jar;D:\apache-cassandra-3.11.7\lib\jamm-0.3.0.jar;D:\apache-cassandra-3.11.7\lib\javax.inject.jar;D:\apache-cassandra-3.11.7\lib\jbcrypt-0.3m.jar;D:\apache-cassandra-3.11.7\lib\jcl-over-slf4j-1.7.7.jar;D:\apache-cassandra-3.11.7\lib\jctools-core-1.2.1.jar;D:\apache-cassandra-3.11.7\lib\jflex-1.6.0.jar;D:\apache-cassandra-3.11.7\lib\jna-4.2.2.jar;D:\apache-cassandra-3.11.7\lib\joda-time-2.4.jar;D:\apache-cassandra-3.11.7\lib\json-simple-1.1.jar;D:\apache-cassandra-3.11.7\lib\jstackjunit-0.0.1.jar;D:\apache-cassandra-3.11.7\lib\libthrift-0.9.2.jar;D:\apache-cassandra-3.11.7\lib\log4j-over-slf4j-1.7.7.jar;D:\apache-cassandra-3.11.7\lib\logback-classic-1.1.3.jar;D:\apache-cassandra-3.11.7\lib\logback-core-1.1.3.jar;D:\apache-cassandra-3.11.7\lib\lz4-1.3.0.jar;D:\apache-cassandra-3.11.7\lib\metrics-core-3.1.5.jar;D:\apache-cassandra-3.11.7\lib\metrics-jvm-3.1.5.jar;D:\apache-cassandra-3.11.7\lib\metrics-logback-3.1.5.jar;D:\apache-cassandra-3.11.7\lib\netty-all-4.0.44.Final.jar;D:\apache-cassandra-3.11.7\lib\ohc-core-0.4.4.jar;D:\apache-cassandra-3.11.7\lib\ohc-core-j8-0.4.4.jar;D:\apache-cassandra-3.11.7\lib\reporter-config-base-3.0.3.jar;D:\apache-cassandra-3.11.7\lib\reporter-config3-3.0.3.jar;D:\apache-cassandra-3.11.7\lib\sigar-1.6.4.jar;D:\apache-cassandra-3.11.7\lib\slf4j-api-1.7.7.jar;D:\apache-cassandra-3.11.7\lib\snakeyaml-1.11.jar;D:\apache-cassandra-3.11.7\lib\snappy-java-1.1.1.7.jar;D:\apache-cassandra-3.11.7\lib\snowball-stemmer-1.3.0.581.1.jar;D:\apache-cassandra-3.11.7\lib\ST4-4.0.8.jar;D:\apache-cassandra-3.11.7\lib\stream-2.5.2.jar;D:\apache-cassandra-3.11.7\lib\thrift-server-0.3.7.jar;D:\apache-cassandra-3.11.7\build\classes\main;D:\apache-cassandra-3.11.7\build\classes\thrift;D:\apache-cassandra-3.11.7\lib\jamm-0.3.0.jar INFO [main] 2020-08-17 17:31:22,747 CassandraDaemon.java:507 - JVM Arguments: [-ea,
-javaagent:D:\apache-cassandra-3.11.7\lib\jamm-0.3.0.jar, -Xms2G, -Xmx2G, -XX:+HeapDumpOnOutOfMemoryError, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled, -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Dlogback.configurationFile=logback.xml, -Djava.library.path=D:\apache-cassandra-3.11.7\lib\sigar-bin, -Dcassandra.jmx.local.port=7199, -Dcassandra, -Dcassandra-foreground=yes, -Dcassandra.logdir=D:\apache-cassandra-3.11.7\logs, -Dcassandra.storagedir=D:\apache-cassandra-3.11.7\data] WARN [main] 2020-08-17 17:31:22,763 StartupChecks.java:169 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info. WARN [main] 2020-08-17 17:31:22,768 StartupChecks.java:220 - The JVM is not configured to stop on OutOfMemoryError which can cause data corruption. Use one of the following JVM options to configure the behavior on OutOfMemoryError: -XX:+ExitOnOutOfMemoryError,
-XX:+CrashOnOutOfMemoryError, or -XX:OnOutOfMemoryError="<cmd args>;<cmd args>" INFO [main] 2020-08-17 17:31:22,774 SigarLibrary.java:44 - Initializing SIGAR library
Can you pls help me resolve the port issue which is stopping me from running two instances.
In addition, I'll appreciate if you suggest other ways to get this done e.g. options 1 or 2 as I've mentioned in the beginning of my post or GCP or AWS with free account.
Option 2 - using docker - is very easy. This is described in details here: cassandra - docker hub
First of all you must increase resources in docker, because default configuration migh be too small for running more than 1-2 nodes - I am using 8 GB RAM and 3 cpu on my laptop and I am able to start 3 nodes of Cassandra simultanously.
The first step is to create a bridge network which will be used by Cassandra's cluster:
docker network create mynet
Then start a first node :
docker run --name cas1 --network mynet -d cassandra
then wait about one minute until the node starts, and then start another node:
docker run --name cas2 --network mynet -d -e CASSANDRA_SEEDS=cas1 cassandra
and again, start third node:
docker run --name cas3 --network mynet -d -e CASSANDRA_SEEDS=cas1 cassandra
Finally start another docker container instance in interactive mode, run cqlsh in it and connect cqlsh to your cluster (to the first node named cas1):
docker run -it --network mynet --rm cassandra cqlsh cas1
Connected to Test Cluster at cas1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.7 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>

Optimization of spark cluster configuration( Low configuration virtual machine cluster)

When I use Spark's standalone mode to process a large number of datasets,the log said:
ERROR TaskSchedulerImpl:70 - Lost executor 1 on : Executor heartbeat timed out after 381181 ms
I search the internet, they say I should set parameters with spark submit:
[hadoop#Master spark2.4.0]$ bin/spark-submit --master spark://master:7077 --conf spark.worker.timeout 10000000 --py-files id.py id.py --name id
Error message in log:
Error: Invalid argument to --conf: spark.worker.timeout
Questions:
How to set timeout parameter?
Thanks to meniluca's answer, I lost the symbols in instructions
After adjusting the timeout, the log displays
2019-12-05 19:42:27 WARN Utils:87 - Suppressing exception in finally: broken pipe (Write failed)
java.net.SocketException: broken pipe (Write failed)
2019-12-05 21:13:09 INFO SparkContext:54 - Invoking stop() from shutdown hook
Exception in thread "serve-DataFrame" java.net.SocketException: Connection reset
Suppressed: java.net.SocketException: broken pipe (Write failed)
then,I change thessh,add ServerAliveInterval 60 while ~/.ssh/ config
ServerAliveInterval 60
the error stil exits, then I try to increase the driver memory, error still exists, and show that the connection is disconnected
[hadoop#Master spark2.4.0]$ bin/spark-submit --master spark://master:7077 --conf spark.worker.timeout=10000000 --driver-memory 1g --py-files id.py id.py --name id
2019-12-06 10:38:49 INFO ContextCleaner:54 - Cleaned accumulator 374
Exception in thread "serve-DataFrame" java.net.SocketException: broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
at org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:212)
at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:224)
at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:224)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
at org.apache.spark.api.python.PythonRDD$$anonfun$serveIterator$1.apply(PythonRDD.scala:413)
at org.apache.spark.api.python.PythonRDD$$anonfun$serveIterator$1.apply(PythonRDD.scala:412)
at org.apache.spark.api.python.PythonRDD$$anonfun$6$$anonfun$apply$1.apply$mcV$sp(PythonRDD.scala:435)
at org.apache.spark.api.python.PythonRDD$$anonfun$6$$anonfun$apply$1.apply(PythonRDD.scala:435)
at org.apache.spark.api.python.PythonRDD$$anonfun$6$$anonfun$apply$1.apply(PythonRDD.scala:435)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.api.python.PythonRDD$$anonfun$6.apply(PythonRDD.scala:436)
at org.apache.spark.api.python.PythonRDD$$anonfun$6.apply(PythonRDD.scala:432)
at org.apache.spark.api.python.PythonServer$$anon$1.run(PythonRDD.scala:862)
2019-12-06 11:06:12 WARN HeartbeatReceiver:66 - Removing executor 1 with no recent heartbeats: 149103 ms exceeds timeout 120000 ms
2019-12-06 11:06:12 ERROR TaskSchedulerImpl:70 - Lost executor 1 on 219.226.109.129: Executor heartbeat timed out after 149103 ms
2019-12-06 11:06:13 INFO SparkContext:54 - Invoking stop() from shutdown hook
2019-12-06 11:06:13 INFO DAGScheduler:54 - Executor lost: 1 (epoch 6)
2019-12-06 11:06:13 WARN HeartbeatReceiver:66 - Removing executor 0 with no recent heartbeats: 155761 ms exceeds timeout 120000 ms
2019-12-06 11:06:13 ERROR TaskSchedulerImpl:70 - Lost executor 0 on 219.226.109.131: Executor heartbeat timed out after 155761 ms
2019-12-06 11:06:13 INFO StandaloneSchedulerBackend:54 - Requesting to kill executor(s) 1
2019-12-06 11:06:13 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 1 from BlockManagerMaster.
2019-12-06 11:06:13 INFO BlockManagerMasterEndpoint:54 - Removing block manager BlockManagerId(1, 219.226.109.129, 42501, None)
2019-12-06 11:06:13 INFO BlockManagerMaster:54 - Removed 1 successfully in removeExecutor
2019-12-06 11:06:13 INFO DAGScheduler:54 - Shuffle files lost for executor: 1 (epoch 6)
2019-12-06 11:06:13 INFO StandaloneSchedulerBackend:54 - Actual list of executor(s) to be killed is 1
2019-12-06 11:06:13 INFO DAGScheduler:54 - Host added was in lost list earlier: 219.226.109.129
2019-12-06 11:06:13 INFO DAGScheduler:54 - Executor lost: 0 (epoch 7)
2019-12-06 11:06:13 INFO AbstractConnector:318 - Stopped Spark#490228e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-12-06 11:06:13 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 0 from BlockManagerMaster.
2019-12-06 11:06:13 INFO BlockManagerMasterEndpoint:54 - Removing block manager BlockManagerId(0, 219.226.109.131, 42164, None)
2019-12-06 11:06:13 INFO BlockManagerMaster:54 - Removed 0 successfully in removeExecutor
2019-12-06 11:06:13 INFO DAGScheduler:54 - Shuffle files lost for executor: 0 (epoch 7)
2019-12-06 11:06:13 INFO DAGScheduler:54 - Host added was in lost list earlier: 219.226.109.131
2019-12-06 11:06:13 INFO SparkUI:54 - Stopped Spark web UI at http://Master:4040
2019-12-06 11:06:13 INFO BlockManagerMasterEndpoint:54 - Registering block manager 219.226.109.129:42501 with 413.9 MB RAM, BlockManagerId(1, 219.226.109.129, 42501, None)
2019-12-06 11:06:13 INFO BlockManagerMasterEndpoint:54 - Registering block manager 219.226.109.131:42164 with 413.9 MB RAM, BlockManagerId(0, 219.226.109.131, 42164, None)
2019-12-06 11:06:14 INFO StandaloneSchedulerBackend:54 - Shutting down all executors
2019-12-06 11:06:14 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asking each executor to shut down
2019-12-06 11:06:14 INFO BlockManagerInfo:54 - Added broadcast_15_piece0 in memory on 219.226.109.129:42501 (size: 21.1 KB, free: 413.9 MB)
2019-12-06 11:06:15 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2019-12-06 11:06:15 INFO BlockManagerInfo:54 - Added broadcast_15_piece0 in memory on 219.226.109.131:42164 (size: 21.1 KB, free: 413.9 MB)
2019-12-06 11:06:16 INFO MemoryStore:54 - MemoryStore cleared
2019-12-06 11:06:16 INFO BlockManager:54 - BlockManager stopped
2019-12-06 11:06:16 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2019-12-06 11:06:17 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2019-12-06 11:06:17 ERROR TransportResponseHandler:144 - Still have 1 requests outstanding when connection from Master/219.226.109.130:7077 is closed
2019-12-06 11:06:17 INFO SparkContext:54 - Successfully stopped SparkContext
2019-12-06 11:06:17 INFO ShutdownHookManager:54 - Shutdown hook called
2019-12-06 11:06:17 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-e2a29bac-7277-4476-ad23-315a27e9ccf0
2019-12-06 11:06:17 INFO ShutdownHookManager:54 - Deleting directory /tmp/localPyFiles-dd95954c-2e77-41ca-969d-a201269f5b5b
2019-12-06 11:06:17 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-bcd56b4a-fb32-4b58-a1d5-71abc5218d32
2019-12-06 11:06:17 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-e2a29bac-7277-4476-ad23-315a27e9ccf0/pyspark-d04b799f-a116-44d5-b6a5-811cc8c03743
Question
Is SSH related to broken pipe?
Is increasing driver memory helpful to this problem?
I see the configuration posts on the Internet, but they're are highly configured. Since I use my computer to built clusters on virtual machine,
, the master has two cores , the slave has one core. How to adjust the configuration ?
Please try with
--conf spark.worker.timeout=10000000
you are missing the equal character between the configuration name and value.
java.net.SocketException: broken pipe (Write failed) occurs when something is wrong with the access port.
I suggest you to change the master which is at port 8080. The port can be changed either in the configuration file or via command-line options.
sbin/start-master.sh
Same can be tried with worker node as well if the above does not fix issue.
To see which ports are being used you can use :
sudo netstat -ltup

Cassandra stopped working after nodetool repair

After running "nodetool repair" command cassandra node gone down and did not start again.
INFO [main] 2016-10-19 12:44:50,244 ColumnFamilyStore.java:405 - Initializing system_schema.aggregates
INFO [main] 2016-10-19 12:44:50,247 ColumnFamilyStore.java:405 - Initializing system_schema.indexes
INFO [main] 2016-10-19 12:44:50,248 ViewManager.java:139 - Not submitting build tasks for views in keyspace system_schema as storage service is not initialized
Cassandra version 3.7
Turned on the node and it's fine. It took too long to start (more than 30 minutes).
INFO [main] 2016-10-19 15:32:48,348 ColumnFamilyStore.java:405 - Initializing system_schema.indexes
INFO [main] 2016-10-19 15:32:48,354 ViewManager.java:139 - Not submitting build tasks for views in keyspace system_schema as storage service is not initialized
INFO [main] 2016-10-19 16:07:36,529 ColumnFamilyStore.java:405 - Initializing system_distributed.parent_repair_history
INFO [main] 2016-10-19 16:07:36,546 ColumnFamilyStore.java:405 - Initializing system_distributed.repair_history
Now I'm trying to figure out why it is so slow.

Unable to start Cassandra via Init service. But can start by directly running $ cassandra

I just installed Cassandra on my server
Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2+deb7u2 x86_64 GNU/Linux
by following the instructions on DataStax. It is recommended to start Cassandra using Init.
$ sudo service cassandra start (dies)
However, my process simply dies after a short while. But strangely I am able to start Cassandra by directly running it with no problems.
$ cassandra (no problem!)
I examined Cassandra's logs in /var/log/cassandra, but there are no errors. What am I doing wrong?
INFO [main] 2015-03-04 08:24:25,759 CassandraDaemon.java (line 135) Logging initialized
INFO [main] 2015-03-04 08:24:25,819 YamlConfigurationLoader.java (line 80) Loading settings from file:/etc/cassandra/cassandra.yaml
INFO [main] 2015-03-04 08:24:26,426 DatabaseDescriptor.java (line 143) Data files directories: [/var/lib/cassandra/data]
INFO [main] 2015-03-04 08:24:26,428 DatabaseDescriptor.java (line 144) Commit log directory: /var/lib/cassandra/commitlog
INFO [main] 2015-03-04 08:24:26,428 DatabaseDescriptor.java (line 184) DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO [main] 2015-03-04 08:24:26,428 DatabaseDescriptor.java (line 198) disk_failure_policy is stop
INFO [main] 2015-03-04 08:24:26,428 DatabaseDescriptor.java (line 199) commit_failure_policy is stop
INFO [main] 2015-03-04 08:24:26,438 DatabaseDescriptor.java (line 269) Global memtable threshold is enabled at 29MB
INFO [main] 2015-03-04 08:24:26,652 DatabaseDescriptor.java (line 408) Not using multi-threaded compaction
INFO [main] 2015-03-04 08:24:27,052 YamlConfigurationLoader.java (line 80) Loading settings from file:/etc/cassandra/cassandra.yaml
INFO [main] 2015-03-04 08:24:27,080 YamlConfigurationLoader.java (line 80) Loading settings from file:/etc/cassandra/cassandra.yaml
INFO [main] 2015-03-04 08:24:27,100 CassandraDaemon.java (line 160) JVM vendor/version: OpenJDK 64-Bit Server VM/1.7.0_75
WARN [main] 2015-03-04 08:24:27,100 CassandraDaemon.java (line 165) OpenJDK is not recommended. Please upgrade to the newest Oracle Java release
INFO [main] 2015-03-04 08:24:27,101 CassandraDaemon.java (line 188) Heap size: 123731968/123731968
INFO [main] 2015-03-04 08:24:27,101 CassandraDaemon.java (line 190) Code Cache Non-heap memory: init = 2555904(2496K) used = 651968(636K) committed = 2555904(2496K) max = 50331648(49152K)
INFO [main] 2015-03-04 08:24:27,102 CassandraDaemon.java (line 190) Eden Space Heap memory: init = 83886080(81920K) used = 64290696(62783K) committed = 83886080(81920K) max = 83886080(81920K)
INFO [main] 2015-03-04 08:24:27,107 CassandraDaemon.java (line 190) Survivor Space Heap memory: init = 10485760(10240K) used = 0(0K) committed = 10485760(10240K) max = 10485760(10240K)
INFO [main] 2015-03-04 08:24:27,107 CassandraDaemon.java (line 190) CMS Old Gen Heap memory: init = 29360128(28672K) used = 0(0K) committed = 29360128(28672K) max = 29360128(28672K)
INFO [main] 2015-03-04 08:24:27,108 CassandraDaemon.java (line 190) CMS Perm Gen Non-heap memory: init = 21757952(21248K) used = 15079368(14725K) committed = 21757952(21248K) max = 174063616(169984K)
INFO [main] 2015-03-04 08:24:27,108 CassandraDaemon.java (line 191) Classpath: /etc/cassandra:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.0.12.jar:/usr/share/cassandra/apache-cassandra-thrift-2.0.12.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar::/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar
system.log (END)
Try to change head of init script from:
#! /bin/sh
to:
#! /bin/sh -x
Then restart service via init script(/etc/init.d/cassandra restart) and check output, there might be an error
I ran into this issue, so writing down this solution even though the question is really old.
I had modified the cassandra.yaml file (updated ports and a few things), and after that the service just wouldn't start. It would always show an active (exited) status, and there wouldn't be anything in any log.
It turns out that the service needs to be forcefully reloaded once, to ensure that all the config files are re-read.
sudo service cassandra force-reload
After this, you should be able to control the service as usual using service or systemctl.
sudo service cassandra status
sudo systemctl status cassandra
sudo systemctl stop cassandra
sudo systemctl start cassandra
sudo systemctl restart cassandra

Cannot start Cassandra with "bin/cassandra -f"

I have a problem of using Cassandra, I can start it with "bin/cassandra", but cannot start it with "bin/cassandra -f", anyone know the reason?
Here are the detailed info:
root#server1:~/cassandra# bin/cassandra -f
INFO 10:51:31,500 JNA not found. Native methods will be disabled.
INFO 10:51:31,740 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO 10:51:32,043 Deleted /var/lib/cassandra/data/system/LocationInfo-61-Data.db
INFO 10:51:32,044 Deleted /var/lib/cassandra/data/system/LocationInfo-62-Data.db
INFO 10:51:32,052 Deleted /var/lib/cassandra/data/system/LocationInfo-63-Data.db
INFO 10:51:32,053 Deleted /var/lib/cassandra/data/system/LocationInfo-64-Data.db
INFO 10:51:32,063 Sampling index for /var/lib/cassandra/data/system/LocationInfo-65-Data.db
INFO 10:51:32,117 Sampling index for /var/lib/cassandra/data/Keyspace1/Standard2-5-Data.db
INFO 10:51:32,118 Sampling index for /var/lib/cassandra/data/Keyspace1/Standard2-6-Data.db
INFO 10:51:32,120 Sampling index for /var/lib/cassandra/data/Keyspace1/Standard2-7-Data.db
INFO 10:51:32,131 Replaying /var/lib/cassandra/commitlog/CommitLog-1285869561954.log
INFO 10:51:32,143 Finished reading /var/lib/cassandra/commitlog/CommitLog-1285869561954.log
INFO 10:51:32,145 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1286301092145.log
INFO 10:51:32,153 Standard2 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1286301092145.log', position=121)
INFO 10:51:32,155 Enqueuing flush of Memtable-Standard2#1811560891(29 bytes, 1 operations)
INFO 10:51:32,156 Writing Memtable-Standard2#1811560891(29 bytes, 1 operations)
INFO 10:51:32,200 Completed flushing /var/lib/cassandra/data/Keyspace1/Standard2-8-Data.db
INFO 10:51:32,203 Compacting [org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Standard2-5-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Standard2-6-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Standard2-7-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Standard2-8-Data.db')]
INFO 10:51:32,214 Recovery complete
INFO 10:51:32,214 Log replay complete
INFO 10:51:32,230 Saved Token found: 47408016217042861442279446207060121025
INFO 10:51:32,230 Saved ClusterName found: Test Cluster
INFO 10:51:32,231 Saved partitioner not found. Using org.apache.cassandra.dht.RandomPartitioner
INFO 10:51:32,250 LocationInfo has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1286301092145.log', position=345)
INFO 10:51:32,250 Enqueuing flush of Memtable-LocationInfo#1120194637(95 bytes, 2 operations)
INFO 10:51:32,251 Writing Memtable-LocationInfo#1120194637(95 bytes, 2 operations)
INFO 10:51:32,307 Completed flushing /var/lib/cassandra/data/system/LocationInfo-66-Data.db
INFO 10:51:32,316 Starting up server gossip
INFO 10:51:32,329 Compacted to /var/lib/cassandra/data/Keyspace1/Standard2-9-Data.db. 1670/1440 bytes for 6 keys. Time: 125ms.
INFO 10:51:32,366 Binding thrift service to /172.24.0.80:9160
INFO 10:51:32,369 Cassandra starting up...
I cant see any problems? (-f is short for 'foreground')

Resources