How to install yugabyte-2.0.10.0 on CentOS7? - yugabytedb

I am trying install yugabyte-2.0.10.0:
a) environment:
os: centos7.6
cpu Model: Intel(R) Core(TM) i7 CPU M 620
kernel: 3.10.0-957.el7.x86_64
gcc version 4.8.5
Python 2.7.5
b) commands:
cd ~
rm -rf /opt/yugabyte
mkdir -p /opt/yugabyte
mkdir -p /opt/yugabyte/data
wget https://downloads.yugabyte.com/yugabyte-2.0.10.0-linux.tar.gz
tar -xvzf /root/yugabyte/yugabyte-2.0.10.0-linux.tar.gz -C /opt/yugabyte
/opt/yugabyte/yugabyte-2.0.10.0/bin/post_install.sh
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" destroy
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" create
Error Logs:
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" create
Creating cluster.
Waiting for cluster to be ready.
Traceback (most recent call last):
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1969, in <module>
control.run()
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1946, in run
self.args.func()
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1706, in create_cmd_impl
self.wait_for_cluster_or_raise()
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1551, in wait_for_cluster_or_raise
raise RuntimeError("Timed out waiting for a YugaByte DB cluster!")
RuntimeError: Timed out waiting for a YugaByte DB cluster!
Viewing file /tmp/tmptCw8eu:
2020-01-09 21:21:18,413 INFO: Starting master-1 with:
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-master --fs_data_dirs "/opt/yugabyte/data/node-1/disk-1" --webserver_interface 127.0.0.1 --rpc_bind_addresses 127.0.0.1 --v 0 --version_file_json_path=/opt/yugabyte/yugabyte-2.0.10.0 --webserver_doc_root "/opt/yugabyte/yugabyte-2.0.10.0/www" --replication_factor=1 --yb_num_shards_per_tserver 2 --ysql_num_shards_per_tserver=2 --master_addresses 127.0.0.1:7100 --enable_ysql=true >"/opt/yugabyte/data/node-1/disk-1/master.out" 2>"/opt/yugabyte/data/node-1/disk-1/master.err" &
2020-01-09 21:21:18,475 INFO: Starting tserver-1 with:
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-tserver --fs_data_dirs "/opt/yugabyte/data/node-1/disk-1" --webserver_interface 127.0.0.1 --rpc_bind_addresses 127.0.0.1 --v 0 --version_file_json_path=/opt/yugabyte/yugabyte-2.0.10.0 --webserver_doc_root "/opt/yugabyte/yugabyte-2.0.10.0/www" --tserver_master_addrs=127.0.0.1:7100 --yb_num_shards_per_tserver=2 --redis_proxy_bind_address=127.0.0.1:6379 --cql_proxy_bind_address=127.0.0.1:9042 --local_ip_for_outbound_sockets=127.0.0.1 --use_cassandra_authentication=false --ysql_num_shards_per_tserver=2 --enable_ysql=true --pgsql_proxy_bind_address=127.0.0.1:5433 >"/opt/yugabyte/data/node-1/disk-1/tserver.out" 2>"/opt/yugabyte/data/node-1/disk-1/tserver.err" &
2020-01-09 21:21:18,483 INFO: Waiting for master and tserver processes to come up.
2020-01-09 21:21:18,627 INFO: Waiting for master leader election and tablet server registration.
2020-01-09 21:22:15,331 INFO: Master leader election still pending...
2020-01-09 21:22:16,333 ERROR: Failed waiting for None tservers, got None
^^^ Encountered errors ^^^
please help me in resolving the above issue!
Update1:
Info and Error Logs:
[root#srvr0 ~]# cat /opt/yugabyte/data/node-1/disk-1/master.out
[root#srvr0 ~]# cat /opt/yugabyte/data/node-1/disk-1/master.err
[root#srvr0 ~]# cat /opt/yugabyte/data/node-1/disk-1/tserver.out
The files belonging to this database system will be owned by user "root".
This user must also own the server process.
The database cluster will be initialized with locales
COLLATE: C
CTYPE: en_US.UTF-8
MESSAGES: en_US.UTF-8
MONETARY: en_US.UTF-8
NUMERIC: en_US.UTF-8
TIME: en_US.UTF-8
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
creating directory /opt/yugabyte/data/node-1/disk-1/pg_data ... ok
creating subdirectories ... ok
selecting default max_connections ... 300
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
syncing data to disk ... ok
[root#srvr0 ~]# cat /opt/yugabyte/data/node-1/disk-1/tserver.err
In YugaByte DB, setting LC_COLLATE to C and all other locale settings to en_US.UTF-8 by default. Locale support will be enhanced as part of addressing https://github.com/YugaByte/yugabyte-db/issues/15572020-01-13 15:07:18.447 UTC [12159] LOG: YugaByte is ENABLED in PostgreSQL. Transactions are enabled.
2020-01-13 15:07:18.488 UTC [12159] LOG: listening on IPv4 address "127.0.0.1", port 5433
2020-01-13 15:07:18.595 UTC [12159] LOG: redirecting log output to logging collector process
2020-01-13 15:07:18.595 UTC [12159] HINT: Future log output will appear in directory "/opt/yugabyte/data/node-1/disk-1/yb-data/tserver/logs".
Update 2:
[root#srvr0 ~]# cat /opt/yugabyte/data/node-1/disk-1/yb-data/master/logs/yb-master.WARNING
Log file created at: 2020/01/14 13:47:23
Running on machine: srvr0
Application fingerprint: version 2.0.10.0 build 4 revision 83610e77c7659c7587bc0c8aea76db47ff8e2df1 build_type RELEASE built at 06 Jan 2020 08:02:49 UTC
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0114 13:47:23.925465 12631 master_service.cc:108] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6130): Node 1d36ad7c7b89457197595fc8f9e57f6f peer not initialized.
W0114 13:47:23.928180 12631 master_service.cc:108] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6130): Node 1d36ad7c7b89457197595fc8f9e57f6f peer not initialized.
W0114 13:47:23.929930 12631 master_service.cc:108] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6130): Node 1d36ad7c7b89457197595fc8f9e57f6f peer not initialized.
W0114 13:47:23.931773 12631 master_service.cc:108] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6130): Node 1d36ad7c7b89457197595fc8f9e57f6f peer not initialized.
W0114 13:47:25.277549 12595 log.cc:702] Time spent Fsync log took a long time: real 0.289s user 0.000s sys 0.000s
W0114 13:47:27.635577 12595 log.cc:702] Time spent Fsync log took a long time: real 0.144s user 0.000s sys 0.000s
W0114 13:47:29.459060 12595 log.cc:702] Time spent Fsync log took a long time: real 0.088s user 0.000s sys 0.000s
...
W0114 13:48:17.587898 12595 log.cc:702] Time spent Fsync log took a long time: real 0.068s user 0.000s sys 0.000s
W0114 13:48:17.652386 12595 log.cc:702] Time spent Fsync log took a long time: real 0.064s user 0.000s sys 0.000s
W0114 13:48:18.864150 12595 log.cc:702] Time spent Fsync log took a long time: real 0.089s user 0.000s sys 0.000s
W0114 13:48:25.154635 12654 permissions_manager.cc:1050] Multiple security configs found when loading sys catalog
W0114 13:48:25.181205 12654 catalog_manager.cc:606] Time spent T 00000000000000000000000000000000 P 1d36ad7c7b89457197595fc8f9e57f6f: Loading metadata into memory: real 60.895s user 0.132s sys 0.026s
[root#srvr0 ~]# cat /opt/yugabyte/data/node-1/disk-1/yb-data/tserver/logs/yb-tserver.WARNING
Log file created at: 2020/01/14 13:47:23
Running on machine: srvr0
Application fingerprint: version 2.0.10.0 build 4 revision 83610e77c7659c7587bc0c8aea76db47ff8e2df1 build_type RELEASE built at 06 Jan 2020 08:02:49 UTC
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0114 13:47:23.926698 12628 heartbeater.cc:598] P 4bbca70b45944a7e9f66463471e11466: Failed to heartbeat to 127.0.0.1:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=0, num=1, masters=0x00000000029c8d60 -> [[127.0.0.1:7100]], code=Service unavailable
W0114 13:47:23.928352 12628 heartbeater.cc:598] P 4bbca70b45944a7e9f66463471e11466: Failed to heartbeat to 127.0.0.1:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=1, num=1, masters=0x00000000029c8d60 -> [[127.0.0.1:7100]], code=Service unavailable
W0114 13:47:23.930130 12628 heartbeater.cc:598] P 4bbca70b45944a7e9f66463471e11466: Failed to heartbeat to 127.0.0.1:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=2, num=1, masters=0x00000000029c8d60 -> [[127.0.0.1:7100]], code=Service unavailable
W0114 13:47:23.930173 12628 heartbeater.cc:323] P 4bbca70b45944a7e9f66463471e11466: Failed 3 heartbeats in a row: no longer allowing fast heartbeat attempts.
...
W0114 13:48:22.868005 12628 heartbeater.cc:598] P 4bbca70b45944a7e9f66463471e11466: Failed to heartbeat to 127.0.0.1:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=61, num=1, masters=0x00000000029c8d60 -> [[127.0.0.1:7100]], code=Service unavailable
W0114 13:48:23.869757 12628 heartbeater.cc:598] P 4bbca70b45944a7e9f66463471e11466: Failed to heartbeat to 127.0.0.1:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=62, num=1, masters=0x00000000029c8d60 -> [[127.0.0.1:7100]], code=Service unavailable
W0114 13:48:24.915241 12628 heartbeater.cc:598] P 4bbca70b45944a7e9f66463471e11466: Failed to heartbeat to 127.0.0.1:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=63, num=1, masters=0x00000000029c8d60 -> [[127.0.0.1:7100]], code=Service unavailable
Update 3:
last 20 lines master and tserver INFO files:
[root#srvr0 logs]# tail -20 /opt/yugabyte/data/node-1/disk-1/yb-data/master/logs/yb-master.INFO
}
}
table_type: TRANSACTION_STATUS_TABLE_TYPE
namespace {
name: "system"
}
I0121 07:00:23.625553 12478 catalog_manager.cc:1937] Setting default tablets to 2 with 1 primary servers
I0121 07:00:23.625607 12478 partition.cc:388] Creating partitions with num_tablets: 2
I0121 07:00:23.701505 12478 catalog_manager.cc:2155] Successfully created table transactions [id=ebe4eab3526e4030a8ef44796223f904] per request from internal request
I0121 07:00:23.701651 12478 catalog_manager.cc:741] Finished creating transaction status table asynchronously
I0121 07:00:23.701782 12478 catalog_manager.cc:3790] 5536e8fad1d04d52902a0d9488ab5b4e now has full report for 0 tablets.
I0121 07:00:23.701819 12478 catalog_manager.cc:3796] 5536e8fad1d04d52902a0d9488ab5b4e sent full tablet report with 0 tablets.
I0121 07:00:23.901152 12478 catalog_manager.cc:4037] Peer 5536e8fad1d04d52902a0d9488ab5b4e sent incremental report for 1bd70a13590146de9fa3feb16e90b120, prev state op id: -1, prev state term: 0, prev state has_leader_uuid: 0. Consensus state: current_term: 0 config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }
I0121 07:00:24.024679 12478 catalog_manager.cc:4037] Peer 5536e8fad1d04d52902a0d9488ab5b4e sent incremental report for 1bd70a13590146de9fa3feb16e90b120, prev state op id: -1, prev state term: 0, prev state has_leader_uuid: 0. Consensus state: current_term: 1 config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }
I0121 07:00:24.035790 12457 catalog_manager.cc:4037] Peer 5536e8fad1d04d52902a0d9488ab5b4e sent incremental report for ec9bb307331442b3b1fd7ba43a0199a0, prev state op id: -1, prev state term: 0, prev state has_leader_uuid: 0. Consensus state: current_term: 1 config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }
I0121 07:00:24.035905 12457 catalog_manager.cc:4002] Tablet: 1bd70a13590146de9fa3feb16e90b120 reported consensus state change. New consensus state: current_term: 1 leader_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } } from 5536e8fad1d04d52902a0d9488ab5b4e
I0121 07:00:24.036085 12457 catalog_entity_info.cc:97] T 1bd70a13590146de9fa3feb16e90b120: Leader changed from <NULL> to 0x00000000038ee010 -> { permanent_uuid: 5536e8fad1d04d52902a0d9488ab5b4e registration: common { private_rpc_addresses { host: "127.0.0.1" port: 9100 } http_addresses { host: "127.0.0.1" port: 9000 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } placement_uuid: "" } capabilities: 2189743739 placement_id: cloud1:datacenter1:rack1 }
I0121 07:00:24.069538 12478 catalog_manager.cc:4002] Tablet: ec9bb307331442b3b1fd7ba43a0199a0 reported consensus state change. New consensus state: current_term: 1 leader_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } } from 5536e8fad1d04d52902a0d9488ab5b4e
I0121 07:00:24.069607 12478 catalog_entity_info.cc:97] T ec9bb307331442b3b1fd7ba43a0199a0: Leader changed from <NULL> to 0x00000000038ee010 -> { permanent_uuid: 5536e8fad1d04d52902a0d9488ab5b4e registration: common { private_rpc_addresses { host: "127.0.0.1" port: 9100 } http_addresses { host: "127.0.0.1" port: 9000 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } placement_uuid: "" } capabilities: 2189743739 placement_id: cloud1:datacenter1:rack1 }
I0121 07:00:28.951503 12447 reactor.cc:450] Master_R000: Timing out connection Connection (0x0000000002cc3690) server 127.0.0.1:49899 => 127.0.0.1:7100 - it has been idle for 65.0008s (delta: 65.0008, current time: 751.024, last activity time: 686.023)
[root#srvr0 logs]# tail -20 /opt/yugabyte/data/node-1/disk-1/yb-data/tserver/logs/yb_tserver.INFO
tail: cannot open ‘/opt/yugabyte/data/node-1/disk-1/yb-data/tserver/logs/yb_tserver.INFO’ for reading: No such file or directory
[root#srvr0 logs]# tail -20 /opt/yugabyte/data/node-1/disk-1/yb-data/tserver/logs/yb-tserver.INFO
I0121 07:00:24.024483 13021 consensus_meta.cc:275] T 1bd70a13590146de9fa3feb16e90b120 P 5536e8fad1d04d52902a0d9488ab5b4e: Updating active role from FOLLOWER to LEADER. Consensus state: current_term: 1 leader_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
I0121 07:00:24.024521 13021 raft_consensus.cc:2803] T 1bd70a13590146de9fa3feb16e90b120 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: Calling mark dirty synchronously for reason code NEW_LEADER_ELECTED
I0121 07:00:24.024586 13021 raft_consensus.cc:838] T 1bd70a13590146de9fa3feb16e90b120 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: Becoming Leader. State: Replica: 5536e8fad1d04d52902a0d9488ab5b4e, State: 1, Role: LEADER, Watermarks: {Received: 0.0 Committed: 0.0} Leader: 0.0
I0121 07:00:24.024760 13021 consensus_queue.cc:207] T 1bd70a13590146de9fa3feb16e90b120 P 5536e8fad1d04d52902a0d9488ab5b4e [LEADER]: Queue going to LEADER mode. State: All replicated op: 0.0, Majority replicated op: 0.0, Committed index: 0.0, Last appended: 0.0, Current term: 1, Majority size: 1, State: QUEUE_OPEN, Mode: LEADER, active raft config: opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } }
I0121 07:00:24.024852 13021 raft_consensus.cc:856] Sending NO_OP at op { term: 0 index: 0 }
I0121 07:00:24.026254 13023 replica_state.cc:1268] T 1bd70a13590146de9fa3feb16e90b120 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: SetLeaderNoOpCommittedUnlocked(1)
I0121 07:00:24.026321 13023 replica_state.cc:725] T 1bd70a13590146de9fa3feb16e90b120 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: Advanced the committed_op_id across terms. Last committed operation was: { term: 0 index: 0 } New committed index is: { term: 1 index: 1 }
I0121 07:00:24.035311 13018 leader_election.cc:239] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [CANDIDATE]: Term 1 election: Election decided. Result: candidate won.
I0121 07:00:24.035398 13018 raft_consensus.cc:2867] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 FOLLOWER]: Snoozing failure detection for 3.178s
I0121 07:00:24.035445 13018 raft_consensus.cc:2773] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 FOLLOWER]: Leader election won for term 1
I0121 07:00:24.035468 13018 replica_state.cc:1268] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 FOLLOWER]: SetLeaderNoOpCommittedUnlocked(0)
I0121 07:00:24.035542 13018 consensus_meta.cc:275] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e: Updating active role from FOLLOWER to LEADER. Consensus state: current_term: 1 leader_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
I0121 07:00:24.035590 13018 raft_consensus.cc:2803] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: Calling mark dirty synchronously for reason code NEW_LEADER_ELECTED
I0121 07:00:24.035641 13018 raft_consensus.cc:838] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: Becoming Leader. State: Replica: 5536e8fad1d04d52902a0d9488ab5b4e, State: 1, Role: LEADER, Watermarks: {Received: 0.0 Committed: 0.0} Leader: 0.0
I0121 07:00:24.035706 13018 consensus_queue.cc:207] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [LEADER]: Queue going to LEADER mode. State: All replicated op: 0.0, Majority replicated op: 0.0, Committed index: 0.0, Last appended: 0.0, Current term: 1, Majority size: 1, State: QUEUE_OPEN, Mode: LEADER, active raft config: opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } }
I0121 07:00:24.035748 13018 raft_consensus.cc:856] Sending NO_OP at op { term: 0 index: 0 }
I0121 07:00:24.036341 13021 replica_state.cc:1268] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: SetLeaderNoOpCommittedUnlocked(1)
I0121 07:00:24.036391 13021 replica_state.cc:725] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: Advanced the committed_op_id across terms. Last committed operation was: { term: 0 index: 0 } New committed index is: { term: 1 index: 1 }
I0121 07:01:28.862228 12462 reactor.cc:450] TabletServer_R000: Timing out connection Connection (0x0000000003fb4490) server 127.0.0.1:49050 => 127.0.0.1:9100 - it has been idle for 65.0008s (delta: 65.0008, current time: 810.935, last activity time: 745.934)
I0121 07:01:28.862249 12463 reactor.cc:450] TabletServer_R001: Timing out connection Connection (0x0000000003fb47f0) server 127.0.0.1:33000 => 127.0.0.1:9100 - it has been idle for 65.0008s (delta: 65.0008, current time: 810.936, last activity time: 745.935)
Update 4:
Install Python 2.7.10 on CentOS7(Reference:https://myopswork.com/install-python-2-7-10-on-centos-rhel-75f90c5239a5), as follows:
cd /usr/src
wget https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz
tar xzf Python-2.7.10.tgz
cd Python-2.7.10
./configure
make altinstall
python2.7
###Make python 2.7.10 as default
echo "alias python=\"/usr/local/bin/python2.7\"" >> /etc/profile
execute the following commands to install yugabyte 2.0.10.0
cd ~
rm -rf /opt/yugabyte
mkdir -p /opt/yugabyte
mkdir -p /opt/yugabyte/data
tar -xvzf /tmp/yugabyte/yugabyte-2.0.10.0-linux.tar.gz -C /opt/yugabyte
/opt/yugabyte/yugabyte-2.0.10.0/bin/post_install.sh
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" destroy
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" create
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" setup_redis
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" status
Note: 1st attempt to create db fails, destroy it and create it again.
Logs:
Python 2.7.10:
[root#srvr0 ~]# python
Python 2.7.10 (default, Jan 27 2020, 17:09:56)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> exit();
Installation:
[root#srvr0 ~]# cd ~
[root#srvr0 ~]# rm -rf /opt/yugabyte
[root#srvr0 ~]# mkdir -p /opt/yugabyte
[root#srvr0 ~]# mkdir -p /opt/yugabyte/data
[root#srvr0 ~]# ###cp /root/yugabyte-2.0.10.0-linux.tar.gz /index
[root#srvr0 ~]# tar -xvzf /index/yugabyte/yugabyte-2.0.10.0-linux.tar.gz -C /opt/yugabyte
yugabyte-2.0.10.0/
yugabyte-2.0.10.0/bin/
yugabyte-2.0.10.0/bin/ysqlsh
yugabyte-2.0.10.0/bin/psql
yugabyte-2.0.10.0/bin/bulk_load_cleanup.sh
yugabyte-2.0.10.0/bin/bulk_load_helper.sh
yugabyte-2.0.10.0/bin/log_cleanup.sh
yugabyte-2.0.10.0/bin/yb-check-failed-tablets.sh
yugabyte-2.0.10.0/bin/yb-check-consistency.py
yugabyte-2.0.10.0/bin/configure
...
yugabyte-2.0.10.0/ui/conf/evolutions/default/1.sql
yugabyte-2.0.10.0/ui/conf/application.conf
yugabyte-2.0.10.0/ui/conf/k8s-expose-all.yml
yugabyte-2.0.10.0/ui/conf/application.default.conf
yugabyte-2.0.10.0/ui/conf/default_cmk_policy.json
yugabyte-2.0.10.0/ui/conf/version.txt
yugabyte-2.0.10.0/ui/README.md
yugabyte-2.0.10.0/version_metadata.json
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/post_install.sh
+ /opt/yugabyte/yugabyte-2.0.10.0/bin/patchelf --set-interpreter /opt/yugabyte/yugabyte-2.0.10.0/lib/ld.so log-dump
...
+ /opt/yugabyte/yugabyte-2.0.10.0/bin/patchelf --set-interpreter /opt/yugabyte/yugabyte-2.0.10.0/lib/ld.so vacuumlo
+ /opt/yugabyte/yugabyte-2.0.10.0/bin/patchelf --set-rpath /opt/yugabyte/yugabyte-2.0.10.0/lib/yb:/opt/yugabyte/yugabyte-2.0.10.0/lib/yb-thirdparty:/opt/yugabyte/yugabyte-2.0.10.0/linuxbrew/lib vacuumlo
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" destroy
Destroying cluster.
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" create
Creating cluster.
Waiting for cluster to be ready.
Traceback (most recent call last):
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1969, in <module>
control.run()
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1946, in run
self.args.func()
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1706, in create_cmd_impl
self.wait_for_cluster_or_raise()
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1551, in wait_for_cluster_or_raise
raise RuntimeError("Timed out waiting for a YugaByte DB cluster!")
RuntimeError: Timed out waiting for a YugaByte DB cluster!
Viewing file /tmp/tmpJb_KSP:
2020-01-27 19:09:25,732 INFO: Starting master-1 with:
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-master --fs_data_dirs "/opt/yugabyte/data/node-1/disk-1" --webserver_interface 127.0.0.1 --rpc_bind_addresses 127.0.0.1 --v 0 --version_file_json_path=/opt/yugabyte/yugabyte-2.0.10.0 --webserver_doc_root "/opt/yugabyte/yugabyte-2.0.10.0/www" --replication_factor=1 --yb_num_shards_per_tserver 2 --ysql_num_shards_per_tserver=2 --master_addresses 127.0.0.1:7100 --enable_ysql=true >"/opt/yugabyte/data/node-1/disk-1/master.out" 2>"/opt/yugabyte/data/node-1/disk-1/master.err" &
2020-01-27 19:09:25,792 INFO: Starting tserver-1 with:
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-tserver --fs_data_dirs "/opt/yugabyte/data/node-1/disk-1" --webserver_interface 127.0.0.1 --rpc_bind_addresses 127.0.0.1 --v 0 --version_file_json_path=/opt/yugabyte/yugabyte-2.0.10.0 --webserver_doc_root "/opt/yugabyte/yugabyte-2.0.10.0/www" --tserver_master_addrs=127.0.0.1:7100 --yb_num_shards_per_tserver=2 --redis_proxy_bind_address=127.0.0.1:6379 --cql_proxy_bind_address=127.0.0.1:9042 --local_ip_for_outbound_sockets=127.0.0.1 --use_cassandra_authentication=false --ysql_num_shards_per_tserver=2 --enable_ysql=true --pgsql_proxy_bind_address=127.0.0.1:5433 >"/opt/yugabyte/data/node-1/disk-1/tserver.out" 2>"/opt/yugabyte/data/node-1/disk-1/tserver.err" &
2020-01-27 19:09:25,800 INFO: Waiting for master and tserver processes to come up.
2020-01-27 19:09:25,934 INFO: Waiting for master leader election and tablet server registration.
2020-01-27 19:10:22,502 INFO: Master leader election still pending...
2020-01-27 19:10:23,504 ERROR: Failed waiting for None tservers, got None
^^^ Encountered errors ^^^
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" destroy
Destroying cluster.
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" create
Creating cluster.
Waiting for cluster to be ready.
.
----------------------------------------------------------------------------------------------------
| Node Count: 1 | Replication Factor: 1 |
----------------------------------------------------------------------------------------------------
| JDBC : jdbc:postgresql://127.0.0.1:5433/postgres |
| YSQL Shell : /opt/yugabyte/yugabyte-2.0.10.0/bin/ysqlsh |
| YCQL Shell : /opt/yugabyte/yugabyte-2.0.10.0/bin/cqlsh |
| YEDIS Shell : /opt/yugabyte/yugabyte-2.0.10.0/bin/redis-cli |
| Web UI : http://127.0.0.1:7000/ |
| Cluster Data : /opt/yugabyte/data |
----------------------------------------------------------------------------------------------------
For more info, please use: yb-ctl --data_dir /opt/yugabyte/data status
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" setup_redis
Setting up YugaByte DB support for Redis API.
Waiting for cluster to be ready.
Setup Redis successful.

You got it working in this issue How to run yugabyte-db yugastore application locally? .
Can you check these logs and report them:
/opt/yugabyte/data/node-1/disk-1/master.out, /opt/yugabyte/data/node-1/disk-1/master.err, /opt/yugabyte/data/node-1/disk-1/tserver.out, /opt/yugabyte/data/node-1/disk-1/tserver.err.

We are trying to reproduce this internally and will get back to you. In the meanwhile, could you please check the tserver.err file and the tserver.INFO logs (how to find yb-ctl tserver logs instructions) to see if anything bad is happening? Feels like the tservers are not up and running.

Related

How to use the DBus system in a container with docker root-less

I would like to use DBus in a container with docker in root-less mode.
I use Ubuntu 22.10 :
host$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.10
Release: 22.10
Codename: kinetic
and docker root-less :
host$ docker info
Client:
Context: rootless
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
compose: Docker Compose (Docker Inc., v2.12.2)
scan: Docker Scan (Docker Inc., v0.21.0)
Server:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 3
Server Version: 20.10.21
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: false
userxattr: true
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d986545181c905378b0f90faa9c5eae3cbfa3755
runc version: v1.1.4-0-g5fd4c4d
init version: de40ad0
Security Options:
seccomp
Profile: default
rootless
cgroupns
Kernel Version: 5.19.0-26-generic
Operating System: Ubuntu 22.10
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 31.23GiB
Name: ****************
ID: LAEG:NBQE:RME5:OPHR:TT4C:PHA3:25FE:7DPW:46PD:E2VI:6FB6:HQ2P
Docker Root Dir: /home/*******/.local/share/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
I tried to create a container with the dbus socket mounted in it :
docker run -it --rm -v /var/run/dbus:/var/run/dbus ubuntu:latest bash
In my case I need to launch the container with a user different from root. Then I created a test user with the uid 1000:
root#163974703e4c:/# adduser test
Adding user `test' ...
Adding new group `test' (1000) ...
Adding new user `test' (1000) with group `test' ...
Creating home directory `/home/test' ...
Copying files from `/etc/skel' ...
New password:
Retype new password:
passwd: password updated successfully
Changing the user information for test
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y
I switch to this new user :
root#163974703e4c:/# su test
test#163974703e4c:/$ id
uid=1000(test) gid=1000(test) groups=1000(test)
As I have a user other than root, he has on my host a subuid. My /etc/subuid:
user:100000:65536
Therefore I put an acl on my dbus socket to allow my sub user to use the socket:
host$ sudo setfacl -R -m u:100999:rwx /run/dbus/system_bus_socket
So I have the DBus socket with an access to this socket in the container:
test#163974703e4c:/$ ls -lan /run/dbus/system_bus_socket
srw-rwxrw-+ 1 65534 65534 0 Dec 9 17:46 /run/dbus/system_bus_socket
test#163974703e4c:/$ getfacl /run/dbus/system_bus_socket
getfacl: Removing leading '/' from absolute path names
# file: run/dbus/system_bus_socket
# owner: nobody
# group: nogroup
user::rw-
user:test:rwx
group::rw-
mask::rwx
other::rw-
I test the command dbus-monitor --system but I have this output :
$ dbus-monitor --system
Failed to open connection to system bus: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
Can you help me please?
I tried to launch my container in privileged mode, with --add-cap ALL, but I still get this error message.
I tried to use strace to show all system call nothing more information :
prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 0
prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 0
prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
getresuid([1000], [1000], [1000]) = 0
getresgid([1000], [1000], [1000]) = 0
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, 29) = 0
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
geteuid() = 1000
getsockname(3, {sa_family=AF_UNIX}, [128 => 2]) = 0
poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}])
sendto(3, "\0", 1, MSG_NOSIGNAL, NULL, 0) = 1
sendto(3, "AUTH EXTERNAL 31303030\r\n", 24, MSG_NOSIGNAL, NULL, 0) = 24
poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
read(3, "REJECTED EXTERNAL\r\n", 2048) = 19
close(3) = 0
write(2, "Failed to open connection to sys"..., 252Failed to open connection to system bus: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
) = 252
exit_group(1) = ?
+++ exited with 1 +++
I want to get the same output as on my host in my container :
dbus-monitor --system
dbus-monitor: unable to enable new-style monitoring: org.freedesktop.DBus.Error.AccessDenied: "Rejected send message, 1 matched rules; type="method_call", sender=":1.544" (uid=1000 pid=32723 comm="dbus-monitor --system" label="unconfined") interface="org.freedesktop.DBus.Monitoring" member="BecomeMonitor" error name="(unset)" requested_reply="0" destination="org.freedesktop.DBus" (bus)". Falling back to eavesdropping.
signal time=1670624207.443897 sender=org.freedesktop.DBus -> destination=:1.544 serial=2 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameAcquired
string ":1.544"
signal time=1670624214.344658 sender=:1.12 -> destination=(null destination) serial=47 path=/org/freedesktop/UDisks2/drives/ST2000DM008_2FR102_ZFL3HVF7; interface=org.freedesktop.DBus.Properties; member=PropertiesChanged
string "org.freedesktop.UDisks2.Drive.Ata"
array [
dict entry(
string "SmartUpdated"
variant uint64 1670624214
)
]
array [
]
The issue is the EXTERNAL authentication used by libdbus which leads t0 discrepancy crossing user-namespace boundaries. Described here https://bugreports.qt.io/browse/QTBUG-108408.
If you can afford to patch libdbus in your project or at least in your containers then you should be good to go by this patch.
From 0d18f455194924ffb100bc980239082187b48301 Mon Sep 1
7 00:00:00 2001
From: =?UTF-8?q?=F0=9F=98=8
Date: Sun, 13 Nov 2022 20:08:02 +0100
Subject: [PATCH] fix: Do not send UID by External Auth
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
sending the UID per EXTERNAL authentication crossing user-namespace would cause
mismatch with out-of-band credentials acquired over UDS
An empty "AUTH EXTERNAL" is still a valid implementation of EXTERNAL authentication
Upstream-ticket: https://gitlab.freedesktop.org/dbus/dbus/-/issues/195
---
dbus/dbus-auth.c | 37 ++++++++++++++-----------------------
1 file changed, 14 insertions(+), 23 deletions(-)
diff --git a/dbus/dbus-auth.c b/dbus/dbus-auth.c
index d4faa737..1d8f3b53 100644
--- a/dbus/dbus-auth.c
+++ b/dbus/dbus-auth.c
## -1231,31 +1231,22 ## static dbus_bool_t
handle_client_initial_response_external_mech (DBusAuth *auth,
DBusString *response)
{
- /* We always append our UID as an initial response, so the server
- * doesn't have to send back an empty challenge to check whether we
- * want to specify an identity. i.e. this avoids a round trip that
- * the spec for the EXTERNAL mechanism otherwise requires.
- */
- DBusString plaintext;
-
- if (!_dbus_string_init (&plaintext))
+ /* We don't send the UID as crossing user-namespace would cause
+ mismatch with out-of-band credentials acquired over UDS
+ it is still a valid implementation of EXTERNAL authentication
+ check related tickets in sd-bus
+ https://github.com/systemd/systemd/commit/1ed4723d38cd0d1423c8fe650f90fa86007ddf55
+ and gdbus
+ https://gitlab.gnome.org/GNOME/glib/-/merge_requests/2832
+
+ Upstream ticket for proper fix: https://gitlab.freedesktop.org/dbus/dbus/-/issues/195
+ */
+ if (!_dbus_string_append (response,
+ "\r\nDATA"))
+ {
return FALSE;
-
- if (!_dbus_append_user_from_current_process (&plaintext))
- goto failed;
-
- if (!_dbus_string_hex_encode (&plaintext, 0,
- response,
- _dbus_string_get_length (response)))
- goto failed;
-
- _dbus_string_free (&plaintext);
-
+ }
return TRUE;
-
- failed:
- _dbus_string_free (&plaintext);
- return FALSE;
}
static dbus_bool_t
--
2.38.1

meteor Verifying Deployment - Connection refused

I am trying to deploy a meteor Application, But I am receiving this error message on the Verifying Deployment section with the following error message -
------------------------------------STDERR------------------------------------
: (7) Failed to connect to 172.17.0.2 port 3000: Connection refused
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to 172.17.0.2 port 3000: Connection refused
=> Logs:
=> Setting node version
NODE_VERSION=14.17.4
v14.17.4 is already installed.
Now using node v14.17.4 (npm v6.14.14)
default -> 14.17.4 (-> v14.17.4 *)
=> Starting meteor app on port 3000
=> Redeploying previous version of the app
When I do the sudo netstat -tulpn | grep LISTEN in the server it shows this
tcp 0 0 10.0.3.1:53 0.0.0.0:* LISTEN 609/dnsmasq
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 406/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 745/sshd: /usr/sbin
tcp6 0 0 :::22 :::* LISTEN 745/sshd: /usr/sbin
When I run sudo docker ps i receive the following message -
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e51b1b4bf3a3 mup-appName:latest "/bin/sh -c 'exec $M…" About an hour ago Restarting (1) 49 seconds ago appName
68b723183f3d mongo:3.4.1 "/entrypoint.sh mong…" 9 days ago Restarting (100) 9 seconds ago mongodb
In my firewall i have also opened the Port 3000
If I check the Docker is running it seems like there is no docker running!!
Also in my mup.js file I am using http and not https
module.exports = {
servers: {
one: {
host: 'xx.xx.xxx.xxx',
username: 'ubuntu',
pem: '/home/runner/.ssh/id_rsa'
}
},
meteor: {
name: 'appName',
path: '../../',
docker: {
image: 'zodern/meteor:latest',
},
servers: {
one: {}
},
buildOptions: {
serverOnly: true
},
env: {
PORT: 3000,
ROOT_URL: 'http://dev-api.appName.com/',
NODE_ENV: 'production',
MAIL_URL: 'smtp://xxxx:xxx/eLPCB3nw3jubkq:#email-smtp.eu-north-1.amazonaws.com:587',
MONGO_URL: 'mongodb+srv://xxx:xx#xxx.iiitd.mongodb.net/Development?retryWrites=true&w=majority'
},
deployCheckWaitTime: 15
}
proxy: {
domains: 'dev.xxx.com',
ssl: {
letsEncryptEmail: 'info#xxx.com'
}
}
}
Any idea what might cause this issue?
I don't know why, but in the MUP docs the correct image name is zodern/meteor:root
If your app is slow to start, increase the deployCheckWaitTime . In my complex apps I put 600, just to ensure the app is up.

RabbitMQ cannot start after upgrading Azure Kubernetes Service (AKS)

I had the same problem with #Amir Soleimani but the error result was a bit different, I tried all the solutions in that post but all of them didn't work.... I'm using Azure Kubernetes Service (AKS) and after upgrading from 1.13.xx to 1.18.xx can't start RabbitMQ anymore.
UPDATED - Solution that worked for me (please consider this approach as it may affect your existing queues)
Remove current rabbitmq StatefulSet including persistent disks
========
Here is my StatefulSet file:
apiVersion: v1
kind: Service
metadata:
name: rabbitmq-management
labels:
app: rabbitmq
spec:
ports:
- port: 80
targetPort: 15672
name: http
selector:
app: rabbitmq
type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
name: rabbitmq
labels:
app: rabbitmq
spec:
ports:
- port: 5672
name: amqp
- port: 4369
name: epmd
- port: 25672
name: rabbitmq-dist
clusterIP: None
selector:
app: rabbitmq
---
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-config
namespace: default
type: Opaque
data:
erlang.cookie: samplecookie==
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
labels:
app: rabbitmq
spec:
serviceName: rabbitmq
selector:
matchLabels:
app: rabbitmq
replicas: 3
template:
metadata:
labels:
app: rabbitmq
spec:
containers:
- name: rabbitmq
image: 'rabbitmq:3.6.6-management-alpine'
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- >
if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi;
until rabbitmqctl node_health_check; do sleep 1; done;
if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit#rabbitmq-0;
rabbitmqctl start_app;
fi;
rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
env:
- name: RABBITMQ_ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: rabbitmq-config
key: erlang.cookie
- name: RABBITMQ_DEFAULT_USER
value: username
- name: RABBITMQ_DEFAULT_PASS
value: password
ports:
- containerPort: 5672
name: amqp
- containerPort: 15672
name: amqp-management
volumeMounts:
- mountPath: /var/lib/rabbitmq
name: volume
volumeClaimTemplates:
- metadata:
name: volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Result of kubectl describe pod rabbitmq-0
DIAGNOSTICS
===========
attempted to contact: ['rabbit#rabbitmq-0']
rabbit#rabbitmq-0:
* connected to epmd (port 4369) on rabbitmq-0
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-0
* suggestion: start the node
current node details:
- node name: 'rabbitmq-cli-91#rabbitmq-0'
- home dir: /var/lib/rabbitmq
- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
Error: unable to connect to node 'rabbit#rabbitmq-0': nodedown
DIAGNOSTICS
===========
attempted to contact: ['rabbit#rabbitmq-0']
rabbit#rabbitmq-0:
* connected to epmd (port 4369) on rabbitmq-0
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-0
* suggestion: start the node
current node details:
- node name: 'rabbitmq-cli-26#rabbitmq-0'
- home dir: /var/lib/rabbitmq
- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: rabbit application is not running on node rabbit#rabbitmq-0.
* Suggestion: start it with "rabbitmqctl start_app" and try again
, message: "Timeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nError: unable to connect to node 'rabbit#rabbitmq-0': nodedown\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit#rabbitmq-0']\n\nrabbit#rabbitmq-0:\n * connected to epmd (port 4369) on rabbitmq-0\n * epmd reports: node 'rabbit' not running at all\n no other nodes on rabbitmq-0\n * suggestion: start the node\n\ncurrent node details:\n- node name: 'rabbitmq-cli-91#rabbitmq-0'\n- home dir: /var/lib/rabbitmq\n- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==\n\nError: unable to connect to node 'rabbit#rabbitmq-0': nodedown\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit#rabbitmq-0']\n\nrabbit#rabbitmq-0:\n * connected to epmd (port 4369) on rabbitmq-0\n * epmd reports: node 'rabbit' not running at all\n no other nodes on rabbitmq-0\n * suggestion: start the node\n\ncurrent node details:\n- node name: 'rabbitmq-cli-26#rabbitmq-0'\n- home dir: /var/lib/rabbitmq\n- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==\n\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: rabbit application is not running on node rabbit#rabbitmq-0.\n * Suggestion: start it with \"rabbitmqctl start_app\" and try again\n"
Warning FailedPostStartHook 23m kubelet Exec lifecycle hook ([/bin/sh -c if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit#rabbitmq-0;
rabbitmqctl start_app;
fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
]) for Container "rabbitmq" in Pod "rabbitmq-0_default(3ac91d73-de7b-4cde-81f6-c31bacd10252)" failed - error: command '/bin/sh -c if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit#rabbitmq-0;
rabbitmqctl start_app;
fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
' exited with 137: Error: unable to connect to node 'rabbit#rabbitmq-0': nodedown
Result of kubectl logs rabbitmq-0
=CRASH REPORT==== 18-Jul-2021::11:06:01 ===
crasher:
initial call: application_master:init/4
pid: <0.156.0>
registered_name: []
exception exit: {{timeout_waiting_for_tables,
[rabbit_user,rabbit_user_permission,rabbit_vhost,
rabbit_durable_route,rabbit_durable_exchange,
rabbit_runtime_parameters,rabbit_durable_queue]},
{rabbit,start,[normal,[]]}}
in function application_master:init/4 (application_master.erl, line 134)
ancestors: [<0.155.0>]
messages: [{'EXIT',<0.157.0>,normal}]
links: [<0.155.0>,<0.31.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 987
stack_size: 27
reductions: 98
neighbours:
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: rabbit
exited: {{timeout_waiting_for_tables,
[rabbit_user,rabbit_user_permission,rabbit_vhost,
rabbit_durable_route,rabbit_durable_exchange,
rabbit_runtime_parameters,rabbit_durable_queue]},
{rabbit,start,[normal,[]]}}
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: amqp_client
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: rabbit_common
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: xmerl
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: os_mon
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: inets
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: asn1
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: syntax_tools
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: mnesia
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: crypto
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: ranch
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: compiler
exited: stopped
type: temporary
BOOT FAILED
===========
Timeout contacting cluster nodes: ['rabbit#rabbitmq-1','rabbit#rabbitmq-2'].
BACKGROUND
==========
This cluster node was shut down while other nodes were still running.
To avoid losing data, you should start the other nodes first, then
start this one. To force this node to start, first invoke
"rabbitmqctl force_boot". If you do so, any changes made on other
cluster nodes after this one was shut down may be lost.
DIAGNOSTICS
===========
attempted to contact: ['rabbit#rabbitmq-1','rabbit#rabbitmq-2']
rabbit#rabbitmq-1:
* unable to connect to epmd (port 4369) on rabbitmq-1: nxdomain (non-existing domain)
rabbit#rabbitmq-2:
* unable to connect to epmd (port 4369) on rabbitmq-2: nxdomain (non-existing domain)
current node details:
- node name: 'rabbit#rabbitmq-0'
- home dir: /var/lib/rabbitmq
- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
Timeout contacting cluster nodes: ['rabbit#rabbitmq-1','rabbit#rabbitmq-2'].
BACKGROUND
==========
This cluster node was shut down while other nodes were still running.
To avoid losing data, you should start the other nodes first, then
start this one. To force this node to start, first invoke
"rabbitmqctl force_boot". If you do so, any changes made on other
cluster nodes after this one was shut down may be lost.
DIAGNOSTICS
===========
attempted to contact: ['rabbit#rabbitmq-1','rabbit#rabbitmq-2']
rabbit#rabbitmq-1:
* unable to connect to epmd (port 4369) on rabbitmq-1: nxdomain (non-existing domain)
rabbit#rabbitmq-2:
* unable to connect to epmd (port 4369) on rabbitmq-2: nxdomain (non-existing domain)
current node details:
- node name: 'rabbit#rabbitmq-0'
- home dir: /var/lib/rabbitmq
- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
{"init terminating in do_boot",timeout_waiting_for_tables}
init terminating in do_boot (timeout_waiting_for_tables)
Crash dump is being written to: erl_crash.dump...
What I tried but didn't work:
rabbitmqctl stop_app
rabbitmqctl force_boot
Remove StatefulSet and re-install
Re-configure the yaml file
Please try force boot in post Start scipt:
...
fi;
if [[ "$HOSTNAME" == "rabbitmq-0" ]]; then
rabbitmqctl stop_app;
rabbitmqctl force_boot;
fi;
until rabbitmqctl node_health_check; do sleep 1; done;
...

Cannot mount volume to pod in Kubernetes using Azure file provisioner

I have the problem that I cannot mount volumes to pods in Kubernetes using the Azure File CSI in Azure cloud.
The error message I am receiving in the pod is
Warning FailedMount 38s kubelet Unable to attach or mount volumes: unmounted volumes=[sensu-backend-etcd], unattached volumes=[default-token-42kfh sensu-backend-etcd sensu-asset-server-ca-cert]: timed out waiting for the condition
My storageclass looks like the following:
items:
- allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"azure-csi-standard-lrs"},"mountOptions":["dir_mode=0640","file_mode=0640","uid=0","gid=0","mfsymlinks","cache=strict","nosharesock"],"parameters":{"location":"eastus","resourceGroup":"kubernetes-resource-group","shareName":"kubernetes","skuName":"Standard_LRS","storageAccount":"kubernetesrf"},"provisioner":"kubernetes.io/azure-file","reclaimPolicy":"Delete","volumeBindingMode":"Immediate"}
storageclass.kubernetes.io/is-default-class: "true"
creationTimestamp: "2020-12-21T19:16:19Z"
managedFields:
- apiVersion: storage.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:allowVolumeExpansion: {}
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:storageclass.kubernetes.io/is-default-class: {}
f:mountOptions: {}
f:parameters:
.: {}
f:location: {}
f:resourceGroup: {}
f:shareName: {}
f:skuName: {}
f:storageAccount: {}
f:provisioner: {}
f:reclaimPolicy: {}
f:volumeBindingMode: {}
manager: kubectl-client-side-apply
operation: Update
time: "2020-12-21T19:16:19Z"
name: azure-csi-standard-lrs
resourceVersion: "15914"
selfLink: /apis/storage.k8s.io/v1/storageclasses/azure-csi-standard-lrs
uid: 3de65d08-14e7-4d0b-a6fe-39ab9a714191
mountOptions:
- dir_mode=0640
- file_mode=0640
- uid=0
- gid=0
- mfsymlinks
- cache=strict
- nosharesock
parameters:
location: eastus
resourceGroup: kubernetes-resource-group
shareName: kubernetes
skuName: Standard_LRS
storageAccount: kubernetesrf
provisioner: kubernetes.io/azure-file
reclaimPolicy: Delete
volumeBindingMode: Immediate
kind: List
metadata:
resourceVersion: ""
selfLink: ""
My PV and PVC are bound:
sensu-backend-etcd 10Gi RWX Retain Bound sensu-system/sensu-backend-etcd azure-csi-standard-lrs 4m31s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
sensu-backend-etcd Bound sensu-backend-etcd 10Gi RWX azure-csi-standard-lrs 4m47s
In the kubelet log I get the following:
Dez 21 19:26:37 kubernetes-3 kubelet[34828]: E1221 19:26:37.766476 34828 pod_workers.go:191] Error syncing pod bab5a69a-f8af-43f1-a3ae-642de8daa05d ("sensu-backend-0_sensu-system(bab5a69a-f8af-43f1-a3ae-642de8daa05d)"), skipping: unmounted volumes=[sensu-backend-etcd], unattached volumes=[sensu-backend-etcd sensu-asset-server-ca-cert default-token-42kfh]: timed out waiting for the condition
Dez 21 19:26:58 kubernetes-3 kubelet[34828]: I1221 19:26:58.002474 34828 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "sensu-backend-etcd" (UniqueName: "kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd") pod "sensu-backend-0" (UID: "bab5a69a-f8af-43f1-a3ae-642de8daa05d")
Dez 21 19:26:58 kubernetes-3 kubelet[34828]: E1221 19:26:58.006699 34828 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd podName: nodeName:}" failed. No retries permitted until 2020-12-21 19:29:00.006639988 +0000 UTC m=+3608.682310977 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume \"sensu-backend-etcd\" (UniqueName: \"kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd\") pod \"sensu-backend-0\" (UID: \"bab5a69a-f8af-43f1-a3ae-642de8daa05d\") "
Dez 21 19:28:51 kubernetes-3 kubelet[34828]: E1221 19:28:51.768309 34828 kubelet.go:1594] Unable to attach or mount volumes for pod "sensu-backend-0_sensu-system(bab5a69a-f8af-43f1-a3ae-642de8daa05d)": unmounted volumes=[sensu-backend-etcd], unattached volumes=[sensu-backend-etcd sensu-asset-server-ca-cert default-token-42kfh]: timed out waiting for the condition; skipping pod
Dez 21 19:28:51 kubernetes-3 kubelet[34828]: E1221 19:28:51.768335 34828 pod_workers.go:191] Error syncing pod bab5a69a-f8af-43f1-a3ae-642de8daa05d ("sensu-backend-0_sensu-system(bab5a69a-f8af-43f1-a3ae-642de8daa05d)"), skipping: unmounted volumes=[sensu-backend-etcd], unattached volumes=[sensu-backend-etcd sensu-asset-server-ca-cert default-token-42kfh]: timed out waiting for the condition
Dez 21 19:29:00 kubernetes-3 kubelet[34828]: I1221 19:29:00.103881 34828 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "sensu-backend-etcd" (UniqueName: "kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd") pod "sensu-backend-0" (UID: "bab5a69a-f8af-43f1-a3ae-642de8daa05d")
Dez 21 19:29:00 kubernetes-3 kubelet[34828]: E1221 19:29:00.108069 34828 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd podName: nodeName:}" failed. No retries permitted until 2020-12-21 19:31:02.108044076 +0000 UTC m=+3730.783715065 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume \"sensu-backend-etcd\" (UniqueName: \"kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd\") pod \"sensu-backend-0\" (UID: \"bab5a69a-f8af-43f1-a3ae-642de8daa05d\") "
Dez 21 19:31:02 kubernetes-3 kubelet[34828]: I1221 19:31:02.169246 34828 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "sensu-backend-etcd" (UniqueName: "kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd") pod "sensu-backend-0" (UID: "bab5a69a-f8af-43f1-a3ae-642de8daa05d")
Dez 21 19:31:02 kubernetes-3 kubelet[34828]: E1221 19:31:02.172474 34828 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd podName: nodeName:}" failed. No retries permitted until 2020-12-21 19:33:04.172432877 +0000 UTC m=+3852.848103766 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume \"sensu-backend-etcd\" (UniqueName: \"kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd\") pod \"sensu-backend-0\" (UID: \"bab5a69a-f8af-43f1-a3ae-642de8daa05d\") "
Dez 21 19:31:09 kubernetes-3 kubelet[34828]: E1221 19:31:09.766084 34828 kubelet.go:1594] Unable to attach or mount volumes for pod "sensu-backend-0_sensu-system(bab5a69a-f8af-43f1-a3ae-642de8daa05d)": unmounted volumes=[sensu-backend-etcd], unattached volumes=[default-token-42kfh sensu-backend-etcd sensu-asset-server-ca-cert]: timed out waiting for the condition; skipping pod
In the kube-controller-manager pod I get:
E1221 20:21:34.069309 1 csi_attacher.go:500] kubernetes.io/csi: attachdetacher.WaitForDetach timeout after 2m0s [volume=sensu-backend-etcd; attachment.ID=csi-9a83de4bef35f5d01e10e3a7d598204c459cac705371256e818e3a35b4b29e4e]
E1221 20:21:34.069453 1 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd podName: nodeName:kubernetes-3}" failed. No retries permitted until 2020-12-21 20:21:34.569430175 +0000 UTC m=+6862.322990347 (durationBeforeRetry 500ms). Error: "AttachVolume.Attach failed for volume \"sensu-backend-etcd\" (UniqueName: \"kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd\") from node \"kubernetes-3\" : attachdetachment timeout for volume sensu-backend-etcd"
I1221 20:21:34.069757 1 event.go:291] "Event occurred" object="sensu-system/sensu-backend-0" kind="Pod" apiVersion="v1" type="Warning" reason="FailedAttachVolume" message="AttachVolume.Attach failed for volume \"sensu-backend-etcd\" : attachdetachment timeout for volume sensu-backend-etcd"
Anyone who knows this error and how to mitigate it?
Thanks in advance.
Best regards,
rforberger
I fixed it.
I switched to the disk.csi.azure.com provisioner and I had to use a volume name as a resource link to Azure like
volumeHandle: /subscriptions/XXXXXXXXXXXXXXXXXXXXXX/resourcegroups/kubernetes-resource-group/providers/Microsoft.Compute/disks/sensu-backend-etcd
in the PV.
Also, I had some mount options in the PV, which did not work with the Azure disk provisioner.

Error while sending query request from client : No peer available to query

I am getting the following error while sending query request from my client.
FabricError: No peers available to query. Errors: ["Failed to connect before the deadline
URL:grpcs://localhost:12051","Failed to connect before the deadline
URL:grpcs://localhost:11051"].
Following is my the part of my connection-org3.json connection profile file
"organizations": {
"Org3": {
"mspid": "Org3MSP",
"peers": [
"peer0.org3.bc4scm.de",
"peer1.org3.bc4scm.de"
],
"certificateAuthorities": [
"ca.org3.bc4scm.de"
]
}
},
"peers": {
"peer0.org3.bc4scm.de": {
"url": "grpcs://localhost:11051",
"tlsCACerts": {
"path": "crypto-config/peerOrganizations/org3.bc4scm.de/tlsca/tlsca.org3.bc4scm.de-cert.pem"
},
"grpcOptions": {
"ssl-target-name-override": "peer0.org3.bc4scm.de"
}
},
"peer1.org3.bc4scm.de": {
"url": "grpcs://localhost:12051",
"tlsCACerts": {
"path": "crypto-config/peerOrganizations/supplier.bc4scm.de/tlsca/tlsca.org3.bc4scm.de-cert.pem"
},
"grpcOptions": {
"ssl-target-name-override": "peer1.org3.bc4scm.de"
}
}
},
"certificateAuthorities": {
"ca.org3.bc4scm.de": {
"url": "https://localhost:9054",
"caName": "ca-supplier",
"tlsCACerts": {
"path": "crypto-config/peerOrganizations/org3.bc4scm.de/tlsca/tlsca.org3.bc4scm.de-cert.pem"
},
"httpOptions": {
"verify": false
}
}
}
And following is a part of my docker composer file.
peer0.org3.bc4scm.de:
container_name: peer0.org3.bc4scm.de
extends:
file: peer-base.yaml
service: peer-base
environment:
- CORE_PEER_ID=peer0.org3.bc4scm.de
- CORE_PEER_ADDRESS=peer0.org3.bc4scm.de:11051
- CORE_PEER_LISTENADDRESS=0.0.0.0:11051
- CORE_PEER_CHAINCODEADDRESS=peer0.org3.bc4scm.de:11052
- CORE_PEER_CHAINCODELISTENADDRESS=0.0.0.0:11052
- CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer0.org3.bc4scm.de:12051
- CORE_PEER_GOSSIP_BOOTSTRAP=peer1.org3.bc4scm.de:11051
- CORE_PEER_LOCALMSPID=Org3MSP
volumes:
- /var/run/:/host/var/run/
- ../crypto-config/peerOrganizations/org3.bc4scm.de/peers/peer0.org3.bc4scm.de/msp:/etc/hyperledger/fabric/msp
- ../crypto-config/peerOrganizations/org3.bc4scm.de/peers/peer0.org3.bc4scm.de/tls:/etc/hyperledger/fabric/tls
- peer0.org3.bc4scm.de:/var/hyperledger/production
ports:
- 11051:11051
peer1.org3.bc4scm.de:
container_name: peer1.org3.bc4scm.de
extends:
file: peer-base.yaml
service: peer-base
environment:
- CORE_PEER_ID=peer1.org3.bc4scm.de
- CORE_PEER_ADDRESS=peer1.org3.bc4scm.de:12051
- CORE_PEER_LISTENADDRESS=0.0.0.0:12051
- CORE_PEER_CHAINCODEADDRESS=peer1.org3.bc4scm.de:12052
- CORE_PEER_CHAINCODELISTENADDRESS=0.0.0.0:12052
- CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer1.org3.bc4scm.de:11051
- CORE_PEER_GOSSIP_BOOTSTRAP=peer0.org3.bc4scm.de:12051
- CORE_PEER_LOCALMSPID=Org3MSP
volumes:
- /var/run/:/host/var/run/
- ../crypto-config/peerOrganizations/org3.bc4scm.de/peers/peer1.org3.bc4scm.de/msp:/etc/hyperledger/fabric/msp
- ../crypto-config/peerOrganizations/supplier.bc4scm.de/peers/peer1.org3.bc4scm.de/tls:/etc/hyperledger/fabric/tls
- peer1.org3.bc4scm.de:/var/hyperledger/production
ports:
- 12051:12051
I got this code from Fabcar sample and tried to query from a client in Org3 instead of Org1. I created an admin user and then created a user in this organization successfully. According to my observations, I am getting the error from following code line execution.
const result = await contract.evaluateTransaction('queryAllProducts','123');
What is the possible reason for this issue? Appreciate your insights on this.
Updates:
I checked opened ports in peer0.prg3.bs4scm.de
root#e52992a76c3d:/opt/gopath/src/github.com/hyperledger/fabric/peer# netstat -tulpn | grep LISTEN
tcp 0 0 127.0.0.1:9443 0.0.0.0:* LISTEN 1/peer
tcp 0 0 127.0.0.11:46353 0.0.0.0:* LISTEN -
tcp6 0 0 :::11051 :::* LISTEN 1/peer
tcp6 0 0 :::6060 :::* LISTEN 1/peer
tcp6 0 0 :::11052 :::* LISTEN 1/peer
Here I can see ports 11051 and 11052 are open and listening.
Also, there is a container for the installed chain code.
cd0b165e5186 dev-peer0.org3.bc4scm.de-scmlogic-1.0-9c7e776aa8a752e530f79d0b456f1bda28aac3f5db0af734be2f315d8d1a4f53 "/bin/sh -c 'cd /usr…" 48 seconds ago Up 47 seconds dev-peer0.org3.bc4scm.de-scmlogic-1.0
When I look at the logs of that peer(peer0.org3) I can see floowing error log is print continuously. It is complaining about the connection with org1
019-07-06 10:26:52.278 UTC [gossip.discovery] expireDeadMembers -> WARN 164 Exiting
2019-07-06 10:26:56.381 UTC [gossip.comm] func1 -> WARN 165 peer1.org1.bc4scm.de:8051, PKIid:42214b7584f3fabcdb84e5770c62e4cf0f7c00b2a9d0441d772925882d4457a7 isn't responsive: EOF
2019-07-06 10:26:56.381 UTC [gossip.discovery] expireDeadMembers -> WARN 166 Entering [42214b7584f3fabcdb84e5770c62e4cf0f7c00b2a9d0441d772925882d4457a7]
2019-07-06 10:26:56.381 UTC [gossip.discovery] expireDeadMembers -> WARN 167 Closing connection to Endpoint: peer1.org1.bc4scm.de:8051, InternalEndpoint: , PKI-ID: 42214b7584f3fabcdb84e5770c62e4cf0f7c00b2a
You could check, if peer is accessible even using browser(Firefox). request on firefox - localhost:11051 if you could see the response means your peer is accessible or if not means your port is not open for the same, then go to the docker file and open the port for the same, and up the peer using docker compose , do the same for every peer you want to access.
Even you could check the logs of peers using following -
docker logs --follow peer0.org3.bc4scm.de
Update : ---
You could check CORE_PEER_GOSSIP_BOOTSTRAP & CORE_PEER_GOSSIP_EXTERNALENDPOINT for both peers
**CORE_PEER_GOSSIP_BOOTSTRAP=<a list of peer endpoints within the peer's org>
CORE_PEER_GOSSIP_EXTERNALENDPOINT=<the peer endpoint, as known outside the org>**
for peer0.org3.bc4scm.de
CORE_PEER_GOSSIP_BOOTSTRAP=peer1.org3.bc4scm.de:12051
CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer0.org3.bc4scm.de:11051
for peer1.org3.bc4scm.de :
CORE_PEER_GOSSIP_BOOTSTRAP=peer0.org3.bc4scm.de:11051
CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer1.org3.bc4scm.de:12051
Check Port accordingly your peers & up your docker file.
This could be due to multiple reasons:
Your peers are not accessible so first check if these ports are open or not.
You should confirm if the chaincode is installed on these peers or not.
If these are not the cases then you must check the logs inside the docker containers of the chaincode and these peers and for that you can use:
docker exec -it [container-name] bash
Do tell me if you find something there and you can't resolve it.
I had this same problem and realized the issue was I ha.d set the "asLocalhost" property to false and was trying to access peers at http://localhost/. Below is the working line with the property set correctly. (I pulled from an example using fabcar, which was great otherwise).
await gateway.connect(ccpPath, { wallet, identity: 'user1', discovery: { enabled: true, asLocalhost: true } });

Resources