How to install yugabyte-2.0.10.0 on CentOS7? - yugabytedb
I am trying install yugabyte-2.0.10.0:
a) environment:
os: centos7.6
cpu Model: Intel(R) Core(TM) i7 CPU M 620
kernel: 3.10.0-957.el7.x86_64
gcc version 4.8.5
Python 2.7.5
b) commands:
cd ~
rm -rf /opt/yugabyte
mkdir -p /opt/yugabyte
mkdir -p /opt/yugabyte/data
wget https://downloads.yugabyte.com/yugabyte-2.0.10.0-linux.tar.gz
tar -xvzf /root/yugabyte/yugabyte-2.0.10.0-linux.tar.gz -C /opt/yugabyte
/opt/yugabyte/yugabyte-2.0.10.0/bin/post_install.sh
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" destroy
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" create
Error Logs:
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" create
Creating cluster.
Waiting for cluster to be ready.
Traceback (most recent call last):
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1969, in <module>
control.run()
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1946, in run
self.args.func()
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1706, in create_cmd_impl
self.wait_for_cluster_or_raise()
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1551, in wait_for_cluster_or_raise
raise RuntimeError("Timed out waiting for a YugaByte DB cluster!")
RuntimeError: Timed out waiting for a YugaByte DB cluster!
Viewing file /tmp/tmptCw8eu:
2020-01-09 21:21:18,413 INFO: Starting master-1 with:
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-master --fs_data_dirs "/opt/yugabyte/data/node-1/disk-1" --webserver_interface 127.0.0.1 --rpc_bind_addresses 127.0.0.1 --v 0 --version_file_json_path=/opt/yugabyte/yugabyte-2.0.10.0 --webserver_doc_root "/opt/yugabyte/yugabyte-2.0.10.0/www" --replication_factor=1 --yb_num_shards_per_tserver 2 --ysql_num_shards_per_tserver=2 --master_addresses 127.0.0.1:7100 --enable_ysql=true >"/opt/yugabyte/data/node-1/disk-1/master.out" 2>"/opt/yugabyte/data/node-1/disk-1/master.err" &
2020-01-09 21:21:18,475 INFO: Starting tserver-1 with:
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-tserver --fs_data_dirs "/opt/yugabyte/data/node-1/disk-1" --webserver_interface 127.0.0.1 --rpc_bind_addresses 127.0.0.1 --v 0 --version_file_json_path=/opt/yugabyte/yugabyte-2.0.10.0 --webserver_doc_root "/opt/yugabyte/yugabyte-2.0.10.0/www" --tserver_master_addrs=127.0.0.1:7100 --yb_num_shards_per_tserver=2 --redis_proxy_bind_address=127.0.0.1:6379 --cql_proxy_bind_address=127.0.0.1:9042 --local_ip_for_outbound_sockets=127.0.0.1 --use_cassandra_authentication=false --ysql_num_shards_per_tserver=2 --enable_ysql=true --pgsql_proxy_bind_address=127.0.0.1:5433 >"/opt/yugabyte/data/node-1/disk-1/tserver.out" 2>"/opt/yugabyte/data/node-1/disk-1/tserver.err" &
2020-01-09 21:21:18,483 INFO: Waiting for master and tserver processes to come up.
2020-01-09 21:21:18,627 INFO: Waiting for master leader election and tablet server registration.
2020-01-09 21:22:15,331 INFO: Master leader election still pending...
2020-01-09 21:22:16,333 ERROR: Failed waiting for None tservers, got None
^^^ Encountered errors ^^^
please help me in resolving the above issue!
Update1:
Info and Error Logs:
[root#srvr0 ~]# cat /opt/yugabyte/data/node-1/disk-1/master.out
[root#srvr0 ~]# cat /opt/yugabyte/data/node-1/disk-1/master.err
[root#srvr0 ~]# cat /opt/yugabyte/data/node-1/disk-1/tserver.out
The files belonging to this database system will be owned by user "root".
This user must also own the server process.
The database cluster will be initialized with locales
COLLATE: C
CTYPE: en_US.UTF-8
MESSAGES: en_US.UTF-8
MONETARY: en_US.UTF-8
NUMERIC: en_US.UTF-8
TIME: en_US.UTF-8
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
creating directory /opt/yugabyte/data/node-1/disk-1/pg_data ... ok
creating subdirectories ... ok
selecting default max_connections ... 300
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
syncing data to disk ... ok
[root#srvr0 ~]# cat /opt/yugabyte/data/node-1/disk-1/tserver.err
In YugaByte DB, setting LC_COLLATE to C and all other locale settings to en_US.UTF-8 by default. Locale support will be enhanced as part of addressing https://github.com/YugaByte/yugabyte-db/issues/15572020-01-13 15:07:18.447 UTC [12159] LOG: YugaByte is ENABLED in PostgreSQL. Transactions are enabled.
2020-01-13 15:07:18.488 UTC [12159] LOG: listening on IPv4 address "127.0.0.1", port 5433
2020-01-13 15:07:18.595 UTC [12159] LOG: redirecting log output to logging collector process
2020-01-13 15:07:18.595 UTC [12159] HINT: Future log output will appear in directory "/opt/yugabyte/data/node-1/disk-1/yb-data/tserver/logs".
Update 2:
[root#srvr0 ~]# cat /opt/yugabyte/data/node-1/disk-1/yb-data/master/logs/yb-master.WARNING
Log file created at: 2020/01/14 13:47:23
Running on machine: srvr0
Application fingerprint: version 2.0.10.0 build 4 revision 83610e77c7659c7587bc0c8aea76db47ff8e2df1 build_type RELEASE built at 06 Jan 2020 08:02:49 UTC
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0114 13:47:23.925465 12631 master_service.cc:108] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6130): Node 1d36ad7c7b89457197595fc8f9e57f6f peer not initialized.
W0114 13:47:23.928180 12631 master_service.cc:108] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6130): Node 1d36ad7c7b89457197595fc8f9e57f6f peer not initialized.
W0114 13:47:23.929930 12631 master_service.cc:108] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6130): Node 1d36ad7c7b89457197595fc8f9e57f6f peer not initialized.
W0114 13:47:23.931773 12631 master_service.cc:108] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6130): Node 1d36ad7c7b89457197595fc8f9e57f6f peer not initialized.
W0114 13:47:25.277549 12595 log.cc:702] Time spent Fsync log took a long time: real 0.289s user 0.000s sys 0.000s
W0114 13:47:27.635577 12595 log.cc:702] Time spent Fsync log took a long time: real 0.144s user 0.000s sys 0.000s
W0114 13:47:29.459060 12595 log.cc:702] Time spent Fsync log took a long time: real 0.088s user 0.000s sys 0.000s
...
W0114 13:48:17.587898 12595 log.cc:702] Time spent Fsync log took a long time: real 0.068s user 0.000s sys 0.000s
W0114 13:48:17.652386 12595 log.cc:702] Time spent Fsync log took a long time: real 0.064s user 0.000s sys 0.000s
W0114 13:48:18.864150 12595 log.cc:702] Time spent Fsync log took a long time: real 0.089s user 0.000s sys 0.000s
W0114 13:48:25.154635 12654 permissions_manager.cc:1050] Multiple security configs found when loading sys catalog
W0114 13:48:25.181205 12654 catalog_manager.cc:606] Time spent T 00000000000000000000000000000000 P 1d36ad7c7b89457197595fc8f9e57f6f: Loading metadata into memory: real 60.895s user 0.132s sys 0.026s
[root#srvr0 ~]# cat /opt/yugabyte/data/node-1/disk-1/yb-data/tserver/logs/yb-tserver.WARNING
Log file created at: 2020/01/14 13:47:23
Running on machine: srvr0
Application fingerprint: version 2.0.10.0 build 4 revision 83610e77c7659c7587bc0c8aea76db47ff8e2df1 build_type RELEASE built at 06 Jan 2020 08:02:49 UTC
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0114 13:47:23.926698 12628 heartbeater.cc:598] P 4bbca70b45944a7e9f66463471e11466: Failed to heartbeat to 127.0.0.1:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=0, num=1, masters=0x00000000029c8d60 -> [[127.0.0.1:7100]], code=Service unavailable
W0114 13:47:23.928352 12628 heartbeater.cc:598] P 4bbca70b45944a7e9f66463471e11466: Failed to heartbeat to 127.0.0.1:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=1, num=1, masters=0x00000000029c8d60 -> [[127.0.0.1:7100]], code=Service unavailable
W0114 13:47:23.930130 12628 heartbeater.cc:598] P 4bbca70b45944a7e9f66463471e11466: Failed to heartbeat to 127.0.0.1:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=2, num=1, masters=0x00000000029c8d60 -> [[127.0.0.1:7100]], code=Service unavailable
W0114 13:47:23.930173 12628 heartbeater.cc:323] P 4bbca70b45944a7e9f66463471e11466: Failed 3 heartbeats in a row: no longer allowing fast heartbeat attempts.
...
W0114 13:48:22.868005 12628 heartbeater.cc:598] P 4bbca70b45944a7e9f66463471e11466: Failed to heartbeat to 127.0.0.1:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=61, num=1, masters=0x00000000029c8d60 -> [[127.0.0.1:7100]], code=Service unavailable
W0114 13:48:23.869757 12628 heartbeater.cc:598] P 4bbca70b45944a7e9f66463471e11466: Failed to heartbeat to 127.0.0.1:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=62, num=1, masters=0x00000000029c8d60 -> [[127.0.0.1:7100]], code=Service unavailable
W0114 13:48:24.915241 12628 heartbeater.cc:598] P 4bbca70b45944a7e9f66463471e11466: Failed to heartbeat to 127.0.0.1:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=63, num=1, masters=0x00000000029c8d60 -> [[127.0.0.1:7100]], code=Service unavailable
Update 3:
last 20 lines master and tserver INFO files:
[root#srvr0 logs]# tail -20 /opt/yugabyte/data/node-1/disk-1/yb-data/master/logs/yb-master.INFO
}
}
table_type: TRANSACTION_STATUS_TABLE_TYPE
namespace {
name: "system"
}
I0121 07:00:23.625553 12478 catalog_manager.cc:1937] Setting default tablets to 2 with 1 primary servers
I0121 07:00:23.625607 12478 partition.cc:388] Creating partitions with num_tablets: 2
I0121 07:00:23.701505 12478 catalog_manager.cc:2155] Successfully created table transactions [id=ebe4eab3526e4030a8ef44796223f904] per request from internal request
I0121 07:00:23.701651 12478 catalog_manager.cc:741] Finished creating transaction status table asynchronously
I0121 07:00:23.701782 12478 catalog_manager.cc:3790] 5536e8fad1d04d52902a0d9488ab5b4e now has full report for 0 tablets.
I0121 07:00:23.701819 12478 catalog_manager.cc:3796] 5536e8fad1d04d52902a0d9488ab5b4e sent full tablet report with 0 tablets.
I0121 07:00:23.901152 12478 catalog_manager.cc:4037] Peer 5536e8fad1d04d52902a0d9488ab5b4e sent incremental report for 1bd70a13590146de9fa3feb16e90b120, prev state op id: -1, prev state term: 0, prev state has_leader_uuid: 0. Consensus state: current_term: 0 config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }
I0121 07:00:24.024679 12478 catalog_manager.cc:4037] Peer 5536e8fad1d04d52902a0d9488ab5b4e sent incremental report for 1bd70a13590146de9fa3feb16e90b120, prev state op id: -1, prev state term: 0, prev state has_leader_uuid: 0. Consensus state: current_term: 1 config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }
I0121 07:00:24.035790 12457 catalog_manager.cc:4037] Peer 5536e8fad1d04d52902a0d9488ab5b4e sent incremental report for ec9bb307331442b3b1fd7ba43a0199a0, prev state op id: -1, prev state term: 0, prev state has_leader_uuid: 0. Consensus state: current_term: 1 config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }
I0121 07:00:24.035905 12457 catalog_manager.cc:4002] Tablet: 1bd70a13590146de9fa3feb16e90b120 reported consensus state change. New consensus state: current_term: 1 leader_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } } from 5536e8fad1d04d52902a0d9488ab5b4e
I0121 07:00:24.036085 12457 catalog_entity_info.cc:97] T 1bd70a13590146de9fa3feb16e90b120: Leader changed from <NULL> to 0x00000000038ee010 -> { permanent_uuid: 5536e8fad1d04d52902a0d9488ab5b4e registration: common { private_rpc_addresses { host: "127.0.0.1" port: 9100 } http_addresses { host: "127.0.0.1" port: 9000 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } placement_uuid: "" } capabilities: 2189743739 placement_id: cloud1:datacenter1:rack1 }
I0121 07:00:24.069538 12478 catalog_manager.cc:4002] Tablet: ec9bb307331442b3b1fd7ba43a0199a0 reported consensus state change. New consensus state: current_term: 1 leader_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } } from 5536e8fad1d04d52902a0d9488ab5b4e
I0121 07:00:24.069607 12478 catalog_entity_info.cc:97] T ec9bb307331442b3b1fd7ba43a0199a0: Leader changed from <NULL> to 0x00000000038ee010 -> { permanent_uuid: 5536e8fad1d04d52902a0d9488ab5b4e registration: common { private_rpc_addresses { host: "127.0.0.1" port: 9100 } http_addresses { host: "127.0.0.1" port: 9000 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } placement_uuid: "" } capabilities: 2189743739 placement_id: cloud1:datacenter1:rack1 }
I0121 07:00:28.951503 12447 reactor.cc:450] Master_R000: Timing out connection Connection (0x0000000002cc3690) server 127.0.0.1:49899 => 127.0.0.1:7100 - it has been idle for 65.0008s (delta: 65.0008, current time: 751.024, last activity time: 686.023)
[root#srvr0 logs]# tail -20 /opt/yugabyte/data/node-1/disk-1/yb-data/tserver/logs/yb_tserver.INFO
tail: cannot open ‘/opt/yugabyte/data/node-1/disk-1/yb-data/tserver/logs/yb_tserver.INFO’ for reading: No such file or directory
[root#srvr0 logs]# tail -20 /opt/yugabyte/data/node-1/disk-1/yb-data/tserver/logs/yb-tserver.INFO
I0121 07:00:24.024483 13021 consensus_meta.cc:275] T 1bd70a13590146de9fa3feb16e90b120 P 5536e8fad1d04d52902a0d9488ab5b4e: Updating active role from FOLLOWER to LEADER. Consensus state: current_term: 1 leader_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
I0121 07:00:24.024521 13021 raft_consensus.cc:2803] T 1bd70a13590146de9fa3feb16e90b120 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: Calling mark dirty synchronously for reason code NEW_LEADER_ELECTED
I0121 07:00:24.024586 13021 raft_consensus.cc:838] T 1bd70a13590146de9fa3feb16e90b120 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: Becoming Leader. State: Replica: 5536e8fad1d04d52902a0d9488ab5b4e, State: 1, Role: LEADER, Watermarks: {Received: 0.0 Committed: 0.0} Leader: 0.0
I0121 07:00:24.024760 13021 consensus_queue.cc:207] T 1bd70a13590146de9fa3feb16e90b120 P 5536e8fad1d04d52902a0d9488ab5b4e [LEADER]: Queue going to LEADER mode. State: All replicated op: 0.0, Majority replicated op: 0.0, Committed index: 0.0, Last appended: 0.0, Current term: 1, Majority size: 1, State: QUEUE_OPEN, Mode: LEADER, active raft config: opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } }
I0121 07:00:24.024852 13021 raft_consensus.cc:856] Sending NO_OP at op { term: 0 index: 0 }
I0121 07:00:24.026254 13023 replica_state.cc:1268] T 1bd70a13590146de9fa3feb16e90b120 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: SetLeaderNoOpCommittedUnlocked(1)
I0121 07:00:24.026321 13023 replica_state.cc:725] T 1bd70a13590146de9fa3feb16e90b120 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: Advanced the committed_op_id across terms. Last committed operation was: { term: 0 index: 0 } New committed index is: { term: 1 index: 1 }
I0121 07:00:24.035311 13018 leader_election.cc:239] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [CANDIDATE]: Term 1 election: Election decided. Result: candidate won.
I0121 07:00:24.035398 13018 raft_consensus.cc:2867] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 FOLLOWER]: Snoozing failure detection for 3.178s
I0121 07:00:24.035445 13018 raft_consensus.cc:2773] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 FOLLOWER]: Leader election won for term 1
I0121 07:00:24.035468 13018 replica_state.cc:1268] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 FOLLOWER]: SetLeaderNoOpCommittedUnlocked(0)
I0121 07:00:24.035542 13018 consensus_meta.cc:275] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e: Updating active role from FOLLOWER to LEADER. Consensus state: current_term: 1 leader_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" config { opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
I0121 07:00:24.035590 13018 raft_consensus.cc:2803] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: Calling mark dirty synchronously for reason code NEW_LEADER_ELECTED
I0121 07:00:24.035641 13018 raft_consensus.cc:838] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: Becoming Leader. State: Replica: 5536e8fad1d04d52902a0d9488ab5b4e, State: 1, Role: LEADER, Watermarks: {Received: 0.0 Committed: 0.0} Leader: 0.0
I0121 07:00:24.035706 13018 consensus_queue.cc:207] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [LEADER]: Queue going to LEADER mode. State: All replicated op: 0.0, Majority replicated op: 0.0, Committed index: 0.0, Last appended: 0.0, Current term: 1, Majority size: 1, State: QUEUE_OPEN, Mode: LEADER, active raft config: opid_index: -1 peers { permanent_uuid: "5536e8fad1d04d52902a0d9488ab5b4e" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 9100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } }
I0121 07:00:24.035748 13018 raft_consensus.cc:856] Sending NO_OP at op { term: 0 index: 0 }
I0121 07:00:24.036341 13021 replica_state.cc:1268] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: SetLeaderNoOpCommittedUnlocked(1)
I0121 07:00:24.036391 13021 replica_state.cc:725] T ec9bb307331442b3b1fd7ba43a0199a0 P 5536e8fad1d04d52902a0d9488ab5b4e [term 1 LEADER]: Advanced the committed_op_id across terms. Last committed operation was: { term: 0 index: 0 } New committed index is: { term: 1 index: 1 }
I0121 07:01:28.862228 12462 reactor.cc:450] TabletServer_R000: Timing out connection Connection (0x0000000003fb4490) server 127.0.0.1:49050 => 127.0.0.1:9100 - it has been idle for 65.0008s (delta: 65.0008, current time: 810.935, last activity time: 745.934)
I0121 07:01:28.862249 12463 reactor.cc:450] TabletServer_R001: Timing out connection Connection (0x0000000003fb47f0) server 127.0.0.1:33000 => 127.0.0.1:9100 - it has been idle for 65.0008s (delta: 65.0008, current time: 810.936, last activity time: 745.935)
Update 4:
Install Python 2.7.10 on CentOS7(Reference:https://myopswork.com/install-python-2-7-10-on-centos-rhel-75f90c5239a5), as follows:
cd /usr/src
wget https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz
tar xzf Python-2.7.10.tgz
cd Python-2.7.10
./configure
make altinstall
python2.7
###Make python 2.7.10 as default
echo "alias python=\"/usr/local/bin/python2.7\"" >> /etc/profile
execute the following commands to install yugabyte 2.0.10.0
cd ~
rm -rf /opt/yugabyte
mkdir -p /opt/yugabyte
mkdir -p /opt/yugabyte/data
tar -xvzf /tmp/yugabyte/yugabyte-2.0.10.0-linux.tar.gz -C /opt/yugabyte
/opt/yugabyte/yugabyte-2.0.10.0/bin/post_install.sh
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" destroy
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" create
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" setup_redis
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" status
Note: 1st attempt to create db fails, destroy it and create it again.
Logs:
Python 2.7.10:
[root#srvr0 ~]# python
Python 2.7.10 (default, Jan 27 2020, 17:09:56)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> exit();
Installation:
[root#srvr0 ~]# cd ~
[root#srvr0 ~]# rm -rf /opt/yugabyte
[root#srvr0 ~]# mkdir -p /opt/yugabyte
[root#srvr0 ~]# mkdir -p /opt/yugabyte/data
[root#srvr0 ~]# ###cp /root/yugabyte-2.0.10.0-linux.tar.gz /index
[root#srvr0 ~]# tar -xvzf /index/yugabyte/yugabyte-2.0.10.0-linux.tar.gz -C /opt/yugabyte
yugabyte-2.0.10.0/
yugabyte-2.0.10.0/bin/
yugabyte-2.0.10.0/bin/ysqlsh
yugabyte-2.0.10.0/bin/psql
yugabyte-2.0.10.0/bin/bulk_load_cleanup.sh
yugabyte-2.0.10.0/bin/bulk_load_helper.sh
yugabyte-2.0.10.0/bin/log_cleanup.sh
yugabyte-2.0.10.0/bin/yb-check-failed-tablets.sh
yugabyte-2.0.10.0/bin/yb-check-consistency.py
yugabyte-2.0.10.0/bin/configure
...
yugabyte-2.0.10.0/ui/conf/evolutions/default/1.sql
yugabyte-2.0.10.0/ui/conf/application.conf
yugabyte-2.0.10.0/ui/conf/k8s-expose-all.yml
yugabyte-2.0.10.0/ui/conf/application.default.conf
yugabyte-2.0.10.0/ui/conf/default_cmk_policy.json
yugabyte-2.0.10.0/ui/conf/version.txt
yugabyte-2.0.10.0/ui/README.md
yugabyte-2.0.10.0/version_metadata.json
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/post_install.sh
+ /opt/yugabyte/yugabyte-2.0.10.0/bin/patchelf --set-interpreter /opt/yugabyte/yugabyte-2.0.10.0/lib/ld.so log-dump
...
+ /opt/yugabyte/yugabyte-2.0.10.0/bin/patchelf --set-interpreter /opt/yugabyte/yugabyte-2.0.10.0/lib/ld.so vacuumlo
+ /opt/yugabyte/yugabyte-2.0.10.0/bin/patchelf --set-rpath /opt/yugabyte/yugabyte-2.0.10.0/lib/yb:/opt/yugabyte/yugabyte-2.0.10.0/lib/yb-thirdparty:/opt/yugabyte/yugabyte-2.0.10.0/linuxbrew/lib vacuumlo
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" destroy
Destroying cluster.
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" create
Creating cluster.
Waiting for cluster to be ready.
Traceback (most recent call last):
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1969, in <module>
control.run()
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1946, in run
self.args.func()
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1706, in create_cmd_impl
self.wait_for_cluster_or_raise()
File "/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl", line 1551, in wait_for_cluster_or_raise
raise RuntimeError("Timed out waiting for a YugaByte DB cluster!")
RuntimeError: Timed out waiting for a YugaByte DB cluster!
Viewing file /tmp/tmpJb_KSP:
2020-01-27 19:09:25,732 INFO: Starting master-1 with:
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-master --fs_data_dirs "/opt/yugabyte/data/node-1/disk-1" --webserver_interface 127.0.0.1 --rpc_bind_addresses 127.0.0.1 --v 0 --version_file_json_path=/opt/yugabyte/yugabyte-2.0.10.0 --webserver_doc_root "/opt/yugabyte/yugabyte-2.0.10.0/www" --replication_factor=1 --yb_num_shards_per_tserver 2 --ysql_num_shards_per_tserver=2 --master_addresses 127.0.0.1:7100 --enable_ysql=true >"/opt/yugabyte/data/node-1/disk-1/master.out" 2>"/opt/yugabyte/data/node-1/disk-1/master.err" &
2020-01-27 19:09:25,792 INFO: Starting tserver-1 with:
/opt/yugabyte/yugabyte-2.0.10.0/bin/yb-tserver --fs_data_dirs "/opt/yugabyte/data/node-1/disk-1" --webserver_interface 127.0.0.1 --rpc_bind_addresses 127.0.0.1 --v 0 --version_file_json_path=/opt/yugabyte/yugabyte-2.0.10.0 --webserver_doc_root "/opt/yugabyte/yugabyte-2.0.10.0/www" --tserver_master_addrs=127.0.0.1:7100 --yb_num_shards_per_tserver=2 --redis_proxy_bind_address=127.0.0.1:6379 --cql_proxy_bind_address=127.0.0.1:9042 --local_ip_for_outbound_sockets=127.0.0.1 --use_cassandra_authentication=false --ysql_num_shards_per_tserver=2 --enable_ysql=true --pgsql_proxy_bind_address=127.0.0.1:5433 >"/opt/yugabyte/data/node-1/disk-1/tserver.out" 2>"/opt/yugabyte/data/node-1/disk-1/tserver.err" &
2020-01-27 19:09:25,800 INFO: Waiting for master and tserver processes to come up.
2020-01-27 19:09:25,934 INFO: Waiting for master leader election and tablet server registration.
2020-01-27 19:10:22,502 INFO: Master leader election still pending...
2020-01-27 19:10:23,504 ERROR: Failed waiting for None tservers, got None
^^^ Encountered errors ^^^
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" destroy
Destroying cluster.
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" create
Creating cluster.
Waiting for cluster to be ready.
.
----------------------------------------------------------------------------------------------------
| Node Count: 1 | Replication Factor: 1 |
----------------------------------------------------------------------------------------------------
| JDBC : jdbc:postgresql://127.0.0.1:5433/postgres |
| YSQL Shell : /opt/yugabyte/yugabyte-2.0.10.0/bin/ysqlsh |
| YCQL Shell : /opt/yugabyte/yugabyte-2.0.10.0/bin/cqlsh |
| YEDIS Shell : /opt/yugabyte/yugabyte-2.0.10.0/bin/redis-cli |
| Web UI : http://127.0.0.1:7000/ |
| Cluster Data : /opt/yugabyte/data |
----------------------------------------------------------------------------------------------------
For more info, please use: yb-ctl --data_dir /opt/yugabyte/data status
[root#srvr0 ~]# /opt/yugabyte/yugabyte-2.0.10.0/bin/yb-ctl --data_dir "/opt/yugabyte/data" setup_redis
Setting up YugaByte DB support for Redis API.
Waiting for cluster to be ready.
Setup Redis successful.
You got it working in this issue How to run yugabyte-db yugastore application locally? .
Can you check these logs and report them:
/opt/yugabyte/data/node-1/disk-1/master.out, /opt/yugabyte/data/node-1/disk-1/master.err, /opt/yugabyte/data/node-1/disk-1/tserver.out, /opt/yugabyte/data/node-1/disk-1/tserver.err.
We are trying to reproduce this internally and will get back to you. In the meanwhile, could you please check the tserver.err file and the tserver.INFO logs (how to find yb-ctl tserver logs instructions) to see if anything bad is happening? Feels like the tservers are not up and running.
Related
How to use the DBus system in a container with docker root-less
I would like to use DBus in a container with docker in root-less mode. I use Ubuntu 22.10 : host$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.10 Release: 22.10 Codename: kinetic and docker root-less : host$ docker info Client: Context: rootless Debug Mode: false Plugins: app: Docker App (Docker Inc., v0.9.1-beta3) buildx: Docker Buildx (Docker Inc., v0.9.1-docker) compose: Docker Compose (Docker Inc., v2.12.2) scan: Docker Scan (Docker Inc., v0.21.0) Server: Containers: 1 Running: 1 Paused: 0 Stopped: 0 Images: 3 Server Version: 20.10.21 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: false userxattr: true Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: d986545181c905378b0f90faa9c5eae3cbfa3755 runc version: v1.1.4-0-g5fd4c4d init version: de40ad0 Security Options: seccomp Profile: default rootless cgroupns Kernel Version: 5.19.0-26-generic Operating System: Ubuntu 22.10 OSType: linux Architecture: x86_64 CPUs: 12 Total Memory: 31.23GiB Name: **************** ID: LAEG:NBQE:RME5:OPHR:TT4C:PHA3:25FE:7DPW:46PD:E2VI:6FB6:HQ2P Docker Root Dir: /home/*******/.local/share/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false I tried to create a container with the dbus socket mounted in it : docker run -it --rm -v /var/run/dbus:/var/run/dbus ubuntu:latest bash In my case I need to launch the container with a user different from root. Then I created a test user with the uid 1000: root#163974703e4c:/# adduser test Adding user `test' ... Adding new group `test' (1000) ... Adding new user `test' (1000) with group `test' ... Creating home directory `/home/test' ... Copying files from `/etc/skel' ... New password: Retype new password: passwd: password updated successfully Changing the user information for test Enter the new value, or press ENTER for the default Full Name []: Room Number []: Work Phone []: Home Phone []: Other []: Is the information correct? [Y/n] Y I switch to this new user : root#163974703e4c:/# su test test#163974703e4c:/$ id uid=1000(test) gid=1000(test) groups=1000(test) As I have a user other than root, he has on my host a subuid. My /etc/subuid: user:100000:65536 Therefore I put an acl on my dbus socket to allow my sub user to use the socket: host$ sudo setfacl -R -m u:100999:rwx /run/dbus/system_bus_socket So I have the DBus socket with an access to this socket in the container: test#163974703e4c:/$ ls -lan /run/dbus/system_bus_socket srw-rwxrw-+ 1 65534 65534 0 Dec 9 17:46 /run/dbus/system_bus_socket test#163974703e4c:/$ getfacl /run/dbus/system_bus_socket getfacl: Removing leading '/' from absolute path names # file: run/dbus/system_bus_socket # owner: nobody # group: nogroup user::rw- user:test:rwx group::rw- mask::rwx other::rw- I test the command dbus-monitor --system but I have this output : $ dbus-monitor --system Failed to open connection to system bus: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. Can you help me please? I tried to launch my container in privileged mode, with --add-cap ALL, but I still get this error message. I tried to use strace to show all system call nothing more information : prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 0 prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument) prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 0 prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument) prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument) prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument) getresuid([1000], [1000], [1000]) = 0 getresgid([1000], [1000], [1000]) = 0 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 3 connect(3, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, 29) = 0 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR) fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 geteuid() = 1000 getsockname(3, {sa_family=AF_UNIX}, [128 => 2]) = 0 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}]) sendto(3, "\0", 1, MSG_NOSIGNAL, NULL, 0) = 1 sendto(3, "AUTH EXTERNAL 31303030\r\n", 24, MSG_NOSIGNAL, NULL, 0) = 24 poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}]) read(3, "REJECTED EXTERNAL\r\n", 2048) = 19 close(3) = 0 write(2, "Failed to open connection to sys"..., 252Failed to open connection to system bus: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. ) = 252 exit_group(1) = ? +++ exited with 1 +++ I want to get the same output as on my host in my container : dbus-monitor --system dbus-monitor: unable to enable new-style monitoring: org.freedesktop.DBus.Error.AccessDenied: "Rejected send message, 1 matched rules; type="method_call", sender=":1.544" (uid=1000 pid=32723 comm="dbus-monitor --system" label="unconfined") interface="org.freedesktop.DBus.Monitoring" member="BecomeMonitor" error name="(unset)" requested_reply="0" destination="org.freedesktop.DBus" (bus)". Falling back to eavesdropping. signal time=1670624207.443897 sender=org.freedesktop.DBus -> destination=:1.544 serial=2 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameAcquired string ":1.544" signal time=1670624214.344658 sender=:1.12 -> destination=(null destination) serial=47 path=/org/freedesktop/UDisks2/drives/ST2000DM008_2FR102_ZFL3HVF7; interface=org.freedesktop.DBus.Properties; member=PropertiesChanged string "org.freedesktop.UDisks2.Drive.Ata" array [ dict entry( string "SmartUpdated" variant uint64 1670624214 ) ] array [ ]
The issue is the EXTERNAL authentication used by libdbus which leads t0 discrepancy crossing user-namespace boundaries. Described here https://bugreports.qt.io/browse/QTBUG-108408. If you can afford to patch libdbus in your project or at least in your containers then you should be good to go by this patch. From 0d18f455194924ffb100bc980239082187b48301 Mon Sep 1 7 00:00:00 2001 From: =?UTF-8?q?=F0=9F=98=8 Date: Sun, 13 Nov 2022 20:08:02 +0100 Subject: [PATCH] fix: Do not send UID by External Auth MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit sending the UID per EXTERNAL authentication crossing user-namespace would cause mismatch with out-of-band credentials acquired over UDS An empty "AUTH EXTERNAL" is still a valid implementation of EXTERNAL authentication Upstream-ticket: https://gitlab.freedesktop.org/dbus/dbus/-/issues/195 --- dbus/dbus-auth.c | 37 ++++++++++++++----------------------- 1 file changed, 14 insertions(+), 23 deletions(-) diff --git a/dbus/dbus-auth.c b/dbus/dbus-auth.c index d4faa737..1d8f3b53 100644 --- a/dbus/dbus-auth.c +++ b/dbus/dbus-auth.c ## -1231,31 +1231,22 ## static dbus_bool_t handle_client_initial_response_external_mech (DBusAuth *auth, DBusString *response) { - /* We always append our UID as an initial response, so the server - * doesn't have to send back an empty challenge to check whether we - * want to specify an identity. i.e. this avoids a round trip that - * the spec for the EXTERNAL mechanism otherwise requires. - */ - DBusString plaintext; - - if (!_dbus_string_init (&plaintext)) + /* We don't send the UID as crossing user-namespace would cause + mismatch with out-of-band credentials acquired over UDS + it is still a valid implementation of EXTERNAL authentication + check related tickets in sd-bus + https://github.com/systemd/systemd/commit/1ed4723d38cd0d1423c8fe650f90fa86007ddf55 + and gdbus + https://gitlab.gnome.org/GNOME/glib/-/merge_requests/2832 + + Upstream ticket for proper fix: https://gitlab.freedesktop.org/dbus/dbus/-/issues/195 + */ + if (!_dbus_string_append (response, + "\r\nDATA")) + { return FALSE; - - if (!_dbus_append_user_from_current_process (&plaintext)) - goto failed; - - if (!_dbus_string_hex_encode (&plaintext, 0, - response, - _dbus_string_get_length (response))) - goto failed; - - _dbus_string_free (&plaintext); - + } return TRUE; - - failed: - _dbus_string_free (&plaintext); - return FALSE; } static dbus_bool_t -- 2.38.1
meteor Verifying Deployment - Connection refused
I am trying to deploy a meteor Application, But I am receiving this error message on the Verifying Deployment section with the following error message - ------------------------------------STDERR------------------------------------ : (7) Failed to connect to 172.17.0.2 port 3000: Connection refused % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: (7) Failed to connect to 172.17.0.2 port 3000: Connection refused => Logs: => Setting node version NODE_VERSION=14.17.4 v14.17.4 is already installed. Now using node v14.17.4 (npm v6.14.14) default -> 14.17.4 (-> v14.17.4 *) => Starting meteor app on port 3000 => Redeploying previous version of the app When I do the sudo netstat -tulpn | grep LISTEN in the server it shows this tcp 0 0 10.0.3.1:53 0.0.0.0:* LISTEN 609/dnsmasq tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 406/systemd-resolve tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 745/sshd: /usr/sbin tcp6 0 0 :::22 :::* LISTEN 745/sshd: /usr/sbin When I run sudo docker ps i receive the following message - CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES e51b1b4bf3a3 mup-appName:latest "/bin/sh -c 'exec $M…" About an hour ago Restarting (1) 49 seconds ago appName 68b723183f3d mongo:3.4.1 "/entrypoint.sh mong…" 9 days ago Restarting (100) 9 seconds ago mongodb In my firewall i have also opened the Port 3000 If I check the Docker is running it seems like there is no docker running!! Also in my mup.js file I am using http and not https module.exports = { servers: { one: { host: 'xx.xx.xxx.xxx', username: 'ubuntu', pem: '/home/runner/.ssh/id_rsa' } }, meteor: { name: 'appName', path: '../../', docker: { image: 'zodern/meteor:latest', }, servers: { one: {} }, buildOptions: { serverOnly: true }, env: { PORT: 3000, ROOT_URL: 'http://dev-api.appName.com/', NODE_ENV: 'production', MAIL_URL: 'smtp://xxxx:xxx/eLPCB3nw3jubkq:#email-smtp.eu-north-1.amazonaws.com:587', MONGO_URL: 'mongodb+srv://xxx:xx#xxx.iiitd.mongodb.net/Development?retryWrites=true&w=majority' }, deployCheckWaitTime: 15 } proxy: { domains: 'dev.xxx.com', ssl: { letsEncryptEmail: 'info#xxx.com' } } } Any idea what might cause this issue?
I don't know why, but in the MUP docs the correct image name is zodern/meteor:root If your app is slow to start, increase the deployCheckWaitTime . In my complex apps I put 600, just to ensure the app is up.
RabbitMQ cannot start after upgrading Azure Kubernetes Service (AKS)
I had the same problem with #Amir Soleimani but the error result was a bit different, I tried all the solutions in that post but all of them didn't work.... I'm using Azure Kubernetes Service (AKS) and after upgrading from 1.13.xx to 1.18.xx can't start RabbitMQ anymore. UPDATED - Solution that worked for me (please consider this approach as it may affect your existing queues) Remove current rabbitmq StatefulSet including persistent disks ======== Here is my StatefulSet file: apiVersion: v1 kind: Service metadata: name: rabbitmq-management labels: app: rabbitmq spec: ports: - port: 80 targetPort: 15672 name: http selector: app: rabbitmq type: LoadBalancer --- apiVersion: v1 kind: Service metadata: name: rabbitmq labels: app: rabbitmq spec: ports: - port: 5672 name: amqp - port: 4369 name: epmd - port: 25672 name: rabbitmq-dist clusterIP: None selector: app: rabbitmq --- apiVersion: v1 kind: Secret metadata: name: rabbitmq-config namespace: default type: Opaque data: erlang.cookie: samplecookie== --- apiVersion: apps/v1 kind: StatefulSet metadata: name: rabbitmq labels: app: rabbitmq spec: serviceName: rabbitmq selector: matchLabels: app: rabbitmq replicas: 3 template: metadata: labels: app: rabbitmq spec: containers: - name: rabbitmq image: 'rabbitmq:3.6.6-management-alpine' lifecycle: postStart: exec: command: - /bin/sh - -c - > if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new; cat /etc/resolv.conf.new > /etc/resolv.conf; rm /etc/resolv.conf.new; fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then rabbitmqctl stop_app; rabbitmqctl join_cluster rabbit#rabbitmq-0; rabbitmqctl start_app; fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}' env: - name: RABBITMQ_ERLANG_COOKIE valueFrom: secretKeyRef: name: rabbitmq-config key: erlang.cookie - name: RABBITMQ_DEFAULT_USER value: username - name: RABBITMQ_DEFAULT_PASS value: password ports: - containerPort: 5672 name: amqp - containerPort: 15672 name: amqp-management volumeMounts: - mountPath: /var/lib/rabbitmq name: volume volumeClaimTemplates: - metadata: name: volume spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi Result of kubectl describe pod rabbitmq-0 DIAGNOSTICS =========== attempted to contact: ['rabbit#rabbitmq-0'] rabbit#rabbitmq-0: * connected to epmd (port 4369) on rabbitmq-0 * epmd reports: node 'rabbit' not running at all no other nodes on rabbitmq-0 * suggestion: start the node current node details: - node name: 'rabbitmq-cli-91#rabbitmq-0' - home dir: /var/lib/rabbitmq - cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg== Error: unable to connect to node 'rabbit#rabbitmq-0': nodedown DIAGNOSTICS =========== attempted to contact: ['rabbit#rabbitmq-0'] rabbit#rabbitmq-0: * connected to epmd (port 4369) on rabbitmq-0 * epmd reports: node 'rabbit' not running at all no other nodes on rabbitmq-0 * suggestion: start the node current node details: - node name: 'rabbitmq-cli-26#rabbitmq-0' - home dir: /var/lib/rabbitmq - cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg== Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}} Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}} Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}} Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}} Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}} Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}} Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}} Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}} Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}} Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}} Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}} Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}} Error: rabbit application is not running on node rabbit#rabbitmq-0. * Suggestion: start it with "rabbitmqctl start_app" and try again , message: "Timeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit#rabbitmq-0' ...\nError: unable to connect to node 'rabbit#rabbitmq-0': nodedown\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit#rabbitmq-0']\n\nrabbit#rabbitmq-0:\n * connected to epmd (port 4369) on rabbitmq-0\n * epmd reports: node 'rabbit' not running at all\n no other nodes on rabbitmq-0\n * suggestion: start the node\n\ncurrent node details:\n- node name: 'rabbitmq-cli-91#rabbitmq-0'\n- home dir: /var/lib/rabbitmq\n- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==\n\nError: unable to connect to node 'rabbit#rabbitmq-0': nodedown\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit#rabbitmq-0']\n\nrabbit#rabbitmq-0:\n * connected to epmd (port 4369) on rabbitmq-0\n * epmd reports: node 'rabbit' not running at all\n no other nodes on rabbitmq-0\n * suggestion: start the node\n\ncurrent node details:\n- node name: 'rabbitmq-cli-26#rabbitmq-0'\n- home dir: /var/lib/rabbitmq\n- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==\n\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: rabbit application is not running on node rabbit#rabbitmq-0.\n * Suggestion: start it with \"rabbitmqctl start_app\" and try again\n" Warning FailedPostStartHook 23m kubelet Exec lifecycle hook ([/bin/sh -c if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new; cat /etc/resolv.conf.new > /etc/resolv.conf; rm /etc/resolv.conf.new; fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then rabbitmqctl stop_app; rabbitmqctl join_cluster rabbit#rabbitmq-0; rabbitmqctl start_app; fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}' ]) for Container "rabbitmq" in Pod "rabbitmq-0_default(3ac91d73-de7b-4cde-81f6-c31bacd10252)" failed - error: command '/bin/sh -c if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new; cat /etc/resolv.conf.new > /etc/resolv.conf; rm /etc/resolv.conf.new; fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then rabbitmqctl stop_app; rabbitmqctl join_cluster rabbit#rabbitmq-0; rabbitmqctl start_app; fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}' ' exited with 137: Error: unable to connect to node 'rabbit#rabbitmq-0': nodedown Result of kubectl logs rabbitmq-0 =CRASH REPORT==== 18-Jul-2021::11:06:01 === crasher: initial call: application_master:init/4 pid: <0.156.0> registered_name: [] exception exit: {{timeout_waiting_for_tables, [rabbit_user,rabbit_user_permission,rabbit_vhost, rabbit_durable_route,rabbit_durable_exchange, rabbit_runtime_parameters,rabbit_durable_queue]}, {rabbit,start,[normal,[]]}} in function application_master:init/4 (application_master.erl, line 134) ancestors: [<0.155.0>] messages: [{'EXIT',<0.157.0>,normal}] links: [<0.155.0>,<0.31.0>] dictionary: [] trap_exit: true status: running heap_size: 987 stack_size: 27 reductions: 98 neighbours: =INFO REPORT==== 18-Jul-2021::11:06:01 === application: rabbit exited: {{timeout_waiting_for_tables, [rabbit_user,rabbit_user_permission,rabbit_vhost, rabbit_durable_route,rabbit_durable_exchange, rabbit_runtime_parameters,rabbit_durable_queue]}, {rabbit,start,[normal,[]]}} type: temporary =INFO REPORT==== 18-Jul-2021::11:06:01 === application: amqp_client exited: stopped type: temporary =INFO REPORT==== 18-Jul-2021::11:06:01 === application: rabbit_common exited: stopped type: temporary =INFO REPORT==== 18-Jul-2021::11:06:01 === application: xmerl exited: stopped type: temporary =INFO REPORT==== 18-Jul-2021::11:06:01 === application: os_mon exited: stopped type: temporary =INFO REPORT==== 18-Jul-2021::11:06:01 === application: inets exited: stopped type: temporary =INFO REPORT==== 18-Jul-2021::11:06:01 === application: asn1 exited: stopped type: temporary =INFO REPORT==== 18-Jul-2021::11:06:01 === application: syntax_tools exited: stopped type: temporary =INFO REPORT==== 18-Jul-2021::11:06:01 === application: mnesia exited: stopped type: temporary =INFO REPORT==== 18-Jul-2021::11:06:01 === application: crypto exited: stopped type: temporary =INFO REPORT==== 18-Jul-2021::11:06:01 === application: ranch exited: stopped type: temporary =INFO REPORT==== 18-Jul-2021::11:06:01 === application: compiler exited: stopped type: temporary BOOT FAILED =========== Timeout contacting cluster nodes: ['rabbit#rabbitmq-1','rabbit#rabbitmq-2']. BACKGROUND ========== This cluster node was shut down while other nodes were still running. To avoid losing data, you should start the other nodes first, then start this one. To force this node to start, first invoke "rabbitmqctl force_boot". If you do so, any changes made on other cluster nodes after this one was shut down may be lost. DIAGNOSTICS =========== attempted to contact: ['rabbit#rabbitmq-1','rabbit#rabbitmq-2'] rabbit#rabbitmq-1: * unable to connect to epmd (port 4369) on rabbitmq-1: nxdomain (non-existing domain) rabbit#rabbitmq-2: * unable to connect to epmd (port 4369) on rabbitmq-2: nxdomain (non-existing domain) current node details: - node name: 'rabbit#rabbitmq-0' - home dir: /var/lib/rabbitmq - cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg== =INFO REPORT==== 18-Jul-2021::11:06:01 === Timeout contacting cluster nodes: ['rabbit#rabbitmq-1','rabbit#rabbitmq-2']. BACKGROUND ========== This cluster node was shut down while other nodes were still running. To avoid losing data, you should start the other nodes first, then start this one. To force this node to start, first invoke "rabbitmqctl force_boot". If you do so, any changes made on other cluster nodes after this one was shut down may be lost. DIAGNOSTICS =========== attempted to contact: ['rabbit#rabbitmq-1','rabbit#rabbitmq-2'] rabbit#rabbitmq-1: * unable to connect to epmd (port 4369) on rabbitmq-1: nxdomain (non-existing domain) rabbit#rabbitmq-2: * unable to connect to epmd (port 4369) on rabbitmq-2: nxdomain (non-existing domain) current node details: - node name: 'rabbit#rabbitmq-0' - home dir: /var/lib/rabbitmq - cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg== {"init terminating in do_boot",timeout_waiting_for_tables} init terminating in do_boot (timeout_waiting_for_tables) Crash dump is being written to: erl_crash.dump... What I tried but didn't work: rabbitmqctl stop_app rabbitmqctl force_boot Remove StatefulSet and re-install Re-configure the yaml file
Please try force boot in post Start scipt: ... fi; if [[ "$HOSTNAME" == "rabbitmq-0" ]]; then rabbitmqctl stop_app; rabbitmqctl force_boot; fi; until rabbitmqctl node_health_check; do sleep 1; done; ...
Cannot mount volume to pod in Kubernetes using Azure file provisioner
I have the problem that I cannot mount volumes to pods in Kubernetes using the Azure File CSI in Azure cloud. The error message I am receiving in the pod is Warning FailedMount 38s kubelet Unable to attach or mount volumes: unmounted volumes=[sensu-backend-etcd], unattached volumes=[default-token-42kfh sensu-backend-etcd sensu-asset-server-ca-cert]: timed out waiting for the condition My storageclass looks like the following: items: - allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"azure-csi-standard-lrs"},"mountOptions":["dir_mode=0640","file_mode=0640","uid=0","gid=0","mfsymlinks","cache=strict","nosharesock"],"parameters":{"location":"eastus","resourceGroup":"kubernetes-resource-group","shareName":"kubernetes","skuName":"Standard_LRS","storageAccount":"kubernetesrf"},"provisioner":"kubernetes.io/azure-file","reclaimPolicy":"Delete","volumeBindingMode":"Immediate"} storageclass.kubernetes.io/is-default-class: "true" creationTimestamp: "2020-12-21T19:16:19Z" managedFields: - apiVersion: storage.k8s.io/v1 fieldsType: FieldsV1 fieldsV1: f:allowVolumeExpansion: {} f:metadata: f:annotations: .: {} f:kubectl.kubernetes.io/last-applied-configuration: {} f:storageclass.kubernetes.io/is-default-class: {} f:mountOptions: {} f:parameters: .: {} f:location: {} f:resourceGroup: {} f:shareName: {} f:skuName: {} f:storageAccount: {} f:provisioner: {} f:reclaimPolicy: {} f:volumeBindingMode: {} manager: kubectl-client-side-apply operation: Update time: "2020-12-21T19:16:19Z" name: azure-csi-standard-lrs resourceVersion: "15914" selfLink: /apis/storage.k8s.io/v1/storageclasses/azure-csi-standard-lrs uid: 3de65d08-14e7-4d0b-a6fe-39ab9a714191 mountOptions: - dir_mode=0640 - file_mode=0640 - uid=0 - gid=0 - mfsymlinks - cache=strict - nosharesock parameters: location: eastus resourceGroup: kubernetes-resource-group shareName: kubernetes skuName: Standard_LRS storageAccount: kubernetesrf provisioner: kubernetes.io/azure-file reclaimPolicy: Delete volumeBindingMode: Immediate kind: List metadata: resourceVersion: "" selfLink: "" My PV and PVC are bound: sensu-backend-etcd 10Gi RWX Retain Bound sensu-system/sensu-backend-etcd azure-csi-standard-lrs 4m31s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE sensu-backend-etcd Bound sensu-backend-etcd 10Gi RWX azure-csi-standard-lrs 4m47s In the kubelet log I get the following: Dez 21 19:26:37 kubernetes-3 kubelet[34828]: E1221 19:26:37.766476 34828 pod_workers.go:191] Error syncing pod bab5a69a-f8af-43f1-a3ae-642de8daa05d ("sensu-backend-0_sensu-system(bab5a69a-f8af-43f1-a3ae-642de8daa05d)"), skipping: unmounted volumes=[sensu-backend-etcd], unattached volumes=[sensu-backend-etcd sensu-asset-server-ca-cert default-token-42kfh]: timed out waiting for the condition Dez 21 19:26:58 kubernetes-3 kubelet[34828]: I1221 19:26:58.002474 34828 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "sensu-backend-etcd" (UniqueName: "kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd") pod "sensu-backend-0" (UID: "bab5a69a-f8af-43f1-a3ae-642de8daa05d") Dez 21 19:26:58 kubernetes-3 kubelet[34828]: E1221 19:26:58.006699 34828 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd podName: nodeName:}" failed. No retries permitted until 2020-12-21 19:29:00.006639988 +0000 UTC m=+3608.682310977 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume \"sensu-backend-etcd\" (UniqueName: \"kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd\") pod \"sensu-backend-0\" (UID: \"bab5a69a-f8af-43f1-a3ae-642de8daa05d\") " Dez 21 19:28:51 kubernetes-3 kubelet[34828]: E1221 19:28:51.768309 34828 kubelet.go:1594] Unable to attach or mount volumes for pod "sensu-backend-0_sensu-system(bab5a69a-f8af-43f1-a3ae-642de8daa05d)": unmounted volumes=[sensu-backend-etcd], unattached volumes=[sensu-backend-etcd sensu-asset-server-ca-cert default-token-42kfh]: timed out waiting for the condition; skipping pod Dez 21 19:28:51 kubernetes-3 kubelet[34828]: E1221 19:28:51.768335 34828 pod_workers.go:191] Error syncing pod bab5a69a-f8af-43f1-a3ae-642de8daa05d ("sensu-backend-0_sensu-system(bab5a69a-f8af-43f1-a3ae-642de8daa05d)"), skipping: unmounted volumes=[sensu-backend-etcd], unattached volumes=[sensu-backend-etcd sensu-asset-server-ca-cert default-token-42kfh]: timed out waiting for the condition Dez 21 19:29:00 kubernetes-3 kubelet[34828]: I1221 19:29:00.103881 34828 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "sensu-backend-etcd" (UniqueName: "kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd") pod "sensu-backend-0" (UID: "bab5a69a-f8af-43f1-a3ae-642de8daa05d") Dez 21 19:29:00 kubernetes-3 kubelet[34828]: E1221 19:29:00.108069 34828 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd podName: nodeName:}" failed. No retries permitted until 2020-12-21 19:31:02.108044076 +0000 UTC m=+3730.783715065 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume \"sensu-backend-etcd\" (UniqueName: \"kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd\") pod \"sensu-backend-0\" (UID: \"bab5a69a-f8af-43f1-a3ae-642de8daa05d\") " Dez 21 19:31:02 kubernetes-3 kubelet[34828]: I1221 19:31:02.169246 34828 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "sensu-backend-etcd" (UniqueName: "kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd") pod "sensu-backend-0" (UID: "bab5a69a-f8af-43f1-a3ae-642de8daa05d") Dez 21 19:31:02 kubernetes-3 kubelet[34828]: E1221 19:31:02.172474 34828 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd podName: nodeName:}" failed. No retries permitted until 2020-12-21 19:33:04.172432877 +0000 UTC m=+3852.848103766 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume \"sensu-backend-etcd\" (UniqueName: \"kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd\") pod \"sensu-backend-0\" (UID: \"bab5a69a-f8af-43f1-a3ae-642de8daa05d\") " Dez 21 19:31:09 kubernetes-3 kubelet[34828]: E1221 19:31:09.766084 34828 kubelet.go:1594] Unable to attach or mount volumes for pod "sensu-backend-0_sensu-system(bab5a69a-f8af-43f1-a3ae-642de8daa05d)": unmounted volumes=[sensu-backend-etcd], unattached volumes=[default-token-42kfh sensu-backend-etcd sensu-asset-server-ca-cert]: timed out waiting for the condition; skipping pod In the kube-controller-manager pod I get: E1221 20:21:34.069309 1 csi_attacher.go:500] kubernetes.io/csi: attachdetacher.WaitForDetach timeout after 2m0s [volume=sensu-backend-etcd; attachment.ID=csi-9a83de4bef35f5d01e10e3a7d598204c459cac705371256e818e3a35b4b29e4e] E1221 20:21:34.069453 1 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd podName: nodeName:kubernetes-3}" failed. No retries permitted until 2020-12-21 20:21:34.569430175 +0000 UTC m=+6862.322990347 (durationBeforeRetry 500ms). Error: "AttachVolume.Attach failed for volume \"sensu-backend-etcd\" (UniqueName: \"kubernetes.io/csi/file.csi.azure.com^sensu-backend-etcd\") from node \"kubernetes-3\" : attachdetachment timeout for volume sensu-backend-etcd" I1221 20:21:34.069757 1 event.go:291] "Event occurred" object="sensu-system/sensu-backend-0" kind="Pod" apiVersion="v1" type="Warning" reason="FailedAttachVolume" message="AttachVolume.Attach failed for volume \"sensu-backend-etcd\" : attachdetachment timeout for volume sensu-backend-etcd" Anyone who knows this error and how to mitigate it? Thanks in advance. Best regards, rforberger
I fixed it. I switched to the disk.csi.azure.com provisioner and I had to use a volume name as a resource link to Azure like volumeHandle: /subscriptions/XXXXXXXXXXXXXXXXXXXXXX/resourcegroups/kubernetes-resource-group/providers/Microsoft.Compute/disks/sensu-backend-etcd in the PV. Also, I had some mount options in the PV, which did not work with the Azure disk provisioner.
Error while sending query request from client : No peer available to query
I am getting the following error while sending query request from my client. FabricError: No peers available to query. Errors: ["Failed to connect before the deadline URL:grpcs://localhost:12051","Failed to connect before the deadline URL:grpcs://localhost:11051"]. Following is my the part of my connection-org3.json connection profile file "organizations": { "Org3": { "mspid": "Org3MSP", "peers": [ "peer0.org3.bc4scm.de", "peer1.org3.bc4scm.de" ], "certificateAuthorities": [ "ca.org3.bc4scm.de" ] } }, "peers": { "peer0.org3.bc4scm.de": { "url": "grpcs://localhost:11051", "tlsCACerts": { "path": "crypto-config/peerOrganizations/org3.bc4scm.de/tlsca/tlsca.org3.bc4scm.de-cert.pem" }, "grpcOptions": { "ssl-target-name-override": "peer0.org3.bc4scm.de" } }, "peer1.org3.bc4scm.de": { "url": "grpcs://localhost:12051", "tlsCACerts": { "path": "crypto-config/peerOrganizations/supplier.bc4scm.de/tlsca/tlsca.org3.bc4scm.de-cert.pem" }, "grpcOptions": { "ssl-target-name-override": "peer1.org3.bc4scm.de" } } }, "certificateAuthorities": { "ca.org3.bc4scm.de": { "url": "https://localhost:9054", "caName": "ca-supplier", "tlsCACerts": { "path": "crypto-config/peerOrganizations/org3.bc4scm.de/tlsca/tlsca.org3.bc4scm.de-cert.pem" }, "httpOptions": { "verify": false } } } And following is a part of my docker composer file. peer0.org3.bc4scm.de: container_name: peer0.org3.bc4scm.de extends: file: peer-base.yaml service: peer-base environment: - CORE_PEER_ID=peer0.org3.bc4scm.de - CORE_PEER_ADDRESS=peer0.org3.bc4scm.de:11051 - CORE_PEER_LISTENADDRESS=0.0.0.0:11051 - CORE_PEER_CHAINCODEADDRESS=peer0.org3.bc4scm.de:11052 - CORE_PEER_CHAINCODELISTENADDRESS=0.0.0.0:11052 - CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer0.org3.bc4scm.de:12051 - CORE_PEER_GOSSIP_BOOTSTRAP=peer1.org3.bc4scm.de:11051 - CORE_PEER_LOCALMSPID=Org3MSP volumes: - /var/run/:/host/var/run/ - ../crypto-config/peerOrganizations/org3.bc4scm.de/peers/peer0.org3.bc4scm.de/msp:/etc/hyperledger/fabric/msp - ../crypto-config/peerOrganizations/org3.bc4scm.de/peers/peer0.org3.bc4scm.de/tls:/etc/hyperledger/fabric/tls - peer0.org3.bc4scm.de:/var/hyperledger/production ports: - 11051:11051 peer1.org3.bc4scm.de: container_name: peer1.org3.bc4scm.de extends: file: peer-base.yaml service: peer-base environment: - CORE_PEER_ID=peer1.org3.bc4scm.de - CORE_PEER_ADDRESS=peer1.org3.bc4scm.de:12051 - CORE_PEER_LISTENADDRESS=0.0.0.0:12051 - CORE_PEER_CHAINCODEADDRESS=peer1.org3.bc4scm.de:12052 - CORE_PEER_CHAINCODELISTENADDRESS=0.0.0.0:12052 - CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer1.org3.bc4scm.de:11051 - CORE_PEER_GOSSIP_BOOTSTRAP=peer0.org3.bc4scm.de:12051 - CORE_PEER_LOCALMSPID=Org3MSP volumes: - /var/run/:/host/var/run/ - ../crypto-config/peerOrganizations/org3.bc4scm.de/peers/peer1.org3.bc4scm.de/msp:/etc/hyperledger/fabric/msp - ../crypto-config/peerOrganizations/supplier.bc4scm.de/peers/peer1.org3.bc4scm.de/tls:/etc/hyperledger/fabric/tls - peer1.org3.bc4scm.de:/var/hyperledger/production ports: - 12051:12051 I got this code from Fabcar sample and tried to query from a client in Org3 instead of Org1. I created an admin user and then created a user in this organization successfully. According to my observations, I am getting the error from following code line execution. const result = await contract.evaluateTransaction('queryAllProducts','123'); What is the possible reason for this issue? Appreciate your insights on this. Updates: I checked opened ports in peer0.prg3.bs4scm.de root#e52992a76c3d:/opt/gopath/src/github.com/hyperledger/fabric/peer# netstat -tulpn | grep LISTEN tcp 0 0 127.0.0.1:9443 0.0.0.0:* LISTEN 1/peer tcp 0 0 127.0.0.11:46353 0.0.0.0:* LISTEN - tcp6 0 0 :::11051 :::* LISTEN 1/peer tcp6 0 0 :::6060 :::* LISTEN 1/peer tcp6 0 0 :::11052 :::* LISTEN 1/peer Here I can see ports 11051 and 11052 are open and listening. Also, there is a container for the installed chain code. cd0b165e5186 dev-peer0.org3.bc4scm.de-scmlogic-1.0-9c7e776aa8a752e530f79d0b456f1bda28aac3f5db0af734be2f315d8d1a4f53 "/bin/sh -c 'cd /usr…" 48 seconds ago Up 47 seconds dev-peer0.org3.bc4scm.de-scmlogic-1.0 When I look at the logs of that peer(peer0.org3) I can see floowing error log is print continuously. It is complaining about the connection with org1 019-07-06 10:26:52.278 UTC [gossip.discovery] expireDeadMembers -> WARN 164 Exiting 2019-07-06 10:26:56.381 UTC [gossip.comm] func1 -> WARN 165 peer1.org1.bc4scm.de:8051, PKIid:42214b7584f3fabcdb84e5770c62e4cf0f7c00b2a9d0441d772925882d4457a7 isn't responsive: EOF 2019-07-06 10:26:56.381 UTC [gossip.discovery] expireDeadMembers -> WARN 166 Entering [42214b7584f3fabcdb84e5770c62e4cf0f7c00b2a9d0441d772925882d4457a7] 2019-07-06 10:26:56.381 UTC [gossip.discovery] expireDeadMembers -> WARN 167 Closing connection to Endpoint: peer1.org1.bc4scm.de:8051, InternalEndpoint: , PKI-ID: 42214b7584f3fabcdb84e5770c62e4cf0f7c00b2a
You could check, if peer is accessible even using browser(Firefox). request on firefox - localhost:11051 if you could see the response means your peer is accessible or if not means your port is not open for the same, then go to the docker file and open the port for the same, and up the peer using docker compose , do the same for every peer you want to access. Even you could check the logs of peers using following - docker logs --follow peer0.org3.bc4scm.de Update : --- You could check CORE_PEER_GOSSIP_BOOTSTRAP & CORE_PEER_GOSSIP_EXTERNALENDPOINT for both peers **CORE_PEER_GOSSIP_BOOTSTRAP=<a list of peer endpoints within the peer's org> CORE_PEER_GOSSIP_EXTERNALENDPOINT=<the peer endpoint, as known outside the org>** for peer0.org3.bc4scm.de CORE_PEER_GOSSIP_BOOTSTRAP=peer1.org3.bc4scm.de:12051 CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer0.org3.bc4scm.de:11051 for peer1.org3.bc4scm.de : CORE_PEER_GOSSIP_BOOTSTRAP=peer0.org3.bc4scm.de:11051 CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer1.org3.bc4scm.de:12051 Check Port accordingly your peers & up your docker file.
This could be due to multiple reasons: Your peers are not accessible so first check if these ports are open or not. You should confirm if the chaincode is installed on these peers or not. If these are not the cases then you must check the logs inside the docker containers of the chaincode and these peers and for that you can use: docker exec -it [container-name] bash Do tell me if you find something there and you can't resolve it.
I had this same problem and realized the issue was I ha.d set the "asLocalhost" property to false and was trying to access peers at http://localhost/. Below is the working line with the property set correctly. (I pulled from an example using fabcar, which was great otherwise). await gateway.connect(ccpPath, { wallet, identity: 'user1', discovery: { enabled: true, asLocalhost: true } });