RabbitMQ Problem loading page (connection reset bad header GET/FAV) - linux

rabbitmq has connected to my ports and everything looks good so far but when i try to connect to my localhost in browser im getting this error message:
The connection was reset
The connection to the server was reset while the page was loading.
The site could be temporarily unavailable or too busy. Try again in a few moments.
If you are unable to load any pages, check your computer’s network connection.
If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the Web
so the first thing i did was go and look at my rabbitmq log and i see this:
Starting RabbitMQ 3.7.9 on Erlang 20.2.2
Copyright (C) 2007-2018 Pivotal Software, Inc.
Licensed under the MPL. See http://www.rabbitmq.com/
2019-09-15 08:06:51.710 [info] <0.240.0>
node : rabbit#bowzer
home dir : /var/lib/rabbitmq
config file(s) : (none)
cookie hash : XC8syc3LUBiQChoU4UJxPA==
log(s) : /var/log/rabbitmq/rabbit#bowzer.log
: /var/log/rabbitmq/rabbit#bowzer_upgrade.log
database dir : /var/lib/rabbitmq/mnesia/rabbit#bowzer
2019-09-15 08:06:52.982 [info] <0.248.0> Memory high watermark set to 6421 MiB (6733540556 bytes) of 16054 MiB (16833851392 bytes) total
2019-09-15 08:06:52.987 [info] <0.250.0> Enabling free disk space monitoring
2019-09-15 08:06:52.987 [info] <0.250.0> Disk free limit set to 50MB
2019-09-15 08:06:52.991 [info] <0.253.0> Limiting to approx 32668 file handles (29399 sockets)
2019-09-15 08:06:52.991 [info] <0.254.0> FHC read buffering: OFF
2019-09-15 08:06:52.991 [info] <0.254.0> FHC write buffering: ON
2019-09-15 08:06:52.993 [info] <0.240.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-09-15 08:06:53.164 [info] <0.240.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-09-15 08:06:53.164 [info] <0.240.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping registration.
2019-09-15 08:06:53.165 [info] <0.240.0> Priority queues enabled, real BQ is rabbit_variable_queue
2019-09-15 08:06:53.219 [info] <0.278.0> Starting rabbit_node_monitor
2019-09-15 08:06:53.241 [info] <0.306.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rabbit#bowzer/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
2019-09-15 08:06:53.302 [info] <0.306.0> Starting message stores for vhost '/'
2019-09-15 08:06:53.303 [info] <0.310.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2019-09-15 08:06:53.305 [info] <0.306.0> Started message store of type transient for vhost '/'
2019-09-15 08:06:53.306 [info] <0.313.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2019-09-15 08:06:53.308 [info] <0.306.0> Started message store of type persistent for vhost '/'
2019-09-15 08:06:53.312 [warning] <0.334.0> Setting Ranch options together with socket options is deprecated. Please use the new map syntax that allows specifying socket options separately from other options.
2019-09-15 08:06:53.313 [info] <0.348.0> started TCP listener on [::]:5672
2019-09-15 08:06:53.314 [info] <0.240.0> Setting up a table for connection tracking on this node: tracked_connection_on_node_rabbit#bowzer
2019-09-15 08:06:53.314 [info] <0.240.0> Setting up a table for per-vhost connection counting on this node: tracked_connection_per_vhost_on_node_rabbit#bowzer
2019-09-15 08:06:53.315 [info] <0.33.0> Application rabbit started on node rabbit#bowzer
2019-09-15 08:06:53.375 [notice] <0.86.0> Changed loghwm of /var/log/rabbitmq/rabbit#bowzer.log to 50
2019-09-15 08:06:53.540 [info] <0.5.0> Server startup complete; 0 plugins started.
2019-09-15 08:10:51.196 [info] <0.378.0> accepting AMQP connection <0.378.0> (127.0.0.1:55986 -> 127.0.0.1:5672)
2019-09-15 08:10:51.196 [error] <0.378.0> closing AMQP connection <0.378.0> (127.0.0.1:55986 -> 127.0.0.1:5672):
{bad_header,<<"GET / HT">>}
2019-09-15 08:13:26.916 [info] <0.385.0> accepting AMQP connection <0.385.0> (127.0.0.1:55990 -> 127.0.0.1:5672)
2019-09-15 08:13:26.916 [error] <0.385.0> closing AMQP connection <0.385.0> (127.0.0.1:55990 -> 127.0.0.1:5672):
{bad_header,<<"GET / HT">>}
2019-09-15 08:13:27.007 [info] <0.389.0> accepting AMQP connection <0.389.0> (127.0.0.1:55992 -> 127.0.0.1:5672)
2019-09-15 08:13:27.007 [error] <0.389.0> closing AMQP connection <0.389.0> (127.0.0.1:55992 -> 127.0.0.1:5672):
{bad_header,<<"GET /fav">>}
so i went and checked my ports and i get this:
sudo lsof -i -p -n | grep rabbitmq:
epmd 5687 rabbitmq 3u IPv4 59822 0t0 TCP *:4369 (LISTEN)
epmd 5687 rabbitmq 4u IPv6 59823 0t0 TCP *:4369 (LISTEN)
beam.smp 5892 rabbitmq 59u IPv4 57068 0t0 TCP *:25672 (LISTEN)
beam.smp 5892 rabbitmq 69u IPv6 58166 0t0 TCP *:5672 (LISTEN)
sudo service rabbitmq-server status:
● rabbitmq-server.service - RabbitMQ broker
Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset:
Active: active (running) since Sun 2019-09-15 07:53:32 MST; 2min 19s ago
Main PID: 875 (beam.smp)
Status: "Initialized"
Tasks: 90 (limit: 4915)
CGroup: /system.slice/rabbitmq-server.service
├─ 875 /usr/lib/erlang/erts-9.2/bin/beam.smp -W w -A 64 -P 1048576 -t 500000
├─1037 /usr/lib/erlang/erts-9.2/bin/epmd -daemon
├─1387 erl_child_setup 32768
├─1691 inet_gethost 4
└─1692 inet_gethost 4
Sep 15 07:53:27 bowzer rabbitmq-server[875]: ## ##
Sep 15 07:53:27 bowzer rabbitmq-server[875]: ## ## RabbitMQ 3.7.9. Copyright (C
Sep 15 07:53:27 bowzer rabbitmq-server[875]: ########## Licensed under the MPL. See
Sep 15 07:53:27 bowzer rabbitmq-server[875]: ###### ##
Sep 15 07:53:27 bowzer rabbitmq-server[875]: ########## Logs: /var/log/rabbitmq/rabb
Sep 15 07:53:27 bowzer rabbitmq-server[875]: /var/log/rabbitmq/rabb
Sep 15 07:53:27 bowzer rabbitmq-server[875]: Starting broker...
Sep 15 07:53:32 bowzer rabbitmq-server[875]: systemd unit for activation check: "rabbit
Sep 15 07:53:32 bowzer systemd[1]: Started RabbitMQ broker.
Sep 15 07:53:33 bowzer rabbitmq-server[875]: completed with 0 plugins.
i also noticed that when others download and install the server they get 'completed with 6 plugins' and mine started with 0 plugins.

Your browser is trying to talk HTTP on port 5672 which is the AMQP port of the RabbitMQ broker.
If you want to access the management console, enable the management plugin and access it on http://your-rabbitmq-host:15672/.

Related

Tor not starting cleanly on boot

I'm running Debian 10 with bitcoind configured as a systemd service that accesses the tor service via localhost. Every reboot the bitcoind error log is filled with the following, and tor services seem to be stuck in a non-working state.
2020-11-18T03:38:30Z connect() to 127.0.0.1:9050 failed after select(): Connection refused (111)
2020-11-18T03:38:30Z connect() to 127.0.0.1:9050 failed after select(): Connection refused (111)
2020-11-18T03:38:31Z connect() to 127.0.0.1:9050 failed after select(): Connection refused (111)
2020-11-18T03:38:31Z connect() to 127.0.0.1:9050 failed after select(): Connection refused (111)
Upon startup systemctl status tor returns the following, indicating tor started successfully
tor.service - Anonymizing overlay network for TCP (multi-instance-master)
Loaded: loaded (/lib/systemd/system/tor.service; enabled; vendor preset: enabled)
Active: active (exited) since Tue 2020-11-17 19:54:04 PST; 4min 19s ago
Process: 413 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
Main PID: 413 (code=exited, status=0/SUCCESS)
Nov 17 19:54:04 cryptoDaemon systemd[1]: Starting Anonymizing overlay network for TCP (multi-instance-master)...
Nov 17 19:54:04 cryptoDaemon systemd[1]: Started Anonymizing overlay network for TCP (multi-instance-master).
However tail -f /var/log/tor/notices.log indicates tor hasn't started. There are no entries after the reboot.
If I restart tor.service with sudo systemctl restart tor the error immediately disappears and bitcoind starts to function over tor correctly. This indicates to me that the tor service itself is not starting properly after reboot.
After restarting the service /var/log/notices.log gets new entries:
Nov 17 20:02:22.000 [notice] Tor 0.3.5.10 opening log file.
Nov 17 20:02:22.875 [notice] We compiled with OpenSSL 1010104f: OpenSSL 1.1.1d 10 Sep 2019 and we are running with OpenSSL 1010107f: OpenSSL 1.1.1g 21 Apr 2020. These two versions should be binary compatible.
Nov 17 20:02:22.877 [notice] Tor 0.3.5.10 running on Linux with Libevent 2.1.8-stable, OpenSSL 1.1.1g, Zlib 1.2.11, Liblzma 5.2.4, and Libzstd 1.3.8.
Nov 17 20:02:22.877 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Nov 17 20:02:22.877 [notice] Read configuration file "/usr/share/tor/tor-service-defaults-torrc".
Nov 17 20:02:22.877 [notice] Read configuration file "/etc/tor/torrc".
Nov 17 20:02:22.881 [notice] You configured a non-loopback address '10.1.10.20:9050' for SocksPort. This allows everybody on your local network to use your machine as a proxy. Make sure this is what you wanted.
Nov 17 20:02:22.881 [notice] Opening Socks listener on 127.0.0.1:9050
Nov 17 20:02:22.881 [notice] Opened Socks listener on 127.0.0.1:9050
Nov 17 20:02:22.881 [notice] Opening Control listener on 127.0.0.1:9051
Nov 17 20:02:22.881 [notice] Opened Control listener on 127.0.0.1:9051
Nov 17 20:02:22.881 [warn] Unable to make /var/lib/tor group-readable: Permission denied
Nov 17 20:02:22.881 [warn] Unable to make /var/lib/tor group-readable: Permission denied
Nov 17 20:02:22.000 [notice] Not disabling debugger attaching for unprivileged users.
Nov 17 20:02:22.000 [notice] Parsing GEOIP IPv4 file /usr/share/tor/geoip.
Nov 17 20:02:23.000 [notice] Parsing GEOIP IPv6 file /usr/share/tor/geoip6.
Nov 17 20:02:23.000 [notice] Bootstrapped 0%: Starting
Nov 17 20:02:23.000 [notice] Starting with guard context "default"
Nov 17 20:02:23.000 [notice] Signaled readiness to systemd
Nov 17 20:02:24.000 [notice] Bootstrapped 10%: Finishing handshake with directory server
Nov 17 20:02:24.000 [notice] Bootstrapped 80%: Connecting to the Tor network
Nov 17 20:02:24.000 [notice] Opening Control listener on /run/tor/control
Nov 17 20:02:24.000 [notice] Opened Control listener on /run/tor/control
Nov 17 20:02:24.000 [notice] Bootstrapped 90%: Establishing a Tor circuit
Nov 17 20:02:25.000 [notice] Bootstrapped 100%: Done
Further investigation reveals that tor is not starting at boot. /var/log/tor/debug.log is empty after reboot. I can even run systemctl start tor and it starts. systemctl start tor won't mess with a service that is already started, so for some reason systemd isn't starting tor, despite it being enabled. Just for fun I disabled with systemctl disable tor and re-enabled, but to no avail.
Any ideas why tor doesn't start?
I also use this server as a Tor SOCKS proxy on the LAN using SOCKSPORT IP.OF.SERVER:9050 in torrc. disabling this and the associated SOCKSPolicy accept IP.OF.SERVER/24 fixed the issue. If anybody has any insight as to why tor behaves this way and doesn't log why, it'd be appreciated.
I fixed it by overriding the systemd configuration for the tor service since i Need the tor Proxy to listen to an actual network interface.
/etc/systemd/system/tor#default.service.d/override.conf
[Unit]
After=network.target nss-lookup.target network-online.target
Wants=network-online.target
Easy way to create the file is systemctl edit tor#default.service

Error initializing Tor in pythonanywhere - Stuck at 5%

I´m making a project in pythonanywhere , I have installed Tor, but when I run it I´m getting this message:
Sep 13 15:04:29.000 [warn] Problem bootstrapping. Stuck at 5% (conn): Connecting to a relay. (Connection refused; CONNECTREFUSED; count 10; recommendation warn; host 24E2F139121D4394C54B5BCC368B3B411857C413 at
204.13.164.118:443)
Sep 13 15:04:29.000 [warn] 9 connections have failed:
Sep 13 15:04:29.000 [warn] 9 connections died in state connect()ing with SSL state (No SSL object)
Sep 13 15:04:36.000 [warn] Problem bootstrapping. Stuck at 5% (conn): Connecting to a relay. (Connection refused; CONNECTREFUSED; count 11; recommendation warn; host D8B7A3A6542AA54D0946B9DC0257C53B6C376679 at
85.10.201.47:9001)
Sep 13 15:04:36.000 [warn] 10 connections have failed:
Sep 13 15:04:36.000 [warn] 10 connections died in state connect()ing with SSL state (No SSL object)
Sep 13 15:04:37.000 [warn] Problem bootstrapping. Stuck at 5% (conn): Connecting to a relay. (Connection refused; CONNECTREFUSED; count 12; recommendation warn; host A0F06C2FADF88D3A39AA3072B406F09D7095AC9E at
46.165.230.5:443) ...
How do I solve it?
It would not work on free PythonAnywhere accounts as free accounts can only connect to the outer world using the HTTP protocol. Also, free accounts are limited to the whitelist. See https://www.pythonanywhere.com/whitelist/

Nodejs: Handshake terminated by server: 403 rabbitmq

I am trying to connect to rabbitmq broker (rabbitmq-server-3.8.0) by nodejs and amqplib/callback_api library so after install amqplib library with :
npm i amqplib
I wrote this code :
const amqp = require("amqplib/callback_api");
amqp.connect('amqp://guest:guest#xxxx:5672', (err, conn) => {
if (err) throw err;
else console.log(`Connect to brocker success!`);
})
As official site say:
By default, the guest user is prohibited from connecting from remote hosts; it can only connect over a loopback interface (i.e. localhost).
It is possible to allow the guest user to connect from a remote host by setting the loopback_users configuration to none.
In %APPDATA%\RabbitMQ\ location of my broker server it did not exist rabbitmq.conf file so i'd created this file just by this content:
loopback_users = none
C:\Users\tazik.WIN-LKH5BTVHRCM\AppData\Roaming\RabbitMQ>dir
Volume in drive C has no label.
Volume Serial Number is A852-F618
Directory of C:\Users\tazik.WIN-LKH5BTVHRCM\AppData\Roaming\RabbitMQ
10/13/2019 11:39 AM <DIR> .
10/13/2019 11:39 AM <DIR> ..
10/12/2019 02:05 PM 3 advanced.config
10/13/2019 11:41 AM <DIR> db
10/12/2019 02:07 PM 23 enabled_plugins
10/13/2019 10:37 AM <DIR> log
10/13/2019 10:22 AM 21 rabbitmq.conf
3 File(s) 47 bytes
4 Dir(s) 116,768,235,520 bytes free
C:\Users\tazik.WIN-LKH5BTVHRCM\AppData\Roaming\RabbitMQ>
Now after run nodejs code i still got this error:
2019-10-13 10:37:46.818 [info] <0.895.0> accepting AMQP connection <0.895.0> (94.182.192.28:25759 -> *********:5672)
2019-10-13 10:37:46.834 [error] <0.895.0> Error on AMQP connection <0.895.0> (94.182.192.28:25759 -> ******:5672, state: starting):
PLAIN login refused: user 'guest' can only connect via localhost
2019-10-13 10:37:46.849 [info] <0.895.0> closing AMQP connection <0.895.0> (94.182.192.28:25759 -> ******:5672)
Have you restarted after configuration? Also, check if RABBITMQ_CONFIG_FILE env is set to the location you placed the config file:
Open the "RabbitMQ Command Prompt (sbin dir)"
.\rabbitmq-service.bat stop
.\rabbitmq-service.bat remove
Run the following commands in the previous shell:
.\rabbitmq-service.bat install
.\rabbitmq-service.bat start
After that, you should be able to connect

PostgreSQL on IBM Cloud Kubernetes returns "psql: FATAL: password authentication failed for user "replica_user"" error. Works on GCP and Azure

I have deployed this PostgreSQL image to the IBM Cloud, Google Cloud Platform and Microsoft Azure using Kubernetes. https://github.com/paunin/PostDock
It was successfully deployed on all 3 platforms with identical configurations and an identical process. The IBM cloud fails with the error "psql: FATAL: password authentication failed for user "replica_user""
You can find below the logs from all 3 cloud platforms. Has anyone experienced this?
IBM Cloud Log
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
psql: FATAL: password authentication failed for user "replica_user"
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node2-service" (172.30.65.206) and accepting
TCP/IP connections on port 5432?
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
cat: /var/lib/postgresql/data/standby.lock: No such file or directory
>>> Previously Locked standby upstream node LOCKED_STANDBY=''
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
psql: FATAL: password authentication failed for user "replica_user"
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node1-service:5432 (will try 30 times more)
....
The last couple of lines are then repeated many times.
This is the log file from deploying the same application, using identical processes on the Google Cloud. It works just fine on the Google Cloud Platform.
Google Cloud Log
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node1-service" (10.52.0.11) and accepting
TCP/IP connections on port 5432?
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node2-service" (10.52.0.12) and accepting
TCP/IP connections on port 5432?
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
cat: /var/lib/postgresql/data/standby.lock: No such file or directory
>>> Previously Locked standby upstream node LOCKED_STANDBY=''
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node1-service" (10.52.0.11) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node1-service:5432 (will try 30 times more)
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node1-service:5432 (will try 29 times more)
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node1-service" (10.52.0.11) and accepting
TCP/IP connections on port 5432?
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node1-service" (10.52.0.11) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node1-service:5432 (will try 28 times more)
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
>>> REPLICATION_UPSTREAM_NODE_ID=1
>>> Sending in background postgres start...
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
>>> Starting standby node...
>>> Instance hasn't been set up yet.
>>> Clonning primary node...
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
NOTICE: destination directory '/var/lib/postgresql/data' provided
INFO: connecting to upstream node
INFO: Successfully connected to upstream node. Current installation size is 34 MB
INFO: checking and correcting permissions on existing directory /var/lib/postgresql/data ...
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
NOTICE: starting backup (using pg_basebackup)...
INFO: executing: '/usr/lib/postgresql/9.5/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/data -h cyclos-postgres-node1-service -p 5432 -U replica_user -c fast -X stream '
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example : pg_ctl -D /var/lib/postgresql/data start
HINT: After starting the server, you need to register this standby with "repmgr standby register"
[REPMGR EVENT] Node id: 2; Event type: standby_clone; Success [1|0]: 1; Time: 2018-02-02 13:24:32.87843+00; Details: Cloned from host 'cyclos-postgres-node1-service', port 5432; backup method: pg_basebackup; --force: Y
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Will add configs to exists file
>>> Starting postgres...
>>> Waiting for local postgres server start...
>>> Wait db replica_db on cyclos-postgres-node2-service:5432(user: replica_user,password: *******), will try 60 times with delay 10 seconds (TIMEOUT=600)
LOG: incomplete startup packet
LOG: incomplete startup packet
LOG: database system was interrupted; last known up at 2018-02-02 13:24:31 UTC
FATAL: the database system is starting up
psql: FATAL: the database system is starting up
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node2-service:5432 (will try 60 times more)
LOG: entering standby mode
LOG: redo starts at 0/2000028
LOG: consistent recovery state reached at 0/20000F8
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
>>>>>> Db replica_db exists on cyclos-postgres-node2-service:5432!
>>> Waiting for replication on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
>>> Replication is done
>>> Unregister the node if it was done before
DELETE 0
>>> Registering node with role standby
INFO: connecting to standby database
INFO: connecting to master database
INFO: retrieving node list for cluster 'postgres_cluster'
INFO: registering the standby
[REPMGR EVENT] Node id: 2; Event type: standby_register; Success [1|0]: 1; Time: 2018-02-02 13:24:51.891592+00; Details:
INFO: standby registration complete
NOTICE: standby node correctly registered for cluster postgres_cluster with id 2 (conninfo: user=replica_user password=replica_pass host=cyclos-postgres-node2-service dbname=replica_db port=5432 connect_timeout=2)
Locking standby (NEW_UPSTREAM_NODE_ID=1)...
>>> Starting repmgr daemon...
[2018-02-02 13:24:53] [NOTICE] looking for configuration file in current directory
[2018-02-02 13:24:53] [NOTICE] looking for configuration file in /etc
[2018-02-02 13:24:53] [NOTICE] configuration file found at: /etc/repmgr.conf
[2018-02-02 13:24:53] [INFO] connecting to database 'user=replica_user password=replica_pass host=cyclos-postgres-node2-service dbname=replica_db port=5432 connect_timeout=2'
[2018-02-02 13:24:53] [INFO] connected to database, checking its state
[2018-02-02 13:24:53] [INFO] connecting to master node of cluster 'postgres_cluster'
[2018-02-02 13:24:53] [INFO] retrieving node list for cluster 'postgres_cluster'
[2018-02-02 13:24:53] [INFO] checking role of cluster node '1'
[2018-02-02 13:24:53] [INFO] checking cluster configuration with schema 'repmgr_postgres_cluster'
[2018-02-02 13:24:53] [INFO] checking node 2 in cluster 'postgres_cluster'
[2018-02-02 13:24:53] [INFO] reloading configuration file
[2018-02-02 13:24:53] [INFO] configuration has not changed
[2018-02-02 13:24:53] [INFO] starting continuous standby node monitoring
ERROR: cannot execute DELETE in a read-only transaction
STATEMENT: DELETE FROM repmgr_postgres_cluster.repl_nodes WHERE conninfo LIKE '%host=cyclos-postgres-node3-service%'
And on the Azure Cloud, it works just fine as well.
Azure Cloud Log
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node2-service" (10.244.0.9) and accepting
TCP/IP connections on port 5432?
>>> Auto-detected master name: 'cyclos-postgres-node1-service'
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
cat: /var/lib/postgresql/data/standby.lock: No such file or directory
>>> Previously Locked standby upstream node LOCKED_STANDBY=''
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
>>> REPLICATION_UPSTREAM_NODE_ID=1
>>> Sending in background postgres start...
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
>>> Starting standby node...
>>> Instance hasn't been set up yet.
>>> Clonning primary node...
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
NOTICE: destination directory '/var/lib/postgresql/data' provided
INFO: connecting to upstream node
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
INFO: Successfully connected to upstream node. Current installation size is 34 MB
INFO: checking and correcting permissions on existing directory /var/lib/postgresql/data ...
NOTICE: starting backup (using pg_basebackup)...
INFO: executing: '/usr/lib/postgresql/9.5/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/data -h cyclos-postgres-node1-service -p 5432 -U replica_user -c fast -X stream '
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example : pg_ctl -D /var/lib/postgresql/data start
HINT: After starting the server, you need to register this standby with "repmgr standby register"
[REPMGR EVENT] Node id: 2; Event type: standby_clone; Success [1|0]: 1; Time: 2018-02-02 06:50:47.340146+00; Details: Cloned from host 'cyclos-postgres-node1-service', port 5432; backup method: pg_basebackup; --force: Y
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Will add configs to exists file
>>> Starting postgres...
>>> Waiting for local postgres server start...
>>> Wait db replica_db on cyclos-postgres-node2-service:5432(user: replica_user,password: *******), will try 60 times with delay 10 seconds (TIMEOUT=600)
LOG: incomplete startup packet
LOG: database system was interrupted; last known up at 2018-02-02 06:50:46 UTC
LOG: incomplete startup packet
FATAL: the database system is starting up
psql: FATAL: the database system is starting up
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node2-service:5432 (will try 60 times more)
LOG: entering standby mode
LOG: redo starts at 0/2000028
LOG: consistent recovery state reached at 0/2000130
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
>>>>>> Db replica_db exists on cyclos-postgres-node2-service:5432!
>>> Waiting for replication on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
>>> Replication is done
>>> Unregister the node if it was done before
DELETE 0
>>> Registering node with role standby
INFO: connecting to standby database
INFO: connecting to master database
INFO: retrieving node list for cluster 'postgres_cluster'
INFO: registering the standby
[REPMGR EVENT] Node id: 2; Event type: standby_register; Success [1|0]: 1; Time: 2018-02-02 06:51:05.083455+00; Details:
INFO: standby registration complete
NOTICE: standby node correctly registered for cluster postgres_cluster with id 2 (conninfo: user=replica_user password=replica_pass host=cyclos-postgres-node2-service dbname=replica_db port=5432 connect_timeout=2)
Locking standby (NEW_UPSTREAM_NODE_ID=1)...
>>> Starting repmgr daemon...
[2018-02-02 06:51:05] [NOTICE] looking for configuration file in current directory
[2018-02-02 06:51:05] [NOTICE] looking for configuration file in /etc
[2018-02-02 06:51:05] [NOTICE] configuration file found at: /etc/repmgr.conf
[2018-02-02 06:51:05] [INFO] connecting to database 'user=replica_user password=replica_pass host=cyclos-postgres-node2-service dbname=replica_db port=5432 connect_timeout=2'
[2018-02-02 06:51:06] [INFO] connected to database, checking its state
[2018-02-02 06:51:06] [INFO] connecting to master node of cluster 'postgres_cluster'
[2018-02-02 06:51:06] [INFO] retrieving node list for cluster 'postgres_cluster'
[2018-02-02 06:51:06] [INFO] checking role of cluster node '1'
[2018-02-02 06:51:06] [INFO] checking cluster configuration with schema 'repmgr_postgres_cluster'
[2018-02-02 06:51:06] [INFO] checking node 2 in cluster 'postgres_cluster'
[2018-02-02 06:51:06] [INFO] reloading configuration file
[2018-02-02 06:51:06] [INFO] configuration has not changed
[2018-02-02 06:51:06] [INFO] starting continuous standby node monitoring
ERROR: cannot execute DELETE in a read-only transaction
STATEMENT: DELETE FROM repmgr_postgres_cluster.repl_nodes WHERE conninfo LIKE '%host=cyclos-postgres-node3-service%'
I was able to run this on a paid cluster in IBM Cloud and it appears to be working. I did NOT use the persistent volumes and I was on a paid cluster. Please note that persistent volumes are not available on free clusters, so if you are testing on a free cluster you will get issues if you use persistent volumes.
My cluster has 3 workers of size u2c.2x4 (the smallest available) and is on the default version of Kubernetes for IBM Cloud (1.8.6), if that helps you debug at all. Please try again or if your setup is different than mine, let me know and I can try with a matching setup.
$ kubectl logs --namespace=mysystem mysystem-db-node1-0
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
psql: could not translate host name "mysystem-db-node1-service" to address: Name or service not known
psql: could not translate host name "mysystem-db-node2-service" to address: Name or service not known
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
>>> Sending in background postgres start...
>>> Waiting for local postgres server start...
>>> Wait db replica_db on mysystem-db-node1-service:5432(user: replica_user,password: *******), will try 60 times with delay 10 seconds (TIMEOUT=600)
psql: could not translate host name "mysystem-db-node3-service" to address: Name or service not known
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
psql: could not connect to server: Connection refused
Is the server running on host "mysystem-db-node1-service" (172.30.207.54) and accepting
TCP/IP connections on port 5432?
selecting default shared_buffers ... >>>>>> Db replica_db is still not accessable on mysystem-db-node1-service:5432 (will try 60 times more)
128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /var/lib/postgresql/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
waiting for server to start....LOG: could not bind IPv6 socket: Cannot assign requested address
HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
LOG: database system was shut down at 2018-02-14 15:40:14 UTC
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
done
server started
CREATE DATABASE
CREATE ROLE
/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/entrypoint.sh
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Config file was replaced with standard one!
>>>>>> Adding config 'wal_keep_segments'='250'
>>>>>> Adding config 'shared_buffers'='300MB'
>>>>>> Adding config 'archive_command'=''/bin/true''
>>> Creating replication user 'replica_user'
CREATE ROLE
>>> Creating replication db 'replica_db'
LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: autovacuum launcher shutting down
waiting for server to shut down....LOG: shutting down
LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
LOG: database system was shut down at 2018-02-14 15:40:16 UTC
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
LOG: incomplete startup packet
LOG: incomplete startup packet
>>>>>> Db replica_db exists on mysystem-db-node1-service:5432!
>>> Registering node with role master
INFO: connecting to master database
INFO: master register: creating database objects inside the 'repmgr_mysystem_cluster' schema
INFO: retrieving node list for cluster 'mysystem_cluster'
[REPMGR EVENT] Node id: 1; Event type: master_register; Success [1|0]: 1; Time: 2018-02-14 15:40:27.337393+00; Details:
[REPMGR EVENT] will execute script '/usr/local/bin/cluster/repmgr/events/execs/master_register.sh' for the event
[REPMGR EVENT::master_register] Node id: 1; Event type: master_register; Success [1|0]: 1; Time: 2018-02-14 15:40:27.337393+00; Details:
[REPMGR EVENT::master_register] Locking master...
[REPMGR EVENT::master_register] Unlocking standby...
NOTICE: master node correctly registered for cluster 'mysystem_cluster' with id 1 (conninfo: user=replica_user password=replica_pass host=mysystem-db-node1-service dbname=replica_db port=5432 connect_timeout=2)
>>> Starting repmgr daemon...
[2018-02-14 15:40:27] [NOTICE] looking for configuration file in current directory
[2018-02-14 15:40:27] [NOTICE] looking for configuration file in /etc
[2018-02-14 15:40:27] [NOTICE] configuration file found at: /etc/repmgr.conf
[2018-02-14 15:40:27] [INFO] connecting to database 'user=replica_user password=replica_pass host=mysystem-db-node1-service dbname=replica_db port=5432 connect_timeout=2'
[2018-02-14 15:40:27] [INFO] connected to database, checking its state
[2018-02-14 15:40:27] [INFO] checking cluster configuration with schema 'repmgr_mysystem_cluster'
[2018-02-14 15:40:27] [INFO] checking node 1 in cluster 'mysystem_cluster'
[2018-02-14 15:40:27] [INFO] reloading configuration file
[2018-02-14 15:40:27] [INFO] configuration has not changed
[2018-02-14 15:40:27] [INFO] starting continuous master connection check

cassandra and vault error logs

after upgrading a node in cassandra, these error in log occured:
I want to investigate but dont have a direction, any clue will help
thanks
2016/09/19 06:24:49 [INFO] core: post-unseal setup starting
2016/09/19 06:24:49 [INFO] core: mounted backend of type generic at secret/
2016/09/19 06:24:49 [INFO] core: mounted backend of type cubbyhole at cubbyhole/
2016/09/19 06:24:49 [INFO] core: mounted backend of type system at sys/
2016/09/19 06:24:49 [INFO] core: mounted backend of type cassandra at cassandra/
2016/09/19 06:24:49 [INFO] rollback: starting rollback manager
2016/09/19 06:24:50 [INFO] expire: restored 2 leases
2016/09/19 06:24:50 [INFO] core: post-unseal setup complete
2016/09/19 06:24:55 gocql: unable to dial control conn node-0.cassandra-app.mesos:9042: dial tcp 10.0.2.42:9042: getsockopt: connection refused
2016/09/19 06:25:12 error: failed to connect to 10.0.2.42:9042 due to error: gocql: no response to connection startup within timeout
Cassandra cluster is not cross-version compatible, you can not upgrade a node only, you have to upgrade the cluster. This is a common mistake people tend to do, please see this video here it mentiones this problem, also it is very very useful with lots of good info.

Resources