cassandra and vault error logs - cassandra

after upgrading a node in cassandra, these error in log occured:
I want to investigate but dont have a direction, any clue will help
thanks
2016/09/19 06:24:49 [INFO] core: post-unseal setup starting
2016/09/19 06:24:49 [INFO] core: mounted backend of type generic at secret/
2016/09/19 06:24:49 [INFO] core: mounted backend of type cubbyhole at cubbyhole/
2016/09/19 06:24:49 [INFO] core: mounted backend of type system at sys/
2016/09/19 06:24:49 [INFO] core: mounted backend of type cassandra at cassandra/
2016/09/19 06:24:49 [INFO] rollback: starting rollback manager
2016/09/19 06:24:50 [INFO] expire: restored 2 leases
2016/09/19 06:24:50 [INFO] core: post-unseal setup complete
2016/09/19 06:24:55 gocql: unable to dial control conn node-0.cassandra-app.mesos:9042: dial tcp 10.0.2.42:9042: getsockopt: connection refused
2016/09/19 06:25:12 error: failed to connect to 10.0.2.42:9042 due to error: gocql: no response to connection startup within timeout

Cassandra cluster is not cross-version compatible, you can not upgrade a node only, you have to upgrade the cluster. This is a common mistake people tend to do, please see this video here it mentiones this problem, also it is very very useful with lots of good info.

Related

Configuration Error in Azure IoT Edge installation - "configuration has correct URIs for daemon mgmt endpoint - Error"

Version details
OS: Ubuntu 18.04.5 LTS
aziot-edge: bionic,now 1.2.3-1 amd64
aziot-identity-service: bionic,now 1.2.2-1 amd64
docker: Docker version 20.10.8+azure, build 3967b7d28e15a020e4ee344283128ead633b3e0c
Verifying the installation shows that the aziot-identityd is in "Down-activating" state
# sudo iotedge system status
System services:
aziot-edged Running
aziot-identityd Down - activating
aziot-keyd Running
aziot-certd Running
aziot-tpmd Ready
aziot-identityd is in a bad state because:
aziot-identityd.service: Down - activating : Printing the last 10 log lines.
-- Logs begin at Fri 2020-11-06 12:29:56 IST, end at Fri 2021-09-10 19:07:13 IST. --
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [INFO] - Could not reconcile Identities with current device data. Reprovisioning.
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [INFO] - Updated device info for Edge1.
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - Failed to provision with IoT Hub, and no valid device backup was found: Hub client error
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - service encountered an error
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - caused by: Hub client error
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - caused by: internal error
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 2021-09-10T13:37:10Z [ERR!] - 0: <unknown>
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN aziot-identityd[1871]: 1: <unknown>
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN systemd[1]: aziot-identityd.service: Main process exited, code=exited, status=1/FAILURE
Sep 10 19:07:10 vm-DevIoTEdge1-poc-CentIN systemd[1]: aziot-identityd.service: Failed with result 'exit-code'.
iotedge check shows 2 configuration related errors:
# iotedge check --verbose
Configuration checks (aziot-identity-service)
---------------------------------------------
√ keyd configuration is well-formed - OK
√ certd configuration is well-formed - OK
√ tpmd configuration is well-formed - OK
√ identityd configuration is well-formed - OK
√ daemon configurations up-to-date with config.toml - OK
√ identityd config toml file specifies a valid hostname - OK
√ aziot-identity-service package is up-to-date - OK
√ host time is close to reference time - OK
√ preloaded certificates are valid - OK
√ keyd is running - OK
√ certd is running - OK
√ identityd is running - OK
× read all preloaded certificates from the Certificates Service - Error
could not load cert with ID "aziot-edged-trust-bundle"
Caused by:
parameter "id" has an invalid value
caused by: not found
√ read all preloaded key pairs from the Keys Service - OK
√ ensure all preloaded certificates match preloaded private keys with the same ID - OK
Connectivity checks (aziot-identity-service)
--------------------------------------------
√ host can connect to and perform TLS handshake with iothub AMQP port - OK
√ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with iothub MQTT port - OK
Configuration checks
--------------------
√ aziot-edged configuration is well-formed - OK
√ configuration up-to-date with config.toml - OK
√ container engine is installed and functional - OK
× configuration has correct URIs for daemon mgmt endpoint - Error
SocketError - SocketErrorCode (TimedOut) : Operation timed out
One or more errors occurred. (Got bad response: )
caused by: docker returned exit code: 1, stderr = SocketError - SocketErrorCode (TimedOut) : Operation timed out
One or more errors occurred. (Got bad response: )
√ aziot-edge package is up-to-date - OK
√ container time is close to host time - OK
‼ DNS server - Warning
Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
You can ignore this warning if you are setting DNS server per module in the Edge deployment.
caused by: Could not open container engine config file /etc/docker/daemon.json
caused by: No such file or directory (os error 2)
√ production readiness: container engine - OK
‼ production readiness: logs policy - Warning
Container engine is not configured to rotate module logs which may cause it run out of disk space.
Please see https://aka.ms/iotedge-prod-checklist-logs for best practices.
You can ignore this warning if you are setting log policy per module in the Edge deployment.
caused by: Could not open container engine config file /etc/docker/daemon.json
caused by: No such file or directory (os error 2)
× production readiness: Edge Agent's storage directory is persisted on the host filesystem - Error
Could not check current state of edgeAgent container
caused by: docker returned exit code: 1, stderr = Error: No such object: edgeAgent
× production readiness: Edge Hub's storage directory is persisted on the host filesystem - Error
Could not check current state of edgeHub container
caused by: docker returned exit code: 1, stderr = Error: No such object: edgeHub
√ Agent image is valid and can be pulled from upstream - OK
Connectivity checks
-------------------
√ container on the default network can connect to upstream AMQP port - OK
√ container on the default network can connect to upstream HTTPS / WebSockets port - OK
√ container on the default network can connect to upstream MQTT port - OK
√ container on the IoT Edge module network can connect to upstream AMQP port - OK
√ container on the IoT Edge module network can connect to upstream HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to upstream MQTT port - OK
30 check(s) succeeded.
2 check(s) raised warnings.
4 check(s) raised errors.
TOML file has only the manual provisioning with connection string.
I had this error because my IOT Hub networks "Public network access" was set as "Disabled".
You can correct this by going the following:
Go to the Azure portal, and go to the IOT Hub resource in question.
Go to the Networking menu option.
Change the "Public network access" to either "All Networks" or "Selected IP ranges", depending on your use case. Remember if you select "Selected IP ranges", you must add the VM/IOT devices ip address to the list of allowed IP addresses.
I came across this question like too many times while I was working with an enterprise environment. My finding is more related to the environment and security aspect of the whole system.
For my case, my working environment was RedHat Linux and Azure is hosted on-prem with added layer of proxy server. Only one piece of advice to solve most common issues in such environment is to give all necessary permissions of rwx (read, write, all).
Pinpointing the problem asked, the identity daemon is failing because the aziot trust bundle is not loading properly.
read all preloaded certificates from the Certificates Service - Error
could not load cert with ID "aziot-edged-trust-bundle"
Check the certificate is properly setup to use device identity certificate.
Second error is related to daemon management socket:
× configuration has correct URIs for daemon mgmt endpoint - Error
SocketError - SocketErrorCode (TimedOut) : Operation timed out
One or more errors occurred. (Got bad response: )
caused by: docker returned exit code: 1, stderr = SocketError - SocketErrorCode (TimedOut) : Operation timed out
One or more errors occurred. (Got bad response: )
This can be resolved by manually giving ownership permission to mgmt.sock at /var/lib/iotedge location.
Nevertheless, there may be a variety of reasons for iotedge dps to not work and further iotAgent and iotHub to not start. It is better to go to the root of the issue and start resolving it.

hyperledger fabric couchdb heap allocation fail crash

I am monitoring hyperledger fabric couchdb and sometimes the couchdb
docker container stoped by heap allocation failing.
below is the logs right before couchdb docker container stoped.
The server host spec is AWS t2.micro and the fabric network consists of 1 peer(dev-mode), 1orderer, 1cli, 1ca, 1couchdb and 1ccenv container.
I have no idea what is the reason and how to solve this problem.
[info] 2018-11-06T14:19:19.839617Z nonode#nohost <0.19858.2908> -------- Closing index for db: shards/80000000-9fffffff/my-fabric.1533697140 idx: _design/16c2f95e8309df9a6f993586266150ba639ba7e8 sig: "97b910c8bbd4a195d0813dc30b7f0b91" because normal
[info] 2018-11-06T14:19:19.844610Z nonode#nohost <0.12086.2908> -------- Index shutdown by monitor notice for db: shards/80000000-9fffffff/_replicator.1533696646 idx: _design/_replicator
[info] 2018-11-06T14:19:19.847820Z nonode#nohost <0.12086.2908> -------- Closing index for db: shards/80000000-9fffffff/_replicator.1533696646 idx: _design/_replicator sig: "3e823c2a4383ac0c18d4e574135a5b08" because normal
eheap_alloc: Cannot allocate 212907632 bytes of memory (of type "heap").
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed

PostgreSQL on IBM Cloud Kubernetes returns "psql: FATAL: password authentication failed for user "replica_user"" error. Works on GCP and Azure

I have deployed this PostgreSQL image to the IBM Cloud, Google Cloud Platform and Microsoft Azure using Kubernetes. https://github.com/paunin/PostDock
It was successfully deployed on all 3 platforms with identical configurations and an identical process. The IBM cloud fails with the error "psql: FATAL: password authentication failed for user "replica_user""
You can find below the logs from all 3 cloud platforms. Has anyone experienced this?
IBM Cloud Log
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
psql: FATAL: password authentication failed for user "replica_user"
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node2-service" (172.30.65.206) and accepting
TCP/IP connections on port 5432?
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
cat: /var/lib/postgresql/data/standby.lock: No such file or directory
>>> Previously Locked standby upstream node LOCKED_STANDBY=''
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
psql: FATAL: password authentication failed for user "replica_user"
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node1-service:5432 (will try 30 times more)
....
The last couple of lines are then repeated many times.
This is the log file from deploying the same application, using identical processes on the Google Cloud. It works just fine on the Google Cloud Platform.
Google Cloud Log
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node1-service" (10.52.0.11) and accepting
TCP/IP connections on port 5432?
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node2-service" (10.52.0.12) and accepting
TCP/IP connections on port 5432?
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
cat: /var/lib/postgresql/data/standby.lock: No such file or directory
>>> Previously Locked standby upstream node LOCKED_STANDBY=''
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node1-service" (10.52.0.11) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node1-service:5432 (will try 30 times more)
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node1-service:5432 (will try 29 times more)
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node1-service" (10.52.0.11) and accepting
TCP/IP connections on port 5432?
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node1-service" (10.52.0.11) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node1-service:5432 (will try 28 times more)
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
>>> REPLICATION_UPSTREAM_NODE_ID=1
>>> Sending in background postgres start...
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
>>> Starting standby node...
>>> Instance hasn't been set up yet.
>>> Clonning primary node...
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
NOTICE: destination directory '/var/lib/postgresql/data' provided
INFO: connecting to upstream node
INFO: Successfully connected to upstream node. Current installation size is 34 MB
INFO: checking and correcting permissions on existing directory /var/lib/postgresql/data ...
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
NOTICE: starting backup (using pg_basebackup)...
INFO: executing: '/usr/lib/postgresql/9.5/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/data -h cyclos-postgres-node1-service -p 5432 -U replica_user -c fast -X stream '
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example : pg_ctl -D /var/lib/postgresql/data start
HINT: After starting the server, you need to register this standby with "repmgr standby register"
[REPMGR EVENT] Node id: 2; Event type: standby_clone; Success [1|0]: 1; Time: 2018-02-02 13:24:32.87843+00; Details: Cloned from host 'cyclos-postgres-node1-service', port 5432; backup method: pg_basebackup; --force: Y
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Will add configs to exists file
>>> Starting postgres...
>>> Waiting for local postgres server start...
>>> Wait db replica_db on cyclos-postgres-node2-service:5432(user: replica_user,password: *******), will try 60 times with delay 10 seconds (TIMEOUT=600)
LOG: incomplete startup packet
LOG: incomplete startup packet
LOG: database system was interrupted; last known up at 2018-02-02 13:24:31 UTC
FATAL: the database system is starting up
psql: FATAL: the database system is starting up
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node2-service:5432 (will try 60 times more)
LOG: entering standby mode
LOG: redo starts at 0/2000028
LOG: consistent recovery state reached at 0/20000F8
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
>>>>>> Db replica_db exists on cyclos-postgres-node2-service:5432!
>>> Waiting for replication on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
>>> Replication is done
>>> Unregister the node if it was done before
DELETE 0
>>> Registering node with role standby
INFO: connecting to standby database
INFO: connecting to master database
INFO: retrieving node list for cluster 'postgres_cluster'
INFO: registering the standby
[REPMGR EVENT] Node id: 2; Event type: standby_register; Success [1|0]: 1; Time: 2018-02-02 13:24:51.891592+00; Details:
INFO: standby registration complete
NOTICE: standby node correctly registered for cluster postgres_cluster with id 2 (conninfo: user=replica_user password=replica_pass host=cyclos-postgres-node2-service dbname=replica_db port=5432 connect_timeout=2)
Locking standby (NEW_UPSTREAM_NODE_ID=1)...
>>> Starting repmgr daemon...
[2018-02-02 13:24:53] [NOTICE] looking for configuration file in current directory
[2018-02-02 13:24:53] [NOTICE] looking for configuration file in /etc
[2018-02-02 13:24:53] [NOTICE] configuration file found at: /etc/repmgr.conf
[2018-02-02 13:24:53] [INFO] connecting to database 'user=replica_user password=replica_pass host=cyclos-postgres-node2-service dbname=replica_db port=5432 connect_timeout=2'
[2018-02-02 13:24:53] [INFO] connected to database, checking its state
[2018-02-02 13:24:53] [INFO] connecting to master node of cluster 'postgres_cluster'
[2018-02-02 13:24:53] [INFO] retrieving node list for cluster 'postgres_cluster'
[2018-02-02 13:24:53] [INFO] checking role of cluster node '1'
[2018-02-02 13:24:53] [INFO] checking cluster configuration with schema 'repmgr_postgres_cluster'
[2018-02-02 13:24:53] [INFO] checking node 2 in cluster 'postgres_cluster'
[2018-02-02 13:24:53] [INFO] reloading configuration file
[2018-02-02 13:24:53] [INFO] configuration has not changed
[2018-02-02 13:24:53] [INFO] starting continuous standby node monitoring
ERROR: cannot execute DELETE in a read-only transaction
STATEMENT: DELETE FROM repmgr_postgres_cluster.repl_nodes WHERE conninfo LIKE '%host=cyclos-postgres-node3-service%'
And on the Azure Cloud, it works just fine as well.
Azure Cloud Log
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node2-service" (10.244.0.9) and accepting
TCP/IP connections on port 5432?
>>> Auto-detected master name: 'cyclos-postgres-node1-service'
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
cat: /var/lib/postgresql/data/standby.lock: No such file or directory
>>> Previously Locked standby upstream node LOCKED_STANDBY=''
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
>>> REPLICATION_UPSTREAM_NODE_ID=1
>>> Sending in background postgres start...
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
>>> Starting standby node...
>>> Instance hasn't been set up yet.
>>> Clonning primary node...
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
NOTICE: destination directory '/var/lib/postgresql/data' provided
INFO: connecting to upstream node
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
INFO: Successfully connected to upstream node. Current installation size is 34 MB
INFO: checking and correcting permissions on existing directory /var/lib/postgresql/data ...
NOTICE: starting backup (using pg_basebackup)...
INFO: executing: '/usr/lib/postgresql/9.5/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/data -h cyclos-postgres-node1-service -p 5432 -U replica_user -c fast -X stream '
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example : pg_ctl -D /var/lib/postgresql/data start
HINT: After starting the server, you need to register this standby with "repmgr standby register"
[REPMGR EVENT] Node id: 2; Event type: standby_clone; Success [1|0]: 1; Time: 2018-02-02 06:50:47.340146+00; Details: Cloned from host 'cyclos-postgres-node1-service', port 5432; backup method: pg_basebackup; --force: Y
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Will add configs to exists file
>>> Starting postgres...
>>> Waiting for local postgres server start...
>>> Wait db replica_db on cyclos-postgres-node2-service:5432(user: replica_user,password: *******), will try 60 times with delay 10 seconds (TIMEOUT=600)
LOG: incomplete startup packet
LOG: database system was interrupted; last known up at 2018-02-02 06:50:46 UTC
LOG: incomplete startup packet
FATAL: the database system is starting up
psql: FATAL: the database system is starting up
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node2-service:5432 (will try 60 times more)
LOG: entering standby mode
LOG: redo starts at 0/2000028
LOG: consistent recovery state reached at 0/2000130
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
>>>>>> Db replica_db exists on cyclos-postgres-node2-service:5432!
>>> Waiting for replication on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
>>> Replication is done
>>> Unregister the node if it was done before
DELETE 0
>>> Registering node with role standby
INFO: connecting to standby database
INFO: connecting to master database
INFO: retrieving node list for cluster 'postgres_cluster'
INFO: registering the standby
[REPMGR EVENT] Node id: 2; Event type: standby_register; Success [1|0]: 1; Time: 2018-02-02 06:51:05.083455+00; Details:
INFO: standby registration complete
NOTICE: standby node correctly registered for cluster postgres_cluster with id 2 (conninfo: user=replica_user password=replica_pass host=cyclos-postgres-node2-service dbname=replica_db port=5432 connect_timeout=2)
Locking standby (NEW_UPSTREAM_NODE_ID=1)...
>>> Starting repmgr daemon...
[2018-02-02 06:51:05] [NOTICE] looking for configuration file in current directory
[2018-02-02 06:51:05] [NOTICE] looking for configuration file in /etc
[2018-02-02 06:51:05] [NOTICE] configuration file found at: /etc/repmgr.conf
[2018-02-02 06:51:05] [INFO] connecting to database 'user=replica_user password=replica_pass host=cyclos-postgres-node2-service dbname=replica_db port=5432 connect_timeout=2'
[2018-02-02 06:51:06] [INFO] connected to database, checking its state
[2018-02-02 06:51:06] [INFO] connecting to master node of cluster 'postgres_cluster'
[2018-02-02 06:51:06] [INFO] retrieving node list for cluster 'postgres_cluster'
[2018-02-02 06:51:06] [INFO] checking role of cluster node '1'
[2018-02-02 06:51:06] [INFO] checking cluster configuration with schema 'repmgr_postgres_cluster'
[2018-02-02 06:51:06] [INFO] checking node 2 in cluster 'postgres_cluster'
[2018-02-02 06:51:06] [INFO] reloading configuration file
[2018-02-02 06:51:06] [INFO] configuration has not changed
[2018-02-02 06:51:06] [INFO] starting continuous standby node monitoring
ERROR: cannot execute DELETE in a read-only transaction
STATEMENT: DELETE FROM repmgr_postgres_cluster.repl_nodes WHERE conninfo LIKE '%host=cyclos-postgres-node3-service%'
I was able to run this on a paid cluster in IBM Cloud and it appears to be working. I did NOT use the persistent volumes and I was on a paid cluster. Please note that persistent volumes are not available on free clusters, so if you are testing on a free cluster you will get issues if you use persistent volumes.
My cluster has 3 workers of size u2c.2x4 (the smallest available) and is on the default version of Kubernetes for IBM Cloud (1.8.6), if that helps you debug at all. Please try again or if your setup is different than mine, let me know and I can try with a matching setup.
$ kubectl logs --namespace=mysystem mysystem-db-node1-0
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
psql: could not translate host name "mysystem-db-node1-service" to address: Name or service not known
psql: could not translate host name "mysystem-db-node2-service" to address: Name or service not known
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
>>> Sending in background postgres start...
>>> Waiting for local postgres server start...
>>> Wait db replica_db on mysystem-db-node1-service:5432(user: replica_user,password: *******), will try 60 times with delay 10 seconds (TIMEOUT=600)
psql: could not translate host name "mysystem-db-node3-service" to address: Name or service not known
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
psql: could not connect to server: Connection refused
Is the server running on host "mysystem-db-node1-service" (172.30.207.54) and accepting
TCP/IP connections on port 5432?
selecting default shared_buffers ... >>>>>> Db replica_db is still not accessable on mysystem-db-node1-service:5432 (will try 60 times more)
128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /var/lib/postgresql/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
waiting for server to start....LOG: could not bind IPv6 socket: Cannot assign requested address
HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
LOG: database system was shut down at 2018-02-14 15:40:14 UTC
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
done
server started
CREATE DATABASE
CREATE ROLE
/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/entrypoint.sh
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Config file was replaced with standard one!
>>>>>> Adding config 'wal_keep_segments'='250'
>>>>>> Adding config 'shared_buffers'='300MB'
>>>>>> Adding config 'archive_command'=''/bin/true''
>>> Creating replication user 'replica_user'
CREATE ROLE
>>> Creating replication db 'replica_db'
LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: autovacuum launcher shutting down
waiting for server to shut down....LOG: shutting down
LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
LOG: database system was shut down at 2018-02-14 15:40:16 UTC
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
LOG: incomplete startup packet
LOG: incomplete startup packet
>>>>>> Db replica_db exists on mysystem-db-node1-service:5432!
>>> Registering node with role master
INFO: connecting to master database
INFO: master register: creating database objects inside the 'repmgr_mysystem_cluster' schema
INFO: retrieving node list for cluster 'mysystem_cluster'
[REPMGR EVENT] Node id: 1; Event type: master_register; Success [1|0]: 1; Time: 2018-02-14 15:40:27.337393+00; Details:
[REPMGR EVENT] will execute script '/usr/local/bin/cluster/repmgr/events/execs/master_register.sh' for the event
[REPMGR EVENT::master_register] Node id: 1; Event type: master_register; Success [1|0]: 1; Time: 2018-02-14 15:40:27.337393+00; Details:
[REPMGR EVENT::master_register] Locking master...
[REPMGR EVENT::master_register] Unlocking standby...
NOTICE: master node correctly registered for cluster 'mysystem_cluster' with id 1 (conninfo: user=replica_user password=replica_pass host=mysystem-db-node1-service dbname=replica_db port=5432 connect_timeout=2)
>>> Starting repmgr daemon...
[2018-02-14 15:40:27] [NOTICE] looking for configuration file in current directory
[2018-02-14 15:40:27] [NOTICE] looking for configuration file in /etc
[2018-02-14 15:40:27] [NOTICE] configuration file found at: /etc/repmgr.conf
[2018-02-14 15:40:27] [INFO] connecting to database 'user=replica_user password=replica_pass host=mysystem-db-node1-service dbname=replica_db port=5432 connect_timeout=2'
[2018-02-14 15:40:27] [INFO] connected to database, checking its state
[2018-02-14 15:40:27] [INFO] checking cluster configuration with schema 'repmgr_mysystem_cluster'
[2018-02-14 15:40:27] [INFO] checking node 1 in cluster 'mysystem_cluster'
[2018-02-14 15:40:27] [INFO] reloading configuration file
[2018-02-14 15:40:27] [INFO] configuration has not changed
[2018-02-14 15:40:27] [INFO] starting continuous master connection check

How to connect Mist to the private blockchain on remote server (Azure)?

I've installed Mist on my local PC (Windows 10), but I don't want to sync Main/Test networks. So I've used this Ethereum + Azure tutorial and now I can work via SSH on my private network.
geth --dev console
More than that, I know that it's possible to run Mist on custom blockchain using special flag
mist.exe --rpc http://YOUR_IP:PORT
So, according to geth --help, I'm running geth --dev --rpc console on Azure's virtual machine, after that I'm running mist.exe --rpc http://VM_IP:8545 and there is an error:
[2016-09-24 18:01:21.928] [INFO] Sockets/node-ipc - Connect to {"hostPort":"http://VM_IP:8545"}
[2016-09-24 18:01:24.968] [ERROR] Sockets/node-ipc - Connection failed (3000ms elapsed)
[2016-09-24 18:01:24.971] [WARN] EthereumNode - Failed to connect to node. Maybe it's not running so let's start our own...
[2016-09-24 18:01:24.979] [INFO] EthereumNode - Node type: geth
[2016-09-24 18:01:24.982] [INFO] EthereumNode - Network: test
[2016-09-24 18:01:24.983] [INFO] EthereumNode - Start node: geth test
[2016-09-24 18:01:32.284] [INFO] EthereumNode - 3000ms elapsed, assuming node started up successfully
[2016-09-24 18:01:32.286] [INFO] EthereumNode - Started node successfully: geth test
[2016-09-24 18:01:32.327] [INFO] Sockets/node-ipc - Connect to {"hostPort":"http://VM_IP:8545"}
[2016-09-24 18:02:02.332] [ERROR] Sockets/node-ipc - Connection failed (30000ms elapsed)
[2016-09-24 18:02:02.333] [ERROR] EthereumNode - Failed to connect to node Error: Unable to connect to socket: timeout
P.S. Mist version - 0.8.2
Your approach is correct. I would say that you have a network configuration issue that prevents your Mist to talk to geth.
I would suggest doing the following test and see if you run into the same issue:
- on the machine where you have Mist, find the geth.exe executable
- run geth with geth --testnet --rpc
- start mist with ./Mist --rpc /.../Ethereum/testnet/geth.ipc or ./Mist --rpc http://localhost:8545
I am on a Mac so I guess you will have to reverse the / and add some C: decorations here and there.

Spring-boot slow to start

When I launch my jhipster app using "mvn spring-boot:run", it takes up to 60 seconds to start...
First part of my log is :
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building jhipster 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] >>> spring-boot-maven-plugin:1.1.9.RELEASE:run (default-cli) # jhipster >>>
[INFO]
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-versions) # jhipster ---
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) # jhipster ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 4 resources
[INFO] Copying 22 resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) # jhipster ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) # jhipster ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 3 resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) # jhipster ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] <<< spring-boot-maven-plugin:1.1.9.RELEASE:run (default-cli) # jhipster <<<
[INFO]
[INFO] --- spring-boot-maven-plugin:1.1.9.RELEASE:run (default-cli) # jhipster ---
[INFO] Attaching agents: []
Listening for transport dt_socket at address: 5005
--> Then it hangs for around 30 seconds before continuing :
[INFO] com.mycompany.myapp.Application - Starting Application on MacBook-Pro.local with PID 5130 (/Users/othomas/Developpement/jhipster-1.9.0/target/classes started by othomas in /Users/othomas/Developpement/jhipster-1.9.0)
[DEBUG] com.mycompany.myapp.Application - Running with Spring Boot v1.1.9.RELEASE, Spring v4.0.8.RELEASE
[DEBUG] org.jboss.logging - Logging Provider: org.jboss.logging.Log4jLoggerProvider
...
I remember having used older versions of jhipster generator (0.17 etc) et it started in 15-20 seconds.
Is it normal or is there a problem on my side ? Where to look for ?
Thanks,
O.
I've been suffering slow startup times myself and wondering what the cause was. I get all the console messages saying various things have started and then it hangs just before the final message to say the app has loaded.
Eventually I found I could use Java VisualVM as part of the JDK to see what was going on. If you have the jdk installed its jvisualvm.exe in the bin folder. Then when I select to debug as Application.java the tomcat process pops up and you can track what's going on.
I took a couple of thread dumps where it hangs and it always seemed to be where the swagger API docs are being generated. A bit more digging and this is configured in a class called MetricsConfiguration which is excluded if you run with a profile called "fast".
In eclipse I edited my debug configuration to include a program argument of:
--spring.profiles.active=dev,fast
This cuts down the startup time from 230 seconds to just 25!
I had a quick scan and fast seems to disable all sorts of things. It mainly looks like the stuff under the admin menu which you'll probably not need during development anyway. Personally I would prefer a fast bootup to being able to see the rest docs during development.
Swagger being such a hog made me wonder if it's such a good idea after all. Is it worth the cost? i then read this http://java.dzone.com/articles/swagger-great and I'm considering just removing it altogether. It's a nice idea but seems to add 33mb to the build + for me was causing really slow startup times.
For info I have around 16 entities. So not small but not excessively large either.
Make sure you aren't running the server in debug mode and have a breakpoint set. This reduced the startup time of one of my applications from 3 min to 22 sec.
This is weird.
Indeed, it should start in 5-15 seconds depending on your machine and specific setup.
But it should not hang for 30 seconds: the line you show is a bit new, it's because we launch the application in debug mode when you use the dev profile -> you can attach a debugger on it.
It looks like it's waiting for you to connect a debugger: I've never seen it myself, so maybe you have some specific JVM option for attaching a debugger at start up, with a timeout of 30 seconds?
Thanks for your feedback. I investigated and put more logs in the app (Application.java).
Actually the problem does not come from the debug mode, the application does not hang here.
The first big "pause" comes from the scanning of liquibase packages (addLiquibaseScanPackages(); in Application.java ) : 26 seconds !
My second pause is still related to Liquibase (log "Configuring Liquibase" ) : 20 seconds. During that time, if I put Liquibase log level to DEBUG, I see that a lock is set and then released but it happens very quickly.
I really don't understand. I am using h2 in-memory database, jdk 1.7.0_25 and Maven 3.0.5, running on MacBook Pro with SSD.
Here is my full log when I run with "mvn spring-boot:run".
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building jhipster 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] >>> spring-boot-maven-plugin:1.1.9.RELEASE:run (default-cli) # jhipster >>>
[INFO]
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-versions) # jhipster ---
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) # jhipster ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 4 resources
[INFO] Copying 22 resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) # jhipster ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) # jhipster ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 3 resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) # jhipster ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] <<< spring-boot-maven-plugin:1.1.9.RELEASE:run (default-cli) # jhipster <<<
[INFO]
[INFO] --- spring-boot-maven-plugin:1.1.9.RELEASE:run (default-cli) # jhipster ---
[INFO] Attaching agents: []
Listening for transport dt_socket at address: 5005
Wed Nov 26 16:32:23 CET 2014 Added log : Application is about to start
Wed Nov 26 16:32:28 CET 2014Added log : Application started, now we set banner to false
Wed Nov 26 16:32:28 CET 2014Added log : About to add Default profile
Wed Nov 26 16:32:28 CET 2014Added log : Default Profile added. Now we scan liquibase packages
Wed Nov 26 16:32:28 CET 2014Added log : Liquibase pakages scanned. Now we run the app
2014-11-26 16:32:54,564 [INFO] com.mycompany.myapp.Application - Starting Application on MacBook-Pro.local with PID 25452 (/Users/othomas/Developpement/jhipster-1.9.0/target/classes started by othomas in /Users/othomas/Developpement/jhipster-1.9.0)
2014-11-26 16:32:54,567 [DEBUG] com.mycompany.myapp.Application - Running with Spring Boot v1.1.9.RELEASE, Spring v4.0.8.RELEASE
2014-11-26 16:32:57,429 [DEBUG] org.jboss.logging - Logging Provider: org.jboss.logging.Log4jLoggerProvider
2014-11-26 16:32:57,559 [DEBUG] com.mycompany.myapp.config.AsyncConfiguration - Creating Async Task Executor
2014-11-26 16:32:58,305 [DEBUG] com.mycompany.myapp.config.MetricsConfiguration - Registering JVM gauges
2014-11-26 16:32:58,379 [INFO] com.mycompany.myapp.config.MetricsConfiguration - Initializing Metrics JMX reporting
2014-11-26 16:32:58,445 [DEBUG] com.mycompany.myapp.config.DatabaseConfiguration - Configuring Datasource
2014-11-26 16:32:59,353 [DEBUG] com.mycompany.myapp.config.DatabaseConfiguration - Configuring Liquibase
2014-11-26 16:33:19,489 [DEBUG] com.mycompany.myapp.config.CacheConfiguration - Starting Ehcache
2014-11-26 16:33:19,491 [DEBUG] com.mycompany.myapp.config.CacheConfiguration - Registering Ehcache Metrics gauges
2014-11-26 16:33:23,419 [DEBUG] com.mycompany.myapp.config.MailConfiguration - Configuring mail server
2014-11-26 16:33:24,559 [INFO] com.mycompany.myapp.config.WebConfigurer - Web application configuration, using profiles: [dev]
2014-11-26 16:33:24,560 [DEBUG] com.mycompany.myapp.config.WebConfigurer - Initializing Metrics registries
2014-11-26 16:33:24,564 [DEBUG] com.mycompany.myapp.config.WebConfigurer - Registering Metrics Filter
2014-11-26 16:33:24,565 [DEBUG] com.mycompany.myapp.config.WebConfigurer - Registering Metrics Servlet
2014-11-26 16:33:24,567 [DEBUG] com.mycompany.myapp.config.WebConfigurer - Registering GZip Filter
2014-11-26 16:33:24,569 [DEBUG] com.mycompany.myapp.config.WebConfigurer - Initialize H2 console
2014-11-26 16:33:24,570 [INFO] com.mycompany.myapp.config.WebConfigurer - Web application fully configured
2014-11-26 16:33:29,753 [INFO] com.mycompany.myapp.Application - Running with Spring profile(s) : [dev]
2014-11-26 16:33:30,012 [INFO] com.mycompany.myapp.config.ThymeleafConfiguration - loading non-reloadable mail messages resources
2014-11-26 16:33:30,896 [DEBUG] com.mycompany.myapp.aop.logging.LoggingAspect - Enter: com.mycompany.myapp.repository.CustomAuditEventRepository.auditEventRepository() with argument[s] = []
2014-11-26 16:33:30,905 [DEBUG] com.mycompany.myapp.aop.logging.LoggingAspect - Exit: com.mycompany.myapp.repository.CustomAuditEventRepository.auditEventRepository() with result = com.mycompany.myapp.repository.CustomAuditEventRepository$1#1edce963
2014-11-26 16:33:37,229 [INFO] com.mycompany.myapp.Application - Started Application in 68.311 seconds (JVM running for 73.972)
Wed Nov 26 16:33:37 CET 2014Added log : App is running
Thanks,
Olivier
It is advised to start the application with the debug points disabled unless you want to debug while starting up
you can just modify xmx like java -jar -Xmx1024m.
Because when Spring boot started, it loads lots of spring bean. You can add heap memory to improve it's performance.

Resources