I am monitoring hyperledger fabric couchdb and sometimes the couchdb
docker container stoped by heap allocation failing.
below is the logs right before couchdb docker container stoped.
The server host spec is AWS t2.micro and the fabric network consists of 1 peer(dev-mode), 1orderer, 1cli, 1ca, 1couchdb and 1ccenv container.
I have no idea what is the reason and how to solve this problem.
[info] 2018-11-06T14:19:19.839617Z nonode#nohost <0.19858.2908> -------- Closing index for db: shards/80000000-9fffffff/my-fabric.1533697140 idx: _design/16c2f95e8309df9a6f993586266150ba639ba7e8 sig: "97b910c8bbd4a195d0813dc30b7f0b91" because normal
[info] 2018-11-06T14:19:19.844610Z nonode#nohost <0.12086.2908> -------- Index shutdown by monitor notice for db: shards/80000000-9fffffff/_replicator.1533696646 idx: _design/_replicator
[info] 2018-11-06T14:19:19.847820Z nonode#nohost <0.12086.2908> -------- Closing index for db: shards/80000000-9fffffff/_replicator.1533696646 idx: _design/_replicator sig: "3e823c2a4383ac0c18d4e574135a5b08" because normal
eheap_alloc: Cannot allocate 212907632 bytes of memory (of type "heap").
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
Related
I have been struggling for some days now to deploy an Azure App Service with two minimalist Docker containers.
My first image is a basic PostgreSQL image and my second image is built with the following Dockerfile:
FROM python:3.7
RUN pip install streamlit
COPY app.py /streamlit-docker/
WORKDIR /streamlit-docker/
CMD streamlit run app.py
and where app.py is:
import streamlit as st
st.write('Hello world')
The docker-compose.yml file I use for creating my Azure App service is the following:
version: '3.3'
services:
db:
image: atestcr.azurecr.io/postgres:latest
environment:
POSTGRES_PASSWORD: postgres
app:
image: atestcr.azurecr.io/my_app:latest
ports:
- '8501:8501'
I get an 'Application Error' whenever I try to access the webapp URL. However, when I deploy this app with a single docker container, using this docker-compose.yml:
version: '3.3'
services:
app:
image: atestcr.azurecr.io/my_app:latest
ports:
- '8501:8501'
everything works fine.
Here is a GitHub repo to recreate the bug.
These are the Azure logs:
2021-08-09T17:29:34.382Z INFO - Starting multi-container app..
2021-08-09T17:29:35.089Z INFO - Pulling image: atestcr.azurecr.io/postgres:latest
2021-08-09T17:29:35.313Z INFO - latest Pulling from postgres
2021-08-09T17:29:35.315Z INFO - Digest: sha256:b6df1345afa5990ea32866e5c331eefbf2e30a05f2a715c3a9691a6cb18fa253
2021-08-09T17:29:35.317Z INFO - Status: Image is up to date for atestcr.azurecr.io/postgres:latest
2021-08-09T17:29:35.319Z INFO - Pull Image successful, Time taken: 0 Minutes and 0 Seconds
2021-08-09T17:29:35.335Z INFO - Starting container for site
2021-08-09T17:29:35.335Z INFO - docker run -d -p 8477:5432 --name testapp32_db_0_6662435d -e WEBSITES_ENABLE_APP_SERVICE_STORAGE=false -e WEBSITE_SITE_NAME=testapp32 -e WEBSITE_AUTH_ENABLED=False -e WEBSITE_ROLE_INSTANCE_ID=0 -e WEBSITE_HOSTNAME=testapp32.azurewebsites.net -e WEBSITE_INSTANCE_ID=42c09ff46e6c54cb467d28e88f2ab5b1e8971ee4daf2e883f44401bde67fe89f -e HTTP_LOGGING_ENABLED=1 atestcr.azurecr.io/postgres:latest
2021-08-09T17:29:35.781Z INFO - Pulling image: atestcr.azurecr.io/my_app:latest
2021-08-09T17:29:35.984Z INFO - latest Pulling from my_app
2021-08-09T17:29:35.987Z INFO - Digest: sha256:659937a52a6223b938b3d429901ab8648497870bf8068b5dcc05816050db5eaf
2021-08-09T17:29:35.988Z INFO - Status: Image is up to date for atestcr.azurecr.io/my_app:latest
2021-08-09T17:29:35.993Z INFO - Pull Image successful, Time taken: 0 Minutes and 0 Seconds
2021-08-09T17:29:36.014Z INFO - Starting container for site
2021-08-09T17:29:36.014Z INFO - docker run -d -p 0:8501 --name testapp32_app_0_6662435d -e WEBSITES_ENABLE_APP_SERVICE_STORAGE=false -e WEBSITE_SITE_NAME=testapp32 -e WEBSITE_AUTH_ENABLED=False -e WEBSITE_ROLE_INSTANCE_ID=0 -e WEBSITE_HOSTNAME=testapp32.azurewebsites.net -e WEBSITE_INSTANCE_ID=42c09ff46e6c54cb467d28e88f2ab5b1e8971ee4daf2e883f44401bde67fe89f -e HTTP_LOGGING_ENABLED=1 atestcr.azurecr.io/my_app:latest
2021-08-09T17:33:26.741Z ERROR - multi-container unit was not started successfully
2021-08-09T17:33:26.846Z INFO - Container logs from testapp32_db_0_6662435d = 2021-08-09T17:29:40.459721366Z The files belonging to this database system will be owned by user "postgres".
2021-08-09T17:29:40.491740899Z This user must also own the server process.
2021-08-09T17:29:40.493019808Z
2021-08-09T17:29:40.497263739Z The database cluster will be initialized with locale "en_US.utf8".
2021-08-09T17:29:40.502456876Z The default database encoding has accordingly been set to "UTF8".
2021-08-09T17:29:40.503203182Z The default text search configuration will be set to "english".
2021-08-09T17:29:40.503218482Z
2021-08-09T17:29:40.503223282Z Data page checksums are disabled.
2021-08-09T17:29:40.506809008Z
2021-08-09T17:29:40.521275113Z fixing permissions on existing directory /var/lib/postgresql/data ... ok
2021-08-09T17:29:40.523912632Z creating subdirectories ... ok
2021-08-09T17:29:40.525237242Z selecting dynamic shared memory implementation ... posix
2021-08-09T17:29:40.642905697Z selecting default max_connections ... 100
2021-08-09T17:29:40.733900958Z selecting default shared_buffers ... 128MB
2021-08-09T17:29:41.039050775Z selecting default time zone ... Etc/UTC
2021-08-09T17:29:41.047757638Z creating configuration files ... ok
2021-08-09T17:29:44.416903507Z running bootstrap script ... ok
2021-08-09T17:29:50.023628737Z performing post-bootstrap initialization ... ok
2021-08-09T17:30:04.217544961Z syncing data to disk ... ok
2021-08-09T17:30:04.218321267Z
2021-08-09T17:30:04.219189873Z initdb: warning: enabling "trust" authentication for local connections
2021-08-09T17:30:04.219206473Z You can change this by editing pg_hba.conf or using the option -A, or
2021-08-09T17:30:04.219223973Z --auth-local and --auth-host, the next time you run initdb.
2021-08-09T17:30:04.219491175Z
2021-08-09T17:30:04.219513275Z Success. You can now start the database server using:
2021-08-09T17:30:04.219519575Z
2021-08-09T17:30:04.219523675Z pg_ctl -D /var/lib/postgresql/data -l logfile start
2021-08-09T17:30:04.219527775Z
2021-08-09T17:30:08.340584036Z waiting for server to start.......2021-08-09 17:30:08.340 UTC [44] LOG: starting PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-08-09T17:30:08.478247580Z 2021-08-09 17:30:08.410 UTC [44] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-08-09T17:30:08.680599067Z 2021-08-09 17:30:08.680 UTC [45] LOG: database system was shut down at 2021-08-09 17:29:49 UTC
2021-08-09T17:30:08.753589768Z 2021-08-09 17:30:08.753 UTC [44] LOG: database system is ready to accept connections
2021-08-09T17:30:08.755965684Z done
2021-08-09T17:30:08.760382514Z server started
2021-08-09T17:30:09.790821978Z
2021-08-09T17:30:09.799723839Z /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
2021-08-09T17:30:09.802079455Z
2021-08-09T17:30:09.840304817Z 2021-08-09 17:30:09.834 UTC [44] LOG: received fast shutdown request
2021-08-09T17:30:09.862102366Z waiting for server to shut down....2021-08-09 17:30:09.861 UTC [44] LOG: aborting any active transactions
2021-08-09T17:30:09.950488072Z 2021-08-09 17:30:09.950 UTC [44] LOG: background worker "logical replication launcher" (PID 51) exited with exit code 1
2021-08-09T17:30:09.954360498Z 2021-08-09 17:30:09.950 UTC [46] LOG: shutting down
2021-08-09T17:30:13.063848848Z ...2021-08-09 17:30:13.063 UTC [44] LOG: database system is shut down
2021-08-09T17:30:13.138558463Z done
2021-08-09T17:30:13.140082774Z server stopped
2021-08-09T17:30:13.144102302Z
2021-08-09T17:30:13.162430928Z PostgreSQL init process complete; ready for start up.
2021-08-09T17:30:13.165654351Z
2021-08-09T17:30:14.083504086Z 2021-08-09 17:30:13.992 UTC [1] LOG: starting PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-08-09T17:30:14.084760295Z 2021-08-09 17:30:14.011 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-08-09T17:30:14.084774095Z 2021-08-09 17:30:14.015 UTC [1] LOG: listening on IPv6 address "::", port 5432
2021-08-09T17:30:14.084779495Z 2021-08-09 17:30:14.082 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-08-09T17:30:14.156076187Z 2021-08-09 17:30:14.132 UTC [63] LOG: database system was shut down at 2021-08-09 17:30:12 UTC
2021-08-09T17:30:14.198749982Z 2021-08-09 17:30:14.176 UTC [1] LOG: database system is ready to accept connections
2021-08-09T17:30:15.006882460Z 2021-08-09 17:30:14.998 UTC [70] LOG: invalid length of startup packet
2021-08-09T17:30:16.112919592Z 2021-08-09 17:30:16.112 UTC [71] LOG: invalid length of startup packet
2021-08-09T17:30:17.178350344Z 2021-08-09 17:30:17.174 UTC [72] LOG: invalid length of startup packet
...
2021-08-09T17:33:25.735293291Z 2021-08-09 17:33:25.735 UTC [260] LOG: invalid length of startup packet
2021-08-09T17:33:29.142Z INFO - Container logs from testapp32_app_0_6662435d = 2021-08-09T17:30:24.010212694Z
2021-08-09T17:30:24.011085300Z You can now view your Streamlit app in your browser.
2021-08-09T17:30:24.011101800Z
2021-08-09T17:30:24.012092107Z Network URL: http://172.16.232.3:8501
2021-08-09T17:30:24.012855612Z External URL: http://20.40.148.207:8501
2021-08-09T17:30:24.013407416Z
2021-08-09T17:33:36.002Z INFO - Stopping site testapp32 because it failed during startup.
It's worth noting your shared example doesn't work because your docker-compose.yml is using your own private container registries.
However, tinkering around the edges I have managed to get your dummy example working with changes to the following files:
Your app dockerfile:
FROM python:3.7
RUN pip install streamlit
COPY app.py /streamlit-docker/
WORKDIR /streamlit-docker/
EXPOSE 80
CMD streamlit run app.py --server.port 80
And adjusting your docker-compose.yml
version: '3.3'
services:
db:
image: postgis/postgis:13-master
environment:
- POSTGRES_DB=counterfactualcovid
- POSTGRES_USER=django
- POSTGRES_PASSWORD=django
app:
image: testazurecontainerregistryajc.azurecr.io/mcmegaapp:latest
ports:
- '80:80'
Ignore the fact i've used a generic postgres docker image and my own private container registry for the images and you'll see the key changes are:
in the Dockerfile exposing port 80, and adding --server.port 80 to the streamlit CMD call
in the docker-compose.yml setting the port forwarding to 80:80
I /think/ your issues arise from the preview limitations for the multi-container web apps feature, so setting everything to work via port 80 appears to resolve this.
I have deployed this PostgreSQL image to the IBM Cloud, Google Cloud Platform and Microsoft Azure using Kubernetes. https://github.com/paunin/PostDock
It was successfully deployed on all 3 platforms with identical configurations and an identical process. The IBM cloud fails with the error "psql: FATAL: password authentication failed for user "replica_user""
You can find below the logs from all 3 cloud platforms. Has anyone experienced this?
IBM Cloud Log
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
psql: FATAL: password authentication failed for user "replica_user"
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node2-service" (172.30.65.206) and accepting
TCP/IP connections on port 5432?
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
cat: /var/lib/postgresql/data/standby.lock: No such file or directory
>>> Previously Locked standby upstream node LOCKED_STANDBY=''
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
psql: FATAL: password authentication failed for user "replica_user"
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node1-service:5432 (will try 30 times more)
....
The last couple of lines are then repeated many times.
This is the log file from deploying the same application, using identical processes on the Google Cloud. It works just fine on the Google Cloud Platform.
Google Cloud Log
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node1-service" (10.52.0.11) and accepting
TCP/IP connections on port 5432?
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node2-service" (10.52.0.12) and accepting
TCP/IP connections on port 5432?
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
cat: /var/lib/postgresql/data/standby.lock: No such file or directory
>>> Previously Locked standby upstream node LOCKED_STANDBY=''
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node1-service" (10.52.0.11) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node1-service:5432 (will try 30 times more)
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node1-service:5432 (will try 29 times more)
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node1-service" (10.52.0.11) and accepting
TCP/IP connections on port 5432?
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node1-service" (10.52.0.11) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node1-service:5432 (will try 28 times more)
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
>>> REPLICATION_UPSTREAM_NODE_ID=1
>>> Sending in background postgres start...
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
>>> Starting standby node...
>>> Instance hasn't been set up yet.
>>> Clonning primary node...
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
NOTICE: destination directory '/var/lib/postgresql/data' provided
INFO: connecting to upstream node
INFO: Successfully connected to upstream node. Current installation size is 34 MB
INFO: checking and correcting permissions on existing directory /var/lib/postgresql/data ...
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
NOTICE: starting backup (using pg_basebackup)...
INFO: executing: '/usr/lib/postgresql/9.5/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/data -h cyclos-postgres-node1-service -p 5432 -U replica_user -c fast -X stream '
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example : pg_ctl -D /var/lib/postgresql/data start
HINT: After starting the server, you need to register this standby with "repmgr standby register"
[REPMGR EVENT] Node id: 2; Event type: standby_clone; Success [1|0]: 1; Time: 2018-02-02 13:24:32.87843+00; Details: Cloned from host 'cyclos-postgres-node1-service', port 5432; backup method: pg_basebackup; --force: Y
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Will add configs to exists file
>>> Starting postgres...
>>> Waiting for local postgres server start...
>>> Wait db replica_db on cyclos-postgres-node2-service:5432(user: replica_user,password: *******), will try 60 times with delay 10 seconds (TIMEOUT=600)
LOG: incomplete startup packet
LOG: incomplete startup packet
LOG: database system was interrupted; last known up at 2018-02-02 13:24:31 UTC
FATAL: the database system is starting up
psql: FATAL: the database system is starting up
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node2-service:5432 (will try 60 times more)
LOG: entering standby mode
LOG: redo starts at 0/2000028
LOG: consistent recovery state reached at 0/20000F8
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
>>>>>> Db replica_db exists on cyclos-postgres-node2-service:5432!
>>> Waiting for replication on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
>>> Replication is done
>>> Unregister the node if it was done before
DELETE 0
>>> Registering node with role standby
INFO: connecting to standby database
INFO: connecting to master database
INFO: retrieving node list for cluster 'postgres_cluster'
INFO: registering the standby
[REPMGR EVENT] Node id: 2; Event type: standby_register; Success [1|0]: 1; Time: 2018-02-02 13:24:51.891592+00; Details:
INFO: standby registration complete
NOTICE: standby node correctly registered for cluster postgres_cluster with id 2 (conninfo: user=replica_user password=replica_pass host=cyclos-postgres-node2-service dbname=replica_db port=5432 connect_timeout=2)
Locking standby (NEW_UPSTREAM_NODE_ID=1)...
>>> Starting repmgr daemon...
[2018-02-02 13:24:53] [NOTICE] looking for configuration file in current directory
[2018-02-02 13:24:53] [NOTICE] looking for configuration file in /etc
[2018-02-02 13:24:53] [NOTICE] configuration file found at: /etc/repmgr.conf
[2018-02-02 13:24:53] [INFO] connecting to database 'user=replica_user password=replica_pass host=cyclos-postgres-node2-service dbname=replica_db port=5432 connect_timeout=2'
[2018-02-02 13:24:53] [INFO] connected to database, checking its state
[2018-02-02 13:24:53] [INFO] connecting to master node of cluster 'postgres_cluster'
[2018-02-02 13:24:53] [INFO] retrieving node list for cluster 'postgres_cluster'
[2018-02-02 13:24:53] [INFO] checking role of cluster node '1'
[2018-02-02 13:24:53] [INFO] checking cluster configuration with schema 'repmgr_postgres_cluster'
[2018-02-02 13:24:53] [INFO] checking node 2 in cluster 'postgres_cluster'
[2018-02-02 13:24:53] [INFO] reloading configuration file
[2018-02-02 13:24:53] [INFO] configuration has not changed
[2018-02-02 13:24:53] [INFO] starting continuous standby node monitoring
ERROR: cannot execute DELETE in a read-only transaction
STATEMENT: DELETE FROM repmgr_postgres_cluster.repl_nodes WHERE conninfo LIKE '%host=cyclos-postgres-node3-service%'
And on the Azure Cloud, it works just fine as well.
Azure Cloud Log
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
psql: could not connect to server: Connection refused
Is the server running on host "cyclos-postgres-node2-service" (10.244.0.9) and accepting
TCP/IP connections on port 5432?
>>> Auto-detected master name: 'cyclos-postgres-node1-service'
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
cat: /var/lib/postgresql/data/standby.lock: No such file or directory
>>> Previously Locked standby upstream node LOCKED_STANDBY=''
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
>>> REPLICATION_UPSTREAM_NODE_ID=1
>>> Sending in background postgres start...
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
>>> Starting standby node...
>>> Instance hasn't been set up yet.
>>> Clonning primary node...
>>> Waiting for upstream postgres server...
>>> Wait db replica_db on cyclos-postgres-node1-service:5432(user: replica_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
NOTICE: destination directory '/var/lib/postgresql/data' provided
INFO: connecting to upstream node
>>>>>> Db replica_db exists on cyclos-postgres-node1-service:5432!
INFO: Successfully connected to upstream node. Current installation size is 34 MB
INFO: checking and correcting permissions on existing directory /var/lib/postgresql/data ...
NOTICE: starting backup (using pg_basebackup)...
INFO: executing: '/usr/lib/postgresql/9.5/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/data -h cyclos-postgres-node1-service -p 5432 -U replica_user -c fast -X stream '
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example : pg_ctl -D /var/lib/postgresql/data start
HINT: After starting the server, you need to register this standby with "repmgr standby register"
[REPMGR EVENT] Node id: 2; Event type: standby_clone; Success [1|0]: 1; Time: 2018-02-02 06:50:47.340146+00; Details: Cloned from host 'cyclos-postgres-node1-service', port 5432; backup method: pg_basebackup; --force: Y
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Will add configs to exists file
>>> Starting postgres...
>>> Waiting for local postgres server start...
>>> Wait db replica_db on cyclos-postgres-node2-service:5432(user: replica_user,password: *******), will try 60 times with delay 10 seconds (TIMEOUT=600)
LOG: incomplete startup packet
LOG: database system was interrupted; last known up at 2018-02-02 06:50:46 UTC
LOG: incomplete startup packet
FATAL: the database system is starting up
psql: FATAL: the database system is starting up
>>>>>> Db replica_db is still not accessable on cyclos-postgres-node2-service:5432 (will try 60 times more)
LOG: entering standby mode
LOG: redo starts at 0/2000028
LOG: consistent recovery state reached at 0/2000130
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
>>>>>> Db replica_db exists on cyclos-postgres-node2-service:5432!
>>> Waiting for replication on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
>>> Replication is done
>>> Unregister the node if it was done before
DELETE 0
>>> Registering node with role standby
INFO: connecting to standby database
INFO: connecting to master database
INFO: retrieving node list for cluster 'postgres_cluster'
INFO: registering the standby
[REPMGR EVENT] Node id: 2; Event type: standby_register; Success [1|0]: 1; Time: 2018-02-02 06:51:05.083455+00; Details:
INFO: standby registration complete
NOTICE: standby node correctly registered for cluster postgres_cluster with id 2 (conninfo: user=replica_user password=replica_pass host=cyclos-postgres-node2-service dbname=replica_db port=5432 connect_timeout=2)
Locking standby (NEW_UPSTREAM_NODE_ID=1)...
>>> Starting repmgr daemon...
[2018-02-02 06:51:05] [NOTICE] looking for configuration file in current directory
[2018-02-02 06:51:05] [NOTICE] looking for configuration file in /etc
[2018-02-02 06:51:05] [NOTICE] configuration file found at: /etc/repmgr.conf
[2018-02-02 06:51:05] [INFO] connecting to database 'user=replica_user password=replica_pass host=cyclos-postgres-node2-service dbname=replica_db port=5432 connect_timeout=2'
[2018-02-02 06:51:06] [INFO] connected to database, checking its state
[2018-02-02 06:51:06] [INFO] connecting to master node of cluster 'postgres_cluster'
[2018-02-02 06:51:06] [INFO] retrieving node list for cluster 'postgres_cluster'
[2018-02-02 06:51:06] [INFO] checking role of cluster node '1'
[2018-02-02 06:51:06] [INFO] checking cluster configuration with schema 'repmgr_postgres_cluster'
[2018-02-02 06:51:06] [INFO] checking node 2 in cluster 'postgres_cluster'
[2018-02-02 06:51:06] [INFO] reloading configuration file
[2018-02-02 06:51:06] [INFO] configuration has not changed
[2018-02-02 06:51:06] [INFO] starting continuous standby node monitoring
ERROR: cannot execute DELETE in a read-only transaction
STATEMENT: DELETE FROM repmgr_postgres_cluster.repl_nodes WHERE conninfo LIKE '%host=cyclos-postgres-node3-service%'
I was able to run this on a paid cluster in IBM Cloud and it appears to be working. I did NOT use the persistent volumes and I was on a paid cluster. Please note that persistent volumes are not available on free clusters, so if you are testing on a free cluster you will get issues if you use persistent volumes.
My cluster has 3 workers of size u2c.2x4 (the smallest available) and is on the default version of Kubernetes for IBM Cloud (1.8.6), if that helps you debug at all. Please try again or if your setup is different than mine, let me know and I can try with a matching setup.
$ kubectl logs --namespace=mysystem mysystem-db-node1-0
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
psql: could not translate host name "mysystem-db-node1-service" to address: Name or service not known
psql: could not translate host name "mysystem-db-node2-service" to address: Name or service not known
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
>>> Sending in background postgres start...
>>> Waiting for local postgres server start...
>>> Wait db replica_db on mysystem-db-node1-service:5432(user: replica_user,password: *******), will try 60 times with delay 10 seconds (TIMEOUT=600)
psql: could not translate host name "mysystem-db-node3-service" to address: Name or service not known
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
psql: could not connect to server: Connection refused
Is the server running on host "mysystem-db-node1-service" (172.30.207.54) and accepting
TCP/IP connections on port 5432?
selecting default shared_buffers ... >>>>>> Db replica_db is still not accessable on mysystem-db-node1-service:5432 (will try 60 times more)
128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /var/lib/postgresql/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
waiting for server to start....LOG: could not bind IPv6 socket: Cannot assign requested address
HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
LOG: database system was shut down at 2018-02-14 15:40:14 UTC
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
done
server started
CREATE DATABASE
CREATE ROLE
/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/entrypoint.sh
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Config file was replaced with standard one!
>>>>>> Adding config 'wal_keep_segments'='250'
>>>>>> Adding config 'shared_buffers'='300MB'
>>>>>> Adding config 'archive_command'=''/bin/true''
>>> Creating replication user 'replica_user'
CREATE ROLE
>>> Creating replication db 'replica_db'
LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: autovacuum launcher shutting down
waiting for server to shut down....LOG: shutting down
LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
LOG: database system was shut down at 2018-02-14 15:40:16 UTC
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
LOG: incomplete startup packet
LOG: incomplete startup packet
>>>>>> Db replica_db exists on mysystem-db-node1-service:5432!
>>> Registering node with role master
INFO: connecting to master database
INFO: master register: creating database objects inside the 'repmgr_mysystem_cluster' schema
INFO: retrieving node list for cluster 'mysystem_cluster'
[REPMGR EVENT] Node id: 1; Event type: master_register; Success [1|0]: 1; Time: 2018-02-14 15:40:27.337393+00; Details:
[REPMGR EVENT] will execute script '/usr/local/bin/cluster/repmgr/events/execs/master_register.sh' for the event
[REPMGR EVENT::master_register] Node id: 1; Event type: master_register; Success [1|0]: 1; Time: 2018-02-14 15:40:27.337393+00; Details:
[REPMGR EVENT::master_register] Locking master...
[REPMGR EVENT::master_register] Unlocking standby...
NOTICE: master node correctly registered for cluster 'mysystem_cluster' with id 1 (conninfo: user=replica_user password=replica_pass host=mysystem-db-node1-service dbname=replica_db port=5432 connect_timeout=2)
>>> Starting repmgr daemon...
[2018-02-14 15:40:27] [NOTICE] looking for configuration file in current directory
[2018-02-14 15:40:27] [NOTICE] looking for configuration file in /etc
[2018-02-14 15:40:27] [NOTICE] configuration file found at: /etc/repmgr.conf
[2018-02-14 15:40:27] [INFO] connecting to database 'user=replica_user password=replica_pass host=mysystem-db-node1-service dbname=replica_db port=5432 connect_timeout=2'
[2018-02-14 15:40:27] [INFO] connected to database, checking its state
[2018-02-14 15:40:27] [INFO] checking cluster configuration with schema 'repmgr_mysystem_cluster'
[2018-02-14 15:40:27] [INFO] checking node 1 in cluster 'mysystem_cluster'
[2018-02-14 15:40:27] [INFO] reloading configuration file
[2018-02-14 15:40:27] [INFO] configuration has not changed
[2018-02-14 15:40:27] [INFO] starting continuous master connection check
I've installed Mist on my local PC (Windows 10), but I don't want to sync Main/Test networks. So I've used this Ethereum + Azure tutorial and now I can work via SSH on my private network.
geth --dev console
More than that, I know that it's possible to run Mist on custom blockchain using special flag
mist.exe --rpc http://YOUR_IP:PORT
So, according to geth --help, I'm running geth --dev --rpc console on Azure's virtual machine, after that I'm running mist.exe --rpc http://VM_IP:8545 and there is an error:
[2016-09-24 18:01:21.928] [INFO] Sockets/node-ipc - Connect to {"hostPort":"http://VM_IP:8545"}
[2016-09-24 18:01:24.968] [ERROR] Sockets/node-ipc - Connection failed (3000ms elapsed)
[2016-09-24 18:01:24.971] [WARN] EthereumNode - Failed to connect to node. Maybe it's not running so let's start our own...
[2016-09-24 18:01:24.979] [INFO] EthereumNode - Node type: geth
[2016-09-24 18:01:24.982] [INFO] EthereumNode - Network: test
[2016-09-24 18:01:24.983] [INFO] EthereumNode - Start node: geth test
[2016-09-24 18:01:32.284] [INFO] EthereumNode - 3000ms elapsed, assuming node started up successfully
[2016-09-24 18:01:32.286] [INFO] EthereumNode - Started node successfully: geth test
[2016-09-24 18:01:32.327] [INFO] Sockets/node-ipc - Connect to {"hostPort":"http://VM_IP:8545"}
[2016-09-24 18:02:02.332] [ERROR] Sockets/node-ipc - Connection failed (30000ms elapsed)
[2016-09-24 18:02:02.333] [ERROR] EthereumNode - Failed to connect to node Error: Unable to connect to socket: timeout
P.S. Mist version - 0.8.2
Your approach is correct. I would say that you have a network configuration issue that prevents your Mist to talk to geth.
I would suggest doing the following test and see if you run into the same issue:
- on the machine where you have Mist, find the geth.exe executable
- run geth with geth --testnet --rpc
- start mist with ./Mist --rpc /.../Ethereum/testnet/geth.ipc or ./Mist --rpc http://localhost:8545
I am on a Mac so I guess you will have to reverse the / and add some C: decorations here and there.
after upgrading a node in cassandra, these error in log occured:
I want to investigate but dont have a direction, any clue will help
thanks
2016/09/19 06:24:49 [INFO] core: post-unseal setup starting
2016/09/19 06:24:49 [INFO] core: mounted backend of type generic at secret/
2016/09/19 06:24:49 [INFO] core: mounted backend of type cubbyhole at cubbyhole/
2016/09/19 06:24:49 [INFO] core: mounted backend of type system at sys/
2016/09/19 06:24:49 [INFO] core: mounted backend of type cassandra at cassandra/
2016/09/19 06:24:49 [INFO] rollback: starting rollback manager
2016/09/19 06:24:50 [INFO] expire: restored 2 leases
2016/09/19 06:24:50 [INFO] core: post-unseal setup complete
2016/09/19 06:24:55 gocql: unable to dial control conn node-0.cassandra-app.mesos:9042: dial tcp 10.0.2.42:9042: getsockopt: connection refused
2016/09/19 06:25:12 error: failed to connect to 10.0.2.42:9042 due to error: gocql: no response to connection startup within timeout
Cassandra cluster is not cross-version compatible, you can not upgrade a node only, you have to upgrade the cluster. This is a common mistake people tend to do, please see this video here it mentiones this problem, also it is very very useful with lots of good info.
Ubuntu 10.04 system. new Plone install, went fine and created some content, everything seemed fine. New kernel update and a reboot later, Plone is running but will not present any pages to a browser. In fact, a browser attempt just times out. I can telnet to the port 8080 on the system and send an HTTP get by hand and nothing comes back. The log file for client1 in a zeo install keeps repeating:
2011-08-10T16:59:57 INFO ZServer HTTP server started at Wed Aug 10 16:59:57 2011
Hostname: 0.0.0.0
Port: 8080
------
2011-08-10T16:59:57 INFO Zope Set effective user to "plone"
------
2011-08-10T17:00:02 INFO ZEO.ClientStorage zeostorage ClientStorage (pid=24596) created RW/normal for storage: '1'
------
2011-08-10T17:00:02 INFO ZEO.cache created temporary cache file '<fdopen>'
------
2011-08-10T17:00:02 INFO ZEO.ClientStorage zeostorage Testing connection <ManagedClientConnection ('127.0.0.1', 8100)>
------
2011-08-10T17:00:02 INFO ZEO.zrpc.Connection(C) (127.0.0.1:8100) received handshake 'Z3101'
------
2011-08-10T17:00:02 INFO ZEO.ClientStorage zeostorage Server authentication protocol None
------
2011-08-10T17:00:02 INFO ZEO.ClientStorage zeostorage Connected to storage: ('dns', 8100)
------
2011-08-10T17:00:02 INFO ZEO.ClientStorage zeostorage No verification necessary -- empty cache
------
2011-08-10T17:00:22 INFO ZServer HTTP server started at Wed Aug 10 17:00:22 2011
Hostname: 0.0.0.0
Port: 8080
I haven't been able to find any other info on what is causing this, nor can I find any documentation on debugging a Plone install.
Thanks for any help you can provide.
Forgive the aborted answer, misread the log snippet. The repeated log entries you're seeing are what you'd expect to see from repeated restarts. Are you repeatedly restarting the instance? If not, then in it seems your instance is restarting on it's own. Shut down the instance and start it using "bin/instance fg" and see if that gives you more information.