OpenEBS target pod is not able to communicate with its replicas after deleting one of the worker node from the cluster - openebs

Having a problem with an OpenEBS data store. Set up is with 3 OpenEBS storage replica on 3 different VMs.
Initially the work pod (postgresql) went into read-only mode, so I deleted first the work node and, after it didn't recover, the openEBS ctrl pod. Now it seems the ctrl pod cannot reconnect with all 3 replicas and keeps showing the message:
level=warning msg="No of yet to be registered replicas are less than 3 , No of registered replicas: 1"
The replica that seems to have managed to connect keeps logging repeatedly:
time="2019-01-22T08:04:12Z" level=info msg="Get Volume info from controller"
time="2019-01-22T08:04:12Z" level=info msg="Register replica at controller"
Target pod logs
"2019-01-22T06:55:46.064Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:46Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:46.065Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:46Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:48.076Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:48Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:48.075Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:48Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:50.085Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:50Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:49.083Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:49Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:50.086Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:50Z"" level=warning msg=busy"
"2019-01-22T06:55:50.085Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:50Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:49.084Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:49Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:53.105Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:53Z"" level=warning msg=busy"
"2019-01-22T06:55:53.104Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:53Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:55.117Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:55Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:54.107Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:54Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:54.107Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:54Z"" level=error msg=""Mode: ReadOnly"""
Replica pod which is not yet conencted
"2019-01-22T06:56:24.117Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Done running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9700 volume-head-010.img.meta]"""
"2019-01-22T06:56:24.866Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""source file size: 12884901888, setting up directIo: true"""
"2019-01-22T06:56:11.390Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:11Z"" level=info msg=""Get Volume Usage"""
"2019-01-22T06:56:23.881Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:23Z"" level=info msg=""Snapshotting [d82c79af-06fd-4bc4-bd67-c54fa636e596] volume, user created false, created time 2019-01-22T06:56:23Z"""
"2019-01-22T06:56:23.924Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","10.233.96.147 - - [22/Jan/2019:06:56:23 +0000] ""POST /v1/replicas/1?action=snapshot HTTP/1.1"" 200 14804"
"2019-01-22T06:56:24.049Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9700 volume-head-010.img.meta]"""
"2019-01-22T06:56:24.828Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9701 volume-snap-6b38fe32-98ab-4f95-8b2d-05ba9aebfe0e.img]"""
"2019-01-22T06:56:24.885Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""The file is a hole: [ 0: 3145728](3145728)"""
"2019-01-22T06:56:24.886Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Ssync client: exit code 0"""
"2019-01-22T06:56:23.872Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:23Z"" level=info msg=""GetReplica for id 1"""
"2019-01-22T06:56:24.019Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=GetReplica"
"2019-01-22T06:56:24.019Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""GetReplica for id 1"""
"2019-01-22T06:56:24.886Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Done running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9701 volume-snap-6b38fe32-98ab-4f95-8b2d-05ba9aebfe0e.img]"""
"2019-01-22T06:56:25.607Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:25Z"" level=info msg=""source file size: 112, setting up directIo: false"""
"2019-01-22T06:56:25.614Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:25Z"" level=warning msg=""Failed to open server: 10.233.91.202:9702, Retrying..."""
"2019-01-22T06:56:26.628Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:26Z"" level=info msg=""Ssync client: exit code 0"""
"2019-01-22T06:56:28.353Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9703 volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img]"""
"2019-01-22T06:56:28.419Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""source file size: 12884901888, setting up directIo: true"""
"2019-01-22T06:56:28.428Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""The file is a hole: [ 0: 3145728](3145728)"""
"2019-01-22T06:56:28.431Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""Ssync client: exit code 0"""
"2019-01-22T06:56:29.121Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9704 volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img.meta]"""
"2019-01-22T06:56:29.900Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Syncing volume-snap-f8771212-06d3-400b-ad12-c063ef8ed827.img to 10.233.91.202:9705...\n"""
"2019-01-22T06:56:29.900Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""source file size: 12884901888, setting up directIo: true"""
"2019-01-22T06:56:29.904Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=warning msg=""Failed to open server: 10.233.91.202:9705, Retrying..."""
"2019-01-22T06:56:25.607Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:25Z"" level=info msg=""Syncing volume-snap-6b38fe32-98ab-4f95-8b2d-05ba9aebfe0e.img.meta to 10.233.91.202:9702...\n"""
"2019-01-22T06:56:25.584Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:25Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9702 volume-snap-6b38fe32-98ab-4f95-8b2d-05ba9aebfe0e.img.meta]"""
"2019-01-22T06:56:28.419Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""Syncing volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img to 10.233.91.202:9703...\n"""
"2019-01-22T06:56:29.215Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Done running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9704 volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img.meta]"""
"2019-01-22T06:56:29.880Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9705 volume-snap-f8771212-06d3-400b-ad12-c063ef8ed827.img]"""
"2019-01-22T06:56:28.434Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""Done running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9703 volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img]"""
"2019-01-22T06:56:29.211Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Ssync client: exit code 0"""
"2019-01-22T06:56:29.183Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""source file size: 164, setting up directIo: false"""
"2019-01-22T06:56:29.905Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=warning msg=""Failed to open server: 10.233.91.202:9705, Retrying..."""
"2019-01-22T06:56:41.391Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:41Z"" level=info msg=GetUsage"
"2019-01-22T06:56:41.392Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","10.233.96.147 - - [22/Jan/2019:06:56:41 +0000] ""GET /v1/replicas/1/volusage HTTP/1.1"" 200 200"
"2019-01-22T06:56:41.390Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:41Z"" level=info msg=""Get Volume Usage"""
"2019-01-22T06:59:11.392Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:59:11Z"" level=info msg=GetUsage"
"2019-01-22T06:59:11.392Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:59:11Z"" level=info msg=""Get Volume Usage"""
"2019-01-22T07:00:38.050Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:38Z"" level=error msg=""Received EOF: EOF"""
"2019-01-22T07:00:38.050Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:38Z"" level=info msg=""Restart AutoConfigure Process"""
"2019-01-22T07:00:43.232Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","10.233.91.234 - - [22/Jan/2019:07:00:43 +0000] ""POST /v1/replicas/1?action=start HTTP/1.1"" 200 1091"
"2019-01-22T07:00:43.238Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=""GetReplica for id 1"""
"2019-01-22T07:00:43.409Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=""GetReplica for id 1"""
"2019-01-22T07:00:43.465Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=GetReplica"
"2019-01-22T07:00:43.239Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=""Got signal: 'open', proceed to open replica"""
"2019-01-22T07:00:43.585Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","10.233.91.234 - - [22/Jan/2019:07:00:43 +0000] ""POST /v1/replicas/1?action=snapshot HTTP/1.1"" 200 15190"
"2019-01-22T07:00:43.666Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=GetReplica"

After going through the logs, I can see that replicas were registered to controller but one of the replica is getting synced with other healthy replica, which might take some time
And after sometime I can see from target pod
level=warning msg="No of yet to be registered replicas are less than 3 , No of registered replicas: 1"
which no longer shows up. I think it is recovering right now. I have a data of 12GiB size.

Related

Grafana UI not showing up (via docker-compose)

I've been facing some issues with grafana and docker. I'm trying to get a grafana and influxdb up, for monitoring my MISP instance.
The project which I'm following is: https://github.com/MISP/misp-grafana
The InfluxDB docker is up and receiving logs via telegraf... seems to be working fine. But the Grafana Dashboards does not show up via https://ip:3000/login. From inside of the machine where MISP is running(the docker containers are also running on this machine) i can "CURL" the address and receive the HTML from Grafana's login page.
Could someone help me? Have already tried lots of suggestions, but none of them work as expected.
I've already tried to disable Iptables and firewalld(since i'm using an CentOS7), and nothing helped.
My docker-compose file is:
services:
influxdb:
image: influxdb:latest
container_name: influxdb
volumes:
- influxdb-storage:/var/lib/influxdb2:rw
# - ./influxdb/ssl/influxdb-selfsigned.crt:/etc/ssl/influxdb-selfsigned.crt:rw
# - ./influxdb/ssl/influxdb-selfsigned.key:/etc/ssl/influxdb-selfsigned.key:rw
ports:
- "8086:8086"
environment:
- DOCKER_INFLUXDB_INIT_MODE=${DOCKER_INFLUXDB_INIT_MODE}
- DOCKER_INFLUXDB_INIT_USERNAME=${DOCKER_INFLUXDB_INIT_USERNAME}
- DOCKER_INFLUXDB_INIT_PASSWORD=${DOCKER_INFLUXDB_INIT_PASSWORD}
- DOCKER_INFLUXDB_INIT_ORG=${DOCKER_INFLUXDB_INIT_ORG}
- DOCKER_INFLUXDB_INIT_BUCKET=${DOCKER_INFLUXDB_INIT_BUCKET}
- DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=${DOCKER_INFLUXDB_INIT_ADMIN_TOKEN}
# - INFLUXD_TLS_CERT=/etc/ssl/influxdb-selfsigned.crt
# - INFLUXD_TLS_KEY=/etc/ssl/influxdb-selfsigned.key
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
volumes:
- grafana-storage:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
depends_on:
- influxdb
environment:
- GF_SECURITY_ADMIN_USER=${DOCKER_GRAFANA_USERNAME}
- GF_SECURITY_ADMIN_PASSWORD=${DOCKER_GRAFANA_PASSWORD}
network_mode: host
volumes:
influxdb-storage:
grafana-storage:
The logs from Grafana container are:
GF_PATHS_CONFIG='/etc/grafana/grafana.ini' is not readable.
GF_PATHS_DATA='/var/lib/grafana' is not writable.
GF_PATHS_HOME='/usr/share/grafana' is not readable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migrate-to-v51-or-later
logger=settings t=2023-01-18T11:23:18.545281441Z level=info msg="Starting Grafana" version=9.3.2 commit=21c1d14e91 branch=HEAD compiled=2022-12-14T10:40:18Z
logger=settings t=2023-01-18T11:23:18.545574106Z level=info msg="Config loaded from" file=/usr/share/grafana/conf/defaults.ini
logger=settings t=2023-01-18T11:23:18.545595127Z level=info msg="Config loaded from" file=/etc/grafana/grafana.ini
logger=settings t=2023-01-18T11:23:18.545601849Z level=info msg="Config overridden from command line" arg="default.paths.data=/var/lib/grafana"
logger=settings t=2023-01-18T11:23:18.545610343Z level=info msg="Config overridden from command line" arg="default.paths.logs=/var/log/grafana"
logger=settings t=2023-01-18T11:23:18.54561656Z level=info msg="Config overridden from command line" arg="default.paths.plugins=/var/lib/grafana/plugins"
logger=settings t=2023-01-18T11:23:18.545623137Z level=info msg="Config overridden from command line" arg="default.paths.provisioning=/etc/grafana/provisioning"
logger=settings t=2023-01-18T11:23:18.5456313Z level=info msg="Config overridden from command line" arg="default.log.mode=console"
logger=settings t=2023-01-18T11:23:18.545637996Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_DATA=/var/lib/grafana"
logger=settings t=2023-01-18T11:23:18.545648448Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_LOGS=/var/log/grafana"
logger=settings t=2023-01-18T11:23:18.545654176Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_PLUGINS=/var/lib/grafana/plugins"
logger=settings t=2023-01-18T11:23:18.545663184Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_PROVISIONING=/etc/grafana/provisioning"
logger=settings t=2023-01-18T11:23:18.545668879Z level=info msg="Config overridden from Environment variable" var="GF_SECURITY_ADMIN_USER=tsec"
logger=settings t=2023-01-18T11:23:18.545682275Z level=info msg="Config overridden from Environment variable" var="GF_SECURITY_ADMIN_PASSWORD=*********"
logger=settings t=2023-01-18T11:23:18.545689113Z level=info msg="Path Home" path=/usr/share/grafana
logger=settings t=2023-01-18T11:23:18.545699682Z level=info msg="Path Data" path=/var/lib/grafana
logger=settings t=2023-01-18T11:23:18.545705402Z level=info msg="Path Logs" path=/var/log/grafana
logger=settings t=2023-01-18T11:23:18.545710714Z level=info msg="Path Plugins" path=/var/lib/grafana/plugins
logger=settings t=2023-01-18T11:23:18.545732177Z level=info msg="Path Provisioning" path=/etc/grafana/provisioning
logger=settings t=2023-01-18T11:23:18.5457395Z level=info msg="App mode production"
logger=sqlstore t=2023-01-18T11:23:18.545859098Z level=info msg="Connecting to DB" dbtype=sqlite3
logger=migrator t=2023-01-18T11:23:18.575806909Z level=info msg="Starting DB migrations"
logger=migrator t=2023-01-18T11:23:18.584646143Z level=info msg="migrations completed" performed=0 skipped=464 duration=1.036135ms
logger=plugin.loader t=2023-01-18T11:23:18.694560017Z level=info msg="Plugin registered" pluginID=input
logger=secrets t=2023-01-18T11:23:18.695056176Z level=info msg="Envelope encryption state" enabled=true currentprovider=secretKey.v1
logger=query_data t=2023-01-18T11:23:18.698004003Z level=info msg="Query Service initialization"
logger=live.push_http t=2023-01-18T11:23:18.709944098Z level=info msg="Live Push Gateway initialization"
logger=infra.usagestats.collector t=2023-01-18T11:23:19.076511711Z level=info msg="registering usage stat providers" usageStatsProvidersLen=2
logger=provisioning.plugins t=2023-01-18T11:23:19.133661231Z level=error msg="Failed to read plugin provisioning files from directory" path=/etc/grafana/provisioning/plugins error="open /etc/grafana/provisioning/plugins: no such file or directory"
logger=provisioning.notifiers t=2023-01-18T11:23:19.133823449Z level=error msg="Can't read alert notification provisioning files from directory" path=/etc/grafana/provisioning/notifiers error="open /etc/grafana/provisioning/notifiers: no such file or directory"
logger=provisioning.alerting t=2023-01-18T11:23:19.133926705Z level=error msg="can't read alerting provisioning files from directory" path=/etc/grafana/provisioning/alerting error="open /etc/grafana/provisioning/alerting: no such file or directory"
logger=provisioning.alerting t=2023-01-18T11:23:19.133951102Z level=info msg="starting to provision alerting"
logger=provisioning.alerting t=2023-01-18T11:23:19.133992848Z level=info msg="finished to provision alerting"
logger=ngalert.state.manager t=2023-01-18T11:23:19.134747843Z level=info msg="Warming state cache for startup"
logger=grafanaStorageLogger t=2023-01-18T11:23:19.140618617Z level=info msg="storage starting"
logger=http.server t=2023-01-18T11:23:19.140693638Z level=info msg="HTTP Server Listen" address=[::]:3000 protocol=http subUrl= socket=
logger=ngalert.state.manager t=2023-01-18T11:23:19.16651119Z level=info msg="State cache has been initialized" states=0 duration=31.757492ms
logger=ticker t=2023-01-18T11:23:19.166633607Z level=info msg=starting first_tick=2023-01-18T11:23:20Z
logger=ngalert.multiorg.alertmanager t=2023-01-18T11:23:19.166666209Z level=info msg="starting MultiOrg Alertmanager"
logger=context userId=0 orgId=0 uname= t=2023-01-18T11:23:27.6249068Z level=info msg="Request Completed" method=GET path=/ status=302 remote_addr=1.134.26.244 time_ms=0 duration=610.399µs size=29 referer= handler=/
logger=cleanup t=2023-01-18T11:33:19.14617771Z level=info msg="Completed cleanup jobs" duration=7.595219ms
The curl http://ip:3000/login command from inside the CentOS7 machine answer (just a part of it, since it's big):
<!doctype html><html lang="en"><head><meta charset="utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"/><meta name="viewport" content="width=device-width"/><meta name="theme-color" content="#000"/><title>Grafana</title><base href="/"/><link rel="preload" href="public/fonts/roboto/RxZJdnzeo3R5zSexge8UUVtXRa8TVwTICgirnJhmVJw.woff2" as="font" crossorigin/><link rel="icon" type="image/png" href="public/img/fav32.png"/><link rel="apple-touch-icon" sizes="180x180" href="public/img/apple-touch-icon.png"/><link rel="mask-icon" href="public/img/grafana_mask_icon.svg" color="#F05A28"/><link rel="stylesheet" href="public/build/grafana.dark.960bbecc684cac29c4a2.css"/><script nonce="">performance.mark('frontend_boot_css_time_seconds');</script><meta name="apple-mobile-web-app-capable" content="yes"/><meta name="apple-mobile-web-app-status-bar-style" content="black"/><meta name="msapplication-TileColor" content="#2b5797"/><meta name="msapplication-config" content="public/img/browserconfig.xml"/></head><body class="theme-dark app-grafana"><style>.preloader {
height: 100%;
flex-direction: column;
display: flex;
justify-content: center;
align-items: center;
}
...
The telnet ip 3000 command from another machine gives me and error.
The `netstat -naput | grep LISTEN" on the CentOS7 machine:
tcp6 0 0 :::8086 :::* LISTEN 30124/docker-proxy-
tcp6 0 0 :::3000 :::* LISTEN 30241/grafana-serve
I've already tried to change de 3000 port to another one (avoiding firewall blocks) but it did not work.
Help me please.....

Prometheus Execution Timeout Exceeded

I use Grafana to monitor my company's infrastructure. Everything worked fine until this week, I started to see alerts on Grafana with an error message :
request handler error: Post "http://prometheus-ip:9090/api/v1/query_range": dial tcp prometheus-ip:9090: i/o timeout
I tried to restart the prometheus server but it seems that it can't be stopped. I have to kill -9 the server and restart it. Here's the log :
Jun 16 01:04:01 prometheus prometheus[18869]: time="2022-06-16T01:04:01+02:00" level=info msg="All requests for rebuilding the label indexes queued. (Actual processing may lag behind.)" source="crashrecovery.go:529"
Jun 16 01:04:01 prometheus prometheus[18869]: time="2022-06-16T01:04:01+02:00" level=info msg="Checkpointing fingerprint mappings..." source="persistence.go:1480"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="Done checkpointing fingerprint mappings in 286.224481ms." source="persistence.go:1503"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=warning msg="Crash recovery complete." source="crashrecovery.go:152"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="362306 series loaded." source="storage.go:378"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="Starting target manager..." source="targetmanager.go:61"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="Listening on :9090" source="web.go:235"
Jun 16 01:04:15 prometheus prometheus[18869]: time="2022-06-16T01:04:15+02:00" level=warning msg="Storage has entered rushed mode." chunksToPersist=420483 maxChunksToPersist=524288 maxMemoryChunks=1048576 memoryChunks=655877 source="storage.go:1660" urgencyScore=0.8020076751708984
Jun 16 01:09:02 prometheus prometheus[18869]: time="2022-06-16T01:09:02+02:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612"
Jun 16 01:10:05 prometheus prometheus[18869]: time="2022-06-16T01:10:05+02:00" level=info msg="Done checkpointing in-memory metrics and chunks in 1m3.127365726s." source="persistence.go:639"
Jun 16 01:12:25 prometheus prometheus[18869]: time="2022-06-16T01:12:25+02:00" level=warning msg="Received SIGTERM, exiting gracefully..." source="main.go:230"
Jun 16 01:12:25 prometheus prometheus[18869]: time="2022-06-16T01:12:25+02:00" level=info msg="See you next time!" source="main.go:237"
Jun 16 01:12:25 prometheus prometheus[18869]: time="2022-06-16T01:12:25+02:00" level=info msg="Stopping target manager..." source="targetmanager.go:75"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping rule manager..." source="manager.go:374"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Rule manager stopped." source="manager.go:380"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping notification handler..." source="notifier.go:369"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping local storage..." source="storage.go:396"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping maintenance loop..." source="storage.go:398"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Maintenance loop stopped." source="storage.go:1259"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping series quarantining..." source="storage.go:402"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Series quarantining stopped." source="storage.go:1701"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping chunk eviction..." source="storage.go:406"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Chunk eviction stopped." source="storage.go:1079"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612"
Jun 16 01:12:44 prometheus prometheus[18869]: time="2022-06-16T01:12:44+02:00" level=info msg="Done checkpointing in-memory metrics and chunks in 16.170119611s." source="persistence.go:639"
Jun 16 01:12:44 prometheus prometheus[18869]: time="2022-06-16T01:12:44+02:00" level=info msg="Checkpointing fingerprint mappings..." source="persistence.go:1480"
Jun 16 01:12:45 prometheus prometheus[18869]: time="2022-06-16T01:12:45+02:00" level=info msg="Done checkpointing fingerprint mappings in 651.409422ms." source="persistence.go:1503"
Jun 16 01:12:45 prometheus systemd[1]: prometheus.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
Jun 16 01:13:06 prometheus systemd[1]: prometheus.service: State 'stop-final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
Jun 16 01:13:06 prometheus systemd[1]: prometheus.service: Unit entered failed state.
Jun 16 01:13:06 prometheus systemd[1]: prometheus.service: Failed with result 'timeout'.
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=info msg="Starting prometheus (version=1.5.2+ds, branch=debian/sid, revision=1.5.2+ds-2+b3)" source="main.go:75"
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=info msg="Build context (go=go1.7.4, user=pkg-go-maintainers#lists.alioth.debian.org, date=20170521-14:39:14)" source="main.go:76"
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:248"
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=error msg="Could not lock /path/to/prometheus/metrics/DIRTY, Prometheus already running?" source="persistence.go:198"
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=error msg="Error opening memory series storage: resource temporarily unavailable" source="main.go:182"
Jun 16 01:13:24 prometheus systemd[1]: prometheus.service: Main process exited, code=exited, status=1/FAILURE
Jun 16 01:13:44 prometheus systemd[1]: prometheus.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
Jun 16 01:14:02 prometheus prometheus[18869]: time="2022-06-16T01:14:02+02:00" level=info msg="Local storage stopped." source="storage.go:421"
Jun 16 01:14:02 prometheus systemd[1]: prometheus.service: Unit entered failed state.
Jun 16 01:14:02 prometheus systemd[1]: prometheus.service: Failed with result 'exit-code'.
Jun 16 01:14:03 prometheus systemd[1]: prometheus.service: Service hold-off time over, scheduling restart.
Jun 16 01:14:03 prometheus prometheus[20564]: time="2022-06-16T01:14:03+02:00" level=info msg="Starting prometheus (version=1.5.2+ds, branch=debian/sid, revision=1.5.2+ds-2+b3)" source="main.go:75"
Jun 16 01:14:03 prometheus prometheus[20564]: time="2022-06-16T01:14:03+02:00" level=info msg="Build context (go=go1.7.4, user=pkg-go-maintainers#lists.alioth.debian.org, date=20170521-14:39:14)" source="main.go:76"
Jun 16 01:14:03 prometheus prometheus[20564]: time="2022-06-16T01:14:03+02:00" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:248"
Jun 16 01:14:04 prometheus prometheus[20564]: time="2022-06-16T01:14:04+02:00" level=info msg="Loading series map and head chunks..." source="storage.go:373"
Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=info msg="364314 series loaded." source="storage.go:378"
Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=info msg="Starting target manager..." source="targetmanager.go:61"
Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=info msg="Listening on :9090" source="web.go:235"
Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=warning msg="Storage has entered rushed mode." chunksToPersist=448681 maxChunksToPersist=524288 maxMemoryChunks=1048576 memoryChunks=687476 source="storage.go:1660" urgencyScore=0.8557910919189453
When restarted like so, Prometheus enters Recovery Mode which takes 1h 30 min to complete. When it's done, the logs show the following :
Jun 16 16:10:42 prometheus prometheus[32708]: time="2022-06-16T16:10:42+02:00" level=info msg="Storage does not need throttling anymore." chunksToPersist=524288 maxChunksToPersist=524288 maxToleratedMemChunks=1153433 memoryChunks=1049320 source="storage.go:935"
Jun 16 16:10:42 prometheus prometheus[32708]: time="2022-06-16T16:10:42+02:00" level=error msg="Storage needs throttling. Scrapes and rule evaluations will be skipped." chunksToPersist=525451 maxChunksToPersist=524288 maxToleratedMemChunks=1153433 memoryChunks=1050483 source="storage.go:927"
Jun 16 16:15:31 prometheus prometheus[32708]: time="2022-06-16T16:15:31+02:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612"
Jun 16 16:16:28 prometheus prometheus[32708]: time="2022-06-16T16:16:28+02:00" level=info msg="Done checkpointing in-memory metrics and chunks in 57.204367083s." source="persistence.go:639"
The checkpointing is repeating often and takes about 1 min.
The monitoring for this server show the following :
Here are the flags used :
/usr/bin/prometheus --storage.local.path /path/to/prometheus/metrics --storage.local.retention=1460h0m0s --storage.local.series-file-shrink-ratio=0.3
Prometheus version :
prometheus --version
prometheus, version 1.5.2+ds (branch: debian/sid, revision: 1.5.2+ds-2+b3)
build user: pkg-go-maintainers#lists.alioth.debian.org
build date: 20170521-14:39:14
go version: go1.7.4
I decided to move some metrics on another server so this one is not as loaded as before. However, this server does have to scrape the metrics for 50+ other servers. What could be the cause of this ?

Prometheus is crash looping when the pod recreates

We are Prometheus version 2.26.0 and kubernetes verion 1.21.7 in Azure. We mount the data in Azure storage NFS and it was working fine. From last few days Prometheus container is crash-looping and below are the logs
level=info ts=2022-01-26T08:04:14.375Z caller=main.go:418 msg="Starting Prometheus" version="(version=2.26.0, branch=HEAD, revision=3cafc58827d1ebd1a67749f88be4218f0bab3d8d)"
level=info ts=2022-01-26T08:04:14.375Z caller=main.go:423 build_context="(go=go1.16.2, user=root#a67cafebe6d0, date=20210331-11:56:23)"
level=info ts=2022-01-26T08:04:14.375Z caller=main.go:424 host_details="(Linux 5.4.0-1065-azure #68~18.04.1-Ubuntu SMP Fri Dec 3 14:08:44 UTC 2021 x86_64 prometheus-6b9d9d54f4-nc45x (none))"
level=info ts=2022-01-26T08:04:14.375Z caller=main.go:425 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2022-01-26T08:04:14.375Z caller=main.go:426 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2022-01-26T08:04:14.503Z caller=web.go:540 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2022-01-26T08:04:14.507Z caller=main.go:795 msg="Starting TSDB ..."
level=info ts=2022-01-26T08:04:14.509Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false
level=info ts=2022-01-26T08:04:14.560Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641478251052 maxt=1641513600000 ulid=01FRSEHC4YHV3N26JY5AMNZFRW
level=info ts=2022-01-26T08:04:14.593Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641513600037 maxt=1641578400000 ulid=01FRVCAP2VJGDF0Z9CS24EXAJJ
level=info ts=2022-01-26T08:04:14.624Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641578400038 maxt=1641643200000 ulid=01FRXA4AQHMHAEYWRKQFGP075M
level=info ts=2022-01-26T08:04:14.651Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641643200422 maxt=1641708000000 ulid=01FRZ7XQQ4RA96DCPPBP22D71N
level=info ts=2022-01-26T08:04:14.679Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641708000020 maxt=1641772800000 ulid=01FS15QDG6BS7H6M6Y09HG3E12
level=info ts=2022-01-26T08:04:14.707Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641772800011 maxt=1641837600000 ulid=01FS33GT38PRSB9VP56YFXT2M0
level=info ts=2022-01-26T08:04:14.736Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641963555381 maxt=1641967200000 ulid=01FS6MRNZEWT1Z6P697K09KHD7
level=info ts=2022-01-26T08:04:14.763Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641837600100 maxt=1641902400000 ulid=01FS6R88C70TCD8CYC4XJ95X23
level=info ts=2022-01-26T08:04:14.810Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641967200019 maxt=1642032000000 ulid=01FS8WXQP3YJ7EXBVNYBQG4DVY
level=info ts=2022-01-26T08:04:14.836Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642032000072 maxt=1642096800000 ulid=01FSATQBR4XBQRDM72ATFS9PQ2
level=info ts=2022-01-26T08:04:14.863Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642096800059 maxt=1642161600000 ulid=01FSCRHE2YBDX7GPRPSH6BNGRX
level=info ts=2022-01-26T08:04:14.895Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642161600091 maxt=1642226400000 ulid=01FSEPB1GPGAANVCQ2VKW9BQ4G
level=info ts=2022-01-26T08:04:14.948Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642226400026 maxt=1642291200000 ulid=01FSGM4J0G1D0A6H1GD3N9C372
level=info ts=2022-01-26T08:04:14.973Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642291200005 maxt=1642356000000 ulid=01FSJHY6W0FRYDHCXBVB5XPFYG
level=info ts=2022-01-26T08:04:15.002Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642356000027 maxt=1642420800000 ulid=01FSMFR96DASV6YPN66W7C86H9
level=info ts=2022-01-26T08:04:15.077Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642420800042 maxt=1642485600000 ulid=01FSPDHGWRT65D8CKWQ2JPRHW3
level=info ts=2022-01-26T08:04:15.105Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642485600006 maxt=1642550400000 ulid=01FSRBAVP2MW71H08F32D6HGB4
level=info ts=2022-01-26T08:04:15.130Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642550400028 maxt=1642615200000 ulid=01FST9482FD0Z3PHXHNW2W616E
level=info ts=2022-01-26T08:04:15.157Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642680000018 maxt=1642687200000 ulid=01FSW00TJKJ7CGCQ7JJS3XQK8G
level=info ts=2022-01-26T08:04:15.187Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642687200018 maxt=1642694400000 ulid=01FSW6WHTSEAXHWV5J7PQP94X7
level=info ts=2022-01-26T08:04:15.213Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642615200021 maxt=1642680000000 ulid=01FSW6XYH2Y429PG5YRM0K45XS
level=info ts=2022-01-26T08:04:15.275Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642694400018 maxt=1642701600000 ulid=01FSWDR92Y7H302NDZRX1V2PX9
level=info ts=2022-01-26T08:04:21.840Z caller=head.go:696 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2022-01-26T08:04:22.623Z caller=head.go:710 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=782.403397ms
level=info ts=2022-01-26T08:04:22.623Z caller=head.go:716 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2022-01-26T08:04:34.169Z caller=head.go:742 component=tsdb msg="WAL checkpoint loaded"
level=info ts=2022-01-26T08:04:38.895Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=299 maxSegment=7511
level=warn ts=2022-01-26T08:04:46.423Z caller=main.go:645 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2022-01-26T08:04:46.424Z caller=main.go:668 msg="Stopping scrape discovery manager..."
level=info ts=2022-01-26T08:04:46.424Z caller=main.go:682 msg="Stopping notify discovery manager..."
level=info ts=2022-01-26T08:04:46.424Z caller=main.go:704 msg="Stopping scrape manager..."
level=info ts=2022-01-26T08:04:46.424Z caller=main.go:678 msg="Notify discovery manager stopped"
level=info ts=2022-01-26T08:04:46.425Z caller=main.go:698 msg="Scrape manager stopped"
level=info ts=2022-01-26T08:04:46.426Z caller=manager.go:934 component="rule manager" msg="Stopping rule manager..."
level=info ts=2022-01-26T08:04:46.426Z caller=manager.go:944 component="rule manager" msg="Rule manager stopped"
level=info ts=2022-01-26T08:04:46.426Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..."
level=info ts=2022-01-26T08:04:46.426Z caller=main.go:872 msg="Notifier manager stopped"
level=info ts=2022-01-26T08:04:46.426Z caller=main.go:664 msg="Scrape discovery manager stopped"
level=info ts=2022-01-26T08:04:46.792Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=300 maxSegment=7511
level=info ts=2022-01-26T08:04:46.870Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=301 maxSegment=7511
level=info ts=2022-01-26T08:04:46.901Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=302 maxSegment=7511
level=info ts=2022-01-26T08:04:46.946Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=303 maxSegment=7511
level=info ts=2022-01-26T08:04:46.974Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=304 maxSegment=7511
level=info ts=2022-01-26T08:04:47.008Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=305 maxSegment=7511
level=info ts=2022-01-26T08:04:47.034Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=306 maxSegment=7511
level=info ts=2022-01-26T08:04:47.067Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=307 maxSegment=7511
level=info ts=2022-01-26T08:04:47.098Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=308 maxSegment=7511
level=info ts=2022-01-26T08:04:47.124Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=309 maxSegment=7511
level=info ts=2022-01-26T08:04:47.158Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=310 maxSegment=7511
level=info ts=2022-01-26T08:04:47.203Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=311 maxSegment=7511
level=info ts=2022-01-26T08:04:47.254Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=312 maxSegment=7511
level=info ts=2022-01-26T08:04:47.486Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=313 maxSegment=7511
level=info ts=2022-01-26T08:04:47.511Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=314 maxSegment=7511
level=info ts=2022-01-26T08:04:47.539Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=315 maxSegment=7511
level=info ts=2022-01-26T08:04:47.564Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=316 maxSegment=7511
.
.
.
.
.
.
.
.
.
level=info ts=2022-01-26T08:05:15.161Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1401 maxSegment=7511
level=info ts=2022-01-26T08:05:15.182Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1402 maxSegment=7511
level=info ts=2022-01-26T08:05:15.205Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1403 maxSegment=7511
level=info ts=2022-01-26T08:05:15.229Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1404 maxSegment=7511
level=info ts=2022-01-26T08:05:15.251Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1405 maxSegment=7511
level=info ts=2022-01-26T08:05:15.274Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1406 maxSegment=7511
level=info ts=2022-01-26T08:05:15.297Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1407 maxSegment=7511
level=info ts=2022-01-26T08:05:15.323Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1408 maxSegment=7511
level=info ts=2022-01-26T08:05:15.349Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1409 maxSegment=7511
level=info ts=2022-01-26T08:05:15.372Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1410 maxSegment=7511
level=info ts=2022-01-26T08:05:15.426Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1411 maxSegment=7511
level=info ts=2022-01-26T08:05:15.452Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1412 maxSegment=7511
level=info ts=2022-01-26T08:05:15.475Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1413 maxSegment=7511
level=info ts=2022-01-26T08:05:15.498Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1414 maxSegment=7511
rpc error: code = NotFound desc = an error occurred when try to find container "ae14079418f59b04bb80d8413e8fdc34f167bfe762317ef674e05466d34c9e1f": not found
So I deleted the deployment and redeployed to the the same storage account, the I got new error
level=info ts=2022-01-26T11:10:11.530Z caller=main.go:418 msg="Starting Prometheus" version="(version=2.26.0, branch=HEAD, revision=3cafc58827d1ebd1a67749f88be4218f0bab3d8d)"
level=info ts=2022-01-26T11:10:11.534Z caller=main.go:423 build_context="(go=go1.16.2, user=root#a67cafebe6d0, date=20210331-11:56:23)"
level=info ts=2022-01-26T11:10:11.535Z caller=main.go:424 host_details="(Linux 5.4.0-1064-azure #67~18.04.1-Ubuntu SMP Wed Nov 10 11:38:21 UTC 2021 x86_64 prometheus-6b9d9d54f4-wnmzh (none))"
level=info ts=2022-01-26T11:10:11.536Z caller=main.go:425 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2022-01-26T11:10:11.536Z caller=main.go:426 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2022-01-26T11:10:14.168Z caller=web.go:540 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2022-01-26T11:10:15.385Z caller=main.go:795 msg="Starting TSDB ..."
level=info ts=2022-01-26T11:10:16.022Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641837600024 maxt=1641902400000 ulid=01FS51ANKBFTVNRPZ68FGQQ5GA
level=info ts=2022-01-26T11:10:16.309Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false
level=info ts=2022-01-26T11:10:16.494Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641902400005 maxt=1641967200000 ulid=01FS6Z46FGXN932K7D39D9166D
level=info ts=2022-01-26T11:10:16.806Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641967200106 maxt=1642032000000 ulid=01FS8WXRJ7Q80FKD4C8EJNR0AD
level=info ts=2022-01-26T11:10:17.011Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642032000003 maxt=1642096800000 ulid=01FSATQE1VMNR101KRW1X10Q75
level=info ts=2022-01-26T11:10:17.305Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642096800206 maxt=1642161600000 ulid=01FSCRGVT1E7562SF7EQN12JBM
level=info ts=2022-01-26T11:10:18.240Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642161600059 maxt=1642226400000 ulid=01FSEPAFP2CX03ANRB7Q1AG514
level=info ts=2022-01-26T11:10:21.046Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642226400051 maxt=1642291200000 ulid=01FSGM3WT0TKR0XW9BD4QSKPQE
level=info ts=2022-01-26T11:10:21.422Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642291200113 maxt=1642356000000 ulid=01FSJHXKHMANW0E6FXDXVM265G
level=info ts=2022-01-26T11:10:22.822Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642356000032 maxt=1642420800000 ulid=01FSMFQ6XJ97VJFKNCYQBVB4DZ
level=info ts=2022-01-26T11:10:23.536Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642420800021 maxt=1642485600000 ulid=01FSPDGM95FDDV2CDWX93BTDCS
level=info ts=2022-01-26T11:10:23.880Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642485600072 maxt=1642550400000 ulid=01FSRBA555RWY4QNP4HD9YKRBM
level=info ts=2022-01-26T11:10:25.021Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642550400031 maxt=1642615200000 ulid=01FST93N3C82K9VS20MKTMGGYC
level=info ts=2022-01-26T11:10:25.713Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642615200014 maxt=1642680000000 ulid=01FSW6X95FRNSN1XJZ2YK0MXW7
level=info ts=2022-01-26T11:10:26.634Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642680000012 maxt=1642744800000 ulid=01FSY4PXA7V1XQHHA3MC35JSWQ
level=info ts=2022-01-26T11:10:27.776Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642744800174 maxt=1642809600000 ulid=01FT02G9XGHPV8GME53ZPMYXE6
level=info ts=2022-01-26T11:10:28.760Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642809600070 maxt=1642874400000 ulid=01FT209WP8AXXVZB1NCSC55ACE
level=info ts=2022-01-26T11:10:29.618Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642874400253 maxt=1642939200000 ulid=01FT3Y3A4H72FFW318RKHEXXGA
level=info ts=2022-01-26T11:10:30.313Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642939200047 maxt=1643004000000 ulid=01FT5VX3YC838QN5VQFAERV1QX
level=info ts=2022-01-26T11:10:30.483Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1643004000040 maxt=1643068800000 ulid=01FT7SPHC5EV0SS1R0WT04H9FR
level=info ts=2022-01-26T11:10:30.696Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1643068800035 maxt=1643133600000 ulid=01FT9QFZXBZ7EYY2CTE8WXZTB9
level=info ts=2022-01-26T11:10:31.838Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1643133600000 maxt=1643155200000 ulid=01FTA574G4M45WX97Z470DQF73
level=info ts=2022-01-26T11:10:33.686Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1643176800008 maxt=1643184000000 ulid=01FTASSZCG8V5N2VGAGFBYJBSR
level=info ts=2022-01-26T11:10:36.078Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1643184000000 maxt=1643191200000 ulid=01FTB0NP47JW5JCF808QZZ8WZQ
level=info ts=2022-01-26T11:10:36.442Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1643155200065 maxt=1643176800000 ulid=01FTB0P9H3H09B2ADD5X1RXFW6
level=info ts=2022-01-26T11:10:40.079Z caller=main.go:668 msg="Stopping scrape discovery manager..."
level=info ts=2022-01-26T11:10:40.079Z caller=main.go:682 msg="Stopping notify discovery manager..."
level=info ts=2022-01-26T11:10:40.079Z caller=main.go:704 msg="Stopping scrape manager..."
level=info ts=2022-01-26T11:10:40.079Z caller=main.go:678 msg="Notify discovery manager stopped"
level=info ts=2022-01-26T11:10:40.079Z caller=main.go:664 msg="Scrape discovery manager stopped"
level=info ts=2022-01-26T11:10:40.079Z caller=main.go:698 msg="Scrape manager stopped"
level=info ts=2022-01-26T11:10:40.080Z caller=manager.go:934 component="rule manager" msg="Stopping rule manager..."
level=info ts=2022-01-26T11:10:40.080Z caller=manager.go:944 component="rule manager" msg="Rule manager stopped"
level=info ts=2022-01-26T11:10:40.080Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..."
level=info ts=2022-01-26T11:10:40.080Z caller=main.go:872 msg="Notifier manager stopped"
level=error ts=2022-01-26T11:10:40.080Z caller=main.go:881 err="opening storage failed: lock DB directory: resource temporarily unavailable"
The yaml is provided by the Istio .Below is deployment yaml file.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
component: "server"
app: prometheus
release: prometheus
chart: prometheus-14.6.1
heritage: Helm
name: prometheus
namespace: istio-system
spec:
selector:
matchLabels:
component: "server"
app: prometheus
release: prometheus
replicas: 1
template:
metadata:
labels:
component: "server"
app: prometheus
release: prometheus
chart: prometheus-14.6.1
heritage: Helm
sidecar.istio.io/inject: "false"
spec:
enableServiceLinks: true
serviceAccountName: prometheus
containers:
- name: prometheus-server-configmap-reload
image: "jimmidyson/configmap-reload:v0.5.0"
imagePullPolicy: "IfNotPresent"
args:
- --volume-dir=/etc/config
- --webhook-url=http://127.0.0.1:9090/-/reload
resources:
{}
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
- name: prometheus-server
image: "prom/prometheus:v2.26.0"
imagePullPolicy: "IfNotPresent"
args:
- --storage.tsdb.retention.time=15d
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
ports:
- containerPort: 9090
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 4
failureThreshold: 3
successThreshold: 1
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
periodSeconds: 15
timeoutSeconds: 10
failureThreshold: 3
successThreshold: 1
resources:
{}
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: azurefileshare
mountPath: /data
subPath: ""
hostNetwork: false
dnsPolicy: ClusterFirst
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
terminationGracePeriodSeconds: 300
volumes:
- name: config-volume
configMap:
name: prometheus
- name: azurefileshare
azureFile:
secretName: log-storage-secret
shareName: prometheusfileshare
readOnly: false
Expected Behavior
When I mount the data to new container, It should load the data.
Actual Behavior
Not able to load the data or not able bind the data with newly created pod when old pod dies
Help me out, to resolve the issue.
Thank You YwH for your suggestion, Posting this an answer so it can help other community member if they encounter the same issue in future.
As stated in this document Istio provides a basic sample installation to quickly get Prometheus up and running:
This is intended for demonstration only, and is not tuned for performance or security.
Note : Isio configuration is well-suited for small clusters and
monitoring for short time horizons, it is not suitable for large-scale
meshes or monitoring over a period of days or weeks
Solution : Prometheus is a stateful application, better deployed with a StatefulSet, not Deployment.
StatefulSets are valuable for applications that require one or more of the following.
Stable, persistent storage. Ordered, graceful deployment and scaling.
You can use this Stateful code for deployment of prometheus container.

GitLab keeps loading and finally fails when deploying a dockerized node.js app

GitLab Job Log
[0KRunning with gitlab-runner 13.2.0-rc2 (45f2b4ec)
[0;m[0K on docker-auto-scale fa6cab46
[0;msection_start:1595233272:prepare_executor
[0K[0K[36;1mPreparing the "docker+machine" executor[0;m
[0;m[0KUsing Docker executor with image gitlab/dind:latest ...
[0;m[0KStarting service docker:dind ...
[0;m[0KPulling docker image docker:dind ...
[0;m[0KUsing docker image sha256:d5d139be840a6ffa04348fc87740e8c095cade6e9cb977785fdba51e5fd7ffec for docker:dind ...
[0;m[0KWaiting for services to be up and running...
[0;m
[0;33m*** WARNING:[0;m Service runner-fa6cab46-project-18378289-concurrent-0-31a688551619da9f-docker-0 probably didn't start properly.
Health check error:
service "runner-fa6cab46-project-18378289-concurrent-0-31a688551619da9f-docker-0-wait-for-service" timeout
Health check container logs:
Service container logs:
2020-07-20T08:21:19.734721788Z time="2020-07-20T08:21:19.734543379Z" level=info msg="Starting up"
2020-07-20T08:21:19.742928068Z time="2020-07-20T08:21:19.742802844Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
2020-07-20T08:21:19.743943014Z time="2020-07-20T08:21:19.743853574Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
2020-07-20T08:21:19.764021012Z time="2020-07-20T08:21:19.763898078Z" level=info msg="libcontainerd: started new containerd process" pid=23
2020-07-20T08:21:19.764159337Z time="2020-07-20T08:21:19.764107864Z" level=info msg="parsed scheme: \"unix\"" module=grpc
2020-07-20T08:21:19.764207629Z time="2020-07-20T08:21:19.764179926Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
2020-07-20T08:21:19.764319635Z time="2020-07-20T08:21:19.764279612Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
2020-07-20T08:21:19.764371375Z time="2020-07-20T08:21:19.764344798Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
2020-07-20T08:21:19.969344247Z time="2020-07-20T08:21:19.969193121Z" level=info msg="starting containerd" revision=7ad184331fa3e55e52b890ea95e65ba581ae3429 version=v1.2.13
2020-07-20T08:21:19.969863044Z time="2020-07-20T08:21:19.969784495Z" level=info msg="loading plugin "io.containerd.content.v1.content"..." type=io.containerd.content.v1
2020-07-20T08:21:19.970042302Z time="2020-07-20T08:21:19.969997665Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." type=io.containerd.snapshotter.v1
2020-07-20T08:21:19.970399514Z time="2020-07-20T08:21:19.970336671Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
2020-07-20T08:21:19.970474776Z time="2020-07-20T08:21:19.970428684Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.aufs"..." type=io.containerd.snapshotter.v1
2020-07-20T08:21:20.019585153Z time="2020-07-20T08:21:20.019421401Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="modprobe aufs failed: "ip: can't find device 'aufs'\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n": exit status 1"
2020-07-20T08:21:20.019709540Z time="2020-07-20T08:21:20.019668899Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.native"..." type=io.containerd.snapshotter.v1
2020-07-20T08:21:20.019934319Z time="2020-07-20T08:21:20.019887606Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1
2020-07-20T08:21:20.020299876Z time="2020-07-20T08:21:20.020218529Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1
2020-07-20T08:21:20.021038477Z time="2020-07-20T08:21:20.020887571Z" level=info msg="skip loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1
2020-07-20T08:21:20.021162370Z time="2020-07-20T08:21:20.021121663Z" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1
2020-07-20T08:21:20.021406797Z time="2020-07-20T08:21:20.021348536Z" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "ip: can't find device 'aufs'\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n": exit status 1"
2020-07-20T08:21:20.021487917Z time="2020-07-20T08:21:20.021435946Z" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin"
2020-07-20T08:21:20.021581245Z time="2020-07-20T08:21:20.021533539Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
2020-07-20T08:21:20.030531741Z time="2020-07-20T08:21:20.030427430Z" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." type=io.containerd.differ.v1
2020-07-20T08:21:20.030639854Z time="2020-07-20T08:21:20.030604536Z" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." type=io.containerd.gc.v1
2020-07-20T08:21:20.030779501Z time="2020-07-20T08:21:20.030736875Z" level=info msg="loading plugin "io.containerd.service.v1.containers-service"..." type=io.containerd.service.v1
2020-07-20T08:21:20.030865060Z time="2020-07-20T08:21:20.030833703Z" level=info msg="loading plugin "io.containerd.service.v1.content-service"..." type=io.containerd.service.v1
2020-07-20T08:21:20.030955439Z time="2020-07-20T08:21:20.030912981Z" level=info msg="loading plugin "io.containerd.service.v1.diff-service"..." type=io.containerd.service.v1
2020-07-20T08:21:20.031027842Z time="2020-07-20T08:21:20.030998003Z" level=info msg="loading plugin "io.containerd.service.v1.images-service"..." type=io.containerd.service.v1
2020-07-20T08:21:20.031132325Z time="2020-07-20T08:21:20.031083782Z" level=info msg="loading plugin "io.containerd.service.v1.leases-service"..." type=io.containerd.service.v1
2020-07-20T08:21:20.031202966Z time="2020-07-20T08:21:20.031174445Z" level=info msg="loading plugin "io.containerd.service.v1.namespaces-service"..." type=io.containerd.service.v1
2020-07-20T08:21:20.031286993Z time="2020-07-20T08:21:20.031253528Z" level=info msg="loading plugin "io.containerd.service.v1.snapshots-service"..." type=io.containerd.service.v1
2020-07-20T08:21:20.031370557Z time="2020-07-20T08:21:20.031312376Z" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." type=io.containerd.runtime.v1
2020-07-20T08:21:20.031709756Z time="2020-07-20T08:21:20.031650044Z" level=info msg="loading plugin "io.containerd.runtime.v2.task"..." type=io.containerd.runtime.v2
2020-07-20T08:21:20.031941868Z time="2020-07-20T08:21:20.031897088Z" level=info msg="loading plugin "io.containerd.monitor.v1.cgroups"..." type=io.containerd.monitor.v1
2020-07-20T08:21:20.032929781Z time="2020-07-20T08:21:20.032846588Z" level=info msg="loading plugin "io.containerd.service.v1.tasks-service"..." type=io.containerd.service.v1
2020-07-20T08:21:20.033064279Z time="2020-07-20T08:21:20.033014391Z" level=info msg="loading plugin "io.containerd.internal.v1.restart"..." type=io.containerd.internal.v1
2020-07-20T08:21:20.034207198Z time="2020-07-20T08:21:20.034120505Z" level=info msg="loading plugin "io.containerd.grpc.v1.containers"..." type=io.containerd.grpc.v1
2020-07-20T08:21:20.034316027Z time="2020-07-20T08:21:20.034279582Z" level=info msg="loading plugin "io.containerd.grpc.v1.content"..." type=io.containerd.grpc.v1
2020-07-20T08:21:20.034402334Z time="2020-07-20T08:21:20.034369239Z" level=info msg="loading plugin "io.containerd.grpc.v1.diff"..." type=io.containerd.grpc.v1
2020-07-20T08:21:20.034482782Z time="2020-07-20T08:21:20.034452282Z" level=info msg="loading plugin "io.containerd.grpc.v1.events"..." type=io.containerd.grpc.v1
2020-07-20T08:21:20.034564724Z time="2020-07-20T08:21:20.034533365Z" level=info msg="loading plugin "io.containerd.grpc.v1.healthcheck"..." type=io.containerd.grpc.v1
2020-07-20T08:21:20.034645756Z time="2020-07-20T08:21:20.034617060Z" level=info msg="loading plugin "io.containerd.grpc.v1.images"..." type=io.containerd.grpc.v1
2020-07-20T08:21:20.034722695Z time="2020-07-20T08:21:20.034689037Z" level=info msg="loading plugin "io.containerd.grpc.v1.leases"..." type=io.containerd.grpc.v1
2020-07-20T08:21:20.034800005Z time="2020-07-20T08:21:20.034770572Z" level=info msg="loading plugin "io.containerd.grpc.v1.namespaces"..." type=io.containerd.grpc.v1
2020-07-20T08:21:20.034873069Z time="2020-07-20T08:21:20.034837050Z" level=info msg="loading plugin "io.containerd.internal.v1.opt"..." type=io.containerd.internal.v1
2020-07-20T08:21:20.036608424Z time="2020-07-20T08:21:20.036525701Z" level=info msg="loading plugin "io.containerd.grpc.v1.snapshots"..." type=io.containerd.grpc.v1
2020-07-20T08:21:20.036722927Z time="2020-07-20T08:21:20.036684403Z" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." type=io.containerd.grpc.v1
2020-07-20T08:21:20.036799326Z time="2020-07-20T08:21:20.036769392Z" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." type=io.containerd.grpc.v1
2020-07-20T08:21:20.036876692Z time="2020-07-20T08:21:20.036844684Z" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." type=io.containerd.grpc.v1
2020-07-20T08:21:20.037291381Z time="2020-07-20T08:21:20.037244979Z" level=info msg=serving... address="/var/run/docker/containerd/containerd-debug.sock"
2020-07-20T08:21:20.037493736Z time="2020-07-20T08:21:20.037445814Z" level=info msg=serving... address="/var/run/docker/containerd/containerd.sock"
2020-07-20T08:21:20.037563487Z time="2020-07-20T08:21:20.037522305Z" level=info msg="containerd successfully booted in 0.069638s"
2020-07-20T08:21:20.087933162Z time="2020-07-20T08:21:20.087804902Z" level=info msg="Setting the storage driver from the $DOCKER_DRIVER environment variable (overlay2)"
2020-07-20T08:21:20.088415387Z time="2020-07-20T08:21:20.088327506Z" level=info msg="parsed scheme: \"unix\"" module=grpc
2020-07-20T08:21:20.088533804Z time="2020-07-20T08:21:20.088465157Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
2020-07-20T08:21:20.088620947Z time="2020-07-20T08:21:20.088562235Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
2020-07-20T08:21:20.088709546Z time="2020-07-20T08:21:20.088654016Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
2020-07-20T08:21:20.092857445Z time="2020-07-20T08:21:20.092749940Z" level=info msg="parsed scheme: \"unix\"" module=grpc
2020-07-20T08:21:20.092962469Z time="2020-07-20T08:21:20.092914347Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
2020-07-20T08:21:20.093060327Z time="2020-07-20T08:21:20.093013905Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
2020-07-20T08:21:20.093142744Z time="2020-07-20T08:21:20.093102173Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
2020-07-20T08:21:20.149109416Z time="2020-07-20T08:21:20.148965236Z" level=info msg="Loading containers: start."
2020-07-20T08:21:20.159351905Z time="2020-07-20T08:21:20.159146135Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: ip: can't find device 'bridge'\nbridge 167936 1 br_netfilter\nstp 16384 1 bridge\nllc 16384 2 bridge,stp\nip: can't find device 'br_netfilter'\nbr_netfilter 24576 0 \nbridge 167936 1 br_netfilter\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n, error: exit status 1"
2020-07-20T08:21:20.280536391Z time="2020-07-20T08:21:20.280402152Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.18.0.0/16. Daemon option --bip can be used to set a preferred IP address"
2020-07-20T08:21:20.337028532Z time="2020-07-20T08:21:20.336889956Z" level=info msg="Loading containers: done."
2020-07-20T08:21:20.435200532Z time="2020-07-20T08:21:20.435033092Z" level=info msg="Docker daemon" commit=48a66213fe graphdriver(s)=overlay2 version=19.03.12
2020-07-20T08:21:20.436386855Z time="2020-07-20T08:21:20.436266338Z" level=info msg="Daemon has completed initialization"
2020-07-20T08:21:20.476621441Z time="2020-07-20T08:21:20.475137317Z" level=info msg="API listen on [::]:2375"
2020-07-20T08:21:20.477679219Z time="2020-07-20T08:21:20.477535808Z" level=info msg="API listen on /var/run/docker.sock"
[0;33m*********[0;m
[0KPulling docker image gitlab/dind:latest ...
[0;m[0KUsing docker image sha256:cc674e878f23bdc3c36cc37596d31adaa23bca0fc3ed18bea9b59abc638602e1 for gitlab/dind:latest ...
[0;msection_end:1595233326:prepare_executor
[0Ksection_start:1595233326:prepare_script
[0K[0K[36;1mPreparing environment[0;m
[0;mRunning on runner-fa6cab46-project-18378289-concurrent-0 via runner-fa6cab46-srm-1595233216-1bd30100...
section_end:1595233330:prepare_script
[0Ksection_start:1595233330:get_sources
[0K[0K[36;1mGetting source from Git repository[0;m
[0;m[32;1m$ eval "$CI_PRE_CLONE_SCRIPT"[0;m
[32;1mFetching changes with git depth set to 50...[0;m
Initialized empty Git repository in /builds/xxx.us/backend/.git/
[32;1mCreated fresh repository.[0;m
[32;1mChecking out 257ffdf2 as stage...[0;m
[32;1mSkipping Git submodules setup[0;m
section_end:1595233333:get_sources
[0Ksection_start:1595233333:restore_cache
[0K[0K[36;1mRestoring cache[0;m
[0;m[32;1mChecking cache for stage node:14.5.0-alpine-2...[0;m
Downloading cache.zip from https://storage.googleapis.com/gitlab-com-runners-cache/project/18378289/stage%20node:14.5.0-alpine-2[0;m
[32;1mSuccessfully extracted cache[0;m
section_end:1595233334:restore_cache
[0Ksection_start:1595233334:step_script
[0K[0K[36;1mExecuting "step_script" stage of the job script[0;m
[0;mln: failed to create symbolic link '/sys/fs/cgroup/systemd/name=systemd': Operation not permitted
time="2020-07-20T08:22:14.844844859Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting -tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
time="2020-07-20T08:22:14.846663310Z" level=info msg="libcontainerd: new containerd process, pid: 57"
time="2020-07-20T08:22:14.906788853Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
time="2020-07-20T08:22:14.907996055Z" level=info msg="Loading containers: start."
time="2020-07-20T08:22:14.910877638Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.78-coreos/modules.dep.bin'\nmodprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.78-coreos/modules.dep.bin'\n, error: exit status 1"
time="2020-07-20T08:22:14.912665866Z" level=warning msg="Running modprobe nf_nat failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.78-coreos/modules.dep.bin'`, error: exit status 1"
time="2020-07-20T08:22:14.914201302Z" level=warning msg="Running modprobe xt_conntrack failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.78-coreos/modules.dep.bin'`, error: exit status 1"
time="2020-07-20T08:22:14.989456423Z" level=warning msg="Could not load necessary modules for IPSEC rules: Running modprobe xfrm_user failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.78-coreos/modules.dep.bin'`, error: exit status 1"
time="2020-07-20T08:22:14.990108153Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.18.0.0/16. Daemon option --bip can be used to set a preferred IP address"
time="2020-07-20T08:22:15.029286773Z" level=info msg="Loading containers: done."
time="2020-07-20T08:22:15.029664106Z" level=info msg="Daemon has completed initialization"
time="2020-07-20T08:22:15.029823541Z" level=info msg="Docker daemon" commit=23cf638 graphdriver=overlay2 version=1.12.1
time="2020-07-20T08:22:15.048665494Z" level=info msg="API listen on /var/run/docker.sock"
time="2020-07-20T08:22:15.049046558Z" level=info msg="API listen on [::]:7070"
# Keeps loading and finally fails after a couple of seconds
gitlab-ci.yml
cache:
key: '$CI_COMMIT_REF_NAME node:14.5.0-alpine'
paths:
- node_modules/
stages:
- release
- deploy
variables:
TAGGED_IMAGE: '$CI_REGISTRY_IMAGE:latest'
.release:
stage: release
image: docker:19.03.12
services:
- docker:dind
variables:
DOCKER_DRIVER: overlay2
DOCKER_BUILDKIT: 1
before_script:
- docker version
- docker info
- echo "$CI_JOB_TOKEN" | docker login --username $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
script:
- docker build --pull --tag $TAGGED_IMAGE --cache-from $TAGGED_IMAGE --build-arg NODE_ENV=$CI_ENVIRONMENT_NAME .
- docker push $TAGGED_IMAGE
after_script:
- docker logout $CI_REGISTRY
.deploy:
stage: deploy
image: gitlab/dind:latest
services:
- docker:dind
variables:
DOCKER_COMPOSE_PATH: '~/docker-composes/$CI_PROJECT_PATH/docker-compose.yml'
before_script:
- 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client -y )'
- eval $(ssh-agent -s)
- echo "$DEPLOY_SERVER_PRIVATE_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan $DEPLOYMENT_SERVER_IP >> ~/.ssh/known_hosts
- chmod 644 ~/.ssh/known_hosts
script:
- rsync -avR --rsync-path="mkdir -p ~/docker-composes/$CI_PROJECT_PATH/; rsync" ./docker-compose.yml root#$DEPLOYMENT_SERVER_IP:~/docker-composes/$CI_PROJECT_PATH/
- ssh root#$DEPLOYMENT_SERVER_IP "echo "$CI_REGISTRY_PASSWORD" | docker login --username $CI_REGISTRY_USER --password-stdin $CI_REGISTRY; docker-compose -f $DOCKER_COMPOSE_PATH rm -f -s -v $CI_COMMIT_REF_NAME; docker pull $TAGGED_IMAGE; docker-compose -f $DOCKER_COMPOSE_PATH up -d $CI_COMMIT_REF_NAME;"
release_stage:
extends: .release
only:
- stage
environment:
name: staging
deploy_stage:
extends: .deploy
only:
- stage
environment:
name: staging
Dockerfile
ARG NODE_ENV
FROM node:14.5.0-alpine
ARG NODE_ENV
ENV NODE_ENV ${NODE_ENV}
# Set working directory
WORKDIR /var/www/
# Install app dependencies
COPY package.json package-lock.json ./
RUN npm ci --silent --only=production
COPY . ./
# Start the application
CMD [ "npm", "run", "start" ]
docker-compose.yml
version: '3.8'
services:
redis-stage:
container_name: redis-stage
image: redis:6.0.5-alpine
ports:
- '7075:6379'
restart: always
networks:
- my-proxy-net
stage:
container_name: xxx-backend-stage
image: registry.gitlab.com/xxx.us/backend:latest
build: .
expose:
- '7070'
restart: always
networks:
- my-proxy-net
depends_on:
- redis-stage
environment:
VIRTUAL_HOST: backend.xxx.us
VIRTUAL_PROTO: https
LETSENCRYPT_HOST: backend.xxx.us
networks:
my-proxy-net:
external:
name: mynetwork
Update 1
I got a warning on the page claims I have used over 30% of my shared runner minutes. Maybe it is about not having enough minutes.
Update 2
The release stage gets completed successfully.
Update 3
Before I get into this problem, I deployed once successfully. I decided to test that commit once again and see if it succeeds, but it fails!
I fixed the issue. In my case, it was PORT (absolutely) and HOST (maybe) environment variables I defined manually in the GitLab CI/CD Variables section. It seems PORT and maybe HOST are two reserved environment variables for GitLab and/or Docker.
By the way, I couldn't find anything in docs state not using those variables names.

monitoring cassandra with prometheus monitoring tool

My prometheus tool is on centos 7 machine and cassandra is on centos 6. I am trying to monitor cassandra JMX port 7199 with prometheus. I keep getting error with my yml file. Not sure why I am not able to connect to the centos 6 (cassandra machine) Is my YAML file wrong or does it have something to do with JMX port 7199?
here is my YAML file:
my global config
global:
scrape_interval: 15s
scrape_configs:
- job_name: cassandra
static_configs:
- targets: ['10.1.0.22:7199']
Here is my prometheus log:
level=info ts=2017-12-08T04:30:53.92549611Z caller=main.go:215 msg="Starting Prometheus" version="(version=2.0.0, branch=HEAD, revision=0a74f98628a0463dddc90528220c94de5032d1a0)"
level=info ts=2017-12-08T04:30:53.925623847Z caller=main.go:216 build_context="(go=go1.9.2, user=root#615b82cb36b6, date=20171108-07:11:59)"
level=info ts=2017-12-08T04:30:53.92566228Z caller=main.go:217 host_details="(Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 localhost.localdomain (none))"
level=info ts=2017-12-08T04:30:53.932807536Z caller=web.go:380 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2017-12-08T04:30:53.93303681Z caller=targetmanager.go:71 component="target manager" msg="Starting target manager..."
level=info ts=2017-12-08T04:30:53.932905473Z caller=main.go:314 msg="Starting TSDB"
level=info ts=2017-12-08T04:30:53.987468942Z caller=main.go:326 msg="TSDB started"
level=info ts=2017-12-08T04:30:53.987582063Z caller=main.go:394 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2017-12-08T04:30:53.988366778Z caller=main.go:371 msg="Server is ready to receive requests."
level=warn ts=2017-12-08T04:31:00.561007282Z caller=main.go:377 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2017-12-08T04:31:00.563191668Z caller=main.go:384 msg="See you next time!"
level=info ts=2017-12-08T04:31:00.566231211Z caller=targetmanager.go:87 component="target manager" msg="Stopping target manager..."
level=info ts=2017-12-08T04:31:00.567070099Z caller=targetmanager.go:99 component="target manager" msg="Target manager stopped"
level=info ts=2017-12-08T04:31:00.567136027Z caller=manager.go:455 component="rule manager" msg="Stopping rule manager..."
level=info ts=2017-12-08T04:31:00.567162215Z caller=manager.go:461 component="rule manager" msg="Rule manager stopped"
level=info ts=2017-12-08T04:31:00.567186356Z caller=notifier.go:483 component=notifier msg="Stopping notification handler..."
If anyone has instruction on how to connect prometheus to cassandra , both being on two different machines, that would be helpful too.
This is not a problem with your config, prometheus received a TERM signal and terminated gracefully.
If you are not getting metrics, check whether 10.1.0.22:7199/metrics loads and returns metrics. You can also check the prometheus server's /targets endpoint for scraping status.
If you're not getting anything on your cassandra server's /metrics endpoint, it could be because you did not configure the cassandra prometheus exporter properly.

Resources