OpenEBS target pod is not able to communicate with its replicas after deleting one of the worker node from the cluster - openebs
Having a problem with an OpenEBS data store. Set up is with 3 OpenEBS storage replica on 3 different VMs.
Initially the work pod (postgresql) went into read-only mode, so I deleted first the work node and, after it didn't recover, the openEBS ctrl pod. Now it seems the ctrl pod cannot reconnect with all 3 replicas and keeps showing the message:
level=warning msg="No of yet to be registered replicas are less than 3 , No of registered replicas: 1"
The replica that seems to have managed to connect keeps logging repeatedly:
time="2019-01-22T08:04:12Z" level=info msg="Get Volume info from controller"
time="2019-01-22T08:04:12Z" level=info msg="Register replica at controller"
Target pod logs
"2019-01-22T06:55:46.064Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:46Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:46.065Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:46Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:48.076Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:48Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:48.075Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:48Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:50.085Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:50Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:49.083Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:49Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:50.086Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:50Z"" level=warning msg=busy"
"2019-01-22T06:55:50.085Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:50Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:49.084Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:49Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:53.105Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:53Z"" level=warning msg=busy"
"2019-01-22T06:55:53.104Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:53Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:55.117Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:55Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:54.107Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:54Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:54.107Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:54Z"" level=error msg=""Mode: ReadOnly"""
Replica pod which is not yet conencted
"2019-01-22T06:56:24.117Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Done running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9700 volume-head-010.img.meta]"""
"2019-01-22T06:56:24.866Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""source file size: 12884901888, setting up directIo: true"""
"2019-01-22T06:56:11.390Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:11Z"" level=info msg=""Get Volume Usage"""
"2019-01-22T06:56:23.881Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:23Z"" level=info msg=""Snapshotting [d82c79af-06fd-4bc4-bd67-c54fa636e596] volume, user created false, created time 2019-01-22T06:56:23Z"""
"2019-01-22T06:56:23.924Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","10.233.96.147 - - [22/Jan/2019:06:56:23 +0000] ""POST /v1/replicas/1?action=snapshot HTTP/1.1"" 200 14804"
"2019-01-22T06:56:24.049Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9700 volume-head-010.img.meta]"""
"2019-01-22T06:56:24.828Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9701 volume-snap-6b38fe32-98ab-4f95-8b2d-05ba9aebfe0e.img]"""
"2019-01-22T06:56:24.885Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""The file is a hole: [ 0: 3145728](3145728)"""
"2019-01-22T06:56:24.886Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Ssync client: exit code 0"""
"2019-01-22T06:56:23.872Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:23Z"" level=info msg=""GetReplica for id 1"""
"2019-01-22T06:56:24.019Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=GetReplica"
"2019-01-22T06:56:24.019Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""GetReplica for id 1"""
"2019-01-22T06:56:24.886Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Done running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9701 volume-snap-6b38fe32-98ab-4f95-8b2d-05ba9aebfe0e.img]"""
"2019-01-22T06:56:25.607Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:25Z"" level=info msg=""source file size: 112, setting up directIo: false"""
"2019-01-22T06:56:25.614Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:25Z"" level=warning msg=""Failed to open server: 10.233.91.202:9702, Retrying..."""
"2019-01-22T06:56:26.628Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:26Z"" level=info msg=""Ssync client: exit code 0"""
"2019-01-22T06:56:28.353Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9703 volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img]"""
"2019-01-22T06:56:28.419Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""source file size: 12884901888, setting up directIo: true"""
"2019-01-22T06:56:28.428Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""The file is a hole: [ 0: 3145728](3145728)"""
"2019-01-22T06:56:28.431Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""Ssync client: exit code 0"""
"2019-01-22T06:56:29.121Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9704 volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img.meta]"""
"2019-01-22T06:56:29.900Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Syncing volume-snap-f8771212-06d3-400b-ad12-c063ef8ed827.img to 10.233.91.202:9705...\n"""
"2019-01-22T06:56:29.900Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""source file size: 12884901888, setting up directIo: true"""
"2019-01-22T06:56:29.904Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=warning msg=""Failed to open server: 10.233.91.202:9705, Retrying..."""
"2019-01-22T06:56:25.607Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:25Z"" level=info msg=""Syncing volume-snap-6b38fe32-98ab-4f95-8b2d-05ba9aebfe0e.img.meta to 10.233.91.202:9702...\n"""
"2019-01-22T06:56:25.584Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:25Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9702 volume-snap-6b38fe32-98ab-4f95-8b2d-05ba9aebfe0e.img.meta]"""
"2019-01-22T06:56:28.419Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""Syncing volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img to 10.233.91.202:9703...\n"""
"2019-01-22T06:56:29.215Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Done running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9704 volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img.meta]"""
"2019-01-22T06:56:29.880Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9705 volume-snap-f8771212-06d3-400b-ad12-c063ef8ed827.img]"""
"2019-01-22T06:56:28.434Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""Done running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9703 volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img]"""
"2019-01-22T06:56:29.211Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Ssync client: exit code 0"""
"2019-01-22T06:56:29.183Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""source file size: 164, setting up directIo: false"""
"2019-01-22T06:56:29.905Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=warning msg=""Failed to open server: 10.233.91.202:9705, Retrying..."""
"2019-01-22T06:56:41.391Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:41Z"" level=info msg=GetUsage"
"2019-01-22T06:56:41.392Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","10.233.96.147 - - [22/Jan/2019:06:56:41 +0000] ""GET /v1/replicas/1/volusage HTTP/1.1"" 200 200"
"2019-01-22T06:56:41.390Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:41Z"" level=info msg=""Get Volume Usage"""
"2019-01-22T06:59:11.392Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:59:11Z"" level=info msg=GetUsage"
"2019-01-22T06:59:11.392Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:59:11Z"" level=info msg=""Get Volume Usage"""
"2019-01-22T07:00:38.050Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:38Z"" level=error msg=""Received EOF: EOF"""
"2019-01-22T07:00:38.050Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:38Z"" level=info msg=""Restart AutoConfigure Process"""
"2019-01-22T07:00:43.232Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","10.233.91.234 - - [22/Jan/2019:07:00:43 +0000] ""POST /v1/replicas/1?action=start HTTP/1.1"" 200 1091"
"2019-01-22T07:00:43.238Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=""GetReplica for id 1"""
"2019-01-22T07:00:43.409Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=""GetReplica for id 1"""
"2019-01-22T07:00:43.465Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=GetReplica"
"2019-01-22T07:00:43.239Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=""Got signal: 'open', proceed to open replica"""
"2019-01-22T07:00:43.585Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","10.233.91.234 - - [22/Jan/2019:07:00:43 +0000] ""POST /v1/replicas/1?action=snapshot HTTP/1.1"" 200 15190"
"2019-01-22T07:00:43.666Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=GetReplica"
After going through the logs, I can see that replicas were registered to controller but one of the replica is getting synced with other healthy replica, which might take some time
And after sometime I can see from target pod
level=warning msg="No of yet to be registered replicas are less than 3 , No of registered replicas: 1"
which no longer shows up. I think it is recovering right now. I have a data of 12GiB size.
Related
Grafana UI not showing up (via docker-compose)
I've been facing some issues with grafana and docker. I'm trying to get a grafana and influxdb up, for monitoring my MISP instance. The project which I'm following is: https://github.com/MISP/misp-grafana The InfluxDB docker is up and receiving logs via telegraf... seems to be working fine. But the Grafana Dashboards does not show up via https://ip:3000/login. From inside of the machine where MISP is running(the docker containers are also running on this machine) i can "CURL" the address and receive the HTML from Grafana's login page. Could someone help me? Have already tried lots of suggestions, but none of them work as expected. I've already tried to disable Iptables and firewalld(since i'm using an CentOS7), and nothing helped. My docker-compose file is: services: influxdb: image: influxdb:latest container_name: influxdb volumes: - influxdb-storage:/var/lib/influxdb2:rw # - ./influxdb/ssl/influxdb-selfsigned.crt:/etc/ssl/influxdb-selfsigned.crt:rw # - ./influxdb/ssl/influxdb-selfsigned.key:/etc/ssl/influxdb-selfsigned.key:rw ports: - "8086:8086" environment: - DOCKER_INFLUXDB_INIT_MODE=${DOCKER_INFLUXDB_INIT_MODE} - DOCKER_INFLUXDB_INIT_USERNAME=${DOCKER_INFLUXDB_INIT_USERNAME} - DOCKER_INFLUXDB_INIT_PASSWORD=${DOCKER_INFLUXDB_INIT_PASSWORD} - DOCKER_INFLUXDB_INIT_ORG=${DOCKER_INFLUXDB_INIT_ORG} - DOCKER_INFLUXDB_INIT_BUCKET=${DOCKER_INFLUXDB_INIT_BUCKET} - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=${DOCKER_INFLUXDB_INIT_ADMIN_TOKEN} # - INFLUXD_TLS_CERT=/etc/ssl/influxdb-selfsigned.crt # - INFLUXD_TLS_KEY=/etc/ssl/influxdb-selfsigned.key grafana: image: grafana/grafana:latest container_name: grafana ports: - "3000:3000" volumes: - grafana-storage:/var/lib/grafana - ./grafana/provisioning:/etc/grafana/provisioning depends_on: - influxdb environment: - GF_SECURITY_ADMIN_USER=${DOCKER_GRAFANA_USERNAME} - GF_SECURITY_ADMIN_PASSWORD=${DOCKER_GRAFANA_PASSWORD} network_mode: host volumes: influxdb-storage: grafana-storage: The logs from Grafana container are: GF_PATHS_CONFIG='/etc/grafana/grafana.ini' is not readable. GF_PATHS_DATA='/var/lib/grafana' is not writable. GF_PATHS_HOME='/usr/share/grafana' is not readable. You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migrate-to-v51-or-later logger=settings t=2023-01-18T11:23:18.545281441Z level=info msg="Starting Grafana" version=9.3.2 commit=21c1d14e91 branch=HEAD compiled=2022-12-14T10:40:18Z logger=settings t=2023-01-18T11:23:18.545574106Z level=info msg="Config loaded from" file=/usr/share/grafana/conf/defaults.ini logger=settings t=2023-01-18T11:23:18.545595127Z level=info msg="Config loaded from" file=/etc/grafana/grafana.ini logger=settings t=2023-01-18T11:23:18.545601849Z level=info msg="Config overridden from command line" arg="default.paths.data=/var/lib/grafana" logger=settings t=2023-01-18T11:23:18.545610343Z level=info msg="Config overridden from command line" arg="default.paths.logs=/var/log/grafana" logger=settings t=2023-01-18T11:23:18.54561656Z level=info msg="Config overridden from command line" arg="default.paths.plugins=/var/lib/grafana/plugins" logger=settings t=2023-01-18T11:23:18.545623137Z level=info msg="Config overridden from command line" arg="default.paths.provisioning=/etc/grafana/provisioning" logger=settings t=2023-01-18T11:23:18.5456313Z level=info msg="Config overridden from command line" arg="default.log.mode=console" logger=settings t=2023-01-18T11:23:18.545637996Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_DATA=/var/lib/grafana" logger=settings t=2023-01-18T11:23:18.545648448Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_LOGS=/var/log/grafana" logger=settings t=2023-01-18T11:23:18.545654176Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_PLUGINS=/var/lib/grafana/plugins" logger=settings t=2023-01-18T11:23:18.545663184Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_PROVISIONING=/etc/grafana/provisioning" logger=settings t=2023-01-18T11:23:18.545668879Z level=info msg="Config overridden from Environment variable" var="GF_SECURITY_ADMIN_USER=tsec" logger=settings t=2023-01-18T11:23:18.545682275Z level=info msg="Config overridden from Environment variable" var="GF_SECURITY_ADMIN_PASSWORD=*********" logger=settings t=2023-01-18T11:23:18.545689113Z level=info msg="Path Home" path=/usr/share/grafana logger=settings t=2023-01-18T11:23:18.545699682Z level=info msg="Path Data" path=/var/lib/grafana logger=settings t=2023-01-18T11:23:18.545705402Z level=info msg="Path Logs" path=/var/log/grafana logger=settings t=2023-01-18T11:23:18.545710714Z level=info msg="Path Plugins" path=/var/lib/grafana/plugins logger=settings t=2023-01-18T11:23:18.545732177Z level=info msg="Path Provisioning" path=/etc/grafana/provisioning logger=settings t=2023-01-18T11:23:18.5457395Z level=info msg="App mode production" logger=sqlstore t=2023-01-18T11:23:18.545859098Z level=info msg="Connecting to DB" dbtype=sqlite3 logger=migrator t=2023-01-18T11:23:18.575806909Z level=info msg="Starting DB migrations" logger=migrator t=2023-01-18T11:23:18.584646143Z level=info msg="migrations completed" performed=0 skipped=464 duration=1.036135ms logger=plugin.loader t=2023-01-18T11:23:18.694560017Z level=info msg="Plugin registered" pluginID=input logger=secrets t=2023-01-18T11:23:18.695056176Z level=info msg="Envelope encryption state" enabled=true currentprovider=secretKey.v1 logger=query_data t=2023-01-18T11:23:18.698004003Z level=info msg="Query Service initialization" logger=live.push_http t=2023-01-18T11:23:18.709944098Z level=info msg="Live Push Gateway initialization" logger=infra.usagestats.collector t=2023-01-18T11:23:19.076511711Z level=info msg="registering usage stat providers" usageStatsProvidersLen=2 logger=provisioning.plugins t=2023-01-18T11:23:19.133661231Z level=error msg="Failed to read plugin provisioning files from directory" path=/etc/grafana/provisioning/plugins error="open /etc/grafana/provisioning/plugins: no such file or directory" logger=provisioning.notifiers t=2023-01-18T11:23:19.133823449Z level=error msg="Can't read alert notification provisioning files from directory" path=/etc/grafana/provisioning/notifiers error="open /etc/grafana/provisioning/notifiers: no such file or directory" logger=provisioning.alerting t=2023-01-18T11:23:19.133926705Z level=error msg="can't read alerting provisioning files from directory" path=/etc/grafana/provisioning/alerting error="open /etc/grafana/provisioning/alerting: no such file or directory" logger=provisioning.alerting t=2023-01-18T11:23:19.133951102Z level=info msg="starting to provision alerting" logger=provisioning.alerting t=2023-01-18T11:23:19.133992848Z level=info msg="finished to provision alerting" logger=ngalert.state.manager t=2023-01-18T11:23:19.134747843Z level=info msg="Warming state cache for startup" logger=grafanaStorageLogger t=2023-01-18T11:23:19.140618617Z level=info msg="storage starting" logger=http.server t=2023-01-18T11:23:19.140693638Z level=info msg="HTTP Server Listen" address=[::]:3000 protocol=http subUrl= socket= logger=ngalert.state.manager t=2023-01-18T11:23:19.16651119Z level=info msg="State cache has been initialized" states=0 duration=31.757492ms logger=ticker t=2023-01-18T11:23:19.166633607Z level=info msg=starting first_tick=2023-01-18T11:23:20Z logger=ngalert.multiorg.alertmanager t=2023-01-18T11:23:19.166666209Z level=info msg="starting MultiOrg Alertmanager" logger=context userId=0 orgId=0 uname= t=2023-01-18T11:23:27.6249068Z level=info msg="Request Completed" method=GET path=/ status=302 remote_addr=1.134.26.244 time_ms=0 duration=610.399µs size=29 referer= handler=/ logger=cleanup t=2023-01-18T11:33:19.14617771Z level=info msg="Completed cleanup jobs" duration=7.595219ms The curl http://ip:3000/login command from inside the CentOS7 machine answer (just a part of it, since it's big): <!doctype html><html lang="en"><head><meta charset="utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"/><meta name="viewport" content="width=device-width"/><meta name="theme-color" content="#000"/><title>Grafana</title><base href="/"/><link rel="preload" href="public/fonts/roboto/RxZJdnzeo3R5zSexge8UUVtXRa8TVwTICgirnJhmVJw.woff2" as="font" crossorigin/><link rel="icon" type="image/png" href="public/img/fav32.png"/><link rel="apple-touch-icon" sizes="180x180" href="public/img/apple-touch-icon.png"/><link rel="mask-icon" href="public/img/grafana_mask_icon.svg" color="#F05A28"/><link rel="stylesheet" href="public/build/grafana.dark.960bbecc684cac29c4a2.css"/><script nonce="">performance.mark('frontend_boot_css_time_seconds');</script><meta name="apple-mobile-web-app-capable" content="yes"/><meta name="apple-mobile-web-app-status-bar-style" content="black"/><meta name="msapplication-TileColor" content="#2b5797"/><meta name="msapplication-config" content="public/img/browserconfig.xml"/></head><body class="theme-dark app-grafana"><style>.preloader { height: 100%; flex-direction: column; display: flex; justify-content: center; align-items: center; } ... The telnet ip 3000 command from another machine gives me and error. The `netstat -naput | grep LISTEN" on the CentOS7 machine: tcp6 0 0 :::8086 :::* LISTEN 30124/docker-proxy- tcp6 0 0 :::3000 :::* LISTEN 30241/grafana-serve I've already tried to change de 3000 port to another one (avoiding firewall blocks) but it did not work. Help me please.....
Prometheus Execution Timeout Exceeded
I use Grafana to monitor my company's infrastructure. Everything worked fine until this week, I started to see alerts on Grafana with an error message : request handler error: Post "http://prometheus-ip:9090/api/v1/query_range": dial tcp prometheus-ip:9090: i/o timeout I tried to restart the prometheus server but it seems that it can't be stopped. I have to kill -9 the server and restart it. Here's the log : Jun 16 01:04:01 prometheus prometheus[18869]: time="2022-06-16T01:04:01+02:00" level=info msg="All requests for rebuilding the label indexes queued. (Actual processing may lag behind.)" source="crashrecovery.go:529" Jun 16 01:04:01 prometheus prometheus[18869]: time="2022-06-16T01:04:01+02:00" level=info msg="Checkpointing fingerprint mappings..." source="persistence.go:1480" Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="Done checkpointing fingerprint mappings in 286.224481ms." source="persistence.go:1503" Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=warning msg="Crash recovery complete." source="crashrecovery.go:152" Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="362306 series loaded." source="storage.go:378" Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="Starting target manager..." source="targetmanager.go:61" Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="Listening on :9090" source="web.go:235" Jun 16 01:04:15 prometheus prometheus[18869]: time="2022-06-16T01:04:15+02:00" level=warning msg="Storage has entered rushed mode." chunksToPersist=420483 maxChunksToPersist=524288 maxMemoryChunks=1048576 memoryChunks=655877 source="storage.go:1660" urgencyScore=0.8020076751708984 Jun 16 01:09:02 prometheus prometheus[18869]: time="2022-06-16T01:09:02+02:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612" Jun 16 01:10:05 prometheus prometheus[18869]: time="2022-06-16T01:10:05+02:00" level=info msg="Done checkpointing in-memory metrics and chunks in 1m3.127365726s." source="persistence.go:639" Jun 16 01:12:25 prometheus prometheus[18869]: time="2022-06-16T01:12:25+02:00" level=warning msg="Received SIGTERM, exiting gracefully..." source="main.go:230" Jun 16 01:12:25 prometheus prometheus[18869]: time="2022-06-16T01:12:25+02:00" level=info msg="See you next time!" source="main.go:237" Jun 16 01:12:25 prometheus prometheus[18869]: time="2022-06-16T01:12:25+02:00" level=info msg="Stopping target manager..." source="targetmanager.go:75" Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping rule manager..." source="manager.go:374" Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Rule manager stopped." source="manager.go:380" Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping notification handler..." source="notifier.go:369" Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping local storage..." source="storage.go:396" Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping maintenance loop..." source="storage.go:398" Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Maintenance loop stopped." source="storage.go:1259" Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping series quarantining..." source="storage.go:402" Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Series quarantining stopped." source="storage.go:1701" Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping chunk eviction..." source="storage.go:406" Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Chunk eviction stopped." source="storage.go:1079" Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612" Jun 16 01:12:44 prometheus prometheus[18869]: time="2022-06-16T01:12:44+02:00" level=info msg="Done checkpointing in-memory metrics and chunks in 16.170119611s." source="persistence.go:639" Jun 16 01:12:44 prometheus prometheus[18869]: time="2022-06-16T01:12:44+02:00" level=info msg="Checkpointing fingerprint mappings..." source="persistence.go:1480" Jun 16 01:12:45 prometheus prometheus[18869]: time="2022-06-16T01:12:45+02:00" level=info msg="Done checkpointing fingerprint mappings in 651.409422ms." source="persistence.go:1503" Jun 16 01:12:45 prometheus systemd[1]: prometheus.service: State 'stop-sigterm' timed out. Skipping SIGKILL. Jun 16 01:13:06 prometheus systemd[1]: prometheus.service: State 'stop-final-sigterm' timed out. Skipping SIGKILL. Entering failed mode. Jun 16 01:13:06 prometheus systemd[1]: prometheus.service: Unit entered failed state. Jun 16 01:13:06 prometheus systemd[1]: prometheus.service: Failed with result 'timeout'. Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=info msg="Starting prometheus (version=1.5.2+ds, branch=debian/sid, revision=1.5.2+ds-2+b3)" source="main.go:75" Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=info msg="Build context (go=go1.7.4, user=pkg-go-maintainers#lists.alioth.debian.org, date=20170521-14:39:14)" source="main.go:76" Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:248" Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=error msg="Could not lock /path/to/prometheus/metrics/DIRTY, Prometheus already running?" source="persistence.go:198" Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=error msg="Error opening memory series storage: resource temporarily unavailable" source="main.go:182" Jun 16 01:13:24 prometheus systemd[1]: prometheus.service: Main process exited, code=exited, status=1/FAILURE Jun 16 01:13:44 prometheus systemd[1]: prometheus.service: State 'stop-sigterm' timed out. Skipping SIGKILL. Jun 16 01:14:02 prometheus prometheus[18869]: time="2022-06-16T01:14:02+02:00" level=info msg="Local storage stopped." source="storage.go:421" Jun 16 01:14:02 prometheus systemd[1]: prometheus.service: Unit entered failed state. Jun 16 01:14:02 prometheus systemd[1]: prometheus.service: Failed with result 'exit-code'. Jun 16 01:14:03 prometheus systemd[1]: prometheus.service: Service hold-off time over, scheduling restart. Jun 16 01:14:03 prometheus prometheus[20564]: time="2022-06-16T01:14:03+02:00" level=info msg="Starting prometheus (version=1.5.2+ds, branch=debian/sid, revision=1.5.2+ds-2+b3)" source="main.go:75" Jun 16 01:14:03 prometheus prometheus[20564]: time="2022-06-16T01:14:03+02:00" level=info msg="Build context (go=go1.7.4, user=pkg-go-maintainers#lists.alioth.debian.org, date=20170521-14:39:14)" source="main.go:76" Jun 16 01:14:03 prometheus prometheus[20564]: time="2022-06-16T01:14:03+02:00" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:248" Jun 16 01:14:04 prometheus prometheus[20564]: time="2022-06-16T01:14:04+02:00" level=info msg="Loading series map and head chunks..." source="storage.go:373" Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=info msg="364314 series loaded." source="storage.go:378" Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=info msg="Starting target manager..." source="targetmanager.go:61" Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=info msg="Listening on :9090" source="web.go:235" Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=warning msg="Storage has entered rushed mode." chunksToPersist=448681 maxChunksToPersist=524288 maxMemoryChunks=1048576 memoryChunks=687476 source="storage.go:1660" urgencyScore=0.8557910919189453 When restarted like so, Prometheus enters Recovery Mode which takes 1h 30 min to complete. When it's done, the logs show the following : Jun 16 16:10:42 prometheus prometheus[32708]: time="2022-06-16T16:10:42+02:00" level=info msg="Storage does not need throttling anymore." chunksToPersist=524288 maxChunksToPersist=524288 maxToleratedMemChunks=1153433 memoryChunks=1049320 source="storage.go:935" Jun 16 16:10:42 prometheus prometheus[32708]: time="2022-06-16T16:10:42+02:00" level=error msg="Storage needs throttling. Scrapes and rule evaluations will be skipped." chunksToPersist=525451 maxChunksToPersist=524288 maxToleratedMemChunks=1153433 memoryChunks=1050483 source="storage.go:927" Jun 16 16:15:31 prometheus prometheus[32708]: time="2022-06-16T16:15:31+02:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612" Jun 16 16:16:28 prometheus prometheus[32708]: time="2022-06-16T16:16:28+02:00" level=info msg="Done checkpointing in-memory metrics and chunks in 57.204367083s." source="persistence.go:639" The checkpointing is repeating often and takes about 1 min. The monitoring for this server show the following : Here are the flags used : /usr/bin/prometheus --storage.local.path /path/to/prometheus/metrics --storage.local.retention=1460h0m0s --storage.local.series-file-shrink-ratio=0.3 Prometheus version : prometheus --version prometheus, version 1.5.2+ds (branch: debian/sid, revision: 1.5.2+ds-2+b3) build user: pkg-go-maintainers#lists.alioth.debian.org build date: 20170521-14:39:14 go version: go1.7.4 I decided to move some metrics on another server so this one is not as loaded as before. However, this server does have to scrape the metrics for 50+ other servers. What could be the cause of this ?
Prometheus is crash looping when the pod recreates
We are Prometheus version 2.26.0 and kubernetes verion 1.21.7 in Azure. We mount the data in Azure storage NFS and it was working fine. From last few days Prometheus container is crash-looping and below are the logs level=info ts=2022-01-26T08:04:14.375Z caller=main.go:418 msg="Starting Prometheus" version="(version=2.26.0, branch=HEAD, revision=3cafc58827d1ebd1a67749f88be4218f0bab3d8d)" level=info ts=2022-01-26T08:04:14.375Z caller=main.go:423 build_context="(go=go1.16.2, user=root#a67cafebe6d0, date=20210331-11:56:23)" level=info ts=2022-01-26T08:04:14.375Z caller=main.go:424 host_details="(Linux 5.4.0-1065-azure #68~18.04.1-Ubuntu SMP Fri Dec 3 14:08:44 UTC 2021 x86_64 prometheus-6b9d9d54f4-nc45x (none))" level=info ts=2022-01-26T08:04:14.375Z caller=main.go:425 fd_limits="(soft=1048576, hard=1048576)" level=info ts=2022-01-26T08:04:14.375Z caller=main.go:426 vm_limits="(soft=unlimited, hard=unlimited)" level=info ts=2022-01-26T08:04:14.503Z caller=web.go:540 component=web msg="Start listening for connections" address=0.0.0.0:9090 level=info ts=2022-01-26T08:04:14.507Z caller=main.go:795 msg="Starting TSDB ..." level=info ts=2022-01-26T08:04:14.509Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false level=info ts=2022-01-26T08:04:14.560Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641478251052 maxt=1641513600000 ulid=01FRSEHC4YHV3N26JY5AMNZFRW level=info ts=2022-01-26T08:04:14.593Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641513600037 maxt=1641578400000 ulid=01FRVCAP2VJGDF0Z9CS24EXAJJ level=info ts=2022-01-26T08:04:14.624Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641578400038 maxt=1641643200000 ulid=01FRXA4AQHMHAEYWRKQFGP075M level=info ts=2022-01-26T08:04:14.651Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641643200422 maxt=1641708000000 ulid=01FRZ7XQQ4RA96DCPPBP22D71N level=info ts=2022-01-26T08:04:14.679Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641708000020 maxt=1641772800000 ulid=01FS15QDG6BS7H6M6Y09HG3E12 level=info ts=2022-01-26T08:04:14.707Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641772800011 maxt=1641837600000 ulid=01FS33GT38PRSB9VP56YFXT2M0 level=info ts=2022-01-26T08:04:14.736Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641963555381 maxt=1641967200000 ulid=01FS6MRNZEWT1Z6P697K09KHD7 level=info ts=2022-01-26T08:04:14.763Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641837600100 maxt=1641902400000 ulid=01FS6R88C70TCD8CYC4XJ95X23 level=info ts=2022-01-26T08:04:14.810Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641967200019 maxt=1642032000000 ulid=01FS8WXQP3YJ7EXBVNYBQG4DVY level=info ts=2022-01-26T08:04:14.836Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642032000072 maxt=1642096800000 ulid=01FSATQBR4XBQRDM72ATFS9PQ2 level=info ts=2022-01-26T08:04:14.863Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642096800059 maxt=1642161600000 ulid=01FSCRHE2YBDX7GPRPSH6BNGRX level=info ts=2022-01-26T08:04:14.895Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642161600091 maxt=1642226400000 ulid=01FSEPB1GPGAANVCQ2VKW9BQ4G level=info ts=2022-01-26T08:04:14.948Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642226400026 maxt=1642291200000 ulid=01FSGM4J0G1D0A6H1GD3N9C372 level=info ts=2022-01-26T08:04:14.973Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642291200005 maxt=1642356000000 ulid=01FSJHY6W0FRYDHCXBVB5XPFYG level=info ts=2022-01-26T08:04:15.002Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642356000027 maxt=1642420800000 ulid=01FSMFR96DASV6YPN66W7C86H9 level=info ts=2022-01-26T08:04:15.077Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642420800042 maxt=1642485600000 ulid=01FSPDHGWRT65D8CKWQ2JPRHW3 level=info ts=2022-01-26T08:04:15.105Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642485600006 maxt=1642550400000 ulid=01FSRBAVP2MW71H08F32D6HGB4 level=info ts=2022-01-26T08:04:15.130Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642550400028 maxt=1642615200000 ulid=01FST9482FD0Z3PHXHNW2W616E level=info ts=2022-01-26T08:04:15.157Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642680000018 maxt=1642687200000 ulid=01FSW00TJKJ7CGCQ7JJS3XQK8G level=info ts=2022-01-26T08:04:15.187Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642687200018 maxt=1642694400000 ulid=01FSW6WHTSEAXHWV5J7PQP94X7 level=info ts=2022-01-26T08:04:15.213Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642615200021 maxt=1642680000000 ulid=01FSW6XYH2Y429PG5YRM0K45XS level=info ts=2022-01-26T08:04:15.275Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642694400018 maxt=1642701600000 ulid=01FSWDR92Y7H302NDZRX1V2PX9 level=info ts=2022-01-26T08:04:21.840Z caller=head.go:696 component=tsdb msg="Replaying on-disk memory mappable chunks if any" level=info ts=2022-01-26T08:04:22.623Z caller=head.go:710 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=782.403397ms level=info ts=2022-01-26T08:04:22.623Z caller=head.go:716 component=tsdb msg="Replaying WAL, this may take a while" level=info ts=2022-01-26T08:04:34.169Z caller=head.go:742 component=tsdb msg="WAL checkpoint loaded" level=info ts=2022-01-26T08:04:38.895Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=299 maxSegment=7511 level=warn ts=2022-01-26T08:04:46.423Z caller=main.go:645 msg="Received SIGTERM, exiting gracefully..." level=info ts=2022-01-26T08:04:46.424Z caller=main.go:668 msg="Stopping scrape discovery manager..." level=info ts=2022-01-26T08:04:46.424Z caller=main.go:682 msg="Stopping notify discovery manager..." level=info ts=2022-01-26T08:04:46.424Z caller=main.go:704 msg="Stopping scrape manager..." level=info ts=2022-01-26T08:04:46.424Z caller=main.go:678 msg="Notify discovery manager stopped" level=info ts=2022-01-26T08:04:46.425Z caller=main.go:698 msg="Scrape manager stopped" level=info ts=2022-01-26T08:04:46.426Z caller=manager.go:934 component="rule manager" msg="Stopping rule manager..." level=info ts=2022-01-26T08:04:46.426Z caller=manager.go:944 component="rule manager" msg="Rule manager stopped" level=info ts=2022-01-26T08:04:46.426Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..." level=info ts=2022-01-26T08:04:46.426Z caller=main.go:872 msg="Notifier manager stopped" level=info ts=2022-01-26T08:04:46.426Z caller=main.go:664 msg="Scrape discovery manager stopped" level=info ts=2022-01-26T08:04:46.792Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=300 maxSegment=7511 level=info ts=2022-01-26T08:04:46.870Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=301 maxSegment=7511 level=info ts=2022-01-26T08:04:46.901Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=302 maxSegment=7511 level=info ts=2022-01-26T08:04:46.946Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=303 maxSegment=7511 level=info ts=2022-01-26T08:04:46.974Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=304 maxSegment=7511 level=info ts=2022-01-26T08:04:47.008Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=305 maxSegment=7511 level=info ts=2022-01-26T08:04:47.034Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=306 maxSegment=7511 level=info ts=2022-01-26T08:04:47.067Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=307 maxSegment=7511 level=info ts=2022-01-26T08:04:47.098Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=308 maxSegment=7511 level=info ts=2022-01-26T08:04:47.124Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=309 maxSegment=7511 level=info ts=2022-01-26T08:04:47.158Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=310 maxSegment=7511 level=info ts=2022-01-26T08:04:47.203Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=311 maxSegment=7511 level=info ts=2022-01-26T08:04:47.254Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=312 maxSegment=7511 level=info ts=2022-01-26T08:04:47.486Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=313 maxSegment=7511 level=info ts=2022-01-26T08:04:47.511Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=314 maxSegment=7511 level=info ts=2022-01-26T08:04:47.539Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=315 maxSegment=7511 level=info ts=2022-01-26T08:04:47.564Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=316 maxSegment=7511 . . . . . . . . . level=info ts=2022-01-26T08:05:15.161Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1401 maxSegment=7511 level=info ts=2022-01-26T08:05:15.182Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1402 maxSegment=7511 level=info ts=2022-01-26T08:05:15.205Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1403 maxSegment=7511 level=info ts=2022-01-26T08:05:15.229Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1404 maxSegment=7511 level=info ts=2022-01-26T08:05:15.251Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1405 maxSegment=7511 level=info ts=2022-01-26T08:05:15.274Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1406 maxSegment=7511 level=info ts=2022-01-26T08:05:15.297Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1407 maxSegment=7511 level=info ts=2022-01-26T08:05:15.323Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1408 maxSegment=7511 level=info ts=2022-01-26T08:05:15.349Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1409 maxSegment=7511 level=info ts=2022-01-26T08:05:15.372Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1410 maxSegment=7511 level=info ts=2022-01-26T08:05:15.426Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1411 maxSegment=7511 level=info ts=2022-01-26T08:05:15.452Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1412 maxSegment=7511 level=info ts=2022-01-26T08:05:15.475Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1413 maxSegment=7511 level=info ts=2022-01-26T08:05:15.498Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=1414 maxSegment=7511 rpc error: code = NotFound desc = an error occurred when try to find container "ae14079418f59b04bb80d8413e8fdc34f167bfe762317ef674e05466d34c9e1f": not found So I deleted the deployment and redeployed to the the same storage account, the I got new error level=info ts=2022-01-26T11:10:11.530Z caller=main.go:418 msg="Starting Prometheus" version="(version=2.26.0, branch=HEAD, revision=3cafc58827d1ebd1a67749f88be4218f0bab3d8d)" level=info ts=2022-01-26T11:10:11.534Z caller=main.go:423 build_context="(go=go1.16.2, user=root#a67cafebe6d0, date=20210331-11:56:23)" level=info ts=2022-01-26T11:10:11.535Z caller=main.go:424 host_details="(Linux 5.4.0-1064-azure #67~18.04.1-Ubuntu SMP Wed Nov 10 11:38:21 UTC 2021 x86_64 prometheus-6b9d9d54f4-wnmzh (none))" level=info ts=2022-01-26T11:10:11.536Z caller=main.go:425 fd_limits="(soft=1048576, hard=1048576)" level=info ts=2022-01-26T11:10:11.536Z caller=main.go:426 vm_limits="(soft=unlimited, hard=unlimited)" level=info ts=2022-01-26T11:10:14.168Z caller=web.go:540 component=web msg="Start listening for connections" address=0.0.0.0:9090 level=info ts=2022-01-26T11:10:15.385Z caller=main.go:795 msg="Starting TSDB ..." level=info ts=2022-01-26T11:10:16.022Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641837600024 maxt=1641902400000 ulid=01FS51ANKBFTVNRPZ68FGQQ5GA level=info ts=2022-01-26T11:10:16.309Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false level=info ts=2022-01-26T11:10:16.494Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641902400005 maxt=1641967200000 ulid=01FS6Z46FGXN932K7D39D9166D level=info ts=2022-01-26T11:10:16.806Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1641967200106 maxt=1642032000000 ulid=01FS8WXRJ7Q80FKD4C8EJNR0AD level=info ts=2022-01-26T11:10:17.011Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642032000003 maxt=1642096800000 ulid=01FSATQE1VMNR101KRW1X10Q75 level=info ts=2022-01-26T11:10:17.305Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642096800206 maxt=1642161600000 ulid=01FSCRGVT1E7562SF7EQN12JBM level=info ts=2022-01-26T11:10:18.240Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642161600059 maxt=1642226400000 ulid=01FSEPAFP2CX03ANRB7Q1AG514 level=info ts=2022-01-26T11:10:21.046Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642226400051 maxt=1642291200000 ulid=01FSGM3WT0TKR0XW9BD4QSKPQE level=info ts=2022-01-26T11:10:21.422Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642291200113 maxt=1642356000000 ulid=01FSJHXKHMANW0E6FXDXVM265G level=info ts=2022-01-26T11:10:22.822Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642356000032 maxt=1642420800000 ulid=01FSMFQ6XJ97VJFKNCYQBVB4DZ level=info ts=2022-01-26T11:10:23.536Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642420800021 maxt=1642485600000 ulid=01FSPDGM95FDDV2CDWX93BTDCS level=info ts=2022-01-26T11:10:23.880Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642485600072 maxt=1642550400000 ulid=01FSRBA555RWY4QNP4HD9YKRBM level=info ts=2022-01-26T11:10:25.021Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642550400031 maxt=1642615200000 ulid=01FST93N3C82K9VS20MKTMGGYC level=info ts=2022-01-26T11:10:25.713Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642615200014 maxt=1642680000000 ulid=01FSW6X95FRNSN1XJZ2YK0MXW7 level=info ts=2022-01-26T11:10:26.634Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642680000012 maxt=1642744800000 ulid=01FSY4PXA7V1XQHHA3MC35JSWQ level=info ts=2022-01-26T11:10:27.776Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642744800174 maxt=1642809600000 ulid=01FT02G9XGHPV8GME53ZPMYXE6 level=info ts=2022-01-26T11:10:28.760Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642809600070 maxt=1642874400000 ulid=01FT209WP8AXXVZB1NCSC55ACE level=info ts=2022-01-26T11:10:29.618Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642874400253 maxt=1642939200000 ulid=01FT3Y3A4H72FFW318RKHEXXGA level=info ts=2022-01-26T11:10:30.313Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1642939200047 maxt=1643004000000 ulid=01FT5VX3YC838QN5VQFAERV1QX level=info ts=2022-01-26T11:10:30.483Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1643004000040 maxt=1643068800000 ulid=01FT7SPHC5EV0SS1R0WT04H9FR level=info ts=2022-01-26T11:10:30.696Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1643068800035 maxt=1643133600000 ulid=01FT9QFZXBZ7EYY2CTE8WXZTB9 level=info ts=2022-01-26T11:10:31.838Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1643133600000 maxt=1643155200000 ulid=01FTA574G4M45WX97Z470DQF73 level=info ts=2022-01-26T11:10:33.686Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1643176800008 maxt=1643184000000 ulid=01FTASSZCG8V5N2VGAGFBYJBSR level=info ts=2022-01-26T11:10:36.078Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1643184000000 maxt=1643191200000 ulid=01FTB0NP47JW5JCF808QZZ8WZQ level=info ts=2022-01-26T11:10:36.442Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1643155200065 maxt=1643176800000 ulid=01FTB0P9H3H09B2ADD5X1RXFW6 level=info ts=2022-01-26T11:10:40.079Z caller=main.go:668 msg="Stopping scrape discovery manager..." level=info ts=2022-01-26T11:10:40.079Z caller=main.go:682 msg="Stopping notify discovery manager..." level=info ts=2022-01-26T11:10:40.079Z caller=main.go:704 msg="Stopping scrape manager..." level=info ts=2022-01-26T11:10:40.079Z caller=main.go:678 msg="Notify discovery manager stopped" level=info ts=2022-01-26T11:10:40.079Z caller=main.go:664 msg="Scrape discovery manager stopped" level=info ts=2022-01-26T11:10:40.079Z caller=main.go:698 msg="Scrape manager stopped" level=info ts=2022-01-26T11:10:40.080Z caller=manager.go:934 component="rule manager" msg="Stopping rule manager..." level=info ts=2022-01-26T11:10:40.080Z caller=manager.go:944 component="rule manager" msg="Rule manager stopped" level=info ts=2022-01-26T11:10:40.080Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..." level=info ts=2022-01-26T11:10:40.080Z caller=main.go:872 msg="Notifier manager stopped" level=error ts=2022-01-26T11:10:40.080Z caller=main.go:881 err="opening storage failed: lock DB directory: resource temporarily unavailable" The yaml is provided by the Istio .Below is deployment yaml file. apiVersion: apps/v1 kind: Deployment metadata: labels: component: "server" app: prometheus release: prometheus chart: prometheus-14.6.1 heritage: Helm name: prometheus namespace: istio-system spec: selector: matchLabels: component: "server" app: prometheus release: prometheus replicas: 1 template: metadata: labels: component: "server" app: prometheus release: prometheus chart: prometheus-14.6.1 heritage: Helm sidecar.istio.io/inject: "false" spec: enableServiceLinks: true serviceAccountName: prometheus containers: - name: prometheus-server-configmap-reload image: "jimmidyson/configmap-reload:v0.5.0" imagePullPolicy: "IfNotPresent" args: - --volume-dir=/etc/config - --webhook-url=http://127.0.0.1:9090/-/reload resources: {} volumeMounts: - name: config-volume mountPath: /etc/config readOnly: true - name: prometheus-server image: "prom/prometheus:v2.26.0" imagePullPolicy: "IfNotPresent" args: - --storage.tsdb.retention.time=15d - --config.file=/etc/config/prometheus.yml - --storage.tsdb.path=/data - --web.console.libraries=/etc/prometheus/console_libraries - --web.console.templates=/etc/prometheus/consoles - --web.enable-lifecycle ports: - containerPort: 9090 readinessProbe: httpGet: path: /-/ready port: 9090 initialDelaySeconds: 0 periodSeconds: 5 timeoutSeconds: 4 failureThreshold: 3 successThreshold: 1 livenessProbe: httpGet: path: /-/healthy port: 9090 initialDelaySeconds: 30 periodSeconds: 15 timeoutSeconds: 10 failureThreshold: 3 successThreshold: 1 resources: {} volumeMounts: - name: config-volume mountPath: /etc/config - name: azurefileshare mountPath: /data subPath: "" hostNetwork: false dnsPolicy: ClusterFirst securityContext: fsGroup: 65534 runAsGroup: 65534 runAsNonRoot: true runAsUser: 65534 terminationGracePeriodSeconds: 300 volumes: - name: config-volume configMap: name: prometheus - name: azurefileshare azureFile: secretName: log-storage-secret shareName: prometheusfileshare readOnly: false Expected Behavior When I mount the data to new container, It should load the data. Actual Behavior Not able to load the data or not able bind the data with newly created pod when old pod dies Help me out, to resolve the issue.
Thank You YwH for your suggestion, Posting this an answer so it can help other community member if they encounter the same issue in future. As stated in this document Istio provides a basic sample installation to quickly get Prometheus up and running: This is intended for demonstration only, and is not tuned for performance or security. Note : Isio configuration is well-suited for small clusters and monitoring for short time horizons, it is not suitable for large-scale meshes or monitoring over a period of days or weeks Solution : Prometheus is a stateful application, better deployed with a StatefulSet, not Deployment. StatefulSets are valuable for applications that require one or more of the following. Stable, persistent storage. Ordered, graceful deployment and scaling. You can use this Stateful code for deployment of prometheus container.
GitLab keeps loading and finally fails when deploying a dockerized node.js app
GitLab Job Log [0KRunning with gitlab-runner 13.2.0-rc2 (45f2b4ec) [0;m[0K on docker-auto-scale fa6cab46 [0;msection_start:1595233272:prepare_executor [0K[0K[36;1mPreparing the "docker+machine" executor[0;m [0;m[0KUsing Docker executor with image gitlab/dind:latest ... [0;m[0KStarting service docker:dind ... [0;m[0KPulling docker image docker:dind ... [0;m[0KUsing docker image sha256:d5d139be840a6ffa04348fc87740e8c095cade6e9cb977785fdba51e5fd7ffec for docker:dind ... [0;m[0KWaiting for services to be up and running... [0;m [0;33m*** WARNING:[0;m Service runner-fa6cab46-project-18378289-concurrent-0-31a688551619da9f-docker-0 probably didn't start properly. Health check error: service "runner-fa6cab46-project-18378289-concurrent-0-31a688551619da9f-docker-0-wait-for-service" timeout Health check container logs: Service container logs: 2020-07-20T08:21:19.734721788Z time="2020-07-20T08:21:19.734543379Z" level=info msg="Starting up" 2020-07-20T08:21:19.742928068Z time="2020-07-20T08:21:19.742802844Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found" 2020-07-20T08:21:19.743943014Z time="2020-07-20T08:21:19.743853574Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]" 2020-07-20T08:21:19.764021012Z time="2020-07-20T08:21:19.763898078Z" level=info msg="libcontainerd: started new containerd process" pid=23 2020-07-20T08:21:19.764159337Z time="2020-07-20T08:21:19.764107864Z" level=info msg="parsed scheme: \"unix\"" module=grpc 2020-07-20T08:21:19.764207629Z time="2020-07-20T08:21:19.764179926Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc 2020-07-20T08:21:19.764319635Z time="2020-07-20T08:21:19.764279612Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc 2020-07-20T08:21:19.764371375Z time="2020-07-20T08:21:19.764344798Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc 2020-07-20T08:21:19.969344247Z time="2020-07-20T08:21:19.969193121Z" level=info msg="starting containerd" revision=7ad184331fa3e55e52b890ea95e65ba581ae3429 version=v1.2.13 2020-07-20T08:21:19.969863044Z time="2020-07-20T08:21:19.969784495Z" level=info msg="loading plugin "io.containerd.content.v1.content"..." type=io.containerd.content.v1 2020-07-20T08:21:19.970042302Z time="2020-07-20T08:21:19.969997665Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." type=io.containerd.snapshotter.v1 2020-07-20T08:21:19.970399514Z time="2020-07-20T08:21:19.970336671Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" 2020-07-20T08:21:19.970474776Z time="2020-07-20T08:21:19.970428684Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.aufs"..." type=io.containerd.snapshotter.v1 2020-07-20T08:21:20.019585153Z time="2020-07-20T08:21:20.019421401Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="modprobe aufs failed: "ip: can't find device 'aufs'\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n": exit status 1" 2020-07-20T08:21:20.019709540Z time="2020-07-20T08:21:20.019668899Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.native"..." type=io.containerd.snapshotter.v1 2020-07-20T08:21:20.019934319Z time="2020-07-20T08:21:20.019887606Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1 2020-07-20T08:21:20.020299876Z time="2020-07-20T08:21:20.020218529Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1 2020-07-20T08:21:20.021038477Z time="2020-07-20T08:21:20.020887571Z" level=info msg="skip loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1 2020-07-20T08:21:20.021162370Z time="2020-07-20T08:21:20.021121663Z" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1 2020-07-20T08:21:20.021406797Z time="2020-07-20T08:21:20.021348536Z" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "ip: can't find device 'aufs'\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n": exit status 1" 2020-07-20T08:21:20.021487917Z time="2020-07-20T08:21:20.021435946Z" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" 2020-07-20T08:21:20.021581245Z time="2020-07-20T08:21:20.021533539Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" 2020-07-20T08:21:20.030531741Z time="2020-07-20T08:21:20.030427430Z" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." type=io.containerd.differ.v1 2020-07-20T08:21:20.030639854Z time="2020-07-20T08:21:20.030604536Z" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." type=io.containerd.gc.v1 2020-07-20T08:21:20.030779501Z time="2020-07-20T08:21:20.030736875Z" level=info msg="loading plugin "io.containerd.service.v1.containers-service"..." type=io.containerd.service.v1 2020-07-20T08:21:20.030865060Z time="2020-07-20T08:21:20.030833703Z" level=info msg="loading plugin "io.containerd.service.v1.content-service"..." type=io.containerd.service.v1 2020-07-20T08:21:20.030955439Z time="2020-07-20T08:21:20.030912981Z" level=info msg="loading plugin "io.containerd.service.v1.diff-service"..." type=io.containerd.service.v1 2020-07-20T08:21:20.031027842Z time="2020-07-20T08:21:20.030998003Z" level=info msg="loading plugin "io.containerd.service.v1.images-service"..." type=io.containerd.service.v1 2020-07-20T08:21:20.031132325Z time="2020-07-20T08:21:20.031083782Z" level=info msg="loading plugin "io.containerd.service.v1.leases-service"..." type=io.containerd.service.v1 2020-07-20T08:21:20.031202966Z time="2020-07-20T08:21:20.031174445Z" level=info msg="loading plugin "io.containerd.service.v1.namespaces-service"..." type=io.containerd.service.v1 2020-07-20T08:21:20.031286993Z time="2020-07-20T08:21:20.031253528Z" level=info msg="loading plugin "io.containerd.service.v1.snapshots-service"..." type=io.containerd.service.v1 2020-07-20T08:21:20.031370557Z time="2020-07-20T08:21:20.031312376Z" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." type=io.containerd.runtime.v1 2020-07-20T08:21:20.031709756Z time="2020-07-20T08:21:20.031650044Z" level=info msg="loading plugin "io.containerd.runtime.v2.task"..." type=io.containerd.runtime.v2 2020-07-20T08:21:20.031941868Z time="2020-07-20T08:21:20.031897088Z" level=info msg="loading plugin "io.containerd.monitor.v1.cgroups"..." type=io.containerd.monitor.v1 2020-07-20T08:21:20.032929781Z time="2020-07-20T08:21:20.032846588Z" level=info msg="loading plugin "io.containerd.service.v1.tasks-service"..." type=io.containerd.service.v1 2020-07-20T08:21:20.033064279Z time="2020-07-20T08:21:20.033014391Z" level=info msg="loading plugin "io.containerd.internal.v1.restart"..." type=io.containerd.internal.v1 2020-07-20T08:21:20.034207198Z time="2020-07-20T08:21:20.034120505Z" level=info msg="loading plugin "io.containerd.grpc.v1.containers"..." type=io.containerd.grpc.v1 2020-07-20T08:21:20.034316027Z time="2020-07-20T08:21:20.034279582Z" level=info msg="loading plugin "io.containerd.grpc.v1.content"..." type=io.containerd.grpc.v1 2020-07-20T08:21:20.034402334Z time="2020-07-20T08:21:20.034369239Z" level=info msg="loading plugin "io.containerd.grpc.v1.diff"..." type=io.containerd.grpc.v1 2020-07-20T08:21:20.034482782Z time="2020-07-20T08:21:20.034452282Z" level=info msg="loading plugin "io.containerd.grpc.v1.events"..." type=io.containerd.grpc.v1 2020-07-20T08:21:20.034564724Z time="2020-07-20T08:21:20.034533365Z" level=info msg="loading plugin "io.containerd.grpc.v1.healthcheck"..." type=io.containerd.grpc.v1 2020-07-20T08:21:20.034645756Z time="2020-07-20T08:21:20.034617060Z" level=info msg="loading plugin "io.containerd.grpc.v1.images"..." type=io.containerd.grpc.v1 2020-07-20T08:21:20.034722695Z time="2020-07-20T08:21:20.034689037Z" level=info msg="loading plugin "io.containerd.grpc.v1.leases"..." type=io.containerd.grpc.v1 2020-07-20T08:21:20.034800005Z time="2020-07-20T08:21:20.034770572Z" level=info msg="loading plugin "io.containerd.grpc.v1.namespaces"..." type=io.containerd.grpc.v1 2020-07-20T08:21:20.034873069Z time="2020-07-20T08:21:20.034837050Z" level=info msg="loading plugin "io.containerd.internal.v1.opt"..." type=io.containerd.internal.v1 2020-07-20T08:21:20.036608424Z time="2020-07-20T08:21:20.036525701Z" level=info msg="loading plugin "io.containerd.grpc.v1.snapshots"..." type=io.containerd.grpc.v1 2020-07-20T08:21:20.036722927Z time="2020-07-20T08:21:20.036684403Z" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." type=io.containerd.grpc.v1 2020-07-20T08:21:20.036799326Z time="2020-07-20T08:21:20.036769392Z" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." type=io.containerd.grpc.v1 2020-07-20T08:21:20.036876692Z time="2020-07-20T08:21:20.036844684Z" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." type=io.containerd.grpc.v1 2020-07-20T08:21:20.037291381Z time="2020-07-20T08:21:20.037244979Z" level=info msg=serving... address="/var/run/docker/containerd/containerd-debug.sock" 2020-07-20T08:21:20.037493736Z time="2020-07-20T08:21:20.037445814Z" level=info msg=serving... address="/var/run/docker/containerd/containerd.sock" 2020-07-20T08:21:20.037563487Z time="2020-07-20T08:21:20.037522305Z" level=info msg="containerd successfully booted in 0.069638s" 2020-07-20T08:21:20.087933162Z time="2020-07-20T08:21:20.087804902Z" level=info msg="Setting the storage driver from the $DOCKER_DRIVER environment variable (overlay2)" 2020-07-20T08:21:20.088415387Z time="2020-07-20T08:21:20.088327506Z" level=info msg="parsed scheme: \"unix\"" module=grpc 2020-07-20T08:21:20.088533804Z time="2020-07-20T08:21:20.088465157Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc 2020-07-20T08:21:20.088620947Z time="2020-07-20T08:21:20.088562235Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc 2020-07-20T08:21:20.088709546Z time="2020-07-20T08:21:20.088654016Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc 2020-07-20T08:21:20.092857445Z time="2020-07-20T08:21:20.092749940Z" level=info msg="parsed scheme: \"unix\"" module=grpc 2020-07-20T08:21:20.092962469Z time="2020-07-20T08:21:20.092914347Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc 2020-07-20T08:21:20.093060327Z time="2020-07-20T08:21:20.093013905Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc 2020-07-20T08:21:20.093142744Z time="2020-07-20T08:21:20.093102173Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc 2020-07-20T08:21:20.149109416Z time="2020-07-20T08:21:20.148965236Z" level=info msg="Loading containers: start." 2020-07-20T08:21:20.159351905Z time="2020-07-20T08:21:20.159146135Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: ip: can't find device 'bridge'\nbridge 167936 1 br_netfilter\nstp 16384 1 bridge\nllc 16384 2 bridge,stp\nip: can't find device 'br_netfilter'\nbr_netfilter 24576 0 \nbridge 167936 1 br_netfilter\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n, error: exit status 1" 2020-07-20T08:21:20.280536391Z time="2020-07-20T08:21:20.280402152Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.18.0.0/16. Daemon option --bip can be used to set a preferred IP address" 2020-07-20T08:21:20.337028532Z time="2020-07-20T08:21:20.336889956Z" level=info msg="Loading containers: done." 2020-07-20T08:21:20.435200532Z time="2020-07-20T08:21:20.435033092Z" level=info msg="Docker daemon" commit=48a66213fe graphdriver(s)=overlay2 version=19.03.12 2020-07-20T08:21:20.436386855Z time="2020-07-20T08:21:20.436266338Z" level=info msg="Daemon has completed initialization" 2020-07-20T08:21:20.476621441Z time="2020-07-20T08:21:20.475137317Z" level=info msg="API listen on [::]:2375" 2020-07-20T08:21:20.477679219Z time="2020-07-20T08:21:20.477535808Z" level=info msg="API listen on /var/run/docker.sock" [0;33m*********[0;m [0KPulling docker image gitlab/dind:latest ... [0;m[0KUsing docker image sha256:cc674e878f23bdc3c36cc37596d31adaa23bca0fc3ed18bea9b59abc638602e1 for gitlab/dind:latest ... [0;msection_end:1595233326:prepare_executor [0Ksection_start:1595233326:prepare_script [0K[0K[36;1mPreparing environment[0;m [0;mRunning on runner-fa6cab46-project-18378289-concurrent-0 via runner-fa6cab46-srm-1595233216-1bd30100... section_end:1595233330:prepare_script [0Ksection_start:1595233330:get_sources [0K[0K[36;1mGetting source from Git repository[0;m [0;m[32;1m$ eval "$CI_PRE_CLONE_SCRIPT"[0;m [32;1mFetching changes with git depth set to 50...[0;m Initialized empty Git repository in /builds/xxx.us/backend/.git/ [32;1mCreated fresh repository.[0;m [32;1mChecking out 257ffdf2 as stage...[0;m [32;1mSkipping Git submodules setup[0;m section_end:1595233333:get_sources [0Ksection_start:1595233333:restore_cache [0K[0K[36;1mRestoring cache[0;m [0;m[32;1mChecking cache for stage node:14.5.0-alpine-2...[0;m Downloading cache.zip from https://storage.googleapis.com/gitlab-com-runners-cache/project/18378289/stage%20node:14.5.0-alpine-2[0;m [32;1mSuccessfully extracted cache[0;m section_end:1595233334:restore_cache [0Ksection_start:1595233334:step_script [0K[0K[36;1mExecuting "step_script" stage of the job script[0;m [0;mln: failed to create symbolic link '/sys/fs/cgroup/systemd/name=systemd': Operation not permitted time="2020-07-20T08:22:14.844844859Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting -tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]" time="2020-07-20T08:22:14.846663310Z" level=info msg="libcontainerd: new containerd process, pid: 57" time="2020-07-20T08:22:14.906788853Z" level=info msg="Graph migration to content-addressability took 0.00 seconds" time="2020-07-20T08:22:14.907996055Z" level=info msg="Loading containers: start." time="2020-07-20T08:22:14.910877638Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.78-coreos/modules.dep.bin'\nmodprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.78-coreos/modules.dep.bin'\n, error: exit status 1" time="2020-07-20T08:22:14.912665866Z" level=warning msg="Running modprobe nf_nat failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.78-coreos/modules.dep.bin'`, error: exit status 1" time="2020-07-20T08:22:14.914201302Z" level=warning msg="Running modprobe xt_conntrack failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.78-coreos/modules.dep.bin'`, error: exit status 1" time="2020-07-20T08:22:14.989456423Z" level=warning msg="Could not load necessary modules for IPSEC rules: Running modprobe xfrm_user failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.78-coreos/modules.dep.bin'`, error: exit status 1" time="2020-07-20T08:22:14.990108153Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.18.0.0/16. Daemon option --bip can be used to set a preferred IP address" time="2020-07-20T08:22:15.029286773Z" level=info msg="Loading containers: done." time="2020-07-20T08:22:15.029664106Z" level=info msg="Daemon has completed initialization" time="2020-07-20T08:22:15.029823541Z" level=info msg="Docker daemon" commit=23cf638 graphdriver=overlay2 version=1.12.1 time="2020-07-20T08:22:15.048665494Z" level=info msg="API listen on /var/run/docker.sock" time="2020-07-20T08:22:15.049046558Z" level=info msg="API listen on [::]:7070" # Keeps loading and finally fails after a couple of seconds gitlab-ci.yml cache: key: '$CI_COMMIT_REF_NAME node:14.5.0-alpine' paths: - node_modules/ stages: - release - deploy variables: TAGGED_IMAGE: '$CI_REGISTRY_IMAGE:latest' .release: stage: release image: docker:19.03.12 services: - docker:dind variables: DOCKER_DRIVER: overlay2 DOCKER_BUILDKIT: 1 before_script: - docker version - docker info - echo "$CI_JOB_TOKEN" | docker login --username $CI_REGISTRY_USER --password-stdin $CI_REGISTRY script: - docker build --pull --tag $TAGGED_IMAGE --cache-from $TAGGED_IMAGE --build-arg NODE_ENV=$CI_ENVIRONMENT_NAME . - docker push $TAGGED_IMAGE after_script: - docker logout $CI_REGISTRY .deploy: stage: deploy image: gitlab/dind:latest services: - docker:dind variables: DOCKER_COMPOSE_PATH: '~/docker-composes/$CI_PROJECT_PATH/docker-compose.yml' before_script: - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client -y )' - eval $(ssh-agent -s) - echo "$DEPLOY_SERVER_PRIVATE_KEY" | tr -d '\r' | ssh-add - - mkdir -p ~/.ssh - chmod 700 ~/.ssh - ssh-keyscan $DEPLOYMENT_SERVER_IP >> ~/.ssh/known_hosts - chmod 644 ~/.ssh/known_hosts script: - rsync -avR --rsync-path="mkdir -p ~/docker-composes/$CI_PROJECT_PATH/; rsync" ./docker-compose.yml root#$DEPLOYMENT_SERVER_IP:~/docker-composes/$CI_PROJECT_PATH/ - ssh root#$DEPLOYMENT_SERVER_IP "echo "$CI_REGISTRY_PASSWORD" | docker login --username $CI_REGISTRY_USER --password-stdin $CI_REGISTRY; docker-compose -f $DOCKER_COMPOSE_PATH rm -f -s -v $CI_COMMIT_REF_NAME; docker pull $TAGGED_IMAGE; docker-compose -f $DOCKER_COMPOSE_PATH up -d $CI_COMMIT_REF_NAME;" release_stage: extends: .release only: - stage environment: name: staging deploy_stage: extends: .deploy only: - stage environment: name: staging Dockerfile ARG NODE_ENV FROM node:14.5.0-alpine ARG NODE_ENV ENV NODE_ENV ${NODE_ENV} # Set working directory WORKDIR /var/www/ # Install app dependencies COPY package.json package-lock.json ./ RUN npm ci --silent --only=production COPY . ./ # Start the application CMD [ "npm", "run", "start" ] docker-compose.yml version: '3.8' services: redis-stage: container_name: redis-stage image: redis:6.0.5-alpine ports: - '7075:6379' restart: always networks: - my-proxy-net stage: container_name: xxx-backend-stage image: registry.gitlab.com/xxx.us/backend:latest build: . expose: - '7070' restart: always networks: - my-proxy-net depends_on: - redis-stage environment: VIRTUAL_HOST: backend.xxx.us VIRTUAL_PROTO: https LETSENCRYPT_HOST: backend.xxx.us networks: my-proxy-net: external: name: mynetwork Update 1 I got a warning on the page claims I have used over 30% of my shared runner minutes. Maybe it is about not having enough minutes. Update 2 The release stage gets completed successfully. Update 3 Before I get into this problem, I deployed once successfully. I decided to test that commit once again and see if it succeeds, but it fails!
I fixed the issue. In my case, it was PORT (absolutely) and HOST (maybe) environment variables I defined manually in the GitLab CI/CD Variables section. It seems PORT and maybe HOST are two reserved environment variables for GitLab and/or Docker. By the way, I couldn't find anything in docs state not using those variables names.
monitoring cassandra with prometheus monitoring tool
My prometheus tool is on centos 7 machine and cassandra is on centos 6. I am trying to monitor cassandra JMX port 7199 with prometheus. I keep getting error with my yml file. Not sure why I am not able to connect to the centos 6 (cassandra machine) Is my YAML file wrong or does it have something to do with JMX port 7199? here is my YAML file: my global config global: scrape_interval: 15s scrape_configs: - job_name: cassandra static_configs: - targets: ['10.1.0.22:7199'] Here is my prometheus log: level=info ts=2017-12-08T04:30:53.92549611Z caller=main.go:215 msg="Starting Prometheus" version="(version=2.0.0, branch=HEAD, revision=0a74f98628a0463dddc90528220c94de5032d1a0)" level=info ts=2017-12-08T04:30:53.925623847Z caller=main.go:216 build_context="(go=go1.9.2, user=root#615b82cb36b6, date=20171108-07:11:59)" level=info ts=2017-12-08T04:30:53.92566228Z caller=main.go:217 host_details="(Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 localhost.localdomain (none))" level=info ts=2017-12-08T04:30:53.932807536Z caller=web.go:380 component=web msg="Start listening for connections" address=0.0.0.0:9090 level=info ts=2017-12-08T04:30:53.93303681Z caller=targetmanager.go:71 component="target manager" msg="Starting target manager..." level=info ts=2017-12-08T04:30:53.932905473Z caller=main.go:314 msg="Starting TSDB" level=info ts=2017-12-08T04:30:53.987468942Z caller=main.go:326 msg="TSDB started" level=info ts=2017-12-08T04:30:53.987582063Z caller=main.go:394 msg="Loading configuration file" filename=prometheus.yml level=info ts=2017-12-08T04:30:53.988366778Z caller=main.go:371 msg="Server is ready to receive requests." level=warn ts=2017-12-08T04:31:00.561007282Z caller=main.go:377 msg="Received SIGTERM, exiting gracefully..." level=info ts=2017-12-08T04:31:00.563191668Z caller=main.go:384 msg="See you next time!" level=info ts=2017-12-08T04:31:00.566231211Z caller=targetmanager.go:87 component="target manager" msg="Stopping target manager..." level=info ts=2017-12-08T04:31:00.567070099Z caller=targetmanager.go:99 component="target manager" msg="Target manager stopped" level=info ts=2017-12-08T04:31:00.567136027Z caller=manager.go:455 component="rule manager" msg="Stopping rule manager..." level=info ts=2017-12-08T04:31:00.567162215Z caller=manager.go:461 component="rule manager" msg="Rule manager stopped" level=info ts=2017-12-08T04:31:00.567186356Z caller=notifier.go:483 component=notifier msg="Stopping notification handler..." If anyone has instruction on how to connect prometheus to cassandra , both being on two different machines, that would be helpful too.
This is not a problem with your config, prometheus received a TERM signal and terminated gracefully. If you are not getting metrics, check whether 10.1.0.22:7199/metrics loads and returns metrics. You can also check the prometheus server's /targets endpoint for scraping status. If you're not getting anything on your cassandra server's /metrics endpoint, it could be because you did not configure the cassandra prometheus exporter properly.