Prometheus Execution Timeout Exceeded - linux

I use Grafana to monitor my company's infrastructure. Everything worked fine until this week, I started to see alerts on Grafana with an error message :
request handler error: Post "http://prometheus-ip:9090/api/v1/query_range": dial tcp prometheus-ip:9090: i/o timeout
I tried to restart the prometheus server but it seems that it can't be stopped. I have to kill -9 the server and restart it. Here's the log :
Jun 16 01:04:01 prometheus prometheus[18869]: time="2022-06-16T01:04:01+02:00" level=info msg="All requests for rebuilding the label indexes queued. (Actual processing may lag behind.)" source="crashrecovery.go:529"
Jun 16 01:04:01 prometheus prometheus[18869]: time="2022-06-16T01:04:01+02:00" level=info msg="Checkpointing fingerprint mappings..." source="persistence.go:1480"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="Done checkpointing fingerprint mappings in 286.224481ms." source="persistence.go:1503"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=warning msg="Crash recovery complete." source="crashrecovery.go:152"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="362306 series loaded." source="storage.go:378"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="Starting target manager..." source="targetmanager.go:61"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="Listening on :9090" source="web.go:235"
Jun 16 01:04:15 prometheus prometheus[18869]: time="2022-06-16T01:04:15+02:00" level=warning msg="Storage has entered rushed mode." chunksToPersist=420483 maxChunksToPersist=524288 maxMemoryChunks=1048576 memoryChunks=655877 source="storage.go:1660" urgencyScore=0.8020076751708984
Jun 16 01:09:02 prometheus prometheus[18869]: time="2022-06-16T01:09:02+02:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612"
Jun 16 01:10:05 prometheus prometheus[18869]: time="2022-06-16T01:10:05+02:00" level=info msg="Done checkpointing in-memory metrics and chunks in 1m3.127365726s." source="persistence.go:639"
Jun 16 01:12:25 prometheus prometheus[18869]: time="2022-06-16T01:12:25+02:00" level=warning msg="Received SIGTERM, exiting gracefully..." source="main.go:230"
Jun 16 01:12:25 prometheus prometheus[18869]: time="2022-06-16T01:12:25+02:00" level=info msg="See you next time!" source="main.go:237"
Jun 16 01:12:25 prometheus prometheus[18869]: time="2022-06-16T01:12:25+02:00" level=info msg="Stopping target manager..." source="targetmanager.go:75"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping rule manager..." source="manager.go:374"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Rule manager stopped." source="manager.go:380"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping notification handler..." source="notifier.go:369"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping local storage..." source="storage.go:396"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping maintenance loop..." source="storage.go:398"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Maintenance loop stopped." source="storage.go:1259"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping series quarantining..." source="storage.go:402"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Series quarantining stopped." source="storage.go:1701"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping chunk eviction..." source="storage.go:406"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Chunk eviction stopped." source="storage.go:1079"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612"
Jun 16 01:12:44 prometheus prometheus[18869]: time="2022-06-16T01:12:44+02:00" level=info msg="Done checkpointing in-memory metrics and chunks in 16.170119611s." source="persistence.go:639"
Jun 16 01:12:44 prometheus prometheus[18869]: time="2022-06-16T01:12:44+02:00" level=info msg="Checkpointing fingerprint mappings..." source="persistence.go:1480"
Jun 16 01:12:45 prometheus prometheus[18869]: time="2022-06-16T01:12:45+02:00" level=info msg="Done checkpointing fingerprint mappings in 651.409422ms." source="persistence.go:1503"
Jun 16 01:12:45 prometheus systemd[1]: prometheus.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
Jun 16 01:13:06 prometheus systemd[1]: prometheus.service: State 'stop-final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
Jun 16 01:13:06 prometheus systemd[1]: prometheus.service: Unit entered failed state.
Jun 16 01:13:06 prometheus systemd[1]: prometheus.service: Failed with result 'timeout'.
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=info msg="Starting prometheus (version=1.5.2+ds, branch=debian/sid, revision=1.5.2+ds-2+b3)" source="main.go:75"
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=info msg="Build context (go=go1.7.4, user=pkg-go-maintainers#lists.alioth.debian.org, date=20170521-14:39:14)" source="main.go:76"
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:248"
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=error msg="Could not lock /path/to/prometheus/metrics/DIRTY, Prometheus already running?" source="persistence.go:198"
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=error msg="Error opening memory series storage: resource temporarily unavailable" source="main.go:182"
Jun 16 01:13:24 prometheus systemd[1]: prometheus.service: Main process exited, code=exited, status=1/FAILURE
Jun 16 01:13:44 prometheus systemd[1]: prometheus.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
Jun 16 01:14:02 prometheus prometheus[18869]: time="2022-06-16T01:14:02+02:00" level=info msg="Local storage stopped." source="storage.go:421"
Jun 16 01:14:02 prometheus systemd[1]: prometheus.service: Unit entered failed state.
Jun 16 01:14:02 prometheus systemd[1]: prometheus.service: Failed with result 'exit-code'.
Jun 16 01:14:03 prometheus systemd[1]: prometheus.service: Service hold-off time over, scheduling restart.
Jun 16 01:14:03 prometheus prometheus[20564]: time="2022-06-16T01:14:03+02:00" level=info msg="Starting prometheus (version=1.5.2+ds, branch=debian/sid, revision=1.5.2+ds-2+b3)" source="main.go:75"
Jun 16 01:14:03 prometheus prometheus[20564]: time="2022-06-16T01:14:03+02:00" level=info msg="Build context (go=go1.7.4, user=pkg-go-maintainers#lists.alioth.debian.org, date=20170521-14:39:14)" source="main.go:76"
Jun 16 01:14:03 prometheus prometheus[20564]: time="2022-06-16T01:14:03+02:00" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:248"
Jun 16 01:14:04 prometheus prometheus[20564]: time="2022-06-16T01:14:04+02:00" level=info msg="Loading series map and head chunks..." source="storage.go:373"
Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=info msg="364314 series loaded." source="storage.go:378"
Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=info msg="Starting target manager..." source="targetmanager.go:61"
Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=info msg="Listening on :9090" source="web.go:235"
Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=warning msg="Storage has entered rushed mode." chunksToPersist=448681 maxChunksToPersist=524288 maxMemoryChunks=1048576 memoryChunks=687476 source="storage.go:1660" urgencyScore=0.8557910919189453
When restarted like so, Prometheus enters Recovery Mode which takes 1h 30 min to complete. When it's done, the logs show the following :
Jun 16 16:10:42 prometheus prometheus[32708]: time="2022-06-16T16:10:42+02:00" level=info msg="Storage does not need throttling anymore." chunksToPersist=524288 maxChunksToPersist=524288 maxToleratedMemChunks=1153433 memoryChunks=1049320 source="storage.go:935"
Jun 16 16:10:42 prometheus prometheus[32708]: time="2022-06-16T16:10:42+02:00" level=error msg="Storage needs throttling. Scrapes and rule evaluations will be skipped." chunksToPersist=525451 maxChunksToPersist=524288 maxToleratedMemChunks=1153433 memoryChunks=1050483 source="storage.go:927"
Jun 16 16:15:31 prometheus prometheus[32708]: time="2022-06-16T16:15:31+02:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612"
Jun 16 16:16:28 prometheus prometheus[32708]: time="2022-06-16T16:16:28+02:00" level=info msg="Done checkpointing in-memory metrics and chunks in 57.204367083s." source="persistence.go:639"
The checkpointing is repeating often and takes about 1 min.
The monitoring for this server show the following :
Here are the flags used :
/usr/bin/prometheus --storage.local.path /path/to/prometheus/metrics --storage.local.retention=1460h0m0s --storage.local.series-file-shrink-ratio=0.3
Prometheus version :
prometheus --version
prometheus, version 1.5.2+ds (branch: debian/sid, revision: 1.5.2+ds-2+b3)
build user: pkg-go-maintainers#lists.alioth.debian.org
build date: 20170521-14:39:14
go version: go1.7.4
I decided to move some metrics on another server so this one is not as loaded as before. However, this server does have to scrape the metrics for 50+ other servers. What could be the cause of this ?

Related

Doсker is only available after a reboot docker.socket

Based on Thinstation, I built an iso with a docker package. The problem is that after raising the service, the connection to the host over the network is available only if restart docker.socket. How can I diagnose docker to understand why by default, without restarting the service, there is no possibility of remote connection to the host over the network?
Here is the service log:
-- Logs begin at Sat 2022-06-04 06:44:32 PDT, end at Sat 2022-06-04 06:44:45 PDT. --
Jun 04 06:44:39 thinstation-linux systemd[1]: Starting Docker Application Container Engine...
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.146186290-07:00" level=info msg="Starting up"
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.147352611-07:00" level=info msg="libcontainerd: started new containerd process" pid=3130
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.147385993-07:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.147399661-07:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.147446573-07:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.147473357-07:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45-07:00" level=warning msg="deprecated version : `1`, please switch to version `2`"
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.173591406-07:00" level=info msg="starting containerd" revision=212e8b6fa2f44b9c21b2798135fc6fb7c53efc16 version=v1.6.4
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.199975908-07:00" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.200078325-07:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.aufs\"..." type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201561989-07:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.aufs\"..." error="aufs is not supported (modprobe aufs failed: exit status 1 \"modprobe: FATAL: Module aufs not found in directory /lib/modules/5.10.119TS\\n\"): skip plugin" type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201609700-07:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201738040-07:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs (rootfs) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201769872-07:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201791635-07:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201809462-07:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201863240-07:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202132975-07:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202304778-07:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202343950-07:00" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202388411-07:00" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202419790-07:00" level=info msg="metadata content store policy set" policy=shared
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202662008-07:00" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202696415-07:00" level=info msg="loading plugin \"io.containerd.event.v1.exchange\"..." type=io.containerd.event.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202717498-07:00" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202759330-07:00" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202783160-07:00" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202805065-07:00" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202840054-07:00" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202875755-07:00" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.203282487-07:00" level=info msg="loading plugin \"io.containerd.service.v1.leases-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.203309113-07:00" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.203330056-07:00" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.203351267-07:00" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.203471409-07:00" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.203573569-07:00" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205104698-07:00" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205208659-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205286555-07:00" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205403958-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205491828-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205568653-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205648226-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205719790-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205789802-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205858833-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205934557-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.206015063-07:00" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.206175458-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.209389255-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.209569204-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.209670052-07:00" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.209832522-07:00" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.209912037-07:00" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.210113358-07:00" level=error msg="failed to initialize a tracing processor \"otlp\"" error="no OpenTelemetry endpoint: skip plugin"
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.211156374-07:00" level=info msg=serving... address=/var/run/docker/containerd/containerd-debug.sock
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.211531426-07:00" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.211662344-07:00" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.211736920-07:00" level=info msg="containerd successfully booted in 0.038934s"
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.225142806-07:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.225280438-07:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.225384775-07:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.225519679-07:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.229535732-07:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.229567923-07:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.229606224-07:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.229623973-07:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.249833626-07:00" level=info msg="Loading containers: start."
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.366721376-07:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.470044352-07:00" level=info msg="Loading containers: done."
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.481283885-07:00" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.481634808-07:00" level=info msg="Docker daemon" commit=f756502 graphdriver(s)=overlay2 version=20.10.16
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.481778501-07:00" level=info msg="Daemon has completed initialization"
Jun 04 06:44:45 ts_080027b3b812 systemd[1]: Started Docker Application Container Engine.
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.512160761-07:00" level=info msg="API listen on /run/docker.sock"

Gitlab pipeline access to docker repo?

gitlab installed in linux machine and pipeline failed because it doesn't have access to docker reposatory
docker repo inside gitlab
with the below error
ERROR: Preparation failed: Error response from daemon: Get
https://docker.*****/v2/operator/kubectl/manifests/1.15: unauthorized: HTTP
Basic: Access denied (executor_docker.go:188:0s)
I found the issue in bridge as the below
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:dc:8d:c0:4b brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2020-07-21 11:54:45 UTC; 19min ago
Docs: https://docs.docker.com
Main PID: 4488 (dockerd)
Tasks: 37
CGroup: /system.slice/docker.service
└─4488 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Jul 21 11:54:44 agora dockerd[4488]: time="2020-07-21T11:54:44.168245639Z" level=warning msg="Your kernel does not support cgroup rt period"
Jul 21 11:54:44 agora dockerd[4488]: time="2020-07-21T11:54:44.168261691Z" level=warning msg="Your kernel does not support cgroup rt runtime"
Jul 21 11:54:44 agora dockerd[4488]: time="2020-07-21T11:54:44.168739695Z" level=info msg="Loading containers: start."
Jul 21 11:54:44 agora dockerd[4488]: time="2020-07-21T11:54:44.660694452Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/
Jul 21 11:54:44 agora dockerd[4488]: time="2020-07-21T11:54:44.843200704Z" level=info msg="Loading containers: done."
Jul 21 11:54:44 agora dockerd[4488]: time="2020-07-21T11:54:44.870880302Z" level=warning msg="failed to retrieve runc version: unknown output format: runc ver
Jul 21 11:54:45 agora dockerd[4488]: time="2020-07-21T11:54:45.887294851Z" level=info msg="Docker daemon" commit=2d0083d graphdriver(s)=overlay2 version=18.09
Jul 21 11:54:45 agora dockerd[4488]: time="2020-07-21T11:54:45.887403804Z" level=info msg="Daemon has completed initialization"
Jul 21 11:54:45 agora dockerd[4488]: time="2020-07-21T11:54:45.928154658Z" level=info msg="API listen on /var/run/docker.sock"
Jul 21 11:54:45 agora systemd[1]: Started Docker Application Container Engine.
lines 1-19/19 (END)...skipping...
You need to login to docker registry
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY

Prometheus won't start: could not use snapshotter btrfs

I have prometheus installed on an Amazon linux instance and here is the status of the service:
[ec2-user#ip-10-193-192-56 ~]$ sudo systemctl status prometheus
● prometheus.service - Prometheus Server
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Tue 2020-01-28 16:27:41 UTC; 16h ago
Docs: https://prometheus.io/docs/introduction/overview/
Main PID: 21129 (code=exited, status=1/FAILURE)
Jan 28 16:27:40 ip-10-193-192-56.service.essilor systemd[1]: prometheus.service: main process exited, code=exited, status=1/FAILURE
Jan 28 16:27:40 ip-10-193-192-56.service.essilor systemd[1]: Unit prometheus.service entered failed state.
Jan 28 16:27:40 ip-10-193-192-56.service.essilor systemd[1]: prometheus.service failed.
Jan 28 16:27:41 ip-10-193-192-56.service.essilor systemd[1]: prometheus.service holdoff time over, scheduling restart.
Jan 28 16:27:41 ip-10-193-192-56.service.essilor systemd[1]: start request repeated too quickly for prometheus.service
Jan 28 16:27:41 ip-10-193-192-56.service.essilor systemd[1]: Failed to start Prometheus Server.
Jan 28 16:27:41 ip-10-193-192-56.service.essilor systemd[1]: Unit prometheus.service entered failed state.
Jan 28 16:27:41 ip-10-193-192-56.service.essilor systemd[1]: prometheus.service failed.
[ec2-user#ip-10-193-192-56 ~]$
When I show the logs I get the following:
[ec2-user#ip-10-193-192-56 ~]$ sudo cat /var/log/messages | grep "error"
Jan 28 16:27:37 ip-10-193-192-56 prometheus: level=error ts=2020-01-28T16:27:37.650Z caller=main.go:736 err="opening storage failed: block dir: \"/app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB\": open /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB/meta.json: no such file or directory"
Jan 28 16:27:38 ip-10-193-192-56 prometheus: level=error ts=2020-01-28T16:27:38.716Z caller=main.go:736 err="opening storage failed: block dir: \"/app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB\": open /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB/meta.json: no such file or directory"
Jan 28 16:27:39 ip-10-193-192-56 prometheus: level=error ts=2020-01-28T16:27:39.340Z caller=main.go:736 err="opening storage failed: block dir: \"/app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB\": open /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB/meta.json: no such file or directory"
Jan 28 16:27:40 ip-10-193-192-56 prometheus: level=error ts=2020-01-28T16:27:40.142Z caller=main.go:736 err="opening storage failed: block dir: \"/app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB\": open /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB/meta.json: no such file or directory"
Jan 28 16:27:40 ip-10-193-192-56 prometheus: level=error ts=2020-01-28T16:27:40.946Z caller=main.go:736 err="opening storage failed: block dir: \"/app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB\": open /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB/meta.json: no such file or directory"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.134041576Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.136910879Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.14.77-81.59.amzn2.x86_64\n": exit status 1"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.138682614Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.zfs" error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.139296419Z" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.139604765Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.139907894Z" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.14.77-81.59.amzn2.x86_64\n": exit status 1"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.159112796Z" level=error msg="Failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
[ec2-user#ip-10-193-192-56 ~]$
I have many errors here and I don't know what to do much about them. I tried to delete the broken block /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB but it didn't change anything.
/app/prometheus/data is in an AWS EFS.
delete the directory /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB
and restart the server.

monitoring cassandra with prometheus monitoring tool

My prometheus tool is on centos 7 machine and cassandra is on centos 6. I am trying to monitor cassandra JMX port 7199 with prometheus. I keep getting error with my yml file. Not sure why I am not able to connect to the centos 6 (cassandra machine) Is my YAML file wrong or does it have something to do with JMX port 7199?
here is my YAML file:
my global config
global:
scrape_interval: 15s
scrape_configs:
- job_name: cassandra
static_configs:
- targets: ['10.1.0.22:7199']
Here is my prometheus log:
level=info ts=2017-12-08T04:30:53.92549611Z caller=main.go:215 msg="Starting Prometheus" version="(version=2.0.0, branch=HEAD, revision=0a74f98628a0463dddc90528220c94de5032d1a0)"
level=info ts=2017-12-08T04:30:53.925623847Z caller=main.go:216 build_context="(go=go1.9.2, user=root#615b82cb36b6, date=20171108-07:11:59)"
level=info ts=2017-12-08T04:30:53.92566228Z caller=main.go:217 host_details="(Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 localhost.localdomain (none))"
level=info ts=2017-12-08T04:30:53.932807536Z caller=web.go:380 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2017-12-08T04:30:53.93303681Z caller=targetmanager.go:71 component="target manager" msg="Starting target manager..."
level=info ts=2017-12-08T04:30:53.932905473Z caller=main.go:314 msg="Starting TSDB"
level=info ts=2017-12-08T04:30:53.987468942Z caller=main.go:326 msg="TSDB started"
level=info ts=2017-12-08T04:30:53.987582063Z caller=main.go:394 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2017-12-08T04:30:53.988366778Z caller=main.go:371 msg="Server is ready to receive requests."
level=warn ts=2017-12-08T04:31:00.561007282Z caller=main.go:377 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2017-12-08T04:31:00.563191668Z caller=main.go:384 msg="See you next time!"
level=info ts=2017-12-08T04:31:00.566231211Z caller=targetmanager.go:87 component="target manager" msg="Stopping target manager..."
level=info ts=2017-12-08T04:31:00.567070099Z caller=targetmanager.go:99 component="target manager" msg="Target manager stopped"
level=info ts=2017-12-08T04:31:00.567136027Z caller=manager.go:455 component="rule manager" msg="Stopping rule manager..."
level=info ts=2017-12-08T04:31:00.567162215Z caller=manager.go:461 component="rule manager" msg="Rule manager stopped"
level=info ts=2017-12-08T04:31:00.567186356Z caller=notifier.go:483 component=notifier msg="Stopping notification handler..."
If anyone has instruction on how to connect prometheus to cassandra , both being on two different machines, that would be helpful too.
This is not a problem with your config, prometheus received a TERM signal and terminated gracefully.
If you are not getting metrics, check whether 10.1.0.22:7199/metrics loads and returns metrics. You can also check the prometheus server's /targets endpoint for scraping status.
If you're not getting anything on your cassandra server's /metrics endpoint, it could be because you did not configure the cassandra prometheus exporter properly.

docker exec: rpc error: code = 2 desc = oci runtime error: exec failed

every time I try to do:
$ docker exec
I get the error message:
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 16\""
Session 1 (works like expected):
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
alpine latest baa5d63471ea 7 weeks ago 4.8 MB
hello-world latest c54a2cc56cbb 5 months ago 1.85 kB
$ docker run --rm --name alpine -it alpine sh
/ # pwd
/
Session 2:
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7bd39b37aee2 alpine "sh" 22 seconds ago Up 21 seconds alpine
$ docker exec -it alpine sh
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 16\""
$ docker exec -it 7bd39b37aee2 sh
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 16\""
/var/log/syslog shows some warnings, but I was neither able to understand the root cause not finding matching answers.
Thanks for any hint.
= = = = = = = = = = = = = = = = = = = = = = = = =
$ docker info
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 2
Server Version: 1.13.0-rc3
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 4
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 51371867a01c467f08af739783b8beafc154c4d7
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-53-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.487 GiB
Name: pb7tt6ts
ID: YQ4G:ETTP:5VCM:PAJD:F3KB:O7JN:AZOF:VLTI:SKH4:BTSR:KP7D:NXIZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
= = =
/var/log/syslog docker restart and steps above
= = =
Dec 13 14:28:09 pb7tt6ts systemd[1]: Stopping Docker Socket for the API.
Dec 13 14:28:09 pb7tt6ts systemd[1]: Starting Docker Socket for the API.
Dec 13 14:28:09 pb7tt6ts systemd[1]: Listening on Docker Socket for the API.
Dec 13 14:28:09 pb7tt6ts systemd[1]: Starting Docker Application Container Engine...
Dec 13 14:28:09 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:09.291301057+01:00" level=info msg="libcontainerd: new containerd process, pid: 1448"
Dec 13 14:28:10 pb7tt6ts kernel: [25908.125394] audit: type=1400 audit(1481635690.357:28): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="docker-default" pid=1466 comm="apparmor_parser"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.370364923+01:00" level=info msg="[graphdriver] using prior storage driver: aufs"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.387915069+01:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.388367650+01:00" level=warning msg="Your kernel does not support swap memory limit."
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.388465142+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.388508739+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.389419384+01:00" level=info msg="Loading containers: start."
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.397339748+01:00" level=info msg="Firewalld running: false"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.628011070+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.743703578+01:00" level=info msg="Loading containers: done."
Dec 13 14:28:10 pb7tt6ts kernel: [25908.510718] aufs au_opts_verify:1597:dockerd[1462]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.808510166+01:00" level=info msg="Daemon has completed initialization"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.808575966+01:00" level=info msg="Docker daemon" commit=4d92237 graphdriver=aufs version=1.13.0-rc3
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.820562161+01:00" level=info msg="API listen on /var/run/docker.sock"
Dec 13 14:28:10 pb7tt6ts systemd[1]: Started Docker Application Container Engine.
Dec 13 14:28:10 pb7tt6ts console-kit-daemon[3106]: console-kit-daemon[3106]: GLib-CRITICAL: Source ID 226 was not found when attempting to remove it
Dec 13 14:28:10 pb7tt6ts console-kit-daemon[3106]: GLib-CRITICAL: Source ID 226 was not found when attempting to remove it
Dec 13 14:28:16 pb7tt6ts kernel: [25914.206672] aufs au_opts_verify:1597:dockerd[1460]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 14:28:16 pb7tt6ts kernel: [25914.388393] aufs au_opts_verify:1597:dockerd[1460]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 14:28:16 pb7tt6ts kernel: [25914.492197] aufs au_opts_verify:1597:dockerd[1460]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <warn> [1481635696.7320] device (vethff6f844): failed to find device 35 'vethff6f844' with udev
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <info> [1481635696.7340] manager: (vethff6f844): new Veth device (/org/freedesktop/NetworkManager/Devices/46)
Dec 13 14:28:16 pb7tt6ts systemd-udevd[1614]: Could not generate persistent MAC address for vethff6f844: No such file or directory
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <warn> [1481635696.7345] device (veth13c2a1d): failed to find device 36 'veth13c2a1d' with udev
Dec 13 14:28:16 pb7tt6ts systemd-udevd[1615]: Could not generate persistent MAC address for veth13c2a1d: No such file or directory
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <info> [1481635696.7417] manager: (veth13c2a1d): new Veth device (/org/freedesktop/NetworkManager/Devices/47)
Dec 13 14:28:16 pb7tt6ts kernel: [25914.509027] device veth13c2a1d entered promiscuous mode
Dec 13 14:28:16 pb7tt6ts kernel: [25914.509240] IPv6: ADDRCONF(NETDEV_UP): veth13c2a1d: link is not ready
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <info> [1481635696.7632] devices added (path: /sys/devices/virtual/net/vethff6f844, iface: vethff6f844)
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <info> [1481635696.7632] device added (path: /sys/devices/virtual/net/vethff6f844, iface: vethff6f844): no ifupdown configuration found.
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <info> [1481635696.7639] devices added (path: /sys/devices/virtual/net/veth13c2a1d, iface: veth13c2a1d)
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <info> [1481635696.7640] device added (path: /sys/devices/virtual/net/veth13c2a1d, iface: veth13c2a1d): no ifupdown configuration found.
Dec 13 14:28:16 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:16.965015836+01:00" level=warning msg="Your kernel does not support swap memory limit."
Dec 13 14:28:16 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:16.965090775+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Dec 13 14:28:16 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:16.965117179+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Dec 13 14:28:17 pb7tt6ts kernel: [25914.808163] eth0: renamed from vethff6f844
Dec 13 14:28:17 pb7tt6ts acvpnagent[2339]: Function: tableCallbackHandler File: RouteMgr.cpp Line: 1723 Invoked Function: recv Return Code: 11 (0x0000000B) Description: unknown
Dec 13 14:28:17 pb7tt6ts NetworkManager[1343]: <info> [1481635697.0599] devices removed (path: /sys/devices/virtual/net/vethff6f844, iface: vethff6f844)
Dec 13 14:28:17 pb7tt6ts acvpnagent[2339]: A new network interface has been detected.
Dec 13 14:28:17 pb7tt6ts NetworkManager[1343]: <info> [1481635697.0600] device (vethff6f844): driver 'veth' does not support carrier detection.
Dec 13 14:28:17 pb7tt6ts acvpnagent[2339]: Function: logInterfaces File: RouteMgr.cpp Line: 2105 Invoked Function: logInterfaces Return Code: 0 (0x00000000) Description: IP Address Interface List: 192.168.178.24 172.17.0.1 9.145.68.34 FE80:0:0:0:D8B4:C1E0:F8E4:DB77 FE80:0:0:0:42:44FF:FEC9:5D85 FE80:0:0:0:60A9:A1FF:FEED:F31C
Dec 13 14:28:17 pb7tt6ts NetworkManager[1343]: <info> [1481635697.0604] device (veth13c2a1d): link connected
Dec 13 14:28:17 pb7tt6ts NetworkManager[1343]: <info> [1481635697.0605] device (docker0): link connected
Dec 13 14:28:17 pb7tt6ts kernel: [25914.823988] IPv6: ADDRCONF(NETDEV_CHANGE): veth13c2a1d: link becomes ready
Dec 13 14:28:17 pb7tt6ts kernel: [25914.824039] docker0: port 1(veth13c2a1d) entered forwarding state
Dec 13 14:28:17 pb7tt6ts kernel: [25914.824061] docker0: port 1(veth13c2a1d) entered forwarding state
Dec 13 14:28:18 pb7tt6ts acvpnagent[2339]: Function: tableCallbackHandler File: RouteMgr.cpp Line: 1723 Invoked Function: recv Return Code: 11 (0x0000000B) Description: unknown
Dec 13 14:28:18 pb7tt6ts avahi-daemon[1217]: Joining mDNS multicast group on interface veth13c2a1d.IPv6 with address fe80::60a9:a1ff:feed:f31c.
Dec 13 14:28:18 pb7tt6ts avahi-daemon[1217]: New relevant interface veth13c2a1d.IPv6 for mDNS.
Dec 13 14:28:18 pb7tt6ts avahi-daemon[1217]: Registering new address record for fe80::60a9:a1ff:feed:f31c on veth13c2a1d.*.
Dec 13 14:28:32 pb7tt6ts kernel: [25929.850840] docker0: port 1(veth13c2a1d) entered forwarding state
Dec 13 14:28:36 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:36.704565159+01:00" level=error msg="Error running exec in container: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"process_linux.go:83: executing setns process caused \\\"exit status 16\\\"\"\n"
Dec 13 14:28:36 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:36.705362948+01:00" level=error msg="Handler for POST /v1.25/exec/8a78f29ef71d4c3ab982a8dd7a4a325e280766072dea7337860874a72c42f42c/resize returned error: rpc error: code = 2 desc = containerd: process not found for container"
Dec 13 14:28:46 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:46.921880770+01:00" level=error msg="Error running exec in container: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"process_linux.go:83: executing setns process caused \\\"exit status 16\\\"\"\n"
Dec 13 14:28:46 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:46.922576933+01:00" level=error msg="Handler for POST /v1.25/exec/5ad25668cac553118b8c702f02c69b427436eb67d1488d4170641bcacfdad50b/resize returned error: rpc error: code = 2 desc = containerd: process not found for container"
As recommended I reverted to a main version of docker and installed docker-engine 1.12.4
$ docker info
Containers: 2
Running: 1
Paused: 0
Stopped: 1
Images: 3
Server Version: 1.12.4
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 11
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host bridge null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-53-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.487 GiB
Name: pb7tt6ts
ID: YQ4G:ETTP:5VCM:PAJD:F3KB:O7JN:AZOF:VLTI:SKH4:BTSR:KP7D:NXIZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
Furthermore, no success but different error:
$ docker exec -it alpine sh
rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"process_linux.go:83: executing setns process caused \\\"exit status 17\\\"\"\n"
Corresponding /var/log/syslog from service docker start (21:00), docker run ... (21:01), docker exec ... (21:01)
Dec 13 21:00:01 pb7tt6ts systemd[1]: Starting Docker Socket for the API.
Dec 13 21:00:01 pb7tt6ts systemd[1]: Listening on Docker Socket for the API.
Dec 13 21:00:01 pb7tt6ts systemd[1]: Starting Docker Application Container Engine...
Dec 13 21:00:01 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:01.468921183+01:00" level=info msg="libcontainerd: new containerd process, pid: 8686"
Dec 13 21:00:02 pb7tt6ts kernel: [49419.124965] audit: type=1400 audit(1481659202.536:37): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="docker-default" pid=8700 comm="apparmor_parser"
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.550070413+01:00" level=info msg="[graphdriver] using prior storage driver \"aufs\""
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.572067603+01:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.572336166+01:00" level=warning msg="Your kernel does not support swap memory limit."
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.572799562+01:00" level=info msg="Loading containers: start."
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.579465999+01:00" level=info msg="Firewalld running: false"
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.779165187+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.903085523+01:00" level=info msg="Loading containers: done."
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.903179108+01:00" level=info msg="Daemon has completed initialization"
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.903208197+01:00" level=info msg="Docker daemon" commit=1564f02 graphdriver=aufs version=1.12.4
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.923282443+01:00" level=info msg="API listen on /var/run/docker.sock"
Dec 13 21:00:02 pb7tt6ts systemd[1]: Started Docker Application Container Engine.
Dec 13 21:01:01 pb7tt6ts kernel: [49477.834789] aufs au_opts_verify:1597:dockerd[8692]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 21:01:01 pb7tt6ts kernel: [49477.896566] aufs au_opts_verify:1597:dockerd[8692]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 21:01:01 pb7tt6ts kernel: [49478.080340] aufs au_opts_verify:1597:dockerd[8692]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 21:01:01 pb7tt6ts kernel: [49478.192100] aufs au_opts_verify:1597:dockerd[8682]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <warn> [1481659261.6125] device (veth2b5b07c): failed to find device 47 'veth2b5b07c' with udev
Dec 13 21:01:01 pb7tt6ts systemd-udevd[8810]: Could not generate persistent MAC address for vethc2e4873: No such file or directory
Dec 13 21:01:01 pb7tt6ts kernel: [49478.196917] device vethc2e4873 entered promiscuous mode
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.6215] manager: (veth2b5b07c): new Veth device (/org/freedesktop/NetworkManager/Devices/63)
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <warn> [1481659261.6222] device (vethc2e4873): failed to find device 48 'vethc2e4873' with udev
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.6241] manager: (vethc2e4873): new Veth device (/org/freedesktop/NetworkManager/Devices/64)
Dec 13 21:01:01 pb7tt6ts systemd-udevd[8809]: Could not generate persistent MAC address for veth2b5b07c: No such file or directory
Dec 13 21:01:01 pb7tt6ts kernel: [49478.211913] IPv6: ADDRCONF(NETDEV_UP): vethc2e4873: link is not ready
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.6454] devices added (path: /sys/devices/virtual/net/veth2b5b07c, iface: veth2b5b07c)
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.6454] device added (path: /sys/devices/virtual/net/veth2b5b07c, iface: veth2b5b07c): no ifupdown configuration found.
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.6507] devices added (path: /sys/devices/virtual/net/vethc2e4873, iface: vethc2e4873)
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.6507] device added (path: /sys/devices/virtual/net/vethc2e4873, iface: vethc2e4873): no ifupdown configuration found.
Dec 13 21:01:01 pb7tt6ts kernel: [49478.557310] eth0: renamed from veth2b5b07c
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.9915] devices removed (path: /sys/devices/virtual/net/veth2b5b07c, iface: veth2b5b07c)
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.9916] device (veth2b5b07c): driver 'veth' does not support carrier detection.
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.9919] device (vethc2e4873): link connected
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.9937] device (docker0): link connected
Dec 13 21:01:01 pb7tt6ts kernel: [49478.573434] IPv6: ADDRCONF(NETDEV_CHANGE): vethc2e4873: link becomes ready
Dec 13 21:01:01 pb7tt6ts kernel: [49478.573503] docker0: port 1(vethc2e4873) entered forwarding state
Dec 13 21:01:01 pb7tt6ts kernel: [49478.573527] docker0: port 1(vethc2e4873) entered forwarding state
Dec 13 21:01:03 pb7tt6ts avahi-daemon[1217]: Joining mDNS multicast group on interface vethc2e4873.IPv6 with address fe80::d02a:ecff:fea8:662c.
Dec 13 21:01:03 pb7tt6ts avahi-daemon[1217]: New relevant interface vethc2e4873.IPv6 for mDNS.
Dec 13 21:01:03 pb7tt6ts avahi-daemon[1217]: Registering new address record for fe80::d02a:ecff:fea8:662c on vethc2e4873.*.
Dec 13 21:01:17 pb7tt6ts kernel: [49493.628038] docker0: port 1(vethc2e4873) entered forwarding state
Dec 13 21:02:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:02:02.072027206+01:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = invalid header field value \"oci runtime error: exec failed: container_linux.go:247: starting container process caused \\\"process_linux.go:83: executing setns process caused \\\\\\\"exit status 17\\\\\\\"\\\"\\n\""
Dec 13 21:02:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:02:02.072759152+01:00" level=error msg="Handler for POST /v1.24/exec/00c0dcac7a178129a17cd9eb833d154d428f2a6efbcd0f421ab3c5c54e52a236/resize returned error: rpc error: code = 2 desc = containerd: process not found for container"
From the linked issue is this comment which appears to be the root cause:
I think I found the root reason. It's nothing to do with Docker.
Actually docker exec always fail because of Symantec AutoProtect
running on my system. It loads a custom kernel module that add some
file operation hooks, which affects the result of setns.
$ lsmod | grep symev
symev_custom_dkms_x86_64 72166 2 symap_custom_dkms_x86_64
The workaround is to disable Symantec AutoProtect and reboot.
sudo update-rc.d autoprotect disable

Resources