I use Grafana to monitor my company's infrastructure. Everything worked fine until this week, I started to see alerts on Grafana with an error message :
request handler error: Post "http://prometheus-ip:9090/api/v1/query_range": dial tcp prometheus-ip:9090: i/o timeout
I tried to restart the prometheus server but it seems that it can't be stopped. I have to kill -9 the server and restart it. Here's the log :
Jun 16 01:04:01 prometheus prometheus[18869]: time="2022-06-16T01:04:01+02:00" level=info msg="All requests for rebuilding the label indexes queued. (Actual processing may lag behind.)" source="crashrecovery.go:529"
Jun 16 01:04:01 prometheus prometheus[18869]: time="2022-06-16T01:04:01+02:00" level=info msg="Checkpointing fingerprint mappings..." source="persistence.go:1480"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="Done checkpointing fingerprint mappings in 286.224481ms." source="persistence.go:1503"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=warning msg="Crash recovery complete." source="crashrecovery.go:152"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="362306 series loaded." source="storage.go:378"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="Starting target manager..." source="targetmanager.go:61"
Jun 16 01:04:02 prometheus prometheus[18869]: time="2022-06-16T01:04:02+02:00" level=info msg="Listening on :9090" source="web.go:235"
Jun 16 01:04:15 prometheus prometheus[18869]: time="2022-06-16T01:04:15+02:00" level=warning msg="Storage has entered rushed mode." chunksToPersist=420483 maxChunksToPersist=524288 maxMemoryChunks=1048576 memoryChunks=655877 source="storage.go:1660" urgencyScore=0.8020076751708984
Jun 16 01:09:02 prometheus prometheus[18869]: time="2022-06-16T01:09:02+02:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612"
Jun 16 01:10:05 prometheus prometheus[18869]: time="2022-06-16T01:10:05+02:00" level=info msg="Done checkpointing in-memory metrics and chunks in 1m3.127365726s." source="persistence.go:639"
Jun 16 01:12:25 prometheus prometheus[18869]: time="2022-06-16T01:12:25+02:00" level=warning msg="Received SIGTERM, exiting gracefully..." source="main.go:230"
Jun 16 01:12:25 prometheus prometheus[18869]: time="2022-06-16T01:12:25+02:00" level=info msg="See you next time!" source="main.go:237"
Jun 16 01:12:25 prometheus prometheus[18869]: time="2022-06-16T01:12:25+02:00" level=info msg="Stopping target manager..." source="targetmanager.go:75"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping rule manager..." source="manager.go:374"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Rule manager stopped." source="manager.go:380"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping notification handler..." source="notifier.go:369"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping local storage..." source="storage.go:396"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping maintenance loop..." source="storage.go:398"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Maintenance loop stopped." source="storage.go:1259"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping series quarantining..." source="storage.go:402"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Series quarantining stopped." source="storage.go:1701"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Stopping chunk eviction..." source="storage.go:406"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Chunk eviction stopped." source="storage.go:1079"
Jun 16 01:12:28 prometheus prometheus[18869]: time="2022-06-16T01:12:28+02:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612"
Jun 16 01:12:44 prometheus prometheus[18869]: time="2022-06-16T01:12:44+02:00" level=info msg="Done checkpointing in-memory metrics and chunks in 16.170119611s." source="persistence.go:639"
Jun 16 01:12:44 prometheus prometheus[18869]: time="2022-06-16T01:12:44+02:00" level=info msg="Checkpointing fingerprint mappings..." source="persistence.go:1480"
Jun 16 01:12:45 prometheus prometheus[18869]: time="2022-06-16T01:12:45+02:00" level=info msg="Done checkpointing fingerprint mappings in 651.409422ms." source="persistence.go:1503"
Jun 16 01:12:45 prometheus systemd[1]: prometheus.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
Jun 16 01:13:06 prometheus systemd[1]: prometheus.service: State 'stop-final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
Jun 16 01:13:06 prometheus systemd[1]: prometheus.service: Unit entered failed state.
Jun 16 01:13:06 prometheus systemd[1]: prometheus.service: Failed with result 'timeout'.
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=info msg="Starting prometheus (version=1.5.2+ds, branch=debian/sid, revision=1.5.2+ds-2+b3)" source="main.go:75"
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=info msg="Build context (go=go1.7.4, user=pkg-go-maintainers#lists.alioth.debian.org, date=20170521-14:39:14)" source="main.go:76"
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:248"
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=error msg="Could not lock /path/to/prometheus/metrics/DIRTY, Prometheus already running?" source="persistence.go:198"
Jun 16 01:13:24 prometheus prometheus[20547]: time="2022-06-16T01:13:24+02:00" level=error msg="Error opening memory series storage: resource temporarily unavailable" source="main.go:182"
Jun 16 01:13:24 prometheus systemd[1]: prometheus.service: Main process exited, code=exited, status=1/FAILURE
Jun 16 01:13:44 prometheus systemd[1]: prometheus.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
Jun 16 01:14:02 prometheus prometheus[18869]: time="2022-06-16T01:14:02+02:00" level=info msg="Local storage stopped." source="storage.go:421"
Jun 16 01:14:02 prometheus systemd[1]: prometheus.service: Unit entered failed state.
Jun 16 01:14:02 prometheus systemd[1]: prometheus.service: Failed with result 'exit-code'.
Jun 16 01:14:03 prometheus systemd[1]: prometheus.service: Service hold-off time over, scheduling restart.
Jun 16 01:14:03 prometheus prometheus[20564]: time="2022-06-16T01:14:03+02:00" level=info msg="Starting prometheus (version=1.5.2+ds, branch=debian/sid, revision=1.5.2+ds-2+b3)" source="main.go:75"
Jun 16 01:14:03 prometheus prometheus[20564]: time="2022-06-16T01:14:03+02:00" level=info msg="Build context (go=go1.7.4, user=pkg-go-maintainers#lists.alioth.debian.org, date=20170521-14:39:14)" source="main.go:76"
Jun 16 01:14:03 prometheus prometheus[20564]: time="2022-06-16T01:14:03+02:00" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:248"
Jun 16 01:14:04 prometheus prometheus[20564]: time="2022-06-16T01:14:04+02:00" level=info msg="Loading series map and head chunks..." source="storage.go:373"
Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=info msg="364314 series loaded." source="storage.go:378"
Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=info msg="Starting target manager..." source="targetmanager.go:61"
Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=info msg="Listening on :9090" source="web.go:235"
Jun 16 01:14:08 prometheus prometheus[20564]: time="2022-06-16T01:14:08+02:00" level=warning msg="Storage has entered rushed mode." chunksToPersist=448681 maxChunksToPersist=524288 maxMemoryChunks=1048576 memoryChunks=687476 source="storage.go:1660" urgencyScore=0.8557910919189453
When restarted like so, Prometheus enters Recovery Mode which takes 1h 30 min to complete. When it's done, the logs show the following :
Jun 16 16:10:42 prometheus prometheus[32708]: time="2022-06-16T16:10:42+02:00" level=info msg="Storage does not need throttling anymore." chunksToPersist=524288 maxChunksToPersist=524288 maxToleratedMemChunks=1153433 memoryChunks=1049320 source="storage.go:935"
Jun 16 16:10:42 prometheus prometheus[32708]: time="2022-06-16T16:10:42+02:00" level=error msg="Storage needs throttling. Scrapes and rule evaluations will be skipped." chunksToPersist=525451 maxChunksToPersist=524288 maxToleratedMemChunks=1153433 memoryChunks=1050483 source="storage.go:927"
Jun 16 16:15:31 prometheus prometheus[32708]: time="2022-06-16T16:15:31+02:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612"
Jun 16 16:16:28 prometheus prometheus[32708]: time="2022-06-16T16:16:28+02:00" level=info msg="Done checkpointing in-memory metrics and chunks in 57.204367083s." source="persistence.go:639"
The checkpointing is repeating often and takes about 1 min.
The monitoring for this server show the following :
Here are the flags used :
/usr/bin/prometheus --storage.local.path /path/to/prometheus/metrics --storage.local.retention=1460h0m0s --storage.local.series-file-shrink-ratio=0.3
Prometheus version :
prometheus --version
prometheus, version 1.5.2+ds (branch: debian/sid, revision: 1.5.2+ds-2+b3)
build user: pkg-go-maintainers#lists.alioth.debian.org
build date: 20170521-14:39:14
go version: go1.7.4
I decided to move some metrics on another server so this one is not as loaded as before. However, this server does have to scrape the metrics for 50+ other servers. What could be the cause of this ?
Related
Based on Thinstation, I built an iso with a docker package. The problem is that after raising the service, the connection to the host over the network is available only if restart docker.socket. How can I diagnose docker to understand why by default, without restarting the service, there is no possibility of remote connection to the host over the network?
Here is the service log:
-- Logs begin at Sat 2022-06-04 06:44:32 PDT, end at Sat 2022-06-04 06:44:45 PDT. --
Jun 04 06:44:39 thinstation-linux systemd[1]: Starting Docker Application Container Engine...
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.146186290-07:00" level=info msg="Starting up"
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.147352611-07:00" level=info msg="libcontainerd: started new containerd process" pid=3130
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.147385993-07:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.147399661-07:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.147446573-07:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.147473357-07:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45-07:00" level=warning msg="deprecated version : `1`, please switch to version `2`"
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.173591406-07:00" level=info msg="starting containerd" revision=212e8b6fa2f44b9c21b2798135fc6fb7c53efc16 version=v1.6.4
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.199975908-07:00" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.200078325-07:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.aufs\"..." type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201561989-07:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.aufs\"..." error="aufs is not supported (modprobe aufs failed: exit status 1 \"modprobe: FATAL: Module aufs not found in directory /lib/modules/5.10.119TS\\n\"): skip plugin" type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201609700-07:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201738040-07:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs (rootfs) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201769872-07:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201791635-07:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201809462-07:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.201863240-07:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202132975-07:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202304778-07:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202343950-07:00" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202388411-07:00" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202419790-07:00" level=info msg="metadata content store policy set" policy=shared
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202662008-07:00" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202696415-07:00" level=info msg="loading plugin \"io.containerd.event.v1.exchange\"..." type=io.containerd.event.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202717498-07:00" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202759330-07:00" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202783160-07:00" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202805065-07:00" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202840054-07:00" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.202875755-07:00" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.203282487-07:00" level=info msg="loading plugin \"io.containerd.service.v1.leases-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.203309113-07:00" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.203330056-07:00" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.203351267-07:00" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.203471409-07:00" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.203573569-07:00" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205104698-07:00" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205208659-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205286555-07:00" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205403958-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205491828-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205568653-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205648226-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205719790-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205789802-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205858833-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.205934557-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.206015063-07:00" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.206175458-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.209389255-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.209569204-07:00" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.209670052-07:00" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.209832522-07:00" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.209912037-07:00" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.210113358-07:00" level=error msg="failed to initialize a tracing processor \"otlp\"" error="no OpenTelemetry endpoint: skip plugin"
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.211156374-07:00" level=info msg=serving... address=/var/run/docker/containerd/containerd-debug.sock
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.211531426-07:00" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.211662344-07:00" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
Jun 04 06:44:45 ts_080027b3b812 dockerd[3130]: time="2022-06-04T06:44:45.211736920-07:00" level=info msg="containerd successfully booted in 0.038934s"
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.225142806-07:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.225280438-07:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.225384775-07:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.225519679-07:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.229535732-07:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.229567923-07:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.229606224-07:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.229623973-07:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.249833626-07:00" level=info msg="Loading containers: start."
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.366721376-07:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.470044352-07:00" level=info msg="Loading containers: done."
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.481283885-07:00" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.481634808-07:00" level=info msg="Docker daemon" commit=f756502 graphdriver(s)=overlay2 version=20.10.16
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.481778501-07:00" level=info msg="Daemon has completed initialization"
Jun 04 06:44:45 ts_080027b3b812 systemd[1]: Started Docker Application Container Engine.
Jun 04 06:44:45 ts_080027b3b812 dockerd[2349]: time="2022-06-04T06:44:45.512160761-07:00" level=info msg="API listen on /run/docker.sock"
gitlab installed in linux machine and pipeline failed because it doesn't have access to docker reposatory
docker repo inside gitlab
with the below error
ERROR: Preparation failed: Error response from daemon: Get
https://docker.*****/v2/operator/kubectl/manifests/1.15: unauthorized: HTTP
Basic: Access denied (executor_docker.go:188:0s)
I found the issue in bridge as the below
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:dc:8d:c0:4b brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2020-07-21 11:54:45 UTC; 19min ago
Docs: https://docs.docker.com
Main PID: 4488 (dockerd)
Tasks: 37
CGroup: /system.slice/docker.service
└─4488 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Jul 21 11:54:44 agora dockerd[4488]: time="2020-07-21T11:54:44.168245639Z" level=warning msg="Your kernel does not support cgroup rt period"
Jul 21 11:54:44 agora dockerd[4488]: time="2020-07-21T11:54:44.168261691Z" level=warning msg="Your kernel does not support cgroup rt runtime"
Jul 21 11:54:44 agora dockerd[4488]: time="2020-07-21T11:54:44.168739695Z" level=info msg="Loading containers: start."
Jul 21 11:54:44 agora dockerd[4488]: time="2020-07-21T11:54:44.660694452Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/
Jul 21 11:54:44 agora dockerd[4488]: time="2020-07-21T11:54:44.843200704Z" level=info msg="Loading containers: done."
Jul 21 11:54:44 agora dockerd[4488]: time="2020-07-21T11:54:44.870880302Z" level=warning msg="failed to retrieve runc version: unknown output format: runc ver
Jul 21 11:54:45 agora dockerd[4488]: time="2020-07-21T11:54:45.887294851Z" level=info msg="Docker daemon" commit=2d0083d graphdriver(s)=overlay2 version=18.09
Jul 21 11:54:45 agora dockerd[4488]: time="2020-07-21T11:54:45.887403804Z" level=info msg="Daemon has completed initialization"
Jul 21 11:54:45 agora dockerd[4488]: time="2020-07-21T11:54:45.928154658Z" level=info msg="API listen on /var/run/docker.sock"
Jul 21 11:54:45 agora systemd[1]: Started Docker Application Container Engine.
lines 1-19/19 (END)...skipping...
You need to login to docker registry
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
I have prometheus installed on an Amazon linux instance and here is the status of the service:
[ec2-user#ip-10-193-192-56 ~]$ sudo systemctl status prometheus
● prometheus.service - Prometheus Server
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Tue 2020-01-28 16:27:41 UTC; 16h ago
Docs: https://prometheus.io/docs/introduction/overview/
Main PID: 21129 (code=exited, status=1/FAILURE)
Jan 28 16:27:40 ip-10-193-192-56.service.essilor systemd[1]: prometheus.service: main process exited, code=exited, status=1/FAILURE
Jan 28 16:27:40 ip-10-193-192-56.service.essilor systemd[1]: Unit prometheus.service entered failed state.
Jan 28 16:27:40 ip-10-193-192-56.service.essilor systemd[1]: prometheus.service failed.
Jan 28 16:27:41 ip-10-193-192-56.service.essilor systemd[1]: prometheus.service holdoff time over, scheduling restart.
Jan 28 16:27:41 ip-10-193-192-56.service.essilor systemd[1]: start request repeated too quickly for prometheus.service
Jan 28 16:27:41 ip-10-193-192-56.service.essilor systemd[1]: Failed to start Prometheus Server.
Jan 28 16:27:41 ip-10-193-192-56.service.essilor systemd[1]: Unit prometheus.service entered failed state.
Jan 28 16:27:41 ip-10-193-192-56.service.essilor systemd[1]: prometheus.service failed.
[ec2-user#ip-10-193-192-56 ~]$
When I show the logs I get the following:
[ec2-user#ip-10-193-192-56 ~]$ sudo cat /var/log/messages | grep "error"
Jan 28 16:27:37 ip-10-193-192-56 prometheus: level=error ts=2020-01-28T16:27:37.650Z caller=main.go:736 err="opening storage failed: block dir: \"/app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB\": open /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB/meta.json: no such file or directory"
Jan 28 16:27:38 ip-10-193-192-56 prometheus: level=error ts=2020-01-28T16:27:38.716Z caller=main.go:736 err="opening storage failed: block dir: \"/app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB\": open /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB/meta.json: no such file or directory"
Jan 28 16:27:39 ip-10-193-192-56 prometheus: level=error ts=2020-01-28T16:27:39.340Z caller=main.go:736 err="opening storage failed: block dir: \"/app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB\": open /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB/meta.json: no such file or directory"
Jan 28 16:27:40 ip-10-193-192-56 prometheus: level=error ts=2020-01-28T16:27:40.142Z caller=main.go:736 err="opening storage failed: block dir: \"/app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB\": open /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB/meta.json: no such file or directory"
Jan 28 16:27:40 ip-10-193-192-56 prometheus: level=error ts=2020-01-28T16:27:40.946Z caller=main.go:736 err="opening storage failed: block dir: \"/app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB\": open /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB/meta.json: no such file or directory"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.134041576Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.136910879Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.14.77-81.59.amzn2.x86_64\n": exit status 1"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.138682614Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.zfs" error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.139296419Z" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.139604765Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.139907894Z" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.14.77-81.59.amzn2.x86_64\n": exit status 1"
Jan 28 16:27:46 ip-10-193-192-56 containerd: time="2020-01-28T16:27:46.159112796Z" level=error msg="Failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
[ec2-user#ip-10-193-192-56 ~]$
I have many errors here and I don't know what to do much about them. I tried to delete the broken block /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB but it didn't change anything.
/app/prometheus/data is in an AWS EFS.
delete the directory /app/prometheus/data/01DZ9119BY4ZGCSRF1H27TDXSB
and restart the server.
My prometheus tool is on centos 7 machine and cassandra is on centos 6. I am trying to monitor cassandra JMX port 7199 with prometheus. I keep getting error with my yml file. Not sure why I am not able to connect to the centos 6 (cassandra machine) Is my YAML file wrong or does it have something to do with JMX port 7199?
here is my YAML file:
my global config
global:
scrape_interval: 15s
scrape_configs:
- job_name: cassandra
static_configs:
- targets: ['10.1.0.22:7199']
Here is my prometheus log:
level=info ts=2017-12-08T04:30:53.92549611Z caller=main.go:215 msg="Starting Prometheus" version="(version=2.0.0, branch=HEAD, revision=0a74f98628a0463dddc90528220c94de5032d1a0)"
level=info ts=2017-12-08T04:30:53.925623847Z caller=main.go:216 build_context="(go=go1.9.2, user=root#615b82cb36b6, date=20171108-07:11:59)"
level=info ts=2017-12-08T04:30:53.92566228Z caller=main.go:217 host_details="(Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 localhost.localdomain (none))"
level=info ts=2017-12-08T04:30:53.932807536Z caller=web.go:380 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2017-12-08T04:30:53.93303681Z caller=targetmanager.go:71 component="target manager" msg="Starting target manager..."
level=info ts=2017-12-08T04:30:53.932905473Z caller=main.go:314 msg="Starting TSDB"
level=info ts=2017-12-08T04:30:53.987468942Z caller=main.go:326 msg="TSDB started"
level=info ts=2017-12-08T04:30:53.987582063Z caller=main.go:394 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2017-12-08T04:30:53.988366778Z caller=main.go:371 msg="Server is ready to receive requests."
level=warn ts=2017-12-08T04:31:00.561007282Z caller=main.go:377 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2017-12-08T04:31:00.563191668Z caller=main.go:384 msg="See you next time!"
level=info ts=2017-12-08T04:31:00.566231211Z caller=targetmanager.go:87 component="target manager" msg="Stopping target manager..."
level=info ts=2017-12-08T04:31:00.567070099Z caller=targetmanager.go:99 component="target manager" msg="Target manager stopped"
level=info ts=2017-12-08T04:31:00.567136027Z caller=manager.go:455 component="rule manager" msg="Stopping rule manager..."
level=info ts=2017-12-08T04:31:00.567162215Z caller=manager.go:461 component="rule manager" msg="Rule manager stopped"
level=info ts=2017-12-08T04:31:00.567186356Z caller=notifier.go:483 component=notifier msg="Stopping notification handler..."
If anyone has instruction on how to connect prometheus to cassandra , both being on two different machines, that would be helpful too.
This is not a problem with your config, prometheus received a TERM signal and terminated gracefully.
If you are not getting metrics, check whether 10.1.0.22:7199/metrics loads and returns metrics. You can also check the prometheus server's /targets endpoint for scraping status.
If you're not getting anything on your cassandra server's /metrics endpoint, it could be because you did not configure the cassandra prometheus exporter properly.
every time I try to do:
$ docker exec
I get the error message:
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 16\""
Session 1 (works like expected):
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
alpine latest baa5d63471ea 7 weeks ago 4.8 MB
hello-world latest c54a2cc56cbb 5 months ago 1.85 kB
$ docker run --rm --name alpine -it alpine sh
/ # pwd
/
Session 2:
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7bd39b37aee2 alpine "sh" 22 seconds ago Up 21 seconds alpine
$ docker exec -it alpine sh
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 16\""
$ docker exec -it 7bd39b37aee2 sh
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 16\""
/var/log/syslog shows some warnings, but I was neither able to understand the root cause not finding matching answers.
Thanks for any hint.
= = = = = = = = = = = = = = = = = = = = = = = = =
$ docker info
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 2
Server Version: 1.13.0-rc3
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 4
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 51371867a01c467f08af739783b8beafc154c4d7
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-53-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.487 GiB
Name: pb7tt6ts
ID: YQ4G:ETTP:5VCM:PAJD:F3KB:O7JN:AZOF:VLTI:SKH4:BTSR:KP7D:NXIZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
= = =
/var/log/syslog docker restart and steps above
= = =
Dec 13 14:28:09 pb7tt6ts systemd[1]: Stopping Docker Socket for the API.
Dec 13 14:28:09 pb7tt6ts systemd[1]: Starting Docker Socket for the API.
Dec 13 14:28:09 pb7tt6ts systemd[1]: Listening on Docker Socket for the API.
Dec 13 14:28:09 pb7tt6ts systemd[1]: Starting Docker Application Container Engine...
Dec 13 14:28:09 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:09.291301057+01:00" level=info msg="libcontainerd: new containerd process, pid: 1448"
Dec 13 14:28:10 pb7tt6ts kernel: [25908.125394] audit: type=1400 audit(1481635690.357:28): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="docker-default" pid=1466 comm="apparmor_parser"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.370364923+01:00" level=info msg="[graphdriver] using prior storage driver: aufs"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.387915069+01:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.388367650+01:00" level=warning msg="Your kernel does not support swap memory limit."
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.388465142+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.388508739+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.389419384+01:00" level=info msg="Loading containers: start."
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.397339748+01:00" level=info msg="Firewalld running: false"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.628011070+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.743703578+01:00" level=info msg="Loading containers: done."
Dec 13 14:28:10 pb7tt6ts kernel: [25908.510718] aufs au_opts_verify:1597:dockerd[1462]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.808510166+01:00" level=info msg="Daemon has completed initialization"
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.808575966+01:00" level=info msg="Docker daemon" commit=4d92237 graphdriver=aufs version=1.13.0-rc3
Dec 13 14:28:10 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:10.820562161+01:00" level=info msg="API listen on /var/run/docker.sock"
Dec 13 14:28:10 pb7tt6ts systemd[1]: Started Docker Application Container Engine.
Dec 13 14:28:10 pb7tt6ts console-kit-daemon[3106]: console-kit-daemon[3106]: GLib-CRITICAL: Source ID 226 was not found when attempting to remove it
Dec 13 14:28:10 pb7tt6ts console-kit-daemon[3106]: GLib-CRITICAL: Source ID 226 was not found when attempting to remove it
Dec 13 14:28:16 pb7tt6ts kernel: [25914.206672] aufs au_opts_verify:1597:dockerd[1460]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 14:28:16 pb7tt6ts kernel: [25914.388393] aufs au_opts_verify:1597:dockerd[1460]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 14:28:16 pb7tt6ts kernel: [25914.492197] aufs au_opts_verify:1597:dockerd[1460]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <warn> [1481635696.7320] device (vethff6f844): failed to find device 35 'vethff6f844' with udev
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <info> [1481635696.7340] manager: (vethff6f844): new Veth device (/org/freedesktop/NetworkManager/Devices/46)
Dec 13 14:28:16 pb7tt6ts systemd-udevd[1614]: Could not generate persistent MAC address for vethff6f844: No such file or directory
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <warn> [1481635696.7345] device (veth13c2a1d): failed to find device 36 'veth13c2a1d' with udev
Dec 13 14:28:16 pb7tt6ts systemd-udevd[1615]: Could not generate persistent MAC address for veth13c2a1d: No such file or directory
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <info> [1481635696.7417] manager: (veth13c2a1d): new Veth device (/org/freedesktop/NetworkManager/Devices/47)
Dec 13 14:28:16 pb7tt6ts kernel: [25914.509027] device veth13c2a1d entered promiscuous mode
Dec 13 14:28:16 pb7tt6ts kernel: [25914.509240] IPv6: ADDRCONF(NETDEV_UP): veth13c2a1d: link is not ready
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <info> [1481635696.7632] devices added (path: /sys/devices/virtual/net/vethff6f844, iface: vethff6f844)
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <info> [1481635696.7632] device added (path: /sys/devices/virtual/net/vethff6f844, iface: vethff6f844): no ifupdown configuration found.
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <info> [1481635696.7639] devices added (path: /sys/devices/virtual/net/veth13c2a1d, iface: veth13c2a1d)
Dec 13 14:28:16 pb7tt6ts NetworkManager[1343]: <info> [1481635696.7640] device added (path: /sys/devices/virtual/net/veth13c2a1d, iface: veth13c2a1d): no ifupdown configuration found.
Dec 13 14:28:16 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:16.965015836+01:00" level=warning msg="Your kernel does not support swap memory limit."
Dec 13 14:28:16 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:16.965090775+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Dec 13 14:28:16 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:16.965117179+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Dec 13 14:28:17 pb7tt6ts kernel: [25914.808163] eth0: renamed from vethff6f844
Dec 13 14:28:17 pb7tt6ts acvpnagent[2339]: Function: tableCallbackHandler File: RouteMgr.cpp Line: 1723 Invoked Function: recv Return Code: 11 (0x0000000B) Description: unknown
Dec 13 14:28:17 pb7tt6ts NetworkManager[1343]: <info> [1481635697.0599] devices removed (path: /sys/devices/virtual/net/vethff6f844, iface: vethff6f844)
Dec 13 14:28:17 pb7tt6ts acvpnagent[2339]: A new network interface has been detected.
Dec 13 14:28:17 pb7tt6ts NetworkManager[1343]: <info> [1481635697.0600] device (vethff6f844): driver 'veth' does not support carrier detection.
Dec 13 14:28:17 pb7tt6ts acvpnagent[2339]: Function: logInterfaces File: RouteMgr.cpp Line: 2105 Invoked Function: logInterfaces Return Code: 0 (0x00000000) Description: IP Address Interface List: 192.168.178.24 172.17.0.1 9.145.68.34 FE80:0:0:0:D8B4:C1E0:F8E4:DB77 FE80:0:0:0:42:44FF:FEC9:5D85 FE80:0:0:0:60A9:A1FF:FEED:F31C
Dec 13 14:28:17 pb7tt6ts NetworkManager[1343]: <info> [1481635697.0604] device (veth13c2a1d): link connected
Dec 13 14:28:17 pb7tt6ts NetworkManager[1343]: <info> [1481635697.0605] device (docker0): link connected
Dec 13 14:28:17 pb7tt6ts kernel: [25914.823988] IPv6: ADDRCONF(NETDEV_CHANGE): veth13c2a1d: link becomes ready
Dec 13 14:28:17 pb7tt6ts kernel: [25914.824039] docker0: port 1(veth13c2a1d) entered forwarding state
Dec 13 14:28:17 pb7tt6ts kernel: [25914.824061] docker0: port 1(veth13c2a1d) entered forwarding state
Dec 13 14:28:18 pb7tt6ts acvpnagent[2339]: Function: tableCallbackHandler File: RouteMgr.cpp Line: 1723 Invoked Function: recv Return Code: 11 (0x0000000B) Description: unknown
Dec 13 14:28:18 pb7tt6ts avahi-daemon[1217]: Joining mDNS multicast group on interface veth13c2a1d.IPv6 with address fe80::60a9:a1ff:feed:f31c.
Dec 13 14:28:18 pb7tt6ts avahi-daemon[1217]: New relevant interface veth13c2a1d.IPv6 for mDNS.
Dec 13 14:28:18 pb7tt6ts avahi-daemon[1217]: Registering new address record for fe80::60a9:a1ff:feed:f31c on veth13c2a1d.*.
Dec 13 14:28:32 pb7tt6ts kernel: [25929.850840] docker0: port 1(veth13c2a1d) entered forwarding state
Dec 13 14:28:36 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:36.704565159+01:00" level=error msg="Error running exec in container: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"process_linux.go:83: executing setns process caused \\\"exit status 16\\\"\"\n"
Dec 13 14:28:36 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:36.705362948+01:00" level=error msg="Handler for POST /v1.25/exec/8a78f29ef71d4c3ab982a8dd7a4a325e280766072dea7337860874a72c42f42c/resize returned error: rpc error: code = 2 desc = containerd: process not found for container"
Dec 13 14:28:46 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:46.921880770+01:00" level=error msg="Error running exec in container: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused \"process_linux.go:83: executing setns process caused \\\"exit status 16\\\"\"\n"
Dec 13 14:28:46 pb7tt6ts dockerd[1436]: time="2016-12-13T14:28:46.922576933+01:00" level=error msg="Handler for POST /v1.25/exec/5ad25668cac553118b8c702f02c69b427436eb67d1488d4170641bcacfdad50b/resize returned error: rpc error: code = 2 desc = containerd: process not found for container"
As recommended I reverted to a main version of docker and installed docker-engine 1.12.4
$ docker info
Containers: 2
Running: 1
Paused: 0
Stopped: 1
Images: 3
Server Version: 1.12.4
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 11
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host bridge null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-53-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.487 GiB
Name: pb7tt6ts
ID: YQ4G:ETTP:5VCM:PAJD:F3KB:O7JN:AZOF:VLTI:SKH4:BTSR:KP7D:NXIZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
Furthermore, no success but different error:
$ docker exec -it alpine sh
rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"process_linux.go:83: executing setns process caused \\\"exit status 17\\\"\"\n"
Corresponding /var/log/syslog from service docker start (21:00), docker run ... (21:01), docker exec ... (21:01)
Dec 13 21:00:01 pb7tt6ts systemd[1]: Starting Docker Socket for the API.
Dec 13 21:00:01 pb7tt6ts systemd[1]: Listening on Docker Socket for the API.
Dec 13 21:00:01 pb7tt6ts systemd[1]: Starting Docker Application Container Engine...
Dec 13 21:00:01 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:01.468921183+01:00" level=info msg="libcontainerd: new containerd process, pid: 8686"
Dec 13 21:00:02 pb7tt6ts kernel: [49419.124965] audit: type=1400 audit(1481659202.536:37): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="docker-default" pid=8700 comm="apparmor_parser"
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.550070413+01:00" level=info msg="[graphdriver] using prior storage driver \"aufs\""
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.572067603+01:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.572336166+01:00" level=warning msg="Your kernel does not support swap memory limit."
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.572799562+01:00" level=info msg="Loading containers: start."
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.579465999+01:00" level=info msg="Firewalld running: false"
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.779165187+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.903085523+01:00" level=info msg="Loading containers: done."
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.903179108+01:00" level=info msg="Daemon has completed initialization"
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.903208197+01:00" level=info msg="Docker daemon" commit=1564f02 graphdriver=aufs version=1.12.4
Dec 13 21:00:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:00:02.923282443+01:00" level=info msg="API listen on /var/run/docker.sock"
Dec 13 21:00:02 pb7tt6ts systemd[1]: Started Docker Application Container Engine.
Dec 13 21:01:01 pb7tt6ts kernel: [49477.834789] aufs au_opts_verify:1597:dockerd[8692]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 21:01:01 pb7tt6ts kernel: [49477.896566] aufs au_opts_verify:1597:dockerd[8692]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 21:01:01 pb7tt6ts kernel: [49478.080340] aufs au_opts_verify:1597:dockerd[8692]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 21:01:01 pb7tt6ts kernel: [49478.192100] aufs au_opts_verify:1597:dockerd[8682]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <warn> [1481659261.6125] device (veth2b5b07c): failed to find device 47 'veth2b5b07c' with udev
Dec 13 21:01:01 pb7tt6ts systemd-udevd[8810]: Could not generate persistent MAC address for vethc2e4873: No such file or directory
Dec 13 21:01:01 pb7tt6ts kernel: [49478.196917] device vethc2e4873 entered promiscuous mode
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.6215] manager: (veth2b5b07c): new Veth device (/org/freedesktop/NetworkManager/Devices/63)
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <warn> [1481659261.6222] device (vethc2e4873): failed to find device 48 'vethc2e4873' with udev
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.6241] manager: (vethc2e4873): new Veth device (/org/freedesktop/NetworkManager/Devices/64)
Dec 13 21:01:01 pb7tt6ts systemd-udevd[8809]: Could not generate persistent MAC address for veth2b5b07c: No such file or directory
Dec 13 21:01:01 pb7tt6ts kernel: [49478.211913] IPv6: ADDRCONF(NETDEV_UP): vethc2e4873: link is not ready
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.6454] devices added (path: /sys/devices/virtual/net/veth2b5b07c, iface: veth2b5b07c)
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.6454] device added (path: /sys/devices/virtual/net/veth2b5b07c, iface: veth2b5b07c): no ifupdown configuration found.
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.6507] devices added (path: /sys/devices/virtual/net/vethc2e4873, iface: vethc2e4873)
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.6507] device added (path: /sys/devices/virtual/net/vethc2e4873, iface: vethc2e4873): no ifupdown configuration found.
Dec 13 21:01:01 pb7tt6ts kernel: [49478.557310] eth0: renamed from veth2b5b07c
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.9915] devices removed (path: /sys/devices/virtual/net/veth2b5b07c, iface: veth2b5b07c)
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.9916] device (veth2b5b07c): driver 'veth' does not support carrier detection.
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.9919] device (vethc2e4873): link connected
Dec 13 21:01:01 pb7tt6ts NetworkManager[1343]: <info> [1481659261.9937] device (docker0): link connected
Dec 13 21:01:01 pb7tt6ts kernel: [49478.573434] IPv6: ADDRCONF(NETDEV_CHANGE): vethc2e4873: link becomes ready
Dec 13 21:01:01 pb7tt6ts kernel: [49478.573503] docker0: port 1(vethc2e4873) entered forwarding state
Dec 13 21:01:01 pb7tt6ts kernel: [49478.573527] docker0: port 1(vethc2e4873) entered forwarding state
Dec 13 21:01:03 pb7tt6ts avahi-daemon[1217]: Joining mDNS multicast group on interface vethc2e4873.IPv6 with address fe80::d02a:ecff:fea8:662c.
Dec 13 21:01:03 pb7tt6ts avahi-daemon[1217]: New relevant interface vethc2e4873.IPv6 for mDNS.
Dec 13 21:01:03 pb7tt6ts avahi-daemon[1217]: Registering new address record for fe80::d02a:ecff:fea8:662c on vethc2e4873.*.
Dec 13 21:01:17 pb7tt6ts kernel: [49493.628038] docker0: port 1(vethc2e4873) entered forwarding state
Dec 13 21:02:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:02:02.072027206+01:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = invalid header field value \"oci runtime error: exec failed: container_linux.go:247: starting container process caused \\\"process_linux.go:83: executing setns process caused \\\\\\\"exit status 17\\\\\\\"\\\"\\n\""
Dec 13 21:02:02 pb7tt6ts dockerd[8675]: time="2016-12-13T21:02:02.072759152+01:00" level=error msg="Handler for POST /v1.24/exec/00c0dcac7a178129a17cd9eb833d154d428f2a6efbcd0f421ab3c5c54e52a236/resize returned error: rpc error: code = 2 desc = containerd: process not found for container"
From the linked issue is this comment which appears to be the root cause:
I think I found the root reason. It's nothing to do with Docker.
Actually docker exec always fail because of Symantec AutoProtect
running on my system. It loads a custom kernel module that add some
file operation hooks, which affects the result of setns.
$ lsmod | grep symev
symev_custom_dkms_x86_64 72166 2 symap_custom_dkms_x86_64
The workaround is to disable Symantec AutoProtect and reboot.
sudo update-rc.d autoprotect disable