Traefik persistant volume timeouts AKS - azure

Im struggling to get Traefik working on K8s with ACME enabled. I want to store the certs as suggested on a persistantVolume. This for the fact that requesting certs is rateLimited and in case of pod restarts the cert would get lost. Below is my full config that is used for stable/traefik (helm chart) and installed in Azure AKS.
There are a issue that do not seem to work (or im just doing it wrong ofcourse).
pod has unbound immediate PersistentVolumeClaims
This is the initial error that i receive when booting up the pods. The weird thing is that the PersistantVolumeClaim is actually there and ready. When i change the volume itself in my Azure portal it also says its mount to my server
traefik-acme
Namespace: default
pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/azure-disk
Creation Time: 2019-04-16T09:55 UTC
Status: Bound
Volume: pvc-b673da74-602d-11e9-a537-9275388
Access modes: ReadWriteOnce
Storage class: default
Also the storageClass itself is active:
$ kubectl get sc --all-namespaces
NAME PROVISIONER AGE
default (default) kubernetes.io/azure-disk 4d
managed-premium kubernetes.io/azure-disk 4d
When i then wait a little longer i receive below error:
Unable to mount volumes for pod "traefik-d65fcbc8b-lkzsh_default(b68c8aa3-602d-11e9-a537-92753888c74b)": timeout expired waiting for volumes to attach or mount for pod "default"/"traefik-d65fcbc8b-lkzsh". list of unmounted volumes=[acme]. list of unattached volumes=[config acme default-token-p2lgf]
Here the full K8s event trace:
pod has unbound immediate PersistentVolumeClaims
default-scheduler
2019-04-16T09:55 UTC
Successfully assigned default/traefik-d65fcbc8b-lkzsh to aks-default-22301976-0
default-scheduler
2019-04-16T09:55 UTC
Unable to mount volumes for pod "traefik-d65fcbc8b-lkzsh_default(b68c8aa3-602d-11e9-a537-92753888c74b)": timeout expired waiting for volumes to attach or mount for pod "default"/"traefik-d65fcbc8b-lkzsh". list of unmounted volumes=[acme]. list of unattached volumes=[config acme default-token-p2lgf]
kubelet aks-default-22301976-0
2019-04-16T09:57 UTC
AttachVolume.Attach succeeded for volume "pvc-b673da74-602d-11e9-a537-92753888c74b"
attachdetach-controller
2019-04-16T09:58 UTC
Container image "traefik:1.7.9" already present on machine
kubelet aks-default-22301976-0
2019-04-16T10:01 UTC
Created container
kubelet aks-default-22301976-0
2019-04-16T10:00 UTC
Started container
kubelet aks-default-22301976-0
2019-04-16T10:00 UTC
Back-off restarting failed container
kubelet aks-default-22301976-0
2019-04-16T10:02 UTC
Install
Installing the helm chart of Traefik done with:
helm install -f values.yaml stable/traefik --name traefik
Below is the full values.yaml used to install the chart
## Default values for Traefik
image: traefik
imageTag: 1.7.9
testFramework:
image: "dduportal/bats"
tag: "0.4.0"
## can switch the service type to NodePort if required
serviceType: LoadBalancer
# loadBalancerIP: ""
# loadBalancerSourceRanges: []
whiteListSourceRange: []
externalTrafficPolicy: Cluster
replicas: 1
# startupArguments:
# - "--ping"
# - "--ping.entrypoint=http"
podDisruptionBudget: {}
# maxUnavailable: 1
# minAvailable: 2
# priorityClassName: ""
# rootCAs: []
resources: {}
debug:
enabled: false
deploymentStrategy: {}
# rollingUpdate:
# maxSurge: 1
# maxUnavailable: 0
# type: RollingUpdate
securityContext: {}
env: {}
nodeSelector: {}
# key: value
affinity: {}
# key: value
tolerations: []
# - key: "key"
# operator: "Equal|Exists"
# value: "value"
# effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
## Kubernetes ingress filters
# kubernetes:
# endpoint:
# namespaces:
# - default
# labelSelector:
# ingressClass:
# ingressEndpoint:
# hostname: "localhost"
# ip: "127.0.0.1"
# publishedService: "namespace/servicename"
# useDefaultPublishedService: false
proxyProtocol:
enabled: false
# trustedIPs is required when enabled
trustedIPs: []
# - 10.0.0.0/8
forwardedHeaders:
enabled: false
# trustedIPs is required when enabled
trustedIPs: []
# - 10.0.0.0/8
## Add arbitrary ConfigMaps to deployment
## Will be mounted to /configs/, i.e. myconfig.json would
## be mounted to /configs/myconfig.json.
configFiles: {}
# myconfig.json: |
# filecontents...
## Add arbitrary Secrets to deployment
## Will be mounted to /secrets/, i.e. file.name would
## be mounted to /secrets/mysecret.txt.
## The contents will be base64 encoded when added
secretFiles: {}
# mysecret.txt: |
# filecontents...
ssl:
enabled: false
enforced: false
permanentRedirect: false
upstream: false
insecureSkipVerify: false
generateTLS: false
# defaultCN: "example.com"
# or *.example.com
defaultSANList: []
# - example.com
# - test1.example.com
defaultIPList: []
# - 1.2.3.4
# cipherSuites: []
# https://docs.traefik.io/configuration/entrypoints/#specify-minimum-tls-version
# tlsMinVersion: VersionTLS12
defaultCert: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUVtekNDQTRPZ0F3SUJBZ0lKQUpBR1FsTW1DMGt5TUEwR0NTcUdTSWIzRFFFQkJRVUFNSUdQTVFzd0NRWUQKVlFRR0V3SlZVekVSTUE4R0ExVUVDQk1JUTI5c2IzSmhaRzh4RURBT0JnTlZCQWNUQjBKdmRXeGtaWEl4RkRBUwpCZ05WQkFvVEMwVjRZVzF3YkdWRGIzSndNUXN3Q1FZRFZRUUxFd0pKVkRFV01CUUdBMVVFQXhRTktpNWxlR0Z0CmNHeGxMbU52YlRFZ01CNEdDU3FHU0liM0RRRUpBUllSWVdSdGFXNUFaWGhoYlhCc1pTNWpiMjB3SGhjTk1UWXgKTURJME1qRXdPVFV5V2hjTk1UY3hNREkwTWpFd09UVXlXakNCanpFTE1Ba0dBMVVFQmhNQ1ZWTXhFVEFQQmdOVgpCQWdUQ0VOdmJHOXlZV1J2TVJBd0RnWURWUVFIRXdkQ2IzVnNaR1Z5TVJRd0VnWURWUVFLRXd0RmVHRnRjR3hsClEyOXljREVMTUFrR0ExVUVDeE1DU1ZReEZqQVVCZ05WQkFNVURTb3VaWGhoYlhCc1pTNWpiMjB4SURBZUJna3EKaGtpRzl3MEJDUUVXRVdGa2JXbHVRR1Y0WVcxd2JHVXVZMjl0TUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQwpBUThBTUlJQkNnS0NBUUVBdHVKOW13dzlCYXA2SDROdUhYTFB6d1NVZFppNGJyYTFkN1ZiRUJaWWZDSStZNjRDCjJ1dThwdTNhVTVzYXVNYkQ5N2pRYW95VzZHOThPUHJlV284b3lmbmRJY3RFcmxueGpxelUyVVRWN3FEVHk0bkEKNU9aZW9SZUxmZXFSeGxsSjE0VmlhNVFkZ3l3R0xoRTlqZy9jN2U0WUp6bmg5S1dZMnFjVnhEdUdEM2llaHNEbgphTnpWNFdGOWNJZm1zOHp3UHZPTk5MZnNBbXc3dUhUKzNiSzEzSUloeDI3ZmV2cXVWcENzNDFQNnBzdStWTG4yCjVIRHk0MXRoQkN3T0wrTithbGJ0ZktTcXM3TEFzM25RTjFsdHpITHZ5MGE1RGhkakpUd2tQclQrVXhwb0tCOUgKNFpZazErRUR0N09QbGh5bzM3NDFRaE4vSkNZK2RKbkFMQnNValFJREFRQUJvNEgzTUlIME1CMEdBMVVkRGdRVwpCQlJwZVc1dFhMdHh3TXJvQXM5d2RNbTUzVVVJTERDQnhBWURWUjBqQklHOE1JRzVnQlJwZVc1dFhMdHh3TXJvCkFzOXdkTW01M1VVSUxLR0JsYVNCa2pDQmp6RUxNQWtHQTFVRUJoTUNWVk14RVRBUEJnTlZCQWdUQ0VOdmJHOXkKWVdSdk1SQXdEZ1lEVlFRSEV3ZENiM1ZzWkdWeU1SUXdFZ1lEVlFRS0V3dEZlR0Z0Y0d4bFEyOXljREVMTUFrRwpBMVVFQ3hNQ1NWUXhGakFVQmdOVkJBTVVEU291WlhoaGJYQnNaUzVqYjIweElEQWVCZ2txaGtpRzl3MEJDUUVXCkVXRmtiV2x1UUdWNFlXMXdiR1V1WTI5dGdna0FrQVpDVXlZTFNUSXdEQVlEVlIwVEJBVXdBd0VCL3pBTkJna3EKaGtpRzl3MEJBUVVGQUFPQ0FRRUFjR1hNZms4TlpzQit0OUtCemwxRmw2eUlqRWtqSE8wUFZVbEVjU0QyQjRiNwpQeG5NT2pkbWdQcmF1SGI5dW5YRWFMN3p5QXFhRDZ0YlhXVTZSeENBbWdMYWpWSk5aSE93NDVOMGhyRGtXZ0I4CkV2WnRRNTZhbW13QzFxSWhBaUE2MzkwRDNDc2V4N2dMNm5KbzdrYnIxWVdVRzN6SXZveGR6OFlEclpOZVdLTEQKcFJ2V2VuMGxNYnBqSVJQNFhac25DNDVDOWdWWGRoM0xSZTErd3lRcTZoOVFQaWxveG1ENk5wRTlpbVRPbjJBNQovYkozVktJekFNdWRlVTZrcHlZbEpCemRHMXVhSFRqUU9Xb3NHaXdlQ0tWVVhGNlV0aXNWZGRyeFF0aDZFTnlXCnZJRnFhWng4NCtEbFNDYzkzeWZrL0dsQnQrU0tHNDZ6RUhNQjlocVBiQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
defaultKey: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb3dJQkFBS0NBUUVBdHVKOW13dzlCYXA2SDROdUhYTFB6d1NVZFppNGJyYTFkN1ZiRUJaWWZDSStZNjRDCjJ1dThwdTNhVTVzYXVNYkQ5N2pRYW95VzZHOThPUHJlV284b3lmbmRJY3RFcmxueGpxelUyVVRWN3FEVHk0bkEKNU9aZW9SZUxmZXFSeGxsSjE0VmlhNVFkZ3l3R0xoRTlqZy9jN2U0WUp6bmg5S1dZMnFjVnhEdUdEM2llaHNEbgphTnpWNFdGOWNJZm1zOHp3UHZPTk5MZnNBbXc3dUhUKzNiSzEzSUloeDI3ZmV2cXVWcENzNDFQNnBzdStWTG4yCjVIRHk0MXRoQkN3T0wrTithbGJ0ZktTcXM3TEFzM25RTjFsdHpITHZ5MGE1RGhkakpUd2tQclQrVXhwb0tCOUgKNFpZazErRUR0N09QbGh5bzM3NDFRaE4vSkNZK2RKbkFMQnNValFJREFRQUJBb0lCQUhrTHhka0dxNmtCWWQxVAp6MkU4YWFENnhneGpyY2JSdGFCcTc3L2hHbVhuQUdaWGVWcE81MG1SYW8wbHZ2VUgwaE0zUnZNTzVKOHBrdzNmCnRhWTQxT1dDTk1PMlYxb1MvQmZUK3Zsblh6V1hTemVQa0pXd2lIZVZMdVdEaVVMQVBHaWl4emF2RFMyUnlQRmEKeGVRdVNhdE5pTDBGeWJGMG5Zd3pST3ZoL2VSa2NKVnJRZlZudU1melFkOGgyMzZlb1UxU3B6UnhSNklubCs5UApNc1R2Wm5OQmY5d0FWcFo5c1NMMnB1V1g3SGNSMlVnem5oMDNZWUZJdGtDZndtbitEbEdva09YWHBVM282aWY5ClRIenBleHdubVJWSmFnRG85bTlQd2t4QXowOW80cXExdHJoU1g1U2p1K0xyNFJvOHg5bytXdUF1VnVwb0lHd0wKMWVseERFRUNnWUVBNzVaWGp1enNJR09PMkY5TStyYVFQcXMrRHZ2REpzQ3gyZnRudk1WWVJKcVliaGt6YnpsVQowSHBCVnk3NmE3WmF6Umxhd3RGZ3ljMlpyQThpM0F3K3J6d1pQclNJeWNieC9nUVduRzZlbFF1Y0FFVWdXODRNCkdSbXhKUGlmOGRQNUxsZXdRalFjUFJwZVoxMzlYODJreGRSSEdma1pscHlXQnFLajBTWExRSEVDZ1lFQXcybkEKbUVXdWQzZFJvam5zbnFOYjBlYXdFUFQrbzBjZ2RyaENQOTZQK1pEekNhcURUblZKV21PeWVxRlk1eVdSSEZOLwpzbEhXU2lTRUFjRXRYZys5aGlMc0RXdHVPdzhUZzYyN2VrOEh1UUtMb2tWWEFUWG1NZG9xOWRyQW9INU5hV2lECmRSY3dEU2EvamhIN3RZV1hKZDA4VkpUNlJJdU8vMVZpbDBtbEk5MENnWUVBb2lsNkhnMFNUV0hWWDNJeG9raEwKSFgrK1ExbjRYcFJ5VEg0eldydWY0TjlhYUxxNTY0QThmZGNodnFiWGJHeEN6U3RxR1E2cW1peUU1TVpoNjlxRgoyd21zZEpxeE14RnEzV2xhL0lxSzM0cTZEaHk3cUNld1hKVGRKNDc0Z3kvY0twZkRmeXZTS1RGZDBFejNvQTZLCmhqUUY0L2lNYnpxUStQREFQR0YrVHFFQ2dZQmQ1YnZncjJMMURzV1FJU3M4MHh3MDBSZDdIbTRaQVAxdGJuNk8KK0IvUWVNRC92UXBaTWV4c1hZbU9lV2Noc3FCMnJ2eW1MOEs3WDY1NnRWdGFYay9nVzNsM3ZVNTdYSFF4Q3RNUwpJMVYvcGVSNHRiN24yd0ZncFFlTm1XNkQ4QXk4Z0xiaUZhRkdRSDg5QWhFa0dTd1d5cWJKc2NoTUZZOUJ5OEtUCkZaVWZsUUtCZ0V3VzJkVUpOZEJMeXNycDhOTE1VbGt1ZnJxbllpUTNTQUhoNFZzWkg1TXU0MW55Yi95NUUyMW4KMk55d3ltWGRlb3VJcFZjcUlVTXl0L3FKRmhIcFJNeVEyWktPR0QyWG5YaENNVlRlL0FQNDJod294Nm02QkZpQgpvemZFa2wwak5uZmREcjZrL1p2MlQ1TnFzaWxaRXJBQlZGOTBKazdtUFBIa0Q2R1ZMUUJ4Ci0tLS0tRU5EIFJTQSBQUklWQVRFIEtFWS0tLS0tCg==
# Basic auth to protect all the routes. Can use htpasswd to generate passwords
# > htpasswd -n -b testuser testpass
# > testuser:$apr1$JXRA7j2s$LpVns9vsme8FHN0r.aSt11
auth: {}
# basic:
# testuser: $apr1$JXRA7j2s$LpVns9vsme8FHN0r.aSt11
kvprovider:
## If you want to run Traefik in HA mode, you will need to setup a KV Provider. Therefore you can choose one of
## * etcd
## * consul
## * boltdb
## * zookeeper
##
## ref: https://docs.traefik.io/user-guide/cluster/
## storeAcme has to be enabled to support HA Support using acme, but at least one kvprovider is needed
storeAcme: false
importAcme: false
# etcd:
# endpoint: etcd-service:2379
# useAPIV3: false
# watch: true
# prefix: traefik
## Override default configuration template.
## For advanced users :)
##
## Optional
# filename: consul.tmpl
# username: foo
# password: bar
# tls:
# ca: "/etc/ssl/ca.crt"
# cert: "/etc/ssl/consul.crt"
# key: "/etc/ssl/consul.key"
# insecureSkipVerify: true
#
# consul:
# endpoint: consul-service:8500
# watch: true
# prefix: traefik
## Override default configuration template.
## For advanced users :)
##
## Optional
# filename: consul.tmpl
# username: foo
# password: bar
# tls:
# ca: "/etc/ssl/ca.crt"
# cert: "/etc/ssl/consul.crt"
# key: "/etc/ssl/consul.key"
# insecureSkipVerify: true
## only relevant for etcd
acme:
enabled: true
email: me#gmail.com
onHostRule: true
staging: true
logging: true
# Configure a Let's Encrypt certificate to be managed by default.
# This is the only way to request wildcard certificates (works only with dns challenge).
domains:
enabled: true
# List of sets of main and (optional) SANs to generate for
# for wildcard certificates see https://docs.traefik.io/configuration/acme/#wildcard-domains
domainsList:
- main: "*.k8s-test.hardstyletop40.com"
# - sans:
# - "k8s-test.hardstyletop40.com"
# - main: "*.example2.com"
# - sans:
# - "test1.example2.com"
# - "test2.example2.com"
## ACME challenge type: "tls-sni-01", "tls-alpn-01", "http-01" or "dns-01"
## Note the chart's default of tls-sni-01 has been DEPRECATED and (except in
## certain circumstances) DISABLED by Let's Encrypt. It remains as a default
## value in this chart to preserve legacy behavior and avoid a breaking
## change. Users of this chart should strongly consider making the switch to
## the recommended "tls-alpn-01" (avaialbe since v1.7), dns-01 or http-01
## (available since v1.5) challenge.
challengeType: tls-alpn-01
## Configure dnsProvider to perform domain verification using dns challenge
## Applicable only if using the dns-01 challenge type
delayBeforeCheck: 0
resolvers: []
# - 1.1.1.1:53
# - 8.8.8.8:53
dnsProvider:
name: nil
auroradns:
AURORA_USER_ID: ""
AURORA_KEY: ""
AURORA_ENDPOINT: ""
azure:
AZURE_CLIENT_ID: ""
AZURE_CLIENT_SECRET: ""
AZURE_SUBSCRIPTION_ID: ""
AZURE_TENANT_ID: ""
AZURE_RESOURCE_GROUP: ""
cloudflare:
CLOUDFLARE_EMAIL: ""
CLOUDFLARE_API_KEY: ""
digitalocean:
DO_AUTH_TOKEN: ""
dnsimple:
DNSIMPLE_OAUTH_TOKEN: ""
DNSIMPLE_BASE_URL: ""
dnsmadeeasy:
DNSMADEEASY_API_KEY: ""
DNSMADEEASY_API_SECRET: ""
DNSMADEEASY_SANDBOX: ""
dnspod:
DNSPOD_API_KEY: ""
dyn:
DYN_CUSTOMER_NAME: ""
DYN_USER_NAME: ""
DYN_PASSWORD: ""
exoscale:
EXOSCALE_API_KEY: ""
EXOSCALE_API_SECRET: ""
EXOSCALE_ENDPOINT: ""
gandi:
GANDI_API_KEY: ""
godaddy:
GODADDY_API_KEY: ""
GODADDY_API_SECRET: ""
gcloud:
GCE_PROJECT: ""
GCE_SERVICE_ACCOUNT_FILE: ""
linode:
LINODE_API_KEY: ""
namecheap:
NAMECHEAP_API_USER: ""
NAMECHEAP_API_KEY: ""
ns1:
NS1_API_KEY: ""
otc:
OTC_DOMAIN_NAME: ""
OTC_USER_NAME: ""
OTC_PASSWORD: ""
OTC_PROJECT_NAME: ""
OTC_IDENTITY_ENDPOINT: ""
ovh:
OVH_ENDPOINT: ""
OVH_APPLICATION_KEY: ""
OVH_APPLICATION_SECRET: ""
OVH_CONSUMER_KEY: ""
pdns:
PDNS_API_URL: ""
rackspace:
RACKSPACE_USER: ""
RACKSPACE_API_KEY: ""
rfc2136:
RFC2136_NAMESERVER: ""
RFC2136_TSIG_ALGORITHM: ""
RFC2136_TSIG_KEY: ""
RFC2136_TSIG_SECRET: ""
RFC2136_TIMEOUT: ""
route53:
AWS_REGION: ""
AWS_ACCESS_KEY_ID: ""
AWS_SECRET_ACCESS_KEY: ""
vultr:
VULTR_API_KEY: ""
## Save ACME certs to a persistent volume.
## WARNING: If you do not do this and you did not have configured
## a kvprovider, you will re-request certs every time a pod (re-)starts
## and you WILL be rate limited!
persistence:
enabled: true
annotations: {}
## acme data Persistent Volume Storage Class
## If defined, storageClassName: <storageClass>
## If set to "-", storageClassName: "", which disables dynamic provisioning
## If undefined (the default) or set to null, no storageClassName spec is
## set, choosing the default provisioner. (gp2 on AWS, standard on
## GKE, AWS & OpenStack)
##
storageClass: "default"
accessMode: ReadWriteOnce
size: 1Gi
## A manually managed Persistent Volume Claim
## Requires persistence.enabled: true
## If defined, PVC must be created manually before volume will be bound
##
# existingClaim:
dashboard:
enabled: true
domain: traefik.k8s-test.hardstyletop40.com
# serviceType: ClusterIP
service: {}
# annotations:
# key: value
ingress: {}
# annotations:
# key: value
# labels:
# key: value
# tls:
# - hosts:
# - traefik.example.com
# secretName: traefik-default-cert
auth: {}
# basic:
# username: password
statistics: {}
## Number of recent errors to show in the ‘Health’ tab
# recentErrors:
service:
# annotations:
# key: value
# labels:
# key: value
## Further config for service of type NodePort
## Default config with empty string "" will assign a dynamic
## nodePort to http and https ports
nodePorts:
http: ""
https: ""
## If static nodePort configuration is required it can be enabled as below
## Configure ports in allowable range (eg. 30000 - 32767 on minikube)
# nodePorts:
# http: 30080
# https: 30443
gzip:
enabled: true
traefikLogFormat: json
accessLogs:
enabled: false
## Path to the access logs file. If not provided, Traefik defaults it to stdout.
# filePath: ""
format: common # choices are: common, json
## for JSON logging, finer-grained control over what is logged. Fields can be
## retained or dropped, and request headers can be retained, dropped or redacted
fields:
# choices are keep, drop
defaultMode: keep
names: {}
# ClientUsername: drop
headers:
# choices are keep, drop, redact
defaultMode: keep
names: {}
# Authorization: redact
rbac:
enabled: false
## Enable the /metrics endpoint, for now only supports prometheus
## set to true to enable metric collection by prometheus
metrics:
prometheus:
enabled: false
## If true, prevents exposing port 8080 on the main Traefik service, reserving
## it to the dashboard service only
restrictAccess: false
# buckets: [0.1,0.3,1.2,5]
datadog:
enabled: false
# address: localhost:8125
# pushinterval: 10s
statsd:
enabled: false
# address: localhost:8125
# pushinterval: 10s
deployment:
# labels to add to the pod container metadata
# podLabels:
# key: value
# podAnnotations:
# key: value
hostPort:
httpEnabled: false
httpsEnabled: false
dashboardEnabled: false
# httpPort: 80
# httpsPort: 443
# dashboardPort: 8080
sendAnonymousUsage: false
tracing:
enabled: false
serviceName: traefik
# backend: choices are jaeger, zipkin, datadog
# jaeger:
# localAgentHostPort: "127.0.0.1:6831"
# samplingServerURL: http://localhost:5778/sampling
# samplingType: const
# samplingParam: 1.0
# zipkin:
# httpEndpoint: http://localhost:9411/api/v1/spans
# debug: false
# sameSpan: false
# id128bit: true
# datadog:
# localAgentHostPort: "127.0.0.1:8126"
# debug: false
# globalTag: ""
## Create HorizontalPodAutoscaler object.
##
# autoscaling:
# minReplicas: 1
# maxReplicas: 10
# metrics:
# - type: Resource
# resource:
# name: cpu
# targetAverageUtilization: 60
# - type: Resource
# resource:
# name: memory
# targetAverageUtilization: 60
## Timeouts
##
# timeouts:
# ## responding are timeouts for incoming requests to the Traefik instance
# responding:
# readTimeout: 0s
# writeTimeout: 0s
# idleTimeout: 180s
# ## forwarding are timeouts for requests forwarded to the backend servers
# forwarding:
# dialTimeout: 30s
# responseHeaderTimeout: 0s

For your issue, it seems you misunderstand the persist volume claims. When you use the command:
kubectl get sc --all-namespaces
It just shows the storage class, not the persist volume claims. The storage class is used to define how a unit of storage is dynamically created with a persistent volume. You need to create the persist volume claims as you need like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: azure-managed-disk
spec:
accessModes:
- ReadWriteOnce
storageClassName: managed-premium
resources:
requests:
storage: 5Gi
And you can use the command to show the persist volume claims like below:
kubectl get pvc --all-namespaces
And it actually shows the persist volume claims that you create. Take a look at the Dynamically create and use a persistent volume with Azure disks in Azure Kubernetes Service (AKS). Or Use the special disk that you create.
Update
Also, I get the error as you, but when the pod in the running state, I check inside the pod and find the volumes all mounted correctly. So I guess if the error came because the pod is not in the running state. When the pod in the running state, volumes will mount as expected.

The main issue is that attaching external Azure resources is slow on which initially a retry is there. There the pod gives a lot of errors that it can not mount since the volume is dynamicly created. Due the retry jt recover after a few minutes.
In fact the actual container crash was due an issue with ACME and Traefik itself and not directly with the volumes.

Related

snakemake allocates memory twice

I am noticing that all my rules request memory twice, one at a lower maximum than what I requested (mem_mb) and then what I actually requested (mem_gb). If I run the rules as localrules they do run faster. How can I make sure the default settings do not interfere?
resources: mem_mb=100, disk_mb=8620, tmpdir=/tmp/pop071.54835, partition=h24, qos=normal, mem_gb=100, time=120:00:00
The rules are as follows:
rule bwa_mem2_mem:
input:
R1 = "data/results/qc/{species}.{population}.{individual}_1.fq.gz",
R2 = "data/results/qc/{species}.{population}.{individual}_2.fq.gz",
R1_unp = "data/results/qc/{species}.{population}.{individual}_1_unp.fq.gz",
R2_unp = "data/results/qc/{species}.{population}.{individual}_2_unp.fq.gz",
idx= "data/results/genome/genome",
ref = "data/results/genome/genome.fa"
output:
bam = "data/results/mapped_reads/{species}.{population}.{individual}.bam",
log:
bwa ="logs/bwa_mem2/{species}.{population}.{individual}.log",
sam ="logs/samtools_view/{species}.{population}.{individual}.log",
benchmark:
"benchmark/bwa_mem2_mem/{species}.{population}.{individual}.tsv",
resources:
time = parameters["bwa_mem2"]["time"],
mem_gb = parameters["bwa_mem2"]["mem_gb"],
params:
extra = parameters["bwa_mem2"]["extra"],
tag = compose_rg_tag,
threads:
parameters["bwa_mem2"]["threads"],
shell:
"bwa-mem2 mem -t {threads} -R '{params.tag}' {params.extra} {input.idx} {input.R1} {input.R2} | "
"samtools sort -l 9 -o {output.bam} --reference {input.ref} --output-fmt CRAM -# {threads} /dev/stdin 2> {log.sam}"
and the config is:
cluster:
mkdir -p logs/{rule} && # change the log file to logs/slurm/{rule}
sbatch
--partition={resources.partition}
--time={resources.time}
--qos={resources.qos}
--cpus-per-task={threads}
--mem={resources.mem_gb}
--job-name=smk-{rule}-{wildcards}
--output=logs/{rule}/{rule}-{wildcards}-%j.out
--parsable # Required to pass job IDs to scancel
default-resources:
- partition=h24
- qos=normal
- mem_gb=100
- time="04:00:00"
restart-times: 3
max-jobs-per-second: 10
max-status-checks-per-second: 1
local-cores: 1
latency-wait: 60
jobs: 100
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy
use-conda: True # Required to run with local conda enviroment
cluster-status: status-sacct.sh # Required to monitor the status of the submitted jobs
cluster-cancel: scancel # Required to cancel the jobs with Ctrl + C
cluster-cancel-nargs: 50
Cheers,
Angel
Right now there are two separate memory resource requirements:
mem_mb
mem_gb
From the perspective of snakemake these are different, so both will be passed to the cluster. A quick fix is to use the same units, e.g. if the resource really requires only 100 mb, then the default resource should be changed to:
default-resources:
- partition=h24
- qos=normal
- mem_mb=100

Prometheus is empty (no targets)

Describe the bug
After rolling out i try to access prometheus unfortunately no targets are displayed.
Version of Helm and Kubernetes:
Helm Version:
$ helm version
version.BuildInfo{Version:"v3.5.3", GitCommit:"041ce5a2c17a58be0fcd5f5e16fb3e7e95fea622", GitTreeState:"dirty", GoVersion:"go1.15.8"}
Kubernetes Version:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-08T17:59:43Z", GoVersion:"go1.15.5", Compiler:"gc", Platform
:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"9a45ba1752db920873e084791faff8d470278b09", GitTreeState:"clean", BuildDate:"2021-05-19T22:28:02Z", GoVersion:"go1.15.8", Compiler:"gc", Platform
:"linux/amd64"}
Which chart:
prometheus-kube-stack
Which version of the chart:
16.6.0
What you expected to happen:
Targets and service discovery should contain information
Changed values of values.yaml (only put values which differ from the defaults):
values.yaml
alertmanagerSpec:
nodeSelector:
beta.kubernetes.io/os: linux
------------------
grafana:
enabled: true
namespaceOverride: ""
## ForceDeployDatasources Create datasource configmap even if grafana deployment has been disabled
##
forceDeployDatasources: false
nodeSelector:
beta.kubernetes.io/os: linux
## ForceDeployDashboard Create dashboard configmap even if grafana deployment has been disabled
##
forceDeployDashboards: false
## Deploy default dashboards.
##
storageSpec:
volumeClaimTemplate:
metadata:
name: grafana-pvc
spec:
storageClassName: default
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 32Gi
selector:
defaultDashboardsEnabled: true
adminPassword: admin
------------------
kubeStateMetrics:
nodeSelector:
beta.kubernetes.io/os: linux
enabled: true
serviceMonitor:
-------------------
kube-state-metrics:
namespaceOverride: ""
rbac:
create: true
podSecurityPolicy:
enabled: true
nodeSelector:
beta.kubernetes.io/os: linux
--------------------
nodeExporter:
enabled: true
nodeSelector:
beta.kubernetes.io/os: linux
---------------------
prometheus-node-exporter:
namespaceOverride: ""
podLabels:
## Add the 'node-exporter' label to be used by serviceMonitor to match standard common usage in rules and grafana dashboards
##
jobLabel: node-exporter
extraArgs:
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
nodeSelector:
beta.kubernetes.io/os: linux
----------------
prometheusOperator:
enabled: true
patch:
enabled: true
image:
repository: jettech/kube-webhook-certgen
tag: v1.5.2
sha: ""
pullPolicy: IfNotPresent
resources: {}
## Provide a priority class name to the webhook patching job
##
priorityClassName: ""
podAnnotations: {}
nodeSelector:
beta.kubernetes.io/os: linux
affinity: {}
tolerations: []
-------------------------
prometheusSpec:
nodeSelector:
beta.kubernetes.io/os: linux
storageSpec:
volumeClaimTemplate:
metadata:
name: prometheus-pvc
spec:
storageClassName: default
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 32Gi
selector:
Anything else we need to know:
EDIT
NAME READY STATUS RESTARTS AGE
pod/alertmanager-kube-monitoring-kube-prome-alertmanager-0 2/2 Running 0 9h
pod/kube-monitoring-grafana-6896d856d9-krtgd 2/2 Running 0 9h
pod/kube-monitoring-kube-prome-operator-74b76d89f7-td5t9 1/1 Running 0 9h
pod/kube-monitoring-kube-state-metrics-7db74b856-p2qmx 1/1 Running 0 9h
pod/kube-monitoring-prometheus-node-exporter-96mdj 1/1 Running 0 9h
pod/kube-monitoring-prometheus-node-exporter-lmstk 1/1 Running 0 9h
pod/prometheus-kube-monitoring-kube-prome-prometheus-0 2/2 Running 1 9h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 9h
service/kube-monitoring-grafana ClusterIP 10.0.122.52 <none> 80/TCP 9h
service/kube-monitoring-kube-prome-alertmanager ClusterIP 10.0.115.147 <none> 9093/TCP 9h
service/kube-monitoring-kube-prome-operator ClusterIP 10.0.127.119 <none> 443/TCP 9h
service/kube-monitoring-kube-prome-prometheus ClusterIP 10.0.229.127 <none> 9090/TCP 9h
service/kube-monitoring-kube-state-metrics ClusterIP 10.0.106.71 <none> 8080/TCP 9h
service/kube-monitoring-prometheus-node-exporter ClusterIP 10.0.32.130 <none> 9100/TCP 9h
service/prometheus-operated ClusterIP None <none> 9090/TCP 9h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kube-monitoring-prometheus-node-exporter 2 2 2 2 2 beta.kubernetes.io/os=linux 9h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kube-monitoring-grafana 1/1 1 1 9h
deployment.apps/kube-monitoring-kube-prome-operator 1/1 1 1 9h
deployment.apps/kube-monitoring-kube-state-metrics 1/1 1 1 9h
NAME DESIRED CURRENT READY AGE
replicaset.apps/kube-monitoring-grafana-6896d856d9 1 1 1 9h
replicaset.apps/kube-monitoring-kube-prome-operator-74b76d89f7 1 1 1 9h
replicaset.apps/kube-monitoring-kube-state-metrics-7db74b856 1 1 1 9h
If more information is needed please let me know. Thanks in advance!

Ansible how to find out dictionary out of list of dictionaries

$ more defaults/mail.yaml
---
envs:
- dev:
acr-names:
- intake.azurecr.io
- dit.azurecr.io
- dev.azurecr.io
subscription-id: xxx
- uat:
acr-names:
- stagreg.azurecr.io
subscription-id: yyy
- prod:
acr-names:
- prodreg.azurecr.io
subscription-id: zzz
I want to write a ansible play to copy the image between registries in azure https://learn.microsoft.com/en-us/azure/container-registry/container-registry-import-images#import-from-a-registry-in-a-different-subscription
The play should accept 2 parameters. source_image and target_image, so the play will import the image from source to destination
For Ex:
ansible-playbook sync-docker-image.yml -e source_image=dit.azurecr.io/repo1:v1.0.0.0 -e target_image=stagreg.azurecr.io/stage-repo:latest
2 questions:
Here how can I find out the which env(dev,uat or prod) the source_image or target_image belongs to in ansible playbook, based on env, I want to choose the subscription-id. So from the above example, I want to create 2 variables called source_subscription and target_subscription and assign them to dev, uat subscriptions respectively.
In YAML, is it possible to access a variable in list of dictionaries based on key, for example something like envs[dev]?
Thanks
First - if possible - when you only have the three stages, don't use a list of dict items in envs. I asume they are already named, so use:
envs:
dev:
acr-names:
- ...
subscription-id: xxx
uat:
acr-names:
- ...
subscription-id: yyy
prod:
acr-names:
- ...
subscription-id: zzz
This would make it easier to access the stages via envs.dev or envs.uat etc. So you need to iterate only over envs.dev.acr-names (maybe use _ instead of -, otherwise you'll get in trouble later). Inside the iteration you can use the when condition to check the item against your source:
- name: "Facts"
set_fact:
envs:
dev:
acr_names:
- intake.azurecr.io
- dit.azurecr.io
- dev.azurecr.io
subscription_id: xxx
uat:
acr_names:
- stagreg.azurecr.io
subscription_id: yyy
prod:
acr_names:
- prodreg.azurecr.io
subscription_id: zzz
source_image: "dit.azurecr.io/repo1:v1.0.0.0"
target_image: "stagreg.azurecr.io/stage-repo:latest"
- name: "Identify source subscription"
set_fact:
source_subscription: "{{ envs.dev.subscription_id }}"
when:
- "item in source_image"
- "source_subscription is undefined"
loop: "{{ envs.dev.acr_names }}"
If it isn't possible to change the dict (because you have "many"), you need to iterate over the items in envs. If possible, do not create "random" keys but use "name"d item. So a structure like this would be better
envs:
- name: dev
acr_names:
- ...
subscription_id: xxx
- name: uat
acr_names:
- ...
subscription_id: yyy
...
So you iterate over the items in envs and then iterate over item.acr_names to find your system. This is more complicated, because you loop over a list and iterate then over items in that list. I think, this isn't possible with one single task. But with the given structure the problem is - the string in source_target is not exactly what is in acr_names. So remove anything after the slash and then you can use a different method to search for a string in a list.
- name: "Identify source subscription"
set_fact:
source_subscription: "{{ env.subscription_id }}"
when:
- "source_image.split('/')[0] in env.acr_names"
- "source_subscription is undefined"
loop: "{{ envs }}"
loop_control:
loop_var: env
You could also use the split filter in the first example without looping over envs.dev etc.
- name: "Show result"
set_fact:
source_subscription: "{{ envs.dev.subscription_id }}"
when:
- "source_image.split('/')[0] in envs.dev.acr_names"
If you really need to use your given structure, then you need to iterate over the envs. It countains a dictionary with a random key as root element. That makes it very complicated. In that case you need to loop over it, include a separate tasks file with include_tasks and inside that tasks list, you need the filter lookup('dict',env) to get a special dict and you can access item.keyanditem.value.acr_namesanditem.value.subscription_id` to access the values inside the dict. I wouldn't recommend that.
- name: "Identify source subscription"
include_tasks: find_env.yml
loop: "{{ envs }}"
loop_control:
loop_var: env
and find_env.yml contains:
- name: "Show result"
set_fact:
source_subscription: "{{ env[item.key].subscription_id }}"
when:
- "source_image.split('/')[0] in env[item.key].acr_names"
- "source_subscription is undefined"
loop: "{{ env | dict2items }}"
All of this must be done twice for source and target.

DataDog GKE NESTJS integration using DataDog's helm chart

Im trying to deploy my service and read my local logfile from the inside pod.
Using DataDog's helm chart values with the following configs :
## Default values for Datadog Agent
## See Datadog helm documentation to learn more:
## https://docs.datadoghq.com/agent/kubernetes/helm/
## #param image - object - required
## Define the Datadog image to work with.
#
image:
## #param repository - string - required
## Define the repository to use:
## use "datadog/agent" for Datadog Agent 6
## use "datadog/dogstatsd" for Standalone Datadog Agent DogStatsD6
#
repository: datadog/agent
## #param tag - string - required
## Define the Agent version to use.
## Use 6.13.0-jmx to enable jmx fetch collection
#
tag: 6.13.0
## #param pullPolicy - string - required
## The Kubernetes pull policy.
#
pullPolicy: IfNotPresent
## #param pullSecrets - list of key:value strings - optional
## It is possible to specify docker registry credentials
## See https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
#
# pullSecrets:
# - name: "<REG_SECRET>"
nameOverride: ""
fullnameOverride: ""
datadog:
## #param apiKey - string - required
## Set this to your Datadog API key before the Agent runs.
## ref: https://app.datadoghq.com/account/settings#agent/kubernetes
#
apiKey: "xxxxxxxx"
## #param apiKeyExistingSecret - string - optional
## Use existing Secret which stores API key instead of creating a new one.
## If set, this parameter takes precedence over "apiKey".
#
# apiKeyExistingSecret: <DATADOG_API_KEY_SECRET>
## #param appKey - string - optional
## If you are using clusterAgent.metricsProvider.enabled = true, you must set
## a Datadog application key for read access to your metrics.
#
appKey: "xxxxxx"
## #param appKeyExistingSecret - string - optional
## Use existing Secret which stores APP key instead of creating a new one
## If set, this parameter takes precedence over "appKey".
#
# appKeyExistingSecret: <DATADOG_APP_KEY_SECRET>
## #param securityContext - object - optional
## You can modify the security context used to run the containers by
## modifying the label type below:
#
# securityContext:
# seLinuxOptions:
# seLinuxLabel: "spc_t"
## #param clusterName - string - optional
## Set a unique cluster name to allow scoping hosts and Cluster Checks easily
#
# clusterName: <CLUSTER_NAME>
## #param name - string - required
## Daemonset/Deployment container name
## See clusterAgent.containerName if clusterAgent.enabled = true
#
name: datadog
## #param site - string - optional - default: 'datadoghq.com'
## The site of the Datadog intake to send Agent data to.
## Set to 'datadoghq.eu' to send data to the EU site.
#
# site: datadoghq.com
## #param dd_url - string - optional - default: 'https://app.datadoghq.com'
## The host of the Datadog intake server to send Agent data to, only set this option
## if you need the Agent to send data to a custom URL.
## Overrides the site setting defined in "site".
#
# dd_url: https://app.datadoghq.com
## #param logLevel - string - required
## Set logging verbosity, valid log levels are:
## trace, debug, info, warn, error, critical, and off
#
logLevel: INFO
## #param podLabelsAsTags - list of key:value strings - optional
## Provide a mapping of Kubernetes Labels to Datadog Tags.
#
# podLabelsAsTags:
# app: kube_app
# release: helm_release
# <KUBERNETES_LABEL>: <DATADOG_TAG_KEY>
## #param podAnnotationsAsTags - list of key:value strings - optional
## Provide a mapping of Kubernetes Annotations to Datadog Tags
#
# podAnnotationsAsTags:
# iam.amazonaws.com/role: kube_iamrole
# <KUBERNETES_ANNOTATIONS>: <DATADOG_TAG_KEY>
## #param tags - list of key:value elements - optional
## List of tags to attach to every metric, event and service check collected by this Agent.
##
## Learn more about tagging: https://docs.datadoghq.com/tagging/
#
# tags:
# - <KEY_1>:<VALUE_1>
# - <KEY_2>:<VALUE_2>
## #param useCriSocketVolume - boolean - required
## Enable container runtime socket volume mounting
#
useCriSocketVolume: true
## #param dogstatsdOriginDetection - boolean - optional
## Enable origin detection for container tagging
## https://docs.datadoghq.com/developers/dogstatsd/unix_socket/#using-origin-detection-for-container-tagging
#
# dogstatsdOriginDetection: true
## #param useDogStatsDSocketVolume - boolean - optional
## Enable dogstatsd over Unix Domain Socket
## ref: https://docs.datadoghq.com/developers/dogstatsd/unix_socket/
#
# useDogStatsDSocketVolume: true
## #param nonLocalTraffic - boolean - optional - default: false
## Enable this to make each node accept non-local statsd traffic.
## ref: https://github.com/DataDog/docker-dd-agent#environment-variables
#
nonLocalTraffic: true
## #param collectEvents - boolean - optional - default: false
## Enables this to start event collection from the kubernetes API
## ref: https://docs.datadoghq.com/agent/kubernetes/event_collection/
#
collectEvents: true
## #param leaderElection - boolean - optional - default: false
## Enables leader election mechanism for event collection.
#
# leaderElection: false
## #param leaderLeaseDuration - integer - optional - default: 60
## Set the lease time for leader election in second.
#
# leaderLeaseDuration: 60
## #param logsEnabled - boolean - optional - default: false
## Enables this to activate Datadog Agent log collection.
## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup
#
logsEnabled: true
## #param logsConfigContainerCollectAll - boolean - optional - default: false
## Enable this to allow log collection for all containers.
## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup
#
logsConfigContainerCollectAll: true
## #param containerLogsPath - string - optional - default: /var/lib/docker/containers
## This to allow log collection from container log path. Set to a different path if not
## using docker runtime.
## ref: https://docs.datadoghq.com/agent/kubernetes/daemonset_setup/?tab=k8sfile#create-manifest
#
containerLogsPath: /var/lib/docker/containers
## #param apmEnabled - boolean - optional - default: false
## Enable this to enable APM and tracing, on port 8126
## ref: https://github.com/DataDog/docker-dd-agent#tracing-from-the-host
#
apmEnabled: true
## #param processAgentEnabled - boolean - optional - default: false
## Enable this to activate live process monitoring.
## Note: /etc/passwd is automatically mounted to allow username resolution.
## ref: https://docs.datadoghq.com/graphing/infrastructure/process/#kubernetes-daemonset
#
processAgentEnabled: true
## #param env - list of object - optional
## The dd-agent supports many environment variables
## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#environment-variables
#
# env:
# - name: <ENV_VAR_NAME>
# value: <ENV_VAR_VALUE>
## #param volumes - list of objects - optional
## Specify additional volumes to mount in the dd-agent container
#
# volumes:
# - hostPath:
# path: <HOST_PATH>
# name: <VOLUME_NAME>
## #param volumeMounts - list of objects - optional
## Specify additional volumes to mount in the dd-agent container
#
# volumeMounts:
# - name: <VOLUME_NAME>
# mountPath: <CONTAINER_PATH>
# readOnly: true
## #param confd - list of objects - optional
## Provide additional check configurations (static and Autodiscovery)
## Each key becomes a file in /conf.d
## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#optional-volumes
## ref: https://docs.datadoghq.com/agent/autodiscovery/
#
confd:
conf.yaml: |-
init_config:
instances:
logs:
- type: "file"
path: "/app/logs/service.log"
service: nodejs
source: nodejs
sourcecategory: sourcecode
# kubernetes_state.yaml: |-
# ad_identifiers:
# - kube-state-metrics
# init_config:
# instances:
# - kube_state_url: http://%%host%%:8080/metrics
## #param checksd - list of key:value strings - optional
## Provide additional custom checks as python code
## Each key becomes a file in /checks.d
## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#optional-volumes
#
# checksd:
# service.py: |-
## #param criSocketPath - string - optional
## Path to the container runtime socket (if different from Docker)
## This is supported starting from agent 6.6.0
#
# criSocketPath: /var/run/containerd/containerd.sock
## #param dogStatsDSocketPath - string - optional
## Path to the DogStatsD socket
#
# dogStatsDSocketPath: /var/run/datadog/dsd.socket
## #param livenessProbe - object - optional
## Override the agent's liveness probe logic from the default:
## In case of issues with the probe, you can disable it with the
## following values, to allow easier investigating:
#
# livenessProbe:
# exec:
# command: ["/bin/true"]
## #param resources - object -required
## datadog-agent resource requests and limits
## Make sure to keep requests and limits equal to keep the pods in the Guaranteed QoS class
## Ref: http://kubernetes.io/docs/user-guide/compute-resources/
#
resources: {}
# requests:
# cpu: 200m
# memory: 256Mi
# limits:
# cpu: 200m
# memory: 256Mi
## #param clusterAgent - object - required
## This is the Datadog Cluster Agent implementation that handles cluster-wide
## metrics more cleanly, separates concerns for better rbac, and implements
## the external metrics API so you can autoscale HPAs based on datadog metrics
## ref: https://docs.datadoghq.com/agent/kubernetes/cluster/
#
clusterAgent:
## #param enabled - boolean - required
## Set this to true to enable Datadog Cluster Agent
#
enabled: false
containerName: cluster-agent
image:
repository: datadog/cluster-agent
tag: 1.3.2
pullPolicy: IfNotPresent
## #param token - string - required
## This needs to be at least 32 characters a-zA-z
## It is a preshared key between the node agents and the cluster agent
## ref:
#
token: ""
replicas: 1
## #param metricsProvider - object - required
## Enable the metricsProvider to be able to scale based on metrics in Datadog
#
metricsProvider:
enabled: true
## #param clusterChecks - object - required
## Enable the Cluster Checks feature on both the cluster-agents and the daemonset
## ref: https://docs.datadoghq.com/agent/autodiscovery/clusterchecks/
## Autodiscovery via Kube Service annotations is automatically enabled
#
clusterChecks:
enabled: true
## #param confd - list of objects - optional
## Provide additional cluster check configurations
## Each key will become a file in /conf.d
## ref: https://docs.datadoghq.com/agent/autodiscovery/
#
# confd:
# mysql.yaml: |-
# cluster_check: true
# instances:
# - server: '<EXTERNAL_IP>'
# port: 3306
# user: datadog
# pass: '<YOUR_CHOSEN_PASSWORD>'
## #param resources - object -required
## Datadog cluster-agent resource requests and limits.
#
resources: {}
# requests:
# cpu: 200m
# memory: 256Mi
# limits:
# cpu: 200m
# memory: 256Mi
## #param priorityclassName - string - optional
## Name of the priorityClass to apply to the Cluster Agent
# priorityClassName: system-cluster-critical
## #param livenessProbe - object - optional
## Override the agent's liveness probe logic from the default:
## In case of issues with the probe, you can disable it with the
## following values, to allow easier investigating:
#
# livenessProbe:
# exec:
# command: ["/bin/true"]
## #param podAnnotations - list of key:value strings - optional
## Annotations to add to the cluster-agents's pod(s)
#
# podAnnotations:
# key: "value"
## #param readinessProbe - object - optional
## Override the cluster-agent's readiness probe logic from the default:
#
# readinessProbe:
rbac:
## #param created - boolean - required
## If true, create & use RBAC resources
#
create: true
## #param serviceAccountName - string - required
## Ignored if rbac.create is true
#
serviceAccountName: default
tolerations: []
kubeStateMetrics:
## #param enabled - boolean - required
## If true, deploys the kube-state-metrics deployment.
## ref: https://github.com/kubernetes/charts/tree/master/stable/kube-state-metrics
#
enabled: true
kube-state-metrics:
rbac:
## #param created - boolean - required
## If true, create & use RBAC resources
#
create: true
serviceAccount:
## #param created - boolean - required
## If true, create ServiceAccount, require rbac kube-state-metrics.rbac.create true
#
create: true
## #param name - string - required
## The name of the ServiceAccount to use.
## If not set and create is true, a name is generated using the fullname template
#
name: coupon-service-account
## #param resources - object - optional
## Resource requests and limits for the kube-state-metrics container.
#
# resources:
# requests:
# cpu: 200m
# memory: 256Mi
# limits:
# cpu: 200m
# memory: 256Mi
daemonset:
## #param enabled - boolean - required
## You should keep Datadog DaemonSet enabled!
## The exceptional case could be a situation when you need to run
## single DataDog pod per every namespace, but you do not need to
## re-create a DaemonSet for every non-default namespace install.
## Note: StatsD and DogStatsD work over UDP, so you may not
## get guaranteed delivery of the metrics in Datadog-per-namespace setup!
#
enabled: true
## #param useDedicatedContainers - boolean - optional
## Deploy each datadog agent process in a separate container. Allow fine-grained
## control over allocated resources and better isolation.
#
# useDedicatedContainers: false
containers:
agent:
## #param env - list - required
## Additionnal environment variables for the agent container.
#
# env:
## #param logLevel - string - optional
## Set logging verbosity, valid log levels are:
## trace, debug, info, warn, error, critical, and off.
## If not set, fall back to the value of datadog.logLevel.
#
logLevel: INFO
## #param resources - object - required
## Resource requests and limits for the agent container.
#
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 200m
memory: 256Mi
processAgent:
## #param env - list - required
## Additionnal environment variables for the process-agent container.
#
# env:
## #param logLevel - string - optional
## Set logging verbosity, valid log levels are:
## trace, debug, info, warn, error, critical, and off.
## If not set, fall back to the value of datadog.logLevel.
#
logLevel: INFO
## #param resources - object - required
## Resource requests and limits for the process-agent container.
#
resources:
requests:
cpu: 100m
memory: 200Mi
limits:
cpu: 100m
memory: 200Mi
traceAgent:
## #param env - list - required
## Additionnal environment variables for the trace-agent container.
#
# env:
## #param logLevel - string - optional
## Set logging verbosity, valid log levels are:
## trace, debug, info, warn, error, critical, and off.
## If not set, fall back to the value of datadog.logLevel.
#
logLevel: INFO
## #param resources - object - required
## Resource requests and limits for the trace-agent container.
#
resources:
requests:
cpu: 100m
memory: 200Mi
limits:
cpu: 100m
memory: 200Mi
## #param useHostNetwork - boolean - optional
## Bind ports on the hostNetwork. Useful for CNI networking where hostPort might
## not be supported. The ports need to be available on all hosts. It Can be
## used for custom metrics instead of a service endpoint.
##
## WARNING: Make sure that hosts using this are properly firewalled otherwise
## metrics and traces are accepted from any host able to connect to this host.
#
useHostNetwork: true
## #param useHostPort - boolean - optional
## Sets the hostPort to the same value of the container port. Needs to be used
## to receive traces in a standard APM set up. Can be used as for sending custom metrics.
## The ports need to be available on all hosts.
##
## WARNING: Make sure that hosts using this are properly firewalled otherwise
## metrics and traces are accepted from any host able to connect to this host.
#
useHostPort: true
## #param useHostPID - boolean - optional
## Run the agent in the host's PID namespace. This is required for Dogstatsd origin
## detection to work. See https://docs.datadoghq.com/developers/dogstatsd/unix_socket/
#
# useHostPID: true
## #param podAnnotations - list of key:value strings - optional
## Annotations to add to the DaemonSet's Pods
#
# podAnnotations:
# <POD_ANNOTATION>: '[{"key": "<KEY>", "value": "<VALUE>"}]'
## #param tolerations - array - optional
## Allow the DaemonSet to schedule on tainted nodes (requires Kubernetes >= 1.6)
#
# tolerations: []
## #param nodeSelector - object - optional
## Allow the DaemonSet to schedule on selected nodes
## Ref: https://kubernetes.io/docs/user-guide/node-selection/
#
# nodeSelector: {}
## #param affinity - object - optional
## Allow the DaemonSet to schedule using affinity rules
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
#
# affinity: {}
## #param updateStrategy - string - optional
## Allow the DaemonSet to perform a rolling update on helm update
## ref: https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/
#
# updateStrategy: RollingUpdate
## #param priorityClassName - string - optional
## Sets PriorityClassName if defined.
#
# priorityClassName:
## #param podLabels - object - optional
## Sets podLabels if defined.
#
# podLabels: {}
## #param useConfigMap - boolean - optional
# Configures a configmap to provide the agent configuration
#
# useConfigMap: false
deployment:
## #param enabled - boolean - required
## Apart from DaemonSet, deploy Datadog agent pods and related service for
## applications that want to send custom metrics. Provides DogStasD service.
#
enabled: false
## #param replicas - integer - required
## If you want to use datadog.collectEvents, keep deployment.replicas set to 1.
#
replicas: 1
## #param affinity - object - required
## Affinity for pod assignment
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
#
affinity: {}
## #param tolerations - array - required
## Tolerations for pod assignment
## Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
#
tolerations: []
## #param dogstatsdNodePort - integer - optional
## If you're using a NodePort-type service and need a fixed port, set this parameter.
#
# dogstatsdNodePort: 8125
## #param traceNodePort - integer - optional
## If you're using a NodePort-type service and need a fixed port, set this parameter.
#
# traceNodePort: 8126
## #param service - object - required
##
#
service:
type: ClusterIP
annotations: {}
## #param priorityClassName - string - optional
## Sets PriorityClassName if defined.
#
# priorityClassName:
clusterchecksDeployment:
## #param enabled - boolean - required
## If true, deploys agent dedicated for running the Cluster Checks instead of running in the Daemonset's agents.
## ref: https://docs.datadoghq.com/agent/autodiscovery/clusterchecks/
#
enabled: false
rbac:
## #param dedicated - boolean - required
## If true, use a dedicated RBAC resource for the cluster checks agent(s)
#
dedicated: false
## #param serviceAccountName - string - required
## Ignored if rbac.create is true
#
serviceAccountName: default
## #param replicas - integer - required
## If you want to deploy the cluckerchecks agent in HA, keep at least clusterchecksDeployment.replicas set to 2.
## And increase the clusterchecksDeployment.replicas according to the number of Cluster Checks.
#
replicas: 2
## #param resources - object -required
## Datadog clusterchecks-agent resource requests and limits.
#
resources: {}
# requests:
# cpu: 200m
# memory: 500Mi
# limits:
# cpu: 200m
# memory: 500Mi
## #param affinity - object - optional
## Allow the ClusterChecks Deployment to schedule using affinity rules.
## By default, ClusterChecks Deployment Pods are forced to run on different Nodes.
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
#
# affinity:
## #param nodeSelector - object - optional
## Allow the ClusterChecks Deploument to schedule on selected nodes
## Ref: https://kubernetes.io/docs/user-guide/node-selection/
#
# nodeSelector: {}
## #param tolerations - array - required
## Tolerations for pod assignment
## Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
#
# tolerations: []
## #param livenessProbe - object - optional
## Override the agent's liveness probe logic from the default:
## In case of issues with the probe, you can disable it with the
## following values, to allow easier investigating:
#
# livenessProbe:
# exec:
# command: ["/bin/true"]
## #param env - list of object - optional
## The dd-agent supports many environment variables
## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#environment-variables
#
# env:
# - name: <ENV_VAR_NAME>
# value: <ENV_VAR_VALUE>
as you can see I expect my logs to be available at /app/logs/service.log and thats what Im supplying to my conf.d :
confd:
conf.yaml: |-
init_config:
instances:
logs:
- type: "file"
path: "/app/logs/service.log"
service: nodejs
source: nodejs
sourcecategory: sourcecode
In my service, I use WinstonLogger using file transport with the JSON format.
transports: [
new transports.File({
winston.format.json(),
filename: `${process.env.LOGS_PATH}/service.log`,
}),
]
process.env.LOGS_PATH = '/app/logs'
After all, exploring my pod and tail -f my service.log in the expected /app/logs folder I see that the application actually writes the logs in a JSON format as expected.
My DataDog doesn't pick up the logs and they are not showing in the log section ..
NOTE:: I do not mount any volume to and from my service ..
What am I missing?
Should I mount my local log to /var/log/pods/[service_name]/ ?
The main issue is that you would have to mount the volume in both the app container and the agent container in order to make it available. It also means you have to find a place to store the log file before it gets picked up by the agent. Doing this for every container could become difficult to maintain and time consuming.
An alternative approach would be to instead send the logs to stdout and let the agent collect them with the Docker integration. Since you configured logsConfigContainerCollectAll to true, the agent is already configured to collect the logs from every container output, so configuring Winston to output to stdout should just work.
See: https://docs.datadoghq.com/agent/docker/log/
To support rochdev comment, here are a few code snippets to help out (if you do not opt in for the STDOUT method which should be simpler). This is only to mount the right volume inside the container agent.
On your app deployment, add:
spec:
containers:
- name: your_nodejs_app
...
volumeMounts:
- name: abc
mountPath: /app/logs
volumes:
- hostPath:
path: /app/logs
name: abc
And on your agent daemonset:
spec:
containers:
- image: datadog/agent
...
volumeMounts:
...
- name: plop
mountPath: /app/logs
volumes:
...
- hostPath:
path: /app/logs/
name: plop

Dynamically assign master/slave variables in Ansible role

I have a simple mariadb role which permits to setup master/slave replication on two servers. In order to do this, I have to define in my inventory my 2 nodes like this:
node1 master=true
node2 slave=true
This way, I can setup one role to setup master/slave replication using Ansible when statement playing with this vars.
- name: Setup master conf
template: >-
src="templates/master.conf.j2"
dest="{{ master_config_file }}"
when:
- master is defined
Now, I would like to get something more automatic that could dynamically and randomly assign a master variable to one node, and slave variable to all other nodes.
I have seen some Ansible doc about variables and filters, but none of them seems to be adapted to that. I guess that I have to develop my own Ansible variable plugin to do that.
You can utilise facts.d. Something like this:
- hosts: all
become: yes
tasks:
- file:
path: /etc/ansible/facts.d
state: directory
- shell: echo '{{ my_facts | to_json }}' > /etc/ansible/facts.d/role.fact
args:
creates: /etc/ansible/facts.d/role.fact
vars:
my_facts:
is_master: "{{ true if play_hosts.index(inventory_hostname) == 0 else false }}"
register: role_fact
# refresh facts if fact has been just added
- setup:
when: role_fact | changed
- set_fact:
is_master: "{{ ansible_local.role.is_master }}"
- debug:
var: is_master
This will create role.fact on remote nodes if it is not there and use is_master fact from it. During subsequent runs ansible_local.role.is_master is fetched automatically.
You can use a dynamic group to do that. Another use case : You don't know which node is the master because it is elected, and you need to performs actions only on master.
To use a dynamic group, you need to define two pays in your playbook :
First one in order to determine which node is the master and add it in a dynamic group, you need to use a command
Then execute tasks on master, slaves
Following playbook determine which nodes are masters and slaves and execute a play on each types :
- hosts: all
tasks:
- shell: <command on node to retrieve node type>
register: result__node_type
- name: If node is a master, add it in masters group
add_host:
name: "{{ inventory_hostname }}"
groups: temp_master
when: result__node_type.stdout == "MASTER"
- name: If node is a slave, add it in slaves group
add_host:
name: "{{ inventory_hostname }}"
groups: temp_slave
when: result__node_type.stdout == "SLAVE"
- name: No master found, then assign first one (or random if you want) to masters group
add_host:
name: "groups['all'][0]"
groups: temp_master
run_once: yes
when: groups['temp_master'] | length == 0
- name: No slave found, then assign others to slaves group
add_host:
name: "groups['all'][0]"
groups: temp_slave
run_once: yes
with_items: "{{ groups['all'][1:] }}"
when: groups['temp_slave'] | length == 0
- hosts: temp_master
gather_facts: false
tasks:
- debug:
msg: "Action on master {{ ansible_host }}"
- hosts: temp_slave
gather_facts: false
tasks:
- debug:
msg: "Action on slave {{ ansible_host }}"

Resources