kube-proxy changes reverting after couple of minutes on my AKS cluster - azure

I am experimenting and tweaking a bit on my sandbox AKS cluster with the intention to configure it in a production ready state. Regarding that, I am following a book where the writer is redeployig the initial kube-proxy daemonset with some modification (the only difference is that he is doing it on AWS EKS).
The problem is that the daemonset and pod are getting to the initial state after 2-3 minutes. AKS is just doing a rollback, what I can se when execute the rollback command
> kubectl rollout history daemonset kube-proxy -n kube-system
daemonset.apps/kube-proxy
REVISION CHANGE-CAUSE
2 <none>
8 <none>
10 <none>
14 <none>
16 <none>
I tried to redeploy the daemonset with my minor changes (changed cpu from 100m to 120m and changed the -v flag from 3 to 2) declaretively by applying following manifest
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
component: kube-proxy
tier: node
deployment: custom
name: kube-proxy
namespace: kube-system
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
component: kube-proxy
tier: node
template:
metadata:
creationTimestamp: null
labels:
component: kube-proxy
tier: node
deployedBy: Luka
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.azure.com/cluster
operator: Exists
- key: type
operator: NotIn
values:
- virtual-kubelet
- key: kubernetes.io/os
operator: In
values:
- linux
containers:
- command:
- kube-proxy
- --conntrack-max-per-core=0
- --metrics-bind-address=0.0.0.0:10249
- --kubeconfig=/var/lib/kubelet/kubeconfig
- --cluster-cidr=10.244.0.0/16
- --detect-local-mode=ClusterCIDR
- --pod-interface-name-prefix=
- --v=2
image: mcr.microsoft.com/oss/kubernetes/kube-proxy:v1.23.12-hotfix.20220922.1
imagePullPolicy: IfNotPresent
name: kube-proxy
resources:
requests:
cpu: 120m
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/kubelet
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/certs
name: certificates
readOnly: true
- mountPath: /run/xtables.lock
name: iptableslock
- mountPath: /lib/modules
name: modules
dnsPolicy: ClusterFirst
hostNetwork: true
initContainers:
- command:
- /bin/sh
- -c
- |
SYSCTL=/proc/sys/net/netfilter/nf_conntrack_max
echo "Current net.netfilter.nf_conntrack_max: $(cat $SYSCTL)"
DESIRED=$(awk -F= '/net.netfilter.nf_conntrack_max/ {print $2}' /etc/sysctl.d/999-sysctl-aks.conf)
if [ -z "$DESIRED" ]; then
DESIRED=$((32768*$(nproc)))
if [ $DESIRED -lt 131072 ]; then
DESIRED=131072
fi
echo "AKS custom config for net.netfilter.nf_conntrack_max not set."
echo "Setting nf_conntrack_max to $DESIRED (32768 * $(nproc) cores, minimum 131072)."
echo $DESIRED > $SYSCTL
else
echo "AKS custom config for net.netfilter.nf_conntrack_max set to $DESIRED."
echo "Setting nf_conntrack_max to $DESIRED."
echo $DESIRED > $SYSCTL
fi
image: mcr.microsoft.com/oss/kubernetes/kube-proxy:v1.23.12-hotfix.20220922.1
imagePullPolicy: IfNotPresent
name: kube-proxy-bootstrap
resources:
requests:
cpu: 100m
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/sysctl.d
name: sysctls
- mountPath: /lib/modules
name: modules
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
- effect: NoSchedule
operator: Exists
volumes:
- hostPath:
path: /var/lib/kubelet
type: ""
name: kubeconfig
- hostPath:
path: /etc/kubernetes/certs
type: ""
name: certificates
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: iptableslock
- hostPath:
path: /etc/sysctl.d
type: Directory
name: sysctls
- hostPath:
path: /lib/modules
type: Directory
name: modules
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 4
desiredNumberScheduled: 4
numberAvailable: 4
numberMisscheduled: 0
numberReady: 4
observedGeneration: 1
updatedNumberScheduled: 4
I tried it also by removing the initContainer. Even the solution by editing the daemonset, explained in this stackoverlow post didnt worked.
Do I miss something? Why is the kube-proxy daemonset always rolling back?

In Kubernetes rolling updates are the default strategy to update running version of the application
When I upgrade the pods from version 1 to 2 the deployment will creates the new ReplicaSet and increase the count of replicas and previous count goes to 0
After rolling update, the previous replica set is not deleted
If we try to execute another rolling update from version 2 to 3 we might notice that at the end of the upgrade we have two replica sets with 0 count
I have created the deployment file and deployed when I check the history of the daemonset I am able to see below results
kubectl rollout history daemonset kube-proxy -n kube-system
We can rollback to the specific version
kubectl rollout undo daemonset kube-proxy --to-revision=4 -n kube-system
After undo changes my replica revision changes to my daemonset look like below
kubectl rollout history daemonset kube-proxy -n kube-system
In the above command we have two columns 1 is revision and another is change-cause and it is always set to none
I have set the change-cause to 'Kube' as mentioned below and got below results
If I try to get the rollout history again
kubernetes.io/change-cause: "Kube" #for particular revision
kubectl apply -f filename
kubectl rollout history daemonset kube-proxy -n kube-system
Reference: To know more about the rolling updates use this kubernetes link

Related

Apache Flink Operator - enable azure-fs-hadoop

I am trying to perform a flink job, using Flink Operator (https://github:com/apache/flink-kubernetes-operator) on k8s, that using uses a connection to Azure Blob Storage described here: https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/filesystems/azure/
Following the guideline I need to copy the jar file flink-azure-fs-hadoop-1.15.0.jar from one directory to another.
I have already tried to do it via podTemplate and command functionality, but unfortunately it does not work, and the file does not appear in the destination directory.
Can you guide me on how to do it properly?
Below you can find my FlinkDeployment file.
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
namespace: flink
name: basic-example
spec:
image: flink:1.15
flinkVersion: v1_15
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
serviceAccount: flink
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: pod-template
spec:
serviceAccount: flink
containers:
- name: flink-main-container
volumeMounts:
- mountPath: /opt/flink/data
name: flink-data
# command:
# - "touch"
# - "/tmp/test.txt"
volumes:
- name: flink-data
emptyDir: { }
jobManager:
resource:
memory: "2048m"
cpu: 1
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: job-manager-pod-template
spec:
initContainers:
- name: fetch-jar
image: cirrusci/wget
volumeMounts:
- mountPath: /opt/flink/data
name: flink-data
command:
- "wget"
- "LINK_TO_CUSTOM_JAR_FILE_ON_AZURE_BLOB_STORAGE"
- "-O"
- "/opt/flink/data/test.jar"
containers:
- name: flink-main-container
command:
- "touch"
- "/tmp/test.txt"
taskManager:
resource:
memory: "2048m"
cpu: 1
job:
jarURI: local:///opt/flink/data/test.jar
parallelism: 2
upgradeMode: stateless
state: running
ingress:
template: "CUSTOM_LINK_TO_AZURE"
annotations:
cert-manager.io/cluster-issuer: letsencrypt
kubernetes.io/ingress.allow-http: 'false'
traefik.ingress.kubernetes.io/router.entrypoints: websecure
traefik.ingress.kubernetes.io/router.tls: 'true'
traefik.ingress.kubernetes.io/router.tls.options: default
Since you are using the stock Flink 1.15 image this Azure filesystem plugin comes built-in. You can enable it via setting the ENABLE_BUILT_IN_PLUGINS environment variable.
spec:
podTemplate:
containers:
# Do not change the main container name
- name: flink-main-container
env:
- name: ENABLE_BUILT_IN_PLUGINS
value: flink-azure-fs-hadoop-1.15.0.jar
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#using-filesystem-plugins

Elastic Search upgrade to v8 on Kubernetes

I am having an elastic search deployment on a Microsoft Kubernetes cluster that was deployed with a 7.x chart and I changed the image to 8.x. This upgrade worked and both elastic and Kibana was accessible, but now i need to enable THE new security feature which is included in the basic license from now on. The reason behind the security first came from the requirement to enable APM Server/Agents.
I have the following values:
- name: cluster.initial_master_nodes
value: elasticsearch-master-0,
- name: discovery.seed_hosts
value: elasticsearch-master-headless
- name: cluster.name
value: elasticsearch
- name: network.host
value: 0.0.0.0
- name: cluster.deprecation_indexing.enabled
value: 'false'
- name: node.roles
value: data,ingest,master,ml,remote_cluster_client
The elastic search and kibana pods are able to start but i am unable to set APM Integration due security. So I am enabling security using the below values:
- name: xpack.security.enabled
value: 'true'
Then i am getting an error log from the elasic search pod: "Transport SSL must be enabled if security is enabled. Please set [xpack.security.transport.ssl.enabled] to [true] or disable security by setting [xpack.security.enabled] to [false]". So i am enabling ssl using the below values:
- name: xpack.security.transport.ssl.enabled
value: 'true'
Then i am getting an error log from elastic search pod: "invalid SSL configuration for xpack.security.transport.ssl - server ssl configuration requires a key and certificate, but these have not been configured; you must set either [xpack.security.transport.ssl.keystore.path] (p12 file), or both [xpack.security.transport.ssl.key] (pem file) and [xpack.security.transport.ssl.certificate] (pem key file)".
I start with Option1, i am creating the keys using the below commands (no password / enter, enter / enter, enter, enter) and i am coping them to a persistent folder:
./bin/elasticsearch-certutil ca
./bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12
cp elastic-stack-ca.p12 data/elastic-stack-ca.p12
cp elastic-certificates.p12 data/elastic-certificates.p12
In addition I am also configuring the below values:
- name: xpack.security.transport.ssl.truststore.path
value: '/usr/share/elasticsearch/data/elastic-certificates.p12'
- name: xpack.security.transport.ssl.keystore.path
value: '/usr/share/elasticsearch/data/elastic-certificates.p12'
But the pod is still in initializing, if generate the certificates with password. then i am getting an error log from elastic search pod: "cannot read configured [PKCS12] keystore (as a truststore) [/usr/share/elasticsearch/data/elastic-certificates.p12] - this is usually caused by an incorrect password; (no password was provided)"
Then i go to Option2, i am creating the keys using the below commands and i am coping them to a persistent folder
./bin/elasticsearch-certutil ca --pem
unzip elastic-stack-ca.zip –d
cp ca.crt data/ca.crt
cp ca.key data/ca.key
In addition I am also configuring the below values:
- name: xpack.security.transport.ssl.key
value: '/usr/share/elasticsearch/data/ca.key'
- name: xpack.security.transport.ssl.certificate
value: '/usr/share/elasticsearch/data/ca.crt'
But the pod is still in initializing state without providing any logs, as i know while pod is in initializing state it does not produce any container logs. From portal side in events everything seems to be ok, except the elastic pod which is not in ready state.
At last i located the same issue to the eleastic search community, without any response: https://discuss.elastic.co/t/elasticsearch-pods-are-not-ready-when-xpack-security-enabled-is-configured/281709?u=s19k15
Here is my StatefullSet
status:
observedGeneration: 169
replicas: 1
updatedReplicas: 1
currentRevision: elasticsearch-master-7449d7bd69
updateRevision: elasticsearch-master-7d8c7b6997
collisionCount: 0
spec:
replicas: 1
selector:
matchLabels:
app: elasticsearch-master
template:
metadata:
name: elasticsearch-master
creationTimestamp: null
labels:
app: elasticsearch-master
chart: elasticsearch
release: platform
spec:
initContainers:
- name: configure-sysctl
image: docker.elastic.co/elasticsearch/elasticsearch:8.1.2
command:
- sysctl
- '-w'
- vm.max_map_count=262144
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
runAsUser: 0
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:8.1.2
ports:
- name: http
containerPort: 9200
protocol: TCP
- name: transport
containerPort: 9300
protocol: TCP
env:
- name: node.name
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: cluster.initial_master_nodes
value: elasticsearch-master-0,
- name: discovery.seed_hosts
value: elasticsearch-master-headless
- name: cluster.name
value: elasticsearch
- name: cluster.deprecation_indexing.enabled
value: 'false'
- name: ES_JAVA_OPTS
value: '-Xmx512m -Xms512m'
- name: node.roles
value: data,ingest,master,ml,remote_cluster_client
- name: xpack.license.self_generated.type
value: basic
- name: xpack.security.enabled
value: 'true'
- name: xpack.security.transport.ssl.enabled
value: 'true'
- name: xpack.security.transport.ssl.truststore.path
value: /usr/share/elasticsearch/data/elastic-certificates.p12
- name: xpack.security.transport.ssl.keystore.path
value: /usr/share/elasticsearch/data/elastic-certificates.p12
- name: xpack.security.http.ssl.enabled
value: 'true'
- name: xpack.security.http.ssl.truststore.path
value: /usr/share/elasticsearch/data/elastic-certificates.p12
- name: xpack.security.http.ssl.keystore.path
value: /usr/share/elasticsearch/data/elastic-certificates.p12
- name: logger.org.elasticsearch.discovery
value: debug
- name: path.logs
value: /usr/share/elasticsearch/data
- name: xpack.security.enrollment.enabled
value: 'true'
resources:
limits:
cpu: '1'
memory: 2Gi
requests:
cpu: 100m
memory: 512Mi
volumeMounts:
- name: elasticsearch-master
mountPath: /usr/share/elasticsearch/data
readinessProbe:
exec:
command:
- bash
- '-c'
- >
set -e
# If the node is starting up wait for the cluster to be ready
(request params: "wait_for_status=green&timeout=1s" )
# Once it has started only check that the node itself is
responding
START_FILE=/tmp/.es_start_file
# Disable nss cache to avoid filling dentry cache when calling
curl
# This is required with Elasticsearch Docker using nss < 3.52
export NSS_SDB_USE_CACHE=no
http () {
local path="${1}"
local args="${2}"
set -- -XGET -s
if [ "$args" != "" ]; then
set -- "$#" $args
fi
if [ -n "${ELASTIC_PASSWORD}" ]; then
set -- "$#" -u "elastic:${ELASTIC_PASSWORD}"
fi
curl --output /dev/null -k "$#" "http://127.0.0.1:9200${path}"
}
if [ -f "${START_FILE}" ]; then
echo 'Elasticsearch is already running, lets check the node is healthy'
HTTP_CODE=$(http "/" "-w %{http_code}")
RC=$?
if [[ ${RC} -ne 0 ]]; then
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
exit ${RC}
fi
# ready if HTTP code 200, 503 is tolerable if ES version is 6.x
if [[ ${HTTP_CODE} == "200" ]]; then
exit 0
elif [[ ${HTTP_CODE} == "503" && "8" == "6" ]]; then
exit 0
else
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
exit 1
fi
else
echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
touch ${START_FILE}
exit 0
else
echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
exit 1
fi
fi
initialDelaySeconds: 10
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 3
failureThreshold: 3
lifecycle:
postStart:
exec:
command:
- bash
- '-c'
- >
#!/bin/bash
# Create the
dev.general.logcreation.elasticsearchlogobject.v1.json index
ES_URL=http://localhost:9200
while [[ "$(curl -s -o /dev/null -w '%{http_code}\n'
$ES_URL)" != "200" ]]; do sleep 1; done
curl --request PUT --header 'Content-Type: application/json'
"$ES_URL/dev.general.logcreation.elasticsearchlogobject.v1.json/"
--data
'{"mappings":{"properties":{"Properties":{"properties":{"StatusCode":{"type":"text"}}}}},"settings":{"index":{"number_of_shards":"1","number_of_replicas":"0"}}}'
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
drop:
- ALL
runAsUser: 1000
runAsNonRoot: true
restartPolicy: Always
terminationGracePeriodSeconds: 120
dnsPolicy: ClusterFirst
automountServiceAccountToken: true
securityContext:
runAsUser: 1000
fsGroup: 1000
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- elasticsearch-master
topologyKey: kubernetes.io/hostname
schedulerName: default-scheduler
enableServiceLinks: true
volumeClaimTemplates:
- kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: elasticsearch-master
creationTimestamp: null
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
volumeMode: Filesystem
status:
phase: Pending
serviceName: elasticsearch-master-headless
podManagementPolicy: Parallel
updateStrategy:
type: RollingUpdate
revisionHistoryLimit: 10
Any ideas?
Finally found the answer, maybe it helps lot of people in case they face something similar. When the pod is initializing endlessly is like sleeping. In my case a strange code inside my chart StatefullSet started causing this issue when security became enabled.
while [[ "$(curl -s -o /dev/null -w '%{http_code}\n'
$ES_URL)" != "200" ]]; do sleep 1; done
This will not return 200 as now the http excepts also a user and a password to authenticate and therefore is goes for a sleep.
So make sure that in case the pods are in initializing state and remaining there, there is no any while/sleep

Elasticsearch PVC

I am trying to build a es cluster using the helm chart with the following es yaml: values:
resources:
requests:
cpu: ".1"
memory: "2Gi"
limits:
cpu: "1"
memory: "3.5Gi"
volumeClaimTemplate:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 500Gi
esConfig:
elasticsearch.yml: |
path.data: /mnt/azure
The problem is that the pods are throwing the following error at the start
"Caused by: java.nio.file.AccessDeniedException: /mnt/azure"
I put the azure disk as default storage in order not to specify the storage class. I don't know if this is the best practice or should i create the storage and after that mount it to the pods
You need to keep init container to change mounted directory ownership
You can update your path as per need, for you changes will be for /mnt/azure
initContainers:
- command:
- sh
- -c
- chown -R 1000:1000 /usr/share/elasticsearch/data
- sysctl -w vm.max_map_count=262144
- chmod 777 /usr/share/elasticsearch/data
- chomod 777 /usr/share/elasticsearch/data/node
- chmod g+rwx /usr/share/elasticsearch/data
- chgrp 1000 /usr/share/elasticsearch/data
Example stateful sets file
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app : elasticsearch
component: elasticsearch
release: elasticsearch
name: elasticsearch
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app : elasticsearch
component: elasticsearch
release: elasticsearch
serviceName: elasticsearch
template:
metadata:
creationTimestamp: null
labels:
app : elasticsearch
component: elasticsearch
release: elasticsearch
spec:
containers:
- env:
- name: cluster.name
value: <SET THIS>
- name: discovery.type
value: single-node
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
- name: bootstrap.memory_lock
value: "false"
image: elasticsearch:6.5.0
imagePullPolicy: IfNotPresent
name: elasticsearch
ports:
- containerPort: 9200
name: http
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
resources:
limits:
cpu: 250m
memory: 1Gi
requests:
cpu: 150m
memory: 512Mi
securityContext:
privileged: true
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-data
dnsPolicy: ClusterFirst
initContainers:
- command:
- sh
- -c
- chown -R 1000:1000 /usr/share/elasticsearch/data
- sysctl -w vm.max_map_count=262144
- chmod 777 /usr/share/elasticsearch/data
- chomod 777 /usr/share/elasticsearch/data/node
- chmod g+rwx /usr/share/elasticsearch/data
- chgrp 1000 /usr/share/elasticsearch/data
image: busybox:1.29.2
imagePullPolicy: IfNotPresent
name: set-dir-owner
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-data
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 10
updateStrategy:
type: OnDelete
volumeClaimTemplates:
- metadata:
creationTimestamp: null
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
The mounted Elasticsearch data directory by default is owned by root. Try the following container to change it before Elasticsearch starts:
initContainers:
- name: chown
image: busybox
imagePullPolicy: IfNotPresent
command:
- chown
args:
- 1000:1000
- /mnt/azure
volumeMounts:
- name: <your volume claim template name>
mountPath: /mnt/azure

Reclaim data(keyspaces) in persistentVolume kubernetes cassandra

I have created Cassandra cluster using Kubernetes on AWS. I created volume as persistentVolume with reclaim policy as retain. But when I delete the pod(all instances) and recreate pod old data get lost.
Here is status of my setting.
$kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-1bc3f896-c0a5-11e8-84a8-02c7556b5a4a 320Gi RWO Retain Bound default/cassandra-storage-cassandra-1 gp2 21d
pvc-f3ff4203-c0a4-11e8-84a8-02c7556b5a4a 320Gi RWO Retain Bound default/cassandra-storage-cassandra-0 gp2 21d
$kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
cassandra-storage-cassandra-0 Bound pvc-f3ff4203-c0a4-11e8-84a8-02c7556b5a4a 320Gi RWO gp2 21d
cassandra-storage-cassandra-1 Bound pvc-1bc3f896-c0a5-11e8-84a8-02c7556b5a4a 320Gi RWO gp2 21d
$kubectl get pods
NAME READY STATUS RESTARTS AGE
cassandra-0 1/1 Running 0 39s
cassandra-1 1/1 Running 0 27s
$kubectl get statefulsets
NAME DESIRED CURRENT AGE
cassandra 2 2 1m
----
Now if I add some data(keyspaces, tables) and then delete statefulsets and again recreate it, old data(keyspaces, tables) are missing. As my policy is reclaim, it should be there.
Here is my statefulset creation yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cassandra
labels:
app: cassandra
spec:
serviceName: cassandra
replicas: 2
selector:
matchLabels:
app: cassandra
template:
metadata:
labels:
app: cassandra
spec:
terminationGracePeriodSeconds: 180
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app: cassandra
containers:
- env:
- name: MAX_HEAP_SIZE
value: 1024M
- name: HEAP_NEWSIZE
value: 1024M
- name: CASSANDRA_SEEDS
value: "cassandra-0.cassandra.default.svc.cluster.local"
- name: CASSANDRA_CLUSTER_NAME
value: "CassandraCluster"
- name: CASSANDRA_DC
value: "DC1-Cassandra"
- name: CASSANDRA_RACK
value: "Rack1-Cassandra"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
image: library/cassandra
name: cassandra
volumeMounts:
- mountPath: /cassandra-storage
name: cassandra-storage
volumeClaimTemplates:
- metadata:
name: cassandra-storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 320Gi
Configuration of PV are as follows.
$kubectl describe pv [10:33]
Name: pvc-1bc3f896-c0a5-11e8-84a8-02c7556b5a4a
Labels: failure-domain.beta.kubernetes.io/region=us-west-2
failure-domain.beta.kubernetes.io/zone=us-west-2b
Annotations: kubernetes.io/createdby=aws-ebs-dynamic-provisioner
pv.kubernetes.io/bound-by-controller=yes
pv.kubernetes.io/provisioned-by=kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pv-protection]
StorageClass: gp2
Status: Bound
Claim: default/cassandra-storage-cassandra-1
Reclaim Policy: Retain
Access Modes: RWO
Capacity: 320Gi
Node Affinity: <none>
Message:
Source:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: aws://us-west-2b/vol-0dceef39c7948c69e
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
Name: pvc-f3ff4203-c0a4-11e8-84a8-02c7556b5a4a
Labels: failure-domain.beta.kubernetes.io/region=us-west-2
failure-domain.beta.kubernetes.io/zone=us-west-2b
Annotations: kubernetes.io/createdby=aws-ebs-dynamic-provisioner
pv.kubernetes.io/bound-by-controller=yes
pv.kubernetes.io/provisioned-by=kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pv-protection]
StorageClass: gp2
Status: Bound
Claim: default/cassandra-storage-cassandra-0
Reclaim Policy: Retain
Access Modes: RWO
Capacity: 320Gi
Node Affinity: <none>
Message:
Source:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: aws://us-west-2b/vol-07c16900909f80cd1
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
am I missing some setting or reclaim is not possible when all statefulset get deleted, only individual pod deletion/restart can claim the volume data?
I think maybe your Cassandra's mountPath "/cassandra-storage" is not right, if you did not change your Cassandra data path, the default Cassandra data path is "/var/lib/cassandra/data", so you need to change the volume mountPath to "/var/lib/cassandra/data" in your Cassandra yaml file.

Multi-broker Kafka on Kubernetes how to set KAFKA_ADVERTISED_HOST_NAME

My current Kafka deployment file with 3 Kafka brokers looks like this:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: kafka
spec:
selector:
matchLabels:
app: kafka
serviceName: kafka-headless
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: Parallel
template:
metadata:
labels:
app: kafka
spec:
containers:
- name: kafka-instance
image: wurstmeister/kafka
ports:
- containerPort: 9092
env:
- name: KAFKA_ADVERTISED_PORT
value: "9092"
- name: KAFKA_ADVERTISED_HOST_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: KAFKA_ZOOKEEPER_CONNECT
value: "zookeeper-0.zookeeper-headless.default.svc.cluster.local:2181,\
zookeeper-1.zookeeper-headless.default.svc.cluster.local:2181,\
zookeeper-2.zookeeper-headless.default.svc.cluster.local:2181"
- name: BROKER_ID_COMMAND
value: "hostname | awk -F '-' '{print $2}'"
- name: KAFKA_CREATE_TOPICS
value: hello:2:1
volumeMounts:
- name: data
mountPath: /var/lib/kafka/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 50Gi
This creates 3 Kafka brokers as a Stateful Set and connects to the Zookeeper cluster using the Kubedns service with FQDN (Fully Qualified Domain Names) such as:
zookeeper-0.zookeeper-headless.default.svc.cluster.local:2181
Broker IDs are generated based on the pod name:
- name: BROKER_ID_COMMAND
value: "hostname | awk -F '-' '{print $2}'"
Result:
kafka-0 = 0
kafka-1 = 1
kafka-2 = 2
However, In order to use the Kubedns names for the Kafka brokers:
kafka-0.kafka-headless.default.svc.cluster.local:9092
kafka-1.kafka-headless.default.svc.cluster.local:9092
kafka-2.kafka-headless.default.svc.cluster.local:9092
I need to be able to set the KAFKA_ADVERTISED_HOST_NAME variable to the above FQDN values based on the name of the pod.
Currently I have the variable set to the name of the pod:
- name: KAFKA_ADVERTISED_HOST_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
Result:
KAFKA_ADVERTISED_HOST_NAME=kafka-0
KAFKA_ADVERTISED_HOST_NAME=kafka-1
KAFKA_ADVERTISED_HOST_NAME=kafka-2
But somehow I would need to append the rest of the DNS name.
Is there a way I could set the DNS value directly?
Something like that:
- name: KAFKA_ADVERTISED_HOST_NAME
valueFrom:
fieldRef:
fieldPath: kubedns.name
I managed to solve the problem with a command field inside the pod definition:
command:
- sh
- -c
- "export KAFKA_ADVERTISED_HOST_NAME=$(hostname).kafka-headless.default.svc.cluster.local &&
start-kafka.sh"
This runs a shell command which exports the advertised hostname environment variable based on the hostname value.
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: KAFKA_ZOOKEEPER_CONNECT
value: zook-zookeeper.zook.svc.cluster.local:2181
- name: KAFKA_PORT_NUMBER
value: "9092"
- name: KAFKA_LISTENERS
value: SASL_SSL://:$(KAFKA_PORT_NUMBER)
- name: KAFKA_ADVERTISED_LISTENERS
value: SASL_SSL://$(MY_POD_NAME).kafka-kafka-headless.kafka.svc.cluster.local:$(KAFKA_PORT_NUMBER)
The above config would create your FQDN.
You should be able to see those names in your Kafka logs when Kafka server starts.
NOTE: Kubernetes allows you to reference environment variables using the syntax $(VARIABLE)
None of the above worked for me; my setup it wurstmeister/kafka:2.12-2.5.0 and wurstmeister/zookeeper:3.4.6 in a single pod on Kubernetes 1.16 (don't ask); ClusterIp service on top which forwards 9092 to the Kafka container.
This set of environment variables works:
- name: KAFKA_LISTENERS
value: "INSIDE://:9094,OUTSIDE://:9092"
- name: KAFKA_ADVERTISED_LISTENERS
value: "INSIDE://:9094,OUTSIDE://my-service.my-namespace.svc.cluster.local:9092"
- name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
value: "INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT" # not production-ready!
- name: KAFKA_INTER_BROKER_LISTENER_NAME
value: INSIDE
- name: KAFKA_ZOOKEEPER_CONNECT
value: "localhost:2181" # since it's in the same pod
Sources: wurstmeister/kafka doc, Kafka doc
The inherent problem seems to be that Kafka itself needs to be an IP-ish thing to bind to and to talk to itself via, while clients need a DNS-ish name to connect to from the outside. The latter one can't contain the pod name for some reason. (Might be a separate configuration issue on my end.)

Resources