Elastic Search upgrade to v8 on Kubernetes - security

I am having an elastic search deployment on a Microsoft Kubernetes cluster that was deployed with a 7.x chart and I changed the image to 8.x. This upgrade worked and both elastic and Kibana was accessible, but now i need to enable THE new security feature which is included in the basic license from now on. The reason behind the security first came from the requirement to enable APM Server/Agents.
I have the following values:
- name: cluster.initial_master_nodes
value: elasticsearch-master-0,
- name: discovery.seed_hosts
value: elasticsearch-master-headless
- name: cluster.name
value: elasticsearch
- name: network.host
value: 0.0.0.0
- name: cluster.deprecation_indexing.enabled
value: 'false'
- name: node.roles
value: data,ingest,master,ml,remote_cluster_client
The elastic search and kibana pods are able to start but i am unable to set APM Integration due security. So I am enabling security using the below values:
- name: xpack.security.enabled
value: 'true'
Then i am getting an error log from the elasic search pod: "Transport SSL must be enabled if security is enabled. Please set [xpack.security.transport.ssl.enabled] to [true] or disable security by setting [xpack.security.enabled] to [false]". So i am enabling ssl using the below values:
- name: xpack.security.transport.ssl.enabled
value: 'true'
Then i am getting an error log from elastic search pod: "invalid SSL configuration for xpack.security.transport.ssl - server ssl configuration requires a key and certificate, but these have not been configured; you must set either [xpack.security.transport.ssl.keystore.path] (p12 file), or both [xpack.security.transport.ssl.key] (pem file) and [xpack.security.transport.ssl.certificate] (pem key file)".
I start with Option1, i am creating the keys using the below commands (no password / enter, enter / enter, enter, enter) and i am coping them to a persistent folder:
./bin/elasticsearch-certutil ca
./bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12
cp elastic-stack-ca.p12 data/elastic-stack-ca.p12
cp elastic-certificates.p12 data/elastic-certificates.p12
In addition I am also configuring the below values:
- name: xpack.security.transport.ssl.truststore.path
value: '/usr/share/elasticsearch/data/elastic-certificates.p12'
- name: xpack.security.transport.ssl.keystore.path
value: '/usr/share/elasticsearch/data/elastic-certificates.p12'
But the pod is still in initializing, if generate the certificates with password. then i am getting an error log from elastic search pod: "cannot read configured [PKCS12] keystore (as a truststore) [/usr/share/elasticsearch/data/elastic-certificates.p12] - this is usually caused by an incorrect password; (no password was provided)"
Then i go to Option2, i am creating the keys using the below commands and i am coping them to a persistent folder
./bin/elasticsearch-certutil ca --pem
unzip elastic-stack-ca.zip –d
cp ca.crt data/ca.crt
cp ca.key data/ca.key
In addition I am also configuring the below values:
- name: xpack.security.transport.ssl.key
value: '/usr/share/elasticsearch/data/ca.key'
- name: xpack.security.transport.ssl.certificate
value: '/usr/share/elasticsearch/data/ca.crt'
But the pod is still in initializing state without providing any logs, as i know while pod is in initializing state it does not produce any container logs. From portal side in events everything seems to be ok, except the elastic pod which is not in ready state.
At last i located the same issue to the eleastic search community, without any response: https://discuss.elastic.co/t/elasticsearch-pods-are-not-ready-when-xpack-security-enabled-is-configured/281709?u=s19k15
Here is my StatefullSet
status:
observedGeneration: 169
replicas: 1
updatedReplicas: 1
currentRevision: elasticsearch-master-7449d7bd69
updateRevision: elasticsearch-master-7d8c7b6997
collisionCount: 0
spec:
replicas: 1
selector:
matchLabels:
app: elasticsearch-master
template:
metadata:
name: elasticsearch-master
creationTimestamp: null
labels:
app: elasticsearch-master
chart: elasticsearch
release: platform
spec:
initContainers:
- name: configure-sysctl
image: docker.elastic.co/elasticsearch/elasticsearch:8.1.2
command:
- sysctl
- '-w'
- vm.max_map_count=262144
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
runAsUser: 0
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:8.1.2
ports:
- name: http
containerPort: 9200
protocol: TCP
- name: transport
containerPort: 9300
protocol: TCP
env:
- name: node.name
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: cluster.initial_master_nodes
value: elasticsearch-master-0,
- name: discovery.seed_hosts
value: elasticsearch-master-headless
- name: cluster.name
value: elasticsearch
- name: cluster.deprecation_indexing.enabled
value: 'false'
- name: ES_JAVA_OPTS
value: '-Xmx512m -Xms512m'
- name: node.roles
value: data,ingest,master,ml,remote_cluster_client
- name: xpack.license.self_generated.type
value: basic
- name: xpack.security.enabled
value: 'true'
- name: xpack.security.transport.ssl.enabled
value: 'true'
- name: xpack.security.transport.ssl.truststore.path
value: /usr/share/elasticsearch/data/elastic-certificates.p12
- name: xpack.security.transport.ssl.keystore.path
value: /usr/share/elasticsearch/data/elastic-certificates.p12
- name: xpack.security.http.ssl.enabled
value: 'true'
- name: xpack.security.http.ssl.truststore.path
value: /usr/share/elasticsearch/data/elastic-certificates.p12
- name: xpack.security.http.ssl.keystore.path
value: /usr/share/elasticsearch/data/elastic-certificates.p12
- name: logger.org.elasticsearch.discovery
value: debug
- name: path.logs
value: /usr/share/elasticsearch/data
- name: xpack.security.enrollment.enabled
value: 'true'
resources:
limits:
cpu: '1'
memory: 2Gi
requests:
cpu: 100m
memory: 512Mi
volumeMounts:
- name: elasticsearch-master
mountPath: /usr/share/elasticsearch/data
readinessProbe:
exec:
command:
- bash
- '-c'
- >
set -e
# If the node is starting up wait for the cluster to be ready
(request params: "wait_for_status=green&timeout=1s" )
# Once it has started only check that the node itself is
responding
START_FILE=/tmp/.es_start_file
# Disable nss cache to avoid filling dentry cache when calling
curl
# This is required with Elasticsearch Docker using nss < 3.52
export NSS_SDB_USE_CACHE=no
http () {
local path="${1}"
local args="${2}"
set -- -XGET -s
if [ "$args" != "" ]; then
set -- "$#" $args
fi
if [ -n "${ELASTIC_PASSWORD}" ]; then
set -- "$#" -u "elastic:${ELASTIC_PASSWORD}"
fi
curl --output /dev/null -k "$#" "http://127.0.0.1:9200${path}"
}
if [ -f "${START_FILE}" ]; then
echo 'Elasticsearch is already running, lets check the node is healthy'
HTTP_CODE=$(http "/" "-w %{http_code}")
RC=$?
if [[ ${RC} -ne 0 ]]; then
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
exit ${RC}
fi
# ready if HTTP code 200, 503 is tolerable if ES version is 6.x
if [[ ${HTTP_CODE} == "200" ]]; then
exit 0
elif [[ ${HTTP_CODE} == "503" && "8" == "6" ]]; then
exit 0
else
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
exit 1
fi
else
echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
touch ${START_FILE}
exit 0
else
echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
exit 1
fi
fi
initialDelaySeconds: 10
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 3
failureThreshold: 3
lifecycle:
postStart:
exec:
command:
- bash
- '-c'
- >
#!/bin/bash
# Create the
dev.general.logcreation.elasticsearchlogobject.v1.json index
ES_URL=http://localhost:9200
while [[ "$(curl -s -o /dev/null -w '%{http_code}\n'
$ES_URL)" != "200" ]]; do sleep 1; done
curl --request PUT --header 'Content-Type: application/json'
"$ES_URL/dev.general.logcreation.elasticsearchlogobject.v1.json/"
--data
'{"mappings":{"properties":{"Properties":{"properties":{"StatusCode":{"type":"text"}}}}},"settings":{"index":{"number_of_shards":"1","number_of_replicas":"0"}}}'
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
drop:
- ALL
runAsUser: 1000
runAsNonRoot: true
restartPolicy: Always
terminationGracePeriodSeconds: 120
dnsPolicy: ClusterFirst
automountServiceAccountToken: true
securityContext:
runAsUser: 1000
fsGroup: 1000
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- elasticsearch-master
topologyKey: kubernetes.io/hostname
schedulerName: default-scheduler
enableServiceLinks: true
volumeClaimTemplates:
- kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: elasticsearch-master
creationTimestamp: null
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
volumeMode: Filesystem
status:
phase: Pending
serviceName: elasticsearch-master-headless
podManagementPolicy: Parallel
updateStrategy:
type: RollingUpdate
revisionHistoryLimit: 10
Any ideas?

Finally found the answer, maybe it helps lot of people in case they face something similar. When the pod is initializing endlessly is like sleeping. In my case a strange code inside my chart StatefullSet started causing this issue when security became enabled.
while [[ "$(curl -s -o /dev/null -w '%{http_code}\n'
$ES_URL)" != "200" ]]; do sleep 1; done
This will not return 200 as now the http excepts also a user and a password to authenticate and therefore is goes for a sleep.
So make sure that in case the pods are in initializing state and remaining there, there is no any while/sleep

Related

kube-proxy changes reverting after couple of minutes on my AKS cluster

I am experimenting and tweaking a bit on my sandbox AKS cluster with the intention to configure it in a production ready state. Regarding that, I am following a book where the writer is redeployig the initial kube-proxy daemonset with some modification (the only difference is that he is doing it on AWS EKS).
The problem is that the daemonset and pod are getting to the initial state after 2-3 minutes. AKS is just doing a rollback, what I can se when execute the rollback command
> kubectl rollout history daemonset kube-proxy -n kube-system
daemonset.apps/kube-proxy
REVISION CHANGE-CAUSE
2 <none>
8 <none>
10 <none>
14 <none>
16 <none>
I tried to redeploy the daemonset with my minor changes (changed cpu from 100m to 120m and changed the -v flag from 3 to 2) declaretively by applying following manifest
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
component: kube-proxy
tier: node
deployment: custom
name: kube-proxy
namespace: kube-system
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
component: kube-proxy
tier: node
template:
metadata:
creationTimestamp: null
labels:
component: kube-proxy
tier: node
deployedBy: Luka
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.azure.com/cluster
operator: Exists
- key: type
operator: NotIn
values:
- virtual-kubelet
- key: kubernetes.io/os
operator: In
values:
- linux
containers:
- command:
- kube-proxy
- --conntrack-max-per-core=0
- --metrics-bind-address=0.0.0.0:10249
- --kubeconfig=/var/lib/kubelet/kubeconfig
- --cluster-cidr=10.244.0.0/16
- --detect-local-mode=ClusterCIDR
- --pod-interface-name-prefix=
- --v=2
image: mcr.microsoft.com/oss/kubernetes/kube-proxy:v1.23.12-hotfix.20220922.1
imagePullPolicy: IfNotPresent
name: kube-proxy
resources:
requests:
cpu: 120m
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/kubelet
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/certs
name: certificates
readOnly: true
- mountPath: /run/xtables.lock
name: iptableslock
- mountPath: /lib/modules
name: modules
dnsPolicy: ClusterFirst
hostNetwork: true
initContainers:
- command:
- /bin/sh
- -c
- |
SYSCTL=/proc/sys/net/netfilter/nf_conntrack_max
echo "Current net.netfilter.nf_conntrack_max: $(cat $SYSCTL)"
DESIRED=$(awk -F= '/net.netfilter.nf_conntrack_max/ {print $2}' /etc/sysctl.d/999-sysctl-aks.conf)
if [ -z "$DESIRED" ]; then
DESIRED=$((32768*$(nproc)))
if [ $DESIRED -lt 131072 ]; then
DESIRED=131072
fi
echo "AKS custom config for net.netfilter.nf_conntrack_max not set."
echo "Setting nf_conntrack_max to $DESIRED (32768 * $(nproc) cores, minimum 131072)."
echo $DESIRED > $SYSCTL
else
echo "AKS custom config for net.netfilter.nf_conntrack_max set to $DESIRED."
echo "Setting nf_conntrack_max to $DESIRED."
echo $DESIRED > $SYSCTL
fi
image: mcr.microsoft.com/oss/kubernetes/kube-proxy:v1.23.12-hotfix.20220922.1
imagePullPolicy: IfNotPresent
name: kube-proxy-bootstrap
resources:
requests:
cpu: 100m
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/sysctl.d
name: sysctls
- mountPath: /lib/modules
name: modules
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
- effect: NoSchedule
operator: Exists
volumes:
- hostPath:
path: /var/lib/kubelet
type: ""
name: kubeconfig
- hostPath:
path: /etc/kubernetes/certs
type: ""
name: certificates
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: iptableslock
- hostPath:
path: /etc/sysctl.d
type: Directory
name: sysctls
- hostPath:
path: /lib/modules
type: Directory
name: modules
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 4
desiredNumberScheduled: 4
numberAvailable: 4
numberMisscheduled: 0
numberReady: 4
observedGeneration: 1
updatedNumberScheduled: 4
I tried it also by removing the initContainer. Even the solution by editing the daemonset, explained in this stackoverlow post didnt worked.
Do I miss something? Why is the kube-proxy daemonset always rolling back?
In Kubernetes rolling updates are the default strategy to update running version of the application
When I upgrade the pods from version 1 to 2 the deployment will creates the new ReplicaSet and increase the count of replicas and previous count goes to 0
After rolling update, the previous replica set is not deleted
If we try to execute another rolling update from version 2 to 3 we might notice that at the end of the upgrade we have two replica sets with 0 count
I have created the deployment file and deployed when I check the history of the daemonset I am able to see below results
kubectl rollout history daemonset kube-proxy -n kube-system
We can rollback to the specific version
kubectl rollout undo daemonset kube-proxy --to-revision=4 -n kube-system
After undo changes my replica revision changes to my daemonset look like below
kubectl rollout history daemonset kube-proxy -n kube-system
In the above command we have two columns 1 is revision and another is change-cause and it is always set to none
I have set the change-cause to 'Kube' as mentioned below and got below results
If I try to get the rollout history again
kubernetes.io/change-cause: "Kube" #for particular revision
kubectl apply -f filename
kubectl rollout history daemonset kube-proxy -n kube-system
Reference: To know more about the rolling updates use this kubernetes link

Elasticsearch PVC

I am trying to build a es cluster using the helm chart with the following es yaml: values:
resources:
requests:
cpu: ".1"
memory: "2Gi"
limits:
cpu: "1"
memory: "3.5Gi"
volumeClaimTemplate:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 500Gi
esConfig:
elasticsearch.yml: |
path.data: /mnt/azure
The problem is that the pods are throwing the following error at the start
"Caused by: java.nio.file.AccessDeniedException: /mnt/azure"
I put the azure disk as default storage in order not to specify the storage class. I don't know if this is the best practice or should i create the storage and after that mount it to the pods
You need to keep init container to change mounted directory ownership
You can update your path as per need, for you changes will be for /mnt/azure
initContainers:
- command:
- sh
- -c
- chown -R 1000:1000 /usr/share/elasticsearch/data
- sysctl -w vm.max_map_count=262144
- chmod 777 /usr/share/elasticsearch/data
- chomod 777 /usr/share/elasticsearch/data/node
- chmod g+rwx /usr/share/elasticsearch/data
- chgrp 1000 /usr/share/elasticsearch/data
Example stateful sets file
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app : elasticsearch
component: elasticsearch
release: elasticsearch
name: elasticsearch
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app : elasticsearch
component: elasticsearch
release: elasticsearch
serviceName: elasticsearch
template:
metadata:
creationTimestamp: null
labels:
app : elasticsearch
component: elasticsearch
release: elasticsearch
spec:
containers:
- env:
- name: cluster.name
value: <SET THIS>
- name: discovery.type
value: single-node
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
- name: bootstrap.memory_lock
value: "false"
image: elasticsearch:6.5.0
imagePullPolicy: IfNotPresent
name: elasticsearch
ports:
- containerPort: 9200
name: http
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
resources:
limits:
cpu: 250m
memory: 1Gi
requests:
cpu: 150m
memory: 512Mi
securityContext:
privileged: true
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-data
dnsPolicy: ClusterFirst
initContainers:
- command:
- sh
- -c
- chown -R 1000:1000 /usr/share/elasticsearch/data
- sysctl -w vm.max_map_count=262144
- chmod 777 /usr/share/elasticsearch/data
- chomod 777 /usr/share/elasticsearch/data/node
- chmod g+rwx /usr/share/elasticsearch/data
- chgrp 1000 /usr/share/elasticsearch/data
image: busybox:1.29.2
imagePullPolicy: IfNotPresent
name: set-dir-owner
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-data
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 10
updateStrategy:
type: OnDelete
volumeClaimTemplates:
- metadata:
creationTimestamp: null
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
The mounted Elasticsearch data directory by default is owned by root. Try the following container to change it before Elasticsearch starts:
initContainers:
- name: chown
image: busybox
imagePullPolicy: IfNotPresent
command:
- chown
args:
- 1000:1000
- /mnt/azure
volumeMounts:
- name: <your volume claim template name>
mountPath: /mnt/azure

Is it possible to get k8s pod limits in my application?

I have an application running a kafka consumer inside a pod with 1.5GB of memory limit.
As you probably know, we need to write some logic to stop the consumer from fetching messages when we are about to reach the memory limit.
I was wondering to stop the consumer when the memory my application is using is above 75% of the memory limit.
So my question is... is it possible to get k8s memory limit runtime? How can I stop my consumer based how much free memory I have?
this.consumer.on('message', (message) => {
checkApplicationMemoryUsage();
executeSomethingWithMessage(message);
});
function checkApplicationMemoryUsage() {
const appMemoryConsumption = process.memoryUsage().heapUsed;
const appMemoryLimit = <?????>;
if (appMemoryConsumption / appMemoryLimit > 0.75) this.consumer.pause();
else this.consumer.resume();
}
The solution I was thinking is to pass the limits as env vars to my pod on the deployment spec, but I wish there was a better way
Yes, the Downward API provides a way to achieve what you need.
apiVersion: v1
kind: Pod
metadata:
name: kubernetes-downwardapi-volume-example-2
spec:
containers:
- name: client-container
image: k8s.gcr.io/busybox:1.24
command: ["sh", "-c"]
args:
- while true; do
echo -en '\n';
if [[ -e /etc/podinfo/cpu_limit ]]; then
echo -en '\n'; cat /etc/podinfo/cpu_limit; fi;
if [[ -e /etc/podinfo/cpu_request ]]; then
echo -en '\n'; cat /etc/podinfo/cpu_request; fi;
if [[ -e /etc/podinfo/mem_limit ]]; then
echo -en '\n'; cat /etc/podinfo/mem_limit; fi;
if [[ -e /etc/podinfo/mem_request ]]; then
echo -en '\n'; cat /etc/podinfo/mem_request; fi;
sleep 5;
done;
resources:
requests:
memory: "32Mi"
cpu: "125m"
limits:
memory: "64Mi"
cpu: "250m"
volumeMounts:
- name: podinfo
mountPath: /etc/podinfo
volumes:
- name: podinfo
downwardAPI:
items:
- path: "cpu_limit"
resourceFieldRef:
containerName: client-container
resource: limits.cpu
divisor: 1m
- path: "cpu_request"
resourceFieldRef:
containerName: client-container
resource: requests.cpu
divisor: 1m
- path: "mem_limit"
resourceFieldRef:
containerName: client-container
resource: limits.memory
divisor: 1Mi
- path: "mem_request"
resourceFieldRef:
containerName: client-container
resource: requests.memory
divisor: 1Mi
See the above example taken from Store Container fields section in K8s docs.

How does the master bootstrap process work and how can I debug it?

I am working to stand up 3 instances of the yugabyte master and tserver in separate k8s clusters connected over LoadBalancer services on bare metal. However, on all three master instances it looks like the bootstrap process is failing:
I0531 19:50:28.081645 1 master_main.cc:94] NumCPUs determined to be: 2
I0531 19:50:28.082594 1 server_base_options.cc:124] Updating master addrs to {yb-master-black.example.com:7100},{yb-master-blue.example.com:7100},{yb-master-white.example.com:7100},{:7100}
I0531 19:50:28.082682 1 server_base_options.cc:124] Updating master addrs to {yb-master-black.example.com:7100},{yb-master-blue.example.com:7100},{yb-master-white.example.com:7100},{:7100}
I0531 19:50:28.082937 1 mem_tracker.cc:249] MemTracker: hard memory limit is 1.699219 GB
I0531 19:50:28.082963 1 mem_tracker.cc:251] MemTracker: soft memory limit is 1.444336 GB
I0531 19:50:28.083189 1 server_base_options.cc:124] Updating master addrs to {yb-master-black.example.com:7100},{yb-master-blue.example.com:7100},{yb-master-white.example.com:7100},{:7100}
I0531 19:50:28.090148 1 server_base_options.cc:124] Updating master addrs to {yb-master-black.example.com:7100},{yb-master-blue.example.com:7100},{yb-master-white.example.com:7100},{:7100}
I0531 19:50:28.090863 1 rpc_server.cc:86] yb::server::RpcServer created at 0x1a7e210
I0531 19:50:28.090924 1 master.cc:146] yb::master::Master created at 0x7ffe2d4bd140
I0531 19:50:28.090958 1 master.cc:147] yb::master::TSManager created at 0x1a90850
I0531 19:50:28.090975 1 master.cc:148] yb::master::CatalogManager created at 0x1dea000
I0531 19:50:28.091152 1 master_main.cc:115] Initializing master server...
I0531 19:50:28.093097 1 server_base.cc:462] Could not load existing FS layout: Not found (yb/util/env_posix.cc:1482): /mnt/disk0/yb-data/master/instance: No such file or directory (system error 2)
I0531 19:50:28.093150 1 server_base.cc:463] Creating new FS layout
I0531 19:50:28.193439 1 fs_manager.cc:463] Generated new instance metadata in path /mnt/disk0/yb-data/master/instance:
uuid: "5f2f6ad78d27450b8cde9c8bcf40fefa"
format_stamp: "Formatted at 2020-05-31 19:50:28 on yb-master-0"
I0531 19:50:28.238484 1 fs_manager.cc:463] Generated new instance metadata in path /mnt/disk1/yb-data/master/instance:
uuid: "5f2f6ad78d27450b8cde9c8bcf40fefa"
format_stamp: "Formatted at 2020-05-31 19:50:28 on yb-master-0"
I0531 19:50:28.377483 1 fs_manager.cc:251] Opened local filesystem: /mnt/disk0,/mnt/disk1
uuid: "5f2f6ad78d27450b8cde9c8bcf40fefa"
format_stamp: "Formatted at 2020-05-31 19:50:28 on yb-master-0"
I0531 19:50:28.378015 1 server_base.cc:245] Auto setting FLAGS_num_reactor_threads to 2
I0531 19:50:28.380707 1 thread_pool.cc:166] Starting thread pool { name: Master queue_limit: 10000 max_workers: 1024 }
I0531 19:50:28.382266 1 master_main.cc:118] Starting Master server...
I0531 19:50:28.382313 24 async_initializer.cc:74] Starting to init ybclient
I0531 19:50:28.382365 1 master_main.cc:119] ulimit cur(max)...
ulimit: core file size unlimited(unlimited) blks
ulimit: data seg size unlimited(unlimited) kb
ulimit: open files 1048576(1048576)
ulimit: file size unlimited(unlimited) blks
ulimit: pending signals 22470(22470)
ulimit: file locks unlimited(unlimited)
ulimit: max locked memory 64(64) kb
ulimit: max memory size unlimited(unlimited) kb
ulimit: stack size 8192(unlimited) kb
ulimit: cpu time unlimited(unlimited) secs
ulimit: max user processes unlimited(unlimited)
W0531 19:50:28.383322 24 master.cc:186] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:6854): Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
I0531 19:50:28.383525 24 client-internal.cc:1847] New master addresses: [yb-master-black.example.com:7100,yb-master-blue.example.com:7100,yb-master-white.example.com:7100,:7100]
I0531 19:50:28.383685 1 service_pool.cc:148] yb.master.MasterBackupService: yb::rpc::ServicePoolImpl created at 0x1a82b40
I0531 19:50:28.384888 1 service_pool.cc:148] yb.master.MasterService: yb::rpc::ServicePoolImpl created at 0x1a83680
I0531 19:50:28.385342 1 service_pool.cc:148] yb.tserver.TabletServerService: yb::rpc::ServicePoolImpl created at 0x1a838c0
I0531 19:50:28.388526 1 thread_pool.cc:166] Starting thread pool { name: Master-high-pri queue_limit: 10000 max_workers: 1024 }
I0531 19:50:28.388588 1 service_pool.cc:148] yb.consensus.ConsensusService: yb::rpc::ServicePoolImpl created at 0x201eb40
I0531 19:50:28.393231 1 service_pool.cc:148] yb.tserver.RemoteBootstrapService: yb::rpc::ServicePoolImpl created at 0x201ed80
I0531 19:50:28.393501 1 webserver.cc:148] Starting webserver on 0.0.0.0:7000
I0531 19:50:28.393544 1 webserver.cc:153] Document root: /home/yugabyte/www
I0531 19:50:28.394471 1 webserver.cc:240] Webserver started. Bound to: http://0.0.0.0:7000/
I0531 19:50:28.394668 1 service_pool.cc:148] yb.server.GenericService: yb::rpc::ServicePoolImpl created at 0x201efc0
I0531 19:50:28.395015 1 rpc_server.cc:169] RPC server started. Bound to: 0.0.0.0:7100
I0531 19:50:28.420223 23 tcp_stream.cc:308] { local: 10.233.80.35:55710 remote: 172.16.0.34:7100 }: Recv failed: Network error (yb/util/net/socket.cc:537): recvmsg error: Connection refused (system error 111)
E0531 19:51:28.523921 24 async_initializer.cc:84] Failed to initialize client: Timed out (yb/rpc/rpc.cc:213): Could not locate the leader master: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 293) passed its deadline 2074493.105s (passed: 60.140s): Not found (yb/master/master_rpc.cc:284): no leader found: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 1)
W0531 19:51:29.524827 24 master.cc:186] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:6854): Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
I0531 19:51:29.524914 24 client-internal.cc:1847] New master addresses: [yb-master-black.example.com:7100,yb-master-blue.example.com:7100,yb-master-white.example.com:7100,:7100]
E0531 19:52:29.524785 24 async_initializer.cc:84] Failed to initialize client: Timed out (yb/rpc/outbound_call.cc:512): Could not locate the leader master: GetMasterRegistration RPC (request call id 2359) to 172.29.1.1:7100 timed out after 0.033s
W0531 19:52:30.525079 24 master.cc:186] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:6854): Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
I0531 19:52:30.525205 24 client-internal.cc:1847] New master addresses: [yb-master-black.example.com:7100,yb-master-blue.example.com:7100,yb-master-white.example.com:7100,:7100]
W0531 19:53:28.114395 36 master-path-handlers.cc:150] Illegal state (yb/master/catalog_manager.cc:6854): Unable to list Masters: Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
W0531 19:53:29.133951 36 master-path-handlers.cc:1002] Illegal state (yb/master/catalog_manager.cc:6854): Unable to list Masters: Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
E0531 19:53:30.625366 24 async_initializer.cc:84] Failed to initialize client: Timed out (yb/rpc/rpc.cc:213): Could not locate the leader master: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 299) passed its deadline 2074615.247s (passed: 60.099s): Not found (yb/master/master_rpc.cc:284): no leader found: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 1)
W0531 19:53:31.625660 24 master.cc:186] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:6854): Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
I0531 19:53:31.625742 24 client-internal.cc:1847] New master addresses: [yb-master-black.example.com:7100,yb-master-blue.example.com:7100,yb-master-white.example.com:7100,:7100]
W0531 19:53:34.024369 37 master-path-handlers.cc:150] Illegal state (yb/master/catalog_manager.cc:6854): Unable to list Masters: Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
E0531 19:54:31.870801 24 async_initializer.cc:84] Failed to initialize client: Timed out (yb/rpc/rpc.cc:213): Could not locate the leader master: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 300) passed its deadline 2074676.348s (passed: 60.244s): Not found (yb/master/master_rpc.cc:284): no leader found: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 1)
W0531 19:54:32.871065 24 master.cc:186] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:6854): Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
I0531 19:54:32.871222 24 client-internal.cc:1847] New master addresses: [yb-master-black.example.com:7100,yb-master-blue.example.com:7100,yb-master-white.example.com:7100,:7100]
W0531 19:55:28.190217 41 master-path-handlers.cc:1002] Illegal state (yb/master/catalog_manager.cc:6854): Unable to list Masters: Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
W0531 19:55:31.745038 42 master-path-handlers.cc:1002] Illegal state (yb/master/catalog_manager.cc:6854): Unable to list Masters: Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
E0531 19:55:33.164300 24 async_initializer.cc:84] Failed to initialize client: Timed out (yb/rpc/rpc.cc:213): Could not locate the leader master: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 299) passed its deadline 2074737.593s (passed: 60.292s): Not found (yb/master/master_rpc.cc:284): no leader found: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 1)
W0531 19:55:34.164574 24 master.cc:186] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:6854): Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
I0531 19:55:34.164667 24 client-internal.cc:1847] New master addresses: [yb-master-black.example.com:7100,yb-master-blue.example.com:7100,yb-master-white.example.com:7100,:7100]
E0531 19:56:34.315380 24 async_initializer.cc:84] Failed to initialize client: Timed out (yb/rpc/rpc.cc:213): Could not locate the leader master: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 299) passed its deadline 2074798.886s (passed: 60.150s): Not found (yb/master/master_rpc.cc:284): no leader found: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 1)
As far as connectivity goes, I am able to verify the LoadBalancer endpoints are responding across the different network boundaries by curling the same service endpoint but on the UI port:
[root#yb-master-0 yugabyte]# curl -I http://yb-master-blue.example.com:7000
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1975
Access-Control-Allow-Origin: *
[root#yb-master-0 yugabyte]# curl -I http://yb-master-white.example.com:7000
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1975
Access-Control-Allow-Origin: *
[root#yb-master-0 yugabyte]# curl -I http://yb-master-black.example.com:7000
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1975
Access-Control-Allow-Origin: *
What strategies are there to debug the bootstrap process?
EDIT:
Here are the startup flags for the master:
/home/yugabyte/bin/yb-master --fs_data_dirs=/mnt/disk0,/mnt/disk1 --server_broadcast_addresses=yb-master-white.example.com:7100 --master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, --replication_factor=3 --enable_ysql=true --rpc_bind_addresses=0.0.0.0:7100 --metric_node_name=yb-master-0 --memory_limit_hard_bytes=1824522240 --stderrthreshold=0 --num_cpus=2 --undefok=num_cpus,enable_ysql --default_memory_limit_to_ram_ratio=0.85 --leader_failure_max_missed_heartbeat_periods=10 --placement_cloud=AAAA --placement_region=XXXX --placement_zone=XXXX
/home/yugabyte/bin/yb-master --fs_data_dirs=/mnt/disk0,/mnt/disk1 --server_broadcast_addresses=yb-master-blue.example.com:7100 --master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, --replication_factor=3 --enable_ysql=true --rpc_bind_addresses=0.0.0.0:7100 --metric_node_name=yb-master-0 --memory_limit_hard_bytes=1824522240 --stderrthreshold=0 --num_cpus=2 --undefok=num_cpus,enable_ysql --default_memory_limit_to_ram_ratio=0.85 --leader_failure_max_missed_heartbeat_periods=10 --placement_cloud=AAAA --placement_region=YYYY --placement_zone=YYYY
/home/yugabyte/bin/yb-master --fs_data_dirs=/mnt/disk0,/mnt/disk1 --server_broadcast_addresses=yb-master-black.example.com:7100 --master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, --replication_factor=3 --enable_ysql=true --rpc_bind_addresses=0.0.0.0:7100 --metric_node_name=yb-master-0 --memory_limit_hard_bytes=1824522240 --stderrthreshold=0 --num_cpus=2 --undefok=num_cpus,enable_ysql --default_memory_limit_to_ram_ratio=0.85 --leader_failure_max_missed_heartbeat_periods=10 --placement_cloud=AAAA --placement_region=ZZZZ --placement_zone=ZZZZ
For the sake of completeness here is one of the k8s manifest that I've modified from one of the helm examples. It is modified to utilize LoadBalancer for the master service:
---
# Source: yugabyte/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: "yb-masters"
labels:
app: "yb-master"
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
type: LoadBalancer
loadBalancerIP: 172.16.0.34
ports:
- name: "rpc-port"
port: 7100
- name: "ui"
port: 7000
selector:
app: "yb-master"
---
# Source: yugabyte/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: "yb-tservers"
labels:
app: "yb-tserver"
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
clusterIP: None
ports:
- name: "rpc-port"
port: 7100
- name: "ui"
port: 9000
- name: "yedis-port"
port: 6379
- name: "yql-port"
port: 9042
- name: "ysql-port"
port: 5433
selector:
app: "yb-tserver"
---
# Source: yugabyte/templates/service.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: "yb-master"
namespace: "yugabytedb"
labels:
app: "yb-master"
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
serviceName: "yb-masters"
podManagementPolicy: Parallel
replicas: 1
volumeClaimTemplates:
- metadata:
name: datadir0
annotations:
volume.beta.kubernetes.io/storage-class: rook-ceph-block
labels:
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: rook-ceph-block
resources:
requests:
storage: 10Gi
- metadata:
name: datadir1
annotations:
volume.beta.kubernetes.io/storage-class: rook-ceph-block
labels:
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: rook-ceph-block
resources:
requests:
storage: 10Gi
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0
selector:
matchLabels:
app: "yb-master"
template:
metadata:
labels:
app: "yb-master"
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
affinity:
# Set the anti-affinity selector scope to YB masters.
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- "yb-master"
topologyKey: kubernetes.io/hostname
containers:
- name: "yb-master"
image: "yugabytedb/yugabyte:2.1.6.0-b17"
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- "sh"
- "-c"
- >
mkdir -p /mnt/disk0/cores;
mkdir -p /mnt/disk0/yb-data/scripts;
if [ ! -f /mnt/disk0/yb-data/scripts/log_cleanup.sh ]; then
if [ -f /home/yugabyte/bin/log_cleanup.sh ]; then
cp /home/yugabyte/bin/log_cleanup.sh /mnt/disk0/yb-data/scripts;
fi;
fi
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
limits:
cpu: 2
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
command:
- "/home/yugabyte/bin/yb-master"
- "--fs_data_dirs=/mnt/disk0,/mnt/disk1"
- "--server_broadcast_addresses=yb-master-blue.example.com:7100"
- "--master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, "
- "--replication_factor=3"
- "--enable_ysql=true"
- "--rpc_bind_addresses=0.0.0.0:7100"
- "--metric_node_name=$(HOSTNAME)"
- "--memory_limit_hard_bytes=1824522240"
- "--stderrthreshold=0"
- "--num_cpus=2"
- "--undefok=num_cpus,enable_ysql"
- "--default_memory_limit_to_ram_ratio=0.85"
- "--leader_failure_max_missed_heartbeat_periods=10"
- "--placement_cloud=AAAA"
- "--placement_region=YYYY"
- "--placement_zone=YYYY"
ports:
- containerPort: 7100
name: "rpc-port"
- containerPort: 7000
name: "ui"
volumeMounts:
- name: datadir0
mountPath: /mnt/disk0
- name: datadir1
mountPath: /mnt/disk1
- name: yb-cleanup
image: busybox:1.31
env:
- name: USER
value: "yugabyte"
command:
- "/bin/sh"
- "-c"
- >
mkdir /var/spool/cron;
mkdir /var/spool/cron/crontabs;
echo "0 * * * * /home/yugabyte/scripts/log_cleanup.sh" | tee -a /var/spool/cron/crontabs/root;
crond;
while true; do
sleep 86400;
done
volumeMounts:
- name: datadir0
mountPath: /home/yugabyte/
subPath: yb-data
volumes:
- name: datadir0
hostPath:
path: /mnt/disks/ssd0
- name: datadir1
hostPath:
path: /mnt/disks/ssd1
---
# Source: yugabyte/templates/service.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: "yb-tserver"
namespace: "yugabytedb"
labels:
app: "yb-tserver"
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
serviceName: "yb-tservers"
podManagementPolicy: Parallel
replicas: 1
volumeClaimTemplates:
- metadata:
name: datadir0
annotations:
volume.beta.kubernetes.io/storage-class: rook-ceph-block
labels:
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: rook-ceph-block
resources:
requests:
storage: 10Gi
- metadata:
name: datadir1
annotations:
volume.beta.kubernetes.io/storage-class: rook-ceph-block
labels:
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: rook-ceph-block
resources:
requests:
storage: 10Gi
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0
selector:
matchLabels:
app: "yb-tserver"
template:
metadata:
labels:
app: "yb-tserver"
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
affinity:
# Set the anti-affinity selector scope to YB masters.
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- "yb-tserver"
topologyKey: kubernetes.io/hostname
containers:
- name: "yb-tserver"
image: "yugabytedb/yugabyte:2.1.6.0-b17"
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- "sh"
- "-c"
- >
mkdir -p /mnt/disk0/cores;
mkdir -p /mnt/disk0/yb-data/scripts;
if [ ! -f /mnt/disk0/yb-data/scripts/log_cleanup.sh ]; then
if [ -f /home/yugabyte/bin/log_cleanup.sh ]; then
cp /home/yugabyte/bin/log_cleanup.sh /mnt/disk0/yb-data/scripts;
fi;
fi
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 500m
memory: 2Gi
command:
- "/home/yugabyte/bin/yb-tserver"
- "--fs_data_dirs=/mnt/disk0,/mnt/disk1"
- "--server_broadcast_addresses=$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local:9100"
- "--rpc_bind_addresses=$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local"
- "--cql_proxy_bind_address=$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local"
- "--enable_ysql=true"
- "--pgsql_proxy_bind_address=$(POD_IP):5433"
- "--tserver_master_addrs=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, "
- "--metric_node_name=$(HOSTNAME)"
- "--memory_limit_hard_bytes=3649044480"
- "--stderrthreshold=0"
- "--num_cpus=2"
- "--undefok=num_cpus,enable_ysql"
- "--leader_failure_max_missed_heartbeat_periods=10"
- "--placement_cloud=AAAA"
- "--placement_region=YYYY"
- "--placement_zone=YYYY"
- "--use_cassandra_authentication=false"
ports:
- containerPort: 7100
name: "rpc-port"
- containerPort: 9000
name: "ui"
- containerPort: 6379
name: "yedis-port"
- containerPort: 9042
name: "yql-port"
- containerPort: 5433
name: "ysql-port"
volumeMounts:
- name: datadir0
mountPath: /mnt/disk0
- name: datadir1
mountPath: /mnt/disk1
- name: yb-cleanup
image: busybox:1.31
env:
- name: USER
value: "yugabyte"
command:
- "/bin/sh"
- "-c"
- >
mkdir /var/spool/cron;
mkdir /var/spool/cron/crontabs;
echo "0 * * * * /home/yugabyte/scripts/log_cleanup.sh" | tee -a /var/spool/cron/crontabs/root;
crond;
while true; do
sleep 86400;
done
volumeMounts:
- name: datadir0
mountPath: /home/yugabyte/
subPath: yb-data
volumes:
- name: datadir0
hostPath:
path: /mnt/disks/ssd0
- name: datadir1
hostPath:
path: /mnt/disks/ssd1
This was mostly resolved (looks like I've now run into an unrelated issue), by dropping the extraneous comma on the master addresses list:
--master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100,
vs
--master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100

Kubernetes CronJob Failure to start a job "Timeout" and "Job already exists"

I am trying to run a cronjob in kubernetes, but keep to having these two errors:
type: 'Warning' reason: 'FailedCreate' Error creating job: jobs.batch "dev-cron-1516702680" already exists
and
type: 'Warning' reason: 'FailedCreate' Error creating job: Timeout: request did not complete within allowed duration
Below are my cronjob yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
creationTimestamp: 2018-01-23T09:45:10Z
name: dev-cron
namespace: dev
resourceVersion: "16768201"
selfLink: /apis/batch/v1beta1/namespaces/dev/cronjobs/dev-cron
uid: 1a32eb94-0022-11e8-9256-065eb556d6a2
spec:
concurrencyPolicy: Allow
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
spec:
containers:
- args:
- for country in th;
- do
- 'curl -X POST -d "{'footprint':'xxxx-xxxx'}"-H "Content-Type: application/json" https://dev.xxx.com/xxx/xxx'
- done
image: appropriate/curl:latest
imagePullPolicy: Always
name: cron
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '* * * * *'
startingDeadlineSeconds: 10
successfulJobsHistoryLimit: 3
suspend: false
status: {}
I am not sure why this is keep happening. I am running Kubernetes version 1.9.1, in AWS cluster. Any idea why?
It turned out to happen because there is an auto injector by Istio initializer. Once I disabled Istio initializer injection for cronjobs, it works fine.

Resources