Azure policies(gatekeeper) monitoring on AKS via Prometheus and Grafana - azure

I have enabled azure policies via terraform and applied to AKS cluster. I can see pods are deployed, up and running. I applied in-built initiative here too with effect "audit" to test out how azure policies works on aks cluster.
$ kubectl get pods -n gatekeeper-system
NAME READY STATUS RESTARTS AGE
gatekeeper-audit-77754c7d8-g44qb 1/1 Running 0 44h
gatekeeper-controller-78cff9c89-7pftn 1/1 Running 0 44h
gatekeeper-controller-78cff9c89-8dsfg 1/1 Running 0 44h
I found a dashboard https://grafana.com/grafana/dashboards/15763
But some of the metrics are different/missing. Not sure, because, azure managing this gatekeeper!?. I see below some panel are displaying and metrics are available in prometheus. For example below opa_scorecard_constraint_violations not avilable.
How to monitor azure policies via prometheus properly

I don't think metrics like opa_scorecard_constraint_violations can be exported when you're using Azure Policies(+Gatekeeper)
However you can export gatekeeper metrics, you just need to create a service monitor to hit the proper endpoint.
My service monitor looks like this:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
labels:
monitoring: prometheus
name: gatekeeper-system-pod-monitor
namespace: monitoring
spec:
jobLabel: gatekeeper.sh/system
namespaceSelector:
matchNames:
- gatekeeper-system
podMetricsEndpoints:
- honorLabels: true
path: /metrics
port: metrics
selector:
matchLabels:
gatekeeper.sh/system: "yes"
grafana screenshot:
metrics

Related

Alert for aks pedning pods with azure monitor

I need to set an alert for pending pods in aks with azure monitor. If the pending pod count reaches 10 for a duration of 1hr an alert should be triggered. I have tried some ways but not able to get through.
I tried to create the alert rules for the pod state is pending
I have followed the below steps to create the alert rules
Have the AKS cluster and in that cluster I have created the alert rules as shown below
Go-To → Portal → AKS-cluster → click on Alert → create new alert
Selected the signal type as metrics and signal name as "Number of pods by phase"
In the condition page I have given the required fields
Used the dimensions to monitor the specific time series
To check alerts I have mentioned the look back period for 1 hour
I have enabled the recommended alert rules to send to email and click on review and create
I have created the example pod with pending state as shown below
apiVersion: v1
kind: Pod
metadata:
name: nodepod
labels:
name: nodepod
spec:
restartpolicy: Never
containers:
- name: nodepod
image: alpine
resources:
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 8080
nodeSelector:
memoryOptimised: "yes"
Created the 10 pods and all my pods are in pending state
I have reached the 10 pending pods so I got the new alert with fired state
Below is my newly created alert for the pod state pending

AKS PersistentVolume Affinity?

Disclaimer: This question is very specific about the used platforms and the UseCase we are trying to solve with it. Also it compares two approaches we currently use at least in a development stage and are trying to compare, but perhaps don't fully understand yet. I am asking for guidance on this very specific topic...
A) We are running a Kafka cluster as Kafka Tasks on DC/OS, where persistence of data is maintained via local Disk Storage which is provisioned on the very same host as the according kafka broker instance.
B) We are trying to run Kafka on Kubernetes (via Strimzi Operator), specifically Azure Kubernetes Service (AKS) and are struggling to get reliable Data Persistence using the StorageClasses you get in AKS. We tried three possibilities:
(Default) Azure Disk
Azure File
emptyDir
I see two major issues with Azure Disk, as we are able to set the Kafka Pod Affinity in a manner that they do not end up on the same maintenance zone / host, we have no instrument to bind the according PersistentVolume anywhere near the Pod. There is nothing like NodeAffinity for AzureDisks. Also it is fairly common that an Azure Disk ends up on another host than its corresponding pod, which might be limited by network bandwidth then?
With Azure File we don't have issues because of maintenance zones which are going down temporarily, but as a high latency storage option it doesn't seem to be a good fit and also Kafka has trouble to delete / update files on retention.
So I ended up using an ephemeral Storage Cluster which is commonly NOT recommended but doesn't come with the problems above. The Volume "lives" near the pod and is available to it as long as the pod itself runs on any node. In the maintenance case pod AND volume die together. As long as I am able to maintain a quorum, I don't see where this might cause issues.
Is there anything like podAffinity for PersistentVolumes as Azure-Disk is per definition Node bound?
What are the major downsides in using emptyDir for persistence in a Kafka Cluster on Kubernetes?
Is there anything like podAffinity for PersistentVolumes as Azure-Disk
is per definition Node bound?
As I know, there is nothing like podaffinity for PersistentVolumes as Azure-Disk. The azure disk should be attached to the node, so if the pod changes the host node, then the pod can't use the volume on that disk. Only the Azure file share is podAffinity.
What are the major downsides in using emptyDir for persistence in a
Kafka Cluster on Kubernetes?
You can take a look at the emptyDir:
scratch space, such as for a disk-based merge sort
This is the most thing you need to watch out for when you use the AKS. You need to calculate the disk space, perhaps you need to attach multiple Azure disks to the nodes.
Starting off - I'm not sure what you mean about an Azure Disk ending up on a node other than where the pod is assigned - that shouldn't be possible, per my understanding (for completeness, you can do this on a VM with the shared disks feature outside of AKS, but as far as I'm aware that's not supported in AKS for dynamic disks at the time of writing). If you're looking at the volume.kubernetes.io/selected-node annotation on the PVC, I don't believe that's updated after initial creation.
You can reach the configuration you're looking for by using a statefulset with antiaffinity. Consider this statefulset. It creates three pods, which must be in different availability zones. I'm deploying this to an AKS cluster with a nodepool (nodepool2) with two nodes per AZ:
❯ kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{","}{.metadata.labels.topology\.kubernetes\.io\/zone}{"\n"}{end}'
aks-nodepool1-25997496-vmss000000,0
aks-nodepool2-25997496-vmss000000,westus2-1
aks-nodepool2-25997496-vmss000001,westus2-2
aks-nodepool2-25997496-vmss000002,westus2-3
aks-nodepool2-25997496-vmss000003,westus2-1
aks-nodepool2-25997496-vmss000004,westus2-2
aks-nodepool2-25997496-vmss000005,westus2-3
Once the statefulset is deployed and spun up, you can see each pod was assigned to one of the nodepool2 nodes:
❯ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
echo-0 1/1 Running 0 3m42s 10.48.36.102 aks-nodepool2-25997496-vmss000001 <none> <none>
echo-1 1/1 Running 0 3m19s 10.48.36.135 aks-nodepool2-25997496-vmss000002 <none> <none>
echo-2 1/1 Running 0 2m55s 10.48.36.72 aks-nodepool2-25997496-vmss000000 <none> <none>
Each pod created a PVC based on the template:
❯ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
demo-echo-0 Bound pvc-bf6104e0-c05e-43d4-9ec5-fae425998f9d 1Gi RWO managed-premium 25m
demo-echo-1 Bound pvc-9d9fbd5f-617a-4582-abc3-ca34b1b178e4 1Gi RWO managed-premium 25m
demo-echo-2 Bound pvc-d914a745-688f-493b-9b82-21598d4335ca 1Gi RWO managed-premium 24m
Let's take a look at one of the PVs that was created:
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/bound-by-controller: "yes"
pv.kubernetes.io/provisioned-by: kubernetes.io/azure-disk
volumehelper.VolumeDynamicallyCreatedByKey: azure-disk-dynamic-provisioner
creationTimestamp: "2021-04-05T14:08:12Z"
finalizers:
- kubernetes.io/pv-protection
labels:
failure-domain.beta.kubernetes.io/region: westus2
failure-domain.beta.kubernetes.io/zone: westus2-3
name: pvc-9d9fbd5f-617a-4582-abc3-ca34b1b178e4
resourceVersion: "19275047"
uid: 945ad69a-92cc-4d8d-96f4-bdf0b80f9965
spec:
accessModes:
- ReadWriteOnce
azureDisk:
cachingMode: ReadOnly
diskName: kubernetes-dynamic-pvc-9d9fbd5f-617a-4582-abc3-ca34b1b178e4
diskURI: /subscriptions/02a062c5-366a-4984-9788-d9241055dda2/resourceGroups/rg-sandbox-aks-mc-sandbox0-westus2/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-9d9fbd5f-617a-4582-abc3-ca34b1b178e4
fsType: ""
kind: Managed
readOnly: false
capacity:
storage: 1Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: demo-echo-1
namespace: zonetest
resourceVersion: "19275017"
uid: 9d9fbd5f-617a-4582-abc3-ca34b1b178e4
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/region
operator: In
values:
- westus2
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- westus2-3
persistentVolumeReclaimPolicy: Delete
storageClassName: managed-premium
volumeMode: Filesystem
status:
phase: Bound
As you can see, that PV has a required nodeAffinity for nodes in failure-domain.beta.kubernetes.io/zone with value westus2-3. This ensures that the pod that owns that PV will only ever get placed on a node in westus2-3, and that PV will be bound to the node the disk is running on when the pod is started.
At this point, I deleted all the pods to get them on the other nodes:
❯ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
echo-0 1/1 Running 0 4m4s 10.48.36.168 aks-nodepool2-25997496-vmss000004 <none> <none>
echo-1 1/1 Running 0 3m30s 10.48.36.202 aks-nodepool2-25997496-vmss000005 <none> <none>
echo-2 1/1 Running 0 2m56s 10.48.36.42 aks-nodepool2-25997496-vmss000003 <none> <none>
There's no way to see it via Kubernetes, but you can see via the Azure portal that managed disk kubernetes-dynamic-pvc-bf6104e0-c05e-43d4-9ec5-fae425998f9d, which backs pv pvc-bf6104e0-c05e-43d4-9ec5-fae425998f9d, which backs PVC zonetest/demo-echo-0, is listed as Managed by: aks-nodepool2-25997496-vmss_4, so it's been removed and assigned to the node where the pod is running.
Portal screenshot showing disk attached to node 4
If I were to remove nodes such that I didn't have nodes in AZ 3, I wouldn't be able to start pod echo-1, since it's bound to a disk in AZ 3, which can't be attached to a node not in AZ 3.

Create new pod when old pod dies or crossed threshold

I am new bee to Kubernetes and I am doing some workaround on these pods.
I have 3 pods running in 3 different nodes. One of the Pod App is taking more usage 90+ and I want to create a health check for that.
Is there any way for creating a health check in Kubernetes ?
If I mention 80 CPU limit, Kubernetes will create new pod or not ?
You need a Horizontal Pod Autoscaler to scale pods. There is a simple guide that will walk you through creating one. Here's a resource example from the mentioned guide:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
as mentioned in the below answer , you are supposed to create a horizontal autoscaler component for that deployment object, the kubernetes metrics server will continuously be keeping watch on CPU utilization against each POD and once the usage crosses the threshold i.e. "averageUtilization: 50" ( as mentioned below ), then a new pod will get spawned once the existing pod reaches 50% of the CPU provided to it.
And this is different from health check thing, as health of a pod is decides whether to send traffic on that or not i.e. via liveness and readiness probes.
Make sure you mention the resources and limits for the POD in the deployment file that you create, so that HPA can take a reference value of CPU against which it will be calculating the percentage of utilization.

spark-submit with prometheus operator

I am trying to use spark on Kubernetes. Idea is to using spark-submit to k8s cluster which is running prometheus operator. Now I know that prometheus operator can respond to ServiceMonitor yaml but I am confused how to provide some of the things required in the YAML using spark-submit
Here is the YAML:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: sparkloads-metrics
namespace: runspark
spec:
selector:
matchLabels:
app: runspark
namespaceSelector:
matchNames:
- runspark
endpoints:
- port: 8192 ---> How to provide the name to port using `spark-submit`
interval: 30s
scheme: http
You cannot provide additional ports and their names to the Service created by SparkSubmit yet (Spark v2.4.4). Things can change in the later versions.
What you can do is create additional Kubernetes Service (Spark Monitoring Service, eg. of type Cluster IP) per Spark job after the Job submission with SparkSubmit, for instance running spark-submit ... && kubectl apply ... . Or use any of the available Kubernetes clients with the language of your choice.
Note that you can use Kubernetes OwnerReference to configure automatic Service deletion/GC on Spark Driver Pod deletion.
Then you can supply the ServiceMonitor's via the Prometheus Operator Helm values:
prometheus:
additionalServiceMonitors:
- name: spark-metrics # <- Spark Monitoring Service name
selector:
matchLabels:
k8s-app: spark-metrics # <- Spark Monitoring Service label
namespaceSelector:
any: true
endpoints:
- interval: 10s
port: metrics # <- Spark Monitoring Service port name
Be aware of the fact that Spark doesn't provide a way to customize Spark Pods yet, so your Pod ports which should expose metrics are not exposed on a Pod level and won't be accessible via Service. To overcome it you can add additional EXPOSE ... 8088 statement in the Dockerfile and rebuild Spark image.
This guide should help you to setup Spark monitoring with PULL strategy using for example Jmx Exporter.
There is an alternative (though it is recommended only for short-running Spark jobs, but you can try it in your environment if you do not run huge workloads):
Deploy Prometheus Pushgateway and integrate it with your Prometheus Operator
Configure Spark Prometheus Sink
By doing that your Spark Pods will PUSH metrics to the Gateway and Prometheus will PULL them from the Gateway in order.
You can refer the Spark Monitoring Helm chart example with the Prometheus Operator and Prometheus Pushgateway combined.
Hope it helps.

Azure AKS 'Kube-Proxy' Kubernetes Node Log file location?

My question is 'probably' specific to Azure.
How can I review the Kube-Proxy logs?
After SSH'ing into an Azure AKS Node (done) I can use the following to view the Kubelet logs:
journalctl -u kubelet -o cat
Azure docs on the Azure Kubelet logs can be found here:
https://learn.microsoft.com/en-us/azure/aks/kubelet-logs
I have reviewed the following Kubernetes resource regarding logs but Kube-Proxy logs on Azure do not appear in any of the suggested locations on the AKS node:
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/#looking-at-logs
This is part of a trouble shooting effort related to a Kubernetes nGinx Ingress temporarily returning a '504 Gateway Time-out' when a service has not been accessed / going idle for some period of time (perhaps 5 to 10 minutes) but then becoming accessible on the next attempt(s).
On AKS, kube-proxy runs as a DaemonSet in the kube-system namespace
You can list the kube-proxy pods + node information with:
kubectl get pods -l component=kube-proxy -n kube-system -o wide
And then you can review the logs by running:
kubectl logs kube-proxy-<suffix> -n kube-system
On the same note as Acanthamoeba's answer the logs for the Kube-Proxy pod can also be accessed via the browse UI interface that can be launched via:
az aks browse --resource-group <ClusterResourceGroup> --name <ClusterName>
The above should pop open a new browser window pointed at the following URL: http://127.0.0.1:8001/#!/overview?namespace=default
Switch to Kube-System Namespace
Once the browser window is open, change to the Kube-System namespace, by selecting that option from the drop down on the left side:
Kube-System namespace is all the way at the bottom of the drop down... and probably requires scrolling.
Navigate to Pods
From there click "pods" (also on the left hand side menu, below the namespaces drop down) and then click the Kube-Proxy pod:
View Kube-Proxy Logs
Click to view logs of your Azure AKS based Kube-Proxy pod, logs button in is in the top right hand menu to the left of "Delete' and 'Edit' just below create:
Other Azure AKS Trouble Shooting Resources
Since you are trying to view the Kube-Proxy logs you are probably trouble shooting some networking issues or something along those lines. Here are some other resources that I used during my trouble shooting tour of my Azure AKS Cluster:
View Kubelet Logs on Azure AKS: https://learn.microsoft.com/en-us/azure/aks/kubelet-logs
nGinx Ingress Troubleshooting: https://github.com/kubernetes/ingress-nginx/blob/master/docs/troubleshooting.md
SSH into an Azure AKS Cluster VM: https://learn.microsoft.com/en-us/azure/aks/aks-ssh

Resources