Messages not picked up by queue after KEDA scaling in AKS - azure

I have deployed an AKS cluster which I also installed KEDA on. Now this KEDA scaler will schedule a pod for me when a new message appears in an Azure Storage Queue. Each message is processed by one pod, and each pod (except for system pods) will be scheduled on a new node due to it being very resource expensive.
The application on the pod is essentially just an Azure Function written in Node. I found it easier to use it this way because now I don't have to use any SDK's and can just use bindings.
Now KEDA successfully launches one scaled job per message, but when the pod(s) start running I do see the logs of the Azure Functions Runtime saying that it started, but not that it's being triggered by the message in the queue. If I check the queue I also don't see the messages being picked up, they're still there.
Now I also checked the storage connection string environment variable and that's correct.
I found two places where I can make configurations which help me with this: the host.json in the Azure Function and the deployment.yml which I applied for the ScaledJob.
In the host.json I've set the batchSize to 1, batchTreshold to 0.
Below is my yaml to apply. Please don't mind the indentation, it's correct in the file as it's applying fine.
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: places-job
namespace: places
spec:
jobTargetRef:
parallelism: 10
completions: 1
backoffLimit: 3
template:
metadata:
namespace: places
labels:
app: places
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- places
topologyKey: kubernetes.io/hostname
containers:
- name: places
image: {azure-container-registry}:latest
env:
- name: AzureFunctionsJobHost__functions__0
value: places
envFrom:
- secretRef:
name: places
readinessProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 240
httpGet:
path: /
port: 80
scheme: HTTP
startupProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 240
httpGet:
path: /
port: 80
scheme: HTTP
resources:
limits:
cpu: 4
memory: 10G
restartPolicy: Never
pollingInterval: 10 # Optional. Default: 30 seconds
successfulJobsHistoryLimit: 1 # Optional. Default: 100. How many completed jobs should be kept.
failedJobsHistoryLimit: 1 # Optional. Default: 100. How many failed jobs should be kept.
triggers:
- type: azure-queue
metadata:
queueName: places-requests
queueLength: "1"
activationQueueLength: "0"
connectionFromEnv: AzureWebJobsStorage
accountName: {storage-account-name}
Does anyone know any other setting I might have to tweak so that the newly deployed pods/nodes do pick up the messages from the queue?

Related

Azure AKS and Application Gateway returning 404

I have a AKS cluster deployed with an Application Gateway. These are all docker images running on the AKS cluster with a simple ingress. They all run on the same default namespace. One is a Vue frontend, the second is a Java spring backend, the last being a Fullstack Tomcat image. All three services ran completely fine without any issues; however, today all of the 3 services gave back a 404 error (without any changes to the cluster to my knowledge).
When testing for the health of these services, I was still able to kubectl to all the services. In addition, the Health Probe on Azure Application Gateway returned a healthy 200 status code for all three services.
I have tried removing the workloads and adding them back again. I have tried removing the services then adding them back again. I have done the same for the ingress as well.
I have also tried setting up the entire cluster from a fresh AKS cluster from scratch from a new AKS cluster, all three images perform without any issues.
The Yaml for a application on the AKS network looks like:
- apiVersion: apps/v1
kind: Deployment
metadata:
name: testvuefe
spec:
replicas: 2
selector:
matchLabels:
app: testvuefe
template:
metadata:
labels:
app: testvuefe
spec:
nodeSelector:
kubernetes.io/os: linux
containers:
- name: testvuefe
image: testdocker.azurecr.io/testvuefe:latest
ports:
- containerPort: 80
resources:
requests:
cpu: '0'
memory: '0'
limits:
cpu: '128'
memory: 512G
volumeMounts:
- mountPath: "/fileShare"
name: volume
volumes:
- name: volume
persistentVolumeClaim:
claimName: aks-azurefile
- apiVersion: v1
kind: Service
metadata:
name: testvuefe-service
spec:
type: ClusterIP
ports:
- targetPort: 80
name: port80
port: 80
protocol: TCP
selector:
app: testvuefe
- apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: testvuefe-ingress
annotations:
kubernetes.io/ingress.class: azure/application-gateway
appgw.ingress.kubernetes.io/health-probe-path: /user
appgw.ingress.kubernetes.io/cookie-based-affinity: "true"
spec:
rules:
- host: test.wbsoft.co.kr
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: testvuefe-service
port:
number: 80
As this issue as occurred once on a cluster, I need to understand what is causing these issues so that it does not happen during production. It would also be great if a solution can be found for the issue.

How to change, edit, save /etc/hosts file from Azure Bash for Kubernetes?

I uploaded a project to kubernetes and for its gateway to redirect the services, it requires the following:
127.0.0.1 app.my.project
127.0.0.1 event-manager.my.project
127.0.0.1 logger.my.project
and so on.
I can't run any sudo commands, so sudo nano /etc/hosts doesnt work. I tried vi /etc/hosts and it gives permission denied error. How can I edit /etc/hosts file or do some configuration on Azure to make it work like that?
Edit:
To give more information, I have uploaded a project to Kubernetes that has reverse-proxy settings.
So reaching the web-app of that project is not available via IP. Instead if I'm running the application locally, I have to edit the hosts file of the computer I'm using with
127.0.0.1 app.my.project
127.0.0.1 event-manager.my.project
127.0.0.1 logger.my.project
and so on. So whenever I type web-app.my.project its gateway redirects to web-app part and if I write app.my.project it redirects to app part, etc.
When I uploaded it to Azure Kubernetes Service it added default-http-backend on ingress-nginx namespace which created by itself. To expose these services, I opened the Http Routing option from Azure which gave me the loadbalancer at the left side of the image. So If I'm reading the situation correctly, (I'm most probably wrong though) it is something like the image below:
So, I added hostaliases to kube-system, ingress-nginx and default namespaces to make it like I edited a hosts file when I was running the project locally. But it still gives me that default backend - 404 ingress error
Edit 2:
I have nginx-ingress-controller which allows the redirection as far as I understand. So, when I add hostaliases to it as follows:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-ingress-controller
namespace: ingress-nginx
spec:
replicas: 1
selector:
matchLabels:
app: ingress-nginx
template:
metadata:
labels:
app: ingress-nginx
annotations:
prometheus.io/port: '10254'
prometheus.io/scrape: 'true'
spec:
hostAliases:
- ip: "127.0.0.1"
hostnames:
- "app.ota.local"
- "gateway.ota.local"
- "treehub.ota.local"
- "tuf-reposerver.ota.local"
- "web-events.ota.local"
hostNetwork: true
serviceAccountName: nginx-ingress-serviceaccount
containers:
- name: nginx-ingress-controller
image: {{ .ingress_controller_docker_image }}
args:
- /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-http-backend
- --configmap=$(POD_NAMESPACE)/nginx-configuration
- --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
- --udp-services-configmap=$(POD_NAMESPACE)/udp-services
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- name: tcp
containerPort: 8000
- name: http
containerPort: 80
- name: https
containerPort: 443
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
So when I edit yaml file as aforementioned, it gives the following error on Azure:
Failed to update the deployment
Failed to update the deployment 'nginx-ingress-controller'. Error: BadRequest (400) : Deployment in version "v1" cannot be handled as a Deployment: v1.Deployment.Spec: v1.DeploymentSpec.Template: v1.PodTemplateSpec.Spec: v1.PodSpec.HostAliases: []v1.HostAlias: readObjectStart: expect { or n, but found ", error found in #10 byte of ...|liases":["ip "127.0|..., bigger context ...|theus.io/scrape":"true"}},"spec":{"hostAliases":["ip "127.0.0.1""],"hostnames":["app.ota.local","g|...
If I edit the yaml file locally and try to run it from local kubectl which is connected to Azure, it gives the following error:
serviceaccount/weave-net configured
clusterrole.rbac.authorization.k8s.io/weave-net configured
clusterrolebinding.rbac.authorization.k8s.io/weave-net configured
role.rbac.authorization.k8s.io/weave-net configured
rolebinding.rbac.authorization.k8s.io/weave-net configured
daemonset.apps/weave-net configured Using cluster from kubectl
context: k8s_14
namespace/ingress-nginx unchanged deployment.apps/default-http-backend
unchanged service/default-http-backend unchanged
configmap/nginx-configuration unchanged configmap/tcp-services
unchanged configmap/udp-services unchanged
serviceaccount/nginx-ingress-serviceaccount unchanged
clusterrole.rbac.authorization.k8s.io/nginx-ingress-clusterrole
unchanged role.rbac.authorization.k8s.io/nginx-ingress-role unchanged
rolebinding.rbac.authorization.k8s.io/nginx-ingress-role-nisa-binding
unchanged
clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress-clusterrole-nisa-binding
unchanged error: error validating
"/home/.../ota-community-edition/scripts/../generated/templates/ingress":
error validating data: ValidationError(Deployment.spec.template.spec):
unknown field "hostnames" in io.k8s.api.core.v1.PodSpec; if you choose
to ignore these errors, turn validation off with --validate=false
make: *** [Makefile:34: start_start-all] Error 1
Adding entries to a
Pod's /etc/hosts file provides Pod-level override of hostname
resolution when DNS and other options are not applicable. You can add
these custom entries with the HostAliases field in PodSpec.
Modification not using HostAliases is not suggested because the file
is managed by the kubelet and can be overwritten on during Pod
creation/restart.
I suggest that you use hostAliases instead
apiVersion: v1
kind: Pod
metadata:
name: hostaliases-pod
spec:
restartPolicy: Never
hostAliases:
- ip: "127.0.0.1"
hostnames:
- "app.my.project"
- "event-manager.my.project"
- "logger.my.project"
containers:
- name: cat-hosts
image: busybox
command:
- cat
args:
- "/etc/hosts"

How do I keep a Kubernetes pod running with no http endpoint? (prevent back-off)

I have a usecase where (at least for now) I need a k8s pod to stay up without a HTTP or TCP endpoint. I tried the following deployment...
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-deployment
labels:
app: node-app
spec:
selector:
matchLabels:
app: node-app
template:
metadata:
labels:
app: node-app
spec:
containers:
- name: node-server
image: node:17-alpine
exec:
command: ["node","-v"]
livenessProbe:
exec:
command: ["node","-v"]
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
exec:
command: ["node","-v"]
initialDelaySeconds: 5
periodSeconds: 5
But after a while it stops the pod with the following error...
Warning BackOff 5s (x10 over 73s) kubelet, minikube Back-off restarting failed container
And the pod status shows...
node-deployment-58445f5649-z6lkz 0/1 CrashLoopBackOff 5 (103s ago) 4m47s
I know it is running because I see the version in kubectl logs <node-name>. How do I get the image to stay up with no long running process? Is this even possible?
The process of the container simply exited, the same happens if you run node -v in your terminal.
Not sure if the use case you provide is your real use case ( I can't see any reason to have the node version as an application), so ..
if you really want to have the version, you can change the command to be
["watch","-n1","node","-v"]
So it will print the node version every second

Implementing Readiness and Liveness probe in Azure AKS

I am trying to follow this documentation to enable readiness and liveness probe on my pods for health checkes in my cluster, however it gives me an error where connection refused to the container IP and port. Portion where i have added the readiness and liveness is below.
I am using helm for deployment and the port i am trying to monitor is 80. The service file for ingress is also given below.
https://learn.microsoft.com/en-us/azure/application-gateway/ingress-controller-add-health-probes
Service.yaml
apiVersion: v1
kind: Service
metadata:
name: expose-portal
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "{{ .Values.isInternal }}"
spec:
type: LoadBalancer
ports:
- port: 80
selector:
app: portal
Deployment.yaml
spec:
containers:
- name: App-server-portal
image: myacr-app-image-:{{ .Values.image_tag }}
imagePullPolicy: Always
ports:
- name: http
containerPort: 80
protocol: TCP
readinessProbe:
httpGet:
path: /
port: 80
periodSeconds: 3
timeoutSeconds: 1
livenessProbe:
httpGet:
path: /
port: 80
periodSeconds: 3
timeoutSeconds: 1
volumeMounts:
- mountPath: /etc/nginx
readOnly: true
name: mynewsite
imagePullSecrets:
- name: my-secret
volumes:
- name: mynewsite.conf
configMap:
name: mynewsite.conf
items:
- key: mynewsite.conf
path: mynewsite.conf
Am i doing something wrong here. As per azure documentation as of today Probing on a port other than the one exposed on the pod is currently not supported. My understanding is that port 80 on my pod is already exposed.
taken from the docs:
initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0.
periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes
The solution was to increase timeout.
PS. I think you might need to introduce initialDelaySeconds instead of increasing timeout to 3 minutes

Pull image Azure Container Registry - Kubernetes

Does anyone have any advice on how to pull from Azure container registry whilst running within Azure container service (kubernetes)
I've tried a sample deployment like the following but the image pull is failing:
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: jenkins-master
spec:
replicas: 1
template:
metadata:
name: jenkins-master
labels:
name: jenkins-master
spec:
containers:
- name: jenkins-master
image: myregistry.azurecr.io/infrastructure/jenkins-master:1.0.0
imagePullPolicy: Always
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 20
timeoutSeconds: 5
ports:
- name: jenkins-web
containerPort: 8080
- name: jenkins-agent
containerPort: 50000
I got this working after reading this info.
http://kubernetes.io/docs/user-guide/images/#specifying-imagepullsecrets-on-a-pod
So firstly create the registry access key
kubectl create secret docker-registry myregistrykey --docker-server=https://myregistry.azurecr.io --docker-username=ACR_USERNAME --docker-password=ACR_PASSWORD --docker-email=ANY_EMAIL_ADDRESS
Replacing the server address with the address of your ACR address and the USERNAME, PASSWORD and EMAIL address with the values from the admin user for your ACR. Note: The email address can be value.
Then in the deploy you simply tell kubernetes to use that key for pulling the image like so:
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: jenkins-master
spec:
replicas: 1
template:
metadata:
name: jenkins-master
labels:
name: jenkins-master
spec:
containers:
- name: jenkins-master
image: myregistry.azurecr.io/infrastructure/jenkins-master:1.0.0
imagePullPolicy: Always
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 20
timeoutSeconds: 5
ports:
- name: jenkins-web
containerPort: 8080
- name: jenkins-agent
containerPort: 50000
imagePullSecrets:
- name: myregistrykey
This is something we've actually made easier. When you provision a Kubernetes cluster through the Azure CLI, a service principal is created with contributor privileges. This will enable pull requests of any Azure Container Registry in the subscription.
There was a PR: https://github.com/kubernetes/kubernetes/pull/40142 that was merged into new deployments of Kubernetes. It won't work on existing kubernetes instances.

Resources