I have an Azure Function in NodeJS running in an AKS cluster. Now it will be deployed based on queue storage messages. After the function/job is done processing it should turn to complete and be removed. For some reason it doesn't.
Below is the KEDA job I've deployed to AKS:
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: keda-queue-storage-job
namespace: orchestrator
spec:
jobTargetRef:
completionMode: Indexed
parallelism: 10
completions: 1
backoffLimit: 3
template:
metadata:
namespace: orchestrator
labels:
app: orchestrator
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- orchestrator
topologyKey: kubernetes.io/hostname
containers:
- name: orchestrator
image: {acr}:latest
env:
- name: AzureFunctionsJobHost__functions__0
value: orchestrator
envFrom:
- secretRef:
name: orchestrator
readinessProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 240
httpGet:
path: /
port: 80
scheme: HTTP
startupProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 240
httpGet:
path: /
port: 80
scheme: HTTP
resources:
limits:
cpu: 4
memory: 10G
restartPolicy: Never
pollingInterval: 10 # Optional. Default: 30 seconds
successfulJobsHistoryLimit: 1 # Optional. Default: 100. How many completed jobs should be kept.
failedJobsHistoryLimit: 1 # Optional. Default: 100. How many failed jobs should be kept.
triggers:
- type: azure-queue
metadata:
queueName: searches
queueLength: "1"
activationQueueLength: "0"
connectionFromEnv: AzureWebJobsStorage
accountName:{account-name}
It seems like the Azure Function isn't signalling to kubernetes that it's done. When I run a process.exit(0) then the entire process stops too early it seems since any outputBindings aren't being done.
Does anyone know how to make the job complete?
I have an artillery config setup to test two different versions of my API gateway (Apollo federated Graphql gateway). The V2 gateway URL returns 404 with artillery, but from the browser and Postman, it is accessible. I am quite confused as to why this is the case. What could be the issue here? The V1 URL of the gateway is accessible by artillery, but when it hits the V2 URL, it returns a 404. Here is my config file
config:
environments:
v2:
target: "https://v2.com/graphql" # not accessible by artillery but works with postman and browser
phases:
- duration: 60
arrivalRate: 5
name: Warm up
- duration: 120
arrivalRate: 5
rampTo: 50
name: Ramp up load
- duration: 300
arrivalRate: 50
name: Sustained load
v1:
target: "https://v1.com/graphql" # accessible by artillery and other agents
phases:
- duration: 60
arrivalRate: 5
name: Warm up
- duration: 120
arrivalRate: 5
rampTo: 50
name: Ramp up load
payload:
path: "test-user.csv"
skipHeader: true
fields:
- "email"
- "_id"
-
path: "test-campaign.csv"
skipHeader: true
fields:
- "campaignId"
scenarios:
- name: "Test"
flow:
- post:
url: "/"
headers:
Authorization: Bearer <token>
json:
query: |
query Query {
randomQuery {
... on Error {
message
statusCode
}
... on Response {
res {
__typename
}
}
Any help would be appreciated. Thank you.
We are receiving random 502 errors from our ALB, our backend does not get hit at all as there are no logs of the request.
Nothing from the ALB logs just the 502 error but nothing usable for debugging.
h2 2020-03-26T14:30:52.495547Z app/path/tomytarget 10.111.11.111:50103 100.00.00.00:8080:8080 0.001 18.799 -1 502 - 1213 208 "POST https://mydomain:443/user/auth HTTP/2.0" "Name/3 CFNetwork/1121.2.2 Darwin/19.2.0" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:ap-southeast-1:0000000000:targetgroup/path/tomytarget "Root=someId" "mydomain.com" "arn:aws:acm:ap-southeast-1:0000000000:certificate/certificatedId" 0 2020-03-26T14:30:33.694000Z "forward" "-" "-" "100.00.00.00:8080" "-"
We started noticing it after we enabled the health checks with a proper route in nodejs and express
app.get("/health-check", (req, res) => {
res.status(200).end();
});
This is our ALB config and we are using VPC peering to another VPC
ElasticLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
IpAddressType: ipv4
Scheme: internet-facing
SecurityGroups:
- !Ref ELBSecurityGroup
Subnets:
- !Ref PublicSubnetA
- !Ref PublicSubnetB
Type: application
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Port: 8080
Protocol: HTTP
Targets:
- Id: <some ip in the other VPC>
AvailabilityZone: all
Port: 8080
TargetType: ip
VpcId: !Ref VPC
HealthCheckEnabled: true
HealthCheckIntervalSeconds: 30
HealthCheckPath: /health-check
HealthCheckPort: 8080
HealthCheckProtocol: HTTP
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 3
UnhealthyThresholdCount: 5
Listener:
Type: AWS::ElasticLoadBalancingV2::Listener
Properties:
DefaultActions:
- Type: forward
TargetGroupArn: !Ref TargetGroup
LoadBalancerArn: !Ref ElasticLoadBalancer
Certificates:
- CertificateArn: !Ref CertificateArn
Port: 443
Protocol: HTTPS
As I said we are using VPC peering and HTTPS with certificates coming from AWS certificate manager
EDIT:
if you are using nodejs one solution could be the following
// AWS ALB keepAlive is set to 60 seconds, we need to increase the default KeepAlive timeout
// of our node server
server.keepAliveTimeout = 65000; // Ensure all inactive connections are terminated by the ALB, by setting this a few seconds higher than the ALB idle timeout
server.headersTimeout = 66000; // Ensure the headersTimeout is set higher than the keepAliveTimeout due to this nodejs regression bug: https://github.com/nodejs/node/issues/27363
I have solved like this in nodejs
// AWS ALB keepAlive is set to 60 seconds, we need to increase the default KeepAlive timeout
// of our node server
server.keepAliveTimeout = 65000; // Ensure all inactive connections are terminated by the ALB, by setting this a few seconds higher than the ALB idle timeout
server.headersTimeout = 66000; // Ensure the headersTimeout is set higher than the keepAliveTimeout due to this nodejs regression bug: https://github.com/nodejs/node/issues/27363
I am creating an application with nodejs, it is using Hapi for web framework and knex for sql builder, The primary code is following:
server.route({
method: 'POST',
path: '/location',
config: {
tags: ['api'],
validate: {
payload: {
longitude: Joi.string().regex(/^\d+\.\d+$/).required(),
latitude: Joi.string().regex(/^\d+\.\d+$/).required(),
date: Joi.string().regex(/^\d{4}-\d{1,2}-\d{1,2}\s\d{1,2}:\d{1,2}.+$/).required(),
phone: Joi.string().required(/^1\d{10}$/)
}
}
},
handler: createLocation
})
async function createLocation(request, reply){
try{
const data = await knex('locations').insert(request.payload)
reply(data)
}catch(error){
reply(error)
}
}
It simply insert some date to postgresql. I am using Wrk to benchmark it Concurrent throughput in Google Compute Engine(cheapest machine), Result:
$ wrk -c 100 -t 12 http://localhost/api/location -s wrk.lua
Running 10s test # http://panpan.tuols.com/api/location
12 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 542.22ms 102.93ms 1.02s 88.32%
Req/Sec 21.95 18.74 70.00 78.71%
1730 requests in 10.02s, 0.94MB read
Requests/sec: 172.65
Transfer/sec: 96.44KB
Then I using pgbench to test postgresql insert performance:
$ pgbench location -U postgres -h localhost -r -t 1000 -f index.sql -c 10
transaction type: index.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
latency average = 1.663 ms
tps = 6014.610692 (including connections establishing)
tps = 6029.973067 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
1.595 INSERT INTO "public"."locations"("phone", "longitude", "latitude", "date", "createdAt", "updatedAt") VALUES('18382383428', '123,33', '123.33', 'now()', 'now()', 'now()') RETURNING "id", "phone", "longitude", "latitude", "date", "createdAt", "updatedAt";
The nodejs is 172.65 req/s, and the postgresql internal native is 6000 req/s, The are actually do some thing, if ignore the http overhead, the difference should not so much big, Why the performance is so much hug different? It is nodejs or node-pg package problem?
I've been trying to start kubernetes-dashboard (and eventualy other services) on a NodePort outside the default port range with little success,
here is my setup:
Cloud provider: Azure (Not azure container service)
OS: CentOS 7
here is what I have tried:
Update the host
$ yum update
Install kubeadm
$ cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://yum.kubernetes.io/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
$ setenforce 0
$ yum install -y docker kubelet kubeadm kubectl kubernetes-cni
$ systemctl enable docker && systemctl start docker
$ systemctl enable kubelet && systemctl start kubelet
Start the cluster with kubeadm
$ kubeadm init
Allow runing containers on master node, because we have a single node cluster
$ kubectl taint nodes --all dedicated-
Install a pod network
$ kubectl apply -f https://git.io/weave-kube
Our kubernetes-dashboard Deployment (# ~/kubernetes-dashboard.yaml
# Copyright 2015 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Configuration to deploy release version of the Dashboard UI.
#
# Example usage: kubectl create -f <this_file>
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
labels:
app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: kubernetes-dashboard
template:
metadata:
labels:
app: kubernetes-dashboard
# Comment the following annotation if Dashboard must not be deployed on master
annotations:
scheduler.alpha.kubernetes.io/tolerations: |
[
{
"key": "dedicated",
"operator": "Equal",
"value": "master",
"effect": "NoSchedule"
}
]
spec:
containers:
- name: kubernetes-dashboard
image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.5.1
imagePullPolicy: Always
ports:
- containerPort: 9090
protocol: TCP
args:
# Uncomment the following line to manually specify Kubernetes API server Host
# If not specified, Dashboard will attempt to auto discover the API server and connect
# to it. Uncomment only if the default does not work.
# - --apiserver-host=http://my-address:port
livenessProbe:
httpGet:
path: /
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
---
kind: Service
apiVersion: v1
metadata:
labels:
app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
type: NodePort
ports:
- port: 8880
targetPort: 9090
nodePort: 8880
selector:
app: kubernetes-dashboard
Create our Deployment
$ kubectl create -f ~/kubernetes-dashboard.yaml
deployment "kubernetes-dashboard" created
The Service "kubernetes-dashboard" is invalid: spec.ports[0].nodePort: Invalid value: 8880: provided port is not in the valid range. The range of valid ports is 30000-32767
I found out that to change the range of valid ports I could set service-node-port-range option on kube-apiserver to allow a different port range,
so I tried this:
$ kubectl get po --namespace=kube-system
NAME READY STATUS RESTARTS AGE
dummy-2088944543-lr2zb 1/1 Running 0 31m
etcd-test2-highr 1/1 Running 0 31m
kube-apiserver-test2-highr 1/1 Running 0 31m
kube-controller-manager-test2-highr 1/1 Running 2 31m
kube-discovery-1769846148-wmbhb 1/1 Running 0 31m
kube-dns-2924299975-8vwjm 4/4 Running 0 31m
kube-proxy-0ls9c 1/1 Running 0 31m
kube-scheduler-test2-highr 1/1 Running 2 31m
kubernetes-dashboard-3203831700-qrvdn 1/1 Running 0 22s
weave-net-m9rxh 2/2 Running 0 31m
Add "--service-node-port-range=8880-8880" to kube-apiserver-test2-highr
$ kubectl edit po kube-apiserver-test2-highr --namespace=kube-system
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "kube-apiserver",
"namespace": "kube-system",
"creationTimestamp": null,
"labels": {
"component": "kube-apiserver",
"tier": "control-plane"
}
},
"spec": {
"volumes": [
{
"name": "k8s",
"hostPath": {
"path": "/etc/kubernetes"
}
},
{
"name": "certs",
"hostPath": {
"path": "/etc/ssl/certs"
}
},
{
"name": "pki",
"hostPath": {
"path": "/etc/pki"
}
}
],
"containers": [
{
"name": "kube-apiserver",
"image": "gcr.io/google_containers/kube-apiserver-amd64:v1.5.3",
"command": [
"kube-apiserver",
"--insecure-bind-address=127.0.0.1",
"--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota",
"--service-cluster-ip-range=10.96.0.0/12",
"--service-node-port-range=8880-8880",
"--service-account-key-file=/etc/kubernetes/pki/apiserver-key.pem",
"--client-ca-file=/etc/kubernetes/pki/ca.pem",
"--tls-cert-file=/etc/kubernetes/pki/apiserver.pem",
"--tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem",
"--token-auth-file=/etc/kubernetes/pki/tokens.csv",
"--secure-port=6443",
"--allow-privileged",
"--advertise-address=100.112.226.5",
"--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname",
"--anonymous-auth=false",
"--etcd-servers=http://127.0.0.1:2379"
],
"resources": {
"requests": {
"cpu": "250m"
}
},
"volumeMounts": [
{
"name": "k8s",
"readOnly": true,
"mountPath": "/etc/kubernetes/"
},
{
"name": "certs",
"mountPath": "/etc/ssl/certs"
},
{
"name": "pki",
"mountPath": "/etc/pki"
}
],
"livenessProbe": {
"httpGet": {
"path": "/healthz",
"port": 8080,
"host": "127.0.0.1"
},
"initialDelaySeconds": 15,
"timeoutSeconds": 15,
"failureThreshold": 8
}
}
],
"hostNetwork": true
},
"status": {}
$ :wq
The following is the truncated response
# pods "kube-apiserver-test2-highr" was not valid:
# * spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
So I tried a different approach, I edited the deployment file for kube-apiserver with the same change described above
and ran the following:
$ kubectl apply -f /etc/kubernetes/manifests/kube-apiserver.json --namespace=kube-system
And got this response:
The connection to the server localhost:8080 was refused - did you specify the right host or port?
So now i'm stuck, how can I change the range of valid ports?
You are specifying --service-node-port-range=8880-8880 wrong. You set it to one port only, Set it to a range.
Second problem: You are setting the service to use 9090 and it's not in the range.
ports:
- port: 80
targetPort: 9090
nodePort: 9090
API Server should have a deployment too, Try to editing the port-range in the deployment itself and delete the api server pod so it gets recreated via new config.
The Service node ports range is set to infrequently-used ports for a reason. Why do you want to publish this on every node? Do you really want that?
An alternative is to expose it on a semi-random nodeport, then use a proxy pod on a known node or set of nodes to access it via hostport.
This issue:
The connection to the server localhost:8080 was refused - did you specify the right host or port?
was caused by my port range excluding 8080, which kube-apiserver was serving on, so I could not send any updates to kubectl.
I fixed it by changing the port range to 8080-8881 and restarting the kubelet service like so:
$ service kubelet restart
Everything works as expected now.