How to tells Knative Pod Autoscaler not to kill in-progress long running pod - autoscaling

My goal:
Implement a cron job run once per week and I intend to implement this topology on Knative to save the computing resources:
PingSource -> knative service
The PingSource will emit a dummy event to a knative service once per week just to bring up 1 knative service pod. The knative service pod will get huge amount of data and then process them.
My concern:
If I set enable-scale-to-zero to true, the Knative pod autoscaler probably shutdown the knative service pod even when the pod has not finished its work.
So far, I explored:
The scale-to-zero-grace-period which can be configured to tell the auto scaler how long it should wait after the last traffic ends to shutdown the pod. But I don't think this approach is subtle. I prefer somewhat similar to readinessProbe or livenessProbe. The auto scaler should send a probe to know whether the pod is processing something before sending the kill signal.
In addition, according to knative's docs, there are 2 type of event sink: callable and addressable. Addressable and Callable both return the response or acknowledgement. Would the knative auto scaler consider the pod as handling the request till the pod return the response/acknowledgement? So as long as the pod does not response, it won't be removed by the auto scaler.

The Knative autoscaler relies on the pod strictly working in a request/response fashion. As long as the "huge amount of data" is processed as part of an HTTP request (or Websocket session, or gRPC session etc.) the pod will not even be considered for deletion.
What will not work is sending the request, immediately return and then munging the data in the background. The autoscaler will think that there's no activity at all and thus shut it down. There is a sandbox project that tries to implement such asynchronous semantics though.

Related

Vertx clustered eventbus not removing old node on kubernetes rolling deployment

I have two vertx micro services running in cluster and communicate with each other using a headless service(link) in on premise cloud. Whenever I do a rolling deployment I am facing connectivity issue within services. When I analysed the log I can see that old node/pod is getting removed from cluster list but the event bus is not removing it and using it in round robin basis.
Below is the member group information before deployment
Member [192.168.4.54]:5701 - ace32cef-8cb2-4a3b-b15a-2728db068b80 //pod 1
Member [192.168.4.54]:5705 - f0c39a6d-4834-4b1d-a179-1f0d74cabbce this
Member [192.168.101.79]:5701 - ac0dcea9-898a-4818-b7e2-e9f8aaefb447 //pod 2
When deployment is started, pod 2 gets removed from the member list,
[192.168.4.54]:5701 [dev] [4.0.2] Could not connect to: /192.168.101.79:5701. Reason: SocketException[Connection refused to address /192.168.101.79:5701]
Removing connection to endpoint [192.168.101.79]:5701 Cause => java.net.SocketException {Connection refused to address /192.168.101.79:5701}, Error-Count: 5
Removing Member [192.168.101.79]:5701 - ac0dcea9-898a-4818-b7e2-e9f8aaefb447
And new member is added,
Member [192.168.4.54]:5701 - ace32cef-8cb2-4a3b-b15a-2728db068b80
Member [192.168.4.54]:5705 - f0c39a6d-4834-4b1d-a179-1f0d74cabbce this
Member [192.168.94.85]:5701 - 1347e755-1b55-45a3-bb9c-70e07a29d55b //new pod
All migration tasks have been completed. (repartitionTime=Mon May 10 08:54:19 MST 2021, plannedMigrations=358, completedMigrations=358, remainingMigrations=0, totalCompletedMigrations=3348, elapsedMigrationTime=1948ms, totalElapsedMigrationTime=27796ms)
But when a request is made to the deployed service, event though old pod is removed from member group the event bus is using the old pod/service reference(ac0dcea9-898a-4818-b7e2-e9f8aaefb447),
[vert.x-eventloop-thread-1] DEBUG io.vertx.core.eventbus.impl.clustered.ConnectionHolder - tx.id=f9f5cfc9-8ad8-4eb1-b12c-322feb0d1acd Not connected to server ac0dcea9-898a-4818-b7e2-e9f8aaefb447 - starting queuing
I checked the official documentation for rolling deployment and my deployment seems to be following two key things mentioned in documentation, only one pod removed and then the new one is added.
never start more than one new pod at once
forbid more than one unavailable pod during the process
I am using vertx 4.0.3 and hazelcast kubernetes 1.2.2. My verticle class is extending AbstractVerticle and deploying using,
Vertx.clusteredVertx(options, vertx -> {
vertx.result().deployVerticle(verticleName, deploymentOptions);
Sorry for the long post, any help is highly appreciated.
One possible reason could be due to a race condition with Kubernetes removing the pod and updating the endpoint in Kube-proxy as detailed in this extensive article. This race condition will lead to Kubernetes continuing to send traffic to the pod being removed after it has terminated.
One TL;DR solution is to add a delay when terminating a pod by either:
Have the service delay when it receives a SIGTERM (e.g. for 15 sec) such that it keeps responding to requests during that delay period like normal.
Use the Kubernetes preStop hook to execute a sleep 15 command on the container. This allows the service to continue responding to requests during that 15 second period while Kubernetes is updating it's endpoints. Kubernetes will send SIGTERM when the preStop hook completes.
Both solutions will give Kubernetes some time to propagate changes to it's internal components so that traffic stops being routed to the pod being removed.
A caveat to this answer is that I'm not familiar with Hazelcast Clustering and how your specific discover mode is setup.

How does Application Gateway prevent requests being sent to recently terminated pods?

I'm currently researching and experimenting with Kubernetes in Azure. I'm playing with AKS and the Application Gateway ingress. As I understand it, when a pod is added to a service, the endpoints are updated and the ingress controller continuously polls this information. As new endpoints are added AG is updated. As they're removed AG is also updated.
As pods are added there will be a small delay whilst that pod is added to the AG before it receives requests. However, when pods are removed, does that delay in update result in requests being forwarded to a pod that no longer exists?
If not, how does AG/K8S guarantee this? What behaviour could the end client potentially experience in this scenario?
Azure Application gateway ingress is an ingress controller for your kubernetes deployment which allows you to use native Azure Application gateway to expose your application to the internet. Its purpose is to route the traffic to pods directly. At the same moment all questions about pods availability, scheduling and generally speaking management is on kubernetes itself.
When a pod receives a command to be terminated it doesn't happen instantly. Right after kube-proxies will update iptables to stop directing traffic to the pod. Also there may be ingress controllers or load balancers forwarding connections directly to the pod (which is the case with an application gateway). It's impossible to solve this issue completely, while adding 5-10 seconds delay can significantly improve users experience.
If you need to terminate or scale down your application, you should consider following steps:
Wait for a few seconds and then stop accepting connections
Close all keep-alive connections not in the middle of request
Wait for all active requests to finish
Shut down the application completely
Here are exact kubernetes mechanics which will help you to resolve your questions:
preStop hook - this hook is called immediately before a container is terminated. This is very helpful for graceful shutdowns of an application. For example simple sh command with "sleep 5" command in a preStop hook can prevent users to see "Connection refused errors". After the pod receives an API request to be terminated, it takes some time to update iptables and let an application gateway know that this pod is out of service. Since preStop hook is executed prior SIGTERM signal, it will help to resolve this issue.
(example can be found in attach lifecycle event)
readiness probe - this type of probe always runs on the container and defines whether pod is ready to accept and serve requests or not. When container's readiness probe returns success, it means the container can handle requests and it will be added to the endpoints. If a readiness probe fails, a pod is not capable to handle requests and it will be removed from endpoints object. It works very well with newly created pods when an application takes some time to load as well as for already running pods if an application takes some time for processing.
Before removing from the endpoints readiness probe should fail several times. It's possible to lower this amount to only one fail using failureTreshold field, however it still needs to detect one failed check.
(additional information on how to set it up can be found in configure liveness readiness startup probes)
startup probe - for some applications which require additional time on their first initialisation it can be tricky to set up a readiness probe parameters correctly and not compromise a fast response from the application.
Using failureThreshold * periodSecondsfields will provide this flexibility.
terminationGracePeriod - is also may be considered if an application requires more than default 30 seconds delay to gracefully shut down (e.g. this is important for stateful applications)

Avoid consuming same events parallel from EventHub

I'm using:
Azure platform to run some microservice architecture software solution.
microservices are using the Azure-EventHub for communicating in special cases.
Kubernetes with 2 clusters (primary, secondary)
per application namespace, there is 1 event-listener pod running per cluster for consuming from eventhub
The last point is relevant to my current problem:
The load balancers will share traffic between the primary and secondary clusters. This means that 2 event-listener-pods are running per application at the same time. So they are just reacting to events but some times they are consuming the same event from the event hub and this causes some duplicated notification mails.
So finally my question is: How can I avoid reading the same event twice the same time? I thought event hub index is always increasing but starting at the same moment is not "secured".
You will need to use separate consumer groups per pod to avoid EPOCH error.
That said, both pods will read the same events, so you have two options.
Have an active-passive set up. One consumer group, one pod that reads the events and delegates the work out on each event. If that pod fails, then a health/heart beat mechanism brings the second pod online.
Have an active-active set up. Two consumer groups, two active pods. You will need to implement idempotent processing.
Idempotent processing, where processing the same message multiple times produces the same result, is good practice regardless of approach. This would allow you to replay batches of events in which one errored and not have adverse affects on the integrity of your data.
I would opt for the first option, a single event hub reader will process thousands of events per second and pass off the work to your micro services.
If you have lower volumes of messages and need guaranteed message processing, then using Service Bus may be a better choice where messages can be locked, completed and abandoned.

One Instance-One Request at a time App Engine Flexible

I am using
App Engine Flexible, custom runtime.
nodejs, as base Image.
express
Cloud Tasks for queuing the requests
puppeteer job
My Requirements
20GB RAM
long-running process
because of my unique requirement, I want 1 request to be handled by only 1 instance. when it gets free or the request gets timed-out, only then it should get a new request.
I have managed to reject other requests while the instance is processing 1 request, but not able to figure out the appropriate automatic scaling settings.
Please suggest the best way to achieve this.
Thanks in advance!
In your app.yaml try restricting the max_instances and max_concurrent_requests.
I also recommend looking into rate limiting your Cloud Tasks queue in order to reduce unnecessary attempts to send requests. Also you may want to increase your MIN_INTERVAL for retry attempts to spread out requests as well.
Your task queue will continue to process and send tasks by the rate you have set, so if your instance rejects the request it will go into a retry pattern. It seems like you're focused on the scaling of App Engine but your issue is with Cloud Tasks. You may want to schedule your tasks so they fire at the interval you want.
You could set readiness checks on your app.
When an instance is handling a request, set the readiness check to return a non-ready status. 429 (too many requests) seems like a good option.
This should avoid traffic to that specific instance.
Once the request is finished, return a 200 from the readiness endpoint to signal that the instance is ready to accept a new request.
However, I'm not sure how will this work with auto-scaling options. Since the app will only scale up once the average CPU is over the threshold defined, if all instances are occupied but do not reach that threshold, the load balancer won't know where to route requests (no instances are ready), and it won't scale up.
You could play around a little bit with this idea and manual scaling, or by programatically changing min_instances (in automatic scaling) through the GAE admin API.
Be sure to always return a 200 for the liveness check, or the instance will be killed as it will be considered unhealthy.

What is the signal sent to the process running in the container when k8s liveness probe fails? KILL or TERM

I have a use case to gracefully terminate the container where i have a script to kill the process gracefully from within the container by using the command "kill PID".( Which will send the TERM signal )
But I have liveness probe configured as well.
Currently liveness probe is configured to probe at 60 second interval. So if the liveness probe take place shortly after the graceful termination signal is sent, the overall health of the container might become CRITICAL when the termination is still in progress.
In this case the liveness probe will fail and container will be terminated immediately.
So i wanted to know whether kubelet kills the container with TERM or KILL.
Appreciate your support
Thanks in advance
In Kubernetes, Liveness Probe checks for the health state of a container.
To answer your question on whether it uses SIGKILL or SIGTERM, the answer is both are used but in order. So here is what happens under the hood.
Liveness probe check fails
Kubernetes stops routing of traffic to the container
Kubernetes restarts the container
Kubernetes starts routing traffic to the container again
For container restart, SIGTERM is first sent with waits for a parameterized grace period, and then Kubernetes sends SIGKILL.
A hack around your issue is to use the attribute:
timeoutSeconds
This specifies how long a request can take to respond before it’s considered a failure. You can add and adjust this parameter if the time taken for your application to come online is predictable.
Also, you can play with readinessProbe before livenessProbe with an adequate delay for the container to come into service after restarting the process. Check https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/ for more details on which parameters to use.

Resources