Why does openwhisk kubernetes pods status is pending in linux? - linux

I have implemented openwhisk using kubernetes in Windows operating system. The same thing I needs to implement in Linux. I followed following document to implement in Linux. https://medium.com/#ansjin/openwhisk-deployment-on-a-kubernetes-cluster-7fd3fc2f3726. But when I try to get all pods details but openwhisk pods status is in pending only.
How to up these pods?

It looks to me like something is going wrong with your Kubernetes/flannel installation. Kubernetes won't create new pods until it can assign the pod IPs properly, so the CNI that you use (in your case, flannel) needs to be working properly before OpenWhisk (or other applications) can be deployed.
If you investigate the flannel-ds pod with the CrashLoopBackOff and also try to figure out why the coredns pods haven't finished creating, that would be a good place to start debugging.

Related

How to kick off Linux script in AKS from Web App (AZURE) on-demand

Given that I have a 24x7 AKS Cluster on AZURE, for which afaik Kubernetes cannot stop/pause a pod and then resume it standardly,
with, in my case, a small Container in a Pod, and for that Pod it can be sidelined via --replicas=0,
then, how can I, on-demand, best kick off a LINIX script packaged in that Pod/Container which may be not running,
from an AZURE Web App?
I thought using ssh should work, after first upscaling the pod to 1 replica. Is this correct?
I am curious if there are simple http calls in AZURE to do this. I see CLI and Powershell to start/stop AKS cluster, but that is different of course.
You can interact remotely with AKS by different methods. The key here is to use the control plane API to deploy your kubernetes resource programmatically (https://kubernetes.io/docs/concepts/overview/kubernetes-api/) .
In order to do that, you should use client libraries that enable that kind of access. Many examples can be found here for different programming languages:
https://github.com/kubernetes-client
ssh is not really recommended since that is sort of a god access to the cluster and its usage is not meant for your purpose.

Azure k8 API calls timeout

I have a problem with azure and it's kubernetes environment. Form to time calls to k8s API are failing and when it happened the pod which experiencing the issue stops responding (more like the network issue than application hanging), for a calls from other pods, but health check is working, so k8s is not restarting the pod. The only way to restore it is a deletion of the pod.
Here is a part of the stacktrace.
Failed to get list of services from DiscoveryClient org.springframework.cloud.kubernetes.discovery.KubernetesDiscoveryClient#7057dbda due to: {}
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [list] for kind: [Service] with name: [null] in namespace: [dev] failed.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:602)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:63)
at org.springframework.cloud.kubernetes.discovery.KubernetesDiscoveryClient.getServices(KubernetesDiscoveryClient.java:253)
at org.springframework.cloud.kubernetes.discovery.KubernetesDiscoveryClient.getServices(KubernetesDiscoveryClient.java:249)
.....
Caused by: java.net.SocketTimeoutException: timeout
at okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.java:593)
at okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.java:601)
What I know now:
issue happens on azure
doesn't depend on the load (also the actual calls are background jobs and they are also not depend on the load, so this is expected)
doesn't depend on the number of services deployed or frequency of the calls to k8s API (so actual traffic to k8s API from cluster doesn't matter)
it is very selective: if one service/replica is affected other can work without issues
if affected pod is restarted quickly (we have a job to automate restarts) then problem tends to jump to other service
Azure support says it is problem with our apps, which are build on spring boot and it's auto discovery mechanism, but I am starting doubt it.
Basically it looks like pod is partially lost by k8s engine.
So the question is what is wrong and what else can I check?

Hi can I have a custom script to be executed in AKS node group?

I would like to tweak some settings in AKS node group with something like userdata in AWS. Is it possible to do in AKS?
how abt using
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/virtual_machine_scale_set_extension
The underlying Virtual Machine Scale Set (VMSS) is an implementation detail, and one that you do not get to adjust outside of SKU and disk choice. Just like you cannot pick the image that goes on the VMSS; you also cannot use VM Extensions on that scale set, without being out of support. Any direct manipulation of those VMSSs (from an Azure resource provider perspective) behind your nodepools puts you out of support. The only supported affordance to perform host (node)-level actions is via deploying your custom script work in a DaemonSet to the cluster. This is fully supported, and will give you the ability to run (almost) anything you need at the host level. Examples being installing/executing custom security agents, FIM solutions, anti-virus.
From the support FAQ:
Any modification done directly to the agent nodes using any of the IaaS APIs renders the cluster unsupportable. Any modification done to the agent nodes must be done using kubernetes-native mechanisms such as Daemon Sets.

Whether the application will be live during the time of POD deployment in AKS?

Whether the application will be live (In transaction) during the time of POD deployment in AKS?
While we are performing the POD deployment, whether the application transactions will go through (or) get error out?
The Deployment system does a rolling update. New pods are created with the new template and once Ready they are added to the service load balancer, and then old ones are removed and terminated.

Who manages the nodes in an AKS cluster?

I started using the AKS service with 3 nodes setup. As I was curious I peeked at the provisioned VMs which are used as nodes. I noticed I can get root on these and that there need to be some updates installed. As I couldn't find anything in the docs, my question is: Who is in charge of managing the AKS nodes (vms).
Do I have to do this myself or what is the idea here?
Thank you in advance.
Azure automatically applies security patches to the nodes in your cluster on a nightly schedule. However, you are responsible for ensuring that nodes are rebooted as required.
You have several options for performing node reboots:
Manually, through the Azure portal or the Azure CLI.
By upgrading your AKS cluster. Cluster upgrades automatically cordon
and drain nodes, then bring them back up with the latest Ubuntu
image. Update the OS image on your nodes without changing Kubernetes
versions by specifying the current cluster version in az aks
upgrade.
Using Kured, an open-source reboot daemon for Kubernetes.
Kured runs as a DaemonSet and monitors each node for the presence of
a file indicating that a reboot is required. It then manages OS
reboots across the cluster, following the same cordon and drain
process described earlier.

Resources