How many Spark Executor Pods you run per Kubernetes Node - apache-spark

Spark needs lots of resources to does its job. Kubernetes is great environment for resource management. How many Spark PODs do you run per node to have the best resource utilization?
Trying to run Spark Cluster on Kubernetes Cluster.

It depends on many factors. We need to know how much resources do you have and how much is being consumed by the pods. To do so you need to setup a Metrics-server.
Metrics Server is a cluster-wide aggregator of resource usage data.
Next step is to setup HPA.
The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization or other custom metrics. HPA normally fetches metrics from a series of aggregated APIs:
metrics.k8s.io
custom.metrics.k8s.io
external.metrics.k8s.io
How to make it work?
HPA is being supported by kubectl by default:
kubectl create - creates a new autoscaler
kubectl get hpa - lists your autoscalers
kubectl describe hpa - gets a detailed description of autoscalers
kubectl delete - deletes an autoscaler
Example:
kubectl autoscale rs foo --min=2 --max=5 --cpu-percent=80 creates an autoscaler for replication set foo, with target CPU utilization set to 80% and the number of replicas between 2 and 5. You can and should adjust all values to your needs.
Here is a detailed documentation of how to use kubectl autoscale command.
Please let me know if you find that useful.

Related

Problem with Kubernetes Cluster Autoscaler on Azure

I have kubernetes cluster running on Azure Virtual Machine Scale Set. I use Kubernetes Cluster Autoscaler to scale the number of nodes. It works fine, if i set limit from 1 to 10 but the problem appears when i set limit from 0 in one particular case:
When the number of nodes has been scaled to 0 and after this operation pod with cluster autoscaler restarted. Then i want to run pod on this VMSS (pod with nodeSelector - agentpool: memory), but it looks like autoscaler can't read appropriate labels from VMSS when number of instance is scaled to 0.
According to documentation i add the following tag to the VMSS k8s.io_cluster-autoscaler_node-template_label_agentpool: memory.
I have logs from autoscaler pod:
GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector

Manage Docker containers at low scale

I have deployed 5 apps using Azure container instances, these are working fine, the issue I have is that currently, all containers are running all the time, which gets expensive.
What I want to do is to start/stop instances when required using for this a Master container or VM that will be working all the time.
E.G.
This master service gets a request to spin up service number 3 for 2 hours then shut it down and all other containers will be off until they receive a similar request.
For my use case, each service will be used for less than 5 hours a day most of the time.
Now, I know Kubernetes its an engine made to manage containers but all examples I have found are for high scale services, not for 5 services with only one container each, also not sure if Kubernetes allows to have all the containers off most of the time.
What I was thinking on is to handle all these throw some API, but I'm not fiding any service in Azure that allows something similar to this, I have only found options to create new containers, not to spin up and shut them down.
EDIT:
Also, this apps run process that are to heavy to have them on a serverless platform.
Solution is to define horizontal pod autoscaler for your deployment.
The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics). Note that Horizontal Pod Autoscaling does not apply to objects that can’t be scaled, for example, DaemonSets.
The Horizontal Pod Autoscaler is implemented as a Kubernetes API resource and a controller. The resource determines the behavior of the controller. The controller periodically adjusts the number of replicas in a replication controller or deployment to match the observed average CPU utilization to the target specified by user.
Configuration file should looks like this:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-images-service
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: example-deployment
minReplicas: 2
maxReplicas: 100
targetCPUUtilizationPercentage: 75
scaleRef should refer toyour deployment definition and minReplicas you can set as 0, value of targetCPUUtilization you can set according to your preferences.. Such approach should help you to save money due to termination pod which have high CPU utilization.
Kubernetes official documentation: kubernetes-hpa.
GKE autoscaler documentation: gke-autoscaler.
Useful blog about saving cash using GCP: kubernetes-google-cloud.

Scale Azure nginx ingress controller

We have a K8s cluster on Azure (aks). On this cluster, we added a loadbalancer on the setup which installed an nginx-ingress controller.
Looking at the deployments:
addon-http-application-routing-default-http-backend 1
addon-http-application-routing-external-dns 1
addon-http-application-routing-nginx-ingress-controller 1
I see there is 1 of each running. Now I find very little information if these should be scaled (there is 1 pod each) and if they should, how?
I've tried running
kubectl scale deployment addon-http-application-routing-nginx-ingress-controller --replicas=3
Which temporarily scales it to 3 pods, but after a few moments, it is downscaled again.
So again, are these supposed to be scaled? Why? How?
EDIT
For those that missed it like I did: The AKS addon-http-application is not ready for production, it is there to quickly set you up and start experimenting. Which is why I wasn't able to scale it properly.
Read more
That's generally the way how you do it:
$ kubectl scale deployment addon-http-application-routing-nginx-ingress-controller --replicas=3
However, I suspect you have an HPA configured which will scale up/down depending on the load or some metrics and has the minReplicas spec set to 1. You can check with:
$ kubectl get hpa
$ kubectl describe hpa <hpa-name>
If that's the case you can scale up by just patching the HPA:
$ kubectl patch hpa <hpa-name> -p '{"spec": {"minReplicas": 3}}'
or edit it manually:
$ kubectl edit hpa <hpa-name>
More information on HPAs here.
And yes, the ingress controllers are supposed to be scaled up and down depending on the load.
In AKS, being a managed service, this "system" workloads like kube-dns and the ingress controller, are managed by the service itself and they cannot be modified by the user (because they're labeled with addonmanager.kubernetes.io/mode: Reconcile, which forces the current configuration to reflect what's on disk at /etc/kubernetes/addons on the masters).

How to limit amount of pods with attached managed disks per node

Imagine there is a cluster with lots of different deployments running on it. Some pods uses PersistentVolumes (Azure Disks). There is a limit in Azure how much disks can be mounted to a VM and this leads to errors on scheduling like
Status=409 Code="OperationNotAllowed" Message="The maximum number of data disks allowed to be attached to a VM of this size is 8
Pods stay in
Waiting: Container creating
state forever, however some nodes were having much less pods with attached disks at the moment of scheduling. It would be great to limit amount of pods with attached disks per node so this error will never happen. I believe
podAntiAffinity
is what I need and I know I can restrict pods with same label from scheduling on same node, but I don't know how to allow it until node has maximum amount of pods with disks.
My installation is AKS.
az acs create \
--orchestrator-type=kubernetes \
--orchestrator-version 1.7.9 \
--resource-group <resource_group_here> \
--name=<name_here> \
...
KUBE_MAX_PD_VOLS is what you are looking for. By default it's value is 16 for Azure Disks. So you can either use instances which has same limit of attached disks (16) or set it to preferrable value. You can see where it's declared at github
You should set this environment variable in your scheduler declaration. I found my scheduler declaration in /etc/kubernetes/manifests/kube-scheduler.yaml. This is what it looks now:
apiVersion: "v1"
kind: "Pod"
metadata:
name: "kube-scheduler"
...
spec:
containers:
- name: "kube-scheduler"
...
env:
- name: KUBE_MAX_PD_VOLS
value: "8"
...
Note spec.containers.env.KUBE_MAX_PD_VOLS setting - it prevents from scheduling more than 8 disks on each node.
This way pods spread among nodes without any issues, pods which cannot fit stays in Pending state until they find enough nodes to fit in.

Change Kubernetes pods logging

I Have an AKS (Azure Container Service) configured, up and running, with kubernetes installed.
Deploying containers on using [kubectl proxy] and the GUI of Kubernetes provided.
I am trying to increase the log level of the pods in order to get more information for better debugging.
I read a lot about kubectl config set
and the log level --v=0 [0-10]
but not being able to change the log level. it seems the documentation
can someone point me out in the right direction?
The --v flag is an argument to kubectl and specifies the verbosity of the kubectl output. It has nothing to do with the log levels of the application running inside your Pods.
To get the logs from your Pods, you can run kubectl logs <pod>, or read /var/log/pods/<namespace>_<pod_name>_<pod_id>/<container_name>/ on the Kubernetes node.
To increase the log level of your application, your application has to support it. And like #Jose Armesto said above, this is usually configured using an environment variable.

Resources