Webhooks on spark-gcp deployed through operatorhub - apache-spark

I deployed gcp-spark operator on k8s. Its working perfectly fine. Able to run scala and python jobs with no issues.
But, I am unable to create volume mounts on my pods. Unable to use local fs. Looks like spark-operator should be enabled with webhooks for it to work. Going by here.
There was an spark-operator with webhooks yaml here, but the name is different to the deployment coming through the operator hub. I updated the names to the best of my knowledge and tried to apply the deployment. But ran into the below issue.
kubectl apply -f spark-operator-with-webhook.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
deployment.apps/spark-operator configured
service/spark-webhook unchanged
The Job "spark-operator-init" is invalid: spec.template: Invalid value: core.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVers......int(nil)}}: field is immutable
Is there an easy way of enabling webhooks on spark-operator? I want to be able to mount local fs on the sparkapplication. Please assist.

I purged the init object and redeployed. The manifest was successfully applied.

Related

Wrong connection port despite Kubernetes deployments/services ports specified

It might take a while to explain what I'm trying to do but bear with me please.
I have the following infrastructure specified:
I have a job called questo-server-deployment (I know, confusing but this was the only way to access the deployment without using ingress on minikube)
This is how the parts should talk to one another:
And here you can find the entire Kubernetes/Terraform config file for the above setup
I have 2 endpoints exposed from the node.js app (questo-server-deployment)
I'm making the requests using 10.97.189.215 which is the questo-server-service external IP address (as you can see in the first picture)
So I have 2 endpoints:
health - which simply returns 200 OK from the node.js app - and this part is fine confirming the node app is working as expected.
dynamodb - which should be able to send a request to the questo-dynamodb-deployment (pod) and get a response back, but it can't.
When I print env vars I'm getting the following:
➜ kubectl -n minikube-local-ns exec questo-server-deployment--1-7ptnz -- printenv
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=questo-server-deployment--1-7ptnz
DB_DOCKER_URL=questo-dynamodb-service
DB_REGION=local
DB_SECRET_ACCESS_KEY=local
DB_TABLE_NAME=Questo
DB_ACCESS_KEY=local
QUESTO_SERVER_SERVICE_PORT_4000_TCP=tcp://10.97.189.215:4000
QUESTO_SERVER_SERVICE_PORT_4000_TCP_PORT=4000
QUESTO_DYNAMODB_SERVICE_SERVICE_PORT=8000
QUESTO_DYNAMODB_SERVICE_PORT_8000_TCP_PROTO=tcp
QUESTO_DYNAMODB_SERVICE_PORT_8000_TCP_PORT=8000
KUBERNETES_SERVICE_HOST=10.96.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_PORT=tcp://10.96.0.1:443
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
QUESTO_SERVER_SERVICE_SERVICE_HOST=10.97.189.215
QUESTO_SERVER_SERVICE_PORT=tcp://10.97.189.215:4000
QUESTO_SERVER_SERVICE_PORT_4000_TCP_PROTO=tcp
QUESTO_SERVER_SERVICE_PORT_4000_TCP_ADDR=10.97.189.215
KUBERNETES_PORT_443_TCP_PROTO=tcp
QUESTO_DYNAMODB_SERVICE_PORT_8000_TCP=tcp://10.107.45.125:8000
QUESTO_DYNAMODB_SERVICE_PORT_8000_TCP_ADDR=10.107.45.125
KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
QUESTO_SERVER_SERVICE_SERVICE_PORT=4000
QUESTO_DYNAMODB_SERVICE_SERVICE_HOST=10.107.45.125
QUESTO_DYNAMODB_SERVICE_PORT=tcp://10.107.45.125:8000
KUBERNETES_SERVICE_PORT_HTTPS=443
NODE_VERSION=12.22.7
YARN_VERSION=1.22.15
HOME=/root
so it looks like the configuration is aware of the dynamodb address and port:
QUESTO_DYNAMODB_SERVICE_PORT_8000_TCP=tcp://10.107.45.125:8000
You'll also notice in the above env variables that I specified:
DB_DOCKER_URL=questo-dynamodb-service
Which is supposed to be the questo-dynamodb-service url:port which I'm assigning to the config here (in the configmap) which is then used here in the questo-server-deployment (job)
Also, when I log:
kubectl logs -f questo-server-deployment--1-7ptnz -n minikube-local-ns
I'm getting the following results:
Which indicates that the app (node.js) tried to connect to the db (dynamodb) but on the wrong port 443 instead of 8000?
The DB_DOCKER_URL should contain the full address (with port) to the questo-dynamodb-service
What am I doing wrong here?
Edit ----
I've explicitly assigned the port 8000 to the DB_DOCKER_URL as suggested in the answer but now I'm getting the following error:
Seems to me there is some kind of default behaviour in Kubernetes and it tries to communicate between pods using https ?
Any ideas what needs to be done here?
How about specify the port in the ConfigMap:
...
data = {
DB_DOCKER_URL = ${kubernetes_service.questo_dynamodb_service.metadata.0.name}:8000
...
Otherwise it may default to 443.
Answering my own question in case anyone have an equally brilliant idea of running local dybamodb in a minikube cluster.
The issue was not only with the port, but also with the protocol, so the final answer to the question is to modify the ConfigMap as follows:
data = {
DB_DOCKER_URL = "http://${kubernetes_service.questo_dynamodb_service.metadata.0.name}:8000"
...
}
As a side note:
Also, when you are running various scripts to create a dynamodb table in your amazon/dynamodb-local container, make sure you use the same region for both creating the table like so:
#!/bin/bash
aws dynamodb create-table \
--cli-input-json file://questo_db_definition.json \
--endpoint-url http://questo-dynamodb-service:8000 \
--region local
And the same region when querying the data.
Even though this is just a local copy, where you can type anything you want as a value of your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and actually in the AWS_REGION as well, the region have to match.
If you query the db with a different region it was created with, you get the Cannot do operations on a non-existent table error.

How to solve 'GitRepository not found' error in FluxCD?

I am trying to use Azure kuberenetes cluster and FluxCD to connect to a repository named realtimeapp-infra in Gitlab. I created the source and kustomization .yaml files in another repo training-setup, but getting the following error when I use flux get kustomizations in cmd. I was getting the same error with GitHub also. (I am new to both FluxCD and Kubernetes.)
EDIT: The problem was solved. It was due to no master branch in the repository, and I did not have access to create the master branch. After the owner created it, the issue was resolved.
Did you connected to repository realtimeapp-infra as a GitRepository inside flux with username & credentials? This is a own CRD type coming with flux = kubectl get gitrepository -A

how to update openshift kube-apiserver component with a new container image?

Openshift provides update way which updates the whole platform in a live way. while i (perhaps others also)have needs to just update some specific components.
It's ok to update component such as console, openshift-apiserver with new container image by managing operator and setting image correspondingly.
For example, to update openshift-apiservercomponent, the following steps do work:
disable the management of openshift apiserver operator
#oc patch openshiftapiservers.operator.openshift.io cluster --patch '{ "spec": { "managementState": "Unmanaged" } }' --type=merge
set a new conainer image for openshift apiserver deployment
#oc set image deploy apiserver openshift-apiserver=registry.somecorp.com:5000/ocp4/openshift4:openshfit-apiserver-4.4.4-t1 -n openshift-apiserverb
check and wait for the rollout status
#oc rollout status -w deploy/apiserver -n openshift-apiserver
While for the base kube-apiserver component, things are different.
Firstly, The way to disable related operator does not work, it seems kubeapiserver operator does not support the "Unmanaged" feature.
#oc patch kubeapiserver.operator.openshift.io cluster --patch '{ "spec": b { "managementState": "Unmanaged" } }' --type=merge
The KubeAPIServer "cluster" is invalid: spec.managementState: Invalid
value: "": spec.managementState in body should match
'^(Managed|Force)$'
Secondly, instead of deployment, it seems just pods are used for kube-apiserver. while there is way to set image for a specific pod/container, i don't figure out how to apply the setting.
#oc set image pod kube-apiserver-master-0 kube-apiserver=registry.somecorp.com:5000/ocp4/openshift4:hyperkube-t1 -n openshift-kube-apiserver b
pod/kube-apiserver-master-0 image updated
Is there someone who could help me figure out an approach to manually update kube-apiserver in a openshift system? Thanks for any information.
Using option A described here(https://github.com/openshift/enhancements/blob/master/enhancements/operator-dev-doc.md), kube-apiserver component can be really updated for a running cluster.

"Error: Key not loaded" in h2o deployed through a K3s cluster, using python3 client

I can confirm the 3-replica cluster of h2o inside K3s is correctly deployed, as executing in the Python3 interpreter h2o.init(ip="x.x.x.x") works as expected. I followed the instructions noted here: https://www.h2o.ai/blog/running-h2o-cluster-on-a-kubernetes-cluster/
Nevertheless, I had to modify the service.yaml and comment out the line which says clusterIP: None, as K3s was complaining about something related to its inability to set the clusterIP to None. But even though, I can certify it is working correctly, and I am able to use an external IP to connect to the cluster.
If I try to load the dataset using the h2o cluster inside the K3s cluster using the exact same steps as described here http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html, this is the output that I get:
>>> train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
...
h2o.exceptions.H2OResponseError: Server error java.lang.IllegalArgumentException:
Error: Key not loaded: Key<Frame> https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv
Request: POST /3/ParseSetup
data: {'check_header': '0', 'source_frames': '["https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv"]'}
The same error occurs if I use the h2o.upoad_file("x.csv") method.
There is a clue about what may be happening here: Key not loaded: Key<Frame> while POSTing source frame through ParseSetup in H2O API call but I am not using curl, and I can not find any parameter that could help me overcome this issue: http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/h2o.html?highlight=import_file#h2o.import_file
I need to use the Python client inside the same K3s cluster due to different technical reasons, so I am not able to kick off nor Flow nor Firebug to know what may be happening.
I can confirm it is working correctly when I simply issue a h2o.init(), using the local Java instance.
UPDATE 1:
I have tried in different K3s clusters without success. I changed the service.yaml to a NodePort, and now this is the error traceback:
>>> train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
...
h2o.exceptions.H2OResponseError: Server error java.lang.IllegalArgumentException:
Error: Job is missing
Request: GET /3/Jobs/$03010a2a016132d4ffffffff$_a2366be93ec99a78d7bc161de8c54d67
UPDATE 2:
I have tried using different services (NodePort, LoadBalancer, ClusterIP) and none of them work. I also have tried using Minikube with the official image, and with a custom image made by me, without success. I suspect this is something related to either h2o itself, or the clustering between pods. I will keep digging and let's think there will be some gold in it.
UPDATE 3:
I also found out that the post about running H2O in Docker is really outdated https://www.h2o.ai/blog/h2o-docker/ nor is working the Dockerfile present at GitHub (I changed it to uncomment the ENTRYPOINT section without success): https://github.com/h2oai/h2o-3/blob/master/Dockerfile
Even though, I tried with the custom image I built for h2o-k8s and it is working seamlessly in pure Docker. I am wondering why it is still not working in K8s...
UPDATE 4:
I have tried modifying the environment variable called H2O_KUBERNETES_SERVICE_DNS without success.
In the meantime, the cluster started to be unavailable, that is, the readinessProbe's would not successfully complete. No matter what I change now, it does not work.
I spinned up a K3d cluster in local to see what happened, and surprisingly, the readinessProbe's were not failing, using v3.30.0.6. But now I started testing it with R instead of Python. I am glad I tried, because I may have pinpointed what was wrong. There is a version mismatch between the client and the server. So I updated accordingly the image to v3.30.0.1.
But now again, the readinessProbe is not working in my k3d cluster, so I am unable to test it.
It seems it is working now. R client version 3.30.0.1 with server version 3.30.0.1. Also tried with Python version 3.30.0.7 and server version 3.30.0.7 and it started working. Marvelous. The problem was caused by a version mismatch between the client and the server, as the python client was updated to 3.30.0.7 while the latest server for docker was 3.30.0.6.

Azure: what could be the cause of the error "Unable to edit or replace deployment"?

When I recreate my VM I got the following error:
Problem occurred during request to Azure services. Cloud provider details: Unable to edit or replace deployment 'VM-Name': previous deployment from '8/20/2019 6:20:33 AM' is still active (expiration time is '8/27/2019 5:17:41 AM'). Please see https://aka.ms/arm-deploy for usage details.
Help me please to understand.
What could be the cause of the error ?
UPDATED:
This deployment has not been started previously.
Prior to this, errors were received during creation:
Azure is not available now. Please Try again later
There were several such errors one at a time and then I got that error related to:
Unable to edit or replace deployment
My assumptions about this.
Tell me, am I right or not ?
I launched the image, then after some time I recreated it.
Creation began, but at that moment the connection with Azure was lost.
Then, when the connection was restored, we tried to make a deployment that was not removed in the previous attempt (because there was no connection with Azure).
As a result, we got such an error.
Does this theory make sense?
exactly what it says, there is another deployment with the same name going on at this time, either change the name of the deployment you are trying to queue or wait for the other deployment to finish\fail
This can also occur if you use Bicep templates for your ARM deployement and multiple modules or resources in the template have the same name:
module fooModule '../modules/foo.bicep' = {
name: 'foo'
}
module barModule '../modules/bar.bicep' = {
name: 'foo'
}
I got the same error initially pipeline was working but when retriggered pipeline took more time so i canceled the deployment and made a fresh rerun it encounters. i think i need wait until that deployment filed.

Resources