How to solve 'GitRepository not found' error in FluxCD? - azure

I am trying to use Azure kuberenetes cluster and FluxCD to connect to a repository named realtimeapp-infra in Gitlab. I created the source and kustomization .yaml files in another repo training-setup, but getting the following error when I use flux get kustomizations in cmd. I was getting the same error with GitHub also. (I am new to both FluxCD and Kubernetes.)
EDIT: The problem was solved. It was due to no master branch in the repository, and I did not have access to create the master branch. After the owner created it, the issue was resolved.

Did you connected to repository realtimeapp-infra as a GitRepository inside flux with username & credentials? This is a own CRD type coming with flux = kubectl get gitrepository -A

Related

Airflow can't reach logs from webserver due to 403 error

I use Apache Airflow for daily ETL jobs. I installed it in Azure Kubernetes Service using the provided Helm chart. It's been running fine for half a year, but since recently I'm unable to access the logs in the webserver (this used to always work fine).
I'm getting the following error:
*** Log file does not exist: /opt/airflow/logs/dag_id=analytics_etl/run_id=manual__2022-09-26T09:25:50.010763+00:00/task_id=copy_device_table/attempt=18.log
*** Fetching from: http://airflow-worker-0.airflow-worker.default.svc.cluster.local:8793/dag_id=analytics_etl/run_id=manual__2022-09-26T09:25:50.010763+00:00/task_id=copy_device_table/attempt=18.log
*** !!!! Please make sure that all your Airflow components (e.g. schedulers, webservers and workers) have the same 'secret_key' configured in 'webserver' section and time is synchronized on all your machines (for example with ntpd) !!!!!
****** See more at https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#secret-key
****** Failed to fetch log file from worker. Client error '403 FORBIDDEN' for url 'http://airflow-worker-0.airflow-worker.default.svc.cluster.local:8793/dag_id=analytics_etl/run_id=manual__2022-09-26T09:25:50.010763+00:00/task_id=copy_device_table/attempt=18.log'
For more information check: https://httpstatuses.com/403
What have I tried:
I've made sure that the log file exists (I can exec into the airflow-worker-0 pod and read the file on command line in the location specified in the error).
I've rolled back my deployment to an earlier commit from when I know for sure it was still working, but it made no difference.
I was using webserverSecretKeySecretName in the values.yaml configuration. I changed the secret to which that name was pointing (deleted it and created a new one, as described here: https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#webserver-secret-key) but it didn't work (no difference, same error).
I changed the config to use a webserverSecretKey instead (in plain text), no difference.
My thoughts/observations:
The error states that the log file doesn't exist, but that's not true. It probably just can't access it.
The time is the same in all pods (I double checked be exec-ing into them and typing date in the command line)
The webserver secret is the same in the worker, the scheduler, and the webserver (I double checked by exec-ing into them and finding the corresponding env variable)
Any ideas?
Turns out this was a known bug with the latest release (2.4.0) of the official Airflow Helm chart, reported here:
https://github.com/apache/airflow/discussions/26490
Should be resolved in version 2.4.1 which should be available in the next couple of days.

Webhooks on spark-gcp deployed through operatorhub

I deployed gcp-spark operator on k8s. Its working perfectly fine. Able to run scala and python jobs with no issues.
But, I am unable to create volume mounts on my pods. Unable to use local fs. Looks like spark-operator should be enabled with webhooks for it to work. Going by here.
There was an spark-operator with webhooks yaml here, but the name is different to the deployment coming through the operator hub. I updated the names to the best of my knowledge and tried to apply the deployment. But ran into the below issue.
kubectl apply -f spark-operator-with-webhook.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
deployment.apps/spark-operator configured
service/spark-webhook unchanged
The Job "spark-operator-init" is invalid: spec.template: Invalid value: core.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVers......int(nil)}}: field is immutable
Is there an easy way of enabling webhooks on spark-operator? I want to be able to mount local fs on the sparkapplication. Please assist.
I purged the init object and redeployed. The manifest was successfully applied.

Azure: what could be the cause of the error "Unable to edit or replace deployment"?

When I recreate my VM I got the following error:
Problem occurred during request to Azure services. Cloud provider details: Unable to edit or replace deployment 'VM-Name': previous deployment from '8/20/2019 6:20:33 AM' is still active (expiration time is '8/27/2019 5:17:41 AM'). Please see https://aka.ms/arm-deploy for usage details.
Help me please to understand.
What could be the cause of the error ?
UPDATED:
This deployment has not been started previously.
Prior to this, errors were received during creation:
Azure is not available now. Please Try again later
There were several such errors one at a time and then I got that error related to:
Unable to edit or replace deployment
My assumptions about this.
Tell me, am I right or not ?
I launched the image, then after some time I recreated it.
Creation began, but at that moment the connection with Azure was lost.
Then, when the connection was restored, we tried to make a deployment that was not removed in the previous attempt (because there was no connection with Azure).
As a result, we got such an error.
Does this theory make sense?
exactly what it says, there is another deployment with the same name going on at this time, either change the name of the deployment you are trying to queue or wait for the other deployment to finish\fail
This can also occur if you use Bicep templates for your ARM deployement and multiple modules or resources in the template have the same name:
module fooModule '../modules/foo.bicep' = {
name: 'foo'
}
module barModule '../modules/bar.bicep' = {
name: 'foo'
}
I got the same error initially pipeline was working but when retriggered pipeline took more time so i canceled the deployment and made a fresh rerun it encounters. i think i need wait until that deployment filed.

Pulling image from ECR via docker-py

I have a script that retrieves a login for ECR, authenticates a DockerClient instance with the login credentials (reauth set to True), and then attempts to pull a nominated container image.
The code seems to work perfectly when running on my local machine interacting with docker daemon on an EC2 instance, but when running from the EC2 instance I am constantly getting
404 Client Error: Not Found ("repository XXXXXXXX.dkr.ecr.eu-west-2.amazonaws.com/autohld-runner not found: does not exist or no pull access")
The same repo is being used for both executing the code locally and remotely on the EC2 instance. I have tried setting the access to the image within ECR to allow pull for both everyone and my AWS ID. I have granted the role assigned to the EC2 instance Full Admin access also. All with no joy.
If I perform the same tasks on the EC2 instance via command line with the exact same repo URI (copied from the error), it works with no issue.
Is there something I am missing within docker-py ?
url = "tcp://127.0.0.1:2375"
dockerd = docker.DockerClient(base_url=url, version='auto')
dockerd.login(username=ecr.username, password=ecr.password, email='none', registry=ecr.registry, reauth=True)
dockerd.images.pull(ecr.get_repo(instance.tags['Container']), tag='latest')
get_repo returns the full URI as reported in the error message, the Container element is the name 'autohld-runner'
Thanks
It seems that if the registry has been accessed via the cli then an auth token or something is set and docker remembers this allowing subsequent calls to work. However in this case the instance is starting up completely fresh and using the login method within docker-py.
This doesn't seem to pass the credentials on to the pull, I have found that using the auth_config named argument and passing in a dictionary of auth parameters works.
auth_creds = {'username': ecr.username, 'password': ecr.password}
dockerd.images.pull(ecr.get_repo(instance.tags['Container']), tag='latest', auth_config=auth_creds)
HTH

WebSphere Database Federated Repository

I'm trying to add a DB2 database repository to my federated repository. I'm using Websphere version 8.0.
I've been running through Paul Ilechko's instructions (http://www-128.ibm.com/developerworks/websphere/techjournal/0701_ilechko/0701_ilechko.html) and I keep getting stuck at Step 3 (Set up the repository by using this wsadmin command to create the wimDB tables). I keep getting this error:
com.ibm.websphere.wim.exception.WIMSystemException: com.ibm.websphere.wim.exception.WIMSystemException: CWWIM1999E An exception occurred during processing: com.ibm.db2.jcc.DB2Driver
I did a search and it says I should set the Environment Variable DB2_JDBC_DRIVER_PATH to /home/.../sqllib/java/ for the scopes Node=Node, Node=Node01, Node=CellManager.
I tested the data source connection via the WAS Console and it worked, so I don't know what I did wrong. Got any ideas what could cause this?
You can easily set required Environment variable.
Move to: Environment -> WebSphere Variables
Set Cell scope
Create DB2_JDBC_DRIVER_PATH

Resources