Files in AIRFLOW_HOME (which is an Azure File Share MOUNT) are created as root - azure

I have set up the airflow in Azure Cloud (Azure Container Apps) and attached an Azure File Share as an external mount/volume
1. I ran **airflow init service**, it had created the airflow.cfg and `'webserver_config.py'` file in the **AIRFLOW_HOME (/opt/airflow)**, which is actually an azure mounted file system
2. I ran **airflow webserver service**, it had created the `airflow-webserver.pid` file in the **AIRFLOW_HOME (/opt/airflow)**, which is actually an azure mounted file system
Now the problem is all the files created above are created with root user&groups, not as airflow user(50000),
I have also set the env variable AIRFLOW_UID to 50000 during the creation of the container app. due to this my webservers are not starting, throwing the below error
PermissionError: [Errno 1] Operation not permitted: '/opt/airflow/airflow-webserver.pid'
Note: Azure Containers Apps does not allow use root/sudo commands, otherwise I could solve this problem with simple chown commands
Another problem is the airflow configurations passed through environment variables are never picked up by Docker, eg
- name: AIRFLOW__API__AUTH_BACKENDS
value: 'airflow.api.auth.backend.basic_auth'
Attached screenshot for reference
Your help is much appreciated!
YAML file that I use to create my container app:
id: /subscriptions/1234/resourceGroups/<my-res-group>/providers/Microsoft.App/containerApps/<app-name>
identity:
type: None
location: eastus2
name: webservice
properties:
configuration:
activeRevisionsMode: Single
registries: []
managedEnvironmentId: /subscriptions/1234/resourceGroups/<my-res-group>/providers/Microsoft.App/managedEnvironments/container-app-env
template:
containers:
- command:
- /bin/bash
- -c
- exec /entrypoint airflow webserver
env:
- name: AIRFLOW__API__AUTH_BACKENDS
value: 'airflow.api.auth.backend.basic_auth'
- name: AIRFLOW__CELERY__BROKER_URL
value: redis://:#myredis.redis.cache.windows.net:6379/0
- name: AIRFLOW__CELERY__RESULT_BACKEND
value: db+postgresql://user:pass#postres-db-servconn.postgres.database.azure.com/airflow?sslmode=require
- name: AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION
value: 'true'
- name: AIRFLOW__CORE__EXECUTOR
value: CeleryExecutor
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
value: postgresql+psycopg2://user:pass#postres-db-servconn.postgres.database.azure.com/airflow?sslmode=require
- name: AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
value: postgresql+psycopg2://user:pass#postres-db-servconn.postgres.database.azure.com/airflow?sslmode=require
- name: AIRFLOW__CORE__LOAD_EXAMPLES
value: 'false'
- name: AIRFLOW_UID
value: 50000
image: docker.io/apache/airflow:latest
name: wsr
volumeMounts:
- volumeName: randaf-azure-files-volume
mountPath: /opt/airflow
probes: []
resources:
cpu: 0.25
memory: 0.5Gi
scale:
maxReplicas: 3
minReplicas: 1
volumes:
- name: randaf-azure-files-volume
storageName: randafstorage
storageType: AzureFile
resourceGroup: RAND
tags:
tagname: ws-only
type: Microsoft.App/containerApps

Related

crunchy postgres operator backup to azure blob fails

I want to backup the database into azure blob, but failed.
My configuration is as follows
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo-azure
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.6-2
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 32Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.41-2
configuration:
- secret:
name: pgo-azure-creds
global:
repo2-path: /pgbackrest/repo2
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 32Gi
- name: repo2
azure:
container: "pgo"
patroni:
dynamicConfiguration:
postgresql:
pg_hba:
- "host all all 0.0.0.0/0 trust"
- "host all postgres 127.0.0.1/32 md5"
users:
- name: qixin
databases:
- iot
- lowcode
options: "SUPERUSER"
service:
type: LoadBalancer
My storage account name is pgobackup, container name is pgo.
The content of azure.conf is as follows.
[global]
repo2-azure-account=pgobackup
repo2-azure-key=aXdafScEP28el......JpkYa28nh5V+AStNtZ5Lg==
But there are no files on the blob
Also I executed the following command for a one time backup
kubectl annotate -n postgres-operator postgrescluster hippo-azure postgres-operator.crunchydata.com/pgbackrest-backup="$( date '+%F_%H:%M:%S' )" --overwrite=true
I also tried scheduled backups, but that also failed
- name: repo2
schedules:
full: "18 * * * *"
differential: "0 1 * * 1-6"
azure:
container: "pgo"
Taken from the official documentation.
The official document says that cronjob will be created, but no Job or Cronjob is created.
Can anyone give me some advice?
Thx!

Why does my Selenium Grid in Azure Container Instances take the same time to execute tests regardless of number of nodes?

Because ACI doesn't support scaling, we deploy multiple container groups containing an Azure DevOps agent, a selenium grid hub and a selenium grid node. To try and speed things up I've tried to deploy the container groups with an additional node, identical to the first only being started on port 6666 instead of port 5555. I can see the two nodes register with the grid without issue but when I execute the same batch of tests with the additional node and without they take the exact same amount of time. How do I go about finding out what's going on here?
My ACI yaml:
apiVersion: 2018-10-01
location: australiaeast
properties:
containers:
- name: devops-agent
properties:
image: __AZUREDEVOPSAGENTIMAGE__
resources:
requests:
cpu: 0.5
memoryInGb: 1
environmentVariables:
- name: AZP_URL
value: __AZUREDEVOPSPROJECTURL__
- name: AZP_POOL
value: __AGENTPOOLNAME__
- name: AZP_TOKEN
secureValue: __AZUREDEVOPSAGENTTOKEN__
- name: SCREEN_WIDTH
value: "1920"
- name: SCREEN_HEIGHT
value: "1080"
volumeMounts:
- name: downloads
mountPath: /tmp/
- name: selenium-hub
properties:
image: selenium/hub:3.141.59-xenon
resources:
requests:
cpu: 1
memoryInGb: 1
ports:
- port: 4444
- name: chrome-node
properties:
image: selenium/node-chrome:3.141.59-xenon
resources:
requests:
cpu: 1
memoryInGb: 2
environmentVariables:
- name: HUB_HOST
value: localhost
- name: HUB_PORT
value: 4444
- name: SCREEN_WIDTH
value: "1920"
- name: SCREEN_HEIGHT
value: "1080"
volumeMounts:
- name: devshm
mountPath: /dev/shm
- name: downloads
mountPath: /home/seluser/downloads
- name: chrome-node-2
properties:
image: selenium/node-chrome:3.141.59-xenon
resources:
requests:
cpu: 1
memoryInGb: 2
environmentVariables:
- name: HUB_HOST
value: localhost
- name: HUB_PORT
value: 4444
- name: SCREEN_WIDTH
value: "1920"
- name: SCREEN_HEIGHT
value: "1080"
- name: SE_OPTS
value: "-port 6666"
volumeMounts:
- name: devshm
mountPath: /dev/shm
- name: downloads
mountPath: /home/seluser/downloads
osType: Linux
diagnostics:
logAnalytics:
workspaceId: __LOGANALYTICSWORKSPACEID__
workspaceKey: __LOGANALYTICSPRIMARYKEY__
volumes:
- name: devshm
emptyDir: {}
- name: downloads
emptyDir: {}
ipAddress:
type: Public
ports:
- protocol: tcp
port: '4444'
#==================== remove this section if not pulling images from private image registries ===============
imageRegistryCredentials:
- server: __IMAGEREGISTRYLOGINSERVER__
username: __IMAGEREGISTRYUSERNAME__
password: __IMAGEREGISTRYPASSWORD__
#========================================================================================================================
tags: null
type: Microsoft.ContainerInstance/containerGroups
When I run my tests locally against a docker selenium grid either from Visual Studio or via dotnet vstest, my tests run in parallel across all available nodes and complete in half the time.

Sharing volumes inside docker container on gitlab runner

So, I am trying to mount a working directory with project files into a child instance on a gitlab runner in sort of a DinD setup. I want to be able to mount a volume in a docker instance, which would allow me to muck around and test stuff. Like e2e testing and such… without compiling a new container to inject the files I need… Ideally, so I can share data in a DinD environment without having to build a new container for each job that runs…
I tried following (Docker volumes not mounted when using docker:dind (#41227) · Issues · GitLab.org / GitLab FOSS · GitLab) and I have some directories being mounted, but it is not the project data I am looking for.
So, the test jobs, I created a dummy file, and I wish to mount the directory in a container and view the files…
I have a test ci yml, which sort of does what I am looking for. I make test files in the volume I which to mount, which I would like to see in a directory listing, but sadly do not. I my second attempt at this, I couldn’t get the container ID becuase the labels don’t exist on the runner and it always comes up blank… However, the first stages show promise as It works perfectly on a “shell” runner outside of k8s. But, as soon as I change the tag to use a k8s runner it craps out. I can see old directory files /web and my directory I am mounting, but not the files within it. weird?
ci.yml
image: docker:stable
services:
- docker:dind
stages:
- compile
variables:
SHARED_PATH: /builds/$CI_PROJECT_PATH/shared/
DOCKER_DRIVER: overlay2
.test: &test
stage: compile
tags:
- k8s-vols
script:
- docker version
- 'export TESTED_IMAGE=$(echo ${CI_JOB_NAME} | sed "s/test //")'
- docker pull ${TESTED_IMAGE}
- 'export SHARED_PATH="$(dirname ${CI_PROJECT_DIR})/shared"'
- echo ${SHARED_PATH}
- echo ${CI_PROJECT_DIR}
- mkdir -p ${SHARED_PATH}
- touch ${SHARED_PATH}/test_file
- touch ${CI_PROJECT_DIR}/test_file2
- find ${SHARED_PATH}
#- find ${CI_PROJECT_DIR}
- docker run --rm -v ${CI_PROJECT_DIR}:/mnt ${TESTED_IMAGE} find /mnt
- docker run --rm -v ${CI_PROJECT_DIR}:/mnt ${TESTED_IMAGE} ls -lR /mnt
- docker run --rm -v ${SHARED_PATH}:/mnt ${TESTED_IMAGE} find /mnt
- docker run --rm -v ${SHARED_PATH}:/mnt ${TESTED_IMAGE} ls -lR /mnt
test alpine: *test
test ubuntu: *test
test centos: *test
testing:
stage: compile
tags:
- k8s-vols
image:
name: docker:stable
entrypoint: ["/bin/sh", "-c"]
script:
# get id of container
- export CONTAINER_ID=$(docker ps -q -f "label=com.gitlab.gitlab-runner.job.id=$CI_JOB_ID" -f "label=com.gitlab.gitlab-runner.type=build")
# get mount name
- export MOUNT_NAME=$(docker inspect $CONTAINER_ID -f "{{ range .Mounts }}{{ if eq .Destination \"/builds/${CI_PROJECT_NAMESPACE}\" }}{{ .Source }}{{end}}{{end}}" | cut -d "/" -f 6)
# run container
- docker run -v $MOUNT_NAME:/builds -w /builds/$CI_PROJECT_NAME --entrypoint=/bin/sh busybox -c "ls -la"
This is the values files I am working with…
image: docker-registry.corp.com/base-images/gitlab-runner:alpine-v13.3.1
imagePullPolicy: IfNotPresent
gitlabUrl: http://gitlab.corp.com
runnerRegistrationToken: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
runnerToken: ""
unregisterRunners: true
terminationGracePeriodSeconds: 3600
concurrent: 5
checkInterval: 10
rbac:
create: true
resources: ["pods", "pods/exec", "secrets"]
verbs: ["get", "list", "watch","update", "create", "delete"]
clusterWideAccess: false
metrics:
enabled: true
runners:
image: docker-registry.corp.com/base-images/docker-dind:v1
imagePullPolicy: "if-not-present"
requestConcurrency: 5
locked: true
tags: "k8s-vols"
privileged: true
secret: gitlab-runner-vols
namespace: gitlab-runner-k8s-vols
pollTimeout: 180
outputLimit: 4096
kubernetes:
volumes:
- type: host_path
volume:
name: docker
host_path: /var/run/docker.sock
mount_path: /var/run/docker.sock
read_only: false
cache: {}
builds: {}
services: {}
helpers:
cpuLimit: 200m
memoryLimit: 256Mi
cpuRequests: 100m
memoryRequests: 128Mi
image: docker-registry.corp.com/base-images/gitlab-runner-helper:x86_64-latest
env:
NAME: VALUE
CI_SERVER_URL: http://gitlab.corp.com
CLONE_URL:
RUNNER_REQUEST_CONCURRENCY: '1'
RUNNER_EXECUTOR: kubernetes
REGISTER_LOCKED: 'true'
RUNNER_TAG_LIST: k8s-vols
RUNNER_OUTPUT_LIMIT: '4096'
KUBERNETES_IMAGE: ubuntu:18.04
KUBERNETES_PRIVILEGED: 'true'
KUBERNETES_NAMESPACE: gitlab-runners-k8s-vols
KUBERNETES_POLL_TIMEOUT: '180'
KUBERNETES_CPU_LIMIT:
KUBERNETES_MEMORY_LIMIT:
KUBERNETES_CPU_REQUEST:
KUBERNETES_MEMORY_REQUEST:
KUBERNETES_SERVICE_ACCOUNT:
KUBERNETES_SERVICE_CPU_LIMIT:
KUBERNETES_SERVICE_MEMORY_LIMIT:
KUBERNETES_SERVICE_CPU_REQUEST:
KUBERNETES_SERVICE_MEMORY_REQUEST:
KUBERNETES_HELPER_CPU_LIMIT:
KUBERNETES_HELPER_MEMORY_LIMIT:
KUBERNETES_HELPER_CPU_REQUEST:
KUBERNETES_HELPER_MEMORY_REQUEST:
KUBERNETES_HELPER_IMAGE:
KUBERNETES_PULL_POLICY:
securityContext:
fsGroup: 65533
runAsUser: 100
resources: {}
affinity: {}
nodeSelector: {}
tolerations: []
envVars:
- name: CI_SERVER_URL
value: http://gitlab.corp.com
- name: CLONE_URL
- name: RUNNER_REQUEST_CONCURRENCY
value: '1'
- name: RUNNER_EXECUTOR
value: kubernetes
- name: REGISTER_LOCKED
value: 'true'
- name: RUNNER_TAG_LIST
value: k8s-vols
- name: RUNNER_OUTPUT_LIMIT
value: '4096'
- name: KUBERNETES_IMAGE
value: ubuntu:18.04
- name: KUBERNETES_PRIVILEGED
value: 'true'
- name: KUBERNETES_NAMESPACE
value: gitlab-runner-k8s-vols
- name: KUBERNETES_POLL_TIMEOUT
value: '180'
- name: KUBERNETES_CPU_LIMIT
- name: KUBERNETES_MEMORY_LIMIT
- name: KUBERNETES_CPU_REQUEST
- name: KUBERNETES_MEMORY_REQUEST
- name: KUBERNETES_SERVICE_ACCOUNT
- name: KUBERNETES_SERVICE_CPU_LIMIT
- name: KUBERNETES_SERVICE_MEMORY_LIMIT
- name: KUBERNETES_SERVICE_CPU_REQUEST
- name: KUBERNETES_SERVICE_MEMORY_REQUEST
- name: KUBERNETES_HELPER_CPU_LIMIT
- name: KUBERNETES_HELPER_MEMORY_LIMIT
- name: KUBERNETES_HELPER_CPU_REQUEST
- name: KUBERNETES_HELPER_MEMORY_REQUEST
- name: KUBERNETES_HELPER_IMAGE
- name: KUBERNETES_PULL_POLICY
hostAliases:
- ip: "10.10.x.x"
hostnames:
- "ch01"
podAnnotations:
prometheus.io/path: "/metrics"
prometheus.io/scrape: "true"
prometheus.io/port: "9252"
podLabels: {}
So, I have made a couple of tweaks to the helm chart. I have added a a volumes section in the config map…
config.toml: |
concurrent = {{ .Values.concurrent }}
check_interval = {{ .Values.checkInterval }}
log_level = {{ default “info” .Values.logLevel | quote }}
{{- if .Values.metrics.enabled }}
listen_address = ‘[::]:9252’
{{- end }}
volumes = ["/builds:/builds"]
#volumes = ["/var/run/docker.sock:/var/run/docker.sock", “/cache”, “/builds:/builds”]
I tried using the last line, which includes the docker sock mount, but when it ran, it complained that it could no find mount docker.sock, file not found, so I used the builds directory only in this section, and in the values files, added, the docker.sock mount. and it seems to work fine. for everything else but this mounting thing…
I also saw examples of setting the runner to privileged, but that didn’t seem to do much for me…
when I run the pipeline, this is the output…
So as you can see no files…
Thanks for taking the time to be thorough in your request, it really helps!

Managing multiple Pods with one Deploy

Good afternoon, I need help defining a structure of my production cluster, i want something like.
1 Deployment that controlled the pods
multiple PODS (one pod per-customer)
multiple services (one service-per pod)
but how will I do this structure if for each POD I have env vars that will connect to the customer database, like that
env:
- name: dbuser
value: "svc_iafox_test#***"
- name: dbpassword
value: "****"
- name: dbname
value: "ts-demo1"
- name: dbconnectstring
value: "jdbc:sqlserver://***-test.database.windows.net:1433;database=$(dbname);user=$(dbuser);password=$(dbpassword);encrypt=true;trustServerCertificate=true;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
so for each pod I will have to change these env vars ... anyway, what is the best way for me to do this??
you could use configmap to achieve that:
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: k8s.gcr.io/busybox
command: [ "/bin/sh", "-c", "echo $(SPECIAL_LEVEL_KEY) $(SPECIAL_TYPE_KEY)" ]
env:
- name: SPECIAL_LEVEL_KEY
valueFrom:
configMapKeyRef:
name: special-config
key: SPECIAL_LEVEL
- name: SPECIAL_TYPE_KEY
valueFrom:
configMapKeyRef:
name: special-config
key: SPECIAL_TYPE
restartPolicy: Never
https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#use-configmap-defined-environment-variables-in-pod-commands
ps. I dont think 1 deployment per pod makes sense. 1 deployment per customer does. I dont think you understand exactly what a deployment does: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/

Multi-broker Kafka on Kubernetes how to set KAFKA_ADVERTISED_HOST_NAME

My current Kafka deployment file with 3 Kafka brokers looks like this:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: kafka
spec:
selector:
matchLabels:
app: kafka
serviceName: kafka-headless
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: Parallel
template:
metadata:
labels:
app: kafka
spec:
containers:
- name: kafka-instance
image: wurstmeister/kafka
ports:
- containerPort: 9092
env:
- name: KAFKA_ADVERTISED_PORT
value: "9092"
- name: KAFKA_ADVERTISED_HOST_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: KAFKA_ZOOKEEPER_CONNECT
value: "zookeeper-0.zookeeper-headless.default.svc.cluster.local:2181,\
zookeeper-1.zookeeper-headless.default.svc.cluster.local:2181,\
zookeeper-2.zookeeper-headless.default.svc.cluster.local:2181"
- name: BROKER_ID_COMMAND
value: "hostname | awk -F '-' '{print $2}'"
- name: KAFKA_CREATE_TOPICS
value: hello:2:1
volumeMounts:
- name: data
mountPath: /var/lib/kafka/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 50Gi
This creates 3 Kafka brokers as a Stateful Set and connects to the Zookeeper cluster using the Kubedns service with FQDN (Fully Qualified Domain Names) such as:
zookeeper-0.zookeeper-headless.default.svc.cluster.local:2181
Broker IDs are generated based on the pod name:
- name: BROKER_ID_COMMAND
value: "hostname | awk -F '-' '{print $2}'"
Result:
kafka-0 = 0
kafka-1 = 1
kafka-2 = 2
However, In order to use the Kubedns names for the Kafka brokers:
kafka-0.kafka-headless.default.svc.cluster.local:9092
kafka-1.kafka-headless.default.svc.cluster.local:9092
kafka-2.kafka-headless.default.svc.cluster.local:9092
I need to be able to set the KAFKA_ADVERTISED_HOST_NAME variable to the above FQDN values based on the name of the pod.
Currently I have the variable set to the name of the pod:
- name: KAFKA_ADVERTISED_HOST_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
Result:
KAFKA_ADVERTISED_HOST_NAME=kafka-0
KAFKA_ADVERTISED_HOST_NAME=kafka-1
KAFKA_ADVERTISED_HOST_NAME=kafka-2
But somehow I would need to append the rest of the DNS name.
Is there a way I could set the DNS value directly?
Something like that:
- name: KAFKA_ADVERTISED_HOST_NAME
valueFrom:
fieldRef:
fieldPath: kubedns.name
I managed to solve the problem with a command field inside the pod definition:
command:
- sh
- -c
- "export KAFKA_ADVERTISED_HOST_NAME=$(hostname).kafka-headless.default.svc.cluster.local &&
start-kafka.sh"
This runs a shell command which exports the advertised hostname environment variable based on the hostname value.
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: KAFKA_ZOOKEEPER_CONNECT
value: zook-zookeeper.zook.svc.cluster.local:2181
- name: KAFKA_PORT_NUMBER
value: "9092"
- name: KAFKA_LISTENERS
value: SASL_SSL://:$(KAFKA_PORT_NUMBER)
- name: KAFKA_ADVERTISED_LISTENERS
value: SASL_SSL://$(MY_POD_NAME).kafka-kafka-headless.kafka.svc.cluster.local:$(KAFKA_PORT_NUMBER)
The above config would create your FQDN.
You should be able to see those names in your Kafka logs when Kafka server starts.
NOTE: Kubernetes allows you to reference environment variables using the syntax $(VARIABLE)
None of the above worked for me; my setup it wurstmeister/kafka:2.12-2.5.0 and wurstmeister/zookeeper:3.4.6 in a single pod on Kubernetes 1.16 (don't ask); ClusterIp service on top which forwards 9092 to the Kafka container.
This set of environment variables works:
- name: KAFKA_LISTENERS
value: "INSIDE://:9094,OUTSIDE://:9092"
- name: KAFKA_ADVERTISED_LISTENERS
value: "INSIDE://:9094,OUTSIDE://my-service.my-namespace.svc.cluster.local:9092"
- name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
value: "INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT" # not production-ready!
- name: KAFKA_INTER_BROKER_LISTENER_NAME
value: INSIDE
- name: KAFKA_ZOOKEEPER_CONNECT
value: "localhost:2181" # since it's in the same pod
Sources: wurstmeister/kafka doc, Kafka doc
The inherent problem seems to be that Kafka itself needs to be an IP-ish thing to bind to and to talk to itself via, while clients need a DNS-ish name to connect to from the outside. The latter one can't contain the pod name for some reason. (Might be a separate configuration issue on my end.)

Resources