crunchy postgres operator backup to azure blob fails - postgres-operator

I want to backup the database into azure blob, but failed.
My configuration is as follows
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo-azure
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.6-2
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 32Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.41-2
configuration:
- secret:
name: pgo-azure-creds
global:
repo2-path: /pgbackrest/repo2
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 32Gi
- name: repo2
azure:
container: "pgo"
patroni:
dynamicConfiguration:
postgresql:
pg_hba:
- "host all all 0.0.0.0/0 trust"
- "host all postgres 127.0.0.1/32 md5"
users:
- name: qixin
databases:
- iot
- lowcode
options: "SUPERUSER"
service:
type: LoadBalancer
My storage account name is pgobackup, container name is pgo.
The content of azure.conf is as follows.
[global]
repo2-azure-account=pgobackup
repo2-azure-key=aXdafScEP28el......JpkYa28nh5V+AStNtZ5Lg==
But there are no files on the blob
Also I executed the following command for a one time backup
kubectl annotate -n postgres-operator postgrescluster hippo-azure postgres-operator.crunchydata.com/pgbackrest-backup="$( date '+%F_%H:%M:%S' )" --overwrite=true
I also tried scheduled backups, but that also failed
- name: repo2
schedules:
full: "18 * * * *"
differential: "0 1 * * 1-6"
azure:
container: "pgo"
Taken from the official documentation.
The official document says that cronjob will be created, but no Job or Cronjob is created.
Can anyone give me some advice?
Thx!

Related

Apache Flink Operator - enable azure-fs-hadoop

I am trying to perform a flink job, using Flink Operator (https://github:com/apache/flink-kubernetes-operator) on k8s, that using uses a connection to Azure Blob Storage described here: https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/filesystems/azure/
Following the guideline I need to copy the jar file flink-azure-fs-hadoop-1.15.0.jar from one directory to another.
I have already tried to do it via podTemplate and command functionality, but unfortunately it does not work, and the file does not appear in the destination directory.
Can you guide me on how to do it properly?
Below you can find my FlinkDeployment file.
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
namespace: flink
name: basic-example
spec:
image: flink:1.15
flinkVersion: v1_15
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
serviceAccount: flink
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: pod-template
spec:
serviceAccount: flink
containers:
- name: flink-main-container
volumeMounts:
- mountPath: /opt/flink/data
name: flink-data
# command:
# - "touch"
# - "/tmp/test.txt"
volumes:
- name: flink-data
emptyDir: { }
jobManager:
resource:
memory: "2048m"
cpu: 1
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: job-manager-pod-template
spec:
initContainers:
- name: fetch-jar
image: cirrusci/wget
volumeMounts:
- mountPath: /opt/flink/data
name: flink-data
command:
- "wget"
- "LINK_TO_CUSTOM_JAR_FILE_ON_AZURE_BLOB_STORAGE"
- "-O"
- "/opt/flink/data/test.jar"
containers:
- name: flink-main-container
command:
- "touch"
- "/tmp/test.txt"
taskManager:
resource:
memory: "2048m"
cpu: 1
job:
jarURI: local:///opt/flink/data/test.jar
parallelism: 2
upgradeMode: stateless
state: running
ingress:
template: "CUSTOM_LINK_TO_AZURE"
annotations:
cert-manager.io/cluster-issuer: letsencrypt
kubernetes.io/ingress.allow-http: 'false'
traefik.ingress.kubernetes.io/router.entrypoints: websecure
traefik.ingress.kubernetes.io/router.tls: 'true'
traefik.ingress.kubernetes.io/router.tls.options: default
Since you are using the stock Flink 1.15 image this Azure filesystem plugin comes built-in. You can enable it via setting the ENABLE_BUILT_IN_PLUGINS environment variable.
spec:
podTemplate:
containers:
# Do not change the main container name
- name: flink-main-container
env:
- name: ENABLE_BUILT_IN_PLUGINS
value: flink-azure-fs-hadoop-1.15.0.jar
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#using-filesystem-plugins

Files in AIRFLOW_HOME (which is an Azure File Share MOUNT) are created as root

I have set up the airflow in Azure Cloud (Azure Container Apps) and attached an Azure File Share as an external mount/volume
1. I ran **airflow init service**, it had created the airflow.cfg and `'webserver_config.py'` file in the **AIRFLOW_HOME (/opt/airflow)**, which is actually an azure mounted file system
2. I ran **airflow webserver service**, it had created the `airflow-webserver.pid` file in the **AIRFLOW_HOME (/opt/airflow)**, which is actually an azure mounted file system
Now the problem is all the files created above are created with root user&groups, not as airflow user(50000),
I have also set the env variable AIRFLOW_UID to 50000 during the creation of the container app. due to this my webservers are not starting, throwing the below error
PermissionError: [Errno 1] Operation not permitted: '/opt/airflow/airflow-webserver.pid'
Note: Azure Containers Apps does not allow use root/sudo commands, otherwise I could solve this problem with simple chown commands
Another problem is the airflow configurations passed through environment variables are never picked up by Docker, eg
- name: AIRFLOW__API__AUTH_BACKENDS
value: 'airflow.api.auth.backend.basic_auth'
Attached screenshot for reference
Your help is much appreciated!
YAML file that I use to create my container app:
id: /subscriptions/1234/resourceGroups/<my-res-group>/providers/Microsoft.App/containerApps/<app-name>
identity:
type: None
location: eastus2
name: webservice
properties:
configuration:
activeRevisionsMode: Single
registries: []
managedEnvironmentId: /subscriptions/1234/resourceGroups/<my-res-group>/providers/Microsoft.App/managedEnvironments/container-app-env
template:
containers:
- command:
- /bin/bash
- -c
- exec /entrypoint airflow webserver
env:
- name: AIRFLOW__API__AUTH_BACKENDS
value: 'airflow.api.auth.backend.basic_auth'
- name: AIRFLOW__CELERY__BROKER_URL
value: redis://:#myredis.redis.cache.windows.net:6379/0
- name: AIRFLOW__CELERY__RESULT_BACKEND
value: db+postgresql://user:pass#postres-db-servconn.postgres.database.azure.com/airflow?sslmode=require
- name: AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION
value: 'true'
- name: AIRFLOW__CORE__EXECUTOR
value: CeleryExecutor
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
value: postgresql+psycopg2://user:pass#postres-db-servconn.postgres.database.azure.com/airflow?sslmode=require
- name: AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
value: postgresql+psycopg2://user:pass#postres-db-servconn.postgres.database.azure.com/airflow?sslmode=require
- name: AIRFLOW__CORE__LOAD_EXAMPLES
value: 'false'
- name: AIRFLOW_UID
value: 50000
image: docker.io/apache/airflow:latest
name: wsr
volumeMounts:
- volumeName: randaf-azure-files-volume
mountPath: /opt/airflow
probes: []
resources:
cpu: 0.25
memory: 0.5Gi
scale:
maxReplicas: 3
minReplicas: 1
volumes:
- name: randaf-azure-files-volume
storageName: randafstorage
storageType: AzureFile
resourceGroup: RAND
tags:
tagname: ws-only
type: Microsoft.App/containerApps

Deploying influxDB 2 in Azure AKS cluster with provisioned storage account

I'm having trouble to deploy Influxdb2 into my Azure AKS Cluster. I'm using a simple storage account to serve as storage. Looking the influxdb pod:
** ts=2021-11-26T00:43:44.126091Z lvl=error msg=“Failed to apply SQL migrations” log_id=0Y2Q~wH0000 error=“database is locked”
** Error: database is locked
I change my PVC to use CSI:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: sc-influxdb
namespace: #{NAMESPACE}#
provisioner: file.csi.azure.com
allowVolumeExpansion: true
parameters:
storageAccount: #{STORAGE_ACCOUNT_NAME}#
location: #{STORAGE_ACCOUNT_LOCATION}#
# Check driver parameters here:
# https://github.com/kubernetes-sigs/azurefile-csi-driver/blob/master/docs/driver-parameters.md
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=0
- gid=0
- mfsymlinks
- cache=strict # https://linux.die.net/man/8/mount.cifs
- nosharesock # reduce probability of reconnect race
- actimeo=30 # reduce latency for metadata-heavy workload
---
# Create a Secret to hold the name and key of the Storage Account
# Remember: values are base64 encoded
apiVersion: v1
kind: Secret
metadata:
name: #{STORAGE_ACCOUNT_NAME}#
namespace: #{NAMESPACE}#
type: Opaque
data:
azurestorageaccountname: #{STORAGE_ACCOUNT_NAME_B64}#
azurestorageaccountkey: #{STORAGE_ACCOUNT_KEY_B64}#
---
# Create a persistent volume, with the corresponding StorageClass and the reference to the Azure File secret.
# Remember: Create the share in the storage account otherwise the pods will fail with a "No such file or directory"
apiVersion: v1
kind: PersistentVolume
metadata:
name: influxdb-pv
spec:
capacity:
storage: 5Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: sc-influxdb
claimRef:
name: influxdb-pvc
namespace: #{NAMESPACE}#
azureFile:
secretName: #{STORAGE_ACCOUNT_NAME}#
secretNamespace: #{NAMESPACE}#
shareName: influxdb
readOnly: false
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=0
- gid=0
- mfsymlinks
- cache=strict
- nosharesock
- nobrl
---
# Create a PersistentVolumeClaim referencing the StorageClass and the volume
# Remember: this is a static scenario. The volume was created in the previous step.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: influxdb-pvc
namespace: #{NAMESPACE}#
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Ti
storageClassName: sc-influxdb
volumeName: influxdb-pv
In my values.yml I defined my persistence as:
## Persist data to a persistent volume
##
persistence:
enabled: true
## If true will use an existing PVC instead of creating one
useExisting: true
## Name of existing PVC to be used in the influx deployment
name: influxdb-pvc
## influxdb data Persistent Volume Storage Class
## If defined, storageClassName: <storageClass>
## If set to "-", storageClassName: "", which disables dynamic provisioning
## If undefined (the default) or set to null, no storageClassName spec is
## set, choosing the default provisioner. (gp2 on AWS, standard on
## GKE, AWS & OpenStack)
##
# storageClass: sc-influxdb
size: 5Ti
To install I ran:
helm upgrade --install influxdb influxdata/influxdb2 -n influxdb -f values.yml

Using a PV in an OpenShift 3 cron job

I have been able to successfully create a cron job for my OpenShift 3 project. The project is a lift and shift from an existing Linux web server. Part of the existing application requires several cron tasks to run. The one I am looking at the moment is a daily update to the applications database. As part of the execution of the cron job I want to write to a log file. There is already a PV/PVC defined for the main application and I was intending to use that hold the logs for my cron job but it seems the cron job is not being provided access to the PV.
I am using the following inProgress.yml for the definition of the cron job
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: in-progress
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: "Replace"
startingDeadlineSeconds: 200
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
parent: "cronjobInProgress"
spec:
containers:
- name: in-progress
image: <image name>
command: ["php", "inProgress.php"]
restartPolicy: OnFailure
volumeMounts:
- mountPath: /data-pv
name: log-vol
volumes:
- name: log-vol
persistentVolumeClaim:
claimName: data-pv
I am using the following command to create the cron job
oc create -f inProgress.yml
PHP Warning: fopen(/data-pv/logs/2022-04-27-app.log): failed to open
stream: No such file or directory in
/opt/app-root/src/errorHandler.php on line 75 WARNING: [2] mkdir():
Permission denied, line 80 in file
/opt/app-root/src/errorLogger.php WARNING: [2]
fopen(/data-pv/logs/2022-04-27-inprogress.log): failed to open stream:
No such file or directory, line 60 in file
/opt/app-root/src/errorLogger.php
Looking at the yml for pod that is executed, there is no mention of data-pv - it appears as though secret volumeMount, which has been added by OpenShift, is removing any further volumeMounts.
apiVersion: v1 kind: Pod metadata: annotations:
openshift.io/scc: restricted creationTimestamp: '2022-04-27T13:25:04Z' generateName: in-progress-1651065900- ...
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-n9jsw
readOnly: true ... volumes:
- name: default-token-n9jsw
secret:
defaultMode: 420
secretName: default-token-n9jsw
How can I access the PV from within the cron job?
Your manifest is incorrect. The volumes block needs to be part of the spec.jobTemplate.spec.template.spec, that is, it needs to be indented at the same level as spec.jobTemplate.spec.template.spec.containers. In its current position it is invisible to OpenShift. See e.g. this pod example.
Similarly, volumeMounts and restartPolicy are arguments to the container block, and need to be indented accordingly.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: in-progress
spec:
schedule: '*/5 * * * *'
concurrencyPolicy: Replace
startingDeadlineSeconds: 200
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
parent: cronjobInProgress
spec:
containers:
- name: in-progress
image: <image name>
command:
- php
- inProgress.php
restartPolicy: OnFailure
volumeMounts:
- mountPath: /data-pv
name: log-vol
volumes:
- name: log-vol
persistentVolumeClaim:
claimName: data-pv
Thanks for the informative response larsks.
OpenShift displayed the following when I copied your manifest suggestions
$ oc create -f InProgress.yml The CronJob "in-progress" is invalid:
spec.jobTemplate.spec.template.spec.restartPolicy: Unsupported value:
"Always": supported values: "OnFailure", " Never"
As your answer was very helpful I was able to resolve this problem by moving restartPolicy: OnFailure so the final manifest is below.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: in-progress
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: "Replace"
startingDeadlineSeconds: 200
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
parent: "cronjobInProgress"
spec:
restartPolicy: OnFailure
containers:
- name: in-progress
image: <image name>
command: ["php", "updateToInProgress.php"]
volumeMounts:
- mountPath: /data-pv
name: log-vol
volumes:
- name: log-vol
persistentVolumeClaim:
claimName: data-pv

Nextcloud with Replicas on Azure Kubernetes - Failing to Mount Azure Files ReadWriteMany Volume

I'm trying to deploy Nextcloud w/HPA (replicas - horizontal scaling) on Azure Kubernetes with the official Nextcloud Helm chart and a ReadWriteMany volume created following these official instructions, but the volume never mounts, get this (or some version thereof) error:
kind: Event
apiVersion: v1
metadata:
name: nextcloud-6bc9b947bf-z6rlh.16bf7711bc2827a5
namespace: nextcloud
uid: c3c5619b-19da-4070-afbb-24bce111ddbe
resourceVersion: '55858'
creationTimestamp: '2021-12-10T18:08:27Z'
managedFields:
- manager: kubelet
operation: Update
apiVersion: v1
time: '2021-12-10T18:08:27Z'
fieldsType: FieldsV1
fieldsV1:
f:count: {}
f:firstTimestamp: {}
f:involvedObject: {}
f:lastTimestamp: {}
f:message: {}
f:reason: {}
f:source:
f:component: {}
f:host: {}
f:type: {}
involvedObject:
kind: Pod
namespace: nextcloud
name: nextcloud-6bc9b947bf-z6rlh
uid: 6106d13f-7033-4a4e-a6e9-a8e3947c52a4
apiVersion: v1
resourceVersion: '55764'
reason: FailedMount
message: >
MountVolume.MountDevice failed for volume "nextcloud-rwx" : rpc error: code =
Internal desc = volume(#azure-secret#aksshare#) mount
"//nextcloudcluster.file.core.windows.net/aksshare" on
"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/nextcloud-rwx/globalmount"
failed with mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t cifs -o
dir_mode=0777,file_mode=0777,gid=33,mfsymlinks,actimeo=30,<masked>
//nextcloudcluster.file.core.windows.net/aksshare
/var/lib/kubelet/plugins/kubernetes.io/csi/pv/nextcloud-rwx/globalmount
Output: mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log
messages (dmesg)
source:
component: kubelet
host: aks-agentpool-16596208-vmss000002
firstTimestamp: '2021-12-10T18:08:27Z'
lastTimestamp: '2021-12-10T18:08:35Z'
count: 5
type: Warning
eventTime: null
reportingComponent: ''
reportingInstance: ''
Here is my PersistentVolume yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: nextcloud-rwx
namespace: nextcloud
spec:
capacity:
storage: 32Gi
accessModes:
- ReadWriteMany
azureFile:
secretName: azure-secret
shareName: aksshare
readOnly: false
mountOptions:
- dir_mode=0777
- file_mode=0777
- gid=33
- mfsymlinks
PersistentVolumeClaim yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nextcloud-rwx
namespace: nextcloud
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
resources:
requests:
storage: 32Gi
I've also tried changing uid and gid to 0, 1000, etc, and get an even more egregious permission denied message because it doesn't "match the fsgroup(33)" (hence why I tried with gid=33).
Any ideas would be greatly appreciated! Thank you for your time.

Resources