I am using postgres-operator version 1.8.2. I am trying to create a postgres cluster with Persistent volume on azure blob storage. I am facing issues while creating any cluster. Please find the error below:
2022-07-18 01:11:24,907 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2022-07-18 01:11:24,924 - bootstrapping - INFO - No meta-data available for this provider
2022-07-18 01:11:24,925 - bootstrapping - INFO - Looks like your running unsupported
2022-07-18 01:11:25,011 - bootstrapping - INFO - Configuring bootstrap
2022-07-18 01:11:25,011 - bootstrapping - INFO - Configuring certificate
2022-07-18 01:11:25,011 - bootstrapping - INFO - Generating ssl self-signed certificate
2022-07-18 01:11:25,420 - bootstrapping - INFO - Configuring pgbouncer
2022-07-18 01:11:25,420 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2022-07-18 01:11:25,420 - bootstrapping - INFO - Configuring patroni
2022-07-18 01:11:25,428 - bootstrapping - INFO - Writing to file /run/postgres.yml
2022-07-18 01:11:25,428 - bootstrapping - INFO - Configuring pam-oauth2
2022-07-18 01:11:25,429 - bootstrapping - INFO - Writing to file /etc/pam.d/postgresql
2022-07-18 01:11:25,429 - bootstrapping - INFO - Configuring log
2022-07-18 01:11:25,429 - bootstrapping - INFO - Configuring wal-e
2022-07-18 01:11:25,429 - bootstrapping - INFO - Configuring crontab
2022-07-18 01:11:25,429 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2022-07-18 01:11:25,430 - bootstrapping - INFO - Configuring pgqd
2022-07-18 01:11:25,430 - bootstrapping - INFO - Configuring standby-cluster
2022-07-18 01:11:26,966 WARNING: Kubernetes RBAC doesn't allow GET access to the 'kubernetes' endpoint in the 'default' namespace. Disabling 'bypass_api_service'.
2022-07-18 01:11:27,018 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-07-18 01:11:27,111 INFO: Lock owner: None; I am platform-postgres-ha-cluster-0
2022-07-18 01:11:27,227 INFO: trying to bootstrap a new cluster
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /home/postgres/pgdata/pgroot/data ... 2022-07-18T01:11:27.439948215Z ok
creating subdirectories ... 2022-07-18T01:11:30.354678043Z ok
selecting default max_connections ... 2022-07-18T01:11:30.719106299Z 100
selecting default shared_buffers ... 2022-07-18T01:11:30.913778725Z 128MB
selecting default timezone ... 2022-07-18T01:11:30.936790305Z Etc/UTC
selecting dynamic shared memory implementation ... 2022-07-18T01:11:30.937331209Z posix
creating configuration files ... 2022-07-18T01:11:31.421589304Z ok
running bootstrap script ... 2022-07-18T01:11:32.551608060Z LOG: could not link file "pg_xlog/xlogtemp.70" to "pg_xlog/000000010000000000000001": Function not implemented
FATAL: could not open file "pg_xlog/000000010000000000000001": No such file or directory
child process exited with exit code 1
initdb: removing contents of data directory "/home/postgres/pgdata/pgroot/data"
pg_ctl: database system initialization failed
2022-07-18 01:11:32,647 INFO: removing initialize key after failed attempt to bootstrap the cluster
2022-07-18 01:11:32,725 INFO: renaming data directory to /home/postgres/pgdata/pgroot/data_2022-07-18-01-11-32
My postgres-cluster.yaml is:
kind: "postgresql"
apiVersion: "acid.zalan.do/v1"
metadata:
name: "platform-postgres-ha-cluster"
namespace: "postgres"
labels:
team: platform
spec:
teamId: "platform"
postgresql:
version: "9.6"
numberOfInstances: 3
enableMasterLoadBalancer: true
enableReplicaLoadBalancer: true
volume:
storageClass: blob-fuse-retained #storageClass
subPath: postgres-1
size: "10Gi"
users:
test: []
databases:
test: test
allowedSourceRanges:
# IP ranges to access your cluster go here
# Your odd host IP
- 127.0.0.1/32
resources:
requests:
cpu: 100m
memory: 1Gi
limits:
cpu: 500m
memory: 5Gi
I am helm chart for installing the operator. My values.yml is:
image:
registry: registry.opensource.zalan.do
repository: acid/postgres-operator
tag: v1.8.2
pullPolicy: "IfNotPresent"
# Optionally specify an array of imagePullSecrets.
# Secrets must be manually created in the namespace.
# ref: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
# imagePullSecrets:
# - name: myRegistryKeySecretName
podAnnotations: {}
podLabels: {}
configTarget: "OperatorConfigurationCRD"
# JSON logging format
enableJsonLogging: false
# general configuration parameters
configGeneral:
# the deployment should create/update the CRDs
enable_crd_registration: true
# specify categories under which crds should be listed
crd_categories:
- "all"
# update only the statefulsets without immediately doing the rolling update
enable_lazy_spilo_upgrade: false
# set the PGVERSION env var instead of providing the version via postgresql.bin_dir in SPILO_CONFIGURATION
enable_pgversion_env_var: true
# start any new database pod without limitations on shm memory
enable_shm_volume: true
# enables backwards compatible path between Spilo 12 and Spilo 13+ images
enable_spilo_wal_path_compat: true
# etcd connection string for Patroni. Empty uses K8s-native DCS.
etcd_host: ""
# Select if setup uses endpoints (default), or configmaps to manage leader (DCS=k8s)
# kubernetes_use_configmaps: false
# Spilo docker image
docker_image: registry.opensource.zalan.do/acid/spilo-14:2.1-p6
# min number of instances in Postgres cluster. -1 = no limit
min_instances: -1
# max number of instances in Postgres cluster. -1 = no limit
max_instances: -1
# period between consecutive repair requests
repair_period: 5m
# period between consecutive sync requests
resync_period: 30m
# can prevent certain cases of memory overcommitment
# set_memory_request_to_limit: false
# map of sidecar names to docker images
# sidecar_docker_images:
# example: "exampleimage:exampletag"
# number of routines the operator spawns to process requests concurrently
workers: 8
# parameters describing Postgres users
configUsers:
# roles to be granted to database owners
# additional_owner_roles:
# - cron_admin
# enable password rotation for app users that are not database owners
enable_password_rotation: false
# rotation interval for updating credentials in K8s secrets of app users
password_rotation_interval: 600000
# retention interval to keep rotation users
password_rotation_user_retention: 600000
# postgres username used for replication between instances
replication_username: standby
# postgres superuser name to be created by initdb
super_username: postgres
configMajorVersionUpgrade:
# "off": no upgrade, "manual": manifest triggers action, "full": minimal version violation triggers too
major_version_upgrade_mode: "off"
# upgrades will only be carried out for clusters of listed teams when mode is "off"
# major_version_upgrade_team_allow_list:
# - acid
# minimal Postgres major version that will not automatically be upgraded
minimal_major_version: "9.6"
# target Postgres major version when upgrading clusters automatically
target_major_version: "14"
configKubernetes:
# list of additional capabilities for postgres container
# additional_pod_capabilities:
# - "SYS_NICE"
# default DNS domain of K8s cluster where operator is running
cluster_domain: cluster.local
# additional labels assigned to the cluster objects
cluster_labels:
application: spilo
# label assigned to Kubernetes objects created by the operator
cluster_name_label: postgres-ha
# additional annotations to add to every database pod
# custom_pod_annotations:
# keya: valuea
# keyb: valueb
# key name for annotation that compares manifest value with current date
# delete_annotation_date_key: "delete-date"
# key name for annotation that compares manifest value with cluster name
# delete_annotation_name_key: "delete-clustername"
# list of annotations propagated from cluster manifest to statefulset and deployment
# downscaler_annotations:
# - deployment-time
# - downscaler/*
# allow user secrets in other namespaces than the Postgres cluster
enable_cross_namespace_secret: false
# enables initContainers to run actions before Spilo is started
enable_init_containers: true
# toggles pod anti affinity on the Postgres pods
enable_pod_antiaffinity: false
# toggles PDB to set to MinAvailabe 0 or 1
enable_pod_disruption_budget: true
# enables sidecar containers to run alongside Spilo in the same pod
enable_sidecars: true
# annotations to be ignored when comparing statefulsets, services etc.
# ignored_annotations:
# - k8s.v1.cni.cncf.io/network-status
# namespaced name of the secret containing infrastructure roles names and passwords
# infrastructure_roles_secret_name: postgresql-infrastructure-roles
# list of annotation keys that can be inherited from the cluster manifest
# inherited_annotations:
# - owned-by
# list of label keys that can be inherited from the cluster manifest
# inherited_labels:
# - application
# - environment
# timeout for successful migration of master pods from unschedulable node
# master_pod_move_timeout: 20m
# set of labels that a running and active node should possess to be considered ready
# node_readiness_label:
# status: ready
# defines how nodeAffinity from manifest should be merged with node_readiness_label
# node_readiness_label_merge: "OR"
# namespaced name of the secret containing the OAuth2 token to pass to the teams API
# oauth_token_secret_name: postgresql-operator
# defines the template for PDB (Pod Disruption Budget) names
pdb_name_format: "postgres-{cluster}-pdb"
# override topology key for pod anti affinity
pod_antiaffinity_topology_key: "kubernetes.io/hostname"
# namespaced name of the ConfigMap with environment variables to populate on every pod
pod_environment_configmap: "postgres-pod-config"
# name of the Secret (in cluster namespace) with environment variables to populate on every pod
pod_environment_secret: "postgres-pod-secrets"
# specify the pod management policy of stateful sets of Postgres clusters
pod_management_policy: "ordered_ready"
# label assigned to the Postgres pods (and services/endpoints)
pod_role_label: spilo-role
# service account definition as JSON/YAML string to be used by postgres cluster pods
# pod_service_account_definition: ""
# role binding definition as JSON/YAML string to be used by pod service account
# pod_service_account_role_binding_definition: ""
# Postgres pods are terminated forcefully after this timeout
pod_terminate_grace_period: 5m
# template for database user secrets generated by the operator,
# here username contains the namespace in the format namespace.username
# if the user is in different namespace than cluster and cross namespace secrets
# are enabled via `enable_cross_namespace_secret` flag in the configuration.
secret_name_template: "{username}.{cluster}.credentials.{tprkind}.{tprgroup}"
# set user and group for the spilo container (required to run Spilo as non-root process)
# spilo_runasuser: 101
# spilo_runasgroup: 103
# group ID with write-access to volumes (required to run Spilo as non-root process)
# spilo_fsgroup: 103
# whether the Spilo container should run in privileged mode
spilo_privileged: false
# whether the Spilo container should run with additional permissions other than parent.
# required by cron which needs setuid
spilo_allow_privilege_escalation: true
# storage resize strategy, available options are: ebs, pvc, off
storage_resize_mode: pvc
# pod toleration assigned to instances of every Postgres cluster
# toleration:
# key: db-only
# operator: Exists
# effect: NoSchedule
# operator watches for postgres objects in the given namespace
watched_namespace: "*" # listen to all namespaces
# configure resource requests for the Postgres pods
configPostgresPodResources:
# CPU limits for the postgres containers
default_cpu_limit: "2"
# CPU request value for the postgres containers
default_cpu_request: 1000m
# memory limits for the postgres containers
default_memory_limit: 5Gi
# memory request value for the postgres containers
default_memory_request: 100Mi
# hard CPU minimum required to properly run a Postgres cluster
min_cpu_limit: 250m
# hard memory minimum required to properly run a Postgres cluster
min_memory_limit: 250Mi
# timeouts related to some operator actions
configTimeouts:
# interval between consecutive attempts of operator calling the Patroni API
patroni_api_check_interval: 20s
# timeout when waiting for successful response from Patroni API
patroni_api_check_timeout: 20s
# timeout when waiting for the Postgres pods to be deleted
pod_deletion_wait_timeout: 10m
# timeout when waiting for pod role and cluster labels
pod_label_wait_timeout: 10m
# interval between consecutive attempts waiting for postgresql CRD to be created
ready_wait_interval: 20s
# timeout for the complete postgres CRD creation
ready_wait_timeout: 30s
# interval to wait between consecutive attempts to check for some K8s resources
resource_check_interval: 10s
# timeout when waiting for the presence of a certain K8s resource (e.g. Sts, PDB)
resource_check_timeout: 10m
# configure behavior of load balancers
configLoadBalancer:
# DNS zone for cluster DNS name when load balancer is configured for cluster
db_hosted_zone: db.example.com
# annotations to apply to service when load balancing is enabled
# custom_service_annotations:
# keyx: valuez
# keya: valuea
# toggles service type load balancer pointing to the master pod of the cluster
enable_master_load_balancer: true
# toggles service type load balancer pointing to the master pooler pod of the cluster
enable_master_pooler_load_balancer: false
# toggles service type load balancer pointing to the replica pod of the cluster
enable_replica_load_balancer: true
# toggles service type load balancer pointing to the replica pooler pod of the cluster
enable_replica_pooler_load_balancer: true
# define external traffic policy for the load balancer
external_traffic_policy: "Cluster"
# defines the DNS name string template for the master load balancer cluster
master_dns_name_format: "{cluster}.{team}.{hostedzone}"
# defines the DNS name string template for the replica load balancer cluster
replica_dns_name_format: "{cluster}-repl.{team}.{hostedzone}"
# options to aid debugging of the operator itself
configDebug:
# toggles verbose debug logs from the operator
debug_logging: true
# toggles operator functionality that require access to the postgres database
enable_database_access: true
# parameters affecting logging and REST API listener
configLoggingRestApi:
# REST API listener listens to this port
api_port: 8080
# number of entries in the cluster history ring buffer
cluster_history_entries: 1000
# number of lines in the ring buffer used to store cluster logs
ring_log_lines: 100
# configure interaction with non-Kubernetes objects from AWS or GCP
configAwsOrGcp:
# Additional Secret (aws or gcp credentials) to mount in the pod
# additional_secret_mount: "some-secret-name"
# Path to mount the above Secret in the filesystem of the container(s)
# additional_secret_mount_path: "/some/dir"
# AWS region used to store ESB volumes
aws_region: eu-central-1
# enable automatic migration on AWS from gp2 to gp3 volumes
enable_ebs_gp3_migration: false
# defines maximum volume size in GB until which auto migration happens
# enable_ebs_gp3_migration_max_size: 1000
# GCP credentials that will be used by the operator / pods
# gcp_credentials: ""
# AWS IAM role to supply in the iam.amazonaws.com/role annotation of Postgres pods
# kube_iam_role: ""
# S3 bucket to use for shipping postgres daily logs
# log_s3_bucket: ""
# S3 bucket to use for shipping WAL segments with WAL-E
# wal_s3_bucket: ""
# GCS bucket to use for shipping WAL segments with WAL-E
# wal_gs_bucket: ""
# Azure Storage Account to use for shipping WAL segments with WAL-G
wal_az_storage_account: "azure-account-name"
# configure K8s cron job managed by the operator
configLogicalBackup:
# image for pods of the logical backup job (example runs pg_dumpall)
logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup:v1.8.0"
# path of google cloud service account json file
# logical_backup_google_application_credentials: ""
# prefix for the backup job name
logical_backup_job_prefix: "logical-backup-"
# storage provider - either "s3" or "gcs"
logical_backup_provider: "s3"
# S3 Access Key ID
logical_backup_s3_access_key_id: ""
# S3 bucket to store backup results
logical_backup_s3_bucket: "my-bucket-url"
# S3 region of bucket
logical_backup_s3_region: ""
# S3 endpoint url when not using AWS
logical_backup_s3_endpoint: ""
# S3 Secret Access Key
logical_backup_s3_secret_access_key: ""
# S3 server side encryption
logical_backup_s3_sse: "AES256"
# S3 retention time for stored backups for example "2 week" or "7 days"
logical_backup_s3_retention_time: ""
# backup schedule in the cron format
logical_backup_schedule: "30 00 * * *"
# automate creation of human users with teams API service
configTeamsApi:
# team_admin_role will have the rights to grant roles coming from PG manifests
enable_admin_role_for_users: true
# operator watches for PostgresTeam CRs to assign additional teams and members to clusters
enable_postgres_team_crd: false
# toogle to create additional superuser teams from PostgresTeam CRs
enable_postgres_team_crd_superusers: false
# toggle to automatically rename roles of former team members and deny LOGIN
enable_team_member_deprecation: false
# toggle to grant superuser to team members created from the Teams API
enable_team_superuser: false
# toggles usage of the Teams API by the operator
enable_teams_api: false
# should contain a URL to use for authentication (username and token)
# pam_configuration: https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees
# operator will add all team member roles to this group and add a pg_hba line
pam_role_name: zalandos
# List of teams which members need the superuser role in each Postgres cluster
postgres_superuser_teams:
- postgres_superusers
# List of roles that cannot be overwritten by an application, team or infrastructure role
protected_role_names:
- admin
- cron_admin
# Suffix to add if members are removed from TeamsAPI or PostgresTeam CRD
role_deletion_suffix: "_deleted"
# role name to grant to team members created from the Teams API
team_admin_role: admin
# postgres config parameters to apply to each team member role
team_api_role_configuration:
log_statement: all
# URL of the Teams API service
# teams_api_url: http://fake-teams-api.default.svc.cluster.local
# configure connection pooler deployment created by the operator
configConnectionPooler:
# db schema to install lookup function into
connection_pooler_schema: "pooler"
# db user for pooler to use
connection_pooler_user: "pooler"
# docker image
connection_pooler_image: "registry.opensource.zalan.do/acid/pgbouncer:master-22"
# max db connections the pooler should hold
connection_pooler_max_db_connections: 60
# default pooling mode
connection_pooler_mode: "transaction"
# number of pooler instances
connection_pooler_number_of_instances: 2
# default resources
connection_pooler_default_cpu_request: 500m
connection_pooler_default_memory_request: 100Mi
connection_pooler_default_cpu_limit: "1"
connection_pooler_default_memory_limit: 100Mi
# Zalando's internal CDC stream feature
enableStreams: false
rbac:
# Specifies whether RBAC resources should be created
create: true
# Specifies whether ClusterRoles that are aggregated into the K8s default roles should be created. (https://kubernetes.io/docs/reference/access-authn-authz/rbac/#default-roles-and-role-bindings)
createAggregateClusterRoles: false
serviceAccount:
# Specifies whether a ServiceAccount should be created
create: true
# The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name:
podServiceAccount:
# The name of the ServiceAccount to be used by postgres cluster pods
# If not set a name is generated using the fullname template and "-pod" suffix
name: "postgres-pod"
# priority class for operator pod
priorityClassName: ""
# priority class for database pods
podPriorityClassName: ""
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 100m
memory: 250Mi
securityContext:
runAsUser: 1000
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: true
# Affinity for pod assignment
# Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}
# Node labels for pod assignment
# Ref: https://kubernetes.io/docs/user-guide/node-selection/
nodeSelector: {}
# Tolerations for pod assignment
# Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
tolerations: []
controllerID:
# Specifies whether a controller ID should be defined for the operator
# Note, all postgres manifest must then contain the following annotation to be found by this operator
# "acid.zalan.do/controller": <controller-ID-of-the-operator>
create: false
# The name of the controller ID to use.
# If not set and create is true, a name is generated using the fullname template
name:
The secret and config details are below:
#postgres/psql-wale-creds
apiVersion: v1
kind: Secret
metadata:
name: postgres-pod-secrets
namespace: postgres
data: #all base64 encoded
AZURE_STORAGE_ACCESS_KEY: 'ACCESS_KEY'
AZURE_STORAGE_ACCOUNT: 'azure-account'
CLONE_AZURE_STORAGE_ACCESS_KEY: 'ACCESS_KEY'
CLONE_AZURE_STORAGE_ACCOUNT: 'azure-account'
#postgres/pod-env-overrides
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-pod-config
namespace: postgres
data:
USE_WALG_BACKUP: "true"
BACKUP_SCHEDULE: "0 3,15 * * *" # Schedule a base backup at 3:00 and 15:00
BACKUP_NUM_TO_RETAIN: "180" # For 2 backups per day, keep 90 days of base backups
WALG_AZ_PREFIX: "azure://kubernetes-test/$(SCOPE)/$(PGVERSION)"
# For point in time recovery/restore
USE_WALG_RESTORE: true
CLONE_WALG_AZ_PREFIX: azure://kubernetes-test/$(CLONE_SCOPE)/$(PGVERSION)
CLONE_USE_WALG_BACKUP: true
CLONE_USE_WALG_RESTORE: true
I am using blob-csi-driver for connecting to azure blob storage. The setup works fine with other containers. It is erroring out only with the postgres container. Can anyone help me in solving the issue?
Related
We have a small collection of Kubernetes pods which run react/next.js UIs in a node 16 alpine container (node:16.18.1-alpine3.15 to be precise). All of this runs in AWS EKS 1.23. We make use of annotations on these pods in order to inject secrets from Hashicorp Vault at start up. The annotations pull the desired secrets from Vault and write these to a file on the pod. Example of said annotations below :
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/agent-init-first: "true"
vault.hashicorp.com/agent-pre-populate-only: "true"
vault.hashicorp.com/role: "onejourney-ui"
vault.hashicorp.com/agent-inject-secret-config: "secret/data/onejourney-ui"
vault.hashicorp.com/agent-inject-template-config: |
{{- with secret "secret/data/onejourney-ui" -}}
export AUTH0_CLIENT_ID="{{ .Data.data.auth0_client_id }}"
export SENTRY_DSN="{{ .Data.data.sentry_admin_dsn }}"
{{- end }}
When the pod starts up, we source this file (which is created by default at /vault/secrets/config) to set environment variables and then delete the file. We do that with the following pod arguments in our helm chart :
node:
args:
- /bin/sh
- -c
- source /vault/secrets/config; rm -rf /vault/secrets/config; yarn start-admin;
We recently upgraded some of AWS EKS clusters from 1.23 to 1.24. After doing so, we noted that our node applications were failing to start and entering a crash loop. Looking in the logs of these containers, the problem seemed to be that the pod was unable to locate the secrets file anymore.
Interestingly, the Vault init container completed successfully and shows that the file was successfully created...
Out of curiosity, I removed the node args to source the file which allowed the container to start successfully, but I found when execing into the pod, the file WAS infact present and had the content I was expecting. The file also had the correct owner and permissions as we see in a good working instance in EKS 1.23.
We have other containers (php-fpm) which consume secrets in the same manner however these were not affected on 1.24, only node containers were affected. There were no namespace, pod or deployment annotations I saw added which would have been a possible cause. After rolling the cluster back down to EKS 1.23, the deployment worked as expected.
I'm left scratching my head as to why the pod is unable to source that file on 1.24. Any suggestions on what to check or a possible cause would be greatly appreciated.
I'm working on a POC for getting a Spark cluster set up to use Kubernetes for resource management using AKS (Azure Kubernetes Service). I'm using spark-submit to submit pyspark applications to k8s in cluster mode and I've been successful in getting applications to run fine.
I got Azure file share set up to store application scripts and Persistent Volume and a Persistent Volume Claim pointing to this file share to allow Spark to access the scripts from Kubernetes. This works fine for applications that don't write any output, like the pi.py example given in the spark source code, but writing any kind of outputs fails in this setup. I tried running a script to get wordcounts and the line
wordCounts.saveAsTextFile(f"./output/counts")
causes an exception where wordCounts is an rdd.
Traceback (most recent call last):
File "/opt/spark/work-dir/wordcount2.py", line 14, in <module>
wordCounts.saveAsTextFile(f"./output/counts")
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1570, in saveAsTextFile
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o65.saveAsTextFile.
: ExitCodeException exitCode=1: chmod: changing permissions of '/opt/spark/work-dir/output/counts': Operation not permitted
The directory "counts" has been created from the spark application just fine, so it seems like it has required permissions, but this subsequent chmod that spark tries to perform internally fails. I haven't been able to figure out the cause and what exact configuration I'm missing in my commands that's causing this. Any help would be greatly appreciated.
The kubectl version I'm using is
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"881d4a5a3c0f4036c714cfb601b377c4c72de543", GitTreeState:"clean", BuildDate:"2021-10-21T05:13:01Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
The spark version is 2.4.5 and the command I'm using is
<SPARK_PATH>/bin/spark-submit --master k8s://<HOST>:443 \
--deploy-mode cluster \
--name spark-pi3 \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=docker.io/datamechanics/spark:2.4.5-hadoop-3.1.0-java-8-scala-2.11-python-3.7-dm14 \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
--verbose /opt/spark/work-dir/wordcount2.py
The PV and PVC are pretty basic. The PV yml is:
apiVersion: v1
kind: PersistentVolume
metadata:
name: azure-fileshare-pv
labels:
usage: azure-fileshare-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
azureFile:
secretName: azure-storage-secret
shareName: dssparktestfs
readOnly: false
secretNamespace: spark-operator
The PVC yml is:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: azure-fileshare-pvc
# Set this annotation to NOT let Kubernetes automatically create
# a persistent volume for this volume claim.
annotations:
volume.beta.kubernetes.io/storage-class: ""
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
selector:
# To make sure we match the claim with the exact volume, match the label
matchLabels:
usage: azure-fileshare-pv
Let me know if more info is needed.
The owner and user are root.
It looks like you've mounted your volume as root. Your problem:
chmod: changing permissions of '/opt/spark/work-dir/output/counts': Operation not permitted
is due to the fact that you are trying to change the permissions of a file that you are not the owner of. So you need to change the owner of the file first.
The easiest solution is to chown on the resource you want to access. However, this is often not feasible as it can lead to privilege escalation as well as the image itself may block this possibility. In this case you can create security context.
A security context defines privilege and access control settings for a Pod or Container. Security context settings include, but are not limited to:
Discretionary Access Control: Permission to access an object, like a file, is based on user ID (UID) and group ID (GID).
Security Enhanced Linux (SELinux): Objects are assigned security labels.
Running as privileged or unprivileged.
Linux Capabilities: Give a process some privileges, but not all the privileges of the root user.
>
AppArmor: Use program profiles to restrict the capabilities of individual programs.
Seccomp: Filter a process's system calls.
AllowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process. This bool directly controls whether the no_new_privs flag gets set on the container process. AllowPrivilegeEscalation is true always when the container is: 1) run as Privileged OR 2) has CAP_SYS_ADMIN.
readOnlyRootFilesystem: Mounts the container's root filesystem as read-only.
The above bullets are not a complete set of security context settings -- please see SecurityContext for a comprehensive list.
For more information about security mechanisms in Linux, see Overview of Linux Kernel Security Features
You can Configure volume permission and ownership change policy for Pods.
By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted. For large volumes, checking and changing ownership and permissions can take a lot of time, slowing Pod startup. You can use the fsGroupChangePolicy field inside a securityContext to control the way that Kubernetes checks and manages ownership and permissions for a volume.
Here is an example:
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
fsGroupChangePolicy: "OnRootMismatch"
See also this similar question.
I'm making a block chain application using Hyperledger Fabric Node SDK, and in it I have to implement the functionality of deleting a user's account.
Below is my code for this:
// Revoke the user, then delete the user from wallet.
await ca.revoke({ enrollmentID: email }, adminUser);
const identityService = ca.newIdentityService();
identityService.delete(email, adminUser, true).then( async function(response) {
await wallet.remove(email);
logger.debug("Successfully removed user from wallet!");
}).catch((error) => {
logger.error(`Getting error: ${error}`);
return error.message;
});
I've tried debugging and separately running each line in the code and I've got to know that this whole code works but only the identityService.delete function does not work.
This function gives the error that Identity removal is disabled.
Looking more into this issue, I've got to know that I have to add:
cfg:
identities:
passwordattempts: 10
allowremove: true
in the standard configuration file.
Here is the Fabric CA Server's Configuration File.
In this file the cfg is at the end just above the Operations Section.
As I have multiple configuration files in my project, I looked into them and found the same Operations Section in one file. I added the cfg lines above it.
But then again the same Identity removal is disabled error shows.
What I want to know is that after adding these lines to the configuration file, why is the error still showing. I again created the channel after adding the lines. How do I know that my block chain network is using the updated configuration file. Or is there a way to update a configuration file and I have not done it properly.
Below is the screenshot of the error that I'm getting:
This has solved the issue. I had to add the 2 flags of --cfg.identities.allowremove and --cfg.affiliations.allowremove and then restart docker and it worked.
I think something missing in Fabric CA Server's Configuration File so you can try below file which is running fine for me. you can check I have added
cfg:
identities:
passwordattempts: 10
allowremove: true
fabric-ca-server.yaml
#############################################################################
# This is a configuration file for the fabric-ca-server command.
#
# COMMAND LINE ARGUMENTS AND ENVIRONMENT VARIABLES
# ------------------------------------------------
# Each configuration element can be overridden via command line
# arguments or environment variables. The precedence for determining
# the value of each element is as follows:
# 1) command line argument
# Examples:
# a) --port 443
# To set the listening port
# b) --ca.keyfile ../mykey.pem
# To set the "keyfile" element in the "ca" section below;
# note the '.' separator character.
# 2) environment variable
# Examples:
# a) FABRIC_CA_SERVER_PORT=443
# To set the listening port
# b) FABRIC_CA_SERVER_CA_KEYFILE="../mykey.pem"
# To set the "keyfile" element in the "ca" section below;
# note the '_' separator character.
# 3) configuration file
# 4) default value (if there is one)
# All default values are shown beside each element below.
#
# FILE NAME ELEMENTS
# ------------------
# The value of all fields whose name ends with "file" or "files" are
# name or names of other files.
# For example, see "tls.certfile" and "tls.clientauth.certfiles".
# The value of each of these fields can be a simple filename, a
# relative path, or an absolute path. If the value is not an
# absolute path, it is interpreted as being relative to the location
# of this configuration file.
#
#############################################################################
# Version of config file
version: 1.2.0
# Server's listening port (default: 7054)
port: 7054
# Enables debug logging (default: false)
debug: false
# Size limit of an acceptable CRL in bytes (default: 512000)
crlsizelimit: 512000
#############################################################################
# TLS section for the server's listening port
#
# The following types are supported for client authentication: NoClientCert,
# RequestClientCert, RequireAnyClientCert, VerifyClientCertIfGiven,
# and RequireAndVerifyClientCert.
#
# Certfiles is a list of root certificate authorities that the server uses
# when verifying client certificates.
#############################################################################
tls:
# Enable TLS (default: false)
enabled: true
# TLS for the server's listening port
certfile:
keyfile:
clientauth:
type: noclientcert
certfiles:
#############################################################################
# The CA section contains information related to the Certificate Authority
# including the name of the CA, which should be unique for all members
# of a blockchain network. It also includes the key and certificate files
# used when issuing enrollment certificates (ECerts) and transaction
# certificates (TCerts).
# The chainfile (if it exists) contains the certificate chain which
# should be trusted for this CA, where the 1st in the chain is always the
# root CA certificate.
#############################################################################
ca:
# Name of this CA
name: Org1CA
# Key file (is only used to import a private key into BCCSP)
keyfile:
# Certificate file (default: ca-cert.pem)
certfile:
# Chain file
chainfile:
#############################################################################
# The gencrl REST endpoint is used to generate a CRL that contains revoked
# certificates. This section contains configuration options that are used
# during gencrl request processing.
#############################################################################
crl:
# Specifies expiration for the generated CRL. The number of hours
# specified by this property is added to the UTC time, the resulting time
# is used to set the 'Next Update' date of the CRL.
expiry: 24h
#############################################################################
# The registry section controls how the fabric-ca-server does two things:
# 1) authenticates enrollment requests which contain a username and password
# (also known as an enrollment ID and secret).
# 2) once authenticated, retrieves the identity's attribute names and
# values which the fabric-ca-server optionally puts into TCerts
# which it issues for transacting on the Hyperledger Fabric blockchain.
# These attributes are useful for making access control decisions in
# chaincode.
# There are two main configuration options:
# 1) The fabric-ca-server is the registry.
# This is true if "ldap.enabled" in the ldap section below is false.
# 2) An LDAP server is the registry, in which case the fabric-ca-server
# calls the LDAP server to perform these tasks.
# This is true if "ldap.enabled" in the ldap section below is true,
# which means this "registry" section is ignored.
#############################################################################
registry:
# Maximum number of times a password/secret can be reused for enrollment
# (default: -1, which means there is no limit)
maxenrollments: -1
# Contains identity information which is used when LDAP is disabled
identities:
- name: admin
pass: adminpw
type: client
affiliation: ""
attrs:
hf.Registrar.Roles: "*"
hf.Registrar.DelegateRoles: "*"
hf.Revoker: true
hf.IntermediateCA: true
hf.GenCRL: true
hf.Registrar.Attributes: "*"
hf.AffiliationMgr: true
#############################################################################
# Database section
# Supported types are: "sqlite3", "postgres", and "mysql".
# The datasource value depends on the type.
# If the type is "sqlite3", the datasource value is a file name to use
# as the database store. Since "sqlite3" is an embedded database, it
# may not be used if you want to run the fabric-ca-server in a cluster.
# To run the fabric-ca-server in a cluster, you must choose "postgres"
# or "mysql".
#############################################################################
db:
type: sqlite3
datasource: fabric-ca-server.db
tls:
enabled: false
certfiles:
client:
certfile:
keyfile:
#############################################################################
# LDAP section
# If LDAP is enabled, the fabric-ca-server calls LDAP to:
# 1) authenticate enrollment ID and secret (i.e. username and password)
# for enrollment requests;
# 2) To retrieve identity attributes
#############################################################################
ldap:
# Enables or disables the LDAP client (default: false)
# If this is set to true, the "registry" section is ignored.
enabled: false
# The URL of the LDAP server
url: ldap://<adminDN>:<adminPassword>#<host>:<port>/<base>
# TLS configuration for the client connection to the LDAP server
tls:
certfiles:
client:
certfile:
keyfile:
# Attribute related configuration for mapping from LDAP entries to Fabric CA attributes
attribute:
# 'names' is an array of strings containing the LDAP attribute names which are
# requested from the LDAP server for an LDAP identity's entry
names: ['uid','member']
# The 'converters' section is used to convert an LDAP entry to the value of
# a fabric CA attribute.
# For example, the following converts an LDAP 'uid' attribute
# whose value begins with 'revoker' to a fabric CA attribute
# named "hf.Revoker" with a value of "true" (because the boolean expression
# evaluates to true).
# converters:
# - name: hf.Revoker
# value: attr("uid") =~ "revoker*"
converters:
- name:
value:
# The 'maps' section contains named maps which may be referenced by the 'map'
# function in the 'converters' section to map LDAP responses to arbitrary values.
# For example, assume a user has an LDAP attribute named 'member' which has multiple
# values which are each a distinguished name (i.e. a DN). For simplicity, assume the
# values of the 'member' attribute are 'dn1', 'dn2', and 'dn3'.
# Further assume the following configuration.
# converters:
# - name: hf.Registrar.Roles
# value: map(attr("member"),"groups")
# maps:
# groups:
# - name: dn1
# value: peer
# - name: dn2
# value: client
# The value of the user's 'hf.Registrar.Roles' attribute is then computed to be
# "peer,client,dn3". This is because the value of 'attr("member")' is
# "dn1,dn2,dn3", and the call to 'map' with a 2nd argument of
# "group" replaces "dn1" with "peer" and "dn2" with "client".
maps:
groups:
- name:
value:
#############################################################################
# Affiliations section. Fabric CA server can be bootstrapped with the
# affiliations specified in this section. Affiliations are specified as maps.
# For example:
# businessunit1:
# department1:
# - team1
# businessunit2:
# - department2
# - department3
#
# Affiliations are hierarchical in nature. In the above example,
# department1 (used as businessunit1.department1) is the child of businessunit1.
# team1 (used as businessunit1.department1.team1) is the child of department1.
# department2 (used as businessunit2.department2) and department3 (businessunit2.department3)
# are children of businessunit2.
# Note: Affiliations are case sensitive except for the non-leaf affiliations
# (like businessunit1, department1, businessunit2) that are specified in the configuration file,
# which are always stored in lower case.
#############################################################################
affiliations:
org1:
- department1
- department2
org2:
- department1
#############################################################################
# Signing section
#
# The "default" subsection is used to sign enrollment certificates;
# the default expiration ("expiry" field) is "8760h", which is 1 year in hours.
#
# The "ca" profile subsection is used to sign intermediate CA certificates;
# the default expiration ("expiry" field) is "43800h" which is 5 years in hours.
# Note that "isca" is true, meaning that it issues a CA certificate.
# A maxpathlen of 0 means that the intermediate CA cannot issue other
# intermediate CA certificates, though it can still issue end entity certificates.
# (See RFC 5280, section 4.2.1.9)
#
# The "tls" profile subsection is used to sign TLS certificate requests;
# the default expiration ("expiry" field) is "8760h", which is 1 year in hours.
#############################################################################
signing:
default:
usage:
- digital signature
expiry: 8760h
profiles:
ca:
usage:
- cert sign
- crl sign
expiry: 43800h
caconstraint:
isca: true
maxpathlen: 0
tls:
usage:
- signing
- key encipherment
- server auth
- client auth
- key agreement
expiry: 8760h
###########################################################################
# Certificate Signing Request (CSR) section.
# This controls the creation of the root CA certificate.
# The expiration for the root CA certificate is configured with the
# "ca.expiry" field below, whose default value is "131400h" which is
# 15 years in hours.
# The pathlength field is used to limit CA certificate hierarchy as described
# in section 4.2.1.9 of RFC 5280.
# Examples:
# 1) No pathlength value means no limit is requested.
# 2) pathlength == 1 means a limit of 1 is requested which is the default for
# a root CA. This means the root CA can issue intermediate CA certificates,
# but these intermediate CAs may not in turn issue other CA certificates
# though they can still issue end entity certificates.
# 3) pathlength == 0 means a limit of 0 is requested;
# this is the default for an intermediate CA, which means it can not issue
# CA certificates though it can still issue end entity certificates.
###########################################################################
csr:
cn: ca.org1.example.com
names:
- C: US
ST: "North Carolina"
L: "Durham"
O: org1.example.com
OU:
hosts:
- localhost
- org1.example.com
ca:
expiry: 131400h
pathlength: 1
#############################################################################
# BCCSP (BlockChain Crypto Service Provider) section is used to select which
# crypto library implementation to use
#############################################################################
bccsp:
default: SW
sw:
hash: SHA2
security: 256
filekeystore:
# The directory used for the software file-based keystore
keystore: msp/keystore
#############################################################################
# Multi CA section
#
# Each Fabric CA server contains one CA by default. This section is used
# to configure multiple CAs in a single server.
#
# 1) --cacount <number-of-CAs>
# Automatically generate <number-of-CAs> non-default CAs. The names of these
# additional CAs are "ca1", "ca2", ... "caN", where "N" is <number-of-CAs>
# This is particularly useful in a development environment to quickly set up
# multiple CAs. Note that, this config option is not applicable to intermediate CA server
# i.e., Fabric CA server that is started with intermediate.parentserver.url config
# option (-u command line option)
#
# 2) --cafiles <CA-config-files>
# For each CA config file in the list, generate a separate signing CA. Each CA
# config file in this list MAY contain all of the same elements as are found in
# the server config file except port, debug, and tls sections.
#
# Examples:
# fabric-ca-server start -b admin:adminpw --cacount 2
#
# fabric-ca-server start -b admin:adminpw --cafiles ca/ca1/fabric-ca-server-config.yaml
# --cafiles ca/ca2/fabric-ca-server-config.yaml
#
#############################################################################
cacount:
cafiles:
#############################################################################
# Intermediate CA section
#
# The relationship between servers and CAs is as follows:
# 1) A single server process may contain or function as one or more CAs.
# This is configured by the "Multi CA section" above.
# 2) Each CA is either a root CA or an intermediate CA.
# 3) Each intermediate CA has a parent CA which is either a root CA or another intermediate CA.
#
# This section pertains to configuration of #2 and #3.
# If the "intermediate.parentserver.url" property is set,
# then this is an intermediate CA with the specified parent
# CA.
#
# parentserver section
# url - The URL of the parent server
# caname - Name of the CA to enroll within the server
#
# enrollment section used to enroll intermediate CA with parent CA
# profile - Name of the signing profile to use in issuing the certificate
# label - Label to use in HSM operations
#
# tls section for secure socket connection
# certfiles - PEM-encoded list of trusted root certificate files
# client:
# certfile - PEM-encoded certificate file for when client authentication
# is enabled on server
# keyfile - PEM-encoded key file for when client authentication
# is enabled on server
#############################################################################
intermediate:
parentserver:
url:
caname:
enrollment:
hosts:
profile:
label:
tls:
certfiles:
client:
certfile:
keyfile:
#############################################################################
# CA configuration section
#
# Configure the number of incorrect password attempts are allowed for
# identities. By default, the value of 'passwordattempts' is 10, which
# means that 10 incorrect password attempts can be made before an identity get
# locked out.
#############################################################################
cfg:
identities:
passwordattempts: 10
allowremove: true
###############################################################################
when I execute the following PowerShell command:
.\kubectl get nodes
I get no nodes in response. I noticed that the config file from kubectl is empty too:
apiVersion: v1
clusters:
- cluster:
server: ""
name: cl-kubernetes
contexts: []
current-context: ""
kind: Config
preferences: {}
users: []
When I enter the server address at the config file, I get the message that the connection was refused. I suspect that it is due to missing certificates. During another installation this (apparently) following information was created automatically, which is now missing:
certificate-authority-data,
contexts - cluster,
contexts - user,
current context,
users - name,
client-certificate-data,
client-key-data,
token,
Could that be it? If so, where do I get this information?
Many thanks for the help
You need to use the Azure CLI first to get the credentials. Run
az aks get-credentials
https://learn.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az-aks-get-credentials
I have Puppet enterprise installed on my VM, running in Virtualbox.
The installation went fine, but when I try to run the command puppet agent -t I get the following error:
[root#puppetmaster ~]# puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Could not find data item role in any Hiera data file and no default supplied at /etc/puppetlabs/code/environments/production/manifests/site.pp:32:10 on node puppetmaster.localdomain
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Here is my site.pp file line where the error is coming from;
## site.pp ##
# This file (/etc/puppetlabs/puppet/manifests/site.pp) is the main entry point
# used when an agent connects to a master and asks for an updated configuration.
#
# Global objects like filebuckets and resource defaults should go in this file,
# as should the default node definition. (The default node can be omitted
# if you use the console and don't define any other nodes in site.pp. See
# http://docs.puppetlabs.com/guides/language_guide.html#nodes for more on
# node definitions.)
## Active Configurations ##
# Disable filebucket by default for all File resources:
#http://docs.puppetlabs.com/pe/latest/release_notes.html#filebucket-resource-no-longer-created-by-default
File { backup => false }
# DEFAULT NODE
# Node definitions in this file are merged with node data from the console. See
# http://docs.puppetlabs.com/guides/language_guide.html#nodes for more on
# node definitions.
# The default node definition matches any node lacking a more specific node
# definition. If there are no other nodes in this file, classes declared here
# will be included in every node's catalog, *in addition* to any classes
# specified in the console for that node.
node default {
# This is where you can declare classes for all nodes.
# Example:
# class { 'my_class': }
$role = hiera('role')
$location = hiera('location')
notify{"in the top level site.pp : role is '${role}', location is '${location}'": }
include "::roles::${role}"
}
If you look at the error, it can't find the hiera key that you've asked for in your site.pp:
Could not find data item role in any Hiera data file and no default supplied at /etc/puppetlabs/code/environments/production/manifests/site.pp:32:10 on node puppetmaster.localdomain
In your code, you have the following:
$role = hiera('role')
$location = hiera('location')
Both of these are hiera calls, that require that hiera is setup and that the relevant key is in a hieradata folder.
A useful tool to help you diagnose hiera issues is hiera_explain, which shows you how your hiera hierarchy is setup and configured, and might help explain what the issue is with your code.