Apache Flink Operator - enable azure-fs-hadoop - azure

I am trying to perform a flink job, using Flink Operator (https://github:com/apache/flink-kubernetes-operator) on k8s, that using uses a connection to Azure Blob Storage described here: https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/filesystems/azure/
Following the guideline I need to copy the jar file flink-azure-fs-hadoop-1.15.0.jar from one directory to another.
I have already tried to do it via podTemplate and command functionality, but unfortunately it does not work, and the file does not appear in the destination directory.
Can you guide me on how to do it properly?
Below you can find my FlinkDeployment file.
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
namespace: flink
name: basic-example
spec:
image: flink:1.15
flinkVersion: v1_15
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
serviceAccount: flink
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: pod-template
spec:
serviceAccount: flink
containers:
- name: flink-main-container
volumeMounts:
- mountPath: /opt/flink/data
name: flink-data
# command:
# - "touch"
# - "/tmp/test.txt"
volumes:
- name: flink-data
emptyDir: { }
jobManager:
resource:
memory: "2048m"
cpu: 1
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: job-manager-pod-template
spec:
initContainers:
- name: fetch-jar
image: cirrusci/wget
volumeMounts:
- mountPath: /opt/flink/data
name: flink-data
command:
- "wget"
- "LINK_TO_CUSTOM_JAR_FILE_ON_AZURE_BLOB_STORAGE"
- "-O"
- "/opt/flink/data/test.jar"
containers:
- name: flink-main-container
command:
- "touch"
- "/tmp/test.txt"
taskManager:
resource:
memory: "2048m"
cpu: 1
job:
jarURI: local:///opt/flink/data/test.jar
parallelism: 2
upgradeMode: stateless
state: running
ingress:
template: "CUSTOM_LINK_TO_AZURE"
annotations:
cert-manager.io/cluster-issuer: letsencrypt
kubernetes.io/ingress.allow-http: 'false'
traefik.ingress.kubernetes.io/router.entrypoints: websecure
traefik.ingress.kubernetes.io/router.tls: 'true'
traefik.ingress.kubernetes.io/router.tls.options: default

Since you are using the stock Flink 1.15 image this Azure filesystem plugin comes built-in. You can enable it via setting the ENABLE_BUILT_IN_PLUGINS environment variable.
spec:
podTemplate:
containers:
# Do not change the main container name
- name: flink-main-container
env:
- name: ENABLE_BUILT_IN_PLUGINS
value: flink-azure-fs-hadoop-1.15.0.jar
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#using-filesystem-plugins

Related

Using a PV in an OpenShift 3 cron job

I have been able to successfully create a cron job for my OpenShift 3 project. The project is a lift and shift from an existing Linux web server. Part of the existing application requires several cron tasks to run. The one I am looking at the moment is a daily update to the applications database. As part of the execution of the cron job I want to write to a log file. There is already a PV/PVC defined for the main application and I was intending to use that hold the logs for my cron job but it seems the cron job is not being provided access to the PV.
I am using the following inProgress.yml for the definition of the cron job
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: in-progress
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: "Replace"
startingDeadlineSeconds: 200
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
parent: "cronjobInProgress"
spec:
containers:
- name: in-progress
image: <image name>
command: ["php", "inProgress.php"]
restartPolicy: OnFailure
volumeMounts:
- mountPath: /data-pv
name: log-vol
volumes:
- name: log-vol
persistentVolumeClaim:
claimName: data-pv
I am using the following command to create the cron job
oc create -f inProgress.yml
PHP Warning: fopen(/data-pv/logs/2022-04-27-app.log): failed to open
stream: No such file or directory in
/opt/app-root/src/errorHandler.php on line 75 WARNING: [2] mkdir():
Permission denied, line 80 in file
/opt/app-root/src/errorLogger.php WARNING: [2]
fopen(/data-pv/logs/2022-04-27-inprogress.log): failed to open stream:
No such file or directory, line 60 in file
/opt/app-root/src/errorLogger.php
Looking at the yml for pod that is executed, there is no mention of data-pv - it appears as though secret volumeMount, which has been added by OpenShift, is removing any further volumeMounts.
apiVersion: v1 kind: Pod metadata: annotations:
openshift.io/scc: restricted creationTimestamp: '2022-04-27T13:25:04Z' generateName: in-progress-1651065900- ...
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-n9jsw
readOnly: true ... volumes:
- name: default-token-n9jsw
secret:
defaultMode: 420
secretName: default-token-n9jsw
How can I access the PV from within the cron job?
Your manifest is incorrect. The volumes block needs to be part of the spec.jobTemplate.spec.template.spec, that is, it needs to be indented at the same level as spec.jobTemplate.spec.template.spec.containers. In its current position it is invisible to OpenShift. See e.g. this pod example.
Similarly, volumeMounts and restartPolicy are arguments to the container block, and need to be indented accordingly.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: in-progress
spec:
schedule: '*/5 * * * *'
concurrencyPolicy: Replace
startingDeadlineSeconds: 200
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
parent: cronjobInProgress
spec:
containers:
- name: in-progress
image: <image name>
command:
- php
- inProgress.php
restartPolicy: OnFailure
volumeMounts:
- mountPath: /data-pv
name: log-vol
volumes:
- name: log-vol
persistentVolumeClaim:
claimName: data-pv
Thanks for the informative response larsks.
OpenShift displayed the following when I copied your manifest suggestions
$ oc create -f InProgress.yml The CronJob "in-progress" is invalid:
spec.jobTemplate.spec.template.spec.restartPolicy: Unsupported value:
"Always": supported values: "OnFailure", " Never"
As your answer was very helpful I was able to resolve this problem by moving restartPolicy: OnFailure so the final manifest is below.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: in-progress
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: "Replace"
startingDeadlineSeconds: 200
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
parent: "cronjobInProgress"
spec:
restartPolicy: OnFailure
containers:
- name: in-progress
image: <image name>
command: ["php", "updateToInProgress.php"]
volumeMounts:
- mountPath: /data-pv
name: log-vol
volumes:
- name: log-vol
persistentVolumeClaim:
claimName: data-pv

How to inject evnironment variables to driver pod when using spark-on-k8s?

I am writing a Kubernetes Spark Application using GCP spark on k8s.
Currently, I am stuck at not being able to inject environment variables into my container.
I am following the doc here
Manifest:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-search-indexer
namespace: spark-operator
spec:
type: Scala
mode: cluster
image: "gcr.io/spark-operator/spark:v2.4.5"
imagePullPolicy: Always
mainClass: com.quid.indexer.news.jobs.ESIndexingJob
mainApplicationFile: "https://lala.com/baba-0.0.43.jar"
arguments:
- "--esSink"
- "http://something:9200/mo-sn-{yyyy-MM}-v0.0.43/searchable-article"
- "-streaming"
- "--kafkaTopics"
- "annotated_blogs,annotated_ln_news,annotated_news"
- "--kafkaBrokers"
- "10.1.1.1:9092"
sparkVersion: "2.4.5"
restartPolicy:
type: Never
volumes:
- name: "test-volume"
hostPath:
path: "/tmp"
type: Directory
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
env:
- name: "DEMOGRAPHICS_ES_URI"
value: "somevalue"
labels:
version: 2.4.5
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
cores: 1
instances: 1
memory: "512m"
env:
- name: "DEMOGRAPHICS_ES_URI"
value: "somevalue"
labels:
version: 2.4.5
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
Environment Variables set at pod:
Environment:
SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP)
SPARK_LOCAL_DIRS: /var/data/spark-1ed8539d-b157-4fab-9aa6-daff5789bfb5
SPARK_CONF_DIR: /opt/spark/conf
It turns out to use this one must enable webhooks (how to set up in quick-start guide here)
The other approach could be to use envVars
Example:
spec:
executor:
envVars:
DEMOGRAPHICS_ES_URI: "somevalue"
Ref: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/978

write access error for mounted volume on kubernetes

When we were deploying active-mq in azure kubernetes service(aks), where active-mq data folder mounted on azure managed disk as a persistent volume claim. Below is the yaml used for deployment.
ActiveMQ Image used: rmohr/activemq
Kubernetes Version: v1.15.7
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: activemqcontainer
spec:
replicas: 1
selector:
matchLabels:
app: activemqcontainer
template:
metadata:
labels:
app: activemqcontainer
spec:
securityContext:
runAsUser: 1000
fsGroup: 2000
runAsNonRoot: false
containers:
- name: web
image: azureregistry.azurecr.io/rmohractivemq
imagePullPolicy: IfNotPresent
ports:
- containerPort: 61616
volumeMounts:
- mountPath: /opt/activemq/data
subPath: data
name: volume
- mountPath: /opt/apache-activemq-5.15.6/conf/activemq.xml
name: config-xml
subPath: activemq.xml
imagePullSecrets:
- name: secret
volumes:
- name: config-xml
configMap:
name: active-mq-xml
- name: volume
persistentVolumeClaim:
claimName: azure-managed-disk
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: azure-managed-disk
spec:
accessModes:
- ReadWriteOnce
storageClassName: managed-premium
resources:
requests:
storage: 100Gi
Getting below error.
WARN | Failed startup of context o.e.j.w.WebAppContext#517566b{/admin,file:/opt/apache-activemq-5.15.6/webapps/admin/,null}
java.lang.IllegalStateException: Parent for temp dir not configured correctly: writeable=false
at org.eclipse.jetty.webapp.WebInfConfiguration.makeTempDirectory(WebInfConfiguration.java:336)[jetty-all-9.2.25.v20180606.jar:9.2.25.v20180606]
at org.eclipse.jetty.webapp.WebInfConfiguration.resolveTempDirectory(WebInfConfiguration.java:304)[jetty-all-9.2.25.v20180606.jar:9.2.25.v20180606]
at org.eclipse.jetty.webapp.WebInfConfiguration.preConfigure(WebInfConfiguration.java:69)[jetty-all-9.2.25.v20180606.jar:9.2.25.v20180606]
at org.eclipse.jetty.webapp.WebAppContext.preConfigure(WebAppContext.java:468)[jetty-all-9.2.25.v20180606.jar:9.2.25.v20180606]
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:504)[jetty-all-9.2.25.v20180606.jar:9.2.25.v20180606]
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)[jetty-all-9.2.25.v20180606.jar:9.2.25.v20180606]
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)[jetty-all-9.2.25.v20180606.jar:9.2.25.v20180606]
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)[jetty-all-9.2.25.v20180606.jar:9.2.25.v20180606]
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)[jetty-all-9.2.25.v20180606.jar:9.2.2
Its a warning from activemq web admin console. Jetty which hosts web console is unable to create temp directory.
WARN | Failed startup of context o.e.j.w.WebAppContext#517566b{/admin,file:/opt/apache-activemq-5.15.6/webapps/admin/,null}
java.lang.IllegalStateException: Parent for temp dir not configured correctly: writeable=false
You can override default temp directory by setting up environment variable ACTIVEMQ_TMP as below in container spec
env:
- name: ACTIVEMQ_TMP
value : "/tmp"

Kubernetes volume mounting

I ' m trying to mount a directory to my pods but always it shows me an error "no file or directory found"
This is my yaml file used for the deployment :
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp1-deployment
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
volumes:
- name: test-mount-1
persistentVolumeClaim:
claimName: task-pv-claim-1
containers:
- name: myapp
image: 192.168.11.168:5002/dev:0.0.1-SNAPSHOT-6f4b1db
command: ["java -jar /jar/myapp1-0.0.1-SNAPSHOT.jar --spring.config.location=file:/etc/application.properties"]
ports:
- containerPort: 8080
volumeMounts:
- mountPath: "/etc/application.properties"
#subPath: application.properties
name: test-mount-1
# hostNetwork: true
imagePullSecrets:
- name: regcred
#volumes:
# - name: test-mount
and this is the persistance volume config :
kind: PersistentVolume
apiVersion: v1
metadata:
name: test-mount-1
labels:
type: local
app: myapp
spec:
storageClassName: manual
capacity:
storage: 5Gi
accessModes:
- ReadWriteMany
hostPath:
path: "/mnt/share"
and this the claim volume config :
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: task-pv-claim-1
spec:
storageClassName: manual
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
and this for the service config used for the deployment :
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp
externalIPs:
- 192.168.11.145
ports:
- protocol: TCP
port: 8080
nodePort: 31000
type: LoadBalancer
status:
loadBalancer:
ingress:
If any one can help , I will be grateful and thanks .
You haven't included your storage class in your question, but I'm assuming you're attempting local storage on a node. Might be a simple thing to check, but does the directory exist on the node where your pod is running? And is it writeable? Depending on how many worker nodes you have, it looks like your pod could be running on any node, and the pv isn't set to any particular node. You could use node affinity to ensure that your pod runs on the same node that contains the directory referenced in your pv, if that's the issue.
Edit, if it's nfs, you need to change your pv to include:
nfs:
path: /mnt/share
server: <nfs server node ip/fqdn>
Example here

How to mount cassandra data location to azure file share using stateful set kubernetes

I am setting up 3 node Cassandra cluster on Azure using Statefull set Kubernetes and not able to mount data location in azure file share.
I am able to do using default kubenetes storage but not with Azurefile share option.
I have tried the following steps given below, finding difficulty in volumeClaimTemplates
apiVersion: "apps/v1"
kind: StatefulSet
metadata:
name: cassandra
labels:
app: cassandra
spec:
serviceName: cassandra
replicas: 3
selector:
matchLabels:
app: cassandra
template:
metadata:
labels:
app: cassandra
spec:
containers:
- name: cassandra
image: cassandra
imagePullPolicy: Always
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
env:
- name: CASSANDRA_SEEDS
value: cassandra-0.cassandra.default.svc.cluster.local
- name: MAX_HEAP_SIZE
value: 256M
- name: HEAP_NEWSIZE
value: 100M
- name: CASSANDRA_CLUSTER_NAME
value: "Cassandra"
- name: CASSANDRA_DC
value: "DC1"
- name: CASSANDRA_RACK
value: "Rack1"
- name: CASSANDRA_ENDPOINT_SNITCH
value: GossipingPropertyFileSnitch
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
volumeMounts:
- mountPath: /var/lib/cassandra/data
name: pv002
volumeClaimTemplates:
- metadata:
name: pv002
spec:
storageClassName: default
accessModes:
- ReadWriteOnce
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv002
accessModes:
- ReadWriteOnce
azureFile:
secretName: storage-secret
shareName: xxxxx
readOnly: false
claimRef:
namespace: default
name: az-files-02
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: az-files-02
spec:
accessModes:
- ReadWriteOnce
---
apiVersion: v1
kind: Secret
metadata:
name: storage-secret
type: Opaque
data:
azurestorageaccountname: xxxxx
azurestorageaccountkey: jjbfjbsfljbafkljasfkl;jf;kjd;kjklsfdhjbsfkjbfkjbdhueueeknekneiononeojnjnjHBDEJKBJBSDJBDJ==
I should able to mount data folder of each cassandra node into azure file share.
For using azure file in statefulset, I think you could following this example: https://github.com/andyzhangx/demo/blob/master/linux/azurefile/attach-stress-test/statefulset-azurefile1-2files.yaml

Resources