I have two different docker stacks, one for HBase and one for Spark. I need to get the HBase jars into the spark path. One way that I can do this, without having to modify the spark containers is to use a volume. In my docker-compose.yml for HBase, I have defined a volume that points to the HBase home (it happens to be /opt/hbase-1.2.6). Is it possible to share that volume with the spark stack?
Right now, since the service names are different (2 different docker-compose files) the volumes are being prepended (hbase_hbasehome and spark_hbasehome) causing the share to fail.
You could use an external volume. See here the official documentation:
if set to true, specifies that this volume has been created outside of
Compose. docker-compose up does not attempt to create it, and raises
an error if it doesn’t exist.
external cannot be used in conjunction with other volume configuration
keys (driver, driver_opts).
In the example below, instead of attempting to create a volume called
[projectname]_data, Compose looks for an existing volume simply called
data and mount it into the db service’s containers.
As an example:
version: '2'
services:
db:
image: postgres
volumes:
- data:/var/lib/postgresql/data
volumes:
data:
external: true
You can also specify the name of the volume separately from the name used to refer to it within the Compose file:
volumes:
data:
external:
name: actual-name-of-volume
Related
This is related to Docker-compose, conditional statements? (e.g. add volume only if condition) question but I prefer to ask a new one because my problem is a bit different:
I have a docker-compose.yml with multiple services and multiple storage.
I'd deploy it locally or on Azure ACI.
I'd like to have only one docker-compose.yml , but Azure ACI needs to specify the storage driver:
mysql-data:
driver: azure_file
driver_opts:
share_name: mysql
storage_account_name: mystore
obviously the driver does not exist locally.
Using a variable for the driver like:
mysql-data:
driver: ${STORAGE_DRIVER}
driver_opts:
share_name: mysql
storage_account_name: mystore
Gives an error because the local overlay2 driver doesn't have share_name nor storage_account_name properties.
How can I achieve my problem and keeping only one docker-compose.yml multi-containers deployement ?
Thank you.
I have 4 nodes - one is manager and three are workers. On my three worker nodes I have configured lsyncd with rsync -u flag (so it is not syncing data if on the remote folder the version of file is newer) and delete=false. The daemon syncs /home/user/mydocker/vaultwarden/data across all worker nodes bidirectional. Syncing works stunning (I also tried with GlusterFS).
My idea is having only one replica on my worker node, and in case of failure Docker swarm gets UP a service on another node, and with synced information I should get the same copy of Vaultwarden with data inside. And it works with one exception - seems that once, for example, when I reboot the node, where service is, Docker redeploys the container on another node and it gets the data from some kinda cache which replaces all the data on my synced folder and since there the data has newer version, lsyncd syncs data to the other nodes. So, in this case I get a clear Vaultwarden without any data or if there was some data before it restores to the previous version. BUT if I manually get up the Vaultwarden with docker compose and then turn off the node (simulate a failure for example) and making the service UP on another node with docker compose, all works like a charm - the data persists and syncs without any problems.
My yaml config for the deploy:
version: '3'
services:
vaultwarden:
image: vaultwarden/server:latest
environment:
- ADMIN_TOKEN=XXXXXXXXXXXXX
- SIGNUPS_ALLOWED=true
volumes:
- /home/user/mydocker/vaultwarden/data:/data
ports:
- "8877:80"
deploy:
placement:
constraints:
- "node.role==worker
mode: replicated
replicas: 1
From a jupyter notebook I am creating a spark context which deploys spark on kubernetes. This has been working fine for some time. I am now trying to configure the spark context so that the driver and executors mount an nfs share to a local directory. Note the nfs share i am trying to mount has been in use for some time both via my k8 cluster as well as via other means.
According to the official documentation and release article for 3.1.x I should be able to modify my spark conf with options that are in turn passed to kubernetes.
My spark conf in this example is set as:
sparkConf.set(f"spark.kubernetes.driver.volumes.nfs.myshare.mount.readOnly", "false")
sparkConf.set(f"spark.kubernetes.driver.volumes.nfs.myshare.mount.path", "/deltalake")
sparkConf.set(f"spark.kubernetes.driver.volumes.nfs.myshare.options.server", "15.4.4.1")
sparkConf.set(f"spark.kubernetes.driver.volumes.nfs.myshare.options.path", "/deltalake")
In my scenario the nfs share is "15.4.4.1:/deltalake" and I arbitrarily selected the name myshare to represent this nfs mount.
When I describe the pods created when I instantiate the spark context I to not see any mounts resembling these directives.
# kubectldescribe <a-spark-pod>
...
Volumes:
spark-conf-volume-exec:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: spark-exec-85efd381ea403488-conf-map
Optional: false
spark-local-dir-1:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-947xd:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
I also do not see anything in the logs for the pod indicating an issue.
Update:
I missed a key line of the documentation which states that drivers and executors have different configs.
The configuration properties for mounting volumes into the executor pods use prefix spark.kubernetes.executor. instead of spark.kubernetes.driver.
The second thing I missed is that the docker container being used by the spark conf to provision the kubernetes pod which hosts the spark executors needs to have the software installed to mount nfs servers (ie run the command line utility to mount an nfs share). The spark integration solution will silently fail in the event that the nfs utils are not installed. If we describe the pod in this scenario, the pod will list an nfs volume and if we execute code on each executor to list the contents of the mount dir, the mount path will show an empty directory. There is not indication of the failure if we describe the pod or look at the logs of the pod.
I am rebuilding the container images and will try again
There are a few things needed to get this to work:
The spark conf needs to be configured for the driver and executor
The nfs utils package needs to be installed on the driver and executor nodes
The nfs server needs to be active and properly configured to allow connections
There are a few possible problems:
The mount does not succeed (server offline, path doesn't exist, path in use)
As a workaround:
After the spark session is created, run a shell command on all the workers to confirm they have access to the mount and the contents look right.
The set up is 3 nodes, two warm nodes with 5TBs of storage and a hot node with 2TB. I want to add 2TBs of storage to each of the two nodes.
Each node is run as a docker image on a Linux server which will be shutdown while adding the disks. I do not know how to make elasticsearch utilize the extra space after adding the disks.
No docker-compose files are used.
The elastic image is started without specifying volumes, but only specifies the elasticsearch yaml file. The elasticsearch file does not mention anything about the path properties.
You can use multiple data path, editing the yaml configuration file:
path:
data:
- /mnt/disk_1
- /mnt/disk_2
- /mnt/disk_3
In recent ElasticSearch versions, this option is deprecated.
See official documentation to migrate to an alternative configuration.
I have been using targets.json inside a node.js application running locally to dynamically add ip addresses for prometheus to probe service discovery as file_sd_configs option. It has worked well. I was able add new ip's and execute the prometheus reload api from the node app, monitor those ip's and issue alerts(with blackbox and alertmanager).
However, now the application and prometheus are running inside docker on same network. How can I make my node application write to a file(or update it) inside a folder in prometheus container?
You could bind the target.json file to the Prometheus and the application container by adding a volume mapping to your docker-compose file.
volumes:
- /hostpath/target.json:/containerpath/target.json
Instead of using a mapped hostsystem folder you can also use named volumes, see here for more information about docker volumes.