Init script for Cassandra with docker-compose - cassandra

I would like to create keyspaces and column-families at the start of my Cassandra container.
I tried the following in a docker-compose.yml file:
# shortened for clarity
cassandra:
hostname: my-cassandra
image: my/cassandra:latest
command: "cqlsh -f init-database.cql"
The image my/cassandra:latest contains init-database.cql in /. But this does not seem to work.
Is there a way to make this happen ?

I was also searching for the solution to this question, and here is the way how I accomplished it.
Here the second instance of Cassandra has a volume with the schema.cql and runs CQLSH command
My Version with healthcheck so we can get rid of sleep command
version: '2.2'
services:
cassandra:
image: cassandra:3.11.2
container_name: cassandra
ports:
- "9042:9042"
environment:
- "MAX_HEAP_SIZE=256M"
- "HEAP_NEWSIZE=128M"
restart: always
volumes:
- ./out/cassandra_data:/var/lib/cassandra
healthcheck:
test: ["CMD", "cqlsh", "-u cassandra", "-p cassandra" ,"-e describe keyspaces"]
interval: 15s
timeout: 10s
retries: 10
cassandra-load-keyspace:
container_name: cassandra-load-keyspace
image: cassandra:3.11.2
depends_on:
cassandra:
condition: service_healthy
volumes:
- ./src/main/resources/cassandra_schema.cql:/schema.cql
command: /bin/bash -c "echo loading cassandra keyspace && cqlsh cassandra -f /schema.cql"
NetFlix Version using sleep
version: '3.5'
services:
cassandra:
image: cassandra:latest
container_name: cassandra
ports:
- "9042:9042"
environment:
- "MAX_HEAP_SIZE=256M"
- "HEAP_NEWSIZE=128M"
restart: always
volumes:
- ./out/cassandra_data:/var/lib/cassandra
cassandra-load-keyspace:
container_name: cassandra-load-keyspace
image: cassandra:latest
depends_on:
- cassandra
volumes:
- ./src/main/resources/cassandra_schema.cql:/schema.cql
command: /bin/bash -c "sleep 60 && echo loading cassandra keyspace && cqlsh cassandra -f /schema.cql"
P.S I found this way at one of the Netflix Repos

We recently tried to solve a similar problem in KillrVideo, a reference application for Cassandra. We are using Docker Compose to spin up the environment needed by the application which includes a DataStax Enterprise (i.e. Cassandra) node. We wanted that node to do some bootstrapping the first time it was started to install the CQL schema (using cqlsh to run the statements in a .cql file just like you're trying to do). Basically the approach we took was to write a shell script for our Docker entrypoint that:
Starts the node normally but in the background.
Waits until port 9042 is available (this is where clients connect to run CQL statements).
Uses cqlsh -f to run some CQL statements and init the schema.
Stops the node that's running in the background.
Continues on to the usual entrypoint for our Docker image that starts up the node normally (in the foreground like Docker expects).
We just use the existence of a file to indicate whether the node has already been bootstrapped and check that on startup to determine whether we need to do that logic above or can just start it normally. You can see the results in the killrvideo-dse-docker repository on GitHub.
There is one caveat to this approach. This worked great for us because in our reference application, we're only spinning up a single node (i.e. we aren't creating a cluster with more than one node). If you're running multiple nodes, you'll probably want to make sure that only one of the nodes does the bootstrapping to create the schema because multiple clients modifying the schema simultaneously can cause some issues with your cluster. (This is a known issue and will hopefully be fixed at some point.)

I solved this problem by patching cassandra's docker-entrypoint.sh so it will execute sh and cql files located in /docker-entrypoint-initdb.d on startup. This is similar to how MySQL docker containers work.
Basically, I add a small script at the end of the docker-entrypoint.sh (right before the last line, exec "$#"), that will run the cql scripts once cassandra is up. A simplified version is:
INIT_DIR=docker-entrypoint-initdb.d
# this whole block will execute in the background
(
cd $INIT_DIR
# wait for cassandra to be ready
while ! cqlsh -e 'describe cluster' > /dev/null 2>&1; do sleep 6; done
echo "$0: Cassandra cluster ready: executing cql scripts found in $INIT_DIR"
# find and execute cql scripts, in name order
for f in $(find . -type f -name "*.cql" -print | sort); do
echo "$0: running $f"
cqlsh -f "$f"
echo "$0: $f executed"
done
) &
This solution works for all cassandra versions (at least until 3.11, as the time of writing).
Hence, you only have to build and use this cassandra image version, and then add proper initializations scripts to the container using docker-compose volumes.
A complete gist with a more robust entrypoint patch (and example) is available here.

Related

Connect to kafka running in Azure Container Instance from outside

I have a kafka instance running in Azure Container instance. I want to connect to it (send messages) from outside the container (from application running on external server/local computer or another container).
After searching the internet, I understand that we need to provide the external IpAddress to kafka listener which would be listening from outside to connect.
Eg: KAFKA_ADVERTISED_LISTENERS: PLAINTEXT_INTERNAL://kafkaserver:29092,PLAINTEXT://<ip-address>:9092
But since azure container instance gets ip address after it has spin up how can we connect in this case?
docker-compose.yaml
version: '3.9'
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.0.1
container_name: zookeeper
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
KAFKA_JMX_PORT: 39999
volumes:
- ../zookeeper_data:/var/lib/zookeeper/data
- ../zookeeper_log:/var/lib/zookeeper/log
networks:
- app_net
#*************kafka***************
kafkaserver:
image: confluentinc/cp-kafka:7.0.1
container_name: kafkaserver
ports:
# To learn about configuring Kafka for access across networks see
# https://www.confluent.io/blog/kafka-client-cannot-connect-to-broker-on-aws-on-docker-etc/
- "9092:9092"
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT_INTERNAL://kafkaserver:29092,PLAINTEXT://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_JMX_PORT: 49999
volumes:
- ../kafka_data:/var/lib/kafka/data
networks:
- app_net
networks:
app_net:
driver: bridge
You could create an EventHubs cluster with Kafka support instead...
But if you want to run Kafka in Docker, the Confluent image would need extended with your own Dockerfile that would inject your own shell script between these lines which would use some shell command to fetch the external listener defined at runtime.
e.g. Create aci-run file with this section
echo "===> Configuring for ACI networking ..."
/etc/confluent/docker/aci-override
echo "===> Configuring ..."
/etc/confluent/docker/configure
echo "===> Running preflight checks ... "
/etc/confluent/docker/ensure
(Might need source /etc/confluent/docker/aci-override ... I haven't tested this)
Create a Dockerfile like so and build/push to your registry
ARG CONFLUENT_VERSION=7.0.1
FROM confluentinc/cp-kafka:${CONFLUENT_VERSION}
COPY aci-override /etc/confluent/docker/aci-override
COPY aci-run /etc/confluent/docker/run # override this file
In aci-override
#!/bin/bash
ACI_IP=...
ACI_EXTERNAL_PORT=...
ACI_SERVICE_NAME=...
export KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://${ACI_IP}:${ACI_EXTERNAL_PORT}
You can remove localhost listener since you want to connect externally.
Then update the YAML to run that image.
I know Heroku, Apache Mesos, Kubernetes, etc all set some PORT environment variable within the container when it starts. I'm not sure what that is for ACI, but if you can exec into a simple running container and run env, you might see it.

docker keep restarting until file is created in a mounted volume

I am trying to create a script that would restart itself, a micro service (in my case its node-red).
Here is my docker compose file:
docker-compose.yml
version: '2.1'
services:
wifi-connect:
build: ./wifi-connect
restart: always
network_mode: host
privileged: true
google-iot:
build: ./google-iot
volumes:
- 'app-data:/data'
restart: always
network_mode: host
depends_on:
- "wifi-connect"
ports:
- "8883:8883"
node-red:
build: ./node-red/node-red
volumes:
- 'app-data:/data'
restart: always
privileged: true
network_mode: host
depends_on:
- "google-iot"
volumes:
app-data:
I am using wait-for-it.sh in order to check if the previous container.
Here is an extract from the Dockerfile of the node-red microservice.
RUN chmod +x ./wait-for-it/wait-for-it.sh
# server.js will run when container starts up on the device
CMD ["bash", "/usr/src/app/start.sh", "bash", "/usr/src/app/wait-for-it/wait-for-it.sh google-iot:8883 -- echo Google IoT Service is up and running"]
I have seen the inotify.
Basically all I want is to restart the container node-red after a file has been created within the app-data volume which is mounted to the node-red container as well under /data folder path, the file path for e.g. would be: /data/myfile.txt.
Please note that this file gets generated automatically to the google-iot micro service but node-red container needs that file and pretty often is the case that the node-red container starts and /data/myfile.txt file is not present.
It sounds like you're trying to delay one container's startup until another has produced the file you're looking for, or exit if it's not available.
You can write that logic into a shell script fairly straightforwardly. For example:
#!/bin/sh
# entrypoint.sh
# Wait for the server to be available
./wait-for-it/wait-for-it.sh google-iot:8883
if [ $? -ne 0 ]; then
echo 'google-iot container did not become available' >&2
exit 1
fi
# Wait for the file to be present
seconds=30
while [ $seconds -gt 0 ]; do
if [ -f /data/myfile.txt ]; then
break
fi
sleep 1
seconds=$(($seconds-1))
done
if [ $seconds -eq 0 ]; then
echo '/data/myfile.txt was not created' >&2
exit 1
fi
# Run the command passed to us as arguments
exec "$#"
In your Dockerfile, make this script be the ENTRYPOINT. You must use JSON-array syntax in the ENTRYPOINT line. Your CMD can use any valid syntax. Note that we're running the wait-for-it script in the entrypoint wrapper, so you don't need to include that in the CMD. (And since the script is executable and begins with a "shebang" line #!/bin/sh, we do not need to explicitly name an interpreter to run it.)
# Dockerfile
RUN chmod +x entrypoint.sh wait-for-it/wait-for-it.sh
ENTRYPOINT ["/usr/src/app/entrypoint.sh"]
CMD ["/usr/src/app/start.sh"]
The entrypoint wrapper has two checks, first that the google-iot container eventually accepts TCP connections on port 8883 and a second that the file is created. If either of these cases fails the script exit 1 before it runs the CMD. This will cause the container as a whole to exit with that status code (a restart: on-failure will still restart it).
I also might consider whether some other approach to get the file might work, like using curl to make an HTTP request to the other container. There are several practical issues with sharing Docker volumes (particularly around ownership, but also if an old copy of the file is still around from a previous run) and sharing files works especially badly in a clustered environment like Kubernetes.
You can fix the issue with the race condition by using the long-syntax of depends_on where you can specify a health check. This will guarantee that your file is present when your node-red service runs.
node-red:
build: ./node-red/node-red
volumes:
- 'app-data:/data'
restart: always
privileged: true
network_mode: host
depends_on:
google-iot:
condition: service_healthy
Then you can define a health-check (see docs here) to see if your file is present in the volume. You can add the following to the service description for google-iot service:
healthcheck:
test: ["CMD", "cat", "/data/myfile.txt"]
interval: 1m30s
timeout: 10s
retries: 3
start_period: 40s
Feel free to tune the duration values as needed.
Does this fix your problem?

How can I switch the Spark version in Zeppelin to use Spark 3.x

I'm trying to set up a Dockerised version of Spark and Zeppelin but I cannot seem to understand how to switch the Zeppelin version to the 3.x version of Spark.
I'm using the default Zeppelin image from Docker Hub. Here's an excerpt from my docker-compose.yml.
zeppelin:
image: apache/zeppelin:0.9.0
container_name: zeppelin
#depends_on:
# - spark-master
ports:
- "8083:8080"
If I access Zeppelin (at localhost:8083), and execute spark.version, it still reads the version as 2.4.5.
How do I change the spark version in Zeppelin? I can see a fair number of versions supported but the docs don't clarify how to switch versions.https://github.com/apache/zeppelin/blob/master/spark/spark-shims/src/main/scala/org/apache/zeppelin/spark/SparkVersion.java#L25
You can run spark in a separate container and point the spark master to it, Another easy way is to build your image with a Spark on top of Zeppelin
Create a Dockerfile file with zeppelin as the base image
FROM apache/zeppelin:0.9.0
ENV SPARK_VERSION=3.0.0
ENV HADOOP_VERSION=3.2
ENV SPARK_INSTALL_ROOT=/spark
ENV SPARK_HOME=${SPARK_INSTALL_ROOT}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}
USER root
RUN mkdir "${SPARK_INSTALL_ROOT}"
USER $USER
RUN cd "${SPARK_INSTALL_ROOT}" && \
wget --show-progress https://archive.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop${HADOOP_VERSION}.tgz && \
tar -xzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
Now Use this image to run with docker-compose, You can create the image with a tag and use it here or you can directly refer to Dockerfile as below
version: '2'
services:
zeppelin:
image: zeppelin-spark
build:
context: .
dockerfile: Dockerfile
container_name: zeppelin
ports:
- "8083:8080"
Now run docker-compose up -d make sure both files are in the same directory or feel free to adjust the path in the context
Then you see the version as 3.0.0
You need to set SPARK_HOME & spark_master with a proper one. Please follow instructions in the Zeppelin's documentation:
https://zeppelin.apache.org/docs/latest/interpreter/spark.html
Reffering to https://zeppelin.apache.org/docs/latest/interpreter/spark.html
you need to mount your version of spark and set SPARK_HOME
docker run -u $(id -u) -p 8080:8080 -p 4040:4040 --rm -v /mnt/disk1/spark-3.1.2:/opt/spark -e SPARK_HOME=/opt/spark --name zeppelin apache/zeppelin:0.10.0
In my case, spark 3.3.0/scala 2.13 did not work; spark 3.1.3 did.

MongoDB and unrecognised option '--enableEncryption'

I have a problem when I run an image mongo with docker-compose.yml. I need to encrypt my data because it is very sensitive. My docker-compose.yml is:
version: '3'
services:
mongo:
image: "mongo"
command: ["mongod","--enableEncryption","--encryptionKeyFile", "/data/db/mongodb-keyfile"]
ports:
- "27017:27017"
volumes:
- $PWD/data:/data/db
I check the mongodb-keyfile exits in data/db, ok no problem, but when I build the file, made and up the image, and te command is:
"docker-entrypoint.sh mongod --enableEncryption --encryptionKeyFile /data/db/mongodb-keyfile"
The status:
About a minute ago Exited (2) About a minute ago
I show the logs and see:
Error parsing command line: unrecognised option '--enableEncryption'
I understand the error, but I don't known how to solve it. I think to make a Dockerfile with the image an ubuntu (linux whatever) and install mongo with the all configurations necessary. Or try to solved it.
Please help me, thx.
According to the documentation, the encryption is available in MongoDB Enterprise only. So you need to have paid subscription to use it.
For the docker image of the enterprise version it says in here that you can build it yourself:
Download the Docker build files for MongoDB Enterprise.
Set MONGODB_VERSION to your major version of choice.
export MONGODB_VERSION=4.0
curl -O --remote-name-all https://raw.githubusercontent.com/docker-library/mongo/master/$MONGODB_VERSION/{Dockerfile,docker-entrypoint.sh}
Build the Docker container.
Use the downloaded build files to create a Docker container image wrapped around MongoDB Enterprise. Set DOCKER_USERNAME to your Docker Hub username.
export DOCKER_USERNAME=username
chmod 755 ./docker-entrypoint.sh
docker build --build-arg MONGO_PACKAGE=mongodb-enterprise --build-arg MONGO_REPO=repo.mongodb.com -t $DOCKER_USERNAME/mongo-enterprise:$MONGODB_VERSION .
Test your image.
The following commands run mongod locally in a Docker container and check the version.
docker run --name mymongo -itd $DOCKER_USERNAME/mongo-enterprise:$MONGODB_VERSION
docker exec -it mymongo /usr/bin/mongo --eval "db.version()"

Docker-compose and node container as not the primary one

I'm new to Docker and I've successfully set up the PHP/Apache/MySQL. But once I try to add the node container (in order to use npm) it always shuts the container down upon composing up. And yes, I understand that I can use node directly without involving docker, but I find it useful for myself.
And as for composer, I want to use volumes in the node container in order to persist node_modules inside of src folder.
I compose it up using docker-compose up -d --build command.
During composing it shows no errors (even node container seems to be successfully built).
If it might help, I can share the log file (it's too big to include it here).
PS. If you find something that can be improved, please let me know.
Thank you in advance!
Dockerfile
FROM php:7.2-apache
RUN apt-get update
RUN a2enmod rewrite
RUN apt-get install zip unzip zlib1g-dev
RUN docker-php-ext-install pdo pdo_mysql mysqli zip
RUN curl -sS https://getcomposer.org/installer | php -- --install-dir=/usr/local/bin --filename=composer
RUN composer global require laravel/installer
ENV PATH="~/.composer/vendor/bin:${PATH}"
docker-compose.yml
version: '3'
services:
app:
build:
.
volumes:
- ./src:/var/www/html
depends_on:
- mysql
- nodejs
ports:
- 80:80
mysql:
image: mysql:5.7
environment:
MYSQL_ROOT_PASSWORD: qwerty
phpmyadmin:
image: phpmyadmin/phpmyadmin
links:
- mysql:db
ports:
- 8765:80
environment:
MYSQL_ROOT_PASSWORD: qwerty
PMA_HOST: mysql
depends_on:
- mysql
nodejs:
image: node:9.11
volumes:
- ./src:/var/www/html
As this Dockerfile you are using shows, you are not actually runing any application in the node container so as soon as it builds and starts up - it shuts down because it has nothing else to do.
Solution is simple - provide a application that you want to run into the container and run it like:
I've modified a part of your compose file
nodejs:
image: node:9.11
command: node app.js
volumes:
- ./src:/var/www/html
Where app.js is the script in which your app is written, you are free to use your own name.
edit providing a small improvement you asked for
You are not waiting until your database is fully initialized (depends_on is not capable of that), so take a look at one of my previous answers dealing with that problem here

Resources