Spark Streaming with Neo4j hangs while running with Docker - apache-spark

I have created a docker image of my application when I simply run it from the bash script, it works properly. However, when I run it as part of the docker-compose file the application hangs on the message:
18/06/27 13:17:18 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
And even after I wait for a while streaming heartbeat times out. What may be the reason for such a Spark Streaming+Neo4j application performance with Docker and how it can be improved?
The docker-compose file for my application:
version: '3.3'
services:
consumer-demo:
build:
context: .
dockerfile: Dockerfile
args:
- ARG_CLASS=consumer
- HOST=neo4jdb
volumes:
- ./:/workdir
working_dir: /workdir
restart: always
Overall docker-compose file for all the applications:
version: '3.3'
services:
kafka:
image: spotify/kafka
ports:
- "9092:9092"
networks:
- docker_elk
environment:
- ADVERTISED_HOST=localhost
neo4jdb:
image: neo4j:latest
container_name: neo4jdb
ports:
- "7474:7474"
- "7473:7473"
- "7687:7687"
networks:
- docker_elk
volumes:
- /var/lib/neo4j/import:/var/lib/neo4j/import
- /var/lib/neo4j/data:/data
- /var/lib/neo4j/conf:/conf
environment:
- NEO4J_dbms_active__database=graphImport.db
elasticsearch:
image: elasticsearch:latest
ports:
- "9200:9200"
- "9300:9300"
networks:
- docker_elk
volumes:
- esdata1:/usr/share/elasticsearch/data
kibana:
image: kibana:latest
ports:
- "5601:5601"
networks:
- docker_elk
volumes:
esdata1:
driver: local
networks:
docker_elk:
driver: bridge
The bash script using which an application works properly:
#!/usr/bin/env bash
if [ "$1" = "consumer" ]
then
java -cp "jars/spark_consumer.jar" consumer.SparkConsumer
else
echo "Wrong parameter. It should be consumer or producer, but it is $1"
fi
Application Dockerfile which may be the reason of slowdown of the application execution:
FROM java:8
ARG ARG_CLASS
ARG HOST
ENV MAIN_CLASS $ARG_CLASS
ENV SCALA_VERSION 2.11.8
ENV SBT_VERSION 1.1.1
ENV SPARK_VERSION 2.2.0
ENV SPARK_DIST spark-$SPARK_VERSION-bin-hadoop2.6
ENV SPARK_ARCH $SPARK_DIST.tgz
ENV HOSTNAME bolt://$HOST:7687
VOLUME /workdir
WORKDIR /opt
# Install Scala
RUN \
cd /root && \
curl -o scala-$SCALA_VERSION.tgz http://downloads.typesafe.com/scala/$SCALA_VERSION/scala-$SCALA_VERSION.tgz && \
tar -xf scala-$SCALA_VERSION.tgz && \
rm scala-$SCALA_VERSION.tgz && \
echo >> /root/.bashrc && \
echo 'export PATH=~/scala-$SCALA_VERSION/bin:$PATH' >> /root/.bashrc
# Install SBT
RUN \
curl -L -o sbt-$SBT_VERSION.deb https://dl.bintray.com/sbt/debian/sbt-$SBT_VERSION.deb && \
dpkg -i sbt-$SBT_VERSION.deb && \
rm sbt-$SBT_VERSION.deb
# Install Spark
RUN \
cd /opt && \
curl -o $SPARK_ARCH http://d3kbcqa49mib13.cloudfront.net/$SPARK_ARCH && \
tar xvfz $SPARK_ARCH && \
rm $SPARK_ARCH && \
echo 'export PATH=$SPARK_DIST/bin:$PATH' >> /root/.bashrc
EXPOSE 9851 9852 4040 9092 9200 9300 5601 7474 7687 7473
CMD /workdir/runDemo.sh "$MAIN_CLASS"

The problem was that another Spark process was running on the machine blocking Spark data streaming. I checked all the processes with ps aux | grep spark and found another running process. Simply killing that process and restarting Spark Streaming application solved the problem.

Related

docker-compose run: django and postgres "strange" behavior

When I run the following command:
docker-compose run django python manage.py startapp hello
OR
docker-compose run django python manage.py createsuperuser
Instead of starting a new container, it opens a Postgres shell.
However, when I'm using exec, like so:
docker-compose exec django python manage.py createsuperuser
Then it works as expected.
Here is my docker-compose.yml file:
version: '3.7'
services:
postgres:
image: 'postgres:9.6.9-alpine'
volumes:
- postgres_data:/var/lib/postgresql/data
env_file:
- ./.env.dev
django:
image: django_dev
build:
context: .
dockerfile: ./app/docker/django/Dockerfile
volumes:
- ./app/:/usr/src/app/
command: /usr/src/app/docker/django/start_dev
ports:
- 8000:8000
env_file:
- ./.env.dev
depends_on:
- postgres
node:
image: node_dev
build:
context: .
dockerfile: ./app/docker/node/Dockerfile
depends_on:
- django
volumes:
- ./app/:/usr/src/app/
- /usr/src/app/node_modules
command: npm run dev
ports:
- '3000:3000'
- "3001:3001"
volumes:
postgres_data:
Dockerfile:
FROM python:3.9.5-slim-buster
WORKDIR /usr/src/app
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
RUN apt update \
&& apt install -y curl \
&& curl -sL https://deb.nodesource.com/setup_lts.x | bash - \
&& apt-get -y install libpq-dev gcc \
&& pip install psycopg2 \
&& pip install psycopg2-binary
RUN pip install --upgrade pip \
&& pip install pipenv
COPY ./app/Pipfile /usr/src/app/Pipfile
RUN pipenv install --skip-lock --system --dev
RUN apt-get update && apt-get install -y libglu1
COPY ./app/docker/django/entrypoint /usr/src/app/docker/django/entrypoint
COPY ./app/docker/django/start_dev /usr/src/app/docker/django/start_dev
RUN sed -i 's/\r$//g' /usr/src/app/docker/django/start_dev
RUN chmod +x /usr/src/app/docker/django/start_dev
COPY . /usr/src/app/
ENTRYPOINT ["/usr/src/app/docker/django/entrypoint"]
entrypoint:
#!/bin/sh
bash | set -o errexit
bash | set +o pipefail
bash | set -o nounset
if [ -z "${POSTGRES_USER}" ]; then
base_postgres_image_default_user='postgres'
export POSTGRES_USER="${base_postgres_image_default_user}"
fi
export DATABASE_URL="postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}#${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DB}"
postgres_ready() {
python << END
import sys
import psycopg2
try:
psycopg2.connect(
dbname="${POSTGRES_DB}",
user="${POSTGRES_USER}",
password="${POSTGRES_PASSWORD}",
host="${POSTGRES_HOST}",
port="${POSTGRES_PORT}",
)
except psycopg2.OperationalError:
sys.exit(-1)
sys.exit(0)
END
}
until postgres_ready; do
>&2 echo 'Waiting for PostgreSQL to become available...'
sleep 1
done
>&2 echo 'PostgreSQL is available'
exec "$#"
start_dev:
#!/bin/sh
bash | set -o errexit
bash | set +o pipefail
bash | set -o nounset
python manage.py flush --no-input
python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser --noinput
python manage.py runserver 0.0.0.0:8000
I racked my brains for a few days, but couldn't find an explanation for this behavior.
Actually it is not opening postgres shell. It said that depended cotainer with postgres is running. A path that showed is /usr/src/app/ as in your django component. You can also checked hash of your django app and hash in bash line '7fbcde....'

Dockerized Node JS application not accessible from host machine

I have Node JS application
docker-compose.yml
version: '3'
services:
app:
build:
context: .
dockerfile: Dockerfile
command: 'yarn nuxt'
ports:
- 3000:3000
volumes:
- '.:/app'
Dockerfile
FROM node:15
RUN apt-get update \
&& apt-get install -y curl
RUN curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add - \
&& echo "deb https://dl.yarnpkg.com/debian/ stable main" > /etc/apt/sources.list.d/yarn.list \
&& apt-get update \
&& apt-get install -y yarn
WORKDIR /app
After running $ docker-compose up -d application starts and inside container it's accessible
$ docker-compose exec admin sh -c 'curl -i localhost:3000'
// 200 OK
But outside of container it's doesnt work. For example in chrome ERR_SOCKET_NOT_CONNECTED
Adding this to app service solves problem in docker-compose.yml
environment:
HOST: 0.0.0.0
Thanks to Marc Mintel article Development setup with Nuxt, Node and Docker
did you try to add
published: 3000
you can read more here - https://docs.docker.com/compose/compose-file/compose-file-v3/

docker container can not see the data from a shared volume

I'm trying to set up a lab using docker containers with base image centos7 and docker-compose.
Here is my docker-compose.yaml file
version: "3"
services:
base:
image: centos_base
build:
context: base
master:
links:
- base
build:
context: master
image: centos_master
container_name: master01
hostname: master01
volumes:
- ansible_vol:/var/ans
networks:
- net
host01:
links:
- base
- master
build:
context: host
image: centos_host
container_name: host01
hostname: host01
command: ["/var/run.sh"]
volumes:
- ansible_vol:/var/ans
networks:
- net
networks:
net:
volumes:
ansible_vol:
My Docker files are as below
Base Image docker file:
# For centos7.0
FROM centos:7
RUN yum install -y net-tools man vim initscripts openssh-server
RUN echo "12345" | passwd root --stdin
RUN mkdir /root/.ssh
Master Dockerfile :
FROM centos_base:latest
# install ansible package
RUN yum install -y epel-release
RUN yum install -y ansible openssh-clients
RUN mkdir /var/ans
# change working directory
WORKDIR /var/ans
RUN ssh-keygen -t rsa -N 12345 -C "master key" -f master_key
CMD /usr/sbin/sshd -D
Host Image Dockerfile:
FROM centos_base:latest
RUN mkdir /var/ans
COPY run.sh /var/
RUN chmod 755 /var/run.sh
My run.sh file:
#!/bin/bash
cat /var/ans/master_key.pub >> /root/.ssh/authorized_keys
# start SSH server
/usr/sbin/sshd -D
My Problems are:
If I run docker-compose up -d --build, I see no containers coming up. they all getting created but exiting.
Successfully tagged centos_host:latest
Creating working_base_1 ... done
Creating master01 ... done
Creating host01 ... done
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
433baf2dd0d8 centos_host "/var/run.sh" 12 minutes ago Exited (1) 12 minutes ago host01
a2a57e480635 centos_master "/bin/sh -c '/usr/sb…" 13 minutes ago Exited (1) 12 minutes ago master01
a4acf6fb3e7b centos_base "/bin/bash" 13 minutes ago Exited (0) 13 minutes ago working_base_1
ssh keys generated in 'centos_master' image are not available in centos_host container, even though I have added a volume mapping 'ansible_vol:/var/ans' in docker-compose file
My intention is these ssh key files generated in master should be available in host containers ,therefore the run.sh script can copy them to authorized_keys section of host containers.
Any help is greatly appreciated.
Try to put in base/Dockerfile :
RUN echo "12345" | passwd root --stdin; \
ssh-keygen -f /etc/ssh/ssh_host_rsa_key -N '' -b 4096 -t rsa
and rerun docker-compose build
/etc/ssh/ssh_host_rsa_key is a key used by sshd (ssh daemon), so that containers can be started properly.
The key you generated and copied into authorized_keys will be used to allow ssh client to connect to container via ssh.
Try to use external: false, to not attempt the container to create it and override the previous data at creation
version: "3"
services:
base:
image: centos_base
build:
context: base
master:
links:
- base
build:
context: master
image: centos_master
container_name: master01
hostname: master01
volumes:
- ansible_vol:/var/ans
networks:
- net
host01:
links:
- base
- master
build:
context: host
image: centos_host
container_name: host01
hostname: host01
command: ["/var/run.sh"]
volumes:
- ansible_vol:/var/ans
networks:
- net
networks:
net:
volumes:
ansible_vol:
external: false

ServiceUnavailableException with Neo4j and Spark from Docker

I have created a Docker image for my application which runs with Spark Streaming, Kafka, ElasticSearch, and Kibana. I packaged it into an executable jar file. When I run the application with this command everything works fine as expected (the data starts to be produced):
java -cp "target/scala-2.11/test_producer.jar" producer.KafkaCheckinsProducer
However, when I run it from docker I get an error of connection to Neo4j, although database runs from docker-compose file:
INFO: Closing connection pool towards localhost:7687
Exception in thread "main" org.neo4j.driver.v1.exceptions.ServiceUnavailableException: Unable to connect to localhost:7687, ensure the database is running and that there is a working network connection to it.
I run my application this way:
docker run -v my-volume:/workdir -w /workdir container-name
What could cause this problem? And what should I change in my Dockerfile to execute this application?
Here is the Dockerfile:
FROM java:8
ARG ARG_CLASS
ENV MAIN_CLASS $ARG_CLASS
ENV SCALA_VERSION 2.11.8
ENV SBT_VERSION 1.1.1
ENV SPARK_VERSION 2.2.0
ENV SPARK_DIST spark-$SPARK_VERSION-bin-hadoop2.6
ENV SPARK_ARCH $SPARK_DIST.tgz
VOLUME /workdir
WORKDIR /opt
# Install Scala
RUN \
cd /root && \
curl -o scala-$SCALA_VERSION.tgz http://downloads.typesafe.com/scala/$SCALA_VERSION/scala-$SCALA_VERSION.tgz && \
tar -xf scala-$SCALA_VERSION.tgz && \
rm scala-$SCALA_VERSION.tgz && \
echo >> /root/.bashrc && \
echo 'export PATH=~/scala-$SCALA_VERSION/bin:$PATH' >> /root/.bashrc
# Install SBT
RUN \
curl -L -o sbt-$SBT_VERSION.deb https://dl.bintray.com/sbt/debian/sbt-$SBT_VERSION.deb && \
dpkg -i sbt-$SBT_VERSION.deb && \
rm sbt-$SBT_VERSION.deb
# Install Spark
RUN \
cd /opt && \
curl -o $SPARK_ARCH http://d3kbcqa49mib13.cloudfront.net/$SPARK_ARCH && \
tar xvfz $SPARK_ARCH && \
rm $SPARK_ARCH && \
echo 'export PATH=$SPARK_DIST/bin:$PATH' >> /root/.bashrc
EXPOSE 9851 9852 4040 9092 9200 9300 5601 7474 7687 7473
CMD /workdir/runDemo.sh "$MAIN_CLASS"
And here is a docker-compose file:
version: '3.3'
services:
kafka:
image: spotify/kafka
ports:
- "9092:9092"
environment:
- ADVERTISED_HOST=localhost
neo4j_db:
image: neo4j:latest
ports:
- "7474:7474"
- "7473:7473"
- "7687:7687"
volumes:
- /var/lib/neo4j/import:/var/lib/neo4j/import
- /var/lib/neo4j/data:/data
- /var/lib/neo4j/conf:/conf
environment:
- NEO4J_dbms_active__database=graphImport.db
elasticsearch:
image: elasticsearch:latest
ports:
- "9200:9200"
- "9300:9300"
networks:
- docker_elk
volumes:
- esdata1:/usr/share/elasticsearch/data
kibana:
image: kibana:latest
ports:
- "5601:5601"
networks:
- docker_elk
volumes:
esdata1:
driver: local
networks:
docker_elk:
driver: bridge
From error message - you're trying to connect to localhost that is local to your application, not to the host on which it's running. You need to connect to correct host name inside the Docker network - you don't need to map all ports into your host, you just need to check that all Docker images in the same network.

docker-compose up didn't finish npm install.

I'm new to docker-compose and I'd like to use it for my current development.
after I ran docker-compose up -d everything was starting ok and it looks good. But my nodejs application wasn't installed correctly. It seems like npm install wasn't complete and I had to do docker exec -it api bash to run npm i manually inside the container.
Here's my docker-compose.
version: '2'
services:
app:
build: .
container_name: sparrow-api-1
volumes:
- .:/usr/src/app
- $HOME/.aws:/root/.aws
working_dir: /usr/src/app
environment:
- SPARROW_EVENT_QUEUE_URL=amqp://guest:guest#rabbitmq:5672
- REDIS_URL=redis
- NSOLID_APPNAME=sparrow-api
- NSOLID_HUB=registry:4001
- NODE_ENV=local
- REDIS_PORT=6379
- NODE_PORT=8081
- SOCKET_PORT=8002
- ELASTICSEARCH_URL=elasticsearch
- STDIN_OPEN=${STDIN_OPEN}
networks:
- default
depends_on:
- redis
- rabbitmq
- elasticsearch
expose:
- "8081"
ports:
- "8081:8081"
command: bash docker-command.sh
redis:
container_name: redis
image: redis:3.0.7-alpine
networks:
- default
ports:
- "6379:6379"
rabbitmq:
container_name: rabbitmq
image: rabbitmq:3.6.2-management
networks:
- default
ports:
- "15672:15672"
elasticsearch:
container_name: elasticsearch
image: elasticsearch:1.5.2
networks:
- default
ports:
- "9200:9200"
- "9300:9300"
registry:
image: nodesource/nsolid-registry
container_name: registry
networks:
- default
ports:
- 4001:4001
proxy:
image: nodesource/nsolid-hub
container_name: hub
networks:
- default
environment:
- REGISTRY=registry:4001
- NODE_DEBUG=nsolid
console:
image: nodesource/nsolid-console
container_name: console
networks:
- default
environment:
- NODE_DEBUG=nsolid
- NSOLID_APPNAME=console
- NSOLID_HUB=registry:4001
command: --hub hub:9000
ports:
- 3000:3000
# don't forget to create network as well
networks:
default:
driver: bridge
Here's my docker-command.sh
#!/usr/bin/env bash
# link the node modules to the root directory of our app, if not exists
modules_link="/usr/src/app/node_modules"
if [ ! -d "${modules_link}" ]; then
ln -s /usr/lib/app/node_modules ${modules_link}
fi
if [ -n "$STDIN_OPEN" ]; then
# if we want to be interactive with our app container, it needs to run in
# the background
tail -f /dev/null
else
nodemon
fi
Here's my Dockerfile
FROM nodesource/nsolid:latest
RUN mkdir /usr/lib/app
WORKDIR /usr/lib/app
COPY [".npmrc", "package.json", "/usr/lib/app/"]
RUN npm install \
&& npm install -g mocha \
&& npm install -g nodemon \
&& rm -rf package.json .npmrc
In your Dockerfile you are running npm install without any arguments first:
RUN npm install \
&& npm install -g mocha \
This will cause a non-zero exit code and due to the && the following commands are not executed. This should also fail the build though, so I'm guessing you already had a working image and added the npm instructions later. To rebuild the image use docker-compose build or simply docker-compose up --build. Per default docker-compose up will only build the image if it did not exist yet.

Resources