Running Spark history server in Docker to view AWS Glue jobs - apache-spark

I have set up AWS Glue to output Spark event logs so that they can be imported into Spark History Server. AWS provide a CloudFormation stack for this, I just want to run the history server locally and import the event logs. I want to use Docker for this so colleagues can easily run the same thing.
I'm running into problems because the history server is a daemon process, so the container starts and immediately shuts down.
How can I keep the Docker image alive?
My Dockerfile is as follows
ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v2.4.4
FROM ${SPARK_IMAGE}
RUN apk --update add coreutils
RUN mkdir /tmp/spark-events
ENTRYPOINT ["/opt/spark/sbin/start-history-server.sh"]
I start it using:
docker run -v ${PWD}/events:/tmp/spark-events -p 18080:18080 sparkhistoryserver

You need the SPARK_NO_DAEMONIZE environment variable, see here. This will keep the container alive.
Just modify your Dockerfile as follows:
ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v2.4.4
FROM ${SPARK_IMAGE}
RUN apk --update add coreutils
RUN mkdir /tmp/spark-events
ENV SPARK_NO_DAEMONIZE TRUE
ENTRYPOINT ["/opt/spark/sbin/start-history-server.sh"]
See here for a repo with more detailed readme.

Related

Update nodejs docker container env variables in container [duplicate]

If I have a docker container that I started a while back, what is the best way to set an environment variable in that running container? I set an environment variable initially when I ran the run command.
$ docker run --name my-wordpress -e VIRTUAL_HOST=domain.example --link my-mysql:mysql -d spencercooley/wordpress
but now that it has been running for a while I want to add another VIRTUAL_HOST to the environment variable. I do not want to delete the container and then just re-run it with the environment variable that I want because then I would have to migrate the old volumes to the new container, it has theme files and uploads in it that I don't want to lose.
I would just like to change the value of VIRTUAL_HOST environment variable.
There are generaly two options, because docker doesn't support this feature now:
Create your own script, which will act like runner for your command. For example:
#!/bin/bash
export VAR1=VAL1
export VAR2=VAL2
your_cmd
Run your command following way:
docker exec -i CONTAINER_ID /bin/bash -c "export VAR1=VAL1 && export VAR2=VAL2 && your_cmd"
Docker doesn't offer this feature.
There is an issue: "How to set an enviroment variable on an existing container? #8838"
Also from "Allow docker start to take environment variables #7561":
Right now Docker can't change the configuration of the container once it's created, and generally this is OK because it's trivial to create a new container.
For a somewhat narrow use case, docker issue 8838 mentions this sort-of-hack:
You just stop docker daemon and change container config in /var/lib/docker/containers/[container-id]/config.json (sic)
This solution updates the environment variables without the need to delete and re-run the container, having to migrate volumes and remembering parameters to run.
However, this requires a restart of the docker daemon. And, until issue issue 2658 is addressed, this includes a restart of all containers.
To:
set up many env. vars in one step,
prevent exposing them in 'sh' history, like with '-e' option (passing credentials/api tokens!),
you can use
--env-file key_value_file.txt
option:
docker run --env-file key_value_file.txt $INSTANCE_ID
Here's how you can modify a running container to update its environment variables. This assumes you're running on Linux. I tested it with Docker 19.03.8
Live Restore
First, ensure that your Docker daemon is set to leave containers running when it's shut down. Edit your /etc/docker/daemon.json, and add "live-restore": true as a top-level key.
sudo vim /etc/docker/daemon.json
My file looks like this:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"live-restore": true
}
Taken from here.
Get the Container ID
Save the ID of the container you want to edit for easier access to the files.
export CONTAINER_ID=`docker inspect --format="{{.Id}}" <YOUR CONTAINER NAME>`
Edit Container Configuration
Edit the configuration file, go to the "Env" section, and add your key.
sudo vim /var/lib/docker/containers/$CONTAINER_ID/config.v2.json
My file looks like this:
...,"Env":["TEST=1",...
Stop and Start Docker
I found that restarting Docker didn't work, I had to stop and then start Docker with two separate commands.
sudo systemctl stop docker
sudo systemctl start docker
Because of live-restore, your containers should stay up.
Verify That It Worked
docker exec <YOUR CONTAINER NAME> bash -c 'echo $TEST'
Single quotes are important here.
You can also verify that the uptime of your container hasn't changed:
docker ps
You wrote that you do not want to migrate the old volumes. So I assume either the Dockerfile that you used to build the spencercooley/wordpress image has VOLUMEs defined or you specified them on command line with the -v switch.
You could simply start a new container which imports the volumes from the old one with the --volumes-from switch like:
$ docker run --name my-new-wordpress --volumes-from my-wordpress -e VIRTUAL_HOST=domain.com --link my-mysql:mysql -d spencercooley/wordpres
So you will have a fresh container but you do not loose the old data. You do not even need to touch or migrate it.
A well-done container is always stateless. That means its process is supposed to add or modify only files on defined volumes. That can be verified with a simple docker diff <containerId> after the container ran a while.
In that case it is not dangerous when you re-create the container with the same parameters (in your case slightly modified ones). Assuming you create it from exactly the same image from which the old one was created and you re-use the same volumes with the above mentioned switch.
After the new container has started successfully and you verified that everything runs correctly you can delete the old wordpress container. The old volumes are then referred from the new container and will not be deleted.
If you are running the container as a service using docker swarm, you can do:
docker service update --env-add <you environment variable> <service_name>
Also remove using --env-rm
To make sure it's addedd as you wanted, just run:
docker exec -it <container id> env
1. Enter your running container:
sudo docker exec -it <container_name> /bin/bash
2. Run command to all available to user accessing the container and copy them to user running session that needs to run the commands:
printenv | grep -v "no_proxy" >> /etc/environment
3. Stop and Start the container
sudo docker stop <container_name>
sudo docker start <container_name>
Firstly you can set env inside the container the same way as you do on a linux box.
Secondly, you can do it by modifying the config file of your docker container (/var/lib/docker/containers/xxxx/config.v2.json). Note you need restart docker service to take affect. This way you can change some other things like port mapping etc.
here is how to update a docker container config permanently
stop container: docker stop <container name>
edit container config: docker run -it -v /var/lib/docker:/var/lib/docker alpine vi $(docker inspect --format='/var/lib/docker/containers/{{.Id}}/config.v2.json' <container name>)
restart docker
I solve this problem with docker commit after some modifications in the base container, we only need to tag the new image and start that one
docs.docker.com/engine/reference/commandline/commit
docker commit [container-id] [tag]
docker commit b0e71de98cb9 stack-overflow:0.0.1
then you can pass environment vars or file
docker run --env AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY --env AWS_SESSION_TOKEN --env-file env.local -p 8093:8093 stack-overflow:0.0.1
the quick working hack would be:
get into the running container.
docker exec -it <container_name> bash
set env variable,
install vim if not installed in the container
apt-get install vim
vi ~/.profile at the end of the file add export MAPPING_FILENAME=p_07302021
source ~/.profile
check whether it has been set! echo $MAPPING_FILENAME(make sure you should come out of the container.)
Now, you can run whatever you're running outside of the container from inside the container.
Note, in case you're worried that you might lose your work if the current session you logged in gets logged off. you can always use screen even before starting step 1. That way if you logged off by chance of your inside running container session, you can log back in.
After understand that docker run an image constructed with a dockerfile , and the only way to change it is build another image stop everything and run everything again .
So the easy way to "set an environment variable in a running docker container" is read dockerfile [1] (with docker inspect) understand how docker starts [1].
In the example [1] we can see that docker start with /usr/local/bin/docker-php-entrypoint and we could edit it with vi and add one line with export myvar=myvalue since /usr/local/bin/docker-php-entrypoint Posix script .
If you can change dockerfile, you can add a call to a script [2] for example /usr/local/bin/mystart.sh and in that file we can set your environment var.
Of course after change the scripts you need restart the container [3]
[1]
$ docker inspect 011aa33ba92b
[{
. . .
"ContainerConfig": {
"Cmd": [
"php-fpm"
],
"WorkingDir": "/app",
"Entrypoint": [
"docker-php-entrypoint"
],
. . .
}]
[2]
/usr/local/bin/mystart.sh
#!/bin/bash
export VAR1=VAL1
export VAR2=VAL2
your_cmd
[3]
docker restart dev-php (container name)
Hack with editing docker inner configs and then restarting docker daemon was unsuitable for my case.
There is a way to recreate container with new environment settings and use it for some time.
1. Create new image from runnning container:
docker commit my-service
a1b2c3d4e5f6032165497
Docker created new image, and answered with its id. Note, the image doesn't include mounts and networks.
2. Stop and rename original container:
docker stop my-service
docker rename my-service my-service-original
3. Create and start new container with modified environment:
docker run \
-it --rm \
--name my-service \
--network=required-network \
--mount type=bind,source=/host/path,target=/inside/path,readonly \
--env 'MY_NEW_ENV_VAR=blablabla OLD_ENV=zzz' \
a1b2c3d4e5f6032165497
Here, I did the following:
created new temporary container from image built on step 1, that will show its output on terminal, will exit on Ctrl+C, and will be deleted after that
configured its mounts and networks
added my custom environment configuration
4. After you worked with temporary container, press Ctrl+C to stop and remove it, and then return old container back:
docker rename my-service-original my-service
docker start my-service
How to set environment variable in a running docker container as a development environment
Basically you can do like in normal linux, adding export MY_VAR="value" to ~/.bashrc file.
Instructions
Using VScode attach to your running container
Then with VScode open the ~/.bashrc file
Export your variable by adding the code in the end of the file
export MY_VAR="value"
Finally execute .bashrc using source command
source ~/.bashrc
You could set an environment variable to a running Docker container by
docker exec -it -e "your environment Key"="your new value" <container> /bin/bash
Verify it using below command
printenv
This will update your key with the new value provided.
Note: This will get reverted back to old on if docker gets restarted.
Use export VAR=Value
Then type printenv in terminal to validate it is set correctly.

Dockerfile USER cmd vs Linux su command

I am trying to deploy db2 express image to docker using non-root user.
The below code is used to start the db2engine using root user, it works fine.
FROM ibmoms/db2express-c:10.5.0.5-3.10.0
ENV LICENSE=accept \
DB2INST1_PASSWORD=password
RUN su - db2inst1 -c "db2start"
CMD ["db2start"]
The below code is used to start the db2engine from db2inst1 profile, giving below exception during image build. please help to resolve this.( I am trying to avoid su - command )
FROM ibmoms/db2express-c:10.5.0.5-3.10.0
ENV LICENSE=accept \
DB2INST1_PASSWORD=password
USER db2inst1
RUN /bin/bash -c ~db2inst1/sqllib/adm/db2start
CMD ["db2start"]
SQL1641N The db2start command failed because one or more DB2 database manager program files was prevented from executing with root privileges by file system mount settings.
Can you show us your Dockerfile please?
It's worth noting that a Dockerfile is used to build an image. You can execute commands while building, but once an image is published, running processses are not maintained in the image definition.
This is the reason that the CMD directive exists, so that you can tell the container which process to start and encapsulate.
If you're using the pre-existing db2 image from IBM on DockerHub (docker pull ibmcom/db2), then you will not need to start the process yourself.
Their quickstart guide demonstrates this with the following example command:
docker run -itd --name mydb2 --privileged=true -p 50000:50000 -e LICENSE=accept -e DB2INST1_PASSWORD=<choose an instance password> -e DBNAME=testdb -v <db storage dir>:/database ibmcom/db2
As you can see, you only specify the image, and leave the default ENTRYPOINT and CMD, resulting in the DB starting.
Their recommendation for building your own container on top of theirs (FROM) is to load all custom scripts into /var/custom, and they will be executed automatically after the main process has started.

docker RUN/CMD is possibly not executed

I'm trying to build a docker file in which I first download and install the Cloud SQL Proxy, before running nodejs.
FROM node:13
WORKDIR /usr/src/app
RUN wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O cloud_sql_proxy
RUN chmod +x cloud_sql_proxy
COPY . .
RUN npm install
EXPOSE 8000
RUN cloud_sql_proxy -instances=[project-id]:[region]:[instance-id]=tcp:5432 -credential_file=serviceaccount.json &
CMD node index.js
When building the docker file, I don't get any errors. Also, the file serviceaccount.json is included and is found.
When running the docker file and checking the logs, I see that the connection in my nodejs app is refused. So there must be a problem with the Cloud SQL proxy. Also, I don't see any output of the Cloud SQL proxy in the logs, only from the nodejs app. When I create a VM and install both packages separately, it works. I get output like "ready for connections".
So somehow, my docker file isn't correct, because the Cloud SQL proxy is not installed or running. What am I missing?
Edit:
I got it working, but I'm not sure this is the correct way to do.
This is my dockerfile now:
FROM node:13
WORKDIR /usr/src/app
COPY . .
RUN chmod +x wrapper.sh
RUN wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O cloud_sql_proxy
RUN chmod +x cloud_sql_proxy
RUN npm install
EXPOSE 8000
CMD ./wrapper.sh
And this is my wrapper.sh file:
#!/bin/bash
set -m
./cloud_sql_proxy -instances=phosphor-dev-265913:us-central1:dev-sql=tcp:5432 -credential_file=serviceaccount.json &
sleep 5
node index.js
fg %1
When I remove the "sleep 5", it does not work because the server is already running before the connection of the cloud_sql_proxy is established. With sleep 5, it works.
Is there any other/better way to wait untill the first command is completely done?
RUN commands are used to do stuff that changes something in the file system of the image like installing packages etc. It is not meant to run a process when the you start a container from the resulting image like you are trying to do. Dockerfile is only used to build a static container image. When you run this image, only the arguments you give to CMD instruction(node index.js) is executed inside the container.
If you need to run both cloud_sql_proxy and node inside your container, put them in a shell script and run that shell script as part of CMD instruction.
See Run multiple services in a container
You should ideally have a separate container per process. I'm not sure what cloud_sql_proxy does, but probably you can run it in its own container and run your node process in its own container and link them using docker network if required.
You can use docker-compose to manage, start and stop these multiple containers with single command. docker-compose also takes care of setting up the network between the containers automatically. You can also declare that your node app depends on cloud_sql_proxy container so that docker-compose starts cloud_sql_proxy container first and then it starts the node app.

Nodejs Kubernetes Deployment keeps crashing

I'm pulling my hair out for a week but I am close to giving up. Please share your wisdom.
This is my Docker file:
FROM node
RUN apt-get update
RUN mkdir -p /var/www/stationconnect
RUN mkdir -p /var/log/node
WORKDIR /var/www/stationconnect
COPY stationconnect /var/www/stationconnect
RUN chown node:node /var/log/node
COPY ./stationconnect_fromstage/api/config /var/www/stationconnect/api/config
COPY ./etc/stationconnect /etc/stationconnect
WORKDIR /var/www/stationconnect/api
RUN cd /var/www/stationconnect/api
RUN npm install
RUN apt-get install -y vim nano
RUN npm install supervisor forever -g
EXPOSE 8888
USER node
WORKDIR /var/www/stationconnect/api
CMD ["bash"]
It works fine in docker alone running e.g.
docker run -it 6bcee4528c7c
Any advice?
When create a container, you should have a foreground process to keep the container alive.
What i’ve done is add a shell script line
while true; do sleep 1000; done at the end of my docker-entrypoint.sh, and refer to it in ENTRYPOINT [/docker-entrypoint.sh]
Take a look at this issue to find out more.
There’s an example how to make a Nodejs dockerfile, be sure to check it out.
this is kind of obvious. You are running it with interactive terminal bash session with docker run -it <container>. When you run a container in kube (or in docker without -it) bash will exit immediately, so this is what it is doing in kube deployment. Not crashing per say, just terminating as expected.
Change your command to some long lasting process. Even sleep 1d will do - it will die no longer. Nor will your node app work though... for that you need your magic command to launch your app in foreground.
You could add an ENTRYPOINT command to your Dockerfile that executes something that is run in the background indefinitely, say, for example, you run a script my_service.sh. This, in turn, could start a webserver like nginx as a service or simply do a tail -f /dev/null. This will keep your pod running in kubernetes as the main task of this container is not done yet. In your Dockerfile above, bash is executed, but once it runs it finishes and the container completes. Therefore, when you try to do kubectl run NAME --image=YOUR_IMAGE it fails to connect because k8s is terminating the pod that runs your container almost immediately after the new pod is started. This process will continue like this infinitely.
Please see this answer here for a in-line command that can help you run your image as is for debugging purposes...

Cron containers for docker - how do they actually work?

I have been using docker for a couple of months now, and am working on dockerizing various different server images. One consistent issue is that many servers need to run cron jobs. There is a lot of discussion about that online (including on Stackoverflow), but I don't completely understand the mechanics of it.
Currently, I am using the host's cron and docker exec into each container to run a script. I created a convention about the script's name and location; all my containers have the same script. This avoids having the host's cron depending on the containers.
Basically, once a minute, the host's cron does this:
for each container
docker exec -it <containername> /cronscript/minute-script
That works, but makes the containers depend on the host.
What I would like to do is create a cron container that kicks off a script within each of the other containers - but I am not aware of an equivalent to "docker exec" that works from one container to the other.
The specific situations I have right now are running a backup in a MySQL container, and running the cron jobs Moodle requires to be run every minute. Eventually, there will be additional things I need to do via cron. Moodle uses command-line PHP scripts.
What is the "proper" dockerized way to kick off a script from one container in another container?
Update: maybe it helps to mention my specific use cases, although there will be more as time goes on.
Currently, cron needs to do the following:
Perform a database dump from MySQL. I can do that via mysqldump TCP link from a cron container; the drawback here is that I can't limit the backup user to host 127.0.0.1. I might also be able to somehow finagle the MySQL socket into the cron container via a volume.
Perform regular maintenance on a Moodle installation. Moodle includes a php command line script that runs all of the maintenance tasks. This is the biggie for me. I can probably run this script through a volume, but Moodle was not designed with that situation in mind, and I would not rule out race conditions. Also, I do not want my moodle installation in a volume because it makes updating the container much harder (remember that in Docker, volumes are not reinitialized when you update the container with a new image).
Future: perform routine maintenance on a number of other of my servers, such as cleaning out email queues, etc.
My solution is:
install crond inside container
install Your soft
run cron as a daemon
run Your soft
Part of my Dockerfile
FROM debian:jessie
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY .crontab /usr/src/app
# Set timezone
RUN echo "Europe/Warsaw" > /etc/timezone \
&& dpkg-reconfigure --frontend noninteractive tzdata
# Cron, mail
RUN set -x \
&& apt-get update \
&& apt-get install -y cron rsyslog mailutils --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
CMD rsyslogd && env > /tmp/crontab && cat .crontab >> /tmp/crontab && crontab /tmp/crontab && cron -f
Description
Set timezone, because cron need this to proper run tasks
Install cron package - package with cron daemon
Install rsyslog package to log cron task output
Install mailutils package if You want to send e-mails from cron tasks
Run rsyslogd
Copy ENV variables to tmp file, because cron run tasks with minimal ENV and You tasks may need access to containers ENV variables
Append Your .crontab file (with Your tasks) to tmp file
Set root crontab from tmp file
Run cron daemon
I use this in my containers and work very well.
one-process-per-container
If You like this paradigm, then make one Dockerfile per cron task. e.g.
Dockerfile - main program
Dockerfile_cron_task_1 - cron task 1
Dockerfile_cron_task_1 - cron task 2
and build all containers:
docker build -f Dockerfile_cron_task_1 ...

Resources