how to pause/resume a fleet unit? - coreos

I have a vagrant coreos cluster setup in my computer. I could submit, load, start, stop, unload, destroy fleet units in different hosts in cluster. Are there fleetctl commands for pause/resume a unit that already been loaded/started? If there is no inbuilt command, how to achieve pause/resume functionality of fleet units?

Containers are supposed to be stateless, and you should design your app that way.
However if you want to pause, you can connect to the host running your unit and use docker pause/unpause.
Or if you never want to stop your container, tweak your unit file using wrapper scripts this way :
[Unit]
Description=blah
[Service]
ExecStart=<full path>/start.sh
ExecStop=<full path>/stop.sh
start.sh script :
#!/bin/bash
if [[ $(docker ps|grep <CONTAINER NAME/ID> == "" ]]; then
docker start <yourname>/<yourcontainer>
else
docker unpause <CONTAINER NAME/ID>
fi
stop.sh script :
#!/bin/bash
if [[ $(docker ps|grep <CONTAINER NAME/ID> == "" ]]; then
echo "container not running"
else
docker pause <CONTAINER NAME/ID>
fi

There isn't a way to do this in fleet today. My question is, how is pause/resume and different than stop/start or destroy/start?

Related

Finish background process when next process is completed

Hi, all
I am trying to implement automated test running from Makefile target. As my test dependent on running docker container, I need to check that container is up and running during whole test execution, and restart it, if it's down. I am trying do it with bash script, running in background mode.
At glance code looks like this:
run-tests:
./check_container.sh & \
docker-compose run --rm tests; \
#Need to finish check_container.sh right here, after tests execution
RESULT=$$?; \
docker-compose logs test-container; \
docker-compose kill; \
docker-compose rm -fv; \
exit $$RESULT
Tests has vary execution time (from 20min to 2hrs), so I don't know before, how much time it will take. So, I try to poll it within script longer, than the longest test suite. Script looks like:
#!/bin/bash
time=0
while [ $time -le 5000 ]; do
num=$(docker ps | grep selenium--standalone-chrome -c)
if [[ "$num" -eq 0 ]]; then
echo 'selenium--standalone-chrome container is down!';
echo 'try to recreate'
docker-compose up -d selenium
elif [[ "$num" -eq 1 ]]; then
echo 'selenium--standalone-chrome container is up and running'
else
docker ps | grep selenium--standalone-chrome
echo 'more than one selenium--standalone-chrome containers is up and running'
fi
time=$(($time + 1))
sleep 30
done
So, I am looking for how to exit script exactly after test running is finished, its meant, after command docker-compose run --rm tests is finished?
P.S. It is also ok, if background process can be finished on Makefile target finish
Docker (and Compose) can restart containers automatically when they exit. If your docker-compose.yml file has:
version: '3.8'
services:
selenium:
restart: unless-stopped
Then Docker will do everything your shell script does. If it also has
services:
tests:
depends_on:
- selenium
then the docker-compose run tests line will also cause the selenium container to be started, and you don't need a script to start it at all.
When you launch a command in the background, the special parameter $! contains its process ID. You can save this in a variable and later kill(1) it.
In plain shell-script syntax, without Make-related escaping:
./check_container.sh &
CHECK_CONTAINER_PID=$!
docker-compose run --rm tests
RESULT=$?
kill "$CHECK_CONTAINER_PID"

Check docker-compose services are running or not

I am writing a script which will check docker service but I want check services which are inside the docker-compose without getting into it.
For ex: we have custom services like tass and inception basically we check it's status by this command "service tass status"
Is there any way to check this services in Docker-compose
You can use:
docker-compose ps <name>
docker-compose top <name>
docker-compose logs <name>
To "check" the service with name <name>. You can learn many more commands by doing `docker-compose --help.
Finally, you can run docker-compose exec <name> <shell> and get an interactive shell inside the cont inaner, which with some unix-utilities will allow you to "check" the container further with ease.
Finally, you can extract the name of the running container as in How do I retrieve the exact container name started by docker-compose.yml , and use any of the docker commands mentioned in the other answer to "check". From docker inspect <the container name> you can get the cgroup name and mounted filesystem, which you can "check".
Docker compose is only a tool to build docker images.
You should rely on docker commands in order to check each service health, for example:
docker ps
docker stat
docker inspect
docker container ls
In this How to check if the docker engine and a docker container are running? thread you can find a lot of alternatives about container checking.
Checking for .State.Status, .State.Running, etc. will tell you if it's running, but it's better to ensure that the health of your containers. Below is a script that you can run that will make sure that two containers are in good health before executing a command in the 2nd container. It prints out the docker logs if the wait time/attempts threshold has been reached.
Example taken from npm sql-mdb.
#!/bin/bash
# Wait for two docker healthchecks to be in a "healthy" state before executing a "docker exec -it $2 bash $3"
##############################################################################################################################
# $1 Docker container name that will wait for a "healthy" healthcheck (required)
# $2 Docker container name that will wait for a "healthy" healthcheck and will be used to run the execution command (required)
# $3 The actual execution command that will be ran (required)
attempt=0
health1=checking
health2=checking
while [ $attempt -le 79 ]; do
attempt=$(( $attempt + 1 ))
echo "Waiting for docker healthcheck on services $1 ($health1) and $2 ($health2): attempt: $attempt..."
if [[ health1 != "healthy" ]]; then
health1=$(docker inspect -f {{.State.Health.Status}} $1)
fi
if [[ $health2 != "healthy" ]]; then
health2=$(docker inspect -f {{.State.Health.Status}} $2)
fi
if [[ $health1 == "healthy" && $health2 == "healthy" ]]; then
echo "Docker healthcheck on services $1 ($health1) and $2 ($health2) - executing: $3"
docker exec -it $2 bash -c "$3"
[[ $? != 0 ]] && { echo "Failed to execute \"$3\" in docker container \"$2\"" >&2; exit 1; }
break
fi
sleep 2
done
if [[ $health1 != "healthy" || $health2 != "healthy" ]]; then
echo "Failed to wait for docker healthcheck on services $1 ($health1) and $2 ($health2) after $attempt attempts"
docker logs --details $1
docker logs --details $2
exit 1
fi

How to start single process using service script passed to ENTRYPOINT

I am passing the service script to ENTRYPOINT. The service is started but exited.
I have to start a process per container using service script from ENTRYPOINT or CMD. This way, I can reload the configuration inside the container using service script. I tried with CMD statement as well, but it starting the service but immediately exists the container.
ENTRYPOINT ["/etc/init.d/elasticsearch", "start"]
/etc/init.d/elasticsearch script has below code to start the service as daemon.
cd $ES_HOME
echo -n $"Starting $prog: "
daemon --user elasticsearch --pidfile $pidfile $exec -p $pidfile -d
retval=$?
echo
[ $retval -eq 0 ] && touch $lockfile
return $retval
Is it not possible to start the service using startup script and keep the container running?
commands used to create and run the containers.
docker build -f Dockerfile -t="elk/elasticsearch" .
docker run -d elk/elasticsearch
docker run -it elk/elasticsearch bash
The sysv initscripts are of type "forking" speaking in terms of a service manager. So it will detach from the start script. The container then needs some init process on pid 1 that controls the background process(es).
If you do not want to extract the relevant command from the initscript then you could still use the docker-systemctl-replacement to do both things for you. If it is run as CMD then it will start enabled service scripts just as you are used from a normal machine.
In general you do not use service scripts with Docker. Also in general, you never restart the service inside a container; instead, you stop the existing container, delete it, and start a new one.
The standard pattern is to launch whatever service it is you are trying to run, directly, as a foreground process. (No /etc/init.d, service, or systemctl anything.) You can extract the relevant command from the init script you show. I would replace your ENTRYPOINT command with
CMD ["elasticsearch"]
(but also double-check the Elasticsearch documentation just in case there are some other command-line options that matter).
The second part of this is to make sure database data is stored outside the container. Usually you use the docker run -v option to mount some alternate storage into the container. For example:
docker run \
--name elasticsearch \
-p 9200:9200 \
-v ./elasticsearch:/var/data/elasticsearch \
imagename
Once you’ve done this, you are free to stop, delete, and recreate the container, which is the right way to restart the service. (You need to do this if the underlying image ever changes; this happens if there is a bug fix or security issue in the image software or in the underlying Linux distribution.)
docker stop elasticsearch
docker rm elasticsearch
docker run -- name elasticsearch ...
You can write a simple shell script to hold the docker run command, or use an orchestration tool like Docker Compose that lets you declare the container parameters.

Use gcloud metadata-from-file shutdown-script to stop docker container gracefully

I have created gcloud compute instance from docker image and configured it to launch shutdown script which should call docker stop in order to shut down the app in the container gracefully.
gcloud beta compute instances create-with-container mycontainername \
--container-image ypapax/trap_exit \
--metadata-from-file shutdown-script=./on_shutdown.sh
And here is my initital on_shutdown.sh:
#!/usr/bin/env bash
docker stop $(docker ps -a -q)
Although, I added more debugging lines to it and now on_shutdown.sh looks like:
#!/usr/bin/env bash
# https://cloud.google.com/compute/docs/shutdownscript
curl -X POST -d "$(date) running containers on $(hostname): $(docker ps)" http://trap_exit.requestcatcher.com/test
docker stop $(docker ps -a -q)
result=$?
curl -X POST -d "$(date) $(hostname) stop status code: $result" http://trap_exit.requestcatcher.com/test
When I reboot the google compute instance:
sudo reboot
The script on_shutdown.sh is launched (I see it checking requrest listener). But when it tries to stop docker container, there is nothing to stop yet, docker ps shows empty line.
So this line:
curl -X POST -d "$(date) running containers on $(hostname): $(docker ps)" http://trap_exit.requestcatcher.com/test
gives me
Thu Jul 12 04:29:48 UTC 2018 running containers on myinstance:
Before calling sudo reboot I checked docker ps and saw my container running:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bbaba30e65ff ypapax/trap_exit "/entrypoint.sh" 7 seconds ago Up 7 seconds myinstance
So looks like docker container is killed between calling reboot and launching on_shutdown.sh. The problem is that killing doesn't call trap cleanup EXIT in my entrypoint. It needs to be stopped in order to call the cleanup.
Here is my entry point:
#!/usr/bin/env bash
set -ex
cleanup(){
echo cleanup is called
curl -X POST -d "cleanup is called on $(hostname) $(date)" http://trap_exit.requestcatcher.com/test
}
trap cleanup EXIT
curl -X POST -d "container is started on $(hostname) $(date)" http://trap_exit.requestcatcher.com/test
sleep 10000
So I would like to run my container's cleanup on gcloud compute instance reboot or shutdown but flag --metadata-from-file shutdown-script=./on_shutdown.sh doesn't help to do it. I also tried other methods to call a script on reboot like this. But my script hadn't been launched at all.
Here is my Dockerfile if it could help.
First, there are limitations coming with this approach:
Create and run shutdown scripts that execute commands right before an instance is terminated or restarted, on a best-effort basis.
Shutdown scripts are especially useful for instances in a managed instance group with an autoscaler.
The script runs during the limited shutdown period before the instance stops
As you have seen, docker might already have stopped by the time the shutdown script run: check with docker ps -a (instead of docker ps) to see the status of all exited containers.
Try adding a supervisor (as in this example) a docker image itself, in order to see if the supervisor, or at least use the docker run --init option: the goal is to check if the containers themselves do use their supervisor scripts.

how to properly configure jenkins swarm as a service to get proper scrshoots?

I have a trouble to find out what is the problem in node setup (centos+gnome+swarm as a service) as it does connect,run gui tests properly but returns "broken" (whole white or "something went wrong) screenshots.
In our CI env, we build and test GUI application (RED - Robot Editor) using Eclipse tool RCPTT which can click on GUI elements to validate functionalities.
Tests are are executed on nodes Centos7 with metacity+gnome+vncserver, whenever something with GUI is wrong (GUI element is not found,validation is not consistant with test criteria), report is created together with screenshot,so tester is able to have a look what has changed in tested app.
When node is manually configured (from Jenkins Nodes configuration page) or swarm script is executed by user on Node (via ssh), screenshots are fine.
When swarm as a service is executed (node is connected, systemctl status is green,by the same user as run manually), everything is ok besides sreenshots are off (screen res is fine,whole screen is white or with error "Oh no! Something has gone wrong".
I do not see any error in logs from RCPTT,xvnc,in job console.
What can be a root cause of broken screenshots?
env setup:
service definition
[Unit]
Description=Swarm client to create Jenkins slave
After=network.target
After=display-manager.service
[Service]
ExecStart=<path>/swarm_client.sh start
ExecStop=<path>/swarm_client.sh stop
Type=forking
PIDFile=<path>/slave.pid
User=root
Group=root
[Install]
WantedBy=graphical.target
swarm_client.sh
function startclient {
PUBIP=`public ip string`
java \
-jar ${SWARM_HOME}/swarm-client-3.3.jar \
-executors 1 \
-deleteExistingClients \
-disableClientsUniqueId \
-fsroot ${CLIENT_HOME} \
-labels "linux" \
-master <jenkins> \
-name node-Swarm-${PUBIP} 2>&1 > ${CLIENT_HOME}/slave.log &
PID=$!
RETCODE=$?
echo $PID > ${CLIENT_HOME}/slave.pid
exit $RETCODE
}
function stopclient {
if [ -f ${CLIENT_HOME}/slave.pid ];then
PID=`head -n1 ${CLIENT_HOME}/slave.pid`
kill $PID
rm -f ${CLIENT_HOME}/slave.pid
fi
}
SWARM_HOME=<path>/jenkins/swarm
CLIENT_HOME=<path>/jenkins
case "$1" in
start)
startclient
;;
stop)
stopclient
;;
*)
echo "Usage: systemctl {start|stop} swarm_client.service" || true
exit 1
esac
xvnc logs:
Fri Jul 7 11:05:40 2017
vncext: VNC extension running!
vncext: Listening for VNC connections on all interface(s), port 5942
vncext: created VNC server for screen 0
gnome-session-is-accelerated: llvmpipe detected.
ok, after rubber duck session and some googling it seems that while setting up a service which will be dependant on user environment properties/settings (swarm client is indeed a reverse remote shell), such service should import at least env properties from user shell.
In my case, if swarm_client.sh was working fine from ssh but not as service, it needed to use user's ssh/bash env properties
#export environment of user to file
env > user.env
Add such file to service description under [Service] section:
EnvironmentFile=<path>/user.env
I have not investigated what exactly was missing but this is good enough for my case.
Hope that it will help someone with the same problems with swarm as a service under Centos/RH

Resources