Check docker-compose services are running or not - linux

I am writing a script which will check docker service but I want check services which are inside the docker-compose without getting into it.
For ex: we have custom services like tass and inception basically we check it's status by this command "service tass status"
Is there any way to check this services in Docker-compose

You can use:
docker-compose ps <name>
docker-compose top <name>
docker-compose logs <name>
To "check" the service with name <name>. You can learn many more commands by doing `docker-compose --help.
Finally, you can run docker-compose exec <name> <shell> and get an interactive shell inside the cont inaner, which with some unix-utilities will allow you to "check" the container further with ease.
Finally, you can extract the name of the running container as in How do I retrieve the exact container name started by docker-compose.yml , and use any of the docker commands mentioned in the other answer to "check". From docker inspect <the container name> you can get the cgroup name and mounted filesystem, which you can "check".

Docker compose is only a tool to build docker images.
You should rely on docker commands in order to check each service health, for example:
docker ps
docker stat
docker inspect
docker container ls
In this How to check if the docker engine and a docker container are running? thread you can find a lot of alternatives about container checking.

Checking for .State.Status, .State.Running, etc. will tell you if it's running, but it's better to ensure that the health of your containers. Below is a script that you can run that will make sure that two containers are in good health before executing a command in the 2nd container. It prints out the docker logs if the wait time/attempts threshold has been reached.
Example taken from npm sql-mdb.
#!/bin/bash
# Wait for two docker healthchecks to be in a "healthy" state before executing a "docker exec -it $2 bash $3"
##############################################################################################################################
# $1 Docker container name that will wait for a "healthy" healthcheck (required)
# $2 Docker container name that will wait for a "healthy" healthcheck and will be used to run the execution command (required)
# $3 The actual execution command that will be ran (required)
attempt=0
health1=checking
health2=checking
while [ $attempt -le 79 ]; do
attempt=$(( $attempt + 1 ))
echo "Waiting for docker healthcheck on services $1 ($health1) and $2 ($health2): attempt: $attempt..."
if [[ health1 != "healthy" ]]; then
health1=$(docker inspect -f {{.State.Health.Status}} $1)
fi
if [[ $health2 != "healthy" ]]; then
health2=$(docker inspect -f {{.State.Health.Status}} $2)
fi
if [[ $health1 == "healthy" && $health2 == "healthy" ]]; then
echo "Docker healthcheck on services $1 ($health1) and $2 ($health2) - executing: $3"
docker exec -it $2 bash -c "$3"
[[ $? != 0 ]] && { echo "Failed to execute \"$3\" in docker container \"$2\"" >&2; exit 1; }
break
fi
sleep 2
done
if [[ $health1 != "healthy" || $health2 != "healthy" ]]; then
echo "Failed to wait for docker healthcheck on services $1 ($health1) and $2 ($health2) after $attempt attempts"
docker logs --details $1
docker logs --details $2
exit 1
fi

Related

Finish background process when next process is completed

Hi, all
I am trying to implement automated test running from Makefile target. As my test dependent on running docker container, I need to check that container is up and running during whole test execution, and restart it, if it's down. I am trying do it with bash script, running in background mode.
At glance code looks like this:
run-tests:
./check_container.sh & \
docker-compose run --rm tests; \
#Need to finish check_container.sh right here, after tests execution
RESULT=$$?; \
docker-compose logs test-container; \
docker-compose kill; \
docker-compose rm -fv; \
exit $$RESULT
Tests has vary execution time (from 20min to 2hrs), so I don't know before, how much time it will take. So, I try to poll it within script longer, than the longest test suite. Script looks like:
#!/bin/bash
time=0
while [ $time -le 5000 ]; do
num=$(docker ps | grep selenium--standalone-chrome -c)
if [[ "$num" -eq 0 ]]; then
echo 'selenium--standalone-chrome container is down!';
echo 'try to recreate'
docker-compose up -d selenium
elif [[ "$num" -eq 1 ]]; then
echo 'selenium--standalone-chrome container is up and running'
else
docker ps | grep selenium--standalone-chrome
echo 'more than one selenium--standalone-chrome containers is up and running'
fi
time=$(($time + 1))
sleep 30
done
So, I am looking for how to exit script exactly after test running is finished, its meant, after command docker-compose run --rm tests is finished?
P.S. It is also ok, if background process can be finished on Makefile target finish
Docker (and Compose) can restart containers automatically when they exit. If your docker-compose.yml file has:
version: '3.8'
services:
selenium:
restart: unless-stopped
Then Docker will do everything your shell script does. If it also has
services:
tests:
depends_on:
- selenium
then the docker-compose run tests line will also cause the selenium container to be started, and you don't need a script to start it at all.
When you launch a command in the background, the special parameter $! contains its process ID. You can save this in a variable and later kill(1) it.
In plain shell-script syntax, without Make-related escaping:
./check_container.sh &
CHECK_CONTAINER_PID=$!
docker-compose run --rm tests
RESULT=$?
kill "$CHECK_CONTAINER_PID"

Server not reachable within a VPN (SNX) out of a Docker container

I am working with the latest Manjaro with the kernel: x86_64 Linux 5.10.15-1-MANJARO.
I am connected to my company network via VPN.
For this I use SNX with the build version 800010003.
When I start a Docker container (Docker version 20.10.3, build 48d30b5b32) which should connect to a machine from the company network, I get the following message.
[maurice#laptop ~]$ docker run --rm alpine ping company-server
ping: bad address &apos;company-server&apos;
Also using the IP from the 'company-server' doesn't work.
A ping outside the container works, no matter using the name or IP.
The resolv.conf looks correct to me.
[maurice#laptop ~]$ docker run --rm alpine cat /etc/resolv.conf
# Generated by NetworkManager
search lan
nameserver 10.1.0.250
nameserver 10.1.0.253
nameserver 192.168.86.1
What I have found out so far.
If I downgrade packages glibc and lib32-glibc to version 2.32-5, the ping out of the container works again. Because of dependencies I have also to downgrade gcc, gcc-libs and lib32-gcc-libs to version 10.2.0-4.
I tried the whole thing with a fresh Pop OS 20.10 installation, same problem.
I also did a test with another VPN (OpenVPN) which worked fine. However, this was only a test scenario and cannot be used as an alternative.
I have been looking for a solution for several days but have not found anything. It would be really nice if someone could help me with this.
TL;DR:
on kernel >5.8 the tunsnx interface is no longer created with global scope and need to be recreated. small script to the rescure https://gist.github.com/Fahl-Design/ec1e066ec2ef8160d101dff96a9b56e8
Longer version:
Here are my findings and the solution to (temp) fix it:
Steps to reproduce:
connect your snx tunnel
see ping fails to server behind tunnel
docker run --rm -ti --net=company_net busybox /bin/sh -c "ping 192.168.210.210"
run this command to check ip and scope of the "tunsnx" interface
ip -o address show "tunsnx" | awk -F ' +' '{print $4 " " $6 " " $8}'
if you get something like
192.168.210.XXX 192.168.210.30/32 247
or (Thx Timz)
192.168.210.XXX 192.168.210.30/32 nowhere
the scope is not set to "global" and no connection can be established
to fix this, like "ronan lanore" suggested, you need to change the scope to global
this can be done with a little helper script like this one:
#!/usr/bin/env bash
#
# Usage: [dry_run=1] [debug=1] [interface=tunsnx] docker-fix-snx
#
# Credits to: https://github.com/docker/for-linwux/issues/288#issuecomment-825580160
#
# Env Variables:
# interface - Defaults to tunsnx
# dry_run - Set to 1 to have a dry run, just printing out the iptables command
# debug - Set to 1 to see bash substitutions
set -eu
_log_stderr() {
echo "$*" >&2
}
if [ "${debug:=0}" = 1 ]; then
set -x
dry_run=${dry_run:=1}
fi
: ${dry_run:=0}
: ${interface:=tunsnx}
data=($(ip -o address show "$interface" | awk -F ' +' '{print $4 " " $6 " " $8}'))
LOCAL_ADDRESS_INDEX=0
PEER_ADDRESS_INDEX=1
SCOPE_INDEX=2
if [ "$dry_run" = 1 ]; then
echo "[-] DRY-RUN MODE"
fi
if [ "${data[$SCOPE_INDEX]}" == "global" ]; then
echo "[+] Interface ${interface} is already set to global scope. Skip!"
exit 0
else
echo "[+] Interface ${interface} is set to scope ${data[$SCOPE_INDEX]}."
tmpfile=$(mktemp --suffix=snxwrapper-routes)
echo "[+] Saving current IP routing table..."
if [ "$dry_run" = 0 ]; then
sudo ip route save >$tmpfile
fi
echo "[+] Deleting current interface ${interface}..."
if [ "$dry_run" = 0 ]; then
sudo ip address del ${data[$LOCAL_ADDRESS_INDEX]} peer ${data[$PEER_ADDRESS_INDEX]} dev ${interface}
fi
echo "[+] Recreating interface ${interface} with global scope..."
if [ "$dry_run" = 0 ]; then
sudo ip address add ${data[$LOCAL_ADDRESS_INDEX]} dev ${interface} peer ${data[$PEER_ADDRESS_INDEX]} scope global
fi
echo "[+] Restoring routing table..."
if [ "$dry_run" = 0 ]; then
sudo ip route restore <$tmpfile 2>/dev/null
fi
echo "[+] Cleaning temporary files..."
rm $tmpfile
echo "[+] Interface ${interface} is set to global scope. Done!"
if [ "$dry_run" = 0 ]; then
echo "[+] Result:"
ip -o address show "tunsnx" | awk -F ' +' '{print $4 " " $6 " " $8}'
fi
exit 0
fi
[ "$debug" = 1 ] && set +x
Same problem for me now. Nothing big change but tunsnx interface scope change from global to 247. Delete it and re create with global scope.
Just for collection of possible solutions. I had the same problem but found that "tunsnx" interface was configured properly with "global" keyword. In my case the problem was that snx was started after docker daemon and restarting docker service docker restart helped.

bash script does not capture exit code 1 properly

I have a bash script in which I start a docker. The docker start fails due to some error which exist in there and it clearly says exit code 1. This is the script I have to run the docker command
startContainer(){
echo "change directory to ..."
cd "..."
docker-compose -f ./docker-compose.yml up -d
if [[ $? -eq 0 ]]; then
echo "Executed docker-compose successfully on ${HOST_APP_HOME}"
else
echo "Failed to start container on ${HOST_APP_HOME}. Failed command: docker-compose -f ${DOCKER_CONF_FILE} up -d"
printErrorFinish
fi
}
The docker-compose command fails and it clearly prints this message
exited with code 1
But my script does not capture it and the first condition (-eq 0) gets executed. Why it can't capture this error and consider it as a successful command?
I think the status code of the docker-compose doesn't really make sense on it's own. It is in charge of running multiple other containers, the exist status you see printed is probably from one of the containers.
Base on what your docker-compose file is doing you can use --exit-code-from option to get the exit code of each service. You can also add a health-check mechanism for desired services in order to know which one is running and which one is not (a service which is deployed successfully doesn't return any value but could be checked with health check).
You can read about --exit-code-from here.
Sorry that I don't know a better way.

Running a Bash Script Continuously Across Multiple Servers (Local, Jumpbox and Dev server)

I wrote a bash script to build a docker container (a NODE JS app) at the end so that the application runs on a DEV server. The thing is that the bash script has to be run in stages across multiple servers i.e., we need to build the docker container in local, ssh to a jump box and rsync it over there and then rsync it from the jumpbox to the dev server and then ssh to it again (if this process is done manually).
If you look at step 5 in the BASH script below, the rsync gets completed and the script runs to SSH to the jumpbox server, however, the script stops over there and I am just logged in to the jumpbox.
Could someone let me know what is wrong with this bash script and how should I fix it?
Thank you in advance.
Best,
R
#!/bin/bash
#This script allows users to deploy the application with minimal work
echo -n "Shall we begin the deployment process? Are you ready to rule the world with QA Hero's next version? `\n` If you are, then type YES and press [ENTER]?: "
read begin
echo "Wohoooo! You did the right thing! We are now ready to roll and you can actually see your terminal scroll! Lol, what a troll!"
echo -n "Yo mate! Can you enter your enumber and press [ENTER]?: "
read enumber
docker build --build-arg NODE_ENV=staging -t course-hero-x -f Dockerfile .
echo "The file has been built! Step 1 completed"
termdown 3
echo "BOOM! Get ready for step 2."
docker save -o course-hero.tar course-hero-x
termdown 3
echo "BOOM! Get ready for step 3."
rsync -avzh --progress --stats course-hero.tar `echo ${enumber}`#10.188.129.25:/home/`echo ${enumber}`
termdown 3
echo "BOOM! Get ready for step 4."
ssh `echo ${enumber}`#10.188.129.25
termdown 3
echo "BOOM! Get ready for step 5."
ls -la
termdown 3
echo "BOOM! Get ready for step 6."
if ["$enumber" == "e30157" || "$enumber" == "E30167"]; then
rsync -avzh --progress --stats course-hero.tar `echo ${enumber^^}`#10.80.63.65:/home/eh7/`echo ${enumber^^}`
elif ["$enumber" == "e32398" || "$enumber" == "E32398"]; then
rsync -avzh --progress --stats course-hero.tar `echo ${enumber^^}`#10.80.63.65:/home/eh8/`echo ${enumber^^}`
else
echo "You cannot access this system."
fi
termdown 3
echo "BOOM! Get ready for step 7."
ssh `echo ${enumber^^}`#10.80.63.65
echo "BOOM! Get ready for step 8."
ls -la
echo "BOOM! Get ready for step 9."
sudo docker load -i course-hero.tar
termdown 3
echo "BOOM! Get ready for step 10."
sudo docker ps
termdown 3
echo "BOOM! Get ready for step 11."
sudo docker stop $(sudo docker ps -a -q)
termdown 3
echo "BOOM! Get ready for step 12."
sudo docker rm $(sudo docker ps -a -q)
termdown 3
echo "BOOM! Get ready for step 13."
sudo docker load -i course-hero.tar
termdown 3
echo "BOOM! Get ready for step 14."
sudo docker run -d -e TZ=Australia/Melbourne --net=courseHero -p 0.0.0.0:80:3000 course-hero-x
echo "QA Hero is up and running. Go to http://cohero-dev.rmit.edu.au to checkout the latest version!"
termdown 3
echo "Step 15 completed! We are done here! See you next time homey!"
Once you've open an ssh session to the other box your local script stops because it's waiting for the ssh session to end. You're just left sitting at a command prompt on the remote box.
It looks like what you're intending to have happen is the commands following your ssh run on the remote box. To send the commands to the remote machine redirect them:
ssh ${enumber}#10.188.129.25 <<'ENDSSH'
# series of commands to
# run on remote host
ENDSSH
Consider running your script through Shellcheck — you've got several problems with it beyond the one you asked about.

how to pause/resume a fleet unit?

I have a vagrant coreos cluster setup in my computer. I could submit, load, start, stop, unload, destroy fleet units in different hosts in cluster. Are there fleetctl commands for pause/resume a unit that already been loaded/started? If there is no inbuilt command, how to achieve pause/resume functionality of fleet units?
Containers are supposed to be stateless, and you should design your app that way.
However if you want to pause, you can connect to the host running your unit and use docker pause/unpause.
Or if you never want to stop your container, tweak your unit file using wrapper scripts this way :
[Unit]
Description=blah
[Service]
ExecStart=<full path>/start.sh
ExecStop=<full path>/stop.sh
start.sh script :
#!/bin/bash
if [[ $(docker ps|grep <CONTAINER NAME/ID> == "" ]]; then
docker start <yourname>/<yourcontainer>
else
docker unpause <CONTAINER NAME/ID>
fi
stop.sh script :
#!/bin/bash
if [[ $(docker ps|grep <CONTAINER NAME/ID> == "" ]]; then
echo "container not running"
else
docker pause <CONTAINER NAME/ID>
fi
There isn't a way to do this in fleet today. My question is, how is pause/resume and different than stop/start or destroy/start?

Resources