Finish background process when next process is completed - linux

Hi, all
I am trying to implement automated test running from Makefile target. As my test dependent on running docker container, I need to check that container is up and running during whole test execution, and restart it, if it's down. I am trying do it with bash script, running in background mode.
At glance code looks like this:
run-tests:
./check_container.sh & \
docker-compose run --rm tests; \
#Need to finish check_container.sh right here, after tests execution
RESULT=$$?; \
docker-compose logs test-container; \
docker-compose kill; \
docker-compose rm -fv; \
exit $$RESULT
Tests has vary execution time (from 20min to 2hrs), so I don't know before, how much time it will take. So, I try to poll it within script longer, than the longest test suite. Script looks like:
#!/bin/bash
time=0
while [ $time -le 5000 ]; do
num=$(docker ps | grep selenium--standalone-chrome -c)
if [[ "$num" -eq 0 ]]; then
echo 'selenium--standalone-chrome container is down!';
echo 'try to recreate'
docker-compose up -d selenium
elif [[ "$num" -eq 1 ]]; then
echo 'selenium--standalone-chrome container is up and running'
else
docker ps | grep selenium--standalone-chrome
echo 'more than one selenium--standalone-chrome containers is up and running'
fi
time=$(($time + 1))
sleep 30
done
So, I am looking for how to exit script exactly after test running is finished, its meant, after command docker-compose run --rm tests is finished?
P.S. It is also ok, if background process can be finished on Makefile target finish

Docker (and Compose) can restart containers automatically when they exit. If your docker-compose.yml file has:
version: '3.8'
services:
selenium:
restart: unless-stopped
Then Docker will do everything your shell script does. If it also has
services:
tests:
depends_on:
- selenium
then the docker-compose run tests line will also cause the selenium container to be started, and you don't need a script to start it at all.

When you launch a command in the background, the special parameter $! contains its process ID. You can save this in a variable and later kill(1) it.
In plain shell-script syntax, without Make-related escaping:
./check_container.sh &
CHECK_CONTAINER_PID=$!
docker-compose run --rm tests
RESULT=$?
kill "$CHECK_CONTAINER_PID"

Related

Why isn't this script killing Docker background process?

I've read How do I kill background processes / jobs when my shell script exits?, but I can't get it to work.
IDK if it's Docker shenanigans or something else.
#!/bin/bash -e
base="$(dirname "$0")"
trap 'kill $(jobs -p)' SIGINT SIGTERM EXIT
docker run --rm -p 5432:5432 -e POSTGRES_PASSWORD=password postgres:12 &
while ! nc -z localhost 5432; do
sleep 0.1
done
# uh-oh, error
false
When I run this, I am left with a running Docker container.
Why? How can stop the process when my script exits?
Docker is a client/server application, consisting of a thin client, docker, and server, dockerd. When you run a container, the client makes a few API calls to the server, one to create the container, another to start it, and since you didn't run it detached, it runs an attach API. When you kill the docker process, it detaches from the container, no longer showing you the logs, and kills that client portion. But the dockerd server is still running the container until process inside the container, running as pid 1 inside the container namespace, exits. You never killed that process since it's spawned from the dockerd daemon, not directly from the docker client.
To fix this, my suggestion is to run a docker stop, with the container name or id, as part of your trap handler. I wouldn't even bother running docker in the background, and instead pass -d to run detached.
Follow up, testing the script locally, it looks like killing the docker client does send a docker stop signal when you run the client attached like that. However, there's a race condition that can cause that stop to happen before the database is running. The command:
nc -z localhost 5432
is always going to succeed even before postgresql starts listening on the port because docker creates a port forward. E.g.:
$ nc -z localhost 5432 && echo it works
$ docker run -itd --rm -p 5432:5432 busybox tail -f /dev/null
c72427053124608fe18c31e5d6f3307d74a5cdce018503e9fff85dbc039b4fff
$ nc -z localhost 5432 && echo it works
it works
$ docker stop c72
c72
$ nc -z localhost 5432 && echo it works
However, if I run a sleep in the script, that forces it to wait long enough for the container to finish starting up, and the attach to finish, the container is stopped.
A better version of the script looks like the following, that waits for the database to completely start by checking the logs, and changing the trap to run a docker stop command:
#!/bin/bash -e
base="$(dirname "$0")"
trap 'kill $(jobs -p)' SIGINT SIGTERM EXIT
cid=$(docker run --rm -d -p 5432:5432 -e POSTGRES_PASSWORD=password postgres:12)
# leaving the kill assuming you have other background processes
trap 'docker stop $cid; kill $(jobs -p)' SIGINT SIGTERM EXIT
# waiting for the db to actually start, assuming later steps need the db to be up
while ! docker logs "$cid" 2>&1 | grep -q "database system is ready to accept connections" ; do
sleep 0.1
done
# uh-oh, error
false
It was Docker shenanigans.
I needed to use the --init option to run tini shim because
A process running as PID 1 inside a container is treated specially by Linux: it ignores any signal with the default action. As a result, the process will not terminate on SIGINT or SIGTERM unless it is coded to do so.
docker run --rm -p 5432:5432 -e POSTGRES_PASSWORD=password postgres:12 &

Using wait-for-it.sh to wait on an endpoint with a slash

Background:
I have a project that runs a docker image (in production it will run in knative, but for testing, docker/docker compose is a good first step), and as part of my CI/CD pipeline, I would like to run acceptance tests against the image. I have the image configured to run in docker, and the (cucumber) acceptance tests run as part of a gradle task. I would like the gradle testing to wait until the image is spun up (the image itself takes under .02 of a second to start thanks to graalvm ahead of time compilation, but the gitlab runner can be slow). I have hooked up a wait-for-it.sh script (https://gitlab.com/connorbutch/reading-comprehension/-/blob/5-implement-glue-code/wait-for-it.sh) and that works to wait for localhost:8080 (you can verify yourself by executing the following command: docker run -i --rm -p 8080:8080 connorbutch/reading-comprehension-server-quarkus-impl & ./wait-for-it.sh localhost:8080 --timeout=10 --strict -- ./gradlew acceptanceTest), but but I would like to have the task wait for localhost:8080/health/readiness, and the following command does not work: docker run -i --rm -p 8080:8080 connorbutch/reading-comprehension-server-quarkus-impl & ./wait-for-it.sh localhost:8080/health/readiness --timeout=10 --strict -- ./gradlew acceptanceTest It is important to note the application starts up in .02 seconds at the slowest, so the endpoint is up and working, it is an issue with how I am running the wait-for-it script
Any ideas on how to input the values to wait for it so that I have the wait-for-it.sh test a url containing a slash? Alternatively, any other input on how to perform this ordering would be great too!
If you'd like to run locally, clone the code from (https://gitlab.com/connorbutch/reading-comprehension/-/tree/5-implement-glue-code) (only requirement is to have docker), execute the ./run-it.sh, then use the above commands
Thanks,
Connor
wait-for-it.sh is a pure bash script that will wait on the availability of a host and TCP port.
you can't use it to check if service deployed into the server or URL is returning response or not , so you have to implement your own entry point script by using curl command and sleep to wait for that URL to return 200 ok or the response which you are expecting
you can start from here
S=0
while [ $S -ne 200 ]
do
S=$(curl -s -o /dev/null -w "%{http_code}" http://www.google.com )
sleep 1
done ;
echo "DONE !!"
#write the script to start the service

bash script does not capture exit code 1 properly

I have a bash script in which I start a docker. The docker start fails due to some error which exist in there and it clearly says exit code 1. This is the script I have to run the docker command
startContainer(){
echo "change directory to ..."
cd "..."
docker-compose -f ./docker-compose.yml up -d
if [[ $? -eq 0 ]]; then
echo "Executed docker-compose successfully on ${HOST_APP_HOME}"
else
echo "Failed to start container on ${HOST_APP_HOME}. Failed command: docker-compose -f ${DOCKER_CONF_FILE} up -d"
printErrorFinish
fi
}
The docker-compose command fails and it clearly prints this message
exited with code 1
But my script does not capture it and the first condition (-eq 0) gets executed. Why it can't capture this error and consider it as a successful command?
I think the status code of the docker-compose doesn't really make sense on it's own. It is in charge of running multiple other containers, the exist status you see printed is probably from one of the containers.
Base on what your docker-compose file is doing you can use --exit-code-from option to get the exit code of each service. You can also add a health-check mechanism for desired services in order to know which one is running and which one is not (a service which is deployed successfully doesn't return any value but could be checked with health check).
You can read about --exit-code-from here.
Sorry that I don't know a better way.

Cucumber tests fail but travis build still passes

I am using Travis for CI. For some reason, the builds pass even when some tests fail. See the full log here
https://travis-ci.org/msm1089/hobnob/jobs/534173396
The way I am running the tests is via a bash script, e2e.test.sh, that is run by yarn.
Searching for this specific issue has not turned up anything that helps. It is something to do with exit codes I believe. I think I need to somehow get the build to exit with non-zero, but as you can see at bottom of the log, yarn exits with 0.
e2e.test.sh
#!/usr/bin/env bash
RETRY_INTERVAL=${RETRY_INTERVAL:-0.2}
# Run our API server as a background process
if [[ "$OSTYPE" == "msys" ]]; then
if ! netstat -aon | grep "0.0.0.0:$SERVER_PORT" | grep "LISTENING"; then
pm2 start --no-autorestart --name test:serve "C:\Program Files\nodejs\node_modules\npm\bin\npm-cli.js" -- run test:serve
until netstat -aon | grep "0.0.0.0:$SERVER_PORT" | grep "LISTENING"; do
sleep $RETRY_INTERVAL
done
fi
else
if ! ss -lnt | grep -q :$SERVER_PORT; then
yarn run test:serve &
fi
until ss -lnt | grep -q :$SERVER_PORT; do
sleep $RETRY_INTERVAL
done
fi
npx cucumber-js spec/cucumber/features --require-module #babel/register --require spec/cucumber/steps
if [[ "$OSTYPE" == "msys" ]]; then
pm2 delete test:serve
fi
travis.yml
language: node_js
node_js:
- 'node'
- 'lts/*'
- '10'
- '10.15.3'
services:
- elasticsearch
before_install:
- curl -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.6.1.deb
- sudo dpkg -i --force-confnew elasticsearch-6.6.1.deb
- sudo service elasticsearch restart
before_script:
- sleep 10
env:
global:
- NODE_ENV=test
- SERVER_PROTOCOL=http
- SERVER_HOSTNAME=localhost
- SERVER_PORT=8888
- ELASTICSEARCH_PROTOCOL=http
- ELASTICSEARCH_HOSTNAME=localhost
- ELASTICSEARCH_PORT=9200
- ELASTICSEARCH_INDEX=test
package.json
...
scripts:{
"test": "yarn run test:unit && yarn run test:integration && yarn run test:e2e"
}
...
So, how can I ensure that the cucumber exit code is the one that is returned, so that the build fails as it should when the tests don't pass?
There are a few possible ways to solve this. Here are two of my favorite.
Option 1:
Add set -e at the top of your bash script, so that it exits on first error, preserving the exit code, and subsequently, failing Travis if its a non zero.
Option 2:
Capture whatever exit code you want, and exit with it wherever it makes sense.
run whatever command here
exitcode=$?
[[ $exitcode == 0 ]] || exit $exitcode
As a side note - it seems like your bash script has too many responsibilities. I would consider separating them if possible, and then you give travis a list of commands to run, and possibly one or two before_script commands.
Something along these lines:
# .travis.yml
before_script:
- ./start_server.sh
script:
- npx cucumber-js spec/cucumber/features ...

Stopping a started background service (phantomjs) in gitlab-ci

I'm starting phantomjs with specific arguments as part of my job.
This is running on a custom gitlab/gitlab-ci server, I'm currently not using containers, I guess that would simplify that.
I'm starting phantomjs like this:
- "timeout 300 phantomjs --ssl-protocol=any --ignore-ssl-errors=true vendor/jcalderonzumba/gastonjs/src/Client/main.js 8510 1024 768 2>&1 >> /tmp/gastonjs.log &"
Then I'm running my behat tests, and then I'm stopping that process again:
- "pkill -f 'src/Client/main.js' || true"
The problem is when a behat test fails, then it doesn't execute the pkill and the test-run is stuck waiting on phantomjs to finish. I already added the timeout 300 but that means I'm still currently waiting 2min or so after a fail and it will eventually stop it while test are still running when they get slow enough.
I haven't found a way to run some kind of post-run/cleanup command that also runs in case of fails.
Is there a better way to do this? Can I start phantomjs in a way that gitlab-ci doesn't care that it is still running? nohup maybe?
TL;DR; - spawn the process in a new thread with & but then you have to make sure the process is killed in successfull and failure builds.
i use this (with comments):
'E2E tests':
before_script:
- yarn install --force >/dev/null
# if there is already an instance running kill it - this is ok in my case - as this is not run very often
- /bin/bash -c '/usr/bin/killall -q lite-server; exit 0'
- export DOCKERHOST=$(ifconfig | grep -E "([0-9]{1,3}\\.){3}[0-9]{1,3}" | grep -v 127.0.0.1 | awk '{ print $2 }' | cut -f2 -d ':' | head -n1)
- export E2E_BASE_URL="http://$DOCKERHOST:8000/#."
# start the lite-server in a new process
- lite-server -c bs-config.js >/dev/null &
script:
# run the tests
- node_modules/.bin/protractor ./protractor.conf.js --seleniumAddress="http://localhost:4444/wd/hub" --baseUrl="http://$DOCKERHOST:8000" --browser chrome
# on a successfull run - kill lite server
- killall lite-server >/dev/null
after_script:
# when a test fails - try to kill it in the after_script. this looks rather complicated, but it makes sure your builds dont fail when the tests succeedes and the lite-server is already killed. to have a successfull build we ensure a non-error return code (exit 0)
- /bin/bash -c '/usr/bin/killall -q lite-server; exit 0'
stage: test
dependencies:
- Build
tags:
- selenium
https://gist.github.com/rufinus/9ee8f04fc1f9248eeb0c73ad5360a006#file-gitlab-ci-yml-L7
As hinted, basically my problem wasn't that I couldn't kill the process, it's that running my test script and it failing stopped at that point, resulting in a deadlock.
I was already doing something quite similar to the example from #Rufinus, but it just didn't work for me. There could be a few different things, like different way of running tests or so or starting it in before_script, which is not an option for me.
I did find a way to make it work for me, which was to prevent my test runner from stopping the execution of further tasks. I managed to do that with a "set +e" and then storing the exit code (something I tried to do before but it didn't work).
This is the relevant part from my job:
# Set option to prevent gitlab from stopping if behat fails.
- set +e
- "phantomjs --ssl-protocol=any --ignore-ssl-errors=true vendor/jcalderonzumba/gastonjs/src/Client/main.js 8510 1024 768 2>&1 >> /dev/null &"
# Store the exit code.
- "./vendor/bin/behat -f progress --stop-on-failure; export TEST_BEHAT=${PIPESTATUS[0]}"
- "pkill -f 'src/Client/main.js' || true"
# Exit the build
- if [ $TEST_BEHAT -eq 0 ]; then exit 0; else exit 1; fi
try -9 signal:
- "pkill -9 -f 'src/Client/main.js' || true"
You can try other signlas as well, you can find a list here

Resources