Server not reachable within a VPN (SNX) out of a Docker container - linux

I am working with the latest Manjaro with the kernel: x86_64 Linux 5.10.15-1-MANJARO.
I am connected to my company network via VPN.
For this I use SNX with the build version 800010003.
When I start a Docker container (Docker version 20.10.3, build 48d30b5b32) which should connect to a machine from the company network, I get the following message.
[maurice#laptop ~]$ docker run --rm alpine ping company-server
ping: bad address 'company-server'
Also using the IP from the 'company-server' doesn't work.
A ping outside the container works, no matter using the name or IP.
The resolv.conf looks correct to me.
[maurice#laptop ~]$ docker run --rm alpine cat /etc/resolv.conf
# Generated by NetworkManager
search lan
nameserver 10.1.0.250
nameserver 10.1.0.253
nameserver 192.168.86.1
What I have found out so far.
If I downgrade packages glibc and lib32-glibc to version 2.32-5, the ping out of the container works again. Because of dependencies I have also to downgrade gcc, gcc-libs and lib32-gcc-libs to version 10.2.0-4.
I tried the whole thing with a fresh Pop OS 20.10 installation, same problem.
I also did a test with another VPN (OpenVPN) which worked fine. However, this was only a test scenario and cannot be used as an alternative.
I have been looking for a solution for several days but have not found anything. It would be really nice if someone could help me with this.

TL;DR:
on kernel >5.8 the tunsnx interface is no longer created with global scope and need to be recreated. small script to the rescure https://gist.github.com/Fahl-Design/ec1e066ec2ef8160d101dff96a9b56e8
Longer version:
Here are my findings and the solution to (temp) fix it:
Steps to reproduce:
connect your snx tunnel
see ping fails to server behind tunnel
docker run --rm -ti --net=company_net busybox /bin/sh -c "ping 192.168.210.210"
run this command to check ip and scope of the "tunsnx" interface
ip -o address show "tunsnx" | awk -F ' +' '{print $4 " " $6 " " $8}'
if you get something like
192.168.210.XXX 192.168.210.30/32 247
or (Thx Timz)
192.168.210.XXX 192.168.210.30/32 nowhere
the scope is not set to "global" and no connection can be established
to fix this, like "ronan lanore" suggested, you need to change the scope to global
this can be done with a little helper script like this one:
#!/usr/bin/env bash
#
# Usage: [dry_run=1] [debug=1] [interface=tunsnx] docker-fix-snx
#
# Credits to: https://github.com/docker/for-linwux/issues/288#issuecomment-825580160
#
# Env Variables:
# interface - Defaults to tunsnx
# dry_run - Set to 1 to have a dry run, just printing out the iptables command
# debug - Set to 1 to see bash substitutions
set -eu
_log_stderr() {
echo "$*" >&2
}
if [ "${debug:=0}" = 1 ]; then
set -x
dry_run=${dry_run:=1}
fi
: ${dry_run:=0}
: ${interface:=tunsnx}
data=($(ip -o address show "$interface" | awk -F ' +' '{print $4 " " $6 " " $8}'))
LOCAL_ADDRESS_INDEX=0
PEER_ADDRESS_INDEX=1
SCOPE_INDEX=2
if [ "$dry_run" = 1 ]; then
echo "[-] DRY-RUN MODE"
fi
if [ "${data[$SCOPE_INDEX]}" == "global" ]; then
echo "[+] Interface ${interface} is already set to global scope. Skip!"
exit 0
else
echo "[+] Interface ${interface} is set to scope ${data[$SCOPE_INDEX]}."
tmpfile=$(mktemp --suffix=snxwrapper-routes)
echo "[+] Saving current IP routing table..."
if [ "$dry_run" = 0 ]; then
sudo ip route save >$tmpfile
fi
echo "[+] Deleting current interface ${interface}..."
if [ "$dry_run" = 0 ]; then
sudo ip address del ${data[$LOCAL_ADDRESS_INDEX]} peer ${data[$PEER_ADDRESS_INDEX]} dev ${interface}
fi
echo "[+] Recreating interface ${interface} with global scope..."
if [ "$dry_run" = 0 ]; then
sudo ip address add ${data[$LOCAL_ADDRESS_INDEX]} dev ${interface} peer ${data[$PEER_ADDRESS_INDEX]} scope global
fi
echo "[+] Restoring routing table..."
if [ "$dry_run" = 0 ]; then
sudo ip route restore <$tmpfile 2>/dev/null
fi
echo "[+] Cleaning temporary files..."
rm $tmpfile
echo "[+] Interface ${interface} is set to global scope. Done!"
if [ "$dry_run" = 0 ]; then
echo "[+] Result:"
ip -o address show "tunsnx" | awk -F ' +' '{print $4 " " $6 " " $8}'
fi
exit 0
fi
[ "$debug" = 1 ] && set +x

Same problem for me now. Nothing big change but tunsnx interface scope change from global to 247. Delete it and re create with global scope.

Just for collection of possible solutions. I had the same problem but found that "tunsnx" interface was configured properly with "global" keyword. In my case the problem was that snx was started after docker daemon and restarting docker service docker restart helped.

Related

Check docker-compose services are running or not

I am writing a script which will check docker service but I want check services which are inside the docker-compose without getting into it.
For ex: we have custom services like tass and inception basically we check it's status by this command "service tass status"
Is there any way to check this services in Docker-compose
You can use:
docker-compose ps <name>
docker-compose top <name>
docker-compose logs <name>
To "check" the service with name <name>. You can learn many more commands by doing `docker-compose --help.
Finally, you can run docker-compose exec <name> <shell> and get an interactive shell inside the cont inaner, which with some unix-utilities will allow you to "check" the container further with ease.
Finally, you can extract the name of the running container as in How do I retrieve the exact container name started by docker-compose.yml , and use any of the docker commands mentioned in the other answer to "check". From docker inspect <the container name> you can get the cgroup name and mounted filesystem, which you can "check".
Docker compose is only a tool to build docker images.
You should rely on docker commands in order to check each service health, for example:
docker ps
docker stat
docker inspect
docker container ls
In this How to check if the docker engine and a docker container are running? thread you can find a lot of alternatives about container checking.
Checking for .State.Status, .State.Running, etc. will tell you if it's running, but it's better to ensure that the health of your containers. Below is a script that you can run that will make sure that two containers are in good health before executing a command in the 2nd container. It prints out the docker logs if the wait time/attempts threshold has been reached.
Example taken from npm sql-mdb.
#!/bin/bash
# Wait for two docker healthchecks to be in a "healthy" state before executing a "docker exec -it $2 bash $3"
##############################################################################################################################
# $1 Docker container name that will wait for a "healthy" healthcheck (required)
# $2 Docker container name that will wait for a "healthy" healthcheck and will be used to run the execution command (required)
# $3 The actual execution command that will be ran (required)
attempt=0
health1=checking
health2=checking
while [ $attempt -le 79 ]; do
attempt=$(( $attempt + 1 ))
echo "Waiting for docker healthcheck on services $1 ($health1) and $2 ($health2): attempt: $attempt..."
if [[ health1 != "healthy" ]]; then
health1=$(docker inspect -f {{.State.Health.Status}} $1)
fi
if [[ $health2 != "healthy" ]]; then
health2=$(docker inspect -f {{.State.Health.Status}} $2)
fi
if [[ $health1 == "healthy" && $health2 == "healthy" ]]; then
echo "Docker healthcheck on services $1 ($health1) and $2 ($health2) - executing: $3"
docker exec -it $2 bash -c "$3"
[[ $? != 0 ]] && { echo "Failed to execute \"$3\" in docker container \"$2\"" >&2; exit 1; }
break
fi
sleep 2
done
if [[ $health1 != "healthy" || $health2 != "healthy" ]]; then
echo "Failed to wait for docker healthcheck on services $1 ($health1) and $2 ($health2) after $attempt attempts"
docker logs --details $1
docker logs --details $2
exit 1
fi

How can I determine the IP address from a bash script?

How do I get the IPaddress of the current machine from Bash in a cross platform compatible way? I need to get the IP address the same way for Windows Linux and Mac OS.
I am currently using docker-compose to create a local version of my full deployment, however I can't access it using localhost or 127.0.0.1, I have to refer to the current machines IP address, for example curl 192.168.0.23:80
Currently I make the user set the IP address manually:
# Return true if we pass in an IPv4 pattern.
not_ip() {
rx='([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])'
if [[ $1 =~ ^$rx\.$rx\.$rx\.$rx$ ]]; then
return 1
else
return 0
fi
}
# Ensure lower case
OPTION=$(echo $1 | tr [:upper:] [:lower:])
case "$OPTION" in
"test")
LOCAL_IP=${LOCAL_IP:-$2}
if not_ip "$LOCAL_IP" ; then
echo The test command couldn\'t resolve your computers Network IP. $LOCAL_IP
echo
help_comment
exit 1
fi
python -m webbrowser "http://${LOCAL_IP}:80/" &
;;
esac
However I would love to be able to get this without having to have the user set any environment variables, especially when dealing with Windows machines.
Any ideas?
You need to detect host before use of the following because of OS specific commands
Windows (Cygwin command line )
LOCAL_IP=${LOCAL_IP:-`ipconfig.exe | grep -im1 'IPv4 Address' | cut -d ':' -f2`}
MacOS
LOCAL_IP=${LOCAL_IP:-`ipconfig getifaddr en0`} #en0 for host and en1 for wireless
Linux
LOCAL_IP=${LOCAL_IP:-`ifconfig | sed -En 's/127.0.0.1//;s/.*inet (addr:)?(([0-9]*\.){3}[0-9]*).*/\2/p'`}

CentOS Network Interface Post-Up Script Not Executing

I am running CentOS 7.2, and I'm struggling to get a simple script to execute on ifup of any interface.
My /sbin/ifup-local looks like this:
[root#oracle2 ~]# cat /sbin/ifup-local
#!/bin/bash
if [[ "$1" == "eth0" ]]
then
exec /vpnup
fi
[root#oracle2 ~]#
The referenced script /vpnup looks like this:
[root#oracle2 ~]# cat /vpnup
#!/bin/bash
#
# CompanyX Production L2TP VPN - UP
#
#
echo -e "\n"
echo -e "PLEASE WAIT\n"
echo -e "Dialling Production L2TP VPN... \n"
echo -e ".........................................\n"
ipsec auto --up L2TP-PSK && echo "c qvprodvpn" > /var/run/xl2tpd/l2tp-control
echo -e ".........................................\n"
echo "Connected..."
echo "Adding local static route to manage VPN bound traffic..."
sleep 6s
ip route add 10.10.24.0/24 via 10.10.24.51
echo "Route added..."
echo -e "...\n"
[root#oracle2 ~]#
Fairly simple, the script works fine when called at command line. It just dials into a L2TP VPN that I've setup, to get this box access to the production LAN of another segment of their network.
However, if I execute "service network restart" or indeed "systemctl restart network.service", the VPN interface does not come up, nor does the ip route get added. If I manually execute ifdown eth0, and then ifup eth0, it also does not run the script as intended.
If I execute "/sbin/ifup-local eth0" the script runs as expected, so I know my script is fine, and I know my ifup-local is fine.
Am I missing something obvious? I've never worked with pre/post up scripts before, but I always figured they were pretty simple... Was I wrong?
Ensure your ifcfg-eth0 script includes
NM_CONTROLLED=no
Otherwise, calling systemctl restart network or ifup eth0 will not execute ifup-pre-local, ifup-eth, ifup-post, ifup-local, etc. for eth0. They will still be called for lo, though.

how to create a script for checking if ssh is running and if not restart it

I'm running a headless system with a raspberry pi and after a while of not connecting via ssh the system will stop responding to ssh, it is not the Wi-Fi dongle falling asleep, I have checked, seeing I have a piglow running piglow-sysmon, and the part of the pi glow that monitors network activity does show activity when the pi stops responding to ssh. I found a nice script for checking if Wi-Fi is up and if not restart it, although im not that great with bash scripting, and cannot figure out how or if i can mod it to work with ssh instead of Wi-Fi, if anyone can help me mod it, or provide a small quick one, I'm using cron to run it (once I can get it modded) every few minutes
here the script I'm trying to mod
#!/bin/bash
LOGFILE=/home/pi/network-monitor.log
if ifconfig wlan0 | grep -q "inet addr:" ;
then
echo "$(date "+%m %d %Y %T") : Wifi OK" >> $LOGFILE
else
echo "$(date "+%m %d %Y %T") : Wifi connection down! Attempting reconnection." >> $LOGFILE
ifup --force wlan0
OUT=$? #save exit status of last command to decide what to do next
if [ $OUT -eq 0 ] ; then
STATE=$(ifconfig wlan0 | grep "inet addr:")
echo "$(date "+%m %d %Y %T") : Network connection reset. Current state is" $STATE >> $LOGFILE
else
echo "$(date "+%m %d %Y %T") : Failed to reset wifi connection" >> $LOGFILE
fi
fi
Try the following script. It makes a few assumptions:
1) Your account has its own ssh key in the authorized_keys file, so that "ssh localhost" essentially just gives you another shell, without prompting for a password
2) If the ssh command does not complete in three seconds, it would be safe to assume that the ssh daemon is up, but stuck for some reason:
#! /bin/bash
ssh localhost /bin/true &
sleep 3; kill -9 $!
if wait $!
then
echo Up
else
echo Down
fi
A bit crude, but should be effective. It's up to you to figure out how to restart the ssh service in an optimum way, here. Fill in the blanks.
You may also want to discard all standard error here, as it'll likely to have some unimportant noise...
If, on the other hand, this script reports that the ssh service is running, but you still can't connect from the outside, the problem is not the ssh service, but it lies elsewhere, so it's back to the drawing board for you.

ssh script returns 255 error

In my code I have the following to run a remote script.
ssh root#host.domain.com "sh /home/user/backup_mysql.sh"
For some reason it keeps 255'ing on me. Any ideas?
I can SSH into the box just fine (passless keys setup)
REMOTE SCRIPT:
MUSER='root'
MPASS='123123'
MHOST="127.0.0.1"
VERBOSE=0
### Set bins path ###
GZIP=/bin/gzip
MYSQL=/usr/bin/mysql
MYSQLDUMP=/usr/bin/mysqldump
RM=/bin/rm
MKDIR=/bin/mkdir
MYSQLADMIN=/usr/bin/mysqladmin
GREP=/bin/grep
### Setup dump directory ###
BAKRSNROOT=/.snapshots/tmp
#####################################
### ----[ No Editing below ]------###
#####################################
### Default time format ###
TIME_FORMAT='%H_%M_%S%P'
### Make a backup ###
backup_mysql_rsnapshot(){
local DBS="$($MYSQL -u $MUSER -h $MHOST -p$MPASS -Bse 'show databases')"
local db="";
[ ! -d $BAKRSNROOT ] && ${MKDIR} -p $BAKRSNROOT
${RM} -f $BAKRSNROOT/* >/dev/null 2>&1
# [ $VERBOSE -eq 1 ] && echo "*** Dumping MySQL Database ***"
# [ $VERBOSE -eq 1 ] && echo -n "Database> "
for db in $DBS
do
local tTime=$(date +"${TIME_FORMAT}")
local FILE="${BAKRSNROOT}/${db}.${tTime}.gz"
# [ $VERBOSE -eq 1 ] && echo -n "$db.."
${MYSQLDUMP} --single-transaction -u ${MUSER} -h ${MHOST} -p${MPASS} $db | ${GZIP} -9 > $FILE
done
# [ $VERBOSE -eq 1 ] && echo ""
# [ $VERBOSE -eq 1 ] && echo "*** Backup done [ files wrote to $BAKRSNROOT] ***"
}
### Die on demand with message ###
die(){
echo "$#"
exit 999
}
### Make sure bins exists.. else die
verify_bins(){
[ ! -x $GZIP ] && die "File $GZIP does not exists. Make sure correct path is set in $0."
[ ! -x $MYSQL ] && die "File $MYSQL does not exists. Make sure correct path is set in $0."
[ ! -x $MYSQLDUMP ] && die "File $MYSQLDUMP does not exists. Make sure correct path is set in $0."
[ ! -x $RM ] && die "File $RM does not exists. Make sure correct path is set in $0."
[ ! -x $MKDIR ] && die "File $MKDIR does not exists. Make sure correct path is set in $0."
[ ! -x $MYSQLADMIN ] && die "File $MYSQLADMIN does not exists. Make sure correct path is set in $0."
[ ! -x $GREP ] && die "File $GREP does not exists. Make sure correct path is set in $0."
}
### Make sure we can connect to server ... else die
verify_mysql_connection(){
$MYSQLADMIN -u $MUSER -h $MHOST -p$MPASS ping | $GREP 'alive'>/dev/null
[ $? -eq 0 ] || die "Error: Cannot connect to MySQL Server. Make sure username and password are set correctly in $0"
}
### main ####
verify_bins
verify_mysql_connection
backup_mysql_rsnapshot
This is usually happens when the remote is down/unavailable; or the remote machine doesn't have ssh installed; or a firewall doesn't allow a connection to be established to the remote host.
ssh returns 255 when an error occurred or 255 is returned by the remote script:
EXIT STATUS
ssh exits with the exit status of the remote command or
with 255 if an error occurred.
Usually you would an error message something similar to:
ssh: connect to host host.domain.com port 22: No route to host
Or
ssh: connect to host HOSTNAME port 22: Connection refused
Check-list:
What happens if you run the ssh command directly from the command line?
Are you able to ping that machine?
Does the remote has ssh installed?
If installed, then is the ssh service running?
This error will also occur when using pdsh to hosts which are not contained in your "known_hosts" file.
I was able to correct this by SSH'ing into each host manually and accepting the question "Do you want to add this to known hosts".
If there's a problem with authentication or connection, such as not being able to read a password from the terminal, ssh will exit with 255 without being able to run your actual script. Verify to make sure you can run 'true' instead, to see if the ssh connection is established successfully.
Isn't the problem in the lines:
### Die on demand with message ###
die(){
echo "$#"
exit 999
}
Correct me if I'm wrong but I believe exit 999 is out of range for an exit code and results in a exit status of 255.
I was stumped by this. Once I got passed the 255 problem... I ended up with a mysterious error code 1. This is the foo to get that resolved:
pssh -x '-tt' -h HOSTFILELIST -P "sudo yum -y install glibc"
-P means write the output out as you go and is optional. But the -x '-tt' trick is what forces a psuedo tty to be allocated.
You can get a clue what the error code 1 means this if you try:
ssh AHOST "sudo yum -y install glibc"
You may see:
[slc#bastion-ci ~]$ ssh MYHOST "sudo yum -y install glibc"
sudo: sorry, you must have a tty to run sudo
[slc#bastion-ci ~]$ echo $?
1
Notice the return code for this is 1, which is what pssh is reporting to you.
I found this -x -tt trick here. Also note that turning on verbose mode (pssh --verbose) for these cases does nothing to help you.
It can very much be an ssh-agent issue.
Check whether there is an ssh-agent PID currently running with eval "$(ssh-agent -s)"
Check whether your identity is added with ssh-add -l and if not, add it with ssh-add <pathToYourRSAKey>.
Then try again your ssh command (or any other command that spawns ssh daemons, like autossh for example) that returned 255.
If above didn't help: check if locale is valid on client and server:
https://www.linuxbabe.com/linux-server/fix-ssh-locale-environment-variable-error
How do not pass locale through ssh
### Die on demand with message ###
die(){
echo "$#"
exit 999
}
I don't have the rep to comment on Alex's answer but the exit 999 line returns code 231 on my WSL Ubuntu 20.04.4 box. Not quite sure why that is returned but I understand that it's out of range.

Resources