autossh tunnel hangs because of „Adress already in use” regardless of all timeouts - linux

I use autossh to create a remote tunnel with the following command (IPs and Port Changed):
autossh -M 0 -o "ServerAliveInterval 5" -o "ServerAliveCountMax 3" -f -T -N -i /root/.ssh/id_rsa -R 1602:localhost:443 root#123.123.123.123
And the server has this config in sshd:
GatewayPort yes
ClientAliveInterval 10
ClientAliveCountMax 6
This works most of the time like a charm. Also timeouts and disconnects get handled very well.
But there is one exception:
If there is only a very short interruption of the network connection – the client notice this and start a reconnect. But the server hasn’t noticed this yet and still uses this port 1602. I can then see in server log the message: sshd[431646]: error: bind [::]:1602: Address already in use.
But autossh does not hang up and try again, it keeps the not working tunnel open. A few seconds later, the server recognise the disconnect of the old tunnel and frees the port 1602.
Now I have a autossh/ssh tunnel – doing all the watchdog stuff (I can see in log this keep alive message all 5 seconds) and staying alive. The port on the server is now unused. And the tunnel is not working, because the port is not allocated at all now.
Autossh does not recover from this state without manual interaction. There are multiple ways to recover manually, but this is not the question.
My questions are:
Why autossh does not hang up and retry if the port is in use (would solve the issue)
Or
How to force free the port and rebind to the new tunnel on reconnect?
Or
How to detect tunnels without actual ports bound to it in order to kill them (for example each minute in a cronjob)
Im searching for a way to automatically recover from this state. And I wonder why this race condition is not mentioned in any place of the internet, even if it can be reproduced easily.

You need to add -o "ExitOnForwardFailure yes" as an autossh option.

Related

Socket connection into Docker initially succeeds then fails

Running under MacOS I am connecting from a node.js app with net.Socket() into a Docker container running on the same host, which contains a C++ sockets server under Centos. The Docker run command is:
docker run -it --rm -p 14000-14010:14000-14010 -v /Users/me/Development/spdz:/spdz spdz/spdzdev
When the c++ server in docker is not running, I see a successful connection in node followed 3ms later by a socket closed message.
It appears as if a proxy in front of the container is accepting the request, passing it through to Docker where it is rejected. However this leads to erroneous messages in my front end application which thinks the connection was successful, only to find out later it was not.
I would like to see a simple connection declined. Any suggestions as to how this may be remedied or better understood would be helpful.
I am confident that the behaviour is introduced by Docker, as running the components outside Docker gives the expected immediate failure on connection. Also I have tried mapping the exported ports to an external network interface rather than localhost but see the same behaviour.
I suggest that you check that if the error is not coming from your server application.
You can use netcat command line to open a socket on your Docker container
nc -l 14000
This will create a TCP server socket listening on port 14000.
Then, from your host computer (MacOs), open a terminal and try to connect with telnet
telnet -e q localhost 14000

Reverse ssh tunnel fails to bind to port when tunnel is torn down and restarted

I have a host that starts a reverse ssh tunnel upon bootup like this:
ssh -N -R 2222:localhost:22 root#10.1.2.6
It works great and the reverse tunnel is formed. But whenever I reboot the host, the remote server that the tunnel is built to says this:
Sep 28 13:13:59 kali sshd[4547]: error: bind: Address already in use
Sep 28 13:13:59 kali sshd[4547]: error: channel_setup_fwd_listener_tcpip: cannot listen to port: 2222
In order for me to resolve this I have to wait a few minutes for the old ssh tunnel to timeout, then find the new ssh connection and kill it, then when I rebuild the ssh tunnel it works fine.
Is there an ssh command or autossh command that does something like checks if the remote host can bind that port, if not, try again in a few seconds?
I believe I have run into the same issue as the original poster. I seem to have found the solution at the end of the accepted answer of this question:
If the client reconnect before the connection has terminated on the server, you can end up in a situation where the new ssh connection is live, but has no port forwardings. In order to avoid that, you need to use the ExitOnForwardFailure keyword on the client side.
I have thus added the following line to my /etc/ssh/ssh_config file at the client side:
ExitOnForwardFailure yes
According to the ssh man page, this option will cause "a client started with -f [to] wait for all remote port forwards to be successfully established before placing itself in the background".
This seems to cause ssh to fail when attempting to start an ssh tunnel immediately after killing one. The option thus enables repeating the attempt until the tunnel is correctly re-established.

Binding IPv4 and IPv6 socket failure

I installed a license server for a software and run this server on a Linux machine. The execution of the license server is something like:
./exefile -logfile log -loglevel 4
where I ask all log info to be written in the file named log.
Everything is running smoothly but when checking the file log, I see the error messages:
"Binding IPv4 socket" "Failure. Socket 16286 probably already in use"
"Binding IPv6 socket" "Failure. Socket 16287 probably already in use"
It seems that this error makes it impossible for the server to record the ip addresses of the machines who (successfully) run the program but I cannot understand the message or find how to solve this error...
Any idea ?
Thanks.
Ok, so to sum up, thanks to Marc's comments I was able to see that the first run of the license server created a process that was using sockets 16286 and 16287 using the command:
netstat -ap
After killing this process and restarting the license server, everything works well.

How to free stuck server ports?

I'm fairly sure that this is a bug in node v0.10.18, but it has created a pollution on my machine which I don't know how to clear.
I have this simple tcp server (coffee) script:
net = require 'net'
server = net.createServer ->
server.listen 'localhost:4545'
when I run it using coffee z.coffee and then press Ctrl+C to interrupt it, I am unable to run it again on the same port due to EADDRINUSE exception. The process repeats on different ports with the same results.
I am aware of other answers about a similar issue, but they aren't able to solve mine because even restarting the machine (osx ml) still leaves the port blocked. Obviously, ps -A | grep node shows nothing as well.
What can I do to free up the stuck ports again?
Edit
Here is an abstract of the comments below. It seems that node uses SO_REUSEADDR be default, so TIME_WAIT should not be the issue, especially since the ports have been stuck for over an hour. Neither netstat nor lsof as root show anything using the ports, and neither multiple reboots, nor killing all but essential programs helped the issue resolve. There is no VPN or firewall.
https://github.com/joyent/node/blob/3d4c663ee68326990e0732a4aa76445688e1064e/lib/net.js#L1159
You are passing invalid arguments to server.listen. It is interpreting your string as a unix domain socket filesystem path.
This program works fine and can be killed and restarted immediately.
net = require "net"
server = net.createServer ->
console.log "connection"
server.listen 1337, "127.0.0.1"
Pass correct arguments to server.listen and all is well.

Openswan tunnel not working after network restart

I observed some strange behaviour while trying to create ipsec connection.
I configured ipsec between cisco asa and my Linux box and it works as expected. But when I restart the network service on my Linux box or restart the port on the cisco side, the tunnel stops working but tunnel status is up:
/etc/init.d/ipsec status
/usr/libexec/ipsec/addconn Non-fips mode set in /proc/sys/crypto/fips_enabled
IPsec running - pluto pid: 2684
pluto pid 2684
1 tunnels up
some eroutes exist
When I try to connect to the other side (telnet, ping, ssh), the connection doesn't work.
My /etc/ipsec.conf looks like this:
# /etc/ipsec.conf - Openswan IPsec configuration file
#
# Manual: ipsec.conf.5
#
# Please place your own config files in /etc/ipsec.d/ ending in .conf
version 2.0 # conforms to second version of ipsec.conf specification
# basic configuration
config setup
# Debug-logging controls: "none" for (almost) none, "all" for lots.
# klipsdebug=none
# plutodebug="control parsing"
# For Red Hat Enterprise Linux and Fedora, leave protostack=netkey
protostack=netkey
nat_traversal=yes
virtual_private=
oe=off
# Enable this if you see "failed to find any available worker"
nhelpers=0
#You may put your configuration (.conf) file in the "/etc/ipsec.d/" and uncomment this.
include /etc/ipsec.d/*.conf
And my /etc/ipsec.d/myvpn.conf looks like this:
conn myvpn
authby=secret # Key exchange method
left=server-ip # Public Internet IP address of the
# LEFT VPN device
leftsubnet=server-ip/32 # Subnet protected by the LEFT VPN device
leftnexthop=%defaultroute # correct in many situations
right=asa-ip # Public Internet IP address of
# the RIGHT VPN device
rightsubnet=network/16 # Subnet protected by the RIGHT VPN device
rightnexthop=asa-ip # correct in many situations
auto=start # authorizes and starts this connection
# on booting
auth=esp
esp=aes-sha1
compress=no
When I restart the openswan service everything starts working, but i think there should be some logic that does this automatically. has anyone an idea what i am missing?
You probably want to enable dead peer detection if available on both sides. Dead peer detection notices when the tunnel isn't actually working anymore and disconnects or resets it.
If not available, you can also try changing your session renegotiation time down very low; your tunnel will create new keys frequently and set up new tunnels to replace the old ones on a regular basis effectively recreating the tunnel after that timeout when the session has gone down.
For PPP sessions on Linux myself, I simply have a "service ipsec restart" in /etc/ppp/ip-up.local to restart all tunnels whenever the PPP device comes back online.
YMMV.
Just try DPD, but not work.
So I just learned from mikebabcock.
add the following line in my /etc/ppp/ip-down
service ipsec restart
With this workaround, now L2TP/IPSec worked like a charm.
I don't like the idea restarting ipsec every time you lose connection. Actually /usr/libexec/ipsec/_updown is ran on different actions in ipsec. The same script can be run on leftupdown/rightupdown. But the problem is that it doesn't perform any actual command when the remote client connects back to your host. To fix this issue you need add doroute replace after up-client) in /usr/libexec/ipsec/_updown.netkey (if you use Netkey of course):
# ...skipped...
#
up-client)
# connection to my client subnet coming up
# If you are doing a custom version, firewall commands go here.
doroute replace
#
# ...skipped...
But be aware, this file will be overwritten, if you update your packages, so just put it somewhere else, and then add the following commands to your connection config:
rightupdown="/usr/local/libexec/ipsec/_updown"
leftupdown="/usr/local/libexec/ipsec/_updown"
Now the routes will be restored as soon as the remote connects back to your server.
Also to me, for strange reasons DPD not work properly in every situation.
I use this script to check every minute the status. The scripts runs on the Peer (e.g. the Firewall):
C=$(ipsec auto --status | grep "established" | wc -l)
if [ $C -eq 0 ]
then
echo "Tunnel is down... Restarting"
ipsec restart
else
echo "Tunnel is up...Bye!"
fi
this could happen because of iptables rules.
Be sure to have enabled the udp port 500 and the esp protocol towards the remote public ip address.
Example:
iptables -A OUTPUT -p udp -d 1.2.3.4 --dport 500 -j ACCEPT
iptables -A OUTPUT -p esp -d 1.2.3.4 -j ACCEPT
Bye

Resources