Openswan tunnel not working after network restart - linux

I observed some strange behaviour while trying to create ipsec connection.
I configured ipsec between cisco asa and my Linux box and it works as expected. But when I restart the network service on my Linux box or restart the port on the cisco side, the tunnel stops working but tunnel status is up:
/etc/init.d/ipsec status
/usr/libexec/ipsec/addconn Non-fips mode set in /proc/sys/crypto/fips_enabled
IPsec running - pluto pid: 2684
pluto pid 2684
1 tunnels up
some eroutes exist
When I try to connect to the other side (telnet, ping, ssh), the connection doesn't work.
My /etc/ipsec.conf looks like this:
# /etc/ipsec.conf - Openswan IPsec configuration file
#
# Manual: ipsec.conf.5
#
# Please place your own config files in /etc/ipsec.d/ ending in .conf
version 2.0 # conforms to second version of ipsec.conf specification
# basic configuration
config setup
# Debug-logging controls: "none" for (almost) none, "all" for lots.
# klipsdebug=none
# plutodebug="control parsing"
# For Red Hat Enterprise Linux and Fedora, leave protostack=netkey
protostack=netkey
nat_traversal=yes
virtual_private=
oe=off
# Enable this if you see "failed to find any available worker"
nhelpers=0
#You may put your configuration (.conf) file in the "/etc/ipsec.d/" and uncomment this.
include /etc/ipsec.d/*.conf
And my /etc/ipsec.d/myvpn.conf looks like this:
conn myvpn
authby=secret # Key exchange method
left=server-ip # Public Internet IP address of the
# LEFT VPN device
leftsubnet=server-ip/32 # Subnet protected by the LEFT VPN device
leftnexthop=%defaultroute # correct in many situations
right=asa-ip # Public Internet IP address of
# the RIGHT VPN device
rightsubnet=network/16 # Subnet protected by the RIGHT VPN device
rightnexthop=asa-ip # correct in many situations
auto=start # authorizes and starts this connection
# on booting
auth=esp
esp=aes-sha1
compress=no
When I restart the openswan service everything starts working, but i think there should be some logic that does this automatically. has anyone an idea what i am missing?

You probably want to enable dead peer detection if available on both sides. Dead peer detection notices when the tunnel isn't actually working anymore and disconnects or resets it.
If not available, you can also try changing your session renegotiation time down very low; your tunnel will create new keys frequently and set up new tunnels to replace the old ones on a regular basis effectively recreating the tunnel after that timeout when the session has gone down.
For PPP sessions on Linux myself, I simply have a "service ipsec restart" in /etc/ppp/ip-up.local to restart all tunnels whenever the PPP device comes back online.
YMMV.

Just try DPD, but not work.
So I just learned from mikebabcock.
add the following line in my /etc/ppp/ip-down
service ipsec restart
With this workaround, now L2TP/IPSec worked like a charm.

I don't like the idea restarting ipsec every time you lose connection. Actually /usr/libexec/ipsec/_updown is ran on different actions in ipsec. The same script can be run on leftupdown/rightupdown. But the problem is that it doesn't perform any actual command when the remote client connects back to your host. To fix this issue you need add doroute replace after up-client) in /usr/libexec/ipsec/_updown.netkey (if you use Netkey of course):
# ...skipped...
#
up-client)
# connection to my client subnet coming up
# If you are doing a custom version, firewall commands go here.
doroute replace
#
# ...skipped...
But be aware, this file will be overwritten, if you update your packages, so just put it somewhere else, and then add the following commands to your connection config:
rightupdown="/usr/local/libexec/ipsec/_updown"
leftupdown="/usr/local/libexec/ipsec/_updown"
Now the routes will be restored as soon as the remote connects back to your server.

Also to me, for strange reasons DPD not work properly in every situation.
I use this script to check every minute the status. The scripts runs on the Peer (e.g. the Firewall):
C=$(ipsec auto --status | grep "established" | wc -l)
if [ $C -eq 0 ]
then
echo "Tunnel is down... Restarting"
ipsec restart
else
echo "Tunnel is up...Bye!"
fi

this could happen because of iptables rules.
Be sure to have enabled the udp port 500 and the esp protocol towards the remote public ip address.
Example:
iptables -A OUTPUT -p udp -d 1.2.3.4 --dport 500 -j ACCEPT
iptables -A OUTPUT -p esp -d 1.2.3.4 -j ACCEPT
Bye

Related

autossh tunnel hangs because of „Adress already in use” regardless of all timeouts

I use autossh to create a remote tunnel with the following command (IPs and Port Changed):
autossh -M 0 -o "ServerAliveInterval 5" -o "ServerAliveCountMax 3" -f -T -N -i /root/.ssh/id_rsa -R 1602:localhost:443 root#123.123.123.123
And the server has this config in sshd:
GatewayPort yes
ClientAliveInterval 10
ClientAliveCountMax 6
This works most of the time like a charm. Also timeouts and disconnects get handled very well.
But there is one exception:
If there is only a very short interruption of the network connection – the client notice this and start a reconnect. But the server hasn’t noticed this yet and still uses this port 1602. I can then see in server log the message: sshd[431646]: error: bind [::]:1602: Address already in use.
But autossh does not hang up and try again, it keeps the not working tunnel open. A few seconds later, the server recognise the disconnect of the old tunnel and frees the port 1602.
Now I have a autossh/ssh tunnel – doing all the watchdog stuff (I can see in log this keep alive message all 5 seconds) and staying alive. The port on the server is now unused. And the tunnel is not working, because the port is not allocated at all now.
Autossh does not recover from this state without manual interaction. There are multiple ways to recover manually, but this is not the question.
My questions are:
Why autossh does not hang up and retry if the port is in use (would solve the issue)
Or
How to force free the port and rebind to the new tunnel on reconnect?
Or
How to detect tunnels without actual ports bound to it in order to kill them (for example each minute in a cronjob)
Im searching for a way to automatically recover from this state. And I wonder why this race condition is not mentioned in any place of the internet, even if it can be reproduced easily.
You need to add -o "ExitOnForwardFailure yes" as an autossh option.

Setup ssh to connect 2 PC and use MPI

I am here because I've found different problems setting up SSH using this guide proposed in this other question.
First of all I've a computer (I want to use it as master) called: timmy#timmy-Lenovo-G50-80. My other computer is a Virtual Machine always with linux mint called: test#test-VirtualBox and I'd like to use it as a slave.
What I've done until now is:
install needed packets (both PC):
sudo apt-get install openssh-server openssh-client
Change inside the file /etc/ssh/sshd_config: (Only master)
the port of server from 22 to 2222
set PubkeyAuthentication yes (so no change)
remove comment at line: Banner /etc/issue.net
STOP
I am stuck when I've to execute this command:
ssh-copy-id username#remotehost
I imagine, reading what's written, that I've to execute something like:
ssh-copy-id timmy#timmy-Lenovo-G50-80
but:
from timmy#timmy-Lenovo-G50-80 everything goes OK, I can connect to myself (not what I actually want)
from test#test-VirtualBox it tells me ERROR: ssh: Could not resolve hostname timmy#timmy-Lenovo-G50-80: Name or service not known
Finally, what I've to do in order to connect these 2 PC?
You need to enable port forwarding into your VirtualBox'ed machine. Simply right click on the virtual machine, then go into Network. Then click on advance which will expand the Network window, and then on the button that appeared labeled Port forwarding.
A table will appear with several columns (Name, Protocol, Host IP, Host Port, ...). Simply add a new entry for protocol TCP, host port = X and guest port = 22 (see the list of well-known ports here https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers#Well-known_ports). The screenshot below is from my cloudera quickstart VM. Notice the outlined entry in the port forwarding rules, which is about setting up the SSH port in the guest OS.
Once you reboot the virtual machine, you can simply connect to it through
# ssh -p X localhost
the -p parameter tells to connect through the port X. Notice that if you want to use scp then you have to use the uppercase -P option rather than the lowercase -p.
# scp -P X localfile localhost:remote-dir/

CHECK_NRPE: Error - Could not complete SSL handshake

I have NRPE daemon process running under xinetd on amazon ec2 instance and nagios server on my local machine.
The check_nrpe -H [amazon public IP] gives this error:
CHECK_NRPE: Error - Could not complete SSL handshake.
Both Nrpe are same versions. Both are compiled with this option:
./configure --with-ssl=/usr/bin/openssl --with-ssl-lib=/usr/lib/i386-linux-gnu/
"allowed host" entry contains my local IP address.
What could be the possible reason of this error now??
If you are running nrpe as a service, make sure you have this line in your nrpe.cfg on the client side:
# example 192. IP, yours will probably differ
allowed_hosts=127.0.0.1,192.168.1.100
You say that is done, however, if you are running nrpe under xinetd, make sure to edit the only_from directive in the file /etc/xinetd.d/nrpe.
Don't forget to restart the xinetd service:
service xinetd restart
To check if you have access to it at all attempt a simple telnet on the address:port, a ping or traceroute to see where it is blocking.
telnet IP port
ping IP
traceroute -p $port IP
Also check on the target server that the nrpe daemon is working properly.
netstat -at | grep nrpe
You also need to check the versions of OpenSSL installed on both servers, as I have seen this break checks on occasion with the SSL handshake!
check your /var/sys/system.log . In my case, it turned out my monitored IP was set to something else than the one I set in nrpe.cfg file. I don't know the cause of this change, though.
#jgritty was right.
you should edit nrpe.cfg and nrpe config files to allow your master nagios server's access:
vim /usr/local/nagios/etc/nrpe.cf
allowed_hosts=127.0.0.1,172.16.16.150
and
vim /etc/xinetd.d/nrpe
only_from= 127.0.0.1 172.16.16.150
That's somewhat of a catch-all error message for NRPE. Check your firewall rules and make sure that port is open. Also try disabling SELinux and seeing if that lets the connection through. It's likely not an SSL issue, but just an issue with the connection being refused.
It looks like you are running your Nagios server in a virtual machine on a host-only network. If this is so, this would stop any external access. Ensure that you have a NAT or Bridged Network available.
So many answers, none of them hit the reason why I ran into this issue.
It turns out that nagios has terrible cross-version support and this was caused by me having a version 2 "client" (machine being monitored) and a version 3 "server" (monitoring machine).
Once I upgraded the client to version 3, the problem went away and I could do a check_nrpe -H [client IP] without issues.
Note that I am not sure if client/server are the right terms with nagios, as in the case of an NRPE call, the server is really the machine being called, but I digress.
Make sure that you have restarted the Nagios Client Plugin as well.
I'm running nrpe using the xinetd service.
Make sure also (in addition to the above basic steps) that your nagios user is authenticating properly. In my case:
Jun 6 15:05:52 gse2 xinetd[33237]: **Unknown user: nagios**<br>[file=/etc/xinetd.d/nrpe] [line=9]
Jun 6 15:05:52 gse2 xinetd[33237]: Error parsing attribute user - DISABLING
SERVICE [file=/etc/xinetd.d/nrpe] [line=9]
Jun 6 15:05:52 gse2 xinetd[33237]: **Unknown group: nagios**<br>[file=/etc/xinetd.d/nrpe] [line=10]
Jun 6 15:05:52 gse2 xinetd[33237]: Error parsing attribute group - DISABLING
SERVICE [file=/etc/xinetd.d/nrpe] [line=10]
Jun 6 15:05:52 gse2 xinetd[33237]: Service nrpe missing attribute user - DISABLING
Was showing in the /var/log messages.
It escaped me at first, but then I did a check on ypbind service and found it was not started.
After starting ypbind, nagios user and group was authenticating properly, the error went away.
some edge cases restarting nagios-nrpe-server doesn't help, due to the fact that process was not killed or it was not properly restarted.
just kill it manually then, and start.
SSL handshake error msg.Beside the allow_host you should assign.
your nagios server is in a local lan with C type ip address such as 192.168.xxxx
when the target monitored server feedback the ssl msg to your local nagios server,the message should first comes to your public IP of your line,the message cannot across the public IP into your nagios server which ip is an internal one.
you need NAT to guide the SSL message from target server to inner nagios server.
Or you better use "GET" method which just get monitor message from the nagios client side,such as SNMP to fulfill the remote monitor of local resource of linux servers.
SSL need feedback in double direction.
Best Regards
For me setting the following in /etc/nagios/nrpe.cfg on Client worked:
dont_blame_nrpe=1
It's and ubuntu 16.04 machine.
For other possible problems, I recommend looking at nrpe logs. Here is good article for configuring logs.
If you are running Debian 9 then there is a known issue regarding this problem, caused by OpenSSL dropping support for the method NRPE uses to initiate anonymous SSL connections.
The issue seems to be fixed but the fix hasn't made it into the official packages, yet.
Currently there seems to be no secure work-around.
check configuration in /etc/xinetd.d/nrpe and verify the server IP. If it is showing only_from = 127.0.0.1 change it with Server IP .

Linux Debian SSH connection to another machine has delay after network settings change

Hi StackOverflow members,
I have an issue with ssh connection on my Debian 7 system to a remote OpenSSH server located on the same network. It looks like there is some network configuration problem but I cann't find where it lays. This two debian machines are connect with a switch that is NOT connected to a router. So the two machines have no internet connection.
A-Debian 7
IP: 192.168.1.2
MASK: 255.255.255.0
GW: 192.168.1.1
B-Debian 7
IP: 192.168.1.3
MASK: 255.255.255.0
GW: 192.168.1.1
With that configuration the ssh command prompts my for a password in less then a second. But the with the following network configuration I get the password prompt after a 10+ second delay:
A-Debian 7
IP: 10.10.1.83
MASK: 255.255.255.128
GW: 10.10.1.1
B-Debian 7
IP: 10.10.1.82
MASK: 255.255.255.128
GW: 10.10.1.1
The ssh connection from the server A -> B runs with both configs on custom 1111 port.
The B machine has also a Web server running on port 8080 that has no delays with both net configurations.
Thank you in advance for any clues or tips how to solve that problem.
SOLVED: Removing of the gateway parameter "GW: 10.10.1.1" in the network settings has solved the problem.
The usual culprits here are IPv6 and DNS lookups.
SSH might try to connect via IPv6, first, but the timeout is too low for that. You can see whether IPv6 is enabled with
cat /proc/sys/net/ipv6/conf/eth0/disable_ipv6
To disable:
echo 1 > /proc/sys/net/ipv6/conf/eth0/disable_ipv6
The second culprit is DNS; my guess is that DNS lookups don't work correctly with the second configuration. Try host www.google.com to test this theory.
If that also has a delay, you need to fix your DNS setup.
If that's not it, check the rest of your networking parameters: Gateway, cables, etc.
Start to ping the other host. Is that fast & reliable?
Next, try remote login (ssh, telnet). Note that you can give telnet a port to connect to, so if you have DB server running, you can still use telnet to connect to the server. It will print an error but it allows you to test the TCP/IP connection without any extra error sources.

Connect to PostgreSql database in Linux VirtualBox from Win7

As said in headline, from Win7 host I'm trying to access Postgres 9.3 established in Linux Centos 5.8 which is in VirtualBox on the same machine. I'm trying to access it from PGAdmin and everything is OK when I start the Postgre from Win7 services, so PGAdmin is well configured.
What have I tried? I've read many articles about this subject, and even some questions on this forum but nothing worked. I have:
switched to NAT and forwarded port 5432 in VirtualBox GUI
set listenadresses = '*' in postgresql.conf file
put host all all 10.0.2.1/24 md5 line in the pg_hba.conf file
put 5432 port inbound and outbound rule in win7 firewall settings
disabled linux firewall with #service iptables stop
Just to mention. When service is started in virtual linux, I can access it from linux, so service is properly started. Problem is that windows doesn't see that service. And when service is started from linux, I can start the same service in Win and vice-versa although the port 5432 should be occupied.
The most suspicious part to me is point 3) because I'm not sure whether i have put good address in rule. That address vary from article to article, and I would appreciate if someone could explain me how to be sure which address (or range) to put there, according to my network. Or some other advice if possible. Thanks.
Solved.
Replacing:
"host all all 10.0.2.1/24 md5" with "host all all 0.0.0.0/0 trust" solved it.
In my case adding the below line to pg_hba.conf was enough:
host all all 10.0.0.0/16 md5
and then restart:
sudo /etc/init.d/postgresql restart
The Solution by Filip works, but you can tailor it further.
First, enable Adapter 2 in VM and set it to Host-only Adapter:
Second go to your host machine and find it's ip address.
This can be found by running ipconfig in your windows host machine.
Now you need to edit two files in your VMBox.
First is postgresql.conf
sudo nano /etc/postgresql/<version>/main/postgresql.conf
and add the following line:
listen_addresses = '*'
save it and then edit pg_hba.conf
sudo nano /etc/postgresql/<version>/main/pg_hba.conf
Here you need to add your host machine ip (in my case it was 192.168.56.1:
host all all 192.168.56.1/0 trust
Save it and restart postgresql
sudo /etc/init.d/postgresql restart
Now you can use pgadmin to connect to vm postgresql.
Convenience!

Resources