Anycast/ECMP not working with iproute2/netlink between network namespaces - linux

I am attempting to validate ECMP functionality on a linux host with unnumbered interfaces and network namespaces.
The following example can be used to demonstrate:
# add address to loopback for unnumbered veth interfaces
ip addr add 198.51.100.0/32 dev lo
# namespace 1
ip netns add ns1
ip link add veth100 type veth peer name veth101
ip link set veth100 up
ip link set veth101 netns ns1
ip netns exec ns1 ip link set veth101 name eth0
ip netns exec ns1 ip addr add 192.0.2.1/32 dev eth0
ip netns exec ns1 ip link set eth0 up
ip netns exec ns1 ip route add 198.51.100.0/32 dev eth0
ip netns exec ns1 ip route add 0.0.0.0/0 via 198.51.100.0
ip route add 192.0.2.1/32 dev veth100
# namespace 2
ip netns add ns2
ip link add veth200 type veth peer name veth201
ip link set veth200 up
ip link set veth201 netns ns2
ip netns exec ns2 ip link set veth201 name eth0
ip netns exec ns2 ip addr add 192.0.2.2/32 dev eth0
ip netns exec ns2 ip link set eth0 up
ip netns exec ns2 ip route add 198.51.100.0/32 dev eth0
ip netns exec ns2 ip route add 203.0.113.0/32 dev eth0
ip netns exec ns2 ip route add 0.0.0.0/0 via 198.51.100.0
ip route add 192.0.2.2/32 dev veth200
# anycast / ecmp setup
ip netns exec ns1 ip addr add 203.0.113.0/32 dev lo
ip netns exec ns1 ip link set dev lo up
ip netns exec ns2 ip addr add 203.0.113.0/32 dev lo
ip netns exec ns2 ip link set dev lo up
ip route append 203.0.113.0/32 nexthop via 192.0.2.1 weight 100
ip route append 203.0.113.0/32 nexthop via 192.0.2.2 weight 100
I can see that I have two routes in my routing table:
$ ip route show
...
203.0.113.0 via 192.0.2.1 dev veth100 onlink
203.0.113.0 via 192.0.2.2 dev veth200 onlink
...
Ping to 203.0.113.0 works (as expected):
$ ping 203.0.113.0 -c 2
PING 203.0.113.0 (203.0.113.0) 56(84) bytes of data.
64 bytes from 203.0.113.0: icmp_seq=1 ttl=64 time=0.096 ms
64 bytes from 203.0.113.0: icmp_seq=2 ttl=64 time=0.079 ms
--- 203.0.113.0 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1024ms
rtt min/avg/max/mdev = 0.079/0.087/0.096/0.008 ms
I can set either veth100 or veth200 down and achieve fail over. However, the load does not appear to be shared across veth100 and veth200 at the same time. I verified this by tcpdump'ing both veth100 and veth200 at the same time.
Experimenting, I've tried adding the ecmp route this way:
ip route add 203.0.113.0/32 nexthop via 192.0.2.2 weight 10 nexthop via 192.0.2.1 weight 10
The route appears to be installed differently. I'm not sure what the difference is in reality, but it looks different.
$ ip route show
...
203.0.113.0
nexthop via 192.0.2.2 dev veth200 weight 10
nexthop via 192.0.2.1 dev veth100 weight 10
...
But, this still has the same problem as mentioned above.
I'm not sure what next steps to take. What am I doing wrong? Is there any way to achieve ECMP load sharing in this scenario?

If you're only testing with ICMP pings the behaviour is expected. ECMP's 5-tuple hash (sourceIP+sourcePort+destIP+destPort+protocol) can't work with ICMP since it doesn't use port numbers so you'll always hit the same host.
Experiment with multiple UDP and TCP and you should see the load balancing effect since at least source ports should be ephemeral (unlike the destination well-known service ports).
BTW - thanks for spelling out the steps you took since I'm currently experimenting with the same concepts in order to replace the K8S network mess with simple load-balanced, routed, IPv6 only.

Related

How to specify the IP address on curl?

My server has 5 different external IPs (all working)
I added them by using:
ip addr add xx.xx.xx.xx/32 dev eth0
ip addr add yy.yy.yy.yy/32 dev eth0
ip addr add zz.zz.zz.zz/32 dev eth0
How can I should curl to use either zz.zz.zz.zz IP address ?
You should be able to use
curl --interface zz.zz.zz.zz http://example.com/

Linux tap interface not forwarding ip fragmentations

I have 4 tap interfaces, tap0 and tap1 is connected and so is tap2 and tap3
vde_switch -d -tap tap0 -tap tap1 click
vde_switch -d -tap tap2 -tap tap3 --sock /run/vde.ctl/ctl2
I then assigned ip for tap1 and tap2
ip addr add 1.1.1.1/24 dev tap1
ip addr add 1.2.1.1/24 dev tap2
From raw socket application, I sent a udp packet from tap0 with source ip 1.1.1.3 and destination ip 1.2.1.3 and it arrived at tap3 (according to wireshark).
The problem is, if I send fragmented ip/udp packet, Linux doesn't forward it to tap3.
I checked the fragmented ip packet (first segment), its checksum and destination mac addr are all right. The funny thing is, if I remove the "more fragment" bit in ip header (ip checksum will change), then it got forwarded.
By the way, I am using Linux 3.19.0-65 on 64bit laptop.
Any idea why? Thanks a lot!
EDIT1
Here is the output of ip route list
default via 10.0.0.1 dev wlan0 proto static
1.1.1.0/24 dev tap1 proto kernel scope link src 1.1.1.1
1.2.1.0/24 dev tap2 proto kernel scope link src 1.2.1.1
10.0.0.0/24 dev wlan0 proto kernel scope link src 10.0.0.3 metric 9
172.16.83.0/24 dev vmnet1 proto kernel scope link src 172.16.83.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.181.0/24 dev vmnet8 proto kernel scope link src 192.168.181.1
Edit2
Here is the link to the pcap of the IP fragment packet, captured on tap0 interface.

how to save ip rule and ip route configuration

The issus is:
I configured the following policy routing:
ip route add 192.168.1.0/24 via 10.0.2.15 dev eth0 table 10
ip route add default via 10.0.2.15 dev eth0 table 10
ip rule add from 10.0.2.15 table 10
these configurations are temporary, once network restart or reboot,the policy route I configured would lost, Is there any way you can solve this problem?
Create a file if does not exist
/etc/sysconfig/network-scripts/route-eth0
Add the following contents
192.168.1.0/24 via 10.0.2.15 dev eth0
default via 10.0.2.15 dev eth0
default route can be added in /etc/sysconfig/network-scripts/ifcfg-eth0
GATEWAY=10.0.2.15
Third rule is not clear to me

Multiple Linux network namespaces for single application

I'm trying to use network namespaces to achieve VRF behavior (virtual routing and forwarding) for network isolation. Essentially I have a server application (C/C++) running on a TCP port in the default namespace. What I'd like to do is use network namespaces to create isolated VRF's using VLANs, and then have that application running in the default namespace be able to spawn a thread to each namespace to listen on that same port per namespace.
I have the network side figured out, I just can't see how I can spawn a thread (prefer to use pthread's instead of clone() if possible), call setns() on one of those namespaces, and then bind to the same port inside the namespace. Here's what I'm doing to create the namespaces and bridges (limiting to one namespace here for simplicity):
# ip netns add ns_vlan100
# ip link add link eno1 eno1.100 type vlan id 100
# ip link add veth0 type veth peer name veth_vlan100
# ip link set veth0 netns ns_vlan100
# ip netns exec ns_vlan100 ip link set dev veth0 up
# ip link set dev veth_vlan100 up
# brctl addbr bridge_vlan100
# brctl addif bridge_vlan100 eno1.100
# brctl addif bridge_vlan100 veth_vlan100
# ip link set dev bridge_vlan100 up
# ip link set dev eno1.100 up
# ip netns exec ns_vlan100 ifconfig veth0 10.10.10.1 netmask 255.255.255.0 up
# ip netns exec ns_vlan100 ip route add default via 10.10.10.1
With this, I can create a VLAN on a peer machine (no containers) and ping 10.10.10.1 without issue. So I know the links are good. What I want to do is then have my existing application be able to spawn a thread in C or C++ (pthreads are heavily preferred), and that thread call setns() with something to put it into namespace ns_vlan100, so I can then bind to the same port for my application, just inside that namespace.
I can't seem to figure out how to do this. Any help is much appreciated.

How to use Linux Network Namespaces for per processes routing?

I want to crawl webpages through browser and store network traffic per URL (not only HTTP but also udp, rtmp etc.) I came across this solution to use linux network namespace for per process routing. Following are the steps I followed, however unable to browse the webpage.
ip netns add test
create a pair of virtual network interfaces (veth-a and veth-b):
ip link add veth-a type veth peer name veth-b
change the active namespace of the veth-a interface:
ip link set veth-a netns test
configure the IP addresses of the virtual interfaces:
ip netns exec test ifconfig veth-a up 192.168.163.1 netmask 255.255.255.0
ifconfig veth-b up 192.168.163.254 netmask 255.255.255.0
configure the routing in the test namespace:
ip netns exec test route add default gw 192.168.163.254 dev veth-a
sudo bash -c ‘echo 1 > /proc/sys/net/ipv4/ip_forward’
sudo iptables -t nat -A POSTROUTING -s 192.168.163.0/24 -o wlan0 -j MASQUERADE
Open Browser in the namepace and get following:
sudo ip netns exec test /usr/bin/firefox http://google.com
(firefox:15861): GConf-WARNING **: Client failed to connect to the D-BUS daemon:
Failed to connect to socket /tmp/dbus-xE8M4KnMPn: Connection refused
(firefox:15861): LIBDBUSMENU-GLIB-WARNING **: Unable to get session bus: Could not connect: Connection refused
In wireshark: sudo ip netns exec test wireshark
I can see Only Outgoing DNS requests from 192.168.163 to 127.0.1.1.
Kindly let me know what I am missing here?
Instead of modifying the host /etc/resolv.conf a cleaner way would be to create a network namespace specific resolv.conf in the following path /etc/netns/ . The "ip netns" utility will bind-mound any resolv.conf on this path to a /etc/resolv.conf in a mount namespace for the process launched with the new network namespace.
Got it. I am able to ping 8.8.8.8. The problem was in DNS resolving.
Update DNS resolver.
put nameserver 8.8.8.8 in /etc/resolvconf/resolv.conf.d/base and in /etc/resolvconf/resolv.conf.d/head.
Restart Network.
sudo service network-manager restart
Now /etc/resolv.conf looks like.
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 8.8.8.8
nameserver 127.0.1.1
Finally.
sudo ip netns exec test /opt/google/chrome/google-chrome --user-data-dir=/tmp/chrome2/ http://yahoo.com

Resources