how to force user-apps to resolve "route lookup" using specfic routing table - linux

I understand that
a. one can maintain multiple routing tables in linux using "ip route ..... table "
b. forwarding decision for packets that ingress from outside network could be done using "ip rule add iif dev table "
However, if I want an user-app to talk to the outside world using specific routing table, I don't see an easy way out except to use "ip netns".
Is there a way to tell the system to use "lookup route" using specific routing table?
My understanding is "ip rules" apply only after a packet has been generated, but the user-apps consult the routing table even before the packet is generated so that ARP for the gateway can be sent.

This is a bit complicated matter. You should familiarize with SELinux labels and containers.
Docker documentation for RedHat states:
by default, docker creates a virtual ethernet card for each container. each container has its own routing tables and iptables. in addition to this, when you ask for specific ports to be forwarded, docker creates certain host iptables rules for you. the docker daemon itself does some of the proxying. the takeaway here is that if you map applications to containers, you provide flexibility to yourself by limiting network access on a per-application basis.

Related

DNSMASQ serve different dns results to different subnets

In my network infrastructure I have multiple subnets intended to segregate different types of devices. I would like the ability to serve different DNS responses from different DNS servers based on the requesting subnet. For example I'd like to use Google's DNS for one subnet but say CloudFlare's anti-malware DNS for another. I would also like the ability to then further lock down by using different "address" declarations on the different subnets.
One way that some people accomplish the first part is to use the "dhcp-option" declaration to serve different server addresses to the different subnets but this kind of defeats the purpose of DNSMASQ and turns it basically into just a DHCP server and also defeats using a firewall to restrict access to port 53 to control any hard-coded dns servers.
The other option I've seen is to run 2 instances of DNSMASQ however this creates a highly customized setup which doesn't allow any of the system level configuration files or run scripts which I'd like to avoid.
So I'm hoping someone can offer a solution for this.
Thanks in advance.
Presumably you want to all of the subnets to use DNSMasq to resolve local domain names, but you want the subnets to use different recursive resolvers for Internet queries?
You should be able to do that with the DHCP settings (so that each subnet will received two DNS entries - one for DNSMasq and one for another resolver e.g. 8.8.8.8). These entries will end up in the /etc/resolv.conf for each device and will be attempted in order when the device needs to resolve DNS. If DNSMasq is set to resolve local queries only, then the device will be forced to try the second address (e.g. 8.8.8.8) to resolve Internet queries.

Docker create two bridges that corrupts my internet access

I'm facing a pretty strange issue:
Here is my config:
docker 17-ce
ubuntu 16.04.
I work from two differents places with differents internet providers.
On the first place, everything works just fine, i can run docker out of the box and access internet without any problems.
But on the second place i cannot access the internet while docker is running, more precisly while the two virtual briges created by docker are up.
In this place, internet connection operate very strangly, i can ping google dns at 8.8.8.8, but nearly all dns request failed and most of the time after a few seconds the internet connection is totally down.
( The only difference between the first and the second place is the internet provider ).
At first i tought i could fix that by changing the default network bridge ip, but this does not solve the problem at all.
The point is that the --bip option of the docker daemon change the IP of the default docker bridge docker0, but docker also create an other bridge called br-1a0208f108d9 which does not reflect the settings passed to the --bip option.
I guess that this second bridge is causing trouble to my network because it overlap my wifi adapter configuration.
I'm having a hard time trying to diagnosticate this.
My questions are:
How can i be sure that my asumptions are right and that this second bridget is in conflict with my wifi adapter
What is this second bridge ? It's easy to find documentation about the docker0 bridge, but i cannot find anything related to this second bridge br-1a0208f108d9
How the exact same setup can work on one place and not an other one.
With this trouble i feel like i'm pretty close to level up my docker knowledges but before that i have to increase my network administration knowledges.
Hope you can help.
I manage to solve this issue after reading this:
https://success.docker.com/Architecture/Docker_Reference_Architecture%3A_Designing_Scalable%2C_Portable_Docker_Container_Networks
The second docker bridge br-1a0208f108d9 was created by docker because i was using a docker-compose file which involve the creation of an other custom network.
This network was using a fixed ip range:
networks:
my_network:
driver: bridge
ipam:
config:
- subnet: 172.16.0.0/16
gateway: 172.16.0.1
At my home, the physical wifi network adapter was automaticly assigned using DHCP the address 192.168.0.X.
But in the other place, the same wifi adapter get the address 172.16.0.x
Which collide with the custom docker network.
The solution was simply to change the IP of the custom docker network.
You have to tell Docker to use a different subnet. Edit /etc/docker/daemon.json and use something like this:
{
"bip": "198.18.251.1/24",
"default-address-pools": [
{
"base": "198.18.252.0/22",
"size": 26
}
]
}
Information is a bit hard to come by, but it looks like the bip option controls the IP and subnet assigned to the docker0 interface, while default-address-pools controls the addresses used for the br-* interfaces. You can omit bip in which case it will grab an allocation from the pool, and bip doesn't have to reside in the pool, as shown above.
The size is how big of a subnet to allocate to each Docker network. For example if your base is a /24 and you also set size to 24, then you'll be able to create exactly one Docker network, and probably you'll only be able to run one Docker container. If you try to start another you'll get the message could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network, which means you've run out of IP addresses in the pool.
In the above example I have allocated a /22 (1024 addresses) with each network/container taking a /26 (64 addresses) from that pool. 1024 รท 64 = 16, so you can run up to 16 Docker networks with this config (so max 16 containers running at the same time, or more if some of them share the same network). Since I rarely have more than two or three running containers at any one time this is fine for me.
In my example I'm using part of the 198.18.0.0/15 subnet as listed in RFC 3330 (but fully documented in RFC 2544) which is reserved for performance testing. It is unlikely that these addresses will appear on the real Internet, and no professional network provider will use these subnets in their private network either, so in my opinion they are a good choice for use with Docker as conflicts are very unlikely. But technically this is a misuse of this IP range so just be aware of potential future conflicts if you also choose to use these subnets.
The defaults listed in the documentation are:
{
"bip": "",
"default-address-pools": [
{"base": "172.80.0.0/16", "size": 24},
{"base": "172.90.0.0/16", "size": 24}
]
}
As mentioned above, the default empty bip means it will just grab an allocation from the pool, like any other network/container will.
In my case I would not apply Clement solution because I have the network conflict only with my dev pc while the container is delivered to many server which are not affected.
This problem in my opinion should be resolved as suggested here.
I tried this workaround:
I stopped the container with "docker-compose down" which destroys the bridge
Started the container while I'm on the "bad" network, so the container use another network
Since then, if I restart the container on any network it doesn't try to use the "bad" one, normally get the last used one.

Test setup on AWS to test TCP transparent proxy (TPROXY) and spoofing sockets

I'm developing a proof-of-concept of some kind of transparent proxy on Linux.
Transparent proxy intercepts TCP traffic and forwards it to backend.
I use https://www.kernel.org/doc/Documentation/networking/tproxy.txt and spoofing sockets for outgoing TCP connection.
On my dev PC I was able to emulate network using Docker and all works fine.
But I need to deploy test environment on AWS.
Proposed design:
Three VMs within the same subnet:
client, 192.168.0.2
proxy, 192.168.0.3
backend, 192.168.0.4
On client I add route to 192.168.0.4 thru 192.168.0.3
On proxy I confugure TPROXY to intercept TCP packets and forward it to backend with 192.168.0.2 IP source address. Here our transparent proxy works.
On backend I run simple web server. Also I add route to 192.168.0.2 thru 192.168.0.3 otherwise packets will go back directly to 192.168.0.2
The question:
Will proposed network design work as expected?
AWS uses some kind of software defined network and I don't know will it work in the same way as I would connect 3 Linux boxes to one Ethernet switch.
Will proposed network design work as expected?
Highly unlikely.
The IP network in VPC that instances can access is, from all appearances, an IP network (Layer 3), not an Ethernet network (Layer 2), even though it's presented to the instances as though it were Ethernet.
The from/to address that is "interesting" to an Ethernet switch is the MAC address. The from/to address of interest to the EC2 network is the IP address. If you tweak your instance's IP stacks by spoofing the addresses and manipulating the route tables, the only two possible outcomes should be one of these: the packets will actually arrive at the correct instance according to the infrastructure's knowledge of where that IP address should exist... or the packets will be dropped by the network. Most likely, the latter.
There is an IP Source/Destination Check Flag on each EC2 instance that disables some of the network's built-in blocking of packets the network would otherwise have considered spoofed, but this should only apply to traffic with IP addresses outside the VPC supernet CIDR block -- the IP address of each instance is known to the infrastructure and not subject to the kind of tweaking you're contemplating.
You could conceivably build tunnels among the instances using the Generic Route Encapsulation (GRE) protocol, or OpenVPN, or some other tunneling solution, and then the instances would have additional network interfaces in different IP subnets where they could directly exchange traffic using a different subnet and rules they make up, since the network wouldn't see the addresses on the packets encapsulated in the tunnels, and wouldn't impose any restrictions on the inner payload.
Possibly related: In a certain cloud provider other than AWS, a provider with a network design that is far less sensible than VPC, I use inter-instance tunnels (built with OpenVPN) to build my own virtual private subnets that make more sense than what that other cloud provider offers, so I would say this is potentially a perfectly viable alternative -- the increased latency of my solution is sub-millisecond.
But this all assumes that you have a valid reason for choosing a solution involving packet mangling. There should be a better, more inside-the-box way of solving the exact problem you are trying to solve.

using netcat for external loop-back test between two ports

I am writing a test script to exercise processor boards for a burn-in cycle during manufacturing. I would like to use netcat to transfer files from one process, out one Ethernet port and back into another Ethernet port to a receiving process. It looks like netcat would be an easy tool to use for this.
The problem is that if I set up the ethernet ports with IP addresses on separate IP sub nets and attempt to transfer data from one to the other, the kernel's protocol stack detects an internal route and although the data transfer completes as expected, it does NOT go out over the wire. The packets are routed internally.
That's great for network optimization but it foils the test I want to do.
Is there easy way to make this work? Is there a trick with iptables that would work? Or maybe things you can do to the route table?
I use network name spaces to do this sort of thing. With each of the adapters in a different namespace the data traffic definitely goes through the wire instead of reflecting in the network stack. The separate namespaces also prevent reverse packet filters and such from getting in the way.
So presume eth0 and eth1, wiht iperf3 as the reflecting agent (ping server or whatever). [DISCLAIMER:text from memory, all typos are typos, YMMV]
ip netns add target
ip link set dev eth1 up netns target
ip netns exec target ip address add dev eth1 xxx.xxx.xxx.xxx/y
ip netns exec target iperf3 --server
So now you've created the namespace "target", moved one of your adapters into that namespace. Set its IP address. And finally run your application in the that target namespace.
You can now run any (compatible) program in the native namespace, and if it references the xxx.xxx.xxx.xxx IP address (which clearly must be reachable with some route) will result in on-wire traffic that, with a proper loop-back path, will find the adapter within the other namespace as if it were a different computer all together.
Once finished, you kill the daemon server and delete the namespace by name and then the namespace members revert and you are back to vanilla.
killall iperf3
ip netns delete target
This also works with "virtual functions" of a single interface, but that example requires teasing out one or more virtual functions --- e.g. SR-IOV type adapters -- and handing out local mac addresses. So I haven't done that enough to have a sample code tidbit ready.
Internal routing is preferred because in the default routing behaviour you have all the internal routes marked as scope link in the local table. Check this out with:
ip rule show
ip route show table local
If your kernel supports multiple routing tables you can simply alter the local table to achieve your goal. You don't need iptables.
Let's say 192.168.1.1 is your target ip address and eth0 is the interface where you want to send your packets out to the wire.
ip route add 192.168.1.1/32 dev eth0 table local

gsoap client multiple ethernets

I have a linux system with two eth cards. eth0 and eth1. I am creating a client that sends
to endpoint 1.2.3.4.
I send my webservice with soap_call_ functions. How can I select eth1 instead of eth0?
the code is like that
soap_call_ns__add(&soap, server, "", a, b, &result);
How can I set inside the &soap variable the eth0 or the eth1?
(gsoap does not have a bind for clients... like soap_bind)
You want outgoing packages from your host to take a specific route (in this case a specific NIC)? If that's the case, then you have to adjust kernels routing tables.
Shorewall has excellent documentation on that kind of setup. You'll find there info about how to direct certain traffic through a particular network interface.
for gsoap we need to manually bind(2) before connect(3) in tcp_connect

Resources