Writing linux conntrack module for DNS requests - dns

I set up a netfilter rule that balances DNS requests using the random mode of the statistics module with some NAT rules. That part worked well however when a DNS client sends all its requests from the same source port the DNS requests all are balanced to the same backend server.
I'm assuming this happens because the connection tracking identifies all the UDP packets as part of the same UDP connection. I couldn't find an easy fix for this, is there one?
In the case there isn't I will have to write some code to make things behave how I'd like. What is the proper approach to doing this?
My first thought was to create something similar to ip_conntrack_ftp that identifies DNS connections by using the ip source/dest as well as the DNS sequence number.

You shouldn't need anything that complicated - you just need to find a way to make the load balancer work "per packet" instead of "per flow".

Related

Designing a DSR load balancer

I want to build a DSR load balancer for an application I am writing. I wont go into the application because it is irrelevant for this discussion. My goal is to create a simple load balancer that does direct server response for TCP packets. The idea is to receive all packets at the load balancer, then using something like round robin, select a server from a list of available servers which are defined in some config file. The next step would be to alter the packer received and change the destination ip to be equal to the chosen backend server. Finally, the packet will be sent over to the backend server using normal system calls for sending packets. Theoretically the backend server should receive the packet, and send one back to the original requester, and then the requester can communicate directly with the backend server rather than going through the load balancer.
I am concerned that this design will not work as I expect it to. The main question is, what happens when computer A send a packet to IP Y, but receives a packet back in the same TCP stream from a computer at IP X? Will it continue to send packets to IP Y? Or will it switch over to IP X?
So it turns out this is possible, but only halfway so, and I will explain what I mean by this. I have three processes, one which is netcat, used to initiate an tcp request, a second process, the dsr-lb, which receives packets on a certain port, changes the destination ip to a backend server(passed in via command line arg), and forwards it off using raw sockets, and a third process which is a basic echo server. I got this working on a local setup. The local setup consists of netcat running on my desktop, and dsr-lb and echo servers running on two different linux VMs on the desktop as well. The path of the packets was like this:
nc -> dsr-lb -> echo -> nc
When I said it only half works, what I meant was that outgoing traffic has to always go through the dsr-lb, but returning traffic can go directly to the client. The client does not send further traffic directly to the backend server, but still goes through the dsr-lb. This makes sense since the client opened a socket to the dsr-lb ip, and internally still remembers this ip, regardless of where the packet came from.
The comment saying "if its from a different IP, it's not the same stream. tcp is connection-based" is incorrect. I read through the linux source code, specifically the receive tcp packet portion, and it turns out that linux uses source ip, source port, destination ip, and destination port to calculate a hash which is uses to find the socket that should receive the traffic. However, if no such socket matches the hash, it tries again using only the destination ip and destination port and that is how this "magic" works. I have no idea if this would work on a windows machine though.
One caveat to this answer is that I also spun up two remote VMs and tried the same experiment, and it did not work. I am guessing it worked while all the machines were on the same switch, but there might be a little more work to do to get it to work if it goes through different routers. I am still trying to figure this out, but from using tcpdump to analyze the traffic, for some reason the dsr-lb is forwarding to the wrong port on the echo server. I am not sure if something is corrupted, or if the checksum is wrong after changing the destination ip and some router along the way is dropping it or changing it somehow(I suspect this might be the case) but hopefully I can get it working over an actual network.
The theory should still hold though. The IP layer is basically a packet forwarding layer and routers should not care about the contents of the packets, they should just forward packets based on their routing tables, so changing the destination of the packet while leaving the source the same should result in the source receiving any answer. The fact that the linux kernel ultimately resolves packets to sockets just using destination ip and port means the only real roadblock to this working does not really exist.
Also, if anyone is wondering why bother doing this, it may be useful for a loadbalancer in front of websocket servers. Its not as great as a direct connection from client to websocket server, but it is better than a loadbalancer that handles both requests and responses, which makes it more scalable, and more able to run on less resources.

Node.js: Disable UDP DNS lookup and use the given IP instead

I have a simple CentOS node.js server that is supposed to consume high frequency UDP messages and then forward them to another service.
Trouble is that dgram.send does a DNS lookup on EVERY call. This DNS lookup is both slowing down the processing of the messages and occasionally getting the DNS server to blacklist the node.js host server thinking it's getting DOS'd.
The question is: how do I send a UDP packet in node.js WITHOUT incurring a DNS lookup?
Thanks for the time.
Glancing through the code for Node, it looks like you can pass an IP address to dgram.send and it won't do anything with DNS. Is it possible to look up or cache your IPs manually and then pass them to the send method?

tcp_tw_reuse vs tcp_tw_recycle : Which to use (or both)?

I have a website and application which use a significant number of connections. It normally has about 3,000 connections statically open, and can receive anywhere from 5,000 to 50,000 connection attempts in a few seconds time frame.
I have had the problem of running out of local ports to open new connections due to TIME_WAIT status sockets. Even with tcp_fin_timeout set to a low value (1-5), this seemed to just be causing too much overhead/slowdown, and it would still occasionally be unable to open a new socket.
I've looked at tcp_tw_reuse and tcp_tw_recycle, but I am not sure which of these would be the preferred choice, or if using both of them is an option.
According to Linux documentation, you should use the TCP_TW_REUSE flag to allow reusing sockets in TIME_WAIT state for new connections.
It seems to be a good option when dealing with a web server that have to handle many short TCP connections left in a TIME_WAIT state.
As described here, The TCP_TW_RECYCLE could cause some problems when using load balancers...
EDIT (to add some warnings ;) ):
as mentionned in comment by #raittes, the "problems when using load balancers" is about public-facing servers. When recycle is enabled, the server can't distinguish new incoming connections from different clients behind the same NAT device.
NOTE: net.ipv4.tcp_tw_recycle has been removed from Linux in 4.12 (4396e46187ca tcp: remove tcp_tw_recycle).
SOURCE: https://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux
pevik mentioned an interesting blog post going the extra mile in describing all available options at the time.
Modifying kernel options must be seen as a last-resort option, and shall generally be avoided unless you know what you are doing... if that were the case you would not be asking for help over here. Hence, I would advise against doing that.
The most suitable piece of advice I can provide is pointing out the part describing what a network connection is: quadruplets (client address, client port, server address, server port).
If you can make the available ports pool bigger, you will be able to accept more concurrent connections:
Client address & client ports you cannot multiply (out of your control)
Server ports: you can only change by tweaking a kernel parameter: less critical than changing TCP buckets or reuse, if you know how much ports you need to leave available for other processes on your system
Server addresses: adding addresses to your host and balancing traffic on them:
behind L4 systems already sized for your load or directly
resolving your domain name to multiple IP addresses (and hoping the load will be shared across addresses through DNS for instance)
According to the VMWare document, the main difference is TCP_TW_REUSE works only on outbound communications.
TCP_TW_REUSE uses server-side time-stamps to allow the server to use a time-wait socket port number for outbound communications once the time-stamp is larger than the last received packet. The use of these time-stamps allows duplicate packets or delayed packets from the old connection to be discarded safely.
TCP_TW_RECYCLE uses the same server-side time-stamps, however it affects both inbound and outbound connections. This is useful when the server is the first party to initiate connection closure. This allows a new client inbound connection from the source IP to the server. Due to this difference, it causes issues where client devices are behind NAT devices, as multiple devices attempting to contact the server may be unable to establish a connection until the Time-Wait state has aged out in its entirety.

Intercept traffic above the transport layer

Firstly, I'm relatively new to network programming. I want to intercept and delay HTTP traffic before it gets to the server application. I've delved into libnetfilter_queue which gives me all the information I need to delay suitably, but at too low a level. I can delay traffic there, but unless I accept the IP datagrams almost immediately (so sending them up the stack when I want to delay them), they will get resent (when no ACK arrives), which isn't what I want.
I don't want or need to have to deal with TCP, just the payloads it delivers. So my question is how do I intercept traffic on a particular port before it reaches its destination, but after TCP has acknowledged and checked it?
Thanks
Edit: Hopefully it's obvious from the tag and libnetfilter_queue - this is for Linux
Hijack the connections through an HTTP proxy. Google up a good way to do this if you can't just set HTTP_PROXY on the client, or set up your filter running with the IP and port number of the current server, moving the real server to another IP.
So the actual TCP connections are between the client and you, then from you to the server. Then you don't have to deal with ACKs, because TCP always sees mission accomplished.
edit: I see the comments on the original already came up with this idea using iptables to redirect the traffic through your transparent proxy process on the same machine.
Well I've done what I suggested in my comment, and it works, even if it did feel a long-winded way of doing it.
The (or a) problem is that the web server now, understandably, thinks that every request comes from localhost. Really I would like this delay to be transparent to both client and server (except in time of course!). Is there anything I can do about this?
If not, what are the implications? Each HTTP session happens through a different port - is that enough for them to be separated completely as they should be? Presumably so considering it works when behind a NAT where the address for many sessions is the same.

DNS Server Refusing Connection

I am implementing a dns client, in which i try to connect to a local dns server, but the dns server is returning the message with an error code 5 , which means that its refusing the connection.
Any thoughts on why this might be happening ?? Thanks
DNS response error code 5 ("Refused") doesn't mean that the connection to the DNS server is refused.
It means that the DNS server refuses to provide whatever data you asked for, or to do whatever action you asked it to do (for example a dynamic update).
Since you mention a "connection", I assume that you are using TCP?
DNS primarilly uses UDP, and some DNS servers will refuse all requests over TCP.
So the solution might be as simple as switching to UDP.
Otherwise, assuming you are building your own DNS client from scratch, my first guess would be that you are formatting the request incorrectly. Eventhough the DNS protocol seems fairly simple, it is very easy to get this wrong.
Finally, the DNS server may of course simply be configured to refuse requests for whatever you are asking.
explicitly adding the network from which i wanted to allow-recursion fixed this problem for me:
these two lines added to /etc/bind/named.conf.options
recursion yes;
allow-recursion { 10.2.0.0/16; };
Policy enforcement?
The DNS server could be configured to accept only connections from certain hosts.
Hmm, if you're able to access StackOverflow you have a working DNS server SOMEwhere. Try doing
host -v stackoverflow.com
and look for messages like
Received 50 bytes from 192.168.1.1#53 in 75 ms
then pick the address out of that line and use THAT as your DNS - it's obviously willing to talk to you.
If you're on Windows, use NSLOOKUP for the same purpose. Your name server's address will be SOMEwhere in the output.
EDIT:
When I'm stuck for a DNS server, I use the one whose address I can remember most easily: 4.2.2.2 . See how that works for you.
You might try monitoring the conversation using WireShark. It can also decode the packets for you, which might help you determine if your client's packets are correctly encoded. Just filter on port 53 (DNS) to limit the packets captured by the trace.
Also, make sure you're using UDP and not TCP for queries; TCP should be used primarily for zone transfers, not queries.

Resources