IP_TRANSPARENT usage - linux

I am implementing a transparent TCP/UDP proxy for all ports (1-65535) on a Raspberry Pi on LAN. I am currently testing routing TCP packets with destination port 80 to the Raspberry Pi. The idea is that one interface (cf "proxy ip") captures incoming traffic and the other (cf "server ip") sends it to the internet and processes it before the original one sends the response to the client. The necessary routing on the router is done via
iptables -t mangle -A PREROUTING -p tcp -s SERVER_IP -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp -s SOME_TEST_CLIENT_IP --dport 80 -j MARK --set-mark 3
ip rule add fwmark 3 table 2
ip route add default via PROXY_IP dev br0 table 2
inspired by this page. This architecture implies a one-to-one port mapping between external IP addresses and the Raspberry PI's proxy interface. The packets arrive with the correct port and destination on the Raspberry Pi (verified with tcpdump), however the proxy doesn't accept the connections: no SYN-ACK is sent for the incoming SYN's. The proxy listening sockets are mainly configured with
const char PROXY_IP_ADDR[] = "192.168.1....";
const char SERVER_IP_ADDR[] = "192.168.1....";
...
struct sockaddr_in saProxy = {0};
saProxy.sin_family = AF_INET;
saProxy.sin_port = htons(80);
inet_pton(AF_INET, PROXY_IP_ADDR, &(saProxy.sin_addr.s_addr));
int enable = 1;
int sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if(-1 == setsockopt(sockfd, SOL_IP, IP_TRANSPARENT, (const char*)&enable, sizeof(enable)) /*error processing*/;
if(-1 == bind(sockfd, (sockaddr*)&saProxy, sizeof(saProxy))) /* error processing*/;
if(-1 == listen(sockfd, 1)) /* error processing*/;
Followed by epoll_ctl() and epoll_wait(). The proxy has been tested with sending HTTP requests and NBNS traffic directly to PROXY_IP without the aforementioned routing in place and it is receiving and processing these connections properly.
Unfortunately, I have found very little documentation or examples related to IP_TRANSPARENT. My original Windows-related question before I could do any testing on Linux. Kernel version is 4.1.13-v7+. How can I achieve this type of proxying?
Edit: I believe I may be missing some routing settings on the Raspberry Pi such as perhaps described here, but I have very little experience with iptables so I don't quite understand the rules described there, although I have read that non-local traffic is rejected by the kernel unless some specific routing is set up since it doesn't know about sockets.
I have also tested binding directly to an external IP address and attempted to listen for packets with this destination address, but the symptoms remain unchanged.

The solution was pretty simple actually. In order to use IP_TRANSPARENT for this purpose, you need to have a single listening socket bound to some port X. Then you need to setup the following rules, assuming you want to redirect ALL traffic going through any (I believe) interface, excluding traffic generated for/by the proxy itself. Here the proxy's IP is 192.168.1.100 and we redirect TCP to port 82 and UDP to port 83.
iptables -t mangle -A PREROUTING ! -d 192.168.1.100 -p tcp -j TPROXY --on-port 82 --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1
iptables -t mangle -A PREROUTING ! -d 192.168.1.100 -p udp -j TPROXY --on-port 83 --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1
ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100
Linux has a special mechanism called tproxy for this.
For TCP
From here on, the socket returned by accept is automatically bound to the original destination and connected to the source, so using it for transparent proxying requires no more work on this side of the proxy.
In order to get the original destination of the socket as a sockaddr_in structure, call getsockname() on the socket returned by accept() as usual.
For UDP
To be able to get the original destination, on the UDP socket, set this option before binding:
int enable = 1;
setsockopt(sockfd, SOL_IP, IP_RECVORIGDSTADDR, (const char*)&enable, sizeof(enable));
Then, to receive the data and get the original destination
char cmbuf[100];
unsigned char bytes[16*1024];
sockaddr_in srcIpAddr, dstIpAddr;
int dstPort;
iovec iov;
iov.iov_base = bytes;
iov.iov_len = sizeof(bytes)-1;
msghdr mh;
mh.msg_name = &srcIpAddr;
mh.msg_namelen = sizeof(sockaddr_in);
mh.msg_control = cmbuf;
mh.msg_controllen = 100;
mh.msg_iovlen = 1;
mh.msg_iov = &iov;
int res = recvmsg(sock, &mh, 0);
sem_post(&udpSem); //I use a semaphore to indicate when incoming data is read and socket is ready for new datagram to be processed
for(cmsghdr *cmsg = CMSG_FIRSTHDR(&mh); cmsg != NULL; cmsg = CMSG_NXTHDR(&mh, cmsg))
{
if(cmsg->cmsg_level != SOL_IP || cmsg->cmsg_type != IP_ORIGDSTADDR) continue; //normally we use IP_PKTINFO if not using tproxy, but this would yield 192.168.1.100:83 in the example
std::memcpy(&dstIpAddr, CMSG_DATA(cmsg), sizeof(sockaddr_in));
dstPort = ntohs(dstIpAddr.sin_port);
}
Then, if we want to reply to the datagram, we need to make a new UDP socket (as UDP is connectionless) and bind it to the original destination of the datagram, stored in dstIpAddr. I had some trouble here as I first tried using IP_FREEBIND, but this option does not seem to work for sending data through UDP, I think it is only intended for TCP listening sockets, so we use IP_TRANSPARENT again before binding to be able to bind to a non-local address.

Related

Can't get nftables to redirect port 80 to 8080

I've tried setting up my server so it redirects traffic for port 80 to port 8080, but it doesn't work. (I get "Connection refused" errors if I telnet to port 80, and "Unable to connect" with firefox.)
I have been able to get it to work using iptables, but would prefer using nftables. Does anybody have an idea what the problem might be? (In case it's relevant, the server is running on linode.com, with a kernel provided by linode.)
I've got the following in /etc/nftables.conf:
#!/usr/sbin/nft -f
flush ruleset
table ip fw {
chain in {
type filter hook input priority 0;
# accept any localhost traffic
iif lo accept
# accept traffic originated from us
ct state established,related accept
# accept ssh, alternative http
tcp dport { ssh, http, http-alt } ct state new counter accept
counter drop
}
}
table ip nat {
chain prerouting {
type nat hook prerouting priority 0;
tcp dport http redirect to http-alt
}
chain postrouting {
type nat hook postrouting priority 0;
}
}
If you're routing on localhost only, try using
table ip nat {
chain output {
type nat hook output priority 0;
tcp dport http redirect to http-alt
}
}
Some years ago I read for iptables that packets on the loop device don't traverse the prerouting chains but instead go through the output chains. That was my problem.
Did you mean table inet filter instead of table ip fw?
If so, I had a similar problem. Changing the ip nat prerouting priority to -101 got it working, but I'm not sure why. It might be related to the default priority for NF_IP_PRI_NAT_DST (-100): destination NAT. The only range that seemed to work was -101 to -200.
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0;
counter
# accept any localhost traffic
iif lo accept
# accept traffic originated from us
ct state {established,related} accept
# activate the following line to accept common local services
tcp dport { 22, 80, 443, 9443 } ct state new accept
# accept neighbour discovery otherwise IPv6 connectivity breaks.
ip6 nexthdr icmpv6 icmpv6 type { nd-neighbor-solicit, nd-router-advert, nd-neighbor-advert } accept
# count and drop any other traffic
counter drop
}
}
table ip nat {
chain input {
type nat hook input priority 0;
counter
}
chain prerouting {
type nat hook prerouting priority -101;
counter
tcp dport 443 counter redirect to 9443
}
chain postrouting {
type nat hook postrouting priority 0;
counter
}
}
The counter rules make it easy to see whether the chain is even being processed; the counter values can be seen via nft list ruleset.

Dnsmasq not receiving response after TPROXY intercept

I am developing a 'monitor the traffic' kind of an application on the router, where I use the TPROXY feature to intercept the DNS packet & send it to my application server listening on a port. After processing, I forward the packet to the actual destination (i.e., dnsmasq) after modifying the TTL.
JFYI, my firewall rule to TPROXY forward the DNS Response packets to my application server listening on port 2345 looks like this:
iptables -t mangle -A PREROUTING -i <WAN-INTERFACE> -p udp --sport 53 -j TPROXY --tproxy-mark 0x3 --on-port 2345
At my application server, without the error checks:
sock_fd = socket(AF_INET, SOCK_DGRAM, 0 );
setsockopt(socket_fd, SOL_IP, IP_PKTINFO, &enabled, sizeof(int));
setsockopt(socket_fd, SOL_IP, IP_TRANSPARENT, &enabled, sizeof(int));
setsockopt(socket_fd, SOL_IP, IP_RECVORIGDSTADDR, &enabled, sizeof(int));
setsockopt(socket_fd, SOL_SOCKET, SO_REUSEADDR, &enabled, sizeof(int));
/* client_addr points to the source IP (i.e. upstream DNS server's IP) */
bind(sock_fd, (const struct sockaddr *)client_addr, sizeof(struct sockaddr));
/* dst_addr points to the router IP on the WAN interface */
sendto(sock_fd, dns_packet_buffer, data_len, 0,
(const struct sockaddr *)dst_addr, sizeof(struct sockaddr));
This sendto succeeds, i.e., no error!!! But, dnsmasq does not receive the data! To be more precise, the fd on which dnsmasq is waiting for data does not become "ready."
At the dnsmasq code, inside check_dns_listeners
for (serverfdp = daemon->sfds; serverfdp; serverfdp = serverfdp->next)
if (FD_ISSET(serverfdp->fd, set))
reply_query(serverfdp->fd, serverfdp->source_addr.sa.sa_family, now);
the FD_ISSET() returns false. If I do not intercept the DNS response flow then this FD_ISSET() returns true. What am I missing here?
Finally I found the answer!! Lemme put it here suppose it is helpful for anybody else.
As I had mentioned earlier, my application was running on the router. The router manufacturers had modified the existing dnsmasq code to add an additional option to limit the interface on which they listen from the upstream server! In other words, they accept responses from the upstream server only via a given interface (like eth2). From the code perspective, they don't even listen on other interfaces other than eth2! Since my response was coming via 'lo' they weren't listening!! :)
I restarted the dnsmasq without that option and viola it works! :)
I wish they had documented it on a public forum! So that generic googling works and not reading through 1000s lines of code!!

Access web service on azure VM

I have a jersey/grizzly web service running on an Azure VM with Ubuntu server 14.04 LTS. I can access the web service from within the machine with curl, but from outside the machine the request just times out.
That is, I can do this curl line:
curl -X POST http://example.cloudapp.net:80/resource/path --data "text=´Mario"
and get a response from within the shell of the VM, but not from my windows console, or another VM in the same Azure datacenter.
I start the Grizzly web server on port 9998 in my Main.java:
public class Main {
public static final URI BASE_URI = UriBuilder.fromUri("http://0.0.0.0/").port(9998).build();
protected static SelectorThread startServer() throws IOException {
final Map<String, String> initParams = new HashMap<String, String>();
initParams.put("com.sun.jersey.config.property.packages", "mypackage");
SelectorThread threadSelector = GrizzlyWebContainerFactory.create(BASE_URI, initParams);
return threadSelector;
}
public static void main(String[] args) throws IOException {
SelectorThread threadSelector = startServer();
System.in.read();
threadSelector.stopEndpoint();
}
}
And I can see it listening with netstat -a:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 *:9998 *:* LISTEN
I have set up an Azure endpoint so that I can access the webservice on the VM, making the translation between the inner port 9998 and the outer port 80.
There are no ACL rules on the endpoint, so the default is to let everything through.
My iptables in the VM is set to let everything through:
$ sudo iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:68
My theory is that the problem is outside the virtual machine, but I don't know where to begin. Does anyone have an idea about how to solve this or how to proceed with the troubleshooting? Please ask, if there is anything else you need. One thing that might help is if I could get the access attempt logs for the Azure endpoints, but I don't know how to get them.

Distinguish forwarding traffic and locally originated traffic in Linux network driver

Is there any information in the struct skbuff to distinguish between the forwarding traffic (bridge forwarding and ip forwarding) and locally originated traffic? We want to treat these two kinds of traffic differently in the network driver because the forwarding traffic do not require cache invalidation on the whole packet size.
Any suggestions are appreciated. Thank you very much!
Yes it's possible, you can try to follow the life cycle of a receiving packet by looking at all calls from this function ip_rcv_finish (http://lxr.free-electrons.com/source/net/ipv4/ip_input.c?v=3.3#L317).
The struct struct sk_buff contain a pointer to the destination entry :
struct dst_entry *dst;
which contain a function pointer :
int (*input)(struct sk_buff*);
to call for the input packet, in the case of local packet the kernel call ip_local_deliver function and for the forwarding packet it calls ip_forward function.
I think that you can, check like this for local and forwarded packets:
- Local :
/* struct sk_buff *skb : Entry packet */
if (((struct rtable *)skb->dst)->rt_type == RTN_LOCAL)
{
/* This packet is to consume locally */
}
- Forward :
if (((struct rtable *)skb->dst)->rt_type == RTN_UNICAST)
{
/* This packet will be forwarded */
}

join/leave multicast group using libpcap

I need to receive a multicast stream but filter incoming packets by source MAC address on CentOS 5.5.
I'm planning to use libpcap library.
Is it possible to join/leave multicast group using libpcap?
If yes, how to do that?
Thanks
Sure, just construct and send the appropriate IGMP packets.
1.Create dummy socket: sd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
2.Bind it: rc = bind(sd, (sockaddr*) &addr, sizeof(sockaddr_in));
3.Join multicast group:
ip_mreq mreq;
mreq.imr_interface.s_addr = htonl(InterfaceIp);
mreq.imr_multiaddr.s_addr = htonl(DestIp);
if (setsockopt(sd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq)) < 0) {
close(sd);
// Error handle...
}
Don't send or receive packets using dummy socket
4.open pcap using pcap_open_live()
The general idea is use regular socket in order to "tell" kernel to send IGMP join packet, and after use pcap in order to capture packets.

Resources