How to make sure packets from the same flow land on the same queue on two NICs when bridging - linux

I'm writing a network bridge that reassembles and analyzes TCP flows on the fly. I have a pair of multi-queue NICS and I use netmap to capture packets from each rx queues on different threads and than pass them on to the other NIC for transmission. The problem is, packets from the same flow do not land on the same queue on the two NICs, due to the source and destination addresses and port being reversed.
I tried ethtool to change the tuple the hash of which is used for distributing packets. Running this:
# ethtool --show-nfc p1p1 rx-flow-hash tcp4
results in:
TCP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA
L4 bytes 0 & 1 [TCP/UDP src port]
L4 bytes 2 & 3 [TCP/UDP dst port]
Repeating the above command for the other NIC results in the same queue. I thought I could change the order for one of the rings by running ethtool --config-nfc p1p1 rx-flow-hash tcp4 dsfn but that doesn't change the order of the fields used in the hash. It seems that both sdfn and dsfn result in the same tuple. The same is true for ds and sd.
Is there a way to make the packets in one flow always land on the same queue (and the same thread) on both NICs whether using ethtool to configure the NICs or otherwise another method or tool?

Related

Specify MTU value

I'm trying to pentest some IPSEC implementation for a uni project, and following this guide I'm stuck at:
Step 1 (common): Forging an ICMP PTB packet from the untrusted network The attacker first has to forge an appropriate ICMP PTB packet (a single packet is sufficient). This is done by eavesdropping a valid packet from the IPsec tunnel on the untrusted network. Then the attacker forges an ICMP PTB packet, specifying a very small MTU value equal or smaller than 576 with IPv4 (resp. 1280 with IPv6). The attacker can use 0 for instance. This packet spoofs the IP address of a router of the untrusted network (in case the source IP address is checked), and in order to bypass the IPsec protection mechanism against blind attacks, it includes as a payload a part of the outer IP packet that has just been eavesdropped. This is the only packet an attacker needs to send. None of the following steps involve the attacker.
I know what MTU is, but what does the bold statement mean?
How do I set the MTU size of a packet with scapy?
It means that I have to set the size of a IP packet less than 576 bytes?
It's already set to 140 B,at least it shows this with len command.
There's something that I didn't get right, maybe I have to set the fragmentation?
I know nothing about the subject, but some quick searching seems to indicate that it's referring to an IPv6 ICMP packet with a type of 2 ("packet too big").
Then from some poking around scapy, this appears to be how you'd create one:
from scapy.layers.inet6 import ICMPv6PacketTooBig
icmp_ptb = ICMPv6PacketTooBig(mtu=0)
Of course though, you'll need to do some testing to verify this.

How to automate measuring of bandwidth usage between two hosts

I have an application that has a TCP client and a server. I set up the client and server on separate machines. Now I want to measure how much bandwidth is being consumed ( bytes sent and received during a single run of the application). I have discovered that wireshark is one such tool that can help me get this statistic. However, wireshark seems to be GUI dependent. What I wanted was a way to automate the measuring and reporting of this statistic. I dont care about the information about individual packets captured by wireshark. I dont need that information. Is there some way to run wireshark so that all it does is write to a file, the total bytes sent and received between two hosts while the application was running on both ends?
Also, is there a better way to capture this statistic ? Through netstat or /proc/dev/net or any other tool ?
Both my machines have ubuntu 10.04 or later running on them.
Bro is an appropriate tool to measure connection-oriented statistics. You can either record a trace of your application communication or analyze it in realtime:
bro -r <trace>
bro -i <interface>
Thereafter, have a look at the connection log (conn.log) in the same directory for the amount of bytes sent and received by the application. Specifically, you're interested in the TCP payload size, which conn.log exposes via the columns orig_bytes and resp_bytes. Here is an example:
bro-cut id.orig_h id.resp_h conn_state orig_bytes resp_bytes < conn.log | head
which yields the following output:
192.168.1.102 192.168.1.1 SF 301 300
192.168.1.103 192.168.1.255 S0 350 0
192.168.1.102 192.168.1.255 S0 350 0
192.168.1.103 192.168.1.255 S0 560 0
192.168.1.102 192.168.1.255 S0 348 0
192.168.1.104 192.168.1.255 S0 350 0
192.168.1.104 192.168.1.255 S0 549 0
192.168.1.103 192.168.1.1 SF 303 300
192.168.1.102 192.168.1.255 S0 - -
192.168.1.104 192.168.1.1 SF 311 300
Each row represents a single connection, transport-layer ports omitted. The last two columns represent the bytes sent by the originator (first column) and responder (second column). The column conn_state represents the connection status. Please refer to the documentation for all possible field values. Some important values are:
S0: Connection attempt seen, no reply.
S1: Connection established, not terminated.
SF: Normal establishment and termination. Note that this is the same symbol as for state S1. You can tell the two apart because for S1 there will not be any byte counts in the summary, while for SF there will be.
REJ: Connection attempt rejected.

How to programmatically increase the per-socket buffer for UDP sockets on LInux?

I'm trying to understand the correct way to increase the socket buffer size on Linux for our streaming network application. The application receives variable bitrate data streamed to it on a number of UDP sockets. The volume of data is substantially higher at the start of the stream and I've used:
# sar -n UDP 1 200
to show that the UDP stack is discarding packets and
# ss -un -pa
to show that each socket Recv-Q length grows to the nearly the limit (124928. from sysctl net.core.rmem_default) before packets are discarded. This implies that the application simply can't keep up with the start of the stream. After discarding enough initial packets the data rate slows down and the application catches up. Recv-Q trends towards 0 and remains there for the duration.
I'm able to address the packet loss by substantially increasing the rmem_default value which increases the socket buffer size and gives the application time to recover from the large initial bursts. My understanding is that this changes the default allocation for all sockets on the system. I'd rather just increase the allocation for the specific UDP sockets and not modify the global default.
My initial strategy was to modify rmem_max and to use setsockopt(SO_RCVBUF) on each individual socket. However, this question makes me concerned about disabling Linux autotuning for all sockets and not just UDP.
udp(7) describes the udp_mem setting but I'm confused how these values interact with the rmem_default and rmem_max values. The language it uses is "all sockets", so my suspicion is that these settings apply to the complete UDP stack and not individual UDP sockets.
Is udp_rmem_min the setting I'm looking for? It seems to apply to individual sockets but global to all UDP sockets on the system.
Is there a way to safely increase the socket buffer length for the specific UDP ports used in my application without modifying any global settings?
Thanks.
Jim Gettys is armed and coming for you. Don't go to sleep.
The solution to network packet floods is almost never to increase buffering. Why is your protocol's queueing strategy not backing off? Why can't you just use TCP if you're trying to send so much data in a stream (which is what TCP was designed for).

Do MTU modifications impact both directions?

ifconfig 1.2.3.4 mtu 1492
This will set MTU to 1492 for incoming, outgoing packets or both? I think it is only for incoming
TLDR: Both. It will only transmit packets with a payload length less than or equal to that size. Similarly, it will only accept packets with a payload length within your MTU. If a device sends a larger packet, it should respond with an ICMP unreachable (oversized) message.
The nitty gritty:
Tuning the MTU for your device is useful because other hops between you and your destination may encapsulate your packet in another form (for example, a VPN or PPPoE.) This layer around your packet results in a bigger packet being sent along the wire. If this new, larger packet exceeds the maximum size of the layer, then the packet will be split into multiple packets (in a perfect world) or will be dropped entirely (in the real world.)
As a practical example, consider having a computer connected over ethernet to an ADSL modem that speaks PPPoE to an ISP. Ethernet allows for a 1500 byte payload, of which 8 bytes will be used by PPPoE. Now we're down to 1492 bytes that can be delivered in a single packet to your ISP. If you were to send a full-size ethernet payload of 1500 bytes, it would get "fragmented" by your router and split into two packets (one with a 1492 byte payload, the other with an 8 byte payload.)
The problem comes when you want to send more data over this connection - lets say you wanted to send 3000 bytes: your computer would split this up based on your MTU - in this case, two packets of 1500 bytes each, and send them to your ADSL modem which would then split them up so that it can fulfill its MTU. Now your 3000 byte data has been fragmented into four packets: two with a payload of 1492 bytes and two with a payload of 8 bytes. This is obviously inefficient, we really only need three packets to send this data. Had your computer been configured with the correct MTU for the network, it would have sent this as three packets in the first place (two 1492 byte packets and one 16 byte packet.)
To avoid this inefficiency, many IP stacks flip a bit in the IP header called "Don't Fragment." In this case, we would have sent our first 1500 byte packet to the ADSL modem and it would have rejected the packet, replying with an Internet Control (ICMP) message informing us that our packet is too large. We then would have retried the transmission with a smaller packet. This is called Path MTU discovery. Similarly, a layer below, at the TCP layer, another factor in avoiding fragmentation is the MSS (Maximum Segment Size) option where both hosts reply with the maximum size packet they can transfer without fragmenting. This is typically computed from the MTU.
The problem here arises when misconfigured firewalls drop all ICMP traffic. When you connect to (say) a web server, you build a TCP session and send that you're willing to accept TCP packets based on your 1500 byte MTU (since you're connected over ethernet to your router.) If the foreign web server wanted to send you a lot of data, they would split this into chunks that (when combined with the TCP and IP headers) came out to 1500 byte payloads and send them to you. Your ISP would receive one of these and then try to wrap it into a PPPoE packet to send to your ADSL modem, but it would be too large to send. So it would reply with an ICMP unreachable, which would (in a perfect world) cause the remote computer to downsize its MSS for the connection and retransmit. If there was a broken firewall in the way, however, this ICMP message would never be reached by the foreign web server and this packet would never make it to you.
Ultimately setting your MTU on your ethernet device is desirable to send the right size frames to your ADSL modem (to avoid it asking you to retransmit with a smaller frame), but it's critical to influence the MSS size you send to remote hosts when building TCP connections.
ifconfig ... mtu <value> sets the MTU for layer2 payloads sent out the interface, and will reject larger layer2 payloads received on this interface. You must ensure your MTU matches on both sides of an ethernet link; you should not have mismatched mtu values anywhere in the same ethernet broadcast domain. Note that the ethernet headers are not included in the MTU you are setting.
Also, ifconfig has not been maintained in linux for ages and is old and deprecated; sadly linux distributions still include it because they're afraid of breaking old scripts. This has the very negative effect of encouraging people to continue using it. You should be using the iproute2 family of commands:
[mpenning#hotcoffee ~]$ sudo ip link set mtu 1492 eth0
[mpenning#hotcoffee ~]$ ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1492 qdisc mq state UP qlen 1000
link/ether 00:1e:c9:cd:46:c8 brd ff:ff:ff:ff:ff:ff
[mpenning#hotcoffee ~]$
Large incoming packets may be dropped based on the interface MTU size.
For example, the default MTU 1500 on
Linux 2.6 CentOS (tested with Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01))
drops Jumbo packets >1504. Errors appear in ifconfig and there are rx_long_length_errors indications for this in ethtool -S output.
Increasing MTU indicates Jumbo packets should be supported.
The threshold for when to drop packets based on their size being too large appears to depend on MTU (-4096, -8192, etc.)
Oren
It's the Maximum Transmission Unit, so it definitely sets the outgoing maximum packet size. I'm not sure if will reject incoming packets larger than the MTU.
There is no doubt that MTU configured by ifconfig impacts Tx ip fragmentation, I have no more comments.
But for Rx direction, I find whether the parameter impacts incoming IP packets, it depends. Different manufacturer behaves differently.
I tested all the devices on hand and found 3 cases below.
Test case:
Device0 eth0 (192.168.225.1, mtu 2000)<--ETH cable-->Device1 eth0
(192.168.225.34, mtu MTU_SIZE)
On Device0 ping 192.168.225.34 -s ICMP_SIZE,
Checking how MTU_SIZE impacts Rx of Device1.
case 1:
Device1 = Linux 4.4.0 with Intel I218-LM:
When MTU_SIZE=1500, ping succeeds at ICMP_SIZE=1476, fails at ICMP_SIZE=1477 and above. It seems that there is a PRACTICAL MTU=1504 (20B(IP header)+8B(ICMP header)+1476B(ICMP data)).
When MTU_SIZE=1490, ping succeeds at ICMP_SIZE=1476, fails at ICMP_SIZE=1477 and above, behave the same as MTU_SIZE=1500.
When MTU_SIZE=1501, ping succeeds at ICMP_SIZE=1476, 1478, 1600, 1900. It seems that jumbo frame is switched on once MTU_SIZE is set >1500 and there is no 1504 restriction any more.
case 2:
Device1 = Linux 3.18.31 with Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet:
When MTU_SIZE=1500, ping succeeds at ICMP_SIZE=1476, fails at ICMP_SIZE=1477 and above.
When MTU_SIZE=1490, ping succeeds at ICMP_SIZE=1466, fails at ICMP_SIZE=1467 and above.
When MTU_SIZE=1501, ping succeeds at ICMP_SIZE=1477, fails at ICMP_SIZE=1478 and above.
When MTU_SIZE=500, ping succeeds at ICMP_SIZE=476, fails at ICMP_SIZE=477 and above.
When MTU_SIZE=1900, ping succeeds at ICMP_SIZE=1876, fails at ICMP_SIZE=1877 and above.
This case behaves exactly as Edward Thomson said, except that in my test the PRACTICAL MTU=MTU_SIZE+4.
case 3:
Device1 = Linux 4.4.50 with Raspberry Pi 2 Module B ETH:
When MTU_SIZE=1500, ping succeeds at ICMP_SIZE=1472, fails at ICMP_SIZE=1473 and above. So there is a PRACTICAL MTU=1500 (20B(IP header)+8B(ICMP header)+1472B(ICMP data)) working there.
When MTU_SIZE=1490, behave the same as MTU_SIZE=1500.
When MTU_SIZE=1501, behave the same as MTU_SIZE=1500.
When MTU_SIZE=2000, behave the same as MTU_SIZE=1500.
When MTU_SIZE=500, behave the same as MTU_SIZE=1500.
This case behaves exactly as Ron Maupin said in Why MTU configuration doesn't take effect on receiving direction?.
To sum it all, in real world, after you set ifconfig mtu,
sometimes the Rx IP packts get dropped when exceed 1504 , no matter what MTU value you set (except that the jumbo frame is enabled).
sometimes the Rx IP packts get dropped when exceed the MTU+4 you set on receiving device.
sometimes the Rx IP packts get dropped when exceed 1500, no matter what MTU value you set.
... ...

UDP IP Fragmentation and MTU

I'm trying to understand some behavior I'm seeing in the context of sending UDP packets.
I have two little Java programs: one that transmits UDP packets, and the other that receives them. I'm running them locally on my network between two computers that are connected via a single switch.
The MTU setting (reported by /sbin/ifconfig) is 1500 on both network adapters.
If I send packets with a size < 1500, I receive them. Expected.
If I send packets with 1500 < size < 24258 I receive them. Expected. I have confirmed via wireshark that the IP layer is fragmenting them.
If I send packets with size > 24258, they are lost. Not Expected. When I run wireshark on the receiving side, I don't see any of these packets.
I was able to see similar behavior with ping -s.
ping -s 24258 hostA works but
ping -s 24259 hostA fails.
Does anyone understand what may be happening, or have ideas of what I should be looking for?
Both computers are running CentOS 5 64-bit. I'm using a 1.6 JDK, but I don't really think it's a programming problem, it's a networking or maybe OS problem.
Implementations of the IP protocol are not required to be capable of handling arbitrarily large packets. In theory, the maximum possible IP packet size is 65,535 octets, but the standard only requires that implementations support at least 576 octets.
It would appear that your host's implementation supports a maximum size much greater than 576, but still significantly smaller than the maximum theoretical size of 65,535. (I don't think the switch should be a problem, because it shouldn't need to do any defragmentation -- it's not even operating at the IP layer).
The IP standard further recommends that hosts not send packets larger than 576 bytes, unless they are certain that the receiving host can handle the larger packet size. You should maybe consider whether or not it would be better for your program to send a smaller packet size. 24,529 seems awfully large to me. I think there may be a possibility that a lot of hosts won't handle packets that large.
Note that these packet size limits are entirely separate from MTU (the maximum frame size supported by the data link layer protocol).
I found the following which may be of interest:
Determine the maximum size of a UDP datagram packet on Linux
Set the DF bit in the IP header and send continually larger packets to determine at what point a packet is fragmented as per Path MTU Discovery. Packet fragmentation should then result in a ICMP type 3 packet with code 4 indicating that the packet was too large to be sent without being fragmented.
Dan's answer is useful but note that after headers you're really limited to 65507 bytes.

Resources