I'm trying and failing to change the MTU for Bluetooth RFCOMM links to a Windows CE device. The documentation states that I can use setsockopt() with the SO_BTH_SET_MTU option to change the MTU for either a listening or non-connection socket.
The issue is that if I set the client/server socket MTU to 1024 (on 2 separate devices, of course), the MTU is always negotiated back to the default of 672! If I set the Minimum MTU on the listening socket to 1024, the client fails to connect, which leads me to believe that client sockets are ignoring the MTU configuration.
Does anybody have experience with altering the default MTU on Windows CE 6.0R3 devices?
I don't understand if you are changing MTU on 2 devices that you control (server and client) or if the client device is a third-party one.
In the second case it may be that it can't handle the minimum MTU you configured and this explains why it can't connect.
Related
Question update: In addition to the problem below, it seems our client/server application using the Linux PLPMTUD mechanism gets too large path MTU. Has anyone seen this, i.e. actual path MTU being 1500, but getsockopt() w TCP_MAXSEG returning the MTU:s of the endpoints, in our case 3000? I have tried turning of GRO, GSO and TSO with ethtool but the error persists. Normal ping only manages to push through packets 1472 bytes or smaller. Also worth mentioning is that PLPMTUD works perfectly for smaller MTU:s. For example, w endpoints at 1500 MTU and one interface of the intermediate router set to e.g 1200 MTU, the kernel TCP probes and reports correct TCP_MAXSEG (1200 - headers).
I am using the Linux RFC4821-compliant packetization layer path MTU discovery in an application. Basically, the client does a setsockopt on a TCP socket:
setsockopt(fd, SOL_IP, IP_MTU_DISCOVER, &sopt, sizeof(sopt));
with option value set to IP_PMTUDISC_PROBE. The setsockopt() does not return an error.
The client sends large tcp packets to a discard server, and the path MTU is calibrated by Linux kernel - tcpdump shows tcp packets with DF bit set being sent, packet size varies until the kernel knows the path MTU. However, to get this to work in the other direction (listening server accept:ing connections from clients, sending data and calibrating PMTU in direction from server to client) I have to set the global option for tcp path mtu discovery, /proc/sys/net/ipv4/tcp_mtu_probing. If I do not, server will stupidly continue to send too large packets, which get discarded by an intermediate router without ICMP sent back. Both endpoints have an MTU set to 3000, while the intermediate hops have MTU 1500.
I hope someone has an idea on what goes wrong. If more info is needed, let me know and I edit the question. Problems exist on both Linux kernel 4.2.0 and 3.19.0, both are stock Kubuntu LTS kernels. (x86/x86-64)
I do set the same socket option server-side as well, on all accept:ed sockets, before sending data in reverse direction.
FWIW, I have found workarounds/solutions for the problems, will do more testing but shortly describe my findings here, in case it helps someone else.
The problem with not being able to set path mtu discovery per socket was fixed, brute force, by enabling it system-wide during execution of my program, then disabling again.
The second problem, of incorrect path mtu returned by getsockopt TCP_MAXSEG, was fixed by waiting for TCP ACK of sent TCP data, also using getsockopt (tcp_info.tcpi_unacked). That way, I can be sure that probing has finished before I get TCP_MAXSEG.
Finally, there was a patchset for improving path mtu probing accuracy merged to mainline Linux kernel in March 2015. Without those patches, the probing is very imprecise. Patchset is part of 4.1.y-series kernels and onward.
I have use sock_raw get all ip packet from kernel.
socket(PF_PACKET, SOCK_RAW, htons(protocol);
But packet still alive in kernel, how can i drop it?
You cannot. When you receive the packet on a raw socket, the kernel has created a copy and delivered it to your receiving process. The packet will continue being processed in the meantime according to the usual semantics. It's likely this will have completed (i.e. whatever the stack would normally do with it will already be done) by the time your process receives it.
However, if the packet is not actually destined to your box (e.g. you're receiving it only because you have the network interface in promiscuous mode), or if there is no local process [or in-kernel component] interested in receiving it, the packet will just be discarded anyway.
If you simply wish to receive all packets that arrive on an interface without processing them, you can simply bring the interface up in promiscuous mode without giving it an IP address. Then packets will be delivered to your raw socket but will then be discarded by the stack.
Old question, but others might find this answer useful.
Depends on the usecase, but you can actually drop ingress packets after you get them from AF_PACKET SOCK_RAW. To do that, put an ingress qdisc where we have drop action. Example:
sudo tc qdisc add dev eth0 ingress
sudo tc filter add dev eth0 parent ffff: matchall action drop
Explaination: this works, because the AF_PACKET sniff the packet's copy from the per-device tap, which is a little bit earlier than the ingress qdisc in the kernel network stack's packet processing pipeline. That way you can implement a simple userspace switch.
What you have:
bond (bond0) interface (all modes except 4) with at least 2 ifaces (say eth0 / eth1) connected on the same external switch
bond0 interface joined on a software bridge (br0)
virtual machine (vm0) (eg LibVirt::LXC) with an interface on br0
What you get:
vm0 is not able to connect to (most) IP addresses via bond0 over br0
"bond0: received packet with own address as source address" in syslog
Why you get this:
When vm0 wants to contact an external IP address it will send out an ARP request. This L2 broadcast with the source mac of vm0 will leave through (depending on bonding mode) eg eth0, but via the external switch, re-enter through eth1 and thus bond0. Hence the switch br0 will learn the mac-address of vm0 on the port connected to bond0. As a consequence the ARP-reply is never received by vm0.
What can you do to resolve:
The reason I post this, next to sharing the info, is that I wasn't able to figure out a good enough solution. Those I did find are:
On vm0 set static ARP entry
Use bond0 mode=4 but your external switch must support this
Configure your external siwtch to use private VLAN on eth0/eth1 but only works in some use-cases and adds complexity
Add both physical interfaces to the bridge with spanning tree enabled, instead of using bond driver
Statically configuring the MAC of vm0 on the correct port of br0 is not an option on Linux (works on OpenBSD though)
I'm really hoping for a more elegant solution here... Anyone?
Thanks
I've got the same problem and I come up with the same analysis.
The only non-invasive/scalable solution I've found is to use the active/backup bonding (mode 1). The tradeoff is that you lose the aggregation.
IMO, the best solution is to use 802.3ad, but I can't always use it because I'm limited with 6 port-channels on most of my switches.
Try these options in bridge:
brigde_fd 0
bridge_stp off # switch on with more system like this
bridge_maxage 0
bridge_ageing 0
bridge_maxwait 0
Taken from this thread:
kvm bridge also in proxmox
I'm developing a tftp client and server and I want to dynamically select the udp payload size to boost transfer performance.
I have tested it with two linux machines ( one has a gigabit ethernet card, the other a fast ethernet one ). I changed the MTU of the gigabit card to 2048 bytes and left the other to 1500.
I have used setsockopt(sockfd, IPPROTO_IP, IP_MTU_DISCOVER, &optval, sizeof(optval)) to set the MTU_DISCOVER flag to IP_PMTUDISC_DO.
From what I have read this option should set the DF bit to one and so it should be possible to find the minimum MTU of the network ( the MTU of the host that has the lowest MTU ). However this thing only gives me an error when I send a packet which size is bigger than the MTU of the machine from which I'm sending packets.
Also the other machine ( the server in this case ) doesn't receive the oversized packets ( the server has a MTU of 1500 ). All the UDP packets are dropped, the only way is to send packets of 1472 bytes.
Why the hosts do this? From what I have read, if I send a packet larger than MTU, the ip layer should fragment it.
I fail to see the problem. You are setting the "don't fragment" bit, and you send a package smaller than the sending host's MTU, but larger than the receiving host's MTU. Of course nobody will fragment here (doing so would violate the DF bit). Instead, the sending host should get an ICMP message back.
Edit: IP specifies that an ICMP error message type 3 (destination unreachable) code 4 (Fragmentation Required but DF Bit Is Set) is sent to the originating host at the point where the fragmentation would have occurred. The TCP layer handles this on its own for PMTU discovery. On connection-less sockets, Linux reports the error in the socket's error queue if the IP_RECVERR option is activated; see ip(7).
That "DF bit" you're setting, stands for "Don't Fragment". The IP layer should not be expected to fragment packets when you've told it not to.
It is not correct to run hosts with different interface MTUs on the same subnet1.
This is a host/network misconfiguration, and IP path MTU discovery is not expected to work correctly in this situation.
If you wish to test your application's path MTU discovery, you will need to set up multiple subnets connected by a router2, with different MTUs. In this situation, the router is the device that will pick up the MTU mismatch, and send back an ICMP "Fragmentation Needed" error.
1. Well, technically, same broadcast domain.
2. The devices sold as "home routers" are really router/switches - they route between the WAN and the LAN, but switch between the ethernet ports on the LAN. This isn't sufficient to separate networks with different MTUs.
In Linux, how do you set the maximum segment size that is allowed on a TCP connection? I need to set this for an application I did not write (so I cannot use setsockopt to do it). I need to set this ABOVE the mtu in the network stack.
I have two streams sharing the same network connection. One sends small packets periodically, which need absolute minimum latency. The other sends tons of data--I am using SCP to simulate that link.
I have setup traffic control (tc) to give the minimum latency traffic high priority. The problem I am running into, though, is that the TCP packets that are coming down from SCP end up with sizes up to 64K bytes. Yes, these are broken into smaller packets based on mtu, but this unfortunately occurs AFTER tc prioritizes the packets. Thus, my low latency packet gets stuck behind up to 64K bytes of SCP traffic.
This article indicates that on Windows you can set this value.
Is there something on Linux I can set? I've tried ip route and iptables, but these are applied too low in the network stack. I need to limit the TCP packet size before tc, so it can prioritize the high priority packets appropriately.
Are you using tcp segmentation offload to the nic? (You can use "ethtool -k $your_network_device" to see the offload settings.) This is the only way as far as I know that you would see 64k tcp packets with a device MTU of 1500. Not that this answers the question, but it might help avoid misdiagnosis.
ip route command with option advmss helps to set MSS value.
ip route add 192.168.1.0/24 dev eth0 advmss 1500
The upper bound of the advertised TCP MSS is the MTU of the first hop route. If you're seeing 64k segments, that tends to indicate that the first hop route MTU is excessively large - are you using loopback or something for testing?
MSS = MTU – 40bytes (standard TCP/IP overhead of 40 bytes [20+20])
If the MTU is 1500 bytes then the MSS will be 1460 bytes.
You are definitely misdiagnosing the problem; as someone else pointed out, tc doesn't see TCP packets, it sees IP packets, and they'd already be in chunks at that point.
You are probably just experiencing bufferbloat: you're overloading your outbound queue in a totally separate device (probably a DSL modem or cable modem). The only fix is to tell tc to limit your outbound bandwidth to less than the modem's bandwidth, eg. using TBF.