Identifying the preferred IPv6 source address for an adapter

Identifying the preferred IPv6 source address for an adapter - linux

If you have a IPv6 enabled host that has more than one global-scope address, how can you programmatically identify the preferred address for bind()?
Example address list:
eth0 Link encap:Ethernet HWaddr 00:14:5e:bd:6d:da
inet addr:10.6.28.31 Bcast:10.6.28.255 Mask:255.255.255.0
inet6 addr: 2002:dce8:d28e:0:214:5eff:febd:6dda/64 Scope:Global
inet6 addr: fe80::214:5eff:febd:6dda/64 Scope:Link
inet6 addr: 2002:dce8:d28e::31/64 Scope:Global
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
On Solaris you can indicate a preferred address with an interface flag and it is available programmatically via SIOCGLIFCONF:
/usr/include/net/if.h:
#define IFF_PREFERRED 0x0400000000 /* Prefer as source address */
As listed in the interface list:
eri0: flags=2104841<UP,RUNNING,MULTICAST,DHCP,ROUTER,IPv6> mtu 1500 index 2
inet6 fe80::203:baff:fe4e:6cc8/10
eri0:1: flags=402100841<UP,RUNNING,MULTICAST,ROUTER,IPv6,PREFERRED> mtu 1500 index 2
inet6 2002:dce8:d28e::36/64
This is not portable to OSX, Linux, FreeBSD, or Windows though. Windows is let off easy though as it has completely useless, from an administrators perspective, UUID based adapter names (depending upon the Windows version).
For Linux this article details how the parameter preferred_lft, where lft is short for "lifetime", can be altered to weight the selection process by the kernel. This setting doesn't appear conveniently available in the results of SIOCGIFCONF or getifaddrs() though.
So I want to bind to eth0, eri0, or whatever available interface name. The choices are a bit stark:
Fail on adapter names resolving to multiple interfaces. I take this approach for handling multicast transports (OpenPGM) as the protocol MUST have one-only sending address.
Bind to everything. This is a cop out and would be unexpected to users.
Bind to the adapter with SO_BINDTODEVICE. This requires CAP_NET_RAW system capability on Linux which can be quite a cumbersome overhead for administrators.
Bind to the first IPv6 interface on the adapter. The ordering tends to be completely bogus.
Bind to the last interface. David Croft's article implies Linux does this, but is also a bit bogus.
Enumerate over every interface and create a new socket explicitly for each.
With option #6 I would expect you could usually be smarter and take the approach that if only a link-local scope address is available bind to that, otherwise bind to just the available global-link scope addresses.
When connecting to another host then RFC 3484 can be used, but as you can see all the choices are dependent upon matching the destination address:
Prefer same address. (i.e. destination is local machine)
Prefer appropriate scope. (i.e. smallest scope shared with the destination)
Avoid deprecated addresses.
Prefer home addresses. Prefer outgoing
interface. (i.e. prefer an address on the interface we're sending
out of)
Prefer matching label.
Prefer public addresses.
Use longest matching prefix.
In some circumstances we can use #7 here, but in the interface example above both global-scope interfaces have a 64-bit prefix length.
RFC 3484 has the following pertinent lines:
The IPv6 addressing architecture 5 allows multiple unicast
addresses to be assigned to interfaces. These addresses may have
different reachability scopes (link-local, site-local, or global).
These addresses may also be "preferred" or "deprecated" 6.
The link being to RFC 2462, similarly expanded:
preferred address - an address assigned to an interface whose use
by
upper layer protocols is unrestricted. Preferred addresses may
be used as the source (or destination) address of packets sent
from (or to) the interface.
But no methods to programmatically acquire this detail.
Props to Win32 API that exposes an ioctl SIO_ADDRESS_LIST_SORT that allows a developer to use not only RFC 3484 sorting but to take into consideration any system administrator overrides. Linux has /etc/gai.conf as used for RFC 3484 sorting in getaddrinfo() but no API for directly accessing the sorting. Solaris has the ipaddrsel command. OSX is following FreeBSD by adding ip6addrctl in 10.7.
edit: Some concerns with RFC 3484 sorting are listed and referred to in this additional IETF draft document:
https://datatracker.ietf.org/doc/html/draft-axu-addr-sel-01
Solaris, for example, creates new alias-interfaces for each new
address assigned to a physical interface. So if_index could also be
used to uniquely identify a source address specific routing table on
that platform. Other operating systems do not work the same way.
The author likes Solaris's approach of giving each additional IPv6 interface a new alias, so that eri0 would become the link-local scope address, and eri0:1 or eri0:2, etc, must be specified to use a global-scope address.
Clearly whilst a nice idea one couldn't expect to see other OS change for quite some time.

I'm not sure this is in the direction you're seeking, but...
Poking around in the iproute bundle's ip code (ip/ipaddress.c) under linux shows that the ip command digs up interface flags like primary and secondary from a struct ifaddrmsg, member ifa_flags. The ifaddmsg seems to be acquired through a struct nlmsghdr which is documented in man 7 netlink, and used via sendmsg and recvmsg interaction with the kernel, which overall sounds like a royal pain but it's at least programmatic. Whether primary and secondary would be enough to be useful is a separate question.

Related

Kernel API to know up address of interface

Is there any kernel side/space API to know the ip address of an interface , given it's name?

I think you're looking for rtnetlink (man page)
Rtnetlink allows the kernel's routing tables to be read and altered.
It is used within the kernel to communicate between various
subsystems, though this usage is not documented here, and for
communication with user-space programs. Network routes, IP addresses,
link parameters, neighbor setups, queueing disciplines, traffic
classes and packet classifiers may all be controlled through
NETLINK_ROUTE sockets.
According to strace, tt's the api ip addr show dev XXX uses:
strace ip addr sh dev lo 2>&1 | grep sendmsg
sendmsg(4, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=48, type=RTM_GETLINK, flags=NLM_F_REQUEST, seq=1596838225, pid=0}, {ifi_family=AF_UNSPEC, ifi_type=ARPHRD_NETROM, ifi_index=0, ifi_flags=0, ifi_change=0}, [{{nla_len=8, nla_type=IFLA_EXT_MASK}, 9}, {{nla_len=7, nla_type=IFLA_IFNAME}, "lo"}]}, iov_len=48}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 48
sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=40, type=RTM_GETLINK, flags=NLM_F_REQUEST, seq=1596838225, pid=0}, {ifi_family=AF_UNSPEC, ifi_type=ARPHRD_NETROM, ifi_index=if_nametoindex("lo"), ifi_flags=0, ifi_change=0}, {{nla_len=8, nla_type=IFLA_EXT_MASK}, 9}}, iov_len=40}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 40
However, it looks like a non-trivial api so if you don't need it often, it might be easier to just run ip addr sh dev XXX and parse the response.
Edit:
Looks like it's also possible using netdevice (man page), specifically, the SIOCGIFADDR ioctl:
SIOCGIFADDR, SIOCSIFADDR
Get or set the address of the device using ifr_addr. Setting the interface address is a privileged operation. For compatibility, only
AF_INET addresses are accepted or returned.
There's example code here

Live with Predictable Network Interface Name

I'm facing for the first time with the new name scheme of network interfaces: Predictable Network Interface Name.
My question is NOT related if this scheme is better or worse... I'm just trying to understand how to use it correcly.
Here I read:
When changing the interface naming scheme, do not forget to update all network-related configuration files and custom systemd unit files to reflect the change.
So I have to write in all the configuration files the actual interface name. In the previous scheme it was i.e. eth0 and it just means the first ethernet card, with the known caveats if there are multiple interfaces.
Now, instead, I have to write the predictable name, that is composed of some easy parts (i.e. type of the interface) and other un-predictable ones like the MAC address. As far as I understand each card will have a different name.
I admit my question might appear fool, but I don't understand how to prepare a configuration file. Let's see an example, /etc/dhcpcd.conf:
profile static_eth0
static ip_address=192.168.1.23/24
static routers=192.168.1.1
static domain_name_servers=192.168.1.1
interface eth0
fallback static_eth0
What should I put instead of eth0 in the o.s. image?
Only when I run the target machine I can retreive the actual name of the ethernet interface.
100% of my systems are headless, and I never connect a keyboard and display to them. Furthermore, if I have to send a spare part of the SBC do I need to reconfigure all?
Would you please help me to understand the correct usage?
ps. I know I can revert back to the old naming scheme... but that's not the point of my question.

See https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
it explains how the names are assigned
Names incorporating Firmware/BIOS provided index numbers for on-board devices (example: eno1)
Names incorporating Firmware/BIOS provided PCI Express hotplug slot index numbers (example: ens1)
Names incorporating physical/geographical location of the connector of the hardware (example: enp2s0)
Names incorporating the interfaces's MAC address (example: enx78e7d1ea46da)
Classic, unpredictable kernel-native ethX naming (example: eth0)
By default, systemd v197 will now name interfaces following policy 1) if that
information from the firmware is applicable and available, falling back to 2) if
that information from the firmware is applicable and available, falling back to 3)
if applicable, falling back to 5) in all other cases. Policy 4) is not used by
default, but is available if the user chooses so.
So you could opt for a different approach, likely in your setup its easiest to take the mac, just reboot once an image that tries pxe/dhcp requests and note down the sended mac.
Another way, that may work, depending on your setup, would be interface groupings.
From "man interfaces"
auto /eth*
If the kernel knows about the interfaces with names lo, eth0 and eth1, then the above line is then interpreted as:
auto eth0 eth1
Note that there must still be valid "iface" stanzas for each matching interface. However, it is possible to combine a pattern with a mapping to a logical interface, like so:
auto /eth*=eth
iface eth inet dhcp
So maybe if you only have one interface, but can't tell where it will be assigned, you could write "auto /e*=eth" to catch all interfaces starting with e and address them inside the configuration file as "eth".

Linux IPV6 primary and secondary addresses

I need someone to explain to me how the IPv6 addresses are assigned and treated as primary/secondary. Is it the same as it was with IPv4 - single primary, multiple secondaries or not.Is there an hierarchy among the IPv6 addresses assigned or it is a flat list.
Thanks!

It is a flat list. All other things like common prefix length equal and with no temporary (privacy) addresses Linux will usually use the last added static address as the default source address. You can override this by setting a specific src address on the default route:
ip -6 route add default via 2001:db8::1 dev ethic src 2001:db8::abcd

tcpdump: capture outgoing packets on virtual interfaces that has an unknown link type to libpcap?

In the system I am testing right now, it has a couple of virtual L2 devices chained together to add our own L2.5 headers between Eth headers and IP headers. Now when I use
tcpdump -xx -i vir_device_1
, it actually shows the SLL header with IP header. How do I capture the full packet that is actually going out of the vir_device_1, i.e. after the ndo_start_xmit() device call?

How do I capture the full packet that is actually going out of the vir_device_1, i.e. after the ndo_start_xmit() device call?
Either by writing your own code to directly use a PF_PACKET/SOCK_RAW socket (you say "SLL header", so this is presumably Linux), or by:
making sure you've assigned a special ARPHRD_ value for your virtual interface;
using one of the DLT_USERn values for your special set of headers, or asking tcpdump-workers#lists.tcpdump.org for an official DLT_ value to be assigned for them;
modifying libpcap to map that ARPHRD_ value to the DLT_ value you're using;
modifying tcpdump to handle that DLT_ value;
if necessary, modifying other programs that would capture on that interface or read capture files as written by tcpdump on that interface to handle that value as well.
Note that the DLT_USERn values are specifically reserved for private use, and no official versions of libpcap, tcpdump, or Wireshark will ever assign them for their own use (i.e., if you use a DLT_USERn value, don't bother contributing patches to assign that value to your type of headers, as they won't be accepted; other people may already be using it for their own special headers, and that must continue to be supported), so you'll have to maintain the modified versions of libpcap, tcpdump, etc. yourself if you use one of those values rather than getting an official value assigned.

Thanks Guy Harris for providing very helpful answers to my original question!
I am adding this as an answer/note to a follow up question I asked in the comments.
Basically my question was what is the status of the packet received by PF_PACKET/SOCK_RAW.
For an software device(no queue), dev_queue_xmit() will call dev_hard_start_xmit(skb, dev) to start transmitting skb buffer. This function calls dev_queue_xmit_nit() before it calls dev->ops->ndo_start_xmit(skb,dev), which means the packet PF_PACKET sees is at the state before any changes made in ndo_start_xmit().

Specify source IP address for TCP socket when using Linux network device aliases

For some specific networking tests, I've created a VLAN device, eth1.900, and a couple of aliases, eth1.900:1 and eth1.900.2.
eth1.900 Link encap:Ethernet HWaddr 00:18:E7:17:2F:13
inet addr:1.0.1.120 Bcast:1.0.1.255 Mask:255.255.255.0
eth1.900:1 Link encap:Ethernet HWaddr 00:18:E7:17:2F:13
inet addr:1.0.1.200 Bcast:1.0.1.255 Mask:255.255.255.0
eth1.900:2 Link encap:Ethernet HWaddr 00:18:E7:17:2F:13
inet addr:1.0.1.201 Bcast:1.0.1.255 Mask:255.255.255.0
When connecting to a server, is there a way to specify which of these aliases will be used? I can ping using the -I <ip> address option to select which alias to use, but I can't see how to do it with a TCP socket in code without using raw sockets, since I would also like to run without extra socket privileges, i.e. not running as root, if possible.
Unfortunately, even with root, SO_BINDTODEVICE doesn't work because the alias device name is not recognized:
printf("Bind to %s\n", devname);
if (setsockopt(s, SOL_SOCKET, SO_BINDTODEVICE, (char*)devname, sizeof(devname)) != 0)
{
perror("SO_BINDTODEVICE");
return 1;
}
Output:
Bind to eth1.900:1
SO_BINDTODEVICE: No such device

Use getifaddrs() to enumerate all the interfaces and find the IP address for the interface you want to bind to. Then use bind() to bind to that IP address, before you call connect().

Since a packet can't be send out on an aliased interface anyway, it would make no sense to use SO_BINDTODEVICE on one. SO_BINDTODEVICE controls which device a packet is sent out from if routing cannot be used for this purpose (for example, if it's a raw Ethernet frame).

You don't show the definition of devname, but if it's a string pointer, e.g.:
char *devname = "eth1.900:1";
Then perhaps it's failing since you specify the argument size using sizeof devname, which would in this case be the same as sizeof (char *), i.e. typically 4 on a 32-bit system.
If setsockopt() expects to see the actual size of the argument, i.e. the length of the string, this could explain the issue since it's then perhaps just inspecting the first four characters and failing since the result is an invalid interface name.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string