How do SO_REUSEADDR and SO_REUSEPORT differ? - linux

The man pages and programmer documentations for the socket options SO_REUSEADDR and SO_REUSEPORT are different for different operating systems and often highly confusing. Some operating systems don't even have the option SO_REUSEPORT. The WWW is full of contradicting information regarding this subject and often you can find information that is only true for one socket implementation of a specific operating system, which may not even be explicitly mentioned in the text.
So how exactly is SO_REUSEADDR different than SO_REUSEPORT?
Are systems without SO_REUSEPORT more limited?
And what exactly is the expected behavior if I use either one on different operating systems?

Welcome to the wonderful world of portability... or rather the lack of it. Before we start analyzing these two options in detail and take a deeper look how different operating systems handle them, it should be noted that the BSD socket implementation is the mother of all socket implementations. Basically all other systems copied the BSD socket implementation at some point in time (or at least its interfaces) and then started evolving it on their own. Of course the BSD socket implementation was evolved as well at the same time and thus systems that copied it later got features that were lacking in systems that copied it earlier. Understanding the BSD socket implementation is the key to understanding all other socket implementations, so you should read about it even if you don't care to ever write code for a BSD system.
There are a couple of basics you should know before we look at these two options. A TCP/UDP connection is identified by a tuple of five values:
{<protocol>, <src addr>, <src port>, <dest addr>, <dest port>}
Any unique combination of these values identifies a connection. As a result, no two connections can have the same five values, otherwise the system would not be able to distinguish these connections any longer.
The protocol of a socket is set when a socket is created with the socket() function. The source address and port are set with the bind() function. The destination address and port are set with the connect() function. Since UDP is a connectionless protocol, UDP sockets can be used without connecting them. Yet it is allowed to connect them and in some cases very advantageous for your code and general application design. In connectionless mode, UDP sockets that were not explicitly bound when data is sent over them for the first time are usually automatically bound by the system, as an unbound UDP socket cannot receive any (reply) data. Same is true for an unbound TCP socket, it is automatically bound before it will be connected.
If you explicitly bind a socket, it is possible to bind it to port 0, which means "any port". Since a socket cannot really be bound to all existing ports, the system will have to choose a specific port itself in that case (usually from a predefined, OS specific range of source ports). A similar wildcard exists for the source address, which can be "any address" (0.0.0.0 in case of IPv4 and :: in case of IPv6). Unlike in case of ports, a socket can really be bound to "any address" which means "all source IP addresses of all local interfaces". If the socket is connected later on, the system has to choose a specific source IP address, since a socket cannot be connected and at the same time be bound to any local IP address. Depending on the destination address and the content of the routing table, the system will pick an appropriate source address and replace the "any" binding with a binding to the chosen source IP address.
By default, no two sockets can be bound to the same combination of source address and source port. As long as the source port is different, the source address is actually irrelevant. Binding socketA to ipA:portA and socketB to ipB:portB is always possible if ipA != ipB holds true, even when portA == portB. E.g. socketA belongs to a FTP server program and is bound to 192.168.0.1:21 and socketB belongs to another FTP server program and is bound to 10.0.0.1:21, both bindings will succeed. Keep in mind, though, that a socket may be locally bound to "any address". If a socket is bound to 0.0.0.0:21, it is bound to all existing local addresses at the same time and in that case no other socket can be bound to port 21, regardless which specific IP address it tries to bind to, as 0.0.0.0 conflicts with all existing local IP addresses.
Anything said so far is pretty much equal for all major operating system. Things start to get OS specific when address reuse comes into play. We start with BSD, since as I said above, it is the mother of all socket implementations.
BSD
SO_REUSEADDR
If SO_REUSEADDR is enabled on a socket prior to binding it, the socket can be successfully bound unless there is a conflict with another socket bound to exactly the same combination of source address and port. Now you may wonder how is that any different than before? The keyword is "exactly". SO_REUSEADDR mainly changes the way how wildcard addresses ("any IP address") are treated when searching for conflicts.
Without SO_REUSEADDR, binding socketA to 0.0.0.0:21 and then binding socketB to 192.168.0.1:21 will fail (with error EADDRINUSE), since 0.0.0.0 means "any local IP address", thus all local IP addresses are considered in use by this socket and this includes 192.168.0.1, too. With SO_REUSEADDR it will succeed, since 0.0.0.0 and 192.168.0.1 are not exactly the same address, one is a wildcard for all local addresses and the other one is a very specific local address. Note that the statement above is true regardless in which order socketA and socketB are bound; without SO_REUSEADDR it will always fail, with SO_REUSEADDR it will always succeed.
To give you a better overview, let's make a table here and list all possible combinations:
SO_REUSEADDR socketA socketB Result
---------------------------------------------------------------------
ON/OFF 192.168.0.1:21 192.168.0.1:21 Error (EADDRINUSE)
ON/OFF 192.168.0.1:21 10.0.0.1:21 OK
ON/OFF 10.0.0.1:21 192.168.0.1:21 OK
OFF 0.0.0.0:21 192.168.1.0:21 Error (EADDRINUSE)
OFF 192.168.1.0:21 0.0.0.0:21 Error (EADDRINUSE)
ON 0.0.0.0:21 192.168.1.0:21 OK
ON 192.168.1.0:21 0.0.0.0:21 OK
ON/OFF 0.0.0.0:21 0.0.0.0:21 Error (EADDRINUSE)
The table above assumes that socketA has already been successfully bound to the address given for socketA, then socketB is created, either gets SO_REUSEADDR set or not, and finally is bound to the address given for socketB. Result is the result of the bind operation for socketB. If the first column says ON/OFF, the value of SO_REUSEADDR is irrelevant to the result.
Okay, SO_REUSEADDR has an effect on wildcard addresses, good to know. Yet that isn't its only effect it has. There is another well known effect which is also the reason why most people use SO_REUSEADDR in server programs in the first place. For the other important use of this option we have to take a deeper look on how the TCP protocol works.
If a TCP socket is being closed, normally a 3-way handshake is performed; the sequence is called FIN-ACK. The problem here is, that the last ACK of that sequence may have arrived on the other side or it may not have arrived and only if it has, the other side also considers the socket as being fully closed. To prevent re-using an address+port combination, that may still be considered open by some remote peer, the system will not immediately consider a socket as dead after sending the last ACK but instead put the socket into a state commonly referred to as TIME_WAIT. It can be in that state for minutes (system dependent setting). On most systems you can get around that state by enabling lingering and setting a linger time of zero1 but there is no guarantee that this is always possible, that the system will always honor this request, and even if the system honors it, this causes the socket to be closed by a reset (RST), which is not always a great idea. To learn more about linger time, have a look at my answer about this topic.
The question is, how does the system treat a socket in state TIME_WAIT? If SO_REUSEADDR is not set, a socket in state TIME_WAIT is considered to still be bound to the source address and port and any attempt to bind a new socket to the same address and port will fail until the socket has really been closed. So don't expect that you can rebind the source address of a socket immediately after closing it. In most cases this will fail. However, if SO_REUSEADDR is set for the socket you are trying to bind, another socket bound to the same address and port in state TIME_WAIT is simply ignored, after all its already "half dead", and your socket can bind to exactly the same address without any problem. In that case it plays no role that the other socket may have exactly the same address and port. Note that binding a socket to exactly the same address and port as a dying socket in TIME_WAIT state can have unexpected, and usually undesired, side effects in case the other socket is still "at work", but that is beyond the scope of this answer and fortunately those side effects are rather rare in practice.
There is one final thing you should know about SO_REUSEADDR. Everything written above will work as long as the socket you want to bind to has address reuse enabled. It is not necessary that the other socket, the one which is already bound or is in a TIME_WAIT state, also had this flag set when it was bound. The code that decides if the bind will succeed or fail only inspects the SO_REUSEADDR flag of the socket fed into the bind() call, for all other sockets inspected, this flag is not even looked at.
SO_REUSEPORT
SO_REUSEPORT is what most people would expect SO_REUSEADDR to be. Basically, SO_REUSEPORT allows you to bind an arbitrary number of sockets to exactly the same source address and port as long as all prior bound sockets also had SO_REUSEPORT set before they were bound. If the first socket that is bound to an address and port does not have SO_REUSEPORT set, no other socket can be bound to exactly the same address and port, regardless if this other socket has SO_REUSEPORT set or not, until the first socket releases its binding again. Unlike in case of SO_REUSEADDR the code handling SO_REUSEPORT will not only verify that the currently bound socket has SO_REUSEPORT set but it will also verify that the socket with a conflicting address and port had SO_REUSEPORT set when it was bound.
SO_REUSEPORT does not imply SO_REUSEADDR. This means if a socket did not have SO_REUSEPORT set when it was bound and another socket has SO_REUSEPORT set when it is bound to exactly the same address and port, the bind fails, which is expected, but it also fails if the other socket is already dying and is in TIME_WAIT state. To be able to bind a socket to the same addresses and port as another socket in TIME_WAIT state requires either SO_REUSEADDR to be set on that socket or SO_REUSEPORT must have been set on both sockets prior to binding them. Of course it is allowed to set both, SO_REUSEPORT and SO_REUSEADDR, on a socket.
There is not much more to say about SO_REUSEPORT other than that it was added later than SO_REUSEADDR, that's why you will not find it in many socket implementations of other systems, which "forked" the BSD code before this option was added, and that there was no way to bind two sockets to exactly the same socket address in BSD prior to this option.
Connect() Returning EADDRINUSE?
Most people know that bind() may fail with the error EADDRINUSE, however, when you start playing around with address reuse, you may run into the strange situation that connect() fails with that error as well. How can this be? How can a remote address, after all that's what connect adds to a socket, be already in use? Connecting multiple sockets to exactly the same remote address has never been a problem before, so what's going wrong here?
As I said on the very top of my reply, a connection is defined by a tuple of five values, remember? And I also said, that these five values must be unique otherwise the system cannot distinguish two connections any longer, right? Well, with address reuse, you can bind two sockets of the same protocol to the same source address and port. That means three of those five values are already the same for these two sockets. If you now try to connect both of these sockets also to the same destination address and port, you would create two connected sockets, whose tuples are absolutely identical. This cannot work, at least not for TCP connections (UDP connections are no real connections anyway). If data arrived for either one of the two connections, the system could not tell which connection the data belongs to. At least the destination address or destination port must be different for either connection, so that the system has no problem to identify to which connection incoming data belongs to.
So if you bind two sockets of the same protocol to the same source address and port and try to connect them both to the same destination address and port, connect() will actually fail with the error EADDRINUSE for the second socket you try to connect, which means that a socket with an identical tuple of five values is already connected.
Multicast Addresses
Most people ignore the fact that multicast addresses exist, but they do exist. While unicast addresses are used for one-to-one communication, multicast addresses are used for one-to-many communication. Most people got aware of multicast addresses when they learned about IPv6 but multicast addresses also existed in IPv4, even though this feature was never widely used on the public Internet.
The meaning of SO_REUSEADDR changes for multicast addresses as it allows multiple sockets to be bound to exactly the same combination of source multicast address and port. In other words, for multicast addresses SO_REUSEADDR behaves exactly as SO_REUSEPORT for unicast addresses. Actually, the code treats SO_REUSEADDR and SO_REUSEPORT identically for multicast addresses, that means you could say that SO_REUSEADDR implies SO_REUSEPORT for all multicast addresses and the other way round.
FreeBSD/OpenBSD/NetBSD
All these are rather late forks of the original BSD code, that's why they all three offer the same options as BSD and they also behave the same way as in BSD.
macOS (MacOS X)
At its core, macOS is simply a BSD-style UNIX named "Darwin", based on a rather late fork of the BSD code (BSD 4.3), which was then later on even re-synchronized with the (at that time current) FreeBSD 5 code base for the Mac OS 10.3 release, so that Apple could gain full POSIX compliance (macOS is POSIX certified). Despite having a microkernel at its core ("Mach"), the rest of the kernel ("XNU") is basically just a BSD kernel, and that's why macOS offers the same options as BSD and they also behave the same way as in BSD.
iOS / watchOS / tvOS
iOS is just a macOS fork with a slightly modified and trimmed kernel, somewhat stripped down user space toolset and a slightly different default framework set. watchOS and tvOS are iOS forks, that are stripped down even further (especially watchOS). To my best knowledge they all behave exactly as macOS does.
Linux
Linux < 3.9
Prior to Linux 3.9, only the option SO_REUSEADDR existed. This option behaves generally the same as in BSD with two important exceptions:
As long as a listening (server) TCP socket is bound to a specific port, the SO_REUSEADDR option is entirely ignored for all sockets targeting that port. Binding a second socket to the same port is only possible if it was also possible in BSD without having SO_REUSEADDR set. E.g. you cannot bind to a wildcard address and then to a more specific one or the other way round, both is possible in BSD if you set SO_REUSEADDR. What you can do is you can bind to the same port and two different non-wildcard addresses, as that's always allowed. In this aspect Linux is more restrictive than BSD.
The second exception is that for client sockets, this option behaves exactly like SO_REUSEPORT in BSD, as long as both had this flag set before they were bound. The reason for allowing that was simply that it is important to be able to bind multiple sockets to exactly to the same UDP socket address for various protocols and as there used to be no SO_REUSEPORT prior to 3.9, the behavior of SO_REUSEADDR was altered accordingly to fill that gap. In that aspect Linux is less restrictive than BSD.
Linux >= 3.9
Linux 3.9 added the option SO_REUSEPORT to Linux as well. This option behaves exactly like the option in BSD and allows binding to exactly the same address and port number as long as all sockets have this option set prior to binding them.
Yet, there are still two differences to SO_REUSEPORT on other systems:
To prevent "port hijacking", there is one special limitation: All sockets that want to share the same address and port combination must belong to processes that share the same effective user ID! So one user cannot "steal" ports of another user. This is some special magic to somewhat compensate for the missing SO_EXCLBIND/SO_EXCLUSIVEADDRUSE flags.
Additionally the kernel performs some "special magic" for SO_REUSEPORT sockets that isn't found in other operating systems: For UDP sockets, it tries to distribute datagrams evenly, for TCP listening sockets, it tries to distribute incoming connect requests (those accepted by calling accept()) evenly across all the sockets that share the same address and port combination. Thus an application can easily open the same port in multiple child processes and then use SO_REUSEPORT to get a very inexpensive load balancing.
Android
Even though the whole Android system is somewhat different from most Linux distributions, at its core works a slightly modified Linux kernel, thus everything that applies to Linux should apply to Android as well.
Windows
Windows only knows the SO_REUSEADDR option, there is no SO_REUSEPORT. Setting SO_REUSEADDR on a socket in Windows behaves like setting SO_REUSEPORT and SO_REUSEADDR on a socket in BSD, with one exception:
Prior to Windows 2003, a socket with SO_REUSEADDR could always been bound to exactly the same source address and port as an already bound socket, even if the other socket did not have this option set when it was bound. This behavior allowed an application "to steal" the connected port of another application. Needless to say that this has major security implications!
Microsoft realized that and added another important socket option: SO_EXCLUSIVEADDRUSE. Setting SO_EXCLUSIVEADDRUSE on a socket makes sure that if the binding succeeds, the combination of source address and port is owned exclusively by this socket and no other socket can bind to them, not even if it has SO_REUSEADDR set.
This default behavior was changed first in Windows 2003, Microsoft calls that "Enhanced Socket Security" (funny name for a behavior that is default on all other major operating systems). For more details just visit this page. There are three tables: The first one shows the classic behavior (still in use when using compatibility modes!), the second one shows the behavior of Windows 2003 and up when the bind() calls are made by the same user, and the third one when the bind() calls are made by different users.
Solaris
Solaris is the successor of SunOS. SunOS was originally based on a fork of BSD, SunOS 5 and later was based on a fork of SVR4, however SVR4 is a merge of BSD, System V, and Xenix, so up to some degree Solaris is also a BSD fork, and a rather early one. As a result Solaris only knows SO_REUSEADDR, there is no SO_REUSEPORT. The SO_REUSEADDR behaves pretty much the same as it does in BSD. As far as I know there is no way to get the same behavior as SO_REUSEPORT in Solaris, that means it is not possible to bind two sockets to exactly the same address and port.
Similar to Windows, Solaris has an option to give a socket an exclusive binding. This option is named SO_EXCLBIND. If this option is set on a socket prior to binding it, setting SO_REUSEADDR on another socket has no effect if the two sockets are tested for an address conflict. E.g. if socketA is bound to a wildcard address and socketB has SO_REUSEADDR enabled and is bound to a non-wildcard address and the same port as socketA, this bind will normally succeed, unless socketA had SO_EXCLBIND enabled, in which case it will fail regardless the SO_REUSEADDR flag of socketB.
Other Systems
In case your system is not listed above, I wrote a little test program that you can use to find out how your system handles these two options. Also if you think my results are wrong, please first run that program before posting any comments and possibly making false claims.
All that the code requires to build is a bit POSIX API (for the network parts) and a C99 compiler (actually most non-C99 compiler will work as well as long as they offer inttypes.h and stdbool.h; e.g. gcc supported both long before offering full C99 support).
All that the program needs to run is that at least one interface in your system (other than the local interface) has an IP address assigned and that a default route is set which uses that interface. The program will gather that IP address and use it as the second "specific address".
It tests all possible combinations you can think of:
TCP and UDP protocol
Normal sockets, listen (server) sockets, multicast sockets
SO_REUSEADDR set on socket1, socket2, or both sockets
SO_REUSEPORT set on socket1, socket2, or both sockets
All address combinations you can make out of 0.0.0.0 (wildcard), 127.0.0.1 (specific address), and the second specific address found at your primary interface (for multicast it's just 224.1.2.3 in all tests)
and prints the results in a nice table. It will also work on systems that don't know SO_REUSEPORT, in which case this option is simply not tested.
What the program cannot easily test is how SO_REUSEADDR acts on sockets in TIME_WAIT state as it's very tricky to force and keep a socket in that state. Fortunately most operating systems seems to simply behave like BSD here and most of the time programmers can simply ignore the existence of that state.
Here's the code (I cannot include it here, answers have a size limit and the code would push this reply over the limit).

Mecki's answer is absolutly perfect, but it's worth adding that FreeBSD also supports SO_REUSEPORT_LB, which mimics Linux' SO_REUSEPORT behaviour - it balances the load; see setsockopt(2)

Related

TCP's socket vs Linux's TCP socket

Linux API and TCP protocol both have concepts called "socket". Are they the same concept, and does Linux's TCP socket implement TCP's socket concept?
Relation between connections and sockets:
I have heard that two connections can't share a Linux's TCP socket, and is it true?
Tenebaum's Computer Networks (5ed 2011, Section 6.5.2 The TCP Service Model, p553) says:
A socket may be used for multiple connections at the same time. In other words, two or more connections may terminate at the same socket. Connections are identified by the socket identifiers at both ends.
Since the quote says two connections can share a "socket", does the book use a different "socket" concept from Linux's TCP socket? Does the book use TCP's socket concept?
Relation between processes and sockets:
I also heard that two processes can share a Linux's TCP socket. But if two processes can share a socket, can't the processes create their own connections on the socket at will, so there are two connections on the same Linux's TCP socket? Is it a contradiction to 1, where two connections can't share a Linux TCP socket?
Can two processes share a TCP's socket?
The book references a more abstract concept of a socket, one that is not tied to a particular OS or even a network/transport protocol. In the book, a socket is simply a uniquely defined connection endpoint. A connection is thus a pair (S1, S2) of sockets, and this pair should be unique in some undefined context. An example specific to TCP using my connection right now would have an abstract socket consisting of an interface IP address and a TCP port number. There are many, many connections between stackoverflow users like myself and the abstract socket [443, 151.101.193.69] but only a single connection from my machine [27165, 192.168.1.231] to [443, 151.101.193.69], which is a fake example using a non-routable IP address so as to protect my privacy.
If we get even more concrete and assume that stackoverflow and my computer are both running linux, than we can talk about the socket as defined by man 2 socket, and the linux API that uses it. Here a socket can be created in listening mode, and this is typically called a server. This socket can be shared (shared in the sense of shared memory or state) amongst multiple processes. However, when a peer connects to this listening socket a new socket is created (as a result of the accept() call. The original listening socket may again be used to accept() another connection. I believe if there are multiple processes blocked on the accept() system call then exactly one of these is unblocked and returns with the newly created connected socket.
Let me know if there is something missing here.
Speaking as the docs you're reading do is convenient, but it's not really accurate.
Sockets are a general networking API. Their only relation with TCP is, you can set sockets up to use it. You can also set sockets up to talk over any other networking protocol the OS backs; also, you don't necessarily have to use sockets, many OS's still offer other networking APIs, some with substantial niche advantages.
The problem this leaves you with is, the nontechnical language leaves you with an idea of how things are put together but it glosses over implementation details, you can't do any detailed reasoning from the characterizations and analogies in layman's terms.
So ignore the concept you've formed of sockets. Read the actual docs, not tutorials. Write code to see if it works as you think it does. You'll learn that what you have now is a layman's understanding of "a socket", glossing over the differences between sockets you create with socket(), the ones you get from accept(), the ones you can find in Unix's filesystem, and so forth.
Even "connection" is somewhat of a simplification, even for TCP.
To give you an idea just how deep the rabbit hole goes, so is "sharing" -- you can send fd's over some kinds of sockets, and sockets are fd's, after fork() the two processes share the fd namespace, and you can dup() fd's...
A fully-set-up TCP network connection is a {host1:port1, host2:port2} pair with some tracked state at both ends and packets getting sent between those ends that update the state according to the TCP protocol i.e. rules. You can bind() a socket to a local TCP address, and connect() through that socket to remote (or local) addresses one after another, so in that sense connections can share a socket—but if you're running a server, accept()ed connections get their own dedicated socket, it's how you identify where the data you read() is coming from.
One of the common conflations is between the host:port pair a socket can be bound to and the socket itself. You wind up with one OS socket listening for new connections plus one per connection over a connection-based protocol like TCP, but they can all use the same host:port, it's easy to gloss over the reality and think of that as "a socket", and it looks like the book you're reading fell into that usage.

Can TCPv4 source and destination ports conflict with each other? Or do source and destination ports live in their own address spaces?

Let me be more specific about my question with an example: Let's say that I have a slew of little servers that all start up on different ports using TCPv4. These ports are going to be destination ports, of course. Let's further assume that these little servers don't just start up at boot time like a typical server, but rather they churn dynamically based on demand. They start up when needed, and may shut themselves down for a while, and then later start up again.
Now let's say that on this same computer, we also have lots of client processes making requests to server processes on other computers via TCPv4. When a client makes such a request, it is assigned a source port by the OS.
Let's say for the sake of this example that a client processes makes a web request to a RESTful server running on a different computer. Let's also say that the source port assigned by the OS to this request is port 7777.
For this example let's also say that while the above request is still occurring, one of our little servers wants to start up, and it wants to start up on destination port 7777.
My question is will this cause a conflict? I.e., will the server get an error because port 7777 is already in use? Or will everything be fine because these two different kinds of ports live in different address spaces that cannot conflict with each other?
One reason I'm worried about the potential for conflict here is that I've seen web pages that say that "ephemeral source port selection" is typically done in a port number range that begins at a relatively high number. Here is such a web page:
https://www.cymru.com/jtk/misc/ephemeralports.html
A natural assumption for why source ports would begin at high numbers, rather than just starting at 1, is to avoid conflict with the destination ports used by server processes. Though I haven't yet seen anything that explicitly comes out and says that this is the case.
P.S. There is, of course, a potential distinction between what the TCPv4 protocol spec has to say on this issue, and what OSes actually do. E.g., perhaps the protocol is agnostic, but OSes tend to only use a single address space? Or perhaps different OSes treat the issue differently?
Personally, I'm most interested at the moment in what Linux would do.
The TCP specification says that connections are identified by the tuple:
{local addr, local port, remote addr, remote port}
Based on this, there theoretically shouldn't be a conflict between a local port used in an existing connection and trying to bind that same port for a server to listen on, since the listening socket doesn't have a remote address/port (these are represented as wildcards in the spec).
However, most TCP implementations, including the Unix sockets API, are more strict than this. If the local port is already in use in any existing socket, you won't be able to bind it, you'll get the error EADDRINUSE. A special exception is made if the existing sockets are all in TIME_WAIT state and the new socket has the SO_REUSEADDR socket option; this is used to allow a server to restart while the sockets left over from a previous process are still waiting to time out.
For this reason, the port range is generally divided into ranges with different uses. When a socket doesn't bind a local port (either because it just called connect() without calling bind(), or by specifying IPPORT_ANY as the port in bind()), the port is chosen from the ephemeral range, which is usually very high numbered ports. Servers, on the other hand, are expected to bind to low-numbered ports. If network applications follow this convention, you should not run into conflicts.

Usage of SO_REUSEPORT with multicast UDP

In a discussion on SO_REUSEPORT, the following question was posted to reddit, but there was no answer. I am wondering if Stack Overflow knows the answer.
Can anyone tell me how this interacts with multicast?
I've got an application where the program should listen to multicast UDP messages, and this program may be started multiple times on the same computer. When a message comes it, all listening processes should get it.
I've noticed that on Linux, it works fine if I don't set SO_REUSEPORT, and if I understand correctly, setting SO_REUSEPORT may be the wrong thing to do -- I don't want UDP messages distributed between the processes, I want all processes to get a copy. However, on OS X, the second execution of the program fails to find a free port unless SO_REUSEPORT is set.
tl;dr: Is it expected to set SO_REUSEPORT when using multicast?
WIth SO_REUSEPORT, one can bind multiple sockets to the same port and address. The only requirement is that earlier sockets must have set this option. Thus, if we
want two sockets, sock1 and sock2 to be bound ot the same port (and address), then s2 would be able to reuse the port/address only if both sock1 and sock2 set SO_REUSEPORT. WIth respect to multicast, if both sock1 and sock2 are recievers of the same multicast group, then they would both get a copy of data.
You might find this answer helpful: Socket options SO_REUSEADDR and SO_REUSEPORT, how do they differ? Do they mean the same across all major operating systems?

Binding multiple times to the same port

is there a way to bind multiple listening TCP sockets on the same {IP, port}? I know I can just open a socket, bind, fork and then listen in each of the processes. But I'd like to do the same with separate processes that cannot fork after binding. Is there some way to allow this and not get the "Address already in use" error?
The only option I need is automatic load-balancing of the connections.
It looks like it makes sense to introduce a separate process that would listen on the port and act as a load balancing proxy forwarding the traffic to a pool of backend processes, either over the loopback interface or Unix sockets. If you're dealing with HTTP you could use one of the existing HTTP reverse proxies, like pound or nginx.
You could do something similar and pass the socket fd through a unix domain socket as suggested here
Unfortunately, i don't believe this is possible.
Only one process can bind TCP socket to a given port and IP address (even if it's INADDR_ANY) - that would be a completely duplicate binding. The only exception to this is the bind(2)/fork(2) dance, as you already mentioned.
That said, if you have multiple network interfaces on the machine (or setup IP aliases on a single interface), you can bind one socket to each IP address with the same port. Just remember to set SO_REUSEADDR socket option between socket(2) and bind(2) calls.
Load balancing could be done in multiple ways:
do it on a firewall mapping source IPs to pool of machines/ports,
proxy/pre-process in one process, do real work in a pool of processes,
use file descriptor passing as #Hasturkun suggests.
appears "newly possible" for linux with kernel 3.9 http://freeprogrammersblog.vhex.net/post/linux-39-introdued-new-way-of-writing-socket-servers/2
by using SO_REUSEPORT
For a highly expert variation of your task, on Linux you may map incoming packets into your process, via using netfilter's queue module. Then you can drop/modify/them.
prog1: checks if packet came from IP xxx.xxx.xxx. Match? Catch packet.
prog2: checks if prog3 busy. Match? Catch packet.
prog3: checks packet's origin...
However this results in ~20Kb of code just to handle packets, and your TCP/IP stack wont survive it long (big traffic == big mistake).
On Windows, you can achieve same with Winsock driver.
It's an esotheric solution, just create one dispatcher process to fork your others. Look after nginx, or apache2, or cometd, or any Perl's asynchronous TCP module to catch the idea.

What is the meaning of SO_REUSEADDR (setsockopt option) - Linux? [duplicate]

This question already has answers here:
How do SO_REUSEADDR and SO_REUSEPORT differ?
(2 answers)
Closed 9 years ago.
From the man page:
SO_REUSEADDR Specifies that the rules
used in validating addresses supplied
to bind() should allow reuse of local
addresses, if this is supported by the
protocol. This option takes an int
value. This is a Boolean option
When should I use it? Why does "reuse of local addresses" give?
TCP's primary design goal is to allow reliable data communication in the face of packet loss, packet reordering, and — key, here — packet duplication.
It's fairly obvious how a TCP/IP network stack deals with all this while the connection is up, but there's an edge case that happens just after the connection closes. What happens if a packet sent right at the end of the conversation is duplicated and delayed, such that the 4-way shutdown packets get to the receiver before the delayed packet? The stack dutifully closes down its connection. Then later, the delayed duplicate packet shows up. What should the stack do?
More importantly, what should it do if a program with open sockets on a given IP address + TCP port combo closes its sockets, and then a brief time later, a program comes along and wants to listen on that same IP address and TCP port number? (Typical case: A program is killed and is quickly restarted.)
There are a couple of choices:
Disallow reuse of that IP/port combo for at least 2 times the maximum time a packet could be in flight. In TCP, this is usually called the 2×MSL delay. You sometimes also see 2×RTT, which is roughly equivalent.
This is the default behavior of all common TCP/IP stacks. 2×MSL is typically between 30 and 120 seconds, and it shows up in netstat output as the TIME_WAIT period. After that time, the stack assumes that any rogue packets have been dropped en route due to expired TTLs, so that socket leaves the TIME_WAIT state, allowing that IP/port combo to be reused.
Allow the new program to re-bind to that IP/port combo. In stacks with BSD sockets interfaces — essentially all Unixes and Unix-like systems, plus Windows via Winsock — you have to ask for this behavior by setting the SO_REUSEADDR option via setsockopt() before you call bind().
SO_REUSEADDR is most commonly set in network server programs, since a common usage pattern is to make a configuration change, then be required to restart that program to make the change take effect. Without SO_REUSEADDR, the bind() call in the restarted program's new instance will fail if there were connections open to the previous instance when you killed it. Those connections will hold the TCP port in the TIME_WAIT state for 30-120 seconds, so you fall into case 1 above.
The risk in setting SO_REUSEADDR is that it creates an ambiguity: the metadata in a TCP packet's headers isn't sufficiently unique that the stack can reliably tell whether the packet is stale and so should be dropped rather than be delivered to the new listener's socket because it was clearly intended for a now-dead listener.
If you don't see that that is true, here's all the listening machine's TCP/IP stack has to work with per-connection to make that decision:
Local IP: Not unique per-conn. In fact, our problem definition here says we're reusing the local IP, on purpose.
Local TCP port: Ditto.
Remote IP: The machine causing the ambiguity could re-connect, so that doesn't help disambiguate the packet's proper destination.
Remote port: In well-behaved network stacks, the remote port of an outgoing connection isn't reused quickly, but it's only 16 bits, so you've got 30-120 seconds to force the stack to get through a few tens of thousands of choices and reuse the port. Computers could do work that fast back in the 1960s.
If your answer to that is that the remote stack should do something like TIME_WAIT on its side to disallow ephemeral TCP port reuse, that solution assumes that the remote host is benign. A malicious actor is free to reuse that remote port.
I suppose the listener's stack could choose to strictly disallow connections from the TCP 4-tuple only, so that during the TIME_WAIT state a given remote host is prevented from reconnecting with the same remote ephemeral port, but I'm not aware of any TCP stack with that particular refinement.
Local and remote TCP sequence numbers: These are also not sufficiently unique that a new remote program couldn't come up with the same values.
If we were re-designing TCP today, I think we'd integrate TLS or something like it as a non-optional feature, one effect of which is to make this sort of inadvertent and malicious connection hijacking impossible. That requires adding large fields (128 bits and up) which wasn't at all practical back in 1981, when the document for the current version of TCP (RFC 793) was published.
Without such hardening, the ambiguity created by allowing re-binding during TIME_WAIT means you can either a) have stale data intended for the old listener be misdelivered to a socket belonging to the new listener, thereby either breaking the listener's protocol or incorrectly injecting stale data into the connection; or b) new data for the new listener's socket mistakenly assigned to the old listener's socket and thus inadvertently dropped.
The safe thing to do is wait out the TIME_WAIT period.
Ultimately, it comes down to a choice of costs: wait out the TIME_WAIT period or take on the risk of unwanted data loss or inadvertent data injection.
Many server programs take this risk, deciding that it's better to get the server back up immediately so as to not miss any more incoming connections than necessary.
This is not a universal choice. Many programs — even server programs requiring a restart to apply a settings change — choose instead to leave SO_REUSEADDR alone. The programmer may know these risks and is choosing to leave the default alone, or they may be ignorant of the issues but are getting the benefit of a wise default.
Some network programs offer the user a choice among the configuration options, fobbing the responsibility off on the end user or sysadmin.
SO_REUSEADDR allows your server to
bind to an address which is in a
TIME_WAIT state.
This socket option tells the kernel that even if this port is busy (in the TIME_WAIT state), go ahead and reuse it anyway. If it is busy, but with another state, you will still get an address already in use error. It is useful if your server has been shut down, and then restarted right away while sockets are still active on its port.
From unixguide.net
When you create a socket, you don't really own it. The OS (TCP stack) creates it for you and gives you a handle (file descriptor) to access it. When your socket is closed, it take time for the OS to "fully close it" while it goes through several states. As EJP mentioned in the comments, the longest delay is usually from the TIME_WAIT state. This extra delay is required to handle edge cases at the very end of the termination sequence and make sure the last termination acknowledgement either got through or had the other side reset itself because of a timeout. Here you can find some extra considerations about this state. The main considerations are pointed out as follow :
Remember that TCP guarantees all data transmitted will be delivered,
if at all possible. When you close a socket, the server goes into a
TIME_WAIT state, just to be really really sure that all the data has
gone through. When a socket is closed, both sides agree by sending
messages to each other that they will send no more data. This, it
seemed to me was good enough, and after the handshaking is done, the
socket should be closed. The problem is two-fold. First, there is no
way to be sure that the last ack was communicated successfully.
Second, there may be "wandering duplicates" left on the net that must
be dealt with if they are delivered.
If you try to create multiple sockets with the same ip:port pair really quick, you get the "address already in use" error because the earlier socket will not have been fully released. Using SO_REUSEADDR will get rid of this error as it will override checks for any previous instance.

Resources