AFAIK, if a unix socket is created as type SOCK_STREAMS then it's unidirectional, and if SOCK_DGRAM then it's bidirectional.
I'm looking at using node-ipc for IPC via unix sockets. It has an example of a client and server that communicate supposedly via a unidirectional unix socket SOCK_STREAMS - yet seems to send messages in both directions.
What am I missing? Is the same socket path being used in two directions? Is it really using SOCK_STREAMS?
AFAIK, if a unix socket is created as type SOCK_STREAMS then it's unidirectional, and if SOCK_DGRAM then it's bidirectional.
No. Domain SOCK_STREAMS are bi-directional as in the case of internet SOCK_STREAMS.
Related
Linux API and TCP protocol both have concepts called "socket". Are they the same concept, and does Linux's TCP socket implement TCP's socket concept?
Relation between connections and sockets:
I have heard that two connections can't share a Linux's TCP socket, and is it true?
Tenebaum's Computer Networks (5ed 2011, Section 6.5.2 The TCP Service Model, p553) says:
A socket may be used for multiple connections at the same time. In other words, two or more connections may terminate at the same socket. Connections are identified by the socket identifiers at both ends.
Since the quote says two connections can share a "socket", does the book use a different "socket" concept from Linux's TCP socket? Does the book use TCP's socket concept?
Relation between processes and sockets:
I also heard that two processes can share a Linux's TCP socket. But if two processes can share a socket, can't the processes create their own connections on the socket at will, so there are two connections on the same Linux's TCP socket? Is it a contradiction to 1, where two connections can't share a Linux TCP socket?
Can two processes share a TCP's socket?
The book references a more abstract concept of a socket, one that is not tied to a particular OS or even a network/transport protocol. In the book, a socket is simply a uniquely defined connection endpoint. A connection is thus a pair (S1, S2) of sockets, and this pair should be unique in some undefined context. An example specific to TCP using my connection right now would have an abstract socket consisting of an interface IP address and a TCP port number. There are many, many connections between stackoverflow users like myself and the abstract socket [443, 151.101.193.69] but only a single connection from my machine [27165, 192.168.1.231] to [443, 151.101.193.69], which is a fake example using a non-routable IP address so as to protect my privacy.
If we get even more concrete and assume that stackoverflow and my computer are both running linux, than we can talk about the socket as defined by man 2 socket, and the linux API that uses it. Here a socket can be created in listening mode, and this is typically called a server. This socket can be shared (shared in the sense of shared memory or state) amongst multiple processes. However, when a peer connects to this listening socket a new socket is created (as a result of the accept() call. The original listening socket may again be used to accept() another connection. I believe if there are multiple processes blocked on the accept() system call then exactly one of these is unblocked and returns with the newly created connected socket.
Let me know if there is something missing here.
Speaking as the docs you're reading do is convenient, but it's not really accurate.
Sockets are a general networking API. Their only relation with TCP is, you can set sockets up to use it. You can also set sockets up to talk over any other networking protocol the OS backs; also, you don't necessarily have to use sockets, many OS's still offer other networking APIs, some with substantial niche advantages.
The problem this leaves you with is, the nontechnical language leaves you with an idea of how things are put together but it glosses over implementation details, you can't do any detailed reasoning from the characterizations and analogies in layman's terms.
So ignore the concept you've formed of sockets. Read the actual docs, not tutorials. Write code to see if it works as you think it does. You'll learn that what you have now is a layman's understanding of "a socket", glossing over the differences between sockets you create with socket(), the ones you get from accept(), the ones you can find in Unix's filesystem, and so forth.
Even "connection" is somewhat of a simplification, even for TCP.
To give you an idea just how deep the rabbit hole goes, so is "sharing" -- you can send fd's over some kinds of sockets, and sockets are fd's, after fork() the two processes share the fd namespace, and you can dup() fd's...
A fully-set-up TCP network connection is a {host1:port1, host2:port2} pair with some tracked state at both ends and packets getting sent between those ends that update the state according to the TCP protocol i.e. rules. You can bind() a socket to a local TCP address, and connect() through that socket to remote (or local) addresses one after another, so in that sense connections can share a socket—but if you're running a server, accept()ed connections get their own dedicated socket, it's how you identify where the data you read() is coming from.
One of the common conflations is between the host:port pair a socket can be bound to and the socket itself. You wind up with one OS socket listening for new connections plus one per connection over a connection-based protocol like TCP, but they can all use the same host:port, it's easy to gloss over the reality and think of that as "a socket", and it looks like the book you're reading fell into that usage.
I understand that, SOCK_DGRAM and SOCK_STREAM corresponds to Connection-less and connection oriented network communication done using INET Address family.
Now i am trying to learn AF_UNIX sockets to carry out IPC between processes running on same host, and there i see we need to specify the sub_socket_type as SOCK_DGRAM Or SOCK_STREAM. I am not able to understand for AF_UNIX sockets, what is the purpose of specifying the sub socket type.
Can anyone pls help understand the significance of SOCK_DGRAM and SOCK_STREAM in the context of AF_UNIX sockets ?
It happens that TCP is both a stream protocol, and connection oriented, whereas UDP is a datagram protocol, and connectionless. However it is possible to have a connection-oriented datagram protocol. That is what a block special file (or a Windows Mailslot) are.
(You can't have a connectionless stream protocol though, it doesn't make sense, unless /dev/null counts)
The flag SOCK_DGRAM does not mean the socket is connectionless, it means that the socket is datagram oriented.
A stream-oriented socket (and a character special file like /dev/random or /dev/null) provides (or consumes, or both) a continuous sequence of bytes, with no inherent structure. Structure is provided by interpreting the contents of the stream. Generally speaking there is only one process on either end of the stream.
A datagram-oriented socket, provides (or consumes or both) short messages which are limited in size and self-contained. Generally speaking, the server can receive datagrams from multiple clients using recvfrom (which provides the caller with an address to send replies to) and replies to them with sendto specifying that address.
The question also confused me for a while, but as Ben said, with socket type is SOCK_STREAM OR SOCK_DGRAM ,they all means the same way to access inter-process communication between client and server. Under domain AF_UNIX ,it makes not one jot of difference.
I've been learning about Linux socket programming recently, mostly from this site.
The site says that using the domain/type combination PF_LOCAL/SOCK_DGRAM...
Provides datagram services within the local host. Note that this
service is connectionless, but reliable, with the possible exception
that packets might be lost if kernel buffers should become exhausted.
My question, then, is why does socketpair(int domain, int type, int protocol, int sv[2]) allow this combination, when according to its man page...
The socketpair() call creates an unnamed pair of connected sockets in
the specified domain, of the specified type...
Isn't there a contradiction here?
I thought SOCK_DGRAM in the PF_LOCAL and PF_INET domains implied UDP, which is a connectionless protocol, so I can't reconcile the seeming conflict with socketpair()'s claim to create connected sockets.
Datagram sockets have "pseudo-connections". The protocol doesn't really have connections, but you can still call connect(). This associates a remote address and port with the socket, and then it only receives packets that come from that source, rather than all packets whose destination is the address/port that the socket is bound to, and you can use send() rather than sendto() to send back to this remote address.
An example where this might be used is the TFTP protocol. The server initially listens for incoming requests on the well-known port. Once a transfer has started, a different port is used, and the sender and receiver can use connect() to associate a socket with that pair of ports. Then they can simply send and receive on that new socket to participate in the transfer.
Similarly, if you use socketpair() with datagram sockets, it creates a pseudo-connection between the two sockets.
When receiving a packet on an unconnected UDP socket bound to 0.0.0.0/INADDR_ANY, how can I determine what the local IP it was sent to?
Can I determine what interface it was received on?
Can this be also be done for connection-oriented sockets such as TCP?
Update0
Platform is Linux, so language is irrelevant but C is native.
UDP sockets are bound to INADDR_ANY host, so getsockname() returns 0.0.0.0.
Hmmm, have a look at this. So looks like there is probably a socket option, at least in the Linux/Unix world. What OSes does it need to work on?
I've had to deal with the same issue on Windows platforms. My solution was to explicitly listen on all available interfaces as that way getsockname() works as expected.
I'm trying to figure out a protocol to use with domain sockets and can't find information on how blindly the domain sockets can be trusted.
Can data be lost? Are messages always received in the same order as sent? Even when using datagram sockets?
Are transfers atomic? When reading the socket, can I trust that I get the whole message on one read or do I have to check it myself?
From 'man AF_UNIX':
Valid types are: SOCK_STREAM, for a stream-oriented socket and
SOCK_DGRAM, for a datagram-oriented socket that preserves message
boundaries (as on most Unix implementations, Unix domain datagram sockets
are always reliable and don’t reorder datagrams);