what is the difference between socket and sock? - linux

I found that there is "socket: /tmp/mysql.sock" in some config files, so what is the difference between socket and sock ?

sock is an abbreviation of socket (also a sock is something that is worn on the feet).

'Socket' is the technical term for a handle that refers to a network endpoint. It originated in the Unix API for networks, and has since leaked over to Windows.
A socket can refer to TCP connection, a UDP packet endpoint, an X.25 connection, or ... a Unix domain socket, which is a mutant named pipe.
People tend to name Unix domain sockets with 'sock' in the name, but there's no requirement.

Sock is just an abbreviation for socket. But if you see a socket as a file it's a Unix socket as opposed to a TCP socket.
This link has some information on the differences between the two.
http://lists.freebsd.org/pipermail/freebsd-performance/2005-February/001143.html
Basically a TCP socket communicates over the network, and a Unix socket is similar except it can only connect to your localhost. (127.0.0.1)

Related

Node.js - How to check if Unix Domain Socket exists in Node.js?

I am writing a docker client in Node.js. It must connect to unix domain socket /var/run/docker.sock if it exists or fallback to TCP socket, typically tcp://localhost:2375 (when client is running on Windows).
The question is - How can I verify the presence of domain socket before falling back to TCP socket?
I can probably check for the presence of TCP socket as mentioned in this answer or attempt to connect to the domain socket. But that does not look clean.

TCP's socket vs Linux's TCP socket

Linux API and TCP protocol both have concepts called "socket". Are they the same concept, and does Linux's TCP socket implement TCP's socket concept?
Relation between connections and sockets:
I have heard that two connections can't share a Linux's TCP socket, and is it true?
Tenebaum's Computer Networks (5ed 2011, Section 6.5.2 The TCP Service Model, p553) says:
A socket may be used for multiple connections at the same time. In other words, two or more connections may terminate at the same socket. Connections are identified by the socket identifiers at both ends.
Since the quote says two connections can share a "socket", does the book use a different "socket" concept from Linux's TCP socket? Does the book use TCP's socket concept?
Relation between processes and sockets:
I also heard that two processes can share a Linux's TCP socket. But if two processes can share a socket, can't the processes create their own connections on the socket at will, so there are two connections on the same Linux's TCP socket? Is it a contradiction to 1, where two connections can't share a Linux TCP socket?
Can two processes share a TCP's socket?
The book references a more abstract concept of a socket, one that is not tied to a particular OS or even a network/transport protocol. In the book, a socket is simply a uniquely defined connection endpoint. A connection is thus a pair (S1, S2) of sockets, and this pair should be unique in some undefined context. An example specific to TCP using my connection right now would have an abstract socket consisting of an interface IP address and a TCP port number. There are many, many connections between stackoverflow users like myself and the abstract socket [443, 151.101.193.69] but only a single connection from my machine [27165, 192.168.1.231] to [443, 151.101.193.69], which is a fake example using a non-routable IP address so as to protect my privacy.
If we get even more concrete and assume that stackoverflow and my computer are both running linux, than we can talk about the socket as defined by man 2 socket, and the linux API that uses it. Here a socket can be created in listening mode, and this is typically called a server. This socket can be shared (shared in the sense of shared memory or state) amongst multiple processes. However, when a peer connects to this listening socket a new socket is created (as a result of the accept() call. The original listening socket may again be used to accept() another connection. I believe if there are multiple processes blocked on the accept() system call then exactly one of these is unblocked and returns with the newly created connected socket.
Let me know if there is something missing here.
Speaking as the docs you're reading do is convenient, but it's not really accurate.
Sockets are a general networking API. Their only relation with TCP is, you can set sockets up to use it. You can also set sockets up to talk over any other networking protocol the OS backs; also, you don't necessarily have to use sockets, many OS's still offer other networking APIs, some with substantial niche advantages.
The problem this leaves you with is, the nontechnical language leaves you with an idea of how things are put together but it glosses over implementation details, you can't do any detailed reasoning from the characterizations and analogies in layman's terms.
So ignore the concept you've formed of sockets. Read the actual docs, not tutorials. Write code to see if it works as you think it does. You'll learn that what you have now is a layman's understanding of "a socket", glossing over the differences between sockets you create with socket(), the ones you get from accept(), the ones you can find in Unix's filesystem, and so forth.
Even "connection" is somewhat of a simplification, even for TCP.
To give you an idea just how deep the rabbit hole goes, so is "sharing" -- you can send fd's over some kinds of sockets, and sockets are fd's, after fork() the two processes share the fd namespace, and you can dup() fd's...
A fully-set-up TCP network connection is a {host1:port1, host2:port2} pair with some tracked state at both ends and packets getting sent between those ends that update the state according to the TCP protocol i.e. rules. You can bind() a socket to a local TCP address, and connect() through that socket to remote (or local) addresses one after another, so in that sense connections can share a socket—but if you're running a server, accept()ed connections get their own dedicated socket, it's how you identify where the data you read() is coming from.
One of the common conflations is between the host:port pair a socket can be bound to and the socket itself. You wind up with one OS socket listening for new connections plus one per connection over a connection-based protocol like TCP, but they can all use the same host:port, it's easy to gloss over the reality and think of that as "a socket", and it looks like the book you're reading fell into that usage.

How to multiplex AF_INET sockets to a daemon and get the information about the original port in the application?

I would like to write a daemon that is available on a large number of AF_INET-SOCK_STREAM and AF_INET-SOCK_DGRAM sockets for network debugging purposes.
To avoid excessive resource usage I want to avoid opening a large number of ports on the application layer but try to multiplex the connections per socket type on lower layers.
Knowledge of the original incoming port on the application layer is a requirement.
I have successfully implemented a daemon that listens on an AF_INET SOCK_STREAM socket that is multiplexed by an iptables REDIRECT rule. The original incoming port of the connection can be retrieved by calling getsockopt with SO_ORIGINAL_DST. As I understand this does not work with AF_INET SOCK_DGRAM.
I have also successfully implemented a daemon that listens on an AF_INET SOCK_DGRAM socket that is multiplexed by an iptables TPROXY rule. The original incoming port of the connection can be retrieved by using recvmsg() and consuming the available ancillary message containing information about the connection before multiplexing. As I understand this does not work with AF_INET SOCK_STREAM.
Is there a transport-layer-agnostic way of multiplexing such socket connections and retrieving information about the original incoming port? Possibly even suitable for protocols like SCTP or DCCP?
With TCP you have a connection. The target port for this connection is the same for all packets inside this connection. In this case each connected socket (result from accept) equals to a single connection and the incoming port is a property of this socket. It does not matter in this case if the listening socket will accept connections on multiple ports, all what matters is the connected socket.
With UDP you don't have a connection. Instead the same socket is used to receive packages from multiple clients and in your case to multiple incoming ports. Source and destination IP and port are thus a property of each packet and not of the socket.
That's why you need different interfaces to retrieve the original incoming port: a socket based for TCP and a packet based for UDP.

Why does socketpair() allow SOCK_DGRAM type?

I've been learning about Linux socket programming recently, mostly from this site.
The site says that using the domain/type combination PF_LOCAL/SOCK_DGRAM...
Provides datagram services within the local host. Note that this
service is connectionless, but reliable, with the possible exception
that packets might be lost if kernel buffers should become exhausted.
My question, then, is why does socketpair(int domain, int type, int protocol, int sv[2]) allow this combination, when according to its man page...
The socketpair() call creates an unnamed pair of connected sockets in
the specified domain, of the specified type...
Isn't there a contradiction here?
I thought SOCK_DGRAM in the PF_LOCAL and PF_INET domains implied UDP, which is a connectionless protocol, so I can't reconcile the seeming conflict with socketpair()'s claim to create connected sockets.
Datagram sockets have "pseudo-connections". The protocol doesn't really have connections, but you can still call connect(). This associates a remote address and port with the socket, and then it only receives packets that come from that source, rather than all packets whose destination is the address/port that the socket is bound to, and you can use send() rather than sendto() to send back to this remote address.
An example where this might be used is the TFTP protocol. The server initially listens for incoming requests on the well-known port. Once a transfer has started, a different port is used, and the sender and receiver can use connect() to associate a socket with that pair of ports. Then they can simply send and receive on that new socket to participate in the transfer.
Similarly, if you use socketpair() with datagram sockets, it creates a pseudo-connection between the two sockets.

Is there only one Unix Domain Socket in the communication between two processes?

There are two kinds of sockets: network sockets and Unix Domain Sockets.
When two processes communicate using network sockets, each process creates its own network socket, and the processes communicate by connection between their sockets. There are two sockets, each belonging to a different process, being the connection endpoint of each process
When two processes communicate using Unix Domain sockets, a Unix Domain socket is addressed by a filename in the filesystem.
Does that imply the two processes communicate by only one Unix Domain socket, instead of two?
Does the Unix Domain socket not belong to any process, i.e. is the Unix domain socket not a connection endpoint of any process, but somehow like a "middle point" between the two processes?
There are two sockets, one on each end of the connection. Each of them, independently, may or may not have a name in the filesystem.
The thing you see when you ls -l that starts with srwx is not really "the socket". It's a name that is bound to a socket (or was bound to a socket in the past - they don't automatically get removed when they're dead).
An analogy: think about TCP sockets. Most of them involve an endpoint with a well-known port number (22 SSH; 25 SMTP; 80 HTTP; etc.) A server creates a socket and binds to the well-known port. A client creates a socket and connects to the well-known port. The client socket also has a port number, which you can see in a packet trace (tcpdump/wireshark), but it's not a fixed number, it's just some number that was automatically chosen by the client's kernel because it wasn't already in use.
In unix domain sockets, the pathname is like the port number. If you want clients to be able to find your server socket, you need to bind it to a well-known name, like /dev/log or /tmp/.X11-unix/X0. But the client doesn't need to have a well-known name, so normally it doesn't do a bind(). Therefore the name /tmp/.X11-unix/X0 is only associated with the server socket. You can confirm this with netstat -x. About half the sockets listed will have pathnames, and the other half won't. Or write your own client/server pair, and call getsockname() on the client. Its name will be empty, while getsockname() on the server gives the pathname.
The ephemeral port number automatically assigned to a TCP client has no counterpart in unix domain socket addresses. In TCP it's necessary to have a local port number so incoming packets can be matched to the correct socket. Unix domain sockets are linked directly in their kernel data structures, so there's no need. A client can be connected to a server and have no name.
And then there's socketpair() which creates 2 unix domain sockets connected to each other, without giving a name to either of them.
(Not mentioned here, and not really interesting: the "abstract" namespace.)

Resources