In Linux, is there any example of using AF_LOCAL (unix domain sockets) to communicate between processes (IPC) without using a file? (on a read only filesystem)
I must use a Unix Domain socket, but I don't have file create/write access on the system.
Thank you in advance.
You can create a unix domain socket with an "abstract socket address". Simply make the first character of the sun_path string in the sockaddr_un you pass to bind be '\0'. After this initial NUL, write a string to the remainder of sun_path and pad it out to UNIX_PATH_MAX with NULs (or anything else).
Sockets created this way will not have any filesystem entry, but instead will be placed into an invisible system-wide socket namespace. The socket name is not a null-terminated string; it's a UNIX_PATH_MAX length string starting with a NUL, and any other NULs have no special significance. So it's vitally important to pad out that name, or you'll put extra uninitialized memory garbage into that name, with unexpected results. By convention, this is generally done with NUL pads, but it's up to you.
For more information, consult unix(7), and specifically the part on abstract socket addresses. A fully worked example can also be found here.
Related
Is a Scheme input port the same thing as a C FILE* or a python file? Is that the same thing as the unix file descriptor concept? If not, how does an Input Port differe from the others (and why is it it called that way and not just a 'file')?
It's approximately, but only approximately equivalent: ports are objects to and from which you can write or read other objects: bytes or characters (usually).
But the approximation is not terribly close. Ports from which you can read or write characters rather than bytes are, well, ports which handle characters. That means that before they can, for instance, write some octets down on the TCP connection underlying the port, they have to translate those characters into octets using (with luck) some standard encoding. I don't think the mechanism for controlling this encoding & decoding is specified in Scheme, but it must exist. So a port is, at least sometimes, a complicated thing.
(As for why they're called 'ports': well, C made a choice to call communication endpoints 'file descriptors' which makes sense in the context of the Unix 'everything's a file' idea (calling bytes chars was always just a mistake though). But Scheme doesn't come from a Unix/C background so there's really no reason to do that. Given what we call communication endpoints in TCP/IP, 'port' seems quite a good choice of name.)
Everywhere I read about named pipes vs sockets, I see that a socket can do what a named pipe can, and more. I am guessing that pipes may be faster because they are simpler, but I haven't find it explicitly anywhere.
In Linux, is there any advantage in using named pipes instead of sockets?
Thanks for your help and time.
EDIT: I found some comparisons here, but these are UNIX-sockets (a.k.a. UNIX domain sockets), which based on this, I understand they are a different creature.
Therefore, I clarify: I am asking about TCP sockets versus Unix named pipes (because MS Windows has another thing called "named pipes", but they seem to be functionally more similar to UNIX domain sockets).
At this point, it looks more likely to me that UNIX named pipes are faster than TCP sockets, but I would like to know some technicalities on why that would the case, and other potential benefits.
This is likely an imperfect answer, but here's what I know.
Sending data over a TCP socket means that the transmission needs to through your networking system, will get a source & destination IP, split up in a maximum of 64K packets. Packaged in both a TCP and IP envelope, potentially go through firewall rules. The receiver also needs to acknowledge packages, make sure they arrive in order and there needs to be a negotation in place in case packages get lost and need to be re-sent.
Sending data in a named pipe more or less works like just writing to a file descriptor like STDOUT, and reading from STDIN.
Going through the network stack (even if it's just localhost) simply has a lot more layers and complexity. The system that can send a message reliably to the other side of the world needs that, but a simple local named pipe does not.
Everywhere I read about named pipes vs sockets, I see that a socket can do what a named pipe can, and more.
Named pipes are used like files.
This is a large benefit if you want to do something that works with a file but it does not work with a socket.
Examples:
ls > /dev/lp1
ls > ./myNamedPipe
# Not possible: ls > 127.0.0.1:9323
dd if=myFile bs=1 count=100 of=/dev/lp1
dd if=myFile bs=1 count=100 of=./myNamedPipe
# Not possible: dd if=myFile bs=1 count=100 of=127.0.0.1:9323
MS Windows has another thing called "named pipes", but they seem to be functionally more similar to UNIX domain sockets
The truth is that MS Windows has additional API functions for accessing named pipes. However, the standard Windows file API (that is used by C functions like open(), read() and write()) also works with named pipes.
For this reason you can also use a named pipe as file in Windows. I already did that to emulate a serial port with a certain device connected.
... and other potential benefits.
One obvious benefit (of both named pipes and Unix sockets) is the naming itself:
If some program foobarSimpleProgram wants to communicate to another program named foobarOtherProgram, it can simply create a Unix socket or a named pipe named /tmp/foobarProgramSuite.
It's very unlikely that any other program uses this name.
In the case of a TCP socket listening on localhost, the program must either use a fixed TCP port; in this case there is the risk that another program uses the same TCP port so only one of the two programs can be used.
Or the program binds to TCP port 0 so the OS assigns some free TCP port. Then the program writes the port number to a file /tmp/foobarProgramSuite which is read by the other program, which is then doing the connect().
This is more complicated than directly connecting to the pipe or socket with the given name.
So, basically the title says it all. I've been porting my unix socket C code to Windows, and apparently those structures do not have sin_len or sin6_len in windows.
I'm using a union between sockaddr_storage, sockaddr_in and sockaddr_in6 everywhere, and just using the correct member according to ss_family. It would make sense that the socket library could just deduce the size according to the family, so the length field would indeed be redundant.
If I comment out the code that sets the length field, everything still works on OSX and linux, but that may be just an illusion, so I decided to ask here.
Is that variable deprecated, somehow? Can I safely stop using it, and rely on the socket implementation to use the family variable?
The sin_len field is not required by the POSIX specification.
Here's the relevant information from Unix Network Programming - Vol 1. The Sockets Networking API, 3rd Edition:
3.2 Socket Address Structures
The length member, sin_len, was added with 4.3BSD-Reno, when support
for the OSI protocols was added (Figure 1.15). Before this release,
the first member was sin_family, which was historically an unsigned
short. Not all vendors support a length field for socket address
structures and the POSIX specification does not require this member.
Further Steven's provides the motivation behind the field:
Having a length field simplifies the handling of variable-length socket address structures.
Even if the length field is present, we need never set it and need never examine it, unless we are dealing with routing sockets (Chapter 18). It is used within the kernel by the routines that deal with socket address structures from various protocol families (e.g, the routing table code).
The four socket functions that pass a socket address structure from the process to the kernel, bind, connect, sendto, and sendmsg, all go through the sockargs function in a Berkeley-derived implementation (p. 452 of TCPv2). This function copies the socket address structure from the process and explicitly sets its sin_len member to the size of the structure that was passed as an argument to these four functions. The five socket functions that pass a socket address structure from the kernel to the process, accept, recvfrom, recvmsg, getpeername, and getsockname, all set the sin_len member before returning to the process.
Unfortunately, there is normally no simple compile-time test to determine whether an implementation defines a length field for its socket address structures...We will see in Figure 3.4 that IPv6 implementations are required to define SIN6_LEN if the socket address structures have a length field. Some IPv4 implementations provide the length field of the socket address structure to the application based on a compile-time option (e.g., _SOCKADDR_LEN).
You'll want to evaluate the usage of sin_len in your code. If it's just initializing it to 0, you can remove the code. If you're reading the value from the result of accept, recvfrom, recvmsg, getpeername, or getsockname, you'll unfortunately need some platform specific compile switches or switch to using the separate address length variable
Is there an equivalent in Linux for the mbuf(message buffer)data structures that holds the actual packet data that is to be transmitted over networks? I assumed that this is a generic UNIX structure but apparently it's unique to FreeBSD.
There's the sk_buff, I don't know enough to say how much alike it is with mbuf in practise: Networking: sk_buff.
So it turns out that the mbuf(message buffer) and pbuf(packet buffer) structures are part of the FreeBSD network stack. The sk_buff(socket buffer) is the Linux equivalent of mbuf and contains all the information about the message data being transmitted as well as the packet structure.
read(2) and write(2) works both on socket descriptor as well as on file descriptor. In case of file descriptor, User file descriptor table->file table and finally to inode table where it checks for the file type(regular file/char/block), and reads accordingly. In case of char spl file, it gets the function pointers based on the major number of the file from the char device switch and calls the appropriate read/write routines registered for the device.
Similarly appropriate read/write routine is called for block special file by getting the function pointers from the block device switch.
Could you please let me know what exatly happens when read/write called on socket descriptor. If read/write works on socket descriptor, we cant we use open instead of socket to get the descriptor?
As i know in memory, the file descriptor will contains flag to identify the file-system type of this fd. The kernel will invoke corresponding handler function depends on the file-system type. You can see the source read_write.c in linux kernel.
To be speak in brief, the kernel did:
In read-write.c, there is a file_system_wrapper function, that call corresponding handler function depends on fd's file type (ext2/ ext3/ socket/ ..)
In socket.c, there is a socket_type_wrapper function; that calls corresponding socket handler function depends on socket's type (ipv4, ipv6, atm others)
In socket_ipv4.c, there is a protocol_type wrapper function; that calls corresponding protocol handler function depends on protocol tpye (udp/ tcp)
In tcp_ip4.c; there is tcp_sendmsg and this function would be called when write to FD of tcp ipv4 type.
Hope this clearly,
thanks,
Houcheng
Socket descriptors are associated with file structures too, but a set of file_operations functions for that structures differs from the usual. Initialization and use of those descriptors are therefore different. Read and write part of kernel-level interface just happened to be exactly equivalent.
read and write are valid for some types of sockets in some states; this all depends on the various structs which are passed around inside the kernel.
In principle, open() could create a socket descriptor, but the BSD sockets API was never defined that way.
There are some other (Somewhat linux-specific) types of file descriptor which are opened by system calls other than open(), for example epoll_create or timerfd_create. These work the same.