AF_UNIX socket: can I pass socket handle between processes? - linux

Let's say I create a socketpair() and I pass the handle of one of the socket to a spawned process (popen), will the said process be able to communicate back with the parent?
The examples I saw are applied using fork() which is out of scope for my current project.
Updated: I tried a simple test:
Client: socketpair with sockets[0]
From Client use posix_spawn with sockets1 as command-line argument
Client: write to socket ... Client exits without any warning...
It would appear that there is a problem with this method.
UPDATED: I also found this note:
Pipes and socketpairs are limited to communication between processes with a common ancestor.

The man page for execve states:
File descriptors open in the calling process image remain open in the new
process image, except for those for which the close-on-exec flag is set
(see close(2) and fcntl(2)). Descriptors that remain open are unaffected
by execve().
Since functions like popen are based on execve, then the file descriptors that you got from your socketpair function should be good across both processes, and I don't see why you can't pass the descriptor in whatever manner pleases you. I'm assuming that in this case you mean to convert it to a string and set it over STDIN to the sub-process, which would convert it back to an int to use as a file descriptor.
It would certainly be worth writing some trial code for.

Yes, you can pass it to the child process. The trick is really that socketpair() gives you a pair of connected sockets - make sure that the child keeps one and the parent keeps the other (the parent should close the child's and vice versa).
Most cases use a pair of pipes instead though.

Related

Can I override a system function before calling fork?

I'd like to be able to intercept filenames with a certain prefix from any of my child processes that I launch. This would be names like "pipe://pipe_name". I think wrapping the open() system call would be a good way to do this for my application, but I'd like to do it without having to compile a separate shared library and hooking it with the LD_PRELOAD trick (or using FUSE and having to have a mounted directory)
I'll be forking the processes myself, is there a way to redirect open() to my own function before forking and have it persist in the child after an exec()?
Edit: The thought behind this is that I want to implement multi-reader pipes by having an intermediate process tee() the data from one pipe into all the others. I'd like this to be transparent to my child processes, so that they can take a filename and open() it, and, if it's a pipe, I'll return the file descriptor for it, while if it's a normal file, I'll just pass that to the regular open() function. Any alternative way to do this that makes it transparent to the child processes would interesting to hear. I'd like to not have to compile a separate library that has to be pre-linked though.
I believe the answer here is no, it's not possible. To my knowledge, there's only three ways to achieve this:
LD_PRELOAD trick, compile a .so that's pre-loaded to override the system call
Implement a FUSE filesystem and pass a path into it to the client program, intercept calls.
Use PTRACE to intercept system calls, and fix them up as needed.
2 and 3 will be very slow as they'll have to intercept every I/O call and swap back to user space to handle it. Probably 500% slower than normal. 1 requires building and maintaining an external shared library to link in.
For my application, where I'm just wanting to be able to pass in a path to a pipe, I realized I can open both ends in my process (via pipe()), then go grab the path to the read end in /proc//fd and pass that path to the client program, which gives me everything I need I think.

Any way to give a file descriptor to a child process, without closing it?

The Stdio type implements FromRawFd, which lets me build one out of any file descriptor. (In my case, I want to use pipes.) That's exactly what I need, but my problem is that the stdin()/stdout()/stderr() methods take their Stdio argument by value. That means that when the Command object goes out of scope, all its fd's get closed. Is there any way to give an fd to a child process sort of by reference, so that it's still available in the parent process after the child is done? Right now I've settled for just calling libc::dup() for each child, which doesn't seem great.
There is currently no better solution, alas. However, the correct solution would be a Command::into_io(self) -> (Option<StdIo>, Option<StdIo>, Option<StdIo>) method that deconstructs the Command to return stdin, stdout and stderr, if available.
I've filed an issue to add that function.

How to send huge amounts of data from child process to parent process in a non-blocking way in Node.js?

I'm trying to send a huge json string from a child process to the parent process. My initial approach was the following:
child:
process.stdout.write(myHugeJsonString);
parent:
child.stdout.on('data', function(data) { ...
But now I read that process.stdout is blocking:
process.stderr and process.stdout are unlike other streams in Node in
that writes to them are usually blocking.
They are blocking in the case that they refer to regular files or TTY file descriptors.
In the case they refer to pipes:
They are blocking in Linux/Unix.
They are non-blocking like other streams in Windows.
The documentation for child_process.spawn says I can create a pipe between the child process and the parent process using the pipe option. But isn't piping my stdout blocking in Linux/Unix (according to cited docs above)?
Ok, what about the Stream objectoption? Hmmmm, it seems I can share a readable or writable stream that refers to a socket with the child process. Would this be non-blocking? How would I implement that?
So the question stands: How do I send huge amounts of data from a child process to the parent process in a non-blocking way in Node.js? A cross-platform solution would be really neat, examples with explanation very appreciated.
One neat trick I used on *nix for this is the fifo pipes (http://linux.about.com/library/cmd/blcmdl4_fifo.htm). This allows child to write to a file like thing and the parent to read from the same. The file is not really on the fs so you don't get any IO problems, all access is handled by the kernel itself. But... if you want it cross-platform, that won't work. There's no such thing on Windows (as far as I know).
Just note that you define the size of the pipe and if what you write to it (from child) is not read by something else (from parent), then the child will block when the pipe is full. This does not block the node processes, they see the pipe as a normal file stream.
I had a similar problem and I think I have a good solution by setting-up a pipe when spawning the child process and using the resulting file descriptor to duplex data to the clients end.
How to transfer/stream big data from/to child processes in node.js without using the blocking stdio?
Apparently you can use fs to stream to/from file descriptors:
How to stream to/from a file descriptor in node?
The documentation for child_process.spawn says I can create a pipe between the child process and the parent process using the pipe option. But isn't piping my stdout blocking in Linux/Unix (according to cited docs above)?
No. The docs above say stdout/stderr, and in no way do they say "all pipes".
It won't matter that stdout/stderr are blocking. In order for a pipe to block, it needs to fill up, which takes a lot of data. In order to fill up, the reader at the other end has to be reading slower than you are writing. But... you are the other end, you wrote the parent process. So, as long as your parent process is functioning, it should be reading from the pipes.
Generally, blocking of the child is a good thing. If its producing data faster than the parent can handle there are ultimately only two possibilities:
1. it blocks, so stops producing data until the parent catches up
2. it produces more data than the parent can consume, and buffers that data in local memory until it hits the v8 memory limit, and the process aborts
You can use stdout to send your json, if you want 1)
You can use a new 'pipe' to send your json, if you want 2)

cross-process locking in linux

I am looking to make an application in Linux, where only one instance of the application can run at a time. I want to make it robust, such that if an instance of the app crashes, that it won't block all the other instances indefinitely. I would really appreciate some example code on how to do this (as there's lots of discussion on this topic on the web, but I couldn't find anything which worked when I tried it).
You can use file locking facilities that Linux provides. You haven't specified the language, however you might find this capability pretty much everywhere in some form or another.
Here is a simple idea how to do that in a C program. When the program starts you can take an exclusive non-blocking lock on the whole file using fcntl system call. When another instance of the applications is attempted to be started, it will get an error trying to lock the file, which will mean the application is already running.
Here is a small example how to take the full file lock using fcntl (this function provides facilities for putting byte range locks, but when length is 0, the full file is locked).
struct flock lock_struct;
memset(&lock_struct, 0, sizeof(lock_struct));
lock_struct.l_type = F_WRLCK;
lock_struct.l_whence = SEEK_SET;
lock_struct.l_pid = getpid();
ret = fcntl(fd, F_SETLK, &lock_struct);
Please note that you need to open a file first to put a lock. This means you need to have a file around to use for locking. It might be useful to put the it somewhere where it won't cause any distraction/confusion for other applications.
When the process terminates, all locks that it has taken will be released, so nothing will be blocked.
This is just one of the ideas. I'm pretty sure there are other ways around.
The conventional UNIX way of doing this is with PID files.
Before a process starts, it checks to see if a pre-determined file - usually /var/run/<process_name>.pid exists. If found, its an indication that a process is already running and this process quits.
If the file does not exist, this is the first process to run. It creates the file /var/run/<process_name>.pid and writes its PID into it. The process unlinks the file on exit.
Update:
To handle cases where a daemon has crashed & left behind the pid file, additional checks can be made during startup if a pid file was found:
Do a ps and ensure that a process with that PID doesn't exist
If it exists ensure that its a different process
from the said ps output
from /proc/$PID/stat

Why isn't close_on_exec the default configuration?

Since there seems no way to use already opened fd after exec,
why isn't this flag the default?
File descriptors can be used past an exec call; that's how the Unix utilities get their standard input/output/error fds from the shell, for example.
Close-on-exec is not the default because the POSIX standard (and Unix tradition) mandates the opposite behavior:
File descriptors open in the calling process image shall remain open in the new process image, except for those whose close-on- exec flag FD_CLOEXEC is set.
Because on UNIX one of the most used features is/was piping streams between processes - and you cannot do that if the CLOEXEC flag is set (child process cannot inherit file descriptor(s), e.g.: STDOUT_FILENO).
And no, it is not true that you cannot use inherited file descriptors after exec (example: standard streams). You also can use any inherited file descriptor as long as you know its value (it is an integer). This value is often passed to a child process by argument (quite a few UNIX programs do that), or you can do this any other way using any IPC (Inter-Process Communication) mechanism of your choice.
I wouldn't mind getting an more complete answer to this, but it's quite easy to guess it's for backward compatibility. the close-on-exec flag had to be introduced sometime. Code existing from before that time didn't know about it, and wouldn't work correctly unless changed. Therefore it's off by default.
Unfortunately bugs happen because of this, a daemon process forking a cgi might leave the listen socket open, and if the cgi doesn't quit or close it, the daemon can not be restarted. So I agree with you that it's not really a good default.

Resources