I have a command cmd and three Vec<u8>: buff1, buff2, and buff3.
I want to execute cmd, using buff1 as stdin, and capturing stdout into buff2 and stderr into buff3.
And I'd like to do all this without explicitly writing any temporary files.
std::process seems to allow all of those things, just not all at the same time.
If I use Command::new(cmd).output() it will return the buffers for stdout and stderr, but there's no way to give it stdin.
If I use Command::new(cmd).stdin(Stdio::piped()).spawn()
then I can child.stdin.as_mut().unwrap().write_all(buff1)
but I can't capture stdout and stderr.
As far as I can tell, there's no way to call Command::new(cmd).stdout(XXX) to explicitly tell it to capture stdout in a buffer, the way it does by default with .output().
It seems like something like this should be possible:
Command::new(cmd)
.stdin(buff1)
.stdout(buff2)
.stderr(buff3)
.output()
Since Rust makes a Vec<u8> look like a File, but Vec doesn't implement Into<Stdio>
Am I missing something? Is there a way to do this, or do I need to read and write with actual files?
If you're ok with using an external library, the subprocess crate supports this use case:
let (buff2, buff3) = subprocess::Exec::cmd(cmd)
.stdin(buff1)
.communicate()?
.read()?;
Doing this with std::process::Command is trickier than it seems because the OS doesn't make it easy to connect a region of memory to a subprocess's stdin. It's easy to connect a file or anything file-like, but to feed a chunk of memory to a subprocess, you basically have to write() in a loop. While Vec<u8> does implement std::io::Read, you can't use it to construct an actual File (or anything else that contains a file descriptor/handle).
Feeding data into a subprocess while at the same time reading its output is sometimes referred to as communicating in reference to the Python method introduced in 2004 with the then-new subprocess module of Python 2.4. You can implement it yourself using std::process, but you need to be careful to avoid deadlock in case the command generates output while you are trying to feed it input. (E.g. a naive loop that feeds a chunk of data to the subprocess and then reads its stdout and stderr will be prone to such deadlocks.) The documentation describes a possible approach to implement it safely using just the standard library.
If you want to read and write with buffers, you need to use the piped forms. The reason is that, at least on Unix, input and output to a process are done through file descriptors. Since a buffer cannot intrinsically be turned into a file descriptor, it's required to use a pipe and both read and write incrementally. The fact that Rust provides an abstraction for buffers doesn't allow you to avoid the fact that the operating system doesn't, and Rust doesn't abstract this for you.
However, since you'll be using pipes for both reading and writing, you'll need to use something like select so you don't deadlock. Otherwise, you could end up trying to write when your subprocess was not accepting new input because it needed data to be read from its standard output. Using select or poll (or similar) permits you to determine when each of those file descriptors are ready to be read or written to. In Rust, these functions are in the libc crate; I don't believe that Rust provides them natively. Windows will have some similar functionality, but I have no clue what it is.
It should be noted that unless you are certain that the subprocess's output can fit into memory, it may be better to process it in a more incremental way. Since you're going to be using select, that shouldn't be too difficult.
Related
I have a batch processing system that can execute a number of commands sequentially. These commands are specified as list of words, that are executed by python's subprocess.call() function, without using a shell. For various reasons I do not want to change the processing system.
I would like to write something to a file, so a subsequent command can use it. Unfortunately, all the ways I can think of to write something to the disk involve some sort of redirection, which is a shell concept.
So is there a way to write a Linux command line that will take its argument and write it to a file, in a context where it is executed outside a shell?
Well, one could write a generalised parser and process manager that could handle this for you, but, luckily, one already comes with Linux. All you have to do is tell it what command to run, and it will handle the redirection for you.
So, if you were to modify your commands a bit, you could easily do this. Just concatenate the words together with strings, quoting when those words may have spaces or other special characters in them, and then you can use a list such as:
/bin/sh, -c, {your new string here} > /some/file
Et voila, stuff written to disk. :)
Looking at the docs for subprocess.call, I see it has extra parameters:
subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False)
If you specify stdout= to a file you have opened, then the output of your code will go to that file, which is basically the same behaviour?
I don't see your exact usage case, but this is certainly a way to synthesise the command-line pipe behaviours, with little coding change.
Note that the docs also say that you should not use the built-in =PIPE support, depending on your exact requirements. It is important that you read data from a pipe regularly or the writer will stall when the buffer is full.
Could someone please explain the proper way to use the appendto function?
I am trying to use it to write debug text to a file. I want it written immediately when I call the function, but for some reason the program waits until it exits, and then writes everything at once.
Am I using the right function? Do I need to open, then write, then close the file each time I write to it instead?
Thanks.
Looks like you are having an issue with buffering (this also a common question in other languages, btw). The data you want to write to the file is being held in a memory buffer and is only being written to disk in a latter time (this is done to batch writes to disk together, for better performance).
One possibility is to open and close the file as you already suggested. Closing a file handle will flush the contents of the buffer to disk.
A second possibility is to use the flush function to explicitly request that the data be written to disk. In Lua 4.0.1, you can either call flush passing a file handle
-- If you have opened your file with open:
local myfile = open("myfile.txt", "a")
flush(myfile)
-- If you used appendto the output file handle is in the _OUTPUT global variable
appendto("myfile.txt")
flush(_OUTPUT)
or you can call flush with no arguments, in which case it will flush all the files you have currently open.
flush()
For details, see the reference manual: http://www.lua.org/manual/4.0/manual.html#6.
I'd like to be able to intercept filenames with a certain prefix from any of my child processes that I launch. This would be names like "pipe://pipe_name". I think wrapping the open() system call would be a good way to do this for my application, but I'd like to do it without having to compile a separate shared library and hooking it with the LD_PRELOAD trick (or using FUSE and having to have a mounted directory)
I'll be forking the processes myself, is there a way to redirect open() to my own function before forking and have it persist in the child after an exec()?
Edit: The thought behind this is that I want to implement multi-reader pipes by having an intermediate process tee() the data from one pipe into all the others. I'd like this to be transparent to my child processes, so that they can take a filename and open() it, and, if it's a pipe, I'll return the file descriptor for it, while if it's a normal file, I'll just pass that to the regular open() function. Any alternative way to do this that makes it transparent to the child processes would interesting to hear. I'd like to not have to compile a separate library that has to be pre-linked though.
I believe the answer here is no, it's not possible. To my knowledge, there's only three ways to achieve this:
LD_PRELOAD trick, compile a .so that's pre-loaded to override the system call
Implement a FUSE filesystem and pass a path into it to the client program, intercept calls.
Use PTRACE to intercept system calls, and fix them up as needed.
2 and 3 will be very slow as they'll have to intercept every I/O call and swap back to user space to handle it. Probably 500% slower than normal. 1 requires building and maintaining an external shared library to link in.
For my application, where I'm just wanting to be able to pass in a path to a pipe, I realized I can open both ends in my process (via pipe()), then go grab the path to the read end in /proc//fd and pass that path to the client program, which gives me everything I need I think.
Suppose I have application A that takes some time to load (opens a couple of libraries). A processes stdin into some stdout.
I want to serve A on a network over a socket (instead of stdin and stdout).
The simplest way of doing that efficiently that I can think of is by hacking at the code and adding a forking server loop, replacing stdin and stdout with socket input and output.
The performance improvement compared to having an independent server application that spawns A (fork+exec) on each connection comes at a cost however. The latter is much easier to write and I don't need to have access to the source code of A or know the language it's written in.
I want my cake and eat it too. Is there a mechanism that would extract that forking loop?
What I want is something like fast_spawnp("A", "/tmp/A.pid", stdin_fd, stdout_fd, stderr_fd) (start process A unless it's already running, clone A from outside and make sure the standard streams of the child point to the argument-supplied file descriptors).
I am working on a flow-based programming system called net2sh. It is currently based on shell tools connected by named pipes. Several processes work together to get the job done, communicating over named pipes, not unlike a production line in a factory.
In general it is working delightfully well, however there is one major problem with it. In the case where processes are communicating over two or more named pipes, the "sending" process and "receiving" process must open the pipes in the same order. This is because when a process opens a named pipe, it blocks until the other end has also been opened.
I want a way to avoid this, without spawning extra "helper" processes for each pipe, without having to hack existing components, and without having to mess with the program networks to avoid this problem.
Ideally I am looking for some "non-blocking fifo" option, where "open" on a fifo always succeeds immediately but subsequent operations may block if the pipe buffer is full (or empty, for read)... I'd even consider using a kernel patch to that effect. According to fifo(7) O_NONBLOCK does do something different when opening fifos, not what I want exactly, and in order to use that I would have to rewrite every existing shell tool such as cat.
Here is a minimal example which deadlocks:
mkfifo a b
(> a; > b; ) &
(< b; < a; ) &
wait
If you can help me to solve this sensibly I will be very grateful!
There is a good description of using O_NONBLOCK with named pipes here: How do I perform a non-blocking fopen on a named pipe (mkfifo)?
It sounds like you want it to work in your entire environment without changing any C code. Therefore, one approach would be to set LD_PRELOAD to some shared library which contains a wrapper for open(2) which adds O_NONBLOCK to the flags whenever pathname refers to a named pipe.
A concise example of using LD_PRELOAD to override a library function is here: https://www.technovelty.org/c/using-ld_preload-to-override-a-function.html
Whether this actually works in practice without breaking anything else, you'll have to find out for yourself (please let us know!).