Delete a file tree in parallel with Rust - rust

I'm trying to delete a file tree in parallel with Rust. I'm using jwalk for parallel traversal and deletion of files. The code shown below deletes all the files. In general it's working as expected, but the performance is terrible.
Compared to a Python version I've implemented, it's 5 times slower on Windows! What am I doing wrong, so that the Rust version is that slow?
What I've found out so far is that std::fs::remove_file is the reason for the bad performance. I'm wondering if the implementation of this function has a performance issue, at least on Windows?
I'm using Rust version 1.42.0 with toolchain stable-x86_64-pc-windows-msvc.
let (tx, rx) = mpsc::channel();
WalkBuilder::new(tmpr)
.hidden(false)
.standard_filters(false)
.threads(cmp::min(30, num_cpus::get()))
.build_parallel()
.run(move || {
let tx = tx.clone();
Box::new(move |dir_entry_result| {
if let Ok(dir_entry) = dir_entry_result {
if dir_entry.file_type().unwrap().is_dir() {
tx.send(dir_entry.path().to_owned()).unwrap();
} else {
if let Err(_) = std::fs::remove_file(&dir_entry.path()) {
match fix_permissions(&dir_entry.path()) {
Ok(_) => {
if let Err(_) = std::fs::remove_file(&dir_entry.path()) {
tx.send(dir_entry.path().to_owned()).unwrap();
}
}
Err(_) => {
tx.send(dir_entry.path().to_owned()).unwrap();
}
}
}
}
}
ignore::WalkState::Continue
})
});
let paths: Vec<_> = rx.into_iter().collect();

I've found the reason for the slowdown. It was the virus scanner. The reason why the Python version was faster is that Python.exe is ignored by the virus scanner, but the Rust exe not. After I disabled the virus scanner the Rust version is now lightning fast.

Related

Dynamically add rtmp source to running pipeline using compositor

The bounty expires in 5 days. Answers to this question are eligible for a +50 reputation bounty.
Adam Burke wants to draw more attention to this question.
I have a very simple Gstreamer pipeline looking like this
uridecodebin -> compositor -> videoconvert -> autovideosink
I want to be able at any time to add a new uridecodebin on the compositor.
If I add the second source before the pipeline is running, it works fine, but if I delay the addition of the second source, pipeline gets stuck and I have tons of QoS events telling me frames are being dropped.
This issue does not occur if I only read non-live sources, but it happens with my RTMP streams, or if I mix live and non-live sources.
When sync=false is set on the sync, RTMP streams are played, but it does not work with non-live sources.
My assumption is that I am missing a step with time/clock/latency, but I don't know what.
Here is the code (in rust) used to add a new source :
fn connect_pad_added(src_pad: &gst::Pad, src: &gst::Element, compositor: &gst::Element) {
println!("Received new pad {} from {}", src_pad.name(), src.name());
let new_pad_caps = src_pad
.current_caps()
.expect("Failed to get caps of new pad.");
let new_pad_struct = new_pad_caps
.structure(0)
.expect("Failed to get first structure of caps.");
let new_pad_type = new_pad_struct.name();
let is_video = new_pad_type.starts_with("video/x-raw");
if !is_video {
println!(
"It has type {} which is not raw video. Ignoring.",
new_pad_type
);
return;
}
println!("Created template");
let sink_pad = compositor
.request_pad_simple("sink_%u")
.expect("Could not get sink pad from compositor");
println!("Got pad");
if sink_pad.is_linked() {
println!("We are already linked. Ignoring.");
return;
}
if sink_pad.name() == "sink_0" {
sink_pad.set_property("width", 1920i32);
sink_pad.set_property("height", 1080i32);
} else {
sink_pad.set_property("alpha", 0.8f64);
}
let res = src_pad.link(&sink_pad);
if res.is_err() {
println!("Type is {} but link failed.", new_pad_type);
} else {
println!("Link succeeded (type {}).", new_pad_type);
}
}
fn add_new_element(pipeline: &gst::Pipeline, uri: &str) {
println!("Adding new element");
let source = gst::ElementFactory::make("uridecodebin")
.property("uri", uri)
.build()
.unwrap();
let compositor = pipeline.by_name("compositor").unwrap();
pipeline.add(&source).unwrap();
source.connect_pad_added(move |src, src_pad| {
println!("Received new pad {} from {}", src_pad.name(), src.name());
connect_pad_added(src_pad, src, &compositor);
});
source
.set_state(gst::State::Paused)
.expect("Unable to set the uridecodebin to the `Paused` state");
println!("Added new element");
}

How to cheaply send a delay message?

My requirement is very simple, which is a very reasonable requirement in many programs. It is to send a specified message to my Channel after a specified time.
I've checked tokio for topics related to delay, interval or timeout, but none of them seem that straightforward to implement.
What I've come up with now is to spawn an asynchronous task, then wait or sleep for a certain amount of time, and finally send the message.
But, obviously, spawning an asynchronous task is a relatively heavy operation. Is there a better solution?
async fn my_handler(sender: mpsc::Sender<i32>, dur: Duration) {
tokio::spawn(async {
time::sleep(dur).await;
sender.send(0).await;
}
}
You could try adding a second channel and a continuously running task that buffers messages until the time they are to be received. Implementing this is more involved than it sounds, I hope I'm handling cancellations right here:
fn make_timed_channel<T: Ord + Send + Sync + 'static>() -> (Sender<(Instant, T)>, Receiver<T>) {
// Ord is an unnecessary requirement arising from me stuffing both the Instant and the T into the Binary heap
// You could drop this requirement by using the priority_queue crate instead
let (sender1, receiver1) = mpsc::channel::<(Instant, T)>(42);
let (sender2, receiver2) = mpsc::channel::<T>(42);
let mut receiver1 = Some(receiver1);
tokio::spawn(async move {
let mut buf = std::collections::BinaryHeap::<Reverse<(Instant, T)>>::new();
loop {
// Pretend we're a bounded channel or exit if the upstream closed
if buf.len() >= 42 || receiver1.is_none() {
match buf.pop() {
Some(Reverse((time, element))) => {
sleep_until(time).await;
if sender2.send(element).await.is_err() {
break;
}
}
None => break,
}
}
// We have some deadline to send a message at
else if let Some(Reverse((then, _))) = buf.peek() {
if let Ok(recv) = timeout_at(*then, receiver1.as_mut().unwrap().recv()).await {
match recv {
Some(recv) => buf.push(Reverse(recv)),
None => receiver1 = None,
}
} else {
if sender2.send(buf.pop().unwrap().0 .1).await.is_err() {
break;
}
}
}
// We're empty, wait around
else {
match receiver1.as_mut().unwrap().recv().await {
Some(recv) => buf.push(Reverse(recv)),
None => receiver1 = None,
}
}
}
});
(sender1, receiver2)
}
Playground
Whether this is more efficient than spawning tasks, you'd have to benchmark. (I doubt it. Tokio iirc has some much fancier solution than a BinaryHeap for waiting for waking up at the next timeout, e.g.)
One optimization you could make if you don't need a Receiver<T> but just something that .poll().await can be called on: You could drop the second channel and maintain the BinaryHeap inside a custom receiver.

Is it possible to make this Rust code easier to reason about? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I'm reading this Rust code and I barely have the mental capacity of understanding what's going on with all the mutexes and handles. It's all overhead to make the Rust gods happy, and it's making it hard to focus on what's actually going on. Take a look:
#[tauri::command]
fn spawn(param: String, window: Window<Wry>) {
let window_arc = Arc::new(Mutex::new(window));
// Spawn bin
let (mut rx, child) = tauri::api::process::Command::new_sidecar("bin")
.expect("failed to create binary command")
.args([param])
.spawn()
.expect("Failed to spawn sidecar");
let child_arc = Arc::new(Mutex::new(child));
// Handle data from bin
let window = window_arc.clone();
let (handle, mut handle_rx) = broadcast::channel(1);
let handle_arc = Arc::new(Mutex::new(handle));
tauri::async_runtime::spawn(async move {
loop {
tokio::select! {
_ = handle_rx.recv() => {
return;
}
Some(event) = rx.recv() => {
if let CommandEvent::Stdout(line) = &event {
let data = decode_and_xor(line.clone());
println!("Data from bin: {}", data);
window.lock().unwrap().emit("from_bin", data).expect("failed to emit message");
}
if let CommandEvent::Stderr(line) = &event {
println!("Fatal error bin: {}", &line);
window.lock().unwrap().emit("bin_fatal_error", line).expect("failed to emit message");
}
}
}
;
}
});
let window = window_arc.clone();
let window_cc = window.clone();
window_cc.lock().unwrap().listen("kill_bin", move |event| {
let handle = handle_arc.clone();
handle.lock().unwrap().send(true).unwrap();
window.lock().unwrap().unlisten(event.id());
});
// Handle data to bin
let window = window_arc.clone();
tauri::async_runtime::spawn(async move {
let child_clone = child_arc.clone();
let (handle, handle_rx) = broadcast::channel(1);
let handle_rx_arc = Arc::new(Mutex::new(handle_rx));
let handle_arc = Arc::new(Mutex::new(handle));
let window_c = window.clone();
window.lock().unwrap().listen("to_bin", move |event| {
let handle_rx = handle_rx_arc.clone();
if handle_rx.lock().unwrap().try_recv().is_ok() {
window_c.lock().unwrap().unlisten(event.id());
return;
}
// Send data to bin
let payload = String::from(event.payload().unwrap());
let encrypted = xor_and_encode(payload) + "\n";
println!("Data to send: {}", event.payload().unwrap());
child_clone.clone().lock().unwrap().write(encrypted.as_bytes()).expect("could not write to child");
});
let window_c = window.clone();
window.lock().unwrap().listen("kill_bin", move |event| {
let handle = handle_arc.clone();
handle.lock().unwrap().send(true).unwrap();
window_c.lock().unwrap().unlisten(event.id());
});
});
}
Are all these Arcs, Mutexes and clones necessary? How would I go about cleaning this up in a Rust idiomatic way, making it easier to see what's going on?
Are all these Arcs, Mutexes and clones necessary?
Probably not, you seem to be way over-cloning -- and rewrapping concurrent structure, but you'll have to look at the specific APIs
e.g. assuming broascast::channel is Tokio's it's designed for concurrent usage (that's kinda the point) so senders are designed to be clonable (for multiple producers) and you can create as many receivers as you need from the senders.
There's no need to wrap in an Arc, and there's especially no need whatsoever to protect them behind locks, they're designed to work as-is.
Furthermore, in this case it's even less necessary because you have just one sender task and one receiver tasks, neither is shared. Nor do you need to clone them when you use them. So e.g.
let handle_arc = Arc::new(Mutex::new(handle));
[...]
window_cc.lock().unwrap().listen("kill_bin", move |event| {
let handle = handle_arc.clone();
handle.lock().unwrap().send(true).unwrap();
window.lock().unwrap().unlisten(event.id());
});
I'm pretty sure can just be
window_cc.lock().unwrap().listen("kill_bin", move |event| {
handle.send(true).unwrap();
window.lock().unwrap().unlisten(event.id());
});
that'll move the handle inside the closure, then send on that. Sender is internally mutable so it needs no locking to send an event (that would rather defeat the point).

Check if a command is in PATH/executable as process

I want to execute an external program via std::process::Command::spawn. Furthermore I want to know the reason why spawning the process failed: is it because the given program name doesn't exist/is not in PATH or because of some different error?
Example code of what I want to achieve:
match Command::new("rustc").spawn() {
Ok(_) => println!("Was spawned :)"),
Err(e) => {
if /* ??? */ {
println!("`rustc` was not found! Check your PATH!")
} else {
println!("Some strange error occurred :(");
}
},
}
When I try to execute a program that isn't on my system, I get:
Error { repr: Os { code: 2, message: "No such file or directory" } }
But I don't want to rely on that. Is there a way to determine if a program exists in PATH?
You can use e.kind() to find what ErrorKind the error was.
match Command::new("rustc").spawn() {
Ok(_) => println!("Was spawned :)"),
Err(e) => {
if let NotFound = e.kind() {
println!("`rustc` was not found! Check your PATH!")
} else {
println!("Some strange error occurred :(");
}
},
}
Edit: I didn't find any explicit documentation about what error kinds can be returned, so I looked up the source code. It seems the error is returned straight from the OS. The relevant code seems to be in src/libstd/sys/[unix/windows/..]/process.rs. A snippet from the Unix version:
One more edit: On a second thought, I'm not sure if the licenses actually allows posting parts of Rust sources here, so you can see it on github
Which just returns Error::from_raw_os_err(...). The Windows version seemed more complicated, and I couldn't immediately find where it even returns errors from. Either way, it seems you're at the mercy of your operating system regarding that. At least I found the following test in src/libstd/process.rs:
Same as above: github
That seems to guarantee that an ErrorKind::NotFound should be returned at least when the binary is not found. It makes sense to assume that the OS wouldn't give a NotFound error in other cases, but who knows. If you want to be absolutely sure that the program really was not found, you'll have to search the directories in $PATH manually. Something like:
use std::env;
use std::fs;
fn is_program_in_path(program: &str) -> bool {
if let Ok(path) = env::var("PATH") {
for p in path.split(":") {
let p_str = format!("{}/{}", p, program);
if fs::metadata(p_str).is_ok() {
return true;
}
}
}
false
}
fn main() {
let program = "rustca"; // shouldn't be found
if is_program_in_path(program) {
println!("Yes.");
} else {
println!("No.");
}
}

How would you stream output from a Process in Rust?

This question refers to Rust as of October 2014.
If you are using Rust 1.0 or above, you best look elsewhere for a solution.
I have a long running Rust process that generates log values, which I'm running using Process.
It looks at though I might be able to periodically "check on" the running process using set_timeout() and wait() and do something kind of high level loop like:
let mut child = match Command::new("thing").arg("...").spawn() {
Ok(child) => child,
Err(e) => fail!("failed to execute child: {}", e),
};
loop {
child.set_timeout(Some(100));
match child.wait() {
// ??? Something goes here
}
}
The things I'm not 100% on are; how do I tell the difference between a timeout error and a process-return error from wait(), and how to a use the PipeStream to "read as much as you can without blocking from the stream" every interval to push out.
Is this the best approach? Should I start a task to monitor stdout and stderr instead?
For distinguishing the errors from the process from the timeout, you have to manage the returns from wait, an example here:
fn run() {
let mut child = match Command::new("sleep").arg("1").spawn() {
Ok(child) => child,
Err(e) => fail!("failed to execute child: {}", e),
};
loop {
child.set_timeout(Some(1000));
match child.wait() {
// Here assume any error is timeout, you can filter from IoErrorKind
Err(..) => println!("Timeout"),
Ok(ExitStatus(0)) => {
println!("Finished without errors");
return;
}
Ok(ExitStatus(a)) => {
println!("Finished with error number: {}", a);
return;
}
Ok(ExitSignal(a)) => {
println!("Terminated by signal number: {}", a);
return;
}
}
}
}
About using streams, check with wait_with_output, or implement something similar with channels and threads : http://doc.rust-lang.org/src/std/home/rustbuild/src/rust-buildbot/slave/nightly-linux/build/src/libstd/io/process.rs.html#601
Hope it helped
Have a look in cargo:
https://docs.rs/cargo-util/0.1.1/cargo_util/struct.ProcessBuilder.html#method.exec_with_streaming
The only downside is that cargo-util seems to need openssl even with default-features=false...
But you can at least see how it and read2 are done.

Resources