Rust threadpool crate how to execute on iterator? - rust

Cargo.toml
threadpool = "1.8.1"
I am using threadpool crate, just for learning purposes, if there are some better threadpool crates please inform me.
Code with problem, simplified version:
use threadpool::ThreadPool;
fn main() {
let pool = ThreadPool::new(4);
let v2 = vec!["a".to_string(), "b".to_string(), "c".to_string(), ];
for i in v2.iter() { // .iter() is the problem
pool.execute(move || println!("{}", i));
}
pool.join();
}
Error:
error[E0597]: `v2` does not live long enough
for i in v2.iter() { // .iter() is the problem
| ^^^^^^^^^ borrowed value does not live long enough
pool.execute(move || println!("{}", i));
| --------------------------------------- argument requires that `v2` is borrowed for `'static`
It is working fine without .iter(). Reason why I use it, it is for demonstration purposes in my code I use iter and zip to iter thru 2 vec of same size both of type String eg. urls.iter().zip(files_t4.iter()).
I do understand why I have error, but have no ideas how to solve it ?
I know that this is similar to How can I pass a reference to a stack variable to a thread?, but except trying another crate, not sure is it possible to use std::thread::scope like in first example.
I am learning hot to download files in Rust, and now what to see dow it is done with threadpool.
Any ideas are appreciated.

I do understand why I have error, but have no ideas how to solve it ?
Use .into_iter, or just iterate directly on the vector (does the same thing).
Because you're using iter, the iterator is borrowing, so i is of type &String, and thus triggers a lifetime error as the compiler does not know that the scope will outlive the threadpool.
not sure is it possible to use std::thread::scope like in first example.
It is, however you'll have to implement the "threadpool" by hand, as the threadpool crate does not support scoped threads / tasks. Alternatively, use Rayon which, aside from parallel iterators, provides explicit threadpool and scoped tasks. It also provides other interesting parallelism tools e.g. broadcast, join, and of course spawn.

Related

Reading a vector from multiple threads [duplicate]

This question already has an answer here:
How can I pass a reference to a stack variable to a thread?
(1 answer)
Closed last month.
I have a function that returns a vector of strings, which is read by multiple threads later. How to do this in rust?
fn get_list() -> Vec<String> { ... }
fn read_vec() {
let v = get_list();
for i in 1..10 {
handles.push(thread::spawn (|| { do_work(&v); }));
}
handles.join();
}
I think I need to extend the lifetime of v to static and pass it as a immutable ref to threads. But, I am not sure , how?
The problem you are facing is that the threads spawned by thread::spawn run for an unknown amount of time. You'll need to make sure that your Vec<String> outlives these threads.
You can use atomic reference-counting by creating an Arc<Vec<String>>, and create a clone for each thread. The Vec<String> will be deallocated only when all Arcs are dropped. Docs
You can leak the Vec<String>. I personally like this approach, but only if you need the Vec<String> for the entire runtime of your program. To achieve this, you can turn your Vec<String> into a &'static [String] by using Vec::leak. Docs
You can ensure that your threads will not run after the read_vec function returns - This is what you're essentially doing by calling handles.join(). However, the compiler doesn't see that these threads are joined later, and there might be edge cases where they are not joined (what happens when the 2nd thread::spawn panics?). To make this explicit, use the scope function in std::thread. Docs
Of course, you can also just clone the Vec<String>, and give each thread a unique copy.
TL;DR:
For this particular use-case, I'd recommend std::thread::scope. If the Vec<String> lives for the entire duration of your program, leaking it using Vec::leak is a great and often under-used solution. For more complex scenarios, wrapping the Vec<String> in an Arc is probably the right way to go.

How can I make a variable borrow for 'static?

In vulkano, to create a CPUAccessibleBuffer you need give it some data and the CPUAccessibleBuffer::from_data function requires the data to have the 'static lifetime.
I have some data in &[u8] array (created at runtime) that I would like to pass to that function.
However, it errors with this message
argument requires that `data` is borrowed for `'static`
So how can I make the lifetime of the data 'static ?
You should use CpuAccessibleBuffer::from_iter instead, it does the same thing but does not require the collection to be Copy or 'static:
let data: &[u8] = todo!();
let _ = CpuAccessibleBuffer::from_iter(
device,
usage,
host_cached,
data.iter().copied(), // <--- pass like so
);
Or if you actually have a Vec<u8>, you can pass it directly:
let data: Vec<u8> = todo!();
let _ = CpuAccessibleBuffer::from_iter(
device,
usage,
host_cached,
data, // <--- pass like so
);
If you really must create the data at runtime, and you really need to last for 'static, then you can use one of the memory leaking methods such as Box::leak or Vec::leak to deliberately leak a heap allocation and ensure it is never freed.
While leaking memory is normally something one avoids, in this case it's actually a sensible thing to do. If the data must live forever then leaking it is actually the correct thing to do, semantically speaking. You don't want the memory to be freed, not ever, which is exactly what happens when memory is leaked.
Example:
fn require_static_data(data: &'static [u8]) {
unimplemented!()
}
fn main() {
let data = vec![1, 2, 3, 4];
require_static_data(data.leak());
}
Playground
That said, really think over the reallys I led with. Make sure you understand why the code you're calling wants 'static data and ask yourself why your data isn't already 'static.
Is it possible to create the data at compile time? Rust has a powerful build time macro system. It's possible, for example, to use include_bytes! to read in a file and do some processing on it before it's embedded into your executable.
Is there another API you can use, another function call you're not seeing that doesn't require 'static?
(These questions aren't for you specifically, but for anyone who comes across this Q&A in the future.)
If the data is created at runtime, it can't have a static lifetime. Static means that data is present for the whole lifetime of the program, which is necessary in some contexts, especially when threading is involved. One way for data to be static is, as Paul already answered, explicitly declaring it as such, i.e.:
static constant_value: i32 = 0;
However, there's no universally applicable way to make arbitrary data static. This type of inference is made at compile-time by the borrow checker, not by the programmer.
Usually if a function requires 'static (type) arguments (as in this case) it means that anything less could potentially be unsafe, and you need to reorganize the way data flows in and out of your program to provide this type of data safely. Unfortunately, that's not something SO can provide within the scope of this question.
Make a constant with static lifetime:
static NUM: i32 = 18;

Why do I have to declare a variable as mutable in order for internal functions to modify its own contents?

I have a CPU struct with a load_rom method:
use std::fs::File;
use std::io::{self, Read};
pub struct CPU {
pub mem: [u8; 4096],
V: [u8; 16],
I: u16,
stack: [u16; 16],
opcode: u16,
}
impl CPU {
pub fn new() -> CPU {
CPU {
mem: [0; 4096],
V: [0; 16],
I: 0,
stack: [0; 16],
opcode: 0,
}
}
pub fn load_rom(&self, filepath: &str) {
let mut rom: Vec<u8> = Vec::new();
let mut file = File::open(filepath).unwrap();
file.read_to_end(&mut rom);
for (i, mut byte) in rom.iter().enumerate() {
self.mem[i] = *byte;
}
}
}
fn main() {}
This generates the error:
error: cannot assign to immutable indexed content `self.mem[..]`
--> src/main.rs:28:13
|
28 | self.mem[i] = *byte;
| ^^^^^^^^^^^^^^^^^^^
When I create a CPU with let mut cpu = CPU::new(); and pass &mut self to the load_rom method, everything works just fine.
If I don't use mut on creation, I get the error:
error: cannot borrow immutable local variable `cpu` as mutable
--> src/main.rs:10:2
|
9 | let cpu = CPU::new();
| --- use `mut cpu` here to make mutable
10 | cpu.load_rom("/Users/.../Code/Rust/chip8/src/roms/connect4.ch8");
| ^^^ cannot borrow mutably
It doesn't seem right that I have to make cpu mutable in order for internal functions to modify its own contents. Do I really have to declare cpu as mutable? Or am I missing something?
make cpu mutable in order for internal functions to modify its own contents
(emphasis mine)
Rust is a systems language, which means that it attempts to give you the ability to create fast and efficient code. One of the primary ways that this is done is by providing references to existing data instead of copying it.
Rust is also a safe language, which (among other things) means that accessing an invalid reference should be impossible.
To accomplish both of these goals, there have to be tradeoffs. Some languages move the safety checks to runtime, enforce mandatory synchronization primitives (e.g. a mutex and friends), or some other interesting solution. Some languages avoid the mess entirely and opt to disallow references or not attempt to guarantee safety.
Rust differs from these by checking as many things at compile time as feasible. This implies that the compiler has to be able to reason about when and where a piece of memory might be mutated.
If it didn't know this, then you might get a reference to something within a value and then call a mutating method on that value that invalidates the reference. When you go to use the now-invalid reference... BOOOOOM. Your program crashes at best, or leaks information or creates a backdoor at worst.
&mut self is in indication to the compiler that this method might mutate the values within. It is only valid to get a mutable reference to a value that is already mutable, which is denoted by the mut keyword on a variable binding (mut cpu here).
However, this isn't just useful to the compiler. Knowing that something is being changed is highly valuable to the programmer too. Mutability in a large system adds hard-to-reason-about complexity, and being forced to explicitly list when something is and isn't mutable can be very informative and mentally freeing.
It's also useful to know the rules for borrowing that Rust applies. These restrict you to one or the other of:
* one or more references (`&T`) to a resource,
* exactly one mutable reference (`&mut T`).
Succinctly, this can be summed as "aliasing XOR mutability".
If your mutation is truly internal, then you can also make use of interior mutability, such as by using a RefCell or a Mutex. What you use depends on your needs and what kind of data you want to store.
These constructs are a good mental fit for structures like caches, where you want to "hide" the mutability from the outside. However, there are also limitations to these as the lifetime of references to the data within must be shortened to continue providing the "aliasing XOR mutabilty" guarantee to keep the code safe.
For your specific problem, I agree with the commenters that it makes sense for load_rom to accept a &mut self. It can even be simplified:
pub fn load_rom(&mut self, filepath: &str) {
let mut file = File::open(filepath).unwrap();
file.read_exact(&mut self.mem);
}
You may want to zero out any old data before loading. Otherwise, if you load a second ROM that's smaller than the first, data from the first ROM can leak to the second (an actual bug from older computers / operating systems).
Rust uses a transitive immutability model. This means that if a variable is marked as immutable, the variable may not be mutated, and data accessed through the variable may not be mutated.
Furthermore, if you have a mutable reference to a variable, the type system disallows any immutable references from coexisting; and so data not marked as `mut' is truly unchanging throughout the lifetime of the the immutable reference.
Together this makes it so that by default, it is not possible for there to be two mutable references to the same data at the same time. This is a requirement for efficient memory safety and thread safety; and also makes it simpler to reason about mutation in Rust code.
If you want "interior" mutability you can use Cell<T> or RefCell<T> from the std::cell module. However, this is probably the wrong thing to do for a CPU struct that is meant to represent a CPU that is expected to be run, and have its state change after each operation. Interior mutability should generally be reserved for performing mutation within the implementation of an operation that does not perform any logical (externally visible) mutation of an object. A CPU running operations or loading memory would not be a good candidate for this, as each operation such as "load memory", "run instruction" or whatever will alter the logical state of the CPU.
See the std::cell documentation for further discussion of when you might want interior mutability.

Borrow problems with compiled SQL statements

My program uses rusqlite to build a database from another data source. The database builds multiple tables in the same manner, so I thought I'd make a reusable function to do so:
fn download_generic<Inserter>(table_name: &str,
connection: &mut rusqlite::Connection,
inserter: &mut Inserter)
-> Result<(), String>
where Inserter: FnMut(&str, &json::JsonValue) -> ()
{}
inserter is a function that binds the correct values from a previously-prepared statement and does the insertion.
I call it like this:
let mut insert_stmt = connection
.prepare("insert or replace into categories values(?,?);")
.unwrap();
download_generic("categories",
&mut connection,
&mut |uuid, jsonproperties| {
insert_stmt.execute(&[&uuid, &jsonproperties["name"].as_str().unwrap_or("")]);
});
However I can't pass &mut connection to download_generic because it's already being borrowed by the insert_stmt. Putting it into a RefCell makes no sense because I shouldn't need runtime overhead to make this work.
I could try making the insert_stmt generated by a lambda that you pass to download_generic, but then I get overwhelmed by having to add lifetime markers everywhere, and it seems unnatural, anyway.
By design, Rust prevents you from having an immutable borrow and a mutable borrow on the same object active at the same time. This is to prevent dangling pointers and data races.
In rusqlite's API, some methods on Connection require a mutable self, and some methods only require an immutable self. However, some of the methods that only require an immutable self return objects that keep that borrow active; prepare is an example of this. Therefore, as long as one of these objects stays in scope, Rust will not allow you to take a mutable borrow on the Connection.
There's probably a reason why some methods take self by mutable reference. Requiring a mutable reference ensures the callee that it has exclusive access to that object. If you think that might not be the case for the methods you need to use, or you think there could be another way to solve this, you should report an issue to the library's maintainers.
Regarding prepare specifically, you can work around the conflicting borrows by calling prepare_cached from within the closure instead. In order to do that, you'll have to make download_generic pass the connection back as a parameter to the closure, otherwise you'd have two mutable borrows on connection and that's not allowed.

How do I use static lifetimes with threads?

I'm currently struggling with lifetimes in Rust (1.0), especially when it comes to passing structs via channels.
How would I get this simple example to compile:
use std::sync::mpsc::{Receiver, Sender};
use std::sync::mpsc;
use std::thread::spawn;
use std::io;
use std::io::prelude::*;
struct Message<'a> {
text: &'a str,
}
fn main() {
let (tx, rx): (Sender<Message>, Receiver<Message>) = mpsc::channel();
let _handle_receive = spawn(move || {
for message in rx.iter() {
println!("{}", message.text);
}
});
let stdin = io::stdin();
for line in stdin.lock().lines() {
let message = Message {
text: &line.unwrap()[..],
};
tx.send(message).unwrap();
}
}
I get:
error[E0597]: borrowed value does not live long enough
--> src/main.rs:23:20
|
23 | text: &line.unwrap()[..],
| ^^^^^^^^^^^^^ does not live long enough
...
26 | }
| - temporary value only lives until here
|
= note: borrowed value must be valid for the static lifetime...
I can see why this is (line only lives for one iteration of for), but I can't figure out what the right way of doing this is.
Should I, as the compiler hints, try to convert the &str into &'static str?
Am I leaking memory if every line would have a 'static lifetime?
When am I supposed to use 'static anyway? Is it something I should try to avoid or is it perfectly OK?
Is there a better way of passing Strings in structs via channels?
I apologize for those naive questions. I've spent quite some time searching already, but I can't quite wrap my head around it. It's probably my dynamic language background getting in the way :)
As an aside: Is &input[..] for converting a String into a &str considered OK? It's the only stable way I could find to do this.
You can't convert &'a T into &'static T except by leaking memory. Luckily, this is not necessary at all. There is no reason to send borrowed pointers to the thread and keep the lines on the main thread. You don't need the lines on the main thread. Just send the lines themselves, i.e. send String.
If access from multiple threads was necessary (and you don't want to clone), use Arc<String> (in the future, Arc<str> may also work). This way the string is shared between threads, properly shared, so that it will be deallocated exactly when no thread uses it any more.
Sending non-'static references between threads is unsafe because you never know how long the other thread will keep using it, so you don't know when the borrow expires and the object can be freed. Note that scoped threads don't have this problem (which aren't in 1.0 but are being redesigned as we speak) do allow this, but regular, spawned threads do.
'static is not something you should avoid, it is perfectly fine for what it does: Denoting that a value lives for the entire duration the program is running. But if that is not what you're trying to convey, of course it is the wrong tool.
Think about it this way: A thread has no syntactical lifetime, i.e. the thread will not be dropped at the end of code block where it was created. Whatever data you send to the thread, you must be sure that it will live as long as the thread does, which means forever. Which means 'static.
What can go wrong in your case, is if the main loop sends a reference to a thread and destroys the string before it has been handled by the thread. The thread would access invalid memory when dealing with the string.
One option would be to put your lines into some statically allocated container but this would mean that you never can destroy those strings. Generally speaking a bad idea. Another option is to think: does the main thread actually need the line once it is read? What if the main thread transfered responsibility for line to the handling thread?
struct Message {
text: String,
}
for line in stdin.lock().lines() {
let message = Message {
text: line.unwrap(),
};
tx.send(message).unwrap();
}
Now you are transferring ownership (move) from the main thread to the handler thread. Because you move your value, no references are involved and no checks for lifetime apply anymore.

Resources