Borrow checker and shared I/O

Borrow checker and shared I/O - rust

I'm hitting a problem in my code where multiple structs need to send data to a shared output sink and the borrow checker doesn't like it.
struct SharedWriter {
count: u32,
}
impl SharedWriter {
pub fn write(&mut self) {
self.count += 1;
}
}
struct Container<'a> {
writer: &'a mut SharedWriter,
}
impl Container<'_> {
pub fn write(&mut self) {
self.writer.write();
}
}
pub fn test() {
let mut writer = SharedWriter { count: 0 };
let mut c0 = Container {
writer: &mut writer,
};
let mut c1 = Container {
// compiler chokes here with:
// cannot borrow `writer` as mutable more than once at a time
writer: &mut writer,
};
c0.write();
c1.write();
}
I understand the problem and why it's happening; you can't borrow something as mutable more than once at a time.
What I don't understand is a good general solution. This pattern happens a lot. You've got a common output sink, like a file or a socket or a database, and you want to feed multiple streams of data to it. It has to be mutable if it maintains any kind of state. It has to be just a single entity if it holds any resources.
You could pass a reference to the sink in every single write() method (write(&mut writer, some_data)), but this clutters the code and will get called (in my particular app) millions of times per second. I'm speculating that there is some extra overhead in passing this parameter over and over.
Is there some syntax that will get past this problem?

Interior mutability.
In your case the easiest way is probably to use RefCell. It will have some runtime overhead, but it is safe.
use std::cell::RefCell;
struct SharedWriter {
count: RefCell<u32>,
}
impl SharedWriter {
pub fn new(count: u32) -> Self {
Self { count: RefCell::new(count) }
}
pub fn write(&self) {
*self.count.borrow_mut() += 1;
}
}
If the data is Copy (like u32, in case this is your real data), you may want to use Cell. It is applicable to less types but zero-cost:
use std::cell::Cell;
struct SharedWriter {
count: Cell<u32>,
}
impl SharedWriter {
pub fn new(count: u32) -> Self {
Self { count: Cell::new(count) }
}
pub fn write(&self) {
self.count.set(self.count.get() + 1);
}
}
There are more interior mutability primitives (for example, UnsafeCell for zero-cost but unsafe access, or mutexes and atomics for thread safe mutation).

One option would be to use a channel. Here is an example of how that might look. This also has the added benefit of allowing you to scale across multiple threads with your io. It takes a handler which it runs in a loop on a new thread. It blocks until a value sent through the sender is received then calls func with a given value and a mutable reference to the handler. The thread exits when all the senders have been dropped. However one downside of this approach is the channel only works in one direction.
use std::sync::mpsc::{channel, Sender};
use std::thread::{self, JoinHandle};
pub fn create_shared_io<T, H, F>(mut handler: H, mut func: F) -> (JoinHandle<H>, Sender<T>)
where
T: 'static + Send,
H: 'static + Send,
F: 'static + FnMut(&mut H, T) + Send,
{
let (send, recv) = channel();
let join_handle = thread::spawn(move || loop {
let value = match recv.recv() {
Ok(v) => v,
Err(_) => break handler,
};
func(&mut handler, value);
});
(join_handle, send)
}
And then it can be used similarly to your example. Since no data was passed in your example, it sends () as a placeholder.
pub fn main() {
let writer = SharedWriter { count: 0 };
println!("Starting!");
let (join_handle, sender) = create_shared_io(writer, |writer, _| {
writer.count += 1;
println!("Current count: {}", writer.count);
});
let mut c0 = Container {
writer: sender.clone(),
};
let mut c1 = Container {
writer: sender,
};
c0.write();
c1.write();
// Ensure the senders are dropped before we join the io thread to avoid possible deadlock
// where the compiler attempts to drop these values after the join.
std::mem::drop((c0, c1));
// Writer is returned when the thread is joined
let writer = join_handle.join().unwrap();
println!("Finished!");
}
struct SharedWriter {
count: u32,
}
struct Container {
writer: Sender<()>,
}
impl Container {
pub fn write(&mut self) {
self.writer.send(()).unwrap();
}
}

Related

Lockless processing of non overlapping non contiguous indexes by multiple threads in Rust

I am practicing rust and decided to create a Matrix ops/factorization project.
Basically I want to be able to process the underlying vector in multiple threads. Since I will be providing each thread non-overlapping indexes (which may or may not be contiguous) and the threads will be joined before the end of whatever function created them, there is no need for a lock /synchronization.
I know that there are several crates that can do this, but I would like to know if there is a relatively idiomatic crate-free way to implement it on my own.
The best I could come up with is (simplified the code a bit):
use std::thread;
//This represents the Matrix
#[derive(Debug, Clone)]
pub struct MainStruct {
pub data: Vec<f64>,
}
//This is the bit that will be shared by the threads,
//ideally it should have its lifetime tied to that of MainStruct
//but i have no idea how to make phantomdata work in this case
#[derive(Debug, Clone)]
pub struct SliceTest {
pub data: Vec<SubSlice>,
}
//This struct is to hide *mut f64 to allow it to be shared to other threads
#[derive(Debug, Clone)]
pub struct SubSlice {
pub data: *mut f64,
}
impl MainStruct {
pub fn slice(&mut self) -> (SliceTest, SliceTest) {
let mut out_vec_odd: Vec<SubSlice> = Vec::new();
let mut out_vec_even: Vec<SubSlice> = Vec::new();
unsafe {
let ptr = self.data.as_mut_ptr();
for i in 0..self.data.len() {
let ptr_to_push = ptr.add(i);
//Non contiguous idxs
if i % 2 == 0 {
out_vec_even.push(SubSlice{data:ptr_to_push});
} else {
out_vec_odd.push(SubSlice{data:ptr_to_push});
}
}
}
(SliceTest{data: out_vec_even}, SliceTest{data: out_vec_odd})
}
}
impl SubSlice {
pub fn set(&self, val: f64) {
unsafe {*(self.data) = val;}
}
}
unsafe impl Send for SliceTest {}
unsafe impl Send for SubSlice {}
fn main() {
let mut maindata = MainStruct {
data: vec![0.0, 1.0, 2.0, 3.0, 4.0, 5.0],
};
let (mut outvec1, mut outvec2) = maindata.slice();
let mut threads = Vec::new();
threads.push(
thread::spawn(move || {
for i in 0..outvec1.data.len() {
outvec1.data[i].set(999.9);
}
})
);
threads.push(
thread::spawn(move || {
for i in 0..outvec2.data.len() {
outvec2.data[i].set(999.9);
}
})
);
for handles in threads {
handles.join();
}
println!("maindata = {:?}", maindata.data);
}
EDIT:
Following kmdreko suggestion below, got the code to work exactly how I wanted it without using unsafe code, yay!
Of course in terms of performance it may be cheaper to copy the f64 slices than to create mutable reference vectors unless your struct is filled with other structs instead of f64
extern crate crossbeam;
use crossbeam::thread;
#[derive(Debug, Clone)]
pub struct Matrix {
data: Vec<f64>,
m: usize, //number of rows
n: usize, //number of cols
}
...
impl Matrix {
...
pub fn get_data_mut(&mut self) -> &mut Vec<f64> {
&mut self.data
}
pub fn calculate_idx(max_cols: usize, i: usize, j: usize) -> usize {
let actual_idx = j + max_cols * i;
actual_idx
}
//Get individual mutable references for contiguous indexes (rows)
pub fn get_all_row_slices(&mut self) -> Vec<Vec<&mut f64>> {
let max_cols = self.max_cols();
let max_rows = self.max_rows();
let inner_data = self.get_data_mut().chunks_mut(max_cols);
let mut out_vec: Vec<Vec<&mut f64>> = Vec::with_capacity(max_rows);
for chunk in inner_data {
let row_vec = chunk.iter_mut().collect();
out_vec.push(row_vec);
}
out_vec
}
//Get mutable references for disjoint indexes (columns)
pub fn get_all_col_slices(&mut self) -> Vec<Vec<&mut f64>> {
let max_cols = self.max_cols();
let max_rows = self.max_rows();
let inner_data = self.get_data_mut().chunks_mut(max_cols);
let mut out_vec: Vec<Vec<&mut f64>> = Vec::with_capacity(max_cols);
for _ in 0..max_cols {
out_vec.push(Vec::with_capacity(max_rows));
}
let mut inner_idx = 0;
for chunk in inner_data {
let row_vec_it = chunk.iter_mut();
for elem in row_vec_it {
out_vec[inner_idx].push(elem);
inner_idx += 1;
}
inner_idx = 0;
}
out_vec
}
...
}
fn test_multithreading() {
fn test(in_vec: Vec<&mut f64>) {
for elem in in_vec {
*elem = 33.3;
}
}
fn launch_task(mat: &mut Matrix, f: fn(Vec<&mut f64>)) {
let test_vec = mat.get_all_row_slices();
thread::scope(|s| {
for elem in test_vec.into_iter() {
s.spawn(move |_| {
println!("Spawning thread...");
f(elem);
});
}
}).unwrap();
}
let rows = 4;
let cols = 3;
//new function code omitted, returns Result<Self, MatrixError>
let mut mat = Matrix::new(rows, cols).unwrap()
launch_task(&mut mat, test);
for i in 0..rows {
for j in 0..cols {
//Requires index trait implemented for matrix
assert_eq!(mat[(i, j)], 33.3);
}
}
}

This API is unsound. Since there is no lifetime annotation binding SliceTest and SubSlice to the MainStruct, they can be preserved after the data has been destroyed and if used would result in use-after-free errors.
Its easy to make it safe though; you can use .iter_mut() to get distinct mutable references to your elements:
pub fn slice(&mut self) -> (Vec<&mut f64>, Vec<&mut f64>) {
let mut out_vec_even = vec![];
let mut out_vec_odd = vec![];
for (i, item_ref) in self.data.iter_mut().enumerate() {
if i % 2 == 0 {
out_vec_even.push(item_ref);
} else {
out_vec_odd.push(item_ref);
}
}
(out_vec_even, out_vec_odd)
}
However, this surfaces another problem: thread::spawn cannot hold references to local variables. The threads created are allowed to live beyond the scope they're created in, so even though you did .join() them, you aren't required to. This was a potential issue in your original code as well, just the compiler couldn't warn about it.
There's no easy way to solve this. You'd need to use a non-referential way to use data on the other threads, but that would be using Arc, which doesn't allow mutating its data, so you'd have to resort to a Mutex, which is what you've tried to avoid.
I would suggest reaching for scope from the crossbeam crate, which does allow you to spawn threads that reference local data. I know you've wanted to avoid using crates, but this is the best solution in my opinion.
See a working version on the playground.
See:
How to get multiple mutable references to elements in a Vec?
Can you specify a non-static lifetime for threads?

Using a trait object in a background job (different thread)

I want to have a background worker which uses a trait implementation / object for some time. The background worker owns this object as long as it is used. After the background worker is "destroyed", the object should be free to be used again.
I tried to make all the things with async/await, but it produced some more problems. Therefore, I use plain threads to create kind of a minimal example. First I also used Box<&dyn mut...> to pass the object to the background worker, but I think that is not even needed.
My minimal example contains a MyWriter-trait which can write string to somewhere. There exists one implementation which writes strings to stdout. A background-worker uses this writer for a background-job. The worker has a start-method to start the job and a stop-method to join it (in my real code I would use a channel to send a stop-info to the worker and joining then).
I'll post the code and then a description with my problems:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=a01745c15ba1088acd2e3d287d60e270
use std::sync::Arc;
use std::sync::Mutex;
use std::thread::{spawn, JoinHandle};
/* Trait + an implementation */
trait MyWriter {
fn write(&mut self, text: &str);
}
struct StdoutWriter {}
impl StdoutWriter {
pub fn new() -> Self {
Self {}
}
}
impl MyWriter for StdoutWriter {
fn write(&mut self, text: &str) {
println!("{}", text);
}
}
/* A background job which uses a "MyWriter" */
struct BackgroundJob<'a> {
writer: Arc<Mutex<&'a dyn MyWriter>>,
job: Option<JoinHandle<()>>,
}
impl<'a> BackgroundJob<'a> {
pub fn new(writer: &'a mut dyn MyWriter) -> Self {
Self {
writer: Arc::new(Mutex::new(writer)),
job: None,
}
}
pub fn start(&mut self) {
assert!(self.job.is_none());
let writer = &self.writer;
self.job = Some(std::thread::spawn(move || {
// this background job uses "writer"
let mut my_writer = writer.lock().unwrap();
my_writer.write("x");
// do something
my_writer.write("y");
}));
}
pub fn stop(&mut self) {
if let Some(job) = self.job {
job.join().unwrap();
self.job = None;
}
}
}
/* Using BackgroundJob */
fn main() {
let mut writer = StdoutWriter::new();
writer.write("a");
{
let mut job = BackgroundJob::new(&mut writer);
// inside this block, writer is owned by "job"
job.start();
job.stop();
}
// writer should be usable again
writer.write("b");
}
The desired output on stdout is a\nx\ny\nz\n, but the program does not even compile. My main problem is that (dyn MyWriter + 'a) cannot be shared between threads safely (compiler error).
How can I implement Send / Sync for a trait? It does not seem to be possible. Actually, I assumed it should be ok if the object (or a ref.) is inside Arc<Mutex<...>>, but that does not seem to be sufficient. Why not?
Maybe someone has an idea how this can be fixed or even more important what exactly is the underlying issue?

Putting a reference in an Arc doesn't work. Since the Arc can be kept alive indefinitely simply by cloning it, the reference could easily outlive whatever it was borrowed from, so that can't compile. You need to put an owned object in the Arc, such as Box<dyn MyWriter>. (Ideally you'd just use Arc<dyn MyWriter>, but that would conflict with returning the writer from the BackgroundJob, as shown below.)
Since you can't borrow from writer in main, you must move it into the BackgroundJob. But at this point you've relinquished ownership over writer, having moved the value to BackgroundJob, so your only option is to have BackgroundJob return the writer. However, since BackgroundJob keeps its writer behind a trait object, it can only give back the Box<dyn MyWriter> it stores, not the original StdoutWriter.
Here is the version that works that way, retaining type erasure and giving back the type-erased writer:
// Trait definition and StdoutWriter implementation unchanged
struct BackgroundJob {
writer: Arc<Mutex<Box<dyn MyWriter + Send>>>,
job: Option<JoinHandle<()>>,
}
impl BackgroundJob {
pub fn new(writer: Box<dyn MyWriter + Send>) -> Self {
Self {
writer: Arc::new(Mutex::new(writer)),
job: None,
}
}
pub fn start(&mut self) {
assert!(self.job.is_none());
let writer = Arc::clone(&self.writer);
self.job = Some(std::thread::spawn(move || {
// this background job uses "writer"
let mut my_writer = writer.lock().unwrap();
my_writer.write("x");
// do something
my_writer.write("y");
}));
}
pub fn stop(&mut self) {
if let Some(job) = self.job.take() {
job.join().unwrap();
}
}
pub fn into_writer(self) -> Box<dyn MyWriter> {
Arc::try_unwrap(self.writer)
.unwrap_or_else(|_| panic!())
.into_inner()
.unwrap()
}
}
fn main() {
let mut writer = StdoutWriter::new();
writer.write("a");
let mut writer = {
let mut job = BackgroundJob::new(Box::new(writer));
job.start();
job.stop();
job.into_writer()
};
writer.write("b");
}
Playground
A version that gave back the writer of the same type would have to give up on type erasure and be generic over the writer type. Though a bit more complex, its ownership semantics would be very close (at least conceptually) to what you originally envisioned:
struct BackgroundJob<W> {
writer: Arc<Mutex<W>>,
job: Option<JoinHandle<()>>,
}
impl<W: MyWriter + Send + 'static> BackgroundJob<W> {
pub fn new(writer: W) -> Self {
Self {
writer: Arc::new(Mutex::new(writer)),
job: None,
}
}
// start() and stop() are unchanged
pub fn into_writer(self) -> W {
Arc::try_unwrap(self.writer)
.unwrap_or_else(|_| panic!())
.into_inner()
.unwrap()
}
}
fn main() {
let mut writer = StdoutWriter::new();
writer.write("a");
{
// inside this block, writer is moved into "job"
let mut job = BackgroundJob::new(writer);
job.start();
job.stop();
// reclaim the writer
writer = job.into_writer();
}
writer.write("b");
}
Playground

The main issue is that you want to pass a reference to the thread. The problem with that approach is that the thread can outlive the referenced object. Obviously this does not happen in your case, but the rust compiler cannot reason about that.
The solution to that problem is to use Arc<Mutex<dyn MyType>> instead of Arc<Mutex<&dyn MyType>> - no lifetimes - no problems.
The next issue is with Mutex<T> - it can be send across threads only if T can. So you have to make T, in your case dyn MyType, implement Send. This can be done in two ways:
Make MyType require Send - in that case that trait can be implemented only by Send objects:
trait MyWriter : Send{
fn write(&mut self, text: &str);
}
Or use an additional trait bound - in that case your trait is less restrictive, but you must always specify MyTrait + Send when you want to send it across threads:
Arc<Mutex<dyn MyWriter + Send>>
So far so good, but now your new() method does not work, because dyn MyWriter is not Sized. In order to fix that you have to make your method generic:
pub fn new<T: MyWriter + Send>(writer: T) -> Self {
Self {
writer: Arc::new(Mutex::new(writer)),
job: None,
}
}
or directly pass an Arc<Mutex<dyn MyWriter + Send>>:
pub fn new(writer: Arc<Mutex<dyn MyWriter + Send>>) -> Self {
Self { writer, job: None }
}
Full working code
use std::sync::Arc;
use std::sync::Mutex;
use std::thread::JoinHandle;
trait MyWriter {
fn write(&mut self, text: &str);
}
struct StdoutWriter {}
impl StdoutWriter {
pub fn new() -> Self {
Self {}
}
}
impl MyWriter for StdoutWriter {
fn write(&mut self, text: &str) {
println!("{}", text);
}
}
/* A background job which uses a "MyWriter" */
struct BackgroundJob {
writer: Arc<Mutex<dyn MyWriter + Send>>,
job: Option<JoinHandle<()>>,
}
impl BackgroundJob {
pub fn new(writer: Arc<Mutex<dyn MyWriter + Send>>) -> Self {
Self { writer, job: None }
}
pub fn start(&mut self) {
assert!(self.job.is_none());
let writer = self.writer.clone();
self.job = Some(std::thread::spawn(move || {
let mut my_writer = writer.lock().unwrap();
my_writer.write("x");
// do something
my_writer.write("y");
}));
}
pub fn stop(&mut self) {
if let Some(job) = self.job.take() {
job.join().unwrap();
}
}
}
fn main() {
let mut writer = StdoutWriter::new();
writer.write("a");
let writer = Arc::new(Mutex::new(writer));
{
let mut job = BackgroundJob::new(writer.clone());
// inside this block, writer is owned by "job"
job.start();
job.stop();
}
// you have to acquire the lock in order to use the writer
writer.lock().unwrap_or_else(|e| e.into_inner()).write("b");
}

How to implement streams from future functions

in order to understand how streams work I was trying to implement an infinite number generator that uses random.org. The first thing I did, was implementing a version where I would call an async function called get_number and it would fill a buffer and return the next possible number:
struct RandomGenerator {
buffer: Vec<u8>,
position: usize,
}
impl RandomGenerator {
pub fn new() -> RandomGenerator {
Self {
buffer: Vec::new(),
position: 0,
}
}
pub async fn get_number(&mut self) -> u8 {
self.fill_buffer().await;
let value = self.buffer[self.position];
self.position += 1;
value
}
async fn fill_buffer(&mut self) {
if self.buffer.is_empty() || self.is_buffer_depleted() {
let new_numbers = self.fetch_numbers().await;
drop(replace(&mut self.buffer, new_numbers));
self.position = 0;
}
}
fn is_buffer_depleted(&self) -> bool {
self.buffer.len() >= self.position
}
async fn fetch_numbers(&mut self) -> Vec<u8> {
let response = reqwest::get("https://www.random.org/integers/?num=10&min=1&max=100&col=1&base=10&format=plain&rnd=new").await.unwrap();
let numbers = response.text().await.unwrap();
numbers
.lines()
.map(|line| line.trim().parse::<u8>().unwrap())
.collect()
}
}
with this implementation, I can call the function get_number on a loop and get as many numbers I want but the idea was to have iterators so I can call a bunch of composition functions like take, take_while, and others.
But the moment I try to implement a Stream, the problems start to rise:
My first try was to have a struct that would hold a reference to the generator
struct RandomGeneratorStream<'a> {
generator: &'a mut RandomGenerator,
}
and then I've implemented the following Stream
impl<'a> Stream for RandomGeneratorStream<'a> {
type Item = u8;
fn poll_next(
self: std::pin::Pin<&mut Self>,
cx: &mut std::task::Context<'_>,
) -> std::task::Poll<Option<Self::Item>> {
let f = self.get_mut().generator.get_number();
pin_mut!(f);
f.poll_unpin(cx).map(Some)
}
}
but calling this would just hang the process
generator.into_stream().take(18).collect::<Vec<u8>>().await
On the next tries, I tried to hold a state of the future on the stream struct using pin_mut! but ended up having many errors with lifetimes without being able to solve them.
What can be done in that case?
Here is a working code without the streams:
use std::mem::replace;
struct RandomGenerator {
buffer: Vec<u8>,
position: usize,
}
impl RandomGenerator {
pub fn new() -> RandomGenerator {
Self {
buffer: Vec::new(),
position: 0,
}
}
pub async fn get_number(&mut self) -> u8 {
self.fill_buffer().await;
let value = self.buffer[self.position];
self.position += 1;
value
}
async fn fill_buffer(&mut self) {
if self.buffer.is_empty() || self.is_buffer_depleted() {
let new_numbers = self.fetch_numbers().await;
drop(replace(&mut self.buffer, new_numbers));
self.position = 0;
}
}
fn is_buffer_depleted(&self) -> bool {
self.buffer.len() >= self.position
}
async fn fetch_numbers(&mut self) -> Vec<u8> {
let response = reqwest::get("https://www.random.org/integers/?num=10&min=1&max=100&col=1&base=10&format=plain&rnd=new").await.unwrap();
let numbers = response.text().await.unwrap();
numbers
.lines()
.map(|line| line.trim().parse::<u8>().unwrap())
.collect()
}
}
#[tokio::main]
async fn main() {
let mut generator = RandomGenerator::new();
dbg!(generator.get_number().await);
}
Here you can find a link to the first working sample (instead of calling random.org I've used a Cursor because dns resolution was not working on the playground) https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=730eaf1f7db842877d3f3e7ca1c6d2a5
And my last try with streams you can find here https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=de0b212ee70865f6ac6c19430cd952cd

On the next tries, I tried to hold a state of the future on the stream struct using pin_mut! but ended up having many errors with lifetimes without being able to solve them.
You were on the right track, you would need to persist the future in order for poll_next to work properly.
Unfortunately, you'll run into a roadblock with mutable references. You're keeping a &mut RandomGenerator in order to use it repeatedly, but the future itself also has to keep a &mut RandomGenerator for it to do its job. This would violate the exclusivity of mutable references. Any way you cut it will likely face this problem.
The better way to go from Futures to a Stream is to follow the advice here and use futures::stream::unfold:
fn as_stream<'a>(&'a mut self) -> impl Stream<Item = u8> + 'a {
futures::stream::unfold(self, |rng| async {
let number = rng.get_number().await;
Some((number, rng))
})
}
See it on the playground.
This may not necessarily help you learn more about streams, but the provided functions are usually better than hand-rolling. The key reason this avoids the multiple-mutable-references problem above is because the function generating the future takes ownership of the mutable reference, and then gives it back when its done. That way only one exists at a time. Even if you implemented Stream yourself, you'd have to use a similar mechanism.

Rust struct within struct: borrowing, lifetime, generic types and more total confusion

I'm trying to modify an existing application that forces me to learn rust and it's giving me a hard time (reformulating...)
I would like to have a struct with two fields:
pub struct Something<'a> {
pkt_wtr: PacketWriter<&'a mut Vec<u8>>,
buf: Vec<u8>,
}
Where 'buf' will be used as an io for PacketWriter to write its results. So PacketWriter is something like
use std::io::{self};
pub struct PacketWriter<T :io::Write> {
wtr :T,
}
impl <T :io::Write> PacketWriter<T> {
pub fn new(wtr :T) -> Self {
return PacketWriter {
wtr,
};
}
pub fn into_inner(self) -> T {
self.wtr
}
pub fn write(&mut self) {
self.wtr.write_all(&[10,11,12]).unwrap();
println!("wrote packet");
}
}
Then inside 'Something' I want to use PacketWriter this way: let it write what it needs in 'buf' and drain it by pieces.
impl Something<'_> {
pub fn process(&mut self) {
self.pkt_wtr.write();
let c = self.buf.drain(0..1);
}
}
What seems to be impossible is to create a workable constructor for 'Something'
impl Something<'_> {
pub fn new() -> Self {
let mut buf = Vec::new();
let pkt_wtr = PacketWriter::new(&mut buf);
return Something {
pkt_wtr: pkt_wtr,
buf: buf,
};
}
}
What does not seem to be doable is, however I try, to have PacketWriter being constructed on a borrowed reference from 'buf' while 'buf' is also stored in the 'Something' object.
I can give 'buf' fully to 'PacketWriter' (per example below) but I cannot then access the content of 'buf' later. I know that it works in the example underneath, but it's because I can have access to the 'buf' after it is given to the "PacketWriter' (through 'wtr'). In reality, the 'PacketWriter' has that field (wtr) private and in addition it's a code that I cannot modify to, for example, obtain a getter for 'wtr'
Thanks
I wrote a small working program to describe the intent and the problem, with the two options
use std::io::{self};
pub struct PacketWriter<T :io::Write> {
wtr :T,
}
impl <T :io::Write> PacketWriter<T> {
pub fn new(wtr :T) -> Self {
return PacketWriter {
wtr,
};
}
pub fn into_inner(self) -> T {
self.wtr
}
pub fn write(&mut self) {
self.wtr.write_all(&[10,11,12]).unwrap();
println!("wrote packet");
}
}
/*
// that does not work of course because buf is local but this is not the issue
pub struct Something<'a> {
pkt_wtr: PacketWriter<&'a mut Vec<u8>>,
buf: Vec<u8>,
}
impl Something<'_> {
pub fn new() -> Self {
let mut buf = Vec::new();
let pkt_wtr = PacketWriter::new(&mut buf);
//let mut pkt_wtr = PacketWriter::new(buf);
return Something {
pkt_wtr,
buf,
};
}
pub fn process(&mut self) {
self.pkt_wtr.write();
println!("process {:?}", self.buf);
}
}
*/
pub struct Something {
pkt_wtr: PacketWriter<Vec<u8>>,
}
impl Something {
pub fn new() -> Self {
let pkt_wtr = PacketWriter::new(Vec::new());
return Something {
pkt_wtr,
};
}
pub fn process(&mut self) {
self.pkt_wtr.write();
let file = &mut self.pkt_wtr.wtr;
println!("processing Something {:?}", file);
let c = file.drain(0..1);
println!("Drained {:?}", c);
}
}
fn main() -> std::io::Result<()> {
let mut file = Vec::new();
let mut wtr = PacketWriter::new(&mut file);
wtr.write();
println!("Got data {:?}", file);
{
let c = file.drain(0..2);
println!("Drained {:?}", c);
}
println!("Remains {:?}", file);
let mut data = Something::new();
data.process();
Ok(())
}

It's not totally clear what the question is, given that the code appears to compile, but I can take a stab at one part: why can't you use into_inner() on self.wtr inside the process function?
into_inner takes ownership of the PacketWriter that gets passed into its self parameter. (You can tell this because the parameter is spelled self, rather than &self or &mut self.) Taking ownership means that it is consumed: it cannot be used anymore by the caller and the callee is responsible for dropping it (read: running destructors). After taking ownership of the PacketWriter, the into_inner function returns just the wtr field and drops (runs destructors on) the rest. But where does that leave the Something struct? It has a field that needs to contain a PacketWriter, and you just took its PacketWriter away and destroyed it! The function ends, and the value held in the PacketWriter field is unknown: it can't be thing that was in there from the beginning, because that was taken over by into_inner and destroyed. But it also can't be anything else.
Rust generally forbids structs from having uninitialized or undefined fields. You need to have that field defined at all times.
Here's the worked example:
pub fn process(&mut self) {
self.pkt_wtr.write();
// There's a valid PacketWriter in pkt_wtr
let raw_wtr: Vec<u8> = self.pkt_wtr.into_inner();
// The PacketWriter in pkt_wtr was consumed by into_inner!
// We have a raw_wtr of type Vec<u8>, but that's not the right type for pkt_wtr
// We could try to call this function here, but what would it do?
self.pkt_wtr.write();
println!("processing Something");
}
(Note: The example above has slightly squishy logic. Formally, because you don't own self, you can't do anything that would take ownership of any part of it, even if you put everything back neatly when you're done.)
You have a few options to fix this, but with one major caveat: with the public interface you have described, there is no way to get access to the PacketWriter::wtr field and put it back into the same PacketWriter. You'll have to extract the PacketWriter::wtr field and put it into a new PacketWriter.
Here's one way you could do it. Remember, the goal is to have self.packet_wtr defined at all times, so we'll use a function called mem::replace to put a dummy PacketWriter into self.pkt_wtr. This ensures that self.pkt_wtr always has something in it.
pub fn process(&mut self) {
self.pkt_wtr.write();
// Create a new dummy PacketWriter and swap it with self.pkt_wtr
// Returns an owned version of pkt_wtr that we're free to consume
let pkt_wtr_owned = std::mem::replace(&mut self.pkt_wtr, PacketWriter::new(Vec::new()));
// Consume pkt_wtr_owned, returning its wtr field
let raw_wtr = pkt_wtr_owned.into_inner();
// Do anything you want with raw_wtr here -- you own it.
println!("The vec is: {:?}", &raw_wtr);
// Create a new PacketWriter with the old PacketWriter's buffer.
// The dummy PacketWriter is dropped here.
self.pkt_wtr = PacketWriter::new(raw_wtr);
println!("processing Something");
}
Rust Playground
This solution is definitely a hack, and it's potentially a place where the borrow checker could be improved to realize that leaving a field temporarily undefined is fine, as long as it's not accessed before it is assigned again. (Though there may be an edge case I missed; this stuff is hard to reason about in general.) Additionally, this is the kind of thing that can be optimized away by later compiler passes through dead store elimination.
If this turns out to be a hotspot when profiling, there are unsafe techniques that would allow the field to be invalid for that period, but that would probably need a new question.
However, my recommendation would be to find a way to get an "escape hatch" function added to PacketWriter that lets you do exactly what you want to do: get a mutable reference to the inner wtr without taking ownership of PacketWriter.
impl<T: io::Write> PacketWriter<T> {
pub fn inner_mut(&mut self) -> &mut T {
&mut self.wtr
}
}

For clarification, I found a solution using Rc+RefCell or Arc+Mutex. I encapsulated the buffer in a Rc/RefCell and added a Write
pub struct WrappedWriter {
data :Arc<Mutex<Vec<u8>>>,
}
impl WrappedWriter {
pub fn new(data : Arc<Mutex<Vec<u8>>>) -> Self {
return WrappedWriter {
data,
};
}
}
impl Write for WrappedWriter {
fn write(&mut self, buf: &[u8]) -> Result<usize, Error> {
let mut data = self.data.lock().unwrap();
data.write(buf)
}
fn flush(&mut self) -> Result<(), Error> {
Ok(())
}
}
pub struct Something {
wtr: PacketWriter<WrappedWriter>,
data : Arc<Mutex<Vec<u8>>>,
}
impl Something {
pub fn new() -> Result<Self, Error> {
let data :Arc<Mutex<Vec<u8>>> = Arc::new(Mutex::new(Vec::new()));
let wtr = PacketWriter::new(WrappedWriter::new(Arc::clone(&data)));
return Ok(PassthroughDecoder {
wtr,
data,
});
}
pub fn process(&mut self) {
let mut data = self.data.lock().unwrap();
data.clear();
}
}
You can replace Arc by Rc and Mutex by RefCell if you don't have thread-safe issues in which case the reference access becomes
let data = self.data.borrow_mut();

How do I efficiently build a vector and an index of that vector while processing a data stream?

I have a struct Foo:
struct Foo {
v: String,
// Other data not important for the question
}
I want to handle a data stream and save the result into Vec<Foo> and also create an index for this Vec<Foo> on the field Foo::v.
I want to use a HashMap<&str, usize> for the index, where the keys will be &Foo::v and the value is the position in the Vec<Foo>, but I'm open to other suggestions.
I want to do the data stream handling as fast as possible, which requires not doing obvious things twice.
For example, I want to:
allocate a String only once per one data stream reading
not search the index twice, once to check that the key does not exist, once for inserting new key.
not increase the run time by using Rc or RefCell.
The borrow checker does not allow this code:
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
//here is loop in real code, like:
//let mut s: String;
//while get_s(&mut s) {
let s = "aaa".to_string();
let idx: usize = match hash.entry(&s) { //a
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: s }); //b
ent.insert(l.len() - 1);
l.len() - 1
}
};
// do something with idx
}
There are multiple problems:
hash.entry borrows the key so s must have a "bigger" lifetime than hash
I want to move s at line (b), while I have a read-only reference at line (a)
So how should I implement this simple algorithm without an extra call to String::clone or calling HashMap::get after calling HashMap::insert?

In general, what you are trying to accomplish is unsafe and Rust is correctly preventing you from doing something you shouldn't. For a simple example why, consider a Vec<u8>. If the vector has one item and a capacity of one, adding another value to the vector will cause a re-allocation and copying of all the values in the vector, invalidating any references into the vector. This would cause all of your keys in your index to point to arbitrary memory addresses, thus leading to unsafe behavior. The compiler prevents that.
In this case, there's two extra pieces of information that the compiler is unaware of but the programmer isn't:
There's an extra indirection — String is heap-allocated, so moving the pointer to that heap allocation isn't really a problem.
The String will never be changed. If it were, then it might reallocate, invalidating the referred-to address. Using a Box<[str]> instead of a String would be a way to enforce this via the type system.
In cases like this, it is OK to use unsafe code, so long as you properly document why it's not unsafe.
use std::collections::HashMap;
#[derive(Debug)]
struct Player {
name: String,
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let mut players = Vec::new();
let mut index = HashMap::new();
for &name in &names {
let player = Player { name: name.into() };
let idx = players.len();
// I copied this code from Stack Overflow without reading the prose
// that describes why this unsafe block is actually safe
let stable_name: &str = unsafe { &*(player.name.as_str() as *const str) };
players.push(player);
index.insert(idx, stable_name);
}
for (k, v) in &index {
println!("{:?} -> {:?}", k, v);
}
for v in &players {
println!("{:?}", v);
}
}
However, my guess is that you don't want this code in your main method but want to return it from some function. That will be a problem, as you will quickly run into Why can't I store a value and a reference to that value in the same struct?.
Honestly, there's styles of code that don't fit well within Rust's limitations. If you run into these, you could:
decide that Rust isn't a good fit for you or your problem.
use unsafe code, preferably thoroughly tested and only exposing a safe API.
investigate alternate representations.
For example, I'd probably rewrite the code to have the index be the primary owner of the key:
use std::collections::BTreeMap;
#[derive(Debug)]
struct Player<'a> {
name: &'a str,
data: &'a PlayerData,
}
#[derive(Debug)]
struct PlayerData {
hit_points: u8,
}
#[derive(Debug)]
struct Players(BTreeMap<String, PlayerData>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| (name.into(), PlayerData { hit_points: 100 }))
.collect();
Players(players)
}
fn get<'a>(&'a self, name: &'a str) -> Option<Player<'a>> {
self.0.get(name).map(|data| Player { name, data })
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for (k, v) in &players.0 {
println!("{:?} -> {:?}", k, v);
}
println!("{:?}", players.get("eustice"));
}
Alternatively, as shown in What's the idiomatic way to make a lookup table which uses field of the item as the key?, you could wrap your type and store it in a set container instead:
use std::collections::BTreeSet;
#[derive(Debug, PartialEq, Eq)]
struct Player {
name: String,
hit_points: u8,
}
#[derive(Debug, Eq)]
struct PlayerByName(Player);
impl PlayerByName {
fn key(&self) -> &str {
&self.0.name
}
}
impl PartialOrd for PlayerByName {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl Ord for PlayerByName {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.key().cmp(&other.key())
}
}
impl PartialEq for PlayerByName {
fn eq(&self, other: &Self) -> bool {
self.key() == other.key()
}
}
impl std::borrow::Borrow<str> for PlayerByName {
fn borrow(&self) -> &str {
self.key()
}
}
#[derive(Debug)]
struct Players(BTreeSet<PlayerByName>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| {
PlayerByName(Player {
name: name.into(),
hit_points: 100,
})
})
.collect();
Players(players)
}
fn get(&self, name: &str) -> Option<&Player> {
self.0.get(name).map(|pbn| &pbn.0)
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for player in &players.0 {
println!("{:?}", player.0);
}
println!("{:?}", players.get("eustice"));
}
not increase the run time by using Rc or RefCell
Guessing about performance characteristics without performing profiling is never a good idea. I honestly don't believe that there'd be a noticeable performance loss from incrementing an integer when a value is cloned or dropped. If the problem required both an index and a vector, then I would reach for some kind of shared ownership.

not increase the run time by using Rc or RefCell.
#Shepmaster already demonstrated accomplishing this using unsafe, once you have I would encourage you to check how much Rc actually would cost you. Here is a full version with Rc:
use std::{
collections::{hash_map::Entry, HashMap},
rc::Rc,
};
#[derive(Debug)]
struct Foo {
v: Rc<str>,
}
#[derive(Debug)]
struct Collection {
vec: Vec<Foo>,
index: HashMap<Rc<str>, usize>,
}
impl Foo {
fn new(s: &str) -> Foo {
Foo {
v: s.into(),
}
}
}
impl Collection {
fn new() -> Collection {
Collection {
vec: Vec::new(),
index: HashMap::new(),
}
}
fn insert(&mut self, foo: Foo) {
match self.index.entry(foo.v.clone()) {
Entry::Occupied(o) => panic!(
"Duplicate entry for: {}, {:?} inserted before {:?}",
foo.v,
o.get(),
foo
),
Entry::Vacant(v) => v.insert(self.vec.len()),
};
self.vec.push(foo)
}
}
fn main() {
let mut collection = Collection::new();
for foo in vec![Foo::new("Hello"), Foo::new("World"), Foo::new("Go!")] {
collection.insert(foo)
}
println!("{:?}", collection);
}

The error is:
error: `s` does not live long enough
--> <anon>:27:5
|
16 | let idx: usize = match hash.entry(&s) { //a
| - borrow occurs here
...
27 | }
| ^ `s` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
The note: at the end is where the answer is.
s must outlive hash because you are using &s as a key in the HashMap. This reference will become invalid when s is dropped. But, as the note says, hash will be dropped after s. A quick fix is to swap the order of their declarations:
let s = "aaa".to_string();
let mut hash = HashMap::<&str, usize>::new();
But now you have another problem:
error[E0505]: cannot move out of `s` because it is borrowed
--> <anon>:22:33
|
17 | let idx: usize = match hash.entry(&s) { //a
| - borrow of `s` occurs here
...
22 | l.push(Foo { v: s }); //b
| ^ move out of `s` occurs here
This one is more obvious. s is borrowed by the Entry, which will live to the end of the block. Cloning s will fix that:
l.push(Foo { v: s.clone() }); //b
I only want to allocate s only once, not cloning it
But the type of Foo.v is String, so it will own its own copy of the str anyway. Just that type means you have to copy the s.
You can replace it with a &str instead which will allow it to stay as a reference into s:
struct Foo<'a> {
v: &'a str,
}
pub fn main() {
// s now lives longer than l
let s = "aaa".to_string();
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
let idx: usize = match hash.entry(&s) {
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: &s });
ent.insert(l.len() - 1);
l.len() - 1
}
};
}
}
Note that, previously I had to move the declaration of s to before hash, so that it would outlive it. But now, l holds a reference to s, so it has to be declared even earlier, so that it outlives l.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Borrow checker and shared I/O - rust

Related

Lockless processing of non overlapping non contiguous indexes by multiple threads in Rust

Using a trait object in a background job (different thread)

How to implement streams from future functions

Rust struct within struct: borrowing, lifetime, generic types and more total confusion

How do I efficiently build a vector and an index of that vector while processing a data stream?

Categories

Resources