How to implement streams from future functions - rust

in order to understand how streams work I was trying to implement an infinite number generator that uses random.org. The first thing I did, was implementing a version where I would call an async function called get_number and it would fill a buffer and return the next possible number:
struct RandomGenerator {
buffer: Vec<u8>,
position: usize,
}
impl RandomGenerator {
pub fn new() -> RandomGenerator {
Self {
buffer: Vec::new(),
position: 0,
}
}
pub async fn get_number(&mut self) -> u8 {
self.fill_buffer().await;
let value = self.buffer[self.position];
self.position += 1;
value
}
async fn fill_buffer(&mut self) {
if self.buffer.is_empty() || self.is_buffer_depleted() {
let new_numbers = self.fetch_numbers().await;
drop(replace(&mut self.buffer, new_numbers));
self.position = 0;
}
}
fn is_buffer_depleted(&self) -> bool {
self.buffer.len() >= self.position
}
async fn fetch_numbers(&mut self) -> Vec<u8> {
let response = reqwest::get("https://www.random.org/integers/?num=10&min=1&max=100&col=1&base=10&format=plain&rnd=new").await.unwrap();
let numbers = response.text().await.unwrap();
numbers
.lines()
.map(|line| line.trim().parse::<u8>().unwrap())
.collect()
}
}
with this implementation, I can call the function get_number on a loop and get as many numbers I want but the idea was to have iterators so I can call a bunch of composition functions like take, take_while, and others.
But the moment I try to implement a Stream, the problems start to rise:
My first try was to have a struct that would hold a reference to the generator
struct RandomGeneratorStream<'a> {
generator: &'a mut RandomGenerator,
}
and then I've implemented the following Stream
impl<'a> Stream for RandomGeneratorStream<'a> {
type Item = u8;
fn poll_next(
self: std::pin::Pin<&mut Self>,
cx: &mut std::task::Context<'_>,
) -> std::task::Poll<Option<Self::Item>> {
let f = self.get_mut().generator.get_number();
pin_mut!(f);
f.poll_unpin(cx).map(Some)
}
}
but calling this would just hang the process
generator.into_stream().take(18).collect::<Vec<u8>>().await
On the next tries, I tried to hold a state of the future on the stream struct using pin_mut! but ended up having many errors with lifetimes without being able to solve them.
What can be done in that case?
Here is a working code without the streams:
use std::mem::replace;
struct RandomGenerator {
buffer: Vec<u8>,
position: usize,
}
impl RandomGenerator {
pub fn new() -> RandomGenerator {
Self {
buffer: Vec::new(),
position: 0,
}
}
pub async fn get_number(&mut self) -> u8 {
self.fill_buffer().await;
let value = self.buffer[self.position];
self.position += 1;
value
}
async fn fill_buffer(&mut self) {
if self.buffer.is_empty() || self.is_buffer_depleted() {
let new_numbers = self.fetch_numbers().await;
drop(replace(&mut self.buffer, new_numbers));
self.position = 0;
}
}
fn is_buffer_depleted(&self) -> bool {
self.buffer.len() >= self.position
}
async fn fetch_numbers(&mut self) -> Vec<u8> {
let response = reqwest::get("https://www.random.org/integers/?num=10&min=1&max=100&col=1&base=10&format=plain&rnd=new").await.unwrap();
let numbers = response.text().await.unwrap();
numbers
.lines()
.map(|line| line.trim().parse::<u8>().unwrap())
.collect()
}
}
#[tokio::main]
async fn main() {
let mut generator = RandomGenerator::new();
dbg!(generator.get_number().await);
}
Here you can find a link to the first working sample (instead of calling random.org I've used a Cursor because dns resolution was not working on the playground) https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=730eaf1f7db842877d3f3e7ca1c6d2a5
And my last try with streams you can find here https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=de0b212ee70865f6ac6c19430cd952cd

On the next tries, I tried to hold a state of the future on the stream struct using pin_mut! but ended up having many errors with lifetimes without being able to solve them.
You were on the right track, you would need to persist the future in order for poll_next to work properly.
Unfortunately, you'll run into a roadblock with mutable references. You're keeping a &mut RandomGenerator in order to use it repeatedly, but the future itself also has to keep a &mut RandomGenerator for it to do its job. This would violate the exclusivity of mutable references. Any way you cut it will likely face this problem.
The better way to go from Futures to a Stream is to follow the advice here and use futures::stream::unfold:
fn as_stream<'a>(&'a mut self) -> impl Stream<Item = u8> + 'a {
futures::stream::unfold(self, |rng| async {
let number = rng.get_number().await;
Some((number, rng))
})
}
See it on the playground.
This may not necessarily help you learn more about streams, but the provided functions are usually better than hand-rolling. The key reason this avoids the multiple-mutable-references problem above is because the function generating the future takes ownership of the mutable reference, and then gives it back when its done. That way only one exists at a time. Even if you implemented Stream yourself, you'd have to use a similar mechanism.

Related

How to store a callback in a struct that can change the struct's internal state?

In one of my projects, I would like to store a function pointer used as a callback to change the state of a struct. I've tried different things, but always encountered errors.
Consider the following situation (playground):
struct CallbackStruct {
callback: fn(i32) -> i32,
}
struct MainStruct {
pub callback_struct: Vec<CallbackStruct>,
pub intern_state: i32,
}
impl MainStruct {
pub fn new() -> MainStruct {
let result = MainStruct {
callback_struct: Vec::new(),
intern_state: 0,
};
// push a new call back struct
result.callback_struct.push(CallbackStruct{callback: result.do_stuff});
return result;
}
pub fn do_stuff(&mut self, i: i32) -> i32 {
self.intern_state = i * 2 + 1;
self.intern_state
}
}
fn main() {
let my_struct = MainStruct::new();
}
Here, I'm trying to keep a callback to the MainStruct, that can change it's internal state. This callback will only be stored by other structs owned by this main structure, so I don't think I'm having lifetimes issues - as long as the callback exists, the main struct does as well as it kind of own it.
Now, I'm wondering if having such a callback (with the &mut self reference) isn't a borrow of the main struct, preventing me from having multiple of them, or even keeping them?
In the example, I'm keeping a Vec of the CallbackStruct because I may have different structs all having these kinds of callbacks.
In c/c++, I can go with functions pointers, and I couldn't find a way to store such things in Rust.
How would one implement such a thing?
The problem that Rust is preventing is the possibility that being able to mutate the struct that holds the callback allows you to mutate or destroy the callback while it is executing. That is a problem.
If it is a common pattern that you want your callbacks to modify the structure that invokes the callback, then there's a couple options. Both require adjusting the function signature to pass-in data they are allowed to mutate. This would also allow multiple callbacks to mutate the same state since they only hold the mutable reference while they are running.
#1: Keep the state and callbacks separate
The idea is that the callback is explicitly given mutable access to the state which does not include the callback itself. This can be done by constructing a separate structure to hold the non-callback data, as shown here, or you can simply pass in multiple parameters:
struct InternalState(i32);
impl InternalState {
fn do_stuff(&mut self, i: i32) {
self.0 = i * 2 + 1;
}
}
struct MainStruct {
callbacks: Vec<Box<dyn FnMut(&mut InternalState)>>,
state: InternalState,
}
impl MainStruct {
fn new() -> MainStruct {
MainStruct {
callbacks: Vec::new(),
state: InternalState(0),
}
}
fn push(&mut self, f: Box<dyn FnMut(&mut InternalState)>) {
self.callbacks.push(f);
}
fn invoke(&mut self) {
for callback in &mut self.callbacks {
callback(&mut self.state)
}
}
}
fn main() {
let mut my_struct = MainStruct::new();
my_struct.push(Box::new(|state| { state.do_stuff(1); }));
my_struct.push(Box::new(|state| { state.do_stuff(2); }));
my_struct.invoke();
dbg!(my_struct.state.0);
}
#2 Remove the callback from the struct before executing it
You can mutate the whole struct itself if you remove the callback being ran. This can be done via the take-and-replace method used in this question. This has the added benefit that you have the opportunity to add new callbacks; you just have to reconcile the two sets of callbacks when putting them back:
struct InternalState(i32);
impl InternalState {
fn do_stuff(&mut self, i: i32) {
self.0 = i * 2 + 1;
}
}
struct MainStruct {
callbacks: Vec<Box<dyn FnMut(&mut MainStruct)>>,
state: InternalState,
}
impl MainStruct {
fn new() -> MainStruct {
MainStruct {
callbacks: Vec::new(),
state: InternalState(0),
}
}
fn push(&mut self, f: Box<dyn FnMut(&mut MainStruct)>) {
self.callbacks.push(f);
}
fn invoke(&mut self) {
let mut callbacks = std::mem::take(&mut self.callbacks);
for callback in &mut callbacks {
callback(self)
}
self.callbacks = callbacks;
}
}
fn main() {
let mut my_struct = MainStruct::new();
my_struct.push(Box::new(|main| { main.state.do_stuff(1); }));
my_struct.push(Box::new(|main| { main.state.do_stuff(2); }));
my_struct.invoke();
dbg!(my_struct.state.0);
}
You'll also noticed I changed the code from function pointers fn(i32) -> i32 to function trait objects Box<dyn FnMut(i32) -> i32> since the latter is much more flexible and common since it can actually capture other variables if needed.

Borrow checker and shared I/O

I'm hitting a problem in my code where multiple structs need to send data to a shared output sink and the borrow checker doesn't like it.
struct SharedWriter {
count: u32,
}
impl SharedWriter {
pub fn write(&mut self) {
self.count += 1;
}
}
struct Container<'a> {
writer: &'a mut SharedWriter,
}
impl Container<'_> {
pub fn write(&mut self) {
self.writer.write();
}
}
pub fn test() {
let mut writer = SharedWriter { count: 0 };
let mut c0 = Container {
writer: &mut writer,
};
let mut c1 = Container {
// compiler chokes here with:
// cannot borrow `writer` as mutable more than once at a time
writer: &mut writer,
};
c0.write();
c1.write();
}
I understand the problem and why it's happening; you can't borrow something as mutable more than once at a time.
What I don't understand is a good general solution. This pattern happens a lot. You've got a common output sink, like a file or a socket or a database, and you want to feed multiple streams of data to it. It has to be mutable if it maintains any kind of state. It has to be just a single entity if it holds any resources.
You could pass a reference to the sink in every single write() method (write(&mut writer, some_data)), but this clutters the code and will get called (in my particular app) millions of times per second. I'm speculating that there is some extra overhead in passing this parameter over and over.
Is there some syntax that will get past this problem?
Interior mutability.
In your case the easiest way is probably to use RefCell. It will have some runtime overhead, but it is safe.
use std::cell::RefCell;
struct SharedWriter {
count: RefCell<u32>,
}
impl SharedWriter {
pub fn new(count: u32) -> Self {
Self { count: RefCell::new(count) }
}
pub fn write(&self) {
*self.count.borrow_mut() += 1;
}
}
If the data is Copy (like u32, in case this is your real data), you may want to use Cell. It is applicable to less types but zero-cost:
use std::cell::Cell;
struct SharedWriter {
count: Cell<u32>,
}
impl SharedWriter {
pub fn new(count: u32) -> Self {
Self { count: Cell::new(count) }
}
pub fn write(&self) {
self.count.set(self.count.get() + 1);
}
}
There are more interior mutability primitives (for example, UnsafeCell for zero-cost but unsafe access, or mutexes and atomics for thread safe mutation).
One option would be to use a channel. Here is an example of how that might look. This also has the added benefit of allowing you to scale across multiple threads with your io. It takes a handler which it runs in a loop on a new thread. It blocks until a value sent through the sender is received then calls func with a given value and a mutable reference to the handler. The thread exits when all the senders have been dropped. However one downside of this approach is the channel only works in one direction.
use std::sync::mpsc::{channel, Sender};
use std::thread::{self, JoinHandle};
pub fn create_shared_io<T, H, F>(mut handler: H, mut func: F) -> (JoinHandle<H>, Sender<T>)
where
T: 'static + Send,
H: 'static + Send,
F: 'static + FnMut(&mut H, T) + Send,
{
let (send, recv) = channel();
let join_handle = thread::spawn(move || loop {
let value = match recv.recv() {
Ok(v) => v,
Err(_) => break handler,
};
func(&mut handler, value);
});
(join_handle, send)
}
And then it can be used similarly to your example. Since no data was passed in your example, it sends () as a placeholder.
pub fn main() {
let writer = SharedWriter { count: 0 };
println!("Starting!");
let (join_handle, sender) = create_shared_io(writer, |writer, _| {
writer.count += 1;
println!("Current count: {}", writer.count);
});
let mut c0 = Container {
writer: sender.clone(),
};
let mut c1 = Container {
writer: sender,
};
c0.write();
c1.write();
// Ensure the senders are dropped before we join the io thread to avoid possible deadlock
// where the compiler attempts to drop these values after the join.
std::mem::drop((c0, c1));
// Writer is returned when the thread is joined
let writer = join_handle.join().unwrap();
println!("Finished!");
}
struct SharedWriter {
count: u32,
}
struct Container {
writer: Sender<()>,
}
impl Container {
pub fn write(&mut self) {
self.writer.send(()).unwrap();
}
}

Lockless processing of non overlapping non contiguous indexes by multiple threads in Rust

I am practicing rust and decided to create a Matrix ops/factorization project.
Basically I want to be able to process the underlying vector in multiple threads. Since I will be providing each thread non-overlapping indexes (which may or may not be contiguous) and the threads will be joined before the end of whatever function created them, there is no need for a lock /synchronization.
I know that there are several crates that can do this, but I would like to know if there is a relatively idiomatic crate-free way to implement it on my own.
The best I could come up with is (simplified the code a bit):
use std::thread;
//This represents the Matrix
#[derive(Debug, Clone)]
pub struct MainStruct {
pub data: Vec<f64>,
}
//This is the bit that will be shared by the threads,
//ideally it should have its lifetime tied to that of MainStruct
//but i have no idea how to make phantomdata work in this case
#[derive(Debug, Clone)]
pub struct SliceTest {
pub data: Vec<SubSlice>,
}
//This struct is to hide *mut f64 to allow it to be shared to other threads
#[derive(Debug, Clone)]
pub struct SubSlice {
pub data: *mut f64,
}
impl MainStruct {
pub fn slice(&mut self) -> (SliceTest, SliceTest) {
let mut out_vec_odd: Vec<SubSlice> = Vec::new();
let mut out_vec_even: Vec<SubSlice> = Vec::new();
unsafe {
let ptr = self.data.as_mut_ptr();
for i in 0..self.data.len() {
let ptr_to_push = ptr.add(i);
//Non contiguous idxs
if i % 2 == 0 {
out_vec_even.push(SubSlice{data:ptr_to_push});
} else {
out_vec_odd.push(SubSlice{data:ptr_to_push});
}
}
}
(SliceTest{data: out_vec_even}, SliceTest{data: out_vec_odd})
}
}
impl SubSlice {
pub fn set(&self, val: f64) {
unsafe {*(self.data) = val;}
}
}
unsafe impl Send for SliceTest {}
unsafe impl Send for SubSlice {}
fn main() {
let mut maindata = MainStruct {
data: vec![0.0, 1.0, 2.0, 3.0, 4.0, 5.0],
};
let (mut outvec1, mut outvec2) = maindata.slice();
let mut threads = Vec::new();
threads.push(
thread::spawn(move || {
for i in 0..outvec1.data.len() {
outvec1.data[i].set(999.9);
}
})
);
threads.push(
thread::spawn(move || {
for i in 0..outvec2.data.len() {
outvec2.data[i].set(999.9);
}
})
);
for handles in threads {
handles.join();
}
println!("maindata = {:?}", maindata.data);
}
EDIT:
Following kmdreko suggestion below, got the code to work exactly how I wanted it without using unsafe code, yay!
Of course in terms of performance it may be cheaper to copy the f64 slices than to create mutable reference vectors unless your struct is filled with other structs instead of f64
extern crate crossbeam;
use crossbeam::thread;
#[derive(Debug, Clone)]
pub struct Matrix {
data: Vec<f64>,
m: usize, //number of rows
n: usize, //number of cols
}
...
impl Matrix {
...
pub fn get_data_mut(&mut self) -> &mut Vec<f64> {
&mut self.data
}
pub fn calculate_idx(max_cols: usize, i: usize, j: usize) -> usize {
let actual_idx = j + max_cols * i;
actual_idx
}
//Get individual mutable references for contiguous indexes (rows)
pub fn get_all_row_slices(&mut self) -> Vec<Vec<&mut f64>> {
let max_cols = self.max_cols();
let max_rows = self.max_rows();
let inner_data = self.get_data_mut().chunks_mut(max_cols);
let mut out_vec: Vec<Vec<&mut f64>> = Vec::with_capacity(max_rows);
for chunk in inner_data {
let row_vec = chunk.iter_mut().collect();
out_vec.push(row_vec);
}
out_vec
}
//Get mutable references for disjoint indexes (columns)
pub fn get_all_col_slices(&mut self) -> Vec<Vec<&mut f64>> {
let max_cols = self.max_cols();
let max_rows = self.max_rows();
let inner_data = self.get_data_mut().chunks_mut(max_cols);
let mut out_vec: Vec<Vec<&mut f64>> = Vec::with_capacity(max_cols);
for _ in 0..max_cols {
out_vec.push(Vec::with_capacity(max_rows));
}
let mut inner_idx = 0;
for chunk in inner_data {
let row_vec_it = chunk.iter_mut();
for elem in row_vec_it {
out_vec[inner_idx].push(elem);
inner_idx += 1;
}
inner_idx = 0;
}
out_vec
}
...
}
fn test_multithreading() {
fn test(in_vec: Vec<&mut f64>) {
for elem in in_vec {
*elem = 33.3;
}
}
fn launch_task(mat: &mut Matrix, f: fn(Vec<&mut f64>)) {
let test_vec = mat.get_all_row_slices();
thread::scope(|s| {
for elem in test_vec.into_iter() {
s.spawn(move |_| {
println!("Spawning thread...");
f(elem);
});
}
}).unwrap();
}
let rows = 4;
let cols = 3;
//new function code omitted, returns Result<Self, MatrixError>
let mut mat = Matrix::new(rows, cols).unwrap()
launch_task(&mut mat, test);
for i in 0..rows {
for j in 0..cols {
//Requires index trait implemented for matrix
assert_eq!(mat[(i, j)], 33.3);
}
}
}
This API is unsound. Since there is no lifetime annotation binding SliceTest and SubSlice to the MainStruct, they can be preserved after the data has been destroyed and if used would result in use-after-free errors.
Its easy to make it safe though; you can use .iter_mut() to get distinct mutable references to your elements:
pub fn slice(&mut self) -> (Vec<&mut f64>, Vec<&mut f64>) {
let mut out_vec_even = vec![];
let mut out_vec_odd = vec![];
for (i, item_ref) in self.data.iter_mut().enumerate() {
if i % 2 == 0 {
out_vec_even.push(item_ref);
} else {
out_vec_odd.push(item_ref);
}
}
(out_vec_even, out_vec_odd)
}
However, this surfaces another problem: thread::spawn cannot hold references to local variables. The threads created are allowed to live beyond the scope they're created in, so even though you did .join() them, you aren't required to. This was a potential issue in your original code as well, just the compiler couldn't warn about it.
There's no easy way to solve this. You'd need to use a non-referential way to use data on the other threads, but that would be using Arc, which doesn't allow mutating its data, so you'd have to resort to a Mutex, which is what you've tried to avoid.
I would suggest reaching for scope from the crossbeam crate, which does allow you to spawn threads that reference local data. I know you've wanted to avoid using crates, but this is the best solution in my opinion.
See a working version on the playground.
See:
How to get multiple mutable references to elements in a Vec?
Can you specify a non-static lifetime for threads?

Rust struct within struct: borrowing, lifetime, generic types and more total confusion

I'm trying to modify an existing application that forces me to learn rust and it's giving me a hard time (reformulating...)
I would like to have a struct with two fields:
pub struct Something<'a> {
pkt_wtr: PacketWriter<&'a mut Vec<u8>>,
buf: Vec<u8>,
}
Where 'buf' will be used as an io for PacketWriter to write its results. So PacketWriter is something like
use std::io::{self};
pub struct PacketWriter<T :io::Write> {
wtr :T,
}
impl <T :io::Write> PacketWriter<T> {
pub fn new(wtr :T) -> Self {
return PacketWriter {
wtr,
};
}
pub fn into_inner(self) -> T {
self.wtr
}
pub fn write(&mut self) {
self.wtr.write_all(&[10,11,12]).unwrap();
println!("wrote packet");
}
}
Then inside 'Something' I want to use PacketWriter this way: let it write what it needs in 'buf' and drain it by pieces.
impl Something<'_> {
pub fn process(&mut self) {
self.pkt_wtr.write();
let c = self.buf.drain(0..1);
}
}
What seems to be impossible is to create a workable constructor for 'Something'
impl Something<'_> {
pub fn new() -> Self {
let mut buf = Vec::new();
let pkt_wtr = PacketWriter::new(&mut buf);
return Something {
pkt_wtr: pkt_wtr,
buf: buf,
};
}
}
What does not seem to be doable is, however I try, to have PacketWriter being constructed on a borrowed reference from 'buf' while 'buf' is also stored in the 'Something' object.
I can give 'buf' fully to 'PacketWriter' (per example below) but I cannot then access the content of 'buf' later. I know that it works in the example underneath, but it's because I can have access to the 'buf' after it is given to the "PacketWriter' (through 'wtr'). In reality, the 'PacketWriter' has that field (wtr) private and in addition it's a code that I cannot modify to, for example, obtain a getter for 'wtr'
Thanks
I wrote a small working program to describe the intent and the problem, with the two options
use std::io::{self};
pub struct PacketWriter<T :io::Write> {
wtr :T,
}
impl <T :io::Write> PacketWriter<T> {
pub fn new(wtr :T) -> Self {
return PacketWriter {
wtr,
};
}
pub fn into_inner(self) -> T {
self.wtr
}
pub fn write(&mut self) {
self.wtr.write_all(&[10,11,12]).unwrap();
println!("wrote packet");
}
}
/*
// that does not work of course because buf is local but this is not the issue
pub struct Something<'a> {
pkt_wtr: PacketWriter<&'a mut Vec<u8>>,
buf: Vec<u8>,
}
impl Something<'_> {
pub fn new() -> Self {
let mut buf = Vec::new();
let pkt_wtr = PacketWriter::new(&mut buf);
//let mut pkt_wtr = PacketWriter::new(buf);
return Something {
pkt_wtr,
buf,
};
}
pub fn process(&mut self) {
self.pkt_wtr.write();
println!("process {:?}", self.buf);
}
}
*/
pub struct Something {
pkt_wtr: PacketWriter<Vec<u8>>,
}
impl Something {
pub fn new() -> Self {
let pkt_wtr = PacketWriter::new(Vec::new());
return Something {
pkt_wtr,
};
}
pub fn process(&mut self) {
self.pkt_wtr.write();
let file = &mut self.pkt_wtr.wtr;
println!("processing Something {:?}", file);
let c = file.drain(0..1);
println!("Drained {:?}", c);
}
}
fn main() -> std::io::Result<()> {
let mut file = Vec::new();
let mut wtr = PacketWriter::new(&mut file);
wtr.write();
println!("Got data {:?}", file);
{
let c = file.drain(0..2);
println!("Drained {:?}", c);
}
println!("Remains {:?}", file);
let mut data = Something::new();
data.process();
Ok(())
}
It's not totally clear what the question is, given that the code appears to compile, but I can take a stab at one part: why can't you use into_inner() on self.wtr inside the process function?
into_inner takes ownership of the PacketWriter that gets passed into its self parameter. (You can tell this because the parameter is spelled self, rather than &self or &mut self.) Taking ownership means that it is consumed: it cannot be used anymore by the caller and the callee is responsible for dropping it (read: running destructors). After taking ownership of the PacketWriter, the into_inner function returns just the wtr field and drops (runs destructors on) the rest. But where does that leave the Something struct? It has a field that needs to contain a PacketWriter, and you just took its PacketWriter away and destroyed it! The function ends, and the value held in the PacketWriter field is unknown: it can't be thing that was in there from the beginning, because that was taken over by into_inner and destroyed. But it also can't be anything else.
Rust generally forbids structs from having uninitialized or undefined fields. You need to have that field defined at all times.
Here's the worked example:
pub fn process(&mut self) {
self.pkt_wtr.write();
// There's a valid PacketWriter in pkt_wtr
let raw_wtr: Vec<u8> = self.pkt_wtr.into_inner();
// The PacketWriter in pkt_wtr was consumed by into_inner!
// We have a raw_wtr of type Vec<u8>, but that's not the right type for pkt_wtr
// We could try to call this function here, but what would it do?
self.pkt_wtr.write();
println!("processing Something");
}
(Note: The example above has slightly squishy logic. Formally, because you don't own self, you can't do anything that would take ownership of any part of it, even if you put everything back neatly when you're done.)
You have a few options to fix this, but with one major caveat: with the public interface you have described, there is no way to get access to the PacketWriter::wtr field and put it back into the same PacketWriter. You'll have to extract the PacketWriter::wtr field and put it into a new PacketWriter.
Here's one way you could do it. Remember, the goal is to have self.packet_wtr defined at all times, so we'll use a function called mem::replace to put a dummy PacketWriter into self.pkt_wtr. This ensures that self.pkt_wtr always has something in it.
pub fn process(&mut self) {
self.pkt_wtr.write();
// Create a new dummy PacketWriter and swap it with self.pkt_wtr
// Returns an owned version of pkt_wtr that we're free to consume
let pkt_wtr_owned = std::mem::replace(&mut self.pkt_wtr, PacketWriter::new(Vec::new()));
// Consume pkt_wtr_owned, returning its wtr field
let raw_wtr = pkt_wtr_owned.into_inner();
// Do anything you want with raw_wtr here -- you own it.
println!("The vec is: {:?}", &raw_wtr);
// Create a new PacketWriter with the old PacketWriter's buffer.
// The dummy PacketWriter is dropped here.
self.pkt_wtr = PacketWriter::new(raw_wtr);
println!("processing Something");
}
Rust Playground
This solution is definitely a hack, and it's potentially a place where the borrow checker could be improved to realize that leaving a field temporarily undefined is fine, as long as it's not accessed before it is assigned again. (Though there may be an edge case I missed; this stuff is hard to reason about in general.) Additionally, this is the kind of thing that can be optimized away by later compiler passes through dead store elimination.
If this turns out to be a hotspot when profiling, there are unsafe techniques that would allow the field to be invalid for that period, but that would probably need a new question.
However, my recommendation would be to find a way to get an "escape hatch" function added to PacketWriter that lets you do exactly what you want to do: get a mutable reference to the inner wtr without taking ownership of PacketWriter.
impl<T: io::Write> PacketWriter<T> {
pub fn inner_mut(&mut self) -> &mut T {
&mut self.wtr
}
}
For clarification, I found a solution using Rc+RefCell or Arc+Mutex. I encapsulated the buffer in a Rc/RefCell and added a Write
pub struct WrappedWriter {
data :Arc<Mutex<Vec<u8>>>,
}
impl WrappedWriter {
pub fn new(data : Arc<Mutex<Vec<u8>>>) -> Self {
return WrappedWriter {
data,
};
}
}
impl Write for WrappedWriter {
fn write(&mut self, buf: &[u8]) -> Result<usize, Error> {
let mut data = self.data.lock().unwrap();
data.write(buf)
}
fn flush(&mut self) -> Result<(), Error> {
Ok(())
}
}
pub struct Something {
wtr: PacketWriter<WrappedWriter>,
data : Arc<Mutex<Vec<u8>>>,
}
impl Something {
pub fn new() -> Result<Self, Error> {
let data :Arc<Mutex<Vec<u8>>> = Arc::new(Mutex::new(Vec::new()));
let wtr = PacketWriter::new(WrappedWriter::new(Arc::clone(&data)));
return Ok(PassthroughDecoder {
wtr,
data,
});
}
pub fn process(&mut self) {
let mut data = self.data.lock().unwrap();
data.clear();
}
}
You can replace Arc by Rc and Mutex by RefCell if you don't have thread-safe issues in which case the reference access becomes
let data = self.data.borrow_mut();

How to keep track of how many bytes written when using 'std::io::Write'?

When writing to a binary file-format, its useful to be able to check how many bytes have been written (for alignment for example), or just to ensure nested functions wrote the correct amount of data.
Is there a way to inspect std::io::Write to know how much has been written? If not, what would be a good approach to wrap the writer so it could track how many bytes have been written?
Write has two required methods: write and flush. Since write already returns the number of bytes written, you just track that:
use std::io::{self, Write};
struct ByteCounter<W> {
inner: W,
count: usize,
}
impl<W> ByteCounter<W>
where W: Write
{
fn new(inner: W) -> Self {
ByteCounter {
inner: inner,
count: 0,
}
}
fn into_inner(self) -> W {
self.inner
}
fn bytes_written(&self) -> usize {
self.count
}
}
impl<W> Write for ByteCounter<W>
where W: Write
{
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
let res = self.inner.write(buf);
if let Ok(size) = res {
self.count += size
}
res
}
fn flush(&mut self) -> io::Result<()> {
self.inner.flush()
}
}
fn main() {
let out = std::io::stdout();
let mut out = ByteCounter::new(out);
writeln!(&mut out, "Hello, world! {}", 42).unwrap();
println!("Wrote {} bytes", out.bytes_written());
}
It's important to not delegate write_all or write_fmt because these do not return the count of bytes. Delegating them would allow bytes to be written and not tracked.
If the type you write to implements std::io::Seek, you can use seek to get the current position:
pos = f.seek(SeekFrom::Current(0))?;
Seek is implemented by std::fs::File (and std::io::BufWriter if the wrapped type implements Seek too).
So the function signature:
use ::std::io::{Write, Seek, SeekFrom, Error};
fn my_write<W: Write>(f: &mut W) -> Result<(), Error> { ... }
Needs to have the Seek trait added:
fn my_write<W: Write + Seek>(f: &mut W) -> Result<(), Error> { ... }

Resources