Rust MemWriter return pointer to Buffer - rust

I'd like to have a function use a MemWriter to write some bytes and then return a pointer to the buffer. I'm struggling to understand how to use lifetimes in this case. How would I make the below code work and what should I read to fill my knowledge gap here?
struct Request<T: Encodable> {
id: i16,
e: T
}
impl <T: Encodable> Request<T> {
fn serialize<'s>(&'s self) -> io::IoResult<&'s Vec<u8>> {
let mut writer = io::MemWriter::new();
try!(writer.write_be_i16(0 as i16));
let buf = writer.unwrap();
let size = buf.len();
let result: io::IoResult<&Vec<u8>> = Ok(&buf);
result
}
}

You can't return a reference to a buffer that is stored nowhere
You need to store your buffer internally, or you would try to return a reference to freed memory, which is dangerous and thus forbidden by the lifetime checker.
For example like this :
struct Request<T: Encodable> {
buf: Vec<u8>
}
impl <T: Encodable> Request<T> {
fn serialize<'s>(&'s mut self) -> io::IoResult<&'s Vec<u8>> { //'
let mut writer = io::MemWriter::new();
try!(writer.write_be_i16(0 as i16));
self.buf = writer.unwrap();
let size = self.buf.len();
let result: io::IoResult<&Vec<u8>> = Ok(&self.buf);
result
}
}
Or, as Vladimir Matveev pointed out in the comments, you can simply return the Vec. Vec is already a container safely managing memory on the heap, returning it directly should be good for you in most situations, and this way you avoid any lifetime issues.
impl <T: Encodable> Request<T> {
fn serialize(&mut self) -> io::IoResult<Vec<u8>> {
let mut writer = io::MemWriter::new();
try!(writer.write_be_i16(0 as i16));
let buf = writer.unwrap();
let size = buf.len();
Ok(buf)
}
}

Related

Is there a way in Rust to overload method for a specific type?

The following is only an example. If there's a native solution for this exact problem with reading bytes - cool, but my goal is to learn how to do it by myself, for any other purpose as well.
I'd like to do something like this: (pseudo-code below)
let mut reader = Reader::new(bytesArr);
let int32: i32 = reader.read(); // separate implementation to read 4 bits and convert into int32
let int64: i64 = reader.read(); // separate implementation to read 8 bits and convert into int64
I imagine it looking like this: (pseudo-code again)
impl Reader {
read<T>(&mut self) -> T {
// if T is i32 ... else if ...
}
}
or like this:
impl Reader {
read(&mut self) -> i32 {
// ...
}
read(&mut self) -> i64 {
// ...
}
}
But haven't found anything relatable yet.
(I actually have, for the first case (if T is i32 ...), but it looked really unreadable and inconvenient)
You could do this by having a Readable trait which you implement on i32 and i64, which does the operation. Then on Reader you could have a generic function which takes any type that is Readable and return it, for example:
struct Reader {
n: u8,
}
trait Readable {
fn read_from_reader(reader: &mut Reader) -> Self;
}
impl Readable for i32 {
fn read_from_reader(reader: &mut Reader) -> i32 {
reader.n += 1;
reader.n as i32
}
}
impl Readable for i64 {
fn read_from_reader(reader: &mut Reader) -> i64 {
reader.n += 1;
reader.n as i64
}
}
impl Reader {
fn read<T: Readable>(&mut self) -> T {
T::read_from_reader(self)
}
}
fn main() {
let mut r = Reader { n: 0 };
let int32: i32 = r.read();
let int64: i64 = r.read();
println!("{} {}", int32, int64);
}
You can try it on the playground
After some trials and searches, I found that implementing them in current Rust seems a bit difficult, but not impossible.
Here is the code, I'll explain it afterwards:
#![feature(generic_const_exprs)]
use std::{
mem::{self, MaybeUninit},
ptr,
};
static DATA: [u8; 8] = [
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
];
struct Reader;
impl Reader {
fn read<T: Copy + Sized>(&self) -> T
where
[(); mem::size_of::<T>()]: ,
{
let mut buf = [unsafe { MaybeUninit::uninit().assume_init() }; mem::size_of::<T>()];
unsafe {
ptr::copy_nonoverlapping(DATA.as_ptr(), buf.as_mut_ptr(), buf.len());
mem::transmute_copy(&buf)
}
}
}
fn main() {
let reader = Reader;
let v_u8: u8 = reader.read();
dbg!(v_u8);
let v_u16: u16 = reader.read();
dbg!(v_u16);
let v_u32: u32 = reader.read();
dbg!(v_u32);
let v_u64: u64 = reader.read();
dbg!(v_u64);
}
Suppose the global static variable DATA is the target data you want to read.
In current Rust, we cannot directly use the size of a generic parameter as the length of an array. This does not work:
fn example<T: Copy + Sized>() {
let mut _buf = [0_u8; mem::size_of::<T>()];
}
The compiler gives a weird error:
error: unconstrained generic constant
--> src\main.rs:34:31
|
34 | let mut _buf = [0_u8; mem::size_of::<T>()];
| ^^^^^^^^^^^^^^^^^^^
|
= help: try adding a `where` bound using this expression: `where [(); mem::size_of::<T>()]:`
There is an issue that is tracking it, if you want to go deeper into this error you can take a look.
We just follow the compiler's suggestion to add a where bound. This requires feature generic_const_exprs to be enabled.
Next, unsafe { MaybeUninit::uninit().assume_init() } is optional, which drops the overhead of initializing this array, since we will eventually overwrite it completely. You can replace it with 0_u8 if you don't like it.
Finally, copy the data you need and transmute this array to your generic type, return.
I think you will see the output you expect:
[src\main.rs:38] v_u8 = 255
[src\main.rs:41] v_u16 = 65535
[src\main.rs:44] v_u32 = 4294967295
[src\main.rs:47] v_u64 = 18446744073709551615

What's the idiomatic way to create a iterator that owns some intermediate data and also points to it?

I'm trying to create a struct that wraps around stdin to provide something like C++'s std::cin.
I want to keep a String with the current line of the input and a SplitAsciiWhitespace iterator to its current token. When I reach the end of the iterator, I want to get a new line.
I'm not worried about error checking and I'm not interested in any crates. This is not for production code, it's just for practicing. I want to avoid using unsafe, as a way to practice the correct mindset.
The idea is that I can use it as follows:
let mut reader = Reader::new();
let x: i32 = reader.read();
let s: f32 = reader.read();
My current attempt is the following, but it doesn't compile. Can somebody give me a pointer on the proper way to do this?
struct Reader<'a> {
line: String,
token: std::str::SplitAsciiWhitespace<'a>,
}
impl<'a> Reader<'a> {
fn new() -> Self {
let line = String::new();
let token = line.split_ascii_whitespace();
Reader { line, token }
}
fn read<T: std::str::FromStr + std::default::Default>(&'a mut self) -> T {
let token = loop {
if let Some(token) = self.token.next() {
break token;
}
let stdin = io::stdin();
stdin.read_line(&mut self.line).unwrap();
self.token = self.line.split_ascii_whitespace();
};
token.parse().unwrap_or_default()
}
}
This question explains why it can't be done this way but does not provide an alternative solution. The "How do I fix it" section simply says "don't put these two things in the same struct", but I can't think of a way to do it separately while keeping a similar interface to the user.
Found a solution: keeping track of how much of the string we've read so far by using a simple index.
It does require some pointer arithmetic, but seems to work nicely.
Not sure if this counts as "idiomatic" Rust, tho.
struct Reader {
line: String,
offset: usize,
}
impl Reader {
fn new() -> Self {
Reader { line: String::new(), offset: 0 }
}
fn next<T: std::str::FromStr + std::default::Default> (&mut self) -> T {
loop {
let rem = &self.line[self.offset..];
let token = rem.split_whitespace().next();
if let Some(token) = token {
self.offset = token.as_ptr() as usize - self.line.as_ptr() as usize + token.len();
return token.parse::<T>().unwrap_or_default();
}
self.line.clear();
std::io::stdin().read_line(&mut self.line).unwrap();
self.offset = 0;
}
}
}

Why can I just pass an immutable reference to BufReader, instead of a mutable reference? [duplicate]

This question already has an answer here:
Why is it possible to implement Read on an immutable reference to File?
(1 answer)
Closed 6 years ago.
I am writing a simple TCP-based echo server. When I tried to use BufReader and BufWriter to read from and write to a TcpStream, I found that passing a TcpStream to BufReader::new() by value moves its ownership so that I couldn't pass it to a BufWriter. Then, I found an answer in this thread that solves the problem:
fn handle_client(stream: TcpStream) {
let mut reader = BufReader::new(&stream);
let mut writer = BufWriter::new(&stream);
// Receive a message
let mut message = String::new();
reader.read_line(&mut message).unwrap();
// ingored
}
This is simple and it works. However, I can not quite understand why this code works. Why can I just pass an immutable reference to BufReader::new(), instead of a mutable reference ?
The whole program can be found here.
More Details
In the above code, I used reader.read_line(&mut message). So I opened the source code of BufRead in Rust standard library and saw this:
fn read_line(&mut self, buf: &mut String) -> Result<usize> {
// ignored
append_to_string(buf, |b| read_until(self, b'\n', b))
}
Here we can see that it passes the self (which may be a &mut BufReader in my case) to read_until(). Next, I found the following code in the same file:
fn read_until<R: BufRead + ?Sized>(r: &mut R, delim: u8, buf: &mut Vec<u8>)
-> Result<usize> {
let mut read = 0;
loop {
let (done, used) = {
let available = match r.fill_buf() {
Ok(n) => n,
Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
Err(e) => return Err(e)
};
match memchr::memchr(delim, available) {
Some(i) => {
buf.extend_from_slice(&available[..i + 1]);
(true, i + 1)
}
None => {
buf.extend_from_slice(available);
(false, available.len())
}
}
};
r.consume(used);
read += used;
if done || used == 0 {
return Ok(read);
}
}
}
In this part, there are two places using the BufReader: r.fill_buf() and r.consume(used). I thought r.fill_buf() is what I want to see. Therefore, I went to the code of BufReader in Rust standard library and found this:
fn fill_buf(&mut self) -> io::Result<&[u8]> {
// ignored
if self.pos == self.cap {
self.cap = try!(self.inner.read(&mut self.buf));
self.pos = 0;
}
Ok(&self.buf[self.pos..self.cap])
}
It seems like it uses self.inner.read(&mut self.buf) to read the data from self.inner. Then, we take a look at the structure of BufReader and the BufReader::new():
pub struct BufReader<R> {
inner: R,
buf: Vec<u8>,
pos: usize,
cap: usize,
}
// ignored
impl<R: Read> BufReader<R> {
// ignored
#[stable(feature = "rust1", since = "1.0.0")]
pub fn new(inner: R) -> BufReader<R> {
BufReader::with_capacity(DEFAULT_BUF_SIZE, inner)
}
// ignored
#[stable(feature = "rust1", since = "1.0.0")]
pub fn with_capacity(cap: usize, inner: R) -> BufReader<R> {
BufReader {
inner: inner,
buf: vec![0; cap],
pos: 0,
cap: 0,
}
}
// ignored
}
From the above code, we can know that inner is a type which implements Read. In my case, the inner may be a &TcpStream.
I knew the signature of Read.read() is:
fn read(&mut self, buf: &mut [u8]) -> Result<usize>
It requires a mutable reference here, but I only lent it an immutable reference. Is this supposed to be a problem when the program reaches self.inner.read() in fill_buf() ?
Quick anser: we pass a &TcpStream as R: Read, not TcpStream. Thus self in Read::read is &mut & TcpStream, not &mut TcpStream. Read is implement for &TcpStream as you can see in the documentation.
Look at this working code:
let stream = TcpStream::connect("...").unwrap();
let mut buf = [0; 100];
Read::read(&mut (&stream), &mut buf);
Note that stream is not even bound as mut, because we use it immutably, just having a mutable reference to the immutable one.
Next, you could ask why Read can be implemented for &TcpStream, because it's necessary to mutate something during the read operation.
This is where the nice Rust-world ๐ŸŒˆ โ˜ฎ ends, and the evil C-/operating system-world starts ๐Ÿ˜ˆ. For example, on Linux you have a simple integer as "file descriptor" for the stream. You can use this for all operations on the stream, including reading and writing. Since you pass the integer by value (it's also a Copy-type), it doesn't matter if you have a mutable or immutable reference to the integer as you can just copy it.
Therefore a minimal amount of synchronization has to be done by the operating system or by the Rust std implementation, because usually it's strange and dangerous to mutate through an immutable reference. This behavior is called "interior mutability" and you can read a little bit more about it...
in the cell documentation
in the book ๐Ÿ“–

Can't bring trait methods into scope

I have this lib.rs file.
use std::io::{ Result, Read };
pub trait ReadExt: Read {
/// Read all bytes until EOF in this source, returning them as a new `Vec`.
///
/// See `read_to_end` for other semantics.
fn read_into_vec(&mut self) -> Result<Vec<u8>> {
let mut buf = Vec::new();
let res = self.read_to_end(&mut buf);
res.map(|_| buf)
}
/// Read all bytes until EOF in this source, returning them as a new buffer.
///
/// See `read_to_string` for other semantics.
fn read_into_string(&mut self) -> Result<String> {
let mut buf = String::new();
let res = self.read_to_string(&mut buf);
res.map(|_| buf)
}
}
impl<T> ReadExt for T where T: Read {}
And now I want to write tests for it in a separate test/lib.rs
extern crate readext;
use std::io::{Read,Cursor};
use readext::ReadExt;
#[test]
fn test () {
let bytes = b"hello";
let mut input = Cursor::new(bytes);
let s = input.read_into_string();
assert_eq!(s, "hello");
}
But Rust keeps telling me
type std::io::cursor::Cursor<&[u8; 5]> does not implement any method in scope named read_into_string
I don't know why. Obviously I'm useing it already. Confused.
The answer is already in the error:
type std::io::cursor::Cursor<&[u8; 5]> does not implement any method
in scope named read_into_string
The problem is, Cursor<&[u8; 5]> does not implement Read because the wrapped type is pointer to a fixed-size array instead of a slice, and so it does not implement your trait either. I guess something along these lines should work:
#[test]
fn test () {
let bytes = b"hello";
let mut input = Cursor::new(bytes as &[u8]);
let s = input.read_into_string();
assert_eq!(s, "hello");
}
This way input is of type Cursor<&[u8]> which implements Read and so should implement your trait too.

Using the same iterator multiple times in Rust

Editor's note: This code example is from a version of Rust prior to 1.0 when many iterators implemented Copy. Updated versions of this code produce a different errors, but the answers still contain valuable information.
I'm trying to write a function to split a string into clumps of letters and numbers; for example, "test123test" would turn into [ "test", "123", "test" ]. Here's my attempt so far:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek() {
None => return bits,
Some(c) => if c.is_digit() {
bits.push(iter.take_while(|c| c.is_digit()).collect());
} else {
bits.push(iter.take_while(|c| !c.is_digit()).collect());
}
}
}
return bits;
}
However, this doesn't work, looping forever. It seems that it is using a clone of iter each time I call take_while, starting from the same position over and over again. I would like it to use the same iter each time, advancing the same iterator over all the each_times. Is this possible?
As you identified, each take_while call is duplicating iter, since take_while takes self and the Peekable chars iterator is Copy. (Only true before Rust 1.0 โ€” editor)
You want to be modifying the iterator each time, that is, for take_while to be operating on an &mut to your iterator. Which is exactly what the .by_ref adaptor is for:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
match iter.peek().map(|c| *c) {
None => return bits,
Some(c) => if c.is_digit(10) {
bits.push(iter.by_ref().take_while(|c| c.is_digit(10)).collect());
} else {
bits.push(iter.by_ref().take_while(|c| !c.is_digit(10)).collect());
},
}
}
}
fn main() {
println!("{:?}", split("123abc456def"))
}
Prints
["123", "bc", "56", "ef"]
However, I imagine this is not correct.
I would actually recommend writing this as a normal for loop, using the char_indices iterator:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(input[start..i].to_string());
is_digit = this_is_digit;
start = i;
}
}
bits.push(input[start..].to_string());
bits
}
This form also allows for doing this with much fewer allocations (that is, the Strings are not required), because each returned value is just a slice into the input, and we can use lifetimes to state this:
pub fn split<'a>(input: &'a str) -> Vec<&'a str> {
let mut bits = vec![];
if input.is_empty() {
return bits;
}
let mut is_digit = input.chars().next().unwrap().is_digit(10);
let mut start = 0;
for (i, c) in input.char_indices() {
let this_is_digit = c.is_digit(10);
if is_digit != this_is_digit {
bits.push(&input[start..i]);
is_digit = this_is_digit;
start = i;
}
}
bits.push(&input[start..]);
bits
}
All that changed was the type signature, removing the Vec<String> type hint and the .to_string calls.
One could even write an iterator like this, to avoid having to allocate the Vec. Something like fn split<'a>(input: &'a str) -> Splits<'a> { /* construct a Splits */ } where Splits is a struct that implements Iterator<&'a str>.
take_while takes self by value: it consumes the iterator. Before Rust 1.0 it also was unfortunately able to be implicitly copied, leading to the surprising behaviour that you are observing.
You cannot use take_while for what you are wanting for these reasons. You will need to manually unroll your take_while invocations.
Here is one of many possible ways of dealing with this:
pub fn split(input: &str) -> Vec<String> {
let mut bits: Vec<String> = vec![];
let mut iter = input.chars().peekable();
loop {
let seeking_digits = match iter.peek() {
None => return bits,
Some(c) => c.is_digit(10),
};
if seeking_digits {
bits.push(take_while(&mut iter, |c| c.is_digit(10)));
} else {
bits.push(take_while(&mut iter, |c| !c.is_digit(10)));
}
}
}
fn take_while<I, F>(iter: &mut std::iter::Peekable<I>, predicate: F) -> String
where
I: Iterator<Item = char>,
F: Fn(&char) -> bool,
{
let mut out = String::new();
loop {
match iter.peek() {
Some(c) if predicate(c) => out.push(*c),
_ => return out,
}
let _ = iter.next();
}
}
fn main() {
println!("{:?}", split("test123test"));
}
This yields a solution with two levels of looping; another valid approach would be to model it as a state machine one level deep only. Ask if you arenโ€™t sure what I mean and Iโ€™ll demonstrate.

Resources