Let's say I would like to generate a string using write! (it might be called conditionally, inside the loop etc, it's why I prefer write! over format!) and then use it in multiple places (so I will need some sort of Rc, otherwise I can't easily share it between structures). I don't need to change the string after generation, so I think that Rc<str> is a good choice (I can avoid indirection that will cause Rc<String> that will store a pointer to pointer).
The problem is a conversion between String (valid argument for write!) and Rc<str>:
use std::fmt::Write;
pub fn generate() -> Result<Rc<str>, Box<dyn Error>> {
let mut out = String::new();
writeln!(out, "Hello {} {}", 1, 2)?; // Just an example
Ok(Rc::from(out.into_boxed_str()))
}
As I understand Rc::from performs copying of all character data into a new memory block (because Box and Rc have different layouts).
Is it possible somehow utilise the knowledge that we will create Rc later, so we can avoid copying? Maybe there are possibility to create someting like String (at least something that accepted by write!), but with a layout compatible with Rc?
With an unknown number of bytes up front
Your best bet is to just use the From<String> implementation of Rc<str>,
the String handles growing and copying while you're still appending and then when you know the amount of memory you need you copy it once over into the Rc<str> getting rid of any over allocation.
use std::error::Error;
use std::fmt::Write;
use std::rc::Rc;
pub fn generate() -> Result<Rc<str>, Box<dyn Error>> {
let mut out = String::new();
writeln!(out, "Hello {} {}", 1, 2)?; // Just an example
Ok(Rc::from(out))
}
With a known number of bytes
You can directly allocate an Rc with enough storage to store the whole str and copy the data directly into it. To avoid initializing you can use MaybeUninit. To make things more ergonomic I wrapped the Rc<[MaybeUninit<u8>]> in a struct and implemented fmt::Write for it.
This currently requires nightly since there is no way to create an Rc with uninitialized storage on stable (that I know of).
The code below is mine with Chaiym's improvements.
#![feature(new_uninit, maybe_uninit_write_slice)]
use std::error::Error;
use std::fmt::Write;
use std::mem::{self, MaybeUninit};
use std::rc::Rc;
pub fn generate() -> Result<Rc<str>, Box<dyn Error>> {
let mut rc = StrRcBuidler::new(6);
writeln!(rc, "Hello")?;
Ok(rc.build().unwrap())
}
#[derive(Debug)]
struct StrRcBuidler {
rc: Rc<[MaybeUninit<u8>]>,
pos: usize,
}
impl StrRcBuidler {
fn new(len: usize) -> Self {
let rc: Rc<[MaybeUninit<u8>]> = Rc::new_uninit_slice(len);
Self { rc, pos: 0 }
}
fn build(self) -> Result<Rc<str>, Self> {
if self.pos < self.rc.len() {
return Err(self);
}
// SAFETY:
// - transmute is safe because `Rc` has the same layout as a pointer
// - everything is utf-8 because we only allow writes with fmt::Write
// - we check above that all bytes are initialized
Ok(unsafe { mem::transmute(self.rc) })
}
}
impl Write for StrRcBuidler {
fn write_str(&mut self, i: &str) -> Result<(), std::fmt::Error> {
let i = i.as_bytes();
let data = Rc::get_mut(&mut self.rc).unwrap()[self.pos..]
.get_mut(..i.len())
.ok_or(std::fmt::Error)?;
MaybeUninit::write_slice(data, i);
self.pos += i.len();
Ok(())
}
}
If you have a known length, you can do it even on stable, at the cost of zeroing the array (and unsafe code). This is not guaranteed, but currently works without allocating because std specializes Rc's FromIterator for TrustedLen iterators, and Take<Repeat> is TrustedLen (adapted the code from #cafce25's answer):
use std::error::Error;
use std::fmt::Write;
use std::rc::Rc;
pub fn generate() -> Result<Rc<str>, Box<dyn Error>> {
let mut rc = StrRcBuidler::new(6);
writeln!(rc, "Hello")?;
Ok(rc.build().unwrap())
}
#[derive(Debug)]
struct StrRcBuidler {
rc: Rc<[u8]>,
pos: usize,
}
impl StrRcBuidler {
fn new(len: usize) -> Self {
Self {
rc: std::iter::repeat(0).take(len).collect(),
pos: 0,
}
}
fn build(self) -> Result<Rc<str>, Self> {
if self.pos < self.rc.len() {
return Err(self);
}
// SAFETY:
// - `[u8]` has the same layout as `str`.
// - Everything is UTF-8 because we only allow writes with `fmt::Write`.
// - We checked above that all bytes are initialized.
Ok(unsafe { Rc::from_raw(Rc::into_raw(self.rc) as *const str) })
}
}
impl Write for StrRcBuidler {
fn write_str(&mut self, i: &str) -> Result<(), std::fmt::Error> {
let i = i.as_bytes();
let data = Rc::get_mut(&mut self.rc).unwrap()[self.pos..]
.get_mut(..i.len())
.ok_or(std::fmt::Error)?;
data.copy_from_slice(i);
self.pos += i.len();
Ok(())
}
}
Technically, you can use this also with only an upper limit on the length (just omit the if self.pos < self.rc.len() check), but this will pad the string with zeroes.
Related
bindgen has nicely given me
extern "C" {
pub fn Hacl_Bignum4096_new_bn_from_bytes_be(len: u32, b: *mut u8) -> *mut u64;
}
returning something of type *mut u64. Unfortunately there is no reliable way (that I have found) to determine how many u64s are allocated. This makes is very hard (for me) to extract the data pointed to into something I can safely persist in a Rust struct instance.
As a consequence, any time I want to use any function from the Hacl library I have to perform that conversion and free up the created pointers in an unsafe block.
impl Bignum {
/// Returns true if self < other
pub fn lt(&self, other: &Bignum) -> Result<bool, Error> {
let hacl_result: HaclBnWord;
unsafe {
let a = self.get_hacl_bn()?;
let b = other.get_hacl_bn()?;
hacl_result = Hacl_Bignum4096_lt_mask(a, b);
free_hacl_bn(a);
free_hacl_bn(b);
}
Ok(hacl_result != 0 as HaclBnWord)
}
}
unsafe fn get_hacl_bn(&self) is suitably defined and calls Hacl_Bignum4096_new_bn_from_bytes_be() appropriately. And unsafe fn free_hacl_bn(bn: HaclBnType) also lives in this module.
I haven't benchmarked anything yet, but having to perform the conversion to a Hacl_Bignum from bytes each and every time feels wasteful.
So is there a way to determine the size of what is pointed to or is there a way to copy the data out of it into something safe?
You write: "having to perform the conversion to a Hacl_Bignum from bytes each and every time feels wasteful". It seems like you are not letting the library do its job. You should not keep a copy of the bignum data in your Rust struct Bignum, but only the pointer you get from the library. Something like:
extern "C" {
pub fn Hacl_Bignum4096_new_bn_from_bytes_be(len: u32, b: *mut u8) -> *mut u64;
pub fn Hacl_Bignum4096_lt_mask(a: *mut u64, b: *mut u64) -> u64;
}
struct Bignum {
handle: *mut u64,
}
struct BignumError {}
impl Bignum {
pub fn new(bytes: &mut [u8]) -> Result<Self, BignumError> {
unsafe {
let handle =
Hacl_Bignum4096_new_bn_from_bytes_be(bytes.len() as u32, bytes.as_mut_ptr());
if handle.is_null() {
return Err(BignumError {});
} else {
Ok(Self { handle })
}
}
}
/// Returns true if self < other
pub fn lt(&self, other: &Bignum) -> bool {
unsafe { Hacl_Bignum4096_lt_mask(self.handle, other.handle) == u64::MAX }
}
}
PS. I used the comments in this file, which seems to be the library in question.
I'm trying to modify an existing application that forces me to learn rust and it's giving me a hard time (reformulating...)
I would like to have a struct with two fields:
pub struct Something<'a> {
pkt_wtr: PacketWriter<&'a mut Vec<u8>>,
buf: Vec<u8>,
}
Where 'buf' will be used as an io for PacketWriter to write its results. So PacketWriter is something like
use std::io::{self};
pub struct PacketWriter<T :io::Write> {
wtr :T,
}
impl <T :io::Write> PacketWriter<T> {
pub fn new(wtr :T) -> Self {
return PacketWriter {
wtr,
};
}
pub fn into_inner(self) -> T {
self.wtr
}
pub fn write(&mut self) {
self.wtr.write_all(&[10,11,12]).unwrap();
println!("wrote packet");
}
}
Then inside 'Something' I want to use PacketWriter this way: let it write what it needs in 'buf' and drain it by pieces.
impl Something<'_> {
pub fn process(&mut self) {
self.pkt_wtr.write();
let c = self.buf.drain(0..1);
}
}
What seems to be impossible is to create a workable constructor for 'Something'
impl Something<'_> {
pub fn new() -> Self {
let mut buf = Vec::new();
let pkt_wtr = PacketWriter::new(&mut buf);
return Something {
pkt_wtr: pkt_wtr,
buf: buf,
};
}
}
What does not seem to be doable is, however I try, to have PacketWriter being constructed on a borrowed reference from 'buf' while 'buf' is also stored in the 'Something' object.
I can give 'buf' fully to 'PacketWriter' (per example below) but I cannot then access the content of 'buf' later. I know that it works in the example underneath, but it's because I can have access to the 'buf' after it is given to the "PacketWriter' (through 'wtr'). In reality, the 'PacketWriter' has that field (wtr) private and in addition it's a code that I cannot modify to, for example, obtain a getter for 'wtr'
Thanks
I wrote a small working program to describe the intent and the problem, with the two options
use std::io::{self};
pub struct PacketWriter<T :io::Write> {
wtr :T,
}
impl <T :io::Write> PacketWriter<T> {
pub fn new(wtr :T) -> Self {
return PacketWriter {
wtr,
};
}
pub fn into_inner(self) -> T {
self.wtr
}
pub fn write(&mut self) {
self.wtr.write_all(&[10,11,12]).unwrap();
println!("wrote packet");
}
}
/*
// that does not work of course because buf is local but this is not the issue
pub struct Something<'a> {
pkt_wtr: PacketWriter<&'a mut Vec<u8>>,
buf: Vec<u8>,
}
impl Something<'_> {
pub fn new() -> Self {
let mut buf = Vec::new();
let pkt_wtr = PacketWriter::new(&mut buf);
//let mut pkt_wtr = PacketWriter::new(buf);
return Something {
pkt_wtr,
buf,
};
}
pub fn process(&mut self) {
self.pkt_wtr.write();
println!("process {:?}", self.buf);
}
}
*/
pub struct Something {
pkt_wtr: PacketWriter<Vec<u8>>,
}
impl Something {
pub fn new() -> Self {
let pkt_wtr = PacketWriter::new(Vec::new());
return Something {
pkt_wtr,
};
}
pub fn process(&mut self) {
self.pkt_wtr.write();
let file = &mut self.pkt_wtr.wtr;
println!("processing Something {:?}", file);
let c = file.drain(0..1);
println!("Drained {:?}", c);
}
}
fn main() -> std::io::Result<()> {
let mut file = Vec::new();
let mut wtr = PacketWriter::new(&mut file);
wtr.write();
println!("Got data {:?}", file);
{
let c = file.drain(0..2);
println!("Drained {:?}", c);
}
println!("Remains {:?}", file);
let mut data = Something::new();
data.process();
Ok(())
}
It's not totally clear what the question is, given that the code appears to compile, but I can take a stab at one part: why can't you use into_inner() on self.wtr inside the process function?
into_inner takes ownership of the PacketWriter that gets passed into its self parameter. (You can tell this because the parameter is spelled self, rather than &self or &mut self.) Taking ownership means that it is consumed: it cannot be used anymore by the caller and the callee is responsible for dropping it (read: running destructors). After taking ownership of the PacketWriter, the into_inner function returns just the wtr field and drops (runs destructors on) the rest. But where does that leave the Something struct? It has a field that needs to contain a PacketWriter, and you just took its PacketWriter away and destroyed it! The function ends, and the value held in the PacketWriter field is unknown: it can't be thing that was in there from the beginning, because that was taken over by into_inner and destroyed. But it also can't be anything else.
Rust generally forbids structs from having uninitialized or undefined fields. You need to have that field defined at all times.
Here's the worked example:
pub fn process(&mut self) {
self.pkt_wtr.write();
// There's a valid PacketWriter in pkt_wtr
let raw_wtr: Vec<u8> = self.pkt_wtr.into_inner();
// The PacketWriter in pkt_wtr was consumed by into_inner!
// We have a raw_wtr of type Vec<u8>, but that's not the right type for pkt_wtr
// We could try to call this function here, but what would it do?
self.pkt_wtr.write();
println!("processing Something");
}
(Note: The example above has slightly squishy logic. Formally, because you don't own self, you can't do anything that would take ownership of any part of it, even if you put everything back neatly when you're done.)
You have a few options to fix this, but with one major caveat: with the public interface you have described, there is no way to get access to the PacketWriter::wtr field and put it back into the same PacketWriter. You'll have to extract the PacketWriter::wtr field and put it into a new PacketWriter.
Here's one way you could do it. Remember, the goal is to have self.packet_wtr defined at all times, so we'll use a function called mem::replace to put a dummy PacketWriter into self.pkt_wtr. This ensures that self.pkt_wtr always has something in it.
pub fn process(&mut self) {
self.pkt_wtr.write();
// Create a new dummy PacketWriter and swap it with self.pkt_wtr
// Returns an owned version of pkt_wtr that we're free to consume
let pkt_wtr_owned = std::mem::replace(&mut self.pkt_wtr, PacketWriter::new(Vec::new()));
// Consume pkt_wtr_owned, returning its wtr field
let raw_wtr = pkt_wtr_owned.into_inner();
// Do anything you want with raw_wtr here -- you own it.
println!("The vec is: {:?}", &raw_wtr);
// Create a new PacketWriter with the old PacketWriter's buffer.
// The dummy PacketWriter is dropped here.
self.pkt_wtr = PacketWriter::new(raw_wtr);
println!("processing Something");
}
Rust Playground
This solution is definitely a hack, and it's potentially a place where the borrow checker could be improved to realize that leaving a field temporarily undefined is fine, as long as it's not accessed before it is assigned again. (Though there may be an edge case I missed; this stuff is hard to reason about in general.) Additionally, this is the kind of thing that can be optimized away by later compiler passes through dead store elimination.
If this turns out to be a hotspot when profiling, there are unsafe techniques that would allow the field to be invalid for that period, but that would probably need a new question.
However, my recommendation would be to find a way to get an "escape hatch" function added to PacketWriter that lets you do exactly what you want to do: get a mutable reference to the inner wtr without taking ownership of PacketWriter.
impl<T: io::Write> PacketWriter<T> {
pub fn inner_mut(&mut self) -> &mut T {
&mut self.wtr
}
}
For clarification, I found a solution using Rc+RefCell or Arc+Mutex. I encapsulated the buffer in a Rc/RefCell and added a Write
pub struct WrappedWriter {
data :Arc<Mutex<Vec<u8>>>,
}
impl WrappedWriter {
pub fn new(data : Arc<Mutex<Vec<u8>>>) -> Self {
return WrappedWriter {
data,
};
}
}
impl Write for WrappedWriter {
fn write(&mut self, buf: &[u8]) -> Result<usize, Error> {
let mut data = self.data.lock().unwrap();
data.write(buf)
}
fn flush(&mut self) -> Result<(), Error> {
Ok(())
}
}
pub struct Something {
wtr: PacketWriter<WrappedWriter>,
data : Arc<Mutex<Vec<u8>>>,
}
impl Something {
pub fn new() -> Result<Self, Error> {
let data :Arc<Mutex<Vec<u8>>> = Arc::new(Mutex::new(Vec::new()));
let wtr = PacketWriter::new(WrappedWriter::new(Arc::clone(&data)));
return Ok(PassthroughDecoder {
wtr,
data,
});
}
pub fn process(&mut self) {
let mut data = self.data.lock().unwrap();
data.clear();
}
}
You can replace Arc by Rc and Mutex by RefCell if you don't have thread-safe issues in which case the reference access becomes
let data = self.data.borrow_mut();
An answer to How do I read the entire body of a Tokio-based Hyper request? suggests:
you may wish to establish some kind of cap on the number of bytes read [when using futures::Stream::concat2]
How can I actually achieve this? For example, here's some code that mimics a malicious user who is sending my service an infinite amount of data:
extern crate futures; // 0.1.25
use futures::{prelude::*, stream};
fn some_bytes() -> impl Stream<Item = Vec<u8>, Error = ()> {
stream::repeat(b"0123456789ABCDEF".to_vec())
}
fn limited() -> impl Future<Item = Vec<u8>, Error = ()> {
some_bytes().concat2()
}
fn main() {
let v = limited().wait().unwrap();
println!("{}", v.len());
}
One solution is to create a stream combinator that ends the stream once some threshold of bytes has passed. Here's one possible implementation:
struct TakeBytes<S> {
inner: S,
seen: usize,
limit: usize,
}
impl<S> Stream for TakeBytes<S>
where
S: Stream<Item = Vec<u8>>,
{
type Item = Vec<u8>;
type Error = S::Error;
fn poll(&mut self) -> Poll<Option<Self::Item>, Self::Error> {
if self.seen >= self.limit {
return Ok(Async::Ready(None)); // Stream is over
}
let inner = self.inner.poll();
if let Ok(Async::Ready(Some(ref v))) = inner {
self.seen += v.len();
}
inner
}
}
trait TakeBytesExt: Sized {
fn take_bytes(self, limit: usize) -> TakeBytes<Self>;
}
impl<S> TakeBytesExt for S
where
S: Stream<Item = Vec<u8>>,
{
fn take_bytes(self, limit: usize) -> TakeBytes<Self> {
TakeBytes {
inner: self,
limit,
seen: 0,
}
}
}
This can then be chained onto the stream before concat2:
fn limited() -> impl Future<Item = Vec<u8>, Error = ()> {
some_bytes().take_bytes(999).concat2()
}
This implementation has caveats:
it only works for Vec<u8>. You can introduce generics to make it more broadly applicable, of course.
it allows for more bytes than the limit to come in, it just stops the stream after that point. Those types of decisions are application-dependent.
Another thing to keep in mind is that you want to attempt to tackle this problem as low as you can — if the source of the data has already allocated a gigabyte of memory, placing a limit won't help as much.
I need to serialize a class of structs according to the TLV format with Serde. TLV can be nested in a tree format.
The fields of these structs are serialized normally, much like bincode does, but before the field data I must include a tag (to be associated, ideally) and the length, in bytes, of the field data.
Ideally, Serde would recognize the structs that need this kind of serialization, probably by having them implement a TLV trait. This part is optional, as I can also explicitly annotate each of these structs.
So this question breaks down in 3 parts, in order of priority:
How do I get the length data (from Serde?) before the serialization of that data has been performed?
How do I associate tags with structs (though I guess I could also include tags inside the structs..)?
How do I make Serde recognize a class of structs and apply custom serialization?
Note that 1) is the (core) question here. I will post 2) and 3) as individual questions if 1) can be solved with Serde.
Brace yourself, long post. Also, for convention: I'm picking both type and length to be unsigned 4 byte big endian. Let's start with the easy stuff:
How do I make Serde recognize a class of structs and apply custom serialization?
That's really a separate question, but you can either do that via the #[serde(serialize_with = …)] attributes, or in your serializer's fn serialize_struct(self, name: &'static str, _: usize) based on the name, depending on what exactly you have in mind.
How do I associate tags with structs (though I guess I could also include tags inside the structs..)?
This is a known limitation of serde, and the reason protobuf implementations typicall aren't based on serde (take e.g. prost), but have their own derive proc macros that allow to annotate structs and fields with the respective tags. You should probably do the same as it's clean and fast. But since you asked about serde, I'll pick an alternative inspired by serde_protobuf: if you look at it from a weird angle, serde is just a visitor-based reflection framework. It will provide you with structure information about the type you're currently (de-)serializing, e.g. it'll tell you type and name and fields of the type your visiting. All you need is a (user-supplied) function that maps from this type information to the tags. For example:
struct TLVSerializer<'a> {
ttf: &'a dyn Fn(TypeTagFor) -> u32,
…
}
impl<'a> Serializer for TLVSerializer<'a> {
fn serialize_bool(self, v: bool) -> Result<Self::Ok, Self::Error> {
let tag = &(self.ttf)(TypeTagFor::Bool).to_be_bytes();
let len = &1u32.to_be_bytes();
todo!("write");
}
fn serialize_i32(self, v: i32) -> Result<Self::Ok, Self::Error> {
let tag = &(self.ttf)(TypeTagFor::Int {
signed: true,
width: 4,
})
.to_be_bytes();
let len = &4u32.to_be_bytes();
todo!("write");
}
}
Then, you need to write a function that supplies the tags, e.g. something like:
enum TypeTagFor {
Bool,
Int { width: u8, signed: bool },
Struct { name: &'static str },
// ...
}
fn foobar_type_tag_for(ttf: TypeTagFor) -> u32 {
match ttf {
TypeTagFor::Int {
width: 4,
signed: true,
} => 0x69333200,
TypeTagFor::Bool => 0x626f6f6c,
_ => unreachable!(),
}
}
If you only have one set of type → tag mappings, you could also put it into the serializer directly.
How do I get the length data (from Serde?) before the serialization of that data has been performed?
The short answer is: Can't. The length can't be known without inspecting the entire structure (there could be Vecs in it, e.g.). But that also tells you what you need to do: You need to inspect the entire structure first, deduce the length, and then do the serialization. And you have precisely one method for inspecting the entire structure at hand: serde. So, you'll write a serializer that doesn't actually serialize anything and only records the length:
struct TLVLenVisitor;
impl Serializer for TLVLenVisitor {
type Ok = usize;
type SerializeSeq = TLVLenSumVisitor;
fn serialize_i32(self, _v: i32) -> Result<Self::Ok, Self::Error> {
Ok(4)
}
fn serialize_str(self, str: &str) -> Result<Self::Ok, Self::Error> {
Ok(str.len())
}
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
Ok(TLVLenSumVisitor { sum: 0 })
}
}
struct TLVLenSumVisitor {
sum: usize,
}
impl serde::ser::SerializeSeq for TLVLenSumVisitor {
type Ok = usize;
fn serialize_element<T: Serialize + ?Sized>(&mut self, value: &T) -> Result<(), Self::Error> {
// The length of a sequence is the length of all its parts, plus the bytes for type tag and length
self.sum += value.serialize(TLVLenVisitor)? + HEADER_LEN;
Ok(())
}
fn end(self) -> Result<Self::Ok, Self::Error> {
Ok(self.sum)
}
}
Fortunately, serialization is non-destructive, so you can use this first serializer to get the length, and then do the actual serialization in a second pass:
let len = foobar.serialize(TLVLenVisitor).unwrap();
foobar.serialize(TLVSerializer {
target: &mut File::create("foobar").unwrap(), // No seeking performed on the file
len,
ttf: &foobar_type_tag_for,
})
.unwrap();
Since you already know the length of what you're serializing, the second serializer is relatively straightforward:
struct TLVSerializer<'a> {
target: &'a mut dyn Write, // Using dyn to reduce verbosity of the example
len: usize,
ttf: &'a dyn Fn(TypeTagFor) -> u32,
}
impl<'a> Serializer for TLVSerializer<'a> {
type Ok = ();
type SerializeSeq = TLVSeqSerializer<'a>;
// Glossing over error handling here.
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
self.target
.write_all(&(self.ttf)(TypeTagFor::Seq).to_be_bytes())
.unwrap();
// Normally, there'd be no way to find the length here.
// But since TLVSerializer has been told, there's no problem
self.target
.write_all(&u32::try_from(self.len).unwrap().to_be_bytes())
.unwrap();
Ok(TLVSeqSerializer {
target: self.target,
ttf: self.ttf,
})
}
}
The only snag you may hit is that the TLVLenVisitor only gave you one length. But you have many TLV-structures, recursively nested. When you want to write out one of the nested structures (e.g. a Vec), you just run the TLVLenVisitor again, for each element.
struct TLVSeqSerializer<'a> {
target: &'a mut dyn Write,
ttf: &'a dyn Fn(TypeTagFor) -> u32,
}
impl<'a> serde::ser::SerializeSeq for TLVSeqSerializer<'a> {
type Ok = ();
fn serialize_element<T: Serialize + ?Sized>(&mut self, value: &T) -> Result<(), Self::Error> {
value.serialize(TLVSerializer {
// Getting the length of a subfield here
len: value.serialize(TLVLenVisitor)?,
target: self.target,
ttf: self.ttf,
})
}
fn end(self) -> Result<Self::Ok, Self::Error> {
Ok(())
}
}
Playground
This also means that you may have to do many passes over the structure you're serializing. This might be fine if speed is not of the essence and you're memory-constrained, but in general, I don't think it's a good idea. You may be tempted to try to get all the lengths in the entire structure in a single pass, which can be done, but it'll either be brittle (since you'd have to rely on visiting order) or difficult (because you'd have to build a shadow structure which contains all the lengths).
Also, do note that this approach expects that two serializer invocations of the same struct traverse the same structure. But an implementer of Serialize is perfectly capable to generating random data on the fly or mutating itself via internal mutability. Which would make this serializer generate invalid data. You can ignore that problem since it's far-fetched, or add a check to the end call and make sure the written length matches the actual written data.
Really, I think it'd be best if you don't worry about finding the length before serialization and wrote the serialization result to memory first. To do so, you can first write all length fields as a dummy value to a Vec<u8>:
struct TLVSerializer<'a> {
target: &'a mut Vec<u8>,
ttf: &'a dyn Fn(TypeTagFor) -> u32,
}
impl<'a> Serializer for TLVSerializer<'a> {
type Ok = ();
type SerializeSeq = TLVSeqSerializer<'a>;
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
let idx = self.target.len();
self.target
.extend((self.ttf)(TypeTagFor::Seq).to_be_bytes());
// Writing dummy length here
self.target.extend(u32::MAX.to_be_bytes());
Ok(TLVSeqSerializer {
target: self.target,
idx,
ttf: self.ttf,
})
}
}
Then after you serialize the content and know its length, you can overwrite the dummies:
struct TLVSeqSerializer<'a> {
target: &'a mut Vec<u8>,
idx: usize, // This is how it knows where it needs to write the length
ttf: &'a dyn Fn(TypeTagFor) -> u32,
}
impl<'a> serde::ser::SerializeSeq for TLVSeqSerializer<'a> {
type Ok = ();
fn serialize_element<T: Serialize + ?Sized>(&mut self, value: &T) -> Result<(), Self::Error> {
value.serialize(TLVSerializer {
target: self.target,
ttf: self.ttf,
})
}
fn end(self) -> Result<Self::Ok, Self::Error> {
end(self.target, self.idx)
}
}
fn end(target: &mut Vec<u8>, idx: usize) -> Result<(), std::fmt::Error> {
let len = u32::try_from(target.len() - idx - HEADER_LEN)
.unwrap()
.to_be_bytes();
target[idx + 4..][..4].copy_from_slice(&len);
Ok(())
}
Playground. And there you go, single pass TLV serialization with serde.
I have an object that can be in either of two modes: a source or a sink. It is always in one of them and it is always known at compile time (when passed the object you know if you are going to read or write to it obviously).
I can put all the methods on the same object, and just assume I won't be called improperly or error when I do, or I was thinking I could be make two
tuple structs of the single underlying object and attach the methods to those tuple structs instead. The methods are almost entirely disjoint.
It is kind of abusing the fact that both tuple structs have the same layout and there is zero overhead for the casts and tuple storage.
Think of this similar to the Java ByteBuffer and related classes where you write then flip then read then flip back and write more. Except this would catch errors in usage.
However, it does seem a little unusual and might be overly confusing for such a small problem. And it seems like there is a better way to do this -- only requirement is zero overhead so no dynamic dispatch.
https://play.rust-lang.org/?gist=280d2ec2548e4f38e305&version=stable
#[derive(Debug)]
struct Underlying {
a: u32,
b: u32,
}
#[derive(Debug)]
struct FaceA(Underlying);
impl FaceA {
fn make() -> FaceA { FaceA(Underlying{a:1,b:2}) }
fn doa(&self) { println!("FaceA do A {:?}", *self); }
fn dou(&self) { println!("FaceA do U {:?}", *self); }
fn tob(&self) -> &FaceB { unsafe{std::mem::transmute::<&FaceA,&FaceB>(self)} }
}
#[derive(Debug)]
struct FaceB(Underlying);
impl FaceB {
fn dob(&self) { println!("FaceB do B {:?}", *self); }
fn dou(&self) { println!("FaceB do U {:?}", *self); }
fn toa(&self) -> &FaceA { unsafe{std::mem::transmute::<&FaceB,&FaceA>(self)} }
}
fn main() {
let a = FaceA::make();
a.doa();
a.dou();
let b = a.tob();
b.dob();
b.dou();
let aa = b.toa();
aa.doa();
aa.dou();
}
First of all, it seems like you don't understand how ownership works in Rust; you may want to read the Ownership chapter of the Rust Book. Specifically, the way you keep re-aliasing the original FaceA is how you would specifically enable the very thing you say you want to avoid. Also, all the borrows are immutable, so it's not clear how you intend to do any sort of mutation.
As such, I've written a new example from scratch that involves going between two types with disjoint interfaces (view on playpen).
#[derive(Debug)]
pub struct Inner {
pub value: i32,
}
impl Inner {
pub fn new(value: i32) -> Self {
Inner {
value: value,
}
}
}
#[derive(Debug)]
pub struct Upper(Inner);
impl Upper {
pub fn new(inner: Inner) -> Self {
Upper(inner)
}
pub fn into_downer(self) -> Downer {
Downer::new(self.0)
}
pub fn up(&mut self) {
self.0.value += 1;
}
}
#[derive(Debug)]
pub struct Downer(Inner);
impl Downer {
pub fn new(inner: Inner) -> Self {
Downer(inner)
}
pub fn into_upper(self) -> Upper {
Upper::new(self.0)
}
pub fn down(&mut self) {
self.0.value -= 1;
}
}
fn main() {
let mut a = Upper::new(Inner::new(0));
a.up();
let mut b = a.into_downer();
b.down();
b.down();
b.down();
let mut c = b.into_upper();
c.up();
show_i32(c.0.value);
}
#[inline(never)]
fn show_i32(v: i32) {
println!("v: {:?}", v);
}
Here, the into_upper and into_downer methods consume the subject value, preventing anyone from using it afterwards (try accessing a after the call to a.into_downer()).
This should not be particularly inefficient; there is no heap allocation going on here, and Rust is pretty good at moving values around efficiently. If you're curious, this is what the main function compiles down to with optimisations enabled:
mov edi, -1
jmp _ZN8show_i3220h2a10d619fa41d919UdaE
It literally inlines the entire program (save for the show function that I specifically told it not to inline). Unless profiling shows this to be a serious performance problem, I wouldn't worry about it.