I am having a problem with the Rust bincode library. When it serializes a vector, it always assumes the prefixed length is 8 bytes. This is a fine assumption when you always encode data using bincode because bincode can read it's own serialized data.
I am in the situation where I cannot influence the serializer as I did not write it and it has to stay the same for legacy reasons. It encodes its vectors as a length-prefixed array where the prefix is always 2 bytes (or in some cases it is 4 bytes but but I know these cases well. Once I know how to do it with 2 bytes 4 bytes should not be a problem).
How can I use bincode (and serde for that matter) to deserialize these fields? Can I work around the default 8 bytes of length hardcoded in bincode?
Bincode is not supposed to be compatible with any existing serializer or standard. Nor is, according to the comment, the format you are trying to read.
I suggest you get the bincode sources—they are MIT-licensed, so you are free to do basically whatever you please with them—and modify them to suit your format (and give it your name and include it in your project).
serde::Deserializer is quite well documented, as is the underlying data model, and the implementation in bincode is trivial to find (in de/mod.rs), so take it as your starting point and adjust as needed.
I have figured out a (possibly very ugly) way to do it without implementing my own deserializer — Bincode could do it after all. It looks something like this:
impl<'de> Deserialize<'de> for VarLen16 {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
struct VarLen16Visitor;
impl<'de> Visitor<'de> for VarLen16Visitor {
type Value = VarLen16;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("VarLen16")
}
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
A: SeqAccess<'de>,
{
let mut res: Vec<u8> = vec![];
let length: u16 = seq
.next_element()?
.ok_or_else(|| serde::de::Error::invalid_length(1, &self))?;
for i in 0..length {
res.push(
seq.next_element()?
.ok_or_else(|| serde::de::Error::invalid_length(1, &self))?,
);
}
return Ok(VarLen16(res));
}
}
return Ok(deserializer.deserialize_tuple(1 << 16, VarLen16Visitor)?);
}
}
In short, I make the system think I deserialize a tuple where I set the length to the maximum I need. I have tested this, it does not actually allocate that much memory. Then I act like the length is part of this tuple, read it first and then continue reading as far as this length tells me to. It's not pretty but it certainly works.
Related
Maybe I haven't read the documentation too carefully but, I didn't manage to get answer to the following question.
When having a custom serialization logic defined like:
pub fn serialize_foo<S>(t: &Foo, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut map = serializer.serialize_map(Some(len))?;
/*
...
*/
map.end()
}
What is len? Is it the size of a type defined in bytes, or is it something else? For example, what would be the type for i32 in this case?
From Serializer::serialize_map:
The argument is the number of elements in the map, which may or may not be computable before the map is iterated. Some serializers only support maps whose length is known up front.
So it is the number of times that you will call .serialize_entry() (or .serialize_key() and .serialize_value() pairs) on map.
The documentation for serialize_seq states
Begin to serialize a variably sized sequence. This call must be followed by zero or more calls to serialize_element, then a call to end.
The argument is the number of elements in the sequence, which may or may not be computable before the sequence is iterated. Some serializers only support sequences whose length is known up front.
I want to serialize a sequence whose length is unknown before iterating over the sequence.
The trivial example they give is:
use serde::ser::{Serialize, Serializer, SerializeSeq};
impl<T> Serialize for Vec<T>
where
T: Serialize,
{
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut seq = serializer.serialize_seq(Some(self.len()))?;
for element in self {
seq.serialize_element(element)?;
}
seq.end()
}
}
How can I modify the code above to support a length which is not computable before the sequence is iterated?
Maybe I have misunderstood the documentation, but I would expect something similar to this to work:
use serde::ser::{Serialize, Serializer, SerializeSeq};
impl<T> Serialize for Vec<T>
where
T: Serialize,
{
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut lazy_len = 0
let mut seq = serializer.serialize_seq(Some(lazy_len))?;
for element in self {
lazy_len += 2;
seq.serialize_element(element)?;
}
seq.end()
}
}
Which would give twice the length of the sequence.
The length paramater is an Option<usize> because maybe you have one and maybe you don't. You don't have one, so use None instead of Some.
let mut seq = serializer.serialize_seq(None)?;
As the note says, some serializers require a length and others don't.
Serializers will use the value that you pass to serialize_seq to (for example) allocate a buffer immediately. They get better performance by doing this once and then serializing each element into that memory. Extending the length with each item would not be at all beneficial since they will need to extend the buffer each time too, which is exactly what they'll do if you pass None for the length.
This should work fine with Json, but some other more efficient formats like Bincode might refuse to serialize without it. In particular, some encoding formats will serialize the length as part of the format before the values.
Maybe I have misunderstood the documentation, but I would expect something similar to this to work:
...
let mut lazy_len = 0
let mut seq = serializer.serialize_seq(Some(lazy_len))?;
for element in self {
lazy_len += 2;
seq.serialize_element(element)?;
}
...
Integers are Copy types, so serializer.serialize_seq(Some(lazy_len)) just copies the value of lazy_len and mutating lazy_len afterwards will have no effect.
I am trying to implement std::io::Read trait for a struct.
My objective is to convert obj to byte array and read it through the implementation of Read trait.
Following is the code I have written so far.
use chrono::{DateTime, Utc};
use std::io::Error;
use std::io::Read;
use std::vec::Vec;
use std::str;
use super::{Chain, Transaction};
// The struct I need to convert to byte array and add the Read impl.
#[derive(Debug)]
pub struct Block {
index: u64,
timestamp: DateTime<Utc>,
transactions: Vec<Transaction>,
proof: i64,
previous_hash: String,
}
// The Read trait implementation for Block
impl Read for Block {
fn read(&mut self, buf: &mut [u8]) -> std::result::Result<usize, Error> {
let bytes: &[u8] = unsafe { any_as_u8_slice(&self) };
buf.clone_from_slice(bytes);
Ok(bytes.len())
}
}
// Function that converts to byte array. (found on stackoverflow)
unsafe fn any_as_u8_slice<T: Sized>(p: &T) -> &[u8] {
::std::slice::from_raw_parts((p as *const T) as *const u8, ::std::mem::size_of::<T>())
}
I get an error when I execute the code this way.
let mut buffer: Vec<u8> = Vec::new();
let result = block.read(buffer.as_mut());
ERROR
thread 'main' panicked at 'destination and source slices have
different lengths',
/Users/harsh/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/slice/mod.rs:2554:9
I am new to Rust, trying to learn by porting another program in Rust.
How do I copy &[u8] to another &mut [u8] which is a vec. (Fix the Read impl for Block)?
And is there a better way to do this?
Convert object to byte array and return it from the Read implementation.
There's a few different problems here:
What you're trying to do here won't be sound in general. Rust structs might include padding bytes or otherwise initialized bytes, which means that reading them from a [u8] is undefined behavior. The name for what you're trying to do here is a Transmute and they are famously very difficult to do correctly.
It's not clear to me why specifically you're doing this in terms of the Read trait. Read is generally for i/o devices, like files or stdin, or in-memory buffers that behave like i/o devices. Even if we assume that a direct, inplace transmute to a byte slice is appropriate here, it would make more sense to just have a method on Block resembling fn as_byte_slice(&self) -> &[u8] { ... }.
Even if we set aside the above issues, it's still not clear to me that you're going to get the outcome you expect. Transmuting a struct to a byte array will convert only the raw bytes in the struct, which will work fine for primitive types like u64, but for types like Vec<T> and String will only return the underlying pointer to the allocated storage.
I'm guessing that what you actually want to have happen here is that all of the contents of the Block– including the list of transactions and the previous_hash– be converted into the byte slice. This is called serialization, and the de-facto way to do it in Rust is with serde. Serde is an abstract library that connects types (like Vec and your own Block) to data formats like json and bincode.
In your question, you've said you want to "convert the [object] to a byte array". This is a bit nonspecific; it's likely that there is actually a specific data format into which you're suppose to convert this Block. Your specific application will likely describe which specific data format is in use, and you can then look into whether there already exists a serde Serializer for that data format, or whether you'll need to write your own.
Editor's note: This question is from a version of Rust prior to 1.0 and references some items that are not present in Rust 1.0. The answers still contain valuable information.
What's the idiomatic way to convert from (say) a usize to a u32?
For example, casting using 4294967295us as u32 works and the Rust 0.12 reference docs on type casting say
A numeric value can be cast to any numeric type. A raw pointer value can be cast to or from any integral type or raw pointer type. Any other cast is unsupported and will fail to compile.
but 4294967296us as u32 will silently overflow and give a result of 0.
I found ToPrimitive and FromPrimitive which provide nice functions like to_u32() -> Option<u32>, but they're marked as unstable:
#[unstable(feature = "core", reason = "trait is likely to be removed")]
What's the idiomatic (and safe) way to convert between numeric (and pointer) types?
The platform-dependent size of isize / usize is one reason why I'm asking this question - the original scenario was I wanted to convert from u32 to usize so I could represent a tree in a Vec<u32> (e.g. let t = Vec![0u32, 0u32, 1u32], then to get the grand-parent of node 2 would be t[t[2us] as usize]), and I wondered how it would fail if usize was less than 32 bits.
Converting values
From a type that fits completely within another
There's no problem here. Use the From trait to be explicit that there's no loss occurring:
fn example(v: i8) -> i32 {
i32::from(v) // or v.into()
}
You could choose to use as, but it's recommended to avoid it when you don't need it (see below):
fn example(v: i8) -> i32 {
v as i32
}
From a type that doesn't fit completely in another
There isn't a single method that makes general sense - you are asking how to fit two things in a space meant for one. One good initial attempt is to use an Option — Some when the value fits and None otherwise. You can then fail your program or substitute a default value, depending on your needs.
Since Rust 1.34, you can use TryFrom:
use std::convert::TryFrom;
fn example(v: i32) -> Option<i8> {
i8::try_from(v).ok()
}
Before that, you'd have to write similar code yourself:
fn example(v: i32) -> Option<i8> {
if v > std::i8::MAX as i32 {
None
} else {
Some(v as i8)
}
}
From a type that may or may not fit completely within another
The range of numbers isize / usize can represent changes based on the platform you are compiling for. You'll need to use TryFrom regardless of your current platform.
See also:
How do I convert a usize to a u32 using TryFrom?
Why is type conversion from u64 to usize allowed using `as` but not `From`?
What as does
but 4294967296us as u32 will silently overflow and give a result of 0
When converting to a smaller type, as just takes the lower bits of the number, disregarding the upper bits, including the sign:
fn main() {
let a: u16 = 0x1234;
let b: u8 = a as u8;
println!("0x{:04x}, 0x{:02x}", a, b); // 0x1234, 0x34
let a: i16 = -257;
let b: u8 = a as u8;
println!("0x{:02x}, 0x{:02x}", a, b); // 0xfeff, 0xff
}
See also:
What is the difference between From::from and as in Rust?
About ToPrimitive / FromPrimitive
RFC 369, Num Reform, states:
Ideally [...] ToPrimitive [...] would all be removed in favor of a more principled way of working with C-like enums
In the meantime, these traits live on in the num crate:
ToPrimitive
FromPrimitive
Rust sadly cannot produce a fixed size array [u8; 16] with a fixed size slicing operator s[0..16]. It'll throw errors like "expected array of 16 elements, found slice".
I've some KDFs that output several keys in wrapper structs like
pub struct LeafKey([u8; 16]);
pub struct MessageKey([u8; 32]);
fn kdfLeaf(...) -> (MessageKey,LeafKey) {
// let mut r: [u8; 32+16];
let mut r: (MessageKey, LeafKey);
debug_assert_eq!(mem::size_of_val(&r), 384/8);
let mut sha = Sha3::sha3_384();
sha.input(...);
// sha.result(r);
sha.result(
unsafe { mem::transmute::<&mut (MessageKey, LeafKey),&mut [u8;32+16]>(&r) }
);
sha.reset();
// (MessageKey(r[0..31]), LeafKey(r[32..47]))
r
}
Is there a safer way to do this? We know mem::transmute will refuse to compile if the types do not have the same size, but that only checks that pointers have the same size here, so I added that debug_assert.
In fact, I'm not terribly worried about extra copies though since I'm running SHA3 here, but afaik rust offers no ergonomic way to copy amongst byte arrays.
Can I avoid writing (MessageKey, LeafKey) three times here? Is there a type alias for the return type of the current function? Is it safe to use _ in the mem::transmute given that I want the code to refuse to compile if the sizes do not match? Yes, I know I could make a type alias, but that seems silly.
As an aside, there is a longer discussion of s[0..16] not having type [u8; 16] here
There's the copy_from_slice method.
fn main() {
use std::default::Default;
// Using 16+8 because Default isn't implemented
// for [u8; 32+16] due to type explosion unfortunateness
let b: [u8; 24] = Default::default();
let mut c: [u8; 16] = Default::default();
let mut d: [u8; 8] = Default::default();
c.copy_from_slice(&b[..16])
d.copy_from_slice(&b[16..16+8]);
}
Note, unfortunately copy_from_slice throws a runtime error if the slices are not the same length, so make sure you thoroughly test this yourself, or use the lengths of the other arrays to guard.
Unfortunately, c.copy_from_slice(&b[..c.len()]) doesn't work because Rust thinks c is borrowed both immutably and mutably at the same time.
I marked the accepted answer as best since it's safe, and led me to the clone_into_array answer here, but..
Another idea that improves the safety is to make a version of mem::transmute for references that checks the sizes of the referenced types, as opposed to just the pointers. It might look like :
#[inline]
unsafe fn transmute_ptr_mut<A,B>(v: &mut A) -> &mut B {
debug_assert_eq!(core::mem::size_of(A),core::mem::size_of(B));
core::mem::transmute::<&mut A,&mut B>(v)
}
I have raised an issue on the arrayref crate to discuss this, as arrayref might be a reasonable crate for it to live in.
Update : We've a new "best answer" by the arrayref crate developer :
let (a,b) = array_refs![&r,32,16];
(MessageKey(*a), LeafKey(*b))