Variable size data structure (e.g. packet) in Rust - rust

I am new to Rust and trying to find the best representation in Rust for a variable size data structure, such as an IP packet. Consider the scenario below:
I have a pointer/reference to the start of a data structure in memory (e.g. &[u8]) with a fixed size header (which contains a length field for the data structure in a fixed, known location) and a variable size body/payload. From this pointer, I would like to construct an immutable reference of a Rust struct.
I assume this is packed with no alignment, but I don't know what is the best way to 1) represent this as a Rust struct and 2) obtain the reference from the pointer to a byte array.
I have read other similar issues and documentations on Dynamically Sized Types (DSTs) but they do not seem to offer the whole solution.

Related

Is it possible to save some structure as sequence of bytes?

I am using a library that only allows to store arrays of [u8], and I have a struct from external crate that does not implement Serialize and does not provide its fields (i. e. they are private). Question: is it possible to turn an instance of this struct into an array of [u8] without causing an undefined behaviour? I was pointed to the fact that simple transmute may cause undefined behaviour because structure may contain uninitialized data in the form of padding fields.
Question: is it possible to turn an instance of this struct into an array of [u8] without causing an undefined behaviour?
You can always serialize it by hand (to whatever format you choose), especially if all sub-fields are serializable.
It's extremely risky and wildly unsafe if you're serializing a pointer and expecting it to come out fine the other way around, but your deserialization will make that rather clear I guess.

Can From trait implementations be lossy?

Context
I have a pair of related structs in my program, Rom and ProfiledRom. They both store a list of u8 values and implement a common trait, GetRom, to provide access to those values.
trait GetRom {
fn get(&self, index: usize) -> u8;
}
The difference is that Rom just wraps a simple Vec<u8>, but ProfiledRom wraps each byte in a ProfiledByte type that counts the number of times it is returned by get.
struct Rom(Vec<u8>);
struct ProfiledRom(Vec<ProfiledByte>);
struct ProfiledByte {
value: u8;
get_count: u32;
};
Much of my program operates on trait GetRom values, so I can substitute in Rom or ProfiledRom type/value depending on whether I want profiling to occur.
Question
I have implemented From<Rom> for ProfiledRom, because converting a Rom to a ProfiledRom just involves wrapping each byte in a new ProfiledByte: a simple and lossless operation.
However, I'm not sure whether it's appropriate to implement From<ProfiledRom> for Rom, because ProfiledRom contains information (the get counts) that can't be represented in a Rom. If you did a round-trip conversion, these values would be lost/reset.
Is it appropriate to implement the From trait when only parts of the source object will be used?
Related
I have seen that the standard library doesn't implement integer conversions like From<i64> for i32 because these could result in bytes being truncated/lost. However, that seems like a somewhat distinct case from what we have here.
With the potentially-truncating integer conversion, you would need to inspect the original i64 to know whether it would be converted appropriately. If you didn't, the behaviour or your code could change unexpectedly when you get an out-of-bounds value. However, in our case above, it's always statically clear what data is being preserved and what data is being lost. The conversion's behaviour won't suddenly change. It should be safer, but is it an appropriate use of the From trait?
From implementations are usually lossless, but there is currently no strict requirement that they be.
The ongoing discussion at rust-lang/rfcs#2484 is related. Some possibilities include adding a FromLossy trait and more exactly prescribing the behaviour of From. We'll have to see where that goes.
For consideration, here are some Target::from(Source) implementations in the standard library:
Lossless conversions
Each Source value is converted into a distinct Target value.
u16::from(u8), i16::from(u8) and other conversions to strictly-larger integer types.
Vec<u8>::from(String)
Vec<T>::from(BinaryHeap<T>)
OsString::from(String)
char::from(u8)
Lossy conversions
Multiple Source values may be convert into the same Target value.
BinaryHeap<T>::from(Vec<T>) loses the order of elements.
Box<[T]>::from(Vec<T>) and Box<str>::from(String) lose any excess capacity.
Vec<T>::from(VecDeque<T>) loses the internal split of elements exposed by .as_slices().

How to map a structure from a buffer like in C with a pointer and cast

In C, I can define many structures and structure of structures.
From a buffer, I can just set the pointer at the beginning of this structure to say this buffer represents this structure.
Of course, I do not want to copy anything, just mapping, otherwise I loose the benefit of the speed.
Is it possible in NodeJs ? How can I do ? How can I be sure it's a mapping and not creating a new object and copy information inside ?
Example:
struct House = {
uint8 door,
uint16BE kitchen,
etc...
}
var mybuff = Buffer.allocate(10, 0)
var MyHouse = new House(mybuff) // same as `House* MyHouse = (House*) mybuff`
console.log(MyHouse.door) // will display the value of door
console.log(MyHouse.kitchen) // will display the value of kitchen with BE function.
This is wrong but explain well what I am looking for.
This without copying anything.
And if I do MyHouse.door=56, mybuff contains know the 56. I consider mybuff as a pointer.
Edit after question update below
Opposed to C/C++, javascript uses pionters by default, so you don't have to do anything. It's the other way around, actually: You have to put some effort in if you want a copy of the current object.
In C, a struct is nothing more than a compile-time reference to different parts of data in the struct. So:
struct X {
int foo;
int bar;
}
is nothing more than saying: if you want bar from a variable with type X, just add the length of foo (length of int) to the base pointer.
In Javascript, we do not even have such a type. We can just say:
var x = {
foo: 1,
bar: 2
}
The lookup of bar will automatically be a pointer (we call them references in javascript) lookup. Because javascript does not have types, you can view an object as a map/dictionary with pointers to mixed types.
If you, for any reason, want to create a copy of a datastructure, you would have to iterate through the entire datastructure (recursively) and create a copy of the datastructure manually. The basic types are not pointer based. These include number (Javascript automatically differentiates between int and float under the hood), string and boolean.
Edit after question update
Although I am not an expert on this area, I do not think it is possible. The problem is, the underlying data representation (as in how the data is represented as bytes in memory) is different, because javascript does not have compile-time information about data structures. As I said before, javascript doesn't have classes/structs, just objects with fields, which basically behave (and may be implemented as) maps/dictionaries.
There are, however, some third party libraries to cope with these problems. There are two general approaches:
Unpack everything to javascript objects. The data will be copied, but you can work with it as normal javascript objects. You should use this if you read/write the data intensively, because the performance increase you get when working with normal javascript objects outweighs the advantage of not having to unpack the data. Link to example library
Leave all data in the buffer. When you need some of the data, compute the location of the data in the buffer at runtime, and read/write at this location accordingly. Because the struct data location computations are done in runtime, you should use this only when you have loads of data and only a few reads/writes to it. In this case the performance decrease of unpacking all data outweighs the few runtime computations that have to be done. Link to example library
As a side-note, if the amount of data you have to process isn't that much, I'd recommend to just unpack the data. It saves you the headache of having to use the library as interface to your data. Computers are fast enough nowadays to copy/process some amount of data in memory. Also, these third party libraries are just some examples. I recommend you do a little more research for libraries to decide which one suits your needs.

If I make a struct and put it in a vector, does it reside on the heap or the stack?

I'm writing some code that generates a vector of geometric elements:
struct Geom_Entity {
// a bunch of geometric information,
// like tangent planes, force vectors, etc
}
The code is parsing many of these entities from a text file (for e.g.) so we have a function currently:
parse_Geom(x: String) -> Vec<Geom_Entity> {
// a bunch of code
}
These geometric entities are large structs with 17 f64s and a few other fields. The file may contain well over 1000 of these, but not so many that they can't all fit into memory (at least for now).
Also, should I be doing
Box::new(Geom_Entity { ...
and then putting the box in the vector?
The documentation for Vec says (emphasis mine):
If a Vec has allocated memory, then the memory it points to is on the heap
So yes, the members of the vector are owned by the vector and are stored on the heap.
In general, boxing an element before putting it in the Vec is wasteful - there's extra memory allocation and indirection. There are times when you need that extra allocation or indirection, so it's never say never.
See also:
Does Rust box the individual items that are added to a vector?

How can a vector be both dynamic and known at compile time in Rust?

I'm confused by what seem to be conflicting statements in the documentation for vectors in Rust:
A ‘vector’ is a dynamic or ‘growable’ array, implemented as the standard library type Vec<T>.
and
Vectors store their contents as contiguous arrays of T on the heap. This means that they must be able to know the size of T at compile time (that is, how many bytes are needed to store a T?). The size of some things can't be known at compile time. For these you'll have to store a pointer to that thing: thankfully, the Box type works perfectly for this.
Rust vectors are dynamically growable, but I don't see how that fits with the statement that their size must be known at compile time.
It's been a while since I've worked with a lower-level language where I have to think about memory allocation so I'm probably missing something obvious.
Note the wording:
they must be able to know the size of T
This says that the size of an individual element must be known. The total count of elements, and thus the total amount of memory allocated, is not known.
When the vector allocates memory, it says "I want to store 12 FooBar structs. One FooBar is 24 bytes, therefore I need to allocate 288 bytes total".
The 12 is the dynamic capacity of the vector, the 24 is the static size of one element (the T).

Resources