Efficiently storing uniform enum variant in collection

Efficiently storing uniform enum variant in collection - rust

I am writing a library that parses an existing serialization format which has objects of various types and collections which can contain many of a uniform type (array, set, map).
My value type is defined as:
enum Value {
Bool(bool),
Int(i32),
Float(f32),
Str(String),
// .. 20+ additional types
}
I can of course define my array type as Vec<Value> but this does not enforce each value in the array is of the same variant. For arrays I can get away with defining an array enum which contains a vec variant for each in the base value:
enum ValueArray {
Bool(Vec<bool>),
Int(Vec<i32>),
Float(Vec<f32>),
Str(Vec<String>),
// etc.
}
This breaks down for a map type that stores a mapping of a type to another type as I have to define a variant for N*N types:
enum ValueMap {
BoolBool(Map<bool, bool>),
BoolInt(Map<bool, i32>),
BoolFloat(Map<bool, f32>),
BoolStr(Map<bool, String>),
IntBool(Map<i32, bool>),
IntInt(Map<i32, i32>),
IntFloat(Map<i32, f32>),
IntStr(Map<i32, String>),
FloatBool(Map<f32, bool>),
FloatInt(Map<f32, i32>),
FloatFloat(Map<f32, f32>),
FloatStr(Map<f32, String>),
StrBool(Map<String, bool>),
StrInt(Map<String, i32>),
StrFloat(Map<String, f32>),
StrStr(Map<String, String>),
// etc.
}
Is there an alternative method that would:
enforce uniform types contained in a collection
store the enum variant once as opposed to for element in the collection

Related

How to returning different data types from a function depending upon the arguments passed

Introduction
In my rust library, I have "Shape" objects (called Types in rust). Each shape object has attributes.
Some of those attributes like "color", "name" etc are string values, some of these attributes are integers, and some bool.
The way attributes module is implemented is such that for each attribute there is a different getter and setter function.
An example
pub struct Attributes{
//--strings
bounding_rectangle_color:String,
shadow_color:String,
name:String,
color:String,
...
pub fn set_bounding_rectangle_color(&mut self,v:String){
self.bounding_rectangle_color = v;
}
pub fn get_bounding_rectangle_color(&self)->String{
String::from(&self.bounding_rectangle_color)
}
This is (in my humble opinion) the correct way of doing it since each attribute has its own get and set function these functions will soon have some logic specific to that attribute.
The Challange
Though having separate getters and setters for each attribute is great however the same API can not be exposed to the user being confusing.
the user should have something like
let some_result:???? = my_shape.get_attr(attribute_name);
The problem is that the get_attr function has to return either a bool or string or int etc (but it can not return 2 different types).
Please keep in mind the problem is getting get_attr and not set_attr since in set_attr we can just take in a string and change it internally to whatever.
I want to wrap the attributes module in a wrapper such that internally attributes can use separate getter and setter for each value whereas from the top the wrapper should be able to communicate with attribute module using a simple API like
let value = shape.get_attr(attribute_name);
What I have tried so far
Tuple Structs
I tried to use a tuple struct in place of each attribute inside this tuple there will be the data type of this attribute (ie either bool, string, etc).
BUT the problem was the same that at some point that tuple had to return a value and that value type can not be unknown at compile time
Returning Functions
Once again when we return a function we have to tell the return type of that function as well.
Summary
Rust does not have union types, am I just trying to re-implement union types??? am I going in the wrong direction .. please help

The easiest way to return multiple types is to return an enum. For example, you could define an enum as follows (with the exact types/names up to you):
enum Return {
Int(usize),
String(String),
Bool(bool),
}
From here, you could have a function like so:
fn do_something(num: usize) -> Return {
match num {
1 => Return::Int(1),
2 => Return::String(String::from("two")),
3 => Return::Bool(true),
_ => unreachable!(),
}
}
Now, depending on the input parameter, it will return different types. To actually do something with the types, you could do something like:
if let Return::Int(int) = do_something(1) {
// do something with the int
}
However, it might be best to try and find another way to solve your problem - having a single getter/setter is not really better than having individual getters/setters, and could lead to your code being confusing.

What is a memory-efficient type for a map with no meaningful value in Rust?

In Go, a memory-efficient way of storing values you want to retrieve by key that has no associated value is to use a map of empty structs keyed with the data you want to store. For instance, if you have a list of strings you want to check have been previously seen by your program, you could do something like this:
var seen = map[string]struct{}{}
for _, str := range strings {
if _, ok := seen[str]; ok {
// do something
} else {
seen[str] = struct{}{}
}
}
Is there a Rust equivalent to this? I am aware that Rust doesn't have anonymous structs like Go, so what Rust type would use the least amount of memory in a map like the above example? Or is there a different, more idiomatic approach?

A HashSet is defined as a HashMap with the unit tuple as the value:
pub struct HashSet<T, S = RandomState> {
map: HashMap<T, (), S>,
}
The same is true for BTreeSet / BTreeMap:
pub struct BTreeSet<T> {
map: BTreeMap<T, ()>,
}
what Rust type would use the least amount of memory
Any type with only one possible value uses zero bytes. () is an easy-to-type one.
See also:
What does an empty set of parentheses mean when used in a generic type declaration?

How can I store an enum so I can retrieve it by only identifying the variant?

I have an enum like:
pub enum Component {
Position { vector: [f64; 2] },
RenderFn { render_fn: fn(Display, &mut Frame, Entity), },
}
I would like to store Components in a hashset/hashmap where they are identified only by their enum variant (Position or RenderFn).
There can be zero or one Position and zero or one RenderFn in the collection. I would like to be able to remove/retrieve it by passing an identifier/type (Position/RenderFn).
Is there any way to do this without any ugly hacks? Perhaps enums are not the way to go?

It sounds like you want a structure, not a collection of enum variants.
struct Component {
position: Option<[f64; 2]>,
render_fn: Option<fn(Display, &mut Frame, Entity)>,
}
If this is likely to involve many kinds of components, and they mostly won't all be present, then maybe you want something like the typemap crate.
But to answer your question: no, a variant can't be separated from its associated values.

Is it possible to have a variable local to a trait implementation?

I have a indexable type that I want to iterate over. It consists of some metadata and an array. I need to first iterate over the bytes of the metadata and then to that of the array. From what I understand, the iterator cannot have any storage local to the trait implementation. I think this is very disorganized, and I don't want my data types to be muddled by the need to satisfy extraneous influence.
impl Iterator for IndexableData {
type Item = u8
let index : isize = 0;
fn next(& mut self) -> Option<Item> {
if self.index > self.len() { None }
if self.index > size_of::<Metadata> {
Some (self.data[index - size_of::<Metadata>])
}
Some (self.metadata[index])
}
}
This is what I think the implementation should look like. The index variable belongs in the iterator trait. Not my IndexableData type. How can I achieve this?

The Iterator should be a separate struct that has a reference to the collection plus any other data it may need (such as this index). The collection object itself should not be an iterator. That would not only require misplaced additional metadata in the collection, it would prevent you from having multiple independent iterators over the collection.

Is Option<T> optimized to a single byte when T allows it?

Suppose we have an enum Foo { A, B, C }.
Is an Option<Foo> optimized to a single byte in this case?
Bonus question: if so, what are the limits of the optimization process? Enums can be nested and contain other types. Is the compiler always capable of calculating the maximum number of combinations and then choosing the smallest representation?

The compiler is not very smart when it comes to optimizing the layout of enums for space. Given:
enum Option<T> { None, Some(T) }
enum Weird<T> { Nil, NotNil { x: int, y: T } }
enum Foo { A, B, C }
There's really only one case the compiler considers:
An Option-like enum: one variant carrying no data ("nullary"), one variant containing exactly one datum. When used with a pointer known to never be null (currently, only references and Box<T>) the representation will be that of a single pointer, null indicating the nullary variant. As a special case, Weird will receive the same treatment, but the value of the y field will be used to determine which variant the value represents.
Beyond this, there are many, many possible optimizations available, but the compiler doesn't do them yet. In particular, your case will not berepresented as a single byte. For a single enum, not considering the nested case, it will be represented as the smallest integer it can.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Efficiently storing uniform enum variant in collection - rust

Related

How to returning different data types from a function depending upon the arguments passed

What is a memory-efficient type for a map with no meaningful value in Rust?

How can I store an enum so I can retrieve it by only identifying the variant?

Is it possible to have a variable local to a trait implementation?

Is Option<T> optimized to a single byte when T allows it?

Categories

Resources