Rust - Get Enum-With-Data Ordinal - rust

I'm trying to serialize an enum. I plan to serialize by first encoding the ordinal of the enum, then the values very similarly to enum to bytes / bytes to enum?. The answer to that question involved using a crate, serde; I would like to avoid using this crate.
It seems that there are two types of enums: with data and without, that are incompatible in some ways. It appears possible to get ordinal values from enums without data using as u8 simply enough. A value comes back from as u8 on enum variants with data, but enum variants without data (when other enum variants have data) fail to compile:
https://play.rust-lang.org/?gist=2f6a4e8507a59d451546a69407bc0d77
#[repr(u8)]
enum Enumeration {
One=0,
}
#[repr(u8)]
enum Other {
Twelve(String)=4,
Thirteen=5,
}
fn main() {
println!("Got unsigned {:?}", Enumeration::One as u8);
println!("Got other {:?}", Other::Twelve as u8);
// Uncommenting the next line produces a compiler error
//println!("Got other {:?}", Other::Thirteen as u8);
}
(I get the impression that the values coming back from enum variants with data are not useful.)
How do I get the ordinal for enum variants with data?

TLDR; you're casting a function pointer to a u8, not what you want. Use serde or similar to achieve your goal if you want to.
So, running clippy on your code gives a very good indication of what's going on here:
warning: casting function pointer `Other::Twelve` to `u8`, which truncates the value
--> src/main.rs:13:32
|
13 | println!("Got other {:?}", Other::Twelve as u8);
| ^^^^^^^^^^^^^^^^^^^ help: try: `Other::Twelve as usize`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#fn_to_numeric_cast_with_truncation
= note: `#[warn(clippy::fn_to_numeric_cast_with_truncation)]` on by default
warning: cast of an enum tuple constructor to an integer
--> src/main.rs:13:32
|
13 | println!("Got other {:?}", Other::Twelve as u8);
| ^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#cast_enum_constructor
= note: `#[warn(clippy::cast_enum_constructor)]` on by default
So, that's probably not what you want.
If we dig deeper into the nomicon, we can find this article:
If the enum has fields, the effect is similar to the effect of repr(C) in that there is a defined layout of the type. This makes it possible to pass the enum to C code, or access the type's raw representation and directly manipulate its tag and fields. See the RFC for details.
[...]
Adding an explicit repr(u*), repr(i*), or repr(C) to an enum with fields suppresses the null-pointer optimization, like:
enum MyOption<T> {
Some(T),
None,
}
#[repr(u8)]
enum MyReprOption<T> {
Some(T),
None,
}
assert_eq!(8, size_of::<MyOption<&u16>>());
assert_eq!(16, size_of::<MyReprOption<&u16>>());
If you want to read what you can do with that, follow the RFC: https://github.com/rust-lang/rfcs/blob/master/text/2195-really-tagged-unions.md#guide-level-explanation

Based on
Given that rustc guarantees that #[repr(u16)] enumerations start with their discriminant stored as a u16...
here: https://github.com/rust-lang/rfcs/pull/2363/files, the first byte of the enumeration value should be the underlying u8. So...
https://play.rust-lang.org/?gist=21e3ab42f76ccbc05b6b61560cbd29ec
#[repr(u8)]
enum MyOption {
Option(String),
NotOption,
}
fn get_ordinal(option: MyOption) -> u8 {
let ptr_to_option = (&option as *const MyOption) as *const u8;
unsafe {
*ptr_to_option
}
}
fn main() {
let option = MyOption::Option("Yolo".to_string());
let not_option = MyOption::NotOption;
println!("ordinal for Option is {}; NotOption is {}",
get_ordinal(option), get_ordinal(not_option));
}
output ordinal for Option is 0; NotOption is 1.

Related

Why does Default derived on an enum not apply to references to the enum?

I have an enum that has Default derived on it, say for example:
#[derive(Copy, Clone, Debug, Default, PartialEq, Eq)]
pub enum Enum<T> {
#[default] None,
One(T),
Two(T),
}
Vec has methods like last() which return an Option<&T>. But when I call unwrap_or_default() on the Option<&T>, I get the following error:
22 | println!("{:?}", v.last().unwrap_or_default());
| ^^^^^^^^ ----------------- required by a bound introduced by this call
| |
| the trait `Default` is not implemented for `&Enum<i8>`
|
= help: the trait `Default` is implemented for `Enum<T>`
note: required by a bound in `Option::<T>::unwrap_or_default`
I can work around this in one of two ways:
Use unwrap_or() and provide a reference to the default element manually:
v.last().unwrap_or(&Enum::None)
Implement Default for &Enum.
impl<'a> Default for &'a Enum2 {
fn default() -> &'a Enum2 { &Enum2::None }
}
But I don't realize why I should have to. Is there a separate trait I should derive to use the tagged default element correctly here?
Playground
Context: the goal was to output "None" if the list was empty and Enum::Display(val) from Some(val) for the last element, within a sequence of format arguments.
match self.history.last() { None => "None", Some(val) => val }
is illegal: the types of the match arms must match. Many other options don't work due to val being borrowed and self not being Copy. Defining a separate default for Enum was an easily-thought-of option to avoid extra string formatting/allocation.
What should Default return for &Enum?
Remember, a reference must point to some valid data, and does not keep said data alive.
So even if the Default implementation would allocate a new object, there is nowhere to keep it alive, because the returning reference wouldn't keep it alive. The only way is to store the default element somewhere globally; and then it would be shared for everyone. Many objects are more complicated and cannot be simply stored globally, because they are not const-initializable. What should happen then?
The fact that a reference does not implement Default automatically is intentional and with good reasons. unwrap_or_default is simply the wrong function to call if your contained value is a reference. Use match or if let to extract the value without having to create a fallback value. unwrap_or is also a valid choice if you must absolutely have a fallback value, although I don't see how this is easily possible to implement with a reference without falling back to global const values. Which is also fine, if it fits your usecase.
In your case, it's of course fine to derive Default manually and is probably what I would do if I absolutely had to. But to me, requiring this in the first place is most likely a code smell in the first place.
For enums that are that simple, it's actually an anti-pattern to use a non-mutable reference instead of a value. References are of size 64 in a 64bit system, while an enum of with three simple values is most likely a 8 bits large. So using a reference is an overhead.
So in this case, you can use copied() to query the object by-value. This is most likely even faster than querying it by reference.
#[derive(Copy, Clone, Debug, Default, PartialEq, Eq)]
pub enum Enum<T> {
#[default]
None,
One(T),
Two(T),
}
fn main() {
let v: Vec<Enum<u32>> = vec![];
println!("{:?}", v.last().copied().unwrap_or_default());
}
None
Either way, I think the fact that you have this question in the first place indicates that there is an architectural issue in your code.
There are two cases:
Your enum members all contain valid data; in that case it is always important to distinguish in .last() between "one element with some data" and "no element", and falling back to a default wouldn't be viable.
Your enum has a None member, in which case having the None element at .last() would be identical to having no members. (which is what I suspect your code does). In that case it's pointless to manually define a None member, though, simply use the Option enum that already exists. That would make your problem trivially solvable; Option already implements Default (which is None) and can propagate the reference to its member via .as_ref():
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum EnumContent<T> {
One(T),
Two(T),
}
// Option already implements `Default`.
pub type Enum<T> = Option<EnumContent<T>>;
fn main() {
let mut v: Vec<Enum<u32>> = vec![];
println!("{:?} -> {:?}", v, v.last().and_then(Option::as_ref));
v.push(Some(EnumContent::One(42)));
println!("{:?} -> {:?}", v, v.last().and_then(Option::as_ref));
v.push(None);
println!("{:?} -> {:?}", v, v.last().and_then(Option::as_ref));
}
[] -> None
[Some(One(42))] -> Some(One(42))
[Some(One(42)), None] -> None

Struct property as key and the struct itself as value in HashMaps - Rust [duplicate]

This question already has answers here:
Why can't I store a value and a reference to that value in the same struct?
(4 answers)
Closed 8 months ago.
The following is a snippet of a more complicated code, the idea is loading a SQL table and setting a hashmap with one of the table struct fields as the key and keeping the structure as the value (implementation details are not important since the code works fine if I clone the String, however, the Strings in the DB can be arbitrarily long and cloning can be expensive).
The following code will fail with
error[E0382]: use of partially moved value: `foo`
--> src/main.rs:24:35
|
24 | foo_hashmap.insert(foo.a, foo);
| ----- ^^^ value used here after partial move
| |
| value partially moved here
|
= note: partial move occurs because `foo.a` has type `String`, which does not implement the `Copy` trait
For more information about this error, try `rustc --explain E0382`.
use std::collections::HashMap;
struct Foo {
a: String,
b: String,
}
fn main() {
let foo_1 = Foo {
a: "bar".to_string(),
b: "bar".to_string(),
};
let foo_2 = Foo {
a: "bar".to_string(),
b: "bar".to_string(),
};
let foo_vec = vec![foo_1, foo_2];
let mut foo_hashmap = HashMap::new();
foo_vec.into_iter().for_each(|foo| {
foo_hashmap.insert(foo.a, foo); // foo.a.clone() will make this compile
});
}
The struct Foo cannot implement Copy since its fields are String. I tried wrapping foo.a with Rc::new(RefCell::new()) but later went down the pitfall of missing the trait Hash for RefCell<String>, so currently I'm not certain in either using something else for the struct fields (will Cow work?), or to handle that logic within the for_each loop.
There are at least two problems here: First, the resulting HashMap<K, V> would be a self-referential struct, as the K borrows V; there are many questions and answers on SA about the pitfalls of this. Second, even if you could construct such a HashMap, you'd easily break the guarantees provided by HashMap, which allows you to modify V while assuming that K always stays constant: There is no way to get a &mut K for a HashMap, but you can get a &mut V; if K is actually a &V, one could easily modify K through V (by ways of mutating Foo.a ) and break the map.
One possibility is to change Foo.a from a String to a Rc<str>, which you can clone with minimal runtime cost in order to put the value both in the K and into V. As Rc<str> is Borrow<str>, you can still look up values in the map by means of &str. This still has the - theoretical - downside that you can break the map by getting a &mut Foo from the map and std::mem::swap the a, which makes it impossible to look up the correct value from its keys; but you'd have to do that deliberately.
Another option is to actually use a HashSet instead of a HashMap, and use a newtype for Foo which behaves like a Foo.a. You'd have to implement PartialEq, Eq, Hash (and Borrow<str> for good measure) like this:
use std::collections::HashSet;
#[derive(Debug)]
struct Foo {
a: String,
b: String,
}
/// A newtype for `Foo` which behaves like a `str`
#[derive(Debug)]
struct FooEntry(Foo);
/// `FooEntry` compares to other `FooEntry` only via `.a`
impl PartialEq<FooEntry> for FooEntry {
fn eq(&self, other: &FooEntry) -> bool {
self.0.a == other.0.a
}
}
impl Eq for FooEntry {}
/// It also hashes the same way as a `Foo.a`
impl std::hash::Hash for FooEntry {
fn hash<H>(&self, hasher: &mut H)
where
H: std::hash::Hasher,
{
self.0.a.hash(hasher);
}
}
/// Due to the above, we can implement `Borrow`, so now we can look up
/// a `FooEntry` in the Set using &str
impl std::borrow::Borrow<str> for FooEntry {
fn borrow(&self) -> &str {
&self.0.a
}
}
fn main() {
let foo_1 = Foo {
a: "foo".to_string(),
b: "bar".to_string(),
};
let foo_2 = Foo {
a: "foobar".to_string(),
b: "barfoo".to_string(),
};
let foo_vec = vec![foo_1, foo_2];
let mut foo_hashmap = HashSet::new();
foo_vec.into_iter().for_each(|foo| {
foo_hashmap.insert(FooEntry(foo));
});
// Look up `Foo` using &str as keys...
println!("{:?}", foo_hashmap.get("foo").unwrap().0);
println!("{:?}", foo_hashmap.get("foobar").unwrap().0);
}
Notice that HashSet provides no way to get a &mut FooEntry due to the reasons described above. You'd have to use RefCell (and read what the docs of HashSet have to say about this).
The third option is to simply clone() the foo.a as you described. Given the above, this is probably the most simple solution. If using an Rc<str> doesn't bother you for other reasons, this would be my choice.
Sidenote: If you don't need to modify a and/or b, a Box<str> instead of String is smaller by one machine word.

Can I define my own "strong" type alias in Rust?

tl;dr in Rust, is there a "strong" type alias (or typing mechanism) such that the rustc compiler will reject (emit an error) for mix-ups that may be the same underlying type?
Problem
Currently, type aliases of the same underlying type may be defined
type WidgetCounter = usize;
type FoobarTally = usize;
However, the compiler will not reject (emit an error or a warning) if I mistakenly mix up instances of the two type aliases.
fn tally_the_foos(tally: FoobarTally) -> FoobarTally {
// ...
tally
}
fn main() {
let wc: WidgetCounter = 33;
let ft: FoobarTally = 1;
// whoops, passed the wrong variable!
let tally_total = tally_the_foos(wc);
}
(Rust Playground)
Possible Solutions?
I'm hoping for something like an additional keyword strong
strong type WidgetCounter = usize;
strong type FoobarTally = usize;
such that the previous code, when compiled, would cause a compiler error:
error[E4444]: mismatched strong alias type WidgetCounter,
expected a FoobarTally
Or maybe there is a clever trick with structs that would achieve this?
Or a cargo module that defines a macro to accomplish this?
I know I could "hack" this by type aliasing different number types, i.e. i32, then u32, then i64, etc. But that's an ugly hack for many reasons.
Is there a way to have the compiler help me avoid these custom type alias mixups?
Rust has a nice trick called the New Type Idiom just for this. By wrapping a single item in a tuple struct, you can create a "strong" or "distinct" type wrapper.
This idiom is also mentioned briefly in the tuple struct section of the Rust docs.
The "New Type Idiom" link has a great example. Here is one similar to the types you are looking for:
// Defines two distinct types. Counter and Tally are incompatible with
// each other, even though they contain the same item type.
struct Counter(usize);
struct Tally(usize);
// You can destructure the parameter here to easily get the contained value.
fn print_tally(Tally(value): &Tally) {
println!("Tally is {}", value);
}
fn return_tally(tally: Tally) -> Tally {
tally
}
fn print_value(value: usize) {
println!("Value is {}", value);
}
fn main() {
let count: Counter = Counter(12);
let mut tally: Tally = Tally(10);
print_tally(&tally);
tally = return_tally(tally);
// This is a compile time error.
// Counter is not compatible with type Tally.
// print_tally(&count);
// The contained value can be obtained through destructuring
// or by potision.
let Tally(tally_value ) = tally;
let tally_value_from_position: usize = tally.0;
print_value(tally_value);
print_value(tally_value_from_position);
}

Is it possible to use a type to access a specific field of a Rust union?

As part of mapping a C interface to Rust, I want to handle a union that stored a few native types directly and have a pointer to an allocated type for all others.
How can I implement a parameterized wrapper type around the union that can select and use the appropriate field based on the wrapper type parameter?
In this case, I want to add a Rust wrapper that reads the data structure directly and not a solution that converts it to a Rust-native type first. Adding other support types to "trick" the compiler is fine though. The end goal is to be able to write code similar to this:
let the_list: List<i32> = get_a_list();
for i in the_list {
println!("value")
}
I am skipping the definitions of IntoIterator and friends since that boils down to accessing the correct field based on the type.
The code is highly unsafe, but we can assume that the user provides the correct type for the type parameter.
There are other solutions to this problem such as having explicit reader functions for each type, but this is focused on understanding if there are ways to make it work without those and without introducing an undue overhead.
Code that does not work, but illustrates what I want to accomplish:
#![feature(specialization)]
use std::convert::From;
union Data<T> {
int_value: i64,
ptr_value: *const T,
}
default impl<T> From<&Data<T>> for &T {
fn from(data: &Data<T>) -> &T {
&*data.ptr_value
}
}
impl From<&Data<i64>> for &i64 {
fn from(data: &Data<i64>) -> &i64 {
&*data.int_value
}
}
fn show<T>(value: &T) {
println!("value: {}", value);
}
fn main() {
let value = String::from("magic");
let data: Data<&str> = Data {
ptr_value: value.as_ptr(),
};
show(data.into());
}
I have minimized the example to avoid discussions about other aspects. This gives the following error:
error[E0210]: type parameter `T` must be used as the type parameter for some local type (e.g., `MyStruct<T>`)
--> examples/union_convert.rs:10:14
|
10 | default impl<T> From<&Data<T>> for &T {
| ^ type parameter `T` must be used as the type parameter for some local type
|
= note: implementing a foreign trait is only possible if at least one of the types for which is it implemented is local
= note: only traits defined in the current crate can be implemented for a type parameter
I have tried adding a wrapper around the union to handle the type-punning, but that seems to just push the error message around. Returning some other type than &T would also be possible, but I do not understand how to make it behave correctly. Using a different return type is also an option, but it still boils down to selecting the correct field based on a type.
Looking at the implementation of std::vec::Vec it does a similar thing, but in this case it always map the memory representation of the type to a real type. In this case, I want to select the correct union field based on the type that was used when writing the value.
Other similar questions that do not really answer this questions:
How to force a union to behave as if there is only one type? ask a similar question, but in this case there are explicit functions to retrieve each type and in this case I want to use the type to resolve what field to read.
Resolve union structure in Rust FFI just address the issue that you need to explicitly pick what field to read, which is what I intend to do.
How is there a conflicting implementation of From when using a generic type? This question discuss From as well, but it converting from a generic type (e.g., &T) to the implemented type (e.g., Data<T>). This question is about going in the other direction: converting from a new type (Data<T>) to a more generic type (&T).
Update: Provided a more distinct example of using into() to convert the union type one of the fields.
I would define my own trait instead of From to avoid conflicting with the standard library implementations. I'd also define a newtype wrapper / marker type. This removes the possibility of conflict when storing one of the specific types in the generic spot.
struct Other<T>(T);
union Data<T> {
bool_value: bool,
int_value: i64,
ptr_value: *const T,
}
trait Access<T> {
type Output;
unsafe fn access(&self) -> &Self::Output;
}
impl<T> Access<Other<T>> for Data<T> {
type Output = T;
unsafe fn access(&self) -> &Self::Output {
&*self.ptr_value
}
}
impl<T> Access<bool> for Data<T> {
type Output = bool;
unsafe fn access(&self) -> &Self::Output {
&self.bool_value
}
}
impl<T> Access<i64> for Data<T> {
type Output = i64;
unsafe fn access(&self) -> &Self::Output {
&self.int_value
}
}
fn main() {
let value = 123_f64;
let data: Data<f64> = Data { ptr_value: &value };
// This is safe because we just created this with
// a `f64` and nothing happened in-between.
unsafe {
println!("{}", Access::<Other<f64>>::access(&data));
}
}
See also:
How is there a conflicting implementation of `From` when using a generic type?

Why does calling a method on a dereferenced trait object or slice compile?

Given the following code:
trait Function {
fn filter (&self);
}
#[derive(Debug, Copy, Clone)]
struct Kidney {}
impl Function for Kidney {
fn filter (&self) {
println!("filtered");
}
}
fn main() {
let k = Kidney {};
let f: &Function = &k;
//let k1 = (*f); //--> This gives a "size not satisfied" error
(*f).filter(); //--> Works; what exactly happens here?
}
I am not sure why it compiles. I was expecting the last statement to fail. I guess I have overlooked some fundamentals while learning Rust, as I am failing to understand why dereferencing a trait (that lives behind a pointer) should compile.
Is this issue similar to the following case?
let v = vec![1, 2, 3, 4];
//let s: &[i32] = *v;
println!("{}", (*v)[0]);
*v gives a slice, but a slice is unsized, so again it is not clear to me how this compiles. If I uncomment the second statement I get
| let s:&[i32]= *v;
| ^^
| |
| expected &[i32], found slice
| help: consider borrowing here: `&*v`
|
= note: expected type `&[i32]`
found type `[{integer}]`
Does expected type &[i32] mean "expected a reference of slice"?
Dereferencing a trait object is no problem. In fact, it must be dereferenced at some point, otherwise it would be quite useless.
let k1 = (*f); fails not because of dereferencing but because you try to put the raw trait object on the stack (this is where local variables live). Values on the stack must have a size known at compile time, which is not the case for trait objects because any type could implement the trait.
Here is an example where a structs with different sizes implement the trait:
trait Function {
fn filter (&self);
}
#[derive(Debug, Copy, Clone)]
struct Kidney {}
impl Function for Kidney {
fn filter (&self) {
println!("filtered");
}
}
#[derive(Debug, Copy, Clone)]
struct Liver {
size: f32
}
impl Function for Liver {
fn filter (&self) {
println!("filtered too!");
}
}
fn main() {
let k = Kidney {};
let l = Liver {size: 1.0};
let f: &Function;
if true {
f = &k;
} else {
f = &l;
}
// Now what is the size of *f - Kidney (0 bytes) or Liver (4 bytes)?
}
(*f).filter(); works because the temporarily dereferenced object is not put on the stack. In fact, this is the same as f.filter(). Rust automatically applies as many dereferences as required to get to an actual object. This is documented in the book.
What happens in the second case is that Vec implements Deref to slices, so it gets all methods implemented for slices for free. *v gives you a dereferenced slice, which you assign to a slice. This is an obvious type error.
Judging by the MIR produced by the first piece of code, (*f).filter() is equivalent to f.filter(); it appears that the compiler is aware that since filter is a method on &self, dereferencing it doesn't serve any purpose and is omitted altogether.
The second case, however, is different, because dereferencing the slice introduces bounds-checking code. In my opinion the compiler should also be able to tell that this operation (dereferencing) doesn't introduce any meaningful changes (and/or that there won't be an out-of-bounds error) and treat it as regular slice indexing, but there might be some reason behind this.

Resources