An enum is clearly a kind of key/value pair structure. Consequently, it would be nice to automatically create a dictionary from one wherein the enum variants become the possible keys and their payload the associated values. Keys without a payload would use the unit value. Here is a possible usage example:
enum PaperType {
PageSize(f32, f32),
Color(String),
Weight(f32),
IsGlossy,
}
let mut dict = make_enum_dictionary!(
PaperType,
allow_duplicates = true,
);
dict.insert(dict.PageSize, (8.5, 11.0));
dict.insert(dict.IsGlossy, ());
dict.insert_def(dict.IsGlossy);
dict.remove_all(dict.PageSize);
Significantly, since an enum is merely a list of values that may optionally carry a payload, auto-magically constructing a dictionary from it presents some semantic issues.
How does a strongly typed Dictionary<K, V> maintain the discriminant/value_type dependency inherent with enums where each discriminant has a specific payload type?
enum Ta {
K1(V1),
K2(V2),
...,
Kn(Vn),
}
How do you conveniently refer to an enum discriminant in code without its payload (Ta.K1?) and what type is it (Ta::Discriminant?) ?
Is the value to be set and get the entire enum value or just the payload?
get(&self, key: Ta::Discriminant) -> Option<Ta>
set(&mut self, value: Ta)
If it were possible to augment an existing enum auto-magically with another enum of of its variants then a reasonably efficient solution seems plausible in the following pseudo code:
type D = add_discriminant_keys!( T );
impl<D> for Vec<D> {
fn get(&self, key: D::Discriminant) -> Option<D> { todo!() }
fn set(&mut self, value: D) { todo!() }
}
I am not aware whether the macro, add_discriminant_keys!, or the construct, D::Discriminant, is even feasible. Unfortunately, I am still splashing in the shallow end of the Rust pool, despite this suggestion. However, the boldness of its macro language suggests many things are possible to those who believe.
Handling of duplicates is an implementation detail.
Enum discriminants are typically functions and therefore have a fixed pointer value (as far as I know). If such values could become constants of an associated type within the enum (like a trait) with attributes similar to what has been realized by strum::EnumDiscriminants things would look good. As it is, EnumDiscriminants seems like a sufficient interim solution.
A generic implementation over HashMap using strum_macros crate is provided based on in the rust playground; however, it is not functional there due to the inability of rust playground to load the strum crate from there. A macro derived solution would be nice.
First, like already said here, the right way to go is a struct with optional values.
However, for completeness sake, I'll show here how you can do that with a proc macro.
When you want to design a macro, especially a complicated one, the first thing to do is to plan what the emitted code will be. So, let's try to write the macro's output for the following reduced enum:
enum PaperType {
PageSize(f32, f32),
IsGlossy,
}
I will already warn you that our macro will not support brace-style enum variants, nor combining enums (your add_discriminant_keys!()). Both are possible to support, but both will complicate this already-complicated answer more. I'll refer to them shortly at the end.
First, let's design the map. It will be in a support crate. Let's call this crate denum (a name will be necessary later, when we'll refer to it from our macro):
pub struct Map<E> {
map: std::collections::HashMap<E, E>, // You can use any map implementation you want.
}
We want to store the discriminant as a key, and the enum as the value. So, we need a way to refer to the free discriminant. So, let's create a trait Enum:
pub trait Enum {
type DiscriminantsEnum: Eq + Hash; // The constraints are those of `HashMap`.
}
Now our map will look like that:
pub struct Map<E: Enum> {
map: std::collections::HashMap<E::DiscriminantsEnum, E>,
}
Our macro will generate the implementation of Enum. Hand-written, it'll be the following (note that in the macro, I wrap it in const _: () = { ... }. This is a technique used to prevent names polluting the global namespaces):
#[derive(PartialEq, Eq, Hash)]
pub enum PaperTypeDiscriminantsEnum {
PageSize,
IsGlossy,
}
impl Enum for PaperType {
type DiscriminantsEnum = PaperTypeDiscriminantsEnum;
}
Next. insert() operation:
impl<E: Enum> Map<E> {
pub fn insert(discriminant: E::DiscriminantsEnum, value: /* What's here? */) {}
}
There is no way in current Rust to refer to an enum discriminant as a distinct type. But there is a way to refer to struct as a distinct type.
We can think about the following:
pub struct PageSize;
But this pollutes the global namespace. Of course, we can call it something like PaperTypePageSize, but I much prefer something like PaperTypeDiscriminants::PageSize.
Modules to the rescue!
#[allow(non_snake_case)]
pub mod PaperTypeDiscriminants {
#[derive(Clone, Copy)]
pub struct PageSize;
#[derive(Clone, Copy)]
pub struct IsGlossy;
}
Now we need a way in insert() to validate the the provided discriminant indeed matches the wanted enum, and to refer to its value. A new trait!
pub trait EnumDiscriminant: Copy {
type Enum: Enum;
type Value;
fn to_discriminants_enum(self) -> <Self::Enum as Enum>::DiscriminantsEnum;
fn to_enum(self, value: Self::Value) -> Self::Enum;
}
And here's how our macro will implements it:
impl EnumDiscriminant for PaperTypeDiscriminants::PageSize {
type Enum = PaperType;
type Value = (f32, f32);
fn to_discriminants_enum(self) -> PaperTypeDiscriminantsEnum { PaperTypeDiscriminantsEnum::PageSize }
fn to_enum(self, (v0, v1): Self::Value) -> Self::Enum { Self::Enum::PageSize(v0, v1) }
}
impl EnumDiscriminant for PaperTypeDiscriminants::IsGlossy {
type Enum = PaperType;
type Value = ();
fn to_discriminants_enum(self) -> PaperTypeDiscriminantsEnum { PaperTypeDiscriminantsEnum::IsGlossy }
fn to_enum(self, (): Self::Value) -> Self::Enum { Self::Enum::IsGlossy }
}
And now insert():
pub fn insert<D>(&mut self, discriminant: D, value: D::Value)
where
D: EnumDiscriminant<Enum = E>,
{
self.map.insert(
discriminant.to_discriminants_enum(),
discriminant.to_enum(value),
);
}
And trivially insert_def():
pub fn insert_def<D>(&mut self, discriminant: D)
where
D: EnumDiscriminant<Enum = E, Value = ()>,
{
self.insert(discriminant, ());
}
And get() (note: seprately getting the value is possible when removing, by adding a method to the trait EnumDiscriminant with the signature fn enum_to_value(enum_: Self::Enum) -> Self::Value. It can be unsafe fn and use unreachable_unchecked() for better performance. But with get() and get_mut(), that returns reference, it's harder because you can't get a reference to the discriminant value. Here's a playground that does that nonetheless, but requires nightly):
pub fn get_entry<D>(&self, discriminant: D) -> Option<&E>
where
D: EnumDiscriminant<Enum = E>,
{
self.map.get(&discriminant.to_discriminants_enum())
}
get_mut() is very similar.
Note that my code doesn't handle duplicates but instead overwrites them, as it uses HashMap. However, you can easily create your own map that handles duplicates in another way.
Now that we have a clear picture in mind what the macro should generate, let's write it!
I decided to write it as a derive macro. You can use an attribute macro too, and even a function-like macro, but you must call it at the declaration site of your enum - because macros cannot inspect code other than the code the're applied to.
The enum will look like:
#[derive(denum::Enum)]
enum PaperType {
PageSize(f32, f32),
Color(String),
Weight(f32),
IsGlossy,
}
Usually, when my macro needs helper code, I put this code in crate and the macro in crate_macros, and reexports the macro from crate. So, the code in denum (besides the aforementioned Enum, EnumDiscriminant and Map):
pub use denum_macros::Enum;
denum_macros/src/lib.rs:
use proc_macro::TokenStream;
use quote::{format_ident, quote};
#[proc_macro_derive(Enum)]
pub fn derive_enum(item: TokenStream) -> TokenStream {
let item = syn::parse_macro_input!(item as syn::DeriveInput);
if item.generics.params.len() != 0 {
return syn::Error::new_spanned(
item.generics,
"`denum::Enum` does not work with generics currently",
)
.into_compile_error()
.into();
}
if item.generics.where_clause.is_some() {
return syn::Error::new_spanned(
item.generics.where_clause,
"`denum::Enum` does not work with `where` clauses currently",
)
.into_compile_error()
.into();
}
let (vis, name, variants) = match item {
syn::DeriveInput {
vis,
ident,
data: syn::Data::Enum(syn::DataEnum { variants, .. }),
..
} => (vis, ident, variants),
_ => {
return syn::Error::new_spanned(item, "`denum::Enum` works only with enums")
.into_compile_error()
.into()
}
};
let discriminants_mod_name = format_ident!("{}Discriminants", name);
let discriminants_enum_name = format_ident!("{}DiscriminantsEnum", name);
let mut discriminants_enum = Vec::new();
let mut discriminant_structs = Vec::new();
let mut enum_discriminant_impls = Vec::new();
for variant in variants {
let variant_name = variant.ident;
discriminant_structs.push(quote! {
#[derive(Clone, Copy)]
pub struct #variant_name;
});
let fields = match variant.fields {
syn::Fields::Named(_) => {
return syn::Error::new_spanned(
variant.fields,
"`denum::Enum` does not work with brace-style variants currently",
)
.into_compile_error()
.into()
}
syn::Fields::Unnamed(fields) => Some(fields.unnamed),
syn::Fields::Unit => None,
};
let value_destructuring = fields
.iter()
.flatten()
.enumerate()
.map(|(index, _)| format_ident!("v{}", index));
let value_destructuring = quote!((#(#value_destructuring,)*));
let value_builder = if fields.is_some() {
value_destructuring.clone()
} else {
quote!()
};
let value_type = fields.into_iter().flatten().map(|field| field.ty);
enum_discriminant_impls.push(quote! {
impl ::denum::EnumDiscriminant for #discriminants_mod_name::#variant_name {
type Enum = #name;
type Value = (#(#value_type,)*);
fn to_discriminants_enum(self) -> #discriminants_enum_name { #discriminants_enum_name::#variant_name }
fn to_enum(self, #value_destructuring: Self::Value) -> Self::Enum { Self::Enum::#variant_name #value_builder }
}
});
discriminants_enum.push(variant_name);
}
quote! {
#[allow(non_snake_case)]
#vis mod #discriminants_mod_name { #(#discriminant_structs)* }
const _: () = {
#[derive(PartialEq, Eq, Hash)]
pub enum #discriminants_enum_name { #(#discriminants_enum,)* }
impl ::denum::Enum for #name {
type DiscriminantsEnum = #discriminants_enum_name;
}
#(#enum_discriminant_impls)*
};
}
.into()
}
This macro has several flaws: it doesn't handle visibility modifiers and attributes correctly, for example. But in the general case, it works, and you can fine-tune it more.
If you want to also support brace-style variants, you can create a struct with the data (instead of the tuple we use currently).
Combining enum is possible if you'll not use a derive macro but a function-like macro, and invoke it on both enums, like:
denum::enums! {
enum A { ... }
enum B { ... }
}
Then the macro will have to combine the discriminants and use something like Either<A, B> when operating with the map.
Unfortunately, a couple of questions arise in that context:
should it be possible to use enum types only once? Or are there some which might be there multiple times?
what should happen if you insert a PageSize and there's already a PageSize in the dictionary?
All in all, a regular struct PaperType is much more suitable to properly model your domain. If you don't want to deal with Option, you can implement the Default trait to ensure that some sensible defaults are always available.
If you really, really want to go with a collection-style interface, the closest approximation would probably be a HashSet<PaperType>. You could then insert a value PaperType::PageSize.
TL;DR: I want to implement trait std::io::Write that outputs to a memory buffer, ideally String, for unit-testing purposes.
I must be missing something simple.
Similar to another question, Writing to a file or stdout in Rust, I am working on a code that can work with any std::io::Write implementation.
It operates on structure defined like this:
pub struct MyStructure {
writer: Box<dyn Write>,
}
Now, it's easy to create instance writing to either a file or stdout:
impl MyStructure {
pub fn use_stdout() -> Self {
let writer = Box::new(std::io::stdout());
MyStructure { writer }
}
pub fn use_file<P: AsRef<Path>>(path: P) -> Result<Self> {
let writer = Box::new(File::create(path)?);
Ok(MyStructure { writer })
}
pub fn printit(&mut self) -> Result<()> {
self.writer.write(b"hello")?;
Ok(())
}
}
But for unit testing, I also need to have a way to run the business logic (here represented by method printit()) and trap its output, so that its content can be checked in the test.
I cannot figure out how to implement this. This playground code shows how I would like to use it, but it does not compile because it breaks borrowing rules.
// invalid code - does not compile!
fn main() {
let mut buf = Vec::new(); // This buffer should receive output
let mut x2 = MyStructure { writer: Box::new(buf) };
x2.printit().unwrap();
// now, get the collected output
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
// here I want to analyze the output, for instance in unit-test asserts
println!("Output to string was {}", output);
}
Any idea how to write the code correctly? I.e., how to implement a writer on top of a memory structure (String, Vec, ...) that can be accessed afterwards?
Something like this does work:
let mut buf = Vec::new();
{
// Use the buffer by a mutable reference
//
// Also, we're doing it inside another scope
// to help the borrow checker
let mut x2 = MyStructure { writer: Box::new(&mut buf) };
x2.printit().unwrap();
}
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
println!("Output to string was {}", output);
However, in order for this to work, you need to modify your type and add a lifetime parameter:
pub struct MyStructure<'a> {
writer: Box<dyn Write + 'a>,
}
Note that in your case (where you omit the + 'a part) the compiler assumes that you use 'static as the lifetime of the trait object:
// Same as your original variant
pub struct MyStructure {
writer: Box<dyn Write + 'static>
}
This limits the set of types which could be used here, in particular, you cannot use any kinds of borrowed references. Therefore, for maximum genericity we have to be explicit here and define a lifetime parameter.
Also note that depending on your use case, you can use generics instead of trait objects:
pub struct MyStructure<W: Write> {
writer: W
}
In this case the types are fully visible at any point of your program, and therefore no additional lifetime annotation is needed.
So, I'm working on porting a string tokenizer that I wrote in Python over to Rust, and I've run into an issue I can't seem to get past with lifetimes and structs.
So, the process is basically:
Get an array of files
Convert each file to a Vec<String> of tokens
User a Counter and Unicase to get counts of individual instances of tokens from each vec
Save that count in a struct, along with some other data
(Future) do some processing on the set of Structs to accumulate the total data along side the per-file data
struct Corpus<'a> {
words: Counter<UniCase<&'a String>>,
parts: Vec<CorpusPart<'a>>
}
pub struct CorpusPart<'a> {
percent_of_total: f32,
word_count: usize,
words: Counter<UniCase<&'a String>>
}
fn process_file(entry: &DirEntry) -> CorpusPart {
let mut contents = read_to_string(entry.path())
.expect("Could not load contents.");
let tokens = tokenize(&mut contents);
let counted_words = collect(&tokens);
CorpusPart {
percent_of_total: 0.0,
word_count: tokens.len(),
words: counted_words
}
}
pub fn tokenize(normalized: &mut String) -> Vec<String> {
// snip ...
}
pub fn collect(results: &Vec<String>) -> Counter<UniCase<&'_ String>> {
results.iter()
.map(|w| UniCase::new(w))
.collect::<Counter<_>>()
}
However, when I try to return CorpusPart it complains that it is trying to reference a local variable tokens. How can/should I deal with this? I tried adding lifetime annotations, but couldn't figure it out...
Essentially, I no longer need the Vec<String>, but I do need some of the Strings that were in it for the counter.
Any help is appreciated, thank you!
The issue here is that you are throwing away Vec<String>, but still referencing the elements inside it. If you no longer need Vec<String>, but still require some of the contents inside, you have to transfer the ownership to something else.
I assume you want Corpus and CorpusPart to both point to the same Strings, so you are not duplicating Strings needlessly. If that is the case, either Corpus or CorpusPart must own the String, so that the one that don't own the String references the Strings owned by the other. (Sounds more complicated that it actually is)
I will assume CorpusPart owns the String, and Corpus just points to those strings
use std::fs::DirEntry;
use std::fs::read_to_string;
pub struct UniCase<a> {
test: a
}
impl<a> UniCase<a> {
fn new(item: a) -> UniCase<a> {
UniCase {
test: item
}
}
}
type Counter<a> = Vec<a>;
struct Corpus<'a> {
words: Counter<UniCase<&'a String>>, // Will reference the strings in CorpusPart (I assume you implemented this elsewhere)
parts: Vec<CorpusPart>
}
pub struct CorpusPart {
percent_of_total: f32,
word_count: usize,
words: Counter<UniCase<String>> // Has ownership of the strings
}
fn process_file(entry: &DirEntry) -> CorpusPart {
let mut contents = read_to_string(entry.path())
.expect("Could not load contents.");
let tokens = tokenize(&mut contents);
let length = tokens.len(); // Cache the length, as tokens will no longer be valid once passed to collect
let counted_words = collect(tokens);
CorpusPart {
percent_of_total: 0.0,
word_count: length,
words: counted_words
}
}
pub fn tokenize(normalized: &mut String) -> Vec<String> {
Vec::new()
}
pub fn collect(results: Vec<String>) -> Counter<UniCase<String>> {
results.into_iter() // Use into_iter() to consume the Vec that is passed in, and take ownership of the internal items
.map(|w| UniCase::new(w))
.collect::<Counter<_>>()
}
I aliased Counter<a> to Vec<a>, as I don't know what Counter you are using.
Playground
I want to write a macro that generates varying structs from an integer argument. For example, make_struct!(3) might generate something like this:
pub struct MyStruct3 {
field_0: u32,
field_1: u32,
field_2: u32
}
What's the best way to transform that "3" literal into a number that I can use to generate code? Should I be using macro_rules! or a proc-macro?
You need a procedural attribute macro and quite a bit of pipework. An example implementation is on Github; bear in mind that it is pretty rough around the edges, but works pretty nicely to start with.
The aim is to have the following:
#[derivefields(u32, "field", 3)]
struct MyStruct {
foo: u32
}
transpile to:
struct MyStruct {
pub field_0: u32,
pub field_1: u32,
pub field_2: u32,
foo: u32
}
To do this, first, we're going to establish a couple of things. We're going to need a struct to easily store and retrieve our arguments:
struct MacroInput {
pub field_type: syn::Type,
pub field_name: String,
pub field_count: u64
}
The rest is pipework:
impl Parse for MacroInput {
fn parse(input: ParseStream) -> syn::Result<Self> {
let field_type = input.parse::<syn::Type>()?;
let _comma = input.parse::<syn::token::Comma>()?;
let field_name = input.parse::<syn::LitStr>()?;
let _comma = input.parse::<syn::token::Comma>()?;
let count = input.parse::<syn::LitInt>()?;
Ok(MacroInput {
field_type: field_type,
field_name: field_name.value(),
field_count: count.base10_parse().unwrap()
})
}
}
This defines syn::Parse on our struct and allows us to use syn::parse_macro_input!() to easily parse our arguments.
#[proc_macro_attribute]
pub fn derivefields(attr: TokenStream, item: TokenStream) -> TokenStream {
let input = syn::parse_macro_input!(attr as MacroInput);
let mut found_struct = false; // We actually need a struct
item.into_iter().map(|r| {
match &r {
&proc_macro::TokenTree::Ident(ref ident) if ident.to_string() == "struct" => { // react on keyword "struct" so we don't randomly modify non-structs
found_struct = true;
r
},
&proc_macro::TokenTree::Group(ref group) if group.delimiter() == proc_macro::Delimiter::Brace && found_struct == true => { // Opening brackets for the struct
let mut stream = proc_macro::TokenStream::new();
stream.extend((0..input.field_count).fold(vec![], |mut state:Vec<proc_macro::TokenStream>, i| {
let field_name_str = format!("{}_{}", input.field_name, i);
let field_name = Ident::new(&field_name_str, Span::call_site());
let field_type = input.field_type.clone();
state.push(quote!(pub #field_name: #field_type,
).into());
state
}).into_iter());
stream.extend(group.stream());
proc_macro::TokenTree::Group(
proc_macro::Group::new(
proc_macro::Delimiter::Brace,
stream
)
)
}
_ => r
}
}).collect()
}
The behavior of the modifier creates a new TokenStream and adds our fields first. This is extremely important; assume that the struct provided is struct Foo { bar: u8 }; appending last would cause a parse error due to a missing ,. Prepending allows us to not have to care about this, since a trailing comma in a struct is not a parse error.
Once we have this TokenStream, we successively extend() it with the generated tokens from quote::quote!(); this allows us to not have to build the token fragments ourselves. One gotcha is that the field name needs to be converted to an Ident (it gets quoted otherwise, which isn't something we want).
We then return this modified TokenStream as a TokenTree::Group to signify that this is indeed a block delimited by brackets.
In doing so, we also solved a few problems:
Since structs without named members (pub struct Foo(u32) for example) never actually have an opening bracket, this macro is a no-op for this
It will no-op any item that isn't a struct
It will also no-op structs without a member
I have a custom struct like the following:
struct MyStruct {
first_field: i32,
second_field: String,
third_field: u16,
}
Is it possible to get the number of struct fields programmatically (like, for example, via a method call field_count()):
let my_struct = MyStruct::new(10, "second_field", 4);
let field_count = my_struct.field_count(); // Expecting to get 3
For this struct:
struct MyStruct2 {
first_field: i32,
}
... the following call should return 1:
let my_struct_2 = MyStruct2::new(7);
let field_count = my_struct2.field_count(); // Expecting to get count 1
Is there any API like field_count() or is it only possible to get that via macros?
If this is achievable with macros, how should it be implemented?
Are there any possible API like field_count() or is it only possible to get that via macros?
There is no such built-in API that would allow you to get this information at runtime. Rust does not have runtime reflection (see this question for more information). But it is indeed possible via proc-macros!
Note: proc-macros are different from "macro by example" (which is declared via macro_rules!). The latter is not as powerful as proc-macros.
If this is achievable with macros, how should it be implemented?
(This is not an introduction into proc-macros; if the topic is completely new to you, first read an introduction elsewhere.)
In the proc-macro (for example a custom derive), you would somehow need to get the struct definition as TokenStream. The de-facto solution to use a TokenStream with Rust syntax is to parse it via syn:
#[proc_macro_derive(FieldCount)]
pub fn derive_field_count(input: TokenStream) -> TokenStream {
let input = parse_macro_input!(input as ItemStruct);
// ...
}
The type of input is ItemStruct. As you can see, it has the field fields of the type Fields. On that field you can call iter() to get an iterator over all fields of the struct, on which in turn you could call count():
let field_count = input.fields.iter().count();
Now you have what you want.
Maybe you want to add this field_count() method to your type. You can do that via the custom derive (by using the quote crate here):
let name = &input.ident;
let output = quote! {
impl #name {
pub fn field_count() -> usize {
#field_count
}
}
};
// Return output tokenstream
TokenStream::from(output)
Then, in your application, you can write:
#[derive(FieldCount)]
struct MyStruct {
first_field: i32,
second_field: String,
third_field: u16,
}
MyStruct::field_count(); // returns 3
It's possible when the struct itself is generated by the macros - in this case you can just count tokens passed into macros, as shown here. That's what I've come up with:
macro_rules! gen {
($name:ident {$($field:ident : $t:ty),+}) => {
struct $name { $($field: $t),+ }
impl $name {
fn field_count(&self) -> usize {
gen!(#count $($field),+)
}
}
};
(#count $t1:tt, $($t:tt),+) => { 1 + gen!(#count $($t),+) };
(#count $t:tt) => { 1 };
}
Playground (with some test cases)
The downside for this approach (one - there could be more) is that it's not trivial to add an attribute to this function - for example, to #[derive(...)] something on it. Another approach would be to write the custom derive macros, but this is something that I can't speak about for now.