Rust macro that counts and generates repetitive struct fields - rust

I want to write a macro that generates varying structs from an integer argument. For example, make_struct!(3) might generate something like this:
pub struct MyStruct3 {
field_0: u32,
field_1: u32,
field_2: u32
}
What's the best way to transform that "3" literal into a number that I can use to generate code? Should I be using macro_rules! or a proc-macro?

You need a procedural attribute macro and quite a bit of pipework. An example implementation is on Github; bear in mind that it is pretty rough around the edges, but works pretty nicely to start with.
The aim is to have the following:
#[derivefields(u32, "field", 3)]
struct MyStruct {
foo: u32
}
transpile to:
struct MyStruct {
pub field_0: u32,
pub field_1: u32,
pub field_2: u32,
foo: u32
}
To do this, first, we're going to establish a couple of things. We're going to need a struct to easily store and retrieve our arguments:
struct MacroInput {
pub field_type: syn::Type,
pub field_name: String,
pub field_count: u64
}
The rest is pipework:
impl Parse for MacroInput {
fn parse(input: ParseStream) -> syn::Result<Self> {
let field_type = input.parse::<syn::Type>()?;
let _comma = input.parse::<syn::token::Comma>()?;
let field_name = input.parse::<syn::LitStr>()?;
let _comma = input.parse::<syn::token::Comma>()?;
let count = input.parse::<syn::LitInt>()?;
Ok(MacroInput {
field_type: field_type,
field_name: field_name.value(),
field_count: count.base10_parse().unwrap()
})
}
}
This defines syn::Parse on our struct and allows us to use syn::parse_macro_input!() to easily parse our arguments.
#[proc_macro_attribute]
pub fn derivefields(attr: TokenStream, item: TokenStream) -> TokenStream {
let input = syn::parse_macro_input!(attr as MacroInput);
let mut found_struct = false; // We actually need a struct
item.into_iter().map(|r| {
match &r {
&proc_macro::TokenTree::Ident(ref ident) if ident.to_string() == "struct" => { // react on keyword "struct" so we don't randomly modify non-structs
found_struct = true;
r
},
&proc_macro::TokenTree::Group(ref group) if group.delimiter() == proc_macro::Delimiter::Brace && found_struct == true => { // Opening brackets for the struct
let mut stream = proc_macro::TokenStream::new();
stream.extend((0..input.field_count).fold(vec![], |mut state:Vec<proc_macro::TokenStream>, i| {
let field_name_str = format!("{}_{}", input.field_name, i);
let field_name = Ident::new(&field_name_str, Span::call_site());
let field_type = input.field_type.clone();
state.push(quote!(pub #field_name: #field_type,
).into());
state
}).into_iter());
stream.extend(group.stream());
proc_macro::TokenTree::Group(
proc_macro::Group::new(
proc_macro::Delimiter::Brace,
stream
)
)
}
_ => r
}
}).collect()
}
The behavior of the modifier creates a new TokenStream and adds our fields first. This is extremely important; assume that the struct provided is struct Foo { bar: u8 }; appending last would cause a parse error due to a missing ,. Prepending allows us to not have to care about this, since a trailing comma in a struct is not a parse error.
Once we have this TokenStream, we successively extend() it with the generated tokens from quote::quote!(); this allows us to not have to build the token fragments ourselves. One gotcha is that the field name needs to be converted to an Ident (it gets quoted otherwise, which isn't something we want).
We then return this modified TokenStream as a TokenTree::Group to signify that this is indeed a block delimited by brackets.
In doing so, we also solved a few problems:
Since structs without named members (pub struct Foo(u32) for example) never actually have an opening bracket, this macro is a no-op for this
It will no-op any item that isn't a struct
It will also no-op structs without a member

Related

Reduce handling of Optional by leveraging the type system?

I'm trying to access an api, where I can specify what kind of fields I want included in the result. (for example "basic", "advanced", "irrelevant"
the Rust Struct to represent that would look something like
Values {
a: Option<String>;
b: Option<String>;
c: Option<String>;
d: Option<String>;
}
or probably better:
Values {
a: Option<Basic>; // With field a
b: Option<Advanced>; // With fields b,c
c: Option<Irrelevant>; // With field d
}
Using this is possible, but I'd love to reduce the handling of Option for the caller.
Is it possible to leverage the type system to simplify the usage? (Or any other way I'm not realizing?)
My idea was something in this direction, but I think that might not be possible with rust (at least without macros):
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=093bdf1853978af61443d547082576ca
struct Values {
a: Option<&'static str>,
b: Option<&'static str>,
c: Option<&'static str>,
}
trait ValueTraits{}
impl ValueTraits for dyn Basic{}
impl ValueTraits for dyn Advanced{}
impl ValueTraits for Values{}
trait Basic {
fn a(&self) -> &'static str;
}
trait Advanced {
fn b(&self) -> &'static str;
fn c(&self) -> &'static str;
}
impl Basic for Values {
fn a(&self) -> &'static str {
self.a.unwrap()
}
}
impl Advanced for Values {
fn b(&self) -> &'static str {
self.b.unwrap()
}
fn c(&self) -> &'static str {
self.c.unwrap()
}
}
//Something like this is probably not possible, as far as I understand Rust
fn get_values<T1, T2>() -> T1 + T2{
Values {
a: "A",
b: "B",
c: "C"
}
}
fn main() {
let values = get_values::<Basic, Advanced>();
println!("{}, {}, {}", values.a(), values.b(), values.c());
}
Clarifications (Edit)
The Values struct contains deserialized json data from the api I called. I can request groups of fields to be included in the response(1-n requested fields groups), the fields are of different types.
If I knew beforehand, which of those fields are returned, I wouldn't need them to be Option, but as the caller decides which fields are returned, the fields needs to be Option (either directly, or grouped by the field groups)
There are too many possible combinations to create a struct for each of those.
I fully realize that this cannot work, it was just "peudorust":
fn get_values<T1, T2>() -> T1 + T2{
Values {
a: "A",
b: "B",
c: "C"
}
}
But my thought process was:
In theory, I could request the field groups via generics, so I could create a "dynamic" type, that implements these traits, because I know which traits are requested.
The Traits are supposed to act like a "view" into the actual struct, because if they are requested beforehand, I know I should request them from the api to include them in the Struct.
My knowledge of generics and traits isn't enough to confidently say "this isn't possible at all" and I couldn't find a conclusive answer before I asked here.
Sorry for the initial question not being clear of what the actual issue was, I hope the clarification helps with that.
I can't quite gauge whether or not you want to be able to request and return fields of multiple different types from the question. But if all the information being returned is of a single type you could try using a HashMap:
use std::collections::HashMap;
fn get_values(fields: &[&'static str]) -> HashMap<&'static str, &'static str> {
let mut selected = HashMap::new();
for field in fields {
let val = match *field {
"a" => "Value of a",
"b" => "Value of b",
"c" => "Value of c",
// Skip requested fields that don't exist.
_ => continue,
};
selected.insert(*field, val);
}
selected
}
fn main() {
let fields = ["a","c"];
let values = get_values(&fields);
for (field, value) in values.iter() {
println!("`{}` = `{}`", field, value);
}
}
Additionally you've given me the impression that you haven't quite been able to form a relationship between generics and traits yet. I highly recommend reading over the book's "Generic Types, Traits, and Lifetimes" section.
The gist of it is that generics exist to generalize a function, struct, enum, or even a trait to any type, and traits are used to assign behaviour to a type. Traits cannot be passed as a generic parameter, because traits are not types, they are behaviours. Which is why doing: get_values::<Basic, Advanced>(); doesn't work. Basic and Advanced are both traits, not types.
If you want practice with generics try generalizing get_values so that it can accept any type which can be converted into an iterator that yields &'static strs.
Edit:
The clarification is appreciated. The approach you have in mind is possible, but I wouldn't recommend it because it's implementing it is extremely verbose and will panic the moment the format of the json you're parsing changes. Though if you really need to use traits for some reason you could try something like this:
// One possible construct returned to you.
struct All {
a: Option<i32>,
b: Option<i32>,
c: Option<i32>,
}
// A variation that only returned b and c
struct Bc {
b: Option<i32>,
c: Option<i32>,
}
// impl Advanced + Basic + Default for All {...}
// impl Advanced + Default for Bc {...}
fn get_bc<T: Advanced + Default>() -> T {
// Here you would set the fields on T.
Default::default()
}
fn get_all<T: Basic + Advanced + Default>() -> T {
Default::default()
}
fn main() {
// This isn't really useful unless you want to create multiple structs that
// are supposed to hold b and c but otherwise have different fields.
let bc = get_bc::<Bc>();
let all = get_all::<All>();
// Could also do something like:
let bc = get_bc::<All>();
// but that could get confusing.
}
I think the above is how you're looking to solve your problem. Though if you can, I would still recommend using a HashMap with a trait object like this:
use std::collections::HashMap;
use std::fmt::Debug;
// Here define the behaviour you need to interact with the data. In this case
// it's just ability to print to console.
trait Value: Debug {}
impl Value for &'static str {}
impl Value for i32 {}
impl<T: Debug> Value for Vec<T> {}
fn get_values(fields: &[&'static str]) -> HashMap<&'static str, Box<dyn Value>> {
let mut selected = HashMap::new();
for field in fields {
let val = match *field {
"a" => Box::new("Value of a") as Box<dyn Value>,
"b" => Box::new(2) as Box<dyn Value>,
"c" => Box::new(vec![1,3,5,7]) as Box<dyn Value>,
// Skip requested fields that don't exist.
_ => continue,
};
selected.insert(*field, val);
}
selected
}
fn main() {
let fields = ["a","c"];
let values = get_values(&fields);
for (field, value) in values.iter() {
println!("`{}` = `{:?}`", field, value);
}
}

Is there a macro that automatically creates a dictionary from an enum?

An enum is clearly a kind of key/value pair structure. Consequently, it would be nice to automatically create a dictionary from one wherein the enum variants become the possible keys and their payload the associated values. Keys without a payload would use the unit value. Here is a possible usage example:
enum PaperType {
PageSize(f32, f32),
Color(String),
Weight(f32),
IsGlossy,
}
let mut dict = make_enum_dictionary!(
PaperType,
allow_duplicates = true,
);
dict.insert(dict.PageSize, (8.5, 11.0));
dict.insert(dict.IsGlossy, ());
dict.insert_def(dict.IsGlossy);
dict.remove_all(dict.PageSize);
Significantly, since an enum is merely a list of values that may optionally carry a payload, auto-magically constructing a dictionary from it presents some semantic issues.
How does a strongly typed Dictionary<K, V> maintain the discriminant/value_type dependency inherent with enums where each discriminant has a specific payload type?
enum Ta {
K1(V1),
K2(V2),
...,
Kn(Vn),
}
How do you conveniently refer to an enum discriminant in code without its payload (Ta.K1?) and what type is it (Ta::Discriminant?) ?
Is the value to be set and get the entire enum value or just the payload?
get(&self, key: Ta::Discriminant) -> Option<Ta>
set(&mut self, value: Ta)
If it were possible to augment an existing enum auto-magically with another enum of of its variants then a reasonably efficient solution seems plausible in the following pseudo code:
type D = add_discriminant_keys!( T );
impl<D> for Vec<D> {
fn get(&self, key: D::Discriminant) -> Option<D> { todo!() }
fn set(&mut self, value: D) { todo!() }
}
I am not aware whether the macro, add_discriminant_keys!, or the construct, D::Discriminant, is even feasible. Unfortunately, I am still splashing in the shallow end of the Rust pool, despite this suggestion. However, the boldness of its macro language suggests many things are possible to those who believe.
Handling of duplicates is an implementation detail.
Enum discriminants are typically functions and therefore have a fixed pointer value (as far as I know). If such values could become constants of an associated type within the enum (like a trait) with attributes similar to what has been realized by strum::EnumDiscriminants things would look good. As it is, EnumDiscriminants seems like a sufficient interim solution.
A generic implementation over HashMap using strum_macros crate is provided based on in the rust playground; however, it is not functional there due to the inability of rust playground to load the strum crate from there. A macro derived solution would be nice.
First, like already said here, the right way to go is a struct with optional values.
However, for completeness sake, I'll show here how you can do that with a proc macro.
When you want to design a macro, especially a complicated one, the first thing to do is to plan what the emitted code will be. So, let's try to write the macro's output for the following reduced enum:
enum PaperType {
PageSize(f32, f32),
IsGlossy,
}
I will already warn you that our macro will not support brace-style enum variants, nor combining enums (your add_discriminant_keys!()). Both are possible to support, but both will complicate this already-complicated answer more. I'll refer to them shortly at the end.
First, let's design the map. It will be in a support crate. Let's call this crate denum (a name will be necessary later, when we'll refer to it from our macro):
pub struct Map<E> {
map: std::collections::HashMap<E, E>, // You can use any map implementation you want.
}
We want to store the discriminant as a key, and the enum as the value. So, we need a way to refer to the free discriminant. So, let's create a trait Enum:
pub trait Enum {
type DiscriminantsEnum: Eq + Hash; // The constraints are those of `HashMap`.
}
Now our map will look like that:
pub struct Map<E: Enum> {
map: std::collections::HashMap<E::DiscriminantsEnum, E>,
}
Our macro will generate the implementation of Enum. Hand-written, it'll be the following (note that in the macro, I wrap it in const _: () = { ... }. This is a technique used to prevent names polluting the global namespaces):
#[derive(PartialEq, Eq, Hash)]
pub enum PaperTypeDiscriminantsEnum {
PageSize,
IsGlossy,
}
impl Enum for PaperType {
type DiscriminantsEnum = PaperTypeDiscriminantsEnum;
}
Next. insert() operation:
impl<E: Enum> Map<E> {
pub fn insert(discriminant: E::DiscriminantsEnum, value: /* What's here? */) {}
}
There is no way in current Rust to refer to an enum discriminant as a distinct type. But there is a way to refer to struct as a distinct type.
We can think about the following:
pub struct PageSize;
But this pollutes the global namespace. Of course, we can call it something like PaperTypePageSize, but I much prefer something like PaperTypeDiscriminants::PageSize.
Modules to the rescue!
#[allow(non_snake_case)]
pub mod PaperTypeDiscriminants {
#[derive(Clone, Copy)]
pub struct PageSize;
#[derive(Clone, Copy)]
pub struct IsGlossy;
}
Now we need a way in insert() to validate the the provided discriminant indeed matches the wanted enum, and to refer to its value. A new trait!
pub trait EnumDiscriminant: Copy {
type Enum: Enum;
type Value;
fn to_discriminants_enum(self) -> <Self::Enum as Enum>::DiscriminantsEnum;
fn to_enum(self, value: Self::Value) -> Self::Enum;
}
And here's how our macro will implements it:
impl EnumDiscriminant for PaperTypeDiscriminants::PageSize {
type Enum = PaperType;
type Value = (f32, f32);
fn to_discriminants_enum(self) -> PaperTypeDiscriminantsEnum { PaperTypeDiscriminantsEnum::PageSize }
fn to_enum(self, (v0, v1): Self::Value) -> Self::Enum { Self::Enum::PageSize(v0, v1) }
}
impl EnumDiscriminant for PaperTypeDiscriminants::IsGlossy {
type Enum = PaperType;
type Value = ();
fn to_discriminants_enum(self) -> PaperTypeDiscriminantsEnum { PaperTypeDiscriminantsEnum::IsGlossy }
fn to_enum(self, (): Self::Value) -> Self::Enum { Self::Enum::IsGlossy }
}
And now insert():
pub fn insert<D>(&mut self, discriminant: D, value: D::Value)
where
D: EnumDiscriminant<Enum = E>,
{
self.map.insert(
discriminant.to_discriminants_enum(),
discriminant.to_enum(value),
);
}
And trivially insert_def():
pub fn insert_def<D>(&mut self, discriminant: D)
where
D: EnumDiscriminant<Enum = E, Value = ()>,
{
self.insert(discriminant, ());
}
And get() (note: seprately getting the value is possible when removing, by adding a method to the trait EnumDiscriminant with the signature fn enum_to_value(enum_: Self::Enum) -> Self::Value. It can be unsafe fn and use unreachable_unchecked() for better performance. But with get() and get_mut(), that returns reference, it's harder because you can't get a reference to the discriminant value. Here's a playground that does that nonetheless, but requires nightly):
pub fn get_entry<D>(&self, discriminant: D) -> Option<&E>
where
D: EnumDiscriminant<Enum = E>,
{
self.map.get(&discriminant.to_discriminants_enum())
}
get_mut() is very similar.
Note that my code doesn't handle duplicates but instead overwrites them, as it uses HashMap. However, you can easily create your own map that handles duplicates in another way.
Now that we have a clear picture in mind what the macro should generate, let's write it!
I decided to write it as a derive macro. You can use an attribute macro too, and even a function-like macro, but you must call it at the declaration site of your enum - because macros cannot inspect code other than the code the're applied to.
The enum will look like:
#[derive(denum::Enum)]
enum PaperType {
PageSize(f32, f32),
Color(String),
Weight(f32),
IsGlossy,
}
Usually, when my macro needs helper code, I put this code in crate and the macro in crate_macros, and reexports the macro from crate. So, the code in denum (besides the aforementioned Enum, EnumDiscriminant and Map):
pub use denum_macros::Enum;
denum_macros/src/lib.rs:
use proc_macro::TokenStream;
use quote::{format_ident, quote};
#[proc_macro_derive(Enum)]
pub fn derive_enum(item: TokenStream) -> TokenStream {
let item = syn::parse_macro_input!(item as syn::DeriveInput);
if item.generics.params.len() != 0 {
return syn::Error::new_spanned(
item.generics,
"`denum::Enum` does not work with generics currently",
)
.into_compile_error()
.into();
}
if item.generics.where_clause.is_some() {
return syn::Error::new_spanned(
item.generics.where_clause,
"`denum::Enum` does not work with `where` clauses currently",
)
.into_compile_error()
.into();
}
let (vis, name, variants) = match item {
syn::DeriveInput {
vis,
ident,
data: syn::Data::Enum(syn::DataEnum { variants, .. }),
..
} => (vis, ident, variants),
_ => {
return syn::Error::new_spanned(item, "`denum::Enum` works only with enums")
.into_compile_error()
.into()
}
};
let discriminants_mod_name = format_ident!("{}Discriminants", name);
let discriminants_enum_name = format_ident!("{}DiscriminantsEnum", name);
let mut discriminants_enum = Vec::new();
let mut discriminant_structs = Vec::new();
let mut enum_discriminant_impls = Vec::new();
for variant in variants {
let variant_name = variant.ident;
discriminant_structs.push(quote! {
#[derive(Clone, Copy)]
pub struct #variant_name;
});
let fields = match variant.fields {
syn::Fields::Named(_) => {
return syn::Error::new_spanned(
variant.fields,
"`denum::Enum` does not work with brace-style variants currently",
)
.into_compile_error()
.into()
}
syn::Fields::Unnamed(fields) => Some(fields.unnamed),
syn::Fields::Unit => None,
};
let value_destructuring = fields
.iter()
.flatten()
.enumerate()
.map(|(index, _)| format_ident!("v{}", index));
let value_destructuring = quote!((#(#value_destructuring,)*));
let value_builder = if fields.is_some() {
value_destructuring.clone()
} else {
quote!()
};
let value_type = fields.into_iter().flatten().map(|field| field.ty);
enum_discriminant_impls.push(quote! {
impl ::denum::EnumDiscriminant for #discriminants_mod_name::#variant_name {
type Enum = #name;
type Value = (#(#value_type,)*);
fn to_discriminants_enum(self) -> #discriminants_enum_name { #discriminants_enum_name::#variant_name }
fn to_enum(self, #value_destructuring: Self::Value) -> Self::Enum { Self::Enum::#variant_name #value_builder }
}
});
discriminants_enum.push(variant_name);
}
quote! {
#[allow(non_snake_case)]
#vis mod #discriminants_mod_name { #(#discriminant_structs)* }
const _: () = {
#[derive(PartialEq, Eq, Hash)]
pub enum #discriminants_enum_name { #(#discriminants_enum,)* }
impl ::denum::Enum for #name {
type DiscriminantsEnum = #discriminants_enum_name;
}
#(#enum_discriminant_impls)*
};
}
.into()
}
This macro has several flaws: it doesn't handle visibility modifiers and attributes correctly, for example. But in the general case, it works, and you can fine-tune it more.
If you want to also support brace-style variants, you can create a struct with the data (instead of the tuple we use currently).
Combining enum is possible if you'll not use a derive macro but a function-like macro, and invoke it on both enums, like:
denum::enums! {
enum A { ... }
enum B { ... }
}
Then the macro will have to combine the discriminants and use something like Either<A, B> when operating with the map.
Unfortunately, a couple of questions arise in that context:
should it be possible to use enum types only once? Or are there some which might be there multiple times?
what should happen if you insert a PageSize and there's already a PageSize in the dictionary?
All in all, a regular struct PaperType is much more suitable to properly model your domain. If you don't want to deal with Option, you can implement the Default trait to ensure that some sensible defaults are always available.
If you really, really want to go with a collection-style interface, the closest approximation would probably be a HashSet<PaperType>. You could then insert a value PaperType::PageSize.

Passing a data structure as an args to a proc macro attribute

I'm trying to create a macro that to use on a struct, in which you get another struct as an argument, then concatenate the structa fields. This the code of the macro
#[proc_macro_attribute]
pub fn cat(args: proc_macro::TokenStream, input: proc_macro::TokenStream) -> proc_macro:: TokenStream {
let mut base_struct = parse_macro_input!(input as ItemStruct);
let mut new_fields_struct = parse_macro_input!(args as ItemStruct);
if let syn::Fields::Named(ref mut base_fields) = base_struct.fields {
if let syn::Fields::Named(ref mut new_fields) = new_fields_struct.fields {
for field in new_fields.named.iter() {
base_fields.push(field.clone());
}
}
}
return quote! {
#base_struct
}.into();
}
Here's a basic example of what I want it to be used
struct new_fields{
field: usize,
}
#[cat(new_fields)]
struct base_struct{
base_field, usize,
}
However, it will just convert the word 'new_fields' into a TokenStream then fail to compile because it isn't a struct. Is there a way to 'pass' a struct to the macro as an argument without writing the code inside the the macro usage?
Is there a way to 'pass' a struct to the macro as an argument without writing the code inside the the macro usage?
No. Macros operate on tokens they are given and nothing else.
It's hard to give advice without knowing more detail, but the closest to what you've provided would be to have your macro generate the other struct, so that it knows the definition. You could perhaps pass it like this:
#[cat(
struct new_fields{
field: usize,
}
)]
struct base_struct{
base_field, usize,
}

Rust lifetime scoping in structs

So, I'm working on porting a string tokenizer that I wrote in Python over to Rust, and I've run into an issue I can't seem to get past with lifetimes and structs.
So, the process is basically:
Get an array of files
Convert each file to a Vec<String> of tokens
User a Counter and Unicase to get counts of individual instances of tokens from each vec
Save that count in a struct, along with some other data
(Future) do some processing on the set of Structs to accumulate the total data along side the per-file data
struct Corpus<'a> {
words: Counter<UniCase<&'a String>>,
parts: Vec<CorpusPart<'a>>
}
pub struct CorpusPart<'a> {
percent_of_total: f32,
word_count: usize,
words: Counter<UniCase<&'a String>>
}
fn process_file(entry: &DirEntry) -> CorpusPart {
let mut contents = read_to_string(entry.path())
.expect("Could not load contents.");
let tokens = tokenize(&mut contents);
let counted_words = collect(&tokens);
CorpusPart {
percent_of_total: 0.0,
word_count: tokens.len(),
words: counted_words
}
}
pub fn tokenize(normalized: &mut String) -> Vec<String> {
// snip ...
}
pub fn collect(results: &Vec<String>) -> Counter<UniCase<&'_ String>> {
results.iter()
.map(|w| UniCase::new(w))
.collect::<Counter<_>>()
}
However, when I try to return CorpusPart it complains that it is trying to reference a local variable tokens. How can/should I deal with this? I tried adding lifetime annotations, but couldn't figure it out...
Essentially, I no longer need the Vec<String>, but I do need some of the Strings that were in it for the counter.
Any help is appreciated, thank you!
The issue here is that you are throwing away Vec<String>, but still referencing the elements inside it. If you no longer need Vec<String>, but still require some of the contents inside, you have to transfer the ownership to something else.
I assume you want Corpus and CorpusPart to both point to the same Strings, so you are not duplicating Strings needlessly. If that is the case, either Corpus or CorpusPart must own the String, so that the one that don't own the String references the Strings owned by the other. (Sounds more complicated that it actually is)
I will assume CorpusPart owns the String, and Corpus just points to those strings
use std::fs::DirEntry;
use std::fs::read_to_string;
pub struct UniCase<a> {
test: a
}
impl<a> UniCase<a> {
fn new(item: a) -> UniCase<a> {
UniCase {
test: item
}
}
}
type Counter<a> = Vec<a>;
struct Corpus<'a> {
words: Counter<UniCase<&'a String>>, // Will reference the strings in CorpusPart (I assume you implemented this elsewhere)
parts: Vec<CorpusPart>
}
pub struct CorpusPart {
percent_of_total: f32,
word_count: usize,
words: Counter<UniCase<String>> // Has ownership of the strings
}
fn process_file(entry: &DirEntry) -> CorpusPart {
let mut contents = read_to_string(entry.path())
.expect("Could not load contents.");
let tokens = tokenize(&mut contents);
let length = tokens.len(); // Cache the length, as tokens will no longer be valid once passed to collect
let counted_words = collect(tokens);
CorpusPart {
percent_of_total: 0.0,
word_count: length,
words: counted_words
}
}
pub fn tokenize(normalized: &mut String) -> Vec<String> {
Vec::new()
}
pub fn collect(results: Vec<String>) -> Counter<UniCase<String>> {
results.into_iter() // Use into_iter() to consume the Vec that is passed in, and take ownership of the internal items
.map(|w| UniCase::new(w))
.collect::<Counter<_>>()
}
I aliased Counter<a> to Vec<a>, as I don't know what Counter you are using.
Playground

How to programmatically get the number of fields of a struct?

I have a custom struct like the following:
struct MyStruct {
first_field: i32,
second_field: String,
third_field: u16,
}
Is it possible to get the number of struct fields programmatically (like, for example, via a method call field_count()):
let my_struct = MyStruct::new(10, "second_field", 4);
let field_count = my_struct.field_count(); // Expecting to get 3
For this struct:
struct MyStruct2 {
first_field: i32,
}
... the following call should return 1:
let my_struct_2 = MyStruct2::new(7);
let field_count = my_struct2.field_count(); // Expecting to get count 1
Is there any API like field_count() or is it only possible to get that via macros?
If this is achievable with macros, how should it be implemented?
Are there any possible API like field_count() or is it only possible to get that via macros?
There is no such built-in API that would allow you to get this information at runtime. Rust does not have runtime reflection (see this question for more information). But it is indeed possible via proc-macros!
Note: proc-macros are different from "macro by example" (which is declared via macro_rules!). The latter is not as powerful as proc-macros.
If this is achievable with macros, how should it be implemented?
(This is not an introduction into proc-macros; if the topic is completely new to you, first read an introduction elsewhere.)
In the proc-macro (for example a custom derive), you would somehow need to get the struct definition as TokenStream. The de-facto solution to use a TokenStream with Rust syntax is to parse it via syn:
#[proc_macro_derive(FieldCount)]
pub fn derive_field_count(input: TokenStream) -> TokenStream {
let input = parse_macro_input!(input as ItemStruct);
// ...
}
The type of input is ItemStruct. As you can see, it has the field fields of the type Fields. On that field you can call iter() to get an iterator over all fields of the struct, on which in turn you could call count():
let field_count = input.fields.iter().count();
Now you have what you want.
Maybe you want to add this field_count() method to your type. You can do that via the custom derive (by using the quote crate here):
let name = &input.ident;
let output = quote! {
impl #name {
pub fn field_count() -> usize {
#field_count
}
}
};
// Return output tokenstream
TokenStream::from(output)
Then, in your application, you can write:
#[derive(FieldCount)]
struct MyStruct {
first_field: i32,
second_field: String,
third_field: u16,
}
MyStruct::field_count(); // returns 3
It's possible when the struct itself is generated by the macros - in this case you can just count tokens passed into macros, as shown here. That's what I've come up with:
macro_rules! gen {
($name:ident {$($field:ident : $t:ty),+}) => {
struct $name { $($field: $t),+ }
impl $name {
fn field_count(&self) -> usize {
gen!(#count $($field),+)
}
}
};
(#count $t1:tt, $($t:tt),+) => { 1 + gen!(#count $($t),+) };
(#count $t:tt) => { 1 };
}
Playground (with some test cases)
The downside for this approach (one - there could be more) is that it's not trivial to add an attribute to this function - for example, to #[derive(...)] something on it. Another approach would be to write the custom derive macros, but this is something that I can't speak about for now.

Resources