How do I process enum/struct/field attributes in a procedural macro? - rust

Serde supports applying custom attributes that are used with #[derive(Serialize)]:
#[derive(Serialize)]
struct Resource {
// Always serialized.
name: String,
// Never serialized.
#[serde(skip_serializing)]
hash: String,
// Use a method to decide whether the field should be skipped.
#[serde(skip_serializing_if = "Map::is_empty")]
metadata: Map<String, String>,
}
I understand how to implement a procedural macro (Serialize in this example) but what should I do to implement #[serde(skip_serializing)]? I was unable to find this information anywhere. The docs don't even mention this. I have tried to look at the serde-derive source code but it is very complicated for me.

First you must register all of your attributes in the same place you register your procedural macro. Let's say we want to add two attributes (we still don't talk what will they belong to: structs or fields or both of them):
#[proc_macro_derive(FiniteStateMachine, attributes(state_transitions, state_change))]
pub fn fxsm(input: TokenStream) -> TokenStream {
// ...
}
After that you may already compile your user code with the following:
#[derive(Copy, Clone, Debug, FiniteStateMachine)]
#[state_change(GameEvent, change_condition)] // optional
enum GameState {
#[state_transitions(NeedServer, Ready)]
Prepare { players: u8 },
#[state_transitions(Prepare, Ready)]
NeedServer,
#[state_transitions(Prepare)]
Ready,
}
Without that compiler will give a error with message like:
state_change does not belong to any known attribute.
These attributes are optional and all we have done is allow them to be to specified. When you derive your procedural macro you may check for everything you want (including attributes existence) and panic! on some condition with meaningful message which will be told by the compiler.
Now we will talk about handling the attribute! Let's forget about state_transitions attribute because it's handling will not vary too much from handling struct/enum attributes (actually it is only a little bit more code) and talk about state_change. The syn crate gives you all the needed information about definitions (but not implementations unfortunately (I am talking about impl here) but this is enough for handling attributes of course). To be more detailed, we need syn::DeriveInput, syn::Body, syn::Variant, syn::Attribute and finally syn::MetaItem.
To handle the attribute of a field you need to go through all these structures from one to another. When you reach Vec<syn:: Attribute> - this is what you want, a list of all attributes of a field. Here our state_transitions can be found. When you find it, you may want to get its content and this can be done by using matching syn::MetaItem enum. Just read the docs :) Here is a simple example code which panics when we find state_change attribute on some field plus it checks does our target entity derive Copy or Clone or neither of them:
#[proc_macro_derive(FiniteStateMachine, attributes(state_transitions, state_change))]
pub fn fxsm(input: TokenStream) -> TokenStream {
// Construct a string representation of the type definition
let s = input.to_string();
// Parse the string representation
let ast = syn::parse_derive_input(&s).unwrap();
// Build the impl
let gen = impl_fsm(&ast);
// Return the generated impl
gen.parse().unwrap()
}
fn impl_fsm(ast: &syn::DeriveInput) -> Tokens {
const STATE_CHANGE_ATTR_NAME: &'static str = "state_change";
if let syn::Body::Enum(ref variants) = ast.body {
// Looks for state_change attriute (our attribute)
if let Some(ref a) = ast.attrs.iter().find(|a| a.name() == STATE_CHANGE_ATTR_NAME) {
if let syn::MetaItem::List(_, ref nested) = a.value {
panic!("Found our attribute with contents: {:?}", nested);
}
}
// Looks for derive impls (not our attribute)
if let Some(ref a) = ast.attrs.iter().find(|a| a.name() == "derive") {
if let syn::MetaItem::List(_, ref nested) = a.value {
if derives(nested, "Copy") {
return gen_for_copyable(&ast.ident, &variants, &ast.generics);
} else if derives(nested, "Clone") {
return gen_for_clonable(&ast.ident, &variants, &ast.generics);
} else {
panic!("Unable to produce Finite State Machine code on a enum which does not drive Copy nor Clone traits.");
}
} else {
panic!("Unable to produce Finite State Machine code on a enum which does not drive Copy nor Clone traits.");
}
} else {
panic!("How have you been able to call me without derive!?!?");
}
} else {
panic!("Finite State Machine must be derived on a enum.");
}
}
fn derives(nested: &[syn::NestedMetaItem], trait_name: &str) -> bool {
nested.iter().find(|n| {
if let syn::NestedMetaItem::MetaItem(ref mt) = **n {
if let syn::MetaItem::Word(ref id) = *mt {
return id == trait_name;
}
return false
}
false
}).is_some()
}
You may be interested in reading serde_codegen_internals, serde_derive, serenity's #[command] attr, another small project of mine - unique-type-id, fxsm-derive. The last link is actually my own project to explain to myself how to use procedural macros in Rust.
After some Rust 1.15 and updating the syn crate, it is no longer possible to check derives of a enums/structs, however, everything else works okay.

You implement attributes on fields as part of the derive macro for the struct (you can only implement derive macros for structs and enums).
Serde does this by checking every field for an attribute within the structures provided by syn and changing the code generation accordingly.
You can find the relevant code here: https://github.com/serde-rs/serde/blob/master/serde_derive/src/internals/attr.rs

To expand Victor Polevoy's answer when it comes to the state_transitions attribute. I'm providing an example of how to extract the field attribute #[state_transitions(NeedServer, Ready)] on a enum that derives #[derive(FiniteStateMachine)]:
#[derive(FiniteStateMachine)]
enum GameState {
#[state_transitions(NeedServer, Ready)] // <-- extract this
Prepare { players: u8 },
#[state_transitions(Prepare, Ready)]
NeedServer,
#[state_transitions(Prepare)]
Ready,
}
use proc_macro::TokenStream;
#[proc_macro_derive(FiniteStateMachine, attributes(state_transitions))]
pub fn finite_state_machine(input: TokenStream) -> TokenStream {
let ast = syn::parse(input).unwrap();
// Extract the enum variants
let variants: Vec<&syn::Variant> = match &ast.data {
syn::Data::Enum(ref data_enum) => data_enum.variants.iter().collect(),
other => panic!("#[derive(FiniteStateMachine)] expects enum, got {:#?}", other)
};
// For each variant, extract the attributes
let _ = variants.iter().map(|variant| {
let attrs = variant.attrs.iter()
// checks attribute named "state_transitions(...)"
.find_map(|attr| match attr.path.is_ident("state_transitions") {
true => Some(&attr.tokens),
false => None,
})
.expect("#[derive(FiniteStateMachine)] expects attribute macros #[state_transitions(...)] on each variant, found none");
// outputs: attr: "(NeedServer, Ready)"
eprintln!("attr: {:#?}", attrs.to_string());
// do something with the extracted attributes
...
})
.collect();
...
}
The content of the extracted attrs (typed TokenStream) looks like this:
TokenStream [
Group {
delimiter: Parenthesis,
stream: TokenStream [
Ident {
ident: "NeedServer",
span: #0 bytes(5511..5521),
},
Punct {
ch: ',',
spacing: Alone,
span: #0 bytes(5521..5522),
},
Ident {
ident: "Ready",
span: #0 bytes(5523..5528),
},
],
span: #0 bytes(5510..5529),
},
]

Related

Rust safe const iterable associative array

I would like to create a structure that's something like a compile-time immutable map with safely checked keys at compile-time. More generally, I would like an iterable associative array with safe key access.
My first attempt at this was using a const HashMap (such as described here) but then the keys are not safely accessible:
use phf::{phf_map};
static COUNTRIES: phf::Map<&'static str, &'static str> = phf_map! {
"US" => "United States",
"UK" => "United Kingdom",
};
COUNTRIES.get("EU") // no compile-time error
Another option I considered was using an enumerable enum with the strum crate as described here:
use strum::IntoEnumIterator; // 0.17.1
use strum_macros::EnumIter; // 0.17.1
#[derive(Debug, EnumIter)]
enum Direction {
NORTH,
SOUTH,
EAST,
WEST,
}
fn main() {
for direction in Direction::iter() {
println!("{:?}", direction);
}
}
This works, except that enum values in rust can only be integers. To assign a different value would require something like implementing a value() function for the enum with a match statement, (such as what's described here), however this means that any time the developer decides to append a new item, the value function must be updated as well, and rewriting the enum name in two places every time isn't ideal.
My last attempt was to use consts in an impl, like so:
struct MyType {
value: &'static str
}
impl MyType {
const ONE: MyType = MyType { value: "one" };
const TWO: MyType = MyType { value: "two" };
}
This allows single-write implementations and the objects are safely-accessible compile-time constants, however there's no way that I've found to iterate over them (as expressed by work-arounds here) (although this may be possible with some kind of procedural macro).
I'm coming from a lot of TypeScript where this kind of task is very simple:
const values = {
one: "one",
two: "two" // easy property addition
}
values.three; // COMPILE-TIME error
Object.keys(values).forEach(key => {...}) // iteration
Or even in Java where this can be done simply with enums with properties.
I'm aware this smells a bit like an XY problem, but I don't really think it's an absurd thing to ask generally for a safe, iterable, compile-time immutable constant associative array (boy is it a mouthful though). Is this pattern possible in Rust? The fact that I can't find anything on it and that it seems so difficult leads me to believe what I'm doing isn't the best practice for Rust code. In that case, what are the alternatives? If this is a bad design pattern for Rust, what would a good substitute be?
#JakubDóka How would I implement it? I did some looking at procedural macros and couldn't seem to understand how to implement such a macro.
macro_rules! decl_named_iterable_enum {
(
// notice how we format input as it should be inputted (good practice)
// here is the indentifier bound to $name variable, when we later mention it
// it will be replaced with the passed value
$name:ident {
// the `$(...)*` matches 0-infinity of consecutive `...`
// the `$(...)?` matches 0-1 of `...`
$($variant:ident $(= $repr:literal)?,)*
}
) => {
#[derive(Clone, Copy)]
enum $name {
// We use the metavar same way we bind it,
// just ommitting its token type
$($variant),*
// ^ this will insert `,` between the variants
}
impl $name {
// same story just with additional tokens
pub const VARIANTS: &[Self] = &[$(Self::$variant),*];
pub const fn name(self) -> &'static str {
match self {
$(
// see comments on other macro branches, this si a
// common way to handle optional patterns
Self::$variant => decl_named_iterable_enum!(#repr $variant $($repr)?),
)*
}
}
}
};
// this branch will match if literal is present
// in this case we just ignore the name
(#repr $name:ident $repr:literal) => {
$repr
};
// fallback for no literal provided,
// we stringify the name of variant
(#repr $name:ident) => {
stringify!($name)
};
}
// this is how you use the macro, similar to typescript
decl_named_iterable_enum! {
MyEnum {
Variant,
Short = "Long",
}
}
// some example code collecting names of variants
fn main() {
let name_list = MyEnum::VARIANTS
.iter()
.map(|v| v.name())
.collect::<Vec<_>>();
println!("{name_list:?}");
}
// Exercise for you:
// 1. replace `=` for name override with `:`
// 2. add a static `&[&str]` names accessed by `MyEnum::VARIANT_NAMES`

Metaprogamming name to function and type lookup in Rust?

I am working on a system which produces and consumes large numbers of "events", they are a name with some small payload of data, and an attached function which is used as a kind of fold-left over the data, something like a reducer.
I receive from the upstream something like {t: 'fieldUpdated', p: {new: 'new field value'}}, and must in my program associate the fieldUpdated "callback" function with the incoming event and apply it. There is a confirmation command I must echo back (which follows a programatic naming convention), and each type is custome.
I tried using simple macros to do codegen for the structs, callbacks, and with the paste::paste! macro crate, and with the stringify macro I made quite good progress.
Regrettably however I did not find a good way to metaprogram these into a list or map using macros. Extending an enum through macros doesn't seem to be possible, and solutions such as the use of ctors seems extremely hacky.
My ideal case is something this:
type evPayload = {
new: String
}
let evHandler = fn(evPayload: )-> Result<(), Error> { Ok(()) }
// ...
let data = r#"{"t": 'fieldUpdated', "p": {"new": 'new field value'}}"#'
let v: Value = serde_json::from_str(data)?;
Given only knowledge of data how can use macros, specifically (boilerplate is actually 2-3 types, 3 functions, some factory and helper functions) in a way that I can do a name-to-function lookup?
It seems like Serde's adjacently, or internally tagged would get me there, if I could modify a enum in a macro https://serde.rs/enum-representations.html#internally-tagged
It almost feels like I need a macro which can either maintain an enum, or I can "cheat" and use module scoped ctors to do a quasi-static initialization of the names and types into a map.
My program would have on the order of 40-100 of these, with anything from 3-10 in a module. I don't think ctors are necessarily a problem here, but the fact that they're a little grey area handshake, and that ctors might preclude one day being able to cross-compile to wasm put me off a little.
I actually had need of something similar today; the enum macro part specifically. But beware of my method: here be dragons!
Someone more experienced than me — and less mad — should probably vet this. Please do not assume my SAFETY comments to be correct.
Also, if you don't have variant that collide with rust keywords, you might want to tear out the '_' prefix hack entirely. I used a static mut byte array for that purpose, as manipulating strings was an order of magnitude slower, but that was benchmarked in a simplified function. There are likely better ways of doing this.
Finally, I am using it where failing to parse must cause panic, so error handling somewhat limited.
With that being said, here's my current solution:
/// NOTE: It is **imperative** that the length of this array is longer that the longest variant name +1
static mut CHECK_BUFF: [u8; 32] = [b'_'; 32];
macro_rules! str_enums {
($enum:ident: $($variant:ident),* $(,)?) => {
#[allow(non_camel_case_types)]
#[derive(Debug, Default, Hash, Clone, PartialEq, Eq, PartialOrd, Ord)]
enum $enum {
#[default]
UNINIT,
$($variant),*,
UNKNOWN
}
impl FromStr for $enum {
type Err = String;
fn from_str(s: &str) -> Result<Self, Self::Err> {
unsafe {
// SAFETY: Currently only single threaded
CHECK_BUFF[1..len].copy_from_slice(s.as_bytes());
let len = s.len() + 1;
assert!(CHECK_BUFF.len() >= len);
// SAFETY: Safe as long as CHECK_BUFF.len() >= s.len() + 1
match from_utf8_unchecked(&CHECK_BUFF[..len]) {
$(stringify!($variant) => Ok(Self::$variant),)*
_ => Err(format!(
"{} variant not accounted for: {s} ({},)",
stringify!($enum),
from_utf8_unchecked(&CHECK_BUFF[..len])
))
}
}
}
}
impl From<&$enum> for &'static str {
fn from(variant: &$enum) -> Self {
unsafe {
match variant {
// SAFETY: The first byte is always '_', and stripping it of should be safe.
$($enum::$variant => from_utf8_unchecked(&stringify!($variant).as_bytes()[1..]),)*
$enum::UNINIT => {
eprintln!("uninitialized {}!", stringify!($enum));
""
}
$enum::UNKNOWN => {
eprintln!("unknown {}!", stringify!($enum));
""
}
}
}
}
}
impl Display for $enum {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", Into::<&str>::into(self))
}
}
};
}
And then I call it like so:
str_enums!(
AttributeKind:
_alias,
_allowduplicate,
_altlen,
_api,
...
_enum,
_type,
_struct,
);
str_enums!(
MarkupKind:
_alias,
_apientry,
_command,
_commands,
...
);

Force use of a constructor [duplicate]

This question already has answers here:
How to restrict the construction of struct?
(2 answers)
Closed 5 months ago.
I want to give some business-rule guarantees about certain structs. For example, that an EmailAddress is a valid email, or that a DateRange has a from that lies before a from, and so on. So that when passing such a value around, it is guaranteed to adhere to all business rules for that struct.
struct InvalidEmailAddress;
struct EmailAddress {
value: String
}
impl EmailAddress {
fn new(value: String) -> Result<Self, InvalidEmailAddress> {
if value.contains("#") { // I know, this isn't any sort of validation. It's an example.
Ok(Self { value })
} else {
Err(InvalidEmailAddress)
}
}
}
Ignoring that now new() behaves unexpected (it probably would be better to use a build() method), this brings an issue: When someone builds an EmailAddress through the constructor, it is guaranteed to be "valid". But when someone constructs it as normal struct, it may not be.:
let guaranteed_valid = EmailAddress::new(String::from("hi#example.com")).unwrap();
let will_crash = EmailAddress::new(String::from("localhost")).unwrap()
let unknown_valid = EmailAddress { value: String::from("hi-at-example.com") }
I would like to prohibit any users of those structs from constructing them directly like in the last line.
Is that possible at all? Are there any more ways someone could construct an EmailAddress in an invalid way?
I'm OK with placing the structs in a module, and using public/private visibility if that is possible at all. But from what I can see, any code that wants to now enforce the EmailAddress type, say a send_report(to: EmailAddress) would have access to the struct and can build it directly. Or am I missing something crucial?
You need to place your struct in a module. That way any code outside of that module will only be able to access the public functionality. Since value is not public, direct construction will not be allowed:
mod email {
#[derive(Debug)]
pub struct InvalidEmailAddress;
pub struct EmailAddress {
value: String,
}
impl EmailAddress {
pub fn new(value: String) -> Result<Self, InvalidEmailAddress> {
if value.contains("#") {
// I know, this isn't any sort of validation. It's an example.
Ok(Self { value })
} else {
Err(InvalidEmailAddress)
}
}
}
}
use email::EmailAddress;
fn main() {
let e = EmailAddress::new("foo#bar".to_string()).unwrap(); // no error
//let e = EmailAddress { value: "invalid".to_string() }; // "field `value` of struct `EmailAddress` is private"
}
Playground
More details on visibility in the Book.

How to programmatically get the number of fields of a struct?

I have a custom struct like the following:
struct MyStruct {
first_field: i32,
second_field: String,
third_field: u16,
}
Is it possible to get the number of struct fields programmatically (like, for example, via a method call field_count()):
let my_struct = MyStruct::new(10, "second_field", 4);
let field_count = my_struct.field_count(); // Expecting to get 3
For this struct:
struct MyStruct2 {
first_field: i32,
}
... the following call should return 1:
let my_struct_2 = MyStruct2::new(7);
let field_count = my_struct2.field_count(); // Expecting to get count 1
Is there any API like field_count() or is it only possible to get that via macros?
If this is achievable with macros, how should it be implemented?
Are there any possible API like field_count() or is it only possible to get that via macros?
There is no such built-in API that would allow you to get this information at runtime. Rust does not have runtime reflection (see this question for more information). But it is indeed possible via proc-macros!
Note: proc-macros are different from "macro by example" (which is declared via macro_rules!). The latter is not as powerful as proc-macros.
If this is achievable with macros, how should it be implemented?
(This is not an introduction into proc-macros; if the topic is completely new to you, first read an introduction elsewhere.)
In the proc-macro (for example a custom derive), you would somehow need to get the struct definition as TokenStream. The de-facto solution to use a TokenStream with Rust syntax is to parse it via syn:
#[proc_macro_derive(FieldCount)]
pub fn derive_field_count(input: TokenStream) -> TokenStream {
let input = parse_macro_input!(input as ItemStruct);
// ...
}
The type of input is ItemStruct. As you can see, it has the field fields of the type Fields. On that field you can call iter() to get an iterator over all fields of the struct, on which in turn you could call count():
let field_count = input.fields.iter().count();
Now you have what you want.
Maybe you want to add this field_count() method to your type. You can do that via the custom derive (by using the quote crate here):
let name = &input.ident;
let output = quote! {
impl #name {
pub fn field_count() -> usize {
#field_count
}
}
};
// Return output tokenstream
TokenStream::from(output)
Then, in your application, you can write:
#[derive(FieldCount)]
struct MyStruct {
first_field: i32,
second_field: String,
third_field: u16,
}
MyStruct::field_count(); // returns 3
It's possible when the struct itself is generated by the macros - in this case you can just count tokens passed into macros, as shown here. That's what I've come up with:
macro_rules! gen {
($name:ident {$($field:ident : $t:ty),+}) => {
struct $name { $($field: $t),+ }
impl $name {
fn field_count(&self) -> usize {
gen!(#count $($field),+)
}
}
};
(#count $t1:tt, $($t:tt),+) => { 1 + gen!(#count $($t),+) };
(#count $t:tt) => { 1 };
}
Playground (with some test cases)
The downside for this approach (one - there could be more) is that it's not trivial to add an attribute to this function - for example, to #[derive(...)] something on it. Another approach would be to write the custom derive macros, but this is something that I can't speak about for now.

Is there a shorter way than a match or if let to get data through many nested levels without increasing the total amount of code?

I work with a bunch of structs / enums included in each other. I need to get ty.node<TyKind::Path>.1.segments.last().identifiers and ty.node<TyKind::Path>.1.segments.last().parameters<AngleBracketed::AngleBracketed>.types.
Is there a simpler way to get these two values then my implementation of f? My ideal syntax would be:
ty.node<TyKind::Path>?.1.segments.last().identifiers
// and
ty.node<TyKind::Path>?.1.segments.last().parameters<AngleBracketed::AngleBracketed>?.types
It that's impossible, maybe there is a way to reduce the number of if let? I want to solve only this particular case, so simplification should be possible compared to f. If an analog of Option::map / Option::unwrap_or_else were introduced, then the sum of its code + the code in f should be less then my original f.
#[derive(Clone)]
struct Ty {
node: TyKind,
}
#[derive(Clone)]
enum TyKind {
Path(Option<i32>, Path),
}
#[derive(Clone)]
struct Path {
segments: Vec<PathSegment>,
}
#[derive(Clone)]
struct PathSegment {
identifier: String,
parameters: Option<Box<PathParameters>>,
}
#[derive(Clone)]
enum PathParameters {
AngleBracketed(AngleBracketedParameterData),
}
#[derive(Clone)]
struct AngleBracketedParameterData {
types: Vec<Box<Ty>>,
}
/// If Tylnode == Path -> return last path segment + types
fn f(ty: &Ty) -> Option<(String, Vec<Box<Ty>>)> {
match ty.node {
TyKind::Path(_, ref path) => if let Some(seg) = path.segments.iter().last() {
let ident = seg.identifier.clone();
println!("next_ty: seg.id {:?}", seg.identifier);
match seg.parameters.as_ref() {
Some(params) => match **params {
PathParameters::AngleBracketed(ref params) => {
Some((ident, params.types.clone()))
}
_ => Some((ident, vec![])),
},
None => Some((ident, vec![])),
}
} else {
None
},
_ => None,
}
}
To simplify the question, I have removed unrelated enum variants and struct fields.
No.
The closest you can get, using nightly features and helper code, is probably this
fn f(ty: &Ty) -> MyOption<(String, Vec<Box<Ty>>)> {
let last = ty.node.path()?.segments.my_last()?;
Just((
last.identifier.clone(),
last.ab_parameters()
.map(|v| v.types.clone())
.unwrap_or_else(|| vec![]),
))
}
Playground
I guess what you want is called Lenses. Not sure about Rust, but here is about Haskell https://en.m.wikibooks.org/wiki/Haskell/Lenses_and_functional_references
It might be possible to implement that in Rust, if somebody haven't done yet.

Resources