How can I create hygienic identifiers in code generated by procedural macros? - rust

When writing a declarative (macro_rules!) macro, we automatically get macro hygiene. In this example, I declare a variable named f in the macro and pass in an identifier f which becomes a local variable:
macro_rules! decl_example {
($tname:ident, $mname:ident, ($($fstr:tt),*)) => {
impl std::fmt::Display for $tname {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let Self { $mname } = self;
write!(f, $($fstr),*)
}
}
}
}
struct Foo {
f: String,
}
decl_example!(Foo, f, ("I am a Foo: {}", f));
fn main() {
let f = Foo {
f: "with a member named `f`".into(),
};
println!("{}", f);
}
This code compiles, but if you look at the partially-expanded code, you can see that there's an apparent conflict:
impl std::fmt::Display for Foo {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let Self { f } = self;
write!(f, "I am a Foo: {}", f)
}
}
I am writing the equivalent of this declarative macro as a procedural macro, but do not know how to avoid potential name conflicts between the user-provided identifiers and identifiers created by my macro. As far as I can see, the generated code has no notion of hygiene and is just a string:
src/main.rs
use my_derive::MyDerive;
#[derive(MyDerive)]
#[my_derive(f)]
struct Foo {
f: String,
}
fn main() {
let f = Foo {
f: "with a member named `f`".into(),
};
println!("{}", f);
}
Cargo.toml
[package]
name = "example"
version = "0.1.0"
edition = "2018"
[dependencies]
my_derive = { path = "my_derive" }
my_derive/src/lib.rs
extern crate proc_macro;
use proc_macro::TokenStream;
use quote::quote;
use syn::{parse_macro_input, DeriveInput, Meta, NestedMeta};
#[proc_macro_derive(MyDerive, attributes(my_derive))]
pub fn my_macro(input: TokenStream) -> TokenStream {
let input = parse_macro_input!(input as DeriveInput);
let name = input.ident;
let attr = input.attrs.into_iter().filter(|a| a.path.is_ident("my_derive")).next().expect("No name passed");
let meta = attr.parse_meta().expect("Unknown attribute format");
let meta = match meta {
Meta::List(ml) => ml,
_ => panic!("Invalid attribute format"),
};
let meta = meta.nested.first().expect("Must have one path");
let meta = match meta {
NestedMeta::Meta(Meta::Path(p)) => p,
_ => panic!("Invalid nested attribute format"),
};
let field_name = meta.get_ident().expect("Not an ident");
let expanded = quote! {
impl std::fmt::Display for #name {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let Self { #field_name } = self;
write!(f, "I am a Foo: {}", #field_name)
}
}
};
TokenStream::from(expanded)
}
my_derive/Cargo.toml
[package]
name = "my_derive"
version = "0.1.0"
edition = "2018"
[lib]
proc-macro = true
[dependencies]
syn = "1.0.13"
quote = "1.0.2"
proc-macro2 = "1.0.7"
With Rust 1.40, this produces the compiler error:
error[E0599]: no method named `write_fmt` found for type `&std::string::String` in the current scope
--> src/main.rs:3:10
|
3 | #[derive(MyDerive)]
| ^^^^^^^^ method not found in `&std::string::String`
|
= help: items from traits can only be used if the trait is in scope
= note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
|
1 | use std::fmt::Write;
|
What techniques exist to namespace my identifiers from identifiers outside of my control?

Summary: you can't yet use hygienic identifiers with proc macros on stable Rust. Your best bet is to use a particularly ugly name such as __your_crate_your_name.
You are creating identifiers (in particular, f) by using quote!. This is certainly convenient, but it's just a helper around the actual proc macro API the compiler offers. So let's take a look at that API to see how we can create identifiers! In the end we need a TokenStream, as that's what our proc macro returns. How can we construct such a token stream?
We can parse it from a string, e.g. "let f = 3;".parse::<TokenStream>(). But this was basically an early solution and is discouraged now. In any case, all identifiers created this way behave in a non-hygienic manner, so this won't solve your problem.
The second way (which quote! uses under the hood) is to create a TokenStream manually by creating a bunch of TokenTrees. One kind of TokenTree is an Ident (identifier). We can create an Ident via new:
fn new(string: &str, span: Span) -> Ident
The string parameter is self explanatory, but the span parameter is the interesting part! A Span stores the location of something in the source code and is usually used for error reporting (in order for rustc to point to the misspelled variable name, for example). But in the Rust compiler, spans carry more than location information: the kind of hygiene! We can see two constructor functions for Span:
fn call_site() -> Span: creates a span with call site hygiene. This is what you call "unhygienic" and is equivalent to "copy and pasting". If two identifiers have the same string, they will collide or shadow each other.
fn def_site() -> Span: this is what you are after. Technically called definition site hygiene, this is what you call "hygienic". The identifiers you define and the ones of your user live in different universes and won't ever collide. As you can see in the docs, this method is still unstable and thus only usable on a nightly compiler. Bummer!
There are no really great workarounds. The obvious one is to use a really ugly name like __your_crate_some_variable. To make it a bit easier for you, you can create that identifier once and use it within quote! (slightly better solution here):
let ugly_name = quote! { __your_crate_some_variable };
quote! {
let #ugly_name = 3;
println!("{}", #ugly_name);
}
Sometimes you can even search through all identifiers of the user that could collide with yours and then simply algorithmically chose an identifier that does not collide. This is actually what we did for auto_impl, with a fallback super ugly name. This was mainly to improve the generated documentation from having super ugly names in it.
Apart from that, I'm afraid you cannot really do anything.

You can thanks to a UUID:
fn generate_unique_ident(prefix: &str) -> Ident {
let uuid = uuid::Uuid::new_v4();
let ident = format!("{}_{}", prefix, uuid).replace('-', "_");
Ident::new(&ident, Span::call_site())
}

Related

Rust & FFI lib share string & free from both

I have a library that is used through its rust interface by rust programs, as well as through C/C++ programs through generated cbindgen bindings, so I implemented a free function to free the string once the ffi function has used the string. However I want rust also to control the memory when it is used as a rust lib. How do I achieve this? is it even possible? or is calling the free function manually in rust the only option?
I also tried implementing drop, but that lead to this:
free(): double free detected in tcache 2
[1] 11097 IOT instruction cargo run
This block allows the string to be freed from C/C++, but the string is not freed in rust (valgrind shows definitely lost block). data is assigned using CString::into_raw()
use std::{ffi::CString, os::raw::c_char};
pub struct SomeData {
pub data: *const c_char
}
impl SomeData {
#[no_mangle] pub extern fn free_shared_string(&mut self) {
if !self.data.is_null() {
unsafe { CString::from_raw(self.data.cast_mut()); }
}
}
}
The docs for from_raw warn against doing exactly this.
Safety
This should only ever be called with a pointer that was earlier obtained by calling CString::into_raw. Other usage (e.g., trying to take ownership of a string that was allocated by foreign code) is likely to lead to undefined behavior or allocator corruption.
So do not use from_raw to pretend that a foreign string was allocated using Rust. If you just need to borrow it and let C free it, you should use the CStr type for borrowed strings. If you want to take ownership, you should copy it into a new string, or wrap it in a custom structure that has a Drop implementation capable of freeing the original memory.
You cannot have two different languages owning that memory. Rust is fundamentally built on a single-ownership model, so every piece of memory has a unique owner. There are some (intra-Rust) workarounds for that like Rc, but none of that will translate to C. So pick an owner, and make that language responsible for freeing the data.
The best solution for me was to have a separate feature, used when building the library to be used through C/C++ applications (ie. .a/.so) vs .rlib which cargo will build when included in a rust project through Cargo.toml.
This lets me use the same API from both possible application languages, call free from C/C++ on my string, and drop will free it in rust.
Note: the null character at the end is because the majority of the time my lib is used with C apps, hence storing with null for faster returns for them.
Add default-features = false when adding in Cargo.toml of a rust app.
lib.rs
use std::{ffi::{c_char, CStr, FromBytesWithNulError, CString}, mem::forget, str::Utf8Error, string::FromUtf8Error};
#[cfg(feature = "c-str")]
#[repr(C)]
pub struct SharedString {
str: *const c_char
}
#[cfg(not(feature = "c-str"))]
pub struct SharedString {
str: Vec<u8>
}
#[cfg(feature = "c-str")]
impl SharedString {
pub fn from_bytes(buf: &[u8]) -> Self {
let mut buf = buf.to_vec();
if let Some(c) = buf.last() {
if *c != 0 {
buf.push(0);
}
}
let s = Self { str: buf.as_ptr().cast() };
forget(buf);
s
}
pub unsafe fn get_string(&self) -> Result<String, SharedStringError> {
Ok(CStr::from_ptr(self.str).to_str()?.to_owned())
}
pub unsafe fn free(&self) {
if !self.str.is_null() {
let _ = CString::from_raw(self.str.cast_mut());
}
}
}
#[cfg(not(feature = "c-str"))]
impl SharedString {
pub fn from_bytes(buf: &[u8]) -> Self {
let mut buf = buf.to_vec();
if let Some(c) = buf.last() {
if *c != 0 {
buf.push(0);
}
}
Self { str: buf }
}
pub fn get_string(&self) -> Result<String, SharedStringError> {
let mut s = self.str.clone();
if let Some(c) = s.last() {
if *c == 0 {
s.pop();
}
}
String::from_utf8(s).map_err(|e| e.into())
}
// do nothing because rust vec will get dropped automatically
pub fn free(&self) {}
}
// Just for proof of concept
#[derive(Debug)]
pub enum SharedStringError {
NullError,
Utf8Error
}
impl From<FromBytesWithNulError> for SharedStringError {
fn from(_: FromBytesWithNulError) -> Self {
Self::NullError
}
}
impl From<Utf8Error> for SharedStringError {
fn from(_: Utf8Error) -> Self {
Self::Utf8Error
}
}
impl From<FromUtf8Error> for SharedStringError {
fn from(_: FromUtf8Error) -> Self {
Self::Utf8Error
}
}
Cargo.toml
[package]
name = "mylib"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
[features]
default = ["c-str"]
c-str = []

Attempt to implement sscanf in Rust, failing when passing &str as argument

Problem:
Im new to Rust, and im trying to implement a macro which simulates sscanf from C.
So far it works with any numeric types, but not with strings, as i am already trying to parse a string.
macro_rules! splitter {
( $string:expr, $sep:expr) => {
let mut iter:Vec<&str> = $string.split($sep).collect();
iter
}
}
macro_rules! scan_to_types {
($buffer:expr,$sep:expr,[$($y:ty),+],$($x:expr),+) => {
let res = splitter!($buffer,$sep);
let mut i = 0;
$(
$x = res[i].parse::<$y>().unwrap_or_default();
i+=1;
)*
};
}
fn main() {
let mut a :u8; let mut b :i32; let mut c :i16; let mut d :f32;
let buffer = "00:98;76,39.6";
let sep = [':',';',','];
scan_to_types!(buffer,sep,[u8,i32,i16,f32],a,b,c,d); // this will work
println!("{} {} {} {}",a,b,c,d);
}
This obviously wont work, because at compile time, it will try to parse a string slice to str:
let a :u8; let b :i32; let c :i16; let d :f32; let e :&str;
let buffer = "02:98;abc,39.6";
let sep = [':',';',','];
scan_to_types!(buffer,sep,[u8,i32,&str,f32],a,b,e,d);
println!("{} {} {} {}",a,b,e,d);
$x = res[i].parse::<$y>().unwrap_or_default();
| ^^^^^ the trait `FromStr` is not implemented for `&str`
What i have tried
I have tried to compare types using TypeId, and a if else condition inside of the macro to skip the parsing, but the same situation happens, because it wont expand to a valid code:
macro_rules! scan_to_types {
($buffer:expr,$sep:expr,[$($y:ty),+],$($x:expr),+) => {
let res = splitter!($buffer,$sep);
let mut i = 0;
$(
if TypeId::of::<$y>() == TypeId::of::<&str>(){
$x = res[i];
}else{
$x = res[i].parse::<$y>().unwrap_or_default();
}
i+=1;
)*
};
}
Is there a way to set conditions or skip a repetition inside of a macro ? Or instead, is there a better aproach to build sscanf using macros ? I have already made functions which parse those strings, but i couldnt pass types as arguments, or make them generic.
Note before the answer: you probably don't want to emulate sscanf() in Rust. There are many very capable parsers in Rust, so you should probably use one of them.
Simple answer: the simplest way to address your problem is to replace the use of &str with String, which makes your macro compile and run. If your code is not performance-critical, that is probably all you need. If you care about performance and about avoiding allocation, read on.
A downside of String is that under the hood it copies the string data from the string you're scanning into a freshly allocated owned string. Your original approach of using an &str should have allowed for your &str to directly point into the data that was scanned, without any copying. Ideally we'd like to write something like this:
trait MyFromStr {
fn my_from_str(s: &str) -> Self;
}
// when called on a type that impls `FromStr`, use `parse()`
impl<T: FromStr + Default> MyFromStr for T {
fn my_from_str(s: &str) -> T {
s.parse().unwrap_or_default()
}
}
// when called on &str, just return it without copying
impl MyFromStr for &str {
fn my_from_str(s: &str) -> &str {
s
}
}
Unfortunately that doesn't compile, complaining of a "conflicting implementation of trait MyFromStr for &str", even though there is no conflict between the two implementations, as &str doesn't implement FromStr. But the way Rust currently works, a blanket implementation of a trait precludes manual implementations of the same trait, even on types not covered by the blanket impl.
In the future this will be resolved by specialization. Specialization is not yet part of stable Rust, and might not come to stable Rust for years, so we have to think of another solution. In case of macro usage, we can just let the compiler "specialize" for us by creating two traits with the same name. (This is similar to the autoref-based specialization invented by David Tolnay, but even simpler because it doesn't require autoref resolution to work, as we have the types provided explicitly.)
We create separate traits for parsed and unparsed values, and implement them as needed:
trait ParseFromStr {
fn my_from_str(s: &str) -> Self;
}
impl<T: FromStr + Default> ParseFromStr for T {
fn my_from_str(s: &str) -> T {
s.parse().unwrap_or_default()
}
}
pub trait StrFromStr {
fn my_from_str(s: &str) -> &str;
}
impl StrFromStr for &str {
fn my_from_str(s: &str) -> &str {
s
}
}
Then in the macro we just call <$y>::my_from_str() and let the compiler generate the correct code. Since macros are untyped, this works because we never need to provide a single "trait bound" that would disambiguate which my_from_str() we want. (Such a trait bound would require specialization.)
macro_rules! scan_to_types {
($buffer:expr,$sep:expr,[$($y:ty),+],$($x:expr),+) => {
#[allow(unused_assignments)]
{
let res = splitter!($buffer,$sep);
let mut i = 0;
$(
$x = <$y>::my_from_str(&res[i]);
i+=1;
)*
}
};
}
Complete example in the playground.

How to share parts of a string with Rc?

I want to create some references to a str with Rc, without cloning str:
fn main() {
let s = Rc::<str>::from("foo");
let t = Rc::clone(&s); // Creating a new pointer to the same address is easy
let u = Rc::clone(&s[1..2]); // But how can I create a new pointer to a part of `s`?
let w = Rc::<str>::from(&s[0..2]); // This seems to clone str
assert_ne!(&w as *const _, &s as *const _);
}
playground
How can I do this?
While it's possible in principle, the standard library's Rc does not support the case you're trying to create: a counted reference to a part of reference-counted memory.
However, we can get the effect for strings using a fairly straightforward wrapper around Rc which remembers the substring range:
use std::ops::{Deref, Range};
use std::rc::Rc;
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct RcSubstr {
string: Rc<str>,
span: Range<usize>,
}
impl RcSubstr {
fn new(string: Rc<str>) -> Self {
let span = 0..string.len();
Self { string, span }
}
fn substr(&self, span: Range<usize>) -> Self {
// A full implementation would also have bounds checks to ensure
// the requested range is not larger than the current substring
Self {
string: Rc::clone(&self.string),
span: (self.span.start + span.start)..(self.span.start + span.end)
}
}
}
impl Deref for RcSubstr {
type Target = str;
fn deref(&self) -> &str {
&self.string[self.span.clone()]
}
}
fn main() {
let s = RcSubstr::new(Rc::<str>::from("foo"));
let u = s.substr(1..2);
// We need to deref to print the string rather than the wrapper struct.
// A full implementation would `impl Debug` and `impl Display` to produce
// the expected substring.
println!("{}", &*u);
}
There are a lot of conveniences missing here, such as suitable implementations of Display, Debug, AsRef, Borrow, From, and Into — I've provided only enough code to illustrate how it can work. Once supplemented with the appropriate trait implementations, this should be just as usable as Rc<str> (with the one edge case that it can't be passed to a library type that wants to store Rc<str> in particular).
The crate arcstr claims to offer a finished version of this basic idea, but I haven't used or studied it and so can't guarantee its quality.
The crate owning_ref provides a way to hold references to parts of an Rc or other smart pointer, but there are concerns about its soundness and I don't fully understand which circumstances that applies to (issue search which currently has 3 open issues).

Is it possible to get the full "namespace" of a struct in a custom derive?

I've read this documentation page but I'm still unable to figure out how to do this.
My files are:
|- pancakes.rs
|- main.rs
I'm deriving on the struct Pancakes in "pancakes.rs":
#[derive(HelloWorld)]
struct Pancakes;
I have the following implementation copied from the documentation, but the ident does not contain the full "namespace":
#[proc_macro_derive(HelloWorld)]
pub fn hello_world(input: TokenStream) -> TokenStream {
let s = input.to_string();
let ast = syn::parse_derive_input(&s).unwrap();
let gen = impl_hello_world(&ast);
gen.parse().unwrap()
}
fn impl_hello_world(ast: &syn::DeriveInput) -> quote::Tokens {
let name = &ast.ident; // <---- HERE name = Pancakes, not pancakes::Pancakes
quote! {
impl HelloWorld for #name {
fn hello_world() {
println!("Hello, World! My name is {}", stringify!(#name));
}
}
}
}
Is it possible to get all the information about the struct? I'd also like to get the cargo's lib name from where the derive is used.
This is not possible. Custom derive work on token streams, from which you can easily build an AST. But at this level, names have not been resolved yet (which makes sense, macros and custom derives can affect how the names are resolved, so they need to be fully expended first).

No method/field name found

I'm trying to apply some OOP but I'm facing a problem.
use std::io::Read;
struct Source {
look: char
}
impl Source {
fn new() {
Source {look: '\0'};
}
fn get_char(&mut self) {
self.look = 'a';
}
}
fn main() {
let src = Source::new();
src.get_char();
println!("{}", src.look);
}
Compiler reports these errors, for src.get_char();:
error: no method named get_char found for type () in the current
scope
and for println!("{}", src.look);:
attempted access of field look on type (), but no field with that
name was found
I can't find out what I've missed.
Source::new has no return type specified, and thus returns () (the empty tuple, also called unit).
As a result, src has type (), which does not have a get_char method, which is what the error message is telling you.
So, first things first, let's set a proper signature for new: fn new() -> Source. Now we get:
error: not all control paths return a value [E0269]
fn new() -> Source {
Source {look: '\0'};
}
This is caused because Rust is an expression language, nearly everything is an expression, unless a semicolon is used to transform the expression into a statement. You can write new either:
fn new() -> Source {
return Source { look: '\0' };
}
Or:
fn new() -> Source {
Source { look: '\0' } // look Ma, no semi-colon!
}
The latter being the more idiomatic in Rust.
So, let's do that, now we get:
error: cannot borrow immutable local variable `src` as mutable
src.get_char();
^~~
Which is because src is declared immutable (the default), for it to be mutable you need to use let mut src.
And now it all works!
Final code:
use std::io::Read;
struct Source {
look: char
}
impl Source {
fn new() -> Source {
Source {look: '\0'}
}
fn get_char(&mut self) {
self.look = 'a';
}
}
fn main() {
let mut src = Source::new();
src.get_char();
println!("{}", src.look);
}
Note: there is a warning because std::io::Read is unused, but I assume you plan to use it.

Resources