I wanted a way to create an array with non-copy values. So I came up with the following:
use proc_macro::TokenStream;
use quote::quote;
use syn::parse::{Parse, ParseStream, Result};
use syn::{parse_macro_input, Expr, LitInt, Token};
struct ArrayLit(Expr, LitInt);
impl Parse for ArrayLit {
fn parse(input: ParseStream) -> Result<Self> {
let v: Expr = input.parse()?;
let _ = input.parse::<Token![;]>()?;
let n: LitInt = input.parse()?;
Ok(ArrayLit(v, n))
}
}
#[proc_macro]
pub fn arr(input: TokenStream) -> TokenStream {
let arr = parse_macro_input!(input as ArrayLit);
let items = std::iter::repeat(arr.0).take(
arr.1.base10_parse::<usize>().expect("error parsing array length"),
);
(quote! { [#(#items),*] }).into()
}
This works with numeric literal sizes, like:
fn f() -> [Option<u32>; 10] {
let mut it = 0..5;
arr![it.next(); 10]
}
How do I change this proc-macro code so that it will take a const generic? For example, I'd like it to work with the following function:
fn f<const N: usize>() -> [Option<u32>; N] {
let mut it = 0..5;
arr![it.next(); N]
}
This is not possible as written. Your macro works by repeating a fragment of code, which necessarily happens before the program is parsed. Generics, including const generics are monomorphized (converted into code with, in this case, a specific value for N) well after the program has been parsed, type-checked, and so on.
You will have to use a different strategy. For example, you could have your macro generate a loop which loops N times.
By the way, if you're not already aware of it, check out std::array::from_fn() which allows constructing an array from repeatedly calling a function (so, more or less the same effect as your macro).
Related
Problem:
Im new to Rust, and im trying to implement a macro which simulates sscanf from C.
So far it works with any numeric types, but not with strings, as i am already trying to parse a string.
macro_rules! splitter {
( $string:expr, $sep:expr) => {
let mut iter:Vec<&str> = $string.split($sep).collect();
iter
}
}
macro_rules! scan_to_types {
($buffer:expr,$sep:expr,[$($y:ty),+],$($x:expr),+) => {
let res = splitter!($buffer,$sep);
let mut i = 0;
$(
$x = res[i].parse::<$y>().unwrap_or_default();
i+=1;
)*
};
}
fn main() {
let mut a :u8; let mut b :i32; let mut c :i16; let mut d :f32;
let buffer = "00:98;76,39.6";
let sep = [':',';',','];
scan_to_types!(buffer,sep,[u8,i32,i16,f32],a,b,c,d); // this will work
println!("{} {} {} {}",a,b,c,d);
}
This obviously wont work, because at compile time, it will try to parse a string slice to str:
let a :u8; let b :i32; let c :i16; let d :f32; let e :&str;
let buffer = "02:98;abc,39.6";
let sep = [':',';',','];
scan_to_types!(buffer,sep,[u8,i32,&str,f32],a,b,e,d);
println!("{} {} {} {}",a,b,e,d);
$x = res[i].parse::<$y>().unwrap_or_default();
| ^^^^^ the trait `FromStr` is not implemented for `&str`
What i have tried
I have tried to compare types using TypeId, and a if else condition inside of the macro to skip the parsing, but the same situation happens, because it wont expand to a valid code:
macro_rules! scan_to_types {
($buffer:expr,$sep:expr,[$($y:ty),+],$($x:expr),+) => {
let res = splitter!($buffer,$sep);
let mut i = 0;
$(
if TypeId::of::<$y>() == TypeId::of::<&str>(){
$x = res[i];
}else{
$x = res[i].parse::<$y>().unwrap_or_default();
}
i+=1;
)*
};
}
Is there a way to set conditions or skip a repetition inside of a macro ? Or instead, is there a better aproach to build sscanf using macros ? I have already made functions which parse those strings, but i couldnt pass types as arguments, or make them generic.
Note before the answer: you probably don't want to emulate sscanf() in Rust. There are many very capable parsers in Rust, so you should probably use one of them.
Simple answer: the simplest way to address your problem is to replace the use of &str with String, which makes your macro compile and run. If your code is not performance-critical, that is probably all you need. If you care about performance and about avoiding allocation, read on.
A downside of String is that under the hood it copies the string data from the string you're scanning into a freshly allocated owned string. Your original approach of using an &str should have allowed for your &str to directly point into the data that was scanned, without any copying. Ideally we'd like to write something like this:
trait MyFromStr {
fn my_from_str(s: &str) -> Self;
}
// when called on a type that impls `FromStr`, use `parse()`
impl<T: FromStr + Default> MyFromStr for T {
fn my_from_str(s: &str) -> T {
s.parse().unwrap_or_default()
}
}
// when called on &str, just return it without copying
impl MyFromStr for &str {
fn my_from_str(s: &str) -> &str {
s
}
}
Unfortunately that doesn't compile, complaining of a "conflicting implementation of trait MyFromStr for &str", even though there is no conflict between the two implementations, as &str doesn't implement FromStr. But the way Rust currently works, a blanket implementation of a trait precludes manual implementations of the same trait, even on types not covered by the blanket impl.
In the future this will be resolved by specialization. Specialization is not yet part of stable Rust, and might not come to stable Rust for years, so we have to think of another solution. In case of macro usage, we can just let the compiler "specialize" for us by creating two traits with the same name. (This is similar to the autoref-based specialization invented by David Tolnay, but even simpler because it doesn't require autoref resolution to work, as we have the types provided explicitly.)
We create separate traits for parsed and unparsed values, and implement them as needed:
trait ParseFromStr {
fn my_from_str(s: &str) -> Self;
}
impl<T: FromStr + Default> ParseFromStr for T {
fn my_from_str(s: &str) -> T {
s.parse().unwrap_or_default()
}
}
pub trait StrFromStr {
fn my_from_str(s: &str) -> &str;
}
impl StrFromStr for &str {
fn my_from_str(s: &str) -> &str {
s
}
}
Then in the macro we just call <$y>::my_from_str() and let the compiler generate the correct code. Since macros are untyped, this works because we never need to provide a single "trait bound" that would disambiguate which my_from_str() we want. (Such a trait bound would require specialization.)
macro_rules! scan_to_types {
($buffer:expr,$sep:expr,[$($y:ty),+],$($x:expr),+) => {
#[allow(unused_assignments)]
{
let res = splitter!($buffer,$sep);
let mut i = 0;
$(
$x = <$y>::my_from_str(&res[i]);
i+=1;
)*
}
};
}
Complete example in the playground.
fn main() {
fn calculate_two_numbers<T:Debug, const N: usize>(data_set: [T;N]) -> (i32,i32) {
// Key Code
let a = &data_set[0]+&data_set[1];
println!("{:?},{:?},{:?}",&data_set[0],&data_set[1],a);
// Ignore
return (0,0)
}
let data = [1509,1857,1736,1815,1576];
let result = calculate_two_numbers(data);
}
I have a very simple function which takes a list of size n.
From this list, I want to take the first two variables and add them together. I then want to print all of them out with println!.
However, I get the error error[E0369]: cannot add &T to &T
This is the solution the complier suggests but I have trouble understanding it
fn calculate_two_numbers<T:Debug + std::ops::Add<Output = &T>, const N: usize>(data_set: [T;N])
Can someone explain what std::ops::Add<Output = &T> does?
In Rust, when using generic types, you need to tell the compiler what kind of operations you would want to do with the type. This is done be constraining the bounds of your type (You are already using it with Debug, it means you can use it as it were a debug type).
The compiler suggest to add the Add trait, but it will not really work straight away because of the references. You can either add a lifetime or use Copy (with would be probably desirable if you are gonna work with numbers), again add it as another bound:
use std::fmt::Debug;
use std::ops::Add;
fn calculate_two_numbers<T: Debug + Add::<Output=T> + Copy, const N: usize>(data_set: [T; N]) -> (i32, i32) {
// Key Code
let a = data_set[0] + data_set[1];
println!("{:?},{:?},{:?}", data_set[0], data_set[1], a);
// Ignore
return (0, 0);
}
fn main() {
let data = [1509, 1857, 1736, 1815, 1576];
let result = calculate_two_numbers(data);
}
Playground
If using the references approach, you need to specify that for any lifetime for your references that reference implements Add<Output=T>:
fn calculate_two_numbers<T: Debug, const N: usize>(data_set: [T; N]) -> (i32, i32)
where
for<'a> &'a T: Add<Output = T>,
{
// Key Code
let a = &data_set[0] + &data_set[1];
println!("{:?},{:?},{:?}", data_set[0], data_set[1], a);
// Ignore
return (0, 0);
}
Playground
I want to write a macro that generates varying structs from an integer argument. For example, make_struct!(3) might generate something like this:
pub struct MyStruct3 {
field_0: u32,
field_1: u32,
field_2: u32
}
What's the best way to transform that "3" literal into a number that I can use to generate code? Should I be using macro_rules! or a proc-macro?
You need a procedural attribute macro and quite a bit of pipework. An example implementation is on Github; bear in mind that it is pretty rough around the edges, but works pretty nicely to start with.
The aim is to have the following:
#[derivefields(u32, "field", 3)]
struct MyStruct {
foo: u32
}
transpile to:
struct MyStruct {
pub field_0: u32,
pub field_1: u32,
pub field_2: u32,
foo: u32
}
To do this, first, we're going to establish a couple of things. We're going to need a struct to easily store and retrieve our arguments:
struct MacroInput {
pub field_type: syn::Type,
pub field_name: String,
pub field_count: u64
}
The rest is pipework:
impl Parse for MacroInput {
fn parse(input: ParseStream) -> syn::Result<Self> {
let field_type = input.parse::<syn::Type>()?;
let _comma = input.parse::<syn::token::Comma>()?;
let field_name = input.parse::<syn::LitStr>()?;
let _comma = input.parse::<syn::token::Comma>()?;
let count = input.parse::<syn::LitInt>()?;
Ok(MacroInput {
field_type: field_type,
field_name: field_name.value(),
field_count: count.base10_parse().unwrap()
})
}
}
This defines syn::Parse on our struct and allows us to use syn::parse_macro_input!() to easily parse our arguments.
#[proc_macro_attribute]
pub fn derivefields(attr: TokenStream, item: TokenStream) -> TokenStream {
let input = syn::parse_macro_input!(attr as MacroInput);
let mut found_struct = false; // We actually need a struct
item.into_iter().map(|r| {
match &r {
&proc_macro::TokenTree::Ident(ref ident) if ident.to_string() == "struct" => { // react on keyword "struct" so we don't randomly modify non-structs
found_struct = true;
r
},
&proc_macro::TokenTree::Group(ref group) if group.delimiter() == proc_macro::Delimiter::Brace && found_struct == true => { // Opening brackets for the struct
let mut stream = proc_macro::TokenStream::new();
stream.extend((0..input.field_count).fold(vec![], |mut state:Vec<proc_macro::TokenStream>, i| {
let field_name_str = format!("{}_{}", input.field_name, i);
let field_name = Ident::new(&field_name_str, Span::call_site());
let field_type = input.field_type.clone();
state.push(quote!(pub #field_name: #field_type,
).into());
state
}).into_iter());
stream.extend(group.stream());
proc_macro::TokenTree::Group(
proc_macro::Group::new(
proc_macro::Delimiter::Brace,
stream
)
)
}
_ => r
}
}).collect()
}
The behavior of the modifier creates a new TokenStream and adds our fields first. This is extremely important; assume that the struct provided is struct Foo { bar: u8 }; appending last would cause a parse error due to a missing ,. Prepending allows us to not have to care about this, since a trailing comma in a struct is not a parse error.
Once we have this TokenStream, we successively extend() it with the generated tokens from quote::quote!(); this allows us to not have to build the token fragments ourselves. One gotcha is that the field name needs to be converted to an Ident (it gets quoted otherwise, which isn't something we want).
We then return this modified TokenStream as a TokenTree::Group to signify that this is indeed a block delimited by brackets.
In doing so, we also solved a few problems:
Since structs without named members (pub struct Foo(u32) for example) never actually have an opening bracket, this macro is a no-op for this
It will no-op any item that isn't a struct
It will also no-op structs without a member
With this sample code:
use std::fs::{File};
use std::io::{BufRead, BufReader};
use std::path::Path;
type BoxIter<T> = Box<Iterator<Item=T>>;
fn tokens_from_str<'a>(text: &'a str)
-> Box<Iterator<Item=String> + 'a> {
Box::new(text.lines().flat_map(|s|
s.split_whitespace().map(|s| s.to_string())
))
}
// Returns an iterator of an iterator. The use case is a very large file where
// each line is very long. The outer iterator goes over the file's lines.
// The inner iterator returns the words of each line.
pub fn tokens_from_path<P>(path_arg: P)
-> BoxIter<BoxIter<String>>
where P: AsRef<Path> {
let reader = reader_from_path(path_arg);
let iter = reader.lines()
.filter_map(|result| result.ok())
.map(|s| tokens_from_str(&s));
Box::new(iter)
}
fn reader_from_path<P>(path_arg: P) -> BufReader<File>
where P: AsRef<Path> {
let path = path_arg.as_ref();
let file = File::open(path).unwrap();
BufReader::new(file)
}
I get this compiler error message:
rustc 1.18.0 (03fc9d622 2017-06-06)
error: `s` does not live long enough
--> <anon>:23:35
|
23 | .map(|s| tokens_from_str(&s));
| ^- borrowed value only lives until here
| |
| does not live long enough
|
= note: borrowed value must be valid for the static lifetime...
My questions are:
How can this be fixed (without changing the function signatures, if possible?)
Any suggestions on better function arguments and return values?
One issue is that .split_whitespace() takes a reference, and doesn't own its content. So when you try to construct a SplitWhitespace object with an owned owned object (this happens when you call .map(|s| tokens_from_str(&s))), the string s is dropped while SplitWhitespace is still trying to reference it. I wrote a quick fix to this by creating a struct that takes ownership of the String and yields a SplitWhitespace on demand.
use std::fs::File;
use std::io::{BufRead, BufReader};
use std::path::Path;
use std::iter::IntoIterator;
use std::str::SplitWhitespace;
pub struct SplitWhitespaceOwned(String);
impl<'a> IntoIterator for &'a SplitWhitespaceOwned {
type Item = &'a str;
type IntoIter = SplitWhitespace<'a>;
fn into_iter(self) -> Self::IntoIter {
self.0.split_whitespace()
}
}
// Returns an iterator of an iterator. The use case is a very large file where
// each line is very long. The outer iterator goes over the file's lines.
// The inner iterator returns the words of each line.
pub fn tokens_from_path<P>(path_arg: P) -> Box<Iterator<Item = SplitWhitespaceOwned>>
where P: AsRef<Path>
{
let reader = reader_from_path(path_arg);
let iter = reader
.lines()
.filter_map(|result| result.ok())
.map(|s| SplitWhitespaceOwned(s));
Box::new(iter)
}
fn reader_from_path<P>(path_arg: P) -> BufReader<File>
where P: AsRef<Path>
{
let path = path_arg.as_ref();
let file = File::open(path).unwrap();
BufReader::new(file)
}
fn main() {
let t = tokens_from_path("test.txt");
for line in t {
for word in &line {
println!("{}", word);
}
}
}
Disclaimer: Frame Challenge
When dealing with huge files, the simplest solution is to use Memory Mapped Files.
That is, you tell the OS that you want the whole file to be accessible in memory, and it's up to it to deal with paging parts of the file in and out of memory.
Once this is down, then your whole file is accessible as a &[u8] or &str (at your convenience), and you can trivially access slices of it.
It may not always be the fastest solution; it certainly is the easiest.
The problem here is that you do use to_string() to turn each item into an owned value, this is done lazily. Since it's lazy, the value before to_string is used (a &str) still exists in the returned iterator's state, and thus is invalid (since the source String is dropped as soon as your map closure returns).
Naive solution
The simplest solution here is to remove the lazy evaluation for that part of the iterator, and simply allocate all tokens as soon as the line is allocated. This will not be as fast, and will involve an additional allocation, but has minimal changes from your current function, and keeps the same signature:
// Returns an iterator of an iterator. The use case is a very large file where
// each line is very long. The outer iterator goes over the file's lines.
// The inner iterator returns the words of each line.
pub fn tokens_from_path<P>(path_arg: P) -> BoxIter<BoxIter<String>>
where
P: AsRef<Path>
{
let reader = reader_from_path(path_arg);
let iter = reader.lines()
.filter_map(|result| result.ok())
.map(|s| {
let collected = tokens_from_str(&s).collect::<Vec<_>>();
Box::new(collected.into_iter()) as Box<Iterator<Item=String>>
});
Box::new(iter)
}
This solution will be fine for any small workload, and it will only allocate about twice as much memory for the line at the same time. There's a performance penalty, but unless you have 10mb+ lines, it probably won't matter.
If you do choose this solution, I'd recommend changing the function signature of tokens_from_path to directly return a BoxIter<String>:
pub fn tokens_from_path<P>(path_arg: P) -> BoxIter<String>
where
P: AsRef<Path>
{
let reader = reader_from_path(path_arg);
let iter = reader.lines()
.filter_map(|result| result.ok())
.flat_map(|s| {
let collected = tokens_from_str(&s).collect::<Vec<_>>();
Box::new(collected.into_iter()) as Box<Iterator<Item=String>>
});
Box::new(iter)
}
Alternative: decouple tokens_from_path and tokens_from_str
The original code doesn't work because you are trying to return borrows to a String which you don't return.
We can fix that by returning the String instead - just hidden behind an opaque API. This is pretty similar to breeden's solution, but slightly different in execution.
use std::fs::{File};
use std::io::{BufRead, BufReader};
use std::path::Path;
type BoxIter<T> = Box<Iterator<Item=T>>;
/// Structure representing in our code a line, but with an opaque API surface.
pub struct TokenIntermediate(String);
impl<'a> IntoIterator for &'a TokenIntermediate {
type Item = String;
type IntoIter = Box<Iterator<Item=String> + 'a>;
fn into_iter(self) -> Self::IntoIter {
// delegate to tokens_from_str
tokens_from_str(&self.0)
}
}
fn tokens_from_str<'a>(text: &'a str) -> Box<Iterator<Item=String> + 'a> {
Box::new(text.lines().flat_map(|s|
s.split_whitespace().map(|s| s.to_string())
))
}
// Returns an iterator of an iterator. The use case is a very large file where
// each line is very long. The outer iterator goes over the file's lines.
// The inner iterator returns the words of each line.
pub fn token_parts_from_path<P>(path_arg: P) -> BoxIter<TokenIntermediate>
where
P: AsRef<Path>
{
let reader = reader_from_path(path_arg);
let iter = reader.lines()
.filter_map(|result| result.ok())
.map(|s| TokenIntermediate(s));
Box::new(iter)
}
fn reader_from_path<P>(path_arg: P) -> BufReader<File>
where P: AsRef<Path> {
let path = path_arg.as_ref();
let file = File::open(path).unwrap();
BufReader::new(file)
}
As you notice, there's no difference to tokens_from_str, and tokens_from_path just returns this opaque TokenIntermediate struct. This will be just as usable as your original solution, all it does is push the ownership of your intermediate String values onto the caller, so they can iterate the tokens in them.
Rust slices do not currently support some iterator methods, i.e. take_while. What is the best way to implement take_while for slices?
const STRHELLO:&'static[u8] = b"HHHello";
fn main() {
let subslice:&[u8] = STRHELLO.iter().take_while(|c|(**c=='H' as u8)).collect();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3),subslice);
assert!(subslice==STRHELLO.slice_to(3));
}
results in the error:
<anon>:6:74: 6:83 error: the trait `core::iter::FromIterator<&u8>` is not implemented for the type `&[u8]`
This code in the playpen:
http://is.gd/1xkcUa
First of all, the issue you have is that collect is about creating a new collection, while a slice is about referencing a contiguous range of items in an existing array (be it dynamically allocated or not).
I am afraid that due to the nature of traits, the fact that the original container (STRHELLO) was a contiguous range has been lost, and cannot be reconstructed after the fact. I am also afraid that any use of "generic" iterators simply cannot lead to the desired output; the type system would have to somehow carry the fact that:
the original container was a contiguous range
the chain of operations performed so far conserve this property
This may be doable or not, but I do not see it done now, and I am unsure in what way it could be elegantly implemented.
On the other hand, you can go about it in the do-it-yourself way:
fn take_while<'a>(initial: &'a [u8], predicate: |&u8| -> bool) -> &'a [u8] { // '
let mut i = 0u;
for c in initial.iter() {
if predicate(c) { i += 1; } else { break; }
}
initial.slice_to(i)
}
And then:
fn main() {
let subslice: &[u8] = take_while(STRHELLO, |c|(*c==b'H'));
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
}
Note: 'H' as u8 can be rewritten as b'H' as show here, which is symmetric with the strings.
It is possible via some heavy gymnastics to implement this functionality using the stock iterators:
use std::raw::Slice;
use std::mem::transmute;
/// Splice together to slices of the same type that are contiguous in memory.
/// Panics if the slices aren't contiguous with "a" coming first.
/// i.e. slice b must follow slice a immediately in memory.
fn splice<'a>(a:&'a[u8], b:&'a[u8]) -> &'a[u8] {
unsafe {
let aa:Slice<u8> = transmute(a);
let bb:Slice<u8> = transmute(b);
let pa = aa.data as *const u8;
let pb = bb.data as *const u8;
let off = aa.len as int; // Risks overflow into negative!!!
assert!(pa.offset(off) == pb, "Slices were not contiguous!");
let cc = Slice{data:aa.data,len:aa.len+bb.len};
transmute(cc)
}
}
/// Wrapper around splice that lets you use None as a base case for fold
/// Will panic if the slices cannot be spliced! See splice.
fn splice_for_fold<'a>(oa:Option<&'a[u8]>, b:&'a[u8]) -> Option<&'a[u8]> {
match oa {
Some(a) => Some(splice(a,b)),
None => Some(b),
}
}
/// Implementaton using pure iterators
fn take_while<'a>(initial: &'a [u8],
predicate: |&u8| -> bool) -> Option<&'a [u8]> {
initial
.chunks(1)
.take_while(|x|(predicate(&x[0])))
.fold(None, splice_for_fold)
}
usage:
const STRHELLO:&'static[u8] = b"HHHello";
let subslice: &[u8] = super::take_while(STRHELLO, |c|(*c==b'H')).unwrap();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
Matthieu's implementation is way cleaner if you just need take_while. I am posting this anyway since it may be a path towards solving the more general problem of using iterator functions on slices cleanly.