I'm writing a parser in Rust, which needs at various points to match the current token against candidate values. Some of the candidate values are characters, others are integer constants, so the token is declared as i32, which would be plenty to accommodate both. (All the characters to be matched against are ASCII.)
The problem is that when I supply a character constant like '(' to be matched against, the compiler complains that it expected i32 and is getting char.
I tried writing e.g. '(' as i32 but an as expression is not allowed as a match candidate.
Obviously I could look up the ASCII values and provide them as numbers, but it seems there should be a more readable solution. Declaring the token as char doesn't really seem correct, as it sometimes needs to hold integers that are not actually characters.
What's the recommended way to solve this problem?
It’s a bit verbose, but your match arms could be of the form c if c == i32::from(b'(').
Another alternative would be to match on u8::try_from(some_i32) (branch arms Some(b'(') and then either None if some_i32 == … or None => { match some_i32 { … } }).
Yet another would be to change the type from i32 to your own enum, which is probably the cleanest option but might require some convincing of the Rust compiler to get an i32-like representation if you need that for some reason.
Finally, you could define const PAREN_OPEN: i32 = b'(' as i32; and use PAREN_OPEN as the pattern.
Since as expressions are allowed in constants, and matching is allowed against constants, you can use a constant:
const LPAREN: i32 = '(' as i32;
match v {
LPAREN => { ... }
// ...
}
If you can use nightly, you can use the inline_const_pat feature to reduce the boilerplate:
#![feature(inline_const_pat)]
match v {
const { '(' as i32 } => { ... }
// ...
}
Another way: here's a small proc macro that will replace the characters with their numerical value (it does not work with nested char patterns):
use proc_macro::TokenStream;
use quote::ToTokens;
#[proc_macro]
pub fn i32_match(input: TokenStream) -> TokenStream {
let mut input = syn::parse_macro_input!(input as syn::ExprMatch);
for arm in &mut input.arms {
if let syn::Pat::Lit(lit) = &mut arm.pat {
if let syn::Expr::Lit(syn::ExprLit { lit, .. }) = &mut *lit.expr {
if let syn::Lit::Char(ch) = lit {
*lit = syn::Lit::Int(syn::LitInt::new(
&(ch.value() as i32).to_string(),
ch.span(),
));
}
}
}
}
input.into_token_stream().into()
}
i32_match! {
match v {
'(' => { ... }
// ...
}
}
Related
I am trying to convert Rc<Vec<F>> into into Rc<Vec<T>> where T and F are numeric types like u8, f32, f64, etc. As the vectors may be quite large, I would like to avoid copying them if F and T are the same type. I do not manage to find out how to do that. Something like this -- it does not compile as the type comparison T == F is invalid:
fn convert_vec<F: num::NumCast + Copy, T: num::NumCast + Copy>(data: &[F], undef: T) -> Vec<T> {
data.iter()
.map(|v| match T::from(*v) {
Some(x) => x,
None => undef,
})
.collect()
}
fn convert_rc_vec<F: num::NumCast + Copy, T: num::NumCast + Copy>(
data: &Rc<Vec<F>>,
undef: T,
) -> anyhow::Result<Rc<Vec<T>>> {
if (T == F) { // invalid
Ok(data.clone()) // invalid
} else {
Ok(Rc::new(convert_vec(data, undef)))
}
}
The vector that I need to convert from is the response from a server which first sends the data type (something like "u8", "f32", "f64", ...) and then the actual data. At present, I store the vector with these data in enum like
pub enum Values {
UInt8(Rc<Vec<u8>>),
Float32(Rc<Vec<f32>>),
Float64(Rc<Vec<f64>>),
// ...
}
At compile time, I do not know in which format the server will send the data, i.e. I do not know F in advance. I do know T in every case I use it, but T might be a different type depending on the use case.
Using specialized functions like convert_rc_vec_to_f32 it is easy to handle the case where clone() is best. But that requires a separate function for each T with almost identical text. I am trying to find a more elegant solution than writing a macro or more or less repeating the code 9 times.
You should not try to prevent your function from being monomorphized with T and F being the same type, or even change its behavior in that case. Instead, you should not use it at all if it would be monomorphized in that case. This is possible because, if T and F were the same type, you would know it at compile time, so you could actually simply remove the function call at all.
It seems that you are actually storing all these vectors into an enum, which means you only know the actual type at run-time. But this doesn't mean my suggestion doesn't apply. Typically, if you wanted to get a vec of f32, you can do something like
match data {
Float32(v) => v,
Float64(v) => convert_rc_vec(v),
UInt8(v) => convert_rc_vec(v),
...
}
If T and F both have a 'static lifetime, then you can use TypeId to compare the two types "at runtime":
if TypeId::of::<T>() == TypeId::of::<F>() {
Ok(data.clone()) // invalid
} else {
/* ... */
}
However, since this comparison happens "at runtime", the type system still doesn't know that T == F inside of this branch. You can use unsafe code to force this "conversion":
if TypeId::of::<T>() == TypeId::of::<F>() {
Ok(unsafe {
// SAFETY: this is sound because `T == F`, so we're
// just helping the compiler along here, with no actual
// type conversions
Rc::<Vec<T>>::from_raw(
Rc::<Vec<F>>::into_raw(data.clone()) as *const _
)
})
} else {
/* ... */
}
Example code snippet:
fn foo() -> i32 {
let a = return 2;
a + 1
}
fn main() {
println!("{}", foo());
}
I would expect that since a will never actually get assigned anything, its type should be !. However the compiler tells me that its type is in-fact () (the unit type). This struck as weird to me. What could be the reason behind this?
The type of return 42 is !:
break, continue and return expressions also have type !. For example we are allowed to write:
#![feature(never_type)]
let x: ! = {
return 123
};
From https://doc.rust-lang.org/std/primitive.never.html.
But one of the characteristic of ! is that it can be coerce to a value of any type:
fn foo() -> i32 {
let a: String = return 2;
42
}
fn main() {
println!("{}", foo());
}
This is what enables things like
let num: u32 = match get_a_number() {
Some(num) => num,
None => break,
};
(from the same page).
Both branches must have the same type. num is clearly u32, and break is !. Both can then have the same type u32 by coercing break to u32.
This is perfectly fine because a value of type ! can never exist, so the compiler can "convert" it to any value of any other type.
Where the confusion arises in your example is that the compiler will claim "error[E0277]: cannot add i32 to ()". This is probably for historical reasons. ! didn't use to exist as such in the Rust 1.0 days. Over time it became more of a first class citizen, but some special cases were required for backwards compatibility where ! will be treated as () over any other type.
How do I match against a nested String in Rust? Suppose I have an enum like
pub enum TypeExpr {
Ident((String, Span)),
// other variants...
}
and a value lhs of type &Box<TypeExpr>. How do I check whether it is an Ident with the value "float"?
I tried
if let TypeExpr::Ident(("float", lhs_span)) = **lhs {}
but this doesn't work since TypeExpr contains a String, not a &str. I tried every variation of the pattern I could think of, but nothing seems to work.
If you really want to do this with an if let, you might have to do it like this
if let TypeExpr::Ident((lhs_name, lhs_span)) = lhs {
if lhs_name == "float" {
// Do the things
}
}
Of course, it can also be done with a match:
match lhs {
TypeExpr::Ident((lhs_name, lhs_span)) if lhs_name == "float" => {
// Do the things
}
_ => {}
}
I am implementing a function-like procedural macro which takes a single string literal as an argument, but I don't know how to get the value of the string literal.
If I print the variable, it shows a bunch of fields, which includes both the type and the value. They are clearly there, somewhere. How do I get them?
extern crate proc_macro;
use proc_macro::{TokenStream,TokenTree};
#[proc_macro]
pub fn my_macro(input: TokenStream) -> TokenStream {
let input: Vec<TokenTree> = input.into_iter().collect();
let literal = match &input.get(0) {
Some(TokenTree::Literal(literal)) => literal,
_ => panic!()
};
// can't do anything with "literal"
// println!("{:?}", literal.lit.symbol); says "unknown field"
format!("{:?}", format!("{:?}", literal)).parse().unwrap()
}
#![feature(proc_macro_hygiene)]
extern crate macros;
fn main() {
let value = macros::my_macro!("hahaha");
println!("it is {}", value);
// prints "it is Literal { lit: Lit { kind: Str, symbol: "hahaha", suffix: None }, span: Span { lo: BytePos(100), hi: BytePos(108), ctxt: #0 } }"
}
After running into the same problem countless times already, I finally wrote a library to help with this: litrs on crates.io. It compiles faster than syn and lets you inspect your literals.
use std::convert::TryFrom;
use litrs::StringLit;
use proc_macro::TokenStream;
use quote::quote;
#[proc_macro]
pub fn my_macro(input: TokenStream) -> TokenStream {
let input = input.into_iter().collect::<Vec<_>>();
if input.len() != 1 {
let msg = format!("expected exactly one input token, got {}", input.len());
return quote! { compile_error!(#msg) }.into();
}
let string_lit = match StringLit::try_from(&input[0]) {
// Error if the token is not a string literal
Err(e) => return e.to_compile_error(),
Ok(lit) => lit,
};
// `StringLit::value` returns the actual string value represented by the
// literal. Quotes are removed and escape sequences replaced with the
// corresponding value.
let v = string_lit.value();
// TODO: implement your logic here
}
See the documentation of litrs for more information.
To obtain more information about a literal, litrs uses the Display impl of Literal to obtain a string representation (as it would be written in source code) and then parses that string. For example, if the string starts with 0x one knows it has to be an integer literal, if it starts with r#" one knows it is a raw string literal. The crate syn does exactly the same.
Of course, it seems a bit wasteful to write and run a second parser given that rustc already parsed the literal. Yes, that's unfortunate and having a better API in proc_literal would be preferable. But right now, I think litrs (or syn if you are using syn anyway) are the best solutions.
(PS: I'm usually not a fan of promoting one's own libraries on Stack Overflow, but I am very familiar with the problem OP is having and I very much think litrs is the best tool for the job right now.)
If you're writing procedural macros, I'd recommend that you look into using the crates syn (for parsing) and quote (for code generation) instead of using proc-macro directly, since those are generally easier to deal with.
In this case, you can use syn::parse_macro_input to parse a token stream into any syntatic element of Rust (such as literals, expressions, functions), and will also take care of error messages in case parsing fails.
You can use LitStr to represent a string literal, if that's exactly what you need. The .value() function will give you a String with the contents of that literal.
You can use quote::quote to generate the output of the macro, and use # to insert the contents of a variable into the generated code.
use proc_macro::TokenStream;
use syn::{parse_macro_input, LitStr};
use quote::quote;
#[proc_macro]
pub fn my_macro(input: TokenStream) -> TokenStream {
// macro input must be `LitStr`, which is a string literal.
// if not, a relevant error message will be generated.
let input = parse_macro_input!(input as LitStr);
// get value of the string literal.
let str_value = input.value();
// do something with value...
let str_value = str_value.to_uppercase();
// generate code, include `str_value` variable (automatically encodes
// `String` as a string literal in the generated code)
(quote!{
#str_value
}).into()
}
I always want a string literal, so I found this solution that is good enough. Literal implements ToString, which I can then use with .parse().
#[proc_macro]
pub fn my_macro(input: TokenStream) -> TokenStream {
let input: Vec<TokenTree> = input.into_iter().collect();
let value = match &input.get(0) {
Some(TokenTree::Literal(literal)) => literal.to_string(),
_ => panic!()
};
let str_value: String = value.parse().unwrap();
// do whatever
format!("{:?}", str_value).parse().unwrap()
}
I had similar problem for parsing doc attribute. It is also represented as a TokenStream. This is not exact answer but maybe will guide in a proper direction:
fn from(value: &Vec<Attribute>) -> Vec<String> {
let mut lines = Vec::new();
for attr in value {
if !attr.path.is_ident("doc") {
continue;
}
if let Ok(Meta::NameValue(nv)) = attr.parse_meta() {
if let Lit::Str(lit) = nv.lit {
lines.push(lit.value());
}
}
}
lines
}
I tried to run the following code snippet:
let a = &[Some(1), Some(2), Some(3), None, Some(4)];
let mut sum = 0;
for &Some(x) in a.iter() {
sum += x;
}
assert_eq!(sum, 1+2+3+4);
The compiler replied with:
about_loops.rs:39:9: 43:18 error: non-exhaustive patterns: None not covered
about_loops.rs:39 for &Some(x) in a.iter() {
about_loops.rs:40 sum += x;
about_loops.rs:41 }
about_loops.rs:42
about_loops.rs:43 assert_eq!(sum, 1+2+3+4);
error: aborting due to previous error
make: *** [all] Error 101
Can I make such a construct compile for a for loop without using a match expression as suggested by luke and hobbs? Or is this error message misleading?
It does not seem so given the grammar definition of for.
for_expr : "for" pat "in" expr '{' block '}' ;
I'm on:
rustc 0.11.0-pre-nightly (6291955 2014-05-19 23:41:20 -0700)
host: x86_64-apple-darwin
To clarify: How expressive is the 'pat' portion of for_expr? This is not specified under http://doc.rust-lang.org/rust.html#for-expressions in contrast to the definition under http://doc.rust-lang.org/rust.html#match-expressions.
The pattern of a for loop essentially has the same restrictions as a let: it has to be irrefutable, that is, it can't ever fail to match.
Examples of irrefutable patterns are &, tuples, structs and single-variant enums. Other patterns (like multivariant enums or literals) aren't guaranteed to always match, since the type allows for values that aren't covered by the pattern.
The for construct is essentially a macro that desugars as follows (it desugars in the same pass as macros are expanded, you can see it manually running rustc with --pretty expanded):
for <pattern> in <iter_expression> {
<code>
}
// becomes
match &mut <iter_expression> { // match to guarantee data lives long enough
it => {
loop {
match it.next() {
None => break,
Some(<pattern>) => { <code> }
}
}
}
}
That is a normal match, i.e. the arms have to be exhaustive (cover every possibility), and so if <pattern> is just &Some(_), then the Some(&None) possibility isn't covered.
(The Some arm is essentially equivalent to Some(value) => { let <pattern> = value; .... Thinking about it now, this might actually be a desugaring that gives better error messages: I filed #14390.)
The Some is a type in an enum. The Option enum has two types, Some(T) and None. Your code assumes that a.iter() always is Some(T), and never checks for None. To add in the check, you can use an match. Something like this:
let a = &[Some(1), Some(2), Some(3), None, Some(4)];
let mut sum = 0;
for &j in a.iter() {
match j {
Some(x) => sum += x,
None => ()
}
}
assert_eq!(sum, 1+2+3+4);
Hope that helps!
for is binding each element in a to the pattern &Some(x) — so when the first element of a is &Some(1), x becomes 1. But None doesn't match the pattern &Some(x) so the binding can't succeed. Rust infers from the literal values that the type of a is actually Option (the type that encompasses either Some(_) or None) and that your pattern doesn't cover all of the possibilities. Instead of waiting for runtime to tell you it doesn't know what to do, it throws an error at compile-time instead.
From what little Rust I know (mostly having read the tutorial) I think you need to do something like:
for &thing in a.iter() {
match thing {
Some(x) => sum += x
None => /* do nothing */
}
}
The following works as well:
use std::iter::AdditiveIterator;
fn main() {
let a = &[Some(1), Some(2), Some(3), None, Some(4)];
let sum = a.iter().filter_map(|x| *x).sum();
assert_eq!(sum, 1+2+3+4);
}
This also works:
let sum = a.iter().fold(0, |s, e| s + e.unwrap_or(0));