How to return different tokens during tokenization?

How to return different tokens during tokenization? - rust

I'm writing a compiler in Rust and, given a string, part of the logic is to find out of which "kind" the characters are.
I want to return the "value" of each character. For an input of 1 + 2 each character has a "token" and should return something like:
NumberToken, 1
WhiteSpaceToken, ' '
PlusToken, '+'
WhiteSpaceToken, ' '
NumberToken, 1
My function should return something like
enum SyntaxKind {
NumberToken,
WhiteSpaceToken,
PlusToken
}
struct SyntaxToken {
kind: SyntaxKind,
value: // Some general type
}
fn next_token(line: String) -> SyntaxToken {
// Logic goes here
}
How would I implement such logic?

If you're writing out a tokenizer and you wonder what such logic might look like, you can couple these values in the same enum, e.g:
#[derive(Debug)]
enum Token {
Add,
Sub,
Whitespace,
Number(f64),
}
For more, see The Rust Programming Language, "Defining an Enum" on adding data to variants.
… and then you can use a match inside of an iterator to handle it accordingly:
#[derive(Debug)]
enum Token {
Add,
Sub,
Whitespace,
Number(f64),
}
use std::str::Chars;
use std::iter::Peekable;
struct Tokens<'a> {
source: Peekable<Chars<'a>>,
}
pub type TokenIterator<'a> = Peekable<Tokens<'a>>;
impl<'a> Tokens<'a> {
pub fn new(s: &'a str) -> TokenIterator {
Self {
source: s.chars().peekable(),
}
.peekable()
}
}
impl<'a> Iterator for Tokens<'a> {
type Item = Token;
fn next(&mut self) -> Option<Self::Item> {
match self.source.next() {
Some(' ') => Some(Token::Whitespace),
Some('+') => Some(Token::Add),
Some('-') => Some(Token::Sub),
n # Some('0'..='9') => {
let mut number = String::from(n.unwrap());
while let Some(n) = self.source.next_if(char::is_ascii_digit) {
number.push(n);
}
Some(Token::Number(number.parse::<f64>().unwrap()))
}
Some(_) => unimplemented!(),
None => None,
}
}
}
fn main() {
let tokens = Tokens::new("1 + 2");
for token in tokens {
println!("{:?}", token);
}
}
This should then give you:
Number(1.0)
Whitespace
Add
Whitespace
Number(2.0)
Playground

Related

in expressions, `_` can only be used on the left-hand side of an assignment

I've the following code I implemented to return an error when a string doesn't contain a match to something like "Mark, 55" to return an error. The compiler complains about my code Err(error) => Err(ParsePersonError::ParseInt(_))
use std::num::ParseIntError;
use std::str::FromStr;
#[derive(Debug, PartialEq)]
struct Person {
name: String,
age: usize,
}
// We will use this error type for the `FromStr` implementation.
#[derive(Debug, PartialEq)]
enum ParsePersonError {
// Empty input string
Empty,
// Incorrect number of fields
BadLen,
// Empty name field
NoName,
// Wrapped error from parse::<usize>()
ParseInt(ParseIntError),
}
// My implementation
impl FromStr for Person {
type Err = ParsePersonError;
fn from_str(s: &str) -> Result<Person, Self::Err> {
if s.len() == 0 {
Err(ParsePersonError::Empty)
}
else {
let v: Vec<&str> = s.split(",").collect();
println!("{:?}",v);
if &v[0]== &""{
Err(ParsePersonError::NoName)
}
else if v.len()!=2 {
Err(ParsePersonError::BadLen)
}
else {
let num = match v[1].parse::<usize>() {
Ok(n) => {
let name = v[0].to_string();
let age = n;
return Ok(Person {name,age})
},
Err(error) => Err(ParsePersonError::ParseInt(_))
};
Err(ParsePersonError::ParseInt(_))
}
}
}
}
However, the compiler doesn't complain about the same code in my test case.
ParsePersonError::ParseInt(_))
#[test]
fn missing_name_and_age() {
assert!(matches!(
",".parse::<Person>(),
Err(ParsePersonError::NoName | ParsePersonError::ParseInt(_))
));
}
What am I missing?

If you de-sugar the match macro in test, it is something like
let result = ",".parse::<Person>();
let r = match result {
Err(ParsePersonError::NoName) => true,
Err(ParsePersonError::ParseInt(_)) => true,
_ => false,
};
assert!(r);
Essentially the '_' is to hold the value attached to ParseInt. Hence an assignment. This is what the compiler error message means.

How do I extract information about the type in a derive macro?

I am implementing a derive macro to reduce the amount of boilerplate I have to write for similar types.
I want the macro to operate on structs which have the following format:
#[derive(MyTrait)]
struct SomeStruct {
records: HashMap<Id, Record>
}
Calling the macro should generate an implementation like so:
impl MyTrait for SomeStruct {
fn foo(&self, id: Id) -> Record { ... }
}
So I understand how to generate the code using quote:
#[proc_macro_derive(MyTrait)]
pub fn derive_answer_fn(item: TokenStream) -> TokenStream {
...
let generated = quote!{
impl MyTrait for #struct_name {
fn foo(&self, id: #id_type) -> #record_type { ... }
}
}
...
}
But what is the best way to get #struct_name, #id_type and #record_type from the input token stream?

One way is to use the venial crate to parse the TokenStream.
use proc_macro2;
use quote::quote;
use venial;
#[proc_macro_derive(MyTrait)]
pub fn derive_answer_fn(item: proc_macro::TokenStream) -> proc_macro::TokenStream {
// Ensure it's deriving for a struct.
let s = match venial::parse_declaration(proc_macro2::TokenStream::from(item)) {
Ok(venial::Declaration::Struct(s)) => s,
Ok(_) => panic!("Can only derive this trait on a struct"),
Err(_) => panic!("Error parsing into valid Rust"),
};
let struct_name = s.name;
// Get the struct's first field.
let fields = s.fields;
let named_fields = match fields {
venial::StructFields::Named(named_fields) => named_fields,
_ => panic!("Expected a named field"),
};
let inners: Vec<(venial::NamedField, proc_macro2::Punct)> = named_fields.fields.inner;
if inners.len() != 1 {
panic!("Expected exactly one named field");
}
// Get the name and type of the first field.
let first_field_name = &inners[0].0.name;
let first_field_type = &inners[0].0.ty;
// Extract Id and Record from the type HashMap<Id, Record>
if first_field_type.tokens.len() != 6 {
panic!("Expected type T<R, S> for first named field");
}
let id = first_field_type.tokens[2].clone();
let record = first_field_type.tokens[4].clone();
// Implement MyTrait.
let generated = quote! {
impl MyTrait for #struct_name {
fn foo(&self, id: #id) -> #record { *self.#first_field_name.get(&id).unwrap() }
}
};
proc_macro::TokenStream::from(generated)
}

Which signature is most effective when using multiple conditions or Results? How to bubble errors correctly?

Introduction
I'm learning rust and have been trying to find the right signature for using multiple Results in a single function and then returning either correct value, or exit the program with a message.
So far I have 2 different methods and I'm trying to combine them.
Context
This is what I'm trying to achieve:
fn blur(image: DynamicImage, amount: &str) -> DynamicImage {
let amount = parse_between_or_error_out("blur", amount, 0.0, 10.0);
image.brighten(amount)
}
This is what I have working now, but would like to refactor.
fn blur(image: DynamicImage, amount: &str) -> DynamicImage {
match parse::<f32>(amount) {
Ok(amount) => {
verify_that_value_is_between("blur", amount, 0.0, 10.0);
image.blur(amount)
}
_ => {
println!("Error");
process::exit(1)
}
}
}
Combining these methods
Now here's the two working methods that I'm trying to combine, to achieve this.
fn parse<T: FromStr>(value: &str) -> Result<T, <T as FromStr>::Err> {
value.parse::<T>()
}
fn verify_that_value_is_between<T: PartialOrd + std::fmt::Display>(
name: &str,
amount: T,
minimum: T,
maximum: T,
) {
if amount > maximum || amount < minimum {
println!(
"Error: Expected {} amount to be between {} and {}",
name, minimum, maximum
);
process::exit(1)
};
println!("- Using {} of {:.1}/{}", name, amount, maximum);
}
Here's what I tried
I have tried the following. I realise I'm likely doing a range of things wrong. This is because I'm still learning Rust, and I'd like any feedback that helps me learn how to improve.
fn parse_between_or_error_out<T: PartialOrd + FromStr + std::fmt::Display>(
name: &str,
amount: &str,
minimum: T,
maximum: T,
) -> Result<T, <T as FromStr>::Err> {
fn error_and_exit() {
println!(
"Error: Expected {} amount to be between {} and {}",
name, minimum, maximum
);
process::exit(1);
}
match amount.parse::<T>() {
Ok(amount) => {
if amount > maximum || amount < minimum {
error_and_exit();
};
println!("- Using {} of {:.1}/{}", name, amount, maximum);
amount
}
_ => {
error_and_exit();
}
}
}
Currently this looks quite messy, probably I'm using too many or the wrong types and the error needs to be in two places (hence the inlined function, which I know is not good practice).
Full reproducible example.
The question
How to best combine logic that is using a Result and another condition (or Result), exit with a message or give T as a result?
Comments on any of the mistakes are making are very welcome too.

You can use a crate such as anyhow to bubble your events up and handle them as needed.
Alternatively, you can write your own trait and implement it on Result.
trait PrintAndExit<T> {
fn or_print_and_exit(&self) -> T;
}
Then use it by calling the method on any type that implements it:
fn try_get_value() -> Result<bool, MyError> {
MyError { msg: "Something went wrong".to_string() }
}
let some_result: Result<bool, MyError> = try_get_value();
let value: bool = some_result.or_print_and_exit();
// Exits with message: "Error: Something went wrong"
Implementing this trait on Result could be done with:
struct MyError {
msg: String,
}
impl<T> PrintAndExit<T> for Result<T, MyError> {
fn or_print_and_exit(&self) -> T {
match self {
Ok(val) => val,
Err(e) => {
println!("Error: {}", e.msg);
std::process::exit(1);
},
}
}
}

Here are a few DRY tricks.
tl;dr:
Convert other Errors into your unified error type(s) with impl From<ExxError> for MyError;
In any function that may result in an Error, use ? as much as you can. Return Result<???, MyError> (*). ? will utilize the implicit conversion.
(*) Only if MyError is an appropriate type for the function. Always create or use the most appropriate error types. (Kinda obvious, but people often treat error types as a second-class code, pun intended)
Recommendations are in the comments.
use std::error::Error;
use std::str::FromStr;
// Debug and Display are required by "impl Error" below.
#[derive(Debug)]
enum ProcessingError {
NumberFormat{ message: String },
NumberRange{ message: String },
ProcessingError{ message: String },
}
// Display will be used when the error is printed.
// No need to litter the business logic with error
// formatting code.
impl Display for ProcessingError {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
match self {
ProcessingError::NumberFormat { message } =>
write!(f, "Number format error: {}", message),
ProcessingError::NumberRange { message } =>
write!(f, "Number range error: {}", message),
ProcessingError::ProcessingError { message } =>
write!(f, "Image processing error: {}", message),
}
}
}
impl Error for ProcessingError {}
// FromStr::Err will be implicitly converted into ProcessingError,
// when ProcessingError is needed. I guess this is what
// anyhow::Error does under the hood.
// Implement From<X> for ProcessingError for every X error type
// that your functions like process_image() may encounter.
impl From<FromStr::Err> for ProcessingError {
fn from(e: FromStr::Err) -> ProcessingError {
ProcessingError::NumberFormat { message: format!("{}", e) }
}
}
pub fn try_parse<T: FromStr>(value: &str) -> Result<T, ProcessingError> {
// Note ?. It will implicitly return
// Err(ProcessingError created from FromStr::Err)
Ok (
value.parse::<T>()?
)
}
// Now, we can have each function only report/handle errors that
// are relevant to it. ? magically eliminates meaningless code like
// match x { ..., Err(e) => Err(e) }.
pub fn parse_between<T>(value: &str, min_amount: T, max_amount: T)
-> Result<T, ProcessingError>
where
T: FromStr + PartialOrd + std::fmt::Display,
{
let amount = try_parse::<T>(value)?;
if amount > max_amount || amount < min_amount {
Err(ProcessingError::NumberRange {
message: format!(
"Expected value to be between {} and {} but received {}",
min_amount,
max_amount,
amount)
})
} else {
Ok(amount)
}
}
main.rs
use image::{DynamicImage};
use std::fmt::{Debug, Formatter, Display};
fn blur(image: DynamicImage, value: &str)
-> Result<DynamicImage, ProcessingError>
{
let min_amount = 0.0;
let max_amount = 10.0;
// Again, note ? in the end.
let amount = parse_between(value, min_amount, max_amount)?;
image.blur(amount)
}
// All processing extracted into a function, whose Error
// then can be handled by main().
fn process_image(image: DynamicImage, value: &str)
-> Result<DynamicImage, ProcessingError>
{
println!("applying blur {:.1}/{:.1}...", amount, max_amount);
image = blur(image, value);
// save image ...
image
}
fn main() {
let mut image = DynamicImage::new(...);
image = match process_image(image, "1") {
Ok(image) => image,
// No need to reuse print-and-exit functionality. I doubt
// you want to reuse it a lot.
// If you do, and then change your mind, you will have to
// root it out of all corners of your code. Better return a
// Result and let the caller decide what to do with errors.
// Here's a single point to process errors and exit() or do
// something else.
Err(e) => {
println!("Error processing image: {:?}", e);
std::process::exit(1);
}
}
}

Sharing my results
I'll share my results/answer as well for other people who are new to Rust. This answer is based on that of #Acidic9's answer.
The types seem to be fine
anyhow looks to be the de facto standard in Rust.
I should have used a trait and implement that trait for the Error type.
I believe the below example is close to what it might look like in the wild.
// main.rs
use image::{DynamicImage};
use app::{parse_between, PrintAndExit};
fn main() {
// mut image = ...
image = blur(image, "1")
// save image
}
fn blur(image: DynamicImage, value: &str) -> DynamicImage {
let min_amount = 0.0;
let max_amount = 10.0;
match parse_between(value, min_amount, max_amount).context("Input error") {
Ok(amount) => {
println!("applying blur {:.1}/{:.1}...", amount, max_amount);
image.blur(amount)
}
Err(error) => error.print_and_exit(),
}
}
And the implementation inside the apps library, using anyhow.
// lib.rs
use anyhow::{anyhow, Error, Result};
use std::str::FromStr;
pub trait Exit {
fn print_and_exit(self) -> !;
}
impl Exit for Error {
fn print_and_exit(self) -> ! {
eprintln!("{:#}", self);
std::process::exit(1);
}
}
pub fn try_parse<T: FromStr>(value: &str) -> Result<T, Error> {
match value.parse::<T>() {
Ok(value) => Ok(value),
Err(_) => Err(anyhow!("\"{}\" is not a valid value.", value)),
}
}
pub fn parse_between<T>(value: &str, min_amount: T, max_amount: T) -> Result<T, Error>
where
T: FromStr + PartialOrd + std::fmt::Display,
{
match try_parse::<T>(value) {
Ok(amount) => {
if amount > max_amount || amount < min_amount {
return Err(anyhow!(
"Expected value to be between {} and {} but received {}",
min_amount,
max_amount,
amount
));
};
Ok(amount)
}
Err(error) => Err(error),
}
}
Hopefully seeing this full implementation will help someone out there.
Source code.

How do I return an error within match statement while implementing from_str in rust?

I'm new to rust.
I'm trying to follow the example for implementing the from_str trait here
https://doc.rust-lang.org/std/str/trait.FromStr.html
But I keep getting this error pointing at 'return Err(Self::Err)'
variant or associated item not found in `black_jack_tools::PlayerDifficulty`
I have an idea of why, Self::Err isn't defined in my enum But I don't get why rust cares in this scenario since I'm returning an Err of my Err object which is inline with the Result<Self,Self::Err> type.
Here's my FromStr is below here's a link to the rust playground with an MRE
impl FromStr for PlayerDifficulty {
type Err = ParseError;
fn from_str(s:&str) -> Result<Self,Self::Err>{
let result = match s {
"Player" => Ok(PlayerDifficulty::Player),
"Dealer" => Ok(PlayerDifficulty::Dealer),
"Normal" => Ok(PlayerDifficulty::Normal),
"Perfect"=> Ok(PlayerDifficulty::Perfect),
"Micky" => Ok(PlayerDifficulty::Micky),
"Elliot" => Ok(PlayerDifficulty::Elliot),
"Cultist"=> Ok(PlayerDifficulty::Cultist),
_ => return Err(Self::Err)
};
}
}
What Am I doing wrong?
Is there a better way to do this?

There are three issues with your code. The first is that you need to use <Self as FromStr>::Err if you want to refer to the Err type in your FromStr implementation:
impl FromStr for PlayerDifficulty {
type Err = ParseError;
fn from_str(s:&str) -> Result<Self,Self::Err>{
let result = match s {
"Player" => Ok(PlayerDifficulty::Player),
/* ... */
_ => return Err(<Self as FromStr>::Err)
};
}
}
Self::Err tries to look for an Err variant in the PlayerDifficulty enum but there is no such variant.
The second issue is that std::string::ParseError is in fact an alias for std::convert::Infallible, which is an error that can never happen and cannot be instantiated. Since your conversion may fail, you need to use an error that can be instantiated or define your own:
struct UnknownDifficultyError;
impl FromStr for PlayerDifficulty {
type Err = UnknownDifficultyError;
fn from_str(s:&str) -> Result<Self,Self::Err>{
let result = match s {
"Player" => Ok(PlayerDifficulty::Player),
/* ... */
_ => return Err(UnknownDifficultyError),
};
}
}
Finally, you need to return the result even when conversion succeeds, by removing the let result = and the semicolon:
struct UnknownDifficultyError;
impl FromStr for PlayerDifficulty {
type Err = UnknownDifficultyError;
fn from_str(s:&str) -> Result<Self,Self::Err>{
match s {
"Player" => Ok(PlayerDifficulty::Player),
/* ... */
_ => return Err(UnknownDifficultyError),
}
}
}
Playground

The function will return it last statement. Remove the last semicolon, and you could also remove the internal return statement, the result of the match statement will be returned.
Is there a better way? It looks like you are parsing a string to a enum, the create enum-utils does that. Instead of implementing the parser with boilerplate code you just derive it.
#[derive(Debug, PartialEq, enum_utils::FromStr)]
enum PlayerDifficulty {
Player,
Dealer,
Cultist,
Normal,
}
fn main() {
let _x:PlayerDifficulty= "Player".parse().unwrap();
}
And in your cargo.toml
[dependencies]
enum-utils = "0.1.2"

You should define a custom error
#[derive(Debug)]
struct PlayerError;
impl std::fmt::Display for PlayerError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "Could not parse player")
}
}
impl std::error::Error for PlayerError{}
Then change the match always return the Result in the same path
use std::str::FromStr;
impl FromStr for PlayerDifficulty {
type Err = PlayerError;
fn from_str(s:&str) -> Result<Self,Self::Err>{
match s {
"Player" => Ok(PlayerDifficulty::Player),
"Dealer" => Ok(PlayerDifficulty::Dealer),
"Normal" => Ok(PlayerDifficulty::Normal),
"Perfect"=> Ok(PlayerDifficulty::Perfect),
"Micky" => Ok(PlayerDifficulty::Micky),
"Elliot" => Ok(PlayerDifficulty::Elliot),
"Cultist"=> Ok(PlayerDifficulty::Cultist),
_ => Err(PlayerError)
}
}
}
And use it with ? to propagate error.
fn main() -> (Result<(),Box<dyn std::error::Error>>) {
let _x = PlayerDifficulty::from_str("Player")?;
let _x = PlayerDifficulty::from_str("PlayerPlayer")?;
Ok(())
}

Write a macro that invokes methods base on the input ident token [duplicate]

I'd like to create a setter/getter pair of functions where the names are automatically generated based on a shared component, but I couldn't find any example of macro rules generating a new name.
Is there a way to generate code like fn get_$iden() and SomeEnum::XX_GET_$enum_iden?

If you are using Rust >= 1.31.0 I would recommend using my paste crate which provides a stable way to create concatenated identifiers in a macro.
macro_rules! make_a_struct_and_getters {
($name:ident { $($field:ident),* }) => {
// Define the struct. This expands to:
//
// pub struct S {
// a: String,
// b: String,
// c: String,
// }
pub struct $name {
$(
$field: String,
)*
}
paste::item! {
// An impl block with getters. Stuff in [<...>] is concatenated
// together as one identifier. This expands to:
//
// impl S {
// pub fn get_a(&self) -> &str { &self.a }
// pub fn get_b(&self) -> &str { &self.b }
// pub fn get_c(&self) -> &str { &self.c }
// }
impl $name {
$(
pub fn [<get_ $field>](&self) -> &str {
&self.$field
}
)*
}
}
};
}
make_a_struct_and_getters!(S { a, b, c });

My mashup crate provides a stable way to create new identifiers that works with any Rust version >= 1.15.0.
#[macro_use]
extern crate mashup;
macro_rules! make_a_struct_and_getters {
($name:ident { $($field:ident),* }) => {
// Define the struct. This expands to:
//
// pub struct S {
// a: String,
// b: String,
// c: String,
// }
pub struct $name {
$(
$field: String,
)*
}
// Use mashup to define a substitution macro `m!` that replaces every
// occurrence of the tokens `"get" $field` in its input with the
// concatenated identifier `get_ $field`.
mashup! {
$(
m["get" $field] = get_ $field;
)*
}
// Invoke the substitution macro to build an impl block with getters.
// This expands to:
//
// impl S {
// pub fn get_a(&self) -> &str { &self.a }
// pub fn get_b(&self) -> &str { &self.b }
// pub fn get_c(&self) -> &str { &self.c }
// }
m! {
impl $name {
$(
pub fn "get" $field(&self) -> &str {
&self.$field
}
)*
}
}
}
}
make_a_struct_and_getters!(S { a, b, c });

No, not as of Rust 1.22.
If you can use nightly builds...
Yes: concat_idents!(get_, $iden) and such will allow you to create a new identifier.
But no: the parser doesn't allow macro calls everywhere, so many of the places you might have sought to do this won't work. In such cases, you are sadly on your own. fn concat_idents!(get_, $iden)(…) { … }, for example, won't work.

There's a little known crate gensym that can generate unique UUID names and pass them as the first argument to a macro, followed by a comma:
macro_rules! gen_fn {
($a:ty, $b:ty) => {
gensym::gensym!{ _gen_fn!{ $a, $b } }
};
}
macro_rules! _gen_fn {
($gensym:ident, $a:ty, $b:ty) => {
fn $gensym(a: $a, b: $b) {
unimplemented!()
}
};
}
mod test {
gen_fn!{ u64, u64 }
gen_fn!{ u64, u64 }
}
If all you need is a unique name, and you don't care what it is, that can be useful. I used it to solve a problem where each invocation of a macro needed to create a unique static to hold a singleton struct. I couldn't use paste, since I didn't have unique identifiers I could paste together in the first place.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to return different tokens during tokenization? - rust

Related

in expressions, `_` can only be used on the left-hand side of an assignment

How do I extract information about the type in a derive macro?

Which signature is most effective when using multiple conditions or Results? How to bubble errors correctly?

How do I return an error within match statement while implementing from_str in rust?

Write a macro that invokes methods base on the input ident token [duplicate]

Categories

Resources