Is there a more beautiful way to convert a character literal to its corresponding escape character? - rust

I'm writing a parser:
match ch {
// ...
'b' => {
token.push('\b');
continue;
},
'f' => {
token.push('\f');
continue;
},
'n' => {
token.push('\n');
continue;
},
'r' => {
token.push('\r');
continue;
},
't' => {
token.push('\t');
continue;
},
// ...
},
There's a lot of repeating code, so I'm thinking about a more elegant way to do it. I thought something like this would be possible:
macro_rules! escaped_match {
($char:expr) => (
'$char' => {
token.push('\$char')
continue;
}
)
}
But my hope is gone:
error: character literal may only contain one codepoint: '$
--> src/main.rs:3:9
|
3 | '$char' => {
| ^^
Is there a more beautiful way to do it, whether using macros, compiler plugins, hacks, or black magic?

Rust macros are not C macros — you cannot create invalid tokens and hope that they are valid sometime in the future. Likewise, they aren't a fancy way of concatenating strings that later get interpreted as code.
Looking at the code, It seems like the main repetition is in the push and continue. I'd probably use normal functions and pattern matching to DRY up that specific code:
fn escape_char(c: char) -> Option<char> {
Some(match c {
// 'b' => '\b',
// 'f' => '\f',
'n' => '\n',
'r' => '\r',
't' => '\t',
_ => return None,
})
}
fn main() {
// ...
if let Some(escape) = escape_char('b') {
token.push(escape);
continue;
}
// ...
}
Now the mapping is constrained to a single x => '\y' line.
Note that \b and \f aren't recognized escape codes in Rust; not sure what you are going to do for those.

Related

Zip iterables with Optional and Non Optional parameter in macro

For the testing part of my lexer, I came up with a simple macro that let met define the expected token type (enum) and the token literal (string):
macro_rules! token_test {
($($ttype:ident: $literal:literal)*) => {
{
vec!($($ttype,)*).iter().zip(vec!($($literal,)*).iter())
}
}
}
and then I can use it like this:
for (ttype, literal) in token_test! {
Let: "let" Identifier: "five" Assign: "=" Int: "5" Semicolon: ";"
} {
//...
}
However, this is a little bit verbose and we don't need to specify the literal for most of the token since I have another macro that transforms an enum variant into a string (eg: Let -> "let").
So what I hope to do is something like:
for (ttype, literal) in token_test! {
Let Identifier: "five" Assign Int: "5" Semicolon
} {
//...
}
And if I understood properly, I can use optional parameters to match either TYPE: LITERAL or TYPE. Maybe something like:
macro_rules! token_test {
($($ttype:ident$(: $literal:literal)?)*) => {
{
//...
}
}
}
So then my question is is there a way to build Vector out of this?
To be more clear:
In the case of no literal passed, it should add the string representation of my enum (eg: Let -> "let")
In the case of literal passed, it should add the literal directly
Made it work with the following macro (any improvement welcomed):
macro_rules! token_test {
($($ttype:ident$(: $literal:literal)?)*) => {
vec!($($ttype,)*).iter().zip(vec!(
$(
{
let mut literal = $ttype.as_str().unwrap();
$(literal = $literal;)?
literal
}
),*).iter())
}
}
This 'iterates' over the literal macro arguments and initially set the value of the as_str which transform a enum variant to a string. Then if the $literal is defined, it replaces the local literal value to that. And finally, it returns the local literal variable.
Improvement
macro_rules! some_or_none {
() => { None };
($entity:literal) => { Some($entity) }
}
macro_rules! token_test {
($($ttype:ident$(: $literal:literal)?)*) => {
vec!($($ttype,)*).iter().zip(vec!($(
some_or_none!($($literal)?).unwrap_or($ttype.as_str().unwrap())
),*))
}
}
Removed some unnecessary scopes, the second .iter(), and added some_or_none macro. With this way I don't need to do the as_str if there is a literal provided.
Further improvement
In the above example, there are two macros that are provided. One is clearly a "private" macro, because its existence is only useful for the implementation of the other one. However, there is a small catch about how macro exports work. Unlike functions, macros cannot access a macro that was defined in the same scope, but which are not accessible from the caller. See this playground example. This is not a problem if you don't intend to export that macro, which is possible since its only purpose is to be used in a test suite. However, you might still want to expose it publicly at a crate level, without exposing some_or_none!. The conventional way to do this is to integrate some_or_none! inside the token_test! macro, by prepending it with #:
macro_rules! token_test {
(#some_or_none) => {
None
};
(#some_or_none $entity:literal) => {
Some($entity)
};
($($ttype:ident $(: $literal:literal)?)*) => {
vec!($($ttype,)*)
.iter()
.zip(vec!($(
token_test!(#some_or_none $($literal)?)
.unwrap_or($ttype.as_str().unwrap())
),*))
};
}
With this version, you can safely export test_token without any fears as shown in this playground.
Little bit more
original idea from steffahn on the Rust Forum
There is another similar way to solve that and without involving unwrap_or, instead of wrapping into an Option in the some_or_none, we can actually create two branches that take either TYPE + LITERAL or TYPE, like so:
macro_rules! token_test {
(#ttype_or_literal $ttype:ident) => { $ttype.as_str().unwrap() };
(#ttype_or_literal $ttype:ident: $literal:literal) => { $literal };
($($ttype:ident $(: $literal:literal)?)*) => {
vec!($($ttype,)*)
.iter()
.zip(vec![$(token_test!(#ttype_or_literal $ttype$(: $literal)?)),*])
};
}
And again
As I only need an iterable than can be deconstructed as (type, iterable), an array of pair is enough:
macro_rules! token_test {
(#ttype_or_literal $ttype:ident) => { $ttype.as_str().unwrap() };
(#ttype_or_literal $ttype:ident: $literal:literal) => { $literal };
($($ttype:ident $(: $literal:literal)?)*) => {
[$(($ttype, token_test!(#ttype_or_literal $ttype$(: $literal)?))),*]
};
}
so no more vec and no more zip.
A Smart trick
A user on the Rust forum gave this potential trick involving ignoring the second argument if it exists. I made the solution a little bit more compact by not having two macros:
macro_rules! token_test {
(#ignore_second $value:expr $(, $_ignored:expr)? $(,)?) => { $value };
($($ttype:ident $(: $literal:literal)?)*) => {
[$(($ttype, token_test!(#ignore_second $($literal,)? $ttype.as_str().unwrap()))),*]
};
}

Match only valid UTF-8 characters

I'm writing an ncurses app with Rust.
When the user inputs a valid UTF-8 char (like ć, or some Asian letters), I want to build up a search string from it and print it to screen. Currently I have this:
use ncurses::*;
fn main() {
...
let mut search_string = String::new();
...
loop {
let user_input = getch();
match user_input {
27 => break,
KEY_UP => { ... },
KEY_DOWN => { ... },
KEY_BACKSPACE => { ... },
_ => {
search_string += &std::char::from_u32(user_input as u32).expect("Invalid char.").to_string();
mvaddstr(0, 0, &search_string);
app::autosearch();
}
}
}
}
However, this catches all other keys, such as F5, KEY_LEFT, etc.
How can I match only valid UTF-8 letters?
If getch gives you a u8, you could collect subsequent key presses into a Vec<u8> and then call e.g. from_utf8 on each getch, handling the error as appropriate (see Utf8Error for more info).
In C, you could call get_wch() instead of getch() -- it returns KEY_CODE_YES for KEY_* codes, while the actual key is stored to an address passed as a parameter. But I don't know how this translates to Rust.

Where will String::from("") be allocated in a match arm?

I am still very new to rust, coming from a C embedded world.
If i have a piece of code like this:
match self {
Command::AT => String::from("AT"),
Command::GetManufacturerId => String::from("AT+CGMI"),
Command::GetModelId => String::from("AT+CGMM"),
Command::GetFWVersion => String::from("AT+CGMR"),
Command::GetSerialNum => String::from("AT+CGSN"),
Command::GetId => String::from("ATI9"),
Command::SetGreetingText { ref enable, ref text } => {
if *enable {
if text.len() > 49 {
// TODO: Error!
}
write!(buffer, "AT+CSGT={},{}", *enable as u8, text).unwrap();
} else {
write!(buffer, "AT+CSGT={}", *enable as u8).unwrap();
}
buffer
},
Command::GetGreetingText => String::from("AT+CSGT?"),
Command::Store => String::from("AT&W0"),
Command::ResetDefault => String::from("ATZ0"),
Command::ResetFactory => String::from("AT+UFACTORY"),
Command::SetDTR { ref value } => {
write!(buffer, "AT&D{}", *value as u8).unwrap();
buffer
},
Command::SetDSR { ref value } => {
write!(buffer, "AT&S{}", *value as u8).unwrap();
buffer
},
Command::SetEcho { ref enable } => {
write!(buffer, "ATE{}", *enable as u8).unwrap();
buffer
},
Command::GetEcho => String::from("ATE?"),
Command::SetEscape { ref esc_char } => {
write!(buffer, "ATS2={}", esc_char).unwrap();
buffer
},
Command::GetEscape => String::from("ATS2?"),
Command::SetTermination { ref line_term } => {
write!(buffer, "ATS3={}", line_term).unwrap();
buffer
}
}
How does it work in Rust? Will all these match arms evaluate immediately, or will only the one matching create a mutable copy on the stack? And also, will all the string literals withing my String::from("") be allocated in .rodata?
Is there a better way of doing what i am trying to do here? Essentially i want to return a string literal, with replaced parameters (the write! macro bits)?
Best regards
Only the matching arm will be evaluated. The non matching arms have no cost apart the size of the program.
In the general case, it's not even possible to evaluate other arms, as they depend on data read using destructuring of the pattern.
As for your second question, the location in a program where literals are stored isn't commonly named rodata, and it's neither specified nor guaranteed (it's usually deduplicated but that's just optimization).

How would I append an Option<char> to a string?

I am using a match statement with .chars().next() and want to append a character to a string if it matches a certain character. I am trying to do so like this
keyword.push(line.chars().next())
but get an error:
expected type 'char' found type Option<<char>>
How would I go about appending this onto my string?
Well, thats the thing: because next() returns an Option<char>, its possible that it returns None. You need to account for that scenario... otherwise you'll likely cause a panic and your application will exit.
So, the blind and error-prone way is to unwrap it:
keyword.push(line.chars().next().unwrap());
That will likely crash at some point. What you want is to destructure it and make sure there's something there:
match line.chars().next() {
Some(c) => {
if c == 'H' || c == 'W' {
keyword.push(c);
}
},
None => ()
}
As Shepmaster points out in the comments, the particular scenario above (where we only care about a single arm of the match) can be condensed to an if let binding:
if let Some(c) = line.chars().next() {
if c == 'H' || c == 'W' {
keyword.push(c);
}
}
That said - you get this all for free by iterating via a for loop:
for c in line.chars() {
if c == 'H' || c == 'W' {
keyword.push(c);
}
}
Playground example

What is the correct & idiomatic way to check if a string starts with a certain character in Rust?

I want to check whether a string starts with some chars:
for line in lines_of_text.split("\n").collect::<Vec<_>>().iter() {
let rendered = match line.char_at(0) {
'#' => {
// Heading
Cyan.paint(*line).to_string()
}
'>' => {
// Quotation
White.paint(*line).to_string()
}
'-' => {
// Inline list
Green.paint(*line).to_string()
}
'`' => {
// Code
White.paint(*line).to_string()
}
_ => (*line).to_string(),
};
println!("{:?}", rendered);
}
I've used char_at, but it reports an error due to its instability.
main.rs:49:29: 49:39 error: use of unstable library feature 'str_char': frequently replaced by the chars() iterator, this method may be removed or possibly renamed in the future; it is normally replaced by chars/char_indices iterators or by getting the first char from a subslice (see issue #27754)
main.rs:49 let rendered = match line.char_at(0) {
^~~~~~~~~~
I'm currently using Rust 1.5
The error message gives useful hints on what to do:
frequently replaced by the chars() iterator, this method may be removed or possibly renamed in the future; it is normally replaced by chars/char_indices iterators or by getting the first char from a subslice (see issue #27754)
We could follow the error text:
for line in lines_of_text.split("\n") {
match line.chars().next() {
Some('#') => println!("Heading"),
Some('>') => println!("Quotation"),
Some('-') => println!("Inline list"),
Some('`') => println!("Code"),
Some(_) => println!("Other"),
None => println!("Empty string"),
};
}
Note that this exposes an error condition you were not handling! What if there was no first character?
We could slice the string and then pattern match on string slices:
for line in lines_of_text.split("\n") {
match &line[..1] {
"#" => println!("Heading"),
">" => println!("Quotation"),
"-" => println!("Inline list"),
"`" => println!("Code"),
_ => println!("Other")
};
}
Slicing a string operates by bytes and thus this will panic if your first character isn't exactly 1 byte (a.k.a. an ASCII character). It will also panic if the string is empty. You can choose to avoid these panics:
for line in lines_of_text.split("\n") {
match line.get(..1) {
Some("#") => println!("Heading"),
Some(">") => println!("Quotation"),
Some("-") => println!("Inline list"),
Some("`") => println!("Code"),
_ => println!("Other"),
};
}
We could use the method that is a direct match to your problem statement, str::starts_with:
for line in lines_of_text.split("\n") {
if line.starts_with('#') { println!("Heading") }
else if line.starts_with('>') { println!("Quotation") }
else if line.starts_with('-') { println!("Inline list") }
else if line.starts_with('`') { println!("Code") }
else { println!("Other") }
}
Note that this solution doesn't panic if the string is empty or if the first character isn't ASCII. I'd probably pick this solution for those reasons. Putting the if bodies on the same line as the if statement is not normal Rust style, but I put it that way to leave it consistent with the other examples. You should look to see how separating them onto different lines looks.
As an aside, you don't need collect::<Vec<_>>().iter(), this is just inefficient. There's no reason to take an iterator, build a vector from it, then iterate over the vector. Just use the original iterator.

Resources