Why does Rust treat '{{' and '}}' differently? - rust

Because of escaping, I thought that given this code:
fn main() {
println!("{}}");
println!("{{}");
}
I would get an error message similar to unmatched '}' in format string for the first println! and unmatched '{' in format string for the second println!. However, I actually get the same error for both uses of println!:
error: invalid format string: unmatched `}` found
--> src/main.rs:2:17
|
2 | println!("{}}");
| ^ unmatched `}` in format string
|
= note: if you intended to print `}`, you can escape it using `}}`
error: invalid format string: unmatched `}` found
--> src/main.rs:3:17
|
3 | println!("{{}");
| ^ unmatched `}` in format string
|
= note: if you intended to print `}`, you can escape it using `}}`
This would imply that the first println! must take a format argument, and the second doesn't. Why this behavior?
Playground

This is because this is a formatter string, not just any string, so both { and } are significant.
Note the escaping notation used here: {{ is equal to {, so } on its own is a close without a corresponding open.
In the first case you have open, close, close, so the second close is unmatched.
In the second case you have literal {, close, and the first close is unmatched.

It's simply due to how Rust parses the format string, from left to right. {}} parses {, which calls for a } which is found. Then it goes to }, and reports an error. Similarly, with {{}, { calls for either a { (escaped {) or a }. It finds it, then goes to the next character, }, and then reports an error. In both cases, the error is due to the last }.

Related

Is is possible that I use separators other than comma in Rust's MacroMatch?

In the rust book I see the definition of MacroMatch is like the following
MacroMatch :
Token except $ and delimiters
| MacroMatcher
| $ ( IDENTIFIER_OR_KEYWORD except crate | RAW_IDENTIFIER | _ ) : MacroFragSpec
| $ ( MacroMatch+ ) MacroRepSep? MacroRepOp
MacroFragSpec :
block | expr | ident | item | lifetime | literal
| meta | pat | pat_param | path | stmt | tt | ty | vis
MacroRepSep :
Token except delimiters and MacroRepOp
MacroRepOp :
* | + | ?
According to the definition of tokens, I found >> is a token. So, in my understading, we can use all tokens as MacroRepSep except from {}/[]/{}/*/+/?.
However, the following codes can't compile, with a error "$a:expr is followed by >>, which is not allowed for expr fragments"
macro_rules! add_list2 {
($($a:expr)>>*) => {
0
$(+$a)*
}
}
pub fn main() {
println!("{}", add_list!(1>>2>>3));
}
Playground.
I wonder why, and if I can use a separator other than ,?
Not sure if you're using a different Rust version, but with your code on the current compiler (1.62) it outputs an error that includes what separators are available:
error: `$a:expr` is followed by `>>`, which is not allowed for `expr` fragments
--> src/main.rs:2:16
|
2 | ($($a:expr)>>*) => {
| ^^ not allowed after `expr` fragments
|
= note: allowed there are: `=>`, `,` or `;`
The problem with repetitions on exprs is that, since they are so varied, they can easily be ambiguous or could become ambiguous. I'll quote the section on Follow-set Ambiguity Restrictions:
The parser used by the macro system is reasonably powerful, but it is limited in order to prevent ambiguity in current or future versions of the language. In particular, in addition to the rule about ambiguous expansions, a nonterminal matched by a metavariable must be followed by a token which has been decided can be safely used after that kind of match.
As an example, a macro matcher like $i:expr [ , ] could in theory be accepted in Rust today, since [,] cannot be part of a legal expression and therefore the parse would always be unambiguous. However, because [ can start trailing expressions, [ is not a character which can safely be ruled out as coming after an expression. If [,] were accepted in a later version of Rust, this matcher would become ambiguous or would misparse, breaking working code. Matchers like $i:expr, or $i:expr; would be legal, however, because , and ; are legal expression separators.
And it includes the separators available for various fragment specifiers:
expr and stmt may only be followed by one of: =>, ,, or ;.
pat_param may only be followed by one of: =>, ,, =, |, if, or in.
pat may only be followed by one of: =>, ,, =, if, or in.
path and ty may only be followed by one of: =>, ,, =, |, ;, :, >, >>, [, {, as, where, or a macro variable of block fragment specifier.
vis may only be followed by one of: ,, an identifier other than a non-raw priv, any token that can begin a type, or a metavariable with a ident, ty, or path fragment specifier.
All other fragment specifiers have no restrictions.

Mismatched input with binary operator parsing

I'm trying to parse an existing language in ANTLR that's currently being parsed using the Ruby library Parslet.
Here is a stripped down version of my grammar:
grammar FilterMin;
filter : condition_set;
condition_set: condition_set_type (property_condition)?;
condition_set_type: '=' | '^=';
property_condition: property_lhs CONDITION_SEPARATOR property_rhs;
property_lhs: QUOTED_STRING;
property_rhs: entity_rhs | contains_rhs;
contains_rhs: CONTAINS_OP '(' contains_value ')';
contains_value: QUOTED_STRING;
entity_rhs: NOT_OP? MATCH_OP? QUOTED_STRING;
// operators
MATCH_OP: '~';
NOT_OP: '^';
CONTAINS_OP: 'contains';
QUOTED_STRING: QUOTE STRING QUOTE;
STRING: (~['\\])*;
QUOTE: '\'';
CONDITION_SEPARATOR: ':';
This parses fails to parse both ='foo':'bar' and ='foo':contains('bar') with the same either: mismatched input ':' expecting ':' or mismatched input ':contains(' expecting ':'.
Why aren't these inputs parsing?
Your STRING rule matches everything that isn't a backslash or a single quote. So it overlaps with all of your other lexical rules except QUOTED_STRING. Since the lexer will always pick the rule that produces the longest match and that's almost always STRING, your lexer will produce a bunch of STRING tokens and never any CONDITION_SEPERATOR tokens.
Since you never use STRING in your parser rules, it doesn't need to be an actual type of token. In fact, you never want STRING tokens to be generated, you only ever want it to be matched as part of a QUOTED_STRING token. Therefore it should be a fragment.

ANTLR4: rule 'RULE' contains a closure with at least one alternative that can match an empty string

I am writing a file parser with ANTLR4. The file can have a number of blocks, which all begin and end with a (BEGIN | END) keyword. Here is a very simple example:
grammar test;
BEGIN: 'BEGIN';
END: 'END';
HEADER:'HEADER';
BODY: 'BODY';
file: block+;
ID: [A-Za-z];
NUM: [0-9];
block:
| BEGIN HEAD statement* END HEAD
| BEGIN BODY statement* END BODY
;
statement: ID '=' NUM;
The error thats get thrown is error(153): test.g4:8:0: rule file contains a closure with at least one alternative that can match an empty string
, what I don't understand, since file has at least one empty block, with the begin-end style. Anyone sees what I am missing here?
block can match the empty string because there's nothing between the colon and the first |. Then in file, you use block+. This causes the error because you're applying + to something that can match the empty string, which could lead to an infinite looo that doesn't consume any input.
To fix this problem, just remove the first | in block.

How to transform Go string literal code to its value?

I'm walking around syntax tree in Go, trying to find all calls to some particular function and then get its string argument (it's a file name, and should be string literal, not any other identifier). I'm succeeded with this, and now I have ast.BasicLit node with Kind == token.STRING, but its value is Go code, not a value of the string that it should have.
I found question that answers how to do reverse transformation - from string to go code it represents: golang: given a string, output an equivalent golang string literal
But I want the opposite - something like eval function (but just for Go string literals).
You can use the strconv.Unquote() to do the conversion (unquoting).
One thing you should be aware of is that strconv.Unquote() can only unquote strings that are in quotes (e.g. start and end with a quote char " or a back quote char `), so you have to manually append that if it's not in quotes.
Example:
fmt.Println(strconv.Unquote("Hi")) // Error: invalid syntax
fmt.Println(strconv.Unquote(`Hi`)) // Error: invalid syntax
fmt.Println(strconv.Unquote(`"Hi"`)) // Prints "Hi"
fmt.Println(strconv.Unquote(`"Hi\x21"`)) // Prints "Hi!"
// This will print 2 lines:
fmt.Println(strconv.Unquote(`"First line\nSecondline"`))
Output (try it on the Go Playground):
invalid syntax
invalid syntax
Hi <nil>
Hi! <nil>
First line
Secondline <nil>

How will I implement the lexing of strings using ocamllex?

I am new to the concept of lexing and am trying to write a lexer in ocaml to read the following example input:
(blue, 4, dog, 15)
Basically the input is a list of any random string or integer. I have found many examples for int based inputs as most of them model a calculator, but have not found any guidance through examples or the documentation for lexing strings. Here is what I have so far as my lexer:
(* File lexer.mll *)
{
open Parser
}
rule lexer_main = parse
[' ' '\r' '\t'] { lexer_main lexbuf } (* skip blanks *)
| ['0'-'9']+ as lxm { INT(int_of_string lxm) }
| '(' { LPAREN }
| ')' { RPAREN }
| ',' { COMMA }
| eof { EOF }
| _ { syntax_error "couldn't identify the token" }
As you can see I am missing the ability to parse strings. I am aware that a string can be represented in the form ['a'-'z'] so would it be as simple as ['a'-'z'] { STRING }
Thanks for your help.
The notation ['a'-'z'] represents a single character, not a string. So a string is more or less a sequence of one or more of those. I have a fear that this is an assignment, so I'll just say that you can extend a pattern for a single character into a pattern for a sequence of the same kind of character using the same technique you're using for INT.
However, I wonder whether you really want your strings to be so restrictive. Are they really required to consist of alphabetic characters only?

Resources