ANTLR4: Use getText() in label of python

ANTLR4: Use getText() in label of python - python-3.x

I'm currently having one issue with ANTLR4. I have previously worked with ANTLR4 and generated the classes in Java. I would then be able whenever I found a label to do: ctx.label.getText() to get the text in the label.
Now I'm trying to do the same thing in Python3, however, it is not working.
For example in this grammar when I try to access the value.
expression
: LPARENS expression RPARENS
| ...
| value=(INTEGER | FLOAT | BOOLEAN | STRING | HOLE)
;
When trying to access ctx.value.getText() it gives me the following error:
print(ctx.value.getText())
AttributeError: 'CommonToken' object has no attribute 'getText'
Since I'm pretty new in using antlr4 with python was wondering what workaround exists for this.

In case of tokens, value=TOKEN, it’s .text:
print(ctx.value.text)
In case of a parser rule, value=expression, then it is value.getText(), I believe.

Related

How can I use a keyword in git book in a string?

Im documenting some code and it includes a | operator, but since the pipe charatcer is used for filters im getting this error
“Error:filter not found: quote”
The line itself is “ [somecode] | quote “
This code is being displayed in a table. I’ve tried tags <code> and ’’’but it’s not working. it still thinks i’m using an undefined filter, how can i fix this?

Why can a variable be formatted, but not an element of an array when using the println macro?

Say you have
let x: u8 = 1
println!("{x}");
this works fine; however, if you instead have
let x: [u8; 1] = [1];
println!("{x[0]}");
then it throws the error
error: invalid format string: expected `'}'`, found `'['
|
| println!("{x[0]}");
| - ^ expected `}` in format string
| |
| because of this opening brace
|
= note: if you intended to print `{`, you can escape it using `{{`
why is this?

The inline print functionality is only intended for variable names. To be honest though, I rarely see this syntax used though. Most people prefer println!("{}", x[0]) instead, and this is one of the main reasons why.
I can defiantly see where they were coming from though. The {:?} or {:X} syntax might look weird since the colon does not seem to have a use in these statements to print in debug or hexadecimal mode, but I suspect it was made to mirror fields in structs and function arguments. It starts to look more familiar when you write it with the inline variable and spacing: format!("{name: ?}"). Under this reasoning it makes more sense to only allow idents here (Token for an identity. Essentially the name of a variable/type/module/etc). But this didn't really materialize (if it ever even was a thing) so we don't have this syntax.
Personally, I think they could have made it work, but you would end up with confusion about how :? (and other format specifiers) work in regards to expressions. For example, if people are taught that {x.foo()} will print the display of the expression x.foo() then does that mean x.foo(): ? is also a valid expression? What about -3.0:3.0?? It kinda looks like a range in python, but I have just worded it in a confusing way.
Edit: Found the RFC for this: https://rust-lang.github.io/rfcs/1618-ergonomic-format-args.html
Edit 2: I found a Rust forum post which better addresses your question (https://internals.rust-lang.org/t/how-to-allow-arbitrary-expressions-in-format-strings/15812). Their reasoning is as follows:
Curly braces in format strings are escaped with curly braces: format!("{{foo}}") prints {foo}. If arbitrary expressions were supported, parsing this would become ambiguous.
It's ambiguous when type ascription is enabled: format!("{foo:X}") could mean either type ascription or that the UpperHex trait should be used.
The ? operator could be easily confused with Debug formatting: "{foo?}" and "{foo:?}" look very similar.

ANTLR4 - How to parse content between same string values

I'm trying to write an antlr4 parser rule that can match the content between some arbitrary string values that are same. So far I couldn't find a method to do it.
For example, in the below input, I need a rule to extract Hello and Bye. I'm not interested in extracting xyz though.
TEXT Hello TEXT
TEXT1 Bye TEXT1
TEXT5 xyz TEXT8
As it is very much similar to an XML element grammar, I tried an example for XML Parser given in ANTLR4 XML Grammar, but it parses an input like <ABC> ... </XYZ> without error which is not what I wanted.
I also tried using semantic predicates without much success.
Could anyone please help with a hint on how to match content that is embedded between same strings?
Thank you!
Satheesh

Not sure how this works out performance wise, because of many many checks the parser has to do, but you could try something like:
token:
start = IDENTIFIER WORD* end = IDENTIFIER { start == end }?
;
The part between the curly braces is a validating semantic predicate. The lexer tokens are self-explanatory, I believe.
The more I think about it, it might be better you just tokenize the input and write an owner parser that processes the input and acts accordingly. Depends of course on the complexity of the syntax.

Why is this left-recursive and how do I fix it?

I'm learning ANTLR4 and I'm confused at one point. For a Java-like language, I'm trying to add rules for constructs like member chaining, something like that:
expr1.MethodCall(expr2).MethodCall(expr3);
I'm getting an error, saying that two of my rules are mutually left-recursive:
expression
: literal
| variableReference
| LPAREN expression RPAREN
| statementExpression
| memberAccess
;
memberAccess: expression DOT (methodCall | fieldReference);
I thought I understood why the above rule combination is considered left-recursive: because memberAccess is a candidate of expression and memberAccess starts with an expression.
However, my understanding broke down when I saw (by looking at the Java example) that if I just move the contents of memberAccess to expression, I got no errors from ANTLR4 (even though it still doesn't parse what I want, seems to fall into a loop):
expression
: literal
| variableReference
| LPAREN expression RPAREN
| statementExpression
| expression DOT (methodCall | fieldReference)
;
Why is the first example left-recursive but the second isn't?
And what do I have to do to actually parse the initial line?

The second is left-recursive but not mutually left recursive. ANTLR4 can eliminate left-recursive rules with an inbuilt algorithm. It cannot eliminate mutually left recursive rules. There probably exists an algorithm, but this would hardly preserve actions and semantic predicates.

For some reason, ANTLRWorks 2 was not responding when my grammar had left-recursion, causing me to (erroneously) believe that my grammar was wrong.
Compiling and testing from commandline revealed that the version with immediate left-recursion did, in fact, compile and parse correctly.
(I'm leaving this here in case anyone else is confused by the behavior of the IDE.)

Handle a bunch of arbitrary literals using Groovy AST Transformation

I read a lot about AST Transformation these days and now I want to handle some arbitrary literals not known by groovy. The specific idea is to enable groovy handling plain sql.
If you write select a from tab where b = 'x' in GroovyConsole, the AST looks like this at some point
MethodCall - this.select(a).from(tab).where((b = x))
With some effort, it should be possible to turn this into a sql-statement.
So everything is fine as long as I don't use an asterisk.
If I write select * from tab no AST can be built (no matter what phase) and an error occures:
Unable to produce AST for this phase due to earlier compilation error:
startup failed:
script123.groovy: 1: expecting EOF, found 'tab' # line1, column 15.
So here's the question:
How can I turn the asterisk into something processable?

I tried to get the CST after parsing was done. But it's always null and I couldn't find out why.
But I found another approach:
http://java.dzone.com/articles/run-your-antlr-dsl-groovy

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

ANTLR4: Use getText() in label of python - python-3.x

In case of tokens, value=TOKEN, it’s .text: print(ctx.value.text) In case of a parser rule, value=expression, then it is value.getText(), I believe.

Related

How can I use a keyword in git book in a string?

Why can a variable be formatted, but not an element of an array when using the println macro?

ANTLR4 - How to parse content between same string values

Why is this left-recursive and how do I fix it?

Handle a bunch of arbitrary literals using Groovy AST Transformation

Categories

Resources