I'm using ANTLR4 to build AST tree, I download g4 file from: https://github.com/antlr/grammars-v4/tree/master/sqlite
Add the option in the head of g4 file:
options{
output=AST;
ASTLabelType=CommonTree;
language=Java;
}
but while compile g4 file, it output :
ANTLR Tool v4.6 (D:\antlr-4.6-complete.jar)
SQLite.g4 -o C:\Users\macro\workspace\tdsql\target\generated-sources\antlr4 -listener -no-visitor -encoding UTF-8
warning(83): SQLite.g4:34:4: unsupported option output
warning(83): SQLite.g4:35:4: unsupported option ASTLabelType
does antlr4 not support using ASTLabelType to build a AST tree? and how can I build a AST tree with antlr4?
I'm an Antlr newbie myself so there are better-qualified people who can answer this. That said, the AST output option was deprecated between Antlr3 and Antlr4. Antlr3 will generate an AST but Antlr4 won't.
Your alternatives in Antlr4 are to use the Listener pattern (to walk the parse tree) or the Visitor pattern (to visit & evaluate nodes). Either - or both - of those can be used after running the Lexer and Parser.
There are a number of examples that can be found with some searching. Here's one for the Visitor pattern. This page compares Listeners and Visitors.
Related
I am new to ANTLR, and I am digging into it for a project. My work would require me to generate a parse tree from a source code file, convert the parse tree into a string that holds all the information about the parse tree in a somewhat "human-readable" form. Parts of this string (representing the parse tree) will then be modified, and the modified string will have to be converted to a changed source code.
I have found out that the .toStringTree(tree) method can be used in ANTLR to print out the tree in LISP format. Is there a better way to represent the parse tree as a string that holds all information?
Can the string-parse-tree be reverted back to the original source code (in the same language) using ANTLR? If no, are there any tools for this?
Can the string-parse-tree be reverted back to the original source code (in the same language) using ANTLR?
That string does not contain the token types, just the matched text. In other words: you cannot create a parse tree from the output of the ToStringTree. Besides, many ANTLR grammars have lexer rules that skip certain input (white spaces and line breaks, for example), so converting a parse tree back to the original input source is not always possible.
If no, are there any tools for this?
Without a doubt, I suggest you do a search on GitHub. But when you have the parse tree, it is trivial to create a custom tree structure and convert that to JSON.
To demonstrate the problem, I'm going to create a simple grammar to merely detect Python-like variables.
I create a virtual environment and install antlr4-python3-runtime in it, as mentioned in "Where can I get the runtime?":
Then, I create a PyVar.g4 file with the following content:
grammar PyVar;
program: IDENTIFIER+;
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9_]*;
NEWLINE: '\n' | '\r\n';
WHITESPACE: [ ]+ -> skip;
Now if I test the grammar with grun, I can see that the grammar detects the variables just fine:
Now I'm trying to write a parser in Python to do just that. I generate the Lexer and Parser, using this command:
antlr4 -Dlanguage=Python3 PyVar.g4
And they're generated with no errors:
But when I use the example provided in "How do I run the generated lexer and/or parser?", I get no output:
What am I not doing right?
There are two problems here.
1. The grammar:
In the line where I had,
program: IDENTIFIER+;
the parser will only detect one or more variables, and it will not detect any newline. The output you see when running grun is the output created by the lexer, that's why newlines are present in the tokens. So I had to replace it with something like this, for the parser to detect newlines.
program: (IDENTIFIER | NEWLINE)+;
2. Printing the output of parser
In PyVar.py file, I created a tree with this line:
tree = parser.program()
But it didn't print its output, nor did I know how to, but the OP's comment on this accepted answer suggests using tree.toStringTree().
Now if we fix those, we can see that it works:
I try to get compiler directive in a verilog parser which give me the true file name/path and the true current line in the non-preprocessed file.
Verilog language needs a preprocessing pass I have, but during the visit I have to know the current file name (which can't change by the `include directive) and so the true current line in the non-preprocessed file .
The preprocessing part add the verilog directive `line which indicates the current file and line.
Then I send the preprocessed buffer to the antlr Lexer, parse and extract all verilog information with a visitor. I have to keep the verilog compiler `line directive in the verilog grammar description:
Preprocessing_line
: '`line ' Decimal_number String Decimal_number '\n' -> channel(2)
;
Now, I don't know how to get this dedicated channel information at any point in the visitor? The target language for this parser is Python3.
Given that the Preprocessing_line tokens may not have a reliable relation to the parse-tree tokens (different Verilog compilers can be a bit loose about where they inject the reference lines), the easiest solution is to create a temporary index prior to the visitor walk.
That is, after parsing the pre-processed Verilog source, do a quick pass over the entire token stream (BufferedTokenStream#getTokens), picking out the Preprocessing_line tokens, and building a current_line -> original_line index.
Then, in any visited context, examine the underlying token(s) (ParserRuleContext#getStart, #getStop, #getSourceInterval) to find their current_line (Token#getLine)
I'm working on a math word problem solver, and would like to pass whole problems to my GATE Embedded application using JAPE. I'm using GATE IDE to display the output, as well as run the pipeline of GATE components. Each problem will be in its own paragraph, and each document will have several problems on it.
Is there a way to match any paragraph using the JAPE left-hand side regex?
I see three options here (there may be more elegant solutions):
1) Use simple rule like:
Phase: find
Input: Token
Options: control = once
Rule:OneToken
(
{Token}
)
In RHS you could get a text and use standard Java approach for getting paragraphs from plain text.
2) Use LHS (if you really want only LHS)
Rule: NewLine
(
({SpaceToken.string=="\n"}) |
({SpaceToken.string=="\r"}) |
({SpaceToken.string=="\n"}{SpaceToken.string=="\r"}) |
({SpaceToken.string=="\r"}{SpaceToken.string=="\n"})
):left
Build annotation NewLine, then write a Jape rule similar to 1) but with NewLine instead of Token. Take all NewLines from outputAS and build your Paragraph annotations.
3) Sometimes there may be right paragraphs in Original markups. In this case you could use Annotation Set Transfer PR and get them in Default Annotations Set.
why not just use RegEx Sentence splitter PR to use Split as the Input in your jape rules?
I have been trying to use the Stanford CoreNLP API included in the 2015-12-09 release. I start the server using:
java -mx5g -cp "./*" edu.stanford.nlp.pipelinStanfordCoreNLPServer
The server works in general, but fails for some setnences including the following:
"Aside from her specifically regional accent, she reveals by the use of the triad, ``irritable, tense, depressed, a certain pedantic itemization that indicates she has some familiarity with literary or scientific language ( i.e., she must have had at least a highÂschool education ) , and she is telling a story she has mentally rehearsed some time before."
I end up with a result that starts with :
{"sentences":[{"index":0,"parse":"SENTENCE_SKIPPED_OR_UNPARSABLE","basic-dependencies":
I would greatly appriciate some help in setting this up - am I not including some annotators in the nlp pipeline.
This same sentence works at http://corenlp.run/
If you're looking for a dependency parse (like that in corenlp.run), you should look at the basic-dependencies field rather than the parse field. If you want a constituency parse, you should include the parse annotator in the list of annotators you are sending to the server. By default, the server does not include the parser annotator, as it's relatively slow.