Converting ANTLR parse trees into string and then reverting it - antlr4

I am new to ANTLR, and I am digging into it for a project. My work would require me to generate a parse tree from a source code file, convert the parse tree into a string that holds all the information about the parse tree in a somewhat "human-readable" form. Parts of this string (representing the parse tree) will then be modified, and the modified string will have to be converted to a changed source code.
I have found out that the .toStringTree(tree) method can be used in ANTLR to print out the tree in LISP format. Is there a better way to represent the parse tree as a string that holds all information?
Can the string-parse-tree be reverted back to the original source code (in the same language) using ANTLR? If no, are there any tools for this?

Can the string-parse-tree be reverted back to the original source code (in the same language) using ANTLR?
That string does not contain the token types, just the matched text. In other words: you cannot create a parse tree from the output of the ToStringTree. Besides, many ANTLR grammars have lexer rules that skip certain input (white spaces and line breaks, for example), so converting a parse tree back to the original input source is not always possible.
If no, are there any tools for this?
Without a doubt, I suggest you do a search on GitHub. But when you have the parse tree, it is trivial to create a custom tree structure and convert that to JSON.

Related

Text first data serialization with separate metadata

I'm trying to find a format that will help solve a very particular problem:
Text first solution.
Ability to specify complex objects in a single text line (properties, key\value, lists, complex objects)
Object metadata structure should be separate from the data.
For example:
Metadata: Prop1:int|Prop2:string|PropList:int[,]
Data: 20|Something|10,20,30
that would mean:
Prop1 = 20
Prop2 = "Something"
PropList = [10,20,30]
Is there any existing serialization format resembling this?
I don't see any format can support the scheme from the example you provided. If you really need this schema (Type section, Data section), then you need to write your own parser, and it's easy.
But the most suitable mature format should still be JSON if you don't want to write your own parser.
specify complex objects in a single text line: not YAML, not XML, not INI, not TOML.
Any common format is designed less semantics or business related.

base 64 Decode XML values using Groovy script

I will be receiving the following XML data in a variable.
<order>
<name>xyz</name>
<city>abc</city>
<string>aGVsbG8gd29ybGQgMQ==</string>
<string>aGVsbG8gd29ybGQgMg==</string>
<string>aGVsbG8gd29ybGQgMw==</string>
</order>
Output:
<order>
<name>xyz</name>
<city>abc</city>
<string>hello world 1</string>
<string>hello world 2</string>
<string>hello world 3</string>
</order>
I know how I can decode from base64 but the problem is some of the values are decoded already and some are encoded. What is the best approach to decode this data using groovy so that I get the output as shown?
Always: tag value will be encoded. rest all other tags and value will be decoded.
Since there's no uncertainty on which nodes could come encoded and which not, hence no need to detect base64 encoding, the way to do it is pretty simple:
Parse it. There's two preferable ways to do that in Groovy: XmlSlurper & XmlParser. They differ in computation & mem consumption modes, both provide object/structure representation in the end, though.
Work with that object structure: traverse all required elements, decode the content/attributes you need to decode.
Either proceed further with the data with them and/or serialize it back to the XML text.
Articles to look at:
Load, modify, and write an XML document in Groovy
https://www.baeldung.com/groovy-xml
https://groovy-lang.org/processing-xml.html
and many, many more.
Another cheat sheet always useful for Groovy noobs: http://groovy-lang.org/groovy-dev-kit.html
Check out how to traverse the structures there, for instance.

Converting an ASTNode into code

How does one convert an ASTNode (or at least a CompilationUnit) into a valid piece of source code?
The documentation says that one shouldn't use toString, but doesn't mention any alternatives:
Returns a string representation of this node suitable for debugging purposes only.
CompilationUnits have rewrite, but that one does not work for ASTs created by hand.
Formatting options would be nice to have, but I'd basically be satisfied with anything that turns arbitrary ASTNodes into semantically equivalent source code.
In JDT the normal way for AST manipulation is to start with a basic CompilationUnit and then use a rewriter to add content. Then ASTRewriteAnalyzer / ASTRewriteFormatter should take care of creating formatted source code. Creating a CU just containing a stub type declaration shouldn't be hard, so that's one option.
If that doesn't suite your needs, you may want to experiement with directly calling the internal org.eclipse.jdt.internal.core.dom.rewrite.ASTRewriteFlattener.asString(ASTNode, RewriteEventStore). If not editing existing files, you may probably ignore the events collected in the RewriteEventStore, just use the returned String.

Stanford nndep to get parse trees

Using Stanford CoreNLP, I am trying to parse text using the neural nets dependency parser. It runs really fast (that's why I want to use this and not the LexicalizedParser), and produces high-quality dependency relations. I am also interested in retrieving the parse trees (Penn-tree style) from that too. So, given the GrammaticalStructure, I am getting the root of that (using root()), and then trying to print it out using the toOneLineString() method. However, root() returns the root node of the tree, with an empty/null list of children. I couldn't find anything on this in the instructions or FAQs.
GrammaticalStructure gs = parser.predict(tagged);
// Print typed dependencies
System.err.println(gs);
// get the tree and print it out in the parenthesised form
TreeGraphNode tree = gs.root();
System.err.println(tree.toOneLineString());
The output of this is:
ROOT-0{CharacterOffsetBeginAnnotation=-1, CharacterOffsetEndAnnotation=-1, PartOfSpeechAnnotation=null, TextAnnotation=ROOT}Typed Dependencies:
[nsubj(tell-5, I-1), aux(tell-5, can-2), advmod(always-4, almost-3), advmod(tell-5, always-4), root(ROOT-0, tell-5), advmod(use-8, when-6), nsubj(use-8, movies-7), advcl(tell-5, use-8), amod(dinosaurs-10, fake-9), dobj(use-8, dinosaurs-10), punct(tell-5, .-11)]
ROOT-0
How can I get the parse tree too?
Figured I can use the Shift-Reduce constituency parser made available by Stanford. It's very fast and the results are comparable.

Groovy says my Unicode string is too long

As part of my probably wrong and cumbersome solution to print out a form I have taken a MS-Word document, saved as XML and I'm trying to store that XML as a groovy string so that I can ${fillOutTheFormProgrammatically}
However, with MS-Word documents being as large as they are, the String is 113100 unicode characters and Groovy says its limited to 65536. Is there some way to change this or am I stuck with splitting up the string?
Groovy - need to make a printable form
That's what I'm trying to do.
Update: to be clear its too long of a Groovy String.. I think a regular string might be all good. Going to change strategy and put some strings in the file I can easily find like %!%variable_name%!% and then do .replace(... uh i feel a new question coming on here...
Are you embedding this string directly in your groovy code? The jvm itself has a limit on the length of string constants, see the VM Spec if you are interested in details.
A ugly workaround might be to split the string in smaller parts and concatenate them at runtime. A better solution would be to save the text in an external file and read the contents from your code. You could also package this file along with your code and access it from the classpath using Class#getResourceAsStream.

Resources