How to find matching expressions while using a ParseTreeWalker - antlr4

Say I'd like to find instances of the expression while using the Java7 grammar:
FoobarClass.getInstanceOfType("Bazz");
Using a ParseTreeWalker and listening to exitExpression() calls sounded like a good first place to start. What surprised me was the level of manual traversal of the Java7Parser.ExpressionContext required to find expressions of this type.
What's the appropriate method to find matches to the above expression? At this point using a Regex in place of ANTLR4 yields simpler code, but this won't scale.

ANTLR 4 does not currently include feature allowing you to write concrete or abstract syntax queries. We hope to add something in the future to help with this type of application.
I've needed to write a few pattern recognition features for ANTLR 4 parse trees. I implemented the predicate itself with relative success by extending BaseMyParserVisitor<Boolean> (the parser in this example is called MyParser).

Related

Parse subset of Python grammar

We are working on a tool to validate user configurations. Invalid configurations will be described in some text file or json file in following form:
case1: if something > 5 and something.else != 10
case2: (if a <= 3 or a >= 5) and b == 10
In case the if statement evaluates to true, the configuration is invalid. We used SLY module to create a lexer and parser to parse this sentence and check, whether it's valid or not. After thinking a bit more, we realized, that instead of writing our own grammar, it would be interesting to use a subset of the Python grammar - let's say expressions, bool operators and few others, but not the complete set, as we don't want and need to incorporate support for functions, classes and many more. The reason for such approach was, that we are writing our tool in Python, so it could cooperate nicely.
I've checked the ast module, however, I've a feeling, that the grammar is tightly coupled with it. If I understand it correctly, the Python parser is not generated automatically using some existing parser generator based on a grammar, right? The parser is "hard coded". Or em I wrong?
Is there "simple" way of doing this?
In general, we are looking for a parser generator, which generates the parser for a subset of Python grammar, but I'm afraid to cover part of the Python grammar, we would need to write the grammar by ourselves and based on it generate a parser. Is my assumption right?

Repeating Pattern Matching in antlr4

I'm trying to write a lexer rule that would match following strings
a
aa
aaa
bbbb
the requirement here is all characters must be the same
I tried to use this rule:
REPEAT_CHARS: ([a-z])(\1)*
But \1 is not valid in antlr4. is it possible to come up with a pattern for this?
You can’t do that in an ANTLR lexer. At least, not without target specific code inside your grammar. And placing code in your grammar is something you should not do (it makes it hard to read, and the grammar is tied to that language). It is better to do those kind of checks/validations inside a listener or visitor.
Things like back-references and look-arounds are features that krept in regex-engines of programming languages. The regular expression syntax available in ANTLR (and all parser generators I know of) do not support those features, but are true regular languages.
Many features found in virtually all modern regular expression libraries provide an expressive power that far exceeds the regular languages. For example, many implementations allow grouping subexpressions with parentheses and recalling the value they match in the same expression (backreferences). This means that, among other things, a pattern can match strings of repeated words like "papa" or "WikiWiki", called squares in formal language theory.
-- https://en.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages

Given an antlr4 grammar, can I build up an expression tree?

So I have written my grammar in antlr4 syntax. Then I setup codegeneration, and now I can parse source files in my own defined language. This works great!
The next step I took is to create an object model from the expression tree. This is also working well.
However, now I want to generate an expression from my object model.
Can I generate code using the generated language parser objects API? Obviously, I can write methods that hand-generates strings. But I want to use a geenrated API based on the grammar to achieve some level of type safety and to detect errors when I make a grammar change.
I'm using the latest antlr4: antlr 4.7.1.
There's no generated solution. You have to wire this all up manually.

Groovy function call omiting the parentheses

According to the gradle documentation/section 13.5.2 we can omit parentheses in a method call:
Parentheses are optional for method calls.
But it seems it doesn't work when we try to apply the java plugin. If a script contains the following line:
apply [plugin: 'java']
We'll get the error:
Maybe something should be set in parentheses or a comma is missing?
# line 1, column 8.
apply [plugin: 'java']
^
But if we put this Map-literal into a parentheses it'll work fine.
apply([plugin: 'java'])
So we can't omit the parentheses when the argument is a Map, can we?
As the specification says, parentheses can be omitted when there is no ambiguity. I suspect the ambiguity in this case arises because the statement without parentheses looks a lot like array index syntax and the parser has trouble working out whether you are calling a method named 'apply' or trying to do something with an array named 'apply'.
Personally, this is why I tend to always use parentheses - if the parser can't work it out I'm sure another programmer reading the code won't either.
While it's true that the usual syntax of an array or map in Groovy uses brackets (for example, for empty ones you typically write [] or [:], respectively), the same bracket symbols are interpreted as the index operator if it follows an identifier. Then Groovy tries to interpret apply as a property of the Gradle project, even if it does not exist. Groovy, as a dynamic language, allows us to define properties dynamically, so there is not always a way to tell at compile time if a property is going to exist. While it is questionable if that is a good design or not, Gradle's ExtraPropertiesExtension very much makes use of this dynamic nature in Groovy for convenience. If you prefer more strict typing, I suggest you try the Kotlin DSL, which has much less of this kind of problems (I do not think it is completely gone, as we can still explicitly declare variables as dynamic type).
On the other hand, one purpose of a DSL is to be concise and remove useless ceremony. That is one thing Groovy excels at, because if the only parameter you need to pass to a method or closure is just an array or a map, you can just omit all kinds of brackets: apply plugin: 'java'. (Anything more complex than that is questionable for a DSL.)
And that is why I think that adding parentheses all the time (e.g., apply([plugin: 'java'])) to everywhere is not the right approach to build scripts whose code is supposed to look declarative and to support that, very DSL like, at least not if you use Groovy, which was designed exactly for that purpose. Never using a language feature, even where it might have an advantage, is what lead to books like Douglas Crockford's JavaScript: The Good Parts, where the author recommends we always use semicolons, and that is one of the most argued practices on the Internet, e.g. disagreed by the JavaScript Standard Style. I personally think that it is more important to know the language you work with and e.g., know when a semicolon is needed, in general to produce the best quality code. The semicolon example may admittedly not be such a strong example for other than reducing ceremony. But just because a language feature can be misused, it does not mean we should ban its use. A kitchen knife is not bad just because we can hurt people with it. And the same can be told about Kotlin's most judged features.
Although nowadays we got the plugins DSL, which is the preferred way of loading plugins (even if plugins are put in buildSrc, which will just need some metadata definition), this question may still be relevant for the following reasons:
programmatically applying plugins, e.g. subprojects { apply plugin: 'java' },
including custom build scripts using a similar syntax: apply from: 'myscript.gradle',
and maybe also if for some reason you can only refer to a plugin by its class name.
Therefore the question is still valid.

How to use other types with UIMA ConceptMapper

I have been successfully using UIMA ConceptMapper with a dictionary that I built. I set the TokenAnnotation parameter to uima.tt.TokenAnnotation and the SpanFeatureStructure parameter to uima.tt.SentenceAnnotation (based on the reference example). These types are I believe coming from the OpenNLP parser. But I also do another parse using medkatp and would like to use their types. So far I have not figured out how to do that. If I change either of these two parameters the whole thing fails saying that it cannot find the type.
I've searched for hours on the net but have found no examples of ConceptMapper that use anything except these two types. Any suggestions are welcome.
These types are I believe coming from the OpenNLP parser.
These types are defined in the component descriptor (usually, in the desc folder), under the tag <typeSystemDescription>
But I also do another parse using medkatp and would like to use their types.
If you want to use other types (from other UIMA components, or your own types), you need to define them wherever you use them (meaning: in every component descriptor where you need it, under the tag <typeSystemDescription>).

Resources