Can this grammar be parsed using antlr4? - antlr4

Given a set S of n rules, I need an antlr4 rule to match any of S subset, in any order :
each rule of S can appear zero or one time
any permutation of the subset is ok
Example :
Given S = {a,b}, (n = 2) the rule must match
a
b
a b
b a
while "a b b", for instance must not match.
It is possible to parse such expression with an antlr4 grammar ? My real set has n = 6, so listing all combinations in the grammar seems not to be a possible choice !

No, you can't define combinations and/or permutations of rules in ANTLR (or any other parser generator that I know).
You could use predicates to accomplish your goal, but that means adding target specific code to your grammar: I'd just parse any a or b and validate the structure after parsing (in a custom visitor/listener).

Related

XML schema restriction pattern for not allowing specific string

I need to write an XSD schema with a restriction on a field, to ensure that
the value of the field does not contain the substring FILENAME at any location.
For example, all of the following must be invalid:
FILENAME
ORIGINFILENAME
FILENAMETEST
123FILENAME456
None of these values should be valid.
In a regular expression language that supports negative lookahead, I could do this by writing /^((?!FILENAME).)*$ but the XSD pattern language does not support negative lookahead.
How can I implement an XSD pattern restriction with the same effect as /^((?!FILENAME).)*$ ?
I need to use pattern, because I don't have access to XSD 1.1 assertions, which are the other obvious possibility.
The question XSD restriction that negates a matching string covers a similar case, but in that case the forbidden string is forbidden only as a prefix, which makes checking the constraint easier. How can the solution there be extended to cover the case where we have to check all locations within the input string, and not just the beginning?
OK, the OP has persuaded me that while the other question mentioned has an overlapping topic, the fact that the forbidden string is forbidden at all locations, not just as a prefix, complicates things enough to require a separate answer, at least for the XSD 1.0 case. (I started to add this answer as an addendum to my answer to the other question, and it grew too large.)
There are two approaches one can use here.
First, in XSD 1.1, a simple assertion of the form
not(matches($v, 'FILENAME'))
ought to do the job.
Second, if one is forced to work with an XSD 1.0 processor, one needs a pattern that will match all and only strings that don't contain the forbidden substring (here 'FILENAME').
One way to do this is to ensure that the character 'F' never occurs in the input. That's too drastic, but it does do the job: strings not containing the first character of the forbidden string do not contain the forbidden string.
But what of strings that do contain an occurrence of 'F'? They are fine, as long as no 'F' is followed by the string 'ILENAME'.
Putting that last point more abstractly, we can say that any acceptable string (any string that doesn't contain the string 'FILENAME') can be divided into two parts:
a prefix which contains no occurrences of the character 'F'
zero or more occurrences of 'F' followed by a string that doesn't match 'ILENAME' and doesn't contain any 'F'.
The prefix is easy to match: [^F]*.
The strings that start with F but don't match 'FILENAME' are a bit more complicated; just as we don't want to outlaw all occurrences of 'F', we also don't want to outlaw 'FI', 'FIL', etc. -- but each occurrence of such a dangerous string must be followed either by the end of the string, or by a letter that doesn't match the next letter of the forbidden string, or by another 'F' which begins another region we need to test. So for each proper prefix of the forbidden string, we create a regular expression of the form
$prefix || '([^F' || next-character-in-forbidden-string || ']'
|| '[^F]*'
Then we join all of those regular expressions with or-bars.
The end result in this case is something like the following (I have inserted newlines here and there, to make it easier to read; before use, they will need to be taken back out):
[^F]*
((F([^FI][^F]*)?)
|(FI([^FL][^F]*)?)
|(FIL([^FE][^F]*)?)
|(FILE([^FN][^F]*)?)
|(FILEN([^FA][^F]*)?)
|(FILENA([^FM][^F]*)?)
|(FILENAM([^FE][^F]*)?))*
Two points to bear in mind:
XSD regular expressions are implicitly anchored; testing this with a non-anchored regular expression evaluator will not produce the correct results.
It may not be obvious at first why the alternatives in the choice all end with [^F]* instead of .*. Thinking about the string 'FEEFIFILENAME' may help. We have to check every occurrence of 'F' to make sure it's not followed by 'ILENAME'.

Can I put one check on a Lexial element instead for on a number of parser rules?

I,m trying to use antlr4 with the IDL.g4 grammar, to implement some checks that our idl-files shall follow. One rule is about names. The rule are like:
ID contains only letters, digits and signle underscores,
ID begin with a letter,
ID end with a letter or digit.
ID is not a reserved Word in ADA, C, C++, Java, IDL
One way to do this check is to write a function that check a string for these properties and call it in the exit listeners for every rule that has an ID. E.g(refering to IDL.g4) in exitConst_decl(), exitInit_decl(), exitSimple_declarator() and a lot of more places. Maybe that is the correct way to do it. But I was thinking about putting that check directly on the lexical element ID. But don't know how to do that, or if it is possible at all.
Validating this type of constraint in the lexer would make it significantly more difficult to provide usable error messages for invalid identifiers. However, you can create a new parser rule identifier, and replace all references to ID in various parser rules to reference identifier instead.
identifier
: ID
;
You can then place your identifier validation logic inside of the single method enterIdentifier instead of all of the various rules that currently reference ID.

Jape grammar to identify product release

How can i use AND operation on jape grammar?. I just want to check whether a sentence contain 'organisation','jobtitle','person' all together in any order. How it possible? There is '|'(OR) operation allowed but i didnt see any documentation about AND operation.
There isn't an "and" operator like that as such but you could do it with a set of contains checks:
Rule: OrgTitlePer
({Sentence contains {Organization},
Sentence contains {JobTitle},
Sentence contains {Person}}):sent
-->
:sent.Interesting = {}
When you have several constraints within the same set of braces that involve the same annotation type on the left (Sentence in this case) then all the constraints must be satisfied simultaneously by the same annotation.

Troubles with returns declaration on the first parser rule in an ANTLR4 grammar

I am using returns for my parser rules which works for all parser rules except the first one. If the first parser rule in my grammer uses the returns declaration ANTLR4 complains as follows:
expecting ARG_ACTION while matching a rule
If I add another parser rule above which does not use "returns" ANTLR does not complain.
Here you have a grammar reduced to the problem:
grammar FirstParserRuleReturnIssue;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
aRule returns [String s]: ID { $s = $ID.text; };
I searched to find a special role of the first rule that could explain the behaviour but did not find anything. Is it a bug? Do I miss some understanding?
You need to place parser rules (start with a lowercase letter) before lexer rules (start with an uppercase letter) in your grammar. After encountering a lexer rule, the [ triggers a LEXER_CHAR_SET instead of ARG_ACTION, so the token stream seen by the compiler looks like you're passing a set of characters where the return value should be.

Can anyone point me at a good example of pretty printing rules to "english"

I've got the equivalent of an AST that a user has built using a rule engine. But when displaying a list of the rules, I'd like to be able to "pretty print" each rule into something that looks nice**. Internally when represented as a string they look like s-expressions so imagine something like:
(and (contains "foo" "foobar") (equals 4 (plus 2 2 )))
Can anyone point me at a program that has done a good job of displaying rules in a readable fashion?
** Needs to be localizable too, but I guess we'll leave that for extra credit.
Maybe check out the Attempto project that is developing Attempto Controlled English (ACE). ACE allows you to write rules in a subset of English. For example:
If "foo" contains "foobar" and "foobar" does not contain "foo" then 4 = 2 + 2.
The ACE parser converts such rules into a logical form called Discourse Representation Structure (DRS). For the above example, it looks like this:
[]
[A]
predicate(A, contain, string(foo), string(foobar))-1
NOT
[B]
predicate(B, contain, string(foobar), string(foo))-1
=>
[]
formula(int(4), =, expr(+, int(2), int(2)))-1
There is a tool called DRS verbalizer that converts DRSs into ACE. For the above DRS you would get:
If "foo" contains "foobar" and it is false that "foobar" contains "foo" then 4 = ( 2 + 2 ).
In your case, you would have to convert your rule representation into the DRS (which should be quite straight-forward), and then you can directly use the DRS verbalizer. The mentioned tools are available under the LGPL license.

Resources