SWRL: How to use the built-in swrlb:booleanNot? - protege

I am using Protege 3.4.8. Say I have some instances with a datatype property colors and object property hasMeaning. The value of colors are represented by a string, e.g. red blue yellow.
I'd like to create a rule like this: If one's colors contain red but no blue, then it has the meaning Happy. My current rule is written as below:
colors(?x, ?y)
∧ swrlb:contains(?y, "red")
∧ swrlb:booleanNot(true, swrlb:contains(?y,"blue"))
→ hasMeaning(?x, Happy)
But I got Error: Expecting ',' or ')', got '('.
I followed the grammar provided here.
Any idea of what's wrong here? Thank you very much!

The abstract syntax for SWRL has this grammar for atoms:
atom ::= description '(' i-object ')'
| dataRange '(' d-object ')'
| individualvaluedPropertyID '(' i-object i-object ')'
| datavaluedPropertyID '(' i-object d-object ')'
| sameAs '(' i-object i-object ')'
| differentFrom '(' i-object i-object ')'
| builtIn '(' builtinID { d-object } ')'
builtinID ::= URIreference
The syntax for a builtIn atom takes a list of d-objects as arguments. The production for d-object is:
d-object ::= d-variable | dataLiteral
The atom booleanNot( true, contains( ?string, "red" )) is malformed because contains( ?string, "red" ) is not a d-object, but an atom.

Related

Antlr4 Match Force Priority

I have a query grammar I am working on and have found one case that is proving difficult to solve. The below provides a minimal version of the grammar to reproduce it.
grammar scratch;
query : command* ; // input rule
RANGE: '..';
NUMBER: ([0-9]+ | (([0-9]+)? '.' [0-9]+));
STRING: ~([ \t\r\n] | '(' | ')' | ':' | '|' | ',' | '.' )+ ;
WS: [ \t\r\n]+ -> skip ;
command
: 'foo:' number_range # FooCommand
| 'bar:' item_list # BarCommand
;
number_range: NUMBER RANGE NUMBER # NumberRange;
item_list: '(' (NUMBER | STRING)+ ((',' | '|') (NUMBER | STRING)+)* ')' # ItemList;
When using this you can match things like bar:(bob, blah, 57, 4.5) foo:2..4.3 no problem. But if you put in bar:(bob.smith, blah, 57, 4.5) foo:2..4 it will complain line 1:8 token recognition error at: '.s' and split it into 'bob' and 'mith'. Makes sense, . is ignored as part of string. Although not sure why it eats the 's'.
So, change string to STRING: ~([ \t\r\n] | '(' | ')' | ':' | '|' | ',' )+ ; instead without the dot in it. And now it will recognize 2..4.3 as a string instead of number_range.
I believe that this is because the string matches more character in one stretch than other options. But is there a way to force STRING to only match if it hasn't already matched elements higher in the grammar? Meaning it is only a STRING if it does not contain RANGE or NUMBER?
I know I can add TERM: '"' .*? '"'; and then add TERM into the item_list, but I was hoping to avoid having to quote things if possible. But seems to be the only route to keep the .. range in, that I have found.
You could allow only single dots inside strings like this:
STRING : ATOM+ ( '.' ATOM+ )*;
fragment ATOM : ~[ \t\r\n():|,.];
Oh, and NUMBER: ([0-9]+ | (([0-9]+)? '.' [0-9]+)); is rather verbose. This does the same: NUMBER : ( [0-9]* '.' )? [0-9]+;

How to break out of a lexical mode

I've been playing around with modes in an attempt to parse a message like this:
-MSGTXT (DO NOT TOKENIZE (THERE CAN BE PARENS HERE) THIS PART)
-END END OF MESSAGE
-TEST 123
The contents of MSGTXT can be any character so I set up my lexer grammar as follows:
lexer grammar ADEXPLexer;
// Fields
MSGTYP: 'MSGTYP';
ADEP: 'ADEP';
TITLE: 'TITLE';
FILTIM: 'FILTIM';
ORIGINDT: 'ORIGINDT';
IFPLID: 'IFPLID';
MSGTXT: 'MSGTXT' -> pushMode(MSG);
COMMENT: 'COMMENT';
// Message types.
ACK: 'ACK';
IFPL: 'IFPL';
// Lexical rules.
SEP: HYPHEN;
WS: [ \t\n\r] + -> skip;
KEYWORD: (ALPHA|DIGIT)+;
mode MSG;
TEXT: CLOSE_MSG | (ALPHA|DIGIT|SPECIAL|WS|HYPHEN)+;
CLOSE_MSG: ')' -> popMode;
fragment HYPHEN: '-';
fragment ALPHA: [A-Z];
fragment DIGIT: [0-9];
fragment SPECIAL
: '('
| '?'
| ':'
| '.'
| ','
| '\''
| '='
| '+'
| '/'
| ')'
;
The problem now however is that the last closing ')' is never used to break out back into the default mode so it continues on into other parts of the message. The parser rule itself looks like this:
msgtxt: SEP MSGTXT TEXT;
I'm looking for a way to get around this which doesn't involve TokenStreamRewriter as there's no such thing in the JavaScript runtime.
Any help appreciated!
Not sure what you need exactly, but if you don't need to check if contents of the TEXT is one of the (ALPHA|DIGIT|SPECIAL|WS|HYPHEN) just use this:
mode MSG;
TEXT: ~[)]+;
CLOSE_MSG: ')' -> popMode;
if you do, just exclude ')' from fragment SPECIAL

antlr4 literal string handling

I have the following antlr4 grammar:
grammar squirrel;
program: globalstatement+;
globalstatement: globalvardef | classdef | functiondef;
globalvardef: IDENT '=' constantexpr ';';
classdef: CLASS IDENT '{' classstatement+ '}';
functiondef: FUNCTION IDENT '(' parameterlist ')' functionbody;
constructordef: CONSTRUCTOR '(' parameterlist ')' functionbody;
parameterlist: IDENT (',' IDENT)* | ;
functionbody: '{' statement* '}';
classstatement: globalvardef | functiondef | constructordef;
statement: expression ';';
expression:
IDENT # ident |
IDENT '=' expression # assignment |
IDENT ('.' IDENT)+ # lookupchain |
constantexpr # constant |
IDENT '(' expressionlist ')' # functioncall |
expression '+' expression # addition;
constantexpr: INTEGER | STRING;
expressionlist: expression (',' expression)* | ;
CONSTRUCTOR: 'constructor';
CLASS: 'class';
FUNCTION: 'function';
COMMENT: '//'.*[\n];
STRING: '"' CHAR* '"';
CHAR: [ a-zA-Z0-9];
INTEGER: [0-9]+;
IDENT: [a-zA-Z]+;
WS: [ \t\r\n]+ -> skip;
Now if I parse this file:
z = "global variable";
class Base
{
z = 10;
}
everything is fine:
#0,0:0='z',<16>,1:0
#1,2:2='=',<1>,1:2
#2,4:20='"global variable"',<14>,1:4
#3,21:21=';',<2>,1:21
#4,26:30='class',<11>,3:0
#5,32:35='Base',<16>,3:6
#6,38:38='{',<3>,4:0
#7,42:42='z',<16>,5:1
#8,44:44='=',<1>,5:3
#9,46:47='10',<15>,5:5
#10,48:48=';',<2>,5:7
#11,51:51='}',<4>,6:0
#12,56:55='<EOF>',<-1>,8:0
But with this file:
z = "global variable";
class Base
{
z = "10";
}
I get this:
#0,0:0='z',<16>,1:0
#1,2:2='=',<1>,1:2
#2,4:49='"global variable";\r\n\r\nclass Base\r\n{\r\n\tz = "10"',<14>,1:4
#3,50:50=';',<2>,5:9
#4,53:53='}',<4>,6:0
#5,58:57='<EOF>',<-1>,8:0
So it seems like everything between the first " and last " in a file gets matched to one string literal.
How do I prevent this ?
Note the string is matching from the first quote to the last possible quote.
By default, a Kleene operator (*) in ANTLR is greedy. So, change
STRING: '"' CHAR* '"';
to
STRING: '"' CHAR*? '"';
to make it non-greedy.

ANTLR4 Grammar picks up 'and' and 'or' in variable names

Please help me with my ANTLR4 Grammar.
Sample "formel":
(Arbejde.ArbejderIKommuneNr=860) and (Arbejde.ErIArbejde = 'J') &
(Arbejde.ArbejdsTimerPrUge = 40)
(Ansogeren.BorIKommunen = 'J') and (BeregnDato(Ansogeren.Fodselsdato;
'+62Å') < DagsDato)
(Arb.BorI=860)
My problem is that Arb.BorI=860 is not handled correct. I get this error:
Error: no viable alternative at input '(Arb.Bor' at linenr/position: 1/6 \r\nException: Der blev udløst en undtagelse af typen 'Antlr4.Runtime.NoViableAltException
Please notis that Arb.BorI contains the word 'or'.
I think my problem is that my 'booleanOps' in the grammar override 'datakildefelt'
So... My problem is how do I get my grammar correct - I am stuck, so any help will be appreciated.
My Grammar:
grammar UnikFormel;
formel : boolExpression # BooleanExpr
| expression # Expr
| '(' formel ')' # Parentes;
boolExpression : ( '(' expression ')' ) ( booleanOps '(' expression ')' )+;
expression : element compareOps element # Compare;
element : datakildefelt # DatakildeId
| function # Funktion
| int # Integer
| decimal # Real
| string # Text;
datakildefelt : datakilde '.' felt;
datakilde : identifyer;
felt : identifyer;
function : funktionsnavn ('(' funcParameters? ')')?;
funktionsnavn : identifyer;
funcParameters : funcParameter (';' funcParameter)*;
funcParameter : element;
identifyer : LETTER+;
int : DIGIT+;
decimal : DIGIT+ '.' DIGIT+ | '.' DIGIT+;
string : QUOTE .*? QUOTE;
booleanOps : (AND | OR);
compareOps : (LT | GT | EQ | GTEQ | LTEQ);
QUOTE : '\'';
OPERATOR: '+';
DIGIT: [0-9];
LETTER: [a-åA-Å];
MUL : '*';
DIV : '/';
ADD : '+';
SUB : '-';
GT : '>';
LT : '<';
EQ : '=';
GTEQ : '>=';
LTEQ : '<=';
AND : '&' | 'and';
OR : '?' | 'or';
WS : ' '+ -> skip;
Rules that come first always have precedence. In your case you need to move AND and OR before LETTER. Also there is the same problem with GTEQ and LTEQ, maybe somewhere else too.
EDIT
Additionally, you should make identifyer a lexer rule, i.e. start with capital letter (IDENTIFIER or Identifier). The same goes for int, decimal and string. Input is initially a stream of characters and is first processed into a stream of tokens, using only lexer rules. At this point parser rules (those starting with lowercase letter) do not come to play yet. So, to make "BorI" parse as single entity (token), you need to create a lexer rule that matches identifiers. Currently it would be parsed as 3 tokens: LETTER (B) OR (or) LETTER (I).
Thanks for your help. There were multiple problems. Reading the ANTLR4 book and using "TestRig -gui" got me on the right track. The working grammar is:
grammar UnikFormel;
formel : '(' formel ')' # Parentes
| expression # Expr
| boolExpression # BooleanExpr
;
boolExpression : '(' expression ')' ( booleanOps '(' expression ')' )+
| '(' formel ')' ( booleanOps '(' formel ')' )+;
expression : element compareOps element # Compare;
datakildefelt : ID '.' ID;
function : ID ('(' funcParameters? ')')?;
funcParameters : funcParameter (';' funcParameter)*;
funcParameter : element;
element : datakildefelt # DatakildeId
| function # Funktion
| INT # Integer
| DECIMAL # Real
| STRING # Text;
booleanOps : (AND | OR);
compareOps : ( GTEQ | LTEQ | LT | GT | EQ |);
AND : '&' | 'and';
OR : '?' | 'or';
GTEQ : '>=';
LTEQ : '<=';
GT : '>';
LT : '<';
EQ : '=';
ID : LETTER ( LETTER | DIGIT)*;
INT : DIGIT+;
DECIMAL : DIGIT+ '.' DIGIT+ | '.' DIGIT+;
STRING : QUOTE .*? QUOTE;
fragment QUOTE : '\'';
fragment DIGIT: [0-9];
fragment LETTER: [a-åA-Å];
WS : [ \t\r\n]+ -> skip;

Identifier Lexer rule does not match '*' like its supposed to

I am in the process of finalizing a grammar for a proprietary pattern language. It borrows a few regex syntax elements (like quantifiers) but it's also a lot more complex than regex, since it has to allow macros, different pattern styles etc.
My problem is that '*' does not match against the ID lexer rule like it's supposed to. There is no other rule that could swallow the * token as far as i see.
Here's the grammar i wrote:
grammar Pattern;
element:
ID
| macro;
macro:
MACRONAME macroarg? ('*'|'+'|'?'|FROMTIL)?;
macroarg: '['( (element | MACROFREE ) ';')* (element | MACROFREE) ']';
and_con :
element '&' element
| and_con '&' element
|'(' and_con ')';
head_con :
'H[' block '=>' block ']';
expression :
element
| and_con
| expression ' ' element
| '(' expression ')';
block :
element
| and_con
| or_con
| '(' block ')';
blocksequence :
(block ' '+)* block;
or_con :
((element | and_con) '|')+ (element | and_con)
| or_con '|' (element | and_con)
| '(' blocksequence (')|(' blocksequence)+ (')'|')*');
patternlist :
(blocksequence ' '* ',' ' '*)* blocksequence;
sentenceord :
'S=(' patternlist ')';
sentenceunord :
'S={' patternlist '}';
pattern :
sentenceord
| sentenceunord
| blocksequence;
multisentence :
MS pattern;
clause :
'CLS' ' '+ pattern;
complexpattern :
pattern
| multisentence
| clause
| SECTIONS ' ' complexpattern;
dictentry:
NUM ';' complexpattern
| NUM ';' NAME ';' complexpattern
| COMMENT;
dictionary:
(dictentry ('\r'|'\n'))* (dictentry)?;
ID : '*' ('*'|'+'|'?'|FROMTIL)?
| ( '^'? '!'? ('F'|'C'|'L'|'P'|'CA'|'N'|'PE'|'G'|'CD'|'T'|'M'|'D')'=' NAME ('*'|'+'|'?'|FROMTIL)? '$'? );
MS : 'MS' [0-9];
SECTIONS: 'SEC' '=' ([0-9]+','?)+;
FROMTIL: '{'NUM'-'NUM'}';
NUM: [0-9]+;
NAME: CHAR+ | ',' | '.' | '*';
CHAR: [a-zA-Z0-9_äöüßÄÖÜ\-];
MACRONAME: '#'[a-zA-Z_][a-zA-Z_0-9]*;
MACROFREE: [a-zA-Z!]+;
COMMENT: '//' ~('\r'|'\n')*;
The complexpattern/pattern/element/block parser rules should accept a simple '*', and i can't figure out why they don't.
In your macro rule, you defined the literal '*', causing the ID rule not to match a single "*" as input.

Resources