Jape grammar to identify product release - text

How can i use AND operation on jape grammar?. I just want to check whether a sentence contain 'organisation','jobtitle','person' all together in any order. How it possible? There is '|'(OR) operation allowed but i didnt see any documentation about AND operation.

There isn't an "and" operator like that as such but you could do it with a set of contains checks:
Rule: OrgTitlePer
({Sentence contains {Organization},
Sentence contains {JobTitle},
Sentence contains {Person}}):sent
-->
:sent.Interesting = {}
When you have several constraints within the same set of braces that involve the same annotation type on the left (Sentence in this case) then all the constraints must be satisfied simultaneously by the same annotation.

Related

Using flex to identify variable name without repeating characters

I'm not fully sure how to word my question, so sorry for the rough title.
I am trying to create a pattern that can identify variable names with the following restraints:
Must begin with a letter
First letter may be followed by any combination of letters, numbers, and hyphens
First letter may be followed with nothing
The variable name must not be entirely X's ([xX]+ is a seperate identifier in this grammar)
So for example, these would all be valid:
Avariable123
Bee-keeper
Y
E-3
But the following would not be valid:
XXXX
X
3variable
5
I am able to meet the first three requirements with my current identifier, but I am really struggling to change it so that it doesn't pick up variables that are entirely the letter X.
Here is what I have so far: [a-z][a-z0-9\-]* {return (NAME);}
Can anyone suggest a way of editing this to avoid variables that are made up of just the letter X?
The easiest way to handle that sort of requirement is to have one pattern which matches the exceptional string and another pattern, which comes afterwards in the file, which matches all the strings:
[xX]+ { /* matches all-x tokens */ }
[[:alpha:]][[:alnum:]-]* { /* handle identifiers */ }
This works because lex (and almost all lex derivatives) select the first match if two patterns match the same longest token.
Of course, you need to know what you want to do with the exceptional symbol. If you just want to accept it as some token type, there's no problem; you just do that. If, on the other hand, the intention was to break it into subtokens, perhaps individual letters, then you'll have to use yyless(), and you might want to switch to a new lexing state in order to avoid repeatedly matching the same long sequence of Xs. But maybe that doesn't matter in your case.
See the flex manual for more details and examples.

Can you set multiple (different) tags with the same value?

For some of my projects, I have had to use the viper package to use configuration.
The package requires you to add the mapstructure:"fieldname" to identify and set your configuration object's fields correctly, but I have also had to add other tags for other purposes, leading to something looking like the following :
type MyStruct struct {
MyField string `mapstructure:"myField" json:"myField" yaml:"myField"`
}
As you can see, it is quite redundant for me to write tag:"myField" for each of my tag, so I was wondering if there was any way to "bundle" them up and reduce the verbosity, with something like this mapstructure,json,yaml:"myField"
Or is it simply not possible and you must specify every tag separately ?
Struct tags are arbitrary string literals. Data stored in struct tags may look like whatever you want them to be, but if you don't follow the conventions, you'll have to write your own parser / processing logic. If you follow the conventions, you may use StructTag.Get() and StructTag.Lookup() to easily get tag values.
The conventions do not support "merging" multiple tags, so just write them all out.
The conventions, quoted from reflect.StructTag:
By convention, tag strings are a concatenation of optionally space-separated key:"value" pairs. Each key is a non-empty string consisting of non-control characters other than space (U+0020 ' '), quote (U+0022 '"'), and colon (U+003A ':'). Each value is quoted using U+0022 '"' characters and Go string literal syntax.
See related question: What are the use(s) for tags in Go?

Can JAPE match paragraph Annotation in LHS?

I'm working on a math word problem solver, and would like to pass whole problems to my GATE Embedded application using JAPE. I'm using GATE IDE to display the output, as well as run the pipeline of GATE components. Each problem will be in its own paragraph, and each document will have several problems on it.
Is there a way to match any paragraph using the JAPE left-hand side regex?
I see three options here (there may be more elegant solutions):
1) Use simple rule like:
Phase: find
Input: Token
Options: control = once
Rule:OneToken
(
{Token}
)
In RHS you could get a text and use standard Java approach for getting paragraphs from plain text.
2) Use LHS (if you really want only LHS)
Rule: NewLine
(
({SpaceToken.string=="\n"}) |
({SpaceToken.string=="\r"}) |
({SpaceToken.string=="\n"}{SpaceToken.string=="\r"}) |
({SpaceToken.string=="\r"}{SpaceToken.string=="\n"})
):left
Build annotation NewLine, then write a Jape rule similar to 1) but with NewLine instead of Token. Take all NewLines from outputAS and build your Paragraph annotations.
3) Sometimes there may be right paragraphs in Original markups. In this case you could use Annotation Set Transfer PR and get them in Default Annotations Set.
why not just use RegEx Sentence splitter PR to use Split as the Input in your jape rules?

XML schema restriction pattern for not allowing specific string

I need to write an XSD schema with a restriction on a field, to ensure that
the value of the field does not contain the substring FILENAME at any location.
For example, all of the following must be invalid:
FILENAME
ORIGINFILENAME
FILENAMETEST
123FILENAME456
None of these values should be valid.
In a regular expression language that supports negative lookahead, I could do this by writing /^((?!FILENAME).)*$ but the XSD pattern language does not support negative lookahead.
How can I implement an XSD pattern restriction with the same effect as /^((?!FILENAME).)*$ ?
I need to use pattern, because I don't have access to XSD 1.1 assertions, which are the other obvious possibility.
The question XSD restriction that negates a matching string covers a similar case, but in that case the forbidden string is forbidden only as a prefix, which makes checking the constraint easier. How can the solution there be extended to cover the case where we have to check all locations within the input string, and not just the beginning?
OK, the OP has persuaded me that while the other question mentioned has an overlapping topic, the fact that the forbidden string is forbidden at all locations, not just as a prefix, complicates things enough to require a separate answer, at least for the XSD 1.0 case. (I started to add this answer as an addendum to my answer to the other question, and it grew too large.)
There are two approaches one can use here.
First, in XSD 1.1, a simple assertion of the form
not(matches($v, 'FILENAME'))
ought to do the job.
Second, if one is forced to work with an XSD 1.0 processor, one needs a pattern that will match all and only strings that don't contain the forbidden substring (here 'FILENAME').
One way to do this is to ensure that the character 'F' never occurs in the input. That's too drastic, but it does do the job: strings not containing the first character of the forbidden string do not contain the forbidden string.
But what of strings that do contain an occurrence of 'F'? They are fine, as long as no 'F' is followed by the string 'ILENAME'.
Putting that last point more abstractly, we can say that any acceptable string (any string that doesn't contain the string 'FILENAME') can be divided into two parts:
a prefix which contains no occurrences of the character 'F'
zero or more occurrences of 'F' followed by a string that doesn't match 'ILENAME' and doesn't contain any 'F'.
The prefix is easy to match: [^F]*.
The strings that start with F but don't match 'FILENAME' are a bit more complicated; just as we don't want to outlaw all occurrences of 'F', we also don't want to outlaw 'FI', 'FIL', etc. -- but each occurrence of such a dangerous string must be followed either by the end of the string, or by a letter that doesn't match the next letter of the forbidden string, or by another 'F' which begins another region we need to test. So for each proper prefix of the forbidden string, we create a regular expression of the form
$prefix || '([^F' || next-character-in-forbidden-string || ']'
|| '[^F]*'
Then we join all of those regular expressions with or-bars.
The end result in this case is something like the following (I have inserted newlines here and there, to make it easier to read; before use, they will need to be taken back out):
[^F]*
((F([^FI][^F]*)?)
|(FI([^FL][^F]*)?)
|(FIL([^FE][^F]*)?)
|(FILE([^FN][^F]*)?)
|(FILEN([^FA][^F]*)?)
|(FILENA([^FM][^F]*)?)
|(FILENAM([^FE][^F]*)?))*
Two points to bear in mind:
XSD regular expressions are implicitly anchored; testing this with a non-anchored regular expression evaluator will not produce the correct results.
It may not be obvious at first why the alternatives in the choice all end with [^F]* instead of .*. Thinking about the string 'FEEFIFILENAME' may help. We have to check every occurrence of 'F' to make sure it's not followed by 'ILENAME'.

Can I put one check on a Lexial element instead for on a number of parser rules?

I,m trying to use antlr4 with the IDL.g4 grammar, to implement some checks that our idl-files shall follow. One rule is about names. The rule are like:
ID contains only letters, digits and signle underscores,
ID begin with a letter,
ID end with a letter or digit.
ID is not a reserved Word in ADA, C, C++, Java, IDL
One way to do this check is to write a function that check a string for these properties and call it in the exit listeners for every rule that has an ID. E.g(refering to IDL.g4) in exitConst_decl(), exitInit_decl(), exitSimple_declarator() and a lot of more places. Maybe that is the correct way to do it. But I was thinking about putting that check directly on the lexical element ID. But don't know how to do that, or if it is possible at all.
Validating this type of constraint in the lexer would make it significantly more difficult to provide usable error messages for invalid identifiers. However, you can create a new parser rule identifier, and replace all references to ID in various parser rules to reference identifier instead.
identifier
: ID
;
You can then place your identifier validation logic inside of the single method enterIdentifier instead of all of the various rules that currently reference ID.

Resources