Intent detects regex entity - dialogflow-es

Since regexp entities are available, I added one to my agent.
This entity is used as a required parameter of my intent.
After some tests it seems to not detect my intent using a word matching the regexp.
Any idea ?
IE :
Intent Training phrase: "my car is registered aa123aa"
"aa123aa" is the resolved value of a parameter of type regNum entity.
Entity regNum : ^[a-hj-np-tv-z]{2}(?:\s|-)?[0-9]{3}(?:\s|-)?[a-hj-np-tv-z]{2}$
I expect that the following phrase should match the intent and resolve the parameter value :
"my car is registered bb123bb"
In fact it matchs the intent but it's unable to resolve the parameter value.
Moreover if I use the training phrase "my car is registered aa123aa" it does not resolves the parameter value either

Dialogflow uses re2 regex. for more information visit this repository
For example:
ABc1234# is equivalent to
\A([A-Z]{2}[a-z]{1}[0-9]{4}[!##$%^&*(),.?":{}|<>]{1})\z
\A - beginning of text
[A-Z]{2} - two uppercase letter A-Z
[a-z]{1} - one lowercase letter a-z [0-9]{4} - four numbers
[!##$%^&*(),.?":{}|<>]{1} - one special character.
\z - end of text

Related

Regex pattern is taking more than 4 digit number

import re
text = """State of California that the foregoing is true and correct. (For California sheriff or marshal use only) 1950-24-12 I certify that the foregoing is true and correct. Date: (SIGNATURE) SUBP-010 [Rev. January 1,2012] PROOF OF SERVICE OF DEPOSITION SUBPOENA FOR PRODUCTION OF BUSINESS RECORDS 055826-00-07 Page 2 of 2"""
pattern = re.findall("\d{2,4}[-]\d{1,2}[-]\d{1,2}",text)
print(pattern)
Required_output: 1950-24-12
The solution is taking 5826-00-07. Though it has more than 4 digit number. Is there any solution to remove it
What you want is called negative lookbehind. This means only matching a pattern when the section directly behind the match does not match a given sequence. To give you an example of what this means, (?<!something)abc will match any occurrence of "abc" that does not directly get proceeded by "something".
So in your case, you want to add (?<!\d) to the beginning of your regex to only match a pattern not proceeded by a digit.
Also, [-] will only match the character - so you don't need the brackets. After this change, the new regex is (?<!\d)\d{2,4}-\d{1,2}-\d{1,2}.

Using flex to identify variable name without repeating characters

I'm not fully sure how to word my question, so sorry for the rough title.
I am trying to create a pattern that can identify variable names with the following restraints:
Must begin with a letter
First letter may be followed by any combination of letters, numbers, and hyphens
First letter may be followed with nothing
The variable name must not be entirely X's ([xX]+ is a seperate identifier in this grammar)
So for example, these would all be valid:
Avariable123
Bee-keeper
Y
E-3
But the following would not be valid:
XXXX
X
3variable
5
I am able to meet the first three requirements with my current identifier, but I am really struggling to change it so that it doesn't pick up variables that are entirely the letter X.
Here is what I have so far: [a-z][a-z0-9\-]* {return (NAME);}
Can anyone suggest a way of editing this to avoid variables that are made up of just the letter X?
The easiest way to handle that sort of requirement is to have one pattern which matches the exceptional string and another pattern, which comes afterwards in the file, which matches all the strings:
[xX]+ { /* matches all-x tokens */ }
[[:alpha:]][[:alnum:]-]* { /* handle identifiers */ }
This works because lex (and almost all lex derivatives) select the first match if two patterns match the same longest token.
Of course, you need to know what you want to do with the exceptional symbol. If you just want to accept it as some token type, there's no problem; you just do that. If, on the other hand, the intention was to break it into subtokens, perhaps individual letters, then you'll have to use yyless(), and you might want to switch to a new lexing state in order to avoid repeatedly matching the same long sequence of Xs. But maybe that doesn't matter in your case.
See the flex manual for more details and examples.

Azure Search: Keyword tokenizer don't work with multi word search

I have a fields in index with [Analyzer(<name>)] applied. This analyzer is of type CustomAnalyzer with tokenizer = Keyword. I assume it treats both field value and search text as one term each. E.g.
ClientName = My Test Client (in index, is broken into 1 term). Search term = My Test Client (broken in 1 term). Result = match.
But surprisingly that's not the case until I apply phrasal search (enclose term in double quotes). Does anyone know why? And how to solve it? I'd rather treat search term as the whole, then do enclosing
Regards,
Sergei.
This is expected behavior. Query text is processed first by the query parser and only individual query terms go through lexical analysis. When you issue a phrase query, the whole expression between quotes is treated as a phrase term and as one goes through lexical analysis. You can find a complete explanation of this process here: How full text search works in Azure Search.

Can I put one check on a Lexial element instead for on a number of parser rules?

I,m trying to use antlr4 with the IDL.g4 grammar, to implement some checks that our idl-files shall follow. One rule is about names. The rule are like:
ID contains only letters, digits and signle underscores,
ID begin with a letter,
ID end with a letter or digit.
ID is not a reserved Word in ADA, C, C++, Java, IDL
One way to do this check is to write a function that check a string for these properties and call it in the exit listeners for every rule that has an ID. E.g(refering to IDL.g4) in exitConst_decl(), exitInit_decl(), exitSimple_declarator() and a lot of more places. Maybe that is the correct way to do it. But I was thinking about putting that check directly on the lexical element ID. But don't know how to do that, or if it is possible at all.
Validating this type of constraint in the lexer would make it significantly more difficult to provide usable error messages for invalid identifiers. However, you can create a new parser rule identifier, and replace all references to ID in various parser rules to reference identifier instead.
identifier
: ID
;
You can then place your identifier validation logic inside of the single method enterIdentifier instead of all of the various rules that currently reference ID.

Jape grammar to identify product release

How can i use AND operation on jape grammar?. I just want to check whether a sentence contain 'organisation','jobtitle','person' all together in any order. How it possible? There is '|'(OR) operation allowed but i didnt see any documentation about AND operation.
There isn't an "and" operator like that as such but you could do it with a set of contains checks:
Rule: OrgTitlePer
({Sentence contains {Organization},
Sentence contains {JobTitle},
Sentence contains {Person}}):sent
-->
:sent.Interesting = {}
When you have several constraints within the same set of braces that involve the same annotation type on the left (Sentence in this case) then all the constraints must be satisfied simultaneously by the same annotation.

Resources