Identifying and comparing syntactic structure of questio-sentence - nlp

I am getting question from user and trying to understand syntactically.
My goal is to identify the exact question sentence from user entered question. Like
Obama is president of USA, who is his wife?
So I am able to apply anaphora resolution and get his pointing to Obama and can convert above sentence to
Obama is president of USA, who is Obama wife?
but how can I syntactically identify exact question sentence i.e. Who is obama wife? from above entire question
I am trying with pylinkgrammar which give 54 linkage for above sentence, like
linkparser>
Linkage 54, cost vector = (UNUSED=0 DIS= 8.05 LEN=24)
+------------------------------Xp------------------------------+
+---------------------->WV---------------------->+ |
+-------------------Xx-------------------+-->WV->+---SIs---+ |
+----Wd---+--Ss--+--Oum--+---Mp--+-Js+ +Wq+--Q-+ +Ds**c+ |
| | | | | | | | | | | |
LEFT-WALL Obama[!] is.v president.t of USA.l , who is.v his wife.n ?
What I want to do it defining pattern for different question type like W5H1, conjunction based question etc.
But I dont find how to write rule for these pattern, any suggestion and reference would be much appreciable?

You can try to extract different possible sub-questions (hypotheses) from your original text and test for textual entailment between your text and hypotheses. Check out http://hltfbk.github.io/Excitement-Open-Platform/#Recognizing_Textual_Entailment

Related

Gherkin - real or representative scenarios?

Lets say that my app works with some books with real titles "The Old Man and the Sea", "War and Peace", etc., when creating scenarios, should I use real title like:
Given I have a book "War and Peace" persisted
When ...
or should I do something like:
Given I have a book "Book1" persisted
When ...
Option 2 is more generic, but artificial example. And If I use first option, person who is reading the test has to have domain knowledge, and he will also have some presumptions about the scenario as soon as he reads the title of the book.
Also, is there some simpler way for me to create data table without repeating data (in this case page where I have always to repeat 1,1,2,2,2,2...)? example:
When we receive book with following content:
| Page | Line | Text |
| 1 | 1 | a |
| 1 | 2 | b |
| 2 | 1 | a |
| 2 | 2 | b |
is this standard way to do it:
When we receive a book
And page 1 has content
| Line | Text |
| 1 | a |
| 2 | b |
And page 2 has content
| Line | Text |
| 1 | a |
| 2 | b |
First of all start with the name of the scenario, this name should be meaningful and should be like a summary about what is about the test.
Once you have the name then the other steps should describe a business flow that of course should contain domain language, because for example if i don't know nothing about healthcare, banking etc then why would I understand a test about a specific domain subject?, the scenarios are for a specific group of people (the ones that are working in the specific domain).
One of the BDD role is to help in understanding better the specifications and the application on all levels (technical to non-technical, but on the same business domain), to improve communication.
Now for your specific issue.
Given I have a book "War and Peace" persisted does not offer to much info since the title of the book says nothing about the test data; is a new book that just was added/created?, is a type of book technical/poetry or just some book?
What was useful for me is use a name for the the data that says something about the data used in the test.
If you don't have different types of books you can use any name, else a more complete name would be more useful.
As for the table, that represents a data set and you need to tell what to check and where; depending by case you could group some checks, if you can read all data at once or not, or if you need to specify the texts/pages.
One option would be to hide the data set and say something like:
Given I have a book "War and Peace" persisted
Then the book contains the expected content for "War and Peace"
in the first step "War and Peace" - gets/creates a specific book that is identified by this title
in the second step "War and Peace" - identifies a set of data for the expected result using the same name since is the expected for that specific data set, this set of data can be list/array/map ... depending of what programming language you are using.
Don't think to much to the details, just define the scenario in human readable language using outside-in approach, then see if you can refine it and after start the implementation.
Always use a description for the feature and a meaningful title for each scenario

How can I display multiple line scenario text in extend reports?

In my feature file, using the same scenario I am checking more than one requirements. I have written the scenario like below:
Scenario: My first requirement ID
My second requirement ID
My third requirement ID
Etc
After execution, the extend report shows only the result as
Scenario: My first requirement ID
How can I get all the three I D,s in extent report.
NOTE:Each of my scenario title is lengthy.
Can you explain your scenario text a little bit more? According to the documentation, the scenario should describe in human terms what we expect the software to do. It is quite unusual to include expected data in that scenario text. Are you using the ID from an enum? If that is the case, it would be better to spell out the enum in human readable terms. Scenario: UserType is Administrator for example. Another option would be to use a Scenario Outline, something like
Scenario Outline: My generic requirement statement
Given Id <whateverId> is provided
When I do <activity>
Then I expect to see <result>
Examples:
| whateverId | activity | result |
| 12 | firstMethod | MyResult |
| 20 | secondActivity | anotherResult |
| 42 | thirdExample | thirdResult |
The variable names provided in the outline in angle brackets become the column headers in the examples grid. Just be sure to indent the grid below the Examples: line and also include the pipe | on both the left and right boundaries of the grid. Hopefully that helps.

Cucumber: How to execute the entire list of Scenario Outline along with example for a different set of attributes

I have a situation where I need to run the scenario outline along with all the datatable for different set of value. I am looking for an datatable inside another datatable. That I need to run my entire list of examples of a scenario outline repeatedly for the given list of products.
Note: I am trying to avoid write different scenario for each product.
I have given some example and my problem statement for better understanding as below
Scenario Outline : Check the behaviour of all the products
Given the POST retrieveProductdetails api url with valid authorization
When POST api is applied for the <"Products">
Then verify the behaviour of all the <"Properties"> and its <"result">
Examples:
|Properties |result|
|Appearance | Successful|
|reading | Successful|
|writing |Successful|
|memo |Successful|
|Singing |Successful|
|Help |Successful|
|Adancefeature |Successful|
|Antiquefeatuer |Succesful|
|AI nature |Successful|
|Interaction |Successful|
Note : I have around 20 Products to be validated and for each and every product i need validate all the 10 properties as mentioned .
If I start to write a an third Variable like as below , I will end up in writing 200 lines/examples (20 *10 = 200 ). And similar to the above scenario i have around 25 to 30 details which needed to be validate for all 20 products . The maintenance will be very difficult. Is there any better option for this ?
Examples:
|Properties |result |Products|
|Appearance | Successful |Alexa|
List of Products
|Products|
|Alexa|
|firetv|
|GoogleHome|
|Chromecast|
|SmartHub|
|SmartTV|
|AmazonVideo|
|AmazonPhoto|
|Echo|
|Echo Dot|
|Echo Show|
|Ring|
.
.
.
.
|SmartHome|
You are making this very difficult on yourself for a couple of reasons.
You are not describing the behavior of the system. I'm not sure what your application is supposed to do but it seems that you have a particular type of product and all instances of that type of product should have certain flags set.
However you didn't write this down, rather you appear to be retrieving all products of that type from a database and checking if these have the right flags set. So I have to infer the behavior of the system from your scenario. This should be the other way around.
You are trying to programming in Gherkin. Steps in Gherkin are not steps in a test script. They do not have to describe the exact operation needed to get some result. When you use Gherkin to describe the behavior of a system it shouldn't matter if you talk to the system in a unit tests, via http or a browser.
However by describing the exact operations you are painting yourself in a corner. It means that you can't effectively generalize without using programming language constructs like loops. If you step away from describing exact operations and rather try to describe what the system does you can use a much bigger vocabulary.
You appear to be testing against fixed data. Your data appears to have been put into the system already. You are merely checking if it comes out alright. This is not a good test because it assumes the system is in a particular state rather then creating the system in that state or verifying it is.
So to fix your feature file you might want to something like this:
Scenario: All smart home products are in the category of AI powered spy-devices
Given the smart home product "<Product>"
When I inspect this smart home product
Then it has all the properties of an AI powered spy-device:
| Appearance |
| reading |
| writing |
| memo |
| Singing |
| Help |
| Adancefeature |
| Antiquefeatuer |
| AI nature |
| Interaction |
Examples:
| Product |
| Alexa |
| firetv |
| GoogleHome |
| Chromecast |
| SmartHub |
| SmartTV |
| AmazonVideo |
| AmazonPhoto |
| Echo |
| Echo Dot |
| Echo Show |
| Ring |
.
.
.
.
| SmartHome |
While in the Given step you'd normally create the product, in your case you'll have to fetch the catalogue of products and verify that the catalogue contains the product. In the When step you'd probably fetch the details for the product. Finally in the Then step you'd verify if all properties have been set when looking at the details.
edit:
If you actually want to check if all the data has been entered into the system correctly you could also do something like this:
Scenario: All smart home products are in the category of AI powered spy-devices
Given the smart home product "<Product>"
When I inspect this smart home product
Then it has all the properties:
| Appearance | <Apperance> |
| reading | <Reading> |
| writing | <....> |
| memo | |
| Singing |
| Help |
| Adancefeature |
| Antiquefeatuer |
| AI nature |
| Interaction |
Examples:
| Product | Apperance | Reading | ....
| Alexa | Yes | No
| firetv | No | Yes
| GoogleHome | Yes | No
.
.
.
.
| SmartHome | No | Yes | ....
But I would suggest not using Cucumber for this. In that case you'd be better of putting your data into an excel file and using JUnit5s parameterized test.

Configure sphinx to rank exact matching higher with morphology enabled

I'm having sphinx index to search users by names.
I'm using soundex morphology to show more relevant results for case searcher doesn't exactly know how the name spells. Consider following table:
+----+--------------------+
| id | name |
+----+--------------------+
| 1 | Maciej Makuszewski |
| 2 | Dane Massey |
| 3 | Lionel Messi |
| 4 | Mr. No Matches |
+----+--------------------+
With soundex enabled sphinx suggests 1, 2, 3 rows as a relevant result for query messi. Anyway I'd like to show the exact matching first. I mean that if user types messi he wants to see Lionel Messi the first with great probability.
My problem is I don't know how to do that. I tried to set different rankers but it gives nothing.
I also tried to add
index_exact_words = 1
to index but it gives nothing.
I'm using sphinx API with node.js sphinxapi module if it matters.
What is the common way of solving such issue?
You want, index_exact_words, but should also add expand_keywords
This will cause sphinx to search for the fuzzy (via morphology) AND the exact word (via index_exact_words) automatically. So an exact match, matches both, and ranks higher.
Can do the same manually by searching for say
messi | =messi
(which is similar to what expand_keywords does internally)

pocketsphinx how to determine out of grammar words

I am currently using pocketsphix demo (android and visual studio 2010) and I have configured a jsgf grammer
like this
#JSGF V1.0;
grammar Names;
public <popular> = muhammad | ahmed | maria | john | kelley | peter | jacob | jason;
Whenever I say correct name, it detects it right in most cases but when I say a name not in the list, it still matches something and I do not want that or atleast be able to detect if something was said that was not in Grammar (may be through some score or api of pocketsphinx)
I am sure pocketsphinx has it somewhere which I don't know. Please advise.
Thanks,
Ahmed
No, this feature is not implemented. For more details see
http://cmusphinx.sourceforge.net/wiki/faq#qcan_pocketsphinx_reject_out-of-grammar_words_and_noises
You can use keyword spotting mode instead to look for a list of keyphrases. You can configure activation threshold for every phrase.

Resources