I m developing question answering system in java,in tht I have created templates manually which will be match to user asked question.
Problem is after pre processing i have list of
Keywords and these keywords I want to match with keywords in stored template to filter search.is there any algorithm?
Ex.ques. wht is features of java?
Keywords-features java
Extract Templates containing keywords features and java.
the thing which i have understood from your question you have some keywords and some other patterns containing those keywords in your lexicon and some others which come to your system from users question. then you need an algorithm to find these patterns in your system input.
as i know in java if you define a pattern from Pattern class then you can do like below to achieve the thing which you want:(A simple example)
Pattern pat = Pattern.compile("[A-Z]+");
Matcher matcher = pat.matcher("ABCD");
if(matcher.matches()) {
System.out.println("it matchs.");
}
Related
I have a text field for tags. For example some entities:
{"tags": "apple. fruits. eat."}
{"tags": "green apple."}
{"tags": "banana. apple."}
I want to select entities with tag apple, not green apple or smth apple smth. Different variants lead to the one point: select a sentence with existing expression and it doesn't matter how this sentence looks like. But in this case it's matter.
How can I do it by using Lucene syntax or Azure Search tools? Or (in general) how can I search for a completely same sentence?
I presume that the "." is a deliminator for the different tags. There may be a way to express this in lucene, but you may need to add some custom analyzers to preserve the "."'s in tokenization.
A better strategy in this case would be use use a field of type Collection(Edm.String). This will allow you to better preserve structure the phrases for the tags, and you can use a filter to select the specific value of "apple". Collection(Edm.String) also allows you to enable faceting of the tags which is useful.
I am using syntaxnet in Spanish and I have found that all the words have a field called "feats" whose format depends on the type of word (noun, pronoun, verb). There are some fields whose meaning is obvious, but in other cases I cannot figure out what it is showing. For example, this is the case of fields such as "fPOS" or "Case" in pronouns. Is there any guide or list with explanations avaiable?
Universal features are documented here .
I have a document
doc = nlp('x-xxmessage-id:')
When I want to extract the tokens of this one I get 'x', 'xx', 'message' and 'id', ':'. Everything goes well.
Then I create a new document
test_doc = nlp('id')
If I try to extract the tokens of test_doc, I will get 'i' and 'd'. Is there any way to get past this problem? Because I want to get the same token as above and this is creating problems in the text processing.
Just like language itself, tokenization is context-dependent and the language-specific data defines rules that tell spaCy how to split the text based on the surrounding characters. spaCy's defaults are also optimised for general-purpose text, like news text, web texts and other modern writing.
In your example, you've come across an interesting case: the abstract string "x-xxmessage-id:" is split on punctuation, while the isolated lowercase string "id" is split into "i" and "d", because in written text, it's most commonly an alternate spelling of "I'd" or "i'd" ("I could", "I would" etc.). You can find the respective rules here.
If you're dealing with specific texts that are substantially different from regular natural language texts, you usually want to customise the tokenization rules or possibly even add a Language subclass for your own custom "dialect". If there's a fixed number of cases you want to tokenize differently that can be expressed by rules, another option would be to add a component to your pipeline that merges the split tokens back together.
Finally, you could also try using the language-independent xx / MultiLanguage class instead. It still includes very basic tokenization rules, like splitting on punctuation, but none of the rules specific to the English language.
from spacy.lang.xx import MultiLanguage
nlp = MultiLanguage()
I admit that I havent searched extensively in the SO database. I tried reading the natural npm package but doesnt seem to provide the feature. I would like to know if the below requirement is somewhat possible ?
I have a database that has list of all cities of a country. I also have rating of these cities (best place to live, worst place to live, best rated city, worsrt rated city etc..). Now from the User interface, I would like to enable the user to enter free text and from there I should be able to search my database.
For e.g Best place to live in California
or places near California
or places in California
From the above sentence, I want to extract the nouns only (may be ) as this will be name of the city or country that I can search for.
Then extract 'best' means I can sort is a particular order etc...
Any suggestions or directions to look for?
I risk a chance that the question will be marked as 'debatable'. But the reason I posted is to get some direction to proceed.
[I came across this question whilst looking for some use cases to test a module I'm working on. Obviously the question is a little old, but since my module addresses the question I thought I might as well add some information here for future searchers.]
You should be able to do what you want with a POS chunker. I've recently released one for Node that is modelled on chunkers provided by the NLTK (Python) and Standford NLP (Java) libraries (the chunk() and TokensRegex() methods, resepectively).
The module processes strings that already contain parts-of-speech, so first you'll need to run your text through a parts-of-speech tagger, such as pos:
var pos = require('pos');
var words = new pos.Lexer().lex('Best place to live in California');
var tags = new pos.Tagger()
.tag(words)
.map(function(tag){return tag[0] + '/' + tag[1];})
.join(' ');
This will give you:
Best/JJS place/NN to/TO live/VB in/IN California/NNP ./.
Now you can use pos-chunker to find all proper nouns:
var chunker = require('pos-chunker');
var places = chunker.chunk(tags, '[{ tag: NNP }]');
This will give you:
Best/JJS place/NN to/TO live/VB in/IN {California/NNP} ./.
Similarly you could extract verbs to understand what people want to do ('live', 'swim', 'eat', etc.):
var verbs = chunker.chunk(tags, '[{ tag: VB }]');
Which would yield:
Best/JJS place/NN to/TO {live/VB} in/IN California/NNP ./.
You can also match words, sequences of words and tags, use lookahead, group sequences together to create chunks (and then match on those), and other such things.
You probably don't have to identify what is a noun. Since you already have a list of city and country names that your system can handle, you just have to check whether the user input contains one of these names.
Well firstly you'll need to find a way to identify nouns. There is no core node module or anything that can do this for you. You need to loop through all words in the string and then compare them against some kind of dictionary database so you can find each word and check if it's a noun.
I found this api which looks pretty promising. You query the API for a word and it sends you back a blob of data like this:
<?xml version="1.0" encoding="UTF-8"?>
<results>
<result>
<term>consistent, uniform</term>
<definition>the same throughout in structure or composition</definition>
<partofspeech>adj</partofspeech>
<example>bituminous coal is often treated as a consistent and homogeneous product</example>
</result>
</results>
You can see that it includes a partofspeech member which tells you that the word "consistent" is an adjective.
Another (and better) option if you have control over the text being stored is to use some kind of markup language to identify important parts of the string before you save it. Something like BBCode. I even found a BBCode node module that will help you do this.
Then you can save your strings to the database like this:
Best place to live in [city]California[/city] or places near [city]California[/city] or places in [city]California[/city].
or
My name is [first]Alex[/first] [last]Ford[/last].
If you're letting user's type whole sentences of text and then you're trying to figure out what parts of those sentences is data you should use in your app then you're making things very unnecessarily hard on yourself. You should either ask them to input important pieces of data into their own text boxes or you should give the user a formatting language such as the aforementioned BBCode syntax so they can identify important bits for you. The job of finding out which parts of a string are important is going to be a huge one for you I think.
I'm fairly new to Expression Engine and I feel this is a really simple question, I just can't find a straight-forward answer from the documentation.
I have a list of restaurants and an alphabetized menu (A B C D etc...)
I want to search only he listings that start with the letter "A".
In a tradiational MySQL search that's be WHERE Title LIKE 'A%'
Any ideas?
I do not believe the Channel Entries module's search parameter allows LIKE matching.
You'll save time by grabbing the Low Alphabet module in this specific case for sure.
Expression Engine doesn't have an exact "LIKE" option but they do have something similar.
I can search a field to see if it "contains" a string but there isn't anything specifically to determine if it starts with or ends with a specific string (such as would be easily available in MySQL).
I ended up doing the "contains" search parameter and then excluded any results within the exp:channel:entries looping that didn't match my exact criteria.