I am using the river plugin for CouchDB and when I execute the following curl command:
curl -XPUT 'localhost:9200/_river/blog/_meta' -d '{
"type": "couchdb",
"couchdb": {
"host": "localhost",
"port": 5984,
"db": "blog",
"filter": null
},
"index": {
"analysis": {
"analyzer": {
"whitespace": {
"type": "whitespace",
"filter": "lowercase"
},
"ox_edgeNGram": {
"type": "custom",
"tokenizer": "ox_t_edgeNGram",
"filter": [
"lowercase"
]
},
"ox_NGram": {
"type": "custom",
"tokenizer": "ox_t_NGram",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"ox_t_edgeNGram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 25,
"side": "front"
},
"ox_t_NGram": {
"type": "NGram",
"min_gram": 2,
"max_gram": 25
}
}
}
}
}'
receive the response:
{
"ok": true,
"_index": "_river",
"_type": "blog",
"_id": "_meta",
"_version": 1
}
The problem I have, is when I want to view the settings in the browser and go to:
http://localhost:9200/blog/_settings?pretty=true
The json that is returned is as follows, but I'm expecting information regarding the analyzer etc. that I thought I created.
Returned JSON:
{
"blog": {
"settings": {
"index.number_of_shards": "5",
"index.number_of_replicas": "1"
}
}
}
It should also be noted that when I create a blog index without using the river and run a curl command to input the analysis information, I do receive a response from the browser indicating the settings that I input.
How can I set the default settings of a an index when using the River plugin?
To solve this issue:
Create new Elasticsearch index + mappings etc.
Create new Elasticsearch river with the name of the index set to that of the index created in step one.
I found the answer here:
http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/5ebf1556d139d5ac/f17e71e04cac5889?lnk=gst&q=couchDB+river+settings#f17e71e04cac5889
You can try this url http://localhost:9200/blog/_mapping?pretty=true
In the response mapping, if the analyzer is not explicitly mentioned, it is then the default analyzer.
Related
I have been trying to match a query using the elasticsearch python client but I am unable to match it even after using escape characters and setting up some custom analyzers and mapping them. I want to search using & and its not giving any response.
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
doc1 = {
'name': 'numb',
'band': 'linkin_park',
'year': '2006'
}
doc2 = {
'name': 'Powerless &',
'band': 'linkin_park',
'year': '2006'
}
doc3 = {
'name': 'Crawling !',
'band': 'linkin_park',
'year': '2006'
}
doc =[doc1, doc2, doc3]
'''
create_index = {
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
}
}
}
es.indices.create(index="idx_temp", body=create_index)
'''
for i in range(3):
es.index(index="idx_temp", doc_type='_doc', id=i, body=doc[i])
my_mapping = {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
'ignore_above': 256
}
},
"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"
},
"band": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"
},
"year": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"
}
}
}
es.indices.put_mapping(index='idx_temp', body=my_mapping, doc_type='_doc', include_type_name=True)
res = es.search(index='idx_temp', body={
"query": {
"match": {
"name": {
"query": "powerless &",
"fuzziness": 3
}
}
}
})
for hit in res['hits']['hits']:
print(hit['_source'])
The expected output was 'name': 'Poweeerless &', but i got 0 hits and no value returned.
So I have fixed the problem by adding another field
"search_quote_analyzer": "my_analyzer"
to the settings field after
"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"
And then I'm getting my output by searching with & in the query as
'name': 'Poweeerless &'
I just tried it using your index settings, mapping, and query and was able to get the results. Below are 2 different things which I did.
Escape the special char &, when I was trying to index the doc using ES REST API directly, using below the body in postman:
{
"content": "Powerless \&" }
Then ES gave me the Unrecognized character escape '&' exception and even Postman, popular REST client was also giving me warning about not a proper string.
Then I changed above payload to below and was able to index the doc:
{
"content": "Powerless \\&" :-> Notice I added a another `\` to escape the `&`
}
I changed the query to use the same field, which was having the value &, in your case it is name field, not the content field., As match query is analyzed and uses the same analyzer which is used for indexing time. And was able to get the result.
PS: I also verified your analyzer using _analyze api and it's generating the below tokens for text Powerless \\&
{
"tokens": [
{
"token": "powerless",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 0
},
{
"token": "\\&",
"start_offset": 10,
"end_offset": 12,
"type": "word",
"position": 1
}
]
}
i have created a elastic search index using below DSL query -
it is already created manually but i am trying to index data using mongosastic with node.js. I am using synchronize method to index my mongodb collection to elastic search. what should be my nodejs mapping code so that it can be indexed properly ?
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"ngram_filter": { // ngrams analyzers
"type": "ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"ngram_filter"
]
}
}
}
},
"mappings": {
"employees": {
"_all": {
"type": "string",
"index_analyzer": "ngram_analyzer",
"search_analyzer": "standard"
},
"properties": { // schema start
"FirstName": {
"type": "string",
"include_in_all": true,
"term_vector": "yes",
"index_analyzer": "ngram_analyzer",
"search_analyzer": "standard"
} // it has more fiels as given in schema below
} // schema end
}
}
}
my mongodb collection schema is -
{
"FirstName": "MISTI",
"LastName": "RAMSTAD",
"Designation": "CEO",
"Salary": "148000",
"DateOfJoining": "23/09/1997",
"Address": "32 Pawnee Ave. San Pablo, CA 94806",
"Gender": "Female",
"Age": 55,
"MaritalStatus": "Unmarried",
"I`enter code here`nterests": "Letterboxing,Scuba Diving,Mountain Biking,Handwriting Analysis,Models"
}
you can see the below answer, you can create the index with settings and mapping from your index.ts file when the server start.
also if you want to update your mapping just make your update and restart the server.
Elastic Search when to add dynamic mappings
If I search for a term "Liebe" the current query and the analyzers used, returns me the results containing the word "Liebe" as a part of a different word "Verlieben" are prioritized over those with only this word.
It should be the other way.
I am also using some advance filters and aggregations too. But here is the most basic query that I use to search.
{
"query": {
"query_string": {
"query": "Liebe",
"default_operator": "AND",
"analyzer": "my_analyzer1"
}
},
"size": "10",
"from": 0
}
The analyzers and index settings are as follows:
{
"settings": {
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
},
"my_analyzer1":{
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_stop'.$s_id.'",
"asciifolding"
]
}
}
}
},
"mappings": {
"product": {
"_all": {
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
'.$mapping.'
}
}
}
}
Please ignore the $mapping. These are the dynamic fields that reside in the index based on some settings in my framework.
Can anyone please point me some direction where I don't need to change more and can get the what I mentioned above?
I have checked many things like match query n all. But, I don't have any fields fixed. So,I cant use that. And I want both the exact search and the search results which has partial match(Using nGrams).
Please help!
Thanks!
We have an index of items with which I'm attempting to do fuzzy wildcard on the items name.
the query
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": {
"query_string": {
"fields": [
"name.suggest"
],
"query": "avacado*",
"fuzziness": 0.7
}
}
}
}
}
the field in the index and the analyzers at play
"
suggest_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": ["standard", "lowercase", "shingle", "punctuation"]
}
"punctuation" : {
"type" : "word_delimiter",
"preserve_original": "true"
}
"name": {
"fields": {
"name": {
"type": "string",
"analyzer": "stem"
},
"suggest":{
"type": "string",
"analyzer": "suggest_analyzer"
},
"untouched": {
"include_in_all": false,
"index": "not_analyzed",
"index_options": "docs",
"omit_norms": true,
"type": "string"
},
"untouched_lowercase": {
"type": "string",
"index_analyzer": "lowercase",
"search_analyzer": "lowercase"
}
},
"type": "multi_field"
},
The problem is this
An item with the name "Avocado Test" will match for the following
avocado*
avo*
avacado
but fails to match for
avacado*
ava*
ava~2
I cant seem to make fuzzy work with wildcards, it seems to be either fuzzy works or wildcards work but not in combination.
Es version is 1.3.1
Note that my query is simplified and we have other filtering going on but I boiled it down to just the query to take any ambiguity out of the results. I've attempted to use the suggest features but they won't allow the level of filtering we need.
Is there any other way to handle doing suggest/typeahead style searching with fuzziness to catch misspellings?
You could try EdgeNgramTokenFilter, use it on a analyzer applied on the desired field and do a fuzzy search on it.
I am new to Elasticsearch, and right now I am trying to figure out why my synonyms are not returning any results like I expect them to.
I created a custom filter and analyzer for my synonyms file and applied the analyzer to both the _all field and explicitly defined the specialty field to use it as well.
When I search for "specialty": "aids" without the analyzer/tokenizer, it gives me zero results as expected.
However, when I search for "specialty": "aids" with the analyzer/tokenizer, I expect it to give me the same results as searching for "speciality": "retrovirology", which should yields 3 results, but it comes back with nothing.
Is there something wrong with how I am approaching this?
Here are my settings and some sample data:
curl -XDELETE "http://localhost:9200/personsearch"
curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "standard",
"filter": [
"XYZSynFilter"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"analyzer": "XYZSynAnalyzer"
}
}
}
}
}'
curl -XPUT "http://localhost:9200/personsearch/xyzemployee/1" -d'
{
"firstName": "Don",
"middleName": "W.",
"lastName": "White",
"specialty": "Adult Retrovirology"
}'
curl -XPUT "http://localhost:9200/personsearch/xyzemployee/2" -d'
{
"firstName": "Terrance",
"middleName": "G.",
"lastName": "Gartner",
"specialty": "Retrovirology"
}'
curl -XPUT "http://localhost:9200/personsearch/xyzemployee/3" -d'
{
"firstName": "Carter",
"middleName": "L.",
"lastName": "Taylor",
"specialty": "Pediatric Retrovirology"
}'
# Why is this returning nothing?
curl -XGET "http://localhost:9200/personsearch/xyzemployee/_search?pretty=true" -d'
{
"query": {
"match": {
"specialty": "retrovirology"
}
}
}'
You aren't lowercasing anywhere.
Try this:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "standard",
"filter": [
"lowercase", "XYZSynFilter"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
}
Note: you may want to split your index analyzer and search analyzer, and choose only one of them to do the synonyms. Expanding them only during indexing will speed search results.