i have created a elastic search index using below DSL query -
it is already created manually but i am trying to index data using mongosastic with node.js. I am using synchronize method to index my mongodb collection to elastic search. what should be my nodejs mapping code so that it can be indexed properly ?
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"ngram_filter": { // ngrams analyzers
"type": "ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"ngram_filter"
]
}
}
}
},
"mappings": {
"employees": {
"_all": {
"type": "string",
"index_analyzer": "ngram_analyzer",
"search_analyzer": "standard"
},
"properties": { // schema start
"FirstName": {
"type": "string",
"include_in_all": true,
"term_vector": "yes",
"index_analyzer": "ngram_analyzer",
"search_analyzer": "standard"
} // it has more fiels as given in schema below
} // schema end
}
}
}
my mongodb collection schema is -
{
"FirstName": "MISTI",
"LastName": "RAMSTAD",
"Designation": "CEO",
"Salary": "148000",
"DateOfJoining": "23/09/1997",
"Address": "32 Pawnee Ave. San Pablo, CA 94806",
"Gender": "Female",
"Age": 55,
"MaritalStatus": "Unmarried",
"I`enter code here`nterests": "Letterboxing,Scuba Diving,Mountain Biking,Handwriting Analysis,Models"
}
you can see the below answer, you can create the index with settings and mapping from your index.ts file when the server start.
also if you want to update your mapping just make your update and restart the server.
Elastic Search when to add dynamic mappings
Related
I have been trying to match a query using the elasticsearch python client but I am unable to match it even after using escape characters and setting up some custom analyzers and mapping them. I want to search using & and its not giving any response.
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
doc1 = {
'name': 'numb',
'band': 'linkin_park',
'year': '2006'
}
doc2 = {
'name': 'Powerless &',
'band': 'linkin_park',
'year': '2006'
}
doc3 = {
'name': 'Crawling !',
'band': 'linkin_park',
'year': '2006'
}
doc =[doc1, doc2, doc3]
'''
create_index = {
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
}
}
}
es.indices.create(index="idx_temp", body=create_index)
'''
for i in range(3):
es.index(index="idx_temp", doc_type='_doc', id=i, body=doc[i])
my_mapping = {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
'ignore_above': 256
}
},
"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"
},
"band": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"
},
"year": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"
}
}
}
es.indices.put_mapping(index='idx_temp', body=my_mapping, doc_type='_doc', include_type_name=True)
res = es.search(index='idx_temp', body={
"query": {
"match": {
"name": {
"query": "powerless &",
"fuzziness": 3
}
}
}
})
for hit in res['hits']['hits']:
print(hit['_source'])
The expected output was 'name': 'Poweeerless &', but i got 0 hits and no value returned.
So I have fixed the problem by adding another field
"search_quote_analyzer": "my_analyzer"
to the settings field after
"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"
And then I'm getting my output by searching with & in the query as
'name': 'Poweeerless &'
I just tried it using your index settings, mapping, and query and was able to get the results. Below are 2 different things which I did.
Escape the special char &, when I was trying to index the doc using ES REST API directly, using below the body in postman:
{
"content": "Powerless \&" }
Then ES gave me the Unrecognized character escape '&' exception and even Postman, popular REST client was also giving me warning about not a proper string.
Then I changed above payload to below and was able to index the doc:
{
"content": "Powerless \\&" :-> Notice I added a another `\` to escape the `&`
}
I changed the query to use the same field, which was having the value &, in your case it is name field, not the content field., As match query is analyzed and uses the same analyzer which is used for indexing time. And was able to get the result.
PS: I also verified your analyzer using _analyze api and it's generating the below tokens for text Powerless \\&
{
"tokens": [
{
"token": "powerless",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 0
},
{
"token": "\\&",
"start_offset": 10,
"end_offset": 12,
"type": "word",
"position": 1
}
]
}
We have a table with this type of structure:
{_id:15_0, createdAt: 1/1/1, task_id:[16_0, 17_0, 18_0], table:”details”, a:b, c: d, more}
We created indexes using
{
"index": {},
"name": "paginationQueryIndex",
"type": "text"
}
It auto created
{
"ddoc": "_design/28e8db44a5a0862xxx",
"name": "paginationQueryIndex",
"type": "text",
"def": {
"default_analyzer": "keyword",
"default_field": {
},
"selector": {
},
"fields": [
],
"index_array_lengths": true
}
}
We are using the following query
{
"selector": {
"createdAt": { "$gt": 0 },
"task_id": { "$in": [ "18_0" ] },
"table": "details"
},
"sort": [ { "createdAt": "desc" } ],
"limit”: 20
}
It takes 700-800 ms for first time, after that it decreases to 500-600 ms
Why does it take longer the first time?
Any way to speed up the query?
Any way to add indexes to specific fields if type is “text”? (instead of indexing all the fields in these records)
You could try creating the index more explicitly, defining the type of each field you wish to index e.g.:
{
"index": {
"fields": [
{
"name": "createdAt",
"type": "string"
},
{
"name": "task_id",
"type": "string"
},
{
"name": "table",
"type": "string"
}
]
},
"name": "myindex",
"type": "text"
}
Then your query becomes:
{
"selector": {
"createdAt": { "$gt": "1970/01/01" },
"task_id": { "$in": [ "18_0" ] },
"table": "details"
},
"sort": [ { "createdAt": "desc" } ],
"limit": 20
}
Notice that I used strings where the data type is a string.
If you're interested in performance, try removing clauses from your query one at-a-time to see if one is causing the performance problem. You can also look at the explanation of your query to see if it using your index correctly.
Documentation on creating an explicit text query index is here
If I search for a term "Liebe" the current query and the analyzers used, returns me the results containing the word "Liebe" as a part of a different word "Verlieben" are prioritized over those with only this word.
It should be the other way.
I am also using some advance filters and aggregations too. But here is the most basic query that I use to search.
{
"query": {
"query_string": {
"query": "Liebe",
"default_operator": "AND",
"analyzer": "my_analyzer1"
}
},
"size": "10",
"from": 0
}
The analyzers and index settings are as follows:
{
"settings": {
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
},
"my_analyzer1":{
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_stop'.$s_id.'",
"asciifolding"
]
}
}
}
},
"mappings": {
"product": {
"_all": {
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
'.$mapping.'
}
}
}
}
Please ignore the $mapping. These are the dynamic fields that reside in the index based on some settings in my framework.
Can anyone please point me some direction where I don't need to change more and can get the what I mentioned above?
I have checked many things like match query n all. But, I don't have any fields fixed. So,I cant use that. And I want both the exact search and the search results which has partial match(Using nGrams).
Please help!
Thanks!
We have an index of items with which I'm attempting to do fuzzy wildcard on the items name.
the query
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": {
"query_string": {
"fields": [
"name.suggest"
],
"query": "avacado*",
"fuzziness": 0.7
}
}
}
}
}
the field in the index and the analyzers at play
"
suggest_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": ["standard", "lowercase", "shingle", "punctuation"]
}
"punctuation" : {
"type" : "word_delimiter",
"preserve_original": "true"
}
"name": {
"fields": {
"name": {
"type": "string",
"analyzer": "stem"
},
"suggest":{
"type": "string",
"analyzer": "suggest_analyzer"
},
"untouched": {
"include_in_all": false,
"index": "not_analyzed",
"index_options": "docs",
"omit_norms": true,
"type": "string"
},
"untouched_lowercase": {
"type": "string",
"index_analyzer": "lowercase",
"search_analyzer": "lowercase"
}
},
"type": "multi_field"
},
The problem is this
An item with the name "Avocado Test" will match for the following
avocado*
avo*
avacado
but fails to match for
avacado*
ava*
ava~2
I cant seem to make fuzzy work with wildcards, it seems to be either fuzzy works or wildcards work but not in combination.
Es version is 1.3.1
Note that my query is simplified and we have other filtering going on but I boiled it down to just the query to take any ambiguity out of the results. I've attempted to use the suggest features but they won't allow the level of filtering we need.
Is there any other way to handle doing suggest/typeahead style searching with fuzziness to catch misspellings?
You could try EdgeNgramTokenFilter, use it on a analyzer applied on the desired field and do a fuzzy search on it.
I am using the river plugin for CouchDB and when I execute the following curl command:
curl -XPUT 'localhost:9200/_river/blog/_meta' -d '{
"type": "couchdb",
"couchdb": {
"host": "localhost",
"port": 5984,
"db": "blog",
"filter": null
},
"index": {
"analysis": {
"analyzer": {
"whitespace": {
"type": "whitespace",
"filter": "lowercase"
},
"ox_edgeNGram": {
"type": "custom",
"tokenizer": "ox_t_edgeNGram",
"filter": [
"lowercase"
]
},
"ox_NGram": {
"type": "custom",
"tokenizer": "ox_t_NGram",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"ox_t_edgeNGram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 25,
"side": "front"
},
"ox_t_NGram": {
"type": "NGram",
"min_gram": 2,
"max_gram": 25
}
}
}
}
}'
receive the response:
{
"ok": true,
"_index": "_river",
"_type": "blog",
"_id": "_meta",
"_version": 1
}
The problem I have, is when I want to view the settings in the browser and go to:
http://localhost:9200/blog/_settings?pretty=true
The json that is returned is as follows, but I'm expecting information regarding the analyzer etc. that I thought I created.
Returned JSON:
{
"blog": {
"settings": {
"index.number_of_shards": "5",
"index.number_of_replicas": "1"
}
}
}
It should also be noted that when I create a blog index without using the river and run a curl command to input the analysis information, I do receive a response from the browser indicating the settings that I input.
How can I set the default settings of a an index when using the River plugin?
To solve this issue:
Create new Elasticsearch index + mappings etc.
Create new Elasticsearch river with the name of the index set to that of the index created in step one.
I found the answer here:
http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/5ebf1556d139d5ac/f17e71e04cac5889?lnk=gst&q=couchDB+river+settings#f17e71e04cac5889
You can try this url http://localhost:9200/blog/_mapping?pretty=true
In the response mapping, if the analyzer is not explicitly mentioned, it is then the default analyzer.