I have indexed multiple (500) documents to ES with a "content" field with different files text in each index.
POST /megacorp/employee113
POST /megacorp/employee114
POST /megacorp/employee115
I would like to search for the word or phrase in many documents say in 150 documents. I tried "POST /megacorp/employee115,employee116/_search" like this. But I have to search in 150 documents. Is there any other way?
UPDATE:
**
'POST /megacorp/employee114/7
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to 158 build cabinets",
"interests": [ "forestry" ]
}'
** Here is the sample mapping. In the same way I have around 150 document indexed. Now I would like to perform a search in say 40 documents. I know we can keep a comma between each document like I said in original post. I would like to know if there is any better way than this.
Related
We have a collection AbstractEvent with field 'text', which contains 1~30 Chinese characters and we want to perform LIKE match with %keyword%, with high performance(less than 0.3 second, for more 2 million records).
After a bunch of effort, we decided to use VIEW and analyzer identity to do this:
FOR i IN AbstractEventView
SEARCH ANALYZER(i.text LIKE '%keyword%', 'identity')
LIMIT 10
RETURN i.text
And here is the definition of view AbstractEventView
{
"name":"AbstractEventView",
"type":"arangosearch",
"links":{
"AbstractEvent":{
"analyzers":[
"identity"
],
"fields":{
"text":{}
}
}
}
}
However, records returned contain irrelevant ones.
The flowlling is an example:
FOR i IN AbstractEventView
SEARCH ANALYZER(i.text LIKE '%速%', 'identity')
LIMIT 10
RETURN i.text
and the result is
[
"全球经济增速虽军官下滑",
"油食用消费出现明显下滑",
"本次国家经济快速下行",
"这场所迅速爆发的情况",
"经济减速风景空间资本大规模流出",
"苜蓿草众人食品物资价格不稳定",
"荤菜价格快速走低",
"情况快速升级",
"情况快速进展",
"四季功劳增速断崖式回落后"
]
油食用消费出现明显下滑and苜蓿草众人食品物资价格不稳定 are irrelavent.
We've been struggling on this for days, can anyone help me out? Thanks.
PS:
Why we do not use FULL-TEXT index?
full-text index indexed fields by tokenized text, so that we can not get matching '货币超发' when keyword is '货',because '货币' is recgonized as a word.
Why we do not use FILTER with LIKE operator directly?
Filtering without index will cost about 1 second and it is not acceptable.
I'm used Solr 6.6.2
I need to search the special characters and highlight it in Solr,
But it does not work,
my data :
[
{
"id" : "test1",
"title" : "test1# title C# ",
"dynamic_s": 5
},
{
"id" : "test2",
"title" : "test2 title C#",
"dynamic_s": 10
},
{
"id" : "test3",
"title" : "test3 title",
"dynamic_s": 0
}
]
When I search "C#",
Then it will just response like this "test1# title C# ",
It just highlights "C" this word...and "#" will not searching and highlight.
How can I make the search and highlight work for special characters?
The StandardTokenizer splits tokens on special characters, meaning that # will split the content into separate tokens - the first token will be C - and that's what's being highlighted. You'll probably get the exact same result if you just search for C.
The tokenization process will make your tokens end up being test2 title C .
Using a field type with a WhitespaceTokenizer that only splits on whitespace will probably be a better choice for this exact use case, but it's impossible to say if that'll be a good match for your regular search behavior (i.e. if you actually want to match 'C' to `C-99' etc., splitting by those characters can be needed). But - you can use a specific field for highlighting, and that fields analysis chain will be used to determine what to highlight. And you can ask for both the original and the more specific field to be highlighted, and then use the best result in your frontend application.
I've got an index of hundreds of book titles in elasticserch, with documents like:
{"_id": 123, "title": "The Diamond Age", ...}
And I've got a block of freeform text entered by a user. The block of text could contain a number of book titles throughout it, with varying capitalization.
I'd like to find all the book titles in the block of text, so I can link to the specific book pages.
Any idea how I can do this? I've been looking around for exact phrase matches in blocks of text, with no luck.
You need to index the field title as not_analyzed or using keyword analyzer.
This will tell elasticsearch to do no operations on the field whenever you send a query and this will make you be able to do an exact match search.
I would suggest that you keep an analyzed version as well as a not_analyzed version in order to be able to do exact searches as well as analyzed searches. Your mappings would go like this, in this case I assume that the type name is movies in your case.
"mappings":{
"movies":{
"properties":{
"title":{
"type": "string",
"fields":{
"row":{
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
This will give you two fields title which contains an analyzed title and title.row which contains the exact value indexed with absolutely no processing.
title.row would match if you entered an exact
I have to implement a text search application which indexes news articles and then allows a user to search for keywords, phrases or dates inside these texts.
After some consideration regarding my options(SOLR vs. elasticsearch mainly), I ended up doing some testing with elasticsearch.
Now the part that I am stuck on regards the mapping and search query construction options best suited for some special cases that I have encountered. My current mapping has only one field that contains all the text and needs to be analyzed in order to be searchable.
The specific part of the mapping with the field:
"txt": {
"type" : "string",
"term_vector" : "with_positions_offsets",
"analyzer" : "shingle_analyzer"
}
where shingle_analyzer is:
"analysis" : {
"filter" : {
"filter_snow": {
"type":"snowball",
"language":"romanian"
},
"shingle":{
"type":"shingle",
"max_shingle_size":4,
"min_shingle_size":2,
"output_unigrams":"true",
"filler_token":""
},
"filter_stop":{
"type":"stop",
"stopwords":["_romanian_"]
}
},
"analyzer" : {
"shingle_analyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding", "filter_stop","filter_snow","shingle"]
}
}}
My question regards the following situations:
I have to search for "ING" and there are several "ing." that are returned.
I have to search for "E!" and the analyzer kills the
punctuation and thus no results.
I have to search for certain uppercased common terms that are used as company names (like "Apple" but with multiple words) and the lowercase filter creates useless results.
The idea that I have would be to build different fields with different filters that could cover all these possible issues.
Three questions:
Is splitting the field in three fields with different analyzers the correct way?
How would I use the different fields when searching?
Could someone explain how scoring would work to include all these fields?
I've got an ElasticSearch index with a large set of product properties. They are all looking like that:
{'_id':1,'type':'manufacturer','name':'Toyota'},
{'_id':2,'type':'color','name':'Green'},
{'_id':3,'type':'category','name':'SUV Cars'},
{'_id':4,'type':'material','name':'Leather'},
{'_id':5,'type':'manufacturer','name':'BMW'},
{'_id':6,'type':'color','name':'Red'},
{'_id':7,'type':'category','name':'Cabrios'},
{'_id':8,'type':'material','name':'Steel'},
{'_id':9,'type':'category','name':'Cabrios Hardtop'},
{'_id':10,'type':'category','name':'Cabrios Softtop'},
... and 1 Mio. more ...
There are 4 different types of product properties existing: Categories, Manufacturers, Colors and Materials.
The question: How can i query with only one query (it's a settled performance requirement) the best matching result for each type?
So if i request a full text search query i.e. "Green Toyota Cabrios" i should get the following results:
{'_id':2,'type':'color','name':'Green'},
{'_id':1,'type':'manufacturer','name':'Toyota'},
{'_id':7,'type':'category','name':'Cabrios'},
{one matching result of the 'material'-type if found by the query}
That would be the perfect result set, always at maximum 4 results (for each 'type' one result). If there is no matching result for a specific type available there should be just 3 result items returned.
How is that possible with Elasticsearch? Thanks for your ideas!
I don't understand clearly your use case. What are you indexing in fact?
If you index cars, you should index it like:
{
"color": "Green",
"manufacturer": "Toyota",
"category": "Cabrios"
}
That said, from the question you ask:
You can probably define your fields as not_indexed. That way, if you search for "Green Toyota Cabrios" if field "name" you won't get "Cabrios Hardtop".
Not sure I really answered but I don't see your use case...