Custom analyzer elasticsearch soundex plus snowball - search

The following works for me (searching for 'testing' also returns fields with 'test'):
index :
analysis :
analyzer :
default :
type : snowball
language : english
when set up in my elasticsearch.yml file .
I want to combine this with the soundex I have installed so I have tried this :
index :
analysis :
analyzer :
default :
type : custom
tokenizer : standard
filter : [standard, lowercase, soundex_filter, stemming]
filter :
soundex_filter :
type : phonetic
encoder : soundex
replace : true
stemming :
type : snowball
language : english
but no success, none of them seems to work (no stemming or soundex)
Anybody had any success at combining filters ?

for those interested, here is the right syntax
index :
analysis :
analyzer :
default :
type : custom
tokenizer : standard
filter : [standard, lowercase, stemming_filter, soundex_filter]
filter :
soundex_filter :
type : phonetic
encoder : soundex
replace : false
stemming_filter :
type : snowball
language : English
replace true was somehow overriding the stemming...

Related

When I try to build the FNF Kade Engine's source code it won't work (lime test and build wise)

C:\HaxeToolkit\haxe\std/eval/_std/haxe/Exception.hx:39: characters 4-53 : Array<haxe.StackItem> should be haxe.CallStack
C:\HaxeToolkit\haxe\std/eval/_std/haxe/Exception.hx:42: characters 4-48 : Array<haxe.StackItem> should be haxe.CallStack
C:\HaxeToolkit\haxe\std/eval/_std/haxe/Exception.hx:57: characters 27-44 : Class<haxe.CallStack> has no field exceptionToString
C:\HaxeToolkit\haxe\std/eval/_std/haxe/Exception.hx:82: characters 20-27 : haxe.CallStack has no field asArray
C:\HaxeToolkit\haxe\std/eval/_std/haxe/Exception.hx:4: lines 4-89 : Field stack has different type than in core type
C:\HaxeToolkit\haxe\std/eval/_std/haxe/Exception.hx:4: lines 4-89 : haxe.CallStack should be haxe.CallStack
export/release/windows/haxe/ApplicationMain.hx:298: characters 1-8 : Build failure
It creates the export folder but it's missing the exe.
How do I fix this?

I need a regular expression to replace a syntax

I am new to sublime text3 but started to like working in it.
I am using its search and replace to achieve as below:
I have a list of hundreds of items like the below :
5149 : Kaliana
5427 : Kalo Chorio
5036 : Kalo Chorio Kapouti
5071 : Kalo Chorio Sleas
5466 : Kalopanagiotis
But I want to replace these with
5149-
5427-
5036-
5071-
5466-
So basically the colon and the words should be replaced by a hyphen(-) symbol
I tried few regular expressions.
for example : (?<=WORD).*$
but things aren't working.
I tried myself and finally got the expression that works for my requirement.
In the find area type:
: [a-zA-Z\d" "]+
And in the replace area type:
-
This gives you the numbers followed by a hiphen and eliminating the words.

ElasticSearch - Searching for exact text match without keeping two copies in index?

Exact matching for text is supported in ElasticSearch if the field mapping contains "index" : "not_analyzed". That way, the field wont' be tokenized and ES will use the whole string for exact matching. The Documentation
Is there a way to support both full text searching and exact matching without having to create two fields: one for full-text, and one with not_analyzed mapping for exact matching?
An example use case:
We want to search by book titles.
I like trees should return results of full text search
exact="I like trees" should return only books that have the exact title I like trees and nothing else. Case insensitive is fine.
You can use a term filter to do exact match searches
the filter looks like this
{
"term" {
"key" : "value"
}
}
a query would look like this:
{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"key" : "value"
}
}
}
}
}
You don't need to store the data in two different fields, what you want is an ES multi-field.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html#_multi_fields

How do we replace spaces with dashes in a permalink that Autoroute generates from a pattern?

I would like to generate the following URL for a 'My Type' content type.
mywebsite.com/my-type/some-item
I have tried the following patterns:
{Content.ContentType}/{Content.Slug} : My Type/some-item
{Content.ContentType.Slug}/{Content.Slug} : some-item
{Content.ContentType.Text.Slug}/{Content.Slug} : some-item
{Text.Slug}{Content.ContentType}/{Content.Slug} : My Type/some-item

Elasticsearch: mapping text field for search optimization

I have to implement a text search application which indexes news articles and then allows a user to search for keywords, phrases or dates inside these texts.
After some consideration regarding my options(SOLR vs. elasticsearch mainly), I ended up doing some testing with elasticsearch.
Now the part that I am stuck on regards the mapping and search query construction options best suited for some special cases that I have encountered. My current mapping has only one field that contains all the text and needs to be analyzed in order to be searchable.
The specific part of the mapping with the field:
"txt": {
"type" : "string",
"term_vector" : "with_positions_offsets",
"analyzer" : "shingle_analyzer"
}
where shingle_analyzer is:
"analysis" : {
"filter" : {
"filter_snow": {
"type":"snowball",
"language":"romanian"
},
"shingle":{
"type":"shingle",
"max_shingle_size":4,
"min_shingle_size":2,
"output_unigrams":"true",
"filler_token":""
},
"filter_stop":{
"type":"stop",
"stopwords":["_romanian_"]
}
},
"analyzer" : {
"shingle_analyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding", "filter_stop","filter_snow","shingle"]
}
}}
My question regards the following situations:
I have to search for "ING" and there are several "ing." that are returned.
I have to search for "E!" and the analyzer kills the
punctuation and thus no results.
I have to search for certain uppercased common terms that are used as company names (like "Apple" but with multiple words) and the lowercase filter creates useless results.
The idea that I have would be to build different fields with different filters that could cover all these possible issues.
Three questions:
Is splitting the field in three fields with different analyzers the correct way?
How would I use the different fields when searching?
Could someone explain how scoring would work to include all these fields?

Resources