Elastic Search: URI Search is inaccurate - search

I am searching via Webbrowser Firefox with the following URI:
http://localhost:9200/xyz-leads/lead/_search?q=datetimeHour:22_09_2015__12&&deliveryType:EMAIL
But I only get hits with deliveryType:RVS, not EMAIL.
If I search for deliveryType:RVS I get 3 hits.
I know there are 3 results for RVS and 0 for EMAIL with the datetimeHour above, in total.
I guess the standard behaviour of ElasticSearch is, that it ignores the second search value 'deliveryType' and uses only the first 'datetimeHour' if there are no results for both.
If I search for deliveryType:Email, then I want no hits in the response.
Do I need to set some additional ElasticSearch parameters?
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html
_source: {
country: "DEU",
brand: "A",
leadGroup: "TEST",
leadID: 108798,
deliveryType: "RVS",
datetime: 1442916913000,
datetimeFormatted: "22.09.2015 12:15:13",
datetimeHour: "22_09_2015__12",
datetimeDay: "22_09_2015"
}

The query string query you stuff into your URL (i.e. in the q=... parameter) needs to be well-formed according to the query string query syntax
Try this instead (i.e. using AND instead of &&)
http://localhost:9200/xyz-leads/lead/_search?q=datetimeHour:22_09_2015__12%20AND%20deliveryType:EMAIL

Related

Irrelevant results returned from view search in arangodb

We have a collection AbstractEvent with field 'text', which contains 1~30 Chinese characters and we want to perform LIKE match with %keyword%, with high performance(less than 0.3 second, for more 2 million records).
After a bunch of effort, we decided to use VIEW and analyzer identity to do this:
FOR i IN AbstractEventView
SEARCH ANALYZER(i.text LIKE '%keyword%', 'identity')
LIMIT 10
RETURN i.text
And here is the definition of view AbstractEventView
{
"name":"AbstractEventView",
"type":"arangosearch",
"links":{
"AbstractEvent":{
"analyzers":[
"identity"
],
"fields":{
"text":{}
}
}
}
}
However, records returned contain irrelevant ones.
The flowlling is an example:
FOR i IN AbstractEventView
SEARCH ANALYZER(i.text LIKE '%速%', 'identity')
LIMIT 10
RETURN i.text
and the result is
[
"全球经济增速虽军官下滑",
"油食用消费出现明显下滑",
"本次国家经济快速下行",
"这场所迅速爆发的情况",
"经济减速风景空间资本大规模流出",
"苜蓿草众人食品物资价格不稳定",
"荤菜价格快速走低",
"情况快速升级",
"情况快速进展",
"四季功劳增速断崖式回落后"
]
油食用消费出现明显下滑and苜蓿草众人食品物资价格不稳定 are irrelavent.
We've been struggling on this for days, can anyone help me out? Thanks.
PS:
Why we do not use FULL-TEXT index?
full-text index indexed fields by tokenized text, so that we can not get matching '货币超发' when keyword is '货',because '货币' is recgonized as a word.
Why we do not use FILTER with LIKE operator directly?
Filtering without index will cost about 1 second and it is not acceptable.

How to allow empty or a string validated by a regex in JOI?

I am to pass a vat value. The validation should allow saving it in the db if it is empty or passes the given regex of a certain country.
.when('country_code', {
is: 'AT',
then: Joi.alternatives([
Joi.string().allow(''),
Joi.string().regex(/(AT)?U[0-9]{8}/)
])
})
This what I have achieved till now but it isn't working.
This is what you are looking for:
Joi.string().regex(/(AT)?U[0-9]{8}$/).allow('')

How to match different instances of the same query in Elasticsearch?

Example 1:
My query term is "abcd".
My query structure is like this:
query: {
query_string: {
query: "abc",
fields: ["field1", "field2", "field3"]
}
},
size: 50,
"highlight": {
"fields": {
"field1": {},
"field2": {},
"field3": {}
}
It matches the following instances:
abc abcs abc_def_ghi
But it does not match def_abc or def_abc_ghi.
Basically instances where abc is in the middle of a string.
Example 2:
In the same example above, if my query is abc_def
It does not match abc_def_ghi, although abc_def is present.
I have tried prefix_phrase and it solves scenario 2 but misses out on example 1's problems.
Any help would be appreciated.
for these usages you should use wildcard in query or regular expression
if you are using term query you can utilize wildcard term query or regexp query instead.
As name suggests phase_prefix is like poor mans autocomplete it searches for fields which starts with given phrase in your case abc,abcs,abc_def_ghi. as your field doesn't start with abc in case of def_abc,def_abc_ghi it won't work with phrase prefix.
Try using character filters specifically Pattern Replace Character Filter to replace _ with (space) from your field while analyzing your field. check this answer . so your token would result in [def,abc,ghi] instead of single token like [def_abc_ghi]. then you can search it using cross_field on analyzed field which should satisfy all of your mentioned cases.

Elasticsearch: How to get the length of a string field(before analysis)?

My index has a string field containing a variable length random id. Obviously it shouldn't be analysed.
But I don't know much about elasticsearch especially when I created the index.
Today I tried a lot to filter documents based on the length of id, finally I got this groovy script:
doc['myfield'].values.size()
or
doc['myfield'].value.size()
both returns mysterious numbers, I think that's because of the field got analysed.
If it's really the case, is there any way to get the original length or fix the problem, without rebuild the whole index?
Use _source instead of doc. That's using the source of the document, meaning the initial indexed text:
_source['myfield'].value.size()
If possible, try to re-index the documents to:
use doc[field] on a not-analyzed version of that field
even better, find out the size of the field before you index the document and consider adding its size as a regular field in the document itself
Elasticsearch stores a string as tokenized in the data structure ( Field data cache )where we have script access to.
So assuming that your field is not not_analyzed , doc['field'].values will look like this
"In america" => [ "in" , "america" ]
Hence what you get from doc['field'].values is a array and not a string.
Now the story doesn't change even if you have a single token or have the field as not_analyzed.
"america" => [ "america" ]
Now to see the size of the first token , you can use the following request
{
"script_fields": {
"test1": {
"script": "doc['field'].values[0].size()"
}
}
}

String Comparison with Elasticsearch Groovy Dynamic Script

I have an elasticsearch index that contains various member documents. Each member document contains a membership object, along with various fields associated with / describing individual membership. For example:
{membership:{'join_date':2015-01-01,'status':'A'}}
Membership status can be 'A' (active) or 'I' (inactive); both Unicode string values. I'm interested in providing a slight boost the score of documents that contain active membership status.
In my groovy script, along with other custom boosters on various numeric fields, I have added the following:
String status = doc['membership.status'].value;
float status_boost = 0.0;
if (status=='A') {status_boost = 2.0} else {status_boost=0.0};
return _score + status_boost
For some reason associated with how strings operate via groovy, the check (status=='A') does not work. I've attempted (status.toString()=='A'), (status.toString()=="A"), (status.equals('A')), plus a number of other variations.
How should I go about troubleshooting this (in a productive, efficient manner)? I don't have a stand-alone installation of groovy, but when I pull the response data in python the status is very much so either a Unicode 'A' or 'I' with no additional spacing or characters.
#VineetMohan is most likely right about the value being 'a' rather than 'A'.
You can check how the values are indexed by spitting them back out as script fields:
$ curl -XGET localhost:9200/test/_search -d '
{
"script_fields": {
"status": {
"script": "doc[\"membership.status\"].values"
}
}
}
'
From there, it should be an indication of what you're actually working with. More than likely based on the name and your usage, you will want to reindex (recreate) your data so that membership.status is mapped as a not_analyzed string. If done, then you won't need to worry about lowercasing of anything.
In the mean time, you can probably get by with:
return _score + (doc['membership.status'].value == 'a' ? 2 : 0)
As a big aside, you should not be using dynamic scripting. Use stored scripts in production to avoid security issues.

Resources