I have a json document example:
{
"_id": "1000001101012017",
"_rev": "1-b7de1e91dbcac1ca098be7d99ddea138",
"ACC_ID": 1005,
"DT": 201701010000000000,
"DBO": 0.9,
"CRO": 0,
"SALDO": -2.05
}
and the Search index function:
function (doc) {
index("saldoSearch", doc._id);
if(doc.DT){
index("DT", doc.DT, {"store":true})
}
if(doc.ACC_ID){
index("ACC_ID", doc.ACC_ID, {"store":true})
}
if(doc.SALDO){
index("SALDO", doc.SALDO, {"store":true})
}
}
search query is:
{
"q": "ACC_ID: 1005 AND DT: [198905260000000000 TO 201702241046285215]",
"sort": "-DT",
"limit": 1
}
and response:
{
"total_rows": 4,
"bookmark": "g1AAAABReJzLYWBgYMxgTmGQS87JL01JzCtxKCkugbFTkoz1Mgr1SpKSc4DqmPJYGFYBAZD6DwRZYDE357aU7W4tTgpJDKvUqrKyACwGG7U",
"rows": [
{
"id": "100502012017",
"order": [
201702010000000000,
11150970
],
"fields": {
"DT": 201702010000000000,
"ACC_ID": 1005
}
}
]
}
response is without "SALDO" field but when
"q":"ACC_ID: 1005 AND DT: [198905260000000000 TO 201702241046285215] AND SALDO: [-identity TO identity]"
response is result with "SALDO" but waithing ~10 second.
How can I get a response with "SALDO" and be high performanse?
One way to solve you problem is to use add include_docs=true to your query string i.e.
{
"q": "ACC_ID: 1005 AND DT: [198905260000000000 TO 201702241046285215]",
"sort": "-DT",
"limit": 1,
"include_docs": true
}
The search results will then contain the entire document in the result object.
Related
Given the following document structure:
{
"name": [
{
"use": "official",
"family": "Chalmers",
"given": [
"Peter",
"James"
]
},
{
"use": "usual",
"given": [
"Jim"
]
},
{
"use": "maiden",
"family": "Windsor",
"given": [
"Peter",
"James"
]
}
]
}
Query:
FOR client IN Patient FILTER client.name[*].use=='official' RETURN client.name[*].given
I have telecom and name array.
I want to query to compare if name[*].use=='official' then print corresponding give array.
Expected result:
"given": [
"Peter",
"James"
]
client.name[*].use is an array, so you need to use an array operator. It can be either of the following:
'string' in doc.attribute
doc.attribute ANY == 'string'
doc.attribute ANY IN ['string']
To return just the given names from the 'official' array, you can use a subquery:
RETURN { given:
FIRST(FOR name IN client.name FILTER name.use == 'official' LIMIT 1 RETURN name.given)
}
Alternatively, you can use an inline expression:
FOR client IN Patient
FILTER 'official' IN client.name[*].use
RETURN { given:
FIRST(client.name[* FILTER CURRENT.use == 'official' LIMIT 1 RETURN CURRENT.given])
}
Result:
[
{
"given": [
"Peter",
"James"
]
}
]
In your original post, the example document and query didn't match, but assuming the following structure:
{
"telecom": [
{
"use": "official",
"value": "+1 (03) 5555 6473 82"
},
{
"use": "mobile",
"value": "+1 (252) 5555 910 920 3"
}
],
"name": [
{
"use": "official",
"family": "Chalmers",
"given": [
"Peter",
"James"
]
},
{
"use": "usual",
"given": [
"Jim"
]
},
{
"use": "maiden",
"family": "Windsor",
"given": [
"Peter",
"James"
]
}
]
}
… here is a possible query:
FOR client IN Patient
FILTER LENGTH(client.telecom[* FILTER
CONTAINS(CURRENT.value, "(03) 5555 6473") AND
CURRENT.use == 'official']
)
RETURN {
given: client.name[* FILTER CURRENT.use == 'official' RETURN CURRENT.given]
}
Note that client.telecom[*].value LIKE "..." causes the array of phone numbers to be cast to a string "[\"+1 (03) 5555 6473 82\",\"+1 (252) 5555 910 920 3\"]" against which the LIKE operation is run - this kind of works, but it's not ideal.
CONTAINS() is also faster than LIKE with % wildcards on both sides.
It would be possible that there are multiple 'official' elements, which might require an extra level of array nesting. Above query produces:
[
{
"given": [
[
"Peter",
"James"
]
]
}
]
If you know that there is only one element or restrict it to one element explicitly then you can get rid of one of the wrapping square brackets with FIRST() or FLATTEN().
I think it's best if I describe my intent and try to break it down to code.
I want users to have the ability of complex queries should they choose to that query_string offers. For example 'AND' and 'OR' and '~', etc.
I want to have fuzziness in effect, which has made me do things I feel dirty about like "#{query}~" to the sent to ES, in other words I am specifying fuzzy query on the user's behalf because we offer transliteration which could be difficult to get the exact spelling.
At times, users search a number of words that are suppose to be in a phrase. query_string searches them individually and not as a phrase. For example 'he who will' should bring me the top match to be when those three words are in that order, then give me whatever later.
Current query:
{
"indices_boost": {},
"aggregations": {
"by_ayah_key": {
"terms": {
"field": "ayah.ayah_key",
"size": 6236,
"order": {
"average_score": "desc"
}
},
"aggregations": {
"match": {
"top_hits": {
"highlight": {
"fields": {
"text": {
"type": "fvh",
"matched_fields": [
"text.root",
"text.stem_clean",
"text.lemma_clean",
"text.stemmed",
"text"
],
"number_of_fragments": 0
}
},
"tags_schema": "styled"
},
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"_source": {
"include": [
"text",
"resource.*",
"language.*"
]
},
"size": 5
}
},
"average_score": {
"avg": {
"script": "_score"
}
}
}
}
},
"from": 0,
"size": 0,
"_source": [
"text",
"resource.*",
"language.*"
],
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "inna alatheena",
"fuzziness": 1,
"fields": [
"text^1.6",
"text.stemmed"
],
"minimum_should_match": "85%"
}
}
],
"should": [
{
"match": {
"text": {
"query": "inna alatheena",
"type": "phrase"
}
}
}
]
}
}
}
Note: alatheena searched without the ~ will not return anything although I have allatheena in the indices. So I must do a fuzzy search.
Any thoughts?
I see that you're doing ES indexing of Qur'anic verses, +1 ...
Much of your problem domain, if I understood it correctly, can be solved simply by storing lots of transliteration variants (and permutations of their combining) in a separate field on your Aayah documents.
First off, you should make a char filter that replaces all double letters with single letters [aa] => [a], [ll] => [l]
Maybe also make a separate field containing all of [a, e, i] (because of their "vocative"/transcribal ambiguity) replaced with € or something similar, and do the same while querying in order to get as many matches as possible...
Also, TH in "allatheena" (which as a footnote may really be Dhaal, Thaa, Zhaa, Taa+Haa, Taa+Hhaa, Ttaa+Hhaa transcribed ...) should be replaced by something, or both the Dhaal AND the Thaa should be transcribed multiple times.
Then, because it's Qur'anic script, all Alefs without diacritics, Hamza, Madda, etc should be treated as Alef (or Hamzat) ul-Wasl, and that should also be considered when indexing / searching, because of Waqf / Wasl in reading arabic. (consider all the Wasl`s in the first Aayah of Surat Al-Alaq for example)
Dunno if this is answering your question in any way, but I hope it's of some assistance in implementing your application nontheless.
You should use Dis Max Query to achieve that.
A query that generates the union of documents produced by its
subqueries, and that scores each document with the maximum score for
that document as produced by any subquery, plus a tie breaking
increment for any additional matching subqueries.
This is useful when searching for a word in multiple fields with
different boost factors (so that the fields cannot be combined
equivalently into a single search field). We want the primary score to
be the one associated with the highest boost.
Quick example how to use it:
POST /_search
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"boost": 1.2,
"queries": [
{
"match": {
"text": {
"query": "inna alatheena",
"type": "phrase",
"boost": 5
}
}
},
{
"match": {
"text": {
"query": "inna alatheena",
"type": "phrase",
"fuzziness": "AUTO",
"boost": 3
}
}
},
{
"query_string": {
"default_field": "text",
"query": "inna alatheena"
}
}
]
}
}
}
It will run all of your queries, and the one, which scored highest compared to others, will be taken. So just define your rules using it. You should achieve what you wanted.
[updated 17:15 on 28/09]
I'm manipulating json data of type:
[
{
"id": 1,
"title": "Sun",
"seeAlso": [
{
"id": 2,
"title": "Rain"
},
{
"id": 3,
"title": "Cloud"
}
]
},
{
"id": 2,
"title": "Rain",
"seeAlso": [
{
"id": 3,
"title": "Cloud"
}
]
},
{
"id": 3,
"title": "Cloud",
"seeAlso": [
{
"id": 1,
"title": "Sun"
}
]
},
];
After inclusion in the database, a node.js search using
db.documents.query(
q.where(
q.collection('test films'),
q.value('title','Sun')
).withOptions({categories: 'none'})
)
.result( function(results) {
console.log(JSON.stringify(results, null,2));
});
will return both the film titled 'Sun' and the films which have a seeAlso/title property (forgive the xpath syntax) = 'Sun'.
I need to find 1/ films with title = 'Sun' 2/ films with seeAlso/title = 'Sun'.
I tried a container query using q.scope() with no success; I don't find how to scope the root object node (first case) and for the second case,
q.where(q.scope(q.property('seeAlso'), q.value('title','Sun')))
returns as first result an item which matches all text inside the root object node
{
"index": 1,
"uri": "/1.json",
"path": "fn:doc(\"/1.json\")",
"score": 137216,
"confidence": 0.6202662,
"fitness": 0.6701325,
"href": "/v1/documents?uri=%2F1.json&database=Documents",
"mimetype": "application/json",
"format": "json",
"matches": [
{
"path": "fn:doc(\"/1.json\")/object-node()",
"match-text": [
"Sun Rain Cloud"
]
}
]
},
which seems crazy.
Any idea about how doing such searches on denormalized json data?
Laurent:
XPaths on JSON are supported by MarkLogic.
In particular, you might consider setting up a path range index to match /title at the root:
http://docs.marklogic.com/guide/admin/range_index#id_54948
Scoped property matching required either filtering or indexed positions to be accurate. An alternative is to set up another path range index on /seeAlso/title
For the match issue it would be useful to know the MarkLogic version and to see the entire query.
Hoping that helps,
How do I go about creating a view that would allow me to search for a particular value based on a JSON object with the main document?
For example, below is a document from my database.
Phone number is kept in "phone" which is within the "patientinfo" key.
How would I search for a phone number with this layout?
{
"_id": "x2484",
"_rev": "2-44ac5e42ce95fcbcc8a743a256a3f6ce",
"patientinfo": {
"name": "Douglas",
"laststat": "Complete",
"timetocall": 1438034383,
"queue": 7,
"notes": "Spouse placed call",
"laststatcode": 501,
"timestamp": 1438023283,
"phone": 3141592653,
"attempts": 1
},
"call": {
"1": {
"timetocall": 1438034383,
"status": "Complete",
"laststat": "Complete",
"timestamp": 1438023283,
"lastintv": "moor",
"laststatcode": 501
}
}
}
So the answer turned out to be much simpler than I thought. All I needed to do was add the next level of the json object as part of my if and emit statements.
function(doc) {
if (doc.patientinfo.phone)
{
emit(doc.patientinfo.phone, doc);
}
}
How do I search for all unique values of a given field with Elasticsearch?
I have such a kind of query like select full_name from authors, so I can display the list to the users on a form.
You could make a terms facet on your 'full_name' field. But in order to do that properly you need to make sure you're not tokenizing it while indexing, otherwise every entry in the facet will be a different term that is part of the field content. You most likely need to configure it as 'not_analyzed' in your mapping. If you are also searching on it and you still want to tokenize it you can just index it in two different ways using multi field.
You also need to take into account that depending on the number of unique terms that are part of the full_name field, this operation can be expensive and require quite some memory.
For Elasticsearch 1.0 and later, you can leverage terms aggregation to do this,
query DSL:
{
"aggs": {
"NAME": {
"terms": {
"field": "",
"size": 10
}
}
}
}
A real example:
{
"aggs": {
"full_name": {
"terms": {
"field": "authors",
"size": 0
}
}
}
}
Then you can get all unique values of authors field.
size=0 means not limit the number of terms(this requires es to be 1.1.0 or later).
Response:
{
...
"aggregations" : {
"full_name" : {
"buckets" : [
{
"key" : "Ken",
"doc_count" : 10
},
{
"key" : "Jim Gray",
"doc_count" : 10
},
]
}
}
}
see Elasticsearch terms aggregations.
Intuition:
In SQL parlance:
Select distinct full_name from authors;
is equivalent to
Select full_name from authors group by full_name;
So, we can use the grouping/aggregate syntax in ElasticSearch to find distinct entries.
Assume the following is the structure stored in elastic search :
[{
"author": "Brian Kernighan"
},
{
"author": "Charles Dickens"
}]
What did not work: Plain aggregation
{
"aggs": {
"full_name": {
"terms": {
"field": "author"
}
}
}
}
I got the following error:
{
"error": {
"root_cause": [
{
"reason": "Fielddata is disabled on text fields by default...",
"type": "illegal_argument_exception"
}
]
}
}
What worked like a charm: Appending .keyword with the field
{
"aggs": {
"full_name": {
"terms": {
"field": "author.keyword"
}
}
}
}
And the sample output could be:
{
"aggregations": {
"full_name": {
"buckets": [
{
"doc_count": 372,
"key": "Charles Dickens"
},
{
"doc_count": 283,
"key": "Brian Kernighan"
}
],
"doc_count": 1000
}
}
}
Bonus tip:
Let us assume the field in question is nested as follows:
[{
"authors": [{
"details": [{
"name": "Brian Kernighan"
}]
}]
},
{
"authors": [{
"details": [{
"name": "Charles Dickens"
}]
}]
}
]
Now the correct query becomes:
{
"aggregations": {
"full_name": {
"aggregations": {
"author_details": {
"terms": {
"field": "authors.details.name"
}
}
},
"nested": {
"path": "authors.details"
}
}
},
"size": 0
}
Working for Elasticsearch 5.2.2
curl -XGET http://localhost:9200/articles/_search?pretty -d '
{
"aggs" : {
"whatever" : {
"terms" : { "field" : "yourfield", "size":10000 }
}
},
"size" : 0
}'
The "size":10000 means get (at most) 10000 unique values. Without this, if you have more than 10 unique values, only 10 values are returned.
The "size":0 means that in result, "hits" will contain no documents. By default, 10 documents are returned, which we don't need.
Reference: bucket terms aggregation
Also note, according to this page, facets have been replaced by aggregations in Elasticsearch 1.0, which are a superset of facets.
The existing answers did not work for me in Elasticsearch 5.X, for the following reasons:
I needed to tokenize my input while indexing.
"size": 0 failed to parse because "[size] must be greater than 0."
"Fielddata is disabled on text fields by default." This means by default you cannot search on the full_name field. However, an unanalyzed keyword field can be used for aggregations.
Solution 1: use the Scroll API. It works by keeping a search context and making multiple requests, each time returning subsequent batches of results. If you are using Python, the elasticsearch module has the scan() helper function to handle scrolling for you and return all results.
Solution 2: use the Search After API. It is similar to Scroll, but provides a live cursor instead of keeping a search context. Thus it is more efficient for real-time requests.