Linking nested documents together and facetting in ElasticSearch - search

I have a mapping which looks like this:
"mappings": {
"mydoc": {
"properties": {
"event": {
"type": "nested",
"properties": {
"eventType": {
"type": "string"
},
"idList": {
"type": "integer"
},
"id": {
"type": "integer"
},
}
}
}
}
}
A mydoc document contains a nested array of event documents.
Within a mydoc document, I want to find all IDs where:
There exists an event with event.type='A' and event.idList contains some ID X
There exists another event with event.type='B' and event.id equals X
Across the index, I want a list of IDs where this criteria holds and also a count (for each ID) of the number of mydoc documents this occurred in.
Is it possible to achieve this in ElasticSearch? I was thinking it might be possible with a nested facet filter or a terms filter lookup but I have not seen a way to do it with these yet.

I think that a parent-child relation might suit your case better then a nested document.
Then you can query you (child) events document directly if you're searching only in the scope of the events (or add a condition on the _parent field to limit to a specific top document).
And you can use the has_child filter or query to search (or facet) on your top documents with conditions on the events (see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html )

Related

How can I obtain a document from a Cosmos DB using a field in an array as a filter?

I have a Cosmos DB with documents that look like the following:
{
"name": {
"productName": "someProductName"
},
"identifiers": [
{
"identifierCode": "1234",
"identifierLabel": "someLabel1"
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
]
}
I would like to write a sql query to obtain an entire document using "identifierLabel" as a filter when searching for the document.
I attempted to write a query based on an example I found from the following blog:
SELECT c,t AS identifiers
FROM c
JOIN t in c.identifiers
WHERE t.identifierLabel = "someLabel2"
However, when the result is returned, it appends the following to the end of the document:
{
"name": {
"productName": "someProductName"
},
"identifiers": [
{
"identifierCode": "1234",
"identifierLabel": "someLabel1"
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
]
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
How can I avoid this and get the result that I desire, i.e. the entire document with nothing appended to it?
Thanks in advance.
Using ARRAY_CONTAINS(), you should be able to do something like this to retrieve the entire document, without any need for a self-join:
SELECT *
FROM c
where ARRAY_CONTAINS(c.identifiers, {"identifierLabel":"someLabel2"}, true)
Note that ARRAY_CONTAINS() can search for either scalar values or objects. By specifying true as the third parameter, it signifies searching through objects. So, in the above query, it's searching all objects in the array where identifierLabel is set to "someLabel2" (and then it should be returning the original document, unchanged, avoiding the issue you ran into with the self-join).

Cloudant Sorting on a nullable field

I want to sort on a field lets say name which is indexed in Cloudant DB. I am getting all the documents both which has this name field and which doesn't by using the index without sort . But when i try to sort with the name field I am not getting the documents which doesn't have this name field in the doc.
Is there any way to do this by using the query indexes. I want all the documents in sorted order which doesn't have the name field too.
For Example :
Below are some documents:
{
"_id": 1234,
"classId": "abc",
"name": "Happa"
}
{
"_id": 12345,
"classId": "abc",
"name": "Prasanth"
}
{
"_id": 123456,
"classId": "abc",
}
Below is the Query what i am trying to execute:
{
"selector": {
"classId": "abc",
"name" :{
"or" : [
{"$exists": true},{"$exists": false}
]
}
},
"sort": [{ "classId": "asc" }, { "name": "asc" }],
"use_index": "idx-classId_name"
},
I am expecting all the documents to be returned in a sorted order including the document which doesn't have that name field.
Your query makes no sense to me as it stands. You're requesting a listing of documents which either have, or don't have a specific field (meaning every document), and expecting to sort those on this field that may or may not exist. Such an order isn't defined out of the box.
I'd remove the name clause from the selector, sorting only on the classId field which appear in every document, and then do the secondary partial ordering on the client side, so you can decide how you intend to mix in the documents without the name field with those that have it.
Another solution is to use a view instead of a Cloudant Query index. I've not tested this, but hopefully the intent is clear:
function(doc) {
if (doc && doc.classId) {
var name = doc.name || "[notfound]";
emit(doc.classId+"-"+name, 1);
}
}
which will key the docs on "classId-name" and for docs with no name, a specified sentinel value.
Querying the view should return the documents lexicographically ordered on this compound key (which you can reverse with a query parameter if you wish).

Azure Search match against two properties of the same object

I would like to do a query matches against two properties of the same item in a sub-collection.
Example:
[
{
"name": "Person 1",
"contacts": [
{ "type": "email", "value": "person.1#xpto.org" },
{ "type": "phone", "value": "555-12345" },
]
}
]
I would like to be able to search by emails than contain xpto.org but,
doing something like the following doesn't work:
search.ismatchscoring('email','contacts/type,','full','all') and search.ismatchscoring('/.*xpto.org/','contacts/value,','full','all')
instead, it will consider the condition in the context of the main object and objects like the following will also match:
[
{
"name": "Person 1",
"contacts": [
{ "type": "email", "value": "555-12345" },
{ "type": "phone", "value": "person.1#xpto.org" },
]
}
]
Is there any way around this without having an additional field that concatenates type and value?
Just saw the official doc. At this moment, there's no support for correlated search:
This happens because each clause applies to all values of its field in
the entire document, so there's no concept of a "current sub-document
https://learn.microsoft.com/en-us/azure/search/search-howto-complex-data-types
and https://learn.microsoft.com/en-us/azure/search/search-query-understand-collection-filters
The solution I've implemented was creating different collections per contact type.
This way I'm able to search directly in, lets say, the email collection without the need for correlated search. It might not be the solution for all cases but it works well in this case.

Filter couchdb document based on value from nested child document

I would like to create a map/reduce function that filters the documents based on a nested value from the child document. But retrieve the parent document.
I have following documents:
{
"_id": "1",
"_rev": "1-991baf1d86435a73a3460335cc19063c",
"configuration_id": "225f9d47-841c-43c2-90c2-e65bb49083d3",
"name": "test",
"image": "",
"type": "A",
"created": "",
"updated": 1,
"destroyed": ""
}
{
"_id": "225f9d47-841c-43c2-90c2-e65bb49083d3",
"_rev": "1-3e3a1c357c86cbd1cd42b5980b9655a4",
"configuration_packages_id": "cd19b0ba-157d-4dd4-adac-56fd470bfed4",
"configuration_distribution_id": "5b538411-ca99-46c7-ac3c-1f382e4577a9",
"type": "CONFIGURATION",
"configuration": {
"hostname": "example123",
"images": [
"image1",
"image2"
]
}
}
Now I would like to retrieve all the documents of type A and with hostname example123.
At the moment I retrieve all the document of type A like this:
function (doc) {
if (doc.type === "A") {
emit([doc.updated], doc);
}
}
But now I would also like to filter on the host name as well.
I'm not sure on how to achieve this with CouchDB.
TLDR;
You cannot do this
Details
Your "nested" document is only accessible through a join but you can't query it.
The correct way to do that kind of query natively would have been to have a real nested document inside the parent document. Separating those documents has a cost.
Join example
function (doc) {
if (doc.type === "A") {
emit([doc.updated,0]);
emit([doc.updated,1],["_id":doc.configuration_id]);
}
}
If you query the view with "include_docs=true", this will get you the configuration document linked as well as the parent document itself. Then you can query to get the updated docs, merge the nested(1) with the parents(0) and filter them.

Count duplicate values via Elasticsearch terms aggregation

I am trying to run an Elasticsearch terms aggregation on multiple fields of the documents in my index. Each document contains multiple fields with hashtags, which can be extracted using a custom hashtag analyzer. The goal is to find the most common hashtags in the system.
As stated in the Elasticsearch documentation, it is not possible to run a terms aggregation on multiple fields of a document. I am thus trying to use a copy_to field. The problem now is, that if the document contains the same hashtag in multiple fields, it should count the term multiple times. This is not the case with the default terms aggregation:
Given Mapping:
{
"properties": {
"field_one": {
"type": "string",
"copy_to": "hashtags"
},
"field_two": {
"type": "string",
"copy_to": "hashtags"
}
}
Given Document:
{
"field_one": "Hello #World",
"field_two": "One #World",
}
The aggregation will return a single bucket {"key": "#World", "doc_count": 1}. What I need is a single bucket {"key": "#World", "doc_count": 2}.

Resources