How to get count of documents that would be added if one selects another aggregation options of an array-field in elastic search - search

Let's say we have four documents with a tags field. It can contain multiple strings, let's say foo, bar and baz.
docA.tags = ['foo']
docB.tags = ['bar']
docC.tags = ['foo', 'bar']
docD.tags = ['foo', 'baz']
I query the docs using aggregations so I get the four documents and a list of three buckets with the count that matches the specific tag.
buckets = [
{key: 'bar', doc_count: 2}, // docB, docC
{key: 'foo', doc_count: 3}, // docA, docC, docD
{key: 'baz', doc_count: 1} // docD
]
If I now run another query and add one of those tags – lets say foo – as a terms-filter to the query, I only get the docs (docA, docC, docD) that have this tag. That's what I want.
But I also get another list of possible aggregations with updated counts.
buckets = [
{key: 'bar', doc_count: 1}, // docC
{key: 'baz', doc_count: 1}, // docD
]
But these counts don't really match what's happening. They reflect the count of documents that match both of the tags, the one I selected in the first place (foo) AND the one of the bucket (bar or baz).
But if I then select a second tag – let's say baz – I get documents that have been tagged with foo OR baz. That's because I use the terms filter.
So what I really want is this
buckets = [
{key: 'bar', doc_count: 1}, //docB
{key: 'baz', doc_count: 0},
]
How can I achieve that the counts are appropriate. They should reflect the count of documents that would be added if I select the second tag. An example of this is here.
I already tried to use post_filter but that always gives me the first result. Than a min_doc_count-flag to the aggs, but this only shows me the combinations that would result in count=0.
I have a solution for this, but it seem pretty complicated to me. For this I would have to run another request for each aggregation where I invert the filter criteria. So in the upper example I have to make a query to all docs that don't have the tag foo and match the rest of the query. The aggregation results would be exactly what I needed.

It sounds like you're trying to do something a little atypical for facets/aggregations.
(However, it's not invalid... it makes a lot of sense to understand how the size of your selection will change through the application of a filter)
What I think you're asking for is:
Display results for: QUERY AND FILTER
Display term aggregation counts for: QUERY NOT FILTER
You mentioned you're doing subsequent request(s) for counts? You should be able to construct this aggregation request inside your main search request.
Structurally it's:
match: (QUERY) or match_all
aggregations:
filter: { not: (FILTER) }
aggregations: { terms: ... }
post_filter: (FILTER)
That post_filter is executed after the aggregations are calculated (but still applied to the search results) so your results will be what you expect.
The aggregations are working in the scope of the search query alone. (The postfilter has not been applied yet.)
The filter aggregation excludes all documents matching FILTER from the search query results before the Terms Aggregation calculates the counts.
(giving you the left outside edge of the Venn shown above, but just for the counts)

Related

Efficiently count Documents with different values for a given field

I am trying to count the number of documents that are in each possible state in a particular Arango collection.
This should be possible in 1 pass over all of the documents using a bucket-sort like strategy where you iterate over all documents, if the value for the state hasn't been seen before, you add a counter with a value of 1 to a list. If you have seen that state before, you increment the counter. Once you've reached the end, you'll have a counter for each possible state in the DB that indicates how many documents are currently stored with that state.
I can't seem to figure out how to write this type of logic in AQL to submit as a query. Current strategy is like this:
Loop over all documents, filtering only docs of a particular state.
Loop over all documents, filtering only docs of a different particular state.
...
All states have been filtered.
Return size of each set
This works, but I'm sure it's much slower than it should be. This also means that if we add a new state, we have to update the query to loop over all docs an additional time, filtering based on the new state. A bucket-sort like query would be quick, and would need no updating as new states are created as well.
If these were the documents:
{A}
{B}
{B}
{C}
{A}
Then I'd like the result to be
{ A:2, B:2, C:1 }
Where A,B,&C are values for a particular field. Current strategy filters like so
LET docsA = (
FOR doc in collection
FILTER doc.state == A
RETURN doc
)
Then manually construct the return object calling LENGTH on each list of docs
Any help or additional info would be greatly appreciated
What about using a COLLECT function? (see docs here)
FOR doc IN collection
COLLECT s = doc.state WITH COUNT INTO c
RETURN { state: s, count: c }
This would return something like:
[
{ state: 'A', count: 23 },
{ state: 'B', count: 2 },
{ state: 'C', count: 45 }
]
Would that accomplish what you are after?

How to filter on view using complex query

I am trying to filter on a view which emits bookName and bookItem.
emit([doc.basicInfo.bookName,doc.basicInfo.bookItem], 1);
it gives me below result without any query:
{"total_rows”:10,”offset":0,"rows":[
{"id":"d4e5548fb01e6e2c559e702fe7b138ad","key":["correctaccouts","billing"],"value":1},
{"id":"863c46c645b6344719a08231606f2a7d","key":["credeaccount","system"],"value":1},
{"id":"68d39e64c406127960dc735e8167eee3","key":["credeaccount11","system"],"value":1},
{"id":"1ab4d31588d76a42e85b526a316074de","key":["mayankamazon","billing"],"value":1},
{"id":"3204f5db5df91886373f95995ce09a2d","key":["mayankazure","asset"],"value":1},
{"id":"452c040048fb2b779205b3785615d368","key":["mayankmaaa","system"],"value":1},
{"id":"23f01f7bc60c2c8f24f6b741584a69fa","key":["TEST_AWS_Delete212sss12","asset"],"value":1},
{"id":"f0093f474e0d50f046b9fdc9145bdc91","key":["vijeth-myteam111115555555","asset"],"value":1},
{"id":"c3bce8dd1482d841f445fbd617ba1db7","key":["vijeth-myteam11111555sss5555","asset"],"value":1},
{"id":"347479ba91696b73f4a57252cd00a358","key":["vijeth-myteamOnly","asset"],"value":1}
]}
Now I am trying to query on it using complex keys:
satrtkey=[{},"asset"]&endkey=[{},"asset"]
It should return me:
{"total_rows”:5,”offset":0,"rows":[
{"id":"3204f5db5df91886373f95995ce09a2d","key":["mayankazure","asset"],"value":1},
{"id":"23f01f7bc60c2c8f24f6b741584a69fa","key":["TEST_AWS_Delete212sss12","asset"],"value":1},
{"id":"f0093f474e0d50f046b9fdc9145bdc91","key":["vijeth-myteam111115555555","asset"],"value":1},
{"id":"c3bce8dd1482d841f445fbd617ba1db7","key":["vijeth-myteam11111555sss5555","asset"],"value":1},
{"id":"347479ba91696b73f4a57252cd00a358","key":["vijeth-myteamOnly","asset"],"value":1}
]}
But it still gives me all 10 records. I want to filter only records of type "asset".
To use key ranges, you must narrow down your research starting with the left fields to the right fields.
For example, if your key would be: [doc.basicInfo.bookItem,doc.basicInfo.bookName]
You could search with start_key=["asset",null]&end_key=["asset",{}]
Also, your current query is equivalent to key=[{},"asset"]. Instead, you should have tried: start_key=[null,"asset"]&end_key=[{},"asset"] but it should not work.
Example
View:
function (doc) {
emit([doc.basicInfo.bookItem,doc.basicInfo.bookName], 1);
}
Query:
http://localhost:5984/<db>/_design/<design_name>/_view/<view_name>?include_docs=true&inclusive_end=true&start_key=%5B%22asset%22%2Cnull%5D&end_key=%5B%22asset%22%2C%7B%7D%5D

using collect in arangodb insert to create new documents

I have a collection called prodSampleNew with documents that have hierarchy levels as fields in arangodb:
{
prodId: 1,
LevelOne: "clothes",
LevelTwo: "pants",
LevelThree: "jeans",
... etc....
}
I want take the hierarchy levels and convert them into their own documents, so I can eventually build a proper graph with the hierarchy.
I was able to get this to extract the first level fo the hierarchy and put it in a new collection using the following:
for i IN [1]
let HierarchyList = (
For prod in prodSampleNew
COLLECT LevelOneUnique = prod.LevelOne
RETURN LevelOneUnique
)
FOR hierarchyLevel in HierarchyList
INSERT {"name": hierarchyLevel}
IN tmp
However, having to put a for I IN [1] at the top seems wrong and that there should be a better way.(yes I am fairly new to AQL)
Any pointers on a better way to do this would be appreciated
Not sure what you are trying to achieve exactly.
The FOR i IN [1] seems unnecessary however, so you could start your AQL query directly with the subquery to compute the distinct values from hierarchy level 1:
LET HierarchyList = (
FOR prod IN prodSampleNew
COLLECT LevelOneUnique = prod.LevelOne
RETURN LevelOneUnique
)
FOR hierarchyLevel IN HierarchyList
INSERT {"name": hierarchyLevel} IN tmp
The result should be the same.
If the question is more like "how can I get all distinct names of levels from all hierarchies", then you could use something like
LET HierarchyList = UNIQUE(FLATTEN(
FOR prod IN prodSampleNew
RETURN [ prod.LevelOne, prod.LevelTwo, prod.LevelThree ]
))
...
to produce an array with the unique names of the hierarchy levels for level 1-3.
Shouldn't this answer your question, please describe the desired result the query should produce.

How to filter by document attributes in couchdb?

Could someone explain to me how I can filter documents with multiple attributes by using arrays and keys?
For example I have a document with the attribute a, b, c and d. I would like to filter by an user selected value from attribute "a". Later I would like to narrow the results with a value from the attribute "c" or maybe a value from the attribute from "d".
Does anyone have suggestions how to accomplish this task elegant?
Assuming your doc looks like:
{ 'a': 123, 'b': 456, 'c': 789, ... }
You can create a view like this:
function(doc){
emit([doc.a, doc.b, doc.c], doc)
}
You can then use the startkey and endkey parameters to access the views, whilst restricting results to a specific subset:
...&startkey=[123,]&endkey=[123,{}] // Shows all results with doc.a=123
...&startkey=[123,]&endkey=[123,456] // Shows all results with doc.a=123 and doc.b<=456
However all elements will be sorted in a single list and all you can ever access is a subsection of this list. So if you want to access documents where 123 <= doc.a <= 456 and doc.b between 123 and 456, you'll have to create two separate views, one for doc.a and one for doc.b and then have your client app identify the documents returned by both views.

Mongodb: How to order a "select in" in same order as elements of the array

I'm performing this mongo query:
db.mytable.find({id:{$in:["1","2", "3", "4" ]}});
It returns all results in a strange order, as it follows:
4,3,2,1
I need to retrieve all results in same order as it was defined in the query array.
1,2,3,4
Is it possible ?
There is indeed no guarantee about the order of results returned from your query, but you could do a sort afterwards with the result. Two examples, the first one with the order you wanted, the second one reversed.
const arr = ["1", "2", "3", "4" ];
db.collection.find({ id: { $in: arr }})
.then(result => {
let sorted = arr.map(i => result.find(j => j.id === i));
console.log(sorted) // 1, 2, 3, 4
let reversed = arr.reverse().map(i => result.find(j => j.id === i));
console.log(reversed) // 4, 3, 2, 1
});
In case you want to do real MongoDB ID lookups, use db.collection.find({ _id: { $in: arr }}) and .map(i => result.find(j => j._id == i)) (Notice the two equal signs instead of three)
A couple of things to note:
1.) MongoDB, like most databases, makes no guarantees about the order of results returned from your query unless you use a call to sort(). If you really want to guarantee that your query result is returned in a a specific order, you'll need to specify that specific sort order.
2.) In general, the most recently updated/moved document will show up at the end of your result set but there are still no guarantees. MongoDB uses "natural order" for its native ordering of objects and although this is very close to the order of insertion, it is not guaranteed to be the same.
3.) Indexed fields will behave differently. It's worth pointing out that it looks like your query is using id and not _id. The former, _id would be indexed by default and id would not be indexed unless you've explicitly added an index to that field.
You can read more about MongoDB's sorting and ordering here:
http://www.mongodb.org/display/DOCS/Sorting+and+Natural+Order
you can write a query like this :
db.mytable.find({id:{$in:["1","2", "3", "4" ]}}).sort({id:1})
To have your results ORDER BY id ASC
Source : MongoDB Advanced Queries
But if you just want to order the results based on your $in array, try to sort your $in array in the reverse order, the result regarding to the first element of your $in array is likely to appear as the last element of the results

Resources