How to use NEIGHBORS in AQL? - arangodb

I'm new to ArangoDB and a growing fan already. Among many things we need to translate many-to-many relations into graphs, and query efficiently in there.
However I can't seem to reproduce the behaviour in NEIGHBORS as described in the cookbook
under "Using Edge Collections".
After I insert data and run:
FOR b IN books RETURN { book: b, authors: NEIGHBORS(books, written, b._id, 'inbound') }
[
{
"book" : {
"_id" : "books/10519631898915",
"_key" : "10519631898915",
"_rev" : "10519631898915",
"title" : "The beauty of JOINS"
},
"authors" : [ ]
}
]
Empty authors list! I tried this instead:
FOR b IN books RETURN { book: b, authors: NEIGHBORS(authors, written, b._id, 'inbound') }
[
{
"book" : {
"_id" : "books/10519631898915",
"_key" : "10519631898915",
"_rev" : "10519631898915",
"title" : "The beauty of JOINS"
},
"authors" : [
"authors/10519474612515",
"authors/10519475792163"
]
}
]
Which returns the _id list. None of those return what I need as in the cookbook, which is the expected edge/vertex structure.
(All has been tested in 2.6.9)
How is the use of NEIGHBORS intended and how do I get to my goal in pure AQL?
Is there a standard documentation of NEIGHBORS (and other graph AQL features) somewhere with description and type of each argument as well as return value?

Have you tried the includeData option for NEIGHBORS?
FOR b IN books RETURN { book: b, authors: NEIGHBORS(authors, written, b._id, 'inbound', [], {includeData: true}) }
That worked in my test.
It will be way more performant then PATHS on large datasets (PATHS computes much more irrelevant information)
Note: The empty array [] is used to define edges that should be followed only. With an empty array we follow all edges, but you could also follow special edges f.e. {label: "written"} instead of [].

Right, I found one solution:
FOR p IN PATHS(books, written, 'inbound')
RETURN p.destination
Result:
Warnings:
[1577], 'collection 'books' used as expression operand'
Result:
[
{
"_id": "books/10519631898915",
"_rev": "10519631898915",
"_key": "10519631898915",
"title": "The beauty of JOINS"
},
{
"_id": "authors/10519474612515",
"_rev": "10519474612515",
"_key": "10519474612515",
"name": {
"first": "John",
"last": "Doe"
}
},
{
"_id": "authors/10519475792163",
"_rev": "10519475792163",
"_key": "10519475792163",
"name": {
"first": "Maxima",
"last": "Musterfrau"
}
}
]
It gets the destination vertices at least, but it doesn't seem right since I get a warning and the source vertex is included as a destination.
Further elaboration and suggestions are very welcome.

UPDATE (2017): NEIGHBORS is no longer supported in AQL 3.x
Instead of
NEIGHBORS(books, written, b._id, 'inbound')
you could write a sub-query:
(FOR v IN 1..1 INBOUND b written RETURN v)

Related

Null when filtering Many-to-Many relationship with JHipster

I have an issue with filtering in JHipster.
Here is my (relevant) jhipster-jdl.jh file :
entity Exercise {
name String required
}
entity Difficulty {
name String required
}
entity Language {
name String required
}
relationship ManyToMany {
Exercise{language(name)} to Language
}
relationship ManyToOne {
Exercise{difficulty} to Difficulty
}
filter Exercise
I generated the Springboot service with JHipster and did not change anything.
Let's say I have an exercise called "test" with difficulty "easy" and languages "spanish" and "dutch".
When I query the GET exercises endpoint with the filter name.equals=test :
http://localhost:8080/myservice/api/exercises?nameId.equals=test
I get this answer :
[
{
"id" : 1000,
"difficulty" : {
"id": 5,
"name": "easy"
},
"languages" : null,
"name" : "test"
}
]
As you can see, the issue is that I don't have direct access to the languages linked to my exercise.
Note that the difficulty field has no issue because it is a many-to-one relationship.
The database is not the source of these issues, because if I query the GET exercises/{id} endpoint with the exercise's id :
http://localhost:8080/myservice/api/exercises/1000
I get the right result :
{
"id" : 1000,
"difficulty" : {
"id": 5,
"name": "easy"
},
"languages" : [
{
"id" : 200,
"name" : "spanish"
},
{
"id" : 205,
"name" : "dutch"
}
],
"name" : "test"
}
Now let's try to query the GET exercises endpoint with the filter languageId.greaterOrEqualThan=200 (for the sake of the example) :
http://localhost:8080/myservice/api/exercises?languageId.greaterOrEqualThan=200
Then the response will be :
[
{
"id" : 1000,
"difficulty" : {
"id": 5,
"name": "easy"
},
"languages" : null,
"name" : "test"
},
{
"id" : 1000,
"difficulty" : {
"id": 5,
"name": "easy"
},
"languages" : null,
"name" : "test"
}
]
Notice that the exercise comes out twice (or n times if it has n languages meeting the constraint, I checked), which is problematic.
I feel like something in the JHipster generator is broken, but it seems unlikely because I did not find anybody talking about this quite crippling issue.
Did I do something wrong when generating my JHipster project ? Or is it a true issue ?
Please feel free to ask for any other piece of code, I'm not sure what could be relevant. Thanks.
Note : I noticed the exercise endpoint filters for the languages field use the singular (e.g. language.equals), I don't know if this is normal for a many-to-many relationship.

Find field value that is found in most documents

Suppose there are documents that represents books and there is a field called author. What aggregation(s) can retrieve the author value that is found in most documents? Or rephrased, the author that has written most books?
In case it's not clear from the tag, the question is referring to Elasticsearch.
e.g.
{
"name" : "Book1"
"author" : "John"
},
{
"name" : "Book3"
"author" : "Mike"
},
{
"name" : "Book2"
"author" : "John"
},
{
"name" : "Book4"
"author" : "Frank"
}
For the above data, John must be returned since there are 2 documents with him as an author, while only one book by the others.
I've tried with value_count and cardinality, but this only returns the count and not the value itself.
Actually this I found this is quite simple using terms aggregation. Left it, maybe other find this useful.
Reference
e.g. From data from above:
{
"aggs": {
"author_count": {
"terms": {
"size": 2,
"field": "book.author"
}
}
}

how to query nested array of objects in mongodb?

i am trying to query nested array of objects in mongodb from node js, tried all the solutions but no luck. can anyone please help this on priority?
I have tried following :
{
"name": "Science",
"chapters": [
{
"name": "ScienceChap1",
"tests": [
{
"name": "ScienceChap1Test1",
"id": 1,
"marks": 10,
"duration": 30,
"questions": [
{
"question": "What is the capital city of New Mexico?",
"type": "mcq",
"choice": [
"Guadalajara",
"Albuquerque",
"Santa Fe",
"Taos"
],
"answer": [
"Santa Fe",
"Taos"
]
},
{
"question": "Who is the author of beowulf?",
"type": "notmcq",
"choice": [
"Mark Twain",
"Shakespeare",
"Abraham Lincoln",
"Newton"
],
"answer": [
"Shakespeare"
]
}
]
},
{
"name": "ScienceChap1test2",
"id": 2,
"marks": 20,
"duration": 30,
"questions": [
{
"question": "What is the capital city of New Mexico?",
"type": "mcq",
"choice": [
"Guadalajara",
"Albuquerque",
"Santa Fe",
"Taos"
],
"answer": [
"Santa Fe",
"Taos"
]
},
{
"question": "Who is the author of beowulf?",
"type": "notmcq",
"choice": [
"Mark Twain",
"Shakespeare",
"Abraham Lincoln",
"Newton"
],
"answer": [
"Shakespeare"
]
}
]
}
]
}
]
}
Here is what I've tried so far but still can't get it to work
db.quiz.find({name:"Science"},{"tests":0,chapters:{$elemMatch:{name:"ScienceCh‌​ap1"}}})
db.quiz.find({ chapters: { $elemMatch: {$elemMatch: { name:"ScienceChap1Test1" } } }})
db.quiz.find({name:"Science"},{chapters:{$elemMatch:{$elemMatch:{name:"Scienc‌​eChap1Test1"}}}}) ({ name:"Science"},{ chapters: { $elemMatch: {$elemMatch: { name:"ScienceChap1Test1" } } }})
Aggregation Framework
You can use the aggregation framework to transform and combine documents in a collection to display to the client. You build a pipeline that processes a stream of documents through several building blocks: filtering, projecting, grouping, sorting, etc.
If you want get the mcq type questions from the test named "ScienceChap1Test1", you would do the following:
db.quiz.aggregate(
//Match the documents by query. Search for science course
{"$match":{"name":"Science"}},
//De-normalize the nested array of chapters.
{"$unwind":"$chapters"},
{"$unwind":"$chapters.tests"},
//Match the document with test name Science Chapter
{"$match":{"chapters.tests.name":"ScienceChap1test2"}},
//Unwind nested questions array
{"$unwind":"$chapters.tests.questions"},
//Match questions of type mcq
{"$match":{"chapters.tests.questions.type":"mcq"}}
).pretty()
The result will be:
{
"_id" : ObjectId("5629eb252e95c020d4a0c5a5"),
"name" : "Science",
"chapters" : {
"name" : "ScienceChap1",
"tests" : {
"name" : "ScienceChap1test2",
"id" : 2,
"marks" : 20,
"duration" : 30,
"questions" : {
"question" : "What is the capital city of New Mexico?",
"type" : "mcq",
"choice" : [
"Guadalajara",
"Albuquerque",
"Santa Fe",
"Taos"
],
"answer" : [
"Santa Fe",
"Taos"
]
}
}
}
}
$elemMatch doesn't work for sub documents. You can use the aggregation framework for "array filtering" by using $unwind.
You can delete each line from the bottom of each command in the aggregation pipeline in the above code to observe the pipelines behavior.
You should try the following queries in the mongodb simple javascript shell.
There could be Two Scenarios.
Scenario One
If you simply want to return the documents that contain certain chapter names or test names for example just one argument in find will do.
For the find method the document you want to be returned is specified by the first argument. You could return documents with the name Science by doing this:
db.quiz.find({name:"Science"})
You could specify criteria to match a single embedded document in an array by using $elemMatch. To find a document that has a chapter with the name ScienceChap1. You could do this:
db.quiz.find({"chapters":{"$elemMatch":{"name":"ScienceChap1"}}})
If you wanted your criteria to be a test name then you could use the dot operator like this:
db.quiz.find({"chapters.tests":{"$elemMatch":{"name":"ScienceChap1Test1"}}})
Scenario Two - Specifying Which Keys to Return
If you want to specify which keys to Return you can pass a second argument to find (or findOne) specifying the keys you want. In your case you can search for the document name and then provide which keys to return like so.
db.quiz.find({name:"Science"},{"chapters":1})
//Would return
{
"_id": ObjectId(...),
"chapters": [
"name": "ScienceChap2",
"tests: [..all object content here..]
}
If you only want to return the marks from each test object you can use the dot operator to do so:
db.quiz.find({name:"Science"},{"chapters.tests.marks":1})
//Would return
{
"_id": ObjectId(...),
"chapters": [
"tests: [
{"marks":10},
{"marks":20}
]
}
If you only want to return the questions from each test:
db.quiz.find({name:"Science"},{"chapters.tests.questions":1})
Test these out. I hope these help.

Elasticsearch two level sort in aggregation list

Currently I am sorting aggregations by document score, so most relevant items come first in aggregation list like below:
{
'aggs' : {
'guilds' : {
'terms' : {
'field' : 'guilds.title.original',
'order' : [{'max_score' : 'desc'}],
'aggs' : {
'max_score' : {
'script' : 'doc.score'
}
}
}
}
}
}
I want to add another sort option to the order terms order array in my JSON. but when I do that like this :
{
'order' : [{'max_score' : 'desc'}, {"_count" : "desc"},
}
The second sort does not work. For example when all of the scores are equal it then should sort based on query but it does not work.
As a correction to Andrei's answer ... to order aggregations by multiple criteria, you MUST create an array as shown in Terms Aggregation: Order and you MUST be using ElasticSearch 1.5 or later.
So, for Andrei's answer, the correction is:
"order" : [ { "max_score": "desc" }, { "_count": "desc" } ]
As Andrei has it, ES will not complain but it will ONLY use the last item listed in the "order" element.
I don't know how your 'aggs' is even working because I tried it and I had parsing errors in three places: "order" is not allowed to have that array structure, your second "aggs" should be placed outside the first "terms" aggs and, finally, the "max_score" aggs should have had a "max" type of "aggs". In my case, to make it work (and it does actually order properly), it should look like this:
"aggs": {
"guilds": {
"terms": {
"field": "guilds.title.original",
"order": {
"max_score": "desc",
"_count": "desc"
}
},
"aggs": {
"max_score": {
"max": {
"script": "doc.score"
}
}
}
}
}

Using near with elemMatch in Mongoose

I am searching within a collection of Stores. Stores have an embedded collection of outlets with locations. My goal is to return the set of stores that have outlets near a geolocation, and also only return those Outlets within that location.
I can successfully restrict the query to only return Stores have an Outlet at a particular location using 'near'
Store
.where('isActive').equals(true)
.where('outlets.location')
.near({ center: [153.027117, -27.468515], maxDistance: 1000 / 6378137, spherical: true })
.where('outlets.isActive').equals(true)
.where('products.productType').equals('53433f1f3e02e39addde1954')
.where('products.isActive').equals(true)
.select('name outlets')
.select({'products': {$elemMatch: {'isActive': true, 'productType': '53433f1f3e02e39addde1954'}}})
.select('name outlets')
.execQ()
.then(function (results) {
console.log(results);
})
.fail(function (err) {
console.log(err);
})
.done();
The problem I have is that the store document returns all the outlets, not just the outlet that matched the geolocation. I've tried using elemMatch within a select like I did with the products;
.select({'outlets': {$elemMatch: {'location': {near:{ center: [153.027117, -27.468515], maxDistance: 10000 / 6378137, spherical: true }}}}})
However it returns an empty array. Can use use the near operator in an elemMatch clause? Am I doing it incorrectly? Is there an more efficient/fast/better way to achieve the goal?
I see what you are trying to do here but there seems to be a few flaws in this sort of design. Though not exactly your document structure I see you are trying to do something like this:
{
"_id" : ObjectId("5344badd519563414f23fdf8"),
"store" : "Mine",
"outlets" : [
{
"name" : "somewhere",
"loc" : {
"type" : "Point",
"coordinates" : [
150.975131,
-33.8440366
]
}
},
{
"name" : "else",
"loc" : {
"type" : "Point",
"coordinates" : [
151.3651524,
-33.8389783
]
}
}
]
}
{
"_id" : ObjectId("5344be6f519563414f23fdf9"),
"store" : "Another",
"outlets" : [
{
"name" : "else",
"loc" : {
"type" : "Point",
"coordinates" : [
151.3651524,
-33.8389783
]
}
},
{
"name" : "somewhere",
"loc" : {
"type" : "Point",
"coordinates" : [
150.975131,
-33.8440366
]
}
}
]
}
So basically you appear to be attempting to nest the outlet locations within an array in a top level document.
What I am referring to a flaw here is that by design, any type of "near" based query is going to return more than 1 result. That does seem logical when you look at the purpose. You can of course modify this to restrict the results by "maxDistance" but generally it will be more than 1.
So the only way is to .limit() the results returned by the cursor to a single "nearest" response. Also note that with some operations those results are not necessarily "sorted" with the "nearest response first.
Now as these results are actually contained within an array of the document, remember that .find() itself does not actually "filter" the results of an array, so of course the document will contain all of the array contents.
If you tried to "project" with a positional $ operator, then the problem falls back to the original point because there is no singular actual match, so it is not possible to return an "index" value for the matching element. If you in fact did try this, you would always get the default index value of 0, so just returning the first element.
If then you thought you could run off to aggregate and and try to actually "de-normalize" the array entries, you would be out of luck because due to the need to use the index at the first stage of any aggregation pipeline statement.
So the summary of this is that embedded entries like this are not suited to this design where you need to do geo-spatial matching on those store locations. The locations are better off in a separate collection:
{
"_id" : ObjectId("5344bec7519563414f23fdfa"),
"store": "Mine"
"name" : "else",
"loc" : {
"type" : "Point",
"coordinates" : [
151.3651524,
-33.8389783
]
}
}
{
"_id" : ObjectId("5344bed5519563414f23fdfb"),
"store": "Mine"
"name" : "somewhere",
"loc" : {
"type" : "Point",
"coordinates" : [
150.975131,
-33.8440366
]
}
}
So that would allow you to "limit" the result to the "nearest" match by setting the limit to 1. You can also include any necessary values such as the "store" to be used in your filtering. If you need to you can include other information aside from what you need to filter or otherwise just put the ObjectId values within the array of the original object, or possibly even duplicate for both collections.
But since the very nature of these queries is intended to not only return 1 match, then there is no way you are going to get this to work on embedded documents. So your solution will require some changes in your overall schema.

Resources