using collect in arangodb insert to create new documents - arangodb

I have a collection called prodSampleNew with documents that have hierarchy levels as fields in arangodb:
{
prodId: 1,
LevelOne: "clothes",
LevelTwo: "pants",
LevelThree: "jeans",
... etc....
}
I want take the hierarchy levels and convert them into their own documents, so I can eventually build a proper graph with the hierarchy.
I was able to get this to extract the first level fo the hierarchy and put it in a new collection using the following:
for i IN [1]
let HierarchyList = (
For prod in prodSampleNew
COLLECT LevelOneUnique = prod.LevelOne
RETURN LevelOneUnique
)
FOR hierarchyLevel in HierarchyList
INSERT {"name": hierarchyLevel}
IN tmp
However, having to put a for I IN [1] at the top seems wrong and that there should be a better way.(yes I am fairly new to AQL)
Any pointers on a better way to do this would be appreciated

Not sure what you are trying to achieve exactly.
The FOR i IN [1] seems unnecessary however, so you could start your AQL query directly with the subquery to compute the distinct values from hierarchy level 1:
LET HierarchyList = (
FOR prod IN prodSampleNew
COLLECT LevelOneUnique = prod.LevelOne
RETURN LevelOneUnique
)
FOR hierarchyLevel IN HierarchyList
INSERT {"name": hierarchyLevel} IN tmp
The result should be the same.
If the question is more like "how can I get all distinct names of levels from all hierarchies", then you could use something like
LET HierarchyList = UNIQUE(FLATTEN(
FOR prod IN prodSampleNew
RETURN [ prod.LevelOne, prod.LevelTwo, prod.LevelThree ]
))
...
to produce an array with the unique names of the hierarchy levels for level 1-3.
Shouldn't this answer your question, please describe the desired result the query should produce.

Related

ArangoDB populate relation as field over graph query

I recently started using Arango since I want to make use of the advantages of graph databases. However, I'm not yet sure what's the most elegant and efficient approach to query an item from a document collection and applying fields to it that are part of a relation.
I'm used to make use of population or joins in SQL and NoSQL databases, but I'm not sure how it works here.
I created a document collection called posts. For example, this is a post:
{
"title": "Foo",
"content": "Bar"
}
And I also have a document collection called tags. A post can have any amount of tags, and my goal is to fetch either all or specific posts, but with their tags included, so for example this as my returning query result:
{
"title": "Foo",
"content": "Bar",
"tags": ["tag1", "tag2"]
}
I tried creating those two document collections and an edge collection post-tags-relation where I added an item for each tag from the post to the tag. I also created a graph, although I'm not yet sure what the vertex field is used for.
My query looked like this
FOR v, e, p IN 1..2 OUTBOUND 'posts/testPost' GRAPH post-tags-relation RETURN v
And it did give me the tag, but my goal is to fetch a post and include the tags in the same document...The path vertices do contain all tags and the post, but in separate arrays, which is not nice and easy to use (and probably not the right way). I'm probably missing something important here. Hopefully someone can help.
You're really close - it looks like your query to get the tags is correct. Now, just add a bit to return the source document:
FOR post IN posts
FILTER post._key == 'testPost'
LET tags = (
FOR v IN 1..2 OUTBOUND post
GRAPH post-tags-relation
RETURN v.value
)
RETURN MERGE(
post,
{ tags }
)
Or, if you want to skip the FOR/FILTER process:
LET post = DOCUMENT('posts/testPost')
LET tags = (
FOR v IN 1..2 OUTBOUND post
GRAPH post-tags-relation
RETURN v.value
)
RETURN MERGE(
post,
{ tags }
)
As for graph definition, there are three required fields:
edge definitions (an edge collection)
from collections (where your edges come from)
to collections (where your edges point to)
The non-obvious vertex collections field is there to allow you to include a set of vertex-only documents in your graph. When these documents are searched and how they're filtered remains a mystery to me. Personally, I've never used this feature (my data has always been connected) so I can't say when it would be valuable, but someone thought it was important to include.

Efficiently count Documents with different values for a given field

I am trying to count the number of documents that are in each possible state in a particular Arango collection.
This should be possible in 1 pass over all of the documents using a bucket-sort like strategy where you iterate over all documents, if the value for the state hasn't been seen before, you add a counter with a value of 1 to a list. If you have seen that state before, you increment the counter. Once you've reached the end, you'll have a counter for each possible state in the DB that indicates how many documents are currently stored with that state.
I can't seem to figure out how to write this type of logic in AQL to submit as a query. Current strategy is like this:
Loop over all documents, filtering only docs of a particular state.
Loop over all documents, filtering only docs of a different particular state.
...
All states have been filtered.
Return size of each set
This works, but I'm sure it's much slower than it should be. This also means that if we add a new state, we have to update the query to loop over all docs an additional time, filtering based on the new state. A bucket-sort like query would be quick, and would need no updating as new states are created as well.
If these were the documents:
{A}
{B}
{B}
{C}
{A}
Then I'd like the result to be
{ A:2, B:2, C:1 }
Where A,B,&C are values for a particular field. Current strategy filters like so
LET docsA = (
FOR doc in collection
FILTER doc.state == A
RETURN doc
)
Then manually construct the return object calling LENGTH on each list of docs
Any help or additional info would be greatly appreciated
What about using a COLLECT function? (see docs here)
FOR doc IN collection
COLLECT s = doc.state WITH COUNT INTO c
RETURN { state: s, count: c }
This would return something like:
[
{ state: 'A', count: 23 },
{ state: 'B', count: 2 },
{ state: 'C', count: 45 }
]
Would that accomplish what you are after?

Many to many AQL query

I have 2 collections and one edge collection. USERS, FILES and FILES_USERS.
Im trying to get all FILES documents, that has the field "what" set to "video", for a specific user, but also embed another document, also from the collection FILES, but where the "what" is set to "trailer" and belongs to the "video" into the results.
I have tried the below code but its not working correctly, im getting a lot of duplicate results...its a mess. Im definitely doing it wrong.
FOR f IN files
FILTER f.what=="video"
LET trailer = (
FOR f2 IN files
FILTER f2.parent_key==f._key
AND f2.what=="trailer"
RETURN f2
)
FOR x IN files_users
FILTER x._from=="users/18418062"
AND x.owner==true
RETURN DISTINCT {f,trailer}
There may be a better way to do this with graph query syntax, but try this. Adjust the UNIQUE functions based on your data-model.
LET user_files = UNIQUE(FOR u IN FILES_USERS
FILTER u._from == "users/18418062" AND u.owner
RETURN u._to)
FOR uf IN user_files
FOR f IN files
FILTER f._key == uf AND f.what == "video"
LET trailers = UNIQUE(FOR t IN files
FILTER t.parent_key == f._key AND t.what == "trailer"
RETURN t)
RETURN {"video": f, "trailers": trailers}
Well, check to see If you have duplicate data as suggested by TMan, however check your query syntax too. It appears that you have no link between your f subquery and the x in the main query. That would cause the query to potentially return a lot of dups if there are multiple records in collection files_users for user users/18418062
Try adding a join in the main query. Something like:
FOR x IN files_users
FILTER x._from=="users/18418062"
AND x.owner==true
AND x._to == f._id
RETURN DISTINCT {f,trailer}
On a related note, if you run into performance issues doing a subquery for trailers , you could instead try just doing a join and array expansion and see if that works for your case

ArangoDb AQL Graph queries traversal example

I am having some trouble wrapping my head around how to traverse a certain graph to extract some data.
Given a collection of "users" and a collection of "places".
And a "likes" edge collection to denote that a user likes a certain place. The "likes" edge collection also has a "review" property to store a user's review about the place.
And a "follows" edge collection to denote that a user follows another user.
How can I traverse the graph to fetch all the places that I like with my review of the place and the reviews of the users I follow that also like the same place.
for example, in the above graph. I am user 6327 and I reviewed both places(7968 and 16213)
I also follow user 6344 which also happens to have reviewed the place 7968.
How can I get all the places that I like and the reviews of the people that I follow who also reviewed the same place that I like.
an expected output would be something like the following:
[
{
name:"my name",
place: "place 1",
id: 1
review,"my review about place 1"
},
{
name:"my name",
place: "place 2",
id: 2
review,"my review about place 2"
},
{
name:"name of the user I follow",
place: "place 2",
id: 2
review,"review about place 2 from the user I follow"
}
]
There are a number of ways to do this query, and it also depends on where you want to add parameters, but for the sake of simplicity I've built this quite verbose query below to help you understand one way of approaching the problem.
One way is to determine the _id of your user record, then find all the _id's of the friends you follow, and then to work out all related reviews in one query.
I take a different approach below, and that is to:
Determine the reviews you have written
Determine who you follow
Determine the reviews the people you follow have written
Merge together your reviews with those of the people you follow
It is possible to merge these queries together more optimally, but I thought it worth breaking them out like this (and showing the output of each stage as well as the final answer) to help you see what data is available.
A key thing to understand about AQL graph queries is how you have access to vertices, edges, and paths when you perform a query.
A path is an object in it's own right and it's worth investigating the contents of that object to better understand how to exploit it for path information.
This query assumes:
users document collection contains users
places document collection contains places
follows edge collection tracks users following other users
reviews edge collection tracks reviews people wrote
Note: When providing an id on each record I used the id of the review, because if you know that id you can fetch the edge document and get the id of both the user and the place as well as read all the data about the review.
LET my_reviews = (
FOR vertices, edges, paths IN 1..1 OUTBOUND "users/6327" reviews
RETURN {
name: FIRST(paths.vertices).name,
review_id: FIRST(paths.edges)._id,
review: FIRST(paths.edges).review,
place: LAST(paths.vertices).place
}
)
LET who_i_follow = (
FOR v IN 1..1 OUTBOUND "users/6327" follows
RETURN v
)
LET reviews_of_who_i_follow = (
FOR users IN who_i_follow
FOR vertices, edges, paths in 1..1 OUTBOUND users._id reviews
RETURN {
name: FIRST(paths.vertices).name,
review_id: FIRST(paths.edges)._id,
review: FIRST(paths.edges).review,
place: LAST(paths.vertices).place
}
)
RETURN {
my_reviews: my_reviews,
who_i_follow: who_i_follow,
reviews_of_who_i_follow: reviews_of_who_i_follow,
merged_reviews: UNION(my_reviews, reviews_of_who_i_follow)
}
The first vertex in paths.vertices is the starting vertex (users/6327)
The last vertex in paths.vertices is the end of the path, e.g. who you follow
The first edge in paths.edges is the review that the user made of the place
Here is another more compact version of the query that takes a param, the _id of the user that is 'you'.
LET target_users = APPEND(TO_ARRAY(#user), (
FOR v IN 1..1 OUTBOUND #user follows RETURN v._id
))
LET selected_reviews = (
FOR u IN target_users
FOR vertices, edges, paths in 1..1 OUTBOUND u reviews
LET user = FIRST(paths.vertices)
LET place = LAST(paths.vertices)
LET review = FIRST(paths.edges)
RETURN {
name: user.name,
review_id: review._id,
review: review.review,
place: place.place
}
)
RETURN selected_reviews

Couchdb query for values calculated from key input

suppose i have the following data in my database:
[1,2],[2,1],[1,3],[3,1]...
were the numbers represent the a and b values of the formula a*x+b
what i now want is a query that returns the difference to a given point x,y.
for example: the point [2,6] is given. i want my query to return
[1,2] = -2 (1*2+2=4 4-6=-2)
[2,1] = -1 (2*2+1=5 5-6=-1)
[1,3] = -1 (1*2+3=5 4-6=-1)
[3,1] = 1 (3*2+1=7 7-6=-1)
I know how to do this in SQL but the data is already in a couchdb. I'm quite new to the NoSQL world and was wondering if something like this would be possible in couchdb.
what you can do is to use the standard MapReduce functionality of CouchDB.
Map is function you put in a view, which finds your data. You can have various criteria how to locate the docs you need. Next, if you specify so in the query with reduce=true, a reduce function is executed on each document that matched the map condition. You can use JavaScript to perform various operations on the document's values.
In your case, the map can look something like this:
function(doc) {
if(doc.a && doc.b) {
emit(doc._id,[doc.a, doc.b]);
}
}
then, the reduce gets called, like this:
function(keys, values, rereduce) {
var res;
//do something with values...
return res;
}
In your case keys will be list of document ID's and values will be the array of your a & b fields.
When you call the MapReduce (depending what method you use to access the DB), you should specify reduce=true.
Good resources on MapReduce (and on Views, Sorting and List funtions) are:
http://guide.couchdb.org/draft/views.html
http://www.slideshare.net/okurow/couchdb-mapreduce-13321353
Another way to go is to use a list function on the Map result, if you want to output the result in HTML. A good reason to use List function is that you can pass arguments to it with querystring, in your case it may be the point for which you want to calculate distances.
For detailed description on List functions, have a look here:
http://guide.couchdb.org/draft/transforming.html
Hope this helps.

Resources