Cloudant 1 to many function - couchdb

I’ve just started to use Cloudant and I just can’t get my head around the map functions. I’ve been fiddling with the data below but it isn’t working out as I expected.
The relationship is, a user can have many vehicles. A vehicle belongs to 1 user. The vehicle ‘userId’ is the key of the user. There is a bit of redundancy as in user the _id and userId is the same, guess later is not required.
Anyhow, how can I find for a/every user, the vehicles which belong to it? The closest I’ve come through trial and error is a result which displays the owner of every vehicle, but I would like it the other way round, the user and the vehicles belonging to it. All the examples I’ve found use another document which ‘joins’ two or more documents, but I don’t need to do that?
Any point in the right direction appreciated - I really have no idea.
function (doc) {
if (doc.$doctype == "vehicle")
{
emit(doc.userId, {_id: doc.userId});
}
}
EDIT: Getting closer. I'm not sure exactly what I was expecting, but the result seems a bit 'messy'. Row[0] is the user document, row[n > 0] are the vehicle documents. I guess it's fine when a startkey/endkey is used, but without the results are a bit jumbled up.
function (doc) {
if (doc.$doctype == 'user') {
emit([doc._id, 0], doc);
} else if (doc.$doctype == 'vehicle') {
emit([doc.userId, 1, doc._id], doc);
}
}
A user is described as,
{
"_id": "user:10",
"firstname": “firstnamehere",
"secondname": “secondnamehere",
"userId": "user:10",
"$doctype": "user"
}
a vehicle is described as,
{
"_id": "vehicle:4002”,
“name”: “avehicle”,
"userId": "user:10",
"$doctype": "vehicle",
}

You're getting in the right direction! You already got that right with the global IDs. Having the type of the document as part of the ID in some form is a very good idea, so that you don't get confused later (all documents are in the same "pot").
Here are some minor problems with your current solution (before getting to your actual question):
Don't emit the doc as value in emit(key, value). You can always ask for the document that belongs to a view row by querying with include_docs=true. Having the doc as view value increases the view indexes a lot. When you don't need a specific value, use emit(key, null).
You also don't need the ID in the emit value. You'll get the ID of the document that belongs to a view row as part of the row anyway.
View Collation
Now to your problem of aggregating the vehicles with their user. You got the basic pattern right. This pattern is called view collation, you can read more about it in the CouchDB docs (ignore that it is in the "Couchapp" section).
The trick with view collation is that you return two or more types of documents, but make sure that they are sorted in a way that allows for direct grouping. Thus it is important to understand how CouchDB sorts the view result. See the collation specification for more information on that one. An important key to understanding view collation is that rows with array keys are sorted by key elements. So when two rows have the same key[0], they sort by key[1]. If that's equal as well, key[2] is considered, and so on.
Your map function frist groups users and vehicles by user ID (key[0]). Your map function then uses the fact that 0 sorts before 1 in the second element of the key, so your view will contain the following:
user 1
vehicle of user 1
vehicle of user 1
vehicle of user 1
user 2
user 3
vehicle of user 3
user 4
etc.
As you can see, the vehicles of a user immediately follow their user. Thus you can group this result into aggregates without performing expensive sort or lookup operations.
Note that users are sorted according to their ID, and vehicles within users also according to their ID. This is because you use the IDs in the key array.
Creating Queries
Now that view isn't worth much if you can't query according to your needs. A view as you have it supports the following queries:
Get all users with their vehicles
Get a range of users with their vehicles
Get a single user with its vehicles
Get a single user without vehicles (you could also use the _all_docs view for that though)
Example query for "all users between user 1 and user 3 (inclusive) with their vehicles"
We want to query for a range, so we use startkey and endkey in the query:
startkey=["user:1", 0]
endkey=["user:3", 1, {}]
Note the use of {} as sentinel value, which is required so that the end key is larger than any row that has a key of ["user:3", 1, (anyConceivableVehicleId)]

Related

Getting index of the resultset

Is there a way to get the index of the results within an aql query?
Something like
FOR user IN Users sort user.age DESC RETURN {id:user._id, order:{index?}}
If you want to enumerate the result set and store these numbers in an attribute order, then this is possible with the following AQL query:
LET sorted_ids = (
FOR user IN Users
SORT user.age DESC
RETURN user._key
)
FOR i IN 0..LENGTH(sorted_ids)-1
UPDATE sorted_ids[i] WITH { order: i+1 } IN Users
RETURN NEW
A subquery is used to sort users by age and return an array of document keys. Then a loop over a numeric range from the first to the last index of the that array is used to iterate over its elements, which gives you the desired order value (minus 1) as variable i. The current array element is a document key, which is used to update the user document with an order attribute.
Above query can be useful for a one-off computation of an order attribute. If your data changes a lot, then it will quickly become stale however, and you may want to move this to the client-side.
For a related discussion see AQL: Counter / enumerator
If I understand your question correctly - and feel free to correct me, this is what you're looking for:
FOR user IN Users
SORT user.age DESC
RETURN {
id: user._id,
order: user._key
}
The _key is the primary key in ArangoDB.
If however, you're looking for example data entered (in chronological order) then you will have to have to set the key on your inserts and/or create a date / time object and filter using that.
Edit:
Upon doing some research, I believe this link might be of use to you for AI the keys: https://www.arangodb.com/2013/03/auto-increment-values-in-arangodb/

ArangoDb AQL Graph queries traversal example

I am having some trouble wrapping my head around how to traverse a certain graph to extract some data.
Given a collection of "users" and a collection of "places".
And a "likes" edge collection to denote that a user likes a certain place. The "likes" edge collection also has a "review" property to store a user's review about the place.
And a "follows" edge collection to denote that a user follows another user.
How can I traverse the graph to fetch all the places that I like with my review of the place and the reviews of the users I follow that also like the same place.
for example, in the above graph. I am user 6327 and I reviewed both places(7968 and 16213)
I also follow user 6344 which also happens to have reviewed the place 7968.
How can I get all the places that I like and the reviews of the people that I follow who also reviewed the same place that I like.
an expected output would be something like the following:
[
{
name:"my name",
place: "place 1",
id: 1
review,"my review about place 1"
},
{
name:"my name",
place: "place 2",
id: 2
review,"my review about place 2"
},
{
name:"name of the user I follow",
place: "place 2",
id: 2
review,"review about place 2 from the user I follow"
}
]
There are a number of ways to do this query, and it also depends on where you want to add parameters, but for the sake of simplicity I've built this quite verbose query below to help you understand one way of approaching the problem.
One way is to determine the _id of your user record, then find all the _id's of the friends you follow, and then to work out all related reviews in one query.
I take a different approach below, and that is to:
Determine the reviews you have written
Determine who you follow
Determine the reviews the people you follow have written
Merge together your reviews with those of the people you follow
It is possible to merge these queries together more optimally, but I thought it worth breaking them out like this (and showing the output of each stage as well as the final answer) to help you see what data is available.
A key thing to understand about AQL graph queries is how you have access to vertices, edges, and paths when you perform a query.
A path is an object in it's own right and it's worth investigating the contents of that object to better understand how to exploit it for path information.
This query assumes:
users document collection contains users
places document collection contains places
follows edge collection tracks users following other users
reviews edge collection tracks reviews people wrote
Note: When providing an id on each record I used the id of the review, because if you know that id you can fetch the edge document and get the id of both the user and the place as well as read all the data about the review.
LET my_reviews = (
FOR vertices, edges, paths IN 1..1 OUTBOUND "users/6327" reviews
RETURN {
name: FIRST(paths.vertices).name,
review_id: FIRST(paths.edges)._id,
review: FIRST(paths.edges).review,
place: LAST(paths.vertices).place
}
)
LET who_i_follow = (
FOR v IN 1..1 OUTBOUND "users/6327" follows
RETURN v
)
LET reviews_of_who_i_follow = (
FOR users IN who_i_follow
FOR vertices, edges, paths in 1..1 OUTBOUND users._id reviews
RETURN {
name: FIRST(paths.vertices).name,
review_id: FIRST(paths.edges)._id,
review: FIRST(paths.edges).review,
place: LAST(paths.vertices).place
}
)
RETURN {
my_reviews: my_reviews,
who_i_follow: who_i_follow,
reviews_of_who_i_follow: reviews_of_who_i_follow,
merged_reviews: UNION(my_reviews, reviews_of_who_i_follow)
}
The first vertex in paths.vertices is the starting vertex (users/6327)
The last vertex in paths.vertices is the end of the path, e.g. who you follow
The first edge in paths.edges is the review that the user made of the place
Here is another more compact version of the query that takes a param, the _id of the user that is 'you'.
LET target_users = APPEND(TO_ARRAY(#user), (
FOR v IN 1..1 OUTBOUND #user follows RETURN v._id
))
LET selected_reviews = (
FOR u IN target_users
FOR vertices, edges, paths in 1..1 OUTBOUND u reviews
LET user = FIRST(paths.vertices)
LET place = LAST(paths.vertices)
LET review = FIRST(paths.edges)
RETURN {
name: user.name,
review_id: review._id,
review: review.review,
place: place.place
}
)
RETURN selected_reviews

loopback relational database hasManyThrough pivot table

I seem to be stuck on a classic ORM issue and don't know really how to handle it, so at this point any help is welcome.
Is there a way to get the pivot table on a hasManyThrough query? Better yet, apply some filter or sort to it. A typical example
Table products
id,title
Table categories
id,title
table products_categories
productsId, categoriesId, orderBy, main
So, in the above scenario, say you want to get all categories of product X that are (main = true) or you want to sort the the product categories by orderBy.
What happens now is a first SELECT on products to get the product data, a second SELECT on products_categories to get the categoriesId and a final SELECT on categories to get the actual categories. Ideally, filters and sort should be applied to the 2nd SELECT like
SELECT `id`,`productsId`,`categoriesId`,`orderBy`,`main` FROM `products_categories` WHERE `productsId` IN (180) WHERE main = 1 ORDER BY `orderBy` DESC
Another typical example would be wanting to order the product images based on the order the user wants them to
so you would have a products_images table
id,image,productsID,orderBy
and you would want to
SELECT from products_images WHERE productsId In (180) ORDER BY orderBy ASC
Is that even possible?
EDIT : Here is the relationship needed for an intermediate table to get what I need based on my schema.
Products.hasMany(Images,
{
as: "Images",
"foreignKey": "productsId",
"through": ProductsImagesItems,
scope: function (inst, filter) {
return {active: 1};
}
});
Thing is the scope function is giving me access to the final result and not to the intermediate table.
I am not sure to fully understand your problem(s), but for sure you need to move away from the table concept and express your problem in terms of Models and Relations.
The way I see it, you have two models Product(properties: title) and Category (properties: main).
Then, you can have relations between the two, potentially
Product belongsTo Category
Category hasMany Product
This means a product will belong to a single category, while a category may contain many products. There are other relations available
Then, using the generated REST API, you can filter GET requests to get items in function of their properties (like main in your case), or use custom GET requests (automatically generated when you add relations) to get for instance all products belonging to a specific category.
Does this helps ?
Based on what you have here I'd probably recommend using the scope option when defining the relationship. The LoopBack docs show a very similar example of the "product - category" scenario:
Product.hasMany(Category, {
as: 'categories',
scope: function(instance, filter) {
return { type: instance.type };
}
});
In the example above, instance is a category that is being matched, and each product would have a new categories property that would contain the matching Category entities for that Product. Note that this does not follow your exact data scheme, so you may need to play around with it. Also, I think your API query would have to specify that you want the categories related data loaded (those are not included by default):
/api/Products/13?filter{"include":["categories"]}
I suggest you define a custom / remote method in Product.js that does the work for you.
Product.getCategories(_productId){
// if you are taking product title as param instead of _productId,
// you will first need to find product ID
// then execute a find query on products_categories with
// 1. where filter to get only main categoris and productId = _productId
// 2. include filter to include product and category objects
// 3. orderBy filter to sort items based on orderBy column
// now you will get an array of products_categories.
// Each item / object in the array will have nested objects of Product and Category.
}

cassandra data model for web logging

Been playing around with Cassandra and I am trying to evaluate what would be the best data model for storing things like views or hits for unique page id's? Would it best to have a single column family per pageid, or 1 Super-column (logs) with columns pageid? Each page has a unique id, then would like to store date and some other metrics on the view.
I am just not sure which solution handles better scalability, lots of column family OR 1 giant super-column?
page-92838 { date:sept 2, browser:IE }
page-22939 { date:sept 2, browser:IE5 }
OR
logs {
page-92838 {
date:sept 2,
browser:IE
}
page-22939 {
date:sept 2,
browser:IE5
}
}
And secondly, how to handle lots of different date: entries for page-92838?
You don't need a column-family per pageid.
One solution is to have a row for each page, keyed on the pageid.
You could then have a column for each page-view or hit, keyed and sorted on time-UUID (assuming having the views in time-sorted order would be useful) or other unique, always-increasing counter. Note that all Cassandra columns are time-stamped anyway, so you would have a precise timestamp 'for free' regardless of what other time- or date- stamps you use. Using a precise time-UUID as the key also solves the problem of storing many hits on the same date.
The value of each column could then be a textual value or JSON document containing any other metadata you want to store (such as browser).
page-12345 -> {timeuuid1:metadata1}{timeuuid2:metadata2}{timeuuid3:metadata3}...
page-12346 -> ...
With cassandra, it is best to start with what queries you need to do, and model your schema to support those queries.
Assuming you want to query hits on a page, and hits by browser, you can have a counter column for each page like,
stats { #cf
page-id { #key
hits : # counter column for hits
browser-ie : #counts of views with ie
browser-firefox : ....
}
}
If you need to do time based queries, look at how twitters rainbird denormalizes as it writes to cassandra.

Couchdb: filter and group in a single view

I have a Couchdb database with documents of the form: { Name, Timestamp, Value }
I have a view that shows a summary grouped by name with the sum of the values. This is straight forward reduce function.
Now I want to filter the view to only take into account documents where the timestamp occured in a given range.
AFAIK this means I have to include the timestamp in the emitted key of the map function, eg. emit([doc.Timestamp, doc.Name], doc)
But as soon as I do that the reduce function no longer sees the rows grouped together to calculate the sum. If I put the name first I can group at level 1 only, but how to I filter at level 2?
Is there a way to do this?
I don't think this is possible with only one HTTP fetch and/or without additional logic in your own code.
If you emit([time, name]) you would be able to query startkey=[timeA]&endkey=[timeB]&group_level=2 to get items between timeA and timeB grouped where their timestamp and name were identical. You could then post-process this to add up whenever the names matched, but the initial result set might be larger than you want to handle.
An alternative would be to emit([name,time]). Then you could first query with group_level=1 to get a list of names [if your application doesn't already know what they'll be]. Then for each one of those you would query startkey=[nameN]&endkey=[nameN,{}]&group_level=2 to get the summary for each name.
(Note that in my query examples I've left the JSON start/end keys unencoded, so as to make them more human readable, but you'll need to apply your language's equivalent of JavaScript's encodeURIComponent on them in actual use.)
You can not make a view onto a view. You need to write another map-reduce view that has the filtering and makes the grouping in the end. Something like:
map:
function(doc) {
if (doc.timestamp > start and doc.timestamp < end ) {
emit(doc.name, doc.value);
}
}
reduce:
function(key, values, rereduce) {
return sum(values);
}
I suppose you can not store this view, and have to put it as an ad-hoc query in your application.

Resources