Getting index of the resultset - arangodb

Is there a way to get the index of the results within an aql query?
Something like
FOR user IN Users sort user.age DESC RETURN {id:user._id, order:{index?}}

If you want to enumerate the result set and store these numbers in an attribute order, then this is possible with the following AQL query:
LET sorted_ids = (
FOR user IN Users
SORT user.age DESC
RETURN user._key
)
FOR i IN 0..LENGTH(sorted_ids)-1
UPDATE sorted_ids[i] WITH { order: i+1 } IN Users
RETURN NEW
A subquery is used to sort users by age and return an array of document keys. Then a loop over a numeric range from the first to the last index of the that array is used to iterate over its elements, which gives you the desired order value (minus 1) as variable i. The current array element is a document key, which is used to update the user document with an order attribute.
Above query can be useful for a one-off computation of an order attribute. If your data changes a lot, then it will quickly become stale however, and you may want to move this to the client-side.
For a related discussion see AQL: Counter / enumerator

If I understand your question correctly - and feel free to correct me, this is what you're looking for:
FOR user IN Users
SORT user.age DESC
RETURN {
id: user._id,
order: user._key
}
The _key is the primary key in ArangoDB.
If however, you're looking for example data entered (in chronological order) then you will have to have to set the key on your inserts and/or create a date / time object and filter using that.
Edit:
Upon doing some research, I believe this link might be of use to you for AI the keys: https://www.arangodb.com/2013/03/auto-increment-values-in-arangodb/

Related

Timeseries differencing - ArangoDB (AQL or Python)

I have a collection which holds documents, with each document having a data observation and the time that the data was captured.
e.g.
{
_key:....,
"data":26,
"timecaptured":1643488638.946702
}
where timecaptured for now is a utc timestamp.
What I want to do is get the duration between consecutive observations, with SQL I could do this with LAG for example, but with ArangoDB and AQL I am struggling to see how to do this at the database. So effectively the difference in timestamps between two documents in time order. I have a lot of data and I don't really want to pull it all into pandas.
Any help really appreciated.
Although the solution provided by CodeManX works, I prefer a different one:
FOR d IN docs
SORT d.timecaptured
WINDOW { preceding: 1 } AGGREGATE s = SUM(d.timecaptured), cnt = COUNT(1)
LET timediff = cnt == 1 ? null : d.timecaptured - (s - d.timecaptured)
RETURN timediff
We simply calculate the sum of the previous and the current document, and by subtracting the current document's timecaptured we can therefore calculate the timecaptured of the previous document. So now we can easily calculate the requested difference.
I only use the COUNT to return null for the first document (which has no predecessor). If you are fine with having a difference of zero for the first document, you can simply remove it.
However, neither approach is very straight forward or obvious. I put on my TODO list to add an APPEND aggregate function that could be used in WINDOW and COLLECT operations.
The WINDOW function doesn't give you direct access to the data in the sliding window but here is a rather clever workaround:
FOR doc IN collection
SORT doc.timecaptured
WINDOW { preceding: 1 }
AGGREGATE d = UNIQUE(KEEP(doc, "_key", "timecaptured"))
LET timediff = doc.timecaptured - d[0].timecaptured
RETURN MERGE(doc, {timediff})
The UNIQUE() function is available for window aggregations and can be used to get at the desired data (previous document). Aggregating full documents might be inefficient, so a projection should do, but remember that UNIQUE() will remove duplicate values. A document _key is unique within a collection, so we can add it to the projection to make sure that UNIQUE() doesn't remove anything.
The time difference is calculated by subtracting the previous' documents timecaptured value from the current document's one. In the case of the first record, d[0] is actually equal to the current document and the difference ends up being 0, which I think is sensible. You could also write d[-1].timecaptured - d[0].timecaptured to achieve the same. d[1].timecaptured - d[0].timecaptured on the other hand will give you the inverted timestamp for the first record because d[1] is null (no previous document) and evaluates to 0.
There is one risk: UNIQUE() may alter the order of the documents. You could use a subquery to sort by timecaptured again:
LET timediff = doc.timecaptured - (
FOR dd IN d SORT dd.timecaptured LIMIT 1 RETURN dd.timecaptured
)[0]
But it's not great for performance to use a subquery. Instead, you can use the aggregation variable d to access both documents and calculate the absolute value of the subtraction so that the order doesn't matter:
LET timediff = ABS(d[-1].timecaptured - d[0].timecaptured)

Couchdb - date range + multiple query parameters

I want to be able query the couchdb between dates, I know that this can be done with startkey and endkey (it works fine), but is it possible to do query for example like this:
SELECT *
FROM TABLENAME
WHERE
DateTime >= '2011-04-12T00:00:00.000' AND
DateTime <= '2012-05-25T03:53:04.000'
AND
Status = 'Completed'
AND
Job_category = 'Installation'
Generally-speaking, establishing indexes on multiple fields grows in complexity as the number of fields increases.
My main question is: do Status and Job_category need to be queried dynamically too? If not, your view is simple:
function (doc) {
if (doc.Status === 'Completed' && doc.Job_category === 'Installation') {
emit(doc.DateTime); // this line may change depending on how you break up and emit the datetimes
}
}
Views are fairly cheap, (depending on the size of your database) so don't be afraid to establish several that cover different cases. I would expect something like Status to have predefined list of available options, as oppposed to Job_category which seems like it could be more related to user input.
If you need those fields to be dynamic, you can just add them to the index as well:
function (doc) {
emit([ doc.Status, doc.Job_category, doc.DateTime ]);
}
Then you can use an array as your start_key. For example:
start_key=["Completed", "Installation", ...]
tl;dr: use "static" views where you have a predetermined list of values for a given field. while possible to query "dynamic" views with multiple fields, the complexity grows very quickly.

Cloudant 1 to many function

I’ve just started to use Cloudant and I just can’t get my head around the map functions. I’ve been fiddling with the data below but it isn’t working out as I expected.
The relationship is, a user can have many vehicles. A vehicle belongs to 1 user. The vehicle ‘userId’ is the key of the user. There is a bit of redundancy as in user the _id and userId is the same, guess later is not required.
Anyhow, how can I find for a/every user, the vehicles which belong to it? The closest I’ve come through trial and error is a result which displays the owner of every vehicle, but I would like it the other way round, the user and the vehicles belonging to it. All the examples I’ve found use another document which ‘joins’ two or more documents, but I don’t need to do that?
Any point in the right direction appreciated - I really have no idea.
function (doc) {
if (doc.$doctype == "vehicle")
{
emit(doc.userId, {_id: doc.userId});
}
}
EDIT: Getting closer. I'm not sure exactly what I was expecting, but the result seems a bit 'messy'. Row[0] is the user document, row[n > 0] are the vehicle documents. I guess it's fine when a startkey/endkey is used, but without the results are a bit jumbled up.
function (doc) {
if (doc.$doctype == 'user') {
emit([doc._id, 0], doc);
} else if (doc.$doctype == 'vehicle') {
emit([doc.userId, 1, doc._id], doc);
}
}
A user is described as,
{
"_id": "user:10",
"firstname": “firstnamehere",
"secondname": “secondnamehere",
"userId": "user:10",
"$doctype": "user"
}
a vehicle is described as,
{
"_id": "vehicle:4002”,
“name”: “avehicle”,
"userId": "user:10",
"$doctype": "vehicle",
}
You're getting in the right direction! You already got that right with the global IDs. Having the type of the document as part of the ID in some form is a very good idea, so that you don't get confused later (all documents are in the same "pot").
Here are some minor problems with your current solution (before getting to your actual question):
Don't emit the doc as value in emit(key, value). You can always ask for the document that belongs to a view row by querying with include_docs=true. Having the doc as view value increases the view indexes a lot. When you don't need a specific value, use emit(key, null).
You also don't need the ID in the emit value. You'll get the ID of the document that belongs to a view row as part of the row anyway.
View Collation
Now to your problem of aggregating the vehicles with their user. You got the basic pattern right. This pattern is called view collation, you can read more about it in the CouchDB docs (ignore that it is in the "Couchapp" section).
The trick with view collation is that you return two or more types of documents, but make sure that they are sorted in a way that allows for direct grouping. Thus it is important to understand how CouchDB sorts the view result. See the collation specification for more information on that one. An important key to understanding view collation is that rows with array keys are sorted by key elements. So when two rows have the same key[0], they sort by key[1]. If that's equal as well, key[2] is considered, and so on.
Your map function frist groups users and vehicles by user ID (key[0]). Your map function then uses the fact that 0 sorts before 1 in the second element of the key, so your view will contain the following:
user 1
vehicle of user 1
vehicle of user 1
vehicle of user 1
user 2
user 3
vehicle of user 3
user 4
etc.
As you can see, the vehicles of a user immediately follow their user. Thus you can group this result into aggregates without performing expensive sort or lookup operations.
Note that users are sorted according to their ID, and vehicles within users also according to their ID. This is because you use the IDs in the key array.
Creating Queries
Now that view isn't worth much if you can't query according to your needs. A view as you have it supports the following queries:
Get all users with their vehicles
Get a range of users with their vehicles
Get a single user with its vehicles
Get a single user without vehicles (you could also use the _all_docs view for that though)
Example query for "all users between user 1 and user 3 (inclusive) with their vehicles"
We want to query for a range, so we use startkey and endkey in the query:
startkey=["user:1", 0]
endkey=["user:3", 1, {}]
Note the use of {} as sentinel value, which is required so that the end key is larger than any row that has a key of ["user:3", 1, (anyConceivableVehicleId)]

Couchdb: filter and group in a single view

I have a Couchdb database with documents of the form: { Name, Timestamp, Value }
I have a view that shows a summary grouped by name with the sum of the values. This is straight forward reduce function.
Now I want to filter the view to only take into account documents where the timestamp occured in a given range.
AFAIK this means I have to include the timestamp in the emitted key of the map function, eg. emit([doc.Timestamp, doc.Name], doc)
But as soon as I do that the reduce function no longer sees the rows grouped together to calculate the sum. If I put the name first I can group at level 1 only, but how to I filter at level 2?
Is there a way to do this?
I don't think this is possible with only one HTTP fetch and/or without additional logic in your own code.
If you emit([time, name]) you would be able to query startkey=[timeA]&endkey=[timeB]&group_level=2 to get items between timeA and timeB grouped where their timestamp and name were identical. You could then post-process this to add up whenever the names matched, but the initial result set might be larger than you want to handle.
An alternative would be to emit([name,time]). Then you could first query with group_level=1 to get a list of names [if your application doesn't already know what they'll be]. Then for each one of those you would query startkey=[nameN]&endkey=[nameN,{}]&group_level=2 to get the summary for each name.
(Note that in my query examples I've left the JSON start/end keys unencoded, so as to make them more human readable, but you'll need to apply your language's equivalent of JavaScript's encodeURIComponent on them in actual use.)
You can not make a view onto a view. You need to write another map-reduce view that has the filtering and makes the grouping in the end. Something like:
map:
function(doc) {
if (doc.timestamp > start and doc.timestamp < end ) {
emit(doc.name, doc.value);
}
}
reduce:
function(key, values, rereduce) {
return sum(values);
}
I suppose you can not store this view, and have to put it as an ad-hoc query in your application.

Does CouchDB support multiple range queries?

How are multiple range queries implemented in CouchDB? For a single range condition, startkey and endkey combination works fine, but the same thing is not working with a multiple range condition.
My View function is like this:
"function(doc){
if ((doc['couchrest-type'] == 'Item')
&& doc['loan_name']&& doc['loan_period']&&
doc['loan_amount'])
{ emit([doc['template_id'],
doc['loan_name'],doc['loan_period'],
doc['loan_amount']],null);}}"
I need to get the whole docs with loan_period > 5 and
loan_amount > 30000. My startkey and endkey parameters are like this:
params = {:startkey =>["7446567e45dc5155353736cb3d6041c0",nil,5,30000],
:endkey=>["7446567e45dc5155353736cb3d6041c0",{},{},{}],:include_docs => true}
Here, I am not getting the desired result. I think my startkey and endkey params are wrong. Can anyone help me?
A CouchDB view is an ordered list of entries. Queries on a view return a contiguous slice of that list. As such, it's not possible to apply two inequality conditions.
Assuming that your loan_period is a discrete variable, this case would probably be best solved by emit'ing the loan_period first and then issuing one query for each period.
An alternative solution would be to use couchdb-lucene.
You're using arrays as your keys. Couchdb will compare arrays by comparing each array element in increasing order until two element are not equal.
E.g. to compare [1,'a',5] and [1,'c',0] it will compare 1 whith 1, then 'a' with 'c' and will decide that [1,'a',5] is less than [1,'a',0]
This explains why your range key query fails:
["7446567e45dc5155353736cb3d6041c0",nil,5,30000] is greater ["7446567e45dc5155353736cb3d6041c0",nil,5,90000]
Your emit statement looks a little strange to me. The purpose of emit is to produce a key (i.e. an index) and then the document's values that you are interested in.
for example:
emit( doc.index, [doc.name, doc.address, ....] );
You are generating an array for the index and no data for the view.
Also, Couchdb doesn't provide for an intersection of views as it doesn't fit the map/reduce paradigm very well. So your needs boil down to trying to address the following:
Can I produce a unique index which I can then extract a particular range from? (using startkey & endkey)
Actually CouchDB allows views to have complex keys which are arrays of values as given in the question:
[template_id, loan_name, loan_period, loan_amount]
Have you tried
params = {:startkey =>["7446567e45dc5155353736cb3d6041c0",nil,5,30000],
:endkey=>["7446567e45dc5155353736cb3d6041c0",{}],:include_docs => true}
or perhaps
params = {:startkey =>["7446567e45dc5155353736cb3d6041c0","\u0000",5,30000],
:endkey=>["7446567e45dc5155353736cb3d6041c0","\u9999",{}],:include_docs => true}

Resources