Gremlin - How to filter while converting property to Integer? - groovy

I have the following segment of gremlin code:
vert.as('x').
both.or(
_().has("time").filter{ _()["time"] > startTime.toInteger() },
_().has("isRead"), _().has("isWrite")).dedup{}.gather.scatter.
store(y).loop('x'){c++ < limit.toInteger()}.iterate();
I would think that this would filter the items out whose time attribute value resolve to be greater than startTime. But this isn't the case. How do I get the time of the current object in the pipeline in order to compare it?

Actually, I found the answer really fast. I should have known, being as I've read basically all the gremlin documentation... :/
vert.as('x').
both.or(
_().has("time").filter{ it.time > startTime.toInteger() },
_().has("isRead"), _().has("isWrite")).dedup{}.gather.scatter.
store(y).loop('x'){c++ < limit.toInteger()}.iterate();

Related

Creating Test data for ArangoDB

Hi I would like to insert random test data into an edge collection called Transaction with the fields _id, Amount and TransferType with random data. I have written the following code below, but it is showing a syntax error.
FOR i IN 1..30000
INSERT {
_id: CONCAT('Transaction/', i),
Amount:RAND(),
Time:Rand(DATE_TIMESTAMP),
i > 1000 || u.Type_of_Transfer == "NEFT" ? u.Type_of_Transfer == "IMPS"
} INTO Transaction OPTIONS { ignoreErrors: true }
Your code has multiple issues:
When you are creating a new document you can either not specify the _key key and Arango will create one for you, or you specify one as a string to be used. _id as a key will be ignored.
RAND() produces a random number between 0 and 1, so it needs to be multiplied in order to make it into the range you want you might need to round it, if you need integer values.
DATE_TIMESTAMP is a function and you have given it as a parameter to the RAND() function which needs no parameter. But because it generates a numerical timestamp (milliseconds since 1970-01-01 00:00 UTC), actually it's not needed. The only thing you need is the random number generation shifted to a range that makes sense (ie: not in the 1970s)
The i > 1000 ... line is something I could only guess what it wanted to be. Here the key for the JSON object is missing. You are referencing a u variable that is not defined anywhere. I see the first two parts of a ternary operator expression (cond ? true_value : false_value) but the : is missing. My best guess is that you wanted to create a Type_of_transfer key with value of "NEFT" when i>1000 and "IMPS" when i<=1000
So, I rewrote your AQL and tested it
FOR i IN 1..30000
INSERT {
_key: TO_STRING(i),
Amount: RAND()*1000,
Time: ROUND(RAND()*100000000+1603031645000),
Type_of_Transfer: i > 1000 ? "NEFT" : "IMPS"
} INTO Transaction OPTIONS { ignoreErrors: true }

Timeseries differencing - ArangoDB (AQL or Python)

I have a collection which holds documents, with each document having a data observation and the time that the data was captured.
e.g.
{
_key:....,
"data":26,
"timecaptured":1643488638.946702
}
where timecaptured for now is a utc timestamp.
What I want to do is get the duration between consecutive observations, with SQL I could do this with LAG for example, but with ArangoDB and AQL I am struggling to see how to do this at the database. So effectively the difference in timestamps between two documents in time order. I have a lot of data and I don't really want to pull it all into pandas.
Any help really appreciated.
Although the solution provided by CodeManX works, I prefer a different one:
FOR d IN docs
SORT d.timecaptured
WINDOW { preceding: 1 } AGGREGATE s = SUM(d.timecaptured), cnt = COUNT(1)
LET timediff = cnt == 1 ? null : d.timecaptured - (s - d.timecaptured)
RETURN timediff
We simply calculate the sum of the previous and the current document, and by subtracting the current document's timecaptured we can therefore calculate the timecaptured of the previous document. So now we can easily calculate the requested difference.
I only use the COUNT to return null for the first document (which has no predecessor). If you are fine with having a difference of zero for the first document, you can simply remove it.
However, neither approach is very straight forward or obvious. I put on my TODO list to add an APPEND aggregate function that could be used in WINDOW and COLLECT operations.
The WINDOW function doesn't give you direct access to the data in the sliding window but here is a rather clever workaround:
FOR doc IN collection
SORT doc.timecaptured
WINDOW { preceding: 1 }
AGGREGATE d = UNIQUE(KEEP(doc, "_key", "timecaptured"))
LET timediff = doc.timecaptured - d[0].timecaptured
RETURN MERGE(doc, {timediff})
The UNIQUE() function is available for window aggregations and can be used to get at the desired data (previous document). Aggregating full documents might be inefficient, so a projection should do, but remember that UNIQUE() will remove duplicate values. A document _key is unique within a collection, so we can add it to the projection to make sure that UNIQUE() doesn't remove anything.
The time difference is calculated by subtracting the previous' documents timecaptured value from the current document's one. In the case of the first record, d[0] is actually equal to the current document and the difference ends up being 0, which I think is sensible. You could also write d[-1].timecaptured - d[0].timecaptured to achieve the same. d[1].timecaptured - d[0].timecaptured on the other hand will give you the inverted timestamp for the first record because d[1] is null (no previous document) and evaluates to 0.
There is one risk: UNIQUE() may alter the order of the documents. You could use a subquery to sort by timecaptured again:
LET timediff = doc.timecaptured - (
FOR dd IN d SORT dd.timecaptured LIMIT 1 RETURN dd.timecaptured
)[0]
But it's not great for performance to use a subquery. Instead, you can use the aggregation variable d to access both documents and calculate the absolute value of the subtraction so that the order doesn't matter:
LET timediff = ABS(d[-1].timecaptured - d[0].timecaptured)

How truncate time while querying documents for date comparison in Cosmos Db

I have document contains properties like this
{
"id":"1bd13f8f-b56a-48cb-9b49-7fc4d88beeac",
"name":"Sam",
"createdOnDateTime": "2018-07-23T12:47:42.6407069Z"
}
I want to query a document on basis of createdOnDateTime which is stored as string.
query e.g. -
SELECT * FROM c where c.createdOnDateTime>='2018-07-23' AND c.createdOnDateTime<='2018-07-23'
This will return all documents which are created on that day.
I am providing date value from date selector which gives only date without time so, it gives me problem while comparing date.
Is there any way to remove time from createdOnDateTime property or is there any other way to achieve this?
CosmosDB clients are storing timestamps in ISO8601 format and one of the good reasons to do so is that its lexicographical order matches the flow of time. Meaning - you can sort and compare those strings and get them ordered by time they represent.
So in this case you don't need to remove time components just modify the passed in parameters to get the result you need. If you want all entries from entire date of 2018-07-23 then you can use query:
SELECT * FROM c
WHERE c.createdOnDateTime >= '2018-07-23'
AND c.createdOnDateTime < '2018-07-24'
Please note that this query can use a RANGE index on createdOnDateTime.
Please use User Defined Function to implement your requirement, no need to update createdOnDateTime property.
UDF:
function con(date){
var myDate = new Date(date);
var month = myDate.getMonth()+1;
if(month<10){
month = "0"+month;
}
return myDate.getFullYear()+"-"+month+"-"+myDate.getDate();
}
SQL:
SELECT c.id,c.createdOnDateTime FROM c where udf.con(c.createdOnDateTime)>='2018-07-23' AND udf.con(c.createdOnDateTime)<='2018-07-23'
Output :
Hope it helps you.

Azure DocumentDB: order by and filter by DateTime

I have the following query:
SELECT * FROM c
WHERE c.DateTime >= "2017-03-20T10:07:17.9894476+01:00" AND c.DateTime <= "2017-03-22T10:07:17.9904464+01:00"
ORDER BY c.DateTime DESC
So as you can see I have a WHERE condition for a property with the type DateTimeand I want to sort my result by the same one.
The query ends with the following error:
Order-by item requires a range index to be defined on the corresponding index path.
I have absolutely no idea what this error message is about :(
Has anybody any idea?
You can also do one thing that don't require indexing explicitly. Azure documentBD is providing indexing on numbers field by default so you can store the date in long format. Because you are already converting date to string, you can also convert date to long an store, then you can implement range query.
I think I found a possible solution, thanks for pointing out the issue with the index.
As stated in the following article https://learn.microsoft.com/en-us/azure/documentdb/documentdb-working-with-dates#indexing-datetimes-for-range-queries I changed the index for the datatype string to RangeIndex, to allow range queries:
DocumentCollection collection = new DocumentCollection { Id = "orders" };
collection.IndexingPolicy = new IndexingPolicy(new RangeIndex(DataType.String) { Precision = -1 });
await client.CreateDocumentCollectionAsync("/dbs/orderdb", collection);
And it seems to work! If there are any undesired side effects I will let you know.

Does CouchDB support multiple range queries?

How are multiple range queries implemented in CouchDB? For a single range condition, startkey and endkey combination works fine, but the same thing is not working with a multiple range condition.
My View function is like this:
"function(doc){
if ((doc['couchrest-type'] == 'Item')
&& doc['loan_name']&& doc['loan_period']&&
doc['loan_amount'])
{ emit([doc['template_id'],
doc['loan_name'],doc['loan_period'],
doc['loan_amount']],null);}}"
I need to get the whole docs with loan_period > 5 and
loan_amount > 30000. My startkey and endkey parameters are like this:
params = {:startkey =>["7446567e45dc5155353736cb3d6041c0",nil,5,30000],
:endkey=>["7446567e45dc5155353736cb3d6041c0",{},{},{}],:include_docs => true}
Here, I am not getting the desired result. I think my startkey and endkey params are wrong. Can anyone help me?
A CouchDB view is an ordered list of entries. Queries on a view return a contiguous slice of that list. As such, it's not possible to apply two inequality conditions.
Assuming that your loan_period is a discrete variable, this case would probably be best solved by emit'ing the loan_period first and then issuing one query for each period.
An alternative solution would be to use couchdb-lucene.
You're using arrays as your keys. Couchdb will compare arrays by comparing each array element in increasing order until two element are not equal.
E.g. to compare [1,'a',5] and [1,'c',0] it will compare 1 whith 1, then 'a' with 'c' and will decide that [1,'a',5] is less than [1,'a',0]
This explains why your range key query fails:
["7446567e45dc5155353736cb3d6041c0",nil,5,30000] is greater ["7446567e45dc5155353736cb3d6041c0",nil,5,90000]
Your emit statement looks a little strange to me. The purpose of emit is to produce a key (i.e. an index) and then the document's values that you are interested in.
for example:
emit( doc.index, [doc.name, doc.address, ....] );
You are generating an array for the index and no data for the view.
Also, Couchdb doesn't provide for an intersection of views as it doesn't fit the map/reduce paradigm very well. So your needs boil down to trying to address the following:
Can I produce a unique index which I can then extract a particular range from? (using startkey & endkey)
Actually CouchDB allows views to have complex keys which are arrays of values as given in the question:
[template_id, loan_name, loan_period, loan_amount]
Have you tried
params = {:startkey =>["7446567e45dc5155353736cb3d6041c0",nil,5,30000],
:endkey=>["7446567e45dc5155353736cb3d6041c0",{}],:include_docs => true}
or perhaps
params = {:startkey =>["7446567e45dc5155353736cb3d6041c0","\u0000",5,30000],
:endkey=>["7446567e45dc5155353736cb3d6041c0","\u9999",{}],:include_docs => true}

Resources