ArangoDB: Collect syntax statement error when adding the sum function for an collection attribute - arangodb

I am trying to add all the Amounts in the edge collection and also extract the days from the date attribute in the edge collection named Transaction.
However, I am getting error in the collect statement.
for d in Transaction
filter d._to == "Account/123"
COLLECT aggregate ct =count(d._id),
aggregate totamnt=sum(d.Amount),
aggregate daysactive= count(distinct date_trunc(d.Time))
return distinct {"Incoming Accounts":length, "Days Active": daysactive}

If I understand what you want to achieve correctly, this is a query to achieve it:
FOR d IN Transaction
FILTER d._to == "Account/123"
COLLECT AGGREGATE length = COUNT_UNIQUE(d._id),
totamnt = SUM(d.Amount),
daysactive = COUNT_UNIQUE(DATE_TRUNC(d.Time, "day"))
RETURN {
"Incoming Accounts": length ,
"Days Active": LENGTH(daysactive),
"Total Amount": totamnt
}
Note: The distinct is not necessary, I include the total amount in the return value, and specified "day" as the unit to truncate the date to.
I tested this slightly adapted on a collection of mine and got sensible results.

Related

NetSuite Search formula for items that have no open transactions

I am trying to create a formula to obtain a list of items that have no open transactions.
I cant just filter out by status as this filters out transactions that are open, as opposed to showing me only items with nothing open.
So basically if an item has anything open then i dont want it on the search. I do need it on the search if it has all closed or it has no transactions at all.
Hoping someone can help put me in the right direction.
I am a little bit stuck at where to start with the formulas and tried a case formula.
You can use item saved search adding under criteria as "Transaction Fields-status-anyOf-select all closed/rejected/declined statuses" not in filter reason of saved search.
Thanks.
To get the value of non transaction items as well, You need to check the check box use expression under criteria in standard subtab use parens() with OR expression.
And add one more condition as "Transaction Fields-Internal Id-anyOf-none with
"Transaction Fields-status-anyOf-select all closed/rejected/declined statuses".
Add both condition with OR logic.
It will work for both items condition if it has transaction status with closed or with none of transaction internal ids.
Thanks.
I think this is possible in a saved search, and requires a change in the way the filtering is done. Rather than filtering on the "Filters", using grouping and summary calculations to determine if an item qualifies, basically :
Create the item saved search as you would normally, but don't include a "Standard" filter for the openness of the transaction.
In the results, group by item name (or internalid), and another fields you want to include in the top-level results.
In the Criteria - Summary list, add a Formula (Number) condition :
Summary Type= Sum (Count won't work here)
Formula = case when {transaction.status} = 'Open' then 1 else 0 end
Equal to 0
Whether this is more or less elegant than bknight's answer is debatable.
I don't think this is the sort of thing you can do with a single saved search.
It would be fairly easy to do with SuiteQL though.
The script below runs in the console and finds items that are not on any Pending Billing Sales Orders. It's adapted from a script with a different purpose but illustrates the concept.
You can get a list of the status values to use by creating a saved search that finds all the transactions with open statuses you want to exclude , take note of that saved search's id and running the second script in the console
require(['N/query'], query => {
const sqlStr = `
select item.id, itemid, count(po.tranid) as po, count(bill.tranId) as bill, max(bill.tranDate) as lastBilled, count(sale.tranId) as sales, count(tran.tranId) as trans
from item
left outer join transactionLine as line
on line.item = item.id
left outer join transaction as tran on line.transaction = tran.id
left outer join transaction as po on line.transaction = po.id and po.type = 'PurchOrd'
left outer join transaction as bill on line.transaction = bill.id and bill.type = 'VendBill'
left outer join transaction as sale on line.transaction = sale.id and sale.type in ('CustInvc', 'CashSale')
where item.id not in (select otl.item from transactionLine otl, transaction ot where
otl.transaction = ot.id and ot.status in ('SalesOrd:F'))
group by item.id, item.itemid
`;
console.log(sqlStr);
console.log(query.runSuiteQL({
query: sqlStr
}).asMappedResults().map((r, idx)=>{
if(!idx) console.log(JSON.stringify(r));
return `${r.id}\t${r.itemid}\t${r.po}\t${r.bill}\t${r.lastBilled}\t${r.sales}\t${r.trans}`;
}).join('\n'));
});
require(['N/search'], search=>{
const filters = search.load({id:304}).filters;
console.log(JSON.stringify(filters.find(f=>f.name == 'status'), null, ' '));
});
In terms of doing something with this you could run this in a saved search and email someone the results, show the results in a workbook in SuiteAnalytics or build a portlet to display the results - for this last Tim Dietrich has a nice write up on portlets and SuiteQL

Timeseries differencing - ArangoDB (AQL or Python)

I have a collection which holds documents, with each document having a data observation and the time that the data was captured.
e.g.
{
_key:....,
"data":26,
"timecaptured":1643488638.946702
}
where timecaptured for now is a utc timestamp.
What I want to do is get the duration between consecutive observations, with SQL I could do this with LAG for example, but with ArangoDB and AQL I am struggling to see how to do this at the database. So effectively the difference in timestamps between two documents in time order. I have a lot of data and I don't really want to pull it all into pandas.
Any help really appreciated.
Although the solution provided by CodeManX works, I prefer a different one:
FOR d IN docs
SORT d.timecaptured
WINDOW { preceding: 1 } AGGREGATE s = SUM(d.timecaptured), cnt = COUNT(1)
LET timediff = cnt == 1 ? null : d.timecaptured - (s - d.timecaptured)
RETURN timediff
We simply calculate the sum of the previous and the current document, and by subtracting the current document's timecaptured we can therefore calculate the timecaptured of the previous document. So now we can easily calculate the requested difference.
I only use the COUNT to return null for the first document (which has no predecessor). If you are fine with having a difference of zero for the first document, you can simply remove it.
However, neither approach is very straight forward or obvious. I put on my TODO list to add an APPEND aggregate function that could be used in WINDOW and COLLECT operations.
The WINDOW function doesn't give you direct access to the data in the sliding window but here is a rather clever workaround:
FOR doc IN collection
SORT doc.timecaptured
WINDOW { preceding: 1 }
AGGREGATE d = UNIQUE(KEEP(doc, "_key", "timecaptured"))
LET timediff = doc.timecaptured - d[0].timecaptured
RETURN MERGE(doc, {timediff})
The UNIQUE() function is available for window aggregations and can be used to get at the desired data (previous document). Aggregating full documents might be inefficient, so a projection should do, but remember that UNIQUE() will remove duplicate values. A document _key is unique within a collection, so we can add it to the projection to make sure that UNIQUE() doesn't remove anything.
The time difference is calculated by subtracting the previous' documents timecaptured value from the current document's one. In the case of the first record, d[0] is actually equal to the current document and the difference ends up being 0, which I think is sensible. You could also write d[-1].timecaptured - d[0].timecaptured to achieve the same. d[1].timecaptured - d[0].timecaptured on the other hand will give you the inverted timestamp for the first record because d[1] is null (no previous document) and evaluates to 0.
There is one risk: UNIQUE() may alter the order of the documents. You could use a subquery to sort by timecaptured again:
LET timediff = doc.timecaptured - (
FOR dd IN d SORT dd.timecaptured LIMIT 1 RETURN dd.timecaptured
)[0]
But it's not great for performance to use a subquery. Instead, you can use the aggregation variable d to access both documents and calculate the absolute value of the subtraction so that the order doesn't matter:
LET timediff = ABS(d[-1].timecaptured - d[0].timecaptured)

Can I filter multiple collections?

I want to filter multiple collections, to return only documents who have those requirements, the problem is when there is more than one matching value in one collection, the elements shown are repeated.
FOR TurmaA IN TurmaA
FOR TurmaB IN TurmaB
FILTER TurmaA.Disciplinas.Mat >10
FILTER TurmaB.Disciplinas.Mat >10
RETURN {TurmaA,TurmaB}
Screenshot of the problem
What your query does is to iterate over all documents of the first collection, and for each record it iterates over the second collection. The applied filters reduce the number of results, but this is not how you should go about it as it is highly inefficient.
Do you actually want to return the union of the matches from both collections?
(SELECT ... UNION SELECT ... in SQL).
What you get with your current approach are all possible combinations of the documents from both collections. I believe what you want is:
LET a = (FOR t IN TurmaA FILTER t.Disciplinas.Mat > 10 RETURN t)
LET b = (FOR t IN TurmaB FILTER t.Disciplinas.Mat > 10 RETURN t)
FOR doc IN UNION(a, b)
RETURN doc
Both collections are filtered individually in sub-queries, then the results are combined and returned.
Another solution would be to store all documents in one collection Turma and have another attribute e.g. Type with a value of "A" or "B". Then the query would be as simple as:
FOR t IN Turma
FILTER t.Disciplinas.Mat > 10
RETURN t
If you want to return TurmaA documents only, you would do:
FOR t IN Turma
FILTER t.Disciplinas.Mat > 10 AND t.Type == "A"
RETURN t
BTW. I recommend to call variables different from collection names, e.g. t instead of Turma if there is a collection Turma.

How to efficiently use Django query and Q to filter each object in a queryset and return 1 field value for each unique field in the queryset

I have query that returns the following queryset:
results = <QuerySet [<Product: ItemA>, <Product: ItemA>, <Product: ItemB>, <Product: ItemB>, <Product: ItemB>, <Product: ItemC>, <Product: ItemC>]>
The __str__ representation of the model is name and each Product variation likely has a different value for the price field. After this query, I need to search my database for each Product in the queryset and return the lowest price for each unique name so like:
Lowest price for all in database where name is == to ItemA
Lowest price for all in database where name is == to ItemB
Lowest price for all in database where name is == to ItemC
I use the following block of code to accomplish this goal:
query_list = []
for each in results:
if each.name not in query_list: #Checks if the name of the object is not in in the query list
query_list.append(each.name) #Adds just the name of the objects so there is just one of each name in query_list
for each in query_list:
priced = results.filter(name=each).order_by('price').first() #Lowest price for each name in query_list
This feel very inefficient. Is there a way to make a similar computation without having to append the unique name of each Product to a separate list, and iterating over that list, and then making a query for each one? I feel like there is a way to use a type of complex lookup to accomplish my goals, maybe event use less Python, and make the db do more of the work, but the above is the best I've been able to figure out so far. There can be a lot of different hits in results so I need this block to be as efficient as possible
It is easy after reading docs Generating aggregates for each item in a QuerySet and also "Interaction with default ordering or order_by()".
from django.db.models import Min
prices = {x['name']: x['lowest_price']
for x in results.values('name').annotate(lowest_price=Min('price').order_by()}
for product in results:
if product.name in prices and product.price == prices[product.name]:
priced = row # output the row
del prices[product.name]
That runs by two database queries.
An even more efficient solution with one query is probably possible with Window function, but it requires an advanced database backend and it can't work e.g. in tests with sqlite3.

Most appeared search between a particular time

I have a search log with fields namely time, place and the query. I want to find the most queried word from a particular place between a particular time. All the fields namely date,time, query_String are chararrays. I have the below pig script but it doesnot do what is required.
Data = LOAD 'data' USING CustomPigStorage();
FClients = FILTER Data BY NOT(country is null);
Clients = FOREACH FClients GENERATE date,time, country,query_string as query;
grp = group Clients by (query, country, date, time);
wth_count = foreach grp generate FLATTEN(group), COUNT(Clients) as count;
For example, I want the result to be "between 2pm and 3 pm, hello was searched 4 times from USA".
I am basically confused by the Count() function .Relatively new to pig. I believe my count() here is counting the total number of records I have.
Your query looks correct, COUNT(Clients) returns number of records in the bag, that came from Clients and belong to the group. To see it you can remove COUNT from "wth_count" statement and save results into a file and than look into it.
wth_count = foreach grp generate group, Clients;
store wth_count into 'path';
Your potential problem might be in the fact that you are using date and time columns in the group by and they produce too many groups. To mitigate this you could write a java static function that gets date and time and returns a single value for the range, for example 12-07-2012, 14.05.03 is converted into "12-07-2012 14h" and 12-07-2012, 14.05.05 into "12-07-2012 14h". This will create a key that covers the time interval 2pm and 3pm and will put all of the records from Clinets into that group's bag.

Resources