Using groupsVariable in COLLECT to track unique values - arangodb

I'm using the following AQL to track unique visitors by month and year. Each hit contains a datetime stamp, _date, and a user name, user.
FOR hit IN PageHits
COLLECT month = DATE_MONTH(hit._date),year=DATE_YEAR(hit._date) INTO user=hit.user
SORT year,month
RETURN {month,year,uniqueVisitors:LENGTH(UNIQUE(user))}
The query calculates the correct answer but it strikes me as inefficient because the variable user contains many duplicates. The final LENGTH/UNIQUE removes these and returns the number of unique visitors.
I looked at AGGREGATE but all of the operations are statistical.
So, is there any way to add only distinct/unique values into the group variable user?

Related

date difference along with group

I have a dataframe which consist of order_date,order_time for multiple restaurant. I want to create a new column which would calculate the queue based on order date and time.
I have tried using diff() on order time along with group by on order_date and restaurant_id but it didnt worked.
Expected output : a new column 'time_diff' which would contain difference between consecutive orders for same restaurant and on same date.

NetSuite formula to show quantity ordred on first order

We have a saved search in place that displays date of first order and last order by customer & item within a given date range. For example - Looking at sales for May 2022 - today, it shows the item, the customer, the date they first ordered the item, and the date they last ordered the item.
I am now also trying to incorporate the quantity ordered on the first order.
crietera
results
I've tried the following, but keep getting an invalid expression. Can anyone advise on how I might be able to display the qty on the date of the first order?
MIN(CASE WHEN {custbody_reporting_ship_date} THEN {quantity} END)
and
CASE WHEN (MIN({custbody_reporting_ship_date})THEN {quantity} END)
I think this can't be achieved in saved search via simple formula. You may want to try below, what I will do is try to record the first and last order id in the customer record, then from customer saved search you can source from these order fields to the order details.
Create 2 custom entity fields applied to customer
first order id => Type:List/Record => List/Record:Transaction
last order id => Type:List/Record => List/Record:Transaction
Create 1 saved search to get the first order id per customer
Similar to the one that you created, the result field just need the customer internal id and min(document number) and filter need to add customer.internal id
Create 1 saved search to get the last order id per customer
Similar to the one that you created, the result field just need the customer internal id and max(document number) and filter need to add customer.internal id
Create 2 scheduled workflows, to update the first order id and last order id fields in respective customer record.
Create your first and last order customer Saved Searches using the 2 custom fields to source for the order details.
Not sure if this help.

Excel: Order by date within multiple IDs

I have a huge epidemiological dataset containing registry data with pathology reports and clinical information. I have merged several files into one masterfile in order to get all information from one file. Every patient is assigned an unique ID-number. Each patient can have several reports and hence the same ID number can be repeated several times in the ID column. For each ID entry = new row (= pathology or clinical report) there is a date of that sample/information reported.
My goal is to be able to read all pathology/clinical info for a particular ID within one row.
By sorting the IDs, I get a clear picture of the number of each ID that has been entered. The problem arises when there are several reports = multiple rows with identical ID because the dates within this one patients with several IDs = rows do not match. The dates come from pathology (sample date, answer date, clinical info date etc). The dates from pathology and clinical within one patient does not have to match exactly on the day but still within a reasonable timeframe e.g. within 1-2 months. This is best illustrated with an example.
I want to sort the columns so that dates from a particular row match together. I am sure there is a way to do that but I cannot figure it out.
Thanks in advance
The issue of mismatching records seems to arise once the two separate tables are merged into one. In order to fix this, there are several options you can take:
Re-do the merge but strengthen the way in which the tables are joined on.
Instead of only merging based on ID, see if there is another field that could easily connect the records, perhaps a medical record #, case #, or event #, and merge the tables based on this new field AND ID. This would be the strongest solution, however it will only work if you can find said field to strengthen the link.
A separate solution would be to first sort the original tables based on the dates so that they match up and then re-merging them together.
In theory this should solve your problem as I assume currently when matching up the two separate tables it is grabbing the first instance of patient X01 from both tables and matching them together. This can be confirmed by checking the merged query and looking to see if the mismatched records are in the same order as presented in the original tables. This is not perfect, as it relies on no clinical dates occurring between pathology dates for the record, so I would proceed with caution.
And to address your concern about losing track of ID's with multiple rows, this should not matter as in the end result after merged you can then sort by ID, however you can add multiple levels of sort by selecting the data and going to Data -> Sort -> Add Level. You can change the order in which the data is sorted (First by ID and then by Date).

Find each customer's second order date in a list of all orders

I have a list of all orders placed in a time range and I am trying to calculate the difference between each customer's first and second order. I have a list of unique customer IDs on another tab and I already have the first order date for each. I need a way to grab the date of the second order, corresponding to each customer ID.
In the sample data below, the correct output of this formula for customer ID "153950" would be "5/11/17 8:41".
Use AGGREGATE()
=AGGREGATE(15,6,Sheet1!C1:C4/(Sheet1!A1:A4 = 153950),2)
The ,2 is grabbing the second smallest date.

Query Couchdb by date while maintaining sort order

I am new to couchdb, i have looked at the docs and SO posts but for some reason this simple query is still eluding me.
SELECT TOP 10 * FROM x WHERE DATE BETWEEN startdate AND enddate ORDER BY score
UPDATE: It cannot be done. This is unfortunate since to get this type
of data you have to pull back potentially millions of records (a few
fields) from couch then do either filtering, sorting or limiting
yourself to get the desired results. I am now going back to my
original solution of using _changes to capture and store elsewhere the data i do need to perform that query on.
Here is my updated view (thanks to Dominic):
emit([d.getUTCFullYear(), d.getUTCMonth() + 1, d.getUTCDate(), score], doc.name);
What I need to do is:
Always sort by score descending
Optionally filter by date range (for instance, TODAY only)
Limit by x
Update: Thanks to Dominic I am much closer - but still having an
issue.
?startkey=[2017,1,13,{}]&endkey=[2017,1,10]&descending=true&limit=10&include_docs=true
This brings back documents between the dates sorted by score
However if i want top 10 regardless of date then i only get back top 10 sorted by date (and not score)
For starters, when using complex keys in CouchDB, you can only sort from left to right. This is a common misconception, but read up on Views Collation for a more in-depth explanation. (while you're at it, read the entire Guide to Views as well since you're getting started)
If you want to be able to sort by score, but filter by date only, you can accomplish this by breaking down your timestamp to only show the degree you care about.
function (doc) {
var d = new Date(doc.date)
emit([ d.getUTCFullYear(), d.getUTCMonth() + 1, d.getUTCDate(), score ])
}
You'll end up outputting a more complex key than what you currently have, but you query it like so:
startkey=[2017,1,1]&endkey=[2017,1,1,{}]
This will pick out all the documents on 1-1-2017, and it'll be sorted by score already! (in ascending order, simply swap startkey and endkey to get descending order, no change to the view needed)
As an aside, avoid emitting the entire doc as the value in your view. It is likely more efficient to leverage the include_docs=true parameter, and leaving the value of your emit empty. (please refer to this SO question for more information)
With this exact setup, you'd need separate views in order to query by different precisions. For example, to query by month you just use the year/month and so on.
However, if you are willing/able to sort your scores in your application, you can use a single view to get all the date precision you want. For example:
function (doc) {
var d = new Date(doc.date)
emit([ d.getUTCFullYear(), d.getUTCMonth() + 1, d.getUTCDate(), d.getUTCHour(), d.getUTCMinutes(), d.getUTCSeconds(), d.getUTCMilliseconds() ])
}
With this view and the group_level parameter, you can get all the scores by year, month, date, hour, etc. As I mentioned, in this case it won't be sorted by score yet, but maybe this opens up other queries to you. (eg: what users participated this month?)

Resources