how can i get max Date and Min Date from a list Date Column
The brute force approach is to create two queries that will retrieve the list content sorted by date asc and desc. I know that this sucks but at least you can move on with you project and refine the query later on.
If only it was possible to retrieve top 1 then it might even work in production.
Related
I have a large number of records indexed on some startDateTime field, and want to select aggregates (SUM and COUNT) on all records grouped by WEEKOFYEAR(startDateTime) (i.e., EXTRACT(WEEK FROM startDateTime)). Can I put a secondary index on EXTRACT(WEEK FROM startDateTime)? Or, even better, will the query use an index on startDateTime appropriately to optimize a request grouped by WEEK?
See this similar question about MySQL indices. How would this be handled in the Cloud Spanner world?
Secondary index on generated columns (i.e., EXTRACT(WEEK FROM startDateTime)) are not supported yet. If you have a covering index that includes all the columns required for the query (i.e., startDateTime and other required columns for grouping and aggregation), the planner will use such covering index over the base table but the aggregation is likely to be based on hash aggregation. Unless you aggregate over very long period of time, it should not be a big problem (I admit that it is not ideal though).
If you want to restrict the aggregated time range, you need to spell it out in terms of startDateTime (i.e., you need to convert the min/max datetime to the same type as startDateTime).
Hope this helps.
I have a column of data [Sales ID] that bringing in duplicate data for an analysis. My goal is to try and limit the data to pull unique sales ID's for the max day of every month in the analysis only (instead of daily). Im basically trying to get it to only pull in unique sales ID values for the last the day of every month in the analysis ,and if the current day is the last day so far then it should pull that in. So it should pull in the MAX date in any given month. Please how do i write an expresion with the [Sales ID] column and [Date ] column to acieve this?
Probably the two easiest options are to
1) Adjust the SQL as niko mentioned
2) Limit the visualization with the "Limit Data Using Expression" option, using the following:
Rank(Day([DATE]), "desc", Month([DATE]), Year([DATE])) = 1
If you had to do it in the Data on Demand section (maybe the IL itself is a usp or you don't have permission to edit it), my preference would be to create another data table that only has the max dates for each month, and then filter your first data table by that.
However, if you really need to do it in the Data on Demand section, then I'm guessing you don't have the ability to create your own information links. This would mean you can't key off additional data tables, and you're probably going to have to get creative.
Constraints of creativity include needing to know the "rules" of your data -- are you pulling the data in daily? Once a week? Do you have today's data, or today - 2? You could probably write a python script to grab the last day of every month for the last 10 years, and then whatever yesterday's date was, and throw all those values into a document property. This would allow you to do a "Values from Property".
(Side Note: I want to say you could also do it directly in the expression portion with something like an extremely long
Date(DateTimeNow()),DateAdd("dd",-1,Date(Year(DateTimeNow()), Month(DateTimeNow()), 1))
But Spotfire is refusing to accept that as multiple values. Interestingly, when I pull the logic for a StringList property, it gives this: $map("${udDates}", ","), which suggests commas are an accurate methodology, but I get an error reading "Expected 'End of expression' but found ','" . Uncertain if this is a Spotfire issue, or related to my database connection)
tl;dr -- Doing it in the Data on Demand section is probably convoluted. Recommend adjusting in SQL if possible, and otherwise limiting in the visualization
I am new to couchdb, i have looked at the docs and SO posts but for some reason this simple query is still eluding me.
SELECT TOP 10 * FROM x WHERE DATE BETWEEN startdate AND enddate ORDER BY score
UPDATE: It cannot be done. This is unfortunate since to get this type
of data you have to pull back potentially millions of records (a few
fields) from couch then do either filtering, sorting or limiting
yourself to get the desired results. I am now going back to my
original solution of using _changes to capture and store elsewhere the data i do need to perform that query on.
Here is my updated view (thanks to Dominic):
emit([d.getUTCFullYear(), d.getUTCMonth() + 1, d.getUTCDate(), score], doc.name);
What I need to do is:
Always sort by score descending
Optionally filter by date range (for instance, TODAY only)
Limit by x
Update: Thanks to Dominic I am much closer - but still having an
issue.
?startkey=[2017,1,13,{}]&endkey=[2017,1,10]&descending=true&limit=10&include_docs=true
This brings back documents between the dates sorted by score
However if i want top 10 regardless of date then i only get back top 10 sorted by date (and not score)
For starters, when using complex keys in CouchDB, you can only sort from left to right. This is a common misconception, but read up on Views Collation for a more in-depth explanation. (while you're at it, read the entire Guide to Views as well since you're getting started)
If you want to be able to sort by score, but filter by date only, you can accomplish this by breaking down your timestamp to only show the degree you care about.
function (doc) {
var d = new Date(doc.date)
emit([ d.getUTCFullYear(), d.getUTCMonth() + 1, d.getUTCDate(), score ])
}
You'll end up outputting a more complex key than what you currently have, but you query it like so:
startkey=[2017,1,1]&endkey=[2017,1,1,{}]
This will pick out all the documents on 1-1-2017, and it'll be sorted by score already! (in ascending order, simply swap startkey and endkey to get descending order, no change to the view needed)
As an aside, avoid emitting the entire doc as the value in your view. It is likely more efficient to leverage the include_docs=true parameter, and leaving the value of your emit empty. (please refer to this SO question for more information)
With this exact setup, you'd need separate views in order to query by different precisions. For example, to query by month you just use the year/month and so on.
However, if you are willing/able to sort your scores in your application, you can use a single view to get all the date precision you want. For example:
function (doc) {
var d = new Date(doc.date)
emit([ d.getUTCFullYear(), d.getUTCMonth() + 1, d.getUTCDate(), d.getUTCHour(), d.getUTCMinutes(), d.getUTCSeconds(), d.getUTCMilliseconds() ])
}
With this view and the group_level parameter, you can get all the scores by year, month, date, hour, etc. As I mentioned, in this case it won't be sorted by score yet, but maybe this opens up other queries to you. (eg: what users participated this month?)
I am using a couchDB database.
I can get all documents by category and paginate results with a key like ["category","document_id"]and a query likestartkey=["category","document_id"]&endkey=["category",{}]`
Now I want to sort those results by date to have latest documents first.
I tried a lot of keys such as ["category","date","document_id"]
but nothing works (or I can't get it working).
I would use something like
startkey=["queried_category","queried_date","queried_document_id"]&endkey=["queried_category"]
but ignore the "queried_date" key part (sort but do not take documents where "document_id" > "queried_document_id")
EDIT:
Example :
With a key like :
startkey=["apple","2012-12-27","ZZZ"]&endkey=["apple",{}]&descending=true
I will have (and it is the normal behavior)
"apple","2012-12-27","ABC"
"apple","2012-05-01","EFG"
...
"apple","2012-02-13","ZZZ"
...
But the result set I want should start with
"apple","2012-02-13","ZZZ"
Emit the category and the timestamp (you don't need the document_id):
emit(category, timestamp);
And then filter on the category:
?startkey=[":category"]&endkey=[":category",{}]
You must understand that this is only a sort, so you need the startkey to be before the first row, and the endkey to be after the last row.
Last but not least, don't forget to have a representation for the timestamp that is adequate to the sort.
The problem with pagination with timestamp instead of doc ID is that timestamp is not unique. That's why you will have problem with paging Aurélien's solution.
I would stay with what you tried but use timestamp as the number (standard UNIX milliseconds since 1970). You can reverse the order of single numeric field just by multiplying by -1:
emit(category, -timestamp, doc_id)
This way result sorted lexicographically (ascending) will be ordered according to your needs:
first dates descending,
then document id's ascending.
I have a storage of news with the following fields (Title, Body, NewsDate)
I need a best query with the following criteria
1) title is more important but less than date
2) date should be compare to the current date if the date of a document is near the current date it is more valuable (NOTE: It doesn't mean that sorting descending on news date cause there maybe results that their title and its body is more relevant but its older)
this is just another factor for searching and i think it needs custom sorting
3) body has is in the third place
Any solution ?
Like #Guillaume said, you need to use boosting.
You can employ in 2 places: one while indexing (boost title and body), and second (the date field) while querying. The date-field is query-time since it is dynamic
Index time boosting would be like:
Field fld = new Field(....);
fld.setBoost(10f);//10x more important, 1 is default
Query time boost would be get the date diff (say in days or mins) and apply the boost inversely i.e. the greater the diff. the smaller the boost.
You should use Boosting in your schema, instead of very complicated queries.