PromQL - Multiple time range exclusion - promql

in query_range is possible to specify a start and end date.
Is it possible to specify time range exclusions within start/end time range?
For example:
start=2020-12-19T00:00:00.000Z
end=2020-12-29T00:00:00.000Z
and excluding period
exclusion_start=2020-12-21T00:00:00.000Z
exclusion_end=2020-12-21T23:00:00.000Z
?
Thanks!

In short: it is not possible, use separate queries instead.
The /query_range endpoint was initially designed for plotting time series in systems like Grafana. And for this case range exclusion doesn't really needed, in my opinion. But you always can issue multiple queries divided by time range as you need.

Related

Can I use MINIFS or INDEX/MATCH on two non-contiguous ranges...?

Problem is straightforward, but solution is escaping. Hopefully some master here can provide insight.
I have a big data grid with prices. Those prices are ordered by location (rows) and business name (cols). I need to match the location/row by looking at two criteria (location name and a second column). Once the matching row is found (there will always be a match), I need to get the minimum/lowest price from two ranges within the grid.
The last point is the real challenge. Unlike a normal INDEX or MINIFS scenario, the columns I need to MIN aren't contiguous... for example, I need to know what the MIN value is between I4:J1331 and Q4:U1331. It's not an intersection, it's a contiguous set of values across two different arrays.
You're probably saying "hey, why don't you just reorder your table to make them contiguous"... not an option. I have a lot of data, and this spreadsheet is used for a bunch of other stuff. So, I have to work with the format I have, and that means figuring out how to do a lookup/min across multiple non-contiguous ranges. My latest attempt:
=MINIFS(AND($I$4:$J$1331,$K$4:$P$1331),$B$4:$B$1331,$A2,$E$4:$E$1331,$B2)
Didn't work, but it should make it more clear what I'm trying to do. There has GOT to be an easy way to just tell excel "use these two ranges instead of one".
Thanks,
Rick
Figured it out. For anyone else who's interested, there doesn't seem to be any easy way to just "AND" arrays together for a search (Hello MS, backlog please). So, what I did instead was to just create multiple INDEX/MATCH arrays inside of a MIN function and take the result. Like this:
MIN((INDEX/MATCH ARRAY 1),(INDEX/MATCH ARRAY 2))
They both have identical criteria, the only difference is the set of arrays being indexed in each function. That basically gives me this:
MIN((match array),(match array))
And Min can then pull the lowest value from either.
Not as elegant as I'd like... lots of redundant code, but at least it works.
-rt

Spotfire DenseRank by category, do I use OVER?

I'm trying to rank some data in spotfire, and I'm having a bit of trouble writing a formula to calculate it. Here's a breakdown of what I am working with.
Group: the test group
SNP: what SNP I am looking at
Count: how many counts I get for the specific SNP
What I'd like to do is rank the average # of counts that are present for each SNP, within the group. Thus, I could then see, within a group, which SNP ranks #1, #2, etc.
Thanks!
TL;DR Disclaimer: You can do this, though if you are changing your cross table frequently, it may become a giant hassle. Make sure to double-check that logic is what you'd expect after any modification. Proceed with caution.
The basis of the Custom Expression you seem to be looking for is as follows:
Max(DenseRank(Count() OVER (Intersect([Group],[SNP])),"desc",[Group]))
This gives the total count of rows instead of the average; I was uncertain if "Count" was supposed to be a column or not. If you really do want to turn it into an average, make sure to adjust accordingly.
If all you have is the Group and the SNP nested on the left, you're done and good to go.
First issue, when you want to filter it down, it gives you the dense rank of only those in the filtered set. In some cases this is good, and what you're looking for; in others, it isn't. If you want it to hold fast to its value, regardless of filtering, you can use the same logic, but throw it in a Calculated column, instead of in the custom expression. Then, in your CrossTable Aggregation, get the max of the Calculated Column value.
Calculated Column:
DenseRank(Count() OVER (Intersect([Group],[SNP])),"desc",[Group])
Second Issue: You want to pivot by something other than Group and SNP. Perhaps, for example, by date? If you throw the Date across the top, it's going to show the same numbers for every month -- the overall numbers. This is not particularly helpful.
To a certain extent, Spotfire's Custom Expressions can handle this modification. If you switch between using a single column, you could use the following:
Max(DenseRank(Count() OVER (Intersect([${Axis.Columns.ShortDisplayName}],[Group],[SNP])),"desc",[Group],[${Axis.Columns.ShortDisplayName}]))
That would automatically pull in the column from the top, and show you the ranking for each individual process date.
However, if you start nesting, using hierarchies, renaming your columns, or having multiple aggregations and throwing (Column Names) across the top, you're going to start having to pay a great deal to your custom expression. You'll need to do some form of string replacement around the Axis.Column, or use expression instead of Short Names, and get rid of Nests, etc.
Any layer of complexity will require this sort of analysis, so if your end-users have access to modify the pivot table... honestly, I probably wouldn't give them this column.
Third Issue: I don't know if this is an issue, exactly, but you said "Average Counts" -- Average per day? Per Month? When averaging, you will need to decide if, for example, a month is the total number of days in month or the number of days that particular payor had data. However you decide to aggregate it, make sure you're doing it on the right level.
For the record, I liked the premise of this question; it's something I'd thought would be useful before, but never took the time to try to implement, since sorting a column or limiting a table to only show the top 10 values is much simpler

How to optimize COUNTIFS with very large data

I would like to create a report that look like this picture below.
My data has around 500,000 cells (it will continue to grow larger)
Right now, I'm using countifs function from excel but it takes a very long time to calculate. (cannot turnoff automatic calculate)
The main value is collected as date and the range of date is about 3 years, so I have to put a lot of formula to cover all range of value.
result
The picture below is the datasource the top one cannot be changed. , while the bottom is the one I created by myself (can change). I use weeknum to change date to week number.
data
Are there any better formula or any ways to make this file faster? Every kinds of suggestions are welcome!
I was thinking about using Pivot Table, but I don't know how to make pivot table from this kind of datasource.
PS. VBA is the last option.
You can download example file here: https://www.mediafire.com/?t21s8ngn9mlme2d
I will post this answer with the disclaimer that it is entirely dependent on the size of the data set. That turning on and off the auto calculate is the best way, but your question doesn't let me do that, so keep reading.
Your question made me curious, so I gave it a try and timed it. I essentially set up two columns of over 100,000 rand numbers choosing from 1-1000 and then tried to do a countif on the two columns if they were equal. I made a macro that I can run that turns off the autocalculate, inserts the start time, calculates, and then inserts the finish time. I highlighted in yellow the time difference.
First I tried your way, two criteria, countifs:
Then I tried to combine (concatenate) the two columns to see if I could make it easier by only having one countif criteria and data set. It doesn't. see result below:
Finally, realizing what was going on. I decided to make the criteria only match the FIRST value in the number to look for. I was essentially reducing the number of characters to check per cell. This had a positive result. See below:
Therefore my suggestion is to limit the length of the words you are comparing in anyway possible. You are mostly looking at dates, so you might have to get creative, but this seems to be the best way possible without going to manual calculation.
I have worked with Excel sheets of a similar size. Especially if you are using the data on a regular basis, I would heartily recommend switching to a proper database SQL based, Access, or whatever fits your purpose. I does wonders for the speed and also you won't run into the size limits of Excel. :-)
You can import the data you have now fairly easy.
I am happy as a clam with my postgresql db.

Dynamically Pick a Dynamic Range

I've spent several hours trying to come up with a clean solution to this, and I just don't seem to be able to.
Basically, I have several dynamic ranges that I have defined, and I want to select one based upon a condition. It seems so straightforward! The reason that I want to select this dynamic range is so that I can graph the range.
Indirect does not work.
DIndirect (A common VBA algorithm) does not work.
(Or I can't get them to)
Choose works! However, the maximum length of a dynamic range is very quickly reached, which means I'm forced into inane solutions like the following.
I define dynamic_dynamic
=IF(A1<3,CHOOSE(A1,'WorksheetName'!dynamic_range_1, 'WorksheetName'!dynamic_range_2),'WorksheetName'!dynamic_dynamic_2)
Then I define dynamic_dynamic_2
=IF(A1<5,CHOOSE(A1-2,'WorksheetName'!dynamic_range_3, 'WorksheetName'!dynamic_range_4),'WorksheetName'!dynamic_dynamic_3)
Then I define dynamic_dynamic_3
=IF(A1<7,CHOOSE(A1-4,'WorksheetName'!dynamic_range_5, 'WorksheetName'!dynamic_range_6),'WorksheetName'!dynamic_dynamic_4)
.... And so on.
Really? I'm sure I'm being an idiot, but nothing else seems to work!

Amazon Cloud Search - get places by time and date

I am using Amazon CloudSearch to store a large set of places.
Each place has a opening time and a closing time, for each day of the week.
I need to retrieve places by current time. How do you suggest to model the index?
I am thinking to solve the problem by creating 7 text indexes in which I specify, for each day of the week, the valid hours.
For example, if a place is opened from 9 am to 13 am, in the index "monday" I will write the string "9-10-11-12". Then, filtering by bq=monday:'10' or bq=monday:'16' I will have only the places that at the specified time are opened.
Any other idea? My solution seems working but would suggest me another approach?
First, I wouldn't use multiple indexes.
You could use your approach, but just make the time in hours from the start of the week. So, Monday would be 0-23, Tuesday 24-47, etc. Or you could just have 7 fields, "monday_hours", "tuesday_hours", …
You could also use uints, instead of strings. Not better, but different, might be worth benchmarking.
With uints you can use range queries. If the document contained the fields "open" and "close" and you want to know if it's open between 10 and 12.
&bq=(and open:..12 close:10..)
One issue remaining is that CloudSearch's range searches are inclusive of endpoints. So I think this will show a false positive if the store opens at twelve. Technically, the ranges overlap, but not usefully. To fix that, I'd do two things. First, I wouldn't go by hours, I'd use minute-of-the-day as the value in the field (0 to 1439). Then add one to the starting range, and subtract one from the end.
Using uints will perform differently from using text fields. I'd definitely benchmark them to see which one works better for you.

Resources