Presto: how to fill in gap and copy value from previous rows

Presto: how to fill in gap and copy value from previous rows - presto

I have a table with 2 rows:
Percent Value
--------- -------
99.95 230
99.92 130
99.05 94
I want to change this so that if there are gaps in the percent column (e.g. 99.94, 99.93, 99.91...), I want to create that row with the value from the previous row. so for example, 99.94 and 99.93 would have value of 130 and 99.91 would have value of 94.
Windowing function requires knowing fixed offset and also i don't think i can use it to populate new set of tables with more # of rows.
I think i can make it work by generating a number sequence table and cross join with this table, however, I don't know how to generate a dummy CTE with number sequence from 00.00 to 100.00 at 0.01 increment.
any help would be appreciated

As you suggested in your question, you can do it with a sequence table (by unnesting the output of the sequence function) and the lag window function like this:
WITH data(p, v) AS (VALUES
(99.95, 230),
(99.92, 130),
(99.05, 94)
),
sequence(p) AS (
SELECT x/100.00 FROM unnest(sequence(1, 10000)) t(x)
)
SELECT
sequence.p,
coalesce(v, lag(v) IGNORE NULLS OVER (ORDER BY sequence.p))
FROM data RIGHT JOIN sequence ON data.p = sequence.p

Related

Excel Array formula to count moving average outliers

I've tried a few things on this and settled on a 'cheap' solution. Wanted to know if this can be done directly and more elegantly.
Problem Statement and Sample Data
Assume we have a table in excel with ~200 columns and a large number of rows (~10k).
Sample Data:
identifier
val1
val2
val3
...
val200
ID_1
100
102
34
...
89
We want to add a column at the end that shows us how many "moving average" outliers exist. A moving average outlier is defined as a point that is outside the range (mean - 2 * std deviations, mean + 2 * std deviations), where the mean and std dev is calculated using the previous 10 values (therefore its a moving average outlier).
We will not test the first 10 values. But from val11, the previous 10 values will be used to form the window and we want to test if the value is an outlier.
My Solution so far
I created another table of same dimensions as the original. In cells from val11 (to val200, for all columns), I put in the formula below in the new table. And then, I can simply sum the columns in each row in the new table.
Assume val11 is on X2 in the "shocks" worksheet (for first row):
=IF(OR(shocks!X2<AVERAGEA(shocks!D2:W2)-2STDEVA(shocks!D2:W2),shocks!X2>AVERAGEA(shocks!D2:W2)+2STDEVA(shocks!D2:W2)),1,"")
But if possible, I want to avoid having a second table since it bloats and slows down the file. Any help would be greaty appreciated

Why is my SUMX DAX function returning this result?

Suppose I have 2 tables:
fTransactions
ProdID RepID Revenue
1 1 10
1 1 10
1 2 10
dSalesReps
RepID RepName
1 joe
2 sue
With dSalesReps having the following measures with no filters applied yet:
RepSales:=CALCULATE(SUM(fTransactions[Revenue]))
RepSales2:=SUMX(fTransactions, CALCULATE(SUM(fTransactions[Revenue]))
The first measure performs how I think it would. It goes to the fTransactions table and sums up the Revenue column.
The second measure, after a lot of trial and error to figure it out, seems to sort of group itself on unique rows in fTransactions. In the above example, fTransactions has 2 rows where everything is identical, then a last row where something is different. This seems to result in the following:
(10 + 10) first iteration that sums the first "grouping"
+
(10 + 10) second iteration that sums the first "grouping" again
+
(10) last iteration that sums the second "grouping"
= 20 + 20 + 10 = 50
At least that's how it looks to be operating. I just don't understand why. I thought it would go to the fTransactions table, sum all of Revenue for each iteration, then sum those sums as a final step.

This is caused by something called "context-transition" (see sqlbi more detailed explanation).
In practice, your formula "RepSales" uses a "Row Context" (created by SUMX) which is turned in an equivalent "Filter Context" (by CALCULATE), but since you don't have an unique key in the table, it gets and uses multiple rows in each iteration, below the explanation.
For the first row, the row context is ProdID=1 AND RepID=1, which turned in an equivalent filter context (stays the same, in this case) is ProdID=1 AND RepID=1 but the filter context is global, and two rows (the first 2) match this filter.
This is repeated for each row.
it does not happen with the formula "RepSales" because it does not iterate multiple times (as you already noticed)
This is your current situation:
To prove that, just add a rowID to the transaction table:
It does not happen because the equivalent filter context also include the RowID column, which matches only one row
Hope this helps, use the sqlbi article as a reference, it will be an exhaustive guide to understand this

Why does the DAX formula in my calculated column use propagation to filter in one instance and not in another?

Suppose I have a couple of tables:
fTransactions
Index ProdID RepID Revenue
1 1 1 10
2 1 1 20
3 2 2 30
4 2 2 10
dSalesReps
RepID RepName CC1 CCC2
1 joe 40 70
2 sue 30 70
3 bob 70
CC1 contains a calculated column with:
CALCULATE(SUM(fTransactions[Revenue]))
It's my understanding that it's taking the row context and changing to filter context to filter the fTransaction table down to the RepID and summing. Makes sense per an sqlbi article on the subject:
"because the filter context containing the current product is automatically propagated to sales due to the relationship between the two tables"
CC2 contains a calculated column with:
SUMX(fTransactions, CALCULATE(SUM(fTransactions[Revenue]))
However, this one puts the same value in all the columns and doesn't seem to propagate the RepID like the other example. The same sqlbi article mentions that a filter is made on the entire fTransactions row. My question is why does it do that here and not the other example, and what happened to the propagation of RepID?
"CALCULATE places a filter on all the columns of the table to identify a single row, not on its row number"

A calculated column is created in a loop: power pivot goes row by row and calculates the results. CALCULATE converts each row into a filter context (context transition).
In the second formula, however, you have 2 loops, not one:
First, it loops dSalesReps table (because that's where you are creating the column);
Second, it loops fTransactions table, because you are using SUMX function, which is an iterator.
CALCULATE function is used only in the second loop, forcing context transition for each row in fTransactions table. But there is no CALCULATE that can force context transition for the rows in the dSalesReps. Hence, there is no filtering by Sale Reps.
Fixing the problem is easy: just wrap the second formula in CALCULATE. Better yet, drop the second CALCULATE - it's not necessary and makes the formula slow:
CCC2 =
CALCULATE(
SUMX(fTransactions, SUM(fTransactions[Revenue]))
)
This formula is essentially identical to the first one (the first formula in the background translates to the second one, SUM function is just a syntax sugar for SUMX).

You could also write the formula as:
CC2 = SUMX( RELATEDTABLE( fTransactions ), fTransactions[Revenue] )
or
CC2 = SUMX( CALCULATETABLE( fTransactions ), fTransactions[Revenue] )
The key is that fTransactions as the first argument of SUMX needs to be filtered for each SalesRep (i.e. on the current row). Without the filter then you are just iterating the entire fTransactions table for each SalesRep. Somehow SUMX needs to know you just want the fTransactions for the SalesRep whose revenue you are trying to compute.

percent_rank special case to not include the value being evaluated in the range of group to be evaluated

Consider these values:
company_ID 3yr_value
1 10
2 20
3 30
4 40
5 50
I have this statement on my query and my goal is to compute for the percent rank of value 50 in the group
round(((percent_rank() over (partition by bb.company_id order by bb.3yr_value)) * 100))
in excel, this is equivalent to
=percentrank(b1:b5,b5)
BUT, what I need is an equivalent to this 1:=percentrank(b1:b4,b5) -- notice that I don't include A5 in the range that needs to be evaluated. I'm out of options, and already consulted Mr. Google but it seems I still cant find the solution. I always end up including B5 in my query.
I'm using postgres sql

How to return a value from a range of values

I would appreciate it if someone can answer this.
Lets say I got multiple rows with three column with min, max and the return value . And I wanted to create a single formula to search the min and max value and then gave back a return value based on the row . Let me just show it :
Min Max Return
0.01 10 0
10.01 20 5
20.01 30 12
30.01 40 15
Input 7 <---- User input
Return 0 <---- This should be calculated based on the user input against the table
Input 33 <---- User input
Return 15 <---- This should be calculated based on the user input against the table

If you mean a SQL Query, here is the query that jsut do the job for you :
SELECT Return from TABLE_Name
WHERE
Input >= Min AND Input < Max

Ok, I'll attempt another try:
=SUMPRODUCT(C2:C5*(F1>=A2:A5)*(F1<=B2:B5))
C2:C5 are the results, A2:A5 the minmum values, B2:B5 the maximum values and F1 the actual value.
Basically, SUMPRODUCT can be used as it does the calculation for every row and sums up the results. If the test succeeds, 1 is returned, otherwise 0. Thus, only the successful test will have a 1, all others will multiply their result with 0.

If I understood the question correctly, some nested IFs would do like (assuming input in A4 and the ranges like in the table):
=IF(AND(A4>B1,A4<B2),B3,IF(AND(A4>C1,A4<C2),C3,...
For more complex (meaning longer) tables you could also use a "helper" column (Column D):
IF(AND($A$4>B1,$A$4<B2),B3,"")
"drag" this down to copy it and then sum the column to get the result.
All a bit of a mess, but I can't think of any more elegant solution using excel formulas.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Presto: how to fill in gap and copy value from previous rows - presto

Related

Excel Array formula to count moving average outliers

Why is my SUMX DAX function returning this result?

Why does the DAX formula in my calculated column use propagation to filter in one instance and not in another?

percent_rank special case to not include the value being evaluated in the range of group to be evaluated

How to return a value from a range of values

Categories

Resources