Calculated column which references its previous value and current value - calculated-columns

I have a requirement where I am creating a calculated column. This column needs to use its own previous value and also its current value (i.e. to perform a cumulative calculation).
I know this is possible in Tableau by using the function
PREVIOUS_VALUE(-1)
so I can do something like
x (calc) = sum(x(calc) + PREVIOUS_VALUE(-1))
How can this be performed in Spotfire? In other words, what is the equivalent Spotfire function to
PREVIOUS_VALUE(-1) (from Tableau)
Here is the equivalent implementation on excel. Where [WDVpt2] is the calculated field.

This is done with an OVER function. You can read about them from the Tibco Documentation. The formula you are looking for is:
(Sum([x]) over (AllPrevious([x]))) - [x]
Uses all nodes, including the current, from the start of the level.
This can be used to calculate the cumulative sum.

I was able to achieve this by the following approach. Its a limitation of Spotfire that a calculated column can not refer its value at the previous node.In tableau Lookup and Previous_value supports this feature(not to be confused with Previous() of Spotfire which can be applied on a static column [x] so [X_Calc]= [X_Calc]+ [X_Calc] over(AllPrevious([ROWID])) is not possible
Step1:Create a ROW ID column which is based on perdiod Rowid([YEAR])
Step2:Breaking the result for each row into a separate calculated column so first row value
First(Sum([WDVpt1]) over (AllPrevious([RowID])) - (Sum([WDVpt1]) over AllPrevious([RowID])) * [Policy rate])) over (AllPrevious([RowID])))
Step3: Create columns for remaining periods(3 to 16) by using the previous calculated column Eg:- column for node 3 will use [WDVpt2_Year2] column for node 4 will use [WDVpt2_Year3] and so on Sum((case when [RowID]=2 then [WDVpt1] end) + Min([WDVpt1]) over ([RowID]) - (((case when [RowID]=2 then [WDVpt1] end) + Min([WDVpt2_Year2]) over (intersect([asset class code],[RowID]))) * [Policy rate])) over (AllPrevious([RowID])))
Step4: Write a case statement to generate one single column
case when [RowID]=2 then first([WDVpt2_Year2]) when [RowID]=3 then first([WDVpt2_Year3]) when [RowID]=4 then first([WDVpt2_Year4]) when [RowID]=5 then first([WDVpt2_Year5]) when [RowID]=6 then first([WDVpt2_Year6]) when [RowID]=7 then first([WDVpt2_Year7]) when [RowID]=8 then first([WDVpt2_Year8]) when [RowID]=9 then first([WDVpt2_Year9]) when [RowID]=10 then first [WDVpt2_Year10]) when [RowID]=11 then first([WDVpt2_Year11]) when [RowID]=12 then first([WDVpt2_Year12]) when [RowID]=13 then first([WDVpt2_Year13]) when [RowID]=14 then first([WDVpt2_Year14]) when [RowID]=15 then first([WDVpt2_Year15]) when [RowID]=16 then first([WDVpt2_Year16]) end as [WDVpt2]

Related

EXCEL - Dual VLOOKUP and Interpolation

I have a table on Excel with data as the following:
Meaning, I have different JPH based on the %SMALL unit and the number of active stations.
I need to create a matrix like the following (with %SMALL on horizontal and STATIONS on vertical axes):
And the formula for each cell should:
Take the input of Stations (column "B")
Check, for that specific Stations number, the amount of data on the other table (like make a filter on STATIONS for the specific number)
Perform an VLOOKUP for checking the JPH based on the %SMALL value on row 2
Interpolate for the exact JPH value, if not found on table
For now, I was able to create the last part (the VLOOKUP and the interpolation), with the following:
=IFERROR(VLOOKUP(C2;'EARLY-STATIONS'!$F:$H;3;FALSE);AVERAGE(OFFSET(INDEX('EARLY-STATIONS'!$H:$H;MATCH(C2;'EARLY-STATIONS'!$F:$F;1));0;0;2;1)))
The problem I'm facing is than with this, the calculation is not checking the number of stations, so the Iteration is not accurate.
Unfortunately I cannot use VBA macros to solve this.
Any clue?
This is an attempt because more clarity is needed in terms of all possible scenarios to consider, based on different input data and how to understand the "extrapolation" process. This approach understands as extrapolation the average of two values (lower and greater), but the idea can be customized to any other way to calculate it. Per tags listed in the question I assume there is no Excel version constraint. This is O365 solution:
=LET(sm, A2:A10, st, B2:B10, jph, C2:C10, smx, F1:J1, sty, E2:E4, NULL, "",
GETLk, LAMBDA(x,y,mode, FILTER(jph, (st=y)
* (sm = INDEX(sm, XMATCH(x, sm, mode))), NULL)),
GET, LAMBDA(x,y, LET(f, FILTER(jph, (jph=GETLk(x,y, 1))
+ (jph=GETLk(x,y, -1)), NULL), IF(#f=NULL, NULL, AVERAGE(f)))),
HREDUCE, LAMBDA(yi, DROP(REDUCE("", smx, LAMBDA(ac,x,
HSTACK(ac, GET(x, yi)))),,1)),
DROP(REDUCE("", sty, LAMBDA(ac,y, VSTACK(ac, HREDUCE(y)))),1))
The above formula spills the entire result, I don't think for this case you can use a LOOKUP-like function.
Here is the output:
The highlighted cells where the average is calculated.
Explanation
The main idea is to use DROP/REDUCE/HSTACK/VSTACK pattern to generate the grid. Check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length on how to apply it.
We use two user LAMBDA functions to abstract some calculations:
GETLk(x,y,mode), filters jph name based on %SMALL and Stations columns values, based on input values x (x-axis value from the grid), y (y-axis value form the grid) respectively. The third input argument mode, is for doing the approximate search in XMATCH (1-next largest, -1 next smallest). In case the value exist in the input table, XMATCH returns the same value in both cases.
GET(x,y) has the logic to find the value or if the value doesn't exist to calculate the average. It uses the previous LAMBDA function GETLk. We filter for jph values that match the input values (x,y), but we use an OR condition in the FILTER (+), to select both lower or greater values. If the value exist, returns just one value otherwise two values are returned by FILTER (f). Finally if f is not empty we return the average, otherwise the value we setup as NULL.
HREDUCE: Concatenate the result by columns for a given row of the grid. Check the referred question for more information about it.

How can I extract a total count or sum of agents who made their first sale in a specified month?

I am trying to extract some data out of a large table of data in Excel. The table consists of the month, the agent's name, and either a 1 if they made a sale or a 0 if they did not.
What I would like to do is plug in a Month value into one cell, then have it spit out a count of how many agents made their first sale that month.
Sample Data and Input Output area
I found success by creating a secondary table for processing a minif and matching to agent name, then countif in that table's data how many sales months matched the input month. However I would like to not have a secondary table and do this all in one go.
=IF(MINIFS(E2ERawData[Date Group],E2ERawData[Agent],'Processed Data'!B4,E2ERawData[E2E Participation],1)=0,"No Sales",MINIFS(E2ERawData[Date Group],E2ERawData[Agent],'Processed Data'!B4,E2ERawData[E2E Participation],1))
=COUNTIFS(ProcessedData[Month of First E2E Sale],H4)
Formula in column F is:
=MAX(0;COUNTIFS($A$2:$A$8;E3;$C$2:$C$8;1)-SUM(COUNTIFS($A$2:$A$8;"<"&E3;$C$2:$C$8;1;$B$2:$B$8;IF($A$2:$A$8=E3;$B$2:$B$8))))
This is how it works (we'll use 01/03/2022 as example)
COUNTIFS($A$2:$A$8;E3;$C$2:$C$8;1) This counts how many 1 there are for the proper month (in our example this part will return 2)
COUNTIFS($A$2:$A$8;"<"&E3;$C$2:$C$8;1;$B$2:$B$8;SI($A$2:$A$8=E3;$B$2:$B$8)) will count how many 1 you got in previous months of the same agents (in our example, it will return 1)
Result from step 2, because it's an array formula, we sum up using SUM() (in our example, this return 1)
We do result from step 1 minus result from step 3 (so we get 1)
Finally, everything is inside a MAX function to avoid negative results (February would return -1 because there were no sales at all and agent B did a sale on January, so it would return -1. To avoid this, we force Excel to show biggest value between 0 and our calculation)
NOTE: Because it's an array formula, depending on your Excel version maybe you must introduce pressing CTRL+ENTER+SHIFT
If one has got access to the newest functions:
=LET(X,UNIQUE(C3:C9),VSTACK({"Month","Total of First time sales"},HSTACK(X,BYROW(X,LAMBDA(a,SUM((C3:C9=a)*(MINIFS(C3:C9,D3:D9,D3:D9,E3:E9,1)=C3:C9)))))))

Pyspark conditionally replace value in column with value from another column

I am working with some weather data that is missing some values (indicated via value code). For example, if SLP data is missing, it is assigned code 99999. I was able to use a window function to calculate a 7 day average and save it as a new column. A significantly reduced example of a single row is shown below:
SLP_ORIGIN
SLP_ORIGIN_7DAY_AVG
99999
11945.823516044207
I'm trying to write code such that when SLP_ORIGIN has the missing code it gets replaced using the SLP_ORIGIN_7DAY_AVG value. However, most code explains how to replace a column value based on a conditional with a constant value, not the column value. I tried using the following:
train_impute = train.withColumn("SLP_ORIGIN", \
when(train["SLP_ORIGIN"] == 99999, train["SLP_ORIGIN_7DAY_AVG"]).otherwise(train["SLP_ORIGIN"]))
where the dataframe is called train.
When I perform a count on the SLP_ORIGIN column using train.where("SLP_ORIGIN = 99999").count() I get the same count from before I attempted replacing the value in that column. I have already checked and my SLP_ORIGIN_7DAY_AVG does not have any values that match the missing code.
So how do I actually replace the 99999 values in the SLP_ORIGIN column with the associated SLP_ORIGIN_7DAY_AVG value?
EVEN BETTER, is there a way to do this replacement and window calculation without making a 7 day average column (I have other variables I need to do the same thing with so I'm hoping there is a more efficient way to do this).
Make sure to double check with dataframe you are verifying on.
I was using train.where("SLP_ORIGIN = 99999").count() when I should have been using train_impute.where("SLP_ORIGIN = 99999").count()
Additionally, instead of making a whole new column to store the imputed 7 day average, one can only calculate the average when the missing value code is present:
train = train.withColumn("SLP_ORIGIN", when(train["SLP_ORIGIN"] == 99999, f.avg('SLP_ORIGIN').over(w)).otherwise(train["SLP_ORIGIN"]))\

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800
tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

SpotFire - How to get cumulative percentage

I need to obtain Cuml % in Spotfire; how to do? Please refer the below data-set
Data Set
You'll want to use the OVER function. I re-created your data table, then inserted three calculated columns:
ActualCuml = Sum([Actual]) OVER (AllPrevious([Day]))
PlannedCuml = Sum([Planned]) OVER (AllPrevious([Day]))
CumlPct = [ActualCuml] / [PlannedCuml]
The first two calculated columns are your rolling sums for Actual and Planned, and then the third column just divides those two new columns to get the cumulative percentage.
You could just insert a single calculated column and use the expressions from the first two as the division factors:
Sum([Actual]) OVER (AllPrevious([Day])) / Sum([Planned]) OVER (AllPrevious([Day]))

Resources