How to Calculate Area Under the Curve in Spotfire? - spotfire

Can someone assist me in calculate the area under the curve in Spotfire w/o using R or S+?
I am looking to rank the data in [K] (1 = largest value) for each Well, and then calculate the cumulative sums of [P] (ranked by descending K) and [K], then calculate and plot (as curves) the % cumulative sum of [P] and [K], calculate the area under each curve, for each Well (what is the expression).
I would like to calculate each column in Spotfire, but my main problems are ranking [K] so there are no ties (I was attempting to Rank first by [K] and then by [Depth]), summing the values of [P] and [K] by ranked [K] for each Well, and then calculating the Riemann Sums (Area) under each curve.

At first you need to add a new column ([period]) to calculate a unique time, otherwise Spotfire would get confused of different times in the datasets:
[well] & Log10([time] + 1)
Then add a cross table, drag [well] to the vertical axis, and enter this expression to cell values:
Sum(([pressure] + Avg([pressure]) over (Next([period]))) / 2 * (Avg([time]) over (Next([period])) - [time]))
Alternatively you can add a new column ([AUDPC_helper]) too see step by step calculations:
([pressure] + Avg([pressure]) over (Intersect(NextPeriod([period]),[well]))) / 2 * (Avg([time]) over (Intersect(NextPeriod([period]),[well])) - [time])
Just need to get summary of this column in e.g. a cross table

Related

Calculating the representative/average group of categorical data

I have a summary dataset below with 3 variables:
Rating
Frequency
Total Value
A
2
$10000
B
15
$24003
C
5
$56789
...
There are 18 different rating categories each with varying frequencies and values. I need to workout the average group which the data falls into so which of the rating groups is the average in both frequency and total value, so in short the data is rating group B for example.
I'm sure there must be a proper way to do this but haven't been able to easily find the answer online.
I've tried calculating some kind of weighted average but struggling as each category would be equally weighted?
In Office 365 you could use:
=LET(range,A2:C4,
rating,INDEX(range,,1),
freq,INDEX(range,,2),
total,INDEX(range,,3),
w_avg,SUMPRODUCT(freq,total)/SUM(freq),
delta,BYROW(total,LAMBDA(t,
MAX(t,w_avg)-MIN(t,w_avg))),
INDEX(rating,XMATCH(MIN(delta),delta)))
It requires the three columns as an input and names the columns. Than it calculates the weighted average as in my comment.
Then it checks the difference between the total and the weighted average per row. Than it indexes the rating and matches the one with the smallest difference (closest match to weighted average).
Note: in case of ties this would result in the first listed as a result. Otherwise we need to use FILTER.
For older Excel I have a solution including a helper cell & column:
In cell E2 use: =SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4) to calculate the weighted average.
In cell D2 use: =MAX(C2,$E$2)-MIN(C2,$E$2) and drag this down to the last used row in your range to calculate the difference between the total of the rating and the weighted average.
In cell F2 use: =INDEX(A2:A4,MATCH(MIN(D2:D4),D2:D4),0)
To match the rating of the smallest difference.
Or this one line monster:
=INDEX(A2:A4,
MATCH(
MIN(
IF(C2:C4>=SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4),
C2:C4,
SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4))-
IF(C2:C4<=SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4),
C2:C4,
SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4))),
IF(C2:C4>=SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4),
C2:C4,
SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4))-
IF(C2:C4<=SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4),
C2:C4,
SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4)),
0))

How to normalize samples of an ongoing cumulative sum?

For simplicity let's assume we have a function sin(x) and calculated 1000 samples between -1 and 1 with it. We can plot those samples. Now in the next step we want to plot the integral of sin(x) which would be - cos(x) + C. Now i can calculate the integral with my existing samples like this:
y[n] = x[n] + y[n-1]
Because it's a cumulative sum we will need to normalize it to get samples between -1 and 1 on the y axis.
y = 2 * ( x - min(x) / max(x) - min(x) ) - 1
To normalize we need a maximum and a minimum.
Now we want to calculate the next 1000 samples for sin(x) and calculate the integral again. Because it's a cumulative sum we will have a new maximum which means we will need to normalize all of our 2000 samples.
Now my question basically is:
How can i normalize samples in this context without knowing the maximum and minimum?
How can i prevent, to normalize all previous samples again, if i have a new set of samples with a new maximum/minimum?
I've found a solution :)
I also want to mention: This is about periodic functions like Sine, so basically the maximum and minimum should be always the same, right?
In a special case this isn't true:
If you samples don't contain a full period of the function (with global maximum and minimum of the function). This can happen when you choose a very low frequency.
What can you do:
Simply calculate the samples for a function like sin(x) with a
frequency of 1. It will contain the global maximum and minimum of the function (it's important that y varies between -1 and 1, not 0 and 1!).
Then you calculate the integral with the cumulative sum.
get maximum and minimum of the samples
you can scale it up or down: maximum/frequency, minimum/frequency
can be used now to normalize samples which were calculated with any other frequency.
It only need to be calculated once at the beginning.

Index values in Excel - Highest value is X, lowest is Y and all in between is divided in between X and Y

I need to index prices in Excel, so the highest price is 5 and the lowest is 1.
All the prices in between needs to be automatically sorted and given a value, depending on the amount of entries.
eg.
8$ -> 5.0 (Highest price = index score 5)
3$ -> 1.0 (Lowest price = index score 1)
7$ -> 2.5 (Price between the highest and lowest, gets a weighted score, depending on amount of entries in the list)
Your sentence "Price between the highest and lowest, gets a weighted score, depending on amount of entries in the list" is very unspecific. how is the weighted score calculated? How does the amount of entries in the list impact it? Can you give a specific example so we understand why the value for 7$ is 2.5? Is the weight depending on the position of 7$ in the sorted list of values?
On the other hand, if all you're looking for is an increasing affine function that maps values in a given range [a,b] to another range [c,d], then there is a simple formula. In your example, a=3$, b=8$, c=1.0, d=5.0.
For a new value x in the range [a,b], the corresponding value y in the range [c,d] is given by:
y = ( (d - c) * x + (bc - ad) ) / (b - a)
With your example, (d-c)/(b-a) = (5.0-1.0)/(8$-3$) = 0.8/$ and (bc-ad)/(b-a) = (8$*1.0-3$*5.0)/(8$-3$) = -1.4.
Therefore y = (0.8/$) * x - 1.4.
For instance if x = 7$, then y = (0.8/$) * 7$ - 1.4 = 4.4.

Calculating contrast values on Excel

I am currently studying experimental designs in statistics and I am calculating values pertaining to 2^3 factorial designs.
The question that I have is particularly with the calculations of the "contrasts".
My goal of this question is to learn how to use the table "Coded Factors" and "Total" in order to get the values "Contrast" using the IF THEN function in Excel.
For example, Contrast A is calculated as : x - y . Where
x = sum of the values in the Total, where the Coded Factor A is + .
And y= sum of the values in the Total, where the Coded Factor A is - .
This would be rather simple, but for the interactions it is a bit more complex.
For example, contrast AC is obtained as : x - y . Where
x = sum of the values in the Total, where the product of Coded Factor A and that of C becomes + .
And y = sum of the values in the Total, where the product of Coded Factor A and that of B becomes - .
I would really appreciate your help.
Edited:
Considering the way how IF statements work, I thought that it might be a good idea to convert the + into 1 and - into -1 to make the calculation straight forward.
Convert all +/- to 1/-1. Use some cells as helper..
Put in these formulas :
J2 --> =LEFT(J1)
K2 --> =MID(J1,2,1)
L2 --> =MID(J1,3,1)
Put
J3 --> =IF(J$2="",1,INDEX($B3:$D3,MATCH(J$2,$B$2:$D$2,0)))
and drag to L10. Then
M3 --> =J3*K3*L3*G3
and drag to M10. Lastly,
M1 --> =SUM(M3:M10)
How to use : Input the Factor comb in cell J1 and the result will be in M1.
Idea : separate the factor text > load the multiplier > multiply Total values with multiplier > get sum.
Hope it helps.

Find a growth rate that creates values adding to a determined total

I am trying to create a forecast tool that shows a smooth growth rate over a determined number of steps while adding up to a determined value. We have variables tied to certain sales values and want to illustrate different growth patterns. I am looking for a formula that would help us to determine the values of each individual step.
as an example: say we wanted to illustrate 100 units sold, starting with sales of 19 units, over 4 months with an even growth rate we would need to have individual month sales of 19, 23, 27 and 31. We can find these values with a lot of trial and error, but I am hoping that there is a formula that I could use to automatically calculate the values.
We will have a starting value (current or last month sales), a total amount of sales that we want to illustrate, and a period of time that we want to evaluate -- so all I am missing is a way to determine the change needed between individual values.
This basically is a problem in sequences and series. If the starting sales number is a, the difference in sales numbers between consecutive months is d, and the number of months is n, then the total sales is
S = n/2 * [2*a + (n-1) * d]
In your example, a=19, n=4, and S=100, with d unknown. That equation is easy to solve for d, and we get
d = 2 * (S - a * n) / (n * (n - 1))
There are other ways to write that, of course. If you substitute your example values into that expression, you get d=4, so the sales values increase by 4 each month.
For excel you can use this formula:
=IF(D1<>"",(D1-1)*($B$1-$B$2*$B$3)/SUMPRODUCT(ROW($A$1:INDEX(A:A,$B$3-1)))+$B$2,"")
I would recommend using Excel.
This is simply a Y=mX+b equation.
Assuming you want a steady growth rate over a time with x periods you can use this formula to determine the slope of your line (growth rate - designated as 'm'). As long as you have your two data points (starting sales value & ending sales value) you can find 'm' using
m = (y2-y1) / (x2-x1)
That will calculate the slope. Y2 represents your final sales goal. Y1 represents your current sales level. X2 is your number of periods in the period of performance (so how many months are you giving to achieve the goal). X1 = 0 since it represents today which is time period 0.
Once you solve for 'm' this will plug into the formula y=mX+b. Your 'b' in this scenario will always be equal to your current sales level (this represents the y intercept).
Then all you have to do to calculate the new 'Y' which represents the sales level at any period by plugging in any X value you choose. So if you are in the first month, then x=1. If you are in the second month X=2. The 'm' & 'b' stay the same.
See the Excel template below which serves as a rudimentary model. The yellow boxes can be filled in by the user and the white boxes should be left as formulas.

Resources