Analysing simulation result of event sequences with branch - statistics

So I have a problem where a sequence of
A1 > B1 > C1 > D1
or
A1 > B1 > C2 > D2
or
A1 > B1 > C2 > D3
or
A2 > B2 > C3 > D4
Note there's more than 1 root starting point too. Each stage also has some other properties to it. So I'd want to ask
find all stage (regardless of ABCD) where property 1 = some value and has some where up the parent chain property 2 = some value.
I need to work out the probability of getting to each stage if given all "sequence branch" are of equal probability. So probability of getting to D3 is 1/2(A) * 1/2(C) where as D1 or D2 stage is 1/2(A) * 1/2(C) * 1/2(D)
Conditional probability. Given B1 has happened, what's the chance of D3.
What's the best way / technique to store and analyse / query / interrogate data like this? What sort of keywords should I google / field / technology to read and learn?
Note I'm thinking to generate in the neighbourhood of 100s of k up to millions sample of sequence events.
I've had some look at RDBMS recursive CTE. That solves problem 1, but 2 and 3 in combination seem a bit more difficult. Was wondering if a graph database like neo4j can solve the problem better?

Related

Summation Changes Based on Result

My objective is to do a sum of previous values based on ranges.
I have an index of 0 to 25. I have a base rate at index 0 to be 0.142857.
The final result is a percentage. I have 3 formula's I know...
Base Rate * 0.1 is the increment if the result is <30%
Base Rate * 0.02 is the increment if the result is <32%
If the result is above 32%, do nothing else, show it as a hard cap of 32%
So for example:
So the problem is that I hardcoded column C with the increment per index. I need this to be smart enough to know, hey if this would go above 30% then use the next formula and if it would go above 32% then cap it. Any ideas on how I could do this in Excel or Google Sheets without using scripts or VBA?
paste in B3 and drag down:
=IF(IF(B2< 30%, B2+($B$2*0.1)) < 30%, B2+($B$2*0.1),
IF(IF(B2< 32%, B2+($B$2*0.02))< 32%, B2+($B$2*0.02),
IF(IF(B2>=32%, B2+($B$2*0.02))>=32%, 32%)))
cell C2 could be: =ARRAYFORMULA({TEXT(B2:B27, "00.00%")})
=IF(B2< 30%, B2+($B$2*0.1),
IF(B2< 32%, B2+($B$2*0.02),
IF(B2>=32%, 32%)))
or combo:
=IF(B2< 30%, B2+($B$2*0.1),
IF(IF(B2< 32%, B2+($B$2*0.02))< 32%, B2+($B$2*0.02),
IF(IF(B2>=32%, B2+($B$2*0.02))>=32%, 32%)))

complex IF statements with multiple variables possibilities

I need help with making some IF/OR/AND statements.
I have a cell (C8) that can be one of fourteen different variables. Depending on the value for C8 either cells F8, D8, or E8 will be used in three possible equations.
C D E F G H
7
8
9
C8 can equal any of the following values
0.5,0.55,0.6,0.7,0.75,1,1.0625,1.125,1.1875,1.25,1.325,1.375,1.4375,1.5
Equations needed:
IF C8 equals any values from 0.6 - 1.5 will then need to solve for (100-(F8-108)*5))+(G8+1))
IF C8 equals 0.5 will then need to solve for (100-((D8-56)*5)+(G8*1))
IF C8 equals 0.55 will then need to solve for (100-((E8-102)*5)+(G8*1)
I currently have this equation C8 if values are 0.6 1.5
=IF(AND(SUMPRODUCT(--ISNUMBER(SEARCH({0.6,0.65,0.7,0.75,1,1.0625,1.125,1.1875,1.25,1.325,1.375,1.4375,1.5},C8)))>0),100-(((F8-108)*5)+(G8*1)),"")
I think I need an IF/OR statement for two additional Situation
C8 equals 0.5 to solve for 100-(((D8-56)*5)+(G8*1))
C8 equals 0.55 to solve for 100-(((E8-102)*5)+(G8*1))
The following is the they type of IF/OR formulas I have tried.
=IF(OR(SUMPRODUCT(--ISNUMBER(SEARCH({0.6,0.65,0.7,0.75,1,1.0625,1.125,1.1875,1.25,1.325,1.375,1.4375,1.5},C8)))>0), 100-(((F8-108)*5)+(G8*1)), OR(ISNUMBER(SEARCH({0.5,C8)))>0)100-(((D8-56)*5)+(G8*1)), OR(ISNUMBER(SEARCH({0.55,C8))>0)100-(((E8-102*5)+(G8*1))"")
=IF(OR(SUMPRODUCT(--ISNUMBER(SEARCH({0.6,0.65,0.7,0.75,1,1.0625,1.125,1.1875,1.25,1.325,1.375,1.4375,1.5},C8)))>0), 100-(((F8-108)*5)+(G8*1)), (ISNUMBER(SEARCH({0.5,C8)))>0)100-(((D8-56)*5)+(G8*1)), (ISNUMBER(SEARCH({0.55,C8))>0)100-(((E8-102*5)+(G8*1))"")
Do you need to search for the values? If the cell can only equal one of the values you shared you can just build your statement around that assumption.
If that assumption is false, this will not work. [Equation3] will be called when C8 equals anything BUT .50 & .55 so C8 has to have limitations for this to work.
IF(C8 = .50, [Equation1], IF(C8 = .55, [Equation2], [Equation3]))
Where
[Equation1] = (100-((D8-56)*5)+(G8*1))
[Equation2] = (100-((E8-102)*5)+(G8*1)
[Equation3] = (100-(F8-108)*5))+(G8+1))

Dynamic Programming. Coin Row Problem - How its Recurrsive relation is developed

Problem: A set of n coins is placed in a row. The coins have positive values which need not be distinct. Find the maximum amount that can be collected given the constraint that no two adjacent coins can be picked up.
Its recursive relation is
F(n) = max{cn + F(n − 2), F(n − 1)} for n > 1,
F(0) = 0, F(1) = c1.
My question is how this recursive relation is developed. Please someone explain this to me.
First, envision a line of coins, with the value of each depicted by the variable ci:
c1 c2 c3 c4 c5 ... cn
If there are no coins, than obviously the max amount that can be made is 0. Likewise, if there is only 1 coin, the max amount is the value of that coin, c1. This accounts for the base case.
For the recursive case of the max value for n coins, start at cn, which is the right-most coin. Since the constraint is that you cannot select adjacent coins, the max value that you can achieve is either the right most coin plus the max achieved from 2 slots to the left (this accounts for the f(n - 2), or the max achieved by selecting the coin immediately to the left (accounting for the f(n - 1) case) and discarding the rightmost coin cn.
Considering the following line of coins again:
c1 c2 c3 c4 c5 c6
The f(6) case would look at c6 + the greatest amount from coins c1 - c4, OR the greatest amount from coins c1 - c5 (and excludes c6).
f(4), likewise, returns c4 + the greatest amount from coins c1 - c2, OR the great amount from coins c1 - c3 (again excluding c4).
f(2) returns c2 + c0 or the greatest amount from c1 (effectively c1) The first equates to c2, since c0 is 0 by the base case, and the second equates to c1 (again by the base case). So f(2) is really just the max of c1 or c2.
Note, too, that the f(n - 2) and f(n - 1) may be the same, since in the n - 1 case it might be beneficial to select the coin to the left (which is the f(n - 2) case). But that is why the first half is not merely f(n - 2), but also adds to it cn
Let's start at the end.
Let's denote the answer to the problem of n coins as F(n)
If you have zero coins, the amount is zero. so F(0) = 0.
If you have a single coin, the amount is that coin's value, so F(1)=c1
Now suppose someone told you the values of F(n-1), F(n-2). How can you use them to find F(n)?
If you have n coins, you have two possible moves:
Pick the nth coin, skip the adjacent one ((n-1)th coin, that's the rule!) and resume solving from there.
Skip the nth coin, and resume solving from the adjacent (n-1)th.
How do you express the notions of 1 and 2 with the tools you have?
If you pick the nth coin, it's value is Cn. Now you have to skip the (n-1)th coin, and continue solving from (n-2) coin. This is Cn + F(n-2).
If you skip the nth coin it contributes 0 to the solution, and now you resume solving from the (n-1)th coin. That's F(n-1).
Which one of either case 1 or case 2 is larger? You don't know. But you can express it as
max(Cn + F(n-2), F(n-1)),
which is saying "I don't which one is larger, but one of them is so return it please".
I hope this helps!
Let F(n) be the maximum amount that can be picked up from the row of n coins.
To derive a recurrence for F(n), we partition all the allowed coin selections into two groups:
those that include the last coin and those without it.
The largest amount we can get from the first group is equal to cn + F(n − 2)—
the value of the nth coin plus the maximum amount we can pick up from the first n − 2 coins.
The maximum amount we can get from the second group is equal to F(n − 1) by the definition of F(n).

Getting average of top 1/3, second 1/3, and last 1/3 of values in column

I have a column with numbers and a reference column. I'm trying to separate the numbers column into first third, second third, and last third and take the average of each.
Values Ref column
1.7 cow
2.3 cow
2.6 cow
1.8 sheep
1.3 sheep
2.2 sheep
1.5 sheep
1.2 sheep
2.3 sheep
1.5 goose
2.5 goose
So, for example, the average of the first two values for "sheep", second two, and last two. In other words, I want to take the average of each 1/3 of cells adjacent to "sheep".
Add a column to cumulatively count the instances of the word you're looking at, then check that row number in your AVERAGE.
C2 = =CountIf($B$2:$B2, $B2) and fill down => values should be {1,2,3,1,2,3,4,5,6,1,2}
E1 = sheep
E2 = =CountIf($B:$B, $E$1) => 6
E3 = {=Average(If(($B:$B = $E$1) * ($C:$C <= $E$2 / 3), $A:$A))} (note this is an array formula, as designated by the {} around it) => 1.55
E3 = {=Average(If(($B:$B = $E$1) * ($C:$C > $E$2 / 3) * ($C:$C <= 2 * $E$2 / 3), $A:$A))} => 1.85
E3 = {=Average(If(($B:$B = $E$1) * ($C:$C > 2 * $E$2 / 3), $A:$A))} => 1.75
Array formulas, if I remember correctly, are entered the same as normal formulas (don't include the {}, that gets entered automatically), but you press Ctrl (and possibly Shift) with Enter when you finish.
NB - these look at the entire column. You can speed them up by changing $A:$A to $A$2:$A$12 (likewise for $B:$B and $C:$C). Just bear in mind that for any data you append to this list, you'll need to update the formulas; but you can insert data into the middle of the list and it will update them automatically.
use a formula like this:
=AVERAGE(INDEX(A:A,MATCH($D$2,B:B,0)+(D3-1)*COUNTIF(B:B,$D$2)/3):INDEX(A:A,MATCH($D$2,B:B,0)+((D3)*COUNTIF(B:B,$D$2)/3)-1))
This does require that the ref column be sorted and like references grouped.
This array formula will return the averages even if not sorted:
=AVERAGE(INDEX(INDEX(A:A,N(IF({1},MODE.MULT(IF($B$1:$B$12=$D$2,ROW($A$1:$A$12)*{1,1}))))),N(IF({1},ROW(INDEX(A:A,1+(D3-1)*COUNTIF(B:B,$D$2)/3):INDEX(A:A,D3*COUNTIF(B:B,$D$2)/3))))))
array formula need to be entered with Ctrl-Shift-enter instead of Enter when exiting edit mode.
Well supposing there were 7 sheep values and you wanted to do a weighted mean (e.g. the first mean would be calculated from the first two sheep plus a third of the third one)?
I have attempted a general solution for this dividing any number of animals into any number of fractions and finding their average values. My approach is to use the elegant overlap formula from #Barry Houdini as used here and work out the overlap between the intervals (in the case of 7 animals divided into 3):
0 to 2.33
2.33 to 4.67
4.67 to 7
and the numbers of the animals
0 to 1
1 to 2
2 to 3
and so on.
In H4
=IF(ROWS($1:1)<=$H$2,ROWS($1:1)/$H$2*COUNTIF(B$2:B$16,$G$2),"")
In G4
=IF(H4="","",H4-COUNTIF(B$2:B$16,$G$2)/$H$2)
The main formula in I4 is
=IF(H4="","",SUM(TEXT(IF(C$2:C$16<H4,C$2:C$16,H4)-IF((C$2:C$16-1)>G4,C$2:C$16-1,G4),"general;\0")
*A$2:A$16*(B$2:B$16=$G$2))/(COUNTIF(B$2:B$16,$G$2)/$H$2))
entered as an array formula.
The fractions can be changed to halves, quarters etc. by changing the number in H2.

Using If,Then function in Excel

I have a retail store that sells items on consignment for a fee that varies based on Selling Price.
So my question is how do I write a formula that checks the selling price and then charges the correct consignment fee to calculate the net based on the following schedule:
When selling price is over $400 then charge = 20%
When Selling price is $100 to $400 then charge = 30%
When Selling price is under $100 then charge = 40%
BLUF: use nested IF statements (an IF inside an IF) --
Example:
=IF(A2>=400, (A2*0.2), (IF(400 > A2 >= 100, (A2 *0.3), (A2*0.4))))
Or use the below if you suspect someone will foolishly enter a negative number or something nonsensical:
=IF(A2>=400, (A2*0.2), (IF(400 > A2 >= 100, (A2 *0.3), (IF(A2 < 100, (A2*0.4), (0))))))
That may look complex, but let's break it down from the beginning.
The basic formula for the IF statement:
=IF(testCondition, (resultIfTrue), (resultIfFalse))
One IF statement will only allow you to do two of your 3 conditions:
=IF(A1 > 400, (A1 * .20), (A1 * .30))
The above basically says that if the number in cell A1 is greater than 400, then the value in your current cell (e.g. B2) is A1 * 20%. But if the number in A1 is NOT greater than 400, the value in your cell will be A1 * 30%.
But how do we calculate the range you were asking (i.e. 100 - 400) and how do we add in a third possibility (i.e. the possibility that the number is less than 100)?
The answer is to use a nested IF. You can tell the cell what it's value should be if the condition is true, but you can test another condition if the answer is false (i.e. the next IF statement stands in the place of resultIfFalse.
=IF(testCondition, (resultIfTrue), (IF(testCondition, (resultIfTrue), (resultIfFalse))))
The above can handle 3 different scenarios. Out of the IFs above, you could also replace the second IFs resultIfFalse with yet another IF statement, and so on. You can nest up to 64 IF statements.
Extra Resources:
http://fiveminutelessons.com/learn-microsoft-excel/using-multiple-if-statements-excel
http://spreadsheets.about.com/od/tipsandfaqs/qt/nested_if.htm
Nested IF functions are not user friendly and require hardcoding your variables (percentages). My suggestion would always be to have a small table of values elsewhere: eg. put the following in A1:B3
0 0.4
100 0.3
401 0.2
Assuming your data is in D1 you can use the following formula in E1 and drag down if necessary
=INDEX($B$1:$B$3,MATCH($D1,$A$1:$A$3,1))
This way you can change your boundaries/ add more conditions easily without more nested IF statements
Try using this instead:
=IF(B1<100,B1*40%,IF((B1>=100)*AND(B1<=400),B1*30%,IF(B1>100,B1*20%,"Invalid Price"))
This formula contains nested-ifs and a logical AND condition which will give the result.

Resources