Calculating poker preflop equity efficient - statistics

I've read many articles about the Monte Carlo algorithm for approximating the preflop equity in NL holdem poker.
Unfortunately, it iterates over only a few possible boards to see what happens. The good thing about this is that you can put in exact hand ranges.
Well, I don't need exact ranges. It's good enough to say "Top 20% vs Top 35%".
Is there a simple formula to tell (or approximate) the likelihood of winning or losing? We can ignore splits here.
I can imagine that the way to calculate the odds will become much simpler if we just using two (percentile) numbers instead of all possible card combinations.
The thing is, I don't know if for example the case "Top 5% vs Top 10%" is equal to "Top 10% vs Top 20%".
Does anyone know of a usable relation or a formula for these inputs?
Thanks

Okay, I've made a bit analytical work and I came up wit the following.
The Formula
eq_a(a, b) := 1/2 - 1/(6*ln(10)) * ln(a/b)
Or if you like:
eq_a(a, b) := 0.5 - 0.072382 * ln(a/b)
Where a is the range in percent (0 to 1) for player a. Same for b.
The function outputs the equity for player a. To get the equity for player b just swap the two ranges.
When we plot the function it will look like this: (Where a = x and b = y)
As you can see it's very hard to get an equity greater than 80% preflop (as even AA isn't that good mostly).
How I came up with this
After I've done some analysis I became aware of the fact that the probability of winning is dependent on just the ratio of the two ranges (same for multiway pots).
So:
eq_a(a, b) = eq(a * h, b * h)
And yes, Top 5% vs Top 10% has the same equities as Top 50% vs Top 100%.
The way I've got the formula is I've done some regressions on sample data I've calculated with an app and picked the best fit (the logarithmic one). Then I optimised it using special cases like eq_a(0.1, 1)=2/3 and eq_a(a, a)=1/2.
It would be great if someone will do the work for multiway preflop all-ins.

Related

Small data anomaly detection algo

I have the following 3 cases of a numeric metric on a time series(t,t1,t2 etc denotes different hourly comparisons across periods)
If you notice the 3 graphs t(period of interest) clearly has a drop off for image 1 but not so much for image 2 and image 3. Assume this is some sort of numeric metric(raw metric or derived) and I want to create a system/algo which specifically catches case 1 but not case 2 or 3 with t being the point of interest. While visually this makes sense and is very intuitive I am trying to design a way to this in python using the dataframes shown in the picture.
Generally the problem is how do I detect when the time series is behaving very differently from any of the prior weeks.
Edit: When I say different what I really mean is, my metric trends together across periods in t1 to t4 but if they dont and try to separate out of the envelope, that to me is an anomaly. If you notice chart 1 you can see t tries to split out from rest of the tn this is an anomaly for me. in other cases t is within the bounds of other time periods. Hope this helps.
With small data the best is if you can come up with a good transformation into a simpler representation.
In this case I would try the following:
Distance to the median along the time-axis. Then a summary of that, could be median, Mean-Squared-Error etc
Median of the cross-correlation of the signals

Matlab - Optimize blend of two stream sub constraint to max profit

I am trying to port a very simple Excel into a Matlab code (I am not completely satisfied with Excel Solver!). My problem is this:
I have two materials (say A and B) with their properties (density, visco, etc) and prices, and I mix them to obtain a third material (say C), whose properties are a mix (non necessarily linear) of the two, and which, if it respects some limits (ie density max X, visco max Y), can be sold for a certain price. What I have is a function which takes the quantity of A and B, their properties, their prices, material C limits, and material C price. It then comes up with a profit (i.e. price C * (quantity A + quantity B) - (price A * quantity A + price B * quantity B) ), and an indicator which tells me if all the properties limits are satisfied in material C (basically it compares limits and actual properties, and puts 0 if ok and 1 otherwise ---> if all properties are respected, mean of that vector should be 0).
Thus I have:
[profit, ok] = blend([qA, qB], [specA, specB], [pA, pB], [limits], pC)
and I want to max profit by changing quantity A and quantity B, sub that the ok vector is 0 and qA+qB is less than a specified max quantity. The real problem is imposing the ok vector equal to 0. I thought about porting the limit check outside of the function, but I can only check if the limits are respected once the function has calculated the property of the blends, so I cannot put that outside. Is there a solution to this? Many thanks!
What your are looking for is called nonlinear constrained optimization, and most likely specifically the function fmincon
I am afraid, that it is probably best if you unpack your function to fit into the standard scheme. This is incidentally a very good thing to learn, as this is the common way to express such problems.
The call is
x=fmincon(fun,x0,A,b,Aeq,beq,lb,ub,nonlcon)
You have two parameters, giving the quantities of your materials, or if you normalize the total quantity to 1, you could get by with just one parameter x and express the other as (1-x) in all equations.
So you would need to write one function fun, that just computes the profit based on the parameters.
The material constraints are then put into the remaining parameters. You could put all constraints into the ceq return of nonlcon, as explained here to return zero when the mix is ok.
However, cleaner and more efficient is to encode all linear constraints using the A and b matrices.
For more details I would require the actual constraints and functions that you have.

Optimization of a list in Excel with Variables

I have a list of 153 golfers with associated salaries and average scores.
I want to find the combination of 6 golfers that optimizes avg score and keeps salary under $50,000.
I've tried using Solver, but I am stuck! Can anyone help please? :)
Illustrating a solution that is pretty close to what #ErwinKalvelagen suggested.
Column A is the names of the 153 golfers
Column B is the golfers salaries (generated by =RANDBETWEEN(50, 125)*100, filled down, then Copy/Paste Values)
Column C is the golfers average scores (generated by =RANDBETWEEN(70, 85), filled down, then Copy/Paste Values)
Column D is a 0 or 1 to indicate if the golfer is included.
Cell F2 is the total salary, given by =SUMPRODUCT(B2:B154,D2:D154)
Cell G2 is the number of golfers, given by =SUM(D2:D154)
Cell H2 is the average score of the team, given by =SUMPRODUCT(C2:C154,D2:D154)/G2
The page looks like this, before setting up Solver ...
The Solver setup looks like this ...
According to the help, it says to use Evolutionary engine for non-smooth problems. In Options, I needed to increase the Maximum Time without improvement from 30 to 300 (60 may have been good enough).
It took a couple of minutes for it to complete. It reached the solution of 70 fairly quickly, but spent more time looking for a better answer.
And here are the six golfers it came up with.
Of the golfers with an average of 70, it could have found a lower salary.
In Cell I2 added the formula =F2+F2*(H2-70) which is essentially salary penalized by increases in average score above 70 ...
... and use the same Solver setup, except to minimize Cell I2 instead of H2 ...
and these are the golfers it chose ...
Again - it looks like there is still a better solution. It could have picked Name97 instead of Name96.
This is a simple optimization problem that can be solved using Excel solver (just use "Simplex Lp solver" -- somewhat of a misnomer as we will use it here to solve an integer programming or MIP problem).
You need one column with 153 binary (BIN) variables (Excels limit is I believe 200). Make sure you add a constraint to set the values to Binary. Lets call this column INCLUDE; Solver will fill it with 0 or 1 values. Sum these values, and add a constraint with SUMINCLUDE=6. Then add a column with INCLUDE * SCORE. Sum this column and this is your objective (optimizing the average is the same as optimizing the sum). Then add a column with INCLUDE*SALARY and sum these. Add a constraint with SUMSALARY <= 50k. Press solve and done.
I don't agree with claims that Excel will crash on this or that this does not fit within the limits of Excels solver. (I really tried this out).
I prefer the simplex method above the evolutionary solver as the simplex solver is more suitable for this problem: it is faster (simplex takes < 1 seconds) and provides optimal solutions (evolutionary solver gives often suboptimal solutions).
If you want to solve this problem with Matlab a function to look at is intlinprog (Optimization Toolbox).
To be complete: this is the mathematical model we are solving here:
Results with random data:
....

Excel Solver Using Strings

I'm going to try to explain this the best that I can.
Right now I have a spreadsheet with a list of football players, each of which has an assigned salary and projected point total for the week.
My goal is to use Solver or some other method to determine the best combination of players to maximize the projected point total while staying under a salary cap.
In this example I have 4 separate player lists, like this:
QB: Player A, Player B, Player C...Player N
RB: Player a, Player b, Player c...Player N
WR: Player X, Player Y, Player Z...Player N
TE: Player x, Player y, Player z...Player N
I need the best combination that includes 2 QBs, 2 RBs, 2 WRs, 1 TE, and 2 "Flex", which means any of RB/WR/TE.
I have tried using Solver to maximize the projected point total, but the variable fields in this case would be the Player's Names and it seems like the variable field needs to be a number, not a list of strings.
Any ideas?
My favorite kind of question :)
Here is the model setup:
Top table shows the decision variables: = 1 if player i = A, B, ..., N of list L = QB, .., TE is selected, =0 otherwise.
Entries in column R, (next to the top table) are the sums of each row. These must be constrained with the numbers in column T. Cell R7 is the total sum of players, which should be 9: 2 flexible and 7 as per the individual list requirements.
Middle table shows the salaries (randomly generated between 50,000 and 150,000). The Sum of Salaries formula is =SUMPRODUCT(C11:P14,C3:P6). The idea here is that only the salaries of players that are selected are taken into account. This SUMPRODUCT should be constrained with the budget, which is in cell T14. For my experiment, I put it equal to 80% of the total sum of all salaries.
Objective: Bottom table shows the projected points for each player. The formula in cell R22 is =SUMPRODUCT(C19:P22,C3:P6) (same logic as with salaries above). This is the value to be maximized.
Solver Model shown below:
I suggest selecting Simplex LP and going to Options and setting the Integer Optimality to zero (0).
Result:
Solver manages to find an optimal solution. The problem is really small and it is very quick. Solver works with up to 200 variables and 100 constraints, for large problems you will need the (commercial) extended version:
Of course, you can just order the real player names so that they fit this setting. For example, if you sort the players of each list alphabetically, then (Player A, QB) = first player of team QB, etc.
I hope this helps! Let me know if you would like me to upload the file for you.
Best,
Ioannis
Excel's solver is built on numerical methods. Applying to a domain that consists of discrete values, like strings or football players is probably going to fail. You should consider writing a brute force solver in a "real" programming language, like c#, java, python, ruby, or javascript. If there are performance problems, then optimize from there.
Solver won't work here because it's not a numeric solution you're after.
Make a spreadsheet that has every possible combination of position players (that meet your criteria) on each row. Then make an Excel formula that calculates projected point total based on the players in that row. Sort the spreadsheet by your projected point column.

Statistically removing erroneous values

We have a application where users enter prices all day. These prices are recorded in a table with a timestamp and then used for producing charts of how the price has moved... Every now and then the user enters a price wrongly (eg. puts in a zero to many or to few) which somewhat ruins the chart (you get big spikes). We've even put in an extra confirmation dialogue if the price moves by more than 20% but this doesn't stop them entering wrong values...
What statistical method can I use to analyse the values before I chart them to exclude any values that are way different from the rest?
EDIT: To add some meat to the bone. Say the prices are share prices (they are not but they behave in the same way). You could see prices moving significantly up or down during the day. On an average day we record about 150 prices and sometimes one or two are way wrong. Other times they are all good...
Calculate and track the standard deviation for a while. After you have a decent backlog, you can disregard the outliers by seeing how many standard deviations away they are from the mean. Even better, if you've got the time, you could use the info to do some naive Bayesian classification.
That's a great question but may lead to quite a bit of discussion as the answers could be very varied. It depends on
how much effort are you willing to put into this?
could some answers genuinely differ by +/-20% or whatever test you invent? so will there always be need for some human intervention?
and to invent a relevant test I'd need to know far more about the subject matter.
That being said the following are possible alternatives.
A simple test against the previous value (or mean/mode of previous 10 or 20 values) would be straight forward to implement
The next level of complexity would involve some statistical measurement of all values (or previous x values, or values of the last 3 months), a normal or Gaussian distribution would enable you to give each value a degree of certainty as to it being a mistake vs. accurate. This degree of certainty would typically be expressed as a percentage.
See http://en.wikipedia.org/wiki/Normal_distribution and http://en.wikipedia.org/wiki/Gaussian_function there are adequate links from these pages to help in programming these, also depending on the language you're using there are likely to be functions and/or plugins available to help with this
A more advanced method could be to have some sort of learning algorithm that could take other parameters into account (on top of the last x values) a learning algorithm could take the product type or manufacturer into account, for instance. Or even monitor the time of day or the user that has entered the figure. This options seems way over the top for what you need however, it would require a lot of work to code it and also to train the learning algorithm.
I think the second option is the correct one for you. Using standard deviation (a lot of languages contain a function for this) may be a simpler alternative, this is simply a measure of how far the value has deviated from the mean of x previous values, I'd put the standard deviation option somewhere between option 1 and 2
You could measure the standard deviation in your existing population and exclude those that are greater than 1 or 2 standard deviations from the mean?
It's going to depend on what your data looks like to give a more precise answer...
Or graph a moving average of prices instead of the actual prices.
Quoting from here:
Statisticians have devised several methods for detecting outliers. All the methods first quantify how far the outlier is from the other values. This can be the difference between the outlier and the mean of all points, the difference between the outlier and the mean of the remaining values, or the difference between the outlier and the next closest value. Next, standardize this value by dividing by some measure of scatter, such as the SD of all values, the SD of the remaining values, or the range of the data. Finally, compute a P value answering this question: If all the values were really sampled from a Gaussian population, what is the chance of randomly obtaining an outlier so far from the other values? If the P value is small, you conclude that the deviation of the outlier from the other values is statistically significant.
Google is your friend, you know. ;)
For your specific question of plotting, and your specific scenario of an average of 1-2 errors per day out of 150, the simplest thing might be to plot trimmed means, or the range of the middle 95% of values, or something like that. It really depends on what value you want out of the plot.
If you are really concerned with the true max and true of a day's prices, then you have to deal with the outliers as outliers, and properly exclude them, probably using one of the outlier tests previously proposed ( data point is x% more than next point, or the last n points, or more than 5 standard deviations away from the daily mean). Another approach is to view what happens after the outlier. If it is an outlier, then it will have a sharp upturn followed by a sharp downturn.
If however you care about overall trend, plotting daily trimmed mean, median, 5% and 95% percentiles will portray history well.
Choose your display methods and how much outlier detection you need to do based on the analysis question. If you care about medians or percentiles, they're probably irrelevant.

Resources