Generate permutations - excel

I have n players to assign to n games. 10 <= n <= 20. Each player can sign up for up to 3 games but will only get one. Different players have different score for each game they sign up for.
Example with 10 players:
It's always possible to assign players x to game x but it will not always give the highest score in total.
My goal is to get as high score as possible and I therefore want to test the different permutations. I could teoretically test all permutations and throw away the unfeasible ones but it will give me a hughe number of possibilities (n!).
Is it possible to reduce the problem with the sign up limit of max 3 games? Maybe this can be done more easily than my approach? Any thoughts?
I'm working in Excel VBA.
I hope you find this as interesting as I do ...
Sorry if you find this unclear! My question is if it's possible to generate a subset of all the permutations. More precise only the feasible ones (which are the ones without any zero score).

Well, just set this up in the solver using Linear Programming as you can see in the image. Have shown the formulae so you can build it as well, along with the solver settings.
Won't give the permutations, but does solve for the highest combination.
Edit, updated image... it now shows correct ranges for the calculations, after trying to make it fit a reasonable size...

Related

How to optimize a strategy for a simultaneous move game in excel

Good day,
As homework ive got to create a strategy in excel, that will be put up against the strategies of others. It is applied to a game that works as follows;
Two strategies are put up against each other
You have the option to put a coin in an envelope.
If you put a coin in your score drops by 1.
If you dont put a coin in your score remains the same.
The players make their decisions simltaneously then you swap envelopes with the opponent.
If you dont get a coin in the envelope you get 0.
If there is a coin in the envelope you get 2. Meaning, if you didnt put one in and there is a coin, you get plus 2, and if you did you and you get a coin your score changes by plus 1.
If no one puts a coin in ,the score doesnt change so plus 0.
If you put one in and the opponent didn't your score decreases by 1.
The game is to be played for 10000 rounds, with the objective to
maximize your own profit and minimize the profit of the opponent.
I have added an image of my work so far, my tracked score is currently higher, than that of a randomized opponent, yet i remain curious, wether someone can beat my strategy or if anyone has any ideas for optimization.
thanks in advance
What I tried first was to cooperate in the first round, after which i decided to copy the move of the opponent in the previous round for 5 rounods. Then I started with the thought of, he gave one in the first round and the sum of his past 5 rounds where bigger than 2 , he decided to cooperate, so we give one, otherwise we dont. Then until the end i procede with:
=IF(AND(B10=1;SUM(B6:B10)>2);1;IF(AND(SUM(D9:D10)=0;SUM(B4:B8)>2);1;0))
so if he gave one in the previous round and the sum of his past 5 rounds are is bigger than two we give one, if not the we see if i did not give twice in a row because he did not give enough( for five rounds > 2) then we give one, otherwise we give 0.
Here is the code for the scoring that i used.
=IF(AND(A2=C2;SUM(A2:C2)=2);1;IF(AND(A2=0;C2=1);2;IF(AND(C2=0;A2=1);-1;0))
The tracked score consists of the sum, of the previous rounds and the current round.
I expect to profit more than the opponent, but so far the opponent has only been a random number generator , and would like some help with improving the strategy.

Pointwise Mutual Information Formula Clarification

Pointwise Mutual Information or PMI for short is given as
https://latex.codecogs.com/svg.image?%5Cfrac%7BP(bigram)%7D%7BP(1st%20Word)%20*%20P(2nd%20Word)%7D
Which is the same as:
https://latex.codecogs.com/svg.image?log_%7B2%7D%5Cfrac%7B%5Cfrac%7BBigramOccurrences%7D%7BN%7D%7D%7B%5Cfrac%7B1stWordOccurrences%7D%7BN%7D%20*%20%5Cfrac%7B2ndWordOccurrences%7D%7BN%7D%7D
Where BigramOccurrences is number of times bigram appears as feature, 1stWordOccurrences is number of times 1st word in bigram appears as feature and 2ndWordOccurrences is number of times 2nd word from the bigram appears as feature. Finally N is given as number of total words.
We can tweak the following formula a bit and get the following:
https://latex.codecogs.com/svg.image?log_%7B2%7D%5Cfrac%7BBigramOccurrences*%20N%7D%7B1stWordOccurrences%20*%202ndWordOccurrences%7D
Now the part that confuses me a bit is the N in the formula. From what I understand it should be a total number of feature occurrences, even though it is described as total number of words. So essentially I wouldn't count total number of words in dataset (as that after some preprocessing doesn't seem like it makes sense to me), but rather I should count the total number of times all bigrams that are features have appeared as well as single words, is this correct?
Finally, one other thing that confuses me a bit is when I work with more than bigrams, so for example trigrams are also part of features. I would then, when calculating PMI for a specific bigram, not consider count of trigrams for N in the given formula? Vice-versa when calculating PMI for a single trigram, the N wouldn't account for number of bigrams, is this correct?
If I misunderstood something about formula, please let me know, as the resources I found online don't make it really clear to me. Also I am sorry for not including formulas directly but rather as urls, I don't have enough reputation to link images and show them properly...

Creating a ramdom baseball batting order in excel

I am trying to build a baseball spreadsheet for my little league team. I have 10 players and 15 games. I am trying to figure out a way to populate the batting order where everyone gets an equal numbers of games batting 1-10 in the order.
Does anyone know of an easy way to do this?
As mentioned in the comments, it's impossible to evenly distribute 10 players in 10 batting order positions over 15 games. It's similar to trying to divide 15 by 10 evenly. So you'll have to get creative on how to handle the five remaining games (fundraiser?)
Random but Fair For Ten Games
In answer to your question of creating a random batting order, you'd probably only want to randomize once, and then offset the order by 1 position each game. This will get you each player in one position in for ten games.
Random Each Game
If you truly wanted random each game, you'd probably have some undesirable distribution with some kids getting to bet first multiple times while others never get a turn (hazards of truly random).
I built out a quick spreadsheet on google sheets here that you can see how such a tool could work. If you download and hit delete in a blank cell it will re-randomize the sheet. It's using the functions index and Rank and Rand.
Good luck.

Small data anomaly detection algo

I have the following 3 cases of a numeric metric on a time series(t,t1,t2 etc denotes different hourly comparisons across periods)
If you notice the 3 graphs t(period of interest) clearly has a drop off for image 1 but not so much for image 2 and image 3. Assume this is some sort of numeric metric(raw metric or derived) and I want to create a system/algo which specifically catches case 1 but not case 2 or 3 with t being the point of interest. While visually this makes sense and is very intuitive I am trying to design a way to this in python using the dataframes shown in the picture.
Generally the problem is how do I detect when the time series is behaving very differently from any of the prior weeks.
Edit: When I say different what I really mean is, my metric trends together across periods in t1 to t4 but if they dont and try to separate out of the envelope, that to me is an anomaly. If you notice chart 1 you can see t tries to split out from rest of the tn this is an anomaly for me. in other cases t is within the bounds of other time periods. Hope this helps.
With small data the best is if you can come up with a good transformation into a simpler representation.
In this case I would try the following:
Distance to the median along the time-axis. Then a summary of that, could be median, Mean-Squared-Error etc
Median of the cross-correlation of the signals

Statistically removing erroneous values

We have a application where users enter prices all day. These prices are recorded in a table with a timestamp and then used for producing charts of how the price has moved... Every now and then the user enters a price wrongly (eg. puts in a zero to many or to few) which somewhat ruins the chart (you get big spikes). We've even put in an extra confirmation dialogue if the price moves by more than 20% but this doesn't stop them entering wrong values...
What statistical method can I use to analyse the values before I chart them to exclude any values that are way different from the rest?
EDIT: To add some meat to the bone. Say the prices are share prices (they are not but they behave in the same way). You could see prices moving significantly up or down during the day. On an average day we record about 150 prices and sometimes one or two are way wrong. Other times they are all good...
Calculate and track the standard deviation for a while. After you have a decent backlog, you can disregard the outliers by seeing how many standard deviations away they are from the mean. Even better, if you've got the time, you could use the info to do some naive Bayesian classification.
That's a great question but may lead to quite a bit of discussion as the answers could be very varied. It depends on
how much effort are you willing to put into this?
could some answers genuinely differ by +/-20% or whatever test you invent? so will there always be need for some human intervention?
and to invent a relevant test I'd need to know far more about the subject matter.
That being said the following are possible alternatives.
A simple test against the previous value (or mean/mode of previous 10 or 20 values) would be straight forward to implement
The next level of complexity would involve some statistical measurement of all values (or previous x values, or values of the last 3 months), a normal or Gaussian distribution would enable you to give each value a degree of certainty as to it being a mistake vs. accurate. This degree of certainty would typically be expressed as a percentage.
See http://en.wikipedia.org/wiki/Normal_distribution and http://en.wikipedia.org/wiki/Gaussian_function there are adequate links from these pages to help in programming these, also depending on the language you're using there are likely to be functions and/or plugins available to help with this
A more advanced method could be to have some sort of learning algorithm that could take other parameters into account (on top of the last x values) a learning algorithm could take the product type or manufacturer into account, for instance. Or even monitor the time of day or the user that has entered the figure. This options seems way over the top for what you need however, it would require a lot of work to code it and also to train the learning algorithm.
I think the second option is the correct one for you. Using standard deviation (a lot of languages contain a function for this) may be a simpler alternative, this is simply a measure of how far the value has deviated from the mean of x previous values, I'd put the standard deviation option somewhere between option 1 and 2
You could measure the standard deviation in your existing population and exclude those that are greater than 1 or 2 standard deviations from the mean?
It's going to depend on what your data looks like to give a more precise answer...
Or graph a moving average of prices instead of the actual prices.
Quoting from here:
Statisticians have devised several methods for detecting outliers. All the methods first quantify how far the outlier is from the other values. This can be the difference between the outlier and the mean of all points, the difference between the outlier and the mean of the remaining values, or the difference between the outlier and the next closest value. Next, standardize this value by dividing by some measure of scatter, such as the SD of all values, the SD of the remaining values, or the range of the data. Finally, compute a P value answering this question: If all the values were really sampled from a Gaussian population, what is the chance of randomly obtaining an outlier so far from the other values? If the P value is small, you conclude that the deviation of the outlier from the other values is statistically significant.
Google is your friend, you know. ;)
For your specific question of plotting, and your specific scenario of an average of 1-2 errors per day out of 150, the simplest thing might be to plot trimmed means, or the range of the middle 95% of values, or something like that. It really depends on what value you want out of the plot.
If you are really concerned with the true max and true of a day's prices, then you have to deal with the outliers as outliers, and properly exclude them, probably using one of the outlier tests previously proposed ( data point is x% more than next point, or the last n points, or more than 5 standard deviations away from the daily mean). Another approach is to view what happens after the outlier. If it is an outlier, then it will have a sharp upturn followed by a sharp downturn.
If however you care about overall trend, plotting daily trimmed mean, median, 5% and 95% percentiles will portray history well.
Choose your display methods and how much outlier detection you need to do based on the analysis question. If you care about medians or percentiles, they're probably irrelevant.

Resources