Manager game: How to calculate market values? - statistics

Usually players in a soccer manager game have market values. The managers sell their players in accordance with these market values. They think: "Oh, the player is worth 3,000,00 so I'll try to sell him for 3,500,000".
All players have three basic qualities:
strength value (1-99)
maximal strength they can ever attain (1-99)
motivation (1-5)
current age (16-40)
Based on these values, I calculate the market values at the moment. But I would like to calculate the market values dynamically according to the player transfers in the last period of time. How could I do this?
I have the above named qualities and the player transfers of the last period of time available for calculation.
How could I calculate it? Do I have to group the last transferred players by the qualities and simply take the average transfer price?
I hope you can help me.
Note: players=items/goods, managers=users

My suggestion: define a distance function that takes two players stats and return a distance value. Now that you have a distance between the two (that corresponds to the similarity between them) you can use the K-means algorithm to find clusters of similar players.
For each cluster you can take a number of values that can help you calculate the so called 'market price' (like the average or median value).
Here's a very simple example of how you could compute the distance function between two players:
float distance(Player player1, Player player2){
float distance = 0.0;
distance += abs(player1.strength - player2.strength) / strengthRange;
distance += abs(player1.maxStrength - player2.maxStrength) / maxStrength;
distance += abs(player1.motivation - player2.motivation) / motivationRange;
distance += abs(player1.age - player2.age) / ageRange;
return distance;
}
Now that you have the distance function you can apply the k-means algorithm:
Assign each player randomly to a cluster.
Now compute the centroid of each cluster. In your case the centroid coordinates will be (strength, maxStrength, motivation, age). To compute the centroid strength coordinate, for example, just average the strengths for the all players in the cluster.
Now assign each player to the nearest centroid. Note that in this step some players may have its cluster changed.
Repeat steps 2 and 3 until you have convergence or, in other words, until no player have its cluster changed in step 3.
Now that you have the clusters, you can calculate the average price fore similar players.

One thing that you could do is look at recent transfers of similar(1) players. Say all transfers within 2-5 game weeks of similar players and then take the average (or median or some other calculated value) of their sale price.
(1) You will have to define similiar in some way, i.e a defender with +-10 in defence, +-3 in passing and +-2 years of age. More factors give more precise results.

Or you could use a little Economics 101 and try to define the supply and demand for that specific player based on:
Number of players in the league with similar capabilities (you could use the clustering method mentioned before) and number of those players "available" for transfer
Number of teams that own the players with similar capabilities and number of teams that are in need for such players
Now with these number you could calculate the supply (available players for transfer) and demand (teams in need for those players) and use that to modify your base price (which can be your last transfer price or a base price for a player) up or down (ie more demand than supply will tend to push the prices up and vice versa)
After that it becomes negotiation game where you can take a look at some of the Game Theory literature to solve the actual exchange price.
Hope this at least give you a different look into it.

Related

QuantLib: Building Key Rate Risks

I was able to build a discount curve for the Treasury market. However, I'm looking to use this to find the key rate risks of an individual bond (and eventually a portfolio of bonds).
The key rate risk I'm looking for is if I have a 30Y bond and we shift the 1y rate that was used to discount the bond, while holding the other rates constant, how much does the price of the bond change by? Repeating this for the tenors (eg. 2Y, 5Y, 7Y, etc) and summing the result should get you to the overall duration of the bond, but provides a better view of how the risk exposure breaks down.
http://www.investinganswers.com/financial-dictionary/bonds/key-rate-duration-6725
Is anyone aware of any documentation that demonstrates how to do this? Thank you.
Given that you have already built the bond and the discount curve, and you have linked them in some way similar to:
discount_handle = RelinkableYieldTermStructureHandle(discount_curve)
bond.setPricingEngine(DiscountingBondEngine(discount_handle))
you can first add a spread over the existing discount curve and then use the modified curve to price the bond. Something like:
nodes = [ 1, 2, 5, 7, 10 ] # the durations
dates = [ today + Period(n, Years) for n in nodes ]
spreads = [ SimpleQuote(0.0) for n in nodes ] # null spreads to begin
new_curve = SpreadedLinearZeroInterpolatedTermStructure(
YieldTermStructureHandle(discount_curve),
[ QuoteHandle(q) for q in spreads ],
dates)
will give you a new curve with initial spreads all at 0 (and a horrible class name) that you can use instead of the original discount curve:
discount_handle.linkTo(new_curve)
After the above, the bond should still return the same price (since the spreads are all null).
When you want to calculate a particular key-rate duration, you can move the corresponding quote: for instance, if you want to bump the 5-years quote (the third in the list above), execute
spreads[2].setValue(0.001) # 10 bps
the curve will update accordingly, and the bond price should change.
A note: the above will interpolate between spreads, so if you move the 5-years points by 10 bps and you leave the 2-years point unchanged, then a rate around 3 years would move by about 3 bps. To mitigate this (in case that's not what you want), you can add more points to the curve and restrict the range that varies. For instance, if you add a point at 5 years minus one month and another at 5 years plus 1 month, then moving the 5-years point will only affect the two months around it.

Handling Negative Values when Calculating Average & Value

I am working on a tool for Fantasy Football that calculates the average value a player offers per million pounds of cost. It essentially boils down to their average points per game divided by their cost.
So for example, a player who costs £10m and scores an average of 5 points per game offers 0.5 points per game per million. Whereas a player who costs £8m and scores an average of 5 points per game offers 0.625 points per game per million. Clearly the player who costs £8m is better value.
My problem is, players are capable of scoring negatively, and so how do I account for that in calculating the value of a player?
To give another example, a player who costs £10m and scores an average of -2 points per game offers -0.2 points per game per million. Whereas a player who costs £8m and scores an average of -2 points per game offers -0.25 points per game per million.
Now the player who costs £10m appears to be better value because their PPG/£m is higher. This shouldn't be true, they can't be better value if they cost more but score the same points. So if I have a list of players sorted by their value, calculated in this manner, some players will incorrectly show higher than players that are technically better value.
Is there a way to account for this problem? Or is just an unfortunate fact of the system I'm using?
One simple trick will be to slightly change your formula for PPG/£m as the ratio of the square of the average points he scored and the cost.
If you are particular about the scales, consider its positive square root.

Tests to Compare Sales Mix Percent between Periods

Background
I wish to compare menu sales mix ratios for two periods.
A menu is defined as a collection of products. (i.e., a hamburger, a club sandwich, etc.)
A sales mix ratio is defined as a product's sales volume in units (i.e., 20 hamburgers) relative to the total number of menu units sold (i.e., 100 menu items were sold). In the hamburger example, the sales mix ratio for hamburgers is 20% (20 burgers / 100 menu items). This represents the share of total menu unit sales.
A period is defined as a time range used for comparative purposes (i.e., lunch versus dinner, Mondays versus Fridays, etc.).
I am not interested in overall changes in the volume (I don't care whether I sold 20 hamburgers in one period and 25 in another). I am only interested in changes in the distribution of the ratios (20% of my units sold were hamburgers in one period and 25% were hamburgers in another period).
Because the sales mix represents a share of the whole, the mean average for each period will be the same; the mean difference between the periods will always be 0%; and, the sum total for each set of data will always be 100%.
Objective:
Test whether the sales distribution (sales mix percentage of each menu item relative to other menu items) changed significantly from one period to another.
Null Hypothesis: the purchase patterns and preferences of customers in period A are the same as those for customers in period B.
Example of potential data input:
[Menu Item] [Period A] [Period B]
Hamburger 25% 28%
Cheeseburger 25% 20%
Salad 20% 25%
Club Sandwich 30% 27%
Question:
Do common methods exist to test whether the distribution of share-of-total is significantly different between two sets of data?
A paired T-Test would have worked if I was measuring a change in the number of actual units sold, but not (I believe) for a change in share of total units.
I've been searching online and a few text books for a while with no luck. I may be looking for the wrong terminology.
Any direction, be it search terms or (preferably) the actual names appropriate tests, are appreciated.
Thanks,
Andrew
EDIT: I am considering a Pearson Correlation test as a possible solution - forgetting that each row of data are independent menu items, the math shouldn't care. A perfect match (identical sales mix) would receive a coefficient of 1 and the greater the change the lower the coefficient would be. One potential issue is that unlike a regular correlation test, the changes may be amplified because any change to one number automatically impacts the others. Is this a viable solution? If so, is there a way to temper the amplification issue?
Consider using a Chi Squared Goodness-of-Fit test as a simple solution to this problem:
H0: the proportion of menu items for month B is the same as month A
Ha: at least one of the proportions of menu items for month B is
different to month A
There is a nice tutorial here.

Excel Solver Using Strings

I'm going to try to explain this the best that I can.
Right now I have a spreadsheet with a list of football players, each of which has an assigned salary and projected point total for the week.
My goal is to use Solver or some other method to determine the best combination of players to maximize the projected point total while staying under a salary cap.
In this example I have 4 separate player lists, like this:
QB: Player A, Player B, Player C...Player N
RB: Player a, Player b, Player c...Player N
WR: Player X, Player Y, Player Z...Player N
TE: Player x, Player y, Player z...Player N
I need the best combination that includes 2 QBs, 2 RBs, 2 WRs, 1 TE, and 2 "Flex", which means any of RB/WR/TE.
I have tried using Solver to maximize the projected point total, but the variable fields in this case would be the Player's Names and it seems like the variable field needs to be a number, not a list of strings.
Any ideas?
My favorite kind of question :)
Here is the model setup:
Top table shows the decision variables: = 1 if player i = A, B, ..., N of list L = QB, .., TE is selected, =0 otherwise.
Entries in column R, (next to the top table) are the sums of each row. These must be constrained with the numbers in column T. Cell R7 is the total sum of players, which should be 9: 2 flexible and 7 as per the individual list requirements.
Middle table shows the salaries (randomly generated between 50,000 and 150,000). The Sum of Salaries formula is =SUMPRODUCT(C11:P14,C3:P6). The idea here is that only the salaries of players that are selected are taken into account. This SUMPRODUCT should be constrained with the budget, which is in cell T14. For my experiment, I put it equal to 80% of the total sum of all salaries.
Objective: Bottom table shows the projected points for each player. The formula in cell R22 is =SUMPRODUCT(C19:P22,C3:P6) (same logic as with salaries above). This is the value to be maximized.
Solver Model shown below:
I suggest selecting Simplex LP and going to Options and setting the Integer Optimality to zero (0).
Result:
Solver manages to find an optimal solution. The problem is really small and it is very quick. Solver works with up to 200 variables and 100 constraints, for large problems you will need the (commercial) extended version:
Of course, you can just order the real player names so that they fit this setting. For example, if you sort the players of each list alphabetically, then (Player A, QB) = first player of team QB, etc.
I hope this helps! Let me know if you would like me to upload the file for you.
Best,
Ioannis
Excel's solver is built on numerical methods. Applying to a domain that consists of discrete values, like strings or football players is probably going to fail. You should consider writing a brute force solver in a "real" programming language, like c#, java, python, ruby, or javascript. If there are performance problems, then optimize from there.
Solver won't work here because it's not a numeric solution you're after.
Make a spreadsheet that has every possible combination of position players (that meet your criteria) on each row. Then make an Excel formula that calculates projected point total based on the players in that row. Sort the spreadsheet by your projected point column.

On calculating temperature differences between zipcodes

Prop A. I wrote a zipcode server that gives me 32,000 zip codes of USA.
Each zipcode has an associated lat-long.
Given 2 zipcodes, I can find the distance between them using their lat-longs.
Prop B. I also wrote a weather server where you can input atmost 200 zipcodes and it spits out the temperature at each of those zipcodes.
Person tells me his zipcode is Z, temperature is T.
He asks me, what's the nearest place from Z where its atleast 10 degrees cooler ?
So I get a list of 200 zipcodes from Z sorted by distance ( using Prop A).
I feed that to B and get 200 temperatures.
If none are 10 degrees cooler, I get the next 200 zipcodes and repeat until done.
Problem: This seems quite inefficient and brute-force. I feel there's some Physics insight I'm missing. Its not always true that if you go north the temperatures cool down & going south they heat up. So direction doesn't help. Altitude probably does ( mountains cooler than valleys ) but zipcodes data keyed to altitude is hard to find.
Can you guys think of some smarter way to go about this ? Any suggestions appreciated.
Note: The weather data is expensive. You can hit the weather server a few times only, and you can only get 200 temperatures at each time. ( otoh, the distances between any 2 zipcodes are precomputed constants, and there is no cost to get that. )
You could do it by binary sorting all of the zip codes and grabbing all of them lower than the user's zip code in the sorted list, then doing the same on that subset for distance. This should be reasonably fast - binary sort is log(n), so you won't kill yourself on the sorts.
I agree with the comment from the physics forum that this is not a physics problem, but some insights from physics (or mathematics at least) might indeed be in order. If you are able to cheaply obtain the weather data, you might be able to set up a dataset and perform analysis once to guide your search.
Specifically, record the temperature for each location concurrently. Then, for each location, calculate the change in temperature to each contiguous zip code and associate that with a relative coordinate (i.e. the direction to the neighboring zip) and store this list ordered by temperature. When someone enters a query zip, your algorithm would start with the zip at the top of the list and work it's way down. Each non-satisfactory answer is added to a stack. If none of the neighboring zips meets the criteria (in this case 10 degrees cooler), the algorithm would start working through the new stack repeating the algorithm.
I am not a wonderful programmer so I won't give any code, but it seems to me that this would "follow" the natural contours of the temperature map better than a brute force search and would retain primacy on the proximity of the result. If you set up your initial dataset with several concurrent temperature measurements, you could time average those for better performance.
This is best for stackoverflow.
Combine the databases.
Write a query abs(lat-lat_o) + abs(long-long_0) < 2.00 AND temp < temp_0 - 10. That query will take advantage of indexing on your server.
If no results, increase 2.00 by a multiple and repeat.
If results, find which is closest. If the closest one is farther than the nearest edge of your bounding box, save that entry and increase 2.000 to that distance and see if one of those is closer.
Scales, uses database efficiently.

Resources