Handling Negative Values when Calculating Average & Value - statistics

I am working on a tool for Fantasy Football that calculates the average value a player offers per million pounds of cost. It essentially boils down to their average points per game divided by their cost.
So for example, a player who costs £10m and scores an average of 5 points per game offers 0.5 points per game per million. Whereas a player who costs £8m and scores an average of 5 points per game offers 0.625 points per game per million. Clearly the player who costs £8m is better value.
My problem is, players are capable of scoring negatively, and so how do I account for that in calculating the value of a player?
To give another example, a player who costs £10m and scores an average of -2 points per game offers -0.2 points per game per million. Whereas a player who costs £8m and scores an average of -2 points per game offers -0.25 points per game per million.
Now the player who costs £10m appears to be better value because their PPG/£m is higher. This shouldn't be true, they can't be better value if they cost more but score the same points. So if I have a list of players sorted by their value, calculated in this manner, some players will incorrectly show higher than players that are technically better value.
Is there a way to account for this problem? Or is just an unfortunate fact of the system I'm using?

One simple trick will be to slightly change your formula for PPG/£m as the ratio of the square of the average points he scored and the cost.
If you are particular about the scales, consider its positive square root.

Related

In Excel, how do I find the margin of error and confidence intervals for surveys with different sample sizes and population sizes?

I'm calculating the NPS (Net Promoter Scores) for about 50 different sessions at a recent event. Each session was attended by about 50-500 people, and the number of survey responses for each session ranges between 15-400.
If I know:
The number of respondents for each session (sample size)
The number of attendees for each session (population size)
The NPS score for each session (average rating, basically—more info below)
How can I figure out the margin of error and/or confidence intervals for each session in Excel?
What formula would I use where, for example, X = sample size, Y = population size, and Z = avg rating?
I don't need this to be incredibly correct as long as I'm in the ballpark—so you can ignore the NPS part which might throw things off slightly:
This is slightly complicated by the fact that NPS is a weird metric.
It asks "How likely would you be to recommend X to a friend or
colleague?" with a scale from 0-10 (10 = extremely likely, 0 = not at
all likely). You then count every 10 and 9 as a "promoter," count
every 8 and 7 as a "neutral" or "passive," and count everything
between 6 and 0 as a "detractor."
You then get the NPS by subtracting the detractors from the promoters, dividing that number by the total responses, then multiplying it by 100, so: ((Promoters - Detractors)/(Total responses))*100. NPS sort of flattens every response to a +1, 0, or -1, so it might complicate the calculations.
Assume I've already calculated the NPS for each session. I'm trying to figure out the margin of errors and/or confidence intervals for each session using Excel.
So, for example, my data would look like this:
Again, you can ignore the NPS stuff if it makes it easier and just assume it's an average rating where people were asked to rate each session from -100 to +100. What function(s) would I use in Excel to find the margin of error and/or confidence intervals for each session, given the sample size and target population size, and the average rating?

Tests to Compare Sales Mix Percent between Periods

Background
I wish to compare menu sales mix ratios for two periods.
A menu is defined as a collection of products. (i.e., a hamburger, a club sandwich, etc.)
A sales mix ratio is defined as a product's sales volume in units (i.e., 20 hamburgers) relative to the total number of menu units sold (i.e., 100 menu items were sold). In the hamburger example, the sales mix ratio for hamburgers is 20% (20 burgers / 100 menu items). This represents the share of total menu unit sales.
A period is defined as a time range used for comparative purposes (i.e., lunch versus dinner, Mondays versus Fridays, etc.).
I am not interested in overall changes in the volume (I don't care whether I sold 20 hamburgers in one period and 25 in another). I am only interested in changes in the distribution of the ratios (20% of my units sold were hamburgers in one period and 25% were hamburgers in another period).
Because the sales mix represents a share of the whole, the mean average for each period will be the same; the mean difference between the periods will always be 0%; and, the sum total for each set of data will always be 100%.
Objective:
Test whether the sales distribution (sales mix percentage of each menu item relative to other menu items) changed significantly from one period to another.
Null Hypothesis: the purchase patterns and preferences of customers in period A are the same as those for customers in period B.
Example of potential data input:
[Menu Item] [Period A] [Period B]
Hamburger 25% 28%
Cheeseburger 25% 20%
Salad 20% 25%
Club Sandwich 30% 27%
Question:
Do common methods exist to test whether the distribution of share-of-total is significantly different between two sets of data?
A paired T-Test would have worked if I was measuring a change in the number of actual units sold, but not (I believe) for a change in share of total units.
I've been searching online and a few text books for a while with no luck. I may be looking for the wrong terminology.
Any direction, be it search terms or (preferably) the actual names appropriate tests, are appreciated.
Thanks,
Andrew
EDIT: I am considering a Pearson Correlation test as a possible solution - forgetting that each row of data are independent menu items, the math shouldn't care. A perfect match (identical sales mix) would receive a coefficient of 1 and the greater the change the lower the coefficient would be. One potential issue is that unlike a regular correlation test, the changes may be amplified because any change to one number automatically impacts the others. Is this a viable solution? If so, is there a way to temper the amplification issue?
Consider using a Chi Squared Goodness-of-Fit test as a simple solution to this problem:
H0: the proportion of menu items for month B is the same as month A
Ha: at least one of the proportions of menu items for month B is
different to month A
There is a nice tutorial here.

Give 9 gifts to 5 users

Ive created a game and in that game played 5 users which collected few points, Ive gived gifts manually but for next games how can i split or make in excel to calculate number of gifts,
this is ok using number format with 0 decimal places, 6+1+1+1 = 9
but in cases like this:
1+6+1+1+1 = 10, how can I make that only 9 gifts results?
You should be comparing their percent (B2/SUM(B2:B6)) against each prize as it relates to the total prize (e.g. 1/9). Since you are comparing decimal numbers with another decimal number and expecting an integer (no. of prizes), you will be rounding either up or down depending on whether you are favoring a wider distribution of the prizes or favoring the top score.
Either way you are going to have to decide whether the lowest score should always receive a prize or if the highest score should benefit from the points awarded.
The three possible formulas to start with would be,
=MROUND(C2, 1/9)*9 ◄ closest to even distribution
=FLOOR(C2, 1/9)*9 ◄ favours wider prize distribution
=CEILING(C2, 1/9)*9 ◄ rewards highest awarded points
Fill down as necessary.
Now you have to either take the highest or lowest score and adjust that to compensate for rounding the division of decimal numbers to an integer. MROUND doesn't play well with SUMPRODUCT but these two may give you a solution that you can live with.
=FLOOR($C2, 1/9)*9-((SUMPRODUCT(FLOOR($C$2:$C$6, 1/9)*9)-9)*($C2=MAX($C$2:$C$6)))
=CEILING($C2, 1/9)*9-((SUMPRODUCT(CEILING($C$2:$C$6, 1/9)*9)-9)*($C2=MAX($C$2:$C$6)))
Fill down as necessary.
If the MROUND solution is best suited to your prize distribution model, use a helper column that can determine the MROUND returns and then adjust the high score according to the sum of the helper column without circular references.

Monte Carlo Simulation using Excel Solver

I am trying to figure out what the optimal number of products I should make per day are, displaying the values in a chart and then using the chart to find the optimal number of products to make per day.
Cost of production: $4
Sold for: $12
Leftovers sold for $1
So the ideal profit for a product is $8, but it could be -$3 if it's left over at the end of the day.
The daily demand of sales has a mean of 150 and a standard deviation of 30.
I have been able to generate a list of random numbers using to generate a list of how many products: NORMINV(RAND(),mean,std_dev)
but I don't know where to go from here to figure out the amount sold from the amount of products made that day.
The number sold on a given day is min(# produced, daily demand).
ADDENDUM
The decision variable is a choice you make: "I will produce 150 each day", or "I will produce 145 each day". You told us in the problem statement that daily demand is a random outcome with a mean of 150 and a SD of 30. Let's say you go with producing 150, the mean of demand. Since it's the mean of a symmetric distribution, half the time you will sell everything you made and have no losses, but in most of those cases you actually could have sold more and made more money. You can't sell products you didn't make, so your profit is capped at selling 150 on those days. The other half of the time, you won't sell all 150 and will take a loss on the unsold items, reducing your profit a bit. The actual profit on any given day is a random variable, because it is determined by random demand.
Since profit is random, you can calculate your average earnings across many days based on the assumption that you produce 150. You can also average earnings based on the assumption that you produce 140 per day, or 160 per day, or any other number. It sounds like you've been asked to plot those average earnings versus how many you decided to produce, and choose a production level that results in the highest long-term average earnings.

Manager game: How to calculate market values?

Usually players in a soccer manager game have market values. The managers sell their players in accordance with these market values. They think: "Oh, the player is worth 3,000,00 so I'll try to sell him for 3,500,000".
All players have three basic qualities:
strength value (1-99)
maximal strength they can ever attain (1-99)
motivation (1-5)
current age (16-40)
Based on these values, I calculate the market values at the moment. But I would like to calculate the market values dynamically according to the player transfers in the last period of time. How could I do this?
I have the above named qualities and the player transfers of the last period of time available for calculation.
How could I calculate it? Do I have to group the last transferred players by the qualities and simply take the average transfer price?
I hope you can help me.
Note: players=items/goods, managers=users
My suggestion: define a distance function that takes two players stats and return a distance value. Now that you have a distance between the two (that corresponds to the similarity between them) you can use the K-means algorithm to find clusters of similar players.
For each cluster you can take a number of values that can help you calculate the so called 'market price' (like the average or median value).
Here's a very simple example of how you could compute the distance function between two players:
float distance(Player player1, Player player2){
float distance = 0.0;
distance += abs(player1.strength - player2.strength) / strengthRange;
distance += abs(player1.maxStrength - player2.maxStrength) / maxStrength;
distance += abs(player1.motivation - player2.motivation) / motivationRange;
distance += abs(player1.age - player2.age) / ageRange;
return distance;
}
Now that you have the distance function you can apply the k-means algorithm:
Assign each player randomly to a cluster.
Now compute the centroid of each cluster. In your case the centroid coordinates will be (strength, maxStrength, motivation, age). To compute the centroid strength coordinate, for example, just average the strengths for the all players in the cluster.
Now assign each player to the nearest centroid. Note that in this step some players may have its cluster changed.
Repeat steps 2 and 3 until you have convergence or, in other words, until no player have its cluster changed in step 3.
Now that you have the clusters, you can calculate the average price fore similar players.
One thing that you could do is look at recent transfers of similar(1) players. Say all transfers within 2-5 game weeks of similar players and then take the average (or median or some other calculated value) of their sale price.
(1) You will have to define similiar in some way, i.e a defender with +-10 in defence, +-3 in passing and +-2 years of age. More factors give more precise results.
Or you could use a little Economics 101 and try to define the supply and demand for that specific player based on:
Number of players in the league with similar capabilities (you could use the clustering method mentioned before) and number of those players "available" for transfer
Number of teams that own the players with similar capabilities and number of teams that are in need for such players
Now with these number you could calculate the supply (available players for transfer) and demand (teams in need for those players) and use that to modify your base price (which can be your last transfer price or a base price for a player) up or down (ie more demand than supply will tend to push the prices up and vice versa)
After that it becomes negotiation game where you can take a look at some of the Game Theory literature to solve the actual exchange price.
Hope this at least give you a different look into it.

Resources