Background
I wish to compare menu sales mix ratios for two periods.
A menu is defined as a collection of products. (i.e., a hamburger, a club sandwich, etc.)
A sales mix ratio is defined as a product's sales volume in units (i.e., 20 hamburgers) relative to the total number of menu units sold (i.e., 100 menu items were sold). In the hamburger example, the sales mix ratio for hamburgers is 20% (20 burgers / 100 menu items). This represents the share of total menu unit sales.
A period is defined as a time range used for comparative purposes (i.e., lunch versus dinner, Mondays versus Fridays, etc.).
I am not interested in overall changes in the volume (I don't care whether I sold 20 hamburgers in one period and 25 in another). I am only interested in changes in the distribution of the ratios (20% of my units sold were hamburgers in one period and 25% were hamburgers in another period).
Because the sales mix represents a share of the whole, the mean average for each period will be the same; the mean difference between the periods will always be 0%; and, the sum total for each set of data will always be 100%.
Objective:
Test whether the sales distribution (sales mix percentage of each menu item relative to other menu items) changed significantly from one period to another.
Null Hypothesis: the purchase patterns and preferences of customers in period A are the same as those for customers in period B.
Example of potential data input:
[Menu Item] [Period A] [Period B]
Hamburger 25% 28%
Cheeseburger 25% 20%
Salad 20% 25%
Club Sandwich 30% 27%
Question:
Do common methods exist to test whether the distribution of share-of-total is significantly different between two sets of data?
A paired T-Test would have worked if I was measuring a change in the number of actual units sold, but not (I believe) for a change in share of total units.
I've been searching online and a few text books for a while with no luck. I may be looking for the wrong terminology.
Any direction, be it search terms or (preferably) the actual names appropriate tests, are appreciated.
Thanks,
Andrew
EDIT: I am considering a Pearson Correlation test as a possible solution - forgetting that each row of data are independent menu items, the math shouldn't care. A perfect match (identical sales mix) would receive a coefficient of 1 and the greater the change the lower the coefficient would be. One potential issue is that unlike a regular correlation test, the changes may be amplified because any change to one number automatically impacts the others. Is this a viable solution? If so, is there a way to temper the amplification issue?
Consider using a Chi Squared Goodness-of-Fit test as a simple solution to this problem:
H0: the proportion of menu items for month B is the same as month A
Ha: at least one of the proportions of menu items for month B is
different to month A
There is a nice tutorial here.
Related
I have a weighted average set up like this:
=IFERROR(SUMPRODUCT(Purchases_XYZ[Cost Per Unit],Purchases_XYZ[Cost])/SUM(Purchases_XYZ[Cost]),"")
The full table includes:
Purchases_XYZ[Date], Purchases_XYZ[Cost], Purchases_XYZ[Units], Purchases_XYZ[Cost Per Unit], Purchases_XYZ[Extras], Purchases_XYZ[Location]
But what if I sell 30% of my units at Location "b"? Then I need to manually decrease the number of units I have left at Location "b".
What if I sell all the units at Location "d"? Then my weighted average can no longer be counted because there is no longer any of that unit left at that price.
I can do the manual stuff - that wouldn't be impossible. But I feel like I am making more work than it needs to be. If you have solutions to offer, I am keen to hear how it could be done.
The Doobie Brothers garage band is planning a concert. Tickets are set at $20. Based on what other bands have done, they figure they should sell 350 tickets, but that could fluctuate. They figure the standard deviation of sales at 50 tickets. No shows are uniformly distributed between 1 and 10. Fixed costs are 5000.
How profitable is the concert likely to be?
So I am able to enter the excel formula for revenue 50*20 and subtract 5000 for FC, but I am having trouble deciphering how to account for the no show costs. I know that I have to use RANDBETWEEN(1,10) formula, but I am not sure if it gets multiplied or divided by something. Again, I am looking for what to do with the formula in the context of a profit equation.
If it helps, the mean for the number of tickets sold is 350 and stdev is 50, so I used that to get the number of attendees in a simulated sense...That is, NORM.INV(RAND(),350,50)
Of course, this problem may not be realistic in real life because promoters keep the money, but for the purposes of the problem...just assume that no promoters exist here.
I have two groups, "in" and "out," and item categories that can be split up among the groups. For example, I can have item category A that is 99% "in" and 1% "out," and item B that is 98% "in" and 2% "out."
For each of these items, I actually have the counts that are in/out. For example, A could have 99 items in and 1 item out, and B could have 196 items that are in and 4 that are out.
I would like to rank these items based on the percentage that are "in," but I would also like to give some priority to items that have larger overall populations. This is because I would like to focus on items that are very relevant to the "in" group, but still have a large number of items in the "out" group that I could pursue.
Is there some kind of score that could do this?
I'd be tempted to use a probabilistic rank — the probability that an item category is from the group given the actual numbers for that category. This requires making some assumptions about the data set, including why a category may have any out-of-group items. You might take a look at the binomial test or the Mann-Whitney U test for a start. You might also look at some other kinds of nonparametric statistics.
I ultimately ended up using bayesian averaging, which was recommended in this post. The technique is briefly described in this wikipedia article and more thoroughly described in this post by Evan miller and this post by Paul Masurel.
In bayesian averaging, "prior values" are used to influence the numerator and denominator towards the expected values. Essentially, the expected numerator and expected denominator are added to the actual numerator and denominator. In the case where the numerator and denominator are small, the prior values have a larger impact because they represent a larger proportion of the new numerator/denominator. As the numerators and denominators grow in magnitude, the bayesian average starts to approach the actual average due to increased confidence.
In my case, the prior value for the average was fairly low, which biased averages with small denominators downward.
I am trying to figure out what the optimal number of products I should make per day are, displaying the values in a chart and then using the chart to find the optimal number of products to make per day.
Cost of production: $4
Sold for: $12
Leftovers sold for $1
So the ideal profit for a product is $8, but it could be -$3 if it's left over at the end of the day.
The daily demand of sales has a mean of 150 and a standard deviation of 30.
I have been able to generate a list of random numbers using to generate a list of how many products: NORMINV(RAND(),mean,std_dev)
but I don't know where to go from here to figure out the amount sold from the amount of products made that day.
The number sold on a given day is min(# produced, daily demand).
ADDENDUM
The decision variable is a choice you make: "I will produce 150 each day", or "I will produce 145 each day". You told us in the problem statement that daily demand is a random outcome with a mean of 150 and a SD of 30. Let's say you go with producing 150, the mean of demand. Since it's the mean of a symmetric distribution, half the time you will sell everything you made and have no losses, but in most of those cases you actually could have sold more and made more money. You can't sell products you didn't make, so your profit is capped at selling 150 on those days. The other half of the time, you won't sell all 150 and will take a loss on the unsold items, reducing your profit a bit. The actual profit on any given day is a random variable, because it is determined by random demand.
Since profit is random, you can calculate your average earnings across many days based on the assumption that you produce 150. You can also average earnings based on the assumption that you produce 140 per day, or 160 per day, or any other number. It sounds like you've been asked to plot those average earnings versus how many you decided to produce, and choose a production level that results in the highest long-term average earnings.
I'm looking to compute and show individual row totals and a Grand Total. I just need the formulae to put in the boxes so the calculation is automatic but the problem is the calculations are a little complicated...
I'm using data validation to select the day type. This is what I think I need:
Assign a price to the day type (either Standard day = £23 or Extended day = £26).
Apply a volume discount where appropriate. If Jack is attending all week (5 days) and the day type is the same for all (all Standard or all Extended), the total cost is £100 (or £120)
Else the total number of days needs to be added up for Jack. (Number of days for each ‘day type’) and priced up.
For his siblings after the first, as above but apply an additional discount of 15%.
The grand total then needs to show at the bottom.
Well, it is not the best of data layouts but this may serve, in L6 and copied down to L13:
=IF(OR(A6="Brother",A6="Sister"),0.85* IF(COUNTIF(B6:F6,"Standard day")=5,100,IF(COUNTIF(B6:F6,"Extended day")=5,120,COUNTIF(B6:F6,"Standard day")*23+COUNTIF(B6:F6,"Extended day")*26)), IF(COUNTIF(B6:F6,"Standard day")=5,100,IF(COUNTIF(B6:F6,"Extended day")=5,120,COUNTIF(B6:F6,"Standard day")*23+COUNTIF(B6:F6,"Extended day")*26)))
and =SUM(L1:L16) in D16.
It would be better practice not to hard code the daily rates/discount, but extracting these from C1:D2 would have increased the length of the formula further.
Note also the result is not £429.95 (you may have changed your example after doing your calculations).