I have a dataset where I know how many units of each product I have in starting inventory. Then I know how many units of a given product were sold. I also know how many units of all other products were sold. The question I'm trying to answer is were the total number of units sold of a particular product significantly higher than I would expect based on the products percentage of starting inventory. I've read the documentation on proportions_ztest. It talks about numbers of observations, so I want to check if I'm using it correctly for units sold. With the code below I'm trying to get the p-value.
sold= total number of units sold of product1
tot_sld= total number of units sold including all products
perc_strt= (total number of units of product1 in starting inventory)/(total number of units from all products in starting invetory)
code:
import statsmodels.api as sm
sm.stats.proportions_ztest(x['sold'],
x['tot_sld'],
x['perc_strt'],
alternative='larger')[1]
Update Example:
product1 start inventory=20 units
product2 start inventory=30 units
prodcut3 start inventory=50 units
product1 perc_strt=20%
number of units sold of product1=10 units
number of units sold of product2=10 units
number of units sold of product3=20 units
tot_sld=40 units
so
x['sold']=10
x['tot_sld']=40
x['perc_strt']=0.2
Update:
the one population proportion test from this post seems to confirm my original approach
https://towardsdatascience.com/demystifying-hypothesis-testing-with-simple-python-examples-4997ad3c5294
Related
I would like to calculate the prices in IQD. 1 USD is equal to 1458 IQD today.
There are 250, 500, 1,000, 5,000, 10,000, 25,000 and 50,000 denominations.
I need to sell my items based on USD. I have multicurrency plugin in my website. But it exchanges so precisely. It calculates smaller than cents.
For example, if I sell an item costs 1 dollar, the buyer need to pay 1458 IQD.
Is there anyway to calculate the prices what so called “near calculation”? Like, if an item costs 6820 it automatically shows 7000 at the checkout page. Or if it costs 6760, then shows 6750.
Thanks for your response.
unfortunately, there is no option in the plugin to do so.
I was given tens of thousands of bills charged to customers. I am trying to determine if the bills are potentially accurate. The only data I have is the actual bill itself. Here's where things get complicated. There are four scenarios, each with their own nuances:
Scenario 1 Nuances:
Base Fee of $10 * the amount of items bought * the number of deliveries (maximum amount of items bought would be 10 and the maximum number of deliveries would be 7)
If the amount of items bought exceeds 1, the total bill is discounted by 20%
If the boxes are recycled, the total bill is further discounted by 15%
If the good is over 15 pounds, an additional charge of $10 will be assessed per each item over 15 pounds (maximum amount of items would be 10)
If there is no adequate port for delivery, an additional $50 per hour will be charged per delivery (maximum hours would be 3)
Scenario 2 Nuances:
Business that share a port of delivery for the second service will be charged $11 per item
If the business does not share a port of delivery, the fee is $13 per item. Each additionally delivery for a business that does not share a port is $15 per delivery, in addition to each item costing $13 (maximum deliveries would be 7 and maximum amount of items would be 8).
You see where I am going with all of this. The next two scenarios are just as lengthy. Sadly, no information besides the total bill is given. We don't know which scenario the bill belongs to, the number of items, etc., just a single column of the invoice. How would I go about determining all potential costs from each scenario to maybe index the invoice and see if there is a match? Any help on this would be amazing.
I tried creating an index where I wrote out multiple scenarios. For example, a column with the number of items, the number of deliveries, a discount for > 1 item bought, a recycling discount, etc... After attempting this for two hours, I realized this may not be the best method. My first row would be 1 item, 1 delivery, 0% discount for > 1 item bought, and 0% discount for recycling. Next would be 2 items bought, 1 delivery, 20% discount for > 1 item bought, 0% discount for recycling. This is seemingly impossible to do for everything.
My expected output is "Y" if the invoice total matches one of the potential costs from the index, "N" if not.
You can write a macro to calculate every possible scenario. Write a macro with nested loops.
The outside loop is the quantity, such as For ItemQty = 1 to 100 (or to some other likely maximum).
The inside loops are the scenarios and nuances, such as For BoxesRecycled = 0 to 1. Each loop modifies the total, such as Total = Total - (BoxesRecycled * 0.15 * Total).
In the middle of all the loops is code that writes a line with the quantity, nuances, and total.
You will need to match the sequence in which charges and discounts were applied and the way the total was rounded. Probably the sequence was discounted 20%, then discounted 15%, then rounded. But maybe it was discounted 20%, then rounded, then discounted 15%, then rounded. If you don't match the sequence and rounding, then your total might be 1 cent different from an actual invoice.
Background
I wish to compare menu sales mix ratios for two periods.
A menu is defined as a collection of products. (i.e., a hamburger, a club sandwich, etc.)
A sales mix ratio is defined as a product's sales volume in units (i.e., 20 hamburgers) relative to the total number of menu units sold (i.e., 100 menu items were sold). In the hamburger example, the sales mix ratio for hamburgers is 20% (20 burgers / 100 menu items). This represents the share of total menu unit sales.
A period is defined as a time range used for comparative purposes (i.e., lunch versus dinner, Mondays versus Fridays, etc.).
I am not interested in overall changes in the volume (I don't care whether I sold 20 hamburgers in one period and 25 in another). I am only interested in changes in the distribution of the ratios (20% of my units sold were hamburgers in one period and 25% were hamburgers in another period).
Because the sales mix represents a share of the whole, the mean average for each period will be the same; the mean difference between the periods will always be 0%; and, the sum total for each set of data will always be 100%.
Objective:
Test whether the sales distribution (sales mix percentage of each menu item relative to other menu items) changed significantly from one period to another.
Null Hypothesis: the purchase patterns and preferences of customers in period A are the same as those for customers in period B.
Example of potential data input:
[Menu Item] [Period A] [Period B]
Hamburger 25% 28%
Cheeseburger 25% 20%
Salad 20% 25%
Club Sandwich 30% 27%
Question:
Do common methods exist to test whether the distribution of share-of-total is significantly different between two sets of data?
A paired T-Test would have worked if I was measuring a change in the number of actual units sold, but not (I believe) for a change in share of total units.
I've been searching online and a few text books for a while with no luck. I may be looking for the wrong terminology.
Any direction, be it search terms or (preferably) the actual names appropriate tests, are appreciated.
Thanks,
Andrew
EDIT: I am considering a Pearson Correlation test as a possible solution - forgetting that each row of data are independent menu items, the math shouldn't care. A perfect match (identical sales mix) would receive a coefficient of 1 and the greater the change the lower the coefficient would be. One potential issue is that unlike a regular correlation test, the changes may be amplified because any change to one number automatically impacts the others. Is this a viable solution? If so, is there a way to temper the amplification issue?
Consider using a Chi Squared Goodness-of-Fit test as a simple solution to this problem:
H0: the proportion of menu items for month B is the same as month A
Ha: at least one of the proportions of menu items for month B is
different to month A
There is a nice tutorial here.
I am trying to figure out what the optimal number of products I should make per day are, displaying the values in a chart and then using the chart to find the optimal number of products to make per day.
Cost of production: $4
Sold for: $12
Leftovers sold for $1
So the ideal profit for a product is $8, but it could be -$3 if it's left over at the end of the day.
The daily demand of sales has a mean of 150 and a standard deviation of 30.
I have been able to generate a list of random numbers using to generate a list of how many products: NORMINV(RAND(),mean,std_dev)
but I don't know where to go from here to figure out the amount sold from the amount of products made that day.
The number sold on a given day is min(# produced, daily demand).
ADDENDUM
The decision variable is a choice you make: "I will produce 150 each day", or "I will produce 145 each day". You told us in the problem statement that daily demand is a random outcome with a mean of 150 and a SD of 30. Let's say you go with producing 150, the mean of demand. Since it's the mean of a symmetric distribution, half the time you will sell everything you made and have no losses, but in most of those cases you actually could have sold more and made more money. You can't sell products you didn't make, so your profit is capped at selling 150 on those days. The other half of the time, you won't sell all 150 and will take a loss on the unsold items, reducing your profit a bit. The actual profit on any given day is a random variable, because it is determined by random demand.
Since profit is random, you can calculate your average earnings across many days based on the assumption that you produce 150. You can also average earnings based on the assumption that you produce 140 per day, or 160 per day, or any other number. It sounds like you've been asked to plot those average earnings versus how many you decided to produce, and choose a production level that results in the highest long-term average earnings.
I am having some difficulty determining how to produce a calculation of averages that can be plotted on a PivotChart.
Specifically, I wish to compare a Sales Rep's performance (gross profit by month/year) against all other reps (using an average) who are in a comparable role (the same workgroup) for a given period.
Let's just say the data structure is as follows:
SaleID SaleLocation SaleType SalesRep SaleDate WorkGroup SalesGP
1 Retail1 Car John A 01/01/2014 Sales $301
2 HQ Bike John A 01/01/2014 Sales $200
3 Retail1 Car Sam L 02/01/2014 Sales $1300
4 Retail2 Plane Sam L 02/01/2014 Sales $72
5 Retail2 Plane Vince T 03/01/2014 Admin $55
6 Retail2 Bike John A 04/01/2014 Sales $39
7 HQ Car Vince T 05/01/2014 Admin $2154
....etc
In the excel data model I've added calculated fields (that use a lookup table) for the sale date so that sales can be plotted by Month or Year (eg. =YEAR([SaleDate]) and =MONTH([SaleDate]))
As an example, let's say I want to plot someone's GP (Gross Profit) for a period of time:
My question is this......
How can I calculate an "average gross profit" that I can plot on the PivotChart? This "average gross profit" should be the average of all sales for the same period for the same workgroup.
In the example above, in the PivotChart I am wanting to plot an "average" series which plots the average GP by month for all SalesReps that are in the same Workgroup as John A ("Sales").
If my request isn't clear enough please let me know and I'll do my best to expand.
Zam, this should be quite easy. You just need to create a new calculated field that calculates the average for ALL sales rep.
Let me walk you through it:
I used your data table and then added it to my PowerPivot (Excel 2013). Then I created those calculated measures:
1. Sales Average:
=AVERAGE(SalesData[SalesGP])
2. Sales Average ALL -- this will calculate the average for ALL rows in the table and will be used in other calculations. Notice I used the first calculated field as the first parameter to make this flexible:
=CALCULATE([Sales Amount Average],ALL(SalesData))
3. Sales Comparison to Average. I wasn't sure what is your goal, but I made this one a bit more complex as I wanted to display the performance in percentage:
=IF([Sales Amount Average] > 0, [Sales Amount Average] / [Sales Average ALL] -1)
Basically, what this does is that first it checks if their an existing calculation for a sales representative, if so then it divides Sales Average for a given sales representative by the average sale amount for ALL sales representatives. And then I subtract 1 so the performance can be easily grasped just by looking at the percentages.
To make it easy-to-understand, I decided to use bar charts in conditional formatting instead of stand-alone pivotchart -- I believe it does exactly what you need:
In the picture above, the first table represents your data. The second table is the actual powerpivot table and shows what I have described.
Hope this helps!
EDIT
I didn't want to make things over-complicated, but should you want to remove the percentage total from Grand Totals row, use this calculation instead for the percentage comparison:
=IF(HASONEVALUE(SalesData[SalesRep]),
IF([Sales Amount Average] > 0,
[Sales Amount Average] / [Sales Average ALL] -1),
BLANK()
)
EDIT - ADDED AVERAGE COMPARISON FOR WORKGROUPS
To calculate the performance per workgroup instead of ALL sales representative, add those two measures:
4. Sales Average per Workgroup
=CALCULATE(AVERAGE(SalesData[SalesGP]),ALL(SalesData[SalesRep]))
This will calculate the average sale per workgroup (don't get confused with using SalesRep in ALL function, it's related to filter context).
5. Sales Diff to Average per Workgroup
=IF(HASONEVALUE(SalesData[SalesRep]),IF([Sales Amount Average] > 0, [Sales Amount Average] - [Sales Average per Workgroup]),BLANK())
This simply calculates the difference between average sale of given sales rep and the average sale per workgroup. The result could then look like this:
I have uploaded the source file to my public Dropbox folder.