Number of days for delivery and number of orders delivered in two separate columns. Is there a way to get summary statistics about orders? - excel

I've had a bit of trouble explaining this so please bear with me. I'm also very new to using excel so if there's a simple fix, I apologize in advance!
I have two columns, one listing number of days starting from 0 and increasing consecutively. The other column has the number of orders delivered. The two correspond to each other. For example, I've typed out how it would look below. It would mean that there were 100 orders delivered in 1 day, 150 orders delivered in 2 days, 800 orders delivered in 3 days, etc.
Is there a way to get summary statistics (mean, median, mode, upper and lower quartiles) for the number of days it took for the average order to get delivered? The only way I can think of solving this is to manually punch in "1" 100 times, "2" 150 times, etc. into a new column and take median, mean, and upper & lower quartile from that, but that seems extremely inefficient. Would I use a pivot table for this? Thank you in advance!
I tried using the data analysis add-on and doing summary statistics that way, but it didn't work. It just gave me the mean, median, mode, and quartiles of each individual column. It would have given me 3 for median number of days for delivery and 300 for median number of orders.

Method 1
The mean is just
=SUMPRODUCT(A2:A6,B2:B6)/SUM(B2:B6)
Mode is the value with highest frequency
=INDEX(A2:A6,MATCH(MAX(B2:B6),B2:B6,0))
The quartiles and median (or any other quantile by varying the value of p) from first principles following this reference
=LET(p,0.25,
values,A2:A6,
freq,B2:B6,
N,SUM(freq),
h,(N+1)*p,
floorh,FLOOR(h,1),
ceilh,CEILING(h,1),
frac,h-floorh,
cusum,SCAN(0,SEQUENCE(ROWS(values)),LAMBDA(a,c,IF(c=1,0,a+INDEX(freq,c-1)))),
xlower,XLOOKUP(floorh-1,cusum,values,,-1),
xupper,XLOOKUP(ceilh-1,cusum,values,,-1),
xlower+(xupper-xlower)*frac)
Method 2
If you don't like doing it this way, you can always expand the data like this:
=AVERAGE(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1))
=MODE(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1))
=QUARTILE.EXC(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1),1)
=MEDIAN(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1))
and
=QUARTILE.EXC(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1),3)

Related

How to calculate no of days where sales were made in MS excel using sumifs and countifs?

I am working on an excel sheet where I am required to calculate average number of days the stores in a city were able to make some sales. I am attaching a sample of the table for reference. The values in the cells represent the number of units sold(not relevant to this question).
Here across NY, two stores are present, and out of the total number of days in consideration (3*2), only 4 days some sales were made, making the average 66%.
Similarly for Paris, there exists only one store which was open across all days.
To arrive at the figures, I tried using nested countifs and SUMIFS , but did not receive the expected results. Also, in some of the older posts, users had suggested to use INDEX MATCH with SUMIFS, but I was not to get accurate results using these.
Can anyone help me to get the correct figures for Total days, and Days with some sale.
SUMPRODUCT SOLUTION
=SUMPRODUCT(--(A$2:A$5=A8)*--(C$2:E$5<>""))
=SUMPRODUCT(--(A$2:A$5=A8)*--(C$2:E$5<>"NO SALE"))
=ROUND(C8/B8,4)
First, according to your grid NY made sales on 4 of the 6 days. (NY1: Mon, Wed; NY2: Tues, Wed). Thus the average is not 50% but 66%.
Second, to get your formula. Assuming "Place" is in column A. Below is for NY, you can solve for the rest.
Total number of days:
In cell "C9": =COUNTIF(A2:A4,"=NY") * 3
Days with sales:
In cell "D9": =COUNTIF(C2:E2,"<>NO SALE") + COUNTIF(C4:E4,"<>NO SALE")

Excel Finding average speed

I have got 1500 rows of travels. In column A I have got total time on travel, in column B total km driven. In column C I did calculation on the average speed of specific travel. Whats the best way to calculate the average speed of all travels? The lengths are from 0 to 20 kms approx, time always shorter than one hour.
First I eliminated all travels shorter than 2 km then
I managed to do a frequency table and have written frequencies of speeds in 0-5,5-10,... km/h. Now I can do a histogram, but should I eliminate more data or how to approach this problem?
In another cell enter:
=SUM(B:B)/SUM(A:A)
A common error would be to try to average the values in column C.
it depends on your data. if it is statistics, don't throw data away, use them.
you have column A for travel time, and column B for travle distance. using this two column you can find the total average speed like what Gary's student suggest i.e. SUM(B:B)/SUM(A:A).
you also have column C the average speed for each travel, you can use this two counter check. simply do SUMPRODUCT(A:A,C:C), you should find the result equals to SUM(B:B). if the results match, then i'll say "ok i'm satisfied with my calculation".

How to generate random numbers within a normal distribution using Excel

I want to use the RAND() function in Excel to generate a random number between 0 and 1.
However, I would like 80% of the values to fall between 0 and 0.2, 90% of the values to fall between 0 and 0.3, 95% of the values to fall between 0 and 0.5, etc.
This reminds me that I took an applied statistics course once upon a time, but not of what was actually in the course...
How is the best way to go about achieving this result using an Excel formula. Alternatively, what is this kind of statistical calculation called / any other pointers that I can Google around for.
=================
Use case:
I have a single column of meter readings, which I would like to duplicate 7 times (each column for a new month). each column has 55 000 rows. While the meter readings need to vary for each month, when taken as a time series, each meter number should have 7 realistic readings.
The aim is to produce realistic data to turn into heat maps (i.e. flag outlying meter readings)
I don't think that there is a formula which would fit exactly to your requirements. I would use a very straightforward solution:
Generate 80% of data using =RANDBETWEEN(0,20)/100
Generate 10% of data using =RANDBETWEEN(20,30)/100
Generate 5% of data using =RANDBETWEEN(30,50)/100
and so on
You can easily change the precision of generated data by modifying the parameters, for example: =RANDBETWEEN(0,2000)/10000 will generate data with up to 4 digits after decimal point.
UPDATE
Use a normal distribution for the use case, for example:
=NORMINV(RAND(), 20, 5)
where 20 is a mean value and 5 is a standard deviation.

Excel- Average days between group of dates

I'm trying to use excel to calculate the average frequency of delivery for a set of parts. I have a data set that has two columns- part number and delivery date. I'm trying to figrue out out oftne parts get delivered, on average, in terms of days. I tried using nested ifs like averageif(a2=a2:b9999,datedif(xx)) etc, but to no avail. I'm looking for this:
Input:
Part A 8.1
Part A 8.8
Part A 8.15
Output: Part A Average Delivery - Every 7 Days
etc etc. Any ideas?
If your dates are in ColumnB:
=(MAX(B:B)-MIN(B:B)--1)/COUNT(B:B)
or:
=(MAX(B:B)+1-MIN(B:B))/COUNTA(B:B)
should serve.
Edit
If you have multiple parts (the above assumed only one) and the list is in no particular order then a PivotTable may be best (say with its top left-hand corner in D1), in Tabular form with Part for Row Labels and Delivery three times for Σ Values (the first as MAX, the second as MIN and the third as COUNT). Then =(1+E3-F3)/G3 copied down should give you the average bumber of days between deliveries. For example 5 in your example (3 deliveries in 15 days).

Likelihood of a Distribution of Values Occurring Randomly

I have a data matrix depicting the number of telephone calls from one telephone to another, all calls are unidirectional. The rows represent days and the columns represent hours. The data is not a sample - it is the full population. Rows are days of the week and columns are one hour blocks of a 24 hour clock. Values in the cells represent the number of telephone calls from telephone A to telephone B for that specific hour.
I would like to have a repeatable measure that enables me to tell my audience that the likelihood of this distribution occurring randomly is <x.
I'd like the formula for Excel 2007 or, as a last resort, VBA code.
I've searched and found answers that tell me how to statistically determine the significance of differences between two different data sets but not how to measure for just one data set against a random outcome.
Thanx in advance.
If the total number of calls in a given hour is T, and the total calling population is P; then the number of calls from A to B should be about T/P if "random". To test whether this is really the case you'd use the Chi-squared test. I'm afraid I don't have time to give you the full answer - but it'd be the testvalue=sum((observed_i/P - (T/P))^2/(T/P)) where you check the testvalue against the chi-squared table, and you can pick off the probability too. Excel can calculate these values. Refer Chi-Squared Test for more details.

Resources