Related
I've had a bit of trouble explaining this so please bear with me. I'm also very new to using excel so if there's a simple fix, I apologize in advance!
I have two columns, one listing number of days starting from 0 and increasing consecutively. The other column has the number of orders delivered. The two correspond to each other. For example, I've typed out how it would look below. It would mean that there were 100 orders delivered in 1 day, 150 orders delivered in 2 days, 800 orders delivered in 3 days, etc.
Is there a way to get summary statistics (mean, median, mode, upper and lower quartiles) for the number of days it took for the average order to get delivered? The only way I can think of solving this is to manually punch in "1" 100 times, "2" 150 times, etc. into a new column and take median, mean, and upper & lower quartile from that, but that seems extremely inefficient. Would I use a pivot table for this? Thank you in advance!
I tried using the data analysis add-on and doing summary statistics that way, but it didn't work. It just gave me the mean, median, mode, and quartiles of each individual column. It would have given me 3 for median number of days for delivery and 300 for median number of orders.
Method 1
The mean is just
=SUMPRODUCT(A2:A6,B2:B6)/SUM(B2:B6)
Mode is the value with highest frequency
=INDEX(A2:A6,MATCH(MAX(B2:B6),B2:B6,0))
The quartiles and median (or any other quantile by varying the value of p) from first principles following this reference
=LET(p,0.25,
values,A2:A6,
freq,B2:B6,
N,SUM(freq),
h,(N+1)*p,
floorh,FLOOR(h,1),
ceilh,CEILING(h,1),
frac,h-floorh,
cusum,SCAN(0,SEQUENCE(ROWS(values)),LAMBDA(a,c,IF(c=1,0,a+INDEX(freq,c-1)))),
xlower,XLOOKUP(floorh-1,cusum,values,,-1),
xupper,XLOOKUP(ceilh-1,cusum,values,,-1),
xlower+(xupper-xlower)*frac)
Method 2
If you don't like doing it this way, you can always expand the data like this:
=AVERAGE(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1))
=MODE(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1))
=QUARTILE.EXC(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1),1)
=MEDIAN(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1))
and
=QUARTILE.EXC(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1),3)
Hopefully the title makes some sense because I'm trying to wrap my head around the logic and I'm not quite sure how to phrase the question.I'll try to give a brief explanation of the end goal without over complicating it with unnecessary details.
I have a table of survey score averages for every month per person and a correlating table with the number of surveys each person received for each month. The logic is essentially multiple the score for each month by the number of surveys, combine them, divide by the total number of surveys within that time period to get their true average. Where things get a little complicated is that I have to include the ability to set a custom date range and return the value. So sometimes I might be looking at the average for Jan - Apr, other times I might just be looking at Feb-Mar etc.
I think sumproduct is going to get what I need done but I'm running into issues trying to write it out. I've written it several different ways and none of them worked so here's one that best conveys what I'm trying to do,
=SUMPRODUCT(--(F7:I7,L7:O7>=C2),--(F7:I7,L7:O7<=C3),--(E8:E12,K8:K12=B9),tbl_average[[Jan-20]:[Apr-20]],tbl_surveys[[Jan-20]:[Apr-20]])
I super appreciate any assistance I can get on this. I'm hoping the end result is not nearly as difficult as I'm making it out to be.
Some additional information:
I'm going to be using this same process to calculate multiple metrics across multiple worksheets.In the test example each of the tables will most likely be on different sheets. The dashboard with the calculated results will contain everyone's names and will be filtered and rearranged frequently, so I need to make sure we're always matching directly to their names and not just the relative rows. Basically, in my example I show that Agent 1 is always lined up on row 8 but that's not always going to be the case. Agent 1 could be in Row 8 on Sheet 1, Row 10 on Sheet 2, and Row 12 on Sheet 3 and I need all the correct values to multiply and sum against one another.
Example data with desired outcome that I need to calculate
I have 12 items of a certain current value. I have a 'soft' cap of $1,000,000 for these values. Some of the items fall above, and some below this cap level.
I have an amount of money (for this example $900,000) that I want to distribute amongst only the items that fall below the cap (in this example 6 items), with the aim of bringing the value of these items up to but not over the cap value.
If I distribute the $900,000 evenly over these 6 items (each receiving $150,000), you can see that items 2 and 9 would then be over the $1,000,000 cap. So items 2 and 9 should only receive $100,000 to raise their value to the cap, then the remaining 4 items would receive and equal share on the remaining pool of money ($700,000 / 4 = $175,000).
So I need a formula to check every item to see if it needs a distribution (i.e below the cap) and then portion/divide out the money pool as illustrated above in the desired distribution column.
Note: The pool of money to be distributed can change. Also the number of items below the cap can change. The cap value itself can change.
I am hoping to avoid VBA or Solver because the spreadsheet could be used on other people's computers.
Hopefully this makes sense. Thanks.
EDIT:
So far I have been able to get close by adding a helper column and using the following formula:
=IF(SUM($F$6:F14)=$D$23,0,E15*MIN(D15,($D$23-SUM($F$6:F14))/SUM(E15:$E$18)))
Working example when values are sorted.
This seems to work when the values are sorted in descending order, as shown in the example image above. But seems to break when the values are a bit more randomly assorted which is likely to happen (as in the original post).
Just to give you an idea of how the solver can be set up to do a capital budget model here is one, also shows the solver and its settings:
I am working on a model of charging load of electric vehicle. I am attaching a link to an excel workbook for your better understanding.
Column B contains random time values
Column G to P represents houses and each house can have 1 car. So the each time values needs to be distributed in one column. Now when a car is plugged in, its load stays constant for 3 cells.
I want excel to randomly distribute these cars e.g. 4 cars to 4 houses and leave others blank.
what i can think of is, to assign each time a random house then use IF formula with AND function to match random times with time series and second condition to match random houses with columns 1-10.
the problem i am facing is, the formula gives a value error and only works in the rows with has random generated time in front of them screenshot. I know there is a very small thing that i am missing. please help me find it
Regards
workbook
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(AND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))>=$F6,INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))<=$F6+TIME(0,30,0)),11,""))
The two elements in the AND find the house number in column C and return the corresponding time in column B.
The first element compares the time in F to that time. The second element compares the time + 30 minutes to F (three cells). If it's between those two times, it gets an 11.
The ISNA makes sure that the house in question is on the list. You could also use an IFERROR, but I prefer the precision of ISNA.
Update
If you want the values to wrap around, you need to OR compare to the next day.
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(OR(AND(ROUND($F6,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,30,0),5)),AND(ROUND($F6+1,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6+1,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,30,0),5))),11,""))
That formula structure looks like
=If(isna(),"",if(or(and(today,today),and(tomorrow,tomorrow)),11,"")
This formulas already getting too big. If you triple it for your three voltages, it will be huge. You should consider writing a UDF in VBA. It won't be as quick to calculate, but will probably be more maintainable.
If you want to stick with a formula, you could put the wattage in row 4 above the house number. Then in another table, list the wattages and minutes to charge. So in, say, B12:C14 you have
3.7 120
11 30
22 15
Now where you have 11 in your formula, you'd have G$4 and the two placed you have TIME(0,30,0), you'd have TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0). I re-arranged some stuff to make it more 'readable' (but it's still pretty tough) and here's the final formula
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(OR(AND(ROUND($F6,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0),5)),AND(ROUND($F6+1,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6+1,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0),5))),G$4,""))
I'm trying to find the rolling mean of time series while ignoring values that do not follow the trend.
x
869
1570
946
0
1136
So, what I would want the result to look like is...
x | y
869 | 0
1570 | 0
946 | 1128.33
3 | 0
1136 | 1217.33 ([1136+1570+946]/3)
900 | 2982 ([946+1136+900]/3)
860 | 2896
The tough part here is if the row I'm on is a trending value I want to take the 3 previous trending values and find them mean of them, but if it's a non-trending value I want it to just zero out. Sometimes I might have to skip 2 or 3 previous lines to get 3 trending values to take the average as well.
So far I've been using array, RC formulas in a VBA macro form, but I'm not sure I could use RC here or if it has to be something else completely. Any help would be greatly appreciated.
I believe I can help you with your problem. First three notes:
1) It appears to me that you are trying to do DCA on smoothed production profiles, ignoring months without a complete record or no data. I'm making this assumption since you mentioned this was time series data but didn't give a sample rate. 2) I've added some extra 'data' for the sake of demo-ing. 3) In your example you shared, the last two values in your 'Y' column it looks like you may have summed but have forgotten to divide.
The solution I came up with has three parts: 1) create a metric to identify 'outliers'; 2) flag 'outliers'; 3) smooth non-flagged data. Let's establish some worksheet infrastructure and say that your production values are in column B and the associated time is in column A as follows:
Part 1) In column 'C', estimate a rough data value based on a trend approximated from two points on either side of your current time step. Subtract the actual value from this approximation. The result will always be positive and quite large for a timestep with little or no production.
=(INTERCEPT(B1:B6,A1:A6)+(A4*SLOPE(B1:B6,A1:A6)))-B4
Part 2) In column 'D', add a condition for when the value computed above is larger than the actual data point. Have it use '0' to identify a point that shouldn't be included in your average. Copy this down to the end of your data as well.
=IF(C4>B4,0,1)
Our sheet now looks like this:
3) Your three element average can now be computed. In the last cell of column 'E', enter the following array formula. You have to accept this formula by pressing ctrl + shift + enter. Once that is done fill the column from bottom to top:
=IFERROR(IF(D17=1,AVERAGE(INDEX(B12:B17,MATCH(2,1/(FIND(1,D12:D17)))),INDEX(B12:B16,MATCH(2,1/(FIND(1,D12:D16)))-COUNTIF(D17,"=0")),INDEX(B12:B15,MATCH(2,1/(FIND(1,D12:D15)))-COUNTIF(D16:D17,"=0"))),0),"")
This takes averages the most recent three values and allows for a skip of up to three time steps of outlier data per your problem statement. For an idea of how the completed sheet looks:
This was a fun challenge, I have some ideas for a more efficient formula but this should get the job done. Please let me know how this works for you!
Cheers
[EDIT]
An alternative approach which allows the user to specify the number of previous entries to include is detailed below. This is a more general (preferred alternative) and picks up in place of the previously described step 3.
3Alt) In cell G2 enter a number of previous values to average, for this example I am sticking with 3. In cell E4 enter the following array expression (ctrl+shift+enter) and drag to the end of column E:
=IFERROR(IF(D4=1,SUM(INDEX(D:D,LARGE(($D$4:D4=1)*ROW($D$4:D4),$G$2)):D4 * INDEX(B:B,LARGE(($D$4:D4=1)*ROW($D$4:D4),$G$2)):B4)/$G$2,0),"")
This uses the LARGE function to find the 'nth' largest value, where n is the number of preceding values from the current time-step to average. Then it builds a range that extends from the found cell to the current time step. Then it multiplies the flags (0's and 1's) by each month's production value, sums them and divides by n. In this way months flagged as bad are set to 0 and not included in the sum.
This is a much cleaner way to achieve the desired result and has the flexibility to average different periods of time. See example of the final value below.