I have two columns in a table that are known as S and D, where S is a date, and D is a duration.
E.g.:
'S' Column
January
February
March
April
'D' Column
60
30
45
30
On a separate sheet, imagine that Row 1 contains a sequence of dates (variable, depends on user menu selection).
Row 2 requires the following calculation:
[(x - s1)/d1 + (x-s2)/d2 + ... + (x-sn)/dn] / n
...where "x" is any date along Row 1.
The calculation would only be done when multiple criteria are matched.
My initial attempt involved creating a separate table, but I think this can be done in a one-cell formula in Row 2. I don't think a sum(index(match)) type would work here considering d1, d2, ..., dn are denominators with different values.
Here is an example attempt:
=SUM((SelectedDate-INDEX(Table[StartDates],MATCH(Criteria1&Criteria2,Range1&Range2,0))/INDEX(Table[Durations],MATCH(Criteria1&Criteria2,Range1&Range2,0))))
It may be important to note that I am able to do this in a two-step fashion. First, I create a table that does the calculation on each row. Then, I reference the table. It would be nice if I can eliminate the need of a "calculation table" in favour of an array-type formula.
I took a guess before most of the comments and suggest the following mainly in case it provides ay ideas or is of help in specifying the requirement more closely. This is "two-step" because the average formula is separate from that for each column, so I appreciate may well be not what is required.
Assuming S and D are labelled data in Sheet2 ColumnsA:B.
Assuming Row1 data rows are labelled in ColumnA.
Assuming dates are not strings (eg January is actually Jan 1) and the differences are in days and always positive, etc.
=(B$1-CHOOSE(COLUMN(),Sheet2!$A1,Sheet2!$A2,Sheet2!$A3,Sheet2!$A4))/CHOOSE(COLUMN(),Sheet2!$B1,Sheet2!$B2,Sheet2!$B3,Sheet2!$B4)
in Row2 to suit and if four data columns, in F2 and copied down to suit :
=AVERAGE(B2:E2)
Related
I am trying to create a formula that gives me the average of the last 12 entries in a given dataset depending on the associated vector.
Let's make an example:
I have in column F2,G2,H2 and I2 dates, Company1, Company2 and Company3 respectively. Then from row3 to row 33 I have months dates starting from May 2016.
Date Company1 Company2 Company3
May-16 2,453,845
Jun-16 13,099,823
Jul-16 14,159,037
Aug-16 38,589,050 8,866,101
Sep-16 63,290,285 13,242,522
Oct-16 94,005,364 14,841,793
Nov-16 123,774,792 7,903,600 41,489,883
Dec-16 93,355,037 12,449,604 69,117,105
Jan-17 47,869,982 13,830,712 83,913,764
Feb-17 77,109,905 10,361,555 68,176,643
The goal is to create a formula that, when I drag it down, correctly calculates the average of the last 12 values for a given company.
So for example i would have, say in table "B2:C5":
Company1 76,856,345
Company2 11,120,859
Company3 65,674,349
And, if a new Company4 is added to the list, then I just have to drag it down the formula, to calculate the average of the last 12 months for Company4.
Until now, I have came up with this formula:
=AVERAGE(LOOKUP(LARGE(IF(ISNUMBER(G:G),ROW(G:G)),ROW(INDIRECT("1:"&MIN(12,COUNT(G:G))))),ROW(G:G),G:G ))
This formula correctly calculates the average of a given column, considering only the last 12 values. The last step would be to come up with a formula that includes all the columns and then calculates the average for the given company.
Thanks!
I recommend that you use a named range to define your data in columns G:I. When a company is added, just modify the named range's specs. I used the name Target. Of course, you can replace it with $G:$I if you feel so inclined but I would rather recommend reducing the number of rows in the range, which is easier to manage when it is named.
Use the formula below to extract the company names from the first row of Target into the first column of your averages table. This is to ensure that the names are spelled identically in both locations.
=INDEX(Target,1,ROW()-2)
The number 2 indicates the number of rows above the row containing the formula. it is copied here from cell M3. There, ROW()-2 creates the number 1, counting sequentially as the formula is copied down.
Now I have the formula below in my cell N3 and copied down.
=SUM(INDEX(Target,0,MATCH($M3,INDEX(Target,1,0),0)))
The formula simply sums up the columns G, H, and I in 3 consecutive rows.
In the final step I inserted the range definition established above, meaning excluding the SUM() function, into your existing formula.
=AVERAGE(LOOKUP(LARGE(IF(ISNUMBER(INDEX(Target,0,MATCH($M3,INDEX(Target,1,0),0))),ROW(INDEX(Target,0,MATCH($M3,INDEX(Target,1,0),0)))),ROW(INDIRECT("1:"&MIN(12,COUNT(INDEX(Target,0,MATCH($M3,INDEX(Target,1,0),0))))))),ROW(INDEX(Target,0,MATCH($M3,INDEX(Target,1,0),0))),INDEX(Target,0,MATCH($M3,INDEX(Target,1,0),0))))
I have 7 columns to choose from and I need to pick 4 of those columns and generate a total for each row. I also need every combination of 4, which means I'll have 35 new columns with the totals for each of those combinations showing in each row. I need the code for this and if it can be done only using Excel. Here is an image of the columns and the grayed ones are the 7 columns I'm talking about. My knowledge of Excel is very limited. There are over 1,500 rows if that matters.
multi step approach that is going to use some helper rows. there may be a more elegant formula that will do this, and much slicker options in VBA, but this is a formula only approach.
Step 1 - Generate List of Column Combination
To generate the list 4 helper rows will need to be insert at the top of your data. either above or below you header row. These 4 rows will represent the column number you are going to pick. To keep the math simpler for me I just assumed the 1 for the first column and 7 for the last column. those numbers will get converted to later to account for column in between in your spreadsheet. For the sake of this example The first combination sum will occur in column AO and the first helper row will be row 1. The first combination will be hard coded and it will seed the pattern for the remainder of column combinations. Enter the following values in the corresponding cells:
AO1 = 1
AO2 = 2
AO3 = 3
AO4 = 4
In the adjacent column a formula will be placed and copied to the right. It will automatically augment the bottom value by 1 until it hits its maximum value at which point the value in the row above will increase by 1 and the the value of the current will be 1 more than the cell above. This will produce a pattern that covers all 35 combinations by the time column BW is reached. Place the formulas below in the appropriate cell and copy to the right:
AP1
=IF(AO2=5,AO1+1,AO1)
AP2
=IF(AO2=5,AP1+1,IF(AO3=6,AO2+1,AO2))
AP3
=IF(AO3=6,AP2+1,IF(AO4=7,AO3+1,AO3))
AP4
=IF(AO4=7,AP3+1,AO4+1)
Step2 - Sum The Appropriate Columns
I was hoping to use a some sort of array type operation to read through the column reference numbers above, but I could not get my head around it. Since it was just 4 entries to worry about I simply added each reference manually in a SUM function. Now the important thing to note is that we will be using the INDEX function over the 13 columns that cover the range of your columns so to convert the index number we figured out above, to something that will work to grab every second row, the number that was calculated will be multiplied by 2 and then 1 will be subtracted. That means 1,2,3,4 for the first column combination becomes 1,3,5,7. You can see this in the following formula. Place the following formula in the appropriate cell and copy down and to the right as needed.
AO5
=INDEX($AB5:$AN5,AO$1*2-1)+INDEX($AB5:$AN5,AO$2*2-1)+INDEX($AB5:$AN5,AO$3*2-1)+INDEX($AB5:$AN5,AO$4*2-1)
pay careful attention to the $ which will lock row or column reference and prevent them from changing as the formula is copied.
Now you may need to adjust the cell references to match your sheet.
I have a spreadsheet with different products, listing units and retail value sold like the example below
Product Units Value
A 10 100
B 15 80
C 30 560
I'd like to compare the Average Selling Price with the Median Selling price, so I am looking for a quick formula to accurately calculate the median.
The median function requires the entire series, so for Product A above I would need 10 instances of 10 etc. How can I calculate the Median quickly considering the condensed form of my data?
Without writing your own VBA function to do this there are a couple of approaches that can be taken.
The first expands the data from its compressed frequency count format to generate the full set of observations. This can be done manually or formulaically. On the assumption the latter is required, it can be achieved using a few columns.
All the blue cells are formulae.
Column Eis simply the cumulative of column B and F is an adjusted version of this. Column H is just the values 1 to 55, the total number of observations given by cell L2. Column I uses the MATCH() with its final argument as 1 to match each observation in H against the adjusted cumulative in F. Column J uses the INDEX() function to generate the value of the observation. (Observations 1-10 have value 100, 11-25 have value 80 and 26-55 have value 560 in this example). The MEDIAN() function is used in cell M2 with column J as its argument.
This approach can be refined to take account of varying numbers of products and data points through the use of the OFFSET function to control the range arguments of the MATCH(), INDEX() and MEDIAN functions. And, of course, adjacent cells in columns I and J could be combined using a single formula - I've shown them separately for ease of explanation.
The second approach involves sorting the data by value (so in this case the data rows would become Product B in row 2, product A in row 3 and product C left as-is in row 4). It is then a case of identifying the middle observation number (if the number of observations is odd) or the middle pair of observation numbers (if the number of observations is even) and then determining the value(s) corresponding to this/these middle observation(s). In this approach the adjusted cumulative in column F is still used but rather than explicitly calculating the values in column I and J for every observation it can now be restricted to just the middle observation(s).
I think there is no way around compromises. Either using big amounts of helper cells or having the table sorted by the values.
Helper cells:
Formula in F4:AS6:
=IF(COLUMN()<COLUMN($F$4)+$B4,$C4,"end")
Formula in D2:
=MEDIAN(F4:AS6)
Sorted:
Formula in F4 downwards:
=SUM($B$3:B3)+1
Formula in D2:
=SUM(LOOKUP(INT(SUM(B4:B6)/2+{0.5,1}),F4:F6,C4:C6))/2
For example, I need to create a merit list of few student based on total marks (column C), then higher marks in math (column B) -
A B C D
-------------------------
Student1 80 220 1
Student2 88 180 3
Student3 90 180 2
Expected merit position is given in column D.
I can use RANK function but I can only do that for one column (total number). If total number of multiple student is equal, I could not find any solution of this.
You can try this one in D1
=COUNTIF($C$1:$C$99,">"&C1)+1+SUMPRODUCT(--($C$1:$C$99=C1),--($B$1:$B$99>B1))
and then copy/fill down.
let me know if this helps.
Explanation
Your first criteria sits in column C, and the second criteria sits in Column B.
Basically, first it is counting the number of entries ($C$1:$C$99) that are bigger than the entry itself ($C1). For the first one in the ranking, you will get zero, therefore you need to add 1 to each result (+1).
Until here, you will get duplicate rankings if you have the same value twice. Therefore you need to add another argument to do some extra calculations based on the second criteria:
To resolve the tie situation, you need to sumproduct two array formulas and add the result to the previous argument, the goal is to find the number of entries that are equal to this entry with $C$1:$C$99=C1 and have a bigger value in the second criteria column $B$1:$B$99>B1:
you add -- to convert TRUE and FALSE to 0s and 1s so that you can multiply them:
SUMPRODUCT(--($C$1:$C$99=C1),--($B$1:$B$99>B1))
the first array is to see how many ties you have in the first criteria. And the second array is to find the number of bigger values than the entry itself.
Note you can add as many entries as you like to your columns, but remember to update the ranges in the formula, currently it is set to 99, you can extend it to as many rows as you want.
Sometimes a helper column will provide a quick and calculation-efficient solution. Adding the math marks to the total marks as a decimal should produce a number that will rank according to your criteria. In an unused column to the right, use this formula in row 2,
=C2+B2/1000
Fill down as necessary. You can now use a conventional RANK function on this helper column like =RANK(D2, D$2:D$9) for your ranking ordinals.
Very simple (or, at least, much more simpler that the one provided by the best answer) 'math' solution: do a linear combination with weights.
Do something like
weighted_marks = 10*colC + colB
then sort weighted marks using simple rank function.
It does solve your problem, bulding the ranking you need.
If you don't like to limit the number of rows or the numbers used in the criteria, Jeeped's approach can be extended. You can use the following formulas in cells D2 to L2, assuming that there are three criteria, the first one in column A, the second one in column B, and the third one in column C:
=RANK($A2,$A:$A,1)
=RANK($B2,$B:$B,1)
=D2*2^27+E2
=RANK(F2,F:F,1)
=RANK($C2,$C:$C,1)
=G2*2^27+H2
=RANK(I2,I:I,1)
=J2*2^27-ROW()
=RANK(K2,K:K,0)
The formulas have to be copied down. The result is in column L. Ties are broken using the row number.
If you like to add a fourth criterion, you can do the following after having the formulas above in place:
Add the new criterion between columns C and D.
Insert three new columns between columns I and J.
Copy columns G:I to the new columns J:L.
Copy column G to column M, overwriting its content.
Change the formula in column L to point to the new criterion.
The factor 2^27 used in the formulas balances the precision of 53 bits available in double-precision numbers. This is enough to cover the row limit of current versions of Excel.
what I'm trying to do is a simple sumif for about 200k lines of data which causes problems for excel.
Basically my list looks like this
List of Companies Dummy1 Dummy2
Company A 0 1
Company A 0 1
Company A 1 1
Company B 1 1
Company B 0 1
Company B 0 1
....
and if there is a 1 in any row of column B for a specific company I need to plug a 1 in each row of column C for this company.
So Dummy 2 is basically the sum over Dummy 1 for all entries for a specific company.
The data is already sorted by column A.
Anyway, Excel goes crazy.
Is it just plain stupid what I'm doing here because I'm generating too many comparative operations?
What would be an easy way to accomplish what I'm trying to do here?
According to your sample data, filling C2:C200000 with,
=SUMIF(A:A, A2, B:B)
... will be performing 3× as many SUMIF calculations as is necessary. An IF formula only processes the part that is TRUE or FALSE depending on how the criteria resolves so changing the formula to something like the following,
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1)
... should drastically reduce the processing in a calculation cycle. The degree of improvement will depend upon how many duplicate company values are in column A and whether column A has been sorted to keep the company names together. The smaller the number of unique companies, the more improvement you will see. In short, unless the company changes from row to row, the SUMIF is not calculated.
Sample Calculation Timing Environment:
Excel 2010 64-bit (14.0.7015.1000) running under Windows 7 Pro on a business class i5 laptop w/8Gbs DRAM.
XLSB; Calculation Manual; Recalculate workbook before saving OFF; Save AutoRecovery information OFF
Test 1: 26 companies (Company A to Company Z), each with ~7683 entries in column A, sorted. Column B random 0's and 1's reverted to values. C2:C200000 cleared, worksheet calculated then formula filled in C2:C200000 and new calculation cycle timed to completion.
formula calculation cycle (hh:mm:ss)
=SUMIF(A:A, A2, B:B) 00:21:44
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1) 00:00:09
Test 2: 5000 companies (Company 0001 to Company 5000), each with ~40 entries in column A, sorted. Column B random 0's and 1's reverted to values. C2:C200000 cleared, worksheet calculated then formula filled in C2:C200000 and new calculation cycle timed to completion.
formula calculation cycle (hh:mm:ss)
=SUMIF(A:A, A2, B:B) 00:22:10
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1) 00:00:37
You cannot magically break the physical laws of time and space but sometimes you can fool them. This solution may not be perfect but perhaps it is something that you can live with.
On a related note, large(r) worksheets benefit from having their formulas reverted to result values once calculations have been made if those results are not likely to change on a regular basis. While Copy, Paste Special, Values is a reasonably quick method of accomplishing this, selecting a large number of cells containing formulas and running the following sub macro is lightning quick.
sub sel_2_Value
application.enableevents = false
selection = selection.value
application.enableevents = true
end sub
If locale differences are not important (currency, dates, etc) then selection = selection.value2 is even better.
The only thing that will slow down the above operation is formulas with dependents within the range being reverted to values as they will be recalculated.
I think the better way to solve this is by ussing pivot table, you can sum Dummy1 by company and get the data as summary.
Here is an examples:
http://www.excel-easy.com/data-analysis/pivot-tables.html
enter link description here
I hope this help