Calculations for every unique/distinct values - excel

I have a very large data set that I need to work with and in my calculations three columns are of interest. Let's call the different columns [A], [B] and [C]. In [A] I have a list of different company names, where each company name might occur several times. I have created a table for the data and sorted the company names alphabetically. Let's say I have the company name X in A2:A5 and the calculation that needs to be done is SUMPRODUCT(B2:B5;C2:C5)/SUM(C2:C5). How do I calculate this for every unique/distinct company name and present the result in a nice way?

Use the following formula in a new column D for example in D2:
=IF(A2<>A1,SUMPRODUCT(--($A$2:$A$10=A2)*$B$2:$B$10*$C$2:$C$10)/SUMIF($A$2:$A$10,A2,$C$2:$C$10),"")
and drag it down whenever the value in A changes it will write a result
Change A2:A10, B2:B10 and C2:C10 to correspond your last row and keep $ for fixed references
To correspond to your settings I wrote ";" in the following:
=IF(A2<>A1;SUMPRODUCT(--($A$2:$A$10=A2)*$B$2:$B$10*$C$2:$C$10)/SUMIF($A$2:$A$10;A2;$C$2:$C$10);"")

Put two INDEX functions together with a : for the range.
=IF(A2<>A3, SUMPRODUCT(INDEX(B:B, MATCH(A2, A:A, 0)):INDEX(B:B, MATCH(A2&"z", A:A)), INDEX(C:C, MATCH(A2, A:A, 0)):INDEX(C:C, MATCH(A2&"z", A:A)))/SUMIFS(C:C, A:A, A2), TEXT(,))
The first occurrence is found with,
INDEX(B:B, MATCH(A2, A:A, 0))
The last occurrence (on sorted data) is found with,
INDEX(B:B, MATCH(A2&"z", A:A))
Note that I changed your SUM to a SUMIFS to make life a little easier.

Related

Excel- Sum groups of similar cell contents using wildcards

I am trying to sum the amounts in column B based on the types of symbols in column A. Any symbol with "EW" at the start needs to be grouped and by date month - see column D. The second symbol comes in two formats but also needs to be grouped, so "OES" and "OMSX" needs to be grouped together and with their date month. I know I need wildcards here but I cannot get this to work.
EDIT, correct EW, accidentally had "EWS" before, apologies to anyone who responded
Thius should work if you can get your data corrected.
=SUMIFS(B:B, A:A, REPLACE(D2, FIND(" ", D2), LEN(D2), "*"), A:A, REPLACE(D2, 1, FIND(" ", D2), "*")&"*")

Excel - VLOOKUP issue

I have this:
My issue:
in column F2 I want R2 -> IF -> B2+C2+D2 exists in O:O;P:P;Q:Q
but I do not know how to use VLOOKUP with multiple columns
my attemp was '=VLOOKUP(B2;O:O;4;FALSE)' and I do not know why I used 4... cuz I count R2 as index 4 from O2...
This is just a multiple column lookup. While two column lookups are more common, three column lookups are not that rare.
=index(r:r, aggregate(15, 6, row($2:$999)/((o$2:o$999=b2)*(p$2:p$999=c2)*(q$2:q$999=d2)), 1))
That will return the value from column R for the first matching set of columns O:Q. In the case of multiple matches, you could return the last match by changing 15 to 14.
Since your returned results are expected to be numeric, a sumifs could also be used.
=sumifs(r:r, o:o, b2, p:p, c2, q:q, d2)
However this would return skewed results is more than a single match was found.
In your own vlookup, the 4 represents the fourth column of your lookup range. Since you were only providing a single column (e.g. O:O), you would never return the value from column R without changing the lookup range to O:R.
Another approach (since they look like dates DD/MM/YYYY) would be to convert each group of three columns to dates
=INDEX(R:R,MATCH(DATE(D2,C2,B2),INDEX(DATE(Q:Q,P:P,O:O),0),0))
#Jeeped is right to point out that this is slow on full columns, so plz use a formula like
=INDEX(R$1:R$100,MATCH(DATE(D2,C2,B2),INDEX(DATE(Q$1:Q$100,P$1:P$100,O$1:O$100),0),0))
and adjust the ranges to include your data.
If O:O;P:P;Q:Q is unique, you can use:
=lookup(1,0/((O:O=B2)*(P:P=C2)*(Q:Q=D2), R:R)

Extracting unique values from a range, but repeating each value x times before next unique value

I have a bit of a strange scenario. I am attempting to create a formula that, as the title suggests, that is supposed to extra unique values from a range (without blanks) but repeat them a set number (let's say 5) of times before returning the next unique value. For example, this is the A1:A6 (should be top to down but for the sake of space, I formatted like this):
1,2,2,3,3,3
What I want is a formula to drag them down to this on Column B.
1,1,1,1,1,2,2,2,2,2,3,3,3,3,3
As reference, the formula I'm currently using to return the unique values to repeat once, without blanks, is this:
=INDEX(A1:A6, MATCH(0, IF(ISBLANK(A1:A6), 1, COUNTIF(B$1:$B1, A1:A6)), 0))
Any suggestions? Many thanks in advance!
Put this in B2 and copy down:
=IFERROR(INDEX(A:A,AGGREGATE(15,6,ROW($A$1:$A$6)/((COUNTIF($B$1:B1,$A$1:$A$6)<5)*($A$1:$A$6<>"")),1)),"")

Multiple IF statements to determine a match and assign value

I am running a small study that basically needs to match suitable people into pairs and assign them a pair number, so each can be assigned into group A or B.
They basically need to be from the same Medical Clinic, same gender, and either below 80, or 80+ years of age.
I'm not sure if this is even possible with excel, but basically I have a sheet with a form that you enter the new participants information. I need a formula that basically checks these 3 criteria against previous entries to find someone who matches on all 3, then assign the same pair number. If it can't find a suitable match, it needs to assign a new pair number.
In the above sample data set, I want I3 to realise that C3, D3 and E3 all match C2, D2 and E2, then put a 1 in I3.
Then for I4, it would assign a 2 as it doesn't match any entries above it. Same for I5. Then I6 would realise the match in I4 and put a 2.
Not sure if this makes sense. Also there can't be more than 2 of each pair #, but I can deal with that after I am able to get the numbers generating.
If you can change column D to Age group, (71-80, 81-90 etc.) the following formula would do what you want as a first step (more than 2 people grouped together). Paste the following formula in I3 and hit cntrl+shift+enter as it is a array formula. Copy it down to other cells below.
=IFERROR(INDEX(F$1:F2,MATCH(C3&D3&E3,C$1:C2&D$1:D2&E$1:E2,0)),MAX(F$2:F2)+1)
This matches a combination of strings in columns C, D and E in the current row to array of string combination in previous rows and assigns the same Pair number, if there is no match it gets the next new number.
Try this modified formula (array formula) to not put more than two entries in a group. I have created another column G which is G2 = C2&D2&E2
=IF(COUNTIF(G$1:G2,G3)<2,IFERROR(INDEX(F$1:F2,MATCH(C3&D3&E3,C$1:C2&D$1:D2&E$1:E2,0)),MAX(F$2:F2)+1),MAX(F$2:F2)+1)
This response expands on your original requirements by returning the actual PT ID numbers of the matched pairs as well as a unique 'paired group' identifier.
The original criteria age brackets (e.g. 70-80, 81+) are used and no matched pair is used more than once.
If you already have a match from further up the data then you will want to return the paired PT ID. A simple INDEX/MATCH function pair can do that. If a match has not already been made then the IFERROR function can pass processing over to a nested INDEX function that uses the AGGREGATE¹ function rather then MATCH to return the appropriate row number.
AGGREGATE is used with its SMALL sub-function. This allows the COUNTIFS function to increment to the second, third, etc. pairing by examining the matches³ made previously.
With expanded sample data the formula in I2:K2 are,
'formula for I2
=IFERROR(IFERROR(INDEX(B$1:B1, MATCH(B2, I$1:I1, 0)),
INDEX(B:B, AGGREGATE(15, 6, ROW(B$1:INDEX(B:B, MATCH(1E+99, B:B)))/
((B:B<>B2)*(C:C=C2)*(E:E=E2)*IF(D2>80, D:D>80, (D:D>=70)*(D:D<=80))),
COUNTIFS(C$1:C1, C2, E$1:E1, E2, D$1:D1, IF(D2>80, ">80", ">=70"), D$1:D1, IF(D2>80, ">80", "<=80"))+1))),
"NO MATCH")
'formula for J2
=IFERROR(INDEX(J$1:J1, MATCH(I2, B$1:B1, 0)), MAX(J$1:J1)+1)
'formula for K2 (volatile and random - see footnote ⁴)
=IFERROR(IF(INDEX(K$1:K1, MATCH(J2, J$1:J1, 0))="A", "B", "A"), IF(ISNUMBER(I2), CHAR(RANDBETWEEN(65, 66)), ""))
Fill down as necessary.
 
¹ The AGGREGATE function was introduced with Excel 2010. It is not available in earlier versions.
² While AGGREGATE-based formulas are entered as conventional formulas (i.e. without CSE), AGGREGATE does apply cyclic array-style processing. Try and reduce your full-column references to ranges more closely representing the extents of your actual data. Array processing chews up calculation cycles logarithmically so it is good practise to narrow the referenced ranges to a minimum. See Guidelines and examples of array formulas for more information.
³ Your original narrative stated '... either between 70 and 80 or above 80 years of age.' while your subsequent comment stated 'Under 80 or 80+'. These are not the same thing. I've used the former original description since you never edited the question for clarification.
⁴ The formula destined for K2 is volatile and uses the RANDBETWEEN function. Once you are happy with the results, use Copy, Paste Special, Values to revert the formula to its underlying values. Leaving the formula intact with RANDBETWEEN means the values could change with any change throughout the workbook. Only the initial A/B value for each matched pair is random; the match in the pair will always be the A or B counterpart.
First, you can create a new column (J) that concatenates the 3 values you are comparing, you can do this with a formula like this:
=C2&IF(D2>80,"YES","NO")&D3
The you can fill I2 with a formula that checks is the concatenated value repeats in the data set with a fomula like this:
=COUNTIFS($J$2:$J$6,J2)>1

SUMIFS within a date range

I am working on a excel file for my monthly budget. I am exporting my monthly transactions into a CSV file and then copying it over.
I have a tab for each month and all my categorizes that I am budgeting for. I then copy over the csv file to a tab called transactions in my budget workbook. Then I have a drop down list with all the categorizes from my monthly categories. Once I have categorized all my transactions that will total up on the corresponding budget sheet.
The issue I am having a hard time with is how do I create a specific equation that will recognize the month and then the specific category item. For an example "His - Income" I can easily use a sumif to get that information from a list, but how do I now separate further it for April only.
The data is organized on a tab by Date in column "A", description in column "B", and amount in "D". I am looking for an equation that will find "His - income" for 4/1/2015 to 4/30/2015.
You want to use a SUMIFS
SUMIFS(sum_range, criteria_range1, criteria1, [criteria_range2,criteria2])
With one IF for the category, and two more for the date range. I will look something like this:
=SUMIFS(D1:D50, B1:50,"=His - income",A1:A50,">=4/1/2015",A1:A50,"<=4/1/2015")
I would change all the hard coded ranges (e.g. D1:D50 to named ranges) as well as the dates. If you put the dates in a cell it will change the formula to something like this: BudgetDates,"<="&F$1 (notice the ampersand)
The SUMIFS function can use operators other than equals; equals is just the default. With 01-Apr-2015 in J1 (sometimes 04/01/2015 formatted a mmmm) and His - Income in K1, you could use one of the following.
=sumifs(D:D, B:B, K1, A:A, ">="&J1, A:A, "<"&edate(J1, 1))
=sumifs(D:D, B:B, K1, A:A, ">="&date(2015, 4, 1), A:A, "<"&date(2015, 5, 1))
=sumifs(D:D, B:B, "His - Income", A:A, ">=4/1/2015", A:A, "<5/1/2015")
Unless you want to use it as a visual reminder, the = for exact martch is unnecessary. Typically, the upper limit of the date range is less than one day higher in case the dates contain times as well. The EDATE function will raise or lower a date by a number of months the equal to the integer in its months parameter.
For all intents and purposes, there is no adverse effect when using full column references with SUMIFS.

Resources