find the serial number with criteria - excel

I have an Excel database, and they are asking me to find the 8-digit pserial value for the oldest current smoker in the survey. The person is a smoker if cigst is 4.
The pserial numbers are in column 3 and smokers are in column N.Age is column
this is the database, but there is more data inside the database:
I think the functions to use are INDEX, MAX and IF, but not sure on what order to use.

The following should work, assuming the input data are in the range of 11-100 rows. You can adapt it to your actual range. Considering also no excel version constraint based on tags listed in the question .
=LET(f, FILTER(HSTACK(C11:C100, H11:H100), N11:N100=4), fa, INDEX(f,,1),
fb, INDEX(f,,2), TEXTJOIN(",",,FILTER(fa, fb=MAX(fb))))
You filter first a subset of pserial and age columns based on cigst column values equal to 4, then just find the maximum. Preventing more than one pserial value correspond to the maximum age, we use TEXTJOIN to concatenate the result. If it is not expected for your real case, then you can remove this last step.

If there are several people with the same age, it is easiest to use the filter function:
=FILTER($C$11:$C$39;($H$11:$H$39=MAXIFS($H$11:$H$39;$N$11:$N$39;4))*($N$11:$N$39=4))
Maybe you have replace semicolon with comma, depending of your Excel version.

Related

MAXIFS function that increments to return 2nd,3rd largest numbers

I have a column called Paid which consists of currency figures which is in column A.
I also have a column called ID to uniquely define each of the rows which is in column B.
I am trying to make a top ten list of the highest results from this column.
For the first result, I use a simple MAX function.
For the following, I use
=MAXIFS(Combined!$A$1:$A$2916,$A$1:$A$2916,"<"&$C1)
where C1 is the first result in the top 10 list.
THE ISSUE: Currently, if two results are equal in Paid, one will get skipped and not be on the list.
I've tried using something in the nature of
=MAXIFS($A$1:$A$2916,$B$2:$B$2916,"<>"&$D1)
where D1 is the ID of the 1st in the top 10 list.
My issue here is that I want to use a condition to say that none of the ID's in the B column appear in the above entries in the D column(IDs of the top 10) and I don't know how to do this without brute forcing and writing each cell into each condition.
Is there a smoother way to write a "not in" check in a maxif function?
Thanks
For reference, an example of what I want would look like:
Example Data
Maybe try:
Formula in C2:
=INDEX(SORT(A2:B11,1,-1),SEQUENCE(10),{1,2})
Where 10 is your top N figure.
Note: The above works fine in retrieving only 10 figures, yet if you insist you want top have ID '5' above '3' in rank 2, and ID '9' above '1' for rank 8, then use:
=INDEX(SORTBY(A2:B11,A2:A11,-1;B2:B11,-1),SEQUENCE(10),{1,2})
I could see this being relevant if you have two identical payments "battle" for e.g. 1st or 10th place with one of multiple id's comming short.

Deal with Ties when Using Index/Match

I'm currently pulling the top (5) number of numerical values from one sheet and inputting them into a different sheet. Each number is within its own column and there is a name matching that column, EX:
And so, having a tie is common with the data that I'm working with, so it nearly deprecates my formulas.
For getting the name:
=INDEX('Total Cases by Categories'!$B$18:$B$50, MATCH(LARGE('Total Cases by Categories'!$H$18:$H$50, A39),'Total Cases by Categories'!$H$18:$H$50, 0))
For getting the numerical value associated with the name:
=LARGE('Total Cases by Categories'!$H$18:$H, A39)
And so, when there are 2 people with the same numerical value associated within a category, then that person appears twice, I assume because of their position within the sheet.
So something like this happens:
So in the event of a tie, I would want to list both names that have the same amount of points instead of the first name that shows up with the duplicated value.
Any help would be appreciated!
Actually, LARGE will give you both of tied names. It's MATCH that can't look beyond the first. To the best of my knowledge there is no way around that (the difficult one being not to use MATCH). Therefore the solution is to have no ties.
This is achieved with helper columns that contain no identical numbers. This can be achieved by adding an insignificant decimal. Since you are dealing with integers, adding 0.1 would be insignificant for your purposes but 13.1 is different from 13.2. If you need to extract the "real" number from this use INT(13.2).
Using the row number to generate an insignificant decimal is popular for this purpose. In row 1 ROW()/10 will return 0.1. But in row 10 ROW()/10 will return 1.0 which isn't an insignificant number anymore. Therefore you have to work with ROW()/100 or an even larger divisor, depending upon how many rows you have. Try ROW()/10^6 - any decimal will do the tie-breaking job.
You may not like that using ROW() will list tied participants in the order in which they appear in the worksheet. The differentiating decimals can be created by any other means that doesn't create ties in itself.
Normally, the helper columns with the decimals added will be hidden. They contain a formula like =D23 + (ROW()/10000) which manages itself. You can then use that column for the MATCH function to list all participants in the order of LARGE using the helper column or the original. Just make sure that MATCH refers to the helper column.

Excel multilevel array formula with partial string matches to sum resultant cells

I've been trying to sort this for over a day now without much luck. I have successfully used SUMIFS, INDEX, MATCH, COUNTIF, "--" etc array functions previously and am not a novice, but also not an expert on these. I can't seem to weave these together correctly, and likely on an altogether incorrect path.
Basically, I am trying to aggregate data from multiple spreadsheets, requiring a mapping of various items (rows) into a canonical form for summing.
The image here shows a representative, but simplified version of my quest. Each "region" on this example spreadsheet (Final..., Mapping, DataSet1, DataSet2) is actually in different spreadsheets, and there are several sheets with 50-150 rows in each xlsx.
Note that the names in Column B are quite arbitrary (meaning not all P1's have an 'x' pattern, like shown here as x1, x2, etc. Do not rely on any pattern in the names, except the x, y , z in the Mapping table are substrings (case insensitive, trailing match) of the names in Column B in the DataSets.
And in the image, the Final Result Table (summed manually) is what I want to compute via(an array) formula: A single formula would be ideal (given I have many spreadsheets from which the monthly data is being pulled from, so I can't readily modify but can create an interim spreadsheet if required, so open to helper columns or helper rows).
Here's the process - For each name (B3-B5) in the Final Result Table, I want to sum the name from it's components as follows:
Lookup all the matches in the Mapping Table (so for P1, the formula =IF($C$10:$C$15=$B3, $B$10:$B$15,"") gives {"x1";"";"";"x2";"";"x3"}.
I then want to search each of x1, x2, and x3 in B19:B26 to get rows 21, 22, 24, 25, 26 in DataSet1 and B31:B35 to get row 32 in DataSet2, to then add up the Jan totals into C3. (Effectively,
C3=C21+C22+C24+C25+C26+C32). Same for P2 and P3, and thru Feb, Mar, ...
I am stuck on how to remove blank or 0 or Div0 or such "error rows" from the interim result in 2, and also need to use 2 arrays of different sizes (3 valid rows in example 2 above, ignoring blanks) to search many rows in DataSets. I tried SEARCH("*"&IF($C$10:$C$15=$B3, $B$10:$B$15,""), $B$19:$B$26) but get unexpected results. I have tried to replace text in the interim result {"x1";"";"";"x2";"";"x3"} with TRUE/FALSE, and 1/0, etc. to help with INDEX or MATCH, but am stymied by errors in downstream ("surrounding") formulas.
Thanks in advance.
Here is a solution without resorting to nasty (imo) CSE formulas.
= SUMPRODUCT($C$19:$F$26*(COUNTIFS($B$10:$B$15, RIGHT($B$19:$B$26,2),$C$10:$C$15,$B3)>0)*($C$18:$F$18=C$2))
+
SUMPRODUCT($C$31:$F$35*(COUNTIFS($B$10:$B$15, RIGHT($B$31:$B$35,2),$C$10:$C$15,$B3)>0)*($C$30:$F$30=C$2))
There is one SUMPRODUCT for each data set. If possible, it would be better to put all your data sets into a single table with a column identify which data set it is a part of.
The way it works is to takes each values in your data set and multiplies it by whether the 2 right most character appear in your mapping table for that P code, multiplied by whether the value is in the correct month. So it returns 0 if either of those conditions are false. Then returns the sum.
UPDATE IN RESPONSE TO OP COMMENTS
If, the X,Y, Z codes are not always 2 digits but the first part is ALWAYS 8 digits, you can easily amend the:
RIGHT($B$19:$B$26,2)
to be:
RIGHT($B$19:$B$26,LEN($B$19:$B$26)-8)
Making the formula for the first data set:
=SUMPRODUCT($C$19:$F$26*(COUNTIFS($B$10:$B$15, RIGHT($B$19:$B$26,LEN($B$19:$B$26)-8),$C$10:$C$15,$B3)>0)*($C$18:$F$18=C$2))
And you can amend for other data sets and simply add them together.
Nice challenge! Are you willing to drop all your tables (DataSet1, DataSet2...) into one spreadsheet, so that we can refer just one single range for each month?
Here's one solution (hopefully a good starting point) - array formula (Ctrl+Shift+Enter):
=SUMPRODUCT(IFERROR(IF(TRANSPOSE(IF($B3=$C$10:$C$15,$B$10:$B$15,""))=RIGHT($B$18:$B$36,2),C$18:C$36,0),0))

SUMIFS with intermediate VLOOKUP in the criteria

I have 3 tables, 1 of which I want to fill in columns with data based on the other 2. Tables are roughly structured as follows:
Table 1 (Semi-Static Data)
SubGroup Group
----------- -----------
subgroup(1) group(a)
subgroup(2) group(b)
subgroup(3) group(b)
subgroup(4) group(c)
etc.
Table 2 (Variable Data)
SubGroup DataValue
----------- -----------
subgroup(1) datavalue(i)
subgroup(2) datavalue(ii)
subgroup(3) datavalue(iii)
subgroup(4) datavalue(iv)
etc.
Table 3 (Results)
Group TotalValue
----------- -----------
group(a) totalvalue(m)
group(b) totalvalue(n)
group(c) totalvalue(o)
etc.
Where the TotalValue is the sum of all DataValue's for all subgroups that belong to that particular Group.
e.g. for group(b) ---> totalvalue(n) = datavalue(ii) + datavalue(iii)
I am looking to achieve this calculation without adding any additional columns to the Data tables nor using VBA.
Basically I need to perform a COUNTIFS where there is an additional VLOOKUP matching the subgroup criteria range to the group it belongs to, and then only summing for datavalue's that match the group being evaluated. I have tried using array formulas but I'm having issues making it work. Any assistance would be very appreciated. Thank you,
EDIT: Wanted to add some details surrounding my question. First all Google searches did not provide a suitable answer. All the links had solutions to a slightly different problem were the VLOOKUP term is not dependent on the SUMIFS criteria but rather another single static variable. Stack Overflow offered similar solutions. Please let me know if anymore details are required to make my post suitable for this forum. Thank you again.
You can use the SUMPRODUCT function to do it all at once. The first reference $B$2:$B$5 is for the Group names, the second reference $E$2:$E$5 is for the datavalues. The G2 reference is for the group names in the third table, you can enter this formula for the first reference and then drag and fill for the rest.
=SUMPRODUCT($E$2:$E$5 * (G2 = $B$2:$B$5))
Some cell references, and sample data, would be helpful but something like this might be what you want:
=SUMIF(C:C,"="&INDEX(A:A,MATCH(E5,B:B,0)),D:D)
WADR & IMHO, this is simply bad worksheet design. For lack of a single cross-reference column in Table2, any solution would have to be a VBA User Defined Formula or an overly complicated array formula (the latter of which I am not even sure is possible). The data tables are not normalized database tables you can INNER JOIN or GROUP BY ... HAVING.
The formula you are trying to achieve is akin to,
=SUMPRODUCT(SUMIF(D:D, {"subgroup(2)","subgroup(3)"}, E:E))
That only works with hard-coded values as arrayed constants (e.g. {"subgroup(2)","subgroup(3)"}). I know of no way to spit a dynamic list back into the formula using additional native Excel functions but VBA offers some possibilities.
HOWEVER,
The simple addition of one more column to Table2 with a very basic VLOOKUP reduces all of your problems to a SUMIF.
     
The formula in the new column D, row 2 is,
=VLOOKUP(E2, A:B, 2, FALSE)
The formula in I2 is,
=SUMIF(D:D, H2,F:F )
Fill each down as necessary. Sorry if that is not what you wanted to hear.
Thank you everyone that responded and reviewed this post. I have managed to resolve this using an array formula and some matrix algebra. Please note that I am not using VLOOKUP (this operator cannot be performed on arrays) nor SUMIFS as my title states.
My final formula looks like this:
{=SUM(IF([Table2.xlsx]Sheet1!SubGroup=TRANSPOSE(IF([Table1.xlsx]Sheet1!Group=G2,[Table1.xlsx]Sheet1!SubGroup,"")),[Table2.xlsx]Sheet1!DataValue))}
Very simply, I create an array variable that compares the Group being evaluated (e.g. cell G2) with the Groups column for Table 1 and outputs the corresponding matching SubGroups. This results in an array with as many rows as Table 1 had (N) and 1 column: Nx1. I then transpose that array (1xN) and compare it to the SubGroups column (Mx1, M being the number of rows in Table 2) and output the DataValues column for the rows that have a corresponding SubGroup (MxN). Then I perform a sum of the whole array to return a single value.
Notice that as I didn't include a value_if_false output return on either IF operators, it will just populate with FALSE in the arrays were the conditions are not met. This does not matter though for the final result. In the first IF, FALSE will not match the SubGroups so will be ignored. For the second all values FALSE passed to SUM will be calculated as 0. The more complicated question is that it grows the amount of memory required to process as we are not filtering to just have the values we want.
For this application I decided against filtering the subarray as the trade-off in resource utilization was acceptable. If the data sets were any bigger though, I would definitely try doing it. Another concern was that I did not understand fully the filtering logic that I was using based on http://exceltactics.com/make-filtered-list-sub-arrays-excel-using-small/ so decided to simplify. Will revisit this concept latter as I think it will work. I might have completed this solution but was missing transposing the array to compare properly so abandoned this route.

nested excel functions with conditional logic

Just getting started in Excel and I was working with a database extract where I need to count values only if items in another column are unique.
So- below is my starting point:
=SUMPRODUCT(COUNTIF(C3:C94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"}))
what i'd like to figure out is the syntax to do something like this-
=sumproduct (Countif range1 criteria..., where range2 criteria="is unique value")
Am I getting this right? The syntax is a bit confusing, and I'm not sure I've chosen the right functions for the task.
I just had to solve this same problem a week ago.
This method works even when you can't always sort on the grouping column (J in your case). If you can keep the data sorted, #MikeD 's solution will scale better.
Firstly, do you know the FREQUENCY trick for counting unique numbers? FREQUENCY is designed to create histograms. It takes two arrays, 'data' and 'bins'. It sorts 'bins', then creates an output array that's one longer than 'bins'. Then it takes each value in 'data' and determines which bin it belongs in, incrementing the output array accordingly. It returns the array. Here's the important part: If a value appears in 'bins' more than once, any 'data' value meant for that bin goes in the first occurrence. The trick is to use the same array for both 'data' and 'bins'. Think it through, and you'll see that there's one non-zero value in the output for each unique number in the input. Note that it only counts numbers.
In short, I use this:
=SUM(SIGN(FREQUENCY(<array>,<array>)))
to count unique numeric values in <array>
From this, we just need to construct arrays containing numbers where appropriate and text elsewhere.
In the example below, I'm counting unique days when the color is red and the fruit is citrus:
This is my conditional array, returning 1 or true for the rows I'm interested in:
($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0))
Note that this requires ctrl-shift-enter to be used as an array formula.
Since the value I'm grouping by for uniqueness is text (as is yours), I need to convert it to numeric. I use:
MATCH($C$2:$C$10,$C$2:$C$10,0)
Note that this also requires ctrl-shift-enter
So, this is the array of numeric values within which I'm looking for uniqueness:
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),"")
Now I plug that into my uniqueness counter:
=SUM(SIGN(FREQUENCY(<array>,<array>)))
to get:
=SUM(SIGN(FREQUENCY(
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),""),
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),"")
)))
Again, this must be entered as an array formula using ctrl-shift-enter. Replacing SUM with SUMPRODUCT will not cut it.
In your example, you'd use something like:
=SUM(SIGN(FREQUENCY(
IF(ISNUMBER(MATCH($C$3:$C$94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"},0)),MATCH($J$3:$J$94735,$J$3:$J$94735,0),""),
IF(ISNUMBER(MATCH($C$3:$C$94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"},0)),MATCH($J$3:$J$94735,$J$3:$J$94735,0),"")
)))
I'll note, though, that scaling might be a problem on data sets as large as yours. I tested it on larger data sets, and it was fairly fast on the order of 10k rows, but really slow on the order of 100k rows, such as yours. The internal arrays are plenty fast, but the FREQUENCY function slows down. I'm not sure, but I'd guess it's between O(n log n) and O(n^2) depending on how the sort is implemented.
Maybe this doesn't matter - none of this is volatile, so it'll just need to calculate once upon refreshing the data. If the column data is changing, though, this could be painful.
Asuming the source data is sorted by the key value [A], start with determining the occurence of the key column
B2: =IF(A2=A1;B1+1;1)
Next determine a group sum
C2: =SUMIF($A$2:$A$9;A2;$B$2:$B$9)
A key is unique if its group sum is exactly 1
D2: =(C2=1)
To count records which match a certain criterium AND are unique, include column D in a =IF(AND(D2, [yourcondition];1;0) and sum this column
Another option is to asume a key unique within a sorted list if it is unequal to both its predecessor and successor, so you could find the unique records like
E2: =AND(A2<>A1;A2<>A3)
G2: =IF(AND(E2;F2="this");1;0)
E and G can of course be combined into one single formula (not sure though if that helps ...)
G2(2): =IF(AND(AND(A2<>A1;A2<>A3);F2="this");1;0)
resolving unnecessarily nested AND's:
G2(3): =IF(AND(A2<>A1;A2<>A3;F2="this");1;0)
all formulas in row 2 should be copied down to the end of the list

Resources