TOP Table is Input, and bottom table is preview for required output.
For Each ID I need to find earliest datetime. I also need other information from other columns (please see image below).
My current solution is:
In Cell E2 =A2
Cell E3 drag down =IF(E2<>A3,IF(E1=A3,"",A3),"")
In Cell F2 drag down =IF(E2<>"",MIN(IF($A$2:$A$14=E2,$C$2:$C$14)),"") Ctrl+Shift+Enter
One more option without any intermediate calculations:
Select the whole range starting E2 and to the last row where IDs are located - for the sample given it's row 14, so select range E2:E14: =IFERROR(INDEX($A$2:$A$14,SMALL(IF(MATCH($A$2:$A$14,$A$2:$A$14,0)=ROW(INDIRECT("1:"&ROWS($A$2:$A$14))),MATCH($A$2:$A$14,$A$2:$A$14,0),""),ROW(INDIRECT("1:"&ROWS($A$2:$A$14))))),"") and press CTRL+SHIFT+ENTER instead of usual ENTER - this will define a Multicell ARRAY formula and will result in curly {} brackets around it (but do NOT type them manually!).
F2 (ID2): =IF(E2="","",SUMPRODUCT(--(E2=$A$2:$A$14),--(G2=$C$2:$C$14),$B$2:$B$14)) - normal formula.
G2 (Min Date): =IF(E2="","",MIN(IF(E2=$A$2:$A$14,$C$2:$C$14,2^100))) and press CTRL+SHIFT+ENTER instead of usual ENTER - this will define an ARRAY formula and will result in curly {} brackets around it (but do NOT type them manually!).
H2 (InCh): =IF(E2="","",INDEX($D$2:$D$14,SUMPRODUCT(--(E2=$A$2:$A$14),--(F2=$B$2:$B$14),--(G2=$C$2:$C$14),ROW(INDIRECT("1:"&ROWS($D$2:$D$14)))))) - normal formula.
Remarks:
To make the solution more compact and easy to read, define named range for ID column, and then reference other data columns using OFFSET.
ID2 values may not be unique - as they are on the sample for IDs 1...3.
Resulting set for Min Date should be formatted the same way as source Date row.
The key formula of the solution - is multicell monster which returns unique IDs without empty rows - as OP requested)
Sample file: https://www.dropbox.com/s/d2098updfh8djnf/MinDateIDs.xlsx
This is quite a challenge... I think I have found an approach that works. For the sake of clarity, I used a few helper columns. Also, I did not use any named ranges but stuck with the column-row indications. You might want to change that.
It looks like this:
and zooming in to the relevant columns:
Column F contains an array formula to filter out duplicates. An approach is explained here. The formula I used in F2 is
=INDEX($A$2:$A$14, MATCH(MIN(IF(COUNTIF($F$1:F1,$A$2:$A$14)=0, 1, MAX((COUNTIF($A$2:$A$14, "<"&$A$2:$A$14)+1)*2))*(COUNTIF($A$2:$A$14, "<"&$A$2:$A$14)+1)), COUNTIF($A$2:$A$14, "<"&$A$2:$A$14)+1, 0))
Use Ctrl-Shift-Enter to confirm as array formula. Drag this down or copy into column F. Then columns G and H contain the starting and ending indices of the duplicate ID values. This answer helped, please upvote it :-). The two formulas used are:
=MATCH(2,1/FREQUENCY($F2,$A$2:$A$14))
in G2, and
=FREQUENCY($A$2:$A$14,$F2)
in H2. Again, drag them down to get the full column filled. Next, column I is for clarification only -- and for sanity checking. It contains the desired minimum date from each sub-array. Column J substitutes that formula into a MATCH to find the actual index of the desired date.
=MIN(OFFSET($C$2:$C$14,$G2-1,0,1+$H2-$G2,1))
in I2 and
=$G2-1+MATCH(2,1/FREQUENCY(MIN(OFFSET($C$2:$C$14,$G2-1,0,1+$H2-$G2,1)), OFFSET($C$2:$C$14,$G2-1,0,1+$H2-$G2,1))
in J2. Finally, columns L, M and N index into the original set of data via
=INDEX(B$2:B$14,$J2)
in L2, which you can drag horizontally and then vertically.
When you are done, you can hide the helper columns, or fold everything into big formulas. Good luck with that... There might be an easier way to achieve this, but I did not find it.
If you want the value from column D in G then assuming that column C values are unique you could just use a VLOOKUP, i.e. in G2 copied down
=VLOOKUP(F2,C$2:D$14,2,0)
Per your picture, they're all in the same sheet. Just sort by ID, then Date (ascending). As you work your way down the ID column, each time the ID changes, you know you've found the row with the minimum Date for that specific ID. Create an extra column to signify where ID changes occur, and filter for those rows (hide the column if you so desire).
And... voila.
Know this link is old, but there is a much shorter and easier way!
How about using a pivot table using the Minimum as field setting and then do a =GETPIVOTDATA() to get the information back!
Seems a lot simpler as these formulas!
Actually, I just realized I've been overthinking this...Excel keeps the top item and removes all that follow when removing duplicates.
So if you are going to create an extra working table anyway, why not just copy the range/columns you want to keep, then use the basic sort.
Sort first by ID, then by the column you want as the second filter. Be sure the sorts are in the order you want (e.g. newest to oldest, oldest to newest, A to Z, Largest to smallest, etc).
Once the data is sorted, remove duplicates based on ID. You are left with all of your columns of data, filtered by newest/oldest/largest/smallest per individual.
This worked for my table with 30,000+ records, filtered down to 1500 unique individuals with most recent (plus associated amount), and with a second filter, the largest (plus associated date) for each person.
Related
The first table below shows how much each person owes and who pays it (it's part of a larger model so I simplified it for our purposes here).
Our goal in the second table below is to give a sum when both the column and row value match.
For example: A (column C) paid $244.17 (D36:H48) in expenses for B (row 35).
Where am I wrong here? I have tried different methods suggested here.
This is another alternative, that only requires to extend the formula down, but not to the left, because on each row it returns an array with all column values. In cell I3 enter the following formula:
=MMULT(N(TRANSPOSE($A$3:$A$15=H3)),IF($B$3:$F$15="", 0, $B$3:$F$15))
or using LET to avoid repetition of the same range:
=LET(set, $B$3:$F$15, MMULT(N(TRANSPOSE($A$3:$A$15=H3)),IF(set="", 0, set))
Notes:
MMULT only works with numeric values, so empty cells need to be converted.
You can replace TRANSPOSE with TOROW if you want.
$-notation is not required in H3, because we extend the formula only down
Here is the output:
Note: This solution assumes header values to compare are the same, i.e. same values for Paid For (I2:M2) and Paid By (H3:H7). Which is the most common situation. That is why in the formula only Paid By column is used. If that is not the case, then the solution provided by #JB-007 is more flexible, because the values can be different, but then you need to extend the formula in both directions.
screenshot/s here refer:
=SUM($C$4:$E$6*($C$3:$E$3=C$8)*($B$4:$B$6=$B9))
(sumifs will really struglly to work across different dimensions)
PS - as you'll see most will advise sumproduct - I think it's overdue deprecation because there's very little (if anything) you can do with sumproduct that you cannot with sum. You can even do counts with sum SUM(1*($C$3:$E$3=C$8)*($B$4:$B$6=$B9))) returns the count of where these values are equivalent...
Save yourself the extra seven letters over and over! ☺
I want to calculate the sumproduct as pictured in the table in the "Revenue" column. However, the dataset is fairly big, I'm limited to excel and the standard functions.
It should find all variables with the keyword "weightl" and "sell" in it and multiply and sum them accordingly per row. In Row 3 for example:
2*3+3*6+2*3 = 30
I thought of using a kind of a dictionary to alter the search terms and go through each column. But I have no clue on how to put it all together.
I used this
{=+isnumber(search("weightl";F2:N2))+isnumber(search("sell";F2:N2))}
to create the 1/0 table of the original one in the hope this could lead me somewhere
and
=SUM(IF(IFERROR(SEARCH("weight";G2:M2);0)>0;IF(G3:M8<>8888;G3:M8)))
to calculate the total sum of the weight values but this doesn't help much here
Can this even be realized with normal functions? if not, how could a solution in VBA look like?
If your "weight" and "sell" columns are always two columns apart, then you can use this array formula which looks for the "weight" column and then multiplies it by the column 2 cells to the right:
hdrs refers to the range $A$1:$I$1 which contains the headers. But it could refer to the entire row, or a much large portion of Row 1
=SUM(IFERROR(SEARCH("*weight*",hdrs)*A2:G2,0) * IFERROR(SEARCH("*weight*",hdrs)*C2:I2,0))
If there might be a variable number of columns between "weight" and "sell", then you can try this array formula which looks for the "weight" and "sell" columns separately:
=SUM(INDEX(A2:I2,1,N(IF(1,AGGREGATE(15,6,SEARCH("*weight*",hdrs)*COLUMN(hdrs),ROW(INDIRECT("1:"&COUNTIF(hdrs,"*weight*")))))))*INDEX(A2:I2,1,N(IF(1,AGGREGATE(15,6,SEARCH("*sell*",hdrs)*COLUMN(hdrs),ROW(INDIRECT("1:"&COUNTIF(hdrs,"*weight*"))))))))
Since this is an array formula, you need to "confirm" it by holding down ctrl + shift while hitting enter. If you do this correctly, Excel will place braces {...} around the formula as observed in the formula bar
Note I just noticed you want to match "weight1", so just make the obvious change in the above formulas.
Here is a formula that should do the matching in the way that you're thinking:
=SUM(A2:I2*ISNUMBER(FIND("weight",A1:I1))*IFERROR(INDEX(A2:I2,N(IF({1},MATCH("*sell"&RIGHT(A1:I1,LEN(A1:I1)-FIND("weightl",A1:I1)-6),A1:I1,0)))),0))
Must be entered as an array formula using CtrlShiftEnter
Note I'm finding the 'sell' header which matches the 'weightl' header, so weightl1_1_4 will match with sell1_1_4 etc., but I'm now wondering if this is necessary - maybe the weight just matches with the next sell, which would be easier.
Having a list of days suchs as:
01-giu-16
01-giu-16
01-giu-16
31-mag-16
31-mag-16
31-mag-16
31-mag-16
30-mag-16
I was looking for an excel formula that helps me count the number of unique days in the list (in this example 3)
Moreover I need the count only for the dates which have a specific ID in the next column (for example 1565)
Without any additional criteria, you can achieve the uniqueness count by using
=SUMPRODUCT(1/COUNTIF(A1:A8,A1:A8)), assuming your data are in the range A1:A8.
To evaluate subject to additional criteria (suppose they are in column B), use
{=SUM(--(FREQUENCY(IF(B1:B8=1565,MATCH(A1:A8,A1:A8,0)),ROW(A1:A8)-ROW(A1)+1)>0))}
This is an array formula: use Ctrl + Shift + Return once you're done editing (and don't type the curly braces yourself). Personally though I think this exceeds the reasonable threshold for complexity: I'd be inclined to adopt the first approach on a column that represents an intermediate transformation of your input data.
Lets assume your data is in Column A and it has a header row. So the first data number will actually be in A2. Place this formula in B2 and copy down beside your list. It will generate a list of unique cell numbers from column A. Once you have the list you simply need to use a function to count the side of it.
=iferror(INDEX($A$2:$A$5,MATCH(0,INDEX(COUNTIF($B$1:B1,$A$2:$A$5),0,0),0)),"")
in C2 you can use the following formula to get the number of unique cell numbers
=COUNTA(B2:B9)-COUNTIF(B2:B9,"")
In D2 you can use the following formula to get the count of each unique cell number from your original list. Copy it down as far as you need to go.
=IF(B5="","",COUNTIF($A$2:$A$9,B5))
I'm working on data from a population of people with allergies. Each person has a unique ExceptionID, and each allergen has a unique AllergenID (451 in total).
I have a data table with 2 columns (ExceptionID and AllergenID), where each person's allergies are listed row by row. This means that the ExceptionID column has repeated values for people with multiple allergies, and the AllergenID column has repeated values for the different people who have that allergy.
I am trying to count how many times each pair of allergies is present in this population (e.g. Allergen#107 & Allergen#108, Allergen#107 & Allergen#109,etc). To keep it simple I've created a matrix of 451 rows X 451 columns, representing every pair (twice actually because A/B and B/A are equivalent).
I somehow need to use the row name (allergenID) to lookup the ExceptionID in my data table, and count the cases where that matches the ExceptionIDs from the column name (also AllergenID). I have no problem using Vlookup or Index/Match, but I'm struggling with the correct combination of a lookup and Sumproduct or Countif formula.
Any help is greatly appreciated!
Mike
PS I'm using Excel 2016 if that changes anything.
-=UPDATE=-
So the methods suggested by Dirk and MacroMarc both worked, though I couldn't apply the latter to my full data set (17,000+ rows) because it was taking a long time.
I've since decided to turn this into a VBA macro because we now want to see the counts of triplets instead of pairs.
With the 2 columns you start with, it is as good as impossible... You would need to check every ExceptionID to have 2 different specific AllergenID. Better use a helper-table with ExceptionID as rows and AllergenID as columns (or the opposite... whatever you like). The helper table needs a formula like:
=COUNTIFS($A:$A,$D2,$B:$B,E$1)
Which then can be auto-filled. (The ranges are from my example, you need to change them to your needs).
With this helper-matrix you can easily go for your bigger matrix like this:
=COUNTIFS(E:E,1,INDEX($E:$G,,MATCH($I2,$E$1:$G$1,0)),1)
Again, you can auto-fill with this formula, but you need to change it, so it fits your needs.
Because the columns have the same ID2 (would be your AllergenID), there is no need to lookup them because E:E changes automatically with the auto-fill.
Most important part of the formulas are the $ which should not be messed up, or you can not auto-fill it.
Picture of my self-made example (formulas are from the upper left cell in each table):
If you still have any questions, just ask :)
It can be done straight from your original set-up with array formulas:
Please note that array formulas MUST be entered with Ctrl-Shift-Enter, before copying across and down:
In the example pic, I have NAMED the data ranges $A$2:$A$21 as 'People' and $B$2:$B$21 as 'Allergens' to make it a nicer set-up. You can see in the formula bar how that looks as a formula. However you could use the standard references like this in your first matrix cell:
EDIT: silly me, N function is not needed to turn the booleans into 1's and 0's, since multiplying booleans will do the trick. Below formula works...
SUM(IF(MATCH($A$2:$A$21,$A$2:$A$21,0)=ROW($A$2:$A$21)-1, NOT(ISERROR(MATCH($A$2:$A$21&$E2,$A$2:$A$21&$B$2:$B$21,0)))*NOT(ISERROR(MATCH($A$2:$A$21&F$1, $A$2:$A$21&$B$2:$B$21,0))), 0))
Then copy from F2 across and down. It can be perhaps improved in technique with sumproduct or whatever, but it's just a rough example of the technique....
I have two columns with team names and two columns with corresponding stats. I need to go through the 2 columns and find the stats that match the team name, and they need to be in order. VLOOKUP, MATCH, and SEARCH don't seem to work with multiple columns. Does anyone know how this can be done?
Assuming in your picture, the "Home" title is in cell B2 then the following array formula can be put in the cells H3:L7 (array formulas need to be entered with Ctrl+Shift+Enter)
=IFERROR(OFFSET($D$1,-1+SMALL(IF(($B$3:$B$7=H$2)+($C$3:$C$7=H$2),ROW($B$3:$B$7),"X"),ROW()-ROW($2:$2)),--NOT((INDEX($B:$B,SMALL(IF(($B$3:$B$7=H$2)+($C$3:$C$7=H$2),ROW($B$3:$B$7),"X"),ROW()-ROW($2:$2)))=H$2))),"")
Let me break it down...
the logic is: OFFSET(top_of_results,row_number_that_has_Nth_team_score,0_or_1_for_home_or_away)
this is wrapped in an IFERROR in for where there isn't e.g. a 5th score for team A
using the array part IF(($B$3:$B$7=H$2)+($C$3:$C$7=H$2),ROW($B$3:$B$7),"X" we get an array of that has the ROW number if either (done using +) B or C have a value matching our team header (H$2) or an X otherwise
using SMALL(...,ROW()-ROW($2:$2)) we get the Nth smallest row, based on 1st being in row 3, 2nd in row 4 etc.
to get whether it is home or away, we check column B on our row to see if it matches --NOT((INDEX($B:$B,row_number_that_has_Nth_team_score-O)=H$2)) this gives 0 for home, 1 for away and this is used to offset the column
Hopefully it makes sense. Array formulas are very powerful, if a little confusing :-) I recommend CPearson's intro for more information.
Good luck!