formula that matches against multiple dates and grabs the most recent - excel

I have a data table with two columns. The first column has a list of Project IDs, and the second column has a bunch of dates associated with those projects. A project can have multiple dates associated with it.
I would like to create a separate summary table of two columns. The first column will be a list of unique Project IDs (I've been able to do this with an index/match function). I want the second column to search the dates column and identify the most recent date associated with each project.
Is it possible to create this second column of my summary table using standard excel formulas and without using any VBA? After an hour or two, I'm not convinced that this is possible.
I was hoping that, for a given project ID, there might be a way to do the following:
--> identify the row numbers for all rows that contain a given project ID;
--> use this row number information to grab the corresponding cell values from the dates column (presumably by first constructing a list of cell references)
--> display the max date out of those that are returned.
What my spreadsheet looks like

The AGGREGATE¹ function can quickly calculate a pseudo-MAXIF function.
In E2 as a standard formula,
=AGGREGATE(14, 6, (B$1:INDEX(B:B, MATCH(1E+99,B:B)))/(A$1:INDEX(A:A, MATCH(1E+99,B:B ))=D2), 1)
Fill down as necessary.
Like the SUMPRODUCT function, AGGREGATE benefits from referencing the minimum number of rows necessary. The MATCH(1E+99,B:B) truncates each column referenced by the INDEX function at the extents of the daes in column B.
¹ The AGGREGATE function was introduced with Excel 2010. It is not available in earlier versions.

Related

Get duplicates from two columns

I'm having trouble filtering an excel table. M, it is a set of two rows from two tables, where it is necessary to find duplicates.
2 rows with duplicates
Some idents are repeated, they are present both in the current and previous months. In the example below, with the help with this function =IFERROR(MATCH(A2;B:B;0); "NO"), I obtained information about which data from last month is repeated in the current month and exactly in which row it is located. The code for determining whether it is repeated is as follows =COUNTIFS($A$2:$B$13;A2)>1
duplicates and if repeated
I would like to retrieve only duplicates from the list, I tried the code =IFERROR(INDEX(A:A;SMALL(IF(NOT(D$2:D$104=TRUE);ROW(B2)-ROW(INDEX(B2;1;1))+1);ROW(G:G)));" ERROR")to get the ones that are repeat and skip those ones that arent, but the result is not as desired. In line G, you can see an example of how Excel gives me data regarding the entered function. In cell H, it is shown how I would like a new row to be created with only duplicates.
Current vs. desired display
In this example, the columns are a bit small, but in reality there could be at least a thousand rows, so I would need help filtering those.
You implied these columns were present in two different tables. So I used Tables with structured references. You can convert to normal addressing if you require that instead.
If you have Windows Excel 2021 or later, you can use:
=FILTERXML("<t><s>" &TEXTJOIN("</s><s>",,UNIQUE(LastMonth[Last month marks],FALSE,TRUE),UNIQUE(CurrentMonth[Current Month],FALSE,TRUE))& "</s></t>","//s[following::*=.]")
Create a list of distinct items for each row
Create an XML by concatenating the items into an array using Textjoin
Extract only those items that are followed by an identical item
With your earlier version of Excel, again, I would still use Tables and structured references but I would also use a Helper Column
D2: =IFERROR(MATCH(lastMonth[#[last month]],currentMonth[current month],0),"NO") *and fill down*
E2: =IFERROR(INDEX(currentMonth[current month], AGGREGATE(15,6,[Duplicates in Which Row],ROWS($1:1))),"")

Dynamic Excel Function to Identify Row Number without VBA

So I am attempting to create a static table in Excel that functions dynamically based off of a pivot table. The amount of data is manageable, ~200-300 rows, so having the formulas wrapped in IFERROR to keep them blank is not an issue. The issue I am running into is identifying the row number that the subtotal's are populated in. They are split by client but the number of entries can very month to month so this month the subtotal for client A could be row 46, but next month it could be row 52. The data is needed from the subtotal line to perform additional calculations within the "static" table.
So far I have been able to concat the row function, which correctly identifies that there is 1 row containing the data.
=ROWS(CONCAT(VLOOKUP(G47,Table1[[CWShortID]:[Company Name]],2,FALSE)," ","Total"))
However, I am unable to pull back the row number itself (which I need to concat into another formula).
Essentially I am attempting to use a short id to pull the clients full name from another table, concat it with " Total" to form the search parameter, return the row number of the match, then concat that with a column identifier to perform the intended calculation. Example:
=SUM(I47/IF(COUNTIF($A$52,CONCAT(VLOOKUP(G47,Table1[[CWShortID]:[Company Name]],2,FALSE)," ","Total"))=1,$B$52,""))
Basically in the above I need both "52"'s to be dynamically populated by the function. Reading Microsoft's documentations ROW() appears to be what I am looking for, but when nesting it doesn't work as intended.
=SUM(I47/IF(COUNTIF(CONCAT("A",ROW(CONCAT(VLOOKUP(G47,Table1[[CWShortID]:[Company Name]],2,FALSE)," ","Total"))),CONCAT(VLOOKUP(G47,Table1[[CWShortID]:[Company Name]],2,FALSE)," ","Total"))=1,CONCAT("B",ROW(CONCAT(VLOOKUP(G47,Table1[[CWShortID]:[Company Name]],2,FALSE)," ","Total"))),""))
Anyone ever been able to do this without having to write in VBA routines or functions? I would prefer to stick strictly with Excel formulas.

Return last date something was entered into one column with criteria from another column

I'm working with a set of data in excel. Data is entered into rows for items specified in columns. The first column contains a date. A cell in the same row of one of the columns contains the name of a person and in another cell in the same row but different column may contain a number larger than zero (or it may be empty).
I need to create a formula which returns the date when a number larger than zero was last entered into that column for a specific name. This is a "living list" which keeps on growing and the same names appear in different rows, sometimes with a number in the column a mentioned and sometimes not.
I found an old thread on this site on a similar subject which got me as far as knowing the date of the last entry containing the persons name but I'm still not able to configure it to show me when that specific person also had a number larger than zero in that column.
Here's the thread: How to get the newest value from a column with conditions
My current formula looks like this:
=INDEX($A:$A,MATCH(MAX(IF($G:$G=Sheet7!C5,$A:$A,0)),IF($G:$G=Sheet7!C5,$A:$A,"")))
CTRL+SHIFT+ENTER
Column A contains the dates
Column G contains the names (and "Sheet7!C5" is a reference to a name)
The value column I need to add to the mix is column AY
I feel there must be a simple solution (a small add on to the formula) to solve this but I always end up with an error.
Thanks in advance :)
Edit: Here is a simplified example of the data entry and output list needed.
For this you need to sort the date DESCENDING and format the table as Excel Table
Edit: you can sort the date ascending. See explanation at the end.
Using your example, then the formula will be
=INDEX(TableData[Activity A], MATCH($B14, TableData[Employee initials], 0))
This works just like the usual VLOOKUP or INDEX MATCH, fetching the first date on an activity matching the employee initials.
You can use VLOOKUP, but you'll need to dynamically name the range of each columns.
Edit: Just today I found an interesting behavior of MATCH when it found multiple matching values. If you use 1 instead of 0, then it will fetch the last matching value on the list.
So, you can use this formula instead in ASCENDING table.
=INDEX(TableData[Activity A], MATCH($B14, TableData[Employee initials], 1))

Countif with dynamic headers

Good afternoon! I'm trying to get a Countifs or Index Match statement to count the number of times a value occurs in another table. The example:
On my report sheet, Column A contains 10 different statuses, such as Green, Yellow, Red etc.; Row 1 contains six dates, such as 1/31/2015, 2/28/2015, etc. These dates are calculations. The last date references my date worksheet and the five other use EOMONTH to get the month end for the five prior months.
On my data table, I have 7 descriptive columns (such as Type, Make, Model, etc) and then we begin date columns: 1/31/2010 all the way to 7/31/2015. I add a new column each month (I know, I don't like it either, but unfortunately we don't have a time series database).
What I need to do is have a Countifs or Index Match that pulls the date from my report tab, goes and finds it in the header row of my tblTrends, and then counts all those statuses that are Green, and if it's a SUV (for example).
Thoughts?
Thx!!
G
At it's most basic, you'd want something akin to:
=COUNTIFS(TypeRange,"SUV",FirstMonth,Status)
So let's say your data table starts in column L, and the first date column is O:
=COUNTIFS($L$1:$L$100,"SUV",O$1:O$100,A$2)
As you drag this formula across the different dates, it will move the date reference over one to the next month.
If you need it to dynamically determine the date column, I'd recommend OFFSET, which dynamically select a range. However, note that OFFSET is a "volatile" formula, which means it re-calculates anytime a change is made anywhere in the file, which can lead to pretty slow load times if not used sparingly.
=COUNTIFS($L$2:$L$100,"SUV",OFFSET($N$2:$N$100,,MATCH(B$1,$O$1:$Z$1,0)),$A2)
The OFFSET starts on column N, because that's the first column before the columns we want (the date columns). The MATCH tells it how many columns to OFFSET from here.
If you're going to use this over a large amount of data, then you could avoid using the OFFSET formula by creating a dynamic table. This table would only contain data for the six months you're interested in, by utilizing INDEX/MATCH, and you could run your COUNTIFS off this table, instead, using the original, basic method I first described. I can go into detail if you're unsure what I mean.

Optimizing multiple-criteria IFs

I'm performing array calculations that are taking a long time to complete. I'd like to optimize my formulas some more. All of the formulas are of the same nature - they perform some high-level function (Average, Slope, Min, Max) across a column of values. However, not all cells in a column are included in the array. I use multiple IF criteria to choose which cells get included. All comparisons are made to the current row. Here's an example of the data:
A B C D E
1 Company Generation Date Value ToCalculate
2 Abc 1 1/1/2010 5.6
3 ... ... ... ... ...
E would look something like this
{=Average(If(A2=A2:A1000, If(B2=B2:B1000, If(C2 > C2:C1000, D2:D1000))))}
So once E2 is calculated then I have to autofill down column E. Column F, G, H, ... Uses the same approach, either selects different values to operate on or a different function to perform. My dataset is quite large, and with only a few of these the spreadsheet is taking an hour plus to compute. Every so often I'll add a fourth criteria, all other criteria being the same.
Is there an efficiency? Some thoughts:
Can I use a single array per column instead of thousands per column?
Can I condense the first three criteria so that the output is row numbers? Perhaps then subsequent formulas won't have to search for multiple criteria but can just perform the function?
or somehow build the crtieria up? So a new column returns all rows where the company is the same. another column returns all rows from the first column where generation is the same...and so on...
For the Average you can do without arrays:
=AVERAGEIFS(D2:D$1000,A2:A$1000,A2,B2:B$1000,B2,C2:C$1000,"<="&C2)
As there is also a COUNTIFS and a SUMIFS, I think your slopes could be calculated the same way.
For the rest of the functions (max, min, etc), we should analyze case by case.
I did a slight performance test, and this is apparently better, but of course my datasets are just mocked.
HTH!
Note: Excel 2007 and up only!
Edit - Answering your comment.
Without knowing the dimensions of the problem is difficult to give advice, but I'll risk one anyway:
You could write a VBA function that:
1) Generates a new sheet for each company-generation pair
2) Sorts the data in those sheets by date
3) Adds the formulas to those sheets (no conditionals needed in this context)
4) Recalculates and Gets the results from those formulas and populates the original sheet
5) Deletes the auxiliary sheets
To capture the rows and re-use try this approach:
Sort the data by Company & Generation.
Make a unique list of Companies & generations (use Advanced Filter, Unique Only, Copy)
For each Company generation pair in the list build 2 columns of formulae. First column gives the count of rows in the data for this pair (use COUNTIFS), second column gives the first row in the data for this pair (=first row for previous pair+count of rows for previous pair). Then you can use a function like OFFSET to return only the rows of data for the Company-Generation pair and embed this inside the final function/array formula (AVERAGEIFS etc) You could extend this sort and count approach to include dates if you wanted. There is a drawback that if the list of cities and generations change you have to change the list of uniques and associated formulas. There are examples of this approach on my website athttp://www.decisionmodels.com/optspeedk.htmhttp://www.decisionmodels.com/optspeedj.htm

Resources