Excel - Vlookup on moving reference - excel

Thank you for taking the time to read this and maybe answer this for me.
I am currently referencing a cell within an array like so:
=INDEX('CampPerf Output'!$1:$1048576,9,55)
This data unfortunately can move up and down along the columns of the array. For example, sometimes it will not be Row 9, it will be on
=INDEX('CampPerf Output'!$1:$1048576,8,55)
=INDEX('CampPerf Output'!$1:$1048576,10,55)
=INDEX('CampPerf Output'!$1:$1048576,11,55)
You never know, because the data above it may be absent for this particular day's reporting (its call volume for lines of business in a call center setting. Some lines of business don't receive any calls that day.)
New data is INSERTED, but will always land in column BC (Column 55). New data is inserted to the left, pushing cells right. Since the data can move up and down, and the references of a vlookup move as new data is inserted, my question is this:
Can I combine my Index and a vlookup in a way that makes the lookup column absolute, no matter what is inserted or deleted.

Use:
=VLOOKUP('Conversion'!J1,INDEX('CampPerf Output'!$1:$1048576,0,55):INDEX('CampPerf Output'!$1:$1048576,0,65),2,FALSE)
This mimics your vlookup you provided, it looks in the range BC:BM. Vlookups can use full column references, the 0 in the second criterion includes all the rows.

Related

How to Mimic Excel Tables Equations / New Row Behavior in Google Sheets

For those of us used to Microsoft Excel who switch to using Google Sheets, there are many differences which need to be taken into account.
One of the nice features in Excel that I miss is tables. If you insert a table into your Excel Spreadsheet, it does a lot of automatic things for you. You can have a single formula for one column of your table, and not have to update it whenever you add new data - whether adding a table row, or adding a row in the middle of the table.
Sometimes (though I haven't figured out why it sometimes does and why it sometimes doesn't) even without tables, Excel will suggest a formula fill as you're entering new data into a row, making copying the formula as easy as pressing "Tab".
There is no functionality in Google Sheets that matches this exactly. When you have a lot of data to enter, having to copy the formulas every single time you add a row is very tedious and time consuming and further leaves open the possibility in making a mistake when transcribing the information and copying / pasting the formulas. Any single cell could have a mistake and you won't know until it causes a problem later, then troubleshooting it will also be time consuming and difficult.
There are various questions in StackOverflow, StackExchange, Google Support and other sites that tackle this issue, but none seem to have a good solution that works for everyone. A lot of people have written an Apps Script do do just this, or use Apps Script + HTML forms as well... but it seems like that shouldn't be necessary, it adds more time & setup, and ends up with a specific solution for that sheet and that sheet only.
So, how can you replicate this behavior in Google Sheets so you don't have to keep copying & pasting your formulas over and over again and save yourself time (and your company money) and make Google Sheets act more like Excel?
BACKGROUND
There is a Google Support Thread on Inserting new Rows which suggests the use of ARRAYFORMULA to do this job. It is not an exact replacement for Excel's functionality, but it can work in most applications. There are other functions that output arrays, such as SEQUENCE which can also be applied similar to these examples depending on the situation, but I'll focus on ARRAYFORMULA here as it's the most generic and MOST functions can be wrapped in it and otherwise behave as you'd expect.
Here is also a link to an ARRAYFORMULA & MMULT Example Provided by Google (Note that this link will make a copy of the sheet, not let you directly access the example). The first tab is all about matrix multiplication, the second and later tabs have examples using ARRAYFORMULA.
The examples above are pretty limited in scope, so let's expand on those. To illustrate, I will use a basic formula involving 4 columns as an example. Let's say we have data in columns A, B, and C, and we want to do a relatively simple formula between them. Let's assume row 1 is being used as a header row, and your data is from row 2 down, as most people would do. Let's make the formula simple, but a little interesting, by having column D equal to the PREVIOUS value of A plus the product of B and C. Let's also assume we currently have 12 rows of data, but we know we will have data we need to enter in the future. Most of that data will get entered at the end, but sometimes we may need to add data in the middle of the range.
You can follow along with my Publicly Posted Example Sheet Here if you want (this will also create a copy on your drive so you can make changes and follow along). Each example below corresponds to a tab in the Example Sheet.
EXAMPLE: FORMULA ON EVERY ROW
In it's simplest form, the formula in D2 would be = A1 + B2 * C2. Except, of course, we know A1 is a text header and if we include that we'll get an error. It's also commonly understood that absolute references (with $) execute faster in Google sheets, and we don't need relative references on the columns (but rows are necessary to fill), so let's modify cell D2 as follows:
=IF( ISNUMBER($A1), $A1, 0 ) + $B2 * $C2
Then fill down to cell D13 (this is already done in the example).
So now you have your current data... but what if you need to add data?
If you add data to row 14, in columns A, B, and C, you then also have to copy the formula to D14. Easy peazy for this example, but what if you have 30 columns, 5 of them with formulas and you add another 10 entries to the list every day? This becomes very tedious. You can avoid entering it for every row, but filling down the number of rows you need today and save a little time, but it breaks your flow of data entry.
Even worse, what if the entries are in some sort of order (e.g. order of date data was captured) and you get old data that needs to be entered in the middle of the range? You can add at the end and copy, then sort.
Some sheets won't let you sort, or won't sort correctly if you have certain data, so you may need to insert in the middle... let's say between rows 8 and 9. If you did this in an Excel table, and used "insert row" it would automatically populate cell D9 with your formula.
But here, when you add this new row 9 not only is D9 blank and need you to enter the formula, but now the A column reference in cell D10 is pointing to A8 instead of A9 where it should! So you have to recopy / refill your equation to cell D10 as well - and this is easy to miss - you may not know to do it, or forget to do it, and now your formulas are broken.
... Now, to be honest, Excel didn't get this part right, either.
Somehow, it properly fills D9 in with the correct formula but botches
D10 with a reference to A8, but then continues with a correct reference to A10 in D11. Which is almost worse because since D9 was filled and all the other rows are correct, you may not realize you have a problem in D10...
This is basic spreadsheet use and is roughly the same behavior as using Excel WITHOUT Tables (except those instances where it decides to make suggestions for you) - so par for the course here if Excel didn't have the Table or suggestion ability.
Pros:
Simplest formula to implement
Works fine in fixed size sheets or sheets that don't change often
Tried and true, Will always work
Cons:
Have to copy formula to every new row you make
Very tedious for "living documents" that change often
If any formulas cross between rows, the pattern breaks when you insert a row
in between and you have to copy your formula to the row below as
well as your new one
With all the additional required repeated actions, it's very easy to make a mistake
Since the mistake could be in a single cell, finding the mistake after the fact can be difficult
EXAMPLE: CLOSED RANGE ARRAYFORMULA
Google support touts this as the best method. Indeed, if you want your formulas to update automatically when you add data in between and you want the least amount of computation time, then an ARRAYFORMULA with a limited (or "closed") range is the best solution.
To use ARRAYFORMULA, you put the formulas only in your top row of data (in this example, row 2). What makes this example closed range is that we will set it to cover exactly the data we have. So, the formula in D2 would be:
=ARRAYFORMULA( IF( ISNUMBER( $A$1:$A$12 ), $A$1:$A$12, 0 ) + $B$2:$B$13 * $C$2:$C$13 )
Here, we can (and I recommend) use all absolute references as the range we're using doesn't change as the cell row it's calculating changes. When you enter this formula, you will see it automatically populate D3 through D13 with the correct data as well.
If we want to add another row in the middle, it's easy. Taking the previous example, if we add a row between rows 8 and 9, you will see the formula in D2 has changed all the last rows - 12 is now 13, and 13 is now 14. When you enter data into columns A, B, and C in the new row 9, it automatically calculates correctly in D9.
When you look at the data in rows in column D (except D2), however, it shows the number itself in the formula bar - so someone looking at this sheet unaware there is an ARRAYFORMULA in use has no indication that it's an ARRAYFORMULA and overwriting ANY cell that was populated by ARRAYFORMULA will break the formula, give you an error in D2 and leave the rest of the values in the column blank. This is true for all methods using ARRAYFORMULA So, for that reason, I recommend you make your column a protected range!
Alternate: You could name all of your ranges. For example, $A$1:$A$12 could be col_A_prev, $B$2:$B$12 could be col_B, and $C$2:$C$12 could be col_C. Which gives the formula:
=ARRAYFORMULA( IF( ISNUMBER( col_A_prev ), col_A_prev, 0 ) + col_B * col_C )
The behavior would be identical. When you add a row in between, the named ranges will automatically expand to include it. You could also use the same ranges for your column protection to ensure no data is written over.
Note: I do want to give kudos where it is due. Google Sheets handles named ranges WAY better than Excel. When you add or remove rows / columns inside your named range in Google it automatically expands the range - and Google actually allows you to use the named ranges as references in any of the settings (conditional formatting, protection, etc.). While you can enter a named range in Excel for some of these, it will convert it to R/C references which won't change even if your range changes later. If you want to add to the ends or you move rows / columns in your named range - well, they're both still terrible at that
However, if we want to add new data to the end, in row 14 or after, this arrayformula will not automatically update.
Even worse, if you add a row between rows 12 and 13, it breaks the formula - as the references to columns B and C will update, but the references to column A will not - because A only went to row 12. In row 14 you now get the error:
Array arguments to ADD are of different size.
Because you're trying to add an array with 12 elements to an array with 13 elements. Admittedly, this is only a problem if you're referencing other rows which isn't that common across all useful spreadsheets. However, there are many practical reasons to do so, like cumulative sums.
So, either you have to deal with updating your ARRAYFORMULA columns each time you add data to the end (which doesn't make it much better than just copying your formulas to each row) or, you could basically make the last two rows "dummy rows" that you don't care about and add protection to those rows so they can't be edited or a row added between them, with perhaps a note saying "To add new data, insert a row above this line" so other people using it know what they have to do.
Pros:
Relatively simple formula to implement
Fastest Execution time
Will automatically adjust formula to any rows added in the middle
Can manage your ranges as named ranges
Cons:
Have to change the formula if you add any new data to the bottom (which is where you usually add new data) -OR- you have to implement one or more blank rows included in range with protection & reminders to ensure no one adds data to the bottom
Data below ARRAYFORMULA looks like just number entries and could easily confuse people into thinking it's not a formula entry and overwrite it without thinking.
EXAMPLE: OPEN RANGE ARRAYFORMULA
If you're following along in the example sheet, the first thing you'll note is this sheet doesn't do the same thing. It is simply using the CURRENT value in column A, rather than the previous row. This is because you CAN'T reference a previous row with this method (see a couple paragraphs down for why). To compensate, I forced A, B, and C to 0 in the first row and added another row to the bottom.
This is similar to the closed range example in its application of ARRAYFORMULA the difference here, is instead of having a fixed end to the ranges (rows 12 & 13 above), you leave the range open by using just the column letter at the end of the range, which references the last row of the column. So the equation in D2 now looks like this:
=ARRAYFORMULA( IF( ISNUMBER( $A$2:$A ), $A$2:$A, 0 ) + $B$2:$B * $C$2:$C )
The reason you can't reference a previous row's cell is if we used $A$1:$A here, that array would always have one more element than either $B$2:B or $C$2:$C and thus won't be able to add and will result in the error:
Result was not automatically expanded, please insert more rows (1).
Except inserting more rows won't work because the ranges will all expand by 1 also. Again, this is only a problem if you need to reference other rows which isn't common but is useful for things like cumulative sums.
When it comes to adding rows, though, this method is the best. Whether you are adding to the middle or the end of your data, it will automatically update the values in your ARRAYFORMULA columns.
Alternate: Same as with closed ranges, you could name all of your ranges. For example, $A$1:$A could be col_A_prev, $B$2:$B could be col_B, and $C2:$C could be col_C. Which gives the same formula as with closed range:
=ARRAYFORMULA( IF( ISNUMBER( col_A_prev ), col_A_prev, 0 ) + col_B * col_C )
So if you're not referencing previous rows, or if you just add a top "dummy" row like I did in the example, it's all good... easy peazy lemon squeezy, right?
Yes, at least at first. The other problem here is that open ranges are computationally intense for Google Sheets algorithms. As you add more and more rows, especially if you have multiple open range ARRAYFORMULA columns, the sheet calculations get slower and slower and slower. The sheet I was working on that prompted this had 21 columns, 8 of which had ARRAYFORMULA formulas in row 2. At around 200 rows of data (not that much in the world of spreadsheets) it was taking MINUTES to calculate with each and every change I was making. That's simply not useable - I almost went back to copying the formula to each row. (It's possible using named ranges may improve the speed some - I didn't try it on that sheet)
So this solution doesn't really work for big (but not even that big) spreadsheets where you have lots of formulas.
Also, a more minor gripe - you'll notice in the example that every row on the spreadsheet was now populated in column D, even where no data was entered. That's annoying, but not a sheet killer by any means - and you could add an IF statement to the ARRAYFORMULA to just output "" whenever you have no data in one or more data columns.
Pros:
Relatively simple and straight forward formula to implement
"Works" with any number of rows
Automatically includes any rows that are added - on the end or in between
Can manage with named ranges
Cons:
Cannot reference data from previous rows
Extremely slow - computation time goes up with every added row (& every added column with an open reference)
Data below ARRAYFORMULA looks like just number entries and could easily confuse people into thinking it's not a formula entry and overwrite it without thinking.
EXAMPLE: HYBRID ARRAYFORMULA
Are you ready to give up on Google Sheets yet?
Well, there is one more option. It gets complicated and involved, but IMO works better in most situations than any of the above examples.
What I do here is add a cell with a formula for the number of rows in the sheet that have data in a certain column. Let's just say column A for this example. That formula looks like this:
= ARRAYFORMULA( MAX( IF( LEN($A:$A), ROW($A:$A), ) ) )
This, in and of itself, is an open ranged formula. It scans everything in column A and returns the last row that has SOMETHING in it. But it's one single formula in one cell reporting 1 value - no other cells get populated from it. It's relatively computationally intense for this one cell, but it's just one cell in the entire sheet.
Then, to make sure that any changes you make (adding / removing rows or columns) do not affect any references to that cell, name it. In the example provided, this is named last_example_row.
I also strongly recommend that you add protection to last_example_row so it's not accidentally changed. Extra tip: you can actually set both sets of permissions: "Only You can edit" and "show a warning when editing" so even if you try to edit it accidentally it will give you the chance to cancel the edit.
Since it's not a piece of data you need visually, hiding it is also a good idea (I left it unhid in the example so you can easily see the formula)
Now, in order to use the value in last_example_row as part of our ranges, we have to use the INDIRECT function. We replace every open-ended instance in the previous example with a specific INDIRECT call.
For calls to the same row, for example, we replace with a pattern like this:
$B$2:$B is replaced with $B$2:INDIRECT( "$B$" & last_example_row )
so it ends on the last used row.
For calls to the previous row, we replace with a pattern like this:
$A$1:$A is replaced with $A$1:INDIRECT( "$B$" & ( last_example_row - 1 ) )
so it ends 1 row before the last used row.
So the final equation becomes this monstrosity:
=ARRAYFORMULA( IF( ISNUMBER( $A$1:INDIRECT( "$A$" & ( last_example_row - 1 ) ) ), $A$1:INDIRECT( "$A$" & ( last_example_row - 1 ) ), 0 ) + $B2:INDIRECT( "$B$" & last_example_row ) * $C2:INDIRECT( "$C$" & last_example_row ) )
So it's a closed range reference that points to a single open range calculation, and it works. Whether you add data in the middle or to the end, it automatically calculates your column for you - and it only populates rows where your data is also populated.
Since it only does the open range calculation ONCE, then uses that value in all remaining closed range calculations, this is much, much faster than the open range example above. It IS slower calculating than the first two examples, however - but I haven't yet hit the point in my real sheets where the delay has made it unusable (stay tuned as I add more data to my sheets over time). If anyone reading this has hit that point with this method, please let me know how many columns & rows you got to, including how many of the columns used an ARRAYFORMULA like this.
Unfortunately, however, since this method requires an INDIRECT call, you cannot use named ranges to accomplish this.
Pros:
Most flexible option
"Works" with any number of rows
Automatically includes any rows that are added - on the end or in between
Much Faster than completely open references
Cons:
Formulas are complex, hard-to-follow, and easy to make a mistake while entering
Slower than closed references - computation still time goes up with every added row and every added column with these "hybrid" references
Data below ARRAYFORMULA looks like just number entries and could easily confuse people into thinking it's not a formula entry and overwrite it without thinking.
Cannot manage with named ranges
Epilogue
Maybe (hopefully) someday Google will add a feature that will keep track of your formulas and execute them in a speedy way and this post will be obsolete. Until then, I hope this post helps someone out there.
Additional Note
Using any of the ARRAYFORMULA methods above can break sorting. If you add filters, and the sort by A->Z or Z->A on a particular column and row 2 is no longer row 2 - then your ARRAYFORMULA gets moved to whatever row it gets sorted to - and then only applies from that row down. Rows above it will be blank in all your ARRAYFORMULA columns. This is very disappointing to me. One way around it (that I don't like) is you can make row 2 a "dummy" row where whatever columns you may sort by have values that will always make it the top row. That's a pretty ugly solution, though.
You can make it less "ugly" by hiding row 2. Then columns will sort fine and you won't see any of the dummy data ("dummy" data may not even be necessary as the hidden row shouldn't sort with the rest). The caveat here is if you share it with multiple users - they won't even see there is a formula being used, it looks like all manual entries - and if one gets overwritten, it will break the ARRAYFORMULA. So, I would recommended protecting the ARRAYFORMULA columns, as well.

How to extract whole rows of data in Excel if it contains a subtring within a string

The entire workbook is a few sheets long, however essentially I'm working with a base sheet that has around 8000 or so lines data with about 10 columns or so. The end goal of this project is to be able to input a start date, end date and a keyword and then be filtered one last time with another keyword. So far, I've been able to filter down the original data within the date range and within the first keyword. The problem arises now when the keyword is within a block of text that varies and is never quite the same. For example, one row contains
12T Q1FY23 Unscheduled/Emergency Maintenance
While another row contains
12T Q4FY23 ERT Spill Stations
There are hundreds of variations of this, but there, including ones that don't start with "12T". The starting data is subject to change so I can't quite use tables in excel and filter it that way, as once you apply a filter then the table won't update if new data is input as the source data, unless there is a way to do this and I just don't know how. So ultimately, I need the same filter that can be used on a table that says "contains" and/or "does not contain" as formulas. Formulas seem to work well with this dynamically/subject-to-change source data, so I'd like to keep it with formulas, as I have done with the filtering previously with the date range and then with the other keyword. The difference between what I want now and what I did for that other keyword is that it was a static keyword that isn't embedded within a string like the "12T". Please let me know if this is too vague or if there's any more material needed to help answer this question. Attached is a sample image of a what I'm working with on the original sheet. I'd like to be able to extract the rows containing only "12T", and not the one's "12T-M", for example, using only a formula. Assume that the data starts at A3 and ends at C8. I should also mention, just to be completely clear, I'm trying to copy these rows dynamically into another sheet so that it can be nicely viewed with only the relevant information and data.
To be extra clear, I first filter it the original data with the following formula:
=INDEX(Sheet1!$A$6:$N$6796, SMALL(IF(COUNTIF('12T'!$H$11,Sheet1!$G$6:$G$6796), MATCH(ROW(Sheet1!$A$6:$N$6796),ROW(Sheet1!$A$6:$N$6796)), ""), ROWS(B3:$B$3)), COLUMNS(Sheet1!$A$6:A6))
The "Sheet1" referral contains the original data and "12T" refers to the sheet that contains the filtering keywords (the dates and the number keyword). This formula extracts all of the rows of the original dataset in Sheet1 that contain a specific keyword, in this case its "5351 - Facilities: Maintenance: Building". These extracted rows of data are deposited as an array (Entered with ctrl+shift+enter) in a new sheet labeled "Xtract".
In this same sheet, I then filter out this array with the date range in mind. With the starting and ending date, I first calculate the number of instances that a date falls within the date range with the following formula.
=SUMPRODUCT(($A$2:$A$671>=Q2)*($A$2:$A$671<=Q3))
I use this result in the following formula in conjunction with the filtered data (filtered with the previous keyword) to filter it further so that I only get the rows of data that have their date in the date range.
=FILTER(A2:O671,(A2:A671>=Q2)*(A2:A671<=Q3),"No data")
This is also entered as an array, and is also in the "Xtract" sheet. With this filtered data set, I want to filter it one last time, so that only the rows of data that contain, for example, "12T" or "728M" in one of the cells (in which the respective cell can be written as "12T Q1FY23 UEM") can get extracted and placed into a final array. All of this is automatically updated simply by entering the values in this section I have shown below.
I can't use a table to filter the data, at least not that I know of, because if I filter a table by this logic ("contains '12T'" and "does not contain '-M'" to get only rows that contain 12T but not 12T-M or anything that's not 12T) then once I change the date range or the other keyword, the table won't update properly. If there's anything else I can add to help clarify, please let me know.
Add a column to the left containing formulae: "=find("12T ",B1) and copy down.
Note the space after T.
Rows matching that will have 1; rows not matching will have #VALUE! so you can sort on them.
P.S. if #VALUE! is ugly, you can use =NOT(ISERROR(FIND("12T ",B2)))
After a lot of searching and referencing my old work/internet, I found the formula to answer my problem. I understand this might not be the most clear since I can't quite provide the excel workbook I'm working with, but the goal of this was to automate all of the filtering so that no matter if data is added or not, when you change the filters, it will stay updated correctly. From the filtered data that I had already worked with, all I had to do to put it into another sheet was use the following formula:
=FILTER(XtractFilters!T2:AF900,ISNUMBER(SEARCH("12T *",XtractFilters!T2:T900)))
This finds all of the data containing a specific substring, which in this case specifically was "12T ", denoting the space as well. So all of the filtered results are then filtered once again so that only the rows where "12T " was found get returned. The range is just the entire range of data and then the column is the one containing the text where "12T " could be found.

How do I extract data from the same row of a maxif result on excel?

So I have a data set converted from a stock chart in 1 minute time increments and I want to extract key data points from the data set.
screenshot
The problem I am running into is when I attempt to use the INDEX function to match the MAXIF and MINIF results, the time criteria does not follow through:
The first function to extract the Low of Day from this data set:
=MINIFS(E:E, B:B, ">09:30", B:B, "<16:00")
The second function I'm attempting to pull the time of when the Low of Day data point is pulled from:
=INDEX(B:B,MATCH(MINIFS(E:E, B:B, ">09:30", B:B, "<16:00"),E:E,0))
The result I get is 8:52 AM, which is outside of the time criteria I have set. It appears that the function seems to pull the very first instance that matches the MINIF function result, disregarding the time criteria altogether.
So also I want to keep in mind that I want to use a function that does not rely on hunting down individual cells as I'm hoping the end goal is to automate a data extraction process to export all the significant data points into a new excel sheet, and doing this over the course of several hundred to thousands of data sets.
Ideally I'd like to have a function that can reference the exact data point that was first extracted to pull other significant data from the same row and avoid possibly referencing the wrong data point just because it's a duplication elsewhere.
Try the following array formula, which needs to be confirmed with CONTROL+SHIFT+ENTER...
=INDEX(B2:B10,MATCH(SMALL(IF(B2:B10>"09:30"+0,IF(B2:B10<"16:00"+0,E2:E10)),1),IF(B2:B10>"09:30"+0,IF(B2:B10<"16:00"+0,E2:E10)),0))
...and adjust the ranges accordingly. For efficiency, though, I would suggest that you avoid whole column references.
If the range grows over time, consider converting your data into a Table. The references will automatically adjust as rows are added or removed.
Notice that +0 is added to the time. This is to coerce the string value into a true time value.
EDIT
Since it looks like you don't want to convert your data into a Table, here's an alternative that uses defined names instead. Here too, the ranges will automatically adjust as rows are added/removed. Note that it uses Column B to determine the last row.
First, define the following names (change the sheet names accordingly)...
Name: LastRow
Refers to: =MATCH(9.99999999999999E+307,'Sheet 1'!$B:$B,1)
Name: TimeColumn
Refers to: ='Sheet 1'!$B$2:INDEX('Sheet 1'!$B:$B,LastRow)
Name: LowColumn
Refers to: ='Sheet 1'!$E$2:INDEX('Sheet 1'!$E:$E,LastRow)
Then try the following formula, which needs to be confirmed with CONTROL+SHIFT+ENTER...
=INDEX(TimeColumn,MATCH(SMALL(IF(TimeColumn>"09:30"+0,IF(TimeColumn<"16:00"+0,LowColumn)),1),IF(TimeColumn>"09:30"+0,IF(TimeColumn<"16:00"+0,LowColumn)),0))

Excel: Flashfill Offset Horizontal + Vertical

So I'm not a fan of VBA and I recently learned that OFFSET can be used with COUNTA to flashfill a range as far at it is as long as you aim for a longer range than you have data. Now I want to be able to achieve this both for columns and rows at the same time, where the rows are averaged. Could this be done? I am banging my head against the wall to find some logic to do it, but can only manage to combine it in a way that multiplies the rows with the number of the column.. which is not desired, of course.
I have posted a Minimal Reproducible Example in Excel Online:
https://onedrive.live.com/view.aspx?resid=63EC0594BD919535!1491&ithint=file%2cxlsx&authkey=!ALmV0VtFb7QZCvI
If you see Cell J9 and J11 you will see what I want to combine. The three rows in J11 and down, I want to average in J10, and spill/flashfill (like J9 and 11 does automatically because of the formula already) them from to the right, for as many columns as there data in the range A1-G4..
So I have raw data of numbers with titles in A1-G4, and by writing =OFFSET($A$1:$A$1,0,0,1,COUNTA($A$1:$EV$1)-1) in J9 I get all the titles of the columns filled from left to right, and by writing =OFFSET($A$1,1,0,COUNTA($A:$A)-1) in J11 I get the rows of the first column filled from top to bottom. They can also be combined, by writing OFFSET(Days,1,0,COUNTA($A:$A)-1,COUNTA(Days)), where "Days" is =OFFSET($A$1:$A$1,0,0,1,COUNTA($A$1:$EV$1)-1) (in a named range for readability) or OFFSET($A$1:$A$1,0,0,1,COUNTA($A$1:$EV$1)-1) without using a named range
As a thought, though I'm not sure how to implement it, maybe this could somehow be used in some form to get the column reference for the horizontal part in combination with =AVERAGE(OFFSET($A$1,1,0,COUNTA($A:$A)-1))
=MID(ADDRESS(ROW(),COLUMN()),2,SEARCH("$",ADDRESS(ROW(),COLUMN()),2)-2)
..found at https://superuser.com/questions/1259506/formula-to-return-just-the-column-letter-in-excel/1259507
Now, based on your explanation, here is the screenshot of my test:
Section A1:Exxx
I have converted that section into a Table, called «TblData», having numerous avantages:
It expands automatically without any additional efforts/formula
We can identify Data by its Columns attributed automatically by the Table [#1], [#2],[#3], [#4], [#5]
Section J9:N9
As a replica of the table name, I have used the following formula to retrieve it:
=INDEX(TblData[#Headers],1,COLUMN(A1)) '<--- This is for J9
=INDEX(TblData[#Headers],1,COLUMN(E1)) '<--- This is for N9
Section J11:Nxx
As a replica of the Table Content, I have used the following formula to populate the content:
=INDEX(TblData,ROW($A1),MATCH(J$9,TblData[#Headers],0)) '<--- This is on J11
=INDEX(TblData,ROW($A3),MATCH(N$9,TblData[#Headers],0)) '<--- This is on N13
Section J10:N10
Now this is the interesting part of the Average, so here is the formula I used for it:
=AVERAGE(TblData[1]) '<--- This is on J10
=AVERAGE(TblData[5]) '<--- This is on N10
NB: (1) Instead of using the Content below J10:N10, I prefer to reuse the Table as it expands automatically as more rows are added.
(2) Unless it is really necessary, I feel it is a double work as well to replicate again A1:Exxx from J9:Nxxx, because you can use the Table for whatever you need, with less maintenance.
Kindly find attached the file as well after I updated those items:
File Link: https://drive.google.com/open?id=1wRbpUxg0XLpfGqdvMF4fNKXDrL7xPPWs
We can correspond more below for further info. Hoping you to strech more your compentence :)
Sorry, mate, I can't figure out what you want to calculate. If it makes sense to add J9+J11 then you could just concatenate the two formulas in J9 and J11 with a plus sign. After much deliberation I decided to assume that your question is not one of formula but of formula-writing - "referencing" for short. Therefore I prepared this answer for you, hoping that it will prove helpful.
Building on your named range Days I suggest you create a dynamic named range Data with this formula.
[Data] =OFFSET(Sheet1!$A$1,0,0,COUNTA(Sheet1!$A:$A),COUNTA(Sheet1!$1:$1))
The range thus defined is dynamic in both directions. However, bearing in mind that OFFSET is volatile (slows down your worksheet) you may like to keep its use limited to this one formula and perhaps start the range at A2, but I shall tempt you to break the rule. Now you can use the INDEX function to refer to the Data range.
= INDEX(Data, [Row number], [Column number])
defines a single cell. But by setting either column or row to zero you can define an entire column or row. =INDEX(Data,0,1) defines column 1 of the Data range, =INDEX(Data,1,0) defines its first row.
=INDEX(OFFSET(Data,1,0),0,1) defines the first column of a range moved down by one row from its original position. I recommend the alternative and start the Data range from A2 and perhaps declare another range for the first row if needed.
=AVERAGE(INDEX(Data,0,1)) would draw the same average you already have in your sheet, provided that Data was defined starting at A2. For fun's sake, =AVERAGE(INDEX(OFFSET(Data,1,0),0,1)) would do the same without the change in the range's definition.
=COLUMN() returns the number of the column this formula resides in. So, you could enter =COLUMN()-6 in column G, copy to the right and get a count starting from 1. (You can do the same vertically with the ROW() function.) Applied to your formula, =AVERAGE(INDEX(Data,0,COLUMN()-6)) would return the average from column 1 if entered in column G, and from columns 2, 3 4, etc as copied to the right.
As I said, I don't understand enough of your request to bring this idea to a conclusion but I think that using the method described above will provide you with a tool to copy formulas into the table your sample has at its right. If you would elaborate on your requirement I might be able to assist more.

Excel index match multiple row results

I'm stuck on an Excel problem and am hoping someone can assist. I read through 10-15 topics that are similar, but I wasn't able to get anything to work. Here is where I'm at...
I have a large data set containing columns for Year, Name, Total 1, Total 2 (and 20+ other columns). The same names appear in multiple rows based on the yearly totals. On a separate sheet, I have another data set containing Name and would like to pull the data from sheet one into columns as shown below.
I have done this in the past using only one year as the initial data set with the following formula:
=INDEX(DATARANGE,MATCH([#Name],DATARANGE[Name],0),MATCH("Total 1",DATARANGE[#Headers],0))
The problem I am having is the result of adding multiple years of data to my 1st data set. Is there a way to match the row based on name and year and then return the results of the appropriate column?
=SUM(($A$2:$A$9=B$16)*($B$2:$B$9=$A17)*($C$2:$C$9))
Enter above in cell B14 as an array formula or below as standard
=SUMPRODUCT(($A$2:$A$9=B$16)*($B$2:$B$9=$A17)*($C$2:$C$9))
You can do the same for total 2 just replace Cs with Ds
And then drag right and down.
Change the first MATCH function to something like this:
=MATCH(1,INDEX(([#Name]=DATARANGE[Name])*([#Year]=DATARANGE[Year]),0),0)
so as part of your whole formula that would be this
=INDEX(DATARANGE,MATCH(1,INDEX(([#Name]=DATARANGE[Name])*([#Year]=DATARANGE[Year]),0),0)
,MATCH("Total 1",DATARANGE[#Headers],0))
Another way you can use for returning numbers only (as here) is like this: (with cell refs for simplicity).
=SUMPRODUCT((A2:A9=2013)*(B2:B9="name x")*(C1:D1="Total 1"),C2:D9)
If the presented data to be indexed is a table then
This
=MATCH(1,INDEX(([#Name]=DATARANGE[Name])*([#Year]=DATARANGE[Year]),0),0)
should be corrected to a proper structured reference of
#[Name]
Also since this is an array formula it may not work with structured references at all. You'd be better served with regular cell references. Also if it is not a table only cell references will work.

Resources