(Alternate title: Why on earth doesn't Excel support user-defined formulas with parameters without resorting to VB and the problems that entails?).
[ Updated to clarify my question ]
In excel when you define a table it will tend to automatically replicate a formula in a column. This is very much like "fill down".
But ... what if you need exceptions to the rule?
In the tables I'm building to do some calculations the first row tends to be "special" in some way. So, I want the auto-fill down, but just not on the first row, or not on cells marked as custom. The Excel docs mention exceptions in computed columns but only in reference to finding them and eliminating them.
For example, first row is computing the initial value
The all the remaining rows compute some incremental change.
A trivial example - a table of 1 column and 4 rows:
A
1 Number
2 =42
3 =A2+1
4 =A3+1
The first formula must be different than the rest.
This creates a simple numbered list with A2=42, A3=43, A4=44.
But now, say I'd like to change it to be incremented by 2 instead of 1.
If I edit A3 to be "A2+2", Excel changes the table to be:
A
1 Number
2 =A1+2
3 =A2+2
4 =A3+2
Which of course is busted -- it should allow A2 to continue to be a special case.
Isn't this (exceptions - particularly in the first row of a table) an incredibly common requirement?
If you have the data formatted as a table you can use table formulas (eg [#ABC]) instead of A1 format (eg A1, $C2 etc). But there are 2 tricks to account for.
Firstly there is no table formula syntax for the previous row, instead excel will default back to A1 format, but you can use the offset formula to move you current cell to the previous row as shown below. However in this case it will return an # value error since I cant +1 to "ABC".
ABC
1 =OFFSET([#ABC],-1,0)+1
2 =OFFSET([#ABC],-1,0)+1
3 =OFFSET([#ABC],-1,0)+1
4 ....
So the second trick is to use a if statement to intialise the value, buy checking if the previous row value = heading value. If the same use the initial value else add the increment. Note assumes table is named Table1
ABC
1 =IF(OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]],42,OFFSET([#ABC],-1,0)+1)
2 =IF(OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]],42,OFFSET([#ABC],-1,0)+1)
3 =IF(OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]],42,OFFSET([#ABC],-1,0)+1)
4 ....
Note you can set the initial value to be a cell outside the table to define the initial value (in say $A$1) and increment (in say $A$2) as below
ABC
1 =IF(OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]],$A$1,OFFSET([#ABC],-1,0)+$A$2)
2 =IF(OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]],$A$1,OFFSET([#ABC],-1,0)+$A$2)
3 =IF(OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]],$A$1,OFFSET([#ABC],-1,0)+$A$2)
4 ....
I use this IF OFFSET combination all the time for iterating and looping in tables.
If you have alot of columns that need to determine if they are the first row you can have one column test if first row and the rest can work with a simpler if. eg ABC will give true for first row false for others, then DEF with increment the initial value
ABC DEF
1 =OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]] =IF([#ABC],$A$1,OFFSET([#DEF],-1,0)+$A$2)
2 =OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]] =IF([#ABC],$A$1,OFFSET([#DEF],-1,0)+$A$2)
3 =OFFSET([#ABC],-1,0)=Table1[[#Headers],[ABC]] =IF([#ABC],$A$1,OFFSET([#DEF],-1,0)+$A$2)
4 ....
Hope that helps
I don't know if you are looking for something as simple as locking down a formula. You can do that by highlighting the part of the formula you do not want to change and then hitting F4. This will absolute this section of the formila, using a $ to indicate it, and will not change as you copy/paste it down the table.
Alternately, you may be able to use Defined Names. These you can set up in the Data tab and basically assigns something to a name or variable you can then put into your formulas. These can be as simple as an easy reference for a cell on another sheet to incredibly complex multi-sheet formals.
Normally, to handle "exceptional" formula in the first row of a table consiting of several columns, you simply enter it there manually, and fill only the lines below. But if you have more "exceptional" cases scattered around, you will need another column with 0/1 values indicating where the exceptins are. And then you use if(condition, formula_if_true, formula_if_false) everywhere.
A B
Number Exceptional?
1 if(C1,42,A1+1) 0
2 if(C2,42,A2+1) 1
3 if(C3,42,A3+1) 0
As much as I love Excel, and as much as it is the best product of whole MS, it is still a weak tool. FYI, you can quiclky learn modern and poweful scripting languages, such as Ruby, here, and never be bothered by spreadsheet idiosyncrasies again.
Related
For those of us used to Microsoft Excel who switch to using Google Sheets, there are many differences which need to be taken into account.
One of the nice features in Excel that I miss is tables. If you insert a table into your Excel Spreadsheet, it does a lot of automatic things for you. You can have a single formula for one column of your table, and not have to update it whenever you add new data - whether adding a table row, or adding a row in the middle of the table.
Sometimes (though I haven't figured out why it sometimes does and why it sometimes doesn't) even without tables, Excel will suggest a formula fill as you're entering new data into a row, making copying the formula as easy as pressing "Tab".
There is no functionality in Google Sheets that matches this exactly. When you have a lot of data to enter, having to copy the formulas every single time you add a row is very tedious and time consuming and further leaves open the possibility in making a mistake when transcribing the information and copying / pasting the formulas. Any single cell could have a mistake and you won't know until it causes a problem later, then troubleshooting it will also be time consuming and difficult.
There are various questions in StackOverflow, StackExchange, Google Support and other sites that tackle this issue, but none seem to have a good solution that works for everyone. A lot of people have written an Apps Script do do just this, or use Apps Script + HTML forms as well... but it seems like that shouldn't be necessary, it adds more time & setup, and ends up with a specific solution for that sheet and that sheet only.
So, how can you replicate this behavior in Google Sheets so you don't have to keep copying & pasting your formulas over and over again and save yourself time (and your company money) and make Google Sheets act more like Excel?
BACKGROUND
There is a Google Support Thread on Inserting new Rows which suggests the use of ARRAYFORMULA to do this job. It is not an exact replacement for Excel's functionality, but it can work in most applications. There are other functions that output arrays, such as SEQUENCE which can also be applied similar to these examples depending on the situation, but I'll focus on ARRAYFORMULA here as it's the most generic and MOST functions can be wrapped in it and otherwise behave as you'd expect.
Here is also a link to an ARRAYFORMULA & MMULT Example Provided by Google (Note that this link will make a copy of the sheet, not let you directly access the example). The first tab is all about matrix multiplication, the second and later tabs have examples using ARRAYFORMULA.
The examples above are pretty limited in scope, so let's expand on those. To illustrate, I will use a basic formula involving 4 columns as an example. Let's say we have data in columns A, B, and C, and we want to do a relatively simple formula between them. Let's assume row 1 is being used as a header row, and your data is from row 2 down, as most people would do. Let's make the formula simple, but a little interesting, by having column D equal to the PREVIOUS value of A plus the product of B and C. Let's also assume we currently have 12 rows of data, but we know we will have data we need to enter in the future. Most of that data will get entered at the end, but sometimes we may need to add data in the middle of the range.
You can follow along with my Publicly Posted Example Sheet Here if you want (this will also create a copy on your drive so you can make changes and follow along). Each example below corresponds to a tab in the Example Sheet.
EXAMPLE: FORMULA ON EVERY ROW
In it's simplest form, the formula in D2 would be = A1 + B2 * C2. Except, of course, we know A1 is a text header and if we include that we'll get an error. It's also commonly understood that absolute references (with $) execute faster in Google sheets, and we don't need relative references on the columns (but rows are necessary to fill), so let's modify cell D2 as follows:
=IF( ISNUMBER($A1), $A1, 0 ) + $B2 * $C2
Then fill down to cell D13 (this is already done in the example).
So now you have your current data... but what if you need to add data?
If you add data to row 14, in columns A, B, and C, you then also have to copy the formula to D14. Easy peazy for this example, but what if you have 30 columns, 5 of them with formulas and you add another 10 entries to the list every day? This becomes very tedious. You can avoid entering it for every row, but filling down the number of rows you need today and save a little time, but it breaks your flow of data entry.
Even worse, what if the entries are in some sort of order (e.g. order of date data was captured) and you get old data that needs to be entered in the middle of the range? You can add at the end and copy, then sort.
Some sheets won't let you sort, or won't sort correctly if you have certain data, so you may need to insert in the middle... let's say between rows 8 and 9. If you did this in an Excel table, and used "insert row" it would automatically populate cell D9 with your formula.
But here, when you add this new row 9 not only is D9 blank and need you to enter the formula, but now the A column reference in cell D10 is pointing to A8 instead of A9 where it should! So you have to recopy / refill your equation to cell D10 as well - and this is easy to miss - you may not know to do it, or forget to do it, and now your formulas are broken.
... Now, to be honest, Excel didn't get this part right, either.
Somehow, it properly fills D9 in with the correct formula but botches
D10 with a reference to A8, but then continues with a correct reference to A10 in D11. Which is almost worse because since D9 was filled and all the other rows are correct, you may not realize you have a problem in D10...
This is basic spreadsheet use and is roughly the same behavior as using Excel WITHOUT Tables (except those instances where it decides to make suggestions for you) - so par for the course here if Excel didn't have the Table or suggestion ability.
Pros:
Simplest formula to implement
Works fine in fixed size sheets or sheets that don't change often
Tried and true, Will always work
Cons:
Have to copy formula to every new row you make
Very tedious for "living documents" that change often
If any formulas cross between rows, the pattern breaks when you insert a row
in between and you have to copy your formula to the row below as
well as your new one
With all the additional required repeated actions, it's very easy to make a mistake
Since the mistake could be in a single cell, finding the mistake after the fact can be difficult
EXAMPLE: CLOSED RANGE ARRAYFORMULA
Google support touts this as the best method. Indeed, if you want your formulas to update automatically when you add data in between and you want the least amount of computation time, then an ARRAYFORMULA with a limited (or "closed") range is the best solution.
To use ARRAYFORMULA, you put the formulas only in your top row of data (in this example, row 2). What makes this example closed range is that we will set it to cover exactly the data we have. So, the formula in D2 would be:
=ARRAYFORMULA( IF( ISNUMBER( $A$1:$A$12 ), $A$1:$A$12, 0 ) + $B$2:$B$13 * $C$2:$C$13 )
Here, we can (and I recommend) use all absolute references as the range we're using doesn't change as the cell row it's calculating changes. When you enter this formula, you will see it automatically populate D3 through D13 with the correct data as well.
If we want to add another row in the middle, it's easy. Taking the previous example, if we add a row between rows 8 and 9, you will see the formula in D2 has changed all the last rows - 12 is now 13, and 13 is now 14. When you enter data into columns A, B, and C in the new row 9, it automatically calculates correctly in D9.
When you look at the data in rows in column D (except D2), however, it shows the number itself in the formula bar - so someone looking at this sheet unaware there is an ARRAYFORMULA in use has no indication that it's an ARRAYFORMULA and overwriting ANY cell that was populated by ARRAYFORMULA will break the formula, give you an error in D2 and leave the rest of the values in the column blank. This is true for all methods using ARRAYFORMULA So, for that reason, I recommend you make your column a protected range!
Alternate: You could name all of your ranges. For example, $A$1:$A$12 could be col_A_prev, $B$2:$B$12 could be col_B, and $C$2:$C$12 could be col_C. Which gives the formula:
=ARRAYFORMULA( IF( ISNUMBER( col_A_prev ), col_A_prev, 0 ) + col_B * col_C )
The behavior would be identical. When you add a row in between, the named ranges will automatically expand to include it. You could also use the same ranges for your column protection to ensure no data is written over.
Note: I do want to give kudos where it is due. Google Sheets handles named ranges WAY better than Excel. When you add or remove rows / columns inside your named range in Google it automatically expands the range - and Google actually allows you to use the named ranges as references in any of the settings (conditional formatting, protection, etc.). While you can enter a named range in Excel for some of these, it will convert it to R/C references which won't change even if your range changes later. If you want to add to the ends or you move rows / columns in your named range - well, they're both still terrible at that
However, if we want to add new data to the end, in row 14 or after, this arrayformula will not automatically update.
Even worse, if you add a row between rows 12 and 13, it breaks the formula - as the references to columns B and C will update, but the references to column A will not - because A only went to row 12. In row 14 you now get the error:
Array arguments to ADD are of different size.
Because you're trying to add an array with 12 elements to an array with 13 elements. Admittedly, this is only a problem if you're referencing other rows which isn't that common across all useful spreadsheets. However, there are many practical reasons to do so, like cumulative sums.
So, either you have to deal with updating your ARRAYFORMULA columns each time you add data to the end (which doesn't make it much better than just copying your formulas to each row) or, you could basically make the last two rows "dummy rows" that you don't care about and add protection to those rows so they can't be edited or a row added between them, with perhaps a note saying "To add new data, insert a row above this line" so other people using it know what they have to do.
Pros:
Relatively simple formula to implement
Fastest Execution time
Will automatically adjust formula to any rows added in the middle
Can manage your ranges as named ranges
Cons:
Have to change the formula if you add any new data to the bottom (which is where you usually add new data) -OR- you have to implement one or more blank rows included in range with protection & reminders to ensure no one adds data to the bottom
Data below ARRAYFORMULA looks like just number entries and could easily confuse people into thinking it's not a formula entry and overwrite it without thinking.
EXAMPLE: OPEN RANGE ARRAYFORMULA
If you're following along in the example sheet, the first thing you'll note is this sheet doesn't do the same thing. It is simply using the CURRENT value in column A, rather than the previous row. This is because you CAN'T reference a previous row with this method (see a couple paragraphs down for why). To compensate, I forced A, B, and C to 0 in the first row and added another row to the bottom.
This is similar to the closed range example in its application of ARRAYFORMULA the difference here, is instead of having a fixed end to the ranges (rows 12 & 13 above), you leave the range open by using just the column letter at the end of the range, which references the last row of the column. So the equation in D2 now looks like this:
=ARRAYFORMULA( IF( ISNUMBER( $A$2:$A ), $A$2:$A, 0 ) + $B$2:$B * $C$2:$C )
The reason you can't reference a previous row's cell is if we used $A$1:$A here, that array would always have one more element than either $B$2:B or $C$2:$C and thus won't be able to add and will result in the error:
Result was not automatically expanded, please insert more rows (1).
Except inserting more rows won't work because the ranges will all expand by 1 also. Again, this is only a problem if you need to reference other rows which isn't common but is useful for things like cumulative sums.
When it comes to adding rows, though, this method is the best. Whether you are adding to the middle or the end of your data, it will automatically update the values in your ARRAYFORMULA columns.
Alternate: Same as with closed ranges, you could name all of your ranges. For example, $A$1:$A could be col_A_prev, $B$2:$B could be col_B, and $C2:$C could be col_C. Which gives the same formula as with closed range:
=ARRAYFORMULA( IF( ISNUMBER( col_A_prev ), col_A_prev, 0 ) + col_B * col_C )
So if you're not referencing previous rows, or if you just add a top "dummy" row like I did in the example, it's all good... easy peazy lemon squeezy, right?
Yes, at least at first. The other problem here is that open ranges are computationally intense for Google Sheets algorithms. As you add more and more rows, especially if you have multiple open range ARRAYFORMULA columns, the sheet calculations get slower and slower and slower. The sheet I was working on that prompted this had 21 columns, 8 of which had ARRAYFORMULA formulas in row 2. At around 200 rows of data (not that much in the world of spreadsheets) it was taking MINUTES to calculate with each and every change I was making. That's simply not useable - I almost went back to copying the formula to each row. (It's possible using named ranges may improve the speed some - I didn't try it on that sheet)
So this solution doesn't really work for big (but not even that big) spreadsheets where you have lots of formulas.
Also, a more minor gripe - you'll notice in the example that every row on the spreadsheet was now populated in column D, even where no data was entered. That's annoying, but not a sheet killer by any means - and you could add an IF statement to the ARRAYFORMULA to just output "" whenever you have no data in one or more data columns.
Pros:
Relatively simple and straight forward formula to implement
"Works" with any number of rows
Automatically includes any rows that are added - on the end or in between
Can manage with named ranges
Cons:
Cannot reference data from previous rows
Extremely slow - computation time goes up with every added row (& every added column with an open reference)
Data below ARRAYFORMULA looks like just number entries and could easily confuse people into thinking it's not a formula entry and overwrite it without thinking.
EXAMPLE: HYBRID ARRAYFORMULA
Are you ready to give up on Google Sheets yet?
Well, there is one more option. It gets complicated and involved, but IMO works better in most situations than any of the above examples.
What I do here is add a cell with a formula for the number of rows in the sheet that have data in a certain column. Let's just say column A for this example. That formula looks like this:
= ARRAYFORMULA( MAX( IF( LEN($A:$A), ROW($A:$A), ) ) )
This, in and of itself, is an open ranged formula. It scans everything in column A and returns the last row that has SOMETHING in it. But it's one single formula in one cell reporting 1 value - no other cells get populated from it. It's relatively computationally intense for this one cell, but it's just one cell in the entire sheet.
Then, to make sure that any changes you make (adding / removing rows or columns) do not affect any references to that cell, name it. In the example provided, this is named last_example_row.
I also strongly recommend that you add protection to last_example_row so it's not accidentally changed. Extra tip: you can actually set both sets of permissions: "Only You can edit" and "show a warning when editing" so even if you try to edit it accidentally it will give you the chance to cancel the edit.
Since it's not a piece of data you need visually, hiding it is also a good idea (I left it unhid in the example so you can easily see the formula)
Now, in order to use the value in last_example_row as part of our ranges, we have to use the INDIRECT function. We replace every open-ended instance in the previous example with a specific INDIRECT call.
For calls to the same row, for example, we replace with a pattern like this:
$B$2:$B is replaced with $B$2:INDIRECT( "$B$" & last_example_row )
so it ends on the last used row.
For calls to the previous row, we replace with a pattern like this:
$A$1:$A is replaced with $A$1:INDIRECT( "$B$" & ( last_example_row - 1 ) )
so it ends 1 row before the last used row.
So the final equation becomes this monstrosity:
=ARRAYFORMULA( IF( ISNUMBER( $A$1:INDIRECT( "$A$" & ( last_example_row - 1 ) ) ), $A$1:INDIRECT( "$A$" & ( last_example_row - 1 ) ), 0 ) + $B2:INDIRECT( "$B$" & last_example_row ) * $C2:INDIRECT( "$C$" & last_example_row ) )
So it's a closed range reference that points to a single open range calculation, and it works. Whether you add data in the middle or to the end, it automatically calculates your column for you - and it only populates rows where your data is also populated.
Since it only does the open range calculation ONCE, then uses that value in all remaining closed range calculations, this is much, much faster than the open range example above. It IS slower calculating than the first two examples, however - but I haven't yet hit the point in my real sheets where the delay has made it unusable (stay tuned as I add more data to my sheets over time). If anyone reading this has hit that point with this method, please let me know how many columns & rows you got to, including how many of the columns used an ARRAYFORMULA like this.
Unfortunately, however, since this method requires an INDIRECT call, you cannot use named ranges to accomplish this.
Pros:
Most flexible option
"Works" with any number of rows
Automatically includes any rows that are added - on the end or in between
Much Faster than completely open references
Cons:
Formulas are complex, hard-to-follow, and easy to make a mistake while entering
Slower than closed references - computation still time goes up with every added row and every added column with these "hybrid" references
Data below ARRAYFORMULA looks like just number entries and could easily confuse people into thinking it's not a formula entry and overwrite it without thinking.
Cannot manage with named ranges
Epilogue
Maybe (hopefully) someday Google will add a feature that will keep track of your formulas and execute them in a speedy way and this post will be obsolete. Until then, I hope this post helps someone out there.
Additional Note
Using any of the ARRAYFORMULA methods above can break sorting. If you add filters, and the sort by A->Z or Z->A on a particular column and row 2 is no longer row 2 - then your ARRAYFORMULA gets moved to whatever row it gets sorted to - and then only applies from that row down. Rows above it will be blank in all your ARRAYFORMULA columns. This is very disappointing to me. One way around it (that I don't like) is you can make row 2 a "dummy" row where whatever columns you may sort by have values that will always make it the top row. That's a pretty ugly solution, though.
You can make it less "ugly" by hiding row 2. Then columns will sort fine and you won't see any of the dummy data ("dummy" data may not even be necessary as the hidden row shouldn't sort with the rest). The caveat here is if you share it with multiple users - they won't even see there is a formula being used, it looks like all manual entries - and if one gets overwritten, it will break the ARRAYFORMULA. So, I would recommended protecting the ARRAYFORMULA columns, as well.
Suppose you have an ordered, indexed list of positive values. These positive values are interrupted by 0 values. I want to determine if a consecutive sub-array exists which is not interrupted by 0 values and whose sum exceeds a certain threshold.
Simple example:
Index, Value
0 0
1 0
2 3
3 4
4 2
5 6
6 0
7 0
8 0
9 2
10 3
11 0
In the above example, the largest consecutive sub-array not interrupted by 0 is from index 2 to index 5 inclusive, and the sum of this sub-array is 15.
Thus, for the following thresholds 20, 10 and 4, the results should be FALSE, TRUE and TRUE respectively.
Note I don't necessarily have to find the largest sub-array, I only have to know if any uninterrupted sub-array sum exceeds the defined threshold.
I suspect this problem is a variation of Kadane's algorithm, but I can't quite figure out how to adjust it.
The added complication is that I have to perform this analysis in Excel or Google Sheets, and I cannot use scripts to do it - only inbuilt formulas.
I'm not sure if this can even be done, but I would be grateful for any input.
Start with
=B2
in c2
then put
=IF(B3=0,0,B3+C2)
in C3 and copy down.
EDIT 1
If you were looking for a Google sheets solution, try something like this:
=ArrayFormula(max(sumif(A2:A,"<="&A2:A,B2:B)-vlookup(A2:A,{if(B2:B=0,A2:A),sumif(A2:A,"<="&A2:A,B2:B)},2)))
Assumes that numbers in column B start with zero: would need to add Iferror if not. It's basically an array formula implementation of #Gary's student's method.
EDIT 2
Here is the Google Sheets formula translated back into Excel. It gives you an alternative if you don't want to use Offset:
=MAX(SUMIF(A2:A13,"<="&A2:A13,B2:B13)-INDEX(SUMIF(A2:A13,"<="&A2:A13,B2:B13),N(IF({1},MATCH(A2:A13,IF(B2:B13=0,A2:A13))))))
(entered as an array formula).
Comment
Maybe the real challenge is to find a formula that works both in Excel and Google sheets because:
Vlookup doesn't work the same way in Excel
The offset/subtotal combination doesn't work in Google sheets
The index/match combination with n(if{1}... doesn't work in Google sheets.
With data in columns A and B, insure column B end with a 0. Then in C2 enter:
=IF(AND(B3=0,B2<>0),SUM(B$1:$B2)-MAX($C$1:C1),"")
and copy downwards:
Column C lists the sums of consecutive non-zeros. In another cell enter something like:
=MAX(C:C)>19
where 19 is the criteria value.
You can avoid the "helper" column by using a VBA UDF.
EDIT#1:
Use this instead:
=IF(AND(B3=0,B2<>0),SUM(B$1:$B2)-SUM($C$1:C1),"")
Thanks to #Tom Sharpe and #Gary's Student for answering the question.
While I admittedly did not specify this in the question, I would prefer to achieve the solution without a helper column because I have to do this operation on 30+ successive columns. I just didn't think it was possible in Excel.
Full credit goes to user XOR LX on the Excelforum for coming up with this solution. It has blown my mind and took me the better part of an hour to wrap my head around, but it is certainly very creative. There is no way I could have come up with it myself. Re-posting it here for the benefit of everyone who is looking into this.
Copy and paste the table from my initial question into an empty Excel sheet such that the headers appear in (A1:B1) and the values appear in (A2:B13).
Then enter this formula as an array formula (ctrl+shift+enter), which gives the max of the sums of all the uninterrupted sub-arrays:
=MAX(SUBTOTAL(9,OFFSET(B2,A2:A14,,-FREQUENCY(IF(B2:B13,A2:A13),IF(B3:B14=0,A2:A13,0))-1)))
Note the deliberate offset to include one additional row below the end of the dataset.
EDIT: I have revived the source data source to remove the ambiguity of my last screen shots
I am trying to transpose spreadsheet data where there are many rows where the customer name may be duplicated but each row contains a different product.
For instance
revised original data source
to
revised proposed data format
I would like to do it with formulae if possible as I struggle with VB
Thank you for any help
I realise this is a huge answer, apologies but I wanted to be clear. If you need anything from me, drop me a comment and I'll help out.
Here's the output from my formula:
EDITED ANSWER - Named ranges used for ease of understanding:
These are just an example of a few of the named ranges I have used, you can reference the ranges directly or name them yourself (simplest way is to highlight the data then put the name in the drop down next to the formula bar [top left])
Be wary that as we will be using Array formulas for AccNum and AccType, you will not want to select the entire column and instead opt for either the exact data length or overshoot it by 100 or so. Large array formulas tend to slow down calculation and will calculate every cell individually regardless of it being empty.
First formula
=IF(COUNTIF(D2:D11,">""")>0,CONCATENATE("Account Number ",LEFT((COLUMN(A:A)+1)/2,1)),"")
This formula is identical to the one in the original answer apart form the adjusted heading title.
=IF(Condition,True,False) - There are so many uses for the IF logic, it is the best formula in Excel in my opinion. I have used to IF with COUNTIF to check whether there is more than 0 cells that are more than BLANK (or ""). This is just a trick around using ISBLANK() or other blank identifiers that get confused when formula is present.
If the result is TRUE, I use CONCATENATE(Text1,Text2,etc.) to build a text string for the column header. ROW(1:1) or COLUMN(A:A) is commonly used to initiate an automatically increasing integer for formulas to use based on whether the count increase is required horizontally or vertically. I add 1 to this increasing integer and divide it by 2 so that the increase for each column is 0.5 (1 > 1.5 > 2 > 2.5) I then use LEFT formula to just take the first digit to the left of this decimal answer so the number increases only once every 2 columns.
If the result is FALSE then leave the cell blank ,""). Standard stuff here, no explanation needed.
Second Formula
=CONCATENATE(INDEX(Forename,MATCH(Sheet4!$A2,Reference,0)))
=CONCATENATE(INDEX(Surname,MATCH(Sheet4!$A2,Reference,0)))
CONCATENATE has only been used here to force blank cells to remain blank when pulled by INDEX. INDEX will read blank cells as values and therefore 0's whereas CONCATENATE will read them as text and therefore "".
INDEX(Range,Row,Column): This is a lookup formula that is much more advanced than VLOOKUP or HLOOKUP and not limited in the way that they are.
The range i have used is the expected output range - Forename or Surname
The row is then calculated using MATCH(Criteria,Range,Match Type). Match will look through a range and return the position as an integer where a match occurs. For this I have set the criteria to the unique reference number in column A for that row, the range to the named range Reference and the match type as 0 (1 Less than, 0 Exact Match, -1 Greater than).
I did not define a column number for INDEX as it defaults to the first column and I am only giving it one column of data to output from anyway.
Third Formula
Remember these need to be entered as an array (when in the formula bar hit Ctrl+Shift+Enter)
=IFERROR(INDEX(AccNum,SMALL(IF(Reference=Sheet4!$A2,ROW(Reference)-ROW(INDEX(Reference,1,1))+1),ROUNDDOWN((COLUMN(A:A)+1)/2,0))),"")
=IFERROR(INDEX(AccType,SMALL(IF(Reference=Sheet4!$A2,ROW(Reference)-ROW(INDEX(Reference,1,1))+1),ROUNDDOWN((COLUMN(B:B)+1)/2,0))),"")
As you can see, one of these is used for AccNum and the other for AccType.
IFERROR(Value): The reason that this has been used is that we are not expecting the formula to always return something. When the formula cannot return something or SMALL has run out of matches to go through then an error will occur (usually #VALUE or #NUM!) so i use ,"") to force a blank result instead (again standard stuff).
I have already explained the INDEX formula above so let's just dive in to how I have worked out the rows that match what we are looking for:
SMALL(IF(Reference=Sheet4!$A2,ROW(Reference)-ROW(INDEX(Reference,1,1))+1),ROUNDDOWN((COLUMN(B:B)+1)/2,0))
The IF statement here is fairly self explanatory but as we have used it as an array formula, it will perform =Sheet4!$A2 which is the unique reference on every cell in the named range Reference individually. In your mock data this returns a result of: {FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE} for the first entry (I included titles in the range, hence the initial FALSE). IF will do my row calculation* for every true but leave the FALSEs as they are.
This leaves a result of {FALSE;2;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE} that SMALL(array,k) will use. SMALL will only work on numeric values and will display the 'k'th result. Again the column trick has been used but to cover more ground, I used another method: ROUNDDOWN(Number,digits) as opposed to using LEFT() Digits here means decimal places so I used 0 to round down to a whole integer for the same result. As this copies across the columns like so: 1, 1, 2, 2, 3, 3, SMALL will alternatively (as the formulas alternate) grab the 1st smallest AccNum then the 1st Smallest AccType before grabbing the 2nd AccNum and Acctype and so forth.
*(Row number of the match minus the first row number of the range then plus 1, again fairly common as a foolproof way to always get the correct row regardless of where the data starts; actually as your data starts on row 1 we could just do ROW(Reference) but I left it as is incase you had data in a different format)
ORIGINAL ANSWER - Same logic as above
Here's your solution in 3 parts
Part 1 being a trick for the auto completion of the titles so that they will hide when not used (in case you will just copay and paste values the whole lot to speed up use again).
=IF(COUNTIF(C2:C11,">""")>0,CONCATENATE("Product ",LEFT((COLUMN(A:A)+1)/2,1)),"") in C
=IF(COUNTIF(D2:D11,">""")>0,CONCATENATE("Prod code ",LEFT((COLUMN(B:B)+1)/2,1)),"") in D
Highlight both of the cells and drag across to stagger the outputs "Product " and "Prod code "
Part 2 would be inputting the unique IDs to the new sheet, I would suggest copying your entire column A across to a new sheet and using DATA > REMOVE DUPLICATES > Continue with current selection to trim out the multiple occurrences of unique IDs.
In column B use =INDEX(Sheet2!$B$1:$B$7,MATCH(Sheet4!$A2,Sheet2!$A$1:$A$7,0)) to get the names pulled across.
Part 3, the INDEX
Once again, we are doing a staggered input here before copying the formula across the page to cover the entirety of the data.
=IFERROR(INDEX(Sheet2!$C$1:$D$11,SMALL(IF(Sheet2!$A$1:$A$11=Sheet4!$A2,ROW(Sheet2!$A$1:$A$11)-ROW(INDEX(Sheet2!$A$1:$A$11,1,1))+1),ROUNDDOWN((COLUMN(A:A)+1)/2,0)),1),"") in C
=IFERROR(INDEX(Sheet2!$C$1:$D$11,SMALL(IF(Sheet2!$A$1:$A$11=Sheet4!$A2,ROW(Sheet2!$A$1:$A$11)-ROW(INDEX(Sheet2!$A$1:$A$11,1,1))+1),ROUNDDOWN((COLUMN(B:B)+1)/2,0)),2),"") in D
The formulas of Part 3 will need to be entered as an array (when in the formula bar hit Ctrl+Shift+Enter) . This will need to be done before copying the formulas across.
These formulas can now be dragged / copied in all directions and will feed off of the unique ID in column A.
My Answer is already rather long so I haven't gone on to break the formula down. If you have any trouble understanding how this works, let me know and I will be happy to write up a quick guide, breaking it down chunk by chunk for you.
I am trying to create multiple if and or conditions in one cell so that if they are true the cell value changes.
(In case this is relevant when you read the code I tried, I am referencing this information from a separate spreadsheet. Also, some translating: si=if, ou=or. Suggestions with English functions are totally fine though.)
My aim is to for a "1" numerical value to come back if (1) c3="VSTM", (2) j3="i", (3) if b3="1" or "3" and if i3 = 1 or 4 then value of cell becomes 1; if b3="2" or "4" and if i3 = 2 or 3 then value of cell becomes 1.
Finally h3 value is simply added to the final value from the if conditions.
I tried the following but it didn't work:
=si([Coded_Diary_Results2.xlsx]Feuil1!$C$3="VSTM",si([Coded_Diary_Results2.xlsx]Feuil1!$J$3="i",si([Coded_Diary_Results2.xlsx]Feuil1!$B$3="1",si(ou([Coded_Diary_Results2.xlsx]Feuil1!$I$3="1","4")))))=1
One problem still is even after I get this to work, I then want a kind of loop to exist which still checks b3 and c3 as normal but after checking i3 and j3 it now moves 7 columns down to check p3 (instead of i3) and q3 (instead of j3) for the same information with p3 checking for numbers 2 and 3, or 1 and 4 based on b3, whilst q will still search for only "i" as the value in the cell. I would like this loop to continue 16 times so then it checks cells w3 and x3, ad3 and ae3, ak3 and al3, ar3 and as etc.
Finally as an additional step, if possible but not necessary, after all the steps have been executed in row 3, it would be very useful if all the same steps can then be executed until row 69. For more information if this is useful, this data is from a study and each row is a participant so this is why the aim would be for all the steps to be repeated with all the participants (i.e., rows).
Any suggestions of whether it is possible to apply this idea and how my initial attempt could be improved or whether a totally different approach should be taken ?
Although your IF statement was almost correct, I found a few issues with it:-
(1) Syntax - the final IF only had one parameter
(2) The final OR wasn't quite right
(3) Need another OR in the third IF statement
(4) It only works if the numbers in I3 and B3 are stored as text - if they are stored as numbers, you don't want the quotes round "1","3" and "4".
(5) With the '=1' at the end, it will display TRUE or FALSE - this is probably OK, because you are going to have to combine it with another formula to get your second set of conditions.
So your formula will look like this (I have taken out the references to the other workbook to clarify it and allow me to test it, and used AltEnter to separate the lines):-
=IF($C$3="VSTM",
IF($J$3="i",
IF(OR($B$3="1",$B$3="3"),
IF(OR($I$3="1",$I$3="4"),1))))=1
From what you have said, you need B and C to be absolute references (so keep the $ signs in front of them) but I and J to be relative references (so remove the $ signs). That means if you copy and paste your formula 7 columns to the right, it will automatically change I and J to P and Q.
If you take out the $ signs before the numbers (e.g.$B$3 becomes $B3), you should be able to copy down or pull down the formulae to the next row and it will update correctly.
I get the feeling this is close to impossible, but here goes. I have data structured like this:
Test Group 1
pass
fail
pass
fall
Test Group 2
pass
fail
Test Group 3
pass
fail
fail
pass
I want to be able to paste it into Excel and have Excel summarise the percentage of each test group. So it would end up looking like this:
Test Group 1 Percentage Pass: 50%
pass
fail
pass
fall
Test Group 2 Percentage Pass: 50%
pass
fail
Test Group 3 Percentage Pass: 50%
pass
fail
fail
pass
BUT as you can see the test group length is not set, and it may vary. I was hoping is that I might be able to create a formula that follows this logic:
if( A<n> contains "Test")
count A<n> +1 until A<n> contains "Test"
It seems like I'm asking a lot from Excel formulas. I have spent some of today writing a small C# app that will split the configs into separate files so that I could copy and paste separate files into Excel. But it would be great to have fewer steps!
--- UPDATE ----
There were three very interesting approaches to this problem proving nothing is impossible :p
I wanted a solution that allowed me to copy and paste results in and see a load of percentages pop up so I was always sort of looking for a formula solution HOWEVER, please take the time to look at pnuts and Jerrys answers as they reveal some useful features of Excel!
chuffs answer was the one I was looking for and it worked out of the box. For anyone who wants to delve deep into how it works and why, I broke the formula down into steps and filled in some help information. The key to these formulas is combining MATCH, OFFSET and the slightly more obvious SEARCH / FIND / LEFT (I used to use IFERROR(FIND type approach, LEFT seems cleaner :) )
Do look at the documentation for these formulas but to see it all in action with some examples see the Google Spreadsheets I created detailing chuffs answer:
https://docs.google.com/spreadsheet/ccc?key=0AqODI11eAjtldDhDd2dBcFhpZW9SXzEybGtMUWMwM3c#gid=0
---P.S---
For the record I did actually create a C++ program to prettify my data and ouput it as .csv files. If I had had this info I wouldn't have bothered but am glad I tried both routes, it was a fun learning adventure.
A single, array-formula solution to this problem is possible. It requires only that a stop value "Test" be added at the bottom of your column of data.
Assuming your data are in the range A2:A18, here is the formula that would compute the average in cell B2:
=IF(LEFT(A1,4)="Test",
COUNTIF(OFFSET(A1,1,0,MATCH("Test*",$A2:$A$18,0)-1,1),"pass")
/ COUNTA(OFFSET(A1,1,0,MATCH("Test*",$A2:$A$18,0)-1,1)),
"")
The key parts of the formula are expressions that calculate the ranges for the two count functions, COUNTIF and COUNTA.
The COUNTIF function - which counts the number of passes in a group - takes two arguments, the range for the count and the condition to be met by cells counted.
I use the OFFSET function to provide the count range. OFFSETtakes five arguments:
An anchor cell (if a range is provided, the function only uses the top left cell in the range).
The number of rows below (+) or above (-) the anchor cell that the range to be returned will begin.
The number of columns to the right (+) or left (-) of the anchor cell for the start of the return range.
The number of rows to return.
The number of columns to return.
For example, OFFSET(A1,5,2,4,1) will return the range C6:D9.
In the formula, the anchor cell is A1, the row offset is 1 and the column offset is 0, in other words, the start of the range to return is A2.
The number of rows to return is computed by the MATCH function. For the calculation of the average for the first test group, MATCH looks in the range A2:A18 for the row of the first cell that starts with the word 'Test'. A2 is the cell in the pass/fail column just below the cell that starts the test group.
The last cell in the column is A18, which has been added to the data and contains the word 'Test'. This prevents the formula from returning an error for the last test group. Otherwise, the match would return an error because it could not find another occurrence of Test beyond the start of the final group.
Also note that the anchor points for the match range, i.e., $A2:$A$18, leaves unanchored the row reference to the top of the range (the 2 in A2). That's so the range will adjust downward as the formula is copied down so as to find the next, not the previous, occurrence of 'Test'.
Finally, I decrease the number of return rows by 1 to exclude the Text that MATCH found, which belongs to the next group.
The COUNTA functions - which counts the total number of passes and fails in a group - uses the same OFFSET/MATCH expression to get the range for the group.
This is an array formula and so must be entered with the Control-Shift-Enter key combination. Copy the formula down to the bottom of the data (excluding the cell with the 'Test' stop value) to calculate the averages for each group.
For convenience, here is the formula without the breaks shown above. It can be directly pasted into the worksheet (remembering to confirm it with Control-Shift-Enter).
=IF(LEFT(A1,4)="Test",COUNTIF(OFFSET(A1,1,0,MATCH("Test*",$A2:$A$18,0)-1,1),"pass")/COUNTA(OFFSET(A1,1,0,MATCH("Test*",$A2:$A$18,0)-1,1)),"")
Basically Subtotal will meet the requirement, provided some fairly tedious layout adjustments are acceptable.
Preparation:
Assuming Test Group 1 is in A1, insert Row1 and ColumnA, with in A2 =IF(LEFT(B2,1)="T",B2,"") and in C2 =IFERROR(MATCH(B2,{"fail","pass"},0)-1,"") and both formulae copied down as required. (And change fall to fail in source!)
Subtotal:
Select A:C, Data > Outline – Subtotal, At each change in: (1) Test Group 1, Use function: Average, Add subtotal to: check (Column C), uncheck Summary below data, OK.
Tidy up:
1. Move A3:C3 to A1 and A2:C2 to A3.
2. Filter A:C and for ColumnB, Text Filters, Contains add Test, OK.
3. In C3 put ="Percentage Pass: "&C4*100&"%" and copy down to last row numbered in blue (should show #DIV/0!).
4. Highlight A:C for all the rows numbered in blue and embolden.
5. Hide ColumnA and, optionally, hide or delete Rows1:2.
6. Take off filter selection.
Hopefully with preparation as on the left the result would be similar to as shown on the right:
Example re comment to #Jerry's answer:
There are many ways to tackle this -- one possible easy solution is to add a column next to your pass/fail column that says (assuming column A):
=IF(A1="pass",100,0)
if you do that, then the average of those values would be equal to the percentage pass.