I have an excel file with data A, B & C.
I want to find the min of C but only if corresponding A=B.
How can I perform this operation?
You can accomplish this with an array formula¹.
=min(if(A2:A34=B2:B34, C2:C34))
Array formulas should never be full column references, If the columns of numbers grows and shrinks occasionally, apply the following to dynamically adjust the number of cells referenced.
=min(if(A2:index(A:A, match(1e99, C:C))=B2:index(B:B, match(1e99, C:C)), C2:index(C:C, match(1e99, C:C))))
¹ Array formulas need to be finalized with Ctrl+Shift+Enter↵. If entered correctly, Excel with wrap the formula in braces (e.g. { and }). You do not type the braces in yourself. Once entered into the first cell correctly, they can be filled or copied down or right just like any other formula. Try and reduce your full-column references to ranges more closely representing the extents of your actual data. Array formulas chew up calculation cycles logarithmically so it is good practise to narrow the referenced ranges to a minimum. See Guidelines and examples of array formulas for more information.
Related
I have a range of data that I would like to have a formula return the title column detail based on three criteria in other columns and not repeat the result. I have been able to use MATCH to set the criteria but have had trouble inserting COUNTIF into the formula to remove duplicates. In summary, I would like to combine =IFERROR(INDEX($B$2:$B$10,MATCH(1,($H$2=C$2:C$10)*($H$3=$D$2:$D$10)*($H$4=$E$2:$E$10),0)),0) and =IFERROR(INDEX($B$2:$B$10, MATCH(0, COUNTIF($G$8:G8,$B$2:$B$10), 0)),0). I have provided the data and the results from the two formulas above. Is it possible to combine the two formulas above to get the desired results shown below. Hopefully returning the data in sequence along the row is not causing issues.
Use AGGREGATE to return the SMALL row number and COLUMN(A:A) to increment the k argument.
=IFERROR(INDEX($B:$B, AGGREGATE(15, 7, ROW($2:$10)/(($H$2=$C$2:$C$10)*($H$3=$D$2:$D$10)*($H$4=$E$2:$E$10)), COLUMN(A:A))), TEXT(,))
With the new Dynamic Array functions (currently only available in Excel Insider Fast builds for some), this can be done with this formula:
=UNIQUE(INDEX(FILTER(B2:E11,(C2:C11=H2)*(D2:D11=H3)*(E2:E11=H4)),,1))
Mind you, this is just one formula in ONE cell. Nothing has been copied down. The formula automatically spills into the neighboring cells. If you want the results spread across columns, wrap the formula in Transpose()
So I have some data that is being filtered. I only have a dollar value and a % of total. I am using the filters to sort the data from largest to smallest in regards to the % of Total value.
Below the data set I have a total spend calculation and then I have a Pass Through formula. The Pass Through function is the Subtotal minus a few line items.
My problem is that Pass Through function reads something like =SUM(A1:A10)-A3-A5. When I sort the data then the cells themselves move but my formula doesn't adjust. The amount changes and is now wrong.
Is there anyway so that my formula "follows" the amounts so that it's always correct no matter what sorting I do?
The quickest way is to add a "helper" column of 0's and 1's to your table and use SUMPRODUCT. Here is an example where Cell A11 contains your formula =SUM(A2:A10)-A3-A5 and Cell C11 contains a modified formula =SUMPRODUCT(A2:A10,C2:C10)
Without sorting ...
Sorted ascending ...
Of course, with more complex formula, it can be more complicated to perform.
i have single file excel with 2 sheets :
-Sheet1
Column A : {1,2,3,4,5}
-Sheet2
Column A : {2,5}
my question is, how to display numbers not in Sheet2 from Sheet1?
so the result numbers is {1,3,4}
thanks!
There are many ways you can do this, the simplest is to use Vlookup all the way down the column of one sheet to determine if the same value exists in the other sheet. (Thus creating a "Difference" flag). You can then use excel filters on the sheet to either keep what matched or remove it.
I can demonstrate using two column on a single worksheet. You should have no trouble transcribing this to two worksheets for your own purposes.
As an array formula¹ in D2 (empty or column header label cell above first cell with formula is required),
=INDEX(A$1:A$5, MATCH(0, IF(ISNA(MATCH(A$1:A$5, B$1:B$2, 0)), COUNTIF(D$1:D1, A$1:A$5&""), 1), 0))
Add error control with the IFERROR function and fill down as necessary.
¹ Array formulas need to be finalized with Ctrl+Shift+Enter↵. If entered correctly, Excel with wrap the formula in braces (e.g. { and }). You do not type the braces in yourself. Once entered into the first cell correctly, they can be filled or copied down or right just like any other formula. Try and reduce your full-column references to ranges more closely representing the extents of your actual data. Array formulas chew up calculation cycles logarithmically so it is good practise to narrow the referenced ranges to a minimum. See Guidelines and examples of array formulas for more information.
This is to go further from the following post.
How to overcome the max length limit of a formula in Excel? a bug of excel?
Here is a problem I am trying to solve. Given a data set with categories and values
category value1 value2
a 1.0 ...
a 2.0
a 1.0
a 3.0
b 1.0
b 5.0
b 2.0 ...
...
I want to validate these values by checking if the value change from the row above is within one sigma deviation of its category. That means we need to skip the first row of each category.
Here is what I tried:
The following formula works for cells of each category beginning from the second row to the last row of each category.
=INDIRECT(ADDRESS(ROW(), COLUMN())) - INDIRECT(ADDRESS(ROW()-1, COLUMN())) <
1.0*STDDEV.P(INDIRECT(ADDRESS(MATCH(INDIRECT("A" & ROW()), $A:$A, 0), COLUMN()) & ":" &ADDRESS(MATCH(INDIRECT("A"&ROW()),$A:$A, 1), COLUMN())))
It works pretty fast, but we need to clear the data validation for the first row of each category.
Here is a solution provided by #user3964075
{=IF($A2<>$A1,TRUE,B2-B1<STDEV.P(IF($A:$A=$A2,B:B)))}
The problem is the performance. It need more than ten minutes for 200Kb data set.
What is the fastest formula to do this?
Beyond the STDDEV.P (STDEV.P function...?) in the original question, the flagrant use of volatile¹ functions in underlying processes like data validation has got to be killing your calculation cycles. Substituting for array formulas² with full column references is not helping.
Using Dynamic Named Ranges
The following method is going to create some dynamic named ranges. There will be one for the contiguous data block 'island' originating in A1 and others that will reference column A on whatever row you use them as well as cells above whatever cell it is being used in and the range from which the STDEV.P function will be getting its returned result.
While there is not much point in using these beyond the worksheet for which they are intended, they will be given Workbook (as opposed to worksheet) scope. I cannot see anywhere that the worksheet was actually named so I will work with Sheet2. Adjust the worksheet name if necessary. Note that you will being using varying combinations of absolute and relative columns and rows in the cell range references.
Select A2 on Sheet2. This is IMPORTANT!
Go to Formulas ► Defined Names ► Name Manager. When the Name Manager dialog opens, click New to create each of these named ranges.
stdCAT - Will reference the cell in column A that corresponds to whatever row it is being used on.
Name: stdCAT
Scope: Workbook
Refers to: =Sheet2!$A2
stdVALa - Will reference the actual cell that the formula it assists is in.
Name: stdVALa
Scope: Workbook
Refers to: =Sheet2!A2
stdVALb - Will reference the cell above the cell that the formula it assists is in.
Name: stdVALb
Scope: Workbook
Refers to: =Sheet2!A1
stdDATA - Will reference the data island extending right from A1 for as many cells as there are in row 1 and down from A1 for as many cells as there are in column A.
Name: stdDATA
Scope: Workbook
Refers to: =Sheet2!$A$1:INDEX(Sheet2!$A:$Z, MATCH("zzz", Sheet2!$A:$A), MATCH("zzz", Sheet2!$1:$1))
stdRNG - Will reference the extents of the range of data in the current column from the row containing the first instance of the associated in columnA to the last.
Name: stdRNG
Scope: Workbook
Refers to: =INDEX(stdDATA, MATCH(Sheet2!$A2, Sheet2!$A:$A, 0), COLUMN()):INDEX(stdDATA, MATCH(Sheet2!$A2, Sheet2!$A:$A, 1), COLUMN())
When these named ranges have been created, click Close in the lower-right to return to your worksheet.
Select Sheet2!B3:B8 (from your sample data) and choose Data ► Data Tools ► Data Validation. Opt for Allow: Custom and supply the following for the Source:,
=(stdVALa-stdVALb)<STDEV.P(stdRNG)
Click OK to create the Data Validation Rule. Select other ranges and create other Data Validation Rules as necessary.
What you've accomplished is a complete bypass of the No reference operators data validation restriction and you've done this without resorting to volatile¹ functions or array formulas² with full column references. Each named range and its subsequent calculation load when used in a formula is only as large as it absolutely has to be. The full column references used in the MATCH functions will not negatively impact the calculation load as by their very nature the lookup_value has to be found.
You should be able to recognize close similarities with the resulting formula and your original method and that is by design. It was not my intention to provide a new formula so much as to redesign the working model to a more efficient version.
¹ Volatile functions recalculate whenever anything in the entire workbook changes, not just when something that affects their outcome changes. Examples of volatile functions are INDIRECT, ADDRESS, TODAY and OFFSET.
² Array formulas chew up calculation cycles logarithmically as their referenced cell ranges grow. Always try to keep any cell range references to an absolute minimum when used in an array formula.
The performance of the array formula is bad because it includes all rows in columns A and B until max row 1048576. It should be much faster if you could limit the max row. Example:
{=IF($A2<>$A1,TRUE,B2-B1<STDEV.P(IF($A$1:$A$1000=$A2,B$1:B$1000)))}
To increase the performance further, you could use helper columns for calculating of standard deviation.
Example:
Formula in F2 downwards and sidewards:
{=IF($A2<>$A1,STDEV.P(IF($A$1:$A$100000=$A2,B$1:B$100000)),F1)}
This is an array formula. Put it into the cell without the curly brackets and press [Ctrl]+[Shift]+[Enter] to confirm.
With this formula the STDEVP(IF($A$1:$A$100000=$A2,B$1:B$100000)) needs only to be calculated if the category changes in column A. This should be much fewer times as if it must be calculated for each row.
The formula for the validation in B2:En has then to be:
=IF($A2<>$A1,TRUE,B2-B1<F2)
Excel defines shared formulas and array formulas. What is the difference?
My understanding is that array formulas are now obsolete. Is this true?
Is it possible to transform array formulas into shared formulas?
Look at section 4.8 of The Microsoft Excel File Format (PDF ref) from OpenOffice:
An array formula (BIFF2-BIFF8) and a shared formula (BIFF5-BIFF8) is a formula spanning over a range of cells. Array
formulas are handled different from single cell formulas in a spreadsheet. Shared formulas are only an optimisation to
decrease the file size, they are not distinguishable from other cell formulas. Naturally an array formula cannot be a
shared formula at the same time. Shared formulas are created for instance when filling a cell range from a single formula
cell.
In general an array or shared formula is stored only once in a file, either in the ARRAY record (➜5.4) for array formulas,
or in the SHAREDFMLA record (➜5.94) for shared formulas. These records are part of the Formula Cell Block
(➜4.7.2). They immediately follow the first FORMULA record (➜5.50) for this range20. All array or shared formula cells contain a reference to the formula data. This reference (tExp token, ➜3.10.1) consists of the cell address of the top left cell of the range. In this way each formula cell can be associated with its formula data.
If a formula returns a string value, a STRING record (➜5.102) follows the FORMULA record normally. In the case of
array and shared formulas, this STRING record follows the ARRAY or SHAREDFMLA record.
20 For shared formulas the first FORMULA record may not be the top-left cell of the range. It is possible to overwrite single cells of a shared formula range without invalidating the shared formula itself (the remaining formula cells).
Shared formulas are simply a more efficient means of storing formulas.
Array formulas add significant functionality and are definitely not obsolete. For example, the MMULT function can return multiple values. To get these multiple values into multiple cells you must use an array formula. Array formulas are entered into a range of cells by selecting the range, typing the formula, and then pressing CTRL+SHIFT+ENTER.