Columns A & B contain a sample of data, with shop being an identifier (only two shown here)
I've setup two ranges in columns D & E which I can use the DSTDEV() function in column G
e.g. G2 formula is =DSTDEV($A$1:$B$9,2,D1:E2)
and G4 formula is =DSTDEV($A$1:$B$9,2,D4:E5)
so the output currently is:
But in reality I have a tonne of shop identifiers so I wanted to be able to use the DSTDEV() as a flood-fill formula, ideally with an output like so:
Where I could calculate the standard deviation in column E for each shop in column D
I basically wanted DSTDEV() to work like SUMIF() but the criteria has to be a range, and I'm looking for a way round that?!
I thought I would ask before I go creating a UDF to do what I needed!
I tried supplying a split range in the format e.g. (E1,E3) as the criteria but that didn't work
An option would be using Pivot Tables:
Subtotal and total fields in a PivotTable
StDev: An estimate of the standard deviation of a population, where the sample is a subset of the entire population.
This will return exactly the result you want, besides it's really easy to manage and setup
Another option would be use of the STDEV.P function integrated with FILTER. This relies on two premises:
STDEV.P calculates the population standard deviation based upon a list of values
FILTER can generate a dynamic array of values based on a single criteria when compared against a query list
So a FILTER array would look like this:
Now that the list of individual values have been generated, this can simply be wrapped with the STDEV.P (or any other function based on lists) like this:
From there, you can anchor all elements except for the bold cell, and then flood fill accordingly:
=STDEV.P(FILTER(B2:B9,A2:A9=**D2**,))
Alternatively, it may be useful to include headers and named dynamic ranges via PivotTable, and you could avoid anchoring the formula.
Related
I would like to apply a Filter function on multiple columns ranging from A:G and only have columns B,D in the output. How can I do it?
For Example =FILTER($A$1:$G$7,$K$1:$K$7=$K$1) results in a spilled array of rows that match the condition, but the output will still have 7 columns(A:G). Can I choose to only output Column B & D?
TL;DR
Option1:
=FILTER(FILTER(A1:G7,K1:K7=K1),{0,1,0,1,0,0,0})
Option2: - Reference
=FILTER(INDEX(tblData,SEQUENCE(ROWS(tblData)),{4,3,5}),tblData[Customer Name]=I3)
Option3: - Answered by Rory
=FILTER(CHOOSE({1,2},B1:B7,D1:D7),$K$1:$K$7=$K$1)
Option4: - Commented by P.b
=FILTER(FILTER($A$1:$G$7,$K$1:$K$7=$K$1),(COLUMN(A:G)=COLUMN(B:B))+(COLUMN(A:G)=COLUMN(D:D)))
Explanation
Option 1
You can nest the original FILTER function inside another FILTER function and specify an array of 1's and 0's mentioning which column you need and which you don't.
For Example, in the above question if I want only Column B & D, I can do this:
=FILTER(FILTER(A1:G7,K1:K7=K1),{0,1,0,1,0,0,0})
Since B & D are the 2nd & 4th columns, you need to specify a 1 at that position in the array
Similarly if you want to filter columns from C:K and only output columns C, D & G, then your formula would be:
=FILTER(FILTER(C1:K7,M1:M7=M1),{1,1,0,0,1,0,0,0,0})
Pros & Cons - Option1
This formula is the simplest of all and easy to understand
You can NOT change the order of output. You can only hide/unhide in the original sequence
You can apply this on a Range of multiple columns without much change
Option2
Another way to do this which is complex looking is this:
Note that this method allows you to change the order of output columns. You can refer to following site for detailed explanation on how this works.
=FILTER(INDEX(tblData,SEQUENCE(ROWS(tblData)),{4,3,5}),tblData[Customer Name]=I3)
Pros & Cons - Option2
This formula looks complex, but is straight-forward once you understand the logic
You can change the order of output columns as required
You can apply this on a Range of multiple columns without much change
Options 3
This is actually the answer provided by Rory
=FILTER(CHOOSE({1,2},B1:B7,D1:D7),$K$1:$K$7=$K$1)
Pros & Cons - Option3
This formula is complex, especially for returning a range of continuous columns
You need to explicitly mention each output column individually
You can change the order of output columns as required
Applying this on to output multiple continuous ranges gets tricky (For Example you cannot replace B1:B7 with B1:C7 in above formula)
Options 4
Based on comment from P.b below
=FILTER(FILTER($A$1:$G$7,$K$1:$K$7=$K$1),(COLUMN(A:G)=COLUMN(B:B))+(COLUMN(A:G)=COLUMN(D:D)))
Pros & Cons - Option4
This formula is the simple and somewhat similar to option 1.
You can NOT change the order of output. You can only hide/unhide in the original sequence
You can apply this on a Range of multiple columns without much change
There's a similar question that's asked in reference to Google Sheet. But Google Sheet also has the Query function which explicitly supports choosing specific columns
You could also use CHOOSE like this:
=FILTER(CHOOSE({1,2},B1:B7,D1:D7),$K$1:$K$7=$K$1)
This also allows you to reorder columns in the output by changing their order in the CHOOSE function.
Try the new CHOOSECOLS function (beta channel at time of writing):
=CHOOSECOLS(filtered_array, {2,4})
I use option #2 exclusively these days, but using a range of cells instead of the array.
=FILTER(INDEX(tblZero,SEQUENCE(ROWS(tblZero)),A2:M2),tblZero[TakenBy]=A1)
Using the [#Headers] of my Table gives me column headings as well.
=FILTER(INDEX(tblZero[#Headers],SEQUENCE(ROWS(tblZero[#Headers])),A2:M2),'Zero Dollar Review Data'!A1<>"")
I use the row directly above the column headings to hold the column numbers. This way, I simply have to enter a column number and I get both the column heading and the data. I can also use some simple formulas in my column number cells to create outputs that are custom to the criteria used. So if I enter "TBT" as my TakenBy value I can display a set of columns unique for that particular Rep., then if I enter "DXD", I can display a different set of columns.
By far the easiest way of doing this.
Apply range, your selection of columns in the preferred order and with the option to work your way in from the right side with -
=CHOOSECOLS(FilteredRange,2,4)
=CHOOSECOLS(FILTER($A$1:$G$7,$K$1:$K$7=$K$1),2,4)
=CHOOSECOLS(FILTER($A$1:$G$7,$K$1:$K$7=$K$1),4,2) 'Custom order
=CHOOSECOLS(FILTER($A$1:$G$7,$K$1:$K$7=$K$1),-6,-4) 'From the right side
I would like to create a data validation field that draws values from a table, but only those values where a corresponding value matches a cell reference: the Month
Ideally this should be done dynamically, rather than having to create multiple named ranges.
I have included an image highlighting the target values I want to pull for January.
Edit: I am exploring using the FILTER formula, but no luck so far.
Edit 2: I got FILTER to work to provide the set of values I'm looking for, but it doesn't seem to want to work as data validation.
=FILTER(tblDate[Date],(tblDate[Month]=E2),"")
As has been commented, the Data Validation list will not accept FILTER, but it will accept other formulae.
In the example, you can set the List validation as:
=OFFSET(B2,MATCH(F2,A2:A18,0)-1,0,COUNTIF(A2:A18,F2))
keying off cell F2 for the value of the month.
NB: This works if the months are ordered (as in the given data). Also note that the List formula doesn't seem to like the Table[] syntax, so you have to put the ranges in as R1C1 format.
Alternatively you can use FILTER() for non-ordered data but put the results in a hidden column (Column H in the example below) on the spreadsheet. It is not as neat, but is more flexible, and allows the Table[] syntax.
eg
H2 = FILTER(Table1[Date],Table1[Month]=F2)
List Range
=$H$2#
(The # uses the result of the FILTER array function from cell $H$2: Hat-tip to #Ike)
I would like to be able to use Excel's filter formula and get only specific columns as a result.
For example, I tried the below formula and failed.
=FILTER((A:B,D:D),A:A=3475,"")
How can I get this working? I want to get the filtered result where any value in column A equals 3475, and only get columns A,B and D
You could use a single one formula like:
=TRANSPOSE(CHOOSE({1,2,3},FILTER(A:A,A:A=3475),TRANSPOSE(FILTER(B:B,A:A=3475)),TRANSPOSE(FILTER(D:D,A1:A4=3475))))
But considering performance, I'd go with two seperate formulas as proposed in the comments.
You need use the proper array for the array argument to the filter function.
I used a Table since using whole-column references is inefficient.
For example, if you want to return only columns 1,2 and 4 of a table, you can use:
=INDEX(Table1,SEQUENCE(ROWS(Table1)),{1,2,4})
So your filter function might be:
=FILTER(INDEX(Table1,SEQUENCE(ROWS(Table1)),{1,2,4}),Table1[colA] = myVar)
IF, for some reason you don't want to use Tables, the following formula should also work:
=FILTER(INDEX($A:$D,SEQUENCE(LOOKUP(2,1/(LEN($A:$A)>0),ROW($A:$A))),{1,2,4}),myVar=INDEX($A:$A,SEQUENCE(LOOKUP(2,1/(LEN($A:$A)>0),ROW($A:$A)))))
as would, the less efficient:
=FILTER(INDEX($A:$D,SEQUENCE(ROWS($A:$A)),{1,2,4}),myVar=$A:$A)
I have the following Table as an example:
Name Task Amount
Jennifer Sing 10
Tom Dance 15
Joe Jump 72
Mandy Scream 10
And supporting lists:
Names Tasks
Jennifer Dance
Joe Sing
Jump
I need to find the total of the sum of amounts where Name is in the Names list AND Task is in the Tasks list. In other words, if the person AND the task are in the list of relevant people and tasks, total their amount.
So, for example, the total would be 10+72=82.
I have tried to name the ranges of the criteria lists as RelevantNamesList and RelevantTasksLists, and the input table columns as Names and Tasks and Hours, and then using sumif, however I am not even able to get it to work with a single condition.
=SUMIF( Names, ( -- ( ISTEXT(VLOOKUP( Names, RelevantNamesList, 1, FALSE))) ), Amounts)
The actual result I get using the above code is a 0, which is obviously not correct. I have also tried to use sumproduct, with no success. I am beginning to think that I wont be able to use this without helper columns.
Is there a way to do this without helper columns?
Thanks in advance!
Jacqueline
Give this a try:
=SUMPRODUCT((COUNTIF(<Names To Lookup>,<All Names List>)>0)*(COUNTIF(<Tasks To Lookup>,<All Tasks List>)>0)*<All Amounts List>)
So if you had a data setup like this:
Then the formula would be:
=SUMPRODUCT((COUNTIF($E$2:$E$3,$A$2:$A$5)>0)*(COUNTIF($F$2:$F$4,$B$2:$B$5)>0)*$C$2:$C$5)
EDIT:
Per comment, the lookup criteria could be partial matches. For example, the name in the list is Ms Jennifer Keim and what's being looked up would be Jennifer. In order to accomodate this, you'd need to switch tactics to the DSUM function. This means that you'll need to alter how you setup the criteria.
There are two ways to do this, the first is to create a row for each set of criteria you want, and the second is to have the limited list like you originally have and then setup criteria formulas you then feed into the DSUM. Here's the data setup for the first scenario (note that the lookup headers must exactly match the table headers and that you're putting the wildcards for partial match directly in the criteria):
The DSUM formula in this scenario is:
=DSUM($A$1:$C$5,"Amount",$E$1:$F$7)
For the second scenario, we setup the limited criteria like you originally had it, but now we need helper formulas to feed into the DSUM. Note that the helper formula headers must not be in your original datatable (here i've added "Check" to the end of the header names as an example):
The formulas need to reference the first cell in your datatable and perform the check against your criteria. The DSUM formula will expand that formula's check against every row, so we only need this for the first row to establish the logic that DSUM will use. Here are the two formulas for NameCheck (cell H2 in this example) and TaskCheck (I2). We are using these formulas to allow for partial matches when looking up the criteria against the datatable:
NameCheck: =SUMPRODUCT(--(LEN(SUBSTITUTE(A2,$E$2:$E$3,""))<LEN(A2)))
TaskCheck: =SUMPRODUCT(--(LEN(SUBSTITUTE(B2,$F$2:$F$4,""))<LEN(B2)))
Now the DSUM criteria argument references those formula cells and becomes:
=DSUM($A$1:$C$5,"Amount",$H$1:$I$2)
For further reading and information regarding the DSUM formula, Contextures has a great explanatory article with an example workbook you can download and experiment with: https://contexturesblog.com/archives/2012/11/15/dsum-and-excel-tables-sum-with-multiple-criteria/
I've the following data set from which I need the count of distinct values in a pivot. I've tried few function like FREQUENCY, COUNTIFS etc. but I could not make it.
Input
Input Data
Output
Expected Output
=SUM(IF((B2:D4=C10),1,0))
To get result after using formula hit ctrl+shift+enter
I think it's an awkward case because the data values are in more than one column and because they are text not numbers.
The only way I could come up with would be to repeat a standard method of getting the distinct values and then use COUNTIF to get the counts.
So starting in F2 I have:-
=IFERROR(INDEX($B$2:$B$4,MATCH(0,COUNTIFS($F$1:$F1,$B$2:$B$4),0)),
IFERROR(INDEX($C$2:$C$4,MATCH(0,COUNTIFS($F$1:$F1,$C$2:$C$4),0)),
IFERROR(INDEX($D$2:$D$4,MATCH(0,COUNTIFS($F$1:$F1,$D$2:$D$4),0)),"")))
(It's an array formula and must be entered with CtrlShiftEnter)
And starting in G2:-
=COUNTIF($B$2:$D$4,F2)
To avoid having to specify an exact range (e.g. $B2:$B4), you could use the following in F2 and adjust it to the maximum number of rows you are likely to use:-
=IFERROR(INDEX($B$2:$B$10,MATCH(0,IF(ISTEXT($B$2:$B$10),COUNTIFS($F$1:$F1,$B$2:$B$10),1),0)),
IFERROR(INDEX($C$2:$C$10,MATCH(0,IF(ISTEXT($C$2:$C$10),COUNTIFS($F$1:$F1,$C$2:$C$10),1),0)),
IFERROR(INDEX($D$2:$D$10,MATCH(0,IF(ISTEXT($D$2:$D$10),COUNTIFS($F$1:$F1,$D$2:$D$10),1),0)),"")))
and this in G2:-
=IF(F2="","",COUNTIF($B$2:$D$10,F2))
but of course it's restricted to three columns and anything beyond this I think may point to a VBA solution.
There is a also a general formula for distinct values from a 2d array here but the output includes a zero when blank rows and columns are included so would need some modification.
So here is the modified formula from the reference above with error handling starting in I2:-
=IFERROR(INDEX(tbl_text, MIN(IF( IF(ISTEXT(tbl_text),COUNTIF($I$1:$I1, tbl_text),1)=0, ROW(tbl_text)-MIN(ROW(tbl_text))+1)),
MATCH(0, COUNTIF($I$1:$I1, INDEX(tbl_text, MIN(IF(IF(ISTEXT(tbl_text),COUNTIF($I$1:$I1, tbl_text),1)=0, ROW(tbl_text)-MIN(ROW(tbl_text))+1)), , 1)), 0), 1),"")
With the counts starting in J2:-
=IF(J2="","",COUNTIF(tbl_text,J2))
where tbl_text is a named range defined (when I tested it) as $B$2:$E$10
This I think should meet your additional criterion of it being more generalized because you can set tbl_text to include the maximum number of rows and columns that you are likely to use.
Will need a slight further modification to ignore blanks within the table.