Excel CountifS. Criteria multi-column ranges. Non-ordered comparison test - excel

It's my first question here, so please don't kill me if something is wrong. I have found numerous solutions on this site, but not this time. Unfortunately I can't post images yet. It won't be easy, but I will try.
To the point:
My data has the following headers:
Decision_Id Opponent1 Opponent2 Opponent3 Suitor1 Suitor2 Suitor3 Suitor4
Decision_id is a unique integer identifier. The rest are strings.
Each row represents a particular judicial decision. Each Decision can have UP TO 3 opponents (defending party) and UP TO 4 suitors (attacking party).A particular party can be a suitor in one decision and an opponent in another one.
What I want to get :
Cross-table where both rows and columns headers are all distinct parties I encounter in the table. (no problem with that, done.)
Where each cell shows in how many distinct decisions a particular opponent (defined by row header) was attacked by a particular suitor (column header) => All diagonal cells equal ZERO (a party can't attack itself) and table is not symmetric.
I have tried
to apply to the first cell and than expand:
=COUNTIFS("Fixed range of all opponents :$B$2:$D$6","the wanted opponent value : $A2", "Fixed range of all suitors :$E$2:$H$6", "the wanted suitor value : B$1")
I had an error. I figured out that criteria ranges have to be of the same size. OK, created dummy empty columns => no error, BUT, the results are clearly underestimated. I think that there is a match only if opponent and suitor have the same "number". In details: For each row excel tests the opponent1 and suitor1 towards corresponding values, then opponent2 and suitor2, then opponent3 and suitor3... This actually explains why the ranges have to be of the same size.
So, What I would need
Is, for each row, to make excel test all opponents towards the wanted opponent value, test all suitor towards the wanted suitor value. If at least one opponent and one suitors correspond, give it a match and count this decision.(Even though opponent1 and suitor3 had the wanted values)
Remarks
I have already made a VBA code which does the job, but it's too slow (around 5 hours for the whole table) and I expect to do the same for different tables of this kind and/or modify this one. So I am interested in "pure excel", fast solution.
Thank you very much!

The difficult part here is to separate multi-column ranges into separate rows - one way to do that is with OFFSET within COUNTIF, i.e. this formula
=SUMPRODUCT(COUNTIF(OFFSET($B$2:$D$6,ROW($B$2:$D$6)-ROW($B$2),0,1),$A2),COUNTIF(OFFSET($E$2:$H$6,ROW($E$2:$H$6)-ROW($E$2),0,1),B$1))
That assumes that all suitors are different on any one row and all opponents are different on any one row (although formula can be modified if that isn't the case).
You can extend the ranges to any size you want - although the number of rows must be the same for each part
....or here's another more obscure way using MMULT function
=SUMPRODUCT(MMULT(($B$2:$D$6=$A2)+0,{1;1;1}),MMULT(($E$2:$H$6=B$1)+0,{1;1;1;1}))
the {1;1;1} and {1;1;1;1} represent the number of columns in each section so if you have 6 and 8 those need to be changed accordingly

Another possibility is to try this array formula:
=SUM(MMULT(-TRANSPOSE($B$2:$D$6=$A2),-($E$2:$H$6=B$1)))
entered using CTRL+SHIFT+ENTER (or defined as a name and entered normally eg =Total.)

This should do it:
= COUNTIFS($B$2:$B$6,$A2, $E$2:$E$6, B$1)
+ COUNTIFS($C$2:$C$6,$A2, $E$2:$E$6, B$1)
+ COUNTIFS($D$2:$D$6,$A2, $E$2:$E$6, B$1)
+ COUNTIFS($B$2:$B$6,$A2, $F$2:$F$6, B$1)
+ COUNTIFS($C$2:$C$6,$A2, $F$2:$F$6, B$1)
+ COUNTIFS($D$2:$D$6,$A2, $F$2:$F$6, B$1)
+ COUNTIFS($B$2:$B$6,$A2, $G$2:$G$6, B$1)
+ COUNTIFS($C$2:$C$6,$A2, $G$2:$G$6, B$1)
+ COUNTIFS($D$2:$D$6,$A2, $G$2:$G$6, B$1)
+ COUNTIFS($B$2:$B$6,$A2, $H$2:$H$6, B$1)
+ COUNTIFS($C$2:$C$6,$A2, $H$2:$H$6, B$1)
+ COUNTIFS($D$2:$D$6,$A2, $H$2:$H$6, B$1)
These look simpler if you make your data into a table, or define named ranges for the Opponent1, Opponent2, Suitor1 columns etc...

Related

EXCEL - Dual VLOOKUP and Interpolation

I have a table on Excel with data as the following:
Meaning, I have different JPH based on the %SMALL unit and the number of active stations.
I need to create a matrix like the following (with %SMALL on horizontal and STATIONS on vertical axes):
And the formula for each cell should:
Take the input of Stations (column "B")
Check, for that specific Stations number, the amount of data on the other table (like make a filter on STATIONS for the specific number)
Perform an VLOOKUP for checking the JPH based on the %SMALL value on row 2
Interpolate for the exact JPH value, if not found on table
For now, I was able to create the last part (the VLOOKUP and the interpolation), with the following:
=IFERROR(VLOOKUP(C2;'EARLY-STATIONS'!$F:$H;3;FALSE);AVERAGE(OFFSET(INDEX('EARLY-STATIONS'!$H:$H;MATCH(C2;'EARLY-STATIONS'!$F:$F;1));0;0;2;1)))
The problem I'm facing is than with this, the calculation is not checking the number of stations, so the Iteration is not accurate.
Unfortunately I cannot use VBA macros to solve this.
Any clue?
This is an attempt because more clarity is needed in terms of all possible scenarios to consider, based on different input data and how to understand the "extrapolation" process. This approach understands as extrapolation the average of two values (lower and greater), but the idea can be customized to any other way to calculate it. Per tags listed in the question I assume there is no Excel version constraint. This is O365 solution:
=LET(sm, A2:A10, st, B2:B10, jph, C2:C10, smx, F1:J1, sty, E2:E4, NULL, "",
GETLk, LAMBDA(x,y,mode, FILTER(jph, (st=y)
* (sm = INDEX(sm, XMATCH(x, sm, mode))), NULL)),
GET, LAMBDA(x,y, LET(f, FILTER(jph, (jph=GETLk(x,y, 1))
+ (jph=GETLk(x,y, -1)), NULL), IF(#f=NULL, NULL, AVERAGE(f)))),
HREDUCE, LAMBDA(yi, DROP(REDUCE("", smx, LAMBDA(ac,x,
HSTACK(ac, GET(x, yi)))),,1)),
DROP(REDUCE("", sty, LAMBDA(ac,y, VSTACK(ac, HREDUCE(y)))),1))
The above formula spills the entire result, I don't think for this case you can use a LOOKUP-like function.
Here is the output:
The highlighted cells where the average is calculated.
Explanation
The main idea is to use DROP/REDUCE/HSTACK/VSTACK pattern to generate the grid. Check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length on how to apply it.
We use two user LAMBDA functions to abstract some calculations:
GETLk(x,y,mode), filters jph name based on %SMALL and Stations columns values, based on input values x (x-axis value from the grid), y (y-axis value form the grid) respectively. The third input argument mode, is for doing the approximate search in XMATCH (1-next largest, -1 next smallest). In case the value exist in the input table, XMATCH returns the same value in both cases.
GET(x,y) has the logic to find the value or if the value doesn't exist to calculate the average. It uses the previous LAMBDA function GETLk. We filter for jph values that match the input values (x,y), but we use an OR condition in the FILTER (+), to select both lower or greater values. If the value exist, returns just one value otherwise two values are returned by FILTER (f). Finally if f is not empty we return the average, otherwise the value we setup as NULL.
HREDUCE: Concatenate the result by columns for a given row of the grid. Check the referred question for more information about it.

Excel CUBEVALUE & CUBESET count records greater than a number

I am writing a series of queries to my workbook's data model to retrieve the number of documents by Category_Name which are greater than a certain numbers of days old (e.g. >=650).
Currently this formula (entered in celll C3) returns the correct number for a single Days Old value (=3).
=CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]",
"[EDD_Report_10-01-18].[Days Old].[34]")
How do I return the number of documents for Days Old values >=650?
The worksheet looks like:
A B C
1 Date PL Count of Docs
2 10/1/2018 ALD 3
3 ...
UPDATE: As suggested in #ama 's answer below, the expression in step B did not work.
However, I created a subset of the Days Old values using
=CUBESET("ThisWorkbookDataModel",
"{[EDD_Report_10-01-18].[Days Old].[all].[650]:[EDD_Report_10-01-18].[Days Old].[All].[3647]}")
The cell containing this cubeset is referenced as the third Member_expression of the original CUBEVALUE formula. The limitation is now that the values for the beginning and end must be members of the Days Old set.
This is limiting, in that, I was hoping for a more general test for >=650 and there is no way to guarantee that specific values of Days Old will be in the query.
First time I hear about CUBE, so you got me curious and I did some digging. Definitely not an expert, but here is what I found:
MDX language should allow you to provide value ranges in the form of {[Table].[Field].[All].[LowerBound]:[Table].[Field].[All].[UpperBound]}.
A. Get the total number of entries:
D3 =CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]"),
"{[EDD_Report_10-01-18].[Days Old].[All]")
B. Get the number of entries less than 650:
E3 =CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]"),
"{[EDD_Report_10-01-18].[Days Old].[All].[0]:[EDD_Report_10-01-18].[Days Old].[All].[649]}")
Note I found something about using .[All].[650].lag(1)} but I think for it to work properly your data might need to be sorted?
C. Substract
C3 =D3-E3
Alternatively, go for the quick and dirty:
=CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]"),
"{[EDD_Report_10-01-18].[Days Old].[All].[650]:[EDD_Report_10-01-18].[Days Old].[All].[99999]}")
Hope this helps and do let me know, I am still curious!

How to remove duplicates from individual powerquery columns without removing entire rows

I have a data table that records cost savings data and I have 1 row per project. This has overall project type data such as annual spend, annual savings, etc. but also has the months the savings fall into. To pivot on this data, I converted it to a table with PowerQuery but some columns repeat such as annual spend for each month where there are savings so I might get 10 rows for savings which is correct, but the annual spend is duplicated 10 times. Can I remove duplicates in just those columns retaining the other data.
I have searched and tried various solutions but haven't found one that works. I am not set on data table format, so am open to anything.
Below is a sample of the data
Sample of PowerQuery
As you will see, Baseline Spend, Negotiated Spend, Savings Amount are all shown for each row and I need to use these in a pivot/slicer.
Any help would be appreciated.
Regards,
Keith
I think one solution might be to "only keep the first1 annual spend per project". More abstractly, "only keep the first value in column(s) X per column(s)Y".
Below is some mock/dummy data. I only want to keep the highlighted values in my annual spend column (as the highlighted values are the first "annual spend" figures per "project").
This is the M code I'm using to achieve this. (To try it, open the Query Editor > Advanced Editor (near top right) > copy-paste code below to there > OK).
let
OnlyKeepFirstValueInColumn = (someTable as table, columnsToNullify as list) as table =>
let
firstRow = Table.FirstN(someTable, 1), // This assumes first row contains a non-blank value.
remainingRows = Table.Skip(someTable, 1),
loopAndNullify = List.Accumulate(columnsToNullify, remainingRows, (tableState, currentHeader) => Table.TransformColumns(tableState, {{currentHeader, each null}})),
combined = firstRow & loopAndNullify
in combined,
FirstValueOfColumnsPerGroup = (someTable as table, groupByColumns as list, columnsToNullify as list) =>
let
group = Table.Group(someTable, groupByColumns, {{"toCombine", each OnlyKeepFirstValueInColumn(_, columnsToNullify), type table}}),
combined = Table.Combine(group[toCombine])
in combined,
aggregatedTable = Table.FromColumns({Text.ToList("aaabbbccccdddeeefg"), List.Repeat({1000}, Text.Length("aaabbbccccdddeeefg"))}, type table [project=text, annual spend=number]),
transformed = FirstValueOfColumnsPerGroup(aggregatedTable, {"project"}, {"annual spend"})
in
transformed
The important bit to understand is this line:
transformed = FirstValueOfColumnsPerGroup(aggregatedTable, {"project"}, {"annual spend"})
in which you should replace:
aggregatedTable with whatever variable/expression contains your table
{"project"} with the name of your "project" column (keep the curly braces {} though as they let you pass in several columns if needed)
{"annual spend"} with the names of whichever column(s) you want to keep only the first value in (keep the curly braces {})
This is what I get (which I think is similar to what you want):
1To keep things simple, we'll say "first" here means the value in the first row. It could have meant "first non-null value" or "first value satisfying some particular condition or logic", but your data suggests the simpler definition will work okay.

Any simple way to do VLOOKUP combine "linear interpolation" in excel?

I'm making an excel sheet for calculating z-score for infant weight/age (Input: "Baby Month Age", and "Baby weight"). To do that, I need get LMS parameters first for a specific month, from below table.
http://www.who.int/childgrowth/standards/tab_wfa_boys_p_0_5.txt
(For Integer Month number, this can be done by vlookup Method without issue.) For Non-Integer Month number, I need use some kind of "linear interpolation" approach to get an approximate LMS data.
The question is, both Trend method and Vlookup method are not working for me. For Trend method, it is not working as the raw data, like L parameters is not linear data, if I use Trend method, for the several top month, return data will far from existing data. As for Vlookup method, it just finds the closest month data.
I had to use multiple "Match" and "Index" Method to do the "linear interpolation" for myself. However, I wonder whether there is any existing function for that?
My current formula for L parameters is below:
=MOD([Month Age],1)*(INDEX('WHO BOY AGE WEIGHT'!A:D,MATCH([Month Age],'WHO BOY AGE WEIGHT'!A:A)+1,2)-INDEX('WHO BOY AGE WEIGHT'!A:D,MATCH([Month Age],'WHO BOY AGE WEIGHT'!A:A),2))+INDEX('WHO BOY AGE WEIGHT'!A:D,MATCH([Month Age],'WHO BOY AGE WEIGHT'!A:A),2)
If we assume that months increment always by 1 (no gap in month data), you can use something like this formula to interpolate between the two values surrounding the give non-integer value:
=(1-MOD(2.3, 1))*VLOOKUP(2.3,A:S,2)+MOD(2.3, 1)*VLOOKUP(2.3+1,A:S, 2)
Which interpolates L(2.3) from data of L(2) = .197 and L(3) = .1738, resulting in .19004.
You can replace 2.3 by any cell reference. You can also change the lookup column 2 for L into 3 for M, 4 for S etc.
To answer the question whether there is some direct "interpolate" function in Excel, not that I know about, although there is good artillery for statistical estimation.

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800
tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

Resources