How can I extract a total count or sum of agents who made their first sale in a specified month? - excel

I am trying to extract some data out of a large table of data in Excel. The table consists of the month, the agent's name, and either a 1 if they made a sale or a 0 if they did not.
What I would like to do is plug in a Month value into one cell, then have it spit out a count of how many agents made their first sale that month.
Sample Data and Input Output area
I found success by creating a secondary table for processing a minif and matching to agent name, then countif in that table's data how many sales months matched the input month. However I would like to not have a secondary table and do this all in one go.
=IF(MINIFS(E2ERawData[Date Group],E2ERawData[Agent],'Processed Data'!B4,E2ERawData[E2E Participation],1)=0,"No Sales",MINIFS(E2ERawData[Date Group],E2ERawData[Agent],'Processed Data'!B4,E2ERawData[E2E Participation],1))
=COUNTIFS(ProcessedData[Month of First E2E Sale],H4)

Formula in column F is:
=MAX(0;COUNTIFS($A$2:$A$8;E3;$C$2:$C$8;1)-SUM(COUNTIFS($A$2:$A$8;"<"&E3;$C$2:$C$8;1;$B$2:$B$8;IF($A$2:$A$8=E3;$B$2:$B$8))))
This is how it works (we'll use 01/03/2022 as example)
COUNTIFS($A$2:$A$8;E3;$C$2:$C$8;1) This counts how many 1 there are for the proper month (in our example this part will return 2)
COUNTIFS($A$2:$A$8;"<"&E3;$C$2:$C$8;1;$B$2:$B$8;SI($A$2:$A$8=E3;$B$2:$B$8)) will count how many 1 you got in previous months of the same agents (in our example, it will return 1)
Result from step 2, because it's an array formula, we sum up using SUM() (in our example, this return 1)
We do result from step 1 minus result from step 3 (so we get 1)
Finally, everything is inside a MAX function to avoid negative results (February would return -1 because there were no sales at all and agent B did a sale on January, so it would return -1. To avoid this, we force Excel to show biggest value between 0 and our calculation)
NOTE: Because it's an array formula, depending on your Excel version maybe you must introduce pressing CTRL+ENTER+SHIFT

If one has got access to the newest functions:
=LET(X,UNIQUE(C3:C9),VSTACK({"Month","Total of First time sales"},HSTACK(X,BYROW(X,LAMBDA(a,SUM((C3:C9=a)*(MINIFS(C3:C9,D3:D9,D3:D9,E3:E9,1)=C3:C9)))))))

Related

Sort data contained in blocks in excel

I have a large amount of reference data in excel, which I am trying to manipulate in a variety of ways. I'm having some problems with the way it is structured and sorting into a more manageable format.
Problem number 1:
I have three columns. Column A contains first a date, and then a designator of high or low. Column B contains times, Column C contains heights.
I would like to sort the data by column B (easy enough) EXCEPT I would like the date headings in Column A preserved. It's almost as though I have 365 tables, each with between 3 and 5 pieces of data - I'm looking to sort the 3 - 5 pieces of data within each date only.
This is what I have currently:
There's no issue with me taking the data and manipulating it some other way first - this is ultimately around me being able to take a batch of data (5x different reference points, each for 365 days) and develop a process to sanitise it and get it displayed in time order, as well as being able to get it into a usable format for problem 2 (I need to adjust some other data points by the sorted data once I have it).
This is what I would like it to look like (I manually went through each of these blocks and sorted them):
It is possible to do it in Excel as follows in cell E2:
=LET(rng, A1:C11, set, FILTER(rng, (INDEX(rng,,1) <>"")),
dates, SCAN("", INDEX(set,,1), LAMBDA(acc, item, IF(ISNUMBER(item), item, acc))),
in, FILTER(HSTACK(dates, set), INDEX(set,,2)<>""), inDates, INDEX(in,,1),
out, REDUCE("", UNIQUE(inDates), LAMBDA(acc, date,
LET(sorted, VSTACK(date, DROP(SORT(FILTER(in, inDates = date),3),,1), {"","",""}),
VSTACK(acc, sorted)
))), IFERROR(DROP(DROP(out,1),-1),"")
)
Here is the output:
You can avoid the clean-up process except for removing the last row as follow:
=LET(rng, A1:C11, set, FILTER(rng, (INDEX(rng,,1) <>"")),
dates, SCAN("", INDEX(set,,1), LAMBDA(acc, item, IF(ISNUMBER(item), item, acc))),
in, FILTER(HSTACK(dates, set), INDEX(set,,2)<>""), inDates, INDEX(in,,1),
out, REDUCE("", UNIQUE(inDates), LAMBDA(acc, date,
LET(sorted, VSTACK(HSTACK(date,"",""), DROP(SORT(FILTER(in, inDates = date),3),,1),
{"","",""}), IF(MAX(LEN(acc))=0, sorted, VSTACK(acc, sorted))
))), DROP(out, -1)
)
Explanation
Basically is to carry out the manual steps but using excel functions. The name set, is the same as the input data (rng) but we removed the empty rows. The name dates, is a column with the same size as rng, repeating all the dates. The condition in the SCAN function to identify a new date is ISNUMBER because dates are stored in Excel as whole numbers. The name in has the data in the format we want for doing the sorting and filter by date removing the date header and adding as the first column the dates.
Now we use DROP/REDUCE/VSTACK pattern (check the answer to the question: how to transform a table in Excel from vertical to horizontal but with different length provided by David Leal) to append each sorted data for a given unique date. We add the date as the first row, then sorted data, and finally an empty row to separate each group of data. Finally, we do a clean-up via IFERROR/DROP to remove the #N/A values and the first and the last empty row.

How to define an array of numbers with a formula

I have a project where I need to break people into 3 buckets with task lists that rotate quarterly (Phase A = task list 1, B = task list 2, C = task list 3). The goal here is to sort people into the buckets based on a departure date, with the ideal being that they would depart when they're in the C phase. I have a formula already set up that will tell me the number of quarters between the project start date and the person's departure date, so now I'm trying to figure out how to get Excel to tell me if a person's departure date falls within their bucket's C Phase.
I have this formula in a column called DEROSQtr:=ROUNDDOWN(DAYS360("1-Oct-2020",[#DEROS],FALSE)/90,0)
Now the easy way to approach this would be to build a static array and just see if that formula results in a value in the right array, where the numbers in the array define which quarter from Oct 2020 that the bucket's C Phase is going to be in:
ArrayA = {1;4;7;10;13;16} ArrayB = {2;5;8;11;14;17} ArrayC = {0;3;6;9;12;15}
The formula that pulls this all together is then:
=IF([#EFP]="A",IF(IFNA(MATCH([#DEROSQtr],ArrayA,0),-1)<>-1,TRUE,FALSE),IF([#EFP]="B",IF(IFNA(MATCH([#DEROSQtr],ArrayB,0),-1)<>-1,TRUE,FALSE),IF([#EFP]="C",IF(IFNA(MATCH([#DEROSQtr],ArrayC,0),-1)<>-1,TRUE,FALSE),"-")))
Now while this will work for as long as I build out the static array, I'm trying to figure out how to define each of these buckets with a formula that Excel can work with, i.e. bucket A hits phase C in 3n + 1 quarters where n is the number of cycles through all 3 phases, so ArrayA = 3n+1, ArrayB = 3n+2 and ArrayC = 3n. What I'm hunting for here is the best way to define each of the arrays as a formula.
After some additional digging and looking back at how to define each array, I came across the MOD() function in Excel. I was then able to rewrite the formula that does the checking as =IF([#EFP]="A",IF(MOD([#DEROSQtr]-1,3)=0,TRUE,FALSE),IF([#EFP]="B",IF(MOD([#DEROSQtr]-2,3)=0,TRUE,FALSE),IF([#EFP]="C",IF(MOD([#DEROSQtr],3)=0,TRUE,FALSE),"-"))), replacing ArrayA(3n+1) with MOD([#DEROSQtr]-1,3), ArrayB(3n+2) with MOD([#DEROSQtr]-2,3), and ArrayC(3n) with MOD([#DEROSQtr],3).
Since I do not have the data you are calculating your quarter, its difficult to give you exact answer. However, as I understand your have a column which has the formula to calculate the quarter say "Formula_Col"
Solution will be to add a new column and flag it based on the values in "Formula_Col".
If you can give some sample data I can provide exact answer.

Excel CUBEVALUE & CUBESET count records greater than a number

I am writing a series of queries to my workbook's data model to retrieve the number of documents by Category_Name which are greater than a certain numbers of days old (e.g. >=650).
Currently this formula (entered in celll C3) returns the correct number for a single Days Old value (=3).
=CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]",
"[EDD_Report_10-01-18].[Days Old].[34]")
How do I return the number of documents for Days Old values >=650?
The worksheet looks like:
A B C
1 Date PL Count of Docs
2 10/1/2018 ALD 3
3 ...
UPDATE: As suggested in #ama 's answer below, the expression in step B did not work.
However, I created a subset of the Days Old values using
=CUBESET("ThisWorkbookDataModel",
"{[EDD_Report_10-01-18].[Days Old].[all].[650]:[EDD_Report_10-01-18].[Days Old].[All].[3647]}")
The cell containing this cubeset is referenced as the third Member_expression of the original CUBEVALUE formula. The limitation is now that the values for the beginning and end must be members of the Days Old set.
This is limiting, in that, I was hoping for a more general test for >=650 and there is no way to guarantee that specific values of Days Old will be in the query.
First time I hear about CUBE, so you got me curious and I did some digging. Definitely not an expert, but here is what I found:
MDX language should allow you to provide value ranges in the form of {[Table].[Field].[All].[LowerBound]:[Table].[Field].[All].[UpperBound]}.
A. Get the total number of entries:
D3 =CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]"),
"{[EDD_Report_10-01-18].[Days Old].[All]")
B. Get the number of entries less than 650:
E3 =CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]"),
"{[EDD_Report_10-01-18].[Days Old].[All].[0]:[EDD_Report_10-01-18].[Days Old].[All].[649]}")
Note I found something about using .[All].[650].lag(1)} but I think for it to work properly your data might need to be sorted?
C. Substract
C3 =D3-E3
Alternatively, go for the quick and dirty:
=CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]"),
"{[EDD_Report_10-01-18].[Days Old].[All].[650]:[EDD_Report_10-01-18].[Days Old].[All].[99999]}")
Hope this helps and do let me know, I am still curious!

Excel: Match 2 Items, Including if Date is Between a Date Span on Multiple Worksheets

I have 2 worksheets:
Worksheet 1: Member ID, Engagement Date and other data.
Worksheet 2: Member ID, Policy Begin Date, Policy End Date and other data.
On Worksheet 1, I want to return a policy type name (from Worksheet 2), if Worksheet 1's Member ID matches Worksheet 2's Member ID AND if Worksheet 1 Engagement Date falls between Worksheet 2's Policy Begin and End Date...
The following is the formula I tried and also have attempted extensive research, to no avail:
=INDEX('Program Data'!M2:M25671,MATCH(A2:A489&F2>='Program Data'K2&F2<='Program Data'L2,'Program Data'!E2:E25671,0))
Replace your MATCH with an AGGREGATE that Calculates the SMALLest row where the Policy ID and Dates all match up.
So, we want AGGREGATE(15, 6, <SOME CODE>, 1) to get the Smallest non-error value in a list created by <SOME CODE>
The first thing that <SOME CODE> will want is the ROW we are looking at (minus one, because I see that you are skipping the header row...) which is ROW(Sheet2!$A$2:$A$25671)-1
If the Row does not match, we want to either make it massive or make it error (even better, because then it gets completely skipped). How to do this? Well, I have the POWER function for that. If you try POWER(10,999) you get a #NUM! error, because 10^999 is too large a number for Excel. If you try POWER(0,999) you get 0, because 0^anything is 0. So, we'll just add some POWER to our ROWs to make it error-out when we don't want them.
But, now we need to decide between 10 and 0. Fortunately, Logical statements can be treated as Bitwise Multiplication (True = 1 and False = 0`)
So, --(Sheet2!$A$2:$A$25671=$A2) will give us 1 when the First Columns (Member ID) match, --(Sheet2!$B$2:$B$25671<=$B2) will make sure that the Policy Start Date is before-or-on the record date, and --(Sheet2!$C$2:$C$25671>=$B2) will check the End Date. Multiply it all together, and we get 1 when the row matches, and 0 when it doesn't.
Now, if we take that away from 1, we get the opposite: 0 when we want 0, and 1 when we want 10. So multiply that by 10, and shove it in the POWER function, and add that to the ROW to get <SOME CODE>. Dump it all in the AGGREGATE from the start, and voila
=AGGREGATE(15,6,ROW(Sheet2!$A$2:$A$25671)-1+(POWER(10*(1-(Sheet2!$A$2:$A$25671=$A2)*--(Sheet2!$B$2:$B$25671<=$B2)*--(Sheet2!$C$2:$C$25671>=$B2)),999)),1)
Then, just use that in place of your MATCH (it will generate a #NUM! error when no policy is found)
=INDEX(Sheet2!M2:M25671,AGGREGATE(15,6,ROW(Sheet2!$A$2:$A$25671)-1+(POWER(10*(1-(Sheet2!$A$2:$A$25671=$A2)*--(Sheet2!$B$2:$B$25671<=$B2)*--(Sheet2!$C$2:$C$25671>=$B2)),999)),1),1)

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800
tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

Resources