match substring against text values arranged in a excel table

match substring against text values arranged in a excel table - excel

I am trying to achieve this result: assign a category to a document based on its title, or part of its title.
Title
Category
correspondence
Correspondence
Note Transmission Correspondence
Correspondence
Advisors Evaluation Report
Report
Country Notes
Correspondence
Annual Portfolio Report
Report
Appointment Letter
Correspondence
The categories are arranged into a table (docCategories) where each row starts with a unique category name, and is followed by a set of labels that match entirely or partially with the document title.
Category
Label
Label2
Label3
Label4
Correspondence
Letter
Memo
Note
Correspondence
Report
Dashboard
Report
The formula will take the document title and check if it matches any of the labels (with wild cards), so to return the unique category in the first position in the same row of the matched label.
Appointment Letter -> matches label:letter -> cat:Correspondence
I have made it working with this formula to be copied in the Category column:
=INDEX(docCategories;MIN(IF(docCategories=A2;ROW(docCategories)))-1;MIN(IF(docCategories=A2;1)))
And only if the title is exact matching of the entire label (e.g. Correspondence -> matches label:correspondence -> cat:Correspondence).
I am looking to have it working for matching on part of the title (e.g. Appointment Letter -> matches label:letter -> cat:Correspondence).
I have tried and failed to change the docCategories=<title> into something that can match the substring of the title, even applying the SPLITEXT(<title>) it still fails to give me the expected result.
Who can think of a creative solution for this?

The following solution works for any number of categories and for any number of labels on any category. It also identifies if no labels were found and also if more than one label was found from a different category. Since the question doesn't specify any specific excel version tag I assume Microsoft Office 365 function can be used.
On cell I2 put the following formula:
=LET(rng, A2:E3, texts, G2:G9, lkupValues, B2:E3, categories, INDEX(rng,,1),
BYROW(texts, LAMBDA(text,LET(
reduceResult, REDUCE("",categories, LAMBDA(acc,c, LET(
lkup, XLOOKUP(c,categories, lkupValues), searchLabels, FILTER(lkup, lkup<>0),
IF(SUM(N(ISNUMBER(SEARCH(searchLabels,text))))=0, acc,
IF(acc="", c, "MORE THAN ONE CATEGORY FOUND"))
))), IF(reduceResult<>"", reduceResult, "CATEGORY NOT FOUND")
)))
)
and here is the corresponding output:
The last two rows Title column were added to test the Non-Happy paths.
Explanation
We use LET function to define the names to be used and to avoid repeating the same calculation. If in your excel version you have DROP function available, then the name: lkupValues can be defined as follow: DROP(rng,,1).
The main idea is to iterate over texts values via BYROW and for each text we invoke SEARCH function for all categories. When the first input argument of SEARCH is an array, it returns an array of the same shape indicating the start of the index position of the labels found in text or #VALUE! if no labels were found.
Note: SEARCH is not case sensitive, if that is not the case, then replace it with FIND.
We use REDUCE function to iterate over all categories to find a match. For each category (c) we find the corresponding labels via XLOOKUP. Since not all categories have the same number of labels, for example Report has fewer labels than the Correspondence category. We need to adjust it to remove empty labels. The name searchLabels filters the result to only non-empty labels.
For checking if labels were not found we use the following condition:
SUM(N(ISNUMBER(SEARCH(searchLabels,text))))=0
ISNUMBER converts the SEARCH result to TRUE/FALSE values. N function converts the result to equivalent 0,1 values.
If the condition is TRUE, it returns the accumulator (acc initialized to an empty string). If the condition is FALSE, some labels were found, then it returns the category (c) if acc is empty, i.e. no previous categories were found. If acc is not empty any previous category was found, so it returns MORE THAN ONE CATEGORY FOUND.
Finally, if the result of REDUCE (reduceResult) is an empty string, it means the accumulator was not updated after initialization, so no labels were found for any category and it is indicated with the output: CATEGORY NOT FOUND.

Related

EXCEL - Dual VLOOKUP and Interpolation

I have a table on Excel with data as the following:
Meaning, I have different JPH based on the %SMALL unit and the number of active stations.
I need to create a matrix like the following (with %SMALL on horizontal and STATIONS on vertical axes):
And the formula for each cell should:
Take the input of Stations (column "B")
Check, for that specific Stations number, the amount of data on the other table (like make a filter on STATIONS for the specific number)
Perform an VLOOKUP for checking the JPH based on the %SMALL value on row 2
Interpolate for the exact JPH value, if not found on table
For now, I was able to create the last part (the VLOOKUP and the interpolation), with the following:
=IFERROR(VLOOKUP(C2;'EARLY-STATIONS'!$F:$H;3;FALSE);AVERAGE(OFFSET(INDEX('EARLY-STATIONS'!$H:$H;MATCH(C2;'EARLY-STATIONS'!$F:$F;1));0;0;2;1)))
The problem I'm facing is than with this, the calculation is not checking the number of stations, so the Iteration is not accurate.
Unfortunately I cannot use VBA macros to solve this.
Any clue?

This is an attempt because more clarity is needed in terms of all possible scenarios to consider, based on different input data and how to understand the "extrapolation" process. This approach understands as extrapolation the average of two values (lower and greater), but the idea can be customized to any other way to calculate it. Per tags listed in the question I assume there is no Excel version constraint. This is O365 solution:
=LET(sm, A2:A10, st, B2:B10, jph, C2:C10, smx, F1:J1, sty, E2:E4, NULL, "",
GETLk, LAMBDA(x,y,mode, FILTER(jph, (st=y)
* (sm = INDEX(sm, XMATCH(x, sm, mode))), NULL)),
GET, LAMBDA(x,y, LET(f, FILTER(jph, (jph=GETLk(x,y, 1))
+ (jph=GETLk(x,y, -1)), NULL), IF(#f=NULL, NULL, AVERAGE(f)))),
HREDUCE, LAMBDA(yi, DROP(REDUCE("", smx, LAMBDA(ac,x,
HSTACK(ac, GET(x, yi)))),,1)),
DROP(REDUCE("", sty, LAMBDA(ac,y, VSTACK(ac, HREDUCE(y)))),1))
The above formula spills the entire result, I don't think for this case you can use a LOOKUP-like function.
Here is the output:
The highlighted cells where the average is calculated.
Explanation
The main idea is to use DROP/REDUCE/HSTACK/VSTACK pattern to generate the grid. Check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length on how to apply it.
We use two user LAMBDA functions to abstract some calculations:
GETLk(x,y,mode), filters jph name based on %SMALL and Stations columns values, based on input values x (x-axis value from the grid), y (y-axis value form the grid) respectively. The third input argument mode, is for doing the approximate search in XMATCH (1-next largest, -1 next smallest). In case the value exist in the input table, XMATCH returns the same value in both cases.
GET(x,y) has the logic to find the value or if the value doesn't exist to calculate the average. It uses the previous LAMBDA function GETLk. We filter for jph values that match the input values (x,y), but we use an OR condition in the FILTER (+), to select both lower or greater values. If the value exist, returns just one value otherwise two values are returned by FILTER (f). Finally if f is not empty we return the average, otherwise the value we setup as NULL.
HREDUCE: Concatenate the result by columns for a given row of the grid. Check the referred question for more information about it.

Comparing two columns and their values and outputting the greater value

I'm trying to compare two columns ("Shows") from different tables and showing which one has the greater number ("Rating") associated with it in another table.
Ignore the operation column above as part of the solution that I'm trying to get, it's just to illustrate for you what I'm trying to compare.
Important note: If the names are duplicated. Compare the matching pair in their corresponding order. (1st with 1st, 2nd with 2nd, 3rd with 3rd etc..) illustrated in the table below:
Thanks

You can try the following in cell F3 for an array solution that spills the entire result at once:
=LET(sA, A3:A6, rA, B3:B6, sB, C3:C6, rB, D3:D6, CNTS, LAMBDA(x,
LET(seq, SEQUENCE(ROWS(x)), MAP(seq, LAMBDA(s,ROWS(FILTER(x,(x=INDEX(x,s))
*(seq<=s))))))), cntsA, CNTS(sA), cntsB, CNTS(sB), eval, MAP(sA, rA, cntsA,
LAMBDA(s,r,c,IF(r > FILTER(rB, (sB=s) * (cntsB=c)), "Table 1", "Table 2"))),
HSTACK(sA, eval))
Here is the output:
Explanation
The main idea is to count repeated show values. We use a user LAMBDA function CNTS, to avoid repetition of the same formula twice. Once we have the counts (cntsA, contsB), we use MAP to iterate over Table 1 elements with the counts and look for specific show and counts to compare with Table 2 columns. The FILTER function will return always a single value (based on sample data). Finally, we prepare the output as expected using HSTACK.

Try-
=IF(INDEX(FILTER($B$3:$B$6,$A$3:$A$6=G3),COUNTIFS($G$3:$G3,G3))>INDEX(FILTER($E$3:$E$6,$D$3:$D$6=G3),COUNTIFS($G$3:$G3,G3)),"Table-1","Table-2")

how to transform a table in Excel from vertical to horizontal but with different length

i would like to get table 2 from Table 1 in a quicker way. can people help? thanks
so far i have done pivot tables and manually copy and paste transpose, but this is really time consuming/

Here, a solution that uses DROP/REDUCE/VSTACK pattern to generate each row. Check for example #JvdV's answer from this question: How to split texts from dynamic range? and a similar idea DROP/REDUCE/HSTACK pattern to generate the columns for a given row. In cell E2 put the following formula:
=LET(set, A2:B13, IDs, INDEX(set,,1), dates, INDEX(set,,2),
HREDUCE, LAMBDA(id, arr, REDUCE(id, arr, LAMBDA(acc, x, HSTACK(acc, x)))),
output, DROP(REDUCE("", UNIQUE(IDs), LAMBDA(ac, id, VSTACK(ac, LET(
idDates, FILTER(dates, ISNUMBER(XMATCH(IDs, id))), HREDUCE(id, idDates)
)))),1), IFERROR(VSTACK(HSTACK("ID", "Dates"), output), "")
)
and here is the output:
Update
As #JdvD pointed out in the comments section there is a shorted way:
=LET(set, A3:B13, title, A1:B1, IDs, INDEX(set,,1), dates, INDEX(set,,2),
IFERROR(REDUCE(title, UNIQUE(IDs),LAMBDA(ac, id,
VSTACK(ac,HSTACK(id,TOROW(FILTER(dates,IDs=id)))))),"")
)
The main idea is to use the title as a way to initialize the VSTACK accumulator (no need to use DROP), and have all the dates for a given id all at once via the FILTER function. As a side note, it can be expressed in terms of the pattern we explained in the Explanation section (see below), as follow:
=LET(set, A3:B13, title, A1:B1, IDs, INDEX(set,,1), dates, INDEX(set,,2),
HREDUCE, LAMBDA(id, HSTACK(id, TOROW(FILTER(dates,IDs=id)))),
IFERROR(REDUCE(title, UNIQUE(IDs),LAMBDA(ac,id, VSTACK(ac, HREDUCE(id)))),"")
)
Note: Keeping the same name of the user LAMBDA function (HREDUCE) for sake of consistency with the Explanation section, but there is no need to use REDUCE. A more appropriate name would be PIVOT_DATES.
Explanation
HREDUCE is a user LAMBDA function that implements the DROP/REDUCE/HSTACK pattern. In order to generate all the columns for a given row, this is the pattern to follow:
DROP(REDUCE("", arr, LAMBDA(acc, x, HSTACK(acc, func))),,1)
It iterates over all elements of arr (x) and uses HSTACK to concatenate column by column on each iteration. DROP function is used to remove the first column, if we don't have a valid value to initialize the first column (the accumulator, acc). The name func is just a symbolic representation of the calculation required to obtain the value to put on a given column. Usually, some variables are required to be defined, so quite often the LET function is used for that.
In our case we have a valid value to initialize the iteration process (no need to use DROP function), so this pattern can be implemented as follow via our user LAMBDA function HREDUCE:
LAMBDA(id, arr, REDUCE(id, arr, LAMBDA(acc, x, HSTACK(acc, x))))
In our case the initialization value will be each unique id value. The func will be just each element of arr, because we don't need to do any additional calculation to obtain the column value.
The previous process can be applied for a given row, but we need to create iteratively each row. In order to do that we use a DROP/REDUCE/VSTACK pattern, which is a similar idea:
DROP(REDUCE("", arr, LAMBDA(acc, x, VSTACK(acc, func))),1)
Now we append rows via VSTACK. For this case we don't know how to initialize properly the accumulator (acc), so we need to use DROP to remove the first row. Now fun will be: HREDUCE(id, idDates), i.e. the LAMBDA function we created before to generate all the dates columns for a given id. Now we use a LET function to name the selected dates for a given id (idDates).
At the beginning of each row (first column), we are going to have the unique IDs (UNIQUE(IDs)). To find the corresponding dates for each unique ID (id) we use the following:
FILTER(dates, ISNUMBER(XMATCH(IDs, id)))
and name the result idDates.
Finally, we build the output including the header. We pad non existing values with the empty string to avoid having #NA values. This is the default behavior of V/HSTACK functions. We use IFERROR function for that.
IFERROR(VSTACK(HSTACK("ID", "Dates"), output), "")
Note: Both patterns are very useful to avoid Nested Array Error (#CALC!) usually produced by some of the new Excel array functions, such as BYROW, BYCOL, MAP when using TEXTSPLIT for example. This is one of the effective ways to overcome it.

Excel: Applying dynamic array function UNIQUE in excel to a table with rows & columns >1

I have a survey of rents in the table called "Rent Survey Data" which is defined as a name "TBL_RentSurvey".
I use the new dynamic array function "FILTER" to output data that is filtered with various AND/OR criteria according to the fields: W/D, Attchd Garage, SS Appliances, Den, Rehab, and a start/end date. Note that "*" operator refers to AND and "+" operator refers to OR. I have 2 rows for my filter because sometimes I would like to output various combinations of 1 criteria and hence use the "+" operator.
When I apply the UNIQUE function (also a new dynamic array function), I do not get a unique array outputted. You can see that Vizcaya is outputted twice.
Can someone please tell me what I am doing wrong. I was expecting to have a unique table based on all of those arguments.
You can see below that Vizcaya was outputted twice. I'm not sure why.
I was expecting this output:
Code for the two scenarios:
((TBL_RentSurvey[W/D]=C4)+(TBL_RentSurvey[W/D]=C5))
*((TBL_RentSurvey[Attchd Garage]=D4)+(TBL_RentSurvey[Attchd Garage]=D5))
*((TBL_RentSurvey[SS Appliances]=E4)+(TBL_RentSurvey[SS Appliances]=E5))
*((TBL_RentSurvey[Den]=F4)+(TBL_RentSurvey[Den]=F5))
*((TBL_RentSurvey[Rehab]=G4)+(TBL_RentSurvey[Rehab]=G5))
*(TBL_RentSurvey[Unit Type]=H4)
*(TBL_RentSurvey[[Date ]]>=L4)
*(TBL_RentSurvey[[Date ]]<=L5))
=UNIQUE(FILTER(TBL_RentSurvey[[Property]:[Unit Type]],
((TBL_RentSurvey[W/D]=C4)+(TBL_RentSurvey[W/D]=C5))
*((TBL_RentSurvey[Attchd Garage]=D4)+(TBL_RentSurvey[Attchd Garage]=D5))
*((TBL_RentSurvey[SS Appliances]=E4)+(TBL_RentSurvey[SS Appliances]=E5))
*((TBL_RentSurvey[Den]=F4)+(TBL_RentSurvey[Den]=F5))
*((TBL_RentSurvey[Rehab]=G4)+(TBL_RentSurvey[Rehab]=G5))
*(TBL_RentSurvey[Unit Type]=H4)
*(TBL_RentSurvey[[Date ]]>=L4)
*(TBL_RentSurvey[[Date ]]<=L5)))
Thank you!

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800

tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

match substring against text values arranged in a excel table - excel

Related

EXCEL - Dual VLOOKUP and Interpolation

Comparing two columns and their values and outputting the greater value

how to transform a table in Excel from vertical to horizontal but with different length

Excel: Applying dynamic array function UNIQUE in excel to a table with rows & columns >1

Using tbl.Lookup to match just part of a column value

Categories

Resources