I am trying to achieve this result: assign a category to a document based on its title, or part of its title.
Title
Category
correspondence
Correspondence
Note Transmission Correspondence
Correspondence
Advisors Evaluation Report
Report
Country Notes
Correspondence
Annual Portfolio Report
Report
Appointment Letter
Correspondence
The categories are arranged into a table (docCategories) where each row starts with a unique category name, and is followed by a set of labels that match entirely or partially with the document title.
Category
Label
Label2
Label3
Label4
Correspondence
Letter
Memo
Note
Correspondence
Report
Dashboard
Report
The formula will take the document title and check if it matches any of the labels (with wild cards), so to return the unique category in the first position in the same row of the matched label.
Appointment Letter -> matches label:letter -> cat:Correspondence
I have made it working with this formula to be copied in the Category column:
=INDEX(docCategories;MIN(IF(docCategories=A2;ROW(docCategories)))-1;MIN(IF(docCategories=A2;1)))
And only if the title is exact matching of the entire label (e.g. Correspondence -> matches label:correspondence -> cat:Correspondence).
I am looking to have it working for matching on part of the title (e.g. Appointment Letter -> matches label:letter -> cat:Correspondence).
I have tried and failed to change the docCategories=<title> into something that can match the substring of the title, even applying the SPLITEXT(<title>) it still fails to give me the expected result.
Who can think of a creative solution for this?
The following solution works for any number of categories and for any number of labels on any category. It also identifies if no labels were found and also if more than one label was found from a different category. Since the question doesn't specify any specific excel version tag I assume Microsoft Office 365 function can be used.
On cell I2 put the following formula:
=LET(rng, A2:E3, texts, G2:G9, lkupValues, B2:E3, categories, INDEX(rng,,1),
BYROW(texts, LAMBDA(text,LET(
reduceResult, REDUCE("",categories, LAMBDA(acc,c, LET(
lkup, XLOOKUP(c,categories, lkupValues), searchLabels, FILTER(lkup, lkup<>0),
IF(SUM(N(ISNUMBER(SEARCH(searchLabels,text))))=0, acc,
IF(acc="", c, "MORE THAN ONE CATEGORY FOUND"))
))), IF(reduceResult<>"", reduceResult, "CATEGORY NOT FOUND")
)))
)
and here is the corresponding output:
The last two rows Title column were added to test the Non-Happy paths.
Explanation
We use LET function to define the names to be used and to avoid repeating the same calculation. If in your excel version you have DROP function available, then the name: lkupValues can be defined as follow: DROP(rng,,1).
The main idea is to iterate over texts values via BYROW and for each text we invoke SEARCH function for all categories. When the first input argument of SEARCH is an array, it returns an array of the same shape indicating the start of the index position of the labels found in text or #VALUE! if no labels were found.
Note: SEARCH is not case sensitive, if that is not the case, then replace it with FIND.
We use REDUCE function to iterate over all categories to find a match. For each category (c) we find the corresponding labels via XLOOKUP. Since not all categories have the same number of labels, for example Report has fewer labels than the Correspondence category. We need to adjust it to remove empty labels. The name searchLabels filters the result to only non-empty labels.
For checking if labels were not found we use the following condition:
SUM(N(ISNUMBER(SEARCH(searchLabels,text))))=0
ISNUMBER converts the SEARCH result to TRUE/FALSE values. N function converts the result to equivalent 0,1 values.
If the condition is TRUE, it returns the accumulator (acc initialized to an empty string). If the condition is FALSE, some labels were found, then it returns the category (c) if acc is empty, i.e. no previous categories were found. If acc is not empty any previous category was found, so it returns MORE THAN ONE CATEGORY FOUND.
Finally, if the result of REDUCE (reduceResult) is an empty string, it means the accumulator was not updated after initialization, so no labels were found for any category and it is indicated with the output: CATEGORY NOT FOUND.
I'm trying to understand the following behaviour:
If I have the following data:
A
B
a
1
b
2
c
3
If I use =INDEX($A$1:$B$3,,)
It will correctly show the whole range.
If I use =INDEX($A$1:$B$3,1,)
It will correctly show the data for the first row for both columns.
If I use =INDEX($A$1:$B$3,SEQUENCE(2),)
I expect it to show the data for the first two rows for both columns. Instead it shows the data of the first two rows, not showing data for the second column.
How come INDEX loses the column reference here?
INDEX reads its parameters as a pair of lists.
For example, using array constants, you can type:
=INDEX(A1:B3,{1,3},{1,2})
which gives:
a 3
because Excel reads this as {1,1}, {3,2}.
With SEQUENCE, an array constant is returned, and so SEQUENCE(2) returns {1;2}. When used twice, Excel processes {1,1};{2,2}.
You can use SEQUENCE to return a vertical array constant, such as
SEQUENCE(1,2)
which returns {1,2}.
Now it works:
=INDEX(A1:B3,SEQUENCE(2),SEQUENCE(1,2))
Or, using a mix of horizontal and vertical array constants
=INDEX(A1:B3,{1;2},{1,2})
Ref:
https://support.microsoft.com/en-us/office/guidelines-and-examples-of-array-formulas-7d94a64e-3ff3-4686-9372-ecfd5caa57c7
Create one and two-dimensional array constants
I'm a bit confused between subarray, subsequence & subset
if I have {1,2,3,4}
then
subsequence can be {1,2,4} OR {2,4} etc. So basically I can omit some elements but keep the order.
subarray would be( say subarray of size 3)
{1,2,3}
{2,3,4}
Then what would be the subset?
I'm bit confused between these 3.
Consider an array:
{1,2,3,4}
Subarray: contiguous sequence in an array i.e.
{1,2},{1,2,3}
Subsequence: Need not to be contiguous, but maintains order i.e.
{1,2,4}
Subset: Same as subsequence except it has empty set i.e.
{1,3},{}
Given an array/sequence of size n, possible
Subarray = n*(n+1)/2
Subseqeunce = (2^n) -1 (non-empty subsequences)
Subset = 2^n
In my opinion, if the given pattern is array, the so called subarray means contiguous subsequence.
For example, if given {1, 2, 3, 4}, subarray can be
{1, 2, 3}
{2, 3, 4}
etc.
While the given pattern is a sequence, subsequence contain elements whose subscripts are increasing in the original sequence.
For example, also {1, 2, 3, 4}, subsequence can be
{1, 3}
{1,4}
etc.
While the given pattern is a set, subset contain any possible combinations of original set.
For example, {1, 2, 3, 4}, subset can be
{1}
{2}
{3}
{4}
{1, 2}
{1, 3}
{1, 4}
{2, 3}
etc.
Consider these two properties in collection (array, sequence, set, etc) of elements: Order and Continuity.
Order is when you cannot switch the indices or locations of two or more elements (a collection with a single element has an irrelevant order).
Continuity is that an element must have their neighbors remain with them or be null.
A subarray has Order and Continuity.
A subsequence has Order but not Continuity.
A subset does not Order nor Continuity.
A collection with Continuity but not Order does not exist (to my knowledge)
In the context of an array, SubSequence - need not be contigious but needs to maintain the order. But SubArray is contigious and inherently maintains the order.
if you have {1,2,3,4} --- {1,3,4} is a valid SubSequence but its not a subarray.
And subset is no order and no contigious.. So you {1,3,2} is a valid sub set but not a subsequence or subarray.
{1,2} is a valid subarray, subset and subsequence.
All Subarrays are subsequences and all subsequence are subset.
But sometimes subset and subarrays and sub sequences are used interchangably and the word contigious is prefixed to make it more clear.
Per my understanding, for example, we have a list say [3,5,7,8,9]. here
subset doesn’t need to maintain order and has non-contiguous behavior. For example, [9,3] is a subset
subsequence maintain order and has non-contiguous behavior. For example, [5,8,9] is a subsequence
subarray maintains order and has contiguous behavior. For example, [8,9] is a subarray
subarray: some continuous elements in the array
subset: some elements in the collection
subsequence: in most case, some elements in the array maintaining relative order (not necessary to be continuous)
A Simple and Straightforward Explanation:
Subarray: It always should be in contiguous form.
For example, lets take an array int arr=[10,20,30,40,50];
-->Now lets see its various combinations:
subarr=[10,20] //true
subarr=[10,30] //false, because its not in contiguous form
subarr=[40,50] //true
Subsequence: which don't need to be in contiguous form but same order.
For example, lets take an array int arr=[10,20,30,40,50];
-->Now lets see its various combinations:
subseq=[10,20]; //true
subseq=[10,30]; //true
subseq=[30,20]; //false, because order isn't maintained
Subset: which mean any possible combinations.
For example, lets take an array int arr=[10,20,30,40,50];
-->Now lets see its various combinations:
subset={10,20}; //true
subset={10,30}; //true
subset={30,20}; //true
Following Are Example of Arrays
Array : 1,2,3,4,5,6,7,8,9
Sub Array : 2,3,4,5,6 >> Contagious Elements in order
Sub Sequence : 2,4,7,8 >> Elements in order by skipping any or 0 elements
Subset : 9,5,2,1 >> Elements by skipping any or 0 elements but not in order
Suppose an Array [3,4,6,7,9]
Sub Array is a continuous and ordered part of that array
example is [3,4,6],[7,9],[5]
Sub Sequence has not need to be continuous but they should be in order
example is [3,4,9],[3,7],[6]
Subset neither need to be continuous nor to be in order
Example is [9,4,7],[3,4],[5]
A subarray is a contiguous part of an array and maintains a relative ordering of elements. For an array/string of size n, there are n*(n+1)/2 non-empty subarrays/substrings.
A subsequence maintains a relative ordering of elements but may or may not be a contiguous part of an array. For a sequence of size n, we can have 2^n-1 non-empty sub-sequences in total.
A subset does not maintain a relative ordering of elements and is neither a contiguous part of an array. For a set of size n, we can have (2^n) sub-sets in total.
Let us understand it with an example.
Consider an array:
array = [1,2,3,4]
Subarray : [1,2],[1,2,3] — is continuous and maintains relative order of elements
Subsequence: [1,2,4] — is not continuous but maintains relative order of elements
Subset: [1,3,2] — is not continuous and does not maintain the relative order of elements
Some interesting observations:
Every Subarray is a Subsequence.
Every Subsequence is a Subset.
I was just doing some yicky code and I thought, instead of using three dynamic arrays, as such:
dim x() as string, y() as string, z() as string
It will be nicer to have a 3 dimensional dynamic array. But; the help and my fumbling experiments has not revealed the method of defining them.
This does not work:
dim x()() or dim(,2) or dim(,)
Any ideas anyone?
A dynamic array is declared the same way regardless of the number of dimensions (arrays in LotusScript can have up to 8 dimensions). According to your example I think it is a two dimensional array you want, where the first dimension is limited to three entries.
If you first declare the array as:
Dim x() As String
You can then specify bounds according to the following example:
Redim x( 0 To 2, 0 To 9 ) ' A two dimensional array
And if you need to enlarge the array later (and keep all the values) you can do it like this:
Redim Preserve x( 0 To 2, 0 To 99 )
Please keep in mind that only the bounds of last dimension can be changed once the number of dimensions of the array has been set.
You could use lists instead of arrays.
Dim x list as String
That is fully dynamic and takes a string as index. List can't contain lists, but lists can contain objects, so you might want to do
Public Class ListContainer
Public level2 List as String
End Class
This way you never need to REDIM preserve. A forall loops you savely through a list