Check if strings in column start with any of the strings in an array? - excel

I have a column containing strings. For each row in the column, I want to check if the string starts with one of several strings contained in an array. This array can contain any number of strings to check against the column.
The output I want is an array {TRUE;FALSE;TRUE;TRUE;etc} where it returns TRUE if the string begins with any of the strings contained in the array.
Anyone know how to do this? Thanks!
Edits for clarification: Office 365 (version 2109). I can't share the spreadsheet itself, but the column contains values like {1.1;1.1.1;1.1.2;1.2;1.3;etc}, and the arrays in question are things like {1.1;1.2}, so I want it to know which cells in the column begin with 1.1 or 1.2 in that case. So it's basically a hierarchically ordered list and I want to be able to pull into an array whether a given row of data is either a child of or is itself any of the items in the array.
Ideally it would be a formula only solution but I'm cool with VBA if it's necessary.
Example data
Bear in mind that in the real data there are many cells in column B which include way more than just two entries.

Assuming the first entry to be checked is in A1:
=0+LEFT(A1,LEN(MyArray))=MyArray
placed somewhere within the worksheet, will produce a spill array of the same dimension as MyArray and comprising the required Booleans. I leave it to you to decide what to do with that resulting array.

We can use MMULT and MATCH to return the correct array:
=LET(
lkprng, FILTER(A:A,A:A<>""),
depcll, B2&CHAR(10),
depclm, FILTERXML("<a><b>"&SUBSTITUTE(depcll,CHAR(10),"</b><b>")&"</b></a>","//b"),
dep, TRANSPOSE(depclm&""),
mtch, --(ISNUMBER(MATCH(LEFT(lkprng,LEN(dep)),dep,0))),
mmt, MMULT(mtch,SEQUENCE(COUNTA(dep))),
TEXTJOIN(CHAR(10),,FILTER(lkprng,mmt>0,"")))

Related

Excel - Search a cell for text strings from an array and display matched strings

I have a large-ish spreadsheet of about 1400 rows. One of the columns in that row has been populated from a free-flow text and contains details about numerous items people are requesting. There's no consistency about what's in the text box though.
Image 1 has an example of what the data looks like. C2 is the cell with the data in is. D2 is where I want to extract the list of things from C2 to. If an item appears multiple times I only want it to show once.
Example image of data
The list where the things I want to look for is on a separate sheet (Example of list array) and the list is defined by the range name "items" and runs from A2:A95.
I'm using Excel365 (despite the example screenshots) and have been trying various solutions from here on SO and various other Excel help pages but can't find anything that will work how I've got things setup. I've nothing against using VBA if that's the only way to do it, but would prefer a simpler solution if possible.
Thanks in advance.
I created a Named Range for your reference list so in my formula ThingsToLookUp is referencing the range A2:A9 in my sample file below.
=TEXTJOIN(", ",TRUE,IF(ISERROR(SEARCH(ThingsToLookUp,C2)),"",ThingsToLookUp))
Note that you will stumble on some edge cases here with SEARCH. For instance, if the string has Crossbow you will always output Crossbow and Bow since the later is a substring of the first. Less important but this solution also outputs the items in the order as they appear on the reference list, not as they appear in the input string.

Excel: Lookup multiple values in one cell and return results in another

I'm trying to find a way to lookup each of several comma separated values in one cell, and return the results as a comma separated list in another cell.
The number of values to lookup is not constant, it may be only one, or several hundred.
Example - Sheet A has the initial values and will hold the returned values. Sheet B is the table with the data to lookup.
EDIT: I've gotten better and learned how to make this a 1-liner. I'll leave my original answer below, but here's my updated one (as this hasn't been accepted or even commented on, yet):
=TEXTJOIN(", ",TRUE,INDEX(SheetB!$A$2:$B$6,MATCH(FILTERXML("<x><y>"&SUBSTITUTE($B2,",","</y><y>")&"</y></x>","//y"),SheetB!$A$2:$A$6,0),2),"")
I think INDEX/MATCH was the key component I hadn't learned yet when I tried answering this before. :)
The FILTERXML/SUBSTITUTE I had learned from another site where it takes in your delimited values and replaces the delimiter and wraps the rest with the needed xml tags for FILTERXML to then return an array of your values.
INDEX/MATCH then does your lookup, first getting the row number via MATCH (note the final value of 0 in MATCH means exact match), then INDEX returns the value of the corresponding matching column (the 2 at the end of INDEX's arguments).
Finally, TEXTJOIN joins the results together with the delimiter of your choice (and the second argument set to TRUE allows it to skip blanks).
As a final note, I think this solution may work without Office 365. FILTERXML appears to have been available as early as Office 2013.
Original, outdated answer follows:
I've been searching for an answer to this myself and was rather dismayed to find your question - exactly the question I have - without an answer.
Almost 4 years late, this answer does require Office 365 Excel, and it's not the most elegant, but here's what I've been able to come up with.
Pick an unused column on your Sheet B (with enough contiguous unused columns to its right to handle spillover of however many values you'll need to be splitting, or use a new sheet dedicated to this) and put this formula into the 2nd row's cell for that column, dragging it down through each cell that has corresponding data back on Sheet A (and assuming your column header "File #s" is column B): =TRANSPOSE(FILTERXML("<x><y>"&SUBSTITUTE(SheetA!$B2,delim,"</y><y>")&"</y></x>","//y"))
Replace delim with your delimiter wrapped in quotes, or a cell reference to a cell that has your delimiter in it. Also, if it's possible to have no value in the corresponding cell in Sheet A, then you'll need to wrap the FILTERXML() function (or the whole thing) with IFERROR().
Then, back on Sheet A in your Results column, use this formula: =TEXTJOIN(delim,TRUE,FILTER(SheetB!$B$2:$B$6,COUNTIF(SheetB!D2#,SheetB!$A$2:$A$6),if_no_match))
Again, replace delim with the delimiter or cell reference with your delimiter of choice. SheetB!$B$2:$B$6 and SheetB!$A$2:$A$6 are your lookup table's columns (and thus extend them to encompass the whole thing). The SheetB!D2# references the column where you put the TRANSPOSE(FILTERXML()) formula. Finally, replace if_no_match with whatever you want to appear if there was no match.
I'd ideally like to find a way that uses a single, self-contained formula, but alas, this is as far as I've managed so far.

Excel: INDEX MATCH with partial number

I have a large table of 12 digit numbers and associated info
I have a small list of 10 and 11 digit numbers (the first and/or last digits were cut off) - I'm attempting to cross these two lists to identify the items on the small list
normally, I'd use an index match to bring the associated info out of the table into the list, but because today I have only partial numbers in my list, I can't get the formula to work
I've seen other examples here that search for partial text strings contained within a range, but I haven't been able to adapt those formulas to my data. wildcards don't seem to work with numbers.
Many thanks for your input, and apologies in advance if I failed to find an existing solution on the site.
To match partial numbers inside a number range, like you do with strings, you can use an array formula with INDEX/MATCH, by composing a temporary array that converts the numbers into strings.
Say column A is your 12 digit numbers column, and you want to match the sequence 1234567890 and retrieve the value from column B, This CSE formula works:
=INDEX($B$2:$B$9999, MATCH("*1234567890*",""&$A$2:$A$9999,0))
CtrlShiftEnter
Although you can use full columns A:A and B:B, this should be avoided as much as possible with array formulas, because they're slow. Full columns mean computing and operating arrays of more than a million entries, so avoid it. Also note the "expensive" conversion from numbers to strings (all numbers in the $A$2:$A$9999 are converted to strings here).
To use a cell reference, say D2, instead of the harcoded 1234567890, the formula should be used like this:
=INDEX($B$2:$B$9999,MATCH("*"&D2&"*",""&$A$2:$A$9999,0))

count unique values between defined range in column

I've got column which consist numbers of type 000.XX.XX
=COUNTIFS(temporary!$A1:$A200,">=000.11.35",temporary!$A1:$A200,"<=000.11.39")
this formula counts values between 000.11.35 and 000.11.39. But i want to count only unique values. How can I do this?
There is not a built-in function for this, as you can see from the several suggestions of how to accomplish this on the Office support site. If you can, you can switch to Google Sheets, and they have a "COUNTUNIQUE()" function.
As described at the link provided above, identify the unique items, either using a filer (this is static) or through repeatedly using the "FREQUENCY()" function. Then count the unique items in a separate step.
Lets say you have the data set in the first column, first you need to remove the repetitions in a second column with the following array formula (confirm the formula with Ctrl+Shift+Enter)
=IF(SUM((A2=$A$2:A2)*1)>1,"",A2)
this formula lists only unique values
I would remove your first 4 digits of your string to create a float number and then count it with the following array formula:
=SUM((IF((RIGHT($B$2:$B$14,4)>=RIGHT(G3,4))*(RIGHT($B$2:$B$14,4)<=RIGHT(G4,4)),$B$2:$B$14,"")<>"")*1)
Please look at the two images for clarification
view with formulas
normal printscreen

I want to use sumproduct with two different tables based on selection

I am working on a statistical model where we use sumproduct to generate forecast values by multiplying coefficients in one table with variables in another. Right now it is being done manually and that is taking time. I would like to automate it but I'm not able to figure this out.
We are using concatenate to identify different rows to use for vlookup. The variable columns are the same in number for both tables. I need to multiply each variable cell respectively in both tables and sum them, hence sumproduct.
this is what I am trying to do
Forecast model 1 sales for product A in phones in USA = sumproduct([variables by year from table 1 for USA for phones], [Variables for USA phone product A model 1 from table 2] )
I hope someone can help me.
Proof of Concept
You will need to update the references to suit your spreadsheet table locations.
In cell E21 use the following and copy right and down as required:
=SUMPRODUCT(INDEX($G$3:$I$12,MATCH($B21&$A21&$C21,$A$3:$A$12,0),0),INDEX($F$15:$H$18,MATCH($A21&$C21&$D21&MID(E$20,16,1),$A$15:$A$18,0),0))
This process was simplified because you had a unique ID tag on each of the previous two tables that could be built from the information in the third table. If you ever get into double digit forecast models the MID() function part of the formula will need to be modified. The 16 in the mid function refers to the character location of the number in the forecast model sales header name in Table 3. As such you either need to keep that header format exactly the same or modify the position of the number in the MID() function.
UPDATE 1
Explanation of Formulas
The following formulas were used in this solution:
SUMPRODUCT
INDEX
MATCH
MID
Concatenate
I will start with the assumption that you already understand sumproduct() as you were already using it before you ran into your problem. One thing to note about sumproduct is that it causes array like calculation to occur on the portion within it brackets. In this case we fed it two ranges of equal size. The difficult part was more an issue of determining those ranges.
Using your ID columns as a lookup row we used the match() function to determine which row to use. For the first set of variables we used the following to determine which row to look in:
=MATCH($B21&$A21&$C21,$A$3:$A$12,0)
Match is made up of three arguments inside the brackets:
MATCH(what to look for, where to look, type of match)
What we need to look for in table is a concatenation of various cells in Table 3 to build the ID in Table 1. It could have been written using the full formula:
=CONCATENATE($B21,$A21,$C21)
but the short form using & was used instead:
=$B21&$A21&$C21
Once we had what to look for we needed the range of where to look and supplied the ID column from table 1:
$A$3:$A$12
This now leaves the third and final argument of what type of search to perform. An exact match seemed to be the most appropriate match to perform so the value of 0 was supplied. What match returns is the row within the supplied range. It is relative to the range supplied and not the actual row in the spreadsheet. If it cannot make a match it will return an error instead of a row number.
Now that we know what row we want, we can use this information with the INDEX() function. The INDEX() function is made up of 3 arguments as well with the third argument being optional depending on if a 1D or 2D range is being indexed:
INDEX(Range to work with, 2D Row or 1D Position reference, 2D Column reference)
IN the case we are dealing with for the first table, the range to work with was your list of variables:
$G$3:$I$12
This is a 2D range. As such we need to tell INDEX() both what Row to look in as well as which Columns to look in. For the row to look in, we used the previously discussed MATCH() function. Since we want all columns and not just a specific column we use the value of 0. If Match returns an error, or if a number greater than the number of rows or columns selected is supplied, INDEX() will return an error. Based on the information discussed, the index function would look like:
=INDEX($G$3:$I$12,MATCH($B21&$A21&$C21,$A$3:$A$12,0),0)
You can try entering the above in a cell but it will give you an error. if you select three adjacent cells in the same row and use CONTROL+SHIFT+ENTER when entering the formula, Excel will add {} around the formula and it will be an array formula and should show you the three variables being used.
The same process as described above can be used for determining the second range of variable from Table 2. The only difference here is that the forecast model number was not in a column of its own but instead in the header row surrounded by text. As such the MID() function needed to be used to go into the header row, bypass the surrounding text and pull the model number out so it could be used as part of the CONCATENATION() used for the "what to look for" in MATCH():
=MID(E$20,16,1)
The MID() function work again with three arguments:
MID(Text to look in, which character to start at, how many characters to pull)
So in this case we are looking at the header in E20. Note the lock $ on the row number so the formula is always looking in row 20 no matter how far down it gets copied. It is then going to the 16th character. In this case the character "1" and pulling 1 character. If the header had just been 1 and 2, there would be no need for the MID function and the cell (with proper lock) could have been used.

Resources