Need to remove non-matching text from a delimited Excel string - excel

I have an Excel database query to get all the RBAC user roles that are assigned to each user, and the database returns a string delimited by & (ampersands) between each user role, e.g.:
&Admin&Supervisor&ViewReports&WriteReports&
My query filters records that only have a matching string, let's say it's Reports. However it still returns the full list of user roles for a matching user, and in this case some users have >10 roles assigned and it makes the table look really messy and not suitable for printing.
I could manually clean up each row, but there are quite a lot of them, and since this is going to be run regularly I'm wondering if there may be a good Excel formula or VBS method to split delimited sections of a string and only keep ones that match a string criteria.
I'm aware of "Text to Columns" and its ability to make use of delimiters, but it just spat out a ton of columns and made things worse. I've done several searches about cleaning up delimited strings in Excel but couldn't find any results that were similar to my situation: need to split a delimited string and do something RegEx-esque to only keep parts that match a pattern.
Ideally I'd like to keep the cleaned up results in a single cell, so the above example &Admin&Supervisor&ViewReports&WriteReports& would look like:
ViewReports WriteReports
or
ViewReports,WriteReports
or similar, in a single cell. Not too picky about formatting really, just need the non-relevant parts of the string gone.

you could use a combination of trim, mid and substitute to find your values so using your example above:
if oyu have a blank excel sheet and ad your example to cell A3 then place 1,2,3 and 4 in cells B2, C2, D2, E2 then use copy this formula into cell B3:
=TRIM(MID(SUBSTITUTE($A3,"&",REPT(" ",LEN($A3))),(B$2-1)*LEN($A3)+1,LEN($A3)))
this should give you the value "Admin".
After that just pull formulas to the right and you will get all 4 values in your example
Please let me know if you need more explanation.
For more info on this equation please see webpage:
https://exceljet.net/formula/split-text-with-delimiter

This formula will work in Excel/Office 365. It won't work in earlier versions due to the TEXTJOIN function which appeared in 2016.
Assumes the data is simple strings as above (i.e. not an XML document that might contain duplicates of the created nodes. If that were the case, there is another method of splitting the string we could use).
Split the string on the ampersand with FILTERXML
Use a variation of the INDEX function to return an array of the matching sections
Concatenate those sections with TEXTJOIN
=TEXTJOIN(" ",TRUE,INDEX(FILTERXML("<t><s>" & SUBSTITUTE(A1,"&","</s><s>")&"</s></t>","//s"),N(IF(1,{3,5}))))
The …N(IF(1,{3,5}))… portion is how to return an array of values from the INDEX functions. In this case, 3 and 5 refer to the value before the third and fifth ampersand. Note that 1 would return an error, since there is nothing before the first ampersand.
You can return whichever elements you want. You just need to know (or calculate using the MATCH function), the proper index number(s).
Note that, with TEXTJOIN you can specify whatever delimiter you want. I specified a space, but you could specify comma or anything.

Related

Write data from a list if the adjacent cell contains certain string

I need to update hundreds of cells, and that would be trivial automating, but I am not being able to make it work.
I have a list like the following:
And, in a different tab, a list I have to populate with values above (in B) based on the appearance of the twitter handle in other column.
The names are within a long text string (all of them begin with #), and it is not possible to re-order the list based on those names. Also, there are more names than values, so some cells will remain blank.
Is there a way I can write a formula that writes the values of the first list into the second one if the name in column A in that row is contained within the adjacent string?
Thanks!
You can refer to this sample formula (Same sheet was used):
=arrayformula(if(C2:C<>"",iferror(vlookup(REGEXEXTRACT(C2:C,"\B\#\w+"),A2:B,2,false),""),""))
What it does?
Use array formula to loop column C values
Extract the twitter name (string that starts with #) using Regexextract()
Use the extracted #twittername as search key to get the connections value using vlookup()
Output:
Since we don't have access to the spreadsheet, I can't know for sure what the line-break character is within the Col-A cells of your second sheet. And using this line-break character is important, since Twitter handles may use some non-alphanumeric characters such as the underscore and others which are not included in such REGEX notation as \w. I'm assuming here that the line-break character is CHAR(10) from the ASCII chart.
I also don't know the name of your first sheet; so here, I've just written it as Sheet1. You'll need to replace that with your actual sheet name, remembering to place it in single quotes if it contains anything but alphanumeric characters (e.g., 'Data Sheet').
That said, delete everything from Col-B in your second sheet (including the header "Connections") and place the following formula in cell B1 of that second sheet):
=ArrayFormula({"Connections"; IF(A2:A="",, IFERROR(VLOOKUP(REGEXEXTRACT(SUBSTITUTE(A2:A,CHAR(10),"~"),"#[^~]+"),Sheet1!A:B,2,FALSE)))})

Extract a Word from String Containing a Specific Multiple Words

"I'm setting up a pivot in excel, and want to extract specific words from a data set of text.
I have tried using the below formula to extract one particular word, but want to nest the multiple formula to extract other words as well
=TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),MAX(1,FIND("Evaluation",SUBSTITUTE(A1," ",REPT(" ",99)))-50),99))
The above formula works but only for one word. I want to create nested formula to search first word or second word or third...
If your goal is to search an array for a a substring, if that substring matches any words in a list, and if so, return the matched substring, as in the post suggested by JvdV, use the formula below, which I have modified.
I recommend, in a different worksheet, add a table with a list of the words you want to find, like this. Highlight the range of cells, including the header, then Home > Format as Table > pick a table style and give it a name. This table's name is "t_WordsToFind" (so I can easily identify it in other functions later). You may want to also put your primary data into a table as well. My go-to name is usually "t_Data". Now, instead of worrying about column numbers/letters, you have the user-friendly column headers you started with which makes reading the formula much easier. Your table ranges will also automatically expand when addtl data is added, so row numbers don't need to be referenced any more either.
If you don't have your data in tables, use this version of the formula, and remember to update your range parameters when data is added. B2 is the first cell to be searched, D2:D4 is the list of words to look for, copy the formula down. I do prefer not to use IFERROR as it includes many different types of errors that I may need to know about, like if I misspelled the function name, for example. If you simply need to have an alternative in the event no matches are return and your function is valid, I recommend IFNA.
IFNA(LOOKUP(1,1/COUNTIF(B2,"*"&$D$2:$D$4&"*"),$D$2:$D$4),"")
If you do use tables for your data and lookup tables (you are very wise) and here is the formula version to use (below). In this example, #[Search This Column] is the the equivalent to B2 and t_WordsToFind[Find This] is the table name and column name of words to look for, but it's much more legible, and doesn't need to be copied down or manually expanded in the future.
IFNA(LOOKUP(1,1/COUNTIF([#[Search This Column]],"*"&t_WordsToFind[Find This]&"*"),t_WordsToFind[Find This]),"")
Even wiser still, assuming this is a perpetual need, would be to use power query/power pivot, but I don't want you to go into TMI overload.
Also, your pivot table range will be nice and easy, "t_Data".

Excel: Lookup multiple values in one cell and return results in another

I'm trying to find a way to lookup each of several comma separated values in one cell, and return the results as a comma separated list in another cell.
The number of values to lookup is not constant, it may be only one, or several hundred.
Example - Sheet A has the initial values and will hold the returned values. Sheet B is the table with the data to lookup.
EDIT: I've gotten better and learned how to make this a 1-liner. I'll leave my original answer below, but here's my updated one (as this hasn't been accepted or even commented on, yet):
=TEXTJOIN(", ",TRUE,INDEX(SheetB!$A$2:$B$6,MATCH(FILTERXML("<x><y>"&SUBSTITUTE($B2,",","</y><y>")&"</y></x>","//y"),SheetB!$A$2:$A$6,0),2),"")
I think INDEX/MATCH was the key component I hadn't learned yet when I tried answering this before. :)
The FILTERXML/SUBSTITUTE I had learned from another site where it takes in your delimited values and replaces the delimiter and wraps the rest with the needed xml tags for FILTERXML to then return an array of your values.
INDEX/MATCH then does your lookup, first getting the row number via MATCH (note the final value of 0 in MATCH means exact match), then INDEX returns the value of the corresponding matching column (the 2 at the end of INDEX's arguments).
Finally, TEXTJOIN joins the results together with the delimiter of your choice (and the second argument set to TRUE allows it to skip blanks).
As a final note, I think this solution may work without Office 365. FILTERXML appears to have been available as early as Office 2013.
Original, outdated answer follows:
I've been searching for an answer to this myself and was rather dismayed to find your question - exactly the question I have - without an answer.
Almost 4 years late, this answer does require Office 365 Excel, and it's not the most elegant, but here's what I've been able to come up with.
Pick an unused column on your Sheet B (with enough contiguous unused columns to its right to handle spillover of however many values you'll need to be splitting, or use a new sheet dedicated to this) and put this formula into the 2nd row's cell for that column, dragging it down through each cell that has corresponding data back on Sheet A (and assuming your column header "File #s" is column B): =TRANSPOSE(FILTERXML("<x><y>"&SUBSTITUTE(SheetA!$B2,delim,"</y><y>")&"</y></x>","//y"))
Replace delim with your delimiter wrapped in quotes, or a cell reference to a cell that has your delimiter in it. Also, if it's possible to have no value in the corresponding cell in Sheet A, then you'll need to wrap the FILTERXML() function (or the whole thing) with IFERROR().
Then, back on Sheet A in your Results column, use this formula: =TEXTJOIN(delim,TRUE,FILTER(SheetB!$B$2:$B$6,COUNTIF(SheetB!D2#,SheetB!$A$2:$A$6),if_no_match))
Again, replace delim with the delimiter or cell reference with your delimiter of choice. SheetB!$B$2:$B$6 and SheetB!$A$2:$A$6 are your lookup table's columns (and thus extend them to encompass the whole thing). The SheetB!D2# references the column where you put the TRANSPOSE(FILTERXML()) formula. Finally, replace if_no_match with whatever you want to appear if there was no match.
I'd ideally like to find a way that uses a single, self-contained formula, but alas, this is as far as I've managed so far.

Excel: read text as string instead of regex

I'm somewhat new to excel and right now working on a sheet that simply counts the sales numbers of products.
I'm using the formula:
=COUNTIFS('salesreport'!$A$1:$A$1048576,'SalesNumbers'!$A1)
The salesreport sheet contains the data with product names in the A column, the SalesNumbers sheet contains a List of the product names in the A column and the mentioned formula in the second column.
Now to the Problem, some of the products have different variants and a * on the name ending e.g. "Product A*", "Product A large*".
Since the * is interpreted as any following text by Excel, the counter for "Product A*" will include the large variant.
Since I receive the data from outside its difficult to simply get rid of the * for future reports which should just be copy/pasted into the template.
The easy way to solve the problem is obviously to subtract the count of the large variant. The Problem I have with that is, that the formula is no longer consistent for all products and might cause problems if somebody beside me will work on the template.
Is there any way to make Excel read the name in the A column as a string instead of a regex?
Thank you in advance!
Ascani0
If I am understanding this correctly then you can use:
=SUMPRODUCT(--(salesreport!$A:$A=SalesNumbers!$A1))
Edit 2 SUMPRODUCT is powerful and pretty useful ARRAY formula which finds many usages. Here we use it to do specific COUNTIF like stuff.
salesreport!$A:$A=SalesNumbers!$A1 part tests equality
It generates boolean ARRAY of {TRUE,FALSE,TRUE....FALSE}
This result cannot be directly passed on to SUMPRODUCT which expects numeric data. So to achieve numeric coercion, we apply double unary (--) which can be also achieved by using +0 or *1 etc methods.
Edit And if you want to use COUNTIFS then you can use it like below:
=COUNTIFS(salesreport!$A:$A,SUBSTITUTE(SalesNumbers!$A1,"*","~*"))
Here tilde(~) does the work as escape character and allows checking of asterisk (*) in literal sense.

Display value if text strings from two ranges match

I have two worksheets in Excel. One contains the rankings:
:
The other one the matchups (daily updated automatically)
The purpose is to display the rank of each respective team in two columns next to the matchups. One of the problems is that the text strings are not exact matches (limitation of the webquery).
So I need to find a formula for when there is a string of the table in Rankings that approximately matches the specified cell for the team in matchups and then display Rank (column 1 on Rankings) next to it.
So far I've got this in the cell where the ranking is supposed to go.
=VLOOKUP("*"&qry!B5&"*";Sheet1!A3:C369;1;0)
But it just results in the string "rank" from the table header in Rankings.
Any ideas?
Several things:
Vlookup will try to find a match in the first column of the lookup table and can return a value from a column to the right. Your lookup table has the desired return value in the first column and the lookup values in the second column. Therefore, Vlookup will not work in this scenario. You can use a combination of Index and Match, though.
the * wildcard before and after the lookup value mean that the value in the lookup table can have any text before and after the text already in your lookup value. In your case, the lookup value has more text than the text in the lookup table. So the wildcard is not helping. Example: you have "353 Virginia" and you want to find this in a column that has only "Virginia". Wrapping "353 Virginia" in wildcards will add no value, because the text that you want to match is actually shorter than the text you are starting out with. You need to remove stuff from the lookup value instead of adding wild cards.
If the data for the rankings comes from a web query, you need to do some work to clean up that data, so it is fit for the lookup into your other table.
In addition to the number at the beginning, there are also two characters at the end of some cells that I cannot identify. These need to be stripped out, too, before you can do the lookup.
Assuming that all cells contain three digits and a space at the beginning you can use a formula approach
=TRIM(MID(SUBSTITUTE(B5;" ##";"");5;99))
Substitute will remove the trailing characters. I don't know what they are exactly, some web related special characters, so for the exercise here I have used ## instead of the A thing and the chevron. Copy and paste the web characters from one of your cells to the formula. With data coming from the web you may also want to check for leading and trailing white space, non-breaking space characters and other invisible characters.
You can use the "cleansing" formula as the basis of a lookup or, if the formula above sufficiently cleans the data, then you can use it in an index/match combo like this:
=index(Sheet1!$A$3:$A$369,match(TRIM(MID(SUBSTITUTE(B5;" ##";"");5;99));Sheet1!$B$3:$B$369;0))
Without seeing the actual data in a spreadsheet it is hard to tell if the formula cleanup is sufficient. You may need to do more work until a lookup returns the desired value.
If the data is coming from the web, I strongly suggest that you look into Power Query as a tool to get the data from the web into your spreadsheet. Power Query is a free add-in for Excel 2010 and 2013 and is built into Excl 2016 as "Get & Transform". It can connect to many data sources, including tables on a web page, and it has very powerful mechanisms to clean data for further processing. Once a Power Query is set up and working, all you need to do is refresh the query to load new data from the web site.

Resources