"I'm setting up a pivot in excel, and want to extract specific words from a data set of text.
I have tried using the below formula to extract one particular word, but want to nest the multiple formula to extract other words as well
=TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),MAX(1,FIND("Evaluation",SUBSTITUTE(A1," ",REPT(" ",99)))-50),99))
The above formula works but only for one word. I want to create nested formula to search first word or second word or third...
If your goal is to search an array for a a substring, if that substring matches any words in a list, and if so, return the matched substring, as in the post suggested by JvdV, use the formula below, which I have modified.
I recommend, in a different worksheet, add a table with a list of the words you want to find, like this. Highlight the range of cells, including the header, then Home > Format as Table > pick a table style and give it a name. This table's name is "t_WordsToFind" (so I can easily identify it in other functions later). You may want to also put your primary data into a table as well. My go-to name is usually "t_Data". Now, instead of worrying about column numbers/letters, you have the user-friendly column headers you started with which makes reading the formula much easier. Your table ranges will also automatically expand when addtl data is added, so row numbers don't need to be referenced any more either.
If you don't have your data in tables, use this version of the formula, and remember to update your range parameters when data is added. B2 is the first cell to be searched, D2:D4 is the list of words to look for, copy the formula down. I do prefer not to use IFERROR as it includes many different types of errors that I may need to know about, like if I misspelled the function name, for example. If you simply need to have an alternative in the event no matches are return and your function is valid, I recommend IFNA.
IFNA(LOOKUP(1,1/COUNTIF(B2,"*"&$D$2:$D$4&"*"),$D$2:$D$4),"")
If you do use tables for your data and lookup tables (you are very wise) and here is the formula version to use (below). In this example, #[Search This Column] is the the equivalent to B2 and t_WordsToFind[Find This] is the table name and column name of words to look for, but it's much more legible, and doesn't need to be copied down or manually expanded in the future.
IFNA(LOOKUP(1,1/COUNTIF([#[Search This Column]],"*"&t_WordsToFind[Find This]&"*"),t_WordsToFind[Find This]),"")
Even wiser still, assuming this is a perpetual need, would be to use power query/power pivot, but I don't want you to go into TMI overload.
Also, your pivot table range will be nice and easy, "t_Data".
Related
I have two columns of words (Old list) and (New list)
I'm trying to check using the VLOOKUP() function both columns and find which words of the New list doesn't appear in the Old list.
(The answers here is obviously: eyes, john, martha, phone)
I'm giving this simple example to know what VLOOKUP() formula to also apply to a larger sample. In reality I have two columns of over 1000 items. Thanks
=FILTER(F2:F10, ISNA(VLOOKUP(F2:F10,A2:A10,1,FALSE)))
This will filter it down and remove all the N/A values while comparing the 2nd list to the first.
EDIT:
as per comment needing clarification (too long to put as a comment)
You can do anything you’d like by substituting references. It’s best to think of the syntax of things here to get a lay of the land and figure out what you want to pull information from.
The ISNA() just handles N/A errors that will occur with partial lists and is irrelevant to understanding where to put your references.
=filter(array, include, [if_empty])
Where “array” is the range you want to filter FROM
And “include” is what you are searching FOR from that range.
The [if_empty] is optional – you can put a text there, like “No Results” in quotes so that it’ll substitute that for N/A.
The “include” portion is where I’ve added additional information because I want to narrow in what I’m including…. In the plain and simple form of =filter() you’d just be putting a word/cell reference you’re looking for. It’ll pull every column for the whole table if there are multiple columns. But we want it to search multiple criteria simultaneously.
=vlookup(lookup_value, table_array, col_index_num,range_lookup)
what you are searching FOR (in this case anything within the table), where you want to look for it (where you are searching FROM), which column to find it in (column 1 in your case), and TRUE/FALSE – exact or approximate matches.
The easiest way to search across multiple sheets is to have the files open, and as you type in the formula bar when you get to each section click and highlight what you want by switching sheets (alt tab). Just pay really close attention to searching FOR versus searching FROM and you can do any combination of comparison you need.
If you would like TRUE/FALSE for each element you can use the following
=ISNUMBER(MATCH(A1,ColumnToSearch,0))
I have a large-ish spreadsheet of about 1400 rows. One of the columns in that row has been populated from a free-flow text and contains details about numerous items people are requesting. There's no consistency about what's in the text box though.
Image 1 has an example of what the data looks like. C2 is the cell with the data in is. D2 is where I want to extract the list of things from C2 to. If an item appears multiple times I only want it to show once.
Example image of data
The list where the things I want to look for is on a separate sheet (Example of list array) and the list is defined by the range name "items" and runs from A2:A95.
I'm using Excel365 (despite the example screenshots) and have been trying various solutions from here on SO and various other Excel help pages but can't find anything that will work how I've got things setup. I've nothing against using VBA if that's the only way to do it, but would prefer a simpler solution if possible.
Thanks in advance.
I created a Named Range for your reference list so in my formula ThingsToLookUp is referencing the range A2:A9 in my sample file below.
=TEXTJOIN(", ",TRUE,IF(ISERROR(SEARCH(ThingsToLookUp,C2)),"",ThingsToLookUp))
Note that you will stumble on some edge cases here with SEARCH. For instance, if the string has Crossbow you will always output Crossbow and Bow since the later is a substring of the first. Less important but this solution also outputs the items in the order as they appear on the reference list, not as they appear in the input string.
This question will be almost exactly like the question below, but I need a slight change to it for my application and I can't quite figure it out:
excel: how can I identify rows containing text keywords taken from a list of keywords
If a row of several text filled cells contains any keyword from a list of keywords, I would like to add that keyword to the end of the row. Each row will be the same number of cells, but some can be blank and they are not necessarily all the same data type, some could be numbers or dates etc. Even more, I would like to add every keyword that appears in the row of text to the end of the row in separate cells.
Relating to the example post that almost answers my question, I am using the more complicated formula for multiple matches, but in that example they only have one column of data they are looking for keywords in. I have several that would be formatted similar to their column A. I tried changing some of the ranges around with no luck specifically where the formula posted has: IF(COUNTIF($A1,"*"&$B$1:$B$10&"*") I changed $A1, to $A1:$D1 with no luck.
The problem showed up because I have several large spreadsheets of text based data about failure modes of different tools and I would like to categorize them in a little more of a controlled way than free form text in every cell and assigning controlled keywords that apply seems like a decent way to do this.
Example case
Expected result
The keyword list shown in Example Case is not shown in the Expected result. The range of keywords is K2:K6
Another feature that would be useful is if I could assign additional words that when found would trigger one of the key words. For example if the key word is "Gear wear" then "Gear wear" would trigger a hit but "stripped gears" would also trigger a hit. I would imagine the keyword list would be set up as a 2D Range with the first column being the actual key word and the cells to the right of each row would be additional words that trigger the key word. I suspect I am getting to the point where I would need to create a VBA macro to do this. If there is a way to accomplish this without writing code it would make it more repeatable on other user's computers.
Enter following formula in Cell F1 then drag/copy across and down as required.
=IFERROR(INDEX($K$2:$K$6,SMALL(IF(COUNTIF($A1:$D1,"*"&$K$2:$K$6&"*"),ROW($K$2:$K$6)-ROW($A$1)),COLUMNS($A1:A1))),"")
This is an array formula so commit it by pressing Ctrl+Shift+Enter
See the image for reference
This formula is derived from the link mentioned in your question.
I have two worksheets in Excel. One contains the rankings:
:
The other one the matchups (daily updated automatically)
The purpose is to display the rank of each respective team in two columns next to the matchups. One of the problems is that the text strings are not exact matches (limitation of the webquery).
So I need to find a formula for when there is a string of the table in Rankings that approximately matches the specified cell for the team in matchups and then display Rank (column 1 on Rankings) next to it.
So far I've got this in the cell where the ranking is supposed to go.
=VLOOKUP("*"&qry!B5&"*";Sheet1!A3:C369;1;0)
But it just results in the string "rank" from the table header in Rankings.
Any ideas?
Several things:
Vlookup will try to find a match in the first column of the lookup table and can return a value from a column to the right. Your lookup table has the desired return value in the first column and the lookup values in the second column. Therefore, Vlookup will not work in this scenario. You can use a combination of Index and Match, though.
the * wildcard before and after the lookup value mean that the value in the lookup table can have any text before and after the text already in your lookup value. In your case, the lookup value has more text than the text in the lookup table. So the wildcard is not helping. Example: you have "353 Virginia" and you want to find this in a column that has only "Virginia". Wrapping "353 Virginia" in wildcards will add no value, because the text that you want to match is actually shorter than the text you are starting out with. You need to remove stuff from the lookup value instead of adding wild cards.
If the data for the rankings comes from a web query, you need to do some work to clean up that data, so it is fit for the lookup into your other table.
In addition to the number at the beginning, there are also two characters at the end of some cells that I cannot identify. These need to be stripped out, too, before you can do the lookup.
Assuming that all cells contain three digits and a space at the beginning you can use a formula approach
=TRIM(MID(SUBSTITUTE(B5;" ##";"");5;99))
Substitute will remove the trailing characters. I don't know what they are exactly, some web related special characters, so for the exercise here I have used ## instead of the A thing and the chevron. Copy and paste the web characters from one of your cells to the formula. With data coming from the web you may also want to check for leading and trailing white space, non-breaking space characters and other invisible characters.
You can use the "cleansing" formula as the basis of a lookup or, if the formula above sufficiently cleans the data, then you can use it in an index/match combo like this:
=index(Sheet1!$A$3:$A$369,match(TRIM(MID(SUBSTITUTE(B5;" ##";"");5;99));Sheet1!$B$3:$B$369;0))
Without seeing the actual data in a spreadsheet it is hard to tell if the formula cleanup is sufficient. You may need to do more work until a lookup returns the desired value.
If the data is coming from the web, I strongly suggest that you look into Power Query as a tool to get the data from the web into your spreadsheet. Power Query is a free add-in for Excel 2010 and 2013 and is built into Excl 2016 as "Get & Transform". It can connect to many data sources, including tables on a web page, and it has very powerful mechanisms to clean data for further processing. Once a Power Query is set up and working, all you need to do is refresh the query to load new data from the web site.
I'm basically looking for a $A:$A equivalent for structured table references in Excel.
Say I have this formula:
=INDEX(tChoice,MATCH(OFFSET(tData[#[cm_sex]],-3,0),tChoice[name],0),3)
Basically tData is a table full of raw data (many columns), taken from surveys (so each column is Survey question, more or less). tChoice is a smaller table (just a few columns), I basically want to look up into tChoice the raw data value & return a label based on that (to value-label table is tChoice).
I actually want the tData[#[cm_sex]] to auto-increment as I apply formulas in cells to the left (so I cycle through all the columns of the raw data), however I DON'T want the column tChoice[name] to change: e.g. the column to look for a match based on the raw table data.
This is equivalent to writing, say, A:A (which would auto-increment to B:B, C:C, etc) and $A:$A (which would not).
Is there a dollar sign equivalent for structured table references?
P-S: Of course I can other things like increment the whole thing, than search & replace the range with say tChoice[*] replaced by tChoice[name]... However it would be cleaner & more efficient to have a proper notation for it....
Didn't find it in the support pages (https://support.office.com/en-us/article/Using-structured-references-with-Excel-tables-f5ed2452-2337-4f71-bed3-c8ae6d2b276e)
user3964075 provided the answer in the comments. I had never seen this before so thanks to him or her for this answer. There's some information out there on the web about absolute structured table references, so I thought I'd summarize what I found.
For your situation you can use tChoice[[name]:[name]] Specifying a range that's just the one column anchors the column like $ signs do in normal cell references.
If you want to just deal with one row (the one that the formula is in) the anchor looks like this:tChoice[#[name]:[name]].
Now say you want to anchor one cell but have the other be relative, as in this scenario where I'm summing from a to the right, starting with a:a, then a:b, etc:
You can do that with a formula like this, where the first part is absolute and the second is relative:
=SUM(Table1[#[a]:[a]]:Table1[#a])
Note that these formulas much be dragged, not copied. Perhaps there is a keyboard shortcut that does this.
This process is rather clunky compared to just clicking F4, as with a regular cell reference. Jon Acampora has created an addin that automates this process, as well as two detailed posts on this topic. His first post contains a link to the one with the addin.