Find first and second match of a list in Excel - excel

I have text cells in column and i have a list of words, i need to get the first and second match of text cell within this list of word using formula without vba.
here is an example of the result

If your text cells start in A2, then:
First Match B2: =IFERROR(MID($A2,AGGREGATE(15,6,SEARCH(LIST,$A2),COLUMNS($A:A)),FIND(" ",$A2&" ",AGGREGATE(15,6,SEARCH(LIST,$A2),COLUMNS($A:A)))-AGGREGATE(15,6,SEARCH(LIST,$A2),COLUMNS($A:A))),"")
and fill right one cell to get the 2nd Match. Then fill down as far as needed.
EDIT: The OP has added an additional requirement having to do with excluding words within words, eg do NOT find also if the word is Calso; and also do not return punctuation.
Although cumbersome in formulas, this can be handling by
- Replacing all of the punctuation with space
- Adding a space at the beginning and end of the sentence
- Adding a space at the beginning and end of each word in LIST
- adjusting the formula to not return the extra space.
The above can be done most simply by modifying the defined name LIST and also by using a defined name for a formula to do the punctuation replacement and space prefix and suffix.
Given the example above, we redefine LIST
LIST refers to: =" " & Sheet1!$F$2:$F$6 & " "
and, with some cell in Row 2 selected, we define theSentence
theSentence refers to: =" " & TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(Sheet1!$A10,","," "),"'"," "),"."," "),"!"," "))& " "
That particular definition will remove comma, apostrophe, period and exclamation point. If you need to remove other punctuation, you can just nest more SUBSTITUTE's
And we make some changes in the formula in B2:
B2: =IFERROR(MID(theSentence,1+AGGREGATE(15,6,SEARCH(LIST,theSentence),COLUMNS($A:A)),FIND(" ",theSentence,1+AGGREGATE(15,6,SEARCH(LIST,theSentence),COLUMNS($A:A)))-AGGREGATE(15,6,SEARCH(LIST,theSentence),COLUMNS($A:A))-1),"")

This is the second interpretation of the question (find the first and second match in the list of a string anywhere in the text).
=IFERROR(INDEX($F$2:$F$6,SMALL(IF(ISNUMBER(FIND(" "&$F$2:$F$6&" "," "&$A2&" ")),ROW($F$2:$F$6)-ROW($F$1)),COLUMNS($A1:A1))),"")
Note that this is an array formula and must be entered with CtrlShiftEnter

Related

How can I remove ONLY leading and trailing spaces while leaving spaces in between words alone with an excel formula?

In excel, TRIM() will remove all spaces before and after text, while also removing any duplicate spaces in between words.
Is there a formula or combination thereof that will do the same as TRIM() but leave spaces between words as-is?
In the following example, I'm looking for a formula that will accomplish that of the fictitious formula "WXYZ":
TRIM(" Omicron Persei 8 ") = "Omicron Persei 8"
WXYZ(" Omicron Persei 8 ") = "Omicron Persei 8"
Note that I've read somewhere that TRIM() in VBA will work like that of WXYZ above. However, I'm looking for a formula solution.
I believe this should work (assuming your string is located at A1):
=MID(A1,
FIND(LEFT(TRIM(A1),1),A1),
(LEN(A1)-MATCH(RIGHT(TRIM(A1),1),INDEX(MID(A1,LEN(A1)-ROW(INDIRECT("1:"&LEN(A1)))+1,1),0),0)-FIND(LEFT(TRIM(A1),1),A1)+2)
FIND(LEFT(TRIM(A1),1),A1) returns the location of the first non-space character in the string
MATCH(RIGHT(TRIM(A1),1),INDEX(MID(A1,LEN(A1)-ROW(INDIRECT("1:"&LEN(A1)))+1,1),0),0) returns the location of the last non-space character in the string from right-to-left.
How would this look in Excel 365? A bit easier I think with let, sequence and xmatch but not particularly short:
=IFERROR(LET(seq,SEQUENCE(LEN(A2)),
array,MID(A2,seq,1),
start,XMATCH(TRUE,array<>" "),
finish,XMATCH(TRUE,array<>" ",0,-1),
MID(A2,start,finish-start+1)),"")
Just to add to all the valuable content:
Formula in B1:
=LET(x,TEXTSPLIT(A2," ",,1),TEXTJOIN(DROP(DROP(TEXTSPLIT(" "&A2&" ",x),,1),,-1),,x))
A Formula Array could also be used.
Assuming the string is located at A1 enter this Formula Array in B2. It's highly suggested to ensure this part of the formula ROW(B:B) refers always to the same column were the formula is located (column B in this case), this is in order to avoid the formula returning an error if the column to which it refers is deleted.
=MID($A1,
FIND(LEFT(TRIM($A1),1),$A1),
1+MAX(ROW(B:B)*(ROW(B:B)<=LEN($A1))*(MID($A1,ROW(B:B),1)<>" "))
-FIND(LEFT(TRIM($A1),1),$A1))
FormulaArrays are entered pressing [Ctrl] + [Shift] + [Enter] simultaneously, you shall see { and } around the formula if entered correctly
As regards the formula provided by #Aakash I suggest to replace the INDIRECT function in this part:
-ROW(INDIRECT("1:"&LEN($A7)))
with this:
-ROW(B:B)
So the formula will become Non-Volatile:
=MID($A1,
FIND(LEFT(TRIM($A1),1),$A1),
(LEN($A1)-MATCH(RIGHT(TRIM($A1),1),INDEX(MID($A1,LEN($A1)-ROW(B:B)+1,1),0),0)
-FIND(LEFT(TRIM($A1),1),$A1)+2))

Excel - pick specific characters from a string after a number

I have a list of strings where I want to split the numbers and alphabets part. For e.g. in cell A1 I have "FNN-12345 - Sample Text - 2016_AA1.1" (without the quotes ""). I want to split it to get just "Sample Text - 2016_AA1.1".
Appreciate any guidance on the formula.
Cheers.
This is the universal solution, no matter what the first alphanumeric string is:
=RIGHT(A1,LEN(A1)-FIND(" - ",A1)-2)
It finds the first occurence of the string " - " and keeps only the part after that string.
You can use the functions to manipulate strings of characters in Excel like Left, Right or Mid to get the desired result in combination with a Lenghth function.
As such for you result you could try :
=RIGHT(A1, LEN(A1) - LEN("FNN-12345 - "))
This formula would take the length of the entire cell and remove the FNN-12345 - part. Of course you can add a column which contain the desired elements to be removed.
If the text which you want to select begin always on the position 13 (as in your example), use the formula
=RIGHT(A1,LEN(A2)-12)
(supposing your original text is in the cell A1).
If you recognize the start of text by the pattern " - ", use the formula
=RIGHT(A3,LEN(A3) -FIND(" - ",A4)-LEN(" - ")+ 1)

How to extract parsed data from once cell to another

Given a spreadsheet cell containing a string that consists of a hyphenated series of character segments, I need to extract the last segment.
For example, consider column A containing data strings like XX-XXX-X-XX-XX-G10, where X denotes any character. What formula would I need to place in column B to get G10 as a result?
A B
1 XX-XXX-X-XX-XX-G10 G10
I'm looking for a formula that could work in in Libre Office Calc, Open Office Calc, MS Excel, or Google Sheets.
Another possibility in LO Calc is to use the general purpose regular expression macro shown here: https://superuser.com/a/1072196/541756. Then the cell formula would be similar to JPV's answer:
=REFIND(A1,"([^-]+$)")
If you are using google sheets, regexextract would be possible too:
=REGEXEXTRACT(A1, "[^-]+$")
In LibreOffice Calc and OpenOffice Calc, you can use a regular expression to determine the position of the text after the last - character:
=SEARCH("-[:alnum:]+$";A1)
will return 15 if A1 contains XX-XXX-X-XX-XX-G10.
Now, you can use this value to get the text "behind" that position, using the RIGHT() function:
=RIGHT(A1;LEN(A1)-SEARCH("-[:alnum:]+$";A1))
Split up on multiple lines:
=RIGHT( ' return text beginning from the right...
A1; ' of cell A1 ...
LEN(A1) ' start at lenght(A1) = 18
- ' minus ...
SEARCH( ' position ...
"-[:alnum:]+$" ' of last "-" ...
;A1 ' in cell A1 = 15 ==> last three characters
)
)
It appears that you want the characters that appear at the end of a string, to the right of the last instance of a hyphen character, "-".
This formula, adapted from here, works in Excel, *Calc & Google Sheets:
=TRIM(RIGHT(SUBSTITUTE(A1,"-",REPT(" ",LEN(A1))),LEN(A1)))
Explanation:
SUBSTITUTE(A1,"-",new_string) will find each hyphen ("-") in the original string from cell A1 and replace it with a new_string.
REPT(" ",LEN(A1)) is a string of repeated space characters (" "), the same length as the original string in cell A1.
TRIM(RIGHT(string,count)) will get the right-most count characters, and trim off leading and trailing spaces. Since the string was previously padded out by replacing hyphens with spaces, and count is the same LEN(A1) used for that padding, the last count characters consists of a bunch of spaces followed by whatever followed the last hyphen!
In Google Sheets, an alternative approach is to use the SPLIT function to break the value from column A into an array, then select the last element. (Excel-VBA has a split() function, so you could make this work in Excel by writing VBA code to provide it as a custom function.)
=INDEX(SPLIT(A1,"-"),0,COUNTA(SPLIT(A1,"-")))
I found simply solution:
=RIGHT(A1;3)
that gives me G10 as the result too! It works because COL A always have 3 chars at the end!

Count the frequency of a specific word in a single cell

In Microsoft Excel I wish to count the frequency of a specific word in a cell. The cell contains a few sentences. I am using a formula right now that is working, but not the way I want it to.
A1
my uncle ate potatos. potato was his favorite food. Don't mash the potato, just keep it simple.
B1 (word to count the frequency of)
potato
C1 (forumula)
=(LEN(A2)-LEN(SUBSTITUTE(A2;B2;"")))/LEN(B2)
C1 Results:
3
In C1, I am getting a count 3. I want it just to be 2. So, the formula is counting potatos.
How do I make the function only count exact matches?
I've got a solution here but it's not pretty.
The problem, as I indicate in my comment, is that Excel has no internal function to see if a cell contains an 'exact match'. You can check if the total value in a cell is an exact match, but you can't check whether a search term has been conjugated like that. So, we'll need to create a special method which checks for every 'acceptable' ending to a word. In my eyes, this would be anything that ends with space, anything that ends with punctuation, and anything at the end of a cell with nothing after it.
ARRAY FORMULAS
You were on the right track with the LEN - SUBSTITUTE method, but the formula will need to be an array formula to work. Array formulas calculate the same thing multiple times over a given range of cells, instead of just once. They resolve the calculation for each individual cell in a formula and provide an array of results. This array of results must be collapsed together to get a single total result.
Consider as follows:
=LEN(C1:C6)
Confirm this formula with CTRL + SHIFT + ENTER instead of just ENTER. This gives us the LEN of C1, followed by C2, C3... etc., resulting in an array of results that looks like this [assume C1 had "a", C2 had "aa", C3 had "a", C4 had "", C5 had "aaa", and C6 had ""]:
={1;2;1;0;3;0}
To get that as a single number providing the total length of each cell individually, wrap that in a SUM function:
=SUM(LEN(C1:C6))
Confirmed again with CTRL + SHIFT + ENTER instead of just ENTER. This results in the total length of all cells: 7.
DEFINING AN EXACT MATCH
Now to take your question, you are looking to find all 'acceptable' matches of given word B1, within text A1. As I said before, we can define an acceptable answer as one which ends in punctuation, a space, or the end of the cell. Something at the end of the cell is a special case which we will consider later. First, take a look at the formula below. In cells C1:C6, I have manually typed a comma, a period, a semi-colon; a hyphen, a space, and a slash. These will be the 'acceptable' ways to end the word found in B1.
=LEN(SUBSTITUTE(A1,B1&C1:C6,""))
Confirmed with CTRL + SHIFT + ENTER, this takes the length of the substitution for the search term in B1 appended with the acceptable word-end in C1:C6. So it gives the length for 6 new SUBSTITUTED words. But as this is an array of results, we need to add them together to get a single number, like so:
=SUM(LEN(SUBSTITUTE(A1,B1&C1:C6,"")))
FORMULIZING THE RESULT
To work it as you have in your sentence, we will now need to subtract this length from the length of the original word. Note that there is a problem with doing this simply - since we are searching multiple times, we will need to add the length of the original word multiple times. Consider something like this:
=LEN(A1)-SUM(LEN(SUBSTITUTE(A1,B1&C1:C6,"")))
This won't work, because it only adds the length of A1 once, but it subtracts the length of the substituted strings multiple times. How about this?
=LEN(A1)*6-SUM(LEN(SUBSTITUTE(A1,B1&C1:C6,"")))
This works, because there are 6 word-end terms we search for with C1:C6, so the substitution there will occur 6 times. So we have the original length of the word 6 times, and the length of each substituted word 6 times [keep in mind that if there is no match for, say, "potato;", then that term will give the length of the original word, thus negating one of the times we added the length of that word, as expected].
To finalize this, we need to divide by the number of letters in the search term. Keep in mind that where you have "/LEN(B1)", we will need to add a character for the length of each of our word-ends.
=(LEN(A1)*6-SUM(LEN(SUBSTITUTE(A1,B1&C1:C6,""))))/(LEN(B1)+1)
Finally, we need to add the special case where the last portion of A1 is equal to the search term, with no word-end. Alone, this would be:
=IF(RIGHT(A1,LEN(B1))=B1,1,0)
This will give us a 1 if the last part of A1 is equal to B1, otherwise it gives 0. So now simply add this to our previous formula, as follows:
=(LEN(A1)*6-SUM(LEN(SUBSTITUTE(A1,B1&C1:C6,""))))/(LEN(B1)+1)+IF(RIGHT(A1,LEN(B1))=B1,1,0)
Remember to confirm with CTRL + SHIFT + ENTER, instead of just ENTER. That's it, it now gives you the count of all "exact matches" of your search term.
ALTERNATE APPROACH TO ARRAY FORMULAS
Note that instead of using C1:C6, you could instead hardcode your formula to look for specific punctuation as the word-end. This will be harder to maintain but, in my opinion, just as readable. It will look like this:
=(LEN(A1)*6-SUM(LEN(SUBSTITUTE(A1,B1&{",",".",";"," ","/","-"},""))))/(LEN(B1)+1)+IF(RIGHT(A1,LEN(B1))=B1,1,0)
This is still technically an "array formula", and it works on the same principles as I have described above. However, one benefit here is that you can confirm this type of entry with just ENTER. This is good, in case someone accidentally edits your cell and presses ENTER without noticing. Otherwise, this is equivilent to the format above.
Let me know if you would like any portion of this elaborated on.
I do have an alternate solution for you to consider. I takes a bit more space and the formulas are a little more convoluted, but in some senses it will be simpler.
Use column C as a new helper column. Column C will take the text from column A, and will substitute out all instances of punctuation with a " ". Once this has been done, the formula to count the instances of the search term from column B will be a simple formula essentially as you have it in your OP.
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"."," "),";"," "),"-"," "),"/"," ")
This formula first substitutes all slashes for spaces, then with that substituted text it substitutes dashes for spaces, then with that substited text it substitutes semicolons with spaces, etc. As you indicated, if you use semi-colons as delimiters, you will need to replace my commas separating terms with semi-colons.
Then the formula in D1 is simply what you have above in your OP, with two changes: we will be searching for B1 & " ", because we know all of the 'exact matches' now end in spaces, and we will be adding in an extra '1' if the last part of the text in C1 is the same as the search term in B1 - because if a cell ends in that word, it won't have a space, but it is still an 'exact match'. Like so:
=(LEN(C1)-LEN(SUBSTITUTE(C1,B1&" ","")))/(LEN(B1)+1)+IF(RIGHT(C1,LEN(B1))=B1,1,0)
EDIT
My list of punctuation was only a suggestion; I recommend you really go through some sample text and make sure you don't have any weird characters after words. Also, consider changing uncommon ones I have (like "/", or "-") with "?" or "!". If you want to add more, just follow the pattern of the SUBSTITUTE formula.
To make this case-insensitive, you just need to change the formula in column C to make the result all lower case, and then ensure your search terms in column B are lower case. Change column C like so:
=LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"."," "),";"," "),"-"," "),"/"," "))
Sorry for making it "a new answer". You may move it wherever you want.
I have just found a solution for the answer Liu Kang asked on Aug 3 2015 at 12:15. :)
Unfortunately, I do not have "50 reputation" to comment on Grade 'Eh' Bacon's solution above, where the last comment is this:
Discovered a slight problem. Using =IF(B1<>"";(LEN(A1)-LEN(SUBSTITUTE(A1;B1&" ";"")))/(LEN(B1)+1)+IF(RIGHT(A1;LEN(B1))=B1;1;0);"") with shoe in B1 gives the following result: shoe in A1 = 1 (correct), shoes in A1 = 0 (correct), ladyshoe in A1 = 1 (wrong). Guess this have to do with "RIGHT" in the formula. Is it possible to make the formula non-matching for prefix words? E.g if B1 is containing shoe and A1 is containing ladyshoe dogshoe catshoes shoes I want C1 to result in 0. – Liu Kang Aug 3 '15 at 12:15
The solution is to search for a space at the beginning of the word as well (" "&B1&" ") and to add "one" more LEN(B1)+2. So, it becomes =IF(B1<>"";(LEN(A1)-LEN(SUBSTITUTE(A1;" "&B1&" ";"")))/(LEN(B1)+2)+IF(RIGHT(A1;LEN(B1))=B1;1;0);"").
There is one more problem if the word we are looking for is at the beginning. Because there is obviously no space " " at the beginning of the sentence. I use a workaround for it - I have my sentence in A1, but then I have a hidden column B where there is =" "&A1 in B1 and it puts the "space" I need to the beginning of the sentence and everything from the original Grade 'Eh' Bacon's solution is shifted (A1->B1, B1->C1, C1->D1).
I hope it can help and thanks to all who participated in this thread, you helped me A LOT!
Do you need this to be a single formula? I have an idea, but it takes a few (relaitvely simple) steps.
Since you have a long sentence in A1, what about going to Data -> Text to Columns, and send this sentence into a Row, delimited by spaces. Then, remove any punctuation. Then, just do a simple Countif()?
Put the info in A1, then go to Data --> Text to Columns, choose "Delimited", click Next, and choose "Space":
Click Finish, and it'll put the entire thing into Row 1, with a word in each cell. Now just Find/Replace "." and "," with nothing.
Then, Countif to the rescue!
If that works, we can automate into VB, so you don't have to manually find/replace the puncutation. Before I jump into that, does this method work?
Take the length of the string and minus the length of the string with the keyword replaced with nothing then divide the result by the length of the keyword:
=(LEN(A1)-LEN(SUBSTITUTE(A1,B1,"")))/LEN(B1)

Counting number of spaces before a string in Excel

A program that exports to Excel creates a file with an indented list in a single column like this:
Column A
First Text
Second Text
Third Text
Fourth Text
Fifth Text
How can I create a function in excel that counts the number of white spaces before the string of text?
So as to return: 1 for the first text row and 3 for the for the thirst row, etc in this example.
Preferably seeking a non-VBA solution.
TRIM doesn't help here because it removes double spaces also between words.
The main idea is to find the FIRST letter in the trimmed string and find its position in the original string:
=FIND(LEFT(TRIM(A1),1),A1)-1
You can try this function in Ms Excel itself:
=LEN(A1)-LEN(SUBSTITUTE(A1," ",""))
This would apply if the results are in a single cell. If it is for a whole row/column, just drag the formula accordingly.
Try below:
=FIND(" ",A1,1)-1
It calculates the position of the first found whitespace character in a cell and reduces it by 1 to reflect number of characters before that position.
As per http://www.mrexcel.com/forum/excel-questions/61485-counting-spaces.html, you may try:
=LEN(Cell)-LEN(SUBSTITUTE(Cell," ",""))
where Cell is your target cell (i.e. A1, B1, D3, etc.).
My example:
B8: =LEN(F8)-LEN(SUBSTITUTE(F8," ",""))
F8: [ this is a test ]
produces 4 in B8.
The above method will count spaces before the string if any were inserted, between individual words and after the string, if any were inserted. It won't count available space that does not have an actual white space character. So, if I inserted two spaces after test in the above example, the total count would be raised to 6.
As has been pointed out in the other answers, you can't really use TRIM or SUBSTITUTE as potential spaces in between words or at the end will give you the wrong result.
However, this formula will work:
=MATCH(TRUE,MID(A1,COLUMN($A$1:$J$1),1)<>" ",0)-1
You need to enter it as an array formula, i.e. press Ctrl-Shift-Enter instead of Enter.
In case you expect more than 10 spaces, replace the $J with a column letter further down in the alphabet!
Here's my solution. If the left 5 characters equals "_____" (5 blank spaces), then return 5, else look for 4 spaces, and so on.
=IF(LEFT(B1,5)=" ",5,IF(LEFT(B1,4)=" ",4,IF(LEFT(B1,3)=" ",3,IF(LEFT(B1,2)=" ",2,1))))
You almost got it with LEN + TRIM in answers before, you only need to combine both:
=LEN(Cell)-LEN(TRIM(Cell))
If it is Indented you could create a Personal Function like this:
Function IndentLevel(Cell As Range)
'This function returns the indentation of a cell content
Application.Volatile
'With "Application.Volatile" you can make sure, that the function will be
recalculated once the worksheet is recalculated
'for example, when you press F9 (Windows) or press enter in a cell
IndentLevel = Cell.IndentLevel
'Return the IndentLevel
End Function
This will work only if it is Indented, you can see this property in the Cell Format -> Alignment.
After This you could see the Indentation Level.

Resources