Count the frequency of a specific word in a single cell - excel

In Microsoft Excel I wish to count the frequency of a specific word in a cell. The cell contains a few sentences. I am using a formula right now that is working, but not the way I want it to.
A1
my uncle ate potatos. potato was his favorite food. Don't mash the potato, just keep it simple.
B1 (word to count the frequency of)
potato
C1 (forumula)
=(LEN(A2)-LEN(SUBSTITUTE(A2;B2;"")))/LEN(B2)
C1 Results:
3
In C1, I am getting a count 3. I want it just to be 2. So, the formula is counting potatos.
How do I make the function only count exact matches?

I've got a solution here but it's not pretty.
The problem, as I indicate in my comment, is that Excel has no internal function to see if a cell contains an 'exact match'. You can check if the total value in a cell is an exact match, but you can't check whether a search term has been conjugated like that. So, we'll need to create a special method which checks for every 'acceptable' ending to a word. In my eyes, this would be anything that ends with space, anything that ends with punctuation, and anything at the end of a cell with nothing after it.
ARRAY FORMULAS
You were on the right track with the LEN - SUBSTITUTE method, but the formula will need to be an array formula to work. Array formulas calculate the same thing multiple times over a given range of cells, instead of just once. They resolve the calculation for each individual cell in a formula and provide an array of results. This array of results must be collapsed together to get a single total result.
Consider as follows:
=LEN(C1:C6)
Confirm this formula with CTRL + SHIFT + ENTER instead of just ENTER. This gives us the LEN of C1, followed by C2, C3... etc., resulting in an array of results that looks like this [assume C1 had "a", C2 had "aa", C3 had "a", C4 had "", C5 had "aaa", and C6 had ""]:
={1;2;1;0;3;0}
To get that as a single number providing the total length of each cell individually, wrap that in a SUM function:
=SUM(LEN(C1:C6))
Confirmed again with CTRL + SHIFT + ENTER instead of just ENTER. This results in the total length of all cells: 7.
DEFINING AN EXACT MATCH
Now to take your question, you are looking to find all 'acceptable' matches of given word B1, within text A1. As I said before, we can define an acceptable answer as one which ends in punctuation, a space, or the end of the cell. Something at the end of the cell is a special case which we will consider later. First, take a look at the formula below. In cells C1:C6, I have manually typed a comma, a period, a semi-colon; a hyphen, a space, and a slash. These will be the 'acceptable' ways to end the word found in B1.
=LEN(SUBSTITUTE(A1,B1&C1:C6,""))
Confirmed with CTRL + SHIFT + ENTER, this takes the length of the substitution for the search term in B1 appended with the acceptable word-end in C1:C6. So it gives the length for 6 new SUBSTITUTED words. But as this is an array of results, we need to add them together to get a single number, like so:
=SUM(LEN(SUBSTITUTE(A1,B1&C1:C6,"")))
FORMULIZING THE RESULT
To work it as you have in your sentence, we will now need to subtract this length from the length of the original word. Note that there is a problem with doing this simply - since we are searching multiple times, we will need to add the length of the original word multiple times. Consider something like this:
=LEN(A1)-SUM(LEN(SUBSTITUTE(A1,B1&C1:C6,"")))
This won't work, because it only adds the length of A1 once, but it subtracts the length of the substituted strings multiple times. How about this?
=LEN(A1)*6-SUM(LEN(SUBSTITUTE(A1,B1&C1:C6,"")))
This works, because there are 6 word-end terms we search for with C1:C6, so the substitution there will occur 6 times. So we have the original length of the word 6 times, and the length of each substituted word 6 times [keep in mind that if there is no match for, say, "potato;", then that term will give the length of the original word, thus negating one of the times we added the length of that word, as expected].
To finalize this, we need to divide by the number of letters in the search term. Keep in mind that where you have "/LEN(B1)", we will need to add a character for the length of each of our word-ends.
=(LEN(A1)*6-SUM(LEN(SUBSTITUTE(A1,B1&C1:C6,""))))/(LEN(B1)+1)
Finally, we need to add the special case where the last portion of A1 is equal to the search term, with no word-end. Alone, this would be:
=IF(RIGHT(A1,LEN(B1))=B1,1,0)
This will give us a 1 if the last part of A1 is equal to B1, otherwise it gives 0. So now simply add this to our previous formula, as follows:
=(LEN(A1)*6-SUM(LEN(SUBSTITUTE(A1,B1&C1:C6,""))))/(LEN(B1)+1)+IF(RIGHT(A1,LEN(B1))=B1,1,0)
Remember to confirm with CTRL + SHIFT + ENTER, instead of just ENTER. That's it, it now gives you the count of all "exact matches" of your search term.
ALTERNATE APPROACH TO ARRAY FORMULAS
Note that instead of using C1:C6, you could instead hardcode your formula to look for specific punctuation as the word-end. This will be harder to maintain but, in my opinion, just as readable. It will look like this:
=(LEN(A1)*6-SUM(LEN(SUBSTITUTE(A1,B1&{",",".",";"," ","/","-"},""))))/(LEN(B1)+1)+IF(RIGHT(A1,LEN(B1))=B1,1,0)
This is still technically an "array formula", and it works on the same principles as I have described above. However, one benefit here is that you can confirm this type of entry with just ENTER. This is good, in case someone accidentally edits your cell and presses ENTER without noticing. Otherwise, this is equivilent to the format above.
Let me know if you would like any portion of this elaborated on.

I do have an alternate solution for you to consider. I takes a bit more space and the formulas are a little more convoluted, but in some senses it will be simpler.
Use column C as a new helper column. Column C will take the text from column A, and will substitute out all instances of punctuation with a " ". Once this has been done, the formula to count the instances of the search term from column B will be a simple formula essentially as you have it in your OP.
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"."," "),";"," "),"-"," "),"/"," ")
This formula first substitutes all slashes for spaces, then with that substituted text it substitutes dashes for spaces, then with that substited text it substitutes semicolons with spaces, etc. As you indicated, if you use semi-colons as delimiters, you will need to replace my commas separating terms with semi-colons.
Then the formula in D1 is simply what you have above in your OP, with two changes: we will be searching for B1 & " ", because we know all of the 'exact matches' now end in spaces, and we will be adding in an extra '1' if the last part of the text in C1 is the same as the search term in B1 - because if a cell ends in that word, it won't have a space, but it is still an 'exact match'. Like so:
=(LEN(C1)-LEN(SUBSTITUTE(C1,B1&" ","")))/(LEN(B1)+1)+IF(RIGHT(C1,LEN(B1))=B1,1,0)
EDIT
My list of punctuation was only a suggestion; I recommend you really go through some sample text and make sure you don't have any weird characters after words. Also, consider changing uncommon ones I have (like "/", or "-") with "?" or "!". If you want to add more, just follow the pattern of the SUBSTITUTE formula.
To make this case-insensitive, you just need to change the formula in column C to make the result all lower case, and then ensure your search terms in column B are lower case. Change column C like so:
=LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"."," "),";"," "),"-"," "),"/"," "))

Sorry for making it "a new answer". You may move it wherever you want.
I have just found a solution for the answer Liu Kang asked on Aug 3 2015 at 12:15. :)
Unfortunately, I do not have "50 reputation" to comment on Grade 'Eh' Bacon's solution above, where the last comment is this:
Discovered a slight problem. Using =IF(B1<>"";(LEN(A1)-LEN(SUBSTITUTE(A1;B1&" ";"")))/(LEN(B1)+1)+IF(RIGHT(A1;LEN(B1))=B1;1;0);"") with shoe in B1 gives the following result: shoe in A1 = 1 (correct), shoes in A1 = 0 (correct), ladyshoe in A1 = 1 (wrong). Guess this have to do with "RIGHT" in the formula. Is it possible to make the formula non-matching for prefix words? E.g if B1 is containing shoe and A1 is containing ladyshoe dogshoe catshoes shoes I want C1 to result in 0. – Liu Kang Aug 3 '15 at 12:15
The solution is to search for a space at the beginning of the word as well (" "&B1&" ") and to add "one" more LEN(B1)+2. So, it becomes =IF(B1<>"";(LEN(A1)-LEN(SUBSTITUTE(A1;" "&B1&" ";"")))/(LEN(B1)+2)+IF(RIGHT(A1;LEN(B1))=B1;1;0);"").
There is one more problem if the word we are looking for is at the beginning. Because there is obviously no space " " at the beginning of the sentence. I use a workaround for it - I have my sentence in A1, but then I have a hidden column B where there is =" "&A1 in B1 and it puts the "space" I need to the beginning of the sentence and everything from the original Grade 'Eh' Bacon's solution is shifted (A1->B1, B1->C1, C1->D1).
I hope it can help and thanks to all who participated in this thread, you helped me A LOT!

Do you need this to be a single formula? I have an idea, but it takes a few (relaitvely simple) steps.
Since you have a long sentence in A1, what about going to Data -> Text to Columns, and send this sentence into a Row, delimited by spaces. Then, remove any punctuation. Then, just do a simple Countif()?
Put the info in A1, then go to Data --> Text to Columns, choose "Delimited", click Next, and choose "Space":
Click Finish, and it'll put the entire thing into Row 1, with a word in each cell. Now just Find/Replace "." and "," with nothing.
Then, Countif to the rescue!
If that works, we can automate into VB, so you don't have to manually find/replace the puncutation. Before I jump into that, does this method work?

Take the length of the string and minus the length of the string with the keyword replaced with nothing then divide the result by the length of the keyword:
=(LEN(A1)-LEN(SUBSTITUTE(A1,B1,"")))/LEN(B1)

Related

Excel find text value in string

I have a string such as K68272CAA6A1
And need to do that, formula will pass the first character (I mean string will be 68272CAA6A1 in mind) and formula will find the first text character. And cell value will be 7. Because first text character is "C" and it's the 7th character of my string (include "K" character).
And after that I'll split rest of them. But I'm confused about this issue.
If I understand you correctly, you are looking for the position of the 2nd letter in your string. That number is given by the following array-entered formula.
To enter an array formula, hold down ctrl+shift while hitting Enter. If you do this correctly, in the Formula Bar you will see braces {...} around the formula:
=MATCH(FALSE,ISNUMBER(MID(A1,ROW(INDIRECT("2:99")),1)/1),0)+1
The 99 just needs to be some number larger than the length of your longest string.
If I understood you correctly, a formula that implements this functionality (assuming cell A1 = K68272CAA6A1 and B1 = K) would be:
=FIND(RIGHT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(RIGHT(A1,LEN(A1)-FIND(B1,A1)),"1",""),"2",""),"3",""),"4",""),"5",""),"6",""),"7",""),"8",""),"9",""),1),RIGHT(A1,LEN(A1)-FIND(B1,A1)))-1
The long sequence of substitute is there to remove the numbers (I couldn't find a specific formula to remove them).
This gigantic formula for your example would simply give the answer 6.
To get the strings separated as you want all you need to do is =LEFT(A1,D1) supposing the long formula is on D1 and =RIGHT(A1,D1), which in your example would yield respectively K68272 and CAA6A1

excel functions find pattern of a string in a cell

I have personal ID's in reports I have to find in one cell. Too bad the string in the cell which hides this ID can be anything, the ID can be at the beginning, the end, anywhere, but it is there.
The only thing I know is the pattern "space,letter,letter,number,number,number,number,number,number,space". Jike DB544345
I was looking for the correct word for this "mask", but couldn't find an answer. Thank you for your help.
As the comments are numerous I have created a minimal example that might represent what the OP is dealing with:
A1: 123456789 DB544345 asdfg asdfghjk
A2: creating dummy data is a DB544345 pain
A3: DB5443456 and soething else
parsed a copy of that in ColumnB with Text To Columns (with space as the delimiter) then applied:
=IFERROR(IF(AND(LEN(B1)=8,CODE(LEFT(B1))>64,CODE(LEFT(B1))<91,CODE(MID(B1,2,1))>64,CODE(MID(B1,2,1))<91,ISNUMBER(RIGHT(B1,6)*1),RIGHT(B1,6)*1>99999),B1,""),"")
to K1, copied this across to P1 and then K1:P1 down.
A concise "built-in function only" solution to a problem such as this requires a bit of tinkering as many attempts will dead-end or need workarounds due to deficiencies and quirks in the built-in Excel formulas. I much prefer single cell formulas because they minimally affect the general spreadsheet structure. However, due to the limitations listed above, complex single cell solutions often come at the cost of being rather long and cumbersome (this answer is somehow still only two lines on my formula bar in Excel). I came back to your question and cobbled together a formula that can (as far as I have tested) extract the first occurrence of this pattern with a single cell formula. This is an array formula (Ctrl+Shift+Enter instead of Enter) that assumes your data is in A2. This rough formula returns the first 8 characters if no match is found and throws #REF if the string is shorter than 10 characters.
=MID(A2,MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9))),1)=" ",IF(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+9,1)=" ",IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+1,1))>64,IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+1,1))<91,IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+2,1))>64,IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+2,1))<91,IF(IFERROR(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+3,6)*1>99999,FALSE),ROW(INDIRECT("A1:A"&(LEN(A2)-9)))))))))))+1,8)
Let me try to break this down at least on a high level. We are splitting the main text into every possible ten character chunk so that we can test each one using the suggestion of #pnuts to verify the Unicode values of the first two characters and run an ISNUMBER check on the rest of the string. This first block recurs throughout my formula. It generates a list of numbers from 1 to n-9 where n is the length of our main text string.
ROW(INDIRECT("A1:A"&(LEN(A2)-9)))
Let's assume our string is 40 characters long and replace the above formula with {1...31}. Using this number sequence generation we can check if characters 1 to 31 are spaces:
IF(MID(A2,{1...31},1)=" "
Then we can check if characters 10 to 40 are spaces:
IF(MID(A2,{1...31}+9,1)=" "
Then we can check if characters 2 to 32 are capital letters:
IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+1,1))>64,
IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+1,1))<91
Then we can check if characters 3 to 33 are capital letters:
IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+2,1))>64,
IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+2,1))<91
Then we can check if the strings of characters 4 to 9, 5 to 10, ..., 33 to 38, 34 to 39 are six-digit numbers:
IF(IFERROR(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+3,6)*1>99999,FALSE)
If all conditions are TRUE, that 10 digit chunk test will return the index of its first character in the string via another instance of the original array {1...31}. Otherwise it returns nothing. We take the Min of all return indexes and then use the Mid function to grab the 8 digit string determined by the aforementioned minimum index:
=MID(A2,MIN(matching index list)+1,8)
I think this will work, if we assume that the SPACE at the beginning and end are merely to differentiate the ID from the rest of the string; hence would not be present if the ID were at the beginning or end of the string. This formula is case insensitive. If case sensitivity is required, we could do character code comparisons.
=LOOKUP(2,1/((LEFT(myArr,2)>="AA")*(LEFT(myArr,2)<="ZZ")*(LEN(myArr)=8)*ISNUMBER(-RIGHT(myArr,6))),myArr)
Where myArr refers to:
=TRIM(MID(SUBSTITUTE(TRIM(Sheet2!A1)," ",REPT(" ",99)),(ROW(INDIRECT("1:10"))-1)*99+1,99))
If myArr is initially defined with the cursor in B1, referring to A1 as shown, it will adjust to refer to the cell one column to the left of the cell in which the Name appears.
The 10 in 1:10 is the maximum number of words in the string -- can be adjusted if required.

Counting number of spaces before a string in Excel

A program that exports to Excel creates a file with an indented list in a single column like this:
Column A
First Text
Second Text
Third Text
Fourth Text
Fifth Text
How can I create a function in excel that counts the number of white spaces before the string of text?
So as to return: 1 for the first text row and 3 for the for the thirst row, etc in this example.
Preferably seeking a non-VBA solution.
TRIM doesn't help here because it removes double spaces also between words.
The main idea is to find the FIRST letter in the trimmed string and find its position in the original string:
=FIND(LEFT(TRIM(A1),1),A1)-1
You can try this function in Ms Excel itself:
=LEN(A1)-LEN(SUBSTITUTE(A1," ",""))
This would apply if the results are in a single cell. If it is for a whole row/column, just drag the formula accordingly.
Try below:
=FIND(" ",A1,1)-1
It calculates the position of the first found whitespace character in a cell and reduces it by 1 to reflect number of characters before that position.
As per http://www.mrexcel.com/forum/excel-questions/61485-counting-spaces.html, you may try:
=LEN(Cell)-LEN(SUBSTITUTE(Cell," ",""))
where Cell is your target cell (i.e. A1, B1, D3, etc.).
My example:
B8: =LEN(F8)-LEN(SUBSTITUTE(F8," ",""))
F8: [ this is a test ]
produces 4 in B8.
The above method will count spaces before the string if any were inserted, between individual words and after the string, if any were inserted. It won't count available space that does not have an actual white space character. So, if I inserted two spaces after test in the above example, the total count would be raised to 6.
As has been pointed out in the other answers, you can't really use TRIM or SUBSTITUTE as potential spaces in between words or at the end will give you the wrong result.
However, this formula will work:
=MATCH(TRUE,MID(A1,COLUMN($A$1:$J$1),1)<>" ",0)-1
You need to enter it as an array formula, i.e. press Ctrl-Shift-Enter instead of Enter.
In case you expect more than 10 spaces, replace the $J with a column letter further down in the alphabet!
Here's my solution. If the left 5 characters equals "_____" (5 blank spaces), then return 5, else look for 4 spaces, and so on.
=IF(LEFT(B1,5)=" ",5,IF(LEFT(B1,4)=" ",4,IF(LEFT(B1,3)=" ",3,IF(LEFT(B1,2)=" ",2,1))))
You almost got it with LEN + TRIM in answers before, you only need to combine both:
=LEN(Cell)-LEN(TRIM(Cell))
If it is Indented you could create a Personal Function like this:
Function IndentLevel(Cell As Range)
'This function returns the indentation of a cell content
Application.Volatile
'With "Application.Volatile" you can make sure, that the function will be
recalculated once the worksheet is recalculated
'for example, when you press F9 (Windows) or press enter in a cell
IndentLevel = Cell.IndentLevel
'Return the IndentLevel
End Function
This will work only if it is Indented, you can see this property in the Cell Format -> Alignment.
After This you could see the Indentation Level.

How to convert a word to a Unique Code in Excel using Formula without using VBA?

Say, I got 2 words A1:ddC, A2:DDC
I want to convert these 2 words into a unique code so that so that i can do the Case Sensitive Vlookup.
So i tried, =Code(A1) & it returned 100, but if i tried =Code("dady") then it also returns 100. It is cos =Code() only pic the first char of the word.
I want to convert a word to a Unique Code (could be ASCII code or any form of unique code).
So how to do that without using VBA?
As this is a hash, it would be possible for some strings to end up with the same value, but it would be unlikely.
Note that the row function uses 1:255 to generate a list of numbers from 1 to 255 - change this number if your strings end up longer.
=A1&SUMPRODUCT(IF(IFERROR(CODE(MID(A1,ROW($1:$255),1)),0)>96,1,0),POWER(2,ROW($1:$255)))
This has to be entered as an array formula using CTRL+SHIFT+ENTER - you will see {} around the formula if you have successfully done that.
This will produce a decimal representation of the upper and lower case letters, and this is then appended to the word itself - this will guarantee uniqueness, as the only way to have a word and number match is to have the same word and case, which means it was a duplicate in the first place.
With this, ddC = ddC & 1*2 + 1*4 + 0*8 = ddC6
DDC = DDC & 0*2 + 0*4 + 0*6 = DDC0
ddC (ddC with a space after it) = ddc & 1*2 + 1*4 + 1*8 + 0*16 = ddC 6
*WARNING: * This is not a solution to the titled question
"How to convert a word to a Unique Code in Excel using Formula without using VBA?" but instead is a solution to what I believe is the underling problem as the original question states "so that i can do the Case Sensitive Vlookup." this is a solution acomplishing a Case Sensitive Vlookup, without the need to convert the values before doing so.
An alternative to converting all the values then doing a look up on the converted values, you could use the INDEX and MATCH functions in an array entered formula and directly look up the values:
=INDEX(A1:A14,MATCH(TRUE,EXACT(A1:A14,"ddC"),0))
This will return the value in A1:A14, at the same index of an exact (case-sensitive) match in A1:A14 to ddC you can VERY easily modify this into a look up of other columns.
Explanation:
Start with getting an array of all exact matches in your look up list to your look up value:
So if I enter this formula:
=EXACT(A1:A14,"ddC")
Then go into the formula bar and press F9 it will show me an array of true false values, relating to each cell in the range A1:A14 that are an Exact match to my expression "ddC":
now if we take this Boolean Array, and use the Match function to return the relative position of True in the array.
=MATCH(TRUE,EXACT(A1:A14,"ddC"),0)
But remember we need to enter this by pressing Ctrl + Shift + Enter because we need the EXACT(A1:A14,"ddC") portion of the formula to be returned as an array.
Now that we have the position of the True in the array, in this case 6 we can use that to retrieve the corresponding value in any column, as long as it is relational and that same size. So if we want to return the value of the exact match (although relatively useless in this situation, but will continue for demonstration) in the original look up column we just wrap the last formula up in an Index function:
=INDEX(A1:A14,MATCH(TRUE,EXACT(A1:A14,"ddC"),0))
But remember we need to enter this by pressing Ctrl + Shift + Enter because we need the EXACT(A1:A14,"ddC") portion of the formula to be returned as an array.
Now we can apply that same concept to a larger range for more useful look up function:
But remember we need to enter this by pressing Ctrl + Shift + Enter because we need the EXACT(A1:A14,"ddC") portion of the formula to be returned as an array.
Now notice in this last step I offered 2 formulas:
=INDEX(A1:B14,MATCH(TRUE,EXACT(A1:A14,D2),0),2)
And
=INDEX(B1:B14,MATCH(TRUE,EXACT(A1:A14,D2),0))
The first returns the value in the range A1:B14 in the Second column at the position of the exact match in A1:A14 to the value in D2 (in this case "dady")
The second returns the value in the range B1:B14 at the position of the exact match in A1:A14 to the value in D2 (in this case "dady")
Hopefully someone else can add more input but as far as I know the second might be better performing, as it has a smaller index range, and doesn't require going to the specified column, it is also shorter.
While the first to me is much easier to read, to some (more of a preference I think) because you know that your looking at a look up table that spans 2 columns and that you are returning the value in the second column.
*Notes: * I am note sure if this solution will be better in practice then converting the original values in the first place, seeing as how converting all the values once, then hard coding the converted values will require no additional formula or calculation (if formulas are afterwards replaced with values) once finished, while this method will recalculate, and also is array entered. But I feel in the case the asker is doing a single look up against a changing look up list (one that constantly requires all values are converted at all times using array formula) this option does allow you to remove the formula per word, with one single formula
all in all I hope this solves your original problem,
Cheers!!
if all your strings like the one you pointed above try something like this:
= CONCATENATE(Code(A1) , Code(Mid(A1,2,1)) , Code(Mid(A1,3,1)))
In order to account for capital letters you're going to end up with a VERY long formula, especially if you have long word entries. Without VBA I would approach it this way and set up the formula once to allow for the biggest word you anticipate, and then copy it around as needed.
Formula (to expand):
=CONCATENATE(IF(EXACT(A1,UPPER(A1))=TRUE,"b","s")&CODE(A1),IF(EXACT(A1,UPPER(A1))=TRUE,"b","s")&CODE(MID(A1,2,1)),IF(EXACT(A1,UPPER(A1))=TRUE,"b","s")&CODE(MID(A1,3,1)), . . . )
You can substitue the "b" and "s" with whatever you like. I was just using those for a case check for capital versus lowercase letters (b=big, s=small) and building that into your unique code.
In order to expand this, add additional cases to account for the length of the words you are using by adding this snippet JUST inside the last parenthesis and modifying the "3" in the MID() function to account for a word length of "4", "5", "6", etc.:
IF(EXACT(A1,UPPER(A1))=TRUE,"b","s")&CODE(MID(A1,3,1))
Painful, yes, but it should work.
Cheers.

String Separate in Excel

mozilla-nss-3.11.4-0.7
gdb-10.12-1.5.2
glibc-dcc-atv-1.0.3-10.6
i want to separate it too in the next B C D cell
mozilla-nss 3.11.4 0.7
gdb 10.12 1.5.2
glibc-dcc-atv 1.0.3 10.6
right now i can use left , right and find function to do it but not quite work well
i use
LEFT(B33,FIND(".",B33)-2) =B cell
RIGHT(B33,FIND(".",B33)) =C Cell
RIGHT(D33,FIND("-",D33)-1) = D Cell
answer is not right anyone can Help me correct my function thank you
The key point here which makes the task difficult - we need to use as separators LAST TWO hyphens in the string, and remain all the rest intact. For such cases ARRAY formulas is the best shot. My solution is below:
Name 6 columns starting A1: String | MAX "-" | 2nd MAX "-" | Str1 | Str2 | Str3
Put your values in Column A starting at A2.
B2 (MAX "-"): type the formula =MAX(IFERROR(SEARCH("-",$A2,ROW(INDIRECT("1:"&LEN($A2)))),0)) but press CTRL+SHIFT+ENTER instead of usual ENTER - this will define an ARRAY formula and will result in {} brackets around it (but do NOT type them manually!).
C2 (2nd MAX "-"): type the formula =MAX(IFERROR(SEARCH("-",$A2,ROW(INDIRECT("1:"&LEN($A2)))),0)*IF(IFERROR(SEARCH("-",$A2,ROW(INDIRECT("1:"&LEN($A2)))),0)=MAX(IFERROR(SEARCH("-",$A2,ROW(INDIRECT("1:"&LEN($A2)))),0)),0,1)) and again press CTRL+SHIFT+ENTER.
Thus we'll obtain positions of LAST TWO hyphens in the string. The rest is easy - ordinary LEFT / MID / RIGHT stuff:
D2: =LEFT($A2,$C2-1), ENTER.
E2: =MID($A2,$C2+1,$B2-$C2-1), ENTER.
F2: =RIGHT($A2,LEN($A2)-$B2), ENTER.
Autofill B:F.
If temporary columns B:C are unwanted - you should replace references to them in D:F for B:C contents (i.e. replace $A2 in =LEFT($A2, with A2 actual formula), but this will result in TOO complicated ARRAY formulas, still doing their job - but difficult to understand the next day even for the creator)
As for the above solution - perhaps it might be improved or simplified, but I'm pretty much familiar with such ROW...INDIRECT constructions from times I had to analyze megabytes of statistic data, so for me it's just as easy as create LEFT / RIGHT. Anyway, it seems to work.
For your convenience my sample file is shared: https://www.dropbox.com/s/p49x32t3a0igtby/StringHyphensSeparate.xlsx
Hope that was helpful)
ADDITION - 2 more simplified solutions to find LAST TWO hyphens (the rest of steps is the same as above):
More simple ARRAY formulas:
B2 (MAX "-"): type the formula =MAX(IF(MID($A2,ROW(INDIRECT("1:"&LEN($A2))),1)="-",ROW(INDIRECT("1:"&LEN($A2))),0)) but press CTRL+SHIFT+ENTER instead of usual ENTER - this will define an ARRAY formula and will result in {} brackets around it (but do NOT type them manually!).
C2 (2nd MAX "-"): type the formula =LARGE(IF(MID($A2,ROW(INDIRECT("1:"&LEN($A2))),1)="-",ROW(INDIRECT("1:"&LEN($A2))),0),2) and again press CTRL+SHIFT+ENTER.
Regular formulas using SUBSTITUTE function:
B2 (MAX "-"): type the formula =SEARCH("#",SUBSTITUTE($A2,"-","#",LEN($A2)-LEN(SUBSTITUTE($A2,"-","")))), ENTER.
C2 (2nd MAX "-"): type the formula =SEARCH("#",SUBSTITUTE($A2,"-","#",LEN($A2)-LEN(SUBSTITUTE($A2,"-",""))-1)), ENTER.
The key for SUBSTITUTE solution is that it may replace only certain instances of matches, i.e. only 2nd or 3rd hyphen. The overall number of hyphens is determined again via SUBSTITUTE formula: length of original string MINUS length of string with ALL hyphens replaced to empty strings: LEN($A2)-LEN(SUBSTITUTE($A2,"-","").
One more trick here - while we should remain the original string intact, we still MAY do anything with it for intermediate solutions! Thus, we replace the hyphen with #, and then search for # in temporary string.
All the above solutions are working, choose what you like / understand better. Hope that will also help in understanding array formulas, since for the same task there are 2 different approaches.
I updated the example file to include the last 2 examples + resulting megaformulas without intermediate steps, link is the same and located above. Good luck!
Here is a less than perfect solution:
Do a search & replace to get rid of any dashes that are not delimiters. For example, replace "mozilla-nss" with "mozillanss"
Put your values in Column A starting at A1
In B1, enter =LEFT(A1,FIND("-",A1)-1)
In C1, enter =SUBSTITUTE(A1,B1,"")
In D1, enter =SUBSTITUTE(LEFT(C1,FIND("-",C1,2)),"-","")
In E1, enter =SUBSTITUTE(SUBSTITUTE(C1,D1,""),"-","")
Fill Down the equations for all your values in Column A.
Edit: Added next line:
Replace "mozillanss" with mozilla-nss".
Your answers are in columns B,D, and E.

Resources