How do I find the most common string in a column by replacing? - excel-formula

Suppose I have a column of wind directions ("N","S","W",E"). Each cell only contains 1 letter. If I am to find the most common wind directions,=CHAR(MODE(CODE(range))) will do the job
But if I handle wind directions like "SW","NE", the above function would not work. I know that =INDEX(range, MODE(MATCH(range, range, 0 ))) will work.
Just curious, somewhat similar to the first function, is there a way to substitue strings with numbers of choice only when passing in the column into MODE()function, so that it will return a number for me to MATCH() and INDEX() to get the result?
Clarify: Say that I have the following data
And I would like to substitue "N" with 0, "NE" with 45, "SW" with 225 and so on. So that MODE() will be applicable. And if needed, I can then use functions like INDEX(MATCH()) to return the actual letter representation of the wind direction.

=SWITCH(MODE(SWITCH(A:A,"N",0,"NE",45,"E",90,"SE",135,"S",180,"SW",225,"W",270,"NW",315,"")),0,"N",45,"NE",90,"E",135,"SE",180,"S",225,"SW",270,"W",315,"NW","")
Or similar:
=CHOOSE(1+MODE(SWITCH(A:A,"N",0,"NE",45,"E",90,"SE",135,"S",180,"SW",225,"W",270,"NW",315,""))/45,"N","NE","E","SE","S","SW","W","NW")
This first translates the strings to values, then translates the MODE result back to it's string.

Related

How do I find the last number in a string with excel formulas

I'm parsing strings in excel, and I need to return everything through the last number. For example:
Input: A00XX
Output: A00
In my case, I know the last number will be between index 3 and 5, so I'm brute-forcing it with:
=LEFT([#Point],
IF(SUM((MID([#Point],5,1)={"0","1","2","3","4","5","6","7","8","9"})+0),5,
IF(SUM((MID([#Point],4,1)={"0","1","2","3","4","5","6","7","8","9"})+0),4,
IF(SUM((MID([#Point],3,1)={"0","1","2","3","4","5","6","7","8","9"})+0),3,
))))
Unfortunately, I've run into some edge cases where the numbers extended beyond index 5. Is there a generic way to find the last number in a string using excel formulas?
Note:
I've tried =MAX(SEARCH(... but it returns the index of the first number, not the last.
As a starting point: if we know the position of the last number, we can use LEFT to get the string to that point. Suppose that the position is 5:
=LEFT(A1, 5)
But, we don't know the position of the last number. Now, what if the only valid number was 0, and it only appeared once: then we could use FIND to locate the position of the number:
=LEFT(A1, FIND(0, A1))
But, we have more than one valid number. Suppose that we had all the numbers from 0 through 9, but each number could only appear once — then we could use MAX on a FIND array, to tell us which of the numbers is the last one:
=LEFT(A1, MAX(FIND({0,1,2,3,4,5,6,7,8,9}, A1)))
Unfortunately, FIND will throw a #VALUE! error any number doesn't appear, which will then make MAX return the same error. So, we need to fix that with IFERROR:
=LEFT(A1, MAX(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9}, A1), 0)))
However, numbers can appear more than once. As such, we need a method to find the last occurrence of a value in a string (since FIND and SEARCH will, by default, return the first occurrence).
The SUBSTITUTE function has 3 mandatory arguments — Initial String, Value to be Replaced, Value to Replace with — and one Optional argument — the occurrence to replace. Normally, this is omitted, so that all occurrences are replaced. But, if we know how many times a character appears in a string, then we can replace just the last instance with a special/uncommon sub-string to search for.
To count how many times a character appears in a String, just start with the length of the String, then subtract the length when you SUBSTITUTE all copies of that character for Nothing:
=LEN(A1) - LEN(SUBSTITUTE(A1, 0, ""))
This means we can now replace the last occurrence of the character with, for example, ">¦<", and then FIND that:
=FIND(">¦<", SUBSTITUTE(A1, 0, ">¦<", LEN(A1) - LEN(SUBSTITUTE(A1, 0, ""))))
Of course, we want to do this for all the numbers from 0 to 9, and take the MAX value (remembering our IFERROR), so we need to put the Array of values back in:
=MAX(IFERROR(FIND(">¦<", SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, ">¦<", LEN(A1) - LEN(SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, "")))), 0))
Then, we plug that all back into our initial LEFT function:
=LEFT(A1, MAX(IFERROR(FIND(">¦<", SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, ">¦<", LEN(A1) - LEN(SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, "")))), 0)))
An alternative, assuming that the length of the string in question will never be more than 9 characters (which seems a safe assumption based on your description):
=LEFT(A1,MATCH(0,0+ISERR(0+MID(A1,{1;2;3;4;5;6;7;8;9},1))))
This, depending on your version of Excel, may or may not require committing with CTRL+SHIFT+ENTER.
Note also that the separator within the array constant {1;2;3;4;5;6;7;8;9} is the semicolon, which, for English-language versions of Excel, represents the row-separator. This may require amending if you are using a non-English-language version.
Of course, we can replace this static constant with a dynamic construction. However, since we are already making the assumption that 9 is an upper limit on the number of characters for the string in question, this would not seem to be necessary.
If you have the newest version of Excel, you can try something like:
=LEFT(D1,
LET(x, SEQUENCE(LEN(D1)),
MAX(IF(ISNUMBER(NUMBERVALUE(MID(D1, SEQUENCE(LEN(D1)), 1))), x))))
For example:

Excel COUNTIF match variations of target: LET solution?

This is a slightly more complicated issue than a simple =COUNTIF(rng,"*"&value&"*"), as found here.
I have a 2D array with cells containing data such as:
abc
def
abc def
ghi
abc,def,ghi
abcdef
ghi; def
..... and several other variations of this. I am trying to count exact matches of "abc", but I want the count to be inclusive of cells containing "abc def" and other like variations, however I can't just use the above simple COUNTIF formula since "abcdef" is not an acceptable match. The target string must stand alone or be separated from other text by an acceptable character in chars.
I think I've got this one 90% done, but the bit I need help with is combining all the possible acceptable variations of a target "name" into a flat range that I can then check my data source against for the COUNTIF. I've tried INDEX(r_1:r_8,idxRow,idxCol) and other familiar solutions that work on the sheet when referencing other ranges, but I'm new to using the =LET function. All of this works well when broken out into separate components on my spreadsheet, but I'm looking for a cleaner solution with =LET. See below for current formula:
=LET(rg, DataTable[[Q14_1]:[Q14_9]],
name, AU38,
chars, {" ",",",";"},
r, 8,
r_1, CONCATENATE(name,chars),
r_2, CONCATENATE(chars,name),
r_3, CONCATENATE(chars,name,chars),
r_4, CONCATENATE(name,chars,"*"),
r_5, CONCATENATE("*",chars,name),
r_6, CONCATENATE(chars,name,chars,"*"),
r_7, CONCATENATE("*",chars,name,chars),
r_8, CONCATENATE("*",chars,name,chars,"*"),
c, COUNTA(chars),
mSeq, SEQUENCE(r*c),
idxRow, 1+MOD(mSeq,r),
idxCol, INT((SEQUENCE(r*c)-1)/r)+1,
X, INDEX(**NeedHelpHere**,idxRow,idxCol),
SUM(COUNTIF(rg,name),COUNTIF(rg,X))
)
Give a try on below formula. If you have more delimiter like space, comma & others then you need to use more SUBSTITUTE() function.
=LET(x,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1:A7," ","</s><s>"),",","</s><s>")&"</s></t>","//s"),y,FILTER(x,x="abc"),SUM(--(y<>"")))
To learn about FILTERXML() please read this article from JvdV.
I've thought about this again and am posting a solution that fits my needs.
I don't need to index a single column of potential matches to then COUNTIF, I can just COUNTIF multiple times. Additionally, I was not taking into account different combinations of chars, I was only searching for the same chars on either side of the target (e.g. ",abc," when I should have also been looking for ",abc;"). Transposing the chars array on one side is a simple way of fixing this. It also turns out that "*"&target&"*" searches for "*target*" AND "target" (duh!), so I simplified further, removing duplicative possibilities.
My final formula is below, which counts the number of times target (by itself or surrounded by any acceptable chars) is present in a given rng:
=LET(rng, DataTable[Q14_1]:[Q14_9]]),
name, $A6,
chars, {" " , "," , ";"},
r_1, CONCATENATE(name,chars,"*"),
r_2, CONCATENATE("*",chars,name),
r_3, CONCATENATE("*",chars,name,TRANSPOSE(chars),"*"),
SUM(COUNTIF(rng,name),COUNTIF(rng,r_1),COUNTIF(rng,r_2),COUNTIF(rng,r_3))
)

Checking if a list starts / ends with any one of many strings in a second list

In excel, what's the best way to check if a list of strings in a column start or end with another list of strings?
Example:
First List:
Reddy
CodeRed
Zabby
KaBlueY
Second List: Red, Blue, Blop, Blurp
The solution should return:
Reddy - TRUE (because it contains 'red' from the second list in the start or end position)
CodeRed - TRUE (because it contains 'red' from the second list in the start or end position)
Zabby - FALSE (because it does not contain any strings from the second list in the start or end position.
KaBlueY - FALSE (because it does not contain any strings from the second list in the start or end position.
Edited Answer:
The question has been changed, and so should my answer to make it current:
If the second argument of SEARCH function is a range, you may use this formula.
Solution #4: Check if any of the strings in a range contain a substring
=OR(ISNUMBER(SEARCH("red",range)))*1
Result: 1
=OR(ISNUMBER(SEARCH("redX",range)))*1
Result: 0
Same explanation as Solution #1 below but instead of searching for a substring in a "parent" string, it searches multiple strings and is made possible by creating an array formula and can only be done by pressing CTRL+SHIFT+ENTER. Check for {} around your formula in the formula bar to make sure that you created an array formula.
Since there are multiple results in an array formula, wrap it with OR function to see if any of the strings in a range contain the string that you're looking for. Finally, simply multiply it by 1 to convert the resulting boolean values to its numerical values.
Hope it helps!
Original Answer:
There are number of ways to do this. You may use SEARCH or SUBSTITUTE function, to parse the string, in combination with other functions such as those that returns boolean values, to check the expected result against the return value of the former. Finally, to convert boolean values into its numerical values, simply multiply it by 1.
Here are some examples to get you started:
Solution #1
=ISNUMBER(SEARCH("red","reddy"))*1
Result: 1
=ISNUMBER(SEARCH("redX","reddy"))*1
Result: 0
If it is able to find the substring red within the "parent" string reddy, the SEARCH function returns a number—the position of the substring you're looking for. Otherwise, like in the case of redX, it returns #VALUE!. To hide the ugly #VALUE! error message as well as to show a more appropriate message than simply showing the position of the substring, wrap it with ISNUMBER function to return TRUE or FALSE. And if you'd like to convert it to its numerical values, multiply it by 1.
Solution #2
=IF(SUBSTITUTE("reddy","red","anyText")="reddy",FALSE,TRUE)*1
Result: 1
=IF(SUBSTITUTE("reddy","redX","anyText")="reddy",FALSE,TRUE)*1
Result: 0
Here, the resulting string from the SUBSTITUTE function is compared against the "parent" string by wrapping it with IF function, which in turn returns a boolean value that can be converted into a numerical value by multiplying it by 1.
Solution #3
=NOT(EXACT("reddy",SUBSTITUTE("reddy","red","anyText")))*1
Result: 1
=NOT(EXACT("reddy",SUBSTITUTE("reddy","redX","anyText")))*1
Result: 0
This is just a variation of the formula presented in Solution #2. It uses EXACT function to check if the resulting string from the SUBSTITUTE function is exactly the same as the old string (which I called the "parent string" in Solution #1). If it is exactly the same, it means nothing were substituted because it didn't find the string you're looking for. Since EXACT function returns TRUE if the two strings are an exact match, which means nothing has changed, which also means, it didn't find the string you're looking for, you need to reverse the result by wrapping it with NOT function. Again, if you'd like to convert it to its numerical form, simply multiply it by 1.
Hope it helps!
References:
SEARCH
ISNUMBER
OR
SUBSTITUTE
IF
EXACT
NOT

Excel get substring from nth position up to the end of string

So How to get substring from nth position up to the end of string?
Input at cell A1 Name: Thomas B.
Expected output: Thomas B.
I know some way to do it but I wonder if there are other elegant ways than them? (some kind of =RIGHT(A1, -6)....)
=MID(A1, 6, 999999) //999999 looks not so good
=MID(A1, 6, LEN(A1) - 5) //must calculate 2 times, first get len, then get substring, seems too much works?
REPLACE
As Dominique already wrote:
'Why don't you just replace the first six characters by an empty string?'
=REPLACE(A1,1,6,"")
I've done some time measuring, but the difference is less than a second at 50000 records (for LEFT, MID, REPLACE & SUSTITUTE). So I'm afraid ELEGANCE is all you're going to get.
A Small Study
I created this study due to the fact that when you say from the n-th character, your n-th character is 7 (your MID-s are wrong), but you want to remove the first n-1 (6) characters. So depending on how you formulate your question, you might have a different approach in RIGHT or MID, and you will remember REPLACE and SUBSTITUTE or you may not.
Small Study Formulas for A1 (*) and B1 (#, ?, *)
Get String From N-th Character to the End, e.g. 7
=RIGHT(A1,LEN(A1)-(B1-1))
=RIGHT(A1,LEN(A1)-B1+1)
=RIGHT(A1,LEN(A1)-6)
=MID(A1,B1,LEN(A1)-(B1-1))
=MID(A1,B1,LEN(A1)-B1+1)
=MID(A1,B1,LEN(A1))
=MID(A1,7,LEN(A1)-6)
=MID(A1,7,LEN(A1))
Remove N First Characters of a String, e.g. 6
=RIGHT(A1,LEN(A1)-B1)
=RIGHT(A1,LEN(A1)-6)
=MID(A1,B1+1,LEN(A1)-B1)
=MID(A1,B1+1,LEN(A1))
=MID(A1,7,LEN(A1)-6)
=MID(A1,7,LEN(A1))
Get String After a Character e.g. " "
=RIGHT(A1,LEN(A1)-(FIND(B1,A1)))
=RIGHT(A1,LEN(A1)-(FIND(" ",A1)))
=MID(A1,FIND(B1,A1)+1,LEN(A1)-FIND(B1,A1))
=MID(A1,FIND(B1,A1)+1,LEN(A1))
=MID(A1,FIND(" ",A1)+1,LEN(A1)-FIND(" ",A1))
=MID(A1,FIND(" ",A1)+1,LEN(A1))
Get String After a String e.g. ": "
=RIGHT(A1,LEN(A1)-(FIND(B1,A1)+LEN(B1))+1)
=RIGHT(A1,LEN(A1)-FIND(B1,A1)-LEN(B1)+1)
=RIGHT(A1,LEN(A1)-FIND(": ",A1)-LEN(": ")+1)
=MID(A1,FIND(B1,A1)+LEN(B1),LEN(A1)-(FIND(B1,A1)+LEN(B1))+1)
=MID(A1,FIND(B1,A1)+LEN(B1),LEN(A1)-FIND(B1,A1)-LEN(B1)+1)
=MID(A1,FIND(B1,A1)+LEN(B1),LEN(A1))
=MID(A1,FIND(": ",A1)+LEN(": "),LEN(A1)-FIND(": ",A1)-LEN(": ")+1)
=MID(A1,FIND(": ",A1)+LEN(": "),LEN(A1))
Back to Remove N First Characters of a String, e.g. 6
=SUBSTITUTE(A1,LEFT(A1,6),"",1)
=REPLACE(A1,1,6,"")
Well, both of your methods already work, but you could also use this one:
=RIGHT(A1,LEN(A1)-6)
(you nearly had this one in your own question)
or this one:
=TRIM(MID(A1,FIND(":",A1)+1,100))
(the FIND() function returns the numeric position of a search string, so is great for doing dynamic substrings)
Why don't you just replace the first six characters by an empty string?
=SUBSTITUTE(A1;LEFT(A1;6);"";1)
Another possibility is that you create a constant with the value 2^31-1 (=2147483647), which is the maximum signed integer value on 32-bit systems, and you give it a nice name, like MaxInt, then your first formula will be efficient and nice looking, too:
=MID(A1, 6, MaxInt)
You can add the Name with Ctrl+F3. If you are interested in fast calculations, giving it as 2147483647 rather than 2^31-1 may have some (very little) advantage.

RFM Segmentation Excel Take Mid Value

I am trying to segment Customer with RFM Segmentation ranged 1 to 5 for each column R, F and M. After I combined the three column, there are many possibilities such as 151, 555, or 254 and so on.
Code 555 is the best Customer
and X5X is the loyal customer. "X" defines any numbers, e.g Code 454 is also Loyal Customer segmentation.
The problem is i cannot exactly deliver the IF function in excel correctly. Here is my trial for 555
=IF(O14="555","Best Customer",IF(MID(O14,2,1)="5","Loyal Customer"))
The function overlaps since it took the latest IF, so the result for 555 is Loyal Customer which should be Best Customer. There are many segmentation such as XX5 for the big spenders, however since the formula turns to overlap i cannot continue the rest. Thank you for your help.
If you only want 1 result per number, then you just need to put the highest priority first. The first IF that is TRUE will be the one that is used. From your example, it looks like "455" would be both a Loyal Customer and a Big Spender. We can't tell from your explanation what the result should be in this case. But whichever is the higher priority should just come earlier in your nested IF statements.
Your formula looks correct that 555 should return "Best Customer". If it is returning Loyal Customer, then it seems like you've got 555 stored as a number rather than text in O14. If it is a number, your formula should instead be:
=IF(O14=555,"Best Customer",IF(MID(O14,2,1)="5","Loyal Customer"))
The only difference is removing the quotes around 555. If O14 is stored as a number, then O14="555" is comparing a number to a string (three "5" characters rather than the number 555), which will always return FALSE, hence it moves on the next IF statement. To get a TRUE result at the start, you need to compare O14 to the number 555 instead.
You may then be confused about why the 2nd part of the formula works. This is because the MID function will accept a number as input and then force a type conversion.
When you use the = operator, excel can only compare like values. Meaning it can compare strings to strings or numbers to numbers, but not strings to numbers.
However, the MID function will accept either strings or numbers. When it is given a number, it will first convert it to a string and then output a string.
If it is given MID(555,2,1), it first changes 555 to "555" and gives the same result as MID("555",2,1), which is the character "5" rather than the number 5.
So, even if O14 has the number 555, MID(014,2,1) will return the character "5" and the comparison MID(O14,2,1)="5" will return TRUE.

Resources