In Excel, finding if cell contains any character other than letters, a dot . , single quotation and space - excel

I have the following table, and would like to identify the cells (as HIT) that contain characters other than
letters
dot .
single quotation
Which formula can I use for this? I've tried different functions, they don't seem to work.

I think there will be a bunch of possibilities. Here is one using the logic that we will check every character in your string against all characters you'd like to exclude:
Formula in B2:
=IF(SUM(--ISERROR(SEARCH(MID(A2,SEQUENCE(LEN(A2)),1),"abcdefghijklmnopqrstuvwxyz'. "))),"Hit","No Hit")
Note: I deliberately included a space since you seems to be wanting to exclude that too.
Other options could be:
=IF(REDUCE(LOWER(A2),MID("abcdefghijklmnopqrstuvwxyz.' ",SEQUENCE(29),1),LAMBDA(a,b,SUBSTITUTE(a,b,"")))<>"","Hit","No Hit")
Or with FILTERXML():
=IF(ISERROR(FILTERXML("<t><s>"&LOWER(A2)&"</s></t>","//s[translate(., ""abcdefghijklmnopqrstuvwxyz.' "", '')!='']")),"No Hit","Hit")
Though these options are more verbose and both SUBSTITUTE() and FILTERXML() are case-sensitive whereas SEARCH() is not.

So, perhaps easier to do:
IF(MAX(IFERROR(FIND(".",A1,1),0),IFERROR(FIND("'",A1,1),0))>0,"No Hit","Hit")
I did not include a space but that is easily edited in.

Related

How to make an excel (365) function that recognizes different words in the same cell and changes them individually

What im working with
I have a list of product names, but unfortunately they are written in uppercase I now want to make only the first letter uppercase and the rest lowercase but I also want all words with 3 or less symbols to stay uppercase
im trying if functions but nothing is really working
i use the german excel version but i would be happy if someone has any idea on how to do it im trying different functions for hours but nothing is working
=IF(LENGTH(C6)<=3,UPPER(C6),UPPER(LEFT(C6,1))&LOWER(RIGHT(C6,LENGTH(C6)-1)))
but its a #NAME error excel does not recognize the first and the last bracket
This is hard! Let me explain:
I do believe there are German words in the mix that are below 4 characters in length that you should exclude. My German isn't great but there would probably be a huge deal of words below 4 characters;
There seems to be substrings that are 3+ characters in length but should probably stay uppercase, e.g. '550E/ER';
There seem to be quite a bunch of characters that could be used as delimiters to split the input into 'words'. It's hard to catch any of them without a full list;
Possible other reasons;
With the above in mind I think it's safe to say that we can try to accomplish something that you want as best as we can. Therefor I'd suggest
To split on multiple characters;
Exclude certain words from being uppercase when length < 3;
Include certain words to be uppercase when length > 3 and digits are present;
Assume 1st character could be made uppercase in any input;
For example:
Formula in B1:
=MAP(A1:A5,LAMBDA(v,LET(x,TEXTSPLIT(v,{"-","/"," ","."},,1),y,TEXTSPLIT(v,x,,1),z,TEXTJOIN(y,,MAP(x,LAMBDA(w,IF(SUM(--(w={"zu","ein","für","aus"})),LOWER(w),IF((LEN(w)<4)+SUM(IFERROR(FIND(SEQUENCE(10,,0),w),)),UPPER(w),LOWER(w)))))),UPPER(LEFT(z))&MID(z,2,LEN(v)))))
You can see how difficult it is to capture each and every possibility;
The minute you exclude a few words, another will pop-up (the 'x' between numbers for example. Which should stay upper/lower-case depending on the context it is found in);
The second you include words containing digits, you notice that some should be excluded ('00SICHERUNGS....');
If the 1st character would be a digit, the whole above solution would not change 1st alpha-char in upper;
Maybe some characters shouldn't be used as delimiters based on context? Think about hypenated words;
Possible other reasons.
Point is, this is not just hard, it's extremely hard if not impossible to do on the type of data you are currently working with! Even if one is proficient with writing a regular expression (chuck in all (non-available to Excel) tokens, quantifiers and methods if you like), I'd doubt all edge-case could be covered.
Because you are dealing with any number of words in a cell you'll need to get crafty with this one. Thankfully there is TEXTSPLIT() and TEXTJOIN() that can make short work of splitting the text into words, where we can then test the length, change the capitalization, and then join them back together all in one formula:
=TEXTJOIN(" ", TRUE, IF(LEN(TEXTSPLIT(C6," "))<=3,UPPER(TEXTSPLIT(C6," ")),PROPER(TEXTSPLIT(C6," "))))
Also used PROPER() formula as well, which only capitalizes the first character of a word.

Replace all non-alphanumeric characters, including wildcards

I take this beautiful formula from JvdV answer:
=TRIM(CONCAT(IF(ISNUMBER(SEARCH(MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1),"-./ 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ")),MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1)," ")))
This formula replace any non-alphanumeric character (&^%]#$) with simple space " ".
I put in formula some exception (-./ ), but this is not all exceptions.
How about wildcards? How to filter wildcards (~*?) with this formula?
I think: Ok, I will use FIND instead of SEARCH and all will be right, just put lowercase and uppercase alphabet in the FIND index, like this: *"-./ 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"*
Then I think: But, what if I want to keep not only numeric and regular alphabet? What if I want to keep all diacritics, like this: "ÁÀȦÄǍĀÃÅĄȺẤẦẮẰǠǺǞẪẴẢȀȂẨẲẠḀẬẶĂÂḂɃƁḄḆĆĊĈČÇȻḈƇƆḊĎḐĐƊḌḒḎÐƉÉÈĖÊËĚĔĒẼĘȨɆẾỀḖḔỄḜẺȄȆỂẸḘḚỆÉÈÊËḞƑǴĠĜǦĞḠĢǤƓḢĤḦȞḨĦḤḪⱧÍÌİÏǏĬĪĨĮƗḮỈȈȊỊḬÍÌÏÎȷĴǰḰǨĶƘᶄḲḴⱩꝀꝂꝄĹĿĽⱢⱠĻȽŁḶḼḺḸꝈḾṀṂŃǸṄŇÑŅƝṆṊṈÑŊÓÒȮÔÖǑŎŌÕǪŐỐỒƟØṒṐṌȪỖṎǾȬǬỎȌȎƠỔỌỚỜỠỘỞỢÓÒÔÖÕØṔṖⱣƤƦŔṘŘŖɌⱤȐȒṚṞṜŚṠŜŠṤṦṢṨŞṪŤƬṬƮṰṮȾŢŦÚÙÛÜǓŬŪŨŮŲŰɄǗǛṸṺỦȔȖƯỤṲỨỪṶṴỮỬỰÚÙÛÜṼṾẂẀẆŴẄẈẊẌÝỲẎŶŸȲỸɎỶƳỴÝŹŻẐŽƵẒẔ"
Then lowercase and uppercase alphabet is too much for FIND index.
Ok, for SEARCH index is also too much, because function accept max. 255 length, but lets say we have only 200 characters in index (numbers, alphabet and some diacritics)
So, the question is available:
How to filter (replace with space) wildcards (~*?) with this kind of formula?
As I read this question there are a few problems:
How to include over 255 characters in the 2nd parameter of SEARCH();
How to exclude literal wildcard characters in the 2nd parameter of SEARCH();
One way around the length limit is to feed SEARCH() an array of options, in this case an array of two elements of a lenght of <255:
Formula in C1:
=TRIM(CONCAT(IF(MMULT(IFERROR(SEARCH("~"&MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1),{"ÁÀȦÄǍĀÃÅĄȺẤẦẮẰǠǺǞẪẴẢȀȂẨẲẠḀẬẶĂÂḂɃƁḄḆĆĊĈČÇȻḈƇƆḊĎḐĐƊḌḒḎÐƉÉÈĖÊËĚĔĒẼĘȨɆẾỀḖḔỄḜẺȄȆỂẸḘḚỆÉÈÊËḞƑǴĠĜǦĞḠĢǤƓḢĤḦȞḨĦḤḪⱧÍÌİÏǏĬĪĨĮƗḮỈȈȊỊḬÍÌÏÎȷĴǰḰǨĶƘᶄḲḴⱩꝀꝂꝄĹĿĽⱢⱠĻȽŁḶḼḺḸꝈḾṀṂŃǸṄŇÑŅƝṆṊṈÑŊÓÒȮÔÖǑŎŌÕǪŐỐỒƟØṒṐṌȪỖṎǾȬǬỎȌȎƠỔỌỚỜỠỘỞỢÓÒÔÖÕØṔṖⱣƤƦŔṘŘŖɌⱤ";"ȐȒṚṞṜŚṠŜŠṤṦṢṨŞṪŤƬṬƮṰṮȾŢŦÚÙÛÜǓŬŪŨŮŲŰɄǗǛṸṺỦȔȖƯỤṲỨỪṶṴỮỬỰÚÙÛÜṼṾẂẀẆŴẄẈẊẌÝỲẎŶŸȲỸɎỶƳỴÝŹŻẐŽƵẒẔ-./*? 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"}),0),{1,1}),MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1)," ")))
What we did here is:
Use an horizontal array {abc;xyz} to check against our characters which was an vertical array {a,b,c}. Note the difference between semi-column and comma.
The result will be a 2D-array which MMULT() can sum. Meaning if the character was found in any of the two elements of the array it will return that same character. Otherwise, a space.
The special wildcard characters are now also included with an extra tilde to escape them as with actually all characters.
If Excel doesn't recognize all lowercase diacritics as their uppercase counterparts, just add them to one of the two elements. If need be, add a 3rd. But know that you'd need to extend on the 2nd parameter in MMULT() too then.
To visualize the above:
Remember, you are using Excel 2019 which means you need to CSE-enter this formula. Needles to say that all will be much easier in ms365 using its dynamic array functionality.

Excel: Find words of certain length in string?

I have this file where I want to make a conditional check for any cell that contains the letter combination "_SOL", or where the string is followed by any numeric character like "_SOL1524", and stop looking after that. So I don't want matches for "_SOLUTION" or "_SOLothercharactersthannumeric".
So when I use the following formula, I also get results for words like "_SOLUTION":
=IF(ISNUMBER(FIND("_SOL",A1))=TRUE,"Yay","")
How can I avoid this, and only get matches if the match is "_SOL" or "_SOLnumericvalue" (one numeric character)
Clarification: The whole strings may be "Blabla_SOL_BLABLA", "Blabla_SOLUTION_BLABLA" or "Blabla_SOL1524_BLABLA"
Maybe this, which will check if the character after "_SOL" is numeric.
=IF(ISNUMBER(VALUE(MID(A1,FIND("_SOL",A1)+4,1))),"Yay","")
Or, as per OP's request and suggestion, to include the possibility of an underscore after "SOL"
=IF(OR(ISNUMBER(VALUE(MID(A1,FIND("_SOL",A1)+4,1))),ISNUMBER(FIND("_SOL_",A1))),"Yay","")
Here is an alternative way to check if your string contains SOL followed by either nothing or any numeric value up to any characters after SOL:
=IF(COUNT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","1</s><s>")&"</s></t>","//s[substring-after(.,'SOL')*0=0]")>0),"Yey","Nay")
Just to use in an unfortunate event where you would encounter SOL1TEXT for example. Or, maybe saver (in case you have text like AEROSOL):
=IF(COUNT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","</s><s>")&"</s></t>","//s[translate(.,'1234567890','')='SOL']")>0),"Yey","Nay")
And to prevent that you have text like 123SOL123 you could even do:
=IF(COUNT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","1</s><s>")&"</s></t>","//s[starts-with(., 'SOL') and substring(., 4)*0=0]")>0),"Yey","Nay")

How to make FIND function exact in Excel

I'm using the FIND function in Excel to check whether certain characters appear in a string of characters in a cell.
However, this function doesn't work cleanly for certain special characters. Specifically F̌,B̌, and some others. When F̌ appears in the string, FIND recognizes it as both F and F̌.
Notable that this is not the case for characters such as Ď and Č. FIND works nicely for these.
How can I get the formula to always differentiate between characters with and without the hat? Is there a way to work in EXACT?
Thank you!
It is because F̌ is actually two characters.
=LEN("F̌") returns 2 not 1. The second character is the hat.
If you do:
=UNICHAR(70)&UNICHAR(780)
It will return the F̌
And as such =FIND("F","F̌") will return 1 as it is the first letter of a two character string.
To find "F" in A,B,F̌,F use:
=AGGREGATE(15,7,ROW($ZZ1:INDEX($ZZ:$ZZ,LEN(A1)))/((MID(A1,ROW($ZZ1:INDEX($ZZ:$ZZ,LEN(A1))),1)="F")*(MID(A1,ROW($ZZ2:INDEX($ZZ:$ZZ,LEN(A1)+1)),1)<>UNICHAR(780))),1)
To find either then we need to use IF:
=IF(LEN(A2)=2,FIND(A2,A1),AGGREGATE(15,7,ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(A1)))/((MID(A1,ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(A1))),1)=A2)*(MID(A1,ROW($ZZ$2:INDEX($ZZ:$ZZ,LEN(A1)+1)),1)<>UNICHAR(780))),1))
Given that your substrings are comma-separated, look for the character followed by a comma (and add a comma to the end of the string to find the last character).
This allows you to separate multicharacter substrings from uni-character substrings where the latter is contained in the former.
You could use something like:
=FIND("F,",A5&",")
That will find an F in A5, but will not find an F if only F̌ is present

Keeping leading zeros with find and replace

I'm using Excels find and replace to remove hyphens from long numbers. They are mixed between birth dates and organisation numbers that have been filled with leading zeros to have the same number of characters. There are a LOT of numbers so find and replace seems to be the simplest solution to remove the hyphens.
But when i use find and replace the leading zeros are truncated and I have not found a solution to keep them afterwards.
For example i have:
19551230-1234
01234567-8901
and after find and replace I have
1,95512E+11
12345678901
but want the format as:
195512301234
012345678901
So I want to keep the leading zeros after find and replace. I've tried formatting the cells as text, but it doesn't work as the find and replace automatically truncates the leading zero and keeps the remaining characters, so the zero is completely removed. I am using Excel 2010, but answers for several versions are appreciated.
Just put a single quote in front of your leading number - ex. '01234 It will take the number as-is literally and the quote will not show in the field.
Use the SUBSTITUTE formula instead of Find and Replace like so:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1," ",""),"/",""),")",""),"(",""),"-","")
The result is text.

Resources