Excel: Find words of certain length in string? - excel

I have this file where I want to make a conditional check for any cell that contains the letter combination "_SOL", or where the string is followed by any numeric character like "_SOL1524", and stop looking after that. So I don't want matches for "_SOLUTION" or "_SOLothercharactersthannumeric".
So when I use the following formula, I also get results for words like "_SOLUTION":
=IF(ISNUMBER(FIND("_SOL",A1))=TRUE,"Yay","")
How can I avoid this, and only get matches if the match is "_SOL" or "_SOLnumericvalue" (one numeric character)
Clarification: The whole strings may be "Blabla_SOL_BLABLA", "Blabla_SOLUTION_BLABLA" or "Blabla_SOL1524_BLABLA"

Maybe this, which will check if the character after "_SOL" is numeric.
=IF(ISNUMBER(VALUE(MID(A1,FIND("_SOL",A1)+4,1))),"Yay","")
Or, as per OP's request and suggestion, to include the possibility of an underscore after "SOL"
=IF(OR(ISNUMBER(VALUE(MID(A1,FIND("_SOL",A1)+4,1))),ISNUMBER(FIND("_SOL_",A1))),"Yay","")

Here is an alternative way to check if your string contains SOL followed by either nothing or any numeric value up to any characters after SOL:
=IF(COUNT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","1</s><s>")&"</s></t>","//s[substring-after(.,'SOL')*0=0]")>0),"Yey","Nay")
Just to use in an unfortunate event where you would encounter SOL1TEXT for example. Or, maybe saver (in case you have text like AEROSOL):
=IF(COUNT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","</s><s>")&"</s></t>","//s[translate(.,'1234567890','')='SOL']")>0),"Yey","Nay")
And to prevent that you have text like 123SOL123 you could even do:
=IF(COUNT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","1</s><s>")&"</s></t>","//s[starts-with(., 'SOL') and substring(., 4)*0=0]")>0),"Yey","Nay")

Related

How to make an excel (365) function that recognizes different words in the same cell and changes them individually

What im working with
I have a list of product names, but unfortunately they are written in uppercase I now want to make only the first letter uppercase and the rest lowercase but I also want all words with 3 or less symbols to stay uppercase
im trying if functions but nothing is really working
i use the german excel version but i would be happy if someone has any idea on how to do it im trying different functions for hours but nothing is working
=IF(LENGTH(C6)<=3,UPPER(C6),UPPER(LEFT(C6,1))&LOWER(RIGHT(C6,LENGTH(C6)-1)))
but its a #NAME error excel does not recognize the first and the last bracket
This is hard! Let me explain:
I do believe there are German words in the mix that are below 4 characters in length that you should exclude. My German isn't great but there would probably be a huge deal of words below 4 characters;
There seems to be substrings that are 3+ characters in length but should probably stay uppercase, e.g. '550E/ER';
There seem to be quite a bunch of characters that could be used as delimiters to split the input into 'words'. It's hard to catch any of them without a full list;
Possible other reasons;
With the above in mind I think it's safe to say that we can try to accomplish something that you want as best as we can. Therefor I'd suggest
To split on multiple characters;
Exclude certain words from being uppercase when length < 3;
Include certain words to be uppercase when length > 3 and digits are present;
Assume 1st character could be made uppercase in any input;
For example:
Formula in B1:
=MAP(A1:A5,LAMBDA(v,LET(x,TEXTSPLIT(v,{"-","/"," ","."},,1),y,TEXTSPLIT(v,x,,1),z,TEXTJOIN(y,,MAP(x,LAMBDA(w,IF(SUM(--(w={"zu","ein","für","aus"})),LOWER(w),IF((LEN(w)<4)+SUM(IFERROR(FIND(SEQUENCE(10,,0),w),)),UPPER(w),LOWER(w)))))),UPPER(LEFT(z))&MID(z,2,LEN(v)))))
You can see how difficult it is to capture each and every possibility;
The minute you exclude a few words, another will pop-up (the 'x' between numbers for example. Which should stay upper/lower-case depending on the context it is found in);
The second you include words containing digits, you notice that some should be excluded ('00SICHERUNGS....');
If the 1st character would be a digit, the whole above solution would not change 1st alpha-char in upper;
Maybe some characters shouldn't be used as delimiters based on context? Think about hypenated words;
Possible other reasons.
Point is, this is not just hard, it's extremely hard if not impossible to do on the type of data you are currently working with! Even if one is proficient with writing a regular expression (chuck in all (non-available to Excel) tokens, quantifiers and methods if you like), I'd doubt all edge-case could be covered.
Because you are dealing with any number of words in a cell you'll need to get crafty with this one. Thankfully there is TEXTSPLIT() and TEXTJOIN() that can make short work of splitting the text into words, where we can then test the length, change the capitalization, and then join them back together all in one formula:
=TEXTJOIN(" ", TRUE, IF(LEN(TEXTSPLIT(C6," "))<=3,UPPER(TEXTSPLIT(C6," ")),PROPER(TEXTSPLIT(C6," "))))
Also used PROPER() formula as well, which only capitalizes the first character of a word.

How to make FIND function exact in Excel

I'm using the FIND function in Excel to check whether certain characters appear in a string of characters in a cell.
However, this function doesn't work cleanly for certain special characters. Specifically F̌,B̌, and some others. When F̌ appears in the string, FIND recognizes it as both F and F̌.
Notable that this is not the case for characters such as Ď and Č. FIND works nicely for these.
How can I get the formula to always differentiate between characters with and without the hat? Is there a way to work in EXACT?
Thank you!
It is because F̌ is actually two characters.
=LEN("F̌") returns 2 not 1. The second character is the hat.
If you do:
=UNICHAR(70)&UNICHAR(780)
It will return the F̌
And as such =FIND("F","F̌") will return 1 as it is the first letter of a two character string.
To find "F" in A,B,F̌,F use:
=AGGREGATE(15,7,ROW($ZZ1:INDEX($ZZ:$ZZ,LEN(A1)))/((MID(A1,ROW($ZZ1:INDEX($ZZ:$ZZ,LEN(A1))),1)="F")*(MID(A1,ROW($ZZ2:INDEX($ZZ:$ZZ,LEN(A1)+1)),1)<>UNICHAR(780))),1)
To find either then we need to use IF:
=IF(LEN(A2)=2,FIND(A2,A1),AGGREGATE(15,7,ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(A1)))/((MID(A1,ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(A1))),1)=A2)*(MID(A1,ROW($ZZ$2:INDEX($ZZ:$ZZ,LEN(A1)+1)),1)<>UNICHAR(780))),1))
Given that your substrings are comma-separated, look for the character followed by a comma (and add a comma to the end of the string to find the last character).
This allows you to separate multicharacter substrings from uni-character substrings where the latter is contained in the former.
You could use something like:
=FIND("F,",A5&",")
That will find an F in A5, but will not find an F if only F̌ is present

Is there a way to use the Find and Replace function in Excel to replace the first and last entry in a string of text?

I have a sequence of a letters, in my case part of a gene. I want to change the first and the last letter in this string of text, but keep the internal characters the same.
For example, if I have the the sequence:
ATCGAATCCATGACG
And I want to change the first letter, in this case A to the word START and change the last letter, G, to STOP all while keeping the internal A's and G's the same. Is this possible to do with the Find and Replace function, or will I have to write a script?
It is easy to do when I have a handful of sequences, I do it by hand. When I get into the hundreds, it can be very difficult.
Thank you.
The function LEN(text) returns the number of characters within a string of letters. MID(text, start, num_chars) returns the middle section of a string. CONCATENATE(text1, text2, ...) pieces together different strings. We can use these in combination to get what you want:
=CONCATENATE("START", MID(A1,2,LEN(A1)-2), "STOP")
You could use replace, and focus on the left and right side independently, then combine, or you can use left/right to add string of text to the available string minus a character, like:
="START"&LEFT(RIGHT(A1,LEN(A1)-1),LEN(A1)-2)&"STOP"
I used left/right, but mid would also work
just another option:
=REPLACE(REPLACE(A1,LEN(A1),1,"STOP"),1,1,"START")

Attempting to extract one of two possible words from a text string if matches a criteria, or input the string in it's entirety

I'm trying to bring back one of two possible words (Local or National) from a text string, and if neither of these words are in the text string, then bring back the string in the whole cell
The issue I have is that I can bring back either word when they appear, but I get an error when they do not
I'm currently using
=IFERROR(IF(SEARCH("*local*",B2,1),"Local"),IF(SEARCH("*national*",B2,1),"National"))
However this obviously this doesn't bring back if the words not exist
I'm sure it's easy and I'm missing something, but I just cannot figure it out. any help would be great
Cheers all
You can use:
Formula in B1:
=IF(ISNUMBER(SEARCH("*local*",A1)),"Local",IF(ISNUMBER(SEARCH("*national*",A1)),"national",A1))
Drag down
Note:
Notice that your wildcards make it that even a string with 'international' in it will return 'national'. If this is not what you want, you should remove the wildcards.
You can also use INDEX/AGGREGATE:
=IFERROR(INDEX({"local","national"},AGGREGATE(15,7,ROW($1:$2)/(ISNUMBER(SEARCH({"local";"national"},A1))),1)),A1)
This will allow one to replace the both hard coded arrays with a range of cells that contain the outputs. If Local and National were in D1:D2 then you can use:
=IFERROR(INDEX($D:$D,AGGREGATE(15,7,ROW($D$1:$D$2)/(ISNUMBER(SEARCH($D$1:$D$2,A1))),1)),A1)
That way if the list gets bigger the formula does not.
I'd suggest regular expressions.
=IF(REGEXMATCH(A2,"(Local|National)"),REGEXEXTRACT(A2,"(Local|National)"),A2)

Unable to keep a certain portion from a text and kick out the rest

Is it possible to create any formula in excel to kick out any certain portion from a string and keep the rest? If I consider this a string Utopia(UTP), my expected output is UTP. To be specific: I would like to grab the bracketed portion and strip the rest.
These are the texts I would like to apply formula on:
Utopia(UTP)
Euphoria(EUPR)
Ecstasy(ECST)
The output I wish to have:
UTP
EUPR
ECST
FIND() will identify where a particular character first appears in a string. You can use this to find the parentheses in your text strings. Then plug those numbers into MID to extract the strings you want:
=MID(A2,FIND("(",A2)+1,FIND(")",A2)-FIND("(",A2)-1)

Resources