Sorting strings with numbers in Excel - excel

I have table with strings like this:
abc-1.2.3-1
abc-1.2.11-3
abc-1.11.3-2
abc-1.2.11-21
abcd-12345-7
abcd-12345-21
abc-def-1-3
Now, what I know for sure that there is a name (I don't know the length, it can be 1 character as well as it can be 15 characters and it can contain hyphens itself) then hyphen character and then version number.
As you have probably noticed, "version numbers" have different structures, one looks like 12345-7 and in next line there can be 1.2.3-3 (basically, I know this is a number with optional dots beetwen digits (1.2.3 part) which ends with hyphen and a number (e.g. -23).
If I simply sort it, then I have something like this:
abc-1.11.3-2
abc-1.2.11-21
abc-1.2.11-3
abc-1.2.3-1
abcd-12345-21
abcd-12345-7
abc-def-1-3
What I'd like would be:
abc-1.2.3-1
abc-1.2.11-3
abc-1.2.11-21
abc-1.11.3-2
abcd-12345-7
abcd-12345-21
abc-def-1-3
Sort it alphabetically, but whenever you find a number, treat it like a number.
I've seen a solutions with using functions LEN/MID/RIGHT and similar, but don't know how to apply them to my case.

Related

How to make an excel (365) function that recognizes different words in the same cell and changes them individually

What im working with
I have a list of product names, but unfortunately they are written in uppercase I now want to make only the first letter uppercase and the rest lowercase but I also want all words with 3 or less symbols to stay uppercase
im trying if functions but nothing is really working
i use the german excel version but i would be happy if someone has any idea on how to do it im trying different functions for hours but nothing is working
=IF(LENGTH(C6)<=3,UPPER(C6),UPPER(LEFT(C6,1))&LOWER(RIGHT(C6,LENGTH(C6)-1)))
but its a #NAME error excel does not recognize the first and the last bracket
This is hard! Let me explain:
I do believe there are German words in the mix that are below 4 characters in length that you should exclude. My German isn't great but there would probably be a huge deal of words below 4 characters;
There seems to be substrings that are 3+ characters in length but should probably stay uppercase, e.g. '550E/ER';
There seem to be quite a bunch of characters that could be used as delimiters to split the input into 'words'. It's hard to catch any of them without a full list;
Possible other reasons;
With the above in mind I think it's safe to say that we can try to accomplish something that you want as best as we can. Therefor I'd suggest
To split on multiple characters;
Exclude certain words from being uppercase when length < 3;
Include certain words to be uppercase when length > 3 and digits are present;
Assume 1st character could be made uppercase in any input;
For example:
Formula in B1:
=MAP(A1:A5,LAMBDA(v,LET(x,TEXTSPLIT(v,{"-","/"," ","."},,1),y,TEXTSPLIT(v,x,,1),z,TEXTJOIN(y,,MAP(x,LAMBDA(w,IF(SUM(--(w={"zu","ein","für","aus"})),LOWER(w),IF((LEN(w)<4)+SUM(IFERROR(FIND(SEQUENCE(10,,0),w),)),UPPER(w),LOWER(w)))))),UPPER(LEFT(z))&MID(z,2,LEN(v)))))
You can see how difficult it is to capture each and every possibility;
The minute you exclude a few words, another will pop-up (the 'x' between numbers for example. Which should stay upper/lower-case depending on the context it is found in);
The second you include words containing digits, you notice that some should be excluded ('00SICHERUNGS....');
If the 1st character would be a digit, the whole above solution would not change 1st alpha-char in upper;
Maybe some characters shouldn't be used as delimiters based on context? Think about hypenated words;
Possible other reasons.
Point is, this is not just hard, it's extremely hard if not impossible to do on the type of data you are currently working with! Even if one is proficient with writing a regular expression (chuck in all (non-available to Excel) tokens, quantifiers and methods if you like), I'd doubt all edge-case could be covered.
Because you are dealing with any number of words in a cell you'll need to get crafty with this one. Thankfully there is TEXTSPLIT() and TEXTJOIN() that can make short work of splitting the text into words, where we can then test the length, change the capitalization, and then join them back together all in one formula:
=TEXTJOIN(" ", TRUE, IF(LEN(TEXTSPLIT(C6," "))<=3,UPPER(TEXTSPLIT(C6," ")),PROPER(TEXTSPLIT(C6," "))))
Also used PROPER() formula as well, which only capitalizes the first character of a word.

Excel: Find words of certain length in string?

I have this file where I want to make a conditional check for any cell that contains the letter combination "_SOL", or where the string is followed by any numeric character like "_SOL1524", and stop looking after that. So I don't want matches for "_SOLUTION" or "_SOLothercharactersthannumeric".
So when I use the following formula, I also get results for words like "_SOLUTION":
=IF(ISNUMBER(FIND("_SOL",A1))=TRUE,"Yay","")
How can I avoid this, and only get matches if the match is "_SOL" or "_SOLnumericvalue" (one numeric character)
Clarification: The whole strings may be "Blabla_SOL_BLABLA", "Blabla_SOLUTION_BLABLA" or "Blabla_SOL1524_BLABLA"
Maybe this, which will check if the character after "_SOL" is numeric.
=IF(ISNUMBER(VALUE(MID(A1,FIND("_SOL",A1)+4,1))),"Yay","")
Or, as per OP's request and suggestion, to include the possibility of an underscore after "SOL"
=IF(OR(ISNUMBER(VALUE(MID(A1,FIND("_SOL",A1)+4,1))),ISNUMBER(FIND("_SOL_",A1))),"Yay","")
Here is an alternative way to check if your string contains SOL followed by either nothing or any numeric value up to any characters after SOL:
=IF(COUNT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","1</s><s>")&"</s></t>","//s[substring-after(.,'SOL')*0=0]")>0),"Yey","Nay")
Just to use in an unfortunate event where you would encounter SOL1TEXT for example. Or, maybe saver (in case you have text like AEROSOL):
=IF(COUNT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","</s><s>")&"</s></t>","//s[translate(.,'1234567890','')='SOL']")>0),"Yey","Nay")
And to prevent that you have text like 123SOL123 you could even do:
=IF(COUNT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","1</s><s>")&"</s></t>","//s[starts-with(., 'SOL') and substring(., 4)*0=0]")>0),"Yey","Nay")

How remove a character from the middle of a string in excel

I have 1900 codes (rows) (eg V933) that correspond to surgical operation types. I need to convert them all from V93.3 to V933 - ie remove a period after the first three characters.
I have seen solutions for prefixing and appending but not in the middle of a string.
How about substitute(), like this:
=SUBSTITUTE(A2,".","")

How to extract string value based on delimiters

I need help with an Excel formula to extract a value from a string, based on delimiters.
This is the string I would like to extract the first 10 fields from: ES_ABC_FACEBOOK_SocialImage_FACEBOOK_Reach(CPM)_DEM_18-45_Apr19_abc_def_ghi
In other words, I would need to get ES_ABC_FACEBOOK_SocialImage_FACEBOOK_Reach(CPM)_DEM_18-45_Apr19_abc
Bearing in mind that the number of fields (separated by delimiters) may vary in the dataset but that I need to consistently only pick up the first 10 fields and drop however many fields are succeeding the 10th field.
Thanks in advance!
Robin
You could try this:
=LEFT(<your cell>,FIND("||",SUBSTITUTE(<your cell>,"_","||",10))-1)
e.g. =LEFT(A1,FIND("||",SUBSTITUTE(A1,"_","||",10))-1)
The formula finds the 10th underscore, and the gives you all the characters up to it (minus the underscore).
If you need to change how many fields it gives you back, change the 10. The -1 at the end just removes the final underscore. Of note, the || is just a simple set of characters I can't imagine will ever appear in your strings. If it does, something else will need to be selected.
Lastly, if some of your strings will have less than 10 fields, try:
=IF(ISERROR(FIND("||",SUBSTITUTE(<your cell>,"_","||",10))-1),<your cell>,LEFT(<your cell>,FIND("||",SUBSTITUTE(<your cell>,"_","||",10))-1))
This gives you the whole string in the event that there are less than 10 fields.
Hope that helps.

Use of Excel text parsing functions to extract from a string with complex format

I have a list of items, with a sample as such:
(CompanyName){space}(PartNumber ending with -){space}(Revision Level).pdf
Company 100-50006- Rev. A.pdf
Company Two 6001241- Rev. CN.pdf
CompanyThree 109581- Rev. B.pdf
My goal is to get three unique pieces of information using Excel: Company Name, Part Number, Revision.
The revision is easy to capture. I am trying to find a way to capture the Company (segregating from the first appearance of any Numeric value). I am also trying to find a way to capture the whole part number.
What function can I use to locate the first numeric character, and do a LEFT(A2,LEN(FUNCTION HERE)-1) where the -1 is due to the spacing?
Similarly, I want to do something to find MID(A2,LEN(FUNCTIONHERE TO FIND BEGINNING NUMERIC), LEN(FUNCTIONHERE TO FIND SPACE OR "REV" AND SEGREGATE AFTER SUCH).
Okay, I don't know if there might be more spaces in the company name, but for the sample you provided, the below formulae work:
=IF(ISERROR(FIND("-",LEFT(A2,FIND(" ",A2,9)))),LEFT(A2,FIND(" ",A2,9)),LEFT(A2,FIND(" ",A2,8)))
=IF(ISERROR(FIND("-",LEFT(A2,FIND(" ",A2,9)))),MID(A2,FIND(" ",A2,9)+1,FIND(" Rev.",A2)-FIND(" ",A2,9)-1),MID(A2,FIND(" ",A2,8)+1,FIND(" Rev.",A2)-FIND(" ",A2,8)-1))
It's a bit long though ^^;
It will work for Company Two. Since T is the 9th index in the string, the default formula will look for the next space, which is inside the revision, and also grab a -, which I'm using in the condition. If there is a -, it means that there is a single space in the company name, and thus, reset the search for space from the 8th index.
And MID just works on the same principle, with +1 and -1 to remove the extra spaces.
Note: It won't work if there are more than two spaces in the company name, e.g. Company the first or names having spaces after the 9th character e.g. Companies Twenty.
This may be much easier with the help of even Word's (primitive) regex. Load into Word, Replace All with Use wildcards ticked: first ( [0-9]) with ^t\1 then (- ) with \1^t and load back into Excel. (Copes with the otherwise tricky issue of the number of spaces in a company name).

Resources