Only include alphanumeric characters in a column? - excel

I am quite new to Excel and am looking for the most straightforward way to solve only including alphanumeric characters in a column of strings. If there are non-alphanumerics, they should simply be deleted in the string and leave only the alphanumerics.
I'd like to avoid using nested SUBSTITUTES functions because it looks clunky, but more importantly, I can't know/predict all the special symbols that could come up. I can't do SUBSTITUTE('hello-world', "-", "") -> "helloworld" because I don't want to exactly get rid of special characters, more-so only keep alphanumeric characters. So it should more so be something like - KEEP('hello-world', [A-Z,a-z,0-9]) (but I can't find something like that.
Is there a more straightforward way to do this without Visual Basic or Macros? I'd have to learn how to do those and it seems time-consuming. If not, I'd love a VBA or Macros idea.
Thanks so much!

Related

How to make an excel (365) function that recognizes different words in the same cell and changes them individually

What im working with
I have a list of product names, but unfortunately they are written in uppercase I now want to make only the first letter uppercase and the rest lowercase but I also want all words with 3 or less symbols to stay uppercase
im trying if functions but nothing is really working
i use the german excel version but i would be happy if someone has any idea on how to do it im trying different functions for hours but nothing is working
=IF(LENGTH(C6)<=3,UPPER(C6),UPPER(LEFT(C6,1))&LOWER(RIGHT(C6,LENGTH(C6)-1)))
but its a #NAME error excel does not recognize the first and the last bracket
This is hard! Let me explain:
I do believe there are German words in the mix that are below 4 characters in length that you should exclude. My German isn't great but there would probably be a huge deal of words below 4 characters;
There seems to be substrings that are 3+ characters in length but should probably stay uppercase, e.g. '550E/ER';
There seem to be quite a bunch of characters that could be used as delimiters to split the input into 'words'. It's hard to catch any of them without a full list;
Possible other reasons;
With the above in mind I think it's safe to say that we can try to accomplish something that you want as best as we can. Therefor I'd suggest
To split on multiple characters;
Exclude certain words from being uppercase when length < 3;
Include certain words to be uppercase when length > 3 and digits are present;
Assume 1st character could be made uppercase in any input;
For example:
Formula in B1:
=MAP(A1:A5,LAMBDA(v,LET(x,TEXTSPLIT(v,{"-","/"," ","."},,1),y,TEXTSPLIT(v,x,,1),z,TEXTJOIN(y,,MAP(x,LAMBDA(w,IF(SUM(--(w={"zu","ein","für","aus"})),LOWER(w),IF((LEN(w)<4)+SUM(IFERROR(FIND(SEQUENCE(10,,0),w),)),UPPER(w),LOWER(w)))))),UPPER(LEFT(z))&MID(z,2,LEN(v)))))
You can see how difficult it is to capture each and every possibility;
The minute you exclude a few words, another will pop-up (the 'x' between numbers for example. Which should stay upper/lower-case depending on the context it is found in);
The second you include words containing digits, you notice that some should be excluded ('00SICHERUNGS....');
If the 1st character would be a digit, the whole above solution would not change 1st alpha-char in upper;
Maybe some characters shouldn't be used as delimiters based on context? Think about hypenated words;
Possible other reasons.
Point is, this is not just hard, it's extremely hard if not impossible to do on the type of data you are currently working with! Even if one is proficient with writing a regular expression (chuck in all (non-available to Excel) tokens, quantifiers and methods if you like), I'd doubt all edge-case could be covered.
Because you are dealing with any number of words in a cell you'll need to get crafty with this one. Thankfully there is TEXTSPLIT() and TEXTJOIN() that can make short work of splitting the text into words, where we can then test the length, change the capitalization, and then join them back together all in one formula:
=TEXTJOIN(" ", TRUE, IF(LEN(TEXTSPLIT(C6," "))<=3,UPPER(TEXTSPLIT(C6," ")),PROPER(TEXTSPLIT(C6," "))))
Also used PROPER() formula as well, which only capitalizes the first character of a word.

How to search for items with multiple "-" in excel or VBA?

I have a list of item numbers (100K) like this:
Some of the items have format like SAG571A-244-4 (thousands) which need to be filtered so I can delete them and only keep the items that have ONE hyphen per SKU. How can I isolate the items that have two instances of "-" in it's SKU? I'm open to solutions within Excel or using VBA as well.
Native text filters don't seem to be capable of this. I'm stumped.
As per John Coleman's comment, "*-*-*" can be used to isolate strings that have at least two dashes in them.
I would add that if you're entering them as a custom text filter, you should lose the double quotes (so just *-*-*) as otherwise the field seems to interpret the quotes literally.
Seems to work for me.
If you want just an excel formula to verify this and give you a result of the number of hyphens (0, 1, or 2+), here is one:
=IF(ISERROR(SEARCH("-",A1)),"0",IF(ISERROR(SEARCH("-",A1,IFERROR(SEARCH("-",A1)+1,LEN(A1)))),"1","2+"))
Replace A1 with your relevant column, then fill down. This is kind of a terrible way to do this performance wise, but you avoid using VBA and possibly xlsm files.
The code first checks to see if there is one hyphen, then if there is it checks to see if there is another hyphen after the position the first one was found. Looking for multiple hyphens in this manner is cumbersome and I don't recommend it.

Excel 2016: Searching for multiple terms in a cell

I'm trying to do a search for multiple strings in a cell with an OR-condition in Excel 2016.
E.g. I have a string abcd1234 and I want to find ab OR 12.
I'm using the german version where the function SEARCH is called SUCHEN and it should behave the same way.
I found this answer which suggests this solution:
SEARCH({"Gingrich","Obama","Romney"},C1).
I also found this website which suggests the same syntax:
SEARCH({"red","blue","green"},B5)
Same with this website:
SEARCH({"mt","msa","county","unemployment","|nsa|"},[#Tags])
So they basically say make a list of search terms separated by commas enclosed by curly braces and you're good.
But putting these into Excel 2016 just results in the usual meaningless Excel error message which says there was an error with the formula and it's always highlighting the whole part in curly braces.
Taking the first example the only way I could get Excel to not throw its error message was to change the syntax like this:
=SEARCH({"Gingrich";"Obama";"Romney"};C1)
But separating the search terms with semicolons doesn't apply the OR-condition correctly, so this is not the solution.
I'm aware from this answer that I could make separate searches and string them together with a condition, but I would like to avoid that, and I also want to know why the syntax that is supposed to work as confirmed by multiple sources is not working for me.
EDIT:
Okay, I'm starting to understand this, thanks to Solar Mike:
The code =IF(COUNT(SEARCH({"Romney","Obama","Gingrich"},A1)),1,"") works indeed perfectly fine.
Also =COUNT(SEARCH({"Romney","Obama","Gingrich"},A1)) works.
But =SEARCH({"Romney","Obama","Gingrich"},A1) does not.
Also =ISNUMBER(SEARCH({"Gingrich","Obama","Romney"},A1)) does not work.
I'd love to know the reason why.
Ok, so this works:
OR(IFERROR(FIND("ab",A1,1),0),IFERROR(FIND("12",A1,1),0))
tested here :
I followed one of the links and the version like this:
=IF(COUNT(SEARCH({"Romney","Obama","Gingrich"},C1)),1,"")
worked as expected for me, but if the search is isolated it then fails and I have not found an explanation ...
Like other array-style formulas, the part that delivers the array has to be enclosed in some sort of aggregate function to make it scan through the array - otherwise it only looks at the first element of the array. So anything like COUNT, SUM, SUMPRODUCT will do the trick.
My preferred one is
=OR(ISNUMBER(SEARCH({"a","b","c"},A1)))
because you can easily change it to this if you want AND logic:
=AND(ISNUMBER(SEARCH({"a","b","c"},A1)))

Finding question marks in cell string

I would like to find out if a string within a cell in Excel 2010 contains any of the following, and then return a '1'.
?dementia
? dementia
dementia?
dementia ?
I've tried some formulas but they don't seem able to get past the use of the wildcard and the string when combined.
Would anyone have any pointers or advice?
Here is a combination of the suggested answers, and my own work around:
This is the formula you need:
=IF(IFERROR(FIND(A2,$C$3),0)>0,1,0)
where A2 is the string that you are trying to check.
Assuming you don't want to include word ? word and return 1, you can use something like this:
=IF(OR(LEFT(A1,1)="?",RIGHT(A1,1)="?"),1,"")
You will want to use the FIND function (Documentation).
Make sure your search term in in quotation marks.
If you are still having trouble with wildcards, you can try the CHAR function (Documentation). It will return the character as a string. In your case, to get a ?, you would use CHAR(63). I use this chart to keep track of CHAR codes (just use the number and ignore the "Alt")
REVISED:
If I understand the question correctly, you are trying to search long string in a short string list.
The formula you can use is:
=ISTEXT(LOOKUP(99^99,SEARCH(SUBSTITUTE($A$2:$A$5,"?","|"),SUBSTITUTE(C2,"?","|")),$A$2:$A$5))*1
The problem with my prior formula is due to that ? mark because it is being treated as a wild card. Here I have replace it with | (you can change this to a character that you will never use but avoid ? and *). This should work for you now.

Number representation by Excel

I'm building a VBA program on Excel 2007 inputing long string of numbers (UPC). Now, the program usually works fine, but sometimes the number string seems to be converted to scientific notation and I want to avoid this, since I then VLook them up.
So, I'd like to treat a textbox input as an exact string. No scientific notation, no number interpretation.
On a related side, this one really gets weird. I have two exact UPC : both yield the same value (as far as I or any text editor can tell), yet one of the value gives a successful Vlookup, the other does not.
Anybody has suggestions on this one? Thanks for your time.
Long strings that look like numbers can be a pain in Excel. If you're not doing any math on the "number", it should really be treated as text. As you've discovered, when you want to force Excel to treat something as a string, precede it with an apostrophe.
There are a couple of common problems with VLOOKUP. The one you found, extra whitespace, can be avoided by using a formula such as
=VLOOKUP(TRIM(A1),B1:C:100,2,FALSE)
The TRIM function will remove those extraneous spaces. The other common problem with VLOOKUP is that one argument is a string and the other is a number. I run into this one a lot with imported data. You can use the TEXT function to do the VLOOKUP without having to change the raw data
=VLOOKUP(TEXT(A1,"00000"),B1:C100,2,FALSE)
will convert A1 to a five digit string before it tries to look it up in column B. And, of course, if your data is a real mess, you may need
=VLOOKUP(TEXT(TRIM(A1),"00000"),B1:C100,2,FALSE)

Resources