Remove letters from a cell leaving numbers only - excel

I am trying to remove all letters from a cell and leave the numbers remaining.
I have found bits of code and other questions on here but none are making much sense to me.
I have in cell E23 "as12df34" and want the value of Cell E23 to read "12 34"
Can anyone help with this query please?

You could use a regular expression:
Sub UsageExample()
Dim cl
' iterate each cell
For Each cl in Range("Sheet1!A1:A100")
' replace each non digit sequence by a space
cl.Value = ReplaceRe(cl.Value, "\D+", " ")
End Sub
Public Function ReplaceRe(text As String, pattern As String, replacement) As String
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp")
re.Global = True
End If
re.pattern = pattern
ReplaceRe = re.Replace(text, replacement)
End Function

Here's a UDF if you want to do something like that. Making "Spaces" True or False will allow for you to have a single space where non-numeric characters used to be.
Sub Test()
Debug.Print Nums("as12df34", True)
End Sub
Function Nums(What As String, Spaces As Boolean) As String
Dim i As Long
For i = 1 To Len(What)
If IsNumeric(Mid(What, i, 1)) = True Then Nums = Nums & Mid(What, i, 1)
If IsNumeric(Mid(What, i, 1)) = False Then Nums = Nums & " "
Next i
Nums = Trim(Nums)
If Spaces = True Then
Do Until InStr(Nums, " ") = 0
Nums = Replace(Nums, " ", " ")
Do Until InStr(Nums, " ") = 0
Nums = Replace(Nums, " ", "")
End If
End Function

I know this may have been answered, but I wanted to let others that may come across this question to see another possibility. I came up with an obvious solution to eliminate all the letters to be replaced with nothing to only leave numbers in the cell. You can just replace the "" for a " " to leave the space that the letters left behind.
It's a huge clutter, but I use it and it works as intended just drag the function to the next cell. No typing required. In my situation, I had a word like "platinum ingot, 3" and it will remove all the letters, comma, and spaces and leaves 3 which can be used to calculate stuff with. I use this to hold 2 values in 1 cell when 1 of the value is never going to also contain numbers.


How to replace certain character in a string

I am trying to replace not each space in a single string with line break. String is taken from specific cell, and looks like:
Now, Im trying to replace each space after abbreviation to line break. The abbreviation can be any, so the best way for precaching which space I intend to replace is like: each space after number and before a letter?
The output I want to get is like:
Below is my code, but it will change every space to line break in cell.
Private Sub Workbook_SheetChange(ByVal Sh As Object, ByVal Target As Range)
On Error GoTo Exitsub
If Not Intersect(Target, .Columns(6)) Is Nothing Then
Application.EnableEvents = False
Target.Value = Replace(Target, " ", Chr(10))
End If
Application.EnableEvents = True
Application.EnableEvents = True
End Sub
You can try
Target.Value = Replace(Target, "kg ", "kg" & Chr(10))
If you can have other abbreviations like "g" or "t", do something similar for them (maybe in a Sub), just be cautious with the order (replace first "kg", then "g")
Update: If you don't know in advance the possible abbreviations, one attempt is to use regular expressions. I'm not really good with them, but the following routine seems to do:
Function replaceAbbr(s As String) As String
Dim regex As New RegExp
regex.Global = True
regex.Pattern = "([a-z]+) "
replaceAbbr = regex.Replace(s, "$1" & Chr(10))
End Function
The below will replace every 2nd space with a carriage return. For reason unknown to me The worksheet function Replace will work as intended, but the VBA Replace doesnt
This will loop through every character in the defined area, you can change this to whatever you want.
The if statement is broken down as such
(SpaceCount Mod 2) = 0 this part is what enable it to get every 2nd character.
As a side note (SpaceCount Mod 3) = 0 will get the 3rd character and (SpaceCount Mod 2) = 1 will do the first character then every other character
Cells(1, 1).Characters(CountChr, 1).Text = " " is to make sure we are replacing a space, if the users enters something funny that looks like a space but isn't, that's on them
I believe something like this will work as intended for you
Private Sub Workbook_SheetChange(ByVal Sh As Object, ByVal Target As Range)
On Error GoTo Exitsub
Application.EnableEvents = False
For CountChr = 1 To Len(Target.Value)
If Target.Characters(CountChr, 1).Text = " " Then
Dim SpaceCount As Integer
SpaceCount = SpaceCount + 1
If (SpaceCount Mod 2) = 0 Then
Target.Value = WorksheetFunction.Replace(Target.Value, CountChr, 1, Chr(10))
End If
End If
Next CountChr
Application.EnableEvents = True
Application.EnableEvents = True
End Sub
Identify arbitrary abbreviation first
"abbreviations aren't determined ..."
Knowing the varying abbreviation which, however is the same within each string (here e.g. kg ) actually helps following the initial idea to look at the blanks first: but instead of replacing them all by vbLf or Chr(10), this approach
a) splits the string at this " " delimiter into a zero-based tmp array and immediately identifies the arbitrary abbreviation abbr as second token, i.e. tmp(1)
b) executes a negative filtering to get the numeric data and eventually
c) joins them together using the abbreviation which is known now for the given string.
So you could change your assignment to
Target.Value = repl(Target) ' << calling help function repl()
Possible help function
Function repl(ByVal s As String) As String
'a) split into tokens and identify arbitrary abbreviation
Dim tmp, abbr As String
tmp = Split(s, " "): abbr = tmp(1)
'b) filter out abbreviation
tmp = Filter(tmp, abbr, Include:=False)
'c) return result string
repl = Join(tmp, " " & abbr & vbLf) & abbr
End Function
Edit // responding to FunThomas ' comment
ad a): If there might be missing spaces between number and abbreviation, the above approach could be modified as follows:
Function repl(ByVal s As String) As String
'a) split into tokens and identify arbitrary abbreviation
Dim tmp, abbr As String
tmp = Split(s, " "): abbr = tmp(1)
'b) renew splitting via found abbreviation (plus blank)
tmp = Split(s & " ", abbr & " ")
'c) return result string
repl = Join(tmp, abbr & vbLf): repl = Left(repl, Len(repl) - 1)
End Function
ad b): following OP citing e.g. "10 kg 20 kg 30,5kg 15kg 130,5 kg" (and as already remarked above) assumption is made that the abbreviation is the same for all values within one string, but can vary from item to item.

Adding a space between two words once

I completed code to remove any data in front of a string, add some text (with a space) to the front and store it back in the cell.
However, every time I run the macro (to check if changes that I've made are working for example), a new space is added in between the words.
The code that removes anything before the name and adds the required string. I have called a InStr function and stored the value in integer pos. Note that this is in a loop over a specific range.
If pos > 0 Then
'Removes anything before the channel name
cellValue.Offset(0, 2) = Right(cell, Len(cell) - InStr(cell, pos) - 2)
'Add "DA" to the front of the channel name
cellValue.Offset(0, 0) = "DA " & Right(cell, Len(cell) - InStr(cell, pos) - 2)
'Aligns the text to the right
cellValue.Offset(0, 2).HorizontalAlignment = xlRight
End If
An additional "DA" is not being added and I haven't made any other functions to add spaces anywhere. The extra space is not added if adding "DA " is changed to "DA".
I'd prefer not to add another function/sub/something somewhere to search and remove any extra spaces.
What the string is AND what is in front of the string is unknown. It could be numbers, characters, spaces or exactly what I want it to be. For example, it could be "Q-Quincey", "BA Bob", "DA White" etc. I thought that searching through the cell for the string I want (Quincey, Bob, White) and altering the cell as needed would be the best way.
Solution that you all helped me come up with:
If pos > 0 Then
modString = Right(cell, Len(cell) - InStr(cell, pos) - 2)
'Removes anything before the channel name and places it in the last column
cellValue.Offset(0, 2) = modString
'Aligns the last column text to the right
cellValue.Offset(0, 2).HorizontalAlignment = xlRight
cellValue.Offset(0, 2).Font.Size = 8
'Add "DA" to the front of the channel name in the rightmost column
If StartsWith(cell, "DA ") = True Then
cellValue.Replace cell, "DA" & modString
cellValue.Replace cell, "DA " & modString
End If
End If
Maybe this is something you can work with:
Sample data:
Sample code:
Sub Test()
With Sheet1.Range("A1:A4")
.Replace "*quincey", "AD Quincey"
End With
End Sub
In your examples, it seems you want to replace the first "word" in the string with something else. If that is always the case, the following function, which makes use of Regular Expressions, can do that:
Option Explicit
Function replaceStart(str As String, replWith As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = False
.MultiLine = True
.Pattern = "^\S+\W(?=\w)"
replaceStart = .Replace(str, replWith)
End With
End Function
Sub test()
Debug.Print replaceStart("Q-Quincy", "DA ")
Debug.Print replaceStart("BA Bob", "DA ")
Debug.Print replaceStart("DA White", "DA ")
End Sub
The debug.print will -->
DA Quincy
DA Bob
DA White
The regular expression matches everything up to but not including the first "word" character that follows a non-word character. This should be the second word in the string.
A "word" character is anything in the set of [A-Za-z0-9_]
Seems to work on the examples you present.
If you wanted to go about it through a loop you should remove some redundancies in your code. For instance, refering to cell.offset(0,0) doesn't make sense.
I would set the target cells to a range and simply edit that cell with out placing the unwanted strings in another cell.
I'd try something like this.**
nameiwant = "Quincy"
Set cell = Range("A1")
If InStr(cell, nameiwant) > 0 And Left(cell, 3) <> "DA " Then
cell.Value = "DA " & nameiwant
End If

How to do a proper keyword search

If I search for the term 'tfo' in the cell value 'TFO_xyz' then the result should be TRUE.
If I search for the term 'tfo' in the cell value 'TFO systems' then the result should be TRUE.
If I search for the term 'tfo' in the cell value 'spring TFO' then the result should be TRUE.
BUT if I check 'tfo' in the cell value 'Platform' then I want the result as FALSE
I have used the formula =IF(COUNTIF(A2,"*tfo*"),"TRUE","FALSE"), but this wont give result as FALSE when I check 'tfo' in the word 'Platform'
Platform should be false because tfo is coming in between a word. I'm looking result as True for cell values with just the word tfo like in tfo<‌space>America or TFO_America or <‌space>TFO systems. But I want FALSE result for the words Platform and portfolio because in these two words the term tfo comes in between alphabets.
Try this:
Dim x As Long: x = 1
With Sheet1
Do While x <= .Cells(.Rows.Count, 1).End(xlUp).Row
If VBA.Left(.Cells(x, 1).Value, 3) = "tfo" Or VBA.Right(.Cells(x, 1).Value, 3) = "tfo" Then
.Cells(x, 2).Value = True
End If
x = x + 1
End With
Try this formula. This assumes that word tfo will be at the beginning or end
Just make sure to place appropriate cell names where i have 'A2' in the formula
Test Cases Below:
My suggestion is to spend sometime to know your data and create a white-list.
Since there is no easy way to properly do fuzzy search in strings.
Function TFO_Search(strText As String) As Boolean
Dim ArryString As Variant
Dim ArryWhitelist As Variant
' Create a White-List Array
ArryWhitelist = Array("TFO_", "TFO ", "_TFO", " TFO", "tfoAmerica")
For Each ArryString In ArryWhitelist
If InStr(UCase(strText), UCase(ArryString)) > 0 Then 'force to UPPER CASE
TFO_Search = True
Exit Function
TFO_Search = False
End If
End Function
I see two dimensions of complexity in your question:
Where does the key word occur in the text (beginning, middle, end)
What are the characters that separate words.
The first one is fixed size, you need to handle three cases. The second one depends on the number of characters you want to accept as delimiters. Below I assumed that you accept space and underscore, however, you may expand this set by inserting more SUBSTITUTE function calls.
In my table, $A2 is the cell in which you search for the keyword, while B$1 contains the keyword.
To standardize the separator character, you need the formula:
B2=SUBSTITUTE($A2,"_"," ")
To check if the string starts with the keyword:
C2=--(LEFT($B2,LEN(B$1)+1)=B$1&" ")
To check if the string ends with the keyword:
D2=--(RIGHT($B2,LEN(B$1)+1)=" "&B$1)
To check if the keyword is in the middle of the string:
E2=--(LEN(SUBSTITUTE(UPPER($B2)," "&UPPER(B$1)&" ",""))<LEN($B2))
To evaluate the above three cases:
If you want to use a single cell, combine the formulas into:
G2=--(0<--(LEFT(SUBSTITUTE($A2,"_"," "),LEN(B$1)+1)=B$1&" ")+--(RIGHT(SUBSTITUTE($A2,"_"," "),LEN(B$1)+1)=" "&B$1)+--(LEN(SUBSTITUTE(UPPER(SUBSTITUTE($A2,"_"," "))," "&UPPER(B$1)&" ",""))<LEN(SUBSTITUTE($A2,"_"," "))))
It is not very readable in the end but I don't think there was an easier solution using Formulas only.
Note: If you want to modify the set of characters accepted as delimiters, add more SUBSTITUTE function calls to B2, then copy the Formula of F2 into notepad and replace $C2 with the formula of C2, etc., then replace $B2 with the updated Formula of B2.
Building on the idea in Ron Rosenfelds comment to tigeravatar's answer, the formula can be simplified (the beginning, middle, ending cases can be joined):
=--(LEN(SUBSTITUTE(" "&UPPER($B2)&" "," "&UPPER(B$1)&" ",""))<LEN($B2))
After substituting $B2 with its formula:
=--(LEN(SUBSTITUTE(" "&UPPER(SUBSTITUTE($A2,"_"," "))&" "," "&UPPER(B$1)&" ",""))<LEN(SUBSTITUTE($A2,"_"," ")))
This formula will return true if TFO is at the beginning or end of any given word, or by itself, in the text string. It also checks every word in the text string, so TFO can be at beginning, middle, or end. The formula assumes that if a word starts or ends with TFO, then the result should be TRUE (as is the case for tfoAmerica so same rule would apply to tform), else FALSE.
=OR(ISNUMBER(SEARCH({" tfo","tfo "}," "&SUBSTITUTE(A2,"_"," ")&" ")))
Here are its results:
In the event that the result should only be TRUE if TFO is found by itself, then this version of the formula will suffice:
=ISNUMBER(SEARCH(" tfo "," "&SUBSTITUTE(A2,"_"," ")&" "))
Image showing results of second version:
If you can rely on VBA, then regex is a more flexible solution.
There is a good summary, of how to use them in VBA: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
For your keyword search problem I wrote the following:
Option Explicit
' Include: Tools > References > Microsoft VBScript Regular Expressions 5.5 (C:\Windows\SysWOW64\vbscript.dll\3)
Public Function SearchKeyWord(strHay As String, strNail As String, Optional strDelimiters As String = " _,.;/", Optional lngNthOccurrence As Long = 1) As Long ' Returns 1-based index of nth occurrence or 0 if not found
Dim strPattern As String: strPattern = CreatePattern(strNail, strDelimiters)
Dim rgxKeyWord As RegExp: Set rgxKeyWord = CreateRegex(strPattern, True)
Dim mtcResult As MatchCollection: Set mtcResult = rgxKeyWord.Execute(strHay)
If (0 <= lngNthOccurrence - 1) And (lngNthOccurrence - 1 < mtcResult.Count) Then
Dim mthResult As Match: Set mthResult = mtcResult(lngNthOccurrence - 1)
SearchKeyWord = mthResult.FirstIndex + Len(mthResult.SubMatches(0)) + 1
SearchKeyWord = 0
End If
End Function
Private Function CreateRegex(strPattern As String, Optional blnIgnoreCase As Boolean = False, Optional blnMultiLine As Boolean = True, Optional blnGlobal As Boolean = True) As RegExp
Dim rgxResult As RegExp: Set rgxResult = New RegExp
With rgxResult
.Pattern = strPattern
.IgnoreCase = blnIgnoreCase
.MultiLine = blnMultiLine
.Global = blnGlobal
End With
Set CreateRegex = rgxResult
End Function
Private Function CreatePattern(strNail As String, strDelimiters As String) As String
Dim strDelimitersEscaped As String: strDelimitersEscaped = RegexEscape(strDelimiters)
Dim strPattern As String: strPattern = "(^|[" & strDelimitersEscaped & "]+)(" & RegexEscape(strNail) & ")($|[" & strDelimitersEscaped & "]+)"
CreatePattern = strPattern
End Function
Private Function RegexEscape(strOriginal As String) As String
Dim strEscaped As String: strEscaped = vbNullString
Dim i As Long: For i = 1 To Len(strOriginal)
Dim strChar As String: strChar = Mid(strOriginal, i, 1)
Select Case strChar
Case ".", "$", "^", "{", "[", "(", "|", ")", "*", "+", "?", "\"
strEscaped = strEscaped & "\" & strChar
Case Else
strEscaped = strEscaped & strChar
End Select
Next i
RegexEscape = strEscaped
End Function
Once you have the above in a Module, you can insert formulas like the following:
where A1 contains e.g. "tfo America".
As a third parameter, you may specify, which characters you want to treat as delimiters, by default they are space, underscore, comma, dot, semicolon and slash.
The return value is the position of the nth occurrence of the keyword, where n is the value of the fourth parameter (default: 1), or 0 if not found.
To check if the keyword is present in A1, compare the result to 0, which means not found:

Remove words that contain each other and leave the longer one

I'm looking for a macro (preferably a function) that would take cell contents, split it into separate words, compare them to one another and remove the shorter words.
Here's an image of what I want the output to look like (I need the words that are crossed out removed):
I tried to write a macro myself, but it doesn't work 100% properly because it's not taking the last words and sometimes removes what shouldn't be removed. Also, I have to do this on around 50k cells, so a macro takes a lot of time to run, that's why I'd prefer it to be a function. I guess I shouldn't use the replace function, but I couldn't make anything else work.
Sub clean_words_containing_eachother()
Dim sht1 As Worksheet
Dim LastRow As Long
Dim Cell As Range
Dim cell_value As String
Dim word, word2 As Variant
Set sht1 = ActiveSheet
col = InputBox("Which column do you want to clear?")
LastRow = sht1.Cells(sht1.Rows.Count, col).End(xlUp).Row
Let to_clean = col & "2:" & col & LastRow
For i = 2 To LastRow
For Each Cell In sht1.Range(to_clean)
cell_value = Cell.Value
cell_split = Split(cell_value, " ")
For Each word In cell_split
For Each word2 In cell_split
If word <> word2 Then
If InStr(word2, word) > 0 Then
If Len(word) < Len(word2) Then
word = word & " "
Cell = Replace(Cell, word, " ")
ElseIf Len(word) > Len(word2) Then
word2 = word2 & " "
Cell = Replace(Cell, word2, " ")
End If
End If
End If
Next word2
Next word
Next Cell
Next i
End Sub
Assuming that the retention of the third word in your first example is an error, since books is contained later on in notebooks:
5003886 book books bound case casebound not notebook notebooks office oxford sign signature
and also assuming that you would want to remove duplicate identical words, even if they are not contained subsequently in another word, then we can use a Regular Expression.
The regex will:
Capture each word
look-ahead to see if that word exists later on in the string
if it does, remove it
Since VBA regexes cannot also look-behind, we work-around this limitation by running the regex a second time on the reversed string.
Then remove the extra spaces and we are done.
Option Explicit
Function cleanWords(S As String) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As String
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "\b(\w+)\b(?=.*\1)"
.ignorecase = True
'replace looking forward
sTemp = .Replace(S, "")
' check in reverse
sTemp = .Replace(StrReverse(sTemp), "")
'return to normal
sTemp = StrReverse(sTemp)
'Remove extraneous spaces
cleanWords = WorksheetFunction.Trim(sTemp)
End With
End Function
punctuation will not be removed
a "word" is defined as containing only the characters in the class [_A-Za-z0-9] (letters, digits and the underscore).
if any words might be hyphenated, or contain other non-word characters
in the above, they will be treated as two separate words
if you want it treated as a single word, then we might need to change the regex
General steps:
Write cell to array (already working)
for each element (x), go through each element (y) (already working)
if x is in y AND y is longer that x THEN set x to ""
concat array back into string
write string to cell
String/array manipulations are much faster than operations on cells, so this will give you some increase in performance (depending on the amount of words you need to replace for each cell).
The "last word problem" might be that you dont have a space after the last word within your cells, since you only replace word + " " with " ".

Returning a numeric value on either side of a dash in a string?

Does anyone know how to return only the numeric value immediately on either side of a dash in a string?
For example, let's say we have the following string "Text, 2-78, 88-100, 101". I'm looking for a way to identify a dash and then return one of the numbers (left or right).
Ultimately I would like to check to see if a given number, let's say 75, falls within any of the ranges noted in the string. Ideally it would see that 75 falls within "2-78".
Any help would be greatly appreciated!
Go to Tools->References and check "Microsoft VBScript Regular Expressions 5.5." Then you can do something like this. (I know this isn't good code, but it's the idea...) Also, this finds all the #-# patterns and prints either the left or right number for all of them (based on whether the boolean "left" is true or false).
Dim str, res As String
str = "Text, 2-78, 88-100, 101"
Dim left As Boolean
left = False
Dim re1 As New RegExp
re1.Pattern = "\d+-\d+"
re1.Global = True
Dim m, n As Match
For Each m In re1.Execute(str)
Dim re2 As New RegExp
re2.Global = False
If left Then
re2.Pattern = "\d+"
re2.Pattern = "-\d+"
End If
For Each n In re2.Execute(m.Value)
res = n.Value
If Not left Then
res = Mid(res, 2, Len(str))
End If
MsgBox res
You can do this many different ways with VBA. Using the Split() function to convert into an array, first using the commas as a delimiter and then using the dash would probably be a way to go.
That said, if you want a quick and dirty way to do this with excel ( from which you could record a macro ) here is what you can do.
Paste your target string into a cell.
Run Text to Columns on it, using the comma as your deliminator.
Copy the row your now have and Paste-Transpose onto a new sheet.
Run Text to Columns again on your transposed column, this time with the dash as your deliminator.
You now have side by side columns of your numbers, which you can compare to your target values as needed.
You may need to use the Trim() functions in there somewhere to remove whitespace, but hopefully the text to columns would leave you with numbers instead of text numbers.
Ultimately I think there are lots of ways you could approach this sort of problem. It looks like a good way to try and use RegExp. RegExp is not my speciality but I do like to try and use it to answer some Q's here on SO. This code has been tested for your example data and is working properly.
Something like this, assuming your text is in cell A1, and you're testing a value like 75, this also captures single digits in your string in the match collection:
Sub TestRegExp
Dim m As Match
Dim testValue As Long
Dim rangeArray As Variant
testValue = 75 'or whatever value you're trying to find
pattern = "[\d]+[-][\d]+\b|[\d]+"
Set re = New RegExp
re.pattern = pattern
re.Global = True
re.IgnoreCase = True 'doesn't really matter since you're looking for numbers
Set allMatches = re.Execute([A1])
For Each m In allMatches
rangeArray = Split(m, "-")
Select Case UBound(rangeArray)
Case 0
If testValue = rangeArray(0) Then
msg = testValue & " = " & m
msg = testValue & " NOT " & m
End If
Case 1
If testValue >= CLng(rangeArray(0)) And testValue <= CLng(rangeArray(1)) Then
msg = testValue & " is within range: " & m
msg = testValue & " is not within range: " & m
End If
Case Else
End Select
MsgBox msg, vbInformation
End Sub
