Removing Whole Numbers from an Alphanumeric String - excel

I'm having trouble finding a way to remove floating integers from a cell without removing numbers attached to the end of my string. Could I get some help as to how to approach this issue?
For example, in the image attached, instead of:
john123 456 hamilton, I want:
john123 hamilton

This can be done using regular expressions. You will match on the data you want to remove, then replace this data with an empty string.
Since you didn't provide any code, all I can do you for is provide you with a function that you can implement into your own project. This function can be used in VBA or as a worksheet function, such as =ReplaceFloatingIntegers(A1).
You will need to add a reference to Microsoft VBScript Regular Expressions 5.5 by going to Tools, References in the VBE menu.
Function ReplaceFloatingIntegers(Byval inputString As String) As String
With New RegExp
.Global = True
.MultiLine = True
.Pattern = "(\b\d+\b\s?)"
If .Test(inputString) Then
ReplaceFloatingIntegers = .Replace(inputString, "")
Else
ReplaceFloatingIntegers = inputString
End If
End With
End Function
Breaking down the pattern
( ... ) This is a capturing group. Anything captured in this group will be able to be replaced with the .Replace() function.
\b This is a word boundary. We use this because we want to test from the edge to edge of any 'words' (which includes words that contain only digits in our case).
\d+\b This will match any digit (\d), one to unlimited + times, to the next word boundary\b
\s? will match a single whitespace character, but it's optional ? if this character exists
You can look at this personalized Regex101 page to see how this matches your data. Anything matched here is replaced with an empty string.

Related

Extract email id of specific domain extensions

I need to extract email id from each row of specific domain extensions like .com .net .org everything else should be ignored. Below is the sample data of two rows.
.#.3,.#.1601466914865855,.#.,.#.null,.#.,abc#xyz.com,abc#xyz.net,abc#xyz.org,null.val#.#.,.##,abc#xyz.jpb,abc#xyz.xls,abc#xyz.321
.#.3,.#.1601466914865855,.#.,.#.null,.#.,123#hjk.com,123#hjk.net,123#hjk.org,null.val#.#.,.##,abc#xyz.jpb,abc#xyz.xls,abc#xyz.321
Whatever the first valid extension email matches is enough even though there are multiple id's only one email id is enough per row. Below is the sample desired result.
I believe this can be done with custom formula with regex but I can't wrap my head around it. I am using Desktop MS Excel latest version.
If your email addresses are relatively simple, you can use this regex:
\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}\b
In VBA:
Option Explicit
Function extrEmail(S As String) As String
Dim RE As Object, MC As Object
Const sPat As String = "\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}\b"
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = sPat
.ignorecase = True
.Global = False
.MultiLine = True
If .test(S) = True Then
Set MC = .Execute(S)
extrEmail = MC(0)
End If
End With
End Function
Matching an email address can become very complicated, and a regex that follows all the rules is extraordinarily complex and long. But this one is relatively simple, and might work for your needs.
Explanation of Regex
Emailaddress1
\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}\b
Options: Case insensitive; ^$ match at line breaks
Assert position at a word boundary \b
Match a single character present in the list below [A-Z0-9._%+-]+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
A character in the range between “A” and “Z” A-Z
A character in the range between “0” and “9” 0-9
A single character from the list “._%+” ._%+
The literal character “-” -
Match the character “#” literally #
Match a single character present in the list below [A-Z0-9.-]+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
A character in the range between “A” and “Z” A-Z
A character in the range between “0” and “9” 0-9
The literal character “.” .
The literal character “-” -
Match the character “.” literally \.
Match a single character in the range between “A” and “Z” [A-Z]{2,}
Between 2 and unlimited times, as many times as possible, giving back as needed (greedy) {2,}
Assert position at a word boundary \b
Created with RegexBuddy
EDIT: To match only specific domains, merely replace the part of the regex that matches domains with a group of pipe-separated domain names.
eg
\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.(?:com|net|org)\b

Remove characters A-Z from string [duplicate]

This question already has answers here:
Extracting digits from a cell with varying char length
(4 answers)
Closed 2 years ago.
I need to be able to remove all alphabetical characters from a string, leaving just the numbers behind.
I don't need to worry about any other characters like ,.?# and so on, just the letters of the alphabet a-z, regardless of case.
The closest I could get to a solution was the exact opposite, the below VBA is able to remove the numbers from a string.
Function removenumbers(ByVal input1 As String) As String
Dim x
Dim tmp As String
tmp = input1
For x = a To Z
tmp = Replace(tmp, x, "")
Next
removenumbers = tmp
End Function
Is there any modification I can make to remove the letters rather than numbers to the above, or am I going at this completely wrong.
The letters could fall anywhere in the string, and there is no pattern to the strings.
Failing this I will use CTRL + H to remove all letters one by one, but may need to repeat this again each week so UDF would be much quicker.
I'm using Office 365 on Excel 16
Option Explicit
dim mystring as String
dim regex as new RegExp
Private Function rgclean(ByVal mystring As String) As String
'function that find and replace string if contains regex pattern
'returns str
With regex
.Global = True ' return all matches found in string
.Pattern = "[A-Z]" ' add [A-Za-z] if you want lower case as well the regex pattern will pick all letters from A-Z and
End With
rgclean = regex.Replace(mystring, "") '.. and replaces everything else with ""
End Function
Try using regular expression.
Make sure you enable regular expression on: Tools > References > checkbox: "Microsoft VBScript Regular Expressions 5.5"
The function will remove anything from [A-Z], if you want to include lower case add [A-Za-z] into the regex.pattern values. ( .Pattern = "[A-Za-z]")
You just pass the string into the function, and the function will use regular expression to remove any words from in a string
Thanks

Excel: Find and Replace without also grabbing the beginning of another word

I'm currently working on shortening a large excel sheet using Find/Replace. I'm finding all instances of words like ", Inc.", ", Co." " LLC", etc. and replacing them with nothing (aka removing them). The problem I am having is that I'm unable to do similar searches for " Inc", ", Inc", ", Co", etc. and remove them because it will also remove them the beginnings of words like ", Inc"orporated, and ", Co"mpany.
Is there a blank character or something I can do in VBA that would allow me to just find/replace items with nothing after what I'm finding (I.e. finding ", Co" without also catching ", Co"rporated)?
In VBA you can use Regular Expressions to ensure that there are "word boundaries" before and after the abbreviation you are trying to remove. You can also remove extraneous spaces that might appear, depending on the original string.
Function remAbbrevs(S As String, ParamArray abbrevs()) As String
Dim RE As Object
Dim sPat As String
sPat = "\s*\b(?:" & Join(abbrevs, "|") & ")\b\.?"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.ignorecase = False
.Pattern = sPat
remAbbrevs = .Replace(S, "")
End With
End Function
For arguments to this function you can enter a series of abbreviations. The function creates an appropriate regex to use.
For example in the below, I entered:
=remAbbrevs(A1,"Inc","Corp")
and filled down:
Explanation of the regex:
remAbbrevs
\s*\b(?:Inc|Corp)\b\.?
Options: Case sensitive
Match a single character that is a “whitespace character” \s*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Assert position at a word boundary \b
Match the regular expression below (?:Inc|Corp)
Match this alternative Inc
Match the character string “Inc” literally Inc
Or match this alternative Corp
Match the character string “Corp” literally Corp
Assert position at a word boundary \b
Match the character “.” literally \.?
Between zero and one times, as many times as possible, giving back as needed (greedy) ?
Created with RegexBuddy

How to get string in between two characters in excel/spreadsheet

I have this string
Weiss,Emery/Ap #519-8997 Quam. Street/Hawaiian Gardens,IN - 79589|10/13/2010
how do I get the Hawaiian Gardens only?
I already tried Using some
=mid(left(A1,find("/",A1)-1),find(",",A1)+1,len(A1))
it gives me emery instead
If there are always two slashes before the string you want to extract, based onTyler M's answer you can use this
=MID(E1,
FIND("~",SUBSTITUTE(E1,"/","~",2))+1,
FIND(",",RIGHT(E1,LEN(E1)-FIND("~",SUBSTITUTE(E1,"/","~",2))))-1
)
This substitutes the second occurence of / with a character which normally would not occur in the address, thus making it findable.
Was your intention to also include Google Spreadsheets (looking at your title)? If so,you can use the REGEXEXTRACT() function. For example in B1
=REGEXEXTRACT(A1,"\/([\w\s]*)\,")
In Excel you could build a UDF using this regex rule like so (as an example):
Function REGEXEXTRACT(S As String, PTRN As String) As String
'We will get the last possible match in your string...
Dim regex As Object
Set regex = CreateObject("VBScript.RegExp")
With regex
.Pattern = PTRN
.Global = True
End With
Set matches = regex.Execute(S)
For Each Match In matches
If Match.SubMatches.Count > 0 Then
For Each subMatch In Match.SubMatches
REGEXEXTRACT = subMatch
Next subMatch
End If
Next Match
End Function
Call the function in B1 like so:
=REGEXEXTRACT(A1,"\/([\w\s]*)\,")

How to match a part of string before a character into one variable and all after it into another

I have a problem with splitting string into two parts on special character.
For example:
12345#data
or
1234567#data
I have 5-7 characters in first part separated with "#" from second part, where are another data (characters,numbers, doesn't matter what)
I need to store two parts on each side of # in two variables:
x = 12345
y = data
without "#" character.
I was looking for some Lua string function like splitOn("#") or substring until character, but I haven't found that.
Use string.match and captures.
Try this:
s = "12345#data"
a,b = s:match("(.+)#(.+)")
print(a,b)
See this documentation:
First of all, although Lua does not have a split function is its standard library, it does have string.gmatch, which can be used instead of a split function in many cases. Unlike a split function, string.gmatch takes a pattern to match the non-delimiter text, instead of the delimiters themselves
It is easily achievable with the help of a negated character class with string.gmatch:
local example = "12345#data"
for i in string.gmatch(example, "[^#]+") do
print(i)
end
See IDEONE demo
The [^#]+ pattern matches one or more characters other than # (so, it "splits" a string with 1 character).

Resources