I have an excel table that contain values in these formats. The tables span over 30000 entries.
I need to clean this data so that only the numbers directly after V- are left. This would mean that when the value is SV-51140r3_rule, V-4407..., I would only want 4407 to remain and when the value is SV-245744r822811_rule, I would only want 245744 to remain. I have about 10 formulas that can handle these variations, but it requires a lot of manual labor. I've also used the text to column feature of excel to clean this data as well, but it takes about 30 minutes to an hour to go through the whole document. I'm looking for ways that I can streamline this process so that one formula or function can handle all of these different variations. I'm open to using VBA but don't have a whole lot of experience with it and I am unable to use Pandas or any IDE or programming language. Help please!!
I've used text to columns to clean data that way and I've used a variation of this formula
=IFERROR(RIGHT(A631,LEN(A631)-FIND("#",SUBSTITUTE(A631,"-","#",LEN(A631)-LEN(SUBSTITUTE(A631,"-",""))))),A631)
Depending on your version of Excel, either of these should work. If you have the ability to use the Let function, it will improve your performance, as this outstanding article articulates.
If you're on a really old version of excel, you'll need to hit ctl shift enter to make array formula work.
While these look daunting, all these functions are doing is finding the last V (by this function) =SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π","") and then looping through each character and only returning numbers.
Obviously the mushroom π could be any character that one would consider improbable to appear in the actual data.
Old School
=TEXTJOIN("",TRUE,IF(ISNUMBER(MID(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π","")),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π","")),9^9))),1)+0),
MID(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π","")),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π","")),9^9))),1),""))
Let Function
(use this if you can)
=LET(zText,SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π",""),
TEXTJOIN("",TRUE,IF(ISNUMBER(MID(MID(zText,FIND("-",zText),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(zText,FIND("-",zText),9^9))),1)+0),
MID(MID(zText,FIND("-",zText),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(zText,FIND("-",zText),9^9))),1),"")))
VBA Custom Function
You could also use a VBA custom function to accomplish what you want.
Function getNumbersAfterCharcter(aCell As Range, aCharacter As String) As String
Const errorValue = "#NoValuesInText"
Dim i As Long, theValue As String
For i = Len(aCell.Value) To 1 Step -1
theValue = Mid(aCell.Value, i, 1)
If IsNumeric(theValue) Then
getNumbersAfterCharcter = Mid(aCell.Value, i, 1) & getNumbersAfterCharcter
ElseIf theValue = aCharacter Then
Exit Function
End If
Next i
If getNumbersAfterCharcter = "" Then getNumbersAfterCharcter = errorValue
End Function
I have a text field in a table where I need to substitute phone numbers where applicable.
For example the text field could have:
Call me on 08588812885 immediately
Call me on 07525812845
I need assistance please contact me
Good service
Sometimes a phone number will be in the text but not always and the phone number entered will always be different.
Is there a measure to use to replace the phone numbers with no text.
Ideally the solution would be Power BI, but can also be done in the raw data using excel or VBA
Regular expression in VBA (excel) or Python (Power BI) is a straightforward solution.
I have never used PowerBI with Python before but manage to make following python script.
In PowerBI transformation steps I created a new column that would copy [message] columns and named it [noPhoneNumber], then next step ran this python script
import re
def removePhone(x):
return re.sub('\d{10,11}', "**number removed**", x)
length = len(dataset["noPhoneNumber"])
for iRow in range(length):
dataset["noPhoneNumber"][iRow] = removePhone(dataset["noPhoneNumber"][iRow])
so column "noPhoneNumber"
Call me on 08588812885 immediately
Call me on 07525812845
I need assistance please contact me
Good service
becomes
Call me on **number removed** immediately
Call me on **number removed**
I need assistance please contact me
Good service
In VBA Preferable create UDF (user defined function) and don't create a subroutine, that would be too error prone for this kind of problem.
[Added]
If you need to make a Excel based solution, you can create a UDF function like so:
(remember early binding to import of VBScript_RegExp_55.RegExp in excel)
Function removePhoneNumber(text As String, Optional replacement As String = "**number removed**") As String
Dim regex As New RegExp
regex.Pattern = "\d{10,11}"
removePhoneNumber = regex.Replace(text, replacement)
End Function
...and then use excel function like so:
=removePhoneNumber(A2),
=removePhoneNumber(A3)
and so on...
A simple VBA function alternative
Function removePhone(s As String) As String
Const DELIM As String = " "
Dim i As Long, tokens As Variant
tokens = Split(s, DELIM)
For i = LBound(tokens) To UBound(tokens)
If IsNumeric(tokens(i)) Then
tokens(i) = "*Removed*" ' << change to your needs
Exit For ' assuming a single phone number per string
End If
Next
removePhone = Join(tokens, DELIM)
End Function
You can do this in Power Query. Create a custom column with this below code. I have considered the column name is Comments but please adjust this with your column name.
if Text.Length(Text.Select([comments], {"0".."9"})) = 11
then
Text.Replace(
[comments],
Text.Select([comments], {"0".."9"}),
""
)
else [comments]
Here is the output below. You can also replace phone numbers with other text like #### to make is anonymous.
NOTE
This will only work if there are only 1 number in the string with length 11 (You can adjust the length in code as per requirement).
This will Not work if there are more than one Numbers in the string.
If there are 1 number in the string but length not equal 11, this will keep the whole string as original.
It's my first time posting a question here :)
When exporting data from our enterprise ticketing system, we unfortunately do not have a specific column for a machine ID, but instead have "problem description" column which includes both the short description of the issue and the machine ID. The Machine ID always has numbers, but may contain only numbers or 2-4 letters before the number, with no spaces, examples are:
XK2065
2092
BOZK10625
The number of digits can vary, but is never more that six.
2 examples of the problem description:
1) XK2065 - issue not detected, please investigate.
2) Please investigate why issue was not detected, machine ID is XK2065, ticket number 1425778.
So, the problem is that the unit ID can be located anywhere in the sentence and can also contain only numbers or 2 to 4 letters before the numbers.
Is there a function that can extract the machine ID, regardless of location, along with the beginning letters adjacent to the numbers if it has them? Additional condition I'd like is for a number of digits to be no more than 6, as sometimes ticket numbers may be included which are 7-digit.
A function would be preferable to VBA macro.
Thanks in advance!
This function should do what you need, using regular expression (like #RonRosenfeld suggested):
Function RegExID(str As String) As String
Dim rgx As Object
Set rgx = CreateObject("VBScript.RegExp")
Dim allMatches As Object
With rgx
.Pattern = "\b[A-Z]{0,4}[\d]{4,6}\b"
.Global = True
.ignoreCase = True
.MultiLine = True
End With
Set allMatches = rgx.Execute(str)
For Each Item In allMatches
RegExID = Item.Value
Next
End Function
I would like to limit certain textboxes to accept only [A-Za-z]
I hope, a counterpart to Like exists.
With Like I would have to make a long list of not allowed characters to be able to filter.
Not MyString like [?;!Β°%/=....]
I can think of a solution in the form of:
For Counter = 1 To Len(MyString)
if Mid(MyString, Counter, 1) Like "*[a-z]*" = false then
MsgBox "String contains bad characters"
exit sub
end if
next
... but is there a more sophisticated 1liner solution ?
Until then, I have created a function to make it "Oneliner":
Function isPureString(myText As String) As Boolean
Dim i As Integer
isPureString = True
For i = 1 To Len(myText)
If Mid(myText, i, 1) Like "*[a-zA-Z_ΓéÑűúΕΓΆΓΌΓ³ΓΓΓΓΕΕ°ΓΓΓ]*" = False Then
isPureString = False
End If
Next
End Function
If i add 1 more parameter, its also possible to define the allowed characters upon calling the function.
Ok, it seems my question was a bit of a duplicate, even though that did not pop in my search results.
So credits for #QHarr for posting the link.
The solution I can forge from that idea for my "oneliner" is:
If myText Like WorksheetFunction.Rept("[a-zA-Z]", Len(myText))=false then 'do something.
Using .rept is inspiringly clever and elegant in my oppinion.
So what is does: Multiplies the search criteria for each charater instead of looping through the characters.
EDIT:
In an overaboundance of nice and elegant solutions, the most recent leader is:
If not myText Like "*[!A-Za-z]*" then '... do something
Statistics update:
I have tested the last 3 solutions' performance:
I have pasted # in the below text strin at the beginning, at the end or nowhere.
The criteria were: "*[a-zA-Z \S.,]*"
For 100000 repetitions
text = "This will be a very Long text, with one unwanted in the middle, to be able to test the difference in performance of the approaches."
1.) Using the [!...] -> 30ms with error, 80ms if no error
2.) Using .Rept -> around 1800ms for all cases
3.) Using characterLoop+Mid -> around 3000ms if no error / 40-80ms ms if early error
Is there any way excel 2010 can lookup customer type by using meaningful part of customer name?
Example, The customer name is Littleton's Valley Market, but the list I am trying to look up the customer type the customer names are formatted little different such as <Littletons Valley MKT #2807 or/and Littleton Valley.
Some customer can be listed under multiple customer types, how can excel tell me what which customer and can I set excel to pull primary or secondary type?
Re #1. Fails on the leading < (if belongs!) and any other extraneous prefix but this may be rare or non-existent so:
=INDEX(G:G,MATCH(LEFT(A1,6)&"*",F:F,0))
or similar may catch enough to be useful. This looks at the first six characters but can be adjusted to suit, though unfortunately only once at a time. Assumes the mismatches are in ColumnA (eg A1 for the formula above) and that the correct names are in ColumnF with the required type in the corresponding row of ColumnG.
On a large scale Fuzzy Lookup may be helpful.
Since with a VBA tag Soundex matching and Levenshtein distance may be of interest.
Re #2 If secondary type is in ColumnH, again in matching row, then adjust G:G above to H:H.
pnuts gives a good answer re: Fuzzy Lookup, Soundex matching, etc. Quick and dirty way I've handled this before:
Function isNameLike(nameSearch As String, nameMatch As String) As Boolean
On Error GoTo ErrorHandler
If InStr(1, invalidChars(nameSearch), invalidChars(nameMatch), vbTextCompare) > 0 Then isNameLike = True
Exit Function
ErrorHandler:
isNameLike = False
End Function
Function invalidChars(strIn As String) As String
Dim i As Long
Dim sIn As String
Dim sOut As String
sOut = ""
On Error GoTo ErrorHandler
For i = 1 To Len(strIn)
sIn = Mid(strIn, i, 1)
If InStr(1, " 1234567890~`!##$%^&*()_-+={}|[]\:'<>?,./" & Chr(34), sIn, vbTextCompare) = 0 Then sOut = sOut & sIn
Next i
invalidChars = sOut
Exit Function
ErrorHandler:
invalidChars = strIn
End Function
Then I can call isNameLike from code, or use it as a formula in a worksheet. Note that you still have to supply the "significant" part of the customer name you're looking for.