I have an excel table that contain values in these formats. The tables span over 30000 entries.
I need to clean this data so that only the numbers directly after V- are left. This would mean that when the value is SV-51140r3_rule, V-4407..., I would only want 4407 to remain and when the value is SV-245744r822811_rule, I would only want 245744 to remain. I have about 10 formulas that can handle these variations, but it requires a lot of manual labor. I've also used the text to column feature of excel to clean this data as well, but it takes about 30 minutes to an hour to go through the whole document. I'm looking for ways that I can streamline this process so that one formula or function can handle all of these different variations. I'm open to using VBA but don't have a whole lot of experience with it and I am unable to use Pandas or any IDE or programming language. Help please!!
I've used text to columns to clean data that way and I've used a variation of this formula
=IFERROR(RIGHT(A631,LEN(A631)-FIND("#",SUBSTITUTE(A631,"-","#",LEN(A631)-LEN(SUBSTITUTE(A631,"-",""))))),A631)
Depending on your version of Excel, either of these should work. If you have the ability to use the Let function, it will improve your performance, as this outstanding article articulates.
If you're on a really old version of excel, you'll need to hit ctl shift enter to make array formula work.
While these look daunting, all these functions are doing is finding the last V (by this function) =SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π","") and then looping through each character and only returning numbers.
Obviously the mushroom π could be any character that one would consider improbable to appear in the actual data.
Old School
=TEXTJOIN("",TRUE,IF(ISNUMBER(MID(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π","")),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π","")),9^9))),1)+0),
MID(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π","")),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π","")),9^9))),1),""))
Let Function
(use this if you can)
=LET(zText,SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("π",999)),999),"π",""),
TEXTJOIN("",TRUE,IF(ISNUMBER(MID(MID(zText,FIND("-",zText),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(zText,FIND("-",zText),9^9))),1)+0),
MID(MID(zText,FIND("-",zText),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(zText,FIND("-",zText),9^9))),1),"")))
VBA Custom Function
You could also use a VBA custom function to accomplish what you want.
Function getNumbersAfterCharcter(aCell As Range, aCharacter As String) As String
Const errorValue = "#NoValuesInText"
Dim i As Long, theValue As String
For i = Len(aCell.Value) To 1 Step -1
theValue = Mid(aCell.Value, i, 1)
If IsNumeric(theValue) Then
getNumbersAfterCharcter = Mid(aCell.Value, i, 1) & getNumbersAfterCharcter
ElseIf theValue = aCharacter Then
Exit Function
End If
Next i
If getNumbersAfterCharcter = "" Then getNumbersAfterCharcter = errorValue
End Function
With ActiveSheet
.Range("T26:T31").NumberFormat = "_(Β£* #,##0.00_);_(Β£* (#,##0.00);_(Β£* " - "??_);_(#_)"
I am trying to set the format of a cell range to Accounting, I have looked at the number format (above) and tried setting it to that, but I get a Type Mismatch.
I also tried doing Debug.Print Application.ActiveCell.NumberFormatLocal to find out how excel reads it, copied that in and still with no luck.
Anyone got any ideas?
You need to double any quotation marks inside the format code, so:
.Range("T26:T31").NumberFormat = "_(Β£* #,##0.00_);_(Β£* (#,##0.00);_(Β£* "" - ""??_);_(#_)"
I have a text field in a table where I need to substitute phone numbers where applicable.
For example the text field could have:
Call me on 08588812885 immediately
Call me on 07525812845
I need assistance please contact me
Good service
Sometimes a phone number will be in the text but not always and the phone number entered will always be different.
Is there a measure to use to replace the phone numbers with no text.
Ideally the solution would be Power BI, but can also be done in the raw data using excel or VBA
Regular expression in VBA (excel) or Python (Power BI) is a straightforward solution.
I have never used PowerBI with Python before but manage to make following python script.
In PowerBI transformation steps I created a new column that would copy [message] columns and named it [noPhoneNumber], then next step ran this python script
import re
def removePhone(x):
return re.sub('\d{10,11}', "**number removed**", x)
length = len(dataset["noPhoneNumber"])
for iRow in range(length):
dataset["noPhoneNumber"][iRow] = removePhone(dataset["noPhoneNumber"][iRow])
so column "noPhoneNumber"
Call me on 08588812885 immediately
Call me on 07525812845
I need assistance please contact me
Good service
becomes
Call me on **number removed** immediately
Call me on **number removed**
I need assistance please contact me
Good service
In VBA Preferable create UDF (user defined function) and don't create a subroutine, that would be too error prone for this kind of problem.
[Added]
If you need to make a Excel based solution, you can create a UDF function like so:
(remember early binding to import of VBScript_RegExp_55.RegExp in excel)
Function removePhoneNumber(text As String, Optional replacement As String = "**number removed**") As String
Dim regex As New RegExp
regex.Pattern = "\d{10,11}"
removePhoneNumber = regex.Replace(text, replacement)
End Function
...and then use excel function like so:
=removePhoneNumber(A2),
=removePhoneNumber(A3)
and so on...
A simple VBA function alternative
Function removePhone(s As String) As String
Const DELIM As String = " "
Dim i As Long, tokens As Variant
tokens = Split(s, DELIM)
For i = LBound(tokens) To UBound(tokens)
If IsNumeric(tokens(i)) Then
tokens(i) = "*Removed*" ' << change to your needs
Exit For ' assuming a single phone number per string
End If
Next
removePhone = Join(tokens, DELIM)
End Function
You can do this in Power Query. Create a custom column with this below code. I have considered the column name is Comments but please adjust this with your column name.
if Text.Length(Text.Select([comments], {"0".."9"})) = 11
then
Text.Replace(
[comments],
Text.Select([comments], {"0".."9"}),
""
)
else [comments]
Here is the output below. You can also replace phone numbers with other text like #### to make is anonymous.
NOTE
This will only work if there are only 1 number in the string with length 11 (You can adjust the length in code as per requirement).
This will Not work if there are more than one Numbers in the string.
If there are 1 number in the string but length not equal 11, this will keep the whole string as original.
For example, suppose I have this Excel file.
Then, I am manually putting things on Excel into do file like this.
replace A = 1 if B>=1 & B<=6
replace A = 2 if B>=23 & B<=2
replace A = 3 if B>=3 & B<=1
replace A = 4 if B>=5 & B<=3
If this wasn't clear, please see this image to see what I am doing.
But there could be actually hundreds of lines.
How can write a short code which imports the Excel file, and another short code which replaces the manual codes I have written?
So the goal here is just to make my code succinct.
You can import excel this file. Let's suppose the headers are A and B and the import produces those as numeric variables. Then the text of a new do-file is contained within
gen text = "replace A = " + string(_n) + " if inrange(A, " + string(A) + "," + string(B) + ")"
which you must export and then run on your real data.
Not tested. I'd also suggest considering doing this in your favourite text editor.
Note that many of your comparisons in your example will always be false.
I'm trying to convert the following text to a decimal number in excel 2003:
"93β
"
The output should be: 93.125
I've gotten this to work with ΒΌ, Β½, ΒΎ by using the Replace function in VBA:
For example, this works:
cur_cell = Replace(cur_cell, "Β½", " 1/2")
However, the β
and family characters are not supported in the VBA editor. They display as ??. Instead, I tried to replace the unicode value directly:
cur_cell = Replace(cur_cell, " & ChrW$(&H215B) & ", " 1/8")
But this doesn't work.
Is there a good way to convert these strings to numbers that I can use?
The correct syntax is:
cur_cell = Replace(cur_cell, ChrW$(&H215B), " 1/8")
Your example was saying: replace the string consisting of a space, an ampersand, a space [etc.] with 1/8. Clearly that's not what you want to do!
I'd actually recommend:
cur_cell.Value = Replace(Replace(cur_cell.Value, ChrW$(&H215B), ".125")," ","")
to circumvent Excel's automatic replacement of fractions. I just don't like to rely on that kind of automatic stuff. Why not write it as a decimal number straight off? Also, I like explicitly refering to the cell's .Value property as opposed to relying on it being the default property.