Converting html special character in excel - excel

Could anyone please suggest a function/formula used in worksheet to convert html special character to HTML entity, thanks
E.g.
™ to ™
® to ®

The answer to this question is a two part.
Do you only need to convert a certain set of these special chars?
Do you need to convert All supported?
Answer 1:
Public Function ConvertHTMLTag(data As String) As String
data = Replace(data, "™", "™")
data = Replace(data, "®", "®")
ConvertHTMLTag = data
End Function
Answer 2:
Repeat for all chars in http://www.webmonkey.com/2010/02/special_characters/
To make this a bit easier, I would try to put this list in to an Excel sheet with in two columns. One for the special tag and the other with it's evaluated char.
Write a formula in a 3rd column to create your code for you...
="data = Replace(data, "&Char(34)&A1&Char(34)&", "&Char(34)&A2&Char(34)&")"
Once you have your VBA code you've created in Excel, a simple copy and paste in to the function above will do the trick.

Use this function to encode from html special character to string
Function HTMLToCharCodes(ByVal s As String) As String
With New MSXML2.DOMDocument60
.LoadXML "<p>" & s & "</p>"
HTMLToCharCodes = .SelectSingleNode("p").nodeTypedValue
End With
End Function
Input: &, return: &

Related

Replace non-printable characters with " (Inch sign) VBA Excel

I need to replace non-printable characters with " (Inch sign).
I tried to use excel clean function and other UDF functions, but it just remove and not replace.
Note: non-printable characters are highlighted in blue on the above photo and it's position is random on the cells.
this is a sample string file Link`
The expected correct output should be 12"x14" LPG . OUTLET OCT-SEP# process
In advance grateful for useful comments and answer.
As per my comment, you can try:
=SUBSTITUTE(A1,CHAR(25)&CHAR(25),CHAR(34))
Or the VBA pseudo-code:
[A1] = [A1].Replace(Chr(25) & Chr(25), Chr(34))
Where [A1] is the obvious placeholder for the range-object you would want to use with proper and absolute referencing.
With ms365 newest functions, we could also use:
=TEXTJOIN(CHAR(34),,TEXTSPLIT(A1,CHAR(25)))
You can use Regular Expressions within a UDF to create a flexible method to replace "bad" characters, when you don't know exactly what they are.
In the UDF below, I show two pattern options, but others are possible.
One is to replace all characters with a character code >127
the second is to replace all characters with a charcter code >255
Option Explicit
Function ReplaceBadChars(str As String, replWith As String) As String
Dim RE As Object
Set RE = CreateObject("Vbscript.Regexp")
With RE
.Pattern = "[\u0080-\uFFFF]" 'to replace all characters with code >127 or
'.Pattern = "[\u0100-\uFFFF]" 'to replace all characters with code >255
.Global = True
ReplaceBadChars = .Replace(str, replWith)
End With
End Function
On the worksheet you can use, for example:
=ReplaceBadChars(A1,"""")
Or you could use it in a macro if you wanted to process a column of data without adding an extra column.
Note: I am uncertain as to whether there might be an efficiency difference using a smaller negated character class (eg: [^\x00-\x79] instead of the character class I showed in the code. But if, as written, execution seems slow, I'd try this change)
You can try this :
Cells.Replace What:="[The caracter to replace]", Replacement:=""""

Filter phone numbers from open text field - Power BI, excel, VBA

I have a text field in a table where I need to substitute phone numbers where applicable.
For example the text field could have:
Call me on 08588812885 immediately
Call me on 07525812845
I need assistance please contact me
Good service
Sometimes a phone number will be in the text but not always and the phone number entered will always be different.
Is there a measure to use to replace the phone numbers with no text.
Ideally the solution would be Power BI, but can also be done in the raw data using excel or VBA
Regular expression in VBA (excel) or Python (Power BI) is a straightforward solution.
I have never used PowerBI with Python before but manage to make following python script.
In PowerBI transformation steps I created a new column that would copy [message] columns and named it [noPhoneNumber], then next step ran this python script
import re
def removePhone(x):
return re.sub('\d{10,11}', "**number removed**", x)
length = len(dataset["noPhoneNumber"])
for iRow in range(length):
dataset["noPhoneNumber"][iRow] = removePhone(dataset["noPhoneNumber"][iRow])
so column "noPhoneNumber"
Call me on 08588812885 immediately
Call me on 07525812845
I need assistance please contact me
Good service
becomes
Call me on **number removed** immediately
Call me on **number removed**
I need assistance please contact me
Good service
In VBA Preferable create UDF (user defined function) and don't create a subroutine, that would be too error prone for this kind of problem.
[Added]
If you need to make a Excel based solution, you can create a UDF function like so:
(remember early binding to import of VBScript_RegExp_55.RegExp in excel)
Function removePhoneNumber(text As String, Optional replacement As String = "**number removed**") As String
Dim regex As New RegExp
regex.Pattern = "\d{10,11}"
removePhoneNumber = regex.Replace(text, replacement)
End Function
...and then use excel function like so:
=removePhoneNumber(A2),
=removePhoneNumber(A3)
and so on...
A simple VBA function alternative
Function removePhone(s As String) As String
Const DELIM As String = " "
Dim i As Long, tokens As Variant
tokens = Split(s, DELIM)
For i = LBound(tokens) To UBound(tokens)
If IsNumeric(tokens(i)) Then
tokens(i) = "*Removed*" ' << change to your needs
Exit For ' assuming a single phone number per string
End If
Next
removePhone = Join(tokens, DELIM)
End Function
You can do this in Power Query. Create a custom column with this below code. I have considered the column name is Comments but please adjust this with your column name.
if Text.Length(Text.Select([comments], {"0".."9"})) = 11
then
Text.Replace(
[comments],
Text.Select([comments], {"0".."9"}),
""
)
else [comments]
Here is the output below. You can also replace phone numbers with other text like #### to make is anonymous.
NOTE
This will only work if there are only 1 number in the string with length 11 (You can adjust the length in code as per requirement).
This will Not work if there are more than one Numbers in the string.
If there are 1 number in the string but length not equal 11, this will keep the whole string as original.

Finding multiple instance of a variable length string in a string

I'm trying to extract my parameters from my SQL query to build my xml for an SSRS report. I want to be able to copy/paste my SQL into Excel, look through the code and find all instances of '#' and the appropriate parameter attached to it. These paramaters will ultimately be copied and pasted to another sheet for further use. So for example:
where DateField between #FromDate and #ToDate
and (BalanceFiled between #BalanceFrom and #BalanceTo
OR BalancdField = #BalanceFrom)
I know I can use Instr to find the starting position of the first '#' in a line but how then do I go about extracting the rest of the parameter name (which varies) and also, in the first two lines of the example, finding the second parameter and extracting it's variable lenght? I've also tried using the .Find method which I've been able to copy the whole line over but not just the parameters.
I might approach this problem like so:
Remove characters that are not surrounded by spaces, but do not
belong. In your example, the parentheses need to be removed.
Split the text using the space as a delimiter.
For each element in the split array, check the first character.
If it is "#", then the parameter is found, and it is the entire value in that part of the array.
My user-defined function looks something like this:
Public Function GetParameters(ByRef rsSQL As String) As String
Dim sWords() As String
Dim s As Variant
Dim sResult As String
'remove parentheses and split at space
sWords = Split(Replace(Replace(rsSQL, ")", ""), "(", ""), " ")
'find parameters
For Each s In sWords
If Left$(s, 1) = "#" Then
sResult = sResult & s & ", "
End If
Next s
'remove extra comma from list
If sResult <> "" Then
sResult = Left$(sResult, Len(sResult) - 2)
End If
GetParameters = sResult
End Function

How to handle Apostrophes ( ' ) using XPATH in QTP

chk this code snippet
Please refer the below code.
rv = “Are you 56' taller ?”
If I pass 20 fields ie, until [rv = “ Are you 56' taller ? "].
It’s not working because ‘ – apostrophe is used to comment in QTP
How to handle ' ( apostrophe ) in Xpath using QTP ?
Code Snippet:
rv = Replace (rv,"'", "\'")
rv = LEFT(rv,50)
If SVAL = "Yes" Then
Set oobj = Browser("xyz").Page("abc").WebElement("xpath:=//div[contains(text(),'"& rv &"')]/../..//label[starts-with(text(),'Yes')]")
oobj.Click
oobj.Click
i = i+1
End If
I really appreciate your reply.
Try with the character code chr(39) for apostrophe as shown below:
"Are you 56" & chr(39) & " taller ?"
As others mentioned this is not because ' is a comment in vbscript (not just QTP) but because you're ending the string too early.
You use single quotes for the string to compare to in the XPath and then the apostrophe closes the string too early. You should instead use regular quotes there too so that the apostrophe doesn't end the string too early.
In order to get a double quote in a string in VBScript write it twice "Like ""this"" for example".
So your XPath should look like this:
"//div[contains(text(),""Are you 56' taller ?"")]"
Rather than this:
"//div[contains(text(),'Are you 56' taller ?')]"
Or using your example:
Browser("xyz").Page("abc").WebElement("xpath:=//div[contains(text(),"""& rv &""")]/../..//label[starts-with(text(),'Yes')]")
(Note this has been tested and works)
Use &apos; rather than (') so that the string can be properly processed.
Supporting evidence -> click here.
This has nothing to do with the ' being the comment character. This is normal working code:
Msgbox "'I love deadlines. I like the whooshing sound they make as they fly by.' Douglas Adams"
Your code results into an error because some characters needs to be escaped like <, >, & and your infamous '. To enter the line above correctly into an XML tag you need to do this:
htmlEscaped = "&apos;I love deadlines. I like the whooshing sound they make as they fly by.&apos Douglas Adams"
Here you can find an overview to a set of the most common characters that needs escaping (while this is not totally true: if you are using Unicode/UTF-8 encoding, some characters will parse just fine).
Unfortunately VBScript does not have a native function that escapes HTML like the Escape function for urls. Only if you are on ASP Server, you can use Server.HtmlEncode but that is not the case with you
To generalize html escaping (treath everything as special except for the most commons) you can use a script like this:
Function HTMLEncode(ByVal sVal)
sReturn = ""
If ((TypeName(sVal)="String") And (Not IsNull(sVal)) And (sVal<>"")) Then
For i = 1 To Len(sVal)
ch = Mid(sVal, i, 1)
Set oRE = New RegExp : oRE.Pattern = "[ a-zA-Z0-9]"
If (Not oRE.Test(ch)) Then
ch = "&#" & Asc(ch) & ";"
End If
sReturn = sReturn & ch
Set oRE = Nothing
Next
End If
HTMLEncode = sReturn
End Function
It could be improved a bit (you'll notice passing objects into this function will result into an error) and made more specific: the regular expression could be matching more characters. I do also not know the performance of it, regular expressions can be slow if used incorrectly, but it proves as an example.

Turn Excel line break into <br>

A client sent me a huge list of product name and descriptions. The Description cells have text wrap and many line breaks. I need to import this into a MySQL which I do through Navicat Premium.
The problem is that the description cell is used as the HTML description of each product page.
Is there a way to replace Excel's line break with the <br> either in the same Excel file or by a php function?
A little bit of ASCII coding will go a long way.
Set up the find/replace dialogue (Ctrl-H). In the Find field, hold down the Alt key and type 010 from the numeric key pad. (This lets you find a linefeed character.) In the replace field, put your <br>.
or use a VBA function to replace the carriage returns in a string
Insert a MODULE and paste this
Function LineFeedReplace(ByVal str As String)
dim strReplace as String
strReplace = "<br>"
LineFeedReplace = Replace(Replace(Replace(Replace(Replace(Replace(str, Chr(10), strReplace), Chr(13), strReplace), vbCr , strReplace), vbCrLf, strReplace), vbLf, strReplace), vbNewLine, strReplace)
End Function
If cell A1 contains a string with a linefeed then =LineFeedReplace(A1) will return the string with all linefeeds set to <br>
First make sure you account for both CR and LF which tend to come together. The codes for these are 0013 and 0010 and so you will need a formula that allows you to clean both. I used this formula successfully =SUBSTITUTE(A3,CHAR(13),"<br>") to convert a cell of long text in excel replacing invisible breaks with the 'br' tag. Since you can't tell exactly what kind of line break you have you can also try it with 0010 or =SUBSTITUTE(A3,CHAR(10),"<br>")

Resources