Processing a Byte Order Mark - excel

I've been writing code to process xml downloaded via a webservice API. I was going ok until one query had some mysterious characters before the root element.
After contacting the support, I got the following message...
"The ABS.Stat APIs resultant XML output are UTF-8 compliant. These characters are a UTF-8 Byte Order Mark designed to identify the xml as UTF-8. Hope this helps."
Whilst waiting for their reply I continued with my programming by simply starting my DOM processing at the opening tag (first "<") with the following code...
Dim lgRootElementStart As Long
lgRootElementStart = InStr(1, hReq.ResponseText, "<")
Dim sgResponse As String
sgResponse = Mid(hReq.ResponseText, lgRootElementStart)
Dim xmlDoc As New MSXML2.DOMDocument
If Not xmlDoc.LoadXML(sgResponse) Then
etc. etc. etc.
All seems to be well, the data is deciphered and displayed ok.
But now that I know what those characters are, is there anything I should do with those characters?
Or to put it another way, is there anything I can do with those characters to make my excel application more reliable? i.e. now that I know the XML is UTF-8, how should I process it differently?
What should I do if the BOM gives UTF-16?

Well it seems that the BOM is more a nuisance than helpful, but I placed code in my application to check that it is a UTF8 BOM if any characters before the xml root element are received. If it's not a UTF8 BOM then an error is thrown. I'm not expecting this to be a problem any more, but if I ever see the error then I will have to re-analyse what's going on. Hopefully that will never happen.
Code is...
Public Const BOM_UTF8 As String = ""
and
If lgRootElementStart > 1 Then
If Left(hReq.ResponseText, lgRootElementStart - 1) = BOM_UTF8 Then
Else
Err.Raise ERROR_SHOULD_NEVER_HAPPEN, sFunctionName, _
"Non UTF8 BOM found. " _
& "BOM is ..." & ConvertToHex(Left(hReq.ResponseText, lgRootElementStart - 1)) _
& ", correct BOM is ... " & ConvertToHex(BOM_UTF8)
End If
End If
One quote from a link in the comments says..."Encodings should be known, not divined". Well with this code I know it's UTF8 if I get it.

Related

VB .Net when exporting to CSV issue when viewing in MS Excel

I have encountered something really weird. When exporting to CSV my top line shows the quotation marks yet the lines below down.
I use UTF8 encoding and manually add the double quotation marks to the value so that it is encased with quotation marks.
the code being used is
Dim fs As New IO.FileStream(GenericValueEditorExportFilename.Value, IO.FileMode.Create)
Dim writer As New IO.StreamWriter(fs, Encoding.UTF8)
fs.Write(Encoding.UTF8.GetPreamble(), 0, Encoding.UTF8.GetPreamble().Length)
....
....
....
While reader.Read
If reader("TargetLanguageID") = targetLanguageID Then
writer.WriteLine(Encode(reader("SourcePhrase")) & ", " & Encode(reader("TargetPhrase")))
End If
....
....
....
Friend Shared Function Encode(ByVal value As String) As String
Return ControlChars.Quote & value.Replace("""", """""") & ControlChars.Quote
End Function
the result when displayed in excel is shown as (https://ibb.co/ntMYdw)
when i open the file in Notepad++ the text is shown as below. But each line is displayed differently. Why is it that the 1st row displays them and the 2nd does not. Notepad++ result is displayed as (https://ibb.co/fMkWWG)
Excel is treating the first line as headers.
https://stackoverflow.com/a/24923167/2319909
So the issue was being caused by the BOM that was created to manually set the encoding for the file as a start writing to the file.
fs.Write(Encoding.UTF8.GetPreamble(), 0, Encoding.UTF8.GetPreamble().Length)
Removing this resolves by issue and the file remains in the desired UTF8 encoding as it is set on the stream writer. so there is no need to add the BOM to set the encoding.
Something like this should work for you.
Dim str As New StringBuilder
For Each dr As DataRow In Me.NorthwindDataSet.Customers
For Each field As Object In dr.ItemArray
str.Append(field.ToString & ",")
Next
str.Replace(",", vbNewLine, str.Length - 1, 1)
Next
Try
My.Computer.FileSystem.WriteAllText("C:\temp\testcsv.csv", str.ToString, False)
Catch ex As Exception
MessageBox.Show("Write Error")
End Try

VBA - System does not support the specified encoding

Run-time error '--1072896658 (c00ce56e)': System does not support the specified encoding
I'm trying to pull pricing data from this website: http://web.tmxmoney.com/pricehistory.php?qm_symbol=^TTUT. I keep getting the error "Run-time error '--1072896658 (c00ce56e)': System does not support the specified encoding".
I've used the code provided below to pull HTML data from most websites. This one is the only one which gives me this error. I think it is possible that i'm getting the error because the website uses Javascript, but i'm not sure. It definitely has something to do with the "tags" layout of the webpage. I can pull using the code from the first tag titled "Quote" (http://web.tmxmoney.com/quote.php?qm_symbol=^TTUT) but not the other tabs.
Option Explicit
Sub TEST_PULL()
Dim Look_String As String
Dim Web_HTML As String
Dim HTTP_OBJ As New MSXML2.XMLHTTP60
Dim xa As Long
Dim xb As Long
Select Case HTTP_OBJ.Status
Case 0: Web_HTML = HTTP_OBJ.responseText
Case 200: Web_HTML = HTTP_OBJ.responseText **'THE ERROR IS CAUSED HERE**
Case Else: GoTo ERROR_LABEL:
End Select
Look_String = "quote-tabs-content"
xa = IIf(IsNumeric(Look_String), Look_String, InStr(Web_HTML, Look_String))
xb = IIf(xa + 32767 <= Len(Web_HTML), 32767, Len(Web_HTML) - xa + 1)
Web_HTML = Mid(Web_HTML, xa, xb)
ERROR_LABEL:
End Sub
Can someone please help me figure out
Why this is happening
How I can successfully pull that pricing data
It would be a huge help!!! Thanks!!!
It's not you, it's them.
The response headers for the page which is causing the error specify an encoding which doesn't exist: ISO-8559-1. ISO 8559 has nothing to do with text encoding - it actually relates to the sizing of clothes. This should almost certainly be ISO-8859-1 instead.
The quote page which is successfully being read has the correct ISO-8859-1 encoding.
To get around this issue, use the responseBody property which contains the raw bytes before decoding. The StrConv function can then attempt to convert those bytes into a Unicode string (although this might not produce correct results in all cases), like this:
Case 200: Web_HTML = StrConv(HTTP_OBJ.responseBody, vbUnicode)

How to handle Apostrophes ( ' ) using XPATH in QTP

chk this code snippet
Please refer the below code.
rv = “Are you 56' taller ?”
If I pass 20 fields ie, until [rv = “ Are you 56' taller ? "].
It’s not working because ‘ – apostrophe is used to comment in QTP
How to handle ' ( apostrophe ) in Xpath using QTP ?
Code Snippet:
rv = Replace (rv,"'", "\'")
rv = LEFT(rv,50)
If SVAL = "Yes" Then
Set oobj = Browser("xyz").Page("abc").WebElement("xpath:=//div[contains(text(),'"& rv &"')]/../..//label[starts-with(text(),'Yes')]")
oobj.Click
oobj.Click
i = i+1
End If
I really appreciate your reply.
Try with the character code chr(39) for apostrophe as shown below:
"Are you 56" & chr(39) & " taller ?"
As others mentioned this is not because ' is a comment in vbscript (not just QTP) but because you're ending the string too early.
You use single quotes for the string to compare to in the XPath and then the apostrophe closes the string too early. You should instead use regular quotes there too so that the apostrophe doesn't end the string too early.
In order to get a double quote in a string in VBScript write it twice "Like ""this"" for example".
So your XPath should look like this:
"//div[contains(text(),""Are you 56' taller ?"")]"
Rather than this:
"//div[contains(text(),'Are you 56' taller ?')]"
Or using your example:
Browser("xyz").Page("abc").WebElement("xpath:=//div[contains(text(),"""& rv &""")]/../..//label[starts-with(text(),'Yes')]")
(Note this has been tested and works)
Use &apos; rather than (') so that the string can be properly processed.
Supporting evidence -> click here.
This has nothing to do with the ' being the comment character. This is normal working code:
Msgbox "'I love deadlines. I like the whooshing sound they make as they fly by.' Douglas Adams"
Your code results into an error because some characters needs to be escaped like <, >, & and your infamous '. To enter the line above correctly into an XML tag you need to do this:
htmlEscaped = "&apos;I love deadlines. I like the whooshing sound they make as they fly by.&apos Douglas Adams"
Here you can find an overview to a set of the most common characters that needs escaping (while this is not totally true: if you are using Unicode/UTF-8 encoding, some characters will parse just fine).
Unfortunately VBScript does not have a native function that escapes HTML like the Escape function for urls. Only if you are on ASP Server, you can use Server.HtmlEncode but that is not the case with you
To generalize html escaping (treath everything as special except for the most commons) you can use a script like this:
Function HTMLEncode(ByVal sVal)
sReturn = ""
If ((TypeName(sVal)="String") And (Not IsNull(sVal)) And (sVal<>"")) Then
For i = 1 To Len(sVal)
ch = Mid(sVal, i, 1)
Set oRE = New RegExp : oRE.Pattern = "[ a-zA-Z0-9]"
If (Not oRE.Test(ch)) Then
ch = "&#" & Asc(ch) & ";"
End If
sReturn = sReturn & ch
Set oRE = Nothing
Next
End If
HTMLEncode = sReturn
End Function
It could be improved a bit (you'll notice passing objects into this function will result into an error) and made more specific: the regular expression could be matching more characters. I do also not know the performance of it, regular expressions can be slow if used incorrectly, but it proves as an example.

VBA PublishObjects. Add character formatting

I found the article about putting excel cells into an email using the RangetoHTML function in VBA. It works like a charm, but now I’m facing a Problem.
If there are Umlaut (e.g.: ü, ä, ö) in the cells the result in the email shows strange symbols (e.g.: ä, …).
I looked up the written temp.htm file. On the first view of this file, it seems the umlaute are correctly written, but after looking through the file with an hex editor i found that the written symbols are not correct.
The function which writes the file is: PublishObjects.Add
So I hope someone can help me with this.
Edit: Added a testfile. Word and Office is needed.
Select the table and run the procedure SendMail.
You will always have problems with vba and foreign chars and the web.
EDIT:
Because you can't separate the cell values from the html the function below will unfortunately not work in this situation. BUT:
if you Save a copy of the document with western European windows encoding it will work.
(See comments below).
To be able to do that you press "Save As" and there is a dropdown on the left side of the save button (Tools) which will give you a dialog where you can change the encoding.
The image has ben lifted from technet and always save web.. is not necessary.
EOF EDIT:
This is a function I have used, Unfortunately can't remember who I got it from, But its from the olden days of vba and classic asp
Put your email cell formula into this function and it should work because all the letters are html encoded. Its slow and makes a bad overhead. But it will work.
Function HtmlEncode(ByVal inText As String) As String
Dim i As Integer
Dim sEnc As Integer
Dim repl As String
HtmlEncode = inText
For i = Len(HtmlEncode) To 1 Step -1
sEnc = Asc(Mid$(HtmlEncode, i, 1))
Select Case sEnc
Case 32
repl = " "
Case 34
repl = """
Case 38
repl = "&"
Case 60
repl = "<"
Case 62
repl = ">"
Case 32 To 127
'Numbers
Case Else
repl = "&#" & CStr(sEnc) & ";" 'Encode it all
End Select
If Len(repl) Then
HtmlEncode = Left$(HtmlEncode, i - 1) & repl & Mid$(HtmlEncode, i + 1)
repl = ""
End If
Next
End Function

Excel VBA TextStream.writeline with Chinese characters

I am fairly new to VBA and am stumped on how to resolve the "Run-time error '5': Invalid procedure call or argument" error that I am receiving when executing this code. The cell in question has chinese characters and the code seems to work fine on the english alphabet. The stream is outputting to a text file. (should be an xml file in the future, but I still don't have all the correct formatting implemented)
Dim fso As New FileSystemObject, stream As TextStream
Set stream = fso.createTextFile("C:\Users\username\XMLs\" _
& WS_Src.Cells(c.Row, 5).Value & "_" & WS_Src.Cells(c.Row, 4).Value & "_Feature.xml", True)
...
stream.WriteLine "<title>" & vbCrLf & "<![CDATA[ " & WS_Src.Cells(c.Row, 6).Value & "]]>" & vbCrLf & "</title>" 'error is on this line
stream.Close
Thanks!
Chris
Sytax for using CreateTextFile method is something like object.CreateTextFile(filename[, overwrite[, unicode]])
. Where:
filename: Required. String expression that identifies the file to create.
overwrite Optional. Boolean value that indicates if an existing file can be overwritten. The value is True if the file can be overwritten; False if it can't be overwritten. If omitted, existing files are not overwritten.
unicode Optional. Boolean value that indicates whether the file is created as a Unicode or ASCII file. The value is True if the file is created as a Unicode file; False if it's created as an ASCII file. If omitted, an ASCII file is assumed.
And you have omitted the last param here, but incoming text, being Chinese is not just ASCII. Rather you have to provide a True value for that, I mean for unicode param. This would definitely solve the problem.
BTW! There are still some factors I can see in the code might cause other run-time errors.
As you generating filename by joining cell values, make sure no invalid characters is not present in the path string,
Furthermore, only setting overwrite value to true is not enough, but also make sure that the folder already exist. Otherwise the procedure would again caught by run-time errors.
Hope this helps.

Resources