Getting HTML Source with Excel-VBA - string

I would like to direct an excel VBA form to certain URLs, get the HTML source and store that resource in a string. Is this possible, and if so, how do I do it?

Yes. One way to do it is to use the MSXML DLL - and to do that you need to add a reference to the Microsoft XML library via Tools->References.
Here's some code that displays the content of a given URL:
Public Sub ShowHTML(ByVal strURL)
On Error GoTo ErrorHandler
Dim strError As String
strError = ""
Dim oXMLHTTP As MSXML2.XMLHTTP
Set oXMLHTTP = New MSXML2.XMLHTTP
Dim strResponse As String
strResponse = ""
With oXMLHTTP
.Open "GET", strURL, False
.send ""
If .Status <> 200 Then
strError = .statusText
GoTo CleanUpAndExit
Else
If .getResponseHeader("Content-type") <> "text/html" Then
strError = "Not an HTML file"
GoTo CleanUpAndExit
Else
strResponse = .responseText
End If
End If
End With
CleanUpAndExit:
On Error Resume Next ' Avoid recursive call to error handler
' Clean up code goes here
Set oXMLHTTP = Nothing
If Len(strError) > 0 Then ' Report any error
MsgBox strError
Else
MsgBox strResponse
End If
Exit Sub
ErrorHandler:
strError = Err.Description
Resume CleanUpAndExit
End Sub

Just an addition to the above response. The question was how to get the HTML source which the stated answer does not actually provide.
Compare the contents of oXMLHTTP.responseText with the source code in a browser for URL "http://finance.yahoo.com/q/op?s=T+Options". They do not match and even the returned values are different. (This should be executed after hours to avoid changes during the trading day.)
If I find a way to perform this task the basic code will be posted.

Compact getHTTP function
Below is a compact & generic function that will return HTTP response from a specified URL to, for example:
return the HTML Source of a web page,
JSON response from an API URL,
parse a text file at a URL, etc.
This does not require any VBA References since MSXML2 is used as a late-bound object.
Public Function getHTTP(ByVal url As String) As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False: .Send
getHTTP = StrConv(.responseBody, vbUnicode)
End With
End Function
Note that this basic function has no validation or error handling, as those are the parts that can vary considerably depending on which URL you're hitting.
If desired, check the value of .Status after the .Send) to check for success codes like 0 or 200, and also you can setup an error trap with On Error Goto... (never Resume Next!)
Example Usage:
This procedure scrapes this Stack Overflow page for the current score of this question.
Sub demo_getVoteCount()
Const answerID$ = 2522760
Const url_SO = "https://stackoverflow.com/a/" & answerID
Dim html As String, startPos As Long, voteCount As Variant
html = getHTTP(url_SO) 'get html from url
startPos = InStr(html, "answerid=""" & answerID) 'locate this answer
startPos = InStr(startPos, html, "vote-count-post") 'locate vote count
startPos = InStr(startPos, html, ">") + 1 'locate value
voteCount=Mid(html,startPos,InStr(startPos,html,"<")-startPos) 'extract score
MsgBox "Answer #" & answerID & " has a score of " & voteCount & "."
End Sub
Of course in reality there are far better ways to get the score of an answer than the example above, such as this way.)

Related

Scrape Cargo Number Tracking Status with XMLHTTP Request with dynamic content

I have to create several functions that get the status of the supplied cargo number from each different website.
Below is the code user Zwenn helped me with. However, I am not familiar with the RegEx and Replace methods of VBA.
I am trying to simplify this code so I can replicate it for other websites. I understand that each website will need a unique code, but if the base stays the same and I can then modify the exact element needed to be scraped would be ideal.
Function FlightStat_AF(cargoNo As Variant) As String
Const url = "https://www.afklcargo.com/mycargo/api/shipment/detail/057-"
Dim elem As Object
Dim Result As String
Dim askFor As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url & cargoNo, False
.send
Result = .responseText
If .Status = 200 Then
If InStr(1, Result, "faultDescription") = 0 Then
askFor = """metaStatus"""
Else
askFor = """faultDescription"""
End If
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.Pattern = askFor & ":(.*?),"
Set elem = .Execute(Result)
End With
Result = Replace(elem(0).SubMatches(0), Chr(34), "")
Else
Result = "No cargoID"
End If
End With
FlightStat_AF = Result
End Function
I am trying to create a similar function for the below website.
URL = https://booking.unitedcargo.com/skychain/app?service=page/nwp:Trackshipmt&doc_typ=AWB&awb_pre=016&awb_no=
Sample CargoNo = 60848034
The element to scrape is highlighted in yellow
The following should fetch you the required status as long as it is available.
Sub PrintStatus()
MsgBox GetDeliveryStat("60848034")
End Sub
Function GetDeliveryStat(cargoNo As Variant) As String
Const Url = "https://booking.unitedcargo.com/skychain/app?service=page/nwp:Trackshipmt&doc_typ=AWB&awb_pre=016&awb_no="
Dim dStatCheck$, deliveryStat$, S$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url & cargoNo, False
.send
S = .responseText
End With
With CreateObject("HTMLFile")
.write S
On Error Resume Next
dStatCheck = .getElementById("trackShiptablerowInner0").getElementsByTagName("b")(0).innerText
On Error GoTo 0
If dStatCheck <> "" Then
deliveryStat = dStatCheck
Else
deliveryStat = "Not Found"
End If
End With
GetDeliveryStat = deliveryStat
End Function

Excel VBA - Web scraping - Track parcel - deal with error where tracking number is incorrect

I am trying to create a function that grabs the status of an airway bill by using a tracking number.
I have managed to create a function that grabs the status correctly with the help of the stackoverflow community.
However, I am trying to add in the error handling where the tracking number may be incorrect.
With the current function, it correctly gets the result if the tracking number is valid.
But when an incorrect number is provided, the function returns a 0 value and keeps running in a loop in the background. When stopped from the VBA editor, excel crashes.
This is the code I have come up with so far. Any help to add this error handling would be appreciated.
Sample Correct Cargo Number: 92366691
Sample Incorrect Cargo Number: 59473805
Function FlightStat_AF(cargoNo As Variant) As String
Dim url As String, ie As Object, result As String
url = "https://www.afklcargo.com/mycargo/shipment/detail/057-" & cargoNo
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.navigate url
Do Until .readyState = 4: DoEvents: Loop
End With
'wait a little for dynamic content to be loaded
Application.Wait (Now + TimeSerial(0, 0, 1))
'Get the status from the table
Do While result = ""
DoEvents
On Error Resume Next
result = Trim(ie.document.getElementsByClassName("fs-12 body-font-bold")(1).innerText)
On Error GoTo 0
Application.Wait (Now + TimeSerial(0, 0, 1))
Loop
ie.Quit: Set ie = Nothing
'Return value of the function
FlightStat_AF = result
End Function
I learned a lot today and I'am very happy about that. My code based on this answer, I learned all the new things from^^
Scraping specific data inside a table II (Answer by SIM)
You ask about how to avoid an error when you send a wrong ID. Here is the answer how you can deal with that error and the error when you send an ID in the wrong format of an ID.
This is the Sub() to test the function:
Sub test()
'A valid ID
MsgBox FlightStat_AF("92366691")
'A wrong ID
'The whole string is "The provided AWB(s) is either invalid, not found or you are not authorized for it."
'The function FlightStat_AF cuts the string by comma
'So it delivers "The provided AWB(s) is either invalid"
'I'am not clear with regex till now and used it like the macro this code is based on ;-)
MsgBox FlightStat_AF("59473805")
'Somthing else than a valid ID format
MsgBox FlightStat_AF("blub")
End Sub
This is the function() to get the answer you want:
Function FlightStat_AF(cargoNo As Variant) As String
Const url = "https://www.afklcargo.com/mycargo/api/shipment/detail/057-"
Dim elem As Object
Dim result As String
Dim askFor As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url & cargoNo, False
.send
result = .responseText
If .Status = 200 Then
If InStr(1, result, "faultDescription") = 0 Then
askFor = """metaStatus"""
Else
askFor = """faultDescription"""
End If
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.Pattern = askFor & ":(.*?),"
Set elem = .Execute(result)
End With
If Not elem Is Nothing Then
result = Replace(elem(0).SubMatches(0), Chr(34), "")
Else
result = "No Value"
End If
Else
result = "No cargoID"
End If
End With
FlightStat_AF = result
End Function
a way to check that the url is valid is to use the function below :
Public Function URLexist(urlToCheck As String) As Boolean
'source : https://excel-malin.com
On Error GoTo Err
Dim oXHTTP As Object
Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
oXHTTP.Open "HEAD", urlToCheck , False
oXHTTP.send
URLexist = (oXHTTP.Status = 200)
Exit Function
Err:
URLexist = False
End Function

Excel VBA: get error code for invalid URL in hyperlink with WinHttpRequest

In Excel, I have a list with URLs. I need to check if IE (default browser) can open these. They don't have to open actually, it's to check the accessibility.
If they can't open, I need to isolate the error-code and place that in another column.
After searching around here, I started with following the hyperlinks, and used GET to get the data in a MsgBox. This seems to work partially, but of course now I get the MsgBox with every URL also without error. Also I'm looking for a way to extract the error and place that in the active sheet.
What I've got so far:
Sub Request_Data()
' declare
numRow = 2
Dim MyRequest As Object
' activate URLs without Follow
Do While ActiveSheet.Range("C" & numRow).Hyperlinks.Count > 0
numRow = numRow + 1
' create request
Set MyRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
MyRequest.Open "GET", _
ActiveSheet.Range("C" & numRow)
' send request
MyRequest.Send
' outcome
MsgBox MyRequest.ResponseText
' isolate the error code (for example 404)
' place error code in excel sheet in column H next to row URL
Loop
End Sub
Does someone know how I should proceed?
I thought this might be useful but I don't know where to start.
Checking for broken hyperlinks in Excel
and
Bulk Url checker macro excel
Thanks in advance
See the code below - you will need to adapt the Test sub-routine to loop through your cells and call IsValidUrl for each value you want to test:
Option Explicit
Sub Test()
MsgBox IsValidUrl("http://www.thisdoesnotexistxxxxxxxxxxxxx.com/")
MsgBox IsValidUrl("http://www.google.com/")
MsgBox IsValidUrl("http://www.ppppppppppppqqqqqqqqqqqqqqrrrrrrrrrrrrr.com/")
End Sub
Function IsValidUrl(strUrl As String) As Long
Dim objRequest As Object
Dim lngCode As Long
On Error GoTo ErrHandler
Set objRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
With objRequest
.Open "GET", strUrl
.Send
lngCode = 0
End With
GoTo ExitHandler
ErrHandler:
lngCode = Err.Number
ExitHandler:
Set objRequest = Nothing
IsValidUrl = lngCode
End Function
My output is:
-2147012889
0
-2147012889

How to use appIE.Document.Body.innerHTML

So I'm trying to retrieve the latitude and longitude of a given postal code, and am trying to use VBA to place this into an excel worksheet. My code is as follows:
Private appIE As Object
Function GeoCode(sLocationData As String) As String
'//Dont want to open and close all day long - make once use many
If appIE Is Nothing Then
CreateIEApp
'// Creates a new IE App
'// if = nothing now then there was an error
If appIE Is Nothing Then
GeoCode = "Sorry could not launch IE"
Exit Function
Else
'// do nothing
End If
Else
'// do nothing
End If
'//clearing up input data
'sLocationData = Replace(sLocationData, ",", " ")
sLocationData = Replace(sLocationData, " ", "+")
sLocationData = Trim(sLocationData)
'//Build URL for Query
sLocationData = "http://maps.google.com/maps/geo?q=%20_" & sLocationData
'// go to the google web service and get the raw CSV data
'// CAUSES PROBLEM AS SPECIFIED BELOW
appIE.Navigate sLocationData
Do While appIE.Busy
Application.StatusBar = "Contacting Google Maps API..."
Loop
Application.StatusBar = False
On Error Resume Next
'// Parsing
GeoCode = appIE.Document.Body.innerHTML
GeoCode = Mid(GeoCode, InStr(GeoCode, ",") + 1, InStr(GeoCode, "/") - InStr(GeoCode, ",") - 2)
appIE = Nothing
End Function
The Google Maps API then returns a JSON formatted value, as per this link:
http://maps.google.com/maps/geo?q=%20_400012
I then attempt to retrieve this value using
appIE.Document.Body.innerHTML,
and parsing that value for the data I want. However, the moment the code hits appIE.Navigate sLocationData,
I'm prompted to save a file called "geo". When saved and opened as a .txt file, I get the exact same JSON formatted value, but I need the values within my worksheet itself.
Is there a way to do this?
Thanks in advance!
That link didn't work for me in Firefox - response 610. If I remove the space and the underscore, it works. I don't know why IE wants to download, probably some setting that tells it to always download JSON rather than render it. In any case, consider using MSXML's http request rather than automating IE.
Set a reference to Microsoft XML, v6.0 or similar (VBE - Tools - References).
Function GeoCode(sLocData As String) As String
Dim xHttp As MSXML2.XMLHTTP
Dim sResponse As String
Dim lStart As Long, lEnd As Long
Const sURL As String = "http://maps.google.com/maps/geo?q="
Const sCOOR As String = "coordinates"": " 'substring that we'll look for later
'send the http request
Set xHttp = New MSXML2.XMLHTTP
xHttp.Open "GET", sURL & sLocData
xHttp.send
'wait until it's done
Do
DoEvents
Loop Until xHttp.readyState = 4
'get the returned data
sResponse = xHttp.responseText
'find the starting and ending points of the substring
lStart = InStr(1, sResponse, sCOOR)
lEnd = InStr(lStart, sResponse, "]")
GeoCode = Mid$(sResponse, lStart + Len(sCOOR), lEnd - lStart - Len(sCOOR) + 1)
End Function
Sub Test()
Dim sTest As String
sTest = GeoCode("400012")
Debug.Assert sTest = "[ 103.9041520, 1.3222160, 0 ]"
End Sub

Sort dead hyperlinks in Excel with VBA?

The title says it:
I have an excel Sheet with an column full of hyperlinks. Now I want that an VBA Script checks which hyperlinks are dead or work and makes an entry into the next columns either with the text 404 Error or active.
Hopefully someone can help me because I am not really good at VB.
EDIT:
I found # http://www.utteraccess.com/forums/printthread.php?Cat=&Board=84&main=1037294&type=thread
A solution which is made for word but the Problem is that I need this solution for Excel. Can someone translate this to Excel solution?
Private Sub testHyperlinks()
Dim thisHyperlink As Hyperlink
For Each thisHyperlink In ActiveDocument.Hyperlinks
If thisHyperlink.Address <> "" And Left(thisHyperlink.Address, 6) <> "mailto" Then
If Not IsURLGood(thisHyperlink.Address) Then
Debug.Print thisHyperlink.Address
End If
End If
Next
End Sub
Private Function IsURLGood(url As String) As Boolean
' Test the URL to see if it is good
Dim request As New WinHttpRequest
On Error GoTo IsURLGoodError
request.Open "GET", url
request.Send
If request.Status = 200 Then
IsURLGood = True
Else
IsURLGood = False
End If
Exit Function
IsURLGoodError:
IsURLGood = False
End Function
First add a reference to Microsoft XML V3 (or above), using Tools->References. Then paste this code:
Option Explicit
Sub CheckHyperlinks()
Dim oColumn As Range
Set oColumn = GetColumn() ' replace this with code to get the relevant column
Dim oCell As Range
For Each oCell In oColumn.Cells
If oCell.Hyperlinks.Count > 0 Then
Dim oHyperlink As Hyperlink
Set oHyperlink = oCell.Hyperlinks(1) ' I assume only 1 hyperlink per cell
Dim strResult As String
strResult = GetResult(oHyperlink.Address)
oCell.Offset(0, 1).Value = strResult
End If
Next oCell
End Sub
Private Function GetResult(ByVal strUrl As String) As String
On Error Goto ErrorHandler
Dim oHttp As New MSXML2.XMLHTTP30
oHttp.Open "HEAD", strUrl, False
oHttp.send
GetResult = oHttp.Status & " " & oHttp.statusText
Exit Function
ErrorHandler:
GetResult = "Error: " & Err.Description
End Function
Private Function GetColumn() As Range
Set GetColumn = ActiveWorkbook.Worksheets(1).Range("A:A")
End Function
Gary's code is perfect, but I would rather use a public function in a module and use it in a cell as function. The advantage is that you can use it in a cell of your choice or anyother more complex function.
In the code below I have adjusted Gary's code to return a boolean and you can then use this output in an =IF(CHECKHYPERLINK(A1);"OK";"FAILED"). Alternatively you could return an Integer and return the status itself (eg.: =IF(CHECKHYPERLINK(A1)=200;"OK";"FAILED"))
A1: http://www.whatever.com
A2: =IF(CHECKHYPERLINK(A1);"OK";"FAILED")
To use this code please follow Gary's instructions and additionally add a module to the workbook (right click on the VBAProject --> Insert --> Module) and paste the code into the module.
Option Explicit
Public Function CheckHyperlink(ByVal strUrl As String) As Boolean
Dim oHttp As New MSXML2.XMLHTTP30
On Error GoTo ErrorHandler
oHttp.Open "HEAD", strUrl, False
oHttp.send
If Not oHttp.Status = 200 Then CheckHyperlink = False Else CheckHyperlink = True
Exit Function
ErrorHandler:
CheckHyperlink = False
End Function
Please also be aware that, if the page is down, the timeout can be long.

Resources