Sort dead hyperlinks in Excel with VBA? - excel

The title says it:
I have an excel Sheet with an column full of hyperlinks. Now I want that an VBA Script checks which hyperlinks are dead or work and makes an entry into the next columns either with the text 404 Error or active.
Hopefully someone can help me because I am not really good at VB.
EDIT:
I found # http://www.utteraccess.com/forums/printthread.php?Cat=&Board=84&main=1037294&type=thread
A solution which is made for word but the Problem is that I need this solution for Excel. Can someone translate this to Excel solution?
Private Sub testHyperlinks()
Dim thisHyperlink As Hyperlink
For Each thisHyperlink In ActiveDocument.Hyperlinks
If thisHyperlink.Address <> "" And Left(thisHyperlink.Address, 6) <> "mailto" Then
If Not IsURLGood(thisHyperlink.Address) Then
Debug.Print thisHyperlink.Address
End If
End If
Next
End Sub
Private Function IsURLGood(url As String) As Boolean
' Test the URL to see if it is good
Dim request As New WinHttpRequest
On Error GoTo IsURLGoodError
request.Open "GET", url
request.Send
If request.Status = 200 Then
IsURLGood = True
Else
IsURLGood = False
End If
Exit Function
IsURLGoodError:
IsURLGood = False
End Function

First add a reference to Microsoft XML V3 (or above), using Tools->References. Then paste this code:
Option Explicit
Sub CheckHyperlinks()
Dim oColumn As Range
Set oColumn = GetColumn() ' replace this with code to get the relevant column
Dim oCell As Range
For Each oCell In oColumn.Cells
If oCell.Hyperlinks.Count > 0 Then
Dim oHyperlink As Hyperlink
Set oHyperlink = oCell.Hyperlinks(1) ' I assume only 1 hyperlink per cell
Dim strResult As String
strResult = GetResult(oHyperlink.Address)
oCell.Offset(0, 1).Value = strResult
End If
Next oCell
End Sub
Private Function GetResult(ByVal strUrl As String) As String
On Error Goto ErrorHandler
Dim oHttp As New MSXML2.XMLHTTP30
oHttp.Open "HEAD", strUrl, False
oHttp.send
GetResult = oHttp.Status & " " & oHttp.statusText
Exit Function
ErrorHandler:
GetResult = "Error: " & Err.Description
End Function
Private Function GetColumn() As Range
Set GetColumn = ActiveWorkbook.Worksheets(1).Range("A:A")
End Function

Gary's code is perfect, but I would rather use a public function in a module and use it in a cell as function. The advantage is that you can use it in a cell of your choice or anyother more complex function.
In the code below I have adjusted Gary's code to return a boolean and you can then use this output in an =IF(CHECKHYPERLINK(A1);"OK";"FAILED"). Alternatively you could return an Integer and return the status itself (eg.: =IF(CHECKHYPERLINK(A1)=200;"OK";"FAILED"))
A1: http://www.whatever.com
A2: =IF(CHECKHYPERLINK(A1);"OK";"FAILED")
To use this code please follow Gary's instructions and additionally add a module to the workbook (right click on the VBAProject --> Insert --> Module) and paste the code into the module.
Option Explicit
Public Function CheckHyperlink(ByVal strUrl As String) As Boolean
Dim oHttp As New MSXML2.XMLHTTP30
On Error GoTo ErrorHandler
oHttp.Open "HEAD", strUrl, False
oHttp.send
If Not oHttp.Status = 200 Then CheckHyperlink = False Else CheckHyperlink = True
Exit Function
ErrorHandler:
CheckHyperlink = False
End Function
Please also be aware that, if the page is down, the timeout can be long.

Related

Open series of links in cells, search website content and return a value

I'm looking to carry out a task in VBA of searching a number of websites for a specified phrase.
I've got a spreadsheet with a dynamic data which changes accordingly creating a link in column F.
I am looking for a macro to open each individual link, search website content for a specifif phrase and then if the phrase is present on the website then to return a value in a cell to the right Offset by 0,1 otherwise to leave the cell blank and move to next row.
Is such task possible to carry out through vba?
I've tried researching similar requests but the result is not what I require.
Code I found online is kind of what I need but it produces the same result if the phrase is present or absent on the website.
Option Explicit
Sub SearchForString()
Dim rngURL As Range
Dim cll As Range
Dim stCheck As String
Dim xmlHttp As Object
On Error Resume Next
Set xmlHttp = CreateObject("Microsoft.XMLHTTP")
If xmlHttp Is Nothing Then
MsgBox "Unable to create XMLHTTP object, it's probably not installed on this machine", vbCritical
Exit Sub
End If
Set rngURL = Application.InputBox("Select the range of URLs to check", "Select Range", Selection, Type:=8)
On Error GoTo 0
If rngURL Is Nothing Then Exit Sub
stCheck = InputBox("Enter the text to search", "", "")
If Len(stCheck) = 0 Then Exit Sub
For Each cll In rngURL.Cells
If CheckURL(xmlHttp, cll.Value, stCheck) Then
cll.Offset(, 1).Value = 1
End If
Next cll
End Sub
Private Function CheckURL(ByRef xmlHttp As Object, ByVal URL As String, ByVal stCheck As String) As Boolean
Dim stResult As String
If Not LCase$(URL) Like "http://*" Then
URL = "http://" & URL
End If
xmlHttp.Open "GET", URL, False
xmlHttp.Send ""
If xmlHttp.readyState = 4 Then
If xmlHttp.Status = 200 Then
stResult = xmlHttp.responseText
If InStr(1, stResult, stCheck, vbBinaryCompare) > 0 Then
CheckURL = True
End If
End If
End If
End Function
Many thanks in advance.

VBA Loop through row and return value

Really would appreciate some help on this. I have a vba script that scrapes data, it opens the URL that is contained in the L column, L4 in this example. Then later down the script it inputs my given value into Col E, Row 4.
Sub ImportData()
...
With CreateObject("msxml2.xmlhttp")
.Open "GET", Range("L4"), False 'Cell that contains hyperlink
.send
HTML_Content.body.innerHTML = .responseText
End With
...
'Then I want to return a value
Sheets(1).Range("E4").Value = rng1.Offset(0, 1)
End Sub
I am trying to make a loop so that the script runs automatically and cycles through column L and runs the script for every row that contains a hyperlink in Col L, and then inputs the value to its respective row in Col E.
I have tried changing the code below that another user suggested without success:
Sub ImportData(urlToOpen as string)
...
.Open "GET", urlToOpen, False 'Cell that contains hyperlink
...
'Then I want to return a value
Sheets(1).Range(E, i).Value = rng1.Offset(0, 1) ' I know that's wrong
and add a calling procedure:
Sub CallRangeL_Urls()
For Each i In Sheet1.Range("L4:L200")
Call ImportData(i)
Next i
End Sub
I keep getting ByRef type argument mismatch error on Call ImportData(i)
Also I am not sure what so ever on how to acheive calling the value to the specific row that is being processed in the loop. Any help would be greatly appreciated. Thanks
Try the following:
Public Sub ImportData(ByVal urlToOpen As String)
And
Public Sub CallRangeL_Urls()
Dim i As Range
For Each i In Sheet1.Range("L4:L200")
ImportData i.Value
Next i
End Sub
Personally, I would reference the workbook as well and I usually use Worksheets("SheetName") but I know a lot of people like to use codeName.
You only want to pass the value of the link in the cell so ByVal is the appropriate way.
As touching the sheet is expensive I would probably dump the urls in an array and loop that, adding a basic test that I am working with an url:
Public Sub CallRangeL_Urls()
Dim arr(), i As Long
arr = Application.Transpose(Sheet1.Range("L4:L200").Value)
For i = LBound(arr) To UBound(arr)
If InStr(arr(i), "http") > 0 Then ImportData arr(i)
Next i
End Sub
To write out extracted value to column E, same row as url, I think you need to convert your ImportData sub to a function that returns the extracted value. Or better still, create a class to hold the xmlhttp object which then has a method to return the value (that way you don't keep creating and destroying the object - which you do if you create the object in the function. You could also create the xmlhttp object in the first sub and pass to the function as an argument - I show you pseudo code for that below).
Public Sub CallRangeL_Urls()
Dim arr(), i As Long
'code to create xmlhttp object
arr = Application.Transpose(Sheet1.Range("L4:L200").Value)
For i = LBound(arr) To UBound(arr)
If InStr(arr(i), "http") > 0 Then
Sheet1.Cells(i + 3, "E") = ImportData(arr(i), xmlhttpObject)
End If
Next i
End Sub
Public Function ImportData(ByVal urlToOpen As String, ByVal xmlhttpObject As Object) As String
''Any declarations
'Dim extractedValue As String
'Dim html As HTMLDocument
'Set html = New HTMLDocument
With xmlhttpObject
.Open "GET", urlToOpen, False
.send
html.body.innerHTML = .responseText
''code to extract value
'extractedValue = html.querySelector("someSelector")
ImportData = extractedValue
End Function

Search a website with Excel data to extract results and then loop

I have 8000 values in an Excel spreadsheet.
I need to search a website and then record a specific line of data from the website to in the Excel spreadsheet.
I found code which searches for data excel macro to search a website and extract results
Sub URL_Get_ABN_Query()
strSearch = Range("a1")
With ActiveSheet.QueryTables.Add( _
Connection:="URL;http://www.abr.business.gov.au/SearchByABN.aspx?SearchText=" & _
strSearch & "&safe=active", _
Destination:=Range("a5"))
.BackgroundQuery = True
.TablesOnlyFromHTML = True
.Refresh BackgroundQuery:=False
.SaveData = True
End With
'enter code here
End Sub
It collects the data from the website like this.
I only want the 'entity type' data line.
I can not find how to extend the code to only grab this line and input to the corresponding cell. i.e. ABN(b2)search, find input 'entity type' and paste into Company Type(c2).
Alternatively, I tried to find how to fill the information vertically instead of horizontally. I could delete the columns that are not needed. I thought this may be simpler.
I tried to record the macro with developer.
I also need to loop to the next ABN and populate the corresponding field and so on (B3>C3, B4>C4, etc.).
This is absolutely possible. You've got what I often find the hardest part, sourcing the information from another platform. To make this work I would separate it out a little bit and for simplicity use 2 sheets (Sheet1 with your known data and Sheet2 for the web data).
Loop through your table of ~8000 businesses. We can identify this from the UsedRange number of Rows. We know that the ABN is in column 2 (also known as B) so we copy that into the variable to pass to the function. The function will return the "Entity type:" to column 3 (C) of the same row.
Sub LoopThroughBusinesses()
Dim i As Integer
Dim ABN As String
For i = 2 To Sheet1.UsedRange.Rows.Count
ABN = Sheet1.Cells(i, 2)
Sheet1.Cells(i, 3) = URL_Get_ABN_Query(ABN)
Next i
End Sub
Change the subroutine you created to a Function so it returns the entity type you are after. The function will save the data into Sheet2 and then return just the Entity data that we are after.
Function URL_Get_ABN_Query(strSearch As String) As String ' Change it from a Sub to a Function that returns the desired string
' strSearch = Range("a1") ' This is now passed as a parameter into the Function
Dim entityRange As Range
With Sheet2.QueryTables.Add( _
Connection:="URL;http://www.abr.business.gov.au/SearchByABN.aspx?SearchText=" & strSearch & "&safe=active", _
Destination:=Sheet2.Range("A1")) ' Change this destination to Sheet2
.BackgroundQuery = True
.TablesOnlyFromHTML = True
.Refresh BackgroundQuery:=False
.SaveData = True
End With
' Find the Range that has "Entity Type:"
Set entityRange = Sheet2.UsedRange.Find("Entity type:")
' Then return the value of the cell to its' right
URL_Get_ABN_Query = entityRange.Offset(0, 1).Value2
' Clear Sheet2 for the next run
Sheet2.UsedRange.Delete
End Function
You do not want a load of connections (queryTables) set up in this way. It will be so slow if even possible. At 8000 requests, provided xmlhttp is not blocked or throttled, the below method will be significantly faster. If there does appear to be slowing/blocking then add in a small wait every x requests.
If possible use xmlhttp to gather data. Use css selectors to specifically target the entity type. Store values in an array and write out with loop at end. Use a class to hold the xmlhttp object for greater efficiency. Provide your class with methods including how to handle not found (example given). Add some further optimizations e.g. given is switching off screen-updating. This assumes your search numbers are in column B from B2. The code below also does some basic checks that there is something present in column B and handles the case of there being 1 or more numbers.
Good code is modular and you want a function to return something and a sub to perform actions. A single sub/function shouldn't complete lots of tasks. You want to easily debug with code that follows the principle of single responsibility (or close to it).
class clsHTTP
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetHTML(ByVal URL As String) As String
Dim sResponse As String
With http
.Open "GET", URL, False
.send
GetHTML = StrConv(.responseBody, vbUnicode)
End With
End Function
Public Function GetEntityType(ByVal html As HTMLDocument) As String
On Error GoTo errhand:
GetEntityType = html.querySelector("a[href*='EntityTypeDescription']").innerText
Exit Function
errhand:
GetEntityType = "Not Found"
End Function
Standard module:
Option Explicit
Public Sub GetInfo()
Dim http As clsHTTP, sResponse As String, lastRow As Long, groupResults(), i As Long, html As HTMLDocument
Set html = New HTMLDocument
Set http = New clsHTTP
Const BASE_URL As String = "http://www.abr.business.gov.au/ABN/View/"
Application.ScreenUpdating = False
With ThisWorkbook.Worksheets("Sheet1")
lastRow = .Cells(.rows.Count, "B").End(xlUp).Row
Select Case lastRow
Case 1
Exit Sub
Case 2
ReDim arr(1, 1): arr(1, 1) = .Range("B2").Value
Case Else
arr = .Range("B2:B" & lastRow).Value
End Select
ReDim groupResults(1 To lastRow - 1)
With http
For i = LBound(arr, 1) To UBound(arr, 1)
If Len(BASE_URL & arr(i, 1)) > Len(BASE_URL) Then
sResponse = .GetHTML(BASE_URL & arr(i, 1))
html.body.innerHTML = sResponse
groupResults(i) = .GetEntityType(html)
sResponse = vbNullString: html.body.innerHTML = vbNullString
End If
Next
End With
For i = LBound(groupResults) To UBound(groupResults)
.Cells(i + 1, "C") = groupResults(i)
Next
End With
Application.ScreenUpdating = True
End Sub
References (VBE> Tools > References):
Microsoft HTML Object Library
CSS selectors:
I use the fact the entity description is a hyperlink (a tag) and that its value contains the string EntityTypeDescription to use a css attribute = value with contains (*) operator to target.

Excel VBA: get error code for invalid URL in hyperlink with WinHttpRequest

In Excel, I have a list with URLs. I need to check if IE (default browser) can open these. They don't have to open actually, it's to check the accessibility.
If they can't open, I need to isolate the error-code and place that in another column.
After searching around here, I started with following the hyperlinks, and used GET to get the data in a MsgBox. This seems to work partially, but of course now I get the MsgBox with every URL also without error. Also I'm looking for a way to extract the error and place that in the active sheet.
What I've got so far:
Sub Request_Data()
' declare
numRow = 2
Dim MyRequest As Object
' activate URLs without Follow
Do While ActiveSheet.Range("C" & numRow).Hyperlinks.Count > 0
numRow = numRow + 1
' create request
Set MyRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
MyRequest.Open "GET", _
ActiveSheet.Range("C" & numRow)
' send request
MyRequest.Send
' outcome
MsgBox MyRequest.ResponseText
' isolate the error code (for example 404)
' place error code in excel sheet in column H next to row URL
Loop
End Sub
Does someone know how I should proceed?
I thought this might be useful but I don't know where to start.
Checking for broken hyperlinks in Excel
and
Bulk Url checker macro excel
Thanks in advance
See the code below - you will need to adapt the Test sub-routine to loop through your cells and call IsValidUrl for each value you want to test:
Option Explicit
Sub Test()
MsgBox IsValidUrl("http://www.thisdoesnotexistxxxxxxxxxxxxx.com/")
MsgBox IsValidUrl("http://www.google.com/")
MsgBox IsValidUrl("http://www.ppppppppppppqqqqqqqqqqqqqqrrrrrrrrrrrrr.com/")
End Sub
Function IsValidUrl(strUrl As String) As Long
Dim objRequest As Object
Dim lngCode As Long
On Error GoTo ErrHandler
Set objRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
With objRequest
.Open "GET", strUrl
.Send
lngCode = 0
End With
GoTo ExitHandler
ErrHandler:
lngCode = Err.Number
ExitHandler:
Set objRequest = Nothing
IsValidUrl = lngCode
End Function
My output is:
-2147012889
0
-2147012889

Excel VBA check if named range is set

I'm trying to determine if a named range has been set via VBA. The named range is called LoadedToken and essentially is loaded when a user clicks a particular button. I use this as proof that initialisation has taken place.
I have a function to check if this has taken place:
Function ToolIsEnabled()
' We check if the R2A add-in has been loaded by detecting the named range
If ActiveWorkbook.Names("LoadedToken") Is Nothing Then
ToolIsEnabled = False
Else
ToolIsEnabled = True
End If
End Function
and I get an application error. Of course, the VBA is incorrect. However how can I actually accomplish this?!
Sub Test()
Debug.Print IsNamedRange("Bumsti")
End Sub
Function IsNamedRange(RName As String) As Boolean
Dim N As Name
IsNamedRange = False
For Each N In ActiveWorkbook.Names
If N.Name = RName Then
IsNamedRange = True
Exit For
End If
Next
End Function
Usage in OP context could be
' ...
If IsNamedRange("LoadedToken") Then
' ...
End If
' ...
or - if a program specific Bool needs to be set
' ...
Dim IsTokenLoaded as Boolean
IsTokenLoaded = IsNamedRange("LoadedToken")
' ...
Both constructs make it pretty clear in the source code what you are aiming for.
You can achieve this by using error handling:
Function ToolIsEnabled() As Boolean
Dim rng As Range
On Error Resume Next
Set rng = ActiveWorkbook.Range("LoadedToken")
On Error GoTo 0
ToolIsEnabled = Not rng is Nothing
End Function
This will check either in ThisWorkbook or a named workbook and return TRUE/FALSE.
Sub Test()
MsgBox NamedRangeExists("SomeName")
MsgBox NamedRangeExists("SomeOtherName", Workbooks("Book1.xls"))
End Sub
Public Function NamedRangeExists(sName As String, Optional Book As Workbook) As Boolean
On Error Resume Next
If Book Is Nothing Then
Set Book = ThisWorkbook
End If
NamedRangeExists = Book.Names(sName).Index <> (Err.Number = 0)
On Error GoTo 0
End Function
Edit:
A shorter version if it's only going to look in ThisWorkbook:
Public Function NamedRangeExists(sName As String) As Boolean
On Error Resume Next
NamedRangeExists = ThisWorkbook.Names(sName).Index <> (Err.Number = 0)
On Error GoTo 0
End Function
For the activeworkbook, you could also call the old XLM NAMES() function:
Function IsNameInActiveWorkbook(sName As String) As Boolean
IsNameInActiveWorkbook = Not IsError(Application.ExecuteExcel4Macro("MATCH(""" & sName & """,NAMES(),0)"))
End Function
As per Tom's answer these 2 line should do the trick:
On Error Resume Next
Set TestRange = ActiveWorkbook.Range("LoadedToken") 'if it does **not** exist this line will be ERROR

Resources