Open series of links in cells, search website content and return a value - excel

I'm looking to carry out a task in VBA of searching a number of websites for a specified phrase.
I've got a spreadsheet with a dynamic data which changes accordingly creating a link in column F.
I am looking for a macro to open each individual link, search website content for a specifif phrase and then if the phrase is present on the website then to return a value in a cell to the right Offset by 0,1 otherwise to leave the cell blank and move to next row.
Is such task possible to carry out through vba?
I've tried researching similar requests but the result is not what I require.
Code I found online is kind of what I need but it produces the same result if the phrase is present or absent on the website.
Option Explicit
Sub SearchForString()
Dim rngURL As Range
Dim cll As Range
Dim stCheck As String
Dim xmlHttp As Object
On Error Resume Next
Set xmlHttp = CreateObject("Microsoft.XMLHTTP")
If xmlHttp Is Nothing Then
MsgBox "Unable to create XMLHTTP object, it's probably not installed on this machine", vbCritical
Exit Sub
End If
Set rngURL = Application.InputBox("Select the range of URLs to check", "Select Range", Selection, Type:=8)
On Error GoTo 0
If rngURL Is Nothing Then Exit Sub
stCheck = InputBox("Enter the text to search", "", "")
If Len(stCheck) = 0 Then Exit Sub
For Each cll In rngURL.Cells
If CheckURL(xmlHttp, cll.Value, stCheck) Then
cll.Offset(, 1).Value = 1
End If
Next cll
End Sub
Private Function CheckURL(ByRef xmlHttp As Object, ByVal URL As String, ByVal stCheck As String) As Boolean
Dim stResult As String
If Not LCase$(URL) Like "http://*" Then
URL = "http://" & URL
End If
xmlHttp.Open "GET", URL, False
xmlHttp.Send ""
If xmlHttp.readyState = 4 Then
If xmlHttp.Status = 200 Then
stResult = xmlHttp.responseText
If InStr(1, stResult, stCheck, vbBinaryCompare) > 0 Then
CheckURL = True
End If
End If
End If
End Function
Many thanks in advance.

Related

Bulk Email Extractor. Need to add a Regxp - excel VBA

I am trying to update my first code with parts of a second code, but have been struggling for a few weeks now. Both codes extract emails from a url list.
THE FIRST CODE
This code is fine apart from it uses Mailto: rather than a Regxp. I am trying to replace the Mailto: in the first code with a Regxp from the second code as the regxp extracts more emails.
How the First code works.
A list of urls are placed in Sheet2 "Urls" and the results are show in Sheet1 "Results". This code will extract ALL emails from the site, so if there are 10 emails it will extract ALL 10, if 100 then it will extract All 100 EMAIL. The SECOND code only extracts 1 email per site.
The problem with the FIRST CODE is that the Mailto: does MISS a lot of email where as the REGXP captures more, and so I want to update the first code. However I have been struggling for a few weeks now as I am not super hot in writing code. The first code was originaly written by me and then updated by another developer and his code is wayout of my skill depth. I have been trying for weeks to update it but can not seem to work anything out so I decided to post.
I have listed the first code below. You can download a sample workbook from here Download Link First code
Link to my second code that I also wrote, and the EMAIL REGXP i am trying to use, My Post StackOver Flow
FIRST CODE, Bulk Email Extractor. ONLY EMAIL PART NEEDS UPDATING WITH REGXP.
Sub ScrapeSoMeAndMailAddresses()
'Columns for both tables
Const colUrl As Long = 1 'Must always be the first column
Const colMail As Long = 2 'Must always be the first column before Some platforms
Const colFacebook As Long = 3 'Must always be the last column of Some platforms
Const colError As Long = 4 'Must always be the last column
Dim url As String
Dim http As Object
Dim htmlDoc As Object
Dim nodeAllLinks As Object
Dim nodeOneLink As Object
Dim pageLoadSuccessful As Boolean
Dim tableUrlsOneAddressLeft As String
Dim tableAllAddresses As String
Dim currentRowTableUrls As Long
Dim lastRowTableUrls As Long
Dim currentRowsTableAll(colUrl To colFacebook) As Long
Dim lastRowTableAll As Long
Dim addressCounters(colMail To colFacebook) As Long
Dim checkCounters As Long
'Initialize variables
tableUrlsOneAddressLeft = "Urls" ''Name of Sheet
currentRowTableUrls = 2 'First row for content
tableAllAddresses = "Results" ''Name of Sheet
For checkCounters = colUrl To colFacebook
currentRowsTableAll(checkCounters) = 2 'First rows for content
Next checkCounters
Set htmlDoc = CreateObject("htmlfile")
Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
'Clear all contents and comments in the URL source sheet from email column to error column
With Sheets(tableUrlsOneAddressLeft)
lastRowTableUrls = .Cells(Rows.Count, colUrl).End(xlUp).Row
.Range(.Cells(currentRowTableUrls, colMail), .Cells(lastRowTableUrls, colError)).ClearContents
.Range(.Cells(currentRowTableUrls, colMail), .Cells(lastRowTableUrls, colError)).ClearComments
End With
'Delete all rows except headline in the sheet with all addresses
lastRowTableAll = Sheets(tableAllAddresses).Cells(Rows.Count, colUrl).End(xlUp).Row
Sheets(tableAllAddresses).Rows(currentRowsTableAll(colUrl) & ":" & lastRowTableAll).Delete Shift:=xlUp
'Loop over all URLs in column A in the URL source sheet
Do While Sheets(tableUrlsOneAddressLeft).Cells(currentRowTableUrls, 1).Value <> ""
'Scroll for visual monitoring, if 'the sheet with the URLs are the
'active one
If ActiveSheet.Name = tableUrlsOneAddressLeft Then
If currentRowTableUrls > 14 Then
ActiveWindow.SmallScroll down:=1
End If
Sheets(tableUrlsOneAddressLeft).Cells(currentRowTableUrls, 1).Select
End If
'Get next url from the URL source sheet
url = Sheets(tableUrlsOneAddressLeft).Cells(currentRowTableUrls, colUrl).Value
'Try to load page 'Temporarily disable error handling if 'there is a timeout or onother error
On Error Resume Next
http.Open "GET", url, False
http.send
'Check if page loading was successful
If Err.Number = 0 Then
pageLoadSuccessful = True
End If
On Error GoTo 0
If pageLoadSuccessful Then
'Build html document for DOM operations
htmlDoc.body.innerHtml = http.responseText
'Create node list from all links of the page
Set nodeAllLinks = htmlDoc.getElementsByTagName("a")
'Walk through all links of the node list
For Each nodeOneLink In nodeAllLinks
'''#####################################################################################################
'''################################### THIS IS THE START OF THE EMAIL SECTION ##########################
'''#####################################################################################################
'Check for mail address
If InStr(1, nodeOneLink.href, "mailto:") Then
'Write mail address to both tables
Sheets(tableUrlsOneAddressLeft).Cells(currentRowTableUrls, colMail).Value = Right(nodeOneLink.href, Len(nodeOneLink.href) - InStr(nodeOneLink.href, ":"))
Sheets(tableAllAddresses).Cells(currentRowsTableAll(colMail), colMail).Value = Right(nodeOneLink.href, Len(nodeOneLink.href) - InStr(nodeOneLink.href, ":"))
'Check if it is a new line in the sheet with all addresses
If currentRowsTableAll(colMail) >= currentRowsTableAll(colUrl) Then
'Write URL in the new line of the sheet with all addresses
Sheets(tableAllAddresses).Cells(currentRowsTableAll(colUrl), colUrl).Value = url
'Increment url counter
currentRowsTableAll(colUrl) = currentRowsTableAll(colUrl) + 1
End If
'Increment mail counters
currentRowsTableAll(colMail) = currentRowsTableAll(colMail) + 1
addressCounters(colMail) = addressCounters(colMail) + 1
End If
'''#####################################################################################################
'''################################### END OF THE EMAIL SECTION ########################################
'''#####################################################################################################
'Check for Facebook address
If InStr(1, UCase(nodeOneLink.href), "FACEBOOK") Then
'Write Facebook address to both tables
Sheets(tableUrlsOneAddressLeft).Cells(currentRowTableUrls, colFacebook).Value = nodeOneLink.href
Sheets(tableAllAddresses).Cells(currentRowsTableAll(colFacebook), colFacebook).Value = nodeOneLink.href
'Check if it is a new line in the sheet with all addresses
If currentRowsTableAll(colFacebook) >= currentRowsTableAll(colUrl) Then
'Write URL in the new line of the sheet with all addresses
Sheets(tableAllAddresses).Cells(currentRowsTableAll(colUrl), colUrl).Value = url
'Increment url counter
currentRowsTableAll(colUrl) = currentRowsTableAll(colUrl) + 1
End If
'Increment Facebook counters
currentRowsTableAll(colFacebook) = currentRowsTableAll(colFacebook) + 1
addressCounters(colFacebook) = addressCounters(colFacebook) + 1
End If
Next nodeOneLink
'Check address counters
For checkCounters = colMail To colFacebook
'Set comment if more than 1 link were found
If addressCounters(checkCounters) > 1 Then
Sheets(tableUrlsOneAddressLeft).Cells(currentRowTableUrls, checkCounters).AddComment Text:=CStr(addressCounters(checkCounters))
Sheets(tableUrlsOneAddressLeft).Cells(currentRowTableUrls, checkCounters).Comment.Shape.TextFrame.AutoSize = True
End If
Next checkCounters
Else
'Page not loaded
'Write message URL table
Sheets(tableUrlsOneAddressLeft).Cells(currentRowTableUrls, colError).Value = "Error with URL or timeout"
End If
'Prepare for next page
pageLoadSuccessful = False
Erase addressCounters
lastRowTableAll = Sheets(tableAllAddresses).Cells(Rows.Count, colUrl).End(xlUp).Row
For checkCounters = colUrl To colFacebook
currentRowsTableAll(checkCounters) = lastRowTableAll + 1 'First rows for next page content
Next checkCounters
currentRowTableUrls = currentRowTableUrls + 1
Loop
'Clean up
Set http = Nothing
Set htmlDoc = Nothing
Set nodeAllLinks = Nothing
Set nodeOneLink = Nothing
End Sub
As always thanks in advance.
I recommend you save a copy of your workbook before running the code in my answer. Hopefully it works and does what you need it to, I wasn't able to test it fully.
The GetEmailAddressesFromHtml function in the code below extracts email addresses using the regular expression you've included in your answer.
I think one thing the code below doesn't do (which your code did) was add comments to the Urls worksheet. But you could set up an Excel formula (e.g. COUNTIFS) to do that in my opinion.
I only add unique Facebook URLs and email addresses, so you shouldn't see any duplicates on the Results sheet.
Option Explicit
Option Private Module 'This option means that Subs of this module are not displayed in the macros of the Excel GUI
' https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/raise-method
Private Const ERR_REQUEST_FAILED As Long = 513
Private Const ERR_INVALID_HTML As Long = 514
Private Sub PrepareSourceSheet( _
ByVal someSheet As Worksheet, ByVal firstRowToClear As Long, ByVal lastRowToClear As Long, _
ByVal firstColumnToClear As Long, ByVal lastColumnToClear As Long)
' Should clear all contents and comments in the source sheet from email column to error column
With someSheet
With .Range(.Cells(firstRowToClear, firstColumnToClear), .Cells(lastRowToClear, lastColumnToClear))
Debug.Assert Intersect(.Cells, .Parent.Columns(1)) Is Nothing
.ClearContents
.ClearComments
End With
End With
End Sub
Private Sub PrepareDestinationSheet(ByVal someSheet As Worksheet, ByVal firstRowToDelete As Long)
'Should delete all rows starting from and including "firstRowToDelete".
With someSheet
.rows(firstRowToDelete & ":" & .rows.CountLarge).Delete Shift:=xlUp
End With
End Sub
Private Sub ScrapeSomeData()
'Columns for both tables
Const COLUMN_URL As Long = 1 'Must always be the first column
Const COLUMN_EMAIL As Long = 2 'Must always be the first column before Some platforms
Const COLUMN_FACEBOOK As Long = 3 'Must always be the last column of Some platforms
Const COLUMN_ERROR As Long = 4 'Must always be the last column
Const FIRST_SOURCE_ROW As Long = 2 ' Skip headers
Const FIRST_DESTINATION_ROW As Long = 2 ' Skip headers
Dim sourceSheet As Worksheet
Set sourceSheet = ThisWorkbook.Worksheets("Urls")
Dim destinationSheet As Worksheet
Set destinationSheet = ThisWorkbook.Worksheets("Results")
Dim lastSourceRow As Long
lastSourceRow = sourceSheet.Cells(sourceSheet.rows.Count, COLUMN_URL).End(xlUp).Row
PrepareSourceSheet someSheet:=sourceSheet, firstRowToClear:=FIRST_SOURCE_ROW, _
lastRowToClear:=lastSourceRow, firstColumnToClear:=COLUMN_EMAIL, lastColumnToClear:=COLUMN_ERROR
PrepareDestinationSheet someSheet:=destinationSheet, firstRowToDelete:=FIRST_DESTINATION_ROW
Dim destinationRowIndex As Long
destinationRowIndex = FIRST_DESTINATION_ROW
Dim sourceRowIndex As Long
For sourceRowIndex = FIRST_SOURCE_ROW To lastSourceRow
Dim data As Collection
Set data = GetDataForUrl(sourceSheet.Cells(sourceRowIndex, COLUMN_URL))
With destinationSheet
Dim currentRowData As Variant
For Each currentRowData In data
.Cells(destinationRowIndex, COLUMN_URL).Value = currentRowData("url")
.Cells(destinationRowIndex, COLUMN_EMAIL).Value = currentRowData("emailAddress")
.Cells(destinationRowIndex, COLUMN_FACEBOOK).Value = currentRowData("facebookUrl")
.Cells(destinationRowIndex, COLUMN_ERROR).Value = currentRowData("errorMessage")
destinationRowIndex = destinationRowIndex + 1
Next currentRowData
End With
With sourceSheet
.Cells(sourceRowIndex, COLUMN_EMAIL).Value = data(1)("emailAddress")
.Cells(sourceRowIndex, COLUMN_FACEBOOK).Value = data(1)("facebookUrl")
.Cells(sourceRowIndex, COLUMN_ERROR).Value = data(1)("errorMessage")
End With
DoEvents
Next sourceRowIndex
End Sub
Private Function GetHtmlFromUrl(ByVal someUrl As String) As Object
' Should return a HTML document. Raises an error if URL is unavailable
' (at the time of requesting) or if HTML could not be assigned.
Dim httpClient As Object
Set httpClient = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Dim succeeded As Boolean
httpClient.Open "GET", someUrl, False
On Error Resume Next
httpClient.Send
succeeded = (0 = Err.Number)
On Error GoTo 0
If Not succeeded Then Err.Raise ERR_REQUEST_FAILED, , "Error with URL or timeout"
Dim htmlDocument As Object
Set htmlDocument = CreateObject("htmlfile")
On Error Resume Next
htmlDocument.body.innerHTML = httpClient.responseText
succeeded = (0 = Err.Number)
On Error GoTo 0
If Not succeeded Then Err.Raise ERR_INVALID_HTML, , "Error whilst assigning HTML"
Set GetHtmlFromUrl = htmlDocument
End Function
Private Function GetFacebookUrlsFromHtml(ByVal htmlDocument As Object) As Collection
' Should return a collection of strings that are Facebook URLs detected.
' This function only looks within anchor tags.
Dim outputCollection As Collection
Set outputCollection = New Collection
Dim allAnchorTags As Object
Set allAnchorTags = htmlDocument.getElementsByTagName("a")
Dim anchorTag As Object
For Each anchorTag In allAnchorTags
If InStr(1, UCase$(anchorTag.href), "FACEBOOK", vbBinaryCompare) > 0 Then
On Error Resume Next
outputCollection.Add anchorTag.href, Key:=anchorTag.href ' De-duplicate here
On Error GoTo 0
End If
Next anchorTag
Set GetFacebookUrlsFromHtml = outputCollection
End Function
Private Function GetEmailAddressesFromHtml(ByVal htmlDocument As Object) As Collection
' Should return a collection of strings representing email addresses detected
' in the HTML document.
Dim outputCollection As Collection
Set outputCollection = New Collection
Dim regEx As Object
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Pattern = "[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"
.Global = True
Dim emailMatches As Object
Set emailMatches = .Execute(htmlDocument.body.innerHTML)
End With
Dim matchFound As Object
For Each matchFound In emailMatches
On Error Resume Next ' De-duplicate here.
outputCollection.Add matchFound.Value, Key:=matchFound.Value
On Error GoTo 0
Next matchFound
Set GetEmailAddressesFromHtml = outputCollection
End Function
Private Function CreateRowDataForError(ByVal wasSuccess, ByVal errorMessage, ByVal someUrl As String) As Collection
' Context: An error has occurred and we don't have any data (Facebook URLs, email addresses).
' So can only return the URL attempted and the error message.
Dim nestedCollection As Collection
Set nestedCollection = New Collection
nestedCollection.Add wasSuccess, "wasSuccess"
nestedCollection.Add errorMessage, "errorMessage"
nestedCollection.Add someUrl, "url"
nestedCollection.Add vbNullString, "facebookUrl"
nestedCollection.Add vbNullString, "emailAddress"
Set CreateRowDataForError = New Collection
CreateRowDataForError.Add nestedCollection
Debug.Assert 1 = CreateRowDataForError.Count
End Function
Private Function CreateRowDataForResults(ByVal wasSuccess As Boolean, ByVal errorMessage As String, _
ByVal someUrl As String, ByVal facebookUrls As Collection, ByVal emailAddresses As Collection) As Collection
' Context: No error occurred. HTML document may or may not contain data,
' but logic below should handle both scenarios.
Dim nestedCollection As Collection
Dim outerCollection As Collection
Set outerCollection = New Collection
Dim i As Long
For i = 1 To Application.Max(1, facebookUrls.Count, emailAddresses.Count)
Set nestedCollection = New Collection
nestedCollection.Add wasSuccess, Key:="wasSuccess"
nestedCollection.Add errorMessage, Key:="errorMessage"
nestedCollection.Add someUrl, Key:="url"
nestedCollection.Add GetCollectionItemOrDefault(facebookUrls, i, vbNullString), Key:="facebookUrl"
nestedCollection.Add GetCollectionItemOrDefault(emailAddresses, i, vbNullString), Key:="emailAddress"
outerCollection.Add nestedCollection
Next i
Debug.Assert outerCollection.Count = Application.Max(1, facebookUrls.Count, emailAddresses.Count)
Set CreateRowDataForResults = outerCollection
End Function
Private Function GetDataForUrl(ByVal someUrl As String) As Collection
' Currently this function misuses Collection class. Should probably instead write a class to return a custom object/data structure.
' Returns a collection of nested collections, where each nested collection is as below:
' • "wasSuccess" = whether data was successfully retrieved
' • "errorMessage" = an error message mentioning what happened
' • "facebookUrl" = a Facebook URL detected
' • "emailAddress" = an email address detected
Dim wasSuccess As Boolean
Dim errorMessage As String
Dim htmlDocument As Object
On Error Resume Next
Set htmlDocument = GetHtmlFromUrl(someUrl)
wasSuccess = (0 = Err.Number)
If Not wasSuccess Then
errorMessage = IIf(ERR_REQUEST_FAILED = Err.Number Or ERR_INVALID_HTML = Err.Number, Err.Description, "Unexpected error occurred")
End If
On Error GoTo 0
If Not wasSuccess Then
Set GetDataForUrl = CreateRowDataForError(wasSuccess, errorMessage, someUrl)
Else
Dim facebookUrls As Collection
Set facebookUrls = GetFacebookUrlsFromHtml(htmlDocument)
Dim emailAddresses As Collection
Set emailAddresses = GetEmailAddressesFromHtml(htmlDocument)
Set GetDataForUrl = CreateRowDataForResults(wasSuccess, errorMessage, someUrl, facebookUrls, emailAddresses)
End If
End Function
Private Function GetCollectionItemOrDefault(ByVal someCollection As Collection, ByVal someKey As Variant, ByVal someDefaultValue As Variant) As Variant
' Assumes item is not an object. This function will return false negatives if item being retrieved is an object.
Dim succeeded As Boolean
On Error Resume Next
GetCollectionItemOrDefault = someCollection(someKey)
succeeded = (0 = Err.Number)
On Error GoTo 0
If Not succeeded Then GetCollectionItemOrDefault = someDefaultValue
End Function
From a maintenance and coding perspective, I think something like Node.js or Python would allow you to get the same work done in fewer lines of code.

VBA in Excel 365: XMLHTTP - Can't search for presence of multiple strings for URLs?

So I have a long list of URLs in an Excel sheet. Using a VBA macro I'm checking for the presence of a number of strings in each URL's HTML. I need to know which URL has which combination of strings in them.
I'm outputting the presence or absence of the strings in the main table as essentially: URL being looked at - col 1, presence of 1st string in that URL? - col 2, presence of 2nd string in that URL? - col 3
(And if I can work out how to get this code working, I'll be checking for a few more strings and outputting their presence or absence in further columns).
The strings I'm searching for are encoded into the VBA string.
My code works fine if I'm just searching for the 1st string.
When I adjust the code to search for a 2nd string as well, it doesn't output as it should if my 2nd string is present too.
I'm using a PC, MS Excel 2007 (it's the MS 365 suite on a Lenovo brought this year with Windows 10. I think I'm messing up something easy in the code here, but I can't quite work out how to search for multiple strings rather than just one, and output which ones are present in separate columns next to the URL in question.
Here's the whole code:
Option Explicit
Sub SearchForString()
Dim rngURL As Range
Dim cll As Range
Dim stCheck As String
Dim xmlHttp As Object
On Error Resume Next
Set xmlHttp = CreateObject("Microsoft.XMLHTTP")
If xmlHttp Is Nothing Then
MsgBox "Unable to create XMLHTTP object, it's probably not installed on this machine", vbCritical
Exit Sub
End If
Set rngURL = Application.InputBox("Select the range of URLs to check", "Select Range", Selection, Type:=8)
On Error GoTo 0
If rngURL Is Nothing Then Exit Sub
stCheck = "Stringtofind1"
If Len(stCheck) = 0 Then Exit Sub
For Each cll In rngURL.Cells
If CheckURL(xmlHttp, cll.Value, stCheck) Then
cll.Offset(, 1).Value = "1st string found" 'This all works fine
End If
' ** Start of Bit I can't get to work **
' ** I want to check for a second string, and see if that's here. But I can't seem to get this bit to work? **
If CheckURL(xmlHttp, cll.Value, "2ndStringtofind") Then
cll.Offset(, 2).Value = "2nd string also found"
End If
' ** End of Bit I can't get to work **
Next cll
End Sub
Private Function CheckURL(ByRef xmlHttp As Object, ByVal URL As String, ByVal stCheck As String) As Boolean
Dim stResult As String
If Not LCase$(URL) Like "https://*" Then
URL = "https://" & URL
End If
xmlHttp.Open "GET", URL, False
xmlHttp.Send ""
If xmlHttp.readyState = 4 Then
If xmlHttp.Status = 200 Then
stResult = xmlHttp.responseText
If InStr(1, stResult, stCheck, vbBinaryCompare) > 0 Then
CheckURL = True 'Returns position where string is found!
End If
End If
End If
End Function
To make it easier to read, here's the bit where I think that I'm struggling when I add it to the script:
If CheckURL(xmlHttp, cll.Value, "2ndStringtofind") Then
cll.Offset(, 2).Value = "2nd string also found"
End If
2ndStringtofind isn't a variable, but the actual text I'm looking for.
All help gratefully received in debugging this code. Thanks guys.

VBA Loop through row and return value

Really would appreciate some help on this. I have a vba script that scrapes data, it opens the URL that is contained in the L column, L4 in this example. Then later down the script it inputs my given value into Col E, Row 4.
Sub ImportData()
...
With CreateObject("msxml2.xmlhttp")
.Open "GET", Range("L4"), False 'Cell that contains hyperlink
.send
HTML_Content.body.innerHTML = .responseText
End With
...
'Then I want to return a value
Sheets(1).Range("E4").Value = rng1.Offset(0, 1)
End Sub
I am trying to make a loop so that the script runs automatically and cycles through column L and runs the script for every row that contains a hyperlink in Col L, and then inputs the value to its respective row in Col E.
I have tried changing the code below that another user suggested without success:
Sub ImportData(urlToOpen as string)
...
.Open "GET", urlToOpen, False 'Cell that contains hyperlink
...
'Then I want to return a value
Sheets(1).Range(E, i).Value = rng1.Offset(0, 1) ' I know that's wrong
and add a calling procedure:
Sub CallRangeL_Urls()
For Each i In Sheet1.Range("L4:L200")
Call ImportData(i)
Next i
End Sub
I keep getting ByRef type argument mismatch error on Call ImportData(i)
Also I am not sure what so ever on how to acheive calling the value to the specific row that is being processed in the loop. Any help would be greatly appreciated. Thanks
Try the following:
Public Sub ImportData(ByVal urlToOpen As String)
And
Public Sub CallRangeL_Urls()
Dim i As Range
For Each i In Sheet1.Range("L4:L200")
ImportData i.Value
Next i
End Sub
Personally, I would reference the workbook as well and I usually use Worksheets("SheetName") but I know a lot of people like to use codeName.
You only want to pass the value of the link in the cell so ByVal is the appropriate way.
As touching the sheet is expensive I would probably dump the urls in an array and loop that, adding a basic test that I am working with an url:
Public Sub CallRangeL_Urls()
Dim arr(), i As Long
arr = Application.Transpose(Sheet1.Range("L4:L200").Value)
For i = LBound(arr) To UBound(arr)
If InStr(arr(i), "http") > 0 Then ImportData arr(i)
Next i
End Sub
To write out extracted value to column E, same row as url, I think you need to convert your ImportData sub to a function that returns the extracted value. Or better still, create a class to hold the xmlhttp object which then has a method to return the value (that way you don't keep creating and destroying the object - which you do if you create the object in the function. You could also create the xmlhttp object in the first sub and pass to the function as an argument - I show you pseudo code for that below).
Public Sub CallRangeL_Urls()
Dim arr(), i As Long
'code to create xmlhttp object
arr = Application.Transpose(Sheet1.Range("L4:L200").Value)
For i = LBound(arr) To UBound(arr)
If InStr(arr(i), "http") > 0 Then
Sheet1.Cells(i + 3, "E") = ImportData(arr(i), xmlhttpObject)
End If
Next i
End Sub
Public Function ImportData(ByVal urlToOpen As String, ByVal xmlhttpObject As Object) As String
''Any declarations
'Dim extractedValue As String
'Dim html As HTMLDocument
'Set html = New HTMLDocument
With xmlhttpObject
.Open "GET", urlToOpen, False
.send
html.body.innerHTML = .responseText
''code to extract value
'extractedValue = html.querySelector("someSelector")
ImportData = extractedValue
End Function

Excel VBA: get error code for invalid URL in hyperlink with WinHttpRequest

In Excel, I have a list with URLs. I need to check if IE (default browser) can open these. They don't have to open actually, it's to check the accessibility.
If they can't open, I need to isolate the error-code and place that in another column.
After searching around here, I started with following the hyperlinks, and used GET to get the data in a MsgBox. This seems to work partially, but of course now I get the MsgBox with every URL also without error. Also I'm looking for a way to extract the error and place that in the active sheet.
What I've got so far:
Sub Request_Data()
' declare
numRow = 2
Dim MyRequest As Object
' activate URLs without Follow
Do While ActiveSheet.Range("C" & numRow).Hyperlinks.Count > 0
numRow = numRow + 1
' create request
Set MyRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
MyRequest.Open "GET", _
ActiveSheet.Range("C" & numRow)
' send request
MyRequest.Send
' outcome
MsgBox MyRequest.ResponseText
' isolate the error code (for example 404)
' place error code in excel sheet in column H next to row URL
Loop
End Sub
Does someone know how I should proceed?
I thought this might be useful but I don't know where to start.
Checking for broken hyperlinks in Excel
and
Bulk Url checker macro excel
Thanks in advance
See the code below - you will need to adapt the Test sub-routine to loop through your cells and call IsValidUrl for each value you want to test:
Option Explicit
Sub Test()
MsgBox IsValidUrl("http://www.thisdoesnotexistxxxxxxxxxxxxx.com/")
MsgBox IsValidUrl("http://www.google.com/")
MsgBox IsValidUrl("http://www.ppppppppppppqqqqqqqqqqqqqqrrrrrrrrrrrrr.com/")
End Sub
Function IsValidUrl(strUrl As String) As Long
Dim objRequest As Object
Dim lngCode As Long
On Error GoTo ErrHandler
Set objRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
With objRequest
.Open "GET", strUrl
.Send
lngCode = 0
End With
GoTo ExitHandler
ErrHandler:
lngCode = Err.Number
ExitHandler:
Set objRequest = Nothing
IsValidUrl = lngCode
End Function
My output is:
-2147012889
0
-2147012889

Sort dead hyperlinks in Excel with VBA?

The title says it:
I have an excel Sheet with an column full of hyperlinks. Now I want that an VBA Script checks which hyperlinks are dead or work and makes an entry into the next columns either with the text 404 Error or active.
Hopefully someone can help me because I am not really good at VB.
EDIT:
I found # http://www.utteraccess.com/forums/printthread.php?Cat=&Board=84&main=1037294&type=thread
A solution which is made for word but the Problem is that I need this solution for Excel. Can someone translate this to Excel solution?
Private Sub testHyperlinks()
Dim thisHyperlink As Hyperlink
For Each thisHyperlink In ActiveDocument.Hyperlinks
If thisHyperlink.Address <> "" And Left(thisHyperlink.Address, 6) <> "mailto" Then
If Not IsURLGood(thisHyperlink.Address) Then
Debug.Print thisHyperlink.Address
End If
End If
Next
End Sub
Private Function IsURLGood(url As String) As Boolean
' Test the URL to see if it is good
Dim request As New WinHttpRequest
On Error GoTo IsURLGoodError
request.Open "GET", url
request.Send
If request.Status = 200 Then
IsURLGood = True
Else
IsURLGood = False
End If
Exit Function
IsURLGoodError:
IsURLGood = False
End Function
First add a reference to Microsoft XML V3 (or above), using Tools->References. Then paste this code:
Option Explicit
Sub CheckHyperlinks()
Dim oColumn As Range
Set oColumn = GetColumn() ' replace this with code to get the relevant column
Dim oCell As Range
For Each oCell In oColumn.Cells
If oCell.Hyperlinks.Count > 0 Then
Dim oHyperlink As Hyperlink
Set oHyperlink = oCell.Hyperlinks(1) ' I assume only 1 hyperlink per cell
Dim strResult As String
strResult = GetResult(oHyperlink.Address)
oCell.Offset(0, 1).Value = strResult
End If
Next oCell
End Sub
Private Function GetResult(ByVal strUrl As String) As String
On Error Goto ErrorHandler
Dim oHttp As New MSXML2.XMLHTTP30
oHttp.Open "HEAD", strUrl, False
oHttp.send
GetResult = oHttp.Status & " " & oHttp.statusText
Exit Function
ErrorHandler:
GetResult = "Error: " & Err.Description
End Function
Private Function GetColumn() As Range
Set GetColumn = ActiveWorkbook.Worksheets(1).Range("A:A")
End Function
Gary's code is perfect, but I would rather use a public function in a module and use it in a cell as function. The advantage is that you can use it in a cell of your choice or anyother more complex function.
In the code below I have adjusted Gary's code to return a boolean and you can then use this output in an =IF(CHECKHYPERLINK(A1);"OK";"FAILED"). Alternatively you could return an Integer and return the status itself (eg.: =IF(CHECKHYPERLINK(A1)=200;"OK";"FAILED"))
A1: http://www.whatever.com
A2: =IF(CHECKHYPERLINK(A1);"OK";"FAILED")
To use this code please follow Gary's instructions and additionally add a module to the workbook (right click on the VBAProject --> Insert --> Module) and paste the code into the module.
Option Explicit
Public Function CheckHyperlink(ByVal strUrl As String) As Boolean
Dim oHttp As New MSXML2.XMLHTTP30
On Error GoTo ErrorHandler
oHttp.Open "HEAD", strUrl, False
oHttp.send
If Not oHttp.Status = 200 Then CheckHyperlink = False Else CheckHyperlink = True
Exit Function
ErrorHandler:
CheckHyperlink = False
End Function
Please also be aware that, if the page is down, the timeout can be long.

Resources