Getting a relevant details from web via code - excel

Hi i am new to VBA and trying to upgrade my skills in VBA.
I am trying to getting the "Owner Name" and "Mailing Address" from this below web link
https://www.pbcgov.org/papa/Asps/PropertyDetail/PropertyDetail.aspx?parcel=30424032060001820
by using this ID in Sheet1"A1"
30-42-40-32-06-000-1820 ( that ID is the relevant to a person which name and mailing address will be paste in Col"B" and Col"C".
I have tried but could not make it.
Any help by anybody will be appreciated.
Sub Data()
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
Url = "https://www.pbcgov.org/papa/?fbclid=IwAR28Ao4d0Ic5hTcd4w6BYv5FwaVYKFc3sCtmcqPI8Ctw2Q0jUy2zIdc7I-c"
'Wait for site to fully load
ie.Navigate2 Url
Do While ie.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In ie.document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.ID
.Range("c" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
Next itm
End With
End Sub

This may be a little advanced but gives another way of looking at the problem.
The info you want is spread over two tables, and two rows within those tables. One table for the owner info (split across several lines); and one table, likewise, for address.
You can use css pattern #ownerInformationDiv table:nth-child(1) to isolate both of those tables, returned in a nodeList by applying querySelectorAll method of ie.document.
Loop each table, and whilst in a given table, loop the rows (ignoring the header row) and concatenate the text found in each row. Once the text is combined, for a given table, write it out to the sheet.
Another things to note include:
The full page load wait
While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
Qualifying object with class
Dim ie As SHDocVw.InternetExplorer
Use of a descriptive title
Public Sub WriteOutOwnersInfo()
VBA:
Option Explicit
Public Sub WriteOutOwnersInfo()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://www.pbcgov.org/papa/Asps/PropertyDetail/PropertyDetail.aspx?parcel=30424032060001820"
While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
Dim tables As Object
Set tables = .Document.querySelectorAll("#ownerInformationDiv table:nth-child(1)")
Dim currTable As Object, currRow As Object, c As Long
Dim i As Long, j As Long, lineOutput As String
For i = 0 To tables.Length - 1
Set currTable = tables.Item(i)
lineOutput = vbNullString
For j = 1 To tables.Item(i).Rows.Length - 1
Set currRow = currTable.Rows(j)
lineOutput = lineOutput & Chr$(32) & Trim$(currRow.innertext)
Next
c = c + 1
ActiveSheet.Cells(1, c) = Trim$(lineOutput)
Next
.Quit
End With
End Sub

Related

Paste table from web into range of cells

I am trying to get a table from a website into my excel sheet. Since the website has a log in and I need to click a few buttons to get to the table, I am using VBA.
The code I have so far is just a test, it is not the actual website that I am trying to log into.
So far, the code is able to launch the website and get the inner text from the table, but it only pastes it into a single cell. How can I paste the table by keeping the same formatting?
Sub test()
Set IE = CreateObject("InternetExplorer.application")
IE.Visible = True
IE.navigate ("https://www.w3schools.com/html/html_tables.asp")
Do
If IE.readyState = 4 Then
IE.Visible = True
Exit Do
Else
DoEvents
End If
Loop
'get data
Dim tbl As HTMLTable
Set tbl = IE.document.getElementById("customers")
Cells(1, 1) = tbl.innerText
End Sub
You may perform webscraping using the following code enhancement, it work perfectly :
Sub test()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.application")
IE.Visible = True
IE.navigate ("https://www.w3schools.com/html/html_tables.asp")
Do
If IE.readyState = 4 Then
IE.Visible = True
Exit Do
Else
DoEvents
End If
Loop
'get data
Dim tbl As HTMLTable
Dim class1 As IHTMLElement, rowText As IHTMLElement, item As IHTMLElement
Dim rowNum As Long, colNum As Long
Set class1 = IE.document.getElementById("customers").children(0)
rowNum = 0
For Each rowText In class1.children
rowNum = rowNum + 1
colNum = 0
For Each item In rowText.children
colNum = colNum + 1
Sheet1.Cells(rowNum, colNum).Value = item.innerText
Next
Next
End Sub

Extracting HTML code from websites by looping through list of URLs

I'm using Excel VBA to launch an IE browser tab based on the URL in each of the rows in column D. Then the relevant HTML code is extracted based on pre-defined classes and populated in columns A - C.
Pretty sure I missed a step. The process stops at D2 and does not proceed to extract HTML from the next URLs (in cells D3, D4, etc).
Thanks in advance for any suggestions!
Sub useClassnames()
Dim element As IHTMLElement
Dim elements As IHTMLElementCollection
Dim IE As InternetExplorer
Dim html As HTMLDocument
Dim shellWins As New ShellWindows
Dim IE_TabURL As String
Dim intRowPosition As Integer
Set IE = New InternetExplorer
IE.Visible = False
intRowPosition = 2
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.navigate Sheet1.Range("D" & intRowPosition)
While IE.Busy
DoEvents
Wend
intRowPosition = intRowPosition + 1
While Sheet1.Range("D" & intRowPosition) <> vbNullString
IE.navigate Sheet1.Range("D" & intRowPosition), CLng(2048)
While IE.Busy
DoEvents
Wend
intRowPosition = intRowPosition + 1
Wend
Do While IE.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Loading Web pageā€¦"
DoEvents
Loop
Set html = IE.document
Set elements = html.getElementsByClassName("container-bs")
Dim count As Long
Dim erow As Long
count = 0
For Each element In elements
If element.className = "container-bs" Then
erow = Sheet1.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1) = html.getElementsByClassName("pull-left")(count + 1).innerHTML
Cells(erow, 2) = html.getElementsByClassName("description")(count).innerHTML
Cells(erow, 3) = html.getElementsByClassName("related-articles")(count).innerHTML
count = count + 1
End If
Next element
Range("A2:C2000").Select
Columns("A:A").EntireColumn.AutoFit
Columns("B:B").ColumnWidth = 36
End Sub
Your lines
Set html = IE.document
Set elements = html.getElementsByClassName("container-bs")
etc happen after the While loop. It needs to be inside.
Your If statement:
If element.className = "container-bs"
should be redundant as you are already looping over a collection of that classname; so I have removed this.
You are not working off element in the loop, so essentially you are using it to control your incremented counter variable. This suggests you can use a better coding strategy for retrieving the items of interest.
Always state the parent worksheet and don't rely on implicit Activesheet references - that is bug prone.
I would expect a structure more like as follows (I cannot account for refactoring to remove element)
Option Explicit
Public Sub UseClassnames()
Dim element As IHTMLElement, elements As IHTMLElementCollection, ie As InternetExplorer
Dim html As HTMLDocument, intRowPosition As Long
intRowPosition = 2
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
While Sheet1.Range("D" & intRowPosition) <> vbNullString
If intRowPosition = 2 Then
ie.navigate Sheet1.Range("D" & intRowPosition)
Else
ie.navigate Sheet1.Range("D" & intRowPosition), CLng(2048)
End If
While ie.Busy Or ie.readyState < 4: DoEvents: Wend
Set html = ie.document
Set elements = html.getElementsByClassName("container-bs")
Dim count As Long, erow As Long
count = 0
For Each element In elements
erow = Sheet1.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
With Sheet1
.Cells(erow, 1) = html.getElementsByClassName("pull-left")(count + 1).innerHTML
.Cells(erow, 2) = html.getElementsByClassName("description")(count).innerHTML
.Cells(erow, 3) = html.getElementsByClassName("related-articles")(count).innerHTML
End With
count = count + 1
Next element
intRowPosition = intRowPosition + 1
Wend
With Sheet1
.Range("A2:C2000").Select
.Columns("A:A").EntireColumn.AutoFit
.Columns("B:B").ColumnWidth = 36
End With
End Sub

How to Improve Data Scraping Using VBA?

I have below code, which fetch data from a Intranet. But its taking more time to fetch data.can someone help me to modify the code to increase the performance.
Thanks In Advance
Note-I haven't posted URL as it is clients website. Sorry about that.
Sub FetchData()
Dim IE As Object
Dim Doc As HTMLDocument
Dim myStr As String
On Error Resume Next
Set IE = CreateObject("InternetExplorer.Application") 'SetBrowser
IE.Visible = False
IE.navigate "URL" 'Open website
Do While IE.Busy Or IE.ReadyState <> 4
Application.Wait DateAdd("s", 1, Now)
Loop
Set Doc = IE.Document
Doc.getElementById("tbxUserID").Value = InputBox("Please Enter Your ID")
Doc.getElementById("txtPassword").Value = InputBox("Please Enter Your
Password")
Doc.getElementById("BtnLogin").Click
Do While IE.Busy Or IE.ReadyState <> 4
Application.Wait DateAdd("s", 1, Now)
Loop
IE.navigate "URL"
Do While IE.Busy Or IE.ReadyState <> 4
Application.Wait DateAdd("s", 1, Now)
Loop
Dim LastRow As Long
Set wks = ActiveSheet
LastRow = wks.Cells(wks.Rows.Count, "A").End(xlUp).Row
Set rowNo = wks.Range("A1:A" & LastRow)
For rowNo = 2 To LastRow
Doc.getElementById("txtField1").Value =
ThisWorkbook.Sheets("Sheet1").Range("A" & rowNo).Value
Doc.getElementById("CtrlQuickSearch1_imgBtnSumbit").Click
Do While IE.Busy Or IE.ReadyState <> 4
Application.Wait DateAdd("s", 1, Now)
Loop
strVal1 = Doc.querySelectorAll("span")(33).innerText
ThisWorkbook.Sheets("Sheet1").Range("B" & rowNo).Value = strVal1
strVal2 = Doc.querySelectorAll("span")(35).innerText
ThisWorkbook.Sheets("Sheet1").Range("C" & rowNo).Value = strVal2
Next
End Sub
Can't guarantee this will run. Points to note:
Use of Worksheets collection
Use of Option Explicit - this means you then have to use the right datatype throughout. Currently you have undeclared variables and, for example, rowNo is used as a Long and as a range.
Removal of On Error Resume Next
Putting all worksheets into variables
Placement of values into array and looping array to get id values. Looping sheet is expensive
Use of early binding and adding class to InternetExplorer
Assumption that after login a new url present and that you need to navigate back to that before each new loop value
Removal of hungarian notation
Ids are fastest selector method so no improvement there
With your css type selectors, e.g. .document.querySelectorAll("span")(33), you might seek whether there is a single node short selector that can be used, rather than using nodeList
VBA:
Option Explicit
Public Sub FetchData()
Dim ie As Object, ie As InternetExplorer
Dim lastRow As Long, wks As Worksheet, i As Long, ws As Worksheet
Set ie = New SHDocVw.InternetExplorer 'SetBrowser
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set wks = ActiveSheet '<==use explicit sheet name if possible
lastRow = wks.Cells(wks.rows.Count, "A").End(xlUp).Row
loopvalues = Application.Transpose(wks.Range("A2:A" & lastRow).Value)
With ie
.Visible = False
.Navigate2 "URL" 'Open website
While .Busy Or .readyState < 4: DoEvents: Wend
.document.getElementById("tbxUserID").Value = InputBox("Please Enter Your ID")
.document.getElementById("txtPassword").Value = InputBox("Please Enter Your Password")
.document.getElementById("BtnLogin").Click
While .Busy Or ie.readyState < 4: DoEvents: Wend
Dim newURL As String, val1 As String, val2 As String
newURL = .document.URL
For i = LBound(loopvalues) To UBound(loopvalues)
.document.getElementById("txtField1").Value = loopvalues(i)
.document.getElementById("CtrlQuickSearch1_imgBtnSumbit").Click
While .Busy Or .readyState < 4: DoEvents: Wend
val1 = .document.querySelectorAll("span")(33).innerText
ws.Range("B" & i).Value = val1
val2 = .document.querySelectorAll("span")(35).innerText
ws.Range("C" & i).Value = val2
.Navigate2 newURL
While .Busy Or ie.readyState < 4: DoEvents: Wend
Next
.Quit
End With
End Sub

VBA HTML Listing Info Pull

I am looking to follow a series of URL's that are found in column A (example: https://www.ebay.com/itm/Apple-iPhone-7-GSM-Unlocked-Verizon-AT-T-TMobile-Sprint-32GB-128GB-256GB/352381131997?epid=225303158&hash=item520b8d5cdd:m:mWgYDe4a79NeLuAlV-RmAQA:rk:7:pf:0) and pull the following information from them:
- Title
- Price
- Description
I think there are multiple issues with my code... For one, I can't get the program to follow specific URL's listed in the Excel (only if I specify one within the code). Also, pulling multiple fields has given me issues.
Option Explicit
Public Sub ListingInfo()
Dim ie As New InternetExplorer, ws As Worksheet, t As Date
Dim i As Integer
i = 0
Do While Worksheets("Sheet1").Cells(i, 1).Value <> ""
Const MAX_WAIT_SEC As Long = 10
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
.Navigate2 Worksheets("Sheet1").Cells(i, 1).Value
While .Busy Or .readyState < 4: DoEvents: Wend
Dim Links As Object, i As Long, count As Long
t = Timer
Do
On Error Resume Next
Set Title = .document.querySelectorAll("it-ttl")
Set price = .document.querySelectorAll("notranslate")
Set Description = .document.querySelectorAll("ds_div")
count = Links.Length
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While count = 0
For i = 0 To Title.Length - 1
ws.Cells(i + 1, 1) = Title.item(i)
ws.Cells(i + 1, 2) = price.item(i)
ws.Cells(i + 1, 3) = Description.item(i)
Next
.Quit
i = i + 1
Loop
End With
End Sub
I would use late binding for MSXML2.XMLHTTP and set a reference to the Microsoft HTML Object Library for the HTMLDocument.
Note: querySelector() references the first item it finds that matches its search string.
Here is the short version:
Public Sub ListingInfo()
Dim cell As Range
With ThisWorkbook.Worksheets("Sheet1")
For Each cell In .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
Dim Document As MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", cell.Value, False
.send
Set Document = New MSHTML.HTMLDocument
Document.body.innerHTML = .responseText
End With
cell.Offset(0, 1).Value = Document.getElementByID("itemTitle").innerText
cell.Offset(0, 2).Value = Document.getElementByID("prcIsum").innerText
If Not Document.querySelector(".viSNotesCnt") Is Nothing Then
cell.Offset(0, 3).Value = Document.querySelector(".viSNotesCnt").innerText
Else
'Try Something Else
End If
Next
End With
End Sub
A more elaborate solution would be to break the code up into smaller routines and load the data into an Array. The main advantage of this is that you can test each subroutine separately.
Option Explicit
Public Type tListingInfo
Description As String
Price As Currency
Title As String
End Type
Public Sub ListingInfo()
Dim source As Range
Dim data As Variant
With ThisWorkbook.Worksheets("Sheet1")
Set source = .Range("A1:D1", .Cells(.Rows.count, 1).End(xlUp))
data = source.Value
End With
Dim r As Long
Dim record As tListingInfo
Dim url As String
For r = 1 To UBound(data)
record = getListingInfo()
url = data(r, 1)
record = getListingInfo(url)
With record
data(r, 2) = .Description
data(r, 3) = .Price
data(r, 4) = .Title
End With
Next
source.Value = data
End Sub
Public Function getListingInfo(url As String) As tListingInfo
Dim ListingInfo As tListingInfo
Dim Document As MSHTML.HTMLDocument
Set Document = getHTMLDocument(url)
With ListingInfo
.Description = Document.getElementByID("itemTitle").innerText
.Price = Split(Document.getElementByID("prcIsum").innerText)(1)
.Title = Document.querySelectorAll(".viSNotesCnt")(0).innerText
Debug.Print .Description, .Price, .Title
End With
End Function
Public Function getHTMLDocument(url As String) As MSHTML.HTMLDocument
Const READYSTATE_COMPLETE As Long = 4
Dim Document As MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
If .readyState = READYSTATE_COMPLETE And .Status = 200 Then
Set Document = New MSHTML.HTMLDocument
Document.body.innerHTML = .responseText
Set getHTMLDocument = Document
Else
MsgBox "URL: " & vbCrLf & "Ready state: " & .readyState & vbCrLf & "HTTP request status: " & .Status, vbInformation, "URL Not Responding"
End If
End With
End Function
There are a lot of things to fix in your code. It is late here so I will just give pointers (and update fully later) and working code below:
Declare all variables and use appropriate type
Review For Loops and how transpose can be used to create a 1d array of urls pulled from sheet to loop over
Review the difference between querySelector and querySelectorAll methods
Review CSS selectors (you are specifying everything as type selector when in fact you are not selecting by tag for the elements of interest; nor by your stated text)
Think about placement of your IE object creation and of your .Navigate2 to make use of existing object
Make sure to use distinct loop counters
Be sure not to overwrite values in sheet
Code:
Option Explicit
Public Sub ListingInfo()
Dim ie As New InternetExplorer, ws As Worksheet
Dim i As Long, urls(), rowCounter As Long
Dim title As Object, price As Object, description As Object
Set ws = ThisWorkbook.Worksheets("Sheet1")
urls = Application.Transpose(ws.Range("A1:A2").Value) '<= Adjust
With ie
.Visible = True
For i = LBound(urls) To UBound(urls)
If InStr(urls(i), "http") > 0 Then
rowCounter = rowCounter + 1
.Navigate2 urls(i)
While .Busy Or .readyState < 4: DoEvents: Wend
Set title = .document.querySelector(".it-ttl")
Set price = .document.querySelector("#prcIsum")
Set description = .document.querySelector("#viTabs_0_is")
ws.Cells(rowCounter, 3) = title.innerText
ws.Cells(rowCounter, 4) = price.innerText
ws.Cells(rowCounter, 5) = description.innerText
Set title = Nothing: Set price = Nothing: Set description = Nothing
End If
Next
.Quit
End With
End Sub
Here's an approach using Web Requests, using MSXML. It should be significantly faster than using IE, and I'd encourage you to strongly consider using this approach wherever possible.
You'll need references to Microsoft HTML Object Library and Microsoft XML v6.0 to get this working.
Option Explicit
Public Sub SubmitRequest()
Dim URLs As Excel.Range
Dim URL As Excel.Range
Dim LastRow As Long
Dim wb As Excel.Workbook: Set wb = ThisWorkbook
Dim ws As Excel.Worksheet: Set ws = wb.Worksheets(1)
Dim ListingDetail As Variant
Dim i As Long
Dim j As Long
Dim html As HTMLDocument
ReDim ListingDetail(0 To 2, 0 To 10000)
'Get URLs
With ws
LastRow = .Cells(.Rows.Count, 1).End(xlUp).Row
Set URLs = .Range(.Cells(1, 1), .Cells(LastRow, 1))
End With
'Update the ListingDetail
For Each URL In URLs
Set html = getHTML(URL.Value2)
ListingDetail(0, i) = html.getElementByID("itemTitle").innertext 'Title
ListingDetail(1, i) = html.getElementByID("prcIsum").innertext 'Price
ListingDetail(2, i) = html.getElementsByClassName("viSNotesCnt")(0).innertext 'Seller Notes
i = i + 1
Next
'Resize array
ReDim Preserve ListingDetail(0 To 2, 0 To i - 1)
'Dump in Column T,U,V of existing sheet
ws.Range("T1:V" & i).Value = WorksheetFunction.Transpose(ListingDetail)
End Sub
Private Function getHTML(ByVal URL As String) As HTMLDocument
'Add a reference to Microsoft HTML Object Library
Set getHTML = New HTMLDocument
With New MSXML2.XMLHTTP60
.Open "GET", URL
.send
getHTML.body.innerHTML = .responseText
End With
End Function

Retrieving a URL from Internet Explorer with VBA

I have written some VBA code in Excel to retrieve the latitude and longitude from a Google Maps URL and paste it into a cell in my worksheet. My problem is in retrieving the URL from internet explorer. Below I have two examples of my code, one macro returns an about:blank as though the object doesn't have the LocationURL property, and the other example seems like it is saving all of my previous searches, so it cycles through all of my previous searches and pastes the very first searches' URL. Example 2 uses a shell suggestion that I found online to reassign the properties to the oIE object. I can get both to slightly work, but neither will do exactly what I need from the macro.
Cell(8,8) is a hyperlink to google maps where I'm searching an address, and Cell(8,9) is where I want to paste the URL after google maps has redirected and has the latitude and longitude in the URL.
Example 1:
Sub CommandButton1_Click()
Dim ie As Object
Dim Doc As HTMLDocument
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Navigate "http://www.google.com/maps?q=" & Range("I7").Value
Do
DoEvents
Loop Until ie.ReadyState = 4
Set Doc = ie.Document
Cells(8, 9).Value = ie.LocationName
End Sub
Example 2:
Sub Macro()
Dim oIE, oShell, objShellWindows, strPath, X
strPath = Cells(8, 8)
Set oIE = CreateObject("InternetExplorer.Application")
'This is to resolve oIE.navigate "about:blank" issue
oIE.Top = 0
oIE.Left = 0
oIE.Width = 500
oIE.Height = 500
oIE.Navigate strPath
Do While oIE.Busy And oIE.ReadyState < 2
DoEvents
Loop
'Reassigning oIE.LocationName & vbCrLf & oIE.LocationURL values after redirect in IE
Set oShell = CreateObject("WScript.Shell")
Set objShellWindows = CreateObject("Shell.Application").Windows
For X = objShellWindows.Count - 1 To 0 Step -1
Set oIE = objShellWindows.Item(X)
If Not oIE Is Nothing Then
If StrComp(oIE.LocationURL, strPath, 1) = 0 Then
Do While oIE.Busy And oIE.ReadyState < 2
DoEvents
Loop
oIE.Visible = 2
Exit For
End If
End If
Cells(8, 9).Value = oIE.LocationURL
Set oIE = Nothing
Next
Set objShellWindows = Nothing
Set oIE = Nothing
End Sub
Thanks,
Andrew
Is this as simple as looping until the document.URL changes? In my timed loop I wait for the string safe=vss in the original page load to disappear.
Option Explicit
Public Sub GetNewURL()
Dim IE As New InternetExplorer, newURL As String, t As Date
Const MAX_WAIT_SEC As Long = 5
With IE
.Visible = True
.navigate2 "http://www.google.com/maps?q=" & "glasgow" '<==Range("I7").Value
While .Busy Or .readyState < 4: DoEvents: Wend
t = Timer
Do
DoEvents
newURL = .document.URL
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While InStr(newURL, "safe=vss") > 0
Debug.Print newURL
End With
End Sub

Resources