I am trying to use VBA in Excel to access data in a webpage that is embedded in a webpage. I know how to do this if the table is on a non-embedded page. I also know how to navigate to this product's page using VBA. I cannot just navigate to the embedded page because there is a product id look up that converts a part number to an id and i don't have access to that database.
Here is the link to the page: http://support.automation.siemens.com/WW/view/en/7224052
I would have posed a picture of the element for clarity but I don't have 10 rep points...
The table I need to get information from is the "Product Life Cycle" table.
I can see the correct url in a property called src under the corresponding item if I save the page as an HTMLDocument in VBA using the following code:
For Each cell In Selection
link = "http://support.automation.siemens.com/US/llisapi.dll?func=cslib.csinfo&lang=en&objid=" & cell & "&caller=view"
ie.navigate link
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Dim doc As HTMLDocument
Is there a way to index this table with VBA or will I have to contact the company and attempt to get access to the product ID so I can navigate to the page directly?
Regarding my comment below, here is the code that a recorded macro yeilds:
ActiveCell.FormulaR1C1 = _
"http://support.automation.siemens.com/WW/llisapi.dll?func=cslib.csinfo&lang=en&objid=6ES7194-1AA01-0XA0&caller=view"
Range("F9").Select
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;http://support.automation.siemens.com/WW/llisapi.dll?func=ll&objid=7224052&nodeid0=10997566&caller=view&lang=en&siteid=cseus&aktprim=0&objaction=csopen&extranet=standard&viewreg=WW" _
, Destination:=Range("$F$9"))
.FieldNames = True
.RowNumbers = False
I know where to find the string: URL;http://support.automation.siemens.com/WW/llisapi.dll?func=ll&objid=7224052&nodeid0=10997566&caller=view&lang=en&siteid=cseus&aktprim=0&objaction=csopen&extranet=standard&viewreg=WW, but I don't know how to save it to a variable.
Not sure I exactly understand your question, but here is some code that will get the source code behind the table of interest. You can extract the data of interest using functions like "instr" and "mid"
' open IE, navigate to the website of interest and loop until fully loaded
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.navigate "http://support.automation.siemens.com/WW/view/en/7224052"
.Top = 50
.Left = 530
.Height = 400
.Width = 400
Do Until Not ie.Busy And ie.ReadyState = 4
DoEvents
Loop
End With
' Assign the source code behind the page to a variable
my_var = ie.document.frames(3).document.DocumentElement.innerhtml
' Extract the url for the "Product life cycle" table
pos_1 = InStr(1, my_var, "product life cycle", vbTextCompare)
pos_2 = InStr(pos_1, my_var, "/WW/llisapi", vbTextCompare)
pos_3 = InStr(pos_2, my_var, """><", vbTextCompare)
pos_4 = InStr(pos_3, my_var, """/>", vbTextCompare)
table_url = Mid(my_var, pos_2, pos_3 - pos_2)
table_url = Replace(table_url, "amp;", "", 1, -1, vbTextCompare)
table_url = "http://support.automation.siemens.com" & table_url
' navigate to the table url
ie.navigate table_url
Do Until Not ie.Busy And ie.ReadyState = 4
DoEvents
Loop
' assign the source code for this page to a variable and extract the desired information
my_var2 = ie.document.body.innerhtml
pos_1 = InStr(1, my_var2, "ET 200X, basic modules,", vbTextCompare)
' close ie
ie.Quit
I have had problems getting ron's code to work, I think becuase IE doesn't work easily with frames. Below is some code that will extract some of the data from the table you have mentioned, it so far doesn't handle the diagrams.
Sub FrameStrip()
Dim oFrames As Object
Dim tdelements As Object
Dim tdElement As Object
Dim oFrame As MSHTML.HTMLFrameElement
Dim oElement As Object
Dim sString As String
Dim myVar As Variant
Dim sLinks() As String
Dim i As Integer
Dim bfound As Boolean
Dim url As String
Dim oIE As InternetExplorer
Set oIE = New InternetExplorer
url = "http://support.automation.siemens.com/WW/view/en/7224052"
'Set address for use with relative source names
myVar = Split(url, "/")
sString = myVar(0) & "//" & myVar(2)
oIE.navigate url
oIE.Visible = True
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
Set oFrames = oIE.document.getElementsByTagName("frame")
ReDim sLinks(oFrames.Length)
'Get the source locations for each frame
i = 0
For Each oFrame In oFrames
sLinks(i) = sString & (oFrame.getAttribute("src"))
i = i + 1
Next oFrame
'Go through each frame to find the table
i = 0
bfound = False
Do While i < UBound(sLinks) And bfound = False
oIE.navigate sLinks(i)
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
Set oElement = oIE.document.getElementById("produktangaben")
bfound = IsSet(oElement)
i = i + 1
Loop
Set tdelements = oElement.getElementsByTagName("td")
'Display information about table
sString = ""
For Each tdElement In tdelements
Debug.Print tdElement.innerText
sString = sString & tdElement.innerText
Next tdElement
End Sub
Function IsSet(ByRef oElement As Object) As Boolean
Dim tdelements As Object
Dim bSet As Boolean
bSet = True
On Error GoTo ErrorSet
Set tdelements = oElement.getElementsByTagName("td")
On Error GoTo 0
Cleanup:
On Error Resume Next
Set tdelements = Nothing
On Error GoTo 0
IsSet = bSet
Exit Function
ErrorSet:
bSet = False
GoTo Cleanup:
End Function
Related
I am trying to extract the Business Name from a website.
I am receiving an error.
e]1
On For iCnt = 0 To .getElementsByTagName("h2").Length - 1
I need to extract all the details available in the website like:
Business Name
Address
Telephone
Fax
Email
Website
in a sequence so all the details can be pasted into an Excel file.
Option Explicit
Const sSiteName = "https://www.thoroughexamination.org/postcode-search/nationwide?page=1"
Private Sub getHTMLContents()
' Create Internet Explorer object.
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False ' Keep this hidden.
IE.Navigate sSiteName
' Wait till IE is fully loaded.
While IE.ReadyState <> 4
DoEvents
Wend
Dim oHDoc As HTMLDocument ' Create document object.
Set oHDoc = IE.Document
Dim oHEle As HTMLUListElement ' Create HTML element (<ul>) object.
Set oHEle = oHDoc.getElementById("search-item-header") ' Get the element reference using its ID.
Dim iCnt As Integer
' Loop through elements inside the <ul> element and find <h1>, which has the texts we want.
With oHEle
For iCnt = 0 To .getElementsByTagName("h2").Length - 1
Debug.Print .getElementsByTagName("h2").Item(iCnt).getElementsByTagName("a").Item(0).innerHTML
Next iCnt
End With
' Clean up.
IE.Quit
Set IE = Nothing
Set oHEle = Nothing
Set oHDoc = Nothing
End Sub
No Response from 2nd Code:
Sub TutorailsPoint()
Const URL = "https://www.thoroughexamination.org/postcode-search/nationwide?page=1"
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, posts As Object, topic As Object
Dim x As Long
x = 1
http.Open "GET", URL, False
http.send
html.body.innerHTML = http.responseText
Set topics = html.getElementsByClassName("search-item-header")
For Each posts In topics
For Each topic In posts.getElementsByTagName("h2")
Cells(x, 1) = topic.innerText
x = x + 1
Next topic
Next posts
End Sub
You can do what you want with xhr if you use an UserAgent. Here is a code for all datasets per row with the present fields for all pages of your posted url.
Sub TutorailsPoint()
Dim doc As Object
Dim url As String
Dim page As Long
Dim hits As Long
Dim maxPage As Long
Dim maxPageKnown As Boolean
Dim currRow As Long
Dim nodeAllGroups As Object
Dim nodeOneGroup As Object
Dim nodeContactData As Object
Dim nodeWebSite As Object
Dim telephone As Boolean
page = 1
maxPage = 1
currRow = 2
Set doc = CreateObject("htmlFile")
With CreateObject("MSXML2.ServerXMLHTTP.6.0")
'Call all pages
Do
url = "https://www.thoroughexamination.org/postcode-search/nationwide?page=" & page
.Open "GET", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0"
.send
If .Status = 200 Then
doc.body.innerHTML = .responseText
'How many pages to call
If Not maxPageKnown Then
hits = CLng(doc.getElementsByClassName("summary")(0).getElementsByTagName("b")(1).innertext)
maxPage = hits / 20
If hits Mod 20 <> 0 Then
maxPage = maxPage + 1
End If
maxPageKnown = True
'Debug.Print maxPage
End If
Set nodeAllGroups = doc.getElementsByClassName("group")
For Each nodeOneGroup In nodeAllGroups
'Business name
Cells(currRow, 1) = nodeOneGroup.getElementsByTagName("h2")(0).innertext
'Address
Cells(currRow, 2) = nodeOneGroup.getElementsByTagName("p")(0).innertext
'Contact block
Set nodeContactData = nodeOneGroup.getElementsByClassName("depot")
If nodeContactData.Length <> 0 Then
'Telephone
If InStr(1, nodeContactData(0).innertext, "tel:") > 0 Then
Cells(currRow, 3).NumberFormat = "#"
Cells(currRow, 3) = Trim(nodeContactData(0).getElementsByTagName("strong")(0).innertext)
telephone = True
End If
'Fax
If InStr(1, nodeContactData(0).innertext, "fax:") > 0 Then
Cells(currRow, 4).NumberFormat = "#"
If telephone Then
Cells(currRow, 4) = Trim(Replace(nodeContactData(0).getElementsByTagName("p")(0).FirstChild.NextSibling.NextSibling.NextSibling.NodeValue, "fax:", ""))
Else
Cells(currRow, 4) = Trim(Replace(nodeContactData(0).getElementsByTagName("p")(0).FirstChild.NodeValue, "fax:", "")) 'not sure, not seen no telephone
End If
End If
'Email
If InStr(1, nodeContactData(0).innertext, "email:") > 0 Then
Cells(currRow, 5) = Trim(nodeContactData(0).getElementsByTagName("a")(0).innertext)
End If
'website
Set nodeWebSite = nodeContactData(0).getElementsByClassName("website")
If nodeWebSite.Length > 0 Then
Cells(currRow, 6) = Trim(nodeWebSite(0).innertext)
End If
End If
telephone = False
currRow = currRow + 1
Next nodeOneGroup
page = page + 1
Else
Cells(currRow, 1) = "Page not loaded. HTTP status " & .Status
Cells(currRow, 2) = url
currRow = currRow + 1
End If
Loop While page <= maxPage
End With
End Sub
Internet Explorer was dropped by MS so it's not a good idea to use it. From now on, in VBA, my best option is to use Selenium (Selenium Type Library) to scrape the WWW. To get started with Selenium the right way you have to take care of the following:
Update the related .Net Framework
Install Selenium Basic app
Download and install the version of Chromedriver.exe (see the latest version here) exactly compatible with the version of Google Chrome (which must be installed on the machine). Extract 'Chromedriver.exe' and put it at the same folder where Selenium Basic was installed (I've installed Selenium Basic here: 'C:\Program Files\SeleniumBasic')
Add the reference in the VBE to the ActiveX library: 'Selenium Type Library'
Put the code in a standard module on the VBE:
Sub fnGetDataFromWWW()
Dim oWD As WebDriver
Dim post As Selenium.WebElement
Dim groups As Selenium.WebElements
Dim strText As String
Dim intItem As Integer
Set oWD = New Selenium.WebDriver
oWD.Start "chrome"
DoEvents
oWD.Get "https://www.thoroughexamination.org/postcode-search/nationwide?page=1"
DoEvents
Set groups = oWD.FindElementsByClass("group")
For Each post In groups
strText = post.Attribute("outerText")
For intItem = 0 To UBound(Split(strText, Chr(10)))
If Trim(Split(strText, Chr(10))(intItem)) <> "" Then
Debug.Print Split(strText, Chr(10))(intItem)
End If
Next
Next post
End Sub
I am looking to follow a series of URL's that are found in column A (example: https://www.ebay.com/itm/Apple-iPhone-7-GSM-Unlocked-Verizon-AT-T-TMobile-Sprint-32GB-128GB-256GB/352381131997?epid=225303158&hash=item520b8d5cdd:m:mWgYDe4a79NeLuAlV-RmAQA:rk:7:pf:0) and pull the following information from them:
- Title
- Price
- Description
I think there are multiple issues with my code... For one, I can't get the program to follow specific URL's listed in the Excel (only if I specify one within the code). Also, pulling multiple fields has given me issues.
Option Explicit
Public Sub ListingInfo()
Dim ie As New InternetExplorer, ws As Worksheet, t As Date
Dim i As Integer
i = 0
Do While Worksheets("Sheet1").Cells(i, 1).Value <> ""
Const MAX_WAIT_SEC As Long = 10
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
.Navigate2 Worksheets("Sheet1").Cells(i, 1).Value
While .Busy Or .readyState < 4: DoEvents: Wend
Dim Links As Object, i As Long, count As Long
t = Timer
Do
On Error Resume Next
Set Title = .document.querySelectorAll("it-ttl")
Set price = .document.querySelectorAll("notranslate")
Set Description = .document.querySelectorAll("ds_div")
count = Links.Length
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While count = 0
For i = 0 To Title.Length - 1
ws.Cells(i + 1, 1) = Title.item(i)
ws.Cells(i + 1, 2) = price.item(i)
ws.Cells(i + 1, 3) = Description.item(i)
Next
.Quit
i = i + 1
Loop
End With
End Sub
I would use late binding for MSXML2.XMLHTTP and set a reference to the Microsoft HTML Object Library for the HTMLDocument.
Note: querySelector() references the first item it finds that matches its search string.
Here is the short version:
Public Sub ListingInfo()
Dim cell As Range
With ThisWorkbook.Worksheets("Sheet1")
For Each cell In .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
Dim Document As MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", cell.Value, False
.send
Set Document = New MSHTML.HTMLDocument
Document.body.innerHTML = .responseText
End With
cell.Offset(0, 1).Value = Document.getElementByID("itemTitle").innerText
cell.Offset(0, 2).Value = Document.getElementByID("prcIsum").innerText
If Not Document.querySelector(".viSNotesCnt") Is Nothing Then
cell.Offset(0, 3).Value = Document.querySelector(".viSNotesCnt").innerText
Else
'Try Something Else
End If
Next
End With
End Sub
A more elaborate solution would be to break the code up into smaller routines and load the data into an Array. The main advantage of this is that you can test each subroutine separately.
Option Explicit
Public Type tListingInfo
Description As String
Price As Currency
Title As String
End Type
Public Sub ListingInfo()
Dim source As Range
Dim data As Variant
With ThisWorkbook.Worksheets("Sheet1")
Set source = .Range("A1:D1", .Cells(.Rows.count, 1).End(xlUp))
data = source.Value
End With
Dim r As Long
Dim record As tListingInfo
Dim url As String
For r = 1 To UBound(data)
record = getListingInfo()
url = data(r, 1)
record = getListingInfo(url)
With record
data(r, 2) = .Description
data(r, 3) = .Price
data(r, 4) = .Title
End With
Next
source.Value = data
End Sub
Public Function getListingInfo(url As String) As tListingInfo
Dim ListingInfo As tListingInfo
Dim Document As MSHTML.HTMLDocument
Set Document = getHTMLDocument(url)
With ListingInfo
.Description = Document.getElementByID("itemTitle").innerText
.Price = Split(Document.getElementByID("prcIsum").innerText)(1)
.Title = Document.querySelectorAll(".viSNotesCnt")(0).innerText
Debug.Print .Description, .Price, .Title
End With
End Function
Public Function getHTMLDocument(url As String) As MSHTML.HTMLDocument
Const READYSTATE_COMPLETE As Long = 4
Dim Document As MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
If .readyState = READYSTATE_COMPLETE And .Status = 200 Then
Set Document = New MSHTML.HTMLDocument
Document.body.innerHTML = .responseText
Set getHTMLDocument = Document
Else
MsgBox "URL: " & vbCrLf & "Ready state: " & .readyState & vbCrLf & "HTTP request status: " & .Status, vbInformation, "URL Not Responding"
End If
End With
End Function
There are a lot of things to fix in your code. It is late here so I will just give pointers (and update fully later) and working code below:
Declare all variables and use appropriate type
Review For Loops and how transpose can be used to create a 1d array of urls pulled from sheet to loop over
Review the difference between querySelector and querySelectorAll methods
Review CSS selectors (you are specifying everything as type selector when in fact you are not selecting by tag for the elements of interest; nor by your stated text)
Think about placement of your IE object creation and of your .Navigate2 to make use of existing object
Make sure to use distinct loop counters
Be sure not to overwrite values in sheet
Code:
Option Explicit
Public Sub ListingInfo()
Dim ie As New InternetExplorer, ws As Worksheet
Dim i As Long, urls(), rowCounter As Long
Dim title As Object, price As Object, description As Object
Set ws = ThisWorkbook.Worksheets("Sheet1")
urls = Application.Transpose(ws.Range("A1:A2").Value) '<= Adjust
With ie
.Visible = True
For i = LBound(urls) To UBound(urls)
If InStr(urls(i), "http") > 0 Then
rowCounter = rowCounter + 1
.Navigate2 urls(i)
While .Busy Or .readyState < 4: DoEvents: Wend
Set title = .document.querySelector(".it-ttl")
Set price = .document.querySelector("#prcIsum")
Set description = .document.querySelector("#viTabs_0_is")
ws.Cells(rowCounter, 3) = title.innerText
ws.Cells(rowCounter, 4) = price.innerText
ws.Cells(rowCounter, 5) = description.innerText
Set title = Nothing: Set price = Nothing: Set description = Nothing
End If
Next
.Quit
End With
End Sub
Here's an approach using Web Requests, using MSXML. It should be significantly faster than using IE, and I'd encourage you to strongly consider using this approach wherever possible.
You'll need references to Microsoft HTML Object Library and Microsoft XML v6.0 to get this working.
Option Explicit
Public Sub SubmitRequest()
Dim URLs As Excel.Range
Dim URL As Excel.Range
Dim LastRow As Long
Dim wb As Excel.Workbook: Set wb = ThisWorkbook
Dim ws As Excel.Worksheet: Set ws = wb.Worksheets(1)
Dim ListingDetail As Variant
Dim i As Long
Dim j As Long
Dim html As HTMLDocument
ReDim ListingDetail(0 To 2, 0 To 10000)
'Get URLs
With ws
LastRow = .Cells(.Rows.Count, 1).End(xlUp).Row
Set URLs = .Range(.Cells(1, 1), .Cells(LastRow, 1))
End With
'Update the ListingDetail
For Each URL In URLs
Set html = getHTML(URL.Value2)
ListingDetail(0, i) = html.getElementByID("itemTitle").innertext 'Title
ListingDetail(1, i) = html.getElementByID("prcIsum").innertext 'Price
ListingDetail(2, i) = html.getElementsByClassName("viSNotesCnt")(0).innertext 'Seller Notes
i = i + 1
Next
'Resize array
ReDim Preserve ListingDetail(0 To 2, 0 To i - 1)
'Dump in Column T,U,V of existing sheet
ws.Range("T1:V" & i).Value = WorksheetFunction.Transpose(ListingDetail)
End Sub
Private Function getHTML(ByVal URL As String) As HTMLDocument
'Add a reference to Microsoft HTML Object Library
Set getHTML = New HTMLDocument
With New MSXML2.XMLHTTP60
.Open "GET", URL
.send
getHTML.body.innerHTML = .responseText
End With
End Function
I have written some VBA code in Excel to retrieve the latitude and longitude from a Google Maps URL and paste it into a cell in my worksheet. My problem is in retrieving the URL from internet explorer. Below I have two examples of my code, one macro returns an about:blank as though the object doesn't have the LocationURL property, and the other example seems like it is saving all of my previous searches, so it cycles through all of my previous searches and pastes the very first searches' URL. Example 2 uses a shell suggestion that I found online to reassign the properties to the oIE object. I can get both to slightly work, but neither will do exactly what I need from the macro.
Cell(8,8) is a hyperlink to google maps where I'm searching an address, and Cell(8,9) is where I want to paste the URL after google maps has redirected and has the latitude and longitude in the URL.
Example 1:
Sub CommandButton1_Click()
Dim ie As Object
Dim Doc As HTMLDocument
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Navigate "http://www.google.com/maps?q=" & Range("I7").Value
Do
DoEvents
Loop Until ie.ReadyState = 4
Set Doc = ie.Document
Cells(8, 9).Value = ie.LocationName
End Sub
Example 2:
Sub Macro()
Dim oIE, oShell, objShellWindows, strPath, X
strPath = Cells(8, 8)
Set oIE = CreateObject("InternetExplorer.Application")
'This is to resolve oIE.navigate "about:blank" issue
oIE.Top = 0
oIE.Left = 0
oIE.Width = 500
oIE.Height = 500
oIE.Navigate strPath
Do While oIE.Busy And oIE.ReadyState < 2
DoEvents
Loop
'Reassigning oIE.LocationName & vbCrLf & oIE.LocationURL values after redirect in IE
Set oShell = CreateObject("WScript.Shell")
Set objShellWindows = CreateObject("Shell.Application").Windows
For X = objShellWindows.Count - 1 To 0 Step -1
Set oIE = objShellWindows.Item(X)
If Not oIE Is Nothing Then
If StrComp(oIE.LocationURL, strPath, 1) = 0 Then
Do While oIE.Busy And oIE.ReadyState < 2
DoEvents
Loop
oIE.Visible = 2
Exit For
End If
End If
Cells(8, 9).Value = oIE.LocationURL
Set oIE = Nothing
Next
Set objShellWindows = Nothing
Set oIE = Nothing
End Sub
Thanks,
Andrew
Is this as simple as looping until the document.URL changes? In my timed loop I wait for the string safe=vss in the original page load to disappear.
Option Explicit
Public Sub GetNewURL()
Dim IE As New InternetExplorer, newURL As String, t As Date
Const MAX_WAIT_SEC As Long = 5
With IE
.Visible = True
.navigate2 "http://www.google.com/maps?q=" & "glasgow" '<==Range("I7").Value
While .Busy Or .readyState < 4: DoEvents: Wend
t = Timer
Do
DoEvents
newURL = .document.URL
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While InStr(newURL, "safe=vss") > 0
Debug.Print newURL
End With
End Sub
I have a VBA code that selects info from drop-down menus on a government website and then submits the query. The requested data then opens up in another IE page. I am trying to copy this data into excel; however, I am unable to do so.
My code currently copies the text on the first IE page that contains the drop-down menus. The government website is: http://www.osfi-bsif.gc.ca/Eng/wt-ow/Pages/FINDAT.aspx
I have look all over the internet for a solution but nothing seems to work...
Here is my code:
Sub GetOsfiFinancialData()
Dim UrlAddress As String
UrlAddress = "http://ws1.osfi-bsif.gc.ca/WebApps/FINDAT/DTIBanks.aspx?T=0&LANG=E"
Dim ie As Object
Set ie = CreateObject("internetexplorer.application")
With ie
.Silent = True
.Visible = False
.navigate UrlAddress
End With
Do Until Not ie.Busy And ie.readyState = 4
DoEvents
Loop
Application.Wait (Now() + TimeValue("00:00:05"))
'Select Bank
ie.document.getElementById("DTIWebPartManager_gwpDTIBankControl1_DTIBankControl1_institutionTypeCriteria_institutionsDropDownList").Value = Z005
'open window with financial data
Dim objButton
Set objButton = ie.document.getElementById("DTIWebPartManager_gwpDTIBankControl1_DTIBankControl1_submitButton")
objButton.Focus
objButton.Click
'select new pop-up window
marker = 0
Set objshell = CreateObject("Shell.Application")
IE_count = objshell.Windows.Count
For x = 0 To (IE_count - 1)
On Error Resume Next ' sometimes more web pages are counted than are open
my_title = objshell.Windows(x).document.Title
If my_title Like "Consolidated Monthly Balance Sheet" & "*" Then 'compare to find if the desired web page is already open
Set ie = objshell.Windows(x)
marker = 1
Exit For
Else
End If
Next
Do Until Not ie.Busy And ie.readyState = 4
DoEvents
Loop
Application.Wait (Now() + TimeValue("00:00:05"))
Dim doc As MSHTML.HTMLDocument
Dim tables As MSHTML.IHTMLElementCollection
Dim table As MSHTML.HTMLTable
Dim clipboard As MSForms.DataObject
Set doc = ie.document
Set tables = doc.getElementsByTagName("body")
Set table = tables(0)
Set clipboard = New MSForms.DataObject
'paste in sheets
Dim test
Set test = ActiveWorkbook.Sheets("Test")
clipboard.SetText table.outerHTML
clipboard.PutInClipboard
test.Range("A1").PasteSpecial xlPasteAll
clipboard.Clear
MsgBox ("Task Completed")
End Sub
Your help is greatly appreciated!
You were using the current test with document.Title. I found that For Each of all windows looking for the full title worked in combination with copy pasting the pop-up window outerHTML. No additional wait time was required.
Inside the For Each Loop, after you reset the IE instance to the new window, you can obtain the new URL with ie.document.url. As you already have the data loaded you might as well just copy paste it straight away in my opinion.
Code:
Option Explicit
Public Sub GetOsfiFinancialData()
Dim UrlAddress As String, objButton, ie As Object
UrlAddress = "http://ws1.osfi-bsif.gc.ca/WebApps/FINDAT/DTIBanks.aspx?T=0&LANG=E"
Set ie = CreateObject("internetexplorer.application")
With ie
.Silent = True
.Visible = False
.navigate UrlAddress
While .Busy Or .readyState < 4: DoEvents: Wend
.document.getElementById("DTIWebPartManager_gwpDTIBankControl1_DTIBankControl1_institutionTypeCriteria_institutionsDropDownList").Value = "Z005"
Set objButton = .document.getElementById("DTIWebPartManager_gwpDTIBankControl1_DTIBankControl1_submitButton")
objButton.Focus
objButton.Click
Dim objShellWindows As New SHDocVw.ShellWindows, currentWindow As IWebBrowser2
For Each currentWindow In objShellWindows
If currentWindow.document.Title = "Consolidated Monthly Balance Sheet - Banks, Trust and Loan" Then
Set ie = currentWindow
Exit For
End If
Next
Dim clipboard As Object
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
clipboard.SetText ie.document.body.outerHTML
clipboard.PutInClipboard
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
.Quit
End With
End Sub
References (VBE > Tools > References):
Microsoft Internet Controls
I don't have time to get into all the stuff about controlling one browser from another, but I think you can figure that part out, especially since you made some great progress on this already. Get URL#2 from URL#1, like you are doing, but with some better data controls around it, and then do this...
Option Explicit
Sub Web_Table_Option_One()
Dim xml As Object
Dim html As Object
Dim objTable As Object
Dim result As String
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
With xml
.Open "GET", "http://ws1.osfi-bsif.gc.ca/WebApps/Temp/2f40b7ef-d024-4eca-a8a3-fb82153efafaFinancialData.aspx", False
.send
End With
result = xml.responseText
Set html = CreateObject("htmlfile")
html.body.innerHTML = result
Set objTable = html.getElementsByTagName("Table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End Sub
I am trying to scrape some data from a database, and I have it pretty much set. I look in IE for a tab that has me logged in into the database, and paste the query link there through vba. But how do I extract the data that it returns from the IE tab and put that into an excel cell or array.
This is the code I have for opening my query:
Sub import()
Dim row As Integer
Dim strTargetFile As String
Dim wb As Workbook
Dim test As String
Dim ie As Object
Call Fill_Array_Cultivar
For row = 3 To 4
Sheets.Add.Name = Cultivar_Array(row, 1)
strTargetFile = "https://www3.wipo.int/pluto/user/jsp/select.jsp?fl=app_date%2Cden_info%2Cden_final&hl=false&json.nl=map&wt=json&type=upov&start=0&qi=3-nNCXQ6etEVv184O9nnd5yg%3D%3D&q=cc%3AIT%20AND%20latin_name%3A(zea%20mays)%20AND%20den_info%3A" & Trim(Cultivar_Array(row, 1)) & "&facet=false"
Set ie = GetIE("https://www3.wipo.int" & "*")
If Not ie Is Nothing Then
ie.navigate (strTargetFile)
Else
MsgBox "IE not found!"
End If
Next row
End Sub
And this is the appropriate function:
'Find an IE window with a matching (partial) URL
'Assumes no frames.
Function GetIE(sAddress As String) As Object
Dim objShell As Object, objShellWindows As Object, o As Object
Dim retVal As Object, sURL As String
Set retVal = Nothing
Set objShell = CreateObject("Shell.Application")
Set objShellWindows = objShell.Windows
'see if IE is already open
For Each o In objShellWindows
sURL = ""
On Error Resume Next
sURL = o.document.Location
On Error GoTo 0
If sURL <> "" Then
If sURL Like sAddress & "*" Then
Set retVal = o
Exit For
End If
End If
Next o
Set GetIE = retVal
End Function
What the website returns to me is a white page with a line of text. Here is an example:
{"response":{"start":0,"docs":[{"den_final":"Abacus","app_date":"1998-01-13T22:59:59Z"}],"numFound":1},"qi":"3-nNCXQ6etEVv184O9nnd5yg==","sv":"bswa2.wipo.int","lastUpdated":1436333633993}
PS. I also tried using the importxml function, it will import the website, but only an error page, as it does not recognize me as logged in.
I found the solution, which was fairly simple but hard to find.
I can just grab the ie.Document.body.innertext which is all the text I need.
See the code I updated below:
Sub import()
Dim row As Integer
Dim strTargetFile As String
Dim wb As Workbook
Dim test As String
Dim ie As Object
Dim pageText As String
Call Fill_Array_Cultivar
For row = 3 To 4
Sheets.Add.Name = Cultivar_Array(row, 1)
strTargetFile = "https://www3.wipo.int/pluto/user/jsp/select.jsp?fl=app_date%2Cden_info%2Cden_final&hl=false&json.nl=map&wt=json&type=upov&start=0&qi=3-nNCXQ6etEVv184O9nnd5yg%3D%3D&q=cc%3AIT%20AND%20latin_name%3A(zea%20mays)%20AND%20den_info%3A" & Trim(Cultivar_Array(row, 1)) & "&facet=false"
Set ie = GetIE("https://www3.wipo.int" & "*")
If Not ie Is Nothing Then
ie.navigate (strTargetFile)
Do Until ie.ReadyState = 4: DoEvents: Loop
pageText = ie.Document.body.innertext
ActiveSheet.Cells(1, 1) = pageText
pageText = Empty
Else
MsgBox "IE not found!"
End If
Next row
End Sub