HTML extracted from website in Excel comes with missing data - excel

I am trying to extract some data from a web page using the following VBA code in Excel:
Sub Extractor_data()
Dim http As New XMLHTTP60, html As New HTMLDocument
With http
.Open "GET", "https://www.pantone.com/connect/11-0105-TPX", False
.send
html.body.innerHTML = .responseText
End With
Range("A1") = html.querySelectorAll("#maincontent div div div div article div.bg")(0).innerHTML
End Sub
But what I get from it has missing data.
I get:
<div class="square" :style="squareStyle"></div>
<div class="code" v-if="color">
<p v-t="i18n.pantoneTiTle"></p>
<p v-text="color.code"></p>
<p v-if="color.name && color.name !== color.code" v-text="color.name"></p>
</div>
I was supposed to get:
<div class="square" style="background-color: rgb(239, 239, 223);"></div>
<div class="code">
<p>PANTONE</p>
<p>11-0104 TPX</p>
<p>Vanilla Ice</p>
</div>
I have tried to understand what's going on, I have tried some slightly different approaches, all to no avail.
Does anyone has an explanation and, hopefully, a fix?
Thank you.

Related

Excel VBA: Working with iFrame via IE Automation

I have a project that I am working on where I am trying to automate a site's behavior via Excel's VBA. So far, I know how to initialize a web browser from VBA, navigate to a website, and perform a simple task such as clicking on an item using the getElementById function and click method. However, I wanted to know how can I go about working with an embedded object(s) that is inside of an iframe.
For example, here is an overview of what the tree structure looks like via HTML source code. Of course, there are a lot more tags, but at least you can get an idea of what it is that I am trying to do.
<html>
<body>
<div>
<iframe class="abs0 hw100" scrolling="no" allowtransparency="true" id="Ifrm1568521068980" src="xxxxx" title="mailboxes - Microsoft Exchange" ldhdlr="1" cf="t" frameborder="0"> <<< ----- This is where I am stuck
<div>
<tbody>
<tr>
etc.....
etc.....
etc.....
</tr>
<tbody>
<div>
----------- close tags
I guess the biggest problem for me is to learn how to manipulate an embedded object(s) that is enclosed inside of an iframe because all of this is still new to me and I am not an expert in writing programs in VBA. Any help or guidance in the right direction will help me out a lot. Also, if you need more information or clarification, please let me know.
Here the code that I have written so far:
Option Explicit
'Added: Microsoft HTML Object Library
'Added: Microsoft Internet Controls
'Added: Global Variable
Dim URL As String
Dim iE As InternetExplorer
Sub Main()
'Declaring local variables
Dim objCollection As Object
Dim objElement As Object
Dim counterClock As Long
Dim iframeDoc As MSHTML.HTMLDocument 'this variable is from stackoverflow (https://stackoverflow.com/questions/44902558/accessing-object-in-iframe-using-vba)
'Predefining URL
URL = ""
'Created InternetExplorer Object
Set iE = CreateObject("InternetExplorer.Application")
'Launching InternetExplorer and navigating to the URL.
With iE
.Visible = True
.Navigate URL
'Waiting for the site to load.
loadingSite
End With
'Navigating to the page I need help with that contains the iFrame structure.
iE.Document.getElementById("Menu_UsersGroups").Click
'Waiting for the site to load.
loadingSite
'Set iframeDoc = iE.frames("iframename").Document '<<-- this is where the error message happens: 438 - object doesn't support this property or method.
'The iFrame of the page does not have a name. Instead "Ifrm1" is the ID of the iFrame.
End Sub
'Created a function that will be used frequently.
Function loadingSite()
Application.StatusBar = URL & " is loading. Please wait..."
Do While iE.Busy = True Or iE.ReadyState <> 4: Debug.Print "loading": Loop
Application.StatusBar = URL & " Loaded."
End Function
Please note: My knowledge of programming in VBA is on an entry-level. So, please bear with me if I don't understand your answer the first time around. Plus, any nifty documentation or videos about this topic will help me a lot as well. Either way, I'm very determined to learn this language as it is becoming very fun and interesting to me especially when I can get a program to do exactly what it was designed to do. :)
You try to use the following code to get elements from the iframe:
IE.Document.getElementsbyTagName("iframe")(0).contentDocument.getElementbyId("txtcontentinput").Value = "BBB"
IE.Document.getElementsbyTagName("iframe")(0).contentDocument.getElementbyId("btncontentSayHello").Click
Sample code as below:
index page:
<input id="txtinput" type="text" /><br />
<input id="btnSayHello" type="button" value="Say Hello" onclick="document.getElementById('result').innerText = 'Hello ' + document.getElementById('txtinput').value" /><br />
<div id="result"></div><br />
<iframe width="500px" height="300px" src="vbaiframecontent.html">
</iframe>
vbaframeContent.html
<input id="txtcontentinput" type="text" /><br />
<input id="btncontentSayHello" type="button" value="Say Hello" onclick="document.getElementById('content_result').innerText = 'Hello ' + document.getElementById('txtcontentinput').value" /><br />
<div id="content_result"></div>
The VBA script as below:
Sub extractTablesData1()
'we define the essential variables
Dim IE As Object, Data As Object
Dim ticket As String
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate ("<your website url>")
While IE.ReadyState <> 4
DoEvents
Wend
Set Data = IE.Document.getElementsbyTagName("input")
'Navigating to the page I need help with that contains the iFrame structure.
IE.Document.getElementbyId("txtinput").Value = "AAA"
IE.Document.getElementbyId("btnSayHello").Click
'Waiting for the site to load.
'loadingSite
IE.Document.getElementsbyTagName("iframe")(0).contentDocument.getElementbyId("txtcontentinput").Value = "BBB"
IE.Document.getElementsbyTagName("iframe")(0).contentDocument.getElementbyId("btncontentSayHello").Click
End With
Set IE = Nothing
End Sub
After running the script, the result as below:

How to click on the button after data paste

I am trying to click on the button after I have pasted a value on a text box. However none of the code I tried seems to work. On the same column there are 2 text boxes and few buttons. I managed to open up a new frame after clicked on the 1st button with the below code:
Set the_button_elements = doc.getElementsByTagName("div")
For Each button_element In the_button_elements
If button_element.getAttribute("class") = "pzbtn-mid" Then
button_element.Click
Exit For
End If
Next button_element
I have tried by changing the tag name and attribution and also the below code but it still doesn't work:
doc.querySelectorAll("[type=button]").item(3).Click
Below is the element for the 1st button which work with the above code:
<button type="button" class="pzhc" id="AcctNumber"
disabled="" onclick="this.disabled=true;javascript: LoadAcct();"
title="Search">
<div class="pzbtn-lft">
<div class="pzbtn-rgt">
<div class="pzbtn-mid" data-click="...">
<img class="pzbtn-i">Go</div></div></div></button>
Below is the element for the next button which I am trying to find the code to make it work:
<button type="button" class="pzhc" id="SearchBtn"
disabled="true"onclick="getCaseDetails();">
<div class="pzbtn-lft">
<div class="pzbtn-rgt">`enter code here`
<div class="pzbtn-mid">
<img class="pzbtn-i">Search</div></div></div></button>
Appreciate someone could guide me by providing me the code that click on the button as the button is still grey out even after the text pasted. However this is the same situation for the 1st code but it still works.
Perhaps the issue relates to the disabled attribute. If we want to click the button, we should make sure it is enabled status. After clicking the first button or in the textbox change event, you could use the following JavaScript code to enable the Search button:
document.getElementById("SearchBtn").disabled = false;
You could check the following sample:
<button type="button" class="pzhc" id="AcctNumber" onclick='this.disabled=true;javascript: alert("Go"); document.getElementById("SearchBtn").disabled = false;'
title="Search">
<div class="pzbtn-lft">
<div class="pzbtn-rgt">
<div class="pzbtn-mid" data-click="...">
<img class="pzbtn-i">Go
</div>
</div>
</div>
</button>
<button type="button" class="pzhc" id="SearchBtn"
disabled="true" onclick='javascript: alert("Search");'>
<div class="pzbtn-lft">
<div class="pzbtn-rgt">
`enter code here`
<div class="pzbtn-mid">
<img class="pzbtn-i">Search
</div>
</div>
</div>
</button>
Then, if I use the following VBA script click the button, it will click the Go button, then, enable the search button and click it:
Sub login()
Const Url$ = "https://dillion132.github.io/vbatestpage.html"
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
With ie
.navigate Url
ieBusy ie
.Visible = True
'Find the related button
Dim button_go_element As Object, button_search As Object
Set button_go_element = .document.getElementById("AcctNumber")
Set button_search = .document.getElementById("SearchBtn")
button_go_element.Click
button_search.Click
End With
End Sub
Sub ieBusy(ie As Object)
Do While ie.Busy Or ie.readyState < 4
DoEvents
Loop
End Sub

Read straight web content with Excel VBA

there are many article on this site on how to read tags and tables in web sites with Excel VBA, but I am stuck here.
This website gives me business locations after entering a Zip code.
("Where is the closest location relative to my Zip Code")
I managed to navigate to the site, enter the Zip code and click Submit:
Dim Browser As SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Set Browser = New SHDocVw.InternetExplorer ' create a browser
Browser.Visible = True ' make it visible
Application.StatusBar = ".... opening page"
Browser.navigate "https://www.thewebsite.com" ' navigate to page
WaitForBrowser Browser, 1 ' wait for completion or timeout
Application.StatusBar = "gaining control over DOM object"
Set HTMLDoc = Browser.document ' load the DOM object
WaitForBrowser Browser, 1
HTMLDoc.getElementById("ZipCode").Value = "28278"
HTMLDoc.getElementById("localTeamZipSubmit").Click
The site opens and the relevant content looks like this:
<div>
<div class="columns">
<div class="column boldText paddingFive" style="padding-left: 20px; width: 70px;">
Location:
</div>
<div class="column paddingTopFive">CHARLOTTE</div>
</div>
<div class="columns">
<div class="column boldText paddingFive" style="padding-left: 20px; width: 120px;">
Location Number:
</div>
<div class="column paddingTopFive">102340</div>
</div>
<div class="columns">
<div class="column boldText paddingTopFive paddingLeftTwenty" style="vertical-align: top;">
Address:
</div>
<div class="column paddingTopFive paddingLeftTwenty">
<div>8848 Main St.</div>
<div>Suite F</div>
<div></div>
<div>Charlotte, NC 27218</div>
</div>
</div>
<div class="columns">
<div class="column boldText paddingFive" style="padding-left: 20px; width: 70px;">
Phone:
</div>
<div class="column paddingTopFive">(704) 911-4440</div>
</div>
<div class="columns">
<div class="column boldText paddingFive" style="padding-left: 20px; width: 70px;">
Fax:
</div>
<div class="column paddingTopFive">(704) 911-4441</div>
</div>
</div>
As you can see, this section has no table, no named tags and classes that are use over and over.
I was not able to read this information yet. I would be happy to get the whole blob into a String and parse it"
"Text = HTMLDoc.getEverything()"
Thanks a lot for your help!!!
In the meantime I found another code snippet that I modified but I am getting stuck at the same point:
Post and submit works but how to get the answer....
{ Private Sub PostalCodes()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
On Error GoTo errHandler
ie.Visible = 1
With ie
.navigate "https://www.pattersondental.com/ContactUs/MyLocalTeam"
Do While .busy: DoEvents: Loop
Do While .ReadyState <> 4: DoEvents: Loop
With .document.Forms("GetBranchFromZipForm")
.ZipCode.Value = "28273"
.submit
End With
' Do While Not CBool(InStrB(1, .document.URL, _
' "cp_search_response-e.asp"))
' DoEvents
' Loop
Do While .busy: DoEvents: Loop
Do While .ReadyState <> 4: DoEvents: Loop
' MsgBox .document.all.tags("Colums").Item(1).Rows(1).Cells(1).innerText
MsgBox .document.all.tags("Colums").innerText
' MsgBox .document}
I guess I have to search no for "how to dissect a HTML document"...
Add on:
It seems that while ie is a valid item (in the watch window) IE.Document is empty... why can this be, The website is still there with new data.
I even tried another code snippet that looks for open websites in IE, it finds the site (with the correct data) but the document is still empty and getelementBY... does not find anything of course.
I am about to start drinking...
I can't believe it.
After 3 days of poking I found this:
With ActiveSheet.QueryTables.Add(Connection:="URL;
https://www.pattersondental.com/ContactUs/MyLocalTeam",
Destination:=Range("A1"))
.PostText = "ZipCode=70032"
.RefreshStyle = xlOverwriteCells
.SaveData = True
.Refresh
I don't pretend to understand why it works, but is does.
John, I will still check out, what you suggested. Thanks

VBA web scrape innertext

I have now tried for quite some time to web scrape this innertext:
I want the value 0606 copied to an Excel sheet
<TABLE class="group"
<td width="100%" nowrap="" colspan="3">
<input name="pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityQualifier"
type="HIDDEN" value="CPR">CPR-nr:
<input name="pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityValue"
type="HIDDEN" value="0606">0606</td>
I have tried through get.attribute,getelementbyclassname, value and innertext, but now I need some fresh eyes on it.
Does any of you have a good idea?
Something like this should work, however without your code I don't know how you're obtaining your HTMLDocument:
Dim oHTMLDocument As Object
Dim ele As Object
Set oHTMLDocument = ... 'No code provided so I'm unsure how you obtained the HTMLDocument
For Each ele in oHTMLDocument.getElementsByTagName("input")
If ele.Name = "pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityValue" Then
Debug.Print ele.innerText
Exit For
End If
Next ele
You can use a CSS selector and avoid the need for a loop.
input[name="pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityValue"]
Try it
VBA
.querySelector is accessed via the HTML document set when you have your page (method not shown in your question) but with Internet Explorer for example:
IE.document.querySelector("pg41_PolicyHolder_FogP_PolicyHolderId_FogP_IdentityValue").innerText
Further info:
HTML DOM querySelector() Method

VBA - Trying to extract information from webpage through HTML but can't return values inside {{ }}

I am currently trying to return some data from a webpage into Excel using VBA. The website in question is linked below:
http://thecise.com/market/securities/3160
I want to extract the data from the "Category:" listing, which in this case reads "Hedge Fund".
I've been using some code put together previously from the following thread: VBA HTML Tag Hierarchy, which in its current form does pull down all the information within the HTML lists, but doesn't work for the "Category:" heading.
I'm not very experienced with HTML, but the difference appears to be that the HTML code for this heading includes some reference value contained within {{ }}, rather than the text "Hedge Fund" as with the other list items.
<li>
<span>Category:</span>
<span>{{security.s_cmpycat}}</span
</li>
Is there a way that I can adjust the code below to return the value presented on the website, rather than the raw HTML code? (i.e. return the value "Hedge Fund" rather than "{{security.s_cmpycat}}" I tried using the "From Web" data extract tool from Excel too but that didn't work either.
I realise that the code below lists all the list values rather than just the one above, but I hadn't tailored it further until I could figure out how to return the right value.
VBA Code:
Sub GetCISEDAta()
Dim xHttp As MSXML2.XMLHTTP
Dim hDoc As MSHTML.HTMLDocument
Dim hUls As MSHTML.IHTMLElementCollection
Dim hUl As MSHTML.HTMLListElement
Dim hLi As MSHTML.HTMLLIElement
Set xHttp = New MSXML2.XMLHTTP
xHttp.Open "GET", "http://thecise.com/market/securities/3160"
xHttp.send
Do
DoEvents
Loop Until xHttp.readyState = 4
Set hDoc = New HTMLDocument
hDoc.body.innerHTML = xHttp.responseText
Set hUls = hDoc.getElementsByTagName("ul")
For Each hUl In hUls
For Each hLi In hUl.Children
Debug.Print hLi.innerText
Next hLi
Next hUl
End Sub
HTML Code Section:
<div class="row">
<div class="security-listing-col security-listing-registration-data">
<ul class="dl-list list-unstyled">
<li>
<span>ISIN:</span>
<span>GG00B247XG70</span>
</li>
<li>
<span>Date Listed:</span>
<span>28-09-2007</span>
</li>
<li>
<span>Domicile:</span>
<span>Guernsey</span>
</li>
<li>
<span>Sponsor:</span>
<span>Vistra Fund Services (Guernsey) Ltd</span>
</li>
<li>
<span>Category:</span>
<span>{{security.s_cmpycat}}</span>
</li>
</ul>
</div>
<div class="security-listing-col security-listing-col-hidemd">
<div class="hr mb-20"></div>
</div>
Thanks!

Resources