Excel VBA - website is hiding element using script (MSXML) - excel

I am trying to scrape email id from this website https://www.construction.co.uk/c/689533/3d-structural-detailing-services-limited but website is somehow hiding email from scraping bot.
Need help in updating VBA code to get email from website or let me know if you have any alternative to do this.
HTML of website is
<div class="compInfoDetail">
<script>document.write(emrp('ufsftb/cfotpo.tifuumfA4etet/dp/vl','construction.co.uk'));</script>
<a href="mailto:teresa.benson-shettle#3dsds.co.uk?subject=Contact from construction.co.uk">
teresa.benson-shettle#3dsds.co.uk
</a></div>
VBA Code is
Sub construction_uk_contact_xml()
Dim IE As MSXML2.XMLHTTP60
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLBody As MSHTML.HTMLBody
url = Selection.Value
Set IE = New MSXML2.XMLHTTP60
IE.Open "GET", url, False
IE.Send
While IE.ReadyState <> 4
DoEvents
Wend
Set HTMLDoc = New MSHTML.HTMLDocument
Set HTMLBody = HTMLDoc.body
HTMLBody.innerHTML = IE.responseText
Set infos = HTMLBody.getElementsByClassName("compInfoSection")
For Each info In infos
If info.getElementsByTagName("div")(0).innerText = "Email" Then ActiveCell.Offset(0, 1).Value = info.getElementsByTagName("div")(1).outerHTML
Next
Set IE = Nothing
End Sub

Related

Select Item in a dropdown from website via Excel Macro

I would like to select the options "Addition, Bulk, Reduction" using excel VBA
This what I have so far, but nothing is being selected.
Dim ie As InternetExplorer
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate "my URL"
Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
'time_adjust_group_ident = Reduction
Dim doc As HTMLDocument
Set doc = ie.document
doc.getElementById("time_adjust_group_ident").Value = "Reduction"
End Sub
You don't need Internet Explorer object for this. Please take a look in the code below where I use MSXML2.XMLHTTP to make a HTTP request and get the HTML response as a string, and then parse it using the HTMLFile object.
I'm using the CreateObject method instead of adding the references via Tools > References, so you can run this code anywhere without having to add references manually every time you open this in a different machine.
In this example, I'm retrieving the children elements of the language-selector dropdown in a given website, and looping through it using a For Each to write each child element's content in a spreadsheet row.
Sub LoadHtml()
Dim strUrl As String
strUrl = "https://developer.mozilla.org/en-US/docs/Web/HTML/Element/select"
Dim httpRequest As Object
Set httpRequest = CreateObject("MSXML2.XMLHTTP")
With httpRequest
.Open "GET", strUrl, False
.send
End With
Dim html As Object
Set html = CreateObject("HTMLFile")
html.body.innerHTML = httpRequest.ResponseText
Dim child As Object
Dim row As Integer
row = 1
For Each child In html.getElementById("language-selector").Children
Range("A" & row) = child.innerText
row = row + 1
Next
End Sub

VBA Excel click on href and fill in form

im trying to get some data from a website by using vba.
The data i want is from this site: https://www.uitvoeringarbeidsvoorwaardenwetgeving.nl/mozard/!suite16.scherm1168?mSel=145576
What i want the code to do is click on the purple bar with the pencil
on it so the screen appears for filters and than fill in a specific time frame in the filters.
When this is done i want to get the data that appears.
Im able to navigate to the site and click on the purple bar so the filter screen appears. but i cant fill in the dates
this is the code i have so far:
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLInput As MSHTML.IHTMLElement
Dim HTMLAs As MSHTML.IHTMLElementCollection
Dim HTMLA As MSHTML.IHTMLElement
Dim pastDate As MSHTML.IHTMLElement
Dim futuredate As MSHTML.IHTMLElement
IE.Visible = True
IE.Navigate "https://www.uitvoeringarbeidsvoorwaardenwetgeving.nl/mozard/!suite16.scherm1168?mGmr=66"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.Document
Set HTMLAs = HTMLDoc.getElementsByTagName("a")
For Each HTMLA In HTMLAs
'Debug.Print HTMLA.className, HTMLA.getAttribute("href"), HTMLA.getAttribute("rel"), HTMLA.innerText
If HTMLA.getAttribute("href") = "https://www.uitvoeringarbeidsvoorwaardenwetgeving.nl/mozard/!suite16.scherm1168?mGmr=66#editmodal" Then
HTMLA.Click
Exit For
End If
Next HTMLA
Do While IE.ReadyState <> 4 Or IE.Busy:
DoEvents: Loop
Set HTMLInput = HTMLDoc.getElementById("frm_FKMT_B931_542_823883_dva_id1")
HTMLInput.Value = "01-01-2020" 'THIS GIVES AN ERROR?
The last line of code gives an error and i dont understand why??
This is the HTML code from the website that i want to change the value of:
<input name="FKMT_B931_542_823883_dva" class="datumveld form-control" id="frm_FKMT_B931_542_823883_dva_id1" type="text" pattern="(0[1-9]|1[0-9]|2[0-9]|3[01]).(0[1-9]|1[012]).[0-9]{4}">
Thanks and sorry for the inconvience or poorly asked question, if there is anything else you guys need to now please feel free to ask!
Thank you!!
This is an example to fill the first date field. The IDs seems not very stable.
Beware: There is a pattern for the entered dates
pattern="(0[1-9]|1[0-9]|2[0-9]|3[01]).(0[1-9]|1[012]).[0-9]{4}"
There are some html events. I don't know if it is necessary to trigger them to make the dialog realy work.
Have you checked if the page works in IE?
Sub OpenAndFillForm()
Dim browser As Object
Dim url As String
Dim nodeToClick As Object
Dim nodeForm As Object
Dim nodeFirstDate As Object
url = "https://www.uitvoeringarbeidsvoorwaardenwetgeving.nl/mozard/!suite16.scherm1168?mGmr=66"
Set browser = CreateObject("internetexplorer.application")
browser.Visible = True
browser.navigate url
Do Until browser.readyState = 4: DoEvents: Loop
Set nodeToClick = browser.document.getElementByID("tabel2").getElementsByTagName("a")(0)
nodeToClick.Click
Application.Wait Now + TimeValue("00:00:02")
Set nodeForm = browser.document.getElementByID("tabel12")
Set nodeFirstDate = nodeForm.getElementsByClassName("datumveld")(0)
nodeFirstDate.Value = "31-12-2019"
End Sub

VBA : how to connect MSXML2.XMLHTTP60 response to IHTMLDocument(iframe)

I am trying to use MSXML2 and IHTMLDocument
to deal with iframe part of HTML web page.
I want to use MSXML2 and save it to better "capture" the data, thinking it's faster than just using InternetExplorer or VBA selenium reference supported by VBA menu.
(I don't want to avoid using IE or selenium as much as possible)
But I couldn't find out how to save document as XML format(to take advantage of its speed) and at the same time click on the element in the document without the help of browser(ie or selenium).
And even after clicking some tab(id="cns_Tab21") on this web page, I have difficulty retrieving data.
So my question is..
1> Is it possible to minimize the use of browser for clicking?
2> Even after clicking(using Selenium), it throws an xpath related error in VBA editor.
Thank you for your answer in advance and the URL used for this is
http://bitly.kr/finance
and the iframe inside the link is http://bitly.kr/LT0aCb
'I declared objects
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim iframeDoc As IHTMLDocument
'and saved XML data to HTML format
HTMLDoc.body.innerHTML = XMLReq.responseText
'and trying to save this HTML to iframe...
Set iframeDoc = HTMLDoc.getElementById("coinfo_cp")
'I tried .contentDocument but it maybe HTMLdoc doesn't have this property.
and I don't know how to access information I saved to iframeDoc above.
'And after I use Selenium I can't figure out why it throw an error
For Each ele In selenium.FindElementsByTag("th")
If ele.Attribute("innerText") = "CAPEX" Then
Debug.Print ele.FindElementsByXPath("./../td").Attribute("innerText")
This post isn't a duplicate since I am trying to use XML to handle iframe element and without InternetExplorer reference in VBA Excel.(ie.document)
You can make replicate the xhr request the page makes when that tab (not iframe) is selected. I use clipboard to copy table to Excel. Note: url I am using is from our discussions. This info should be reflected in question.
Option Explicit
Public Sub GetTable()
'VBE > Tools > References > Microsoft HTML Object Library
Dim html As HTMLDocument, hTable As HTMLTable, clipboard As Object
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://navercomp.wisereport.co.kr/v2/company/ajax/cF1001.aspx?cmp_cd=005930&fin_typ=0&freq_typ=Y&encparam=ZXR1cWFjeGJnS1lWOHhCYmNScmJXUT09&id=bG05RlB6cn", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
Set hTable = html.querySelector(".hbG05RlB6cn + .gHead01")
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") ' New DataObject
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
End Sub
You can find the params of the ajax url for the tab content update in the scripts of the page
Along with the target for the update:
This needs tidying up:
Option Explicit
Public Sub GetTable()
'https://navercomp.wisereport.co.kr/v2/company/c1010001.aspx?cmp_cd=005930
'VBE > Tools > References > Microsoft HTML Object Library
Dim html As HTMLDocument, hTable As HTMLTable, clipboard As Object, ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://navercomp.wisereport.co.kr/v2/company/ajax/cF1001.aspx?cmp_cd=005930&fin_typ=0&freq_typ=Y&encparam=ZXR1cWFjeGJnS1lWOHhCYmNScmJXUT09&id=bG05RlB6cn", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
Set hTable = html.querySelector(".hbG05RlB6cn + .gHead01") '2nd tab. CAPEX row
Dim html2 As HTMLDocument, i As Long
Set html2 = New HTMLDocument
html2.body.innerHTML = hTable.outerHTML
Dim tableBodyRows As Object, tableBodyRowLength As Long, tableHeaderRowLength As Long, tableHeaderRows As Object, targetRow As Long
Set tableBodyRows = html2.querySelectorAll("tbody tr .bg")
tableBodyRowLength = tableBodyRows.Length
tableHeaderRowLength = html2.querySelectorAll("thead tr").Length + 2
For i = 0 To tableBodyRowLength - 1
If Trim$(tableBodyRows.item(i).innerText) = "CAPEX" Then
targetRow = i + tableHeaderRowLength + 1
Exit For
End If
Next
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") ' New DataObject
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ws.Cells(1, 1).PasteSpecial
Dim unionRng As Range
For i = (tableHeaderRowLength + 1) To (tableBodyRowLength + tableHeaderRowLength)
If i <> targetRow Then
If Not unionRng Is Nothing Then
Set unionRng = Union(ws.rows(i), unionRng)
Else
Set unionRng = ws.rows(i)
End If
End If
Next
If Not unionRng Is Nothing Then unionRng.Delete
End Sub

Scrape website (Excel vba) with xml http request after cookie has been set

I would like to scrape a website (extract a product price) from a single website page (with XML HTTP request). But before this script should run I need to have selected the correct store first (saved in browser cookie variable or included in any other way/request if possible) since prices are different in different shops.
I have created a working code but it's taking a very long time to run so i assume there must be faster and cleaner :) way. I also needed to include the application to wait for the website to follow the steps.
My current vba code:
runs a HTTP IE request to open the website, and in multiple clicks selects the desired store and saves it in a cookie (like a site user should do)
next the product page is requested with another HTTP IE request and data is extracted. I found out a can't use the XML HTTP request because it won't use the cookie value with the correct store, displaying the correct price.
The price i'm after (in the example below) is E 1,39 instead of E 1,48 (when no cookie value is used and no store is selected).
The cookie value is saved in the cookie "www.jumbo.com/cookie/HomeStore the Content is holding the store tag which is known upfront and could be hardcoded in a request if possible.
Selecting the correct store (and saving it in a browser cookie)
Sub SetStore()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLSearchbox As MSHTML.IHTMLElement
Dim HTMLSearchboxes As MSHTML.IHTMLElementCollection
Dim HTMLButton As MSHTML.IHTMLElement
Dim HTMLButtons As MSHTML.IHTMLElementCollection
Dim HTMLSearchButton As MSHTML.IHTMLElement
Dim HTMLSearchButtons As MSHTML.IHTMLElementCollection
Dim HTMLStoreID As MSHTML.IHTMLElement
Dim HTMLStoreIDs As MSHTML.IHTMLElementCollection
Dim HTMLSaveStore As MSHTML.IHTMLElement
Dim HTMLSaveStores As MSHTML.IHTMLElementCollection
'set on False to hide IE screen
IE.Visible = True
'navigate to url with limited content
IE.navigate "https://www.jumbo.com/content/algemene-voorwaarden/"
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.document
Set HTMLButtons = HTMLDoc.getElementsByTagName("button")
For Each HTMLButton In HTMLButtons
If HTMLButton.getAttribute("data-jum-action") = "openHomeStoreFinder" Then
HTMLButton.Click
Exit For
End If
Next HTMLButton
Application.Wait Now + #12:00:02 AM#
Set HTMLSearchboxes = HTMLDoc.getElementsByTagName("input")
For Each HTMLSearchbox In HTMLSearchboxes
If HTMLSearchbox.getAttribute("id") = "searchTerm__DkKYx4XylsAAAFJktpb2Guy" Then
'input field store name/location to show search results
HTMLSearchbox.Value = "Oosterhout"
Application.Wait Now + #12:00:03 AM#
HTMLSearchbox.Click
Exit For
End If
Next HTMLSearchbox
Set HTMLSearchButtons = HTMLDoc.getElementsByTagName("button")
For Each HTMLSearchButton In HTMLSearchButtons
If HTMLSearchButton.getAttribute("data-jum-filter") = "search" Then
HTMLSearchButton.Click
Exit For
End If
Next HTMLSearchButton
Application.Wait Now + #12:00:05 AM#
Set HTMLStoreIDs = HTMLDoc.getElementsByTagName("li")
For Each HTMLStoreID In HTMLStoreIDs
'oosterhout = YC8KYx4XB88AAAFIDcIYwKxJ
'nieuwegein = 84IKYx4XziUAAAFInSYYwKrH
'vaassen = JYYKYx4XC1oAAAFItvcYwKxJ
'brielle = OG8KYx4XP4wAAAFIlsEYwKxK
If HTMLStoreID.getAttribute("data-jum-store-id") = "YC8KYx4XB88AAAFIDcIYwKxJ" Then
HTMLStoreID.Click
Application.Wait Now + #12:00:03 AM#
Exit For
End If
Next HTMLStoreID
Set HTMLSaveStores = HTMLDoc.getElementsByTagName("button")
For Each HTMLSaveStore In HTMLSaveStores
If HTMLSaveStore.getAttribute("data-jum-action") = "saveHomeStore" Then
HTMLSaveStore.Click
Exit For
End If
Next HTMLSaveStore
'IE.Quit
End Sub
Extracting data from product page (IE HTTP request, working with cookie store value)
Sub GetJumboPriceIE()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim JumInputs As MSHTML.IHTMLElementCollection
Dim JumInput As MSHTML.IHTMLElement
Dim JumPrice As MSHTML.IHTMLElement
Dim JumboPrice As Double
Dim Price_In_Cents_Tag As String
Dim SKU_tag As String, SKU_url As String
SKU_tag = "173140KST"
SKU_url = "https://www.jumbo.com/lu-bastogne-koeken-original-260g/173140KST/"
IE.Visible = False
IE.navigate SKU_url
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.document
IE.Quit
Set JumInputs = HTMLDoc.getElementsByTagName("input")
Price_In_Cents_Tag = "PriceInCents_" & SKU_tag
Set JumPrice = HTMLDoc.getElementById(Price_In_Cents_Tag)
JumboPrice = JumPrice.getAttribute("value") / 100
Debug.Print JumboPrice
End Sub
The code above is working but would like to use XML HTTP request code like below (but using the correct store). The price of 1,39 is printed.
Extracting data from product page (using XML HTTP request), but cookie value is not used
Sub GetJumboPriceXML()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim JumInputs As MSHTML.IHTMLElementCollection
Dim JumInput As MSHTML.IHTMLElement
Dim JumPrice As MSHTML.IHTMLElement
Dim JumboPrice As Double
Dim Price_In_Cents_Tag As String
Dim SKU_tag As String, SKU_url As String
SKU_tag = "173140KST"
SKU_url = "https://www.jumbo.com/lu-bastogne-koeken-original-260g/173140KST/"
XMLReq.Open "GET", SKU_url, False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Set JumInputs = HTMLDoc.getElementsByTagName("input")
Price_In_Cents_Tag = "PriceInCents_" & SKU_tag
Set JumPrice = HTMLDoc.getElementById(Price_In_Cents_Tag)
JumboPrice = JumPrice.getAttribute("value") / 100
Debug.Print JumboPrice
End Sub
This code is not using the correct store and outputting the price i'm not after (The price 1,48 is printed).
To summarize:
When no store is selected (no cookie set) the following URL now gives the price of €1,48.
I would like the VB script to set the store to “Jumbo Oosterhout Nieuwe Bouwlingstraat” and then scrape a predefined list op product URL’s and extract the prices (URL above gives €1,39).
Then set the store to a different local store “Jumbo Brielle Thoelaverweg” and scrape the identical list of product URL’s. The above URL gives €1,48.
You can select a different store by clicking on the location pin icon at the top right of the page.
Thanks a lot for your help

Select Link on website after login with Excel 2013 VBA

I am trying to open a link on a webpage after logging in with VBA in excel. So far my code successfully opens the website and logs on. However, no examples I've seen in the forums seem to work for actually opening the next link.
HTML Code from the site:
<td class="nrmlFnt nWrap">
<a id="ctl00_wpm_ac_ac_rbk_ctl01_lnkBrokerageAccountName" oncontextmenu="return false;" href="javascript:__doPostBack('ctl00$wpm$ac$ac$rbk$ctl01$lnkBrokerageAccountName','')">Retirement</a>
</td>
While the innertext and ID are unique, I've been unable to select them.
I am new to VBA and may not have been implementing some of the solutions to similar problems quite right. Below is my code:
Dim HTMLdoc As HTMLDocument
Dim MyBrowser As InternetExplorer
Private Sub btnUpdate_click()
'Create Variables for web browsing
Dim MyHTML_Element As IHTMLElement 'Elements to search for: textboxes, buttons etc...
Dim MyURL As String
'Dim IE As InternetExplorerMedium
'Clear errors to prevent from stopping code
On Error GoTo Err_Clear
'Input Desired webpage login URL
MyURL = "https://client.schwab.com/Login/SignOn/CustomerCenterLogin.aspx"
Set MyBrowser = New InternetExplorer 'Creates new browser session
MyBrowser.Silent = True 'Avoids Pop-ups
MyBrowser.navigate MyURL 'Navigates to URL
MyBrowser.Visible = True 'Opens browser window. Change to False to run w/o opening
'Wait for web page to fully load
Do
DoEvents
Loop Until MyBrowser.readyState = READYSTATE_COMPLETE
Set HTMLdoc = MyBrowser.document
'Enter Username:
HTMLdoc.all.ctl00_WebPartManager1_CenterLogin_LoginUserControlId_txtLoginID.Value = "username"
'Enter Password:
HTMLdoc.all.txtpassword.Value = "password"
'Click Login button on webpage:
For Each MyHTML_Element In HTMLdoc.getElementsByTagName("a")
If MyHTML_Element.ID = "ctl00_WebPartManager1_CenterLogin_LoginUserControlId_btnLogin" Then MyHTML_Element.Click: Exit For
Next
'Wait for web page to fully load
Do
DoEvents
Loop Until MyBrowser.readyState = READYSTATE_COMPLETE
'Select Account to be displayed:
Set HTMLdoc = MyBrowser.document
Set e = HTMLdoc.getElementById("ctl00_wpm_ac_ac_rbk_ctl01_lnkBrokerageAccountName")
e.Click
'Wait for web page to fully load
Do
DoEvents
Loop Until MyBrowser.readyState = READYSTATE_COMPLETE
'Define Err_Clear Function:
Err_Clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
End Sub
I feel this should be something pretty simple like the following, but it hasn't worked:
Set MyHTML_Element= MyBrowser.Document.getElementByID("ctl00_wpm_ac_ac_rbk_ctl01_lnkBrokerageAccountName").getElementsByTagName("a")(0)
MyHTML_Element.Click
I have also tried the following with no success:
For Each MyHTML_Element In HTMLdoc.getElementsByTagName("a")
If MyHTML_Element.ID = "ctl00_wpm_ac_ac_rbk_ctl01_lnkBrokerageAccountName" Then MyHTML_Element.Click: Exit For
Next
The webpage loads, logs in and then... nothing happens.
So I got it to work finally. I apparently needed the webpage to finish loading and the simple "READYSTATE_COMPLETE" didn't cut it. So I ended up with this to make sure the page fully loaded:
With MyBrowser
Do While .Busy Or .readyState <> 4
DoEvents
Loop
End With
And then I used the following code to select the link:
Set HTMLdoc = MyBrowser.document
Set e = HTMLdoc.getElementById("ctl00_wpm_ac_ac_rbk_ctl01_lnkBrokerageAccountName")
e.Click
It finally goes to the correct page! Now I just need to pull the correct data... easy right?

Resources