Data values of uploaded images - excel

There are a large number of classes ("df-table") on that link homepage. The first class was able to be entered into the Excel sheet, but not the second class ("df-table"), the third class ("df-table"), etc.
Various internet resources say it can be solved by using nth-of-type or using xpath, but the error keeps occurring.
I want the data values of the uploaded images.
Public Sub Jaemu()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim d As WebDriver, ws As Worksheet, URL As String
Set d = New ChromeDriver
Set ws = ThisWorkbook.Worksheets("gemstone2")
Dim http As New WinHttpRequest
With d
'.AddArgument "--headless"
.Start "Chrome"
Dim html As HTMLDocument
Dim JsonObject As Object
Set html = New HTMLDocument
URL = "https://globalmonitor.einfomax.co.kr/infomax_ds.html#/USA/1/1"
.get URL, Raise:=False ' set raise to false to avoid a timeout error
d.FindElementByCss("[ng-click='openStockSearchPopup();']").Click
d.FindElementByCss("[ng-enter='searchStockSearchPopup(true);']").SendKeys "GOOGL"
d.FindElementByCss("[ng-click='searchStockSearchPopup(true);']").Click
d.FindElementByCss("[class='slick-cell l1 r1 text-center clickable']").Click
Cells(2, 1).Value = d.FindElementByCss("[class='df-table']").Text
Cells(3, 1).Value = d.FindElementByCss(".table-contents[ng-if='IS_RT_STATE_SUCCESS(requeststate.prospectData)'] > .df-table").Text
End With
End Sub
Original OP error:
runtime error 32.
Coding line.
Cells(3, 1).Value = d.FindElementByCss("[class='df-table' class:nth-of-type(2)]").Text
New error following initial suggestion to use different CSS selector:
Runtime error 7
Coding line.
Cells(3, 1).Value = d.FindElementByCss(".table-contents[ng-if='IS_RT_STATE_SUCCESS(requeststate.prospectData)'] > .df-table").Text

Initial error (RTE32):
The :nth-of-type() pseudo class selector would go outside of the attribute selector closing ] i.e. "[class='df-table']:nth-of-type(2)", however this does not provide a match.
You can use:
.table-contents[ng-if='IS_RT_STATE_SUCCESS(requeststate.prospectData)'] > .df-table
This returns a single match and is specific enough, and not reliant on a long selector list, that it is likely fairly stable. Based on experience.
If you had wanted to use :nth-of-type() you could have used it on a parent element and taken first match, but that is less stable and prone to breaking if html changes:
.contents-area:nth-of-type(5) .df-table
Follow-up error (RTE7):
The element needs to be scrolled into view.
A not ideal way is (your mileage may vary):
d.ExecuteScript "window.scrollTo(0, document.body.scrollHeight/3);"
Application.Wait Now + TimeSerial(0, 0, 2)
Activesheet.Cells(1,3) = d.FindElementByCss(".table-contents[ng-if='IS_RT_STATE_SUCCESS(requeststate.prospectData)'] > .df-table").Text
There are better ways with inbuilt functions and custom JS which you can likely pull from the examples.xlsm by the author on GitHub. As I can only test with Python this was a quick and dirty test that worked.
You could avoid overhead of browser and use XHR to get a JSON response to then use a JSON parser with, as follows:
Option Explicit
Public Sub GetData()
Dim s As String
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", "https://globalmonitor.einfomax.co.kr/facset/getKeyData", False
.SetRequestHeader "User-Agent", "Mozilla/5.0"
.SetRequestHeader "content-type", "application/json;charset=UTF-8"
.send "{""param"":""NAS:AAPL""}"
s = .responsetext
End With
Debug.Print s
End Sub

Related

VBA scrape HTML from URL with JavaScript elements

Using the following code
Sub Test()
'Must have the Microsoft HTML Object Library reference enabled
Dim oHtml As HTMLDocument
Dim oElement As Object
Dim link As String
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://www.afklcargo.com/WW/en/local/app/index.jsp#/tntdetails/074-26126063", False
.Send
oHtml.Body.innerHTML = .responseText
End With
End Sub
I am unable to get the actual HTML, I believe it's because the website is using Javascript? How can I circumvent this so I can get my value?
It is dynamically added via another xhr call which you can find in the network tab of browser. So change your url to that and then use a json parser to parse the response.
Use a json parser, such as jsonconverter.bas to handle the response. After installing the code from that link in a standard module called JsonConverter, go to VBE > Tools > References > Add a reference to Microsoft Scripting Runtime.
You extract your value from the json which is returned as unix timestamp.
1561791600000 > 2019-06-29T07:00:00.000Z
The json response actually has all the info regarding the tracking. You can explore it here.
Option Explicit
Public Sub Test()
Dim json As Object
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://www.afklcargo.com/resources/tnt/singleAwbDetails?awbId=074-26126063", False
.send
Set json = JsonConverter.ParseJson(.responseText)
Debug.Print json("booking")(1)("arrivalDateLong")("local")
End With
End Sub
Two functions, by Schmidt and Navion for doing the conversion to datetime from stamp copied directly from here.
Function Epoch2Date(ByVal E As Currency, Optional msFrac) As Date
Const Estart As Double = #1/1/1970#
msFrac = 0
If E > 10000000000# Then E = E * 0.001: msFrac = E - Int(E)
Epoch2Date = Estart + (E - msFrac) / 86400
End Function
Function Epoch2DateS(ByVal epochstamp$) As Date
Epoch2DateS = CVDate(CDbl(DateSerial(1970, 1, 1)) + Int(Val(epochstamp$) / 1000#) / 86400)
End Function
N.B.
Using json parser is the reliable and recommended way, but you can also regex or split function to get the value.

Excel vba getElementsByClassName

I am trying to scrape IPO date from crunchbase.
Unfortunately I get Runtime Error 1004 “Application-defined or Object-defined error”.
My goal is to save IPO date in A1 cell.
Sub GetIE()
Dim IE As Object
Dim URL As String
Dim myValue As IHTMLElement
URL = "https://www.crunchbase.com/organization/verastem"
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate URL
Do While IE.Busy Or IE.ReadyState <> 4
DoEvents
Loop
Set myValue = IE.Document.getElementsByClassName("post_glass post_micro_glass")(0)
Range("A1").Value = myValue
Set IE = Nothing
End Sub
I can't find that class name in the html for that url. You can use the css selector I show below which can be scraped by xmlhttp and thus avoiding opening a browser
Option Explicit
Public Sub GetDate()
Dim html As HTMLDocument
Set html = New HTMLDocument '< VBE > Tools > References > Microsoft Scripting Runtime
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.crunchbase.com/organization/verastem#section-overview", False
.send
html.body.innerHTML = .responseText
End With
ActiveSheet.Range("A1") = html.querySelectorAll(".field-type-date.ng-star-inserted").item(1).innerText
End Sub
If you don't want to use compound classes then you can also use
ActiveSheet.Range("A1") = html.querySelectorAll("#section-ipo-stock-price .field-type-date").item(1).innerText
You can see the relevant html here:
Note the element has multiple (compound) classes
<span class="component--field-formatter field-type-date ng-star-inserted" title="Jan 27, 2012">Jan 27, 2012</span>
There are 3 classes component--field-formatter ; field-type-date and ng-star-inserted. I use two of these in combination in the first solution I give. Multiple classes is popular now-a-days due to the versatility it gives in page styling e.g. it allows overriding styles easily. You can read about css specificity* to understand this better.
More classes may mean the code is a little less robust as the ordering of classes may be changed and a class, or more, may be removed. This was raised by #SIM in a comment on an answer to another web-scraping question. Thus, I offer one solution with two of the classes used, and another solution with only one of the classes used.
Whilst you do get the same date for this page with simply:
ActiveSheet.Range("A1") = html.querySelector("#section-ipo-stock-price .field-type-date").innerText
I wouldn't want to assume that would always hold true as it grabs the date from the line where it says "Their stock opened".
* https://developer.mozilla.org/en-US/docs/Web/CSS/Specificity
References:
querySelectorAll
css selectors

VBA post request with formdata (URL doesnt change)

Ive been going through many similar questions, like this and this but mine is much simpler.I want to change the date on a webform and get the data using POST request
I have this code which makes a POST request:
Sub winpost()
Dim WebClient As WinHttp.WinHttpRequest
Set WebClient = New WinHttp.WinHttpRequest
Dim searchResult As HTMLTextElement: Dim searchTxt As String
Dim html As New HTMLDocument
Dim Payload As String
Payload = "ContentPlaceHolder1_ddlday=6"
With WebClient
.Open "POST", "http://pib.nic.in/AllRelease.aspx", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send (Payload)
.waitForResponse
End With
html.body.innerHTML = WebClient.responseText
Set searchResult = html.querySelector(".search_box_result"): searchTxt = searchResult.innerText
Debug.Print searchTxt
End Sub
The website is this.The page sends a post request onchange of any fields.
On looking at ChromeDevTools under network > Formdata section i see this:
ctl00$ContentPlaceHolder1$ddlday: 8
I have tried various versions of this in the Payload string.But it always returns the same page (8th jan).
Internet Explorer
With IE slightly different syntax from selenium basic (shown at bottom) as no SelectByText option. You can use indices or attribute = value css selectors for example. Here months are indices upto 12 instead of month names
Option Explicit
Public Sub SetDates()
Dim ie As New InternetExplorer
With ie
.Visible = True
.Navigate2 "http://pib.nic.in/AllRelease.aspx"
While .Busy Or .readyState < 4: DoEvents: Wend
With .Document
.querySelector("#btnSave").Click
.querySelector("#ContentPlaceHolder1_ddlMonth [value='2']").Selected = True
.querySelector("#ContentPlaceHolder1_ddlYear [value='2018']").Selected = True
.querySelector("#ContentPlaceHolder1_ddlday [value='2']").Selected = True
End With
Stop '<==delete me later
.Quit
End With
End Sub
Selenium basic:
If you do go down the selenium basic vba route you can do something like as follows. Note: You would need to go VBE > Tools > References > Add reference to selenium type library after installing selenium. You would also need latest Chrome and ChromeDriver and ChromeDriver folder should be placed on environmental path or chromedriver placed in folder containing selenium executables.
Option Explicit
Public Sub SetDates()
Dim d As WebDriver
Set d = New ChromeDriver
Const Url = "http://pib.nic.in/AllRelease.aspx"
With d
.Start "Chrome"
.get Url
.FindElementById("btnSave").Click
'date values
.FindElementById("ContentPlaceHolder1_ddlMonth").AsSelect.SelectByText "February"
.FindElementById("ContentPlaceHolder1_ddlYear").AsSelect.SelectByText "2018"
.FindElementById("ContentPlaceHolder1_ddlday").AsSelect.SelectByText "2"
Stop 'delete me later
.Quit
End With
End Sub

Crashing with loop of xmlhttp requests

I am trying to rewrite this macro from ie.application to http requests in VBA.
The macro navigates to a URL, GETs the html then parses and scrapes the required data.
It works well with one request, but when I try to do this with a FOR loop with thousands of URLs Excel crashes.
I think I should "close" the connection in some way before going to the next loop, but I did not find a solution.
Here is my code:
Sub GetQuotes()
Dim xmlhttp As New MSXML2.XMLHTTP60, myurl As String
Dim html As New HTMLDocument
For r = 1 To 10
'*****GO TO PRODUCT PAGE*****
path= ThisWorkbook.Worksheets("Sheet1").Cells(r, 1).Value
myurl = "https://some_domain.com" + path
xmlhttp.Open "GET", myurl, False
xmlhttp.send
html.body.innerHTML = xmlhttp.responseText
'*****GET PRICE*****
If Not html.getElementById("some_id") Is Nothing Then
price = html.getElementById("some_id").innerHTML
ThisWorkbook.Worksheets("sheet1").Cells(r, 2).Value = price
Else
price = "empty"
ThisWorkbook.Worksheets("sheet1").Cells(r, 2).Value = price
End If
Next r
End Sub
XML and HTML references are active
Squeeze in a
Do While xmlhttp.ReadyState <> 4
DoEvents
Loop
after the send. Your current code will fire all requests almost at once. With this, you will wait for the page to load before proceeding.
This worked:
A do-while with DoEvents inside just after the request is sent.
A DoEvents alone just before the next r.
Using one of them only, did not work.

Excel VBA source code for extracting data from a URL

I want to extract the title of every news item displayed on "http://pib.nic.in/newsite/erelease.aspx?relid=58313" website using Excel VBA. I have written a code using getelementsbyclassname("contentdiv"). But the debugger is showing a error pertaining to that the object doesn't support...I want to extract the information items of every relid..which is there in the URL as well...
Cold scrapes like this are generally handled more efficiently with a XMLHTTP pull. This requires the addition of a few libraries to the VBE's Tools ► References. The code below needs Microsoft XML, v6.0, Microsoft HTML Object library and Microsoft Internet Controls. Might not need the last one but you probably will if you expand the code beyond what is supplied.
Public Const csURL As String = "http://pib.nic.in/newsite/erelease.aspx?relid=×ID×"
Sub scrape_PIBNIC()
Dim htmlBDY As HTMLDocument, xmlHTTP As MSXML2.ServerXMLHTTP60
Dim i As Long, u As String, iDIV As Long
On Error GoTo CleanUp
Set xmlHTTP = New MSXML2.ServerXMLHTTP60
Set htmlBDY = New HTMLDocument
For i = 58313 To 58313
htmlBDY.body.innerHTML = vbNullString
With xmlHTTP
u = Replace(csURL, "×ID×", i)
'Debug.Print u
.Open "GET", u, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
.send
If .Status <> 200 Then GoTo CleanUp
htmlBDY.body.innerHTML = .responseText
For iDIV = 0 To (htmlBDY.getElementsByClassName("contentdiv").Length - 1)
If CBool(htmlBDY.getElementsByClassName("contentdiv")(iDIV).getElementsByTagName("span").Length) Then
Sheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0) = _
htmlBDY.getElementsByClassName("contentdiv")(iDIV).getElementsByTagName("span")(0).innerText
End If
Next iDIV
End With
Next i
CleanUp:
Set htmlBDY = Nothing
Set xmlHTTP = Nothing
End Sub
That should be enough to get you started. The site you are targeting requires that charset=UTF-8 be added to the request. I had no success without it. I strongly suspect that this may have been the source of your object doesn't support error.

Resources