VBA With CreateObject("msxml2.xmlhttp") - getting data from table with irregular structure - excel

I've aged 5 years spending hours trying to solve this and spent hours and hours trying to understand it, so here goes :)
I am trying to extract some tables from
this company page on Market Screener
using the CreateObject method.
Taking table(25) as an example (this one) (screenshot, I am trying to extract the table "Type of business" and the first column listings the business types (not the 2016, 2017 and Delta columns).
I found a head-startonline in this
2016 stackoverflow thread
Dim oDom As Object: Set oDom = CreateObject("htmlFile")
Dim x As Long, y As Long
Dim oRow As Object, oCell As Object
Dim vData As Variant
Dim link As String
link = "https://www.marketscreener.com/COLUMBIA-SPORTSWEAR-COMPA-8859/company/"
y = 1: x = 1
With CreateObject("msxml2.xmlhttp")
.Open "GET", link, False
.send
oDom.body.innerHTML = .responseText
End With
With oDom.getElementsByTagName("table")(25)
ReDim vData(1 To .Rows.Length, 1 To 11) '.Rows(1).Cells.Length)
For Each oRow In .Rows
For Each oCell In oRow.Cells
vData(x, y) = oCell.innerText
y = y + 1
Next oCell
y = 1
x = x + 1
Next oRow
End With
Sheets(2).Cells(66, 2).Resize(UBound(vData), UBound(vData, 2)).Value = vData
It sort-of works, but is returning a jumbled table with all the data in it in a single cell, like this, but jumbled into a single cell
I then found another tweak online, which was this, which suggests copy and pasting and letting Excel work out how to paste it in, which sort of works too:
With oDom.getElementsByTagName("table")(25)
Dim dataObj As Object
Set dataObj = CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
dataObj.SetText "<table>" & .innerHTML & "</table>"
dataObj.PutInClipboard
End With
Sheets(2).Paste Sheets(2).Cells(66, 1)
Which creates this result sort-of correctly, but not just the values - I am trying to paste special, without any formatting.
Driving me a bit nuts and get the concept but completely stuck at the moment. Is there a way to do it? I can replicate it on on tables on that page and other tabs then if I have a head-start.
Any help greatly appreciated,
Best Regards,
Paul

If you have Excel 2010+, you can do this using Power Query.
You can set up a query to get this Data from the Web.
The PQ code would be:
let
Source = Web.Page(Web.Contents("https://www.marketscreener.com/COLUMBIA-SPORTSWEAR-COMPA-8859/company/")),
myData = Source{3}[Data],
firstColumn = {List.First(Table.ColumnNames(myData))},
#"Removed Other Columns" = Table.SelectColumns(myData,firstColumn),
#"Removed Blank Rows" = Table.SelectRows(#"Removed Other Columns", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null})))
in
#"Removed Blank Rows"
This results in:
And the query can be refreshed, edited, etc.
As written, the query will keep the first column of the desired table. You can decide which table to process by changing the number in Source{n}. 3 happens to be the one you are interested in, but there are 11 or 12 tables, if I recall correctly.

Taking your given example you can use a combination of class and type (tag) to select those elements. Same logic applies for next table as well. The problem here is you really have to inspect the html and tailor what you do. Otherwise, the easy solution, which you didn't want, is to use the clipboard.
Option Explicit
Public Sub GetTableInfo()
Dim html As HTMLDocument
Set html = New HTMLDocument '< VBE > Tools > References > Microsoft Scripting Runtime
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.marketscreener.com/COLUMBIA-SPORTSWEAR-COMPA-8859/company/", False
.send
html.body.innerHTML = .responseText
End With
Dim leftElements As Object, td As Object
'.tabElemNoBor.fvtDiv tr:nth-of-type(2) td.nfvtTitleLeft
Set leftElements = html.getElementsByClassName("tabElemNoBor fvtDiv")(0).getElementsByTagName("tr")(2)
For Each td In leftElements.getElementsByTagName("td")
If td.className = "nfvtTitleLeft" Then
Debug.Print td.innerText
End If
Next
End Sub

Related

Excel macro to search a website with excel data and extract specific results and then loop for next value for another webiste

I have replicated the code in Excel macro to search a website with excel data and extract specific results and then loop for next value, although I get a error on the line URL_Get_SKU_Query1 = entityRange.Offset(0, 1).Value2 stating "object variable or with block variable not set"
So I am just trying to replicate the code for another website.
This code pulls in a certain text and spits out a value from the webiste.
So I would like to enter in MFR SKU in sheet 1 as such:
Name // SKU // Price
WaterSaverFaucet // SS902BC
After I have created a macro button on sheet 2 and clicking it
Then have it spit out the price.
So that it ends up like this below:
Name // SKU // Price
WaterSaverFaucet // SS902BC // 979.08
I would need this in order to look up multiple items on a website.
Sub LoopThroughBusinesses1()
Dim i As Integer
Dim SKU As String
For i = 2 To Sheet1.UsedRange.Rows.Count
SKU = Sheet1.Cells(i, 2)
Sheet1.Cells(i, 3) = URL_Get_SKU_Query1(SKU)
Next i
End Sub
Function URL_Get_SKU_Query1(strSearch As String) As String ' Change it from a Sub to a Function that returns the desired string
' strSearch = Range("a1") ' This is now passed as a parameter into the Function
Dim entityRange As Range
With Sheet2.QueryTables.Add( _
Connection:="URL;https://www.neobits.com/SearchBySKU.aspx?SearchText=" & strSearch & "&safe=active", _
Destination:=Sheet2.Range("A1")) ' Change this destination to Sheet2
.BackgroundQuery = True
.TablesOnlyFromHTML = True
.Refresh BackgroundQuery:=False
.SaveData = True
End With
' Find the Range that has "Price"
Set entityRange = Sheet2.UsedRange.Find("Price")
' Then return the value of the cell to its' right
URL_Get_SKU_Query1 = entityRange.Offset(0, 1).Value2
' Clear Sheet2 for the next run
Sheet2.UsedRange.Delete
End Function
Your logic is flawed unfortunately. You cannot simply take the mechanism from one webpage and assume it works for the next. In this case the solution you are trying will not work. When you enter a SKU into search what actually happens is a page re-direct (302). Not the construction of an url as you have tried. You are getting the error you see primarily due to hitting a page not found - though surfaces due to your element not being found on the 404 page.
Instead, you can use the construct the page in question actually uses for initial url and then you can use xmlhttp which will follow the re-direct as follows:
VBA:
Option Explicit
Public Sub GetPrices()
Dim xhr As XMLHTTP60, html As HTMLDocument, ws As Worksheet, i As Long
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set xhr = New XMLHTTP60
Set html = New HTMLDocument
Dim allData()
allData = ws.UsedRange.Value
With xhr
For i = 2 To UBound(allData, 1)
.Open "GET", "https://www.neobits.com/search?keywords=" & allData(i, 2), False
.send
Dim price As Object
html.body.innerHTML = .responseText
Set price = html.querySelector("#main_price")
If Not price Is Nothing Then
allData(i, 3) = price.innerText
Else
allData(i, 3) = "No price found"
End If
Set price = Nothing
Next
End With
ws.Cells(1, 1).Resize(UBound(allData, 1), UBound(allData, 2)) = allData
End Sub
I assume your page set-up, in Sheet1, is as follows:
Required project references:
The two references bounded in red are required. Press Alt+F11 to open the VBE and then go Tools > References and add references. You may have a different version number for xml library - in which case reference will need changing as will code references of
Dim xhr As XMLHTTP60
and
New XMLHTTP60
To run this code:
Press Alt+F11 to open the VBE > Right click in project explorer > Add standard module. Paste code into that standard module > Select anywhere inside the code and press F5, or hit the green Run arrow in the ribbon.
You could further develop, for example, to handle non 200 status codes:
Option Explicit
Public Sub GetPrices()
Dim xhr As XMLHTTP60, html As HTMLDocument, ws As Worksheet, i As Long
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set xhr = New XMLHTTP60
Set html = New HTMLDocument
Dim allData(), price As Object
allData = ws.UsedRange.Value
With xhr
For i = 2 To UBound(allData, 1)
.Open "GET", "https://www.neobits.com/search?keywords=" & allData(i, 2), False
.send
If .Status <> 200 Then
allData(i, 3) = "Status not succeeded" '<== Little bit loose but you get the idea.
Else
html.body.innerHTML = .responseText
Set price = html.querySelector("#main_price")
If Not price Is Nothing Then
allData(i, 3) = price.innerText
Else
allData(i, 3) = "No price found"
End If
Set price = Nothing
End If
Next
End With
ws.Cells(1, 1).Resize(UBound(allData, 1), UBound(allData, 2)) = allData
End Sub
' Find the Range that has "Entity Type:"
Set entityRange = Sheet2.UsedRange.Find("Lists At:")
' Then return the value of the cell to its' right
URL_Get_SKU_Query1 = entityRange.Offset(0, 1).Value2
The problem is that Range.Find may not find what you're looking for, for various reasons. Always specify the optional parameters to that function, since it otherwise "conveniently remembers" the values from the last time it was invoked - either from other VBA code, or through the Excel UI (IOW there's no way to be 100% sure of what values it's going to be running with if you don't specify them). But even then, if Range.Find doesn't find what it's looking for, it will return Nothing - and you can't just assume that will never happen!
But, reading closer...
' Find the Range that has "Entity Type:"
Set entityRange = Sheet2.UsedRange.Find("Lists At:")
Someone's lying. Read the comment. Now read the code. Who's telling the truth? Don't write comments that say "what" - have comments say "why", and let the code say "what". Otherwise you have situations like that, where it's impossible to tell whether the comment is outdated or the code isn't right, at least not without looking at the worksheet.
In any case, you need to make sure entityRange isn't Nothing before you try to make a member call against it:
If Not entityRange Is Nothing Then
URL_Get_SKU_Query1 = entityRange.Offset(0, 1).Value2
End If

Scraping Web With VBA to return Data while matching conditions

I hope this question is not in other post since I have searched and not found an answer. I'm also quite new to programming but specially to scraping the web. If you guys know of any good, complete tutorial, I'll appreciate if you can direct me to it. I work with VBA and Python.
I begun working after reading this: Scraping data from website using vba
Very helpful, by the way. I understood the old method better so I chose that one.
The site I want to search in is: http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp
The code I've written so far:
Sub Test()
Dim ie As Object
Dim form As Variant, button As Variant
Set ie = CreateObject("InternetExplorer.Application")
Dim TR_col As Object, TR As Object
Dim TD_col As Object, TD As Object
Dim xx As Object, x As Object
With ie
.Visible = True '< Show browser window
.navigate ("http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp") '> Travel to homepage
Do While ie.Busy
DoEvents
Loop '< Wait for page to have loaded
End With
Set TR_col = ie.Document.getElementsByTagName("TR")
For Each TR In TR_col
Set xx = ie.Document.getElementsByTagName("a")
If xx = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
Cells(1, 1) = "Ok"
End If
Next TR
End Sub
Lastly, this is the Inspector looks like:
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/YoG4H.png
I also highlighted the piece of information I'm using for testing purposes.
So, my approach is to search for all the "tr" tags and then validate whether the first column of the table (I guess this would be the first "td" tag) is equal to a text I'll have in a cell (in this case I just wrote in text for testing purposes). The result should be copying the number next to the date to a cell in the worksheet. In this case I wrote "Ok" just to see whether the if statement was working. But it isn't.
I guess I'm not sure how tell VBA to search for all "tr" tags, search for all the "td" tags within each "tr", find the one that matches some text, and return the 3rd "td" tag within that "tr". Makes sense?
Hope I've been specific enough and that someone can guide me.
It's not necessary to load whole browser to get HTML - you can do without it.
Sub Test()
'// References required:
'// 1) Microsoft HTML Object Library
'// 2) Microsoft XML, v6.0
Dim req As MSXML2.XMLHTTP60
Dim doc As MSHTML.HTMLDocument
Dim tbl As MSHTML.HTMLTable
Dim tblRow As MSHTML.HTMLTableRow
Dim tblCell As MSHTML.HTMLTableCell
Dim anch As MSHTML.HTMLAnchorElement
Dim html$, url$, sText$, fecha$, valor$, j%
Set req = New MSXML2.XMLHTTP60
url = "http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp"
Set req = New MSXML2.XMLHTTP60
req.Open "GET", url, False
req.send
html = req.responseText
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = html
Set tbl = doc.getElementsByClassName("table-BCRA")(, 0)
For j = 1 To tbl.Rows.Length - 1
With tbl.Rows(j)
'// Skip cells without data.
'// Assume correct data has three cells.
If .Cells.Length = 3 Then
Set anch = .Cells(0)
sText = anch.textContent
If sText = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
fecha = .Cells(1).innerText
valor = .Cells(2).innerText
End If
End If
End With
Next
End Sub

Scraping data from a website with a dynamic array function - Excel VBA

I want to eventually create a function where I can specify a web page element and URL and populate all instances of that element down a column. But am currently only experiencing limited success with this function:
Sub GrabAnchorTags() '(URL As String) As Variant'
Dim objIE As InternetExplorer
Dim elem As Object
Set objIE = New InternetExplorer
objIE.Visible = False
objIE.navigate "http://example.com/"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Dim aRange As Range
Debug.Print objIE.document.getElementsByTagName("a").Length
For Each elem In objIE.document.getElementsByTagName("a")
Debug.Print elem
ActiveCell.Offset(x, y).Value = elem
ActiveCell.Offset(x, y + 1).Value = elem.textContent
x = x + 1
Next
objIE.Quit
Set objIE = Nothing
End Sub
I would like to be able to turn this successfully from a macro to a function.
Currently, it uses a for loop to populate the cells and I wonder if it's possible to accomplish the same thing using evaluate or something similar because the for loop is inefficient.
This function would need to live in a cell, reference a URL in another cell, and populate the cells bellow it with all elements of a type found on the page. I am currently working on the anchor tag.
Many other solutions I referenced used macros:
Scraping data from website using excel vba
Getting links url from a webpage excel vba
VBA – Web scraping with getElementsByTagName()
Generally speaking, whenever you have many cells to write to, you should enter the data into an internal array, and then write the entire array to the worksheet in one hit. However you seem to not want a macro/sub in your case.
If you wish it to take the worksheet formula approach for usability reasons, then the best way is to use a very powerful, but underused technique in Excel development.
A NAMED RANGE
Named ranges are Excels closest thing to getting an in-memory block of data, and then other simpler formulas can use the named range to get info from the Named Range.
A Named Range doesn't have to actually be a simple block of cells on a sheet. You can write your VBA formula as a Public formula, and then reference it in the Named Range.
Function getElems(url As String, tagName As String) As String()
Dim browser As New MSXML2.XMLHTTP60
Dim doc As MSHTML.HTMLDocument
With browser
.Open "GET", url, False
.send
If .readyState = 4 And .Status = 200 Then
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = .responseText
Else
MsgBox "Error" & vbNewLine & "Ready state: " & .readyState & _
vbNewLine & "HTTP request status: " & .Status
End If
End With
Dim tag As MSHTML.IHTMLElement
Dim tags As MSHTML.IHTMLElementCollection
Set tags = doc.getElementsByTagName(tagName)
Dim arr() As String
Dim arrCounter As Long: arrCounter = 1
ReDim arr(1 To tags.Length, 1 To 2)
For Each tag In tags
arr(arrCounter, 1) = tag.innerText
'Change the below if block to suit
If tagName = "a" Then
arr(arrCounter, 2) = tag.href
Else
arr(arrCounter, 2) = tag.innerText
End If
arrCounter = arrCounter + 1
Next tag
Set doc = Nothing
Set browser = Nothing
getElems = arr
End Function
Now set a Named Range in Excel such as:
elementData
=getElems(Sheet1!$A$1, Sheet1!$B$1)
In A1, put the URL, and in B1 put the tag Name such as "a"
Then in your cells you can say
=INDEX(elementData, ROW(1:1), 1) and in adjacent cell put =INDEX(elementData, ROW(1:1), 2) (or use ROWS formula technique)
and drag down.

Web Query from URL in Cell

I believe I have thoroughly researched this question (sorry if you have seen the answer, please be patient with me).
Truly a newcomer to VBA/Macros and do not even fully understand where to "put" the codes that are provided in these message boards, that is why I prefer a formula.
My sheet has cells which feed to a hyperlink (i.e. A1=JFK, B1:CVG, C1=HYPERLINK("http://www.gcmap.com/dist?p="&A1&"-"&B1,"My Flight").
If you visit the link (http://www.gcmap.com/dist?P=jfk-cvg) it shows the flying distance between these two points - 589 mi.
What I am trying to do is do a web query in Excel based off the link provided in cell C1, and then have the web query point to the total distance included in the link - and then populate another cell on my sheet (D1) with that distance.
Any and all help would be appreciated!
How's something like this:
Sub getMiles()
'Thanks to http://stackoverflow.com/questions/16975506/how-to-download-source-code-from-a-website-with-vba for idea
Dim k As Long, s
Dim URL2 As String
Dim ws As Worksheet, newWS As Worksheet
Set ws = ActiveSheet
Application.ScreenUpdating = False
URL2 = ws.Cells(1, 3) 'Cell C1 is the URL
' to get data from the url we need to creat a win Http object_
' tools > references > select Windows Win Http Services 5.1
Dim Http2 As New WinHttpRequest
'open the url
Http2.Open "GET", URL2, False
' send request
Http2.Send
'MsgBox Http2.ResponseText
Debug.Print s
'Debug.Print Http2
Debug.Print URL2
Dim Resp As String: Resp = Http2.ResponseText
Dim Lines2 As Variant: Lines2 = Split(Resp, ">")
Worksheets.Add after:=Sheets(Sheets.Count)
Set newWS = ActiveSheet
newWS.Name = "Temp for source code"
k = 0
For k = LBound(Lines2) To UBound(Lines2)
newWS.Cells(1 + k, 1).Value = Lines2(k)
k = k + 1
Next k
Dim findString As String, stringCell As Range
findString = " mi"
Set stringCell = newWS.Columns(1).Find(what:=findString)
Dim milesFlown As String
milesFlown = Left(stringCell.Value, WorksheetFunction.Search("&", stringCell, 1) - 1)
'MsgBox ("You would fly " & milesFlown)
ws.Cells(1, 4).Value = milesFlown
Application.DisplayAlerts = False
newWS.Delete
Application.DisplayAlerts = True
Application.ScreenUpdating = True
End Sub
It's sort of roundabout, but what it does is get the source code of your URL, and in that source code, look for a string that only seems to occur before the miles are given (" mi"), then finds the numbers to the left of the &, and sets that as your miles. You will need to tweak the macro to correctly point to the cell with your URL. Let me know if you need any help doing so!
edit: Ah, to use this code, with Excel open, press ALT+F11, this will open up the VB editor. I think you can insert this code (just copy/paste) into the "Sheet1 (Sheet1)" part. If not, you'll need to right click "VBAProject ([yourbook])" and Insert Module, and put the code there. It should then show up in your macro list (View tab --> Macros).
Edit2: Also, you'll need to add a Reference most likely in VBA. Press ALT+F1 to open VB Editor, then in Tools -> References, look for "Microsoft WinHTTP Services, version 5.1" and add a check mark, and click "Ok" to add this reference. Otherwise, you'll get an error.
Edit3: Updated the code. It now puts the source code on a new sheet, so anything you have in Col. A won't be deleted.

VBA spliting results from html imported table into excel

Hi I am importing a whole table from a website to excel string:
Dim fST As String
fST = Doc.getElementsByTagName("table")(0).innerText
after that I would like to split the table inside excel cells and the splitting to be done using the <td> tags from the html table, or at least this is the option for which I think can be done so the imported table will be the same inside excel once it is imported every value will be inside individual cell.
Let me know thanks.
Here is the Whole conde that I am using:
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Row = Range("URL").Row And _
Target.Column = Range("URL").Column Then
Dim IE As New InternetExplorer
IE.Visible = True
IE.navigate Application.ActiveSheet.Range("URL")
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim tbl, trs, tr, tds, td, r, c
Set tbl = Doc.getElementsByTagName("table")(0)
Set trs = tbl.getElementsByTagName("tr")
For r = 1 To trs.Count
Set tds = trs(r).getElementsByTagName("td")
For c = 1 To tds.Count
ActiveSheet.Cells(r, c).Value = tds(c).innerText
Next c
Next r
IE.Quit
End If
End Sub
But it says error: Object doesn't support this property or method on the following line: For r = 1 To trs.Count
EDIT: tested example
Sub Tester()
Dim IE As Object
Dim tbls, tbl, trs, tr, tds, td, r, c
Set IE = CreateObject("internetexplorer.application")
IE.navigate "http://www.w3schools.com/html/html_tables.asp"
Application.Wait Now + TimeSerial(0, 0, 4)
Set tbls = IE.Document.getElementsByTagName("table")
For r = 0 To tbls.Length - 1
Debug.Print r, tbls(r).Rows.Length
Next r
Set tbl = IE.Document.getElementsByTagName("table")(5)
Set trs = tbl.getElementsByTagName("tr")
For r = 0 To trs.Length - 1
Set tds = trs(r).getElementsByTagName("td")
'if no <td> then look for <th>
If tds.Length = 0 Then Set tds = trs(r).getElementsByTagName("th")
For c = 0 To tds.Length - 1
ActiveSheet.Range("B4").Offset(r, c).Value = tds(c).innerText
Next c
Next r
End Sub
I looked all over for the answer to this question, too. I finally found the solution by talking to a coworker which was actually through recording a macro.
I know, you all think you are above this, but it is actually the best way. See the full post here: http://automatic-office.com/?p=344 In short, you want to record the macro and go to data --> from web and navigate to your website and select the table you want. Tell excell which cell to put it in and thats it!
I have used the above solutions "get element by id" type stuff in the past, and it is great for a few elements, but if you want a whole table, and you aren't super experienced, just record a macro. don't tell your friends and then reformat it to look like your own work so no one knows you used the macro tool ;)

Resources