VBA spliting results from html imported table into excel - excel

Hi I am importing a whole table from a website to excel string:
Dim fST As String
fST = Doc.getElementsByTagName("table")(0).innerText
after that I would like to split the table inside excel cells and the splitting to be done using the <td> tags from the html table, or at least this is the option for which I think can be done so the imported table will be the same inside excel once it is imported every value will be inside individual cell.
Let me know thanks.
Here is the Whole conde that I am using:
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Row = Range("URL").Row And _
Target.Column = Range("URL").Column Then
Dim IE As New InternetExplorer
IE.Visible = True
IE.navigate Application.ActiveSheet.Range("URL")
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim tbl, trs, tr, tds, td, r, c
Set tbl = Doc.getElementsByTagName("table")(0)
Set trs = tbl.getElementsByTagName("tr")
For r = 1 To trs.Count
Set tds = trs(r).getElementsByTagName("td")
For c = 1 To tds.Count
ActiveSheet.Cells(r, c).Value = tds(c).innerText
Next c
Next r
IE.Quit
End If
End Sub
But it says error: Object doesn't support this property or method on the following line: For r = 1 To trs.Count

EDIT: tested example
Sub Tester()
Dim IE As Object
Dim tbls, tbl, trs, tr, tds, td, r, c
Set IE = CreateObject("internetexplorer.application")
IE.navigate "http://www.w3schools.com/html/html_tables.asp"
Application.Wait Now + TimeSerial(0, 0, 4)
Set tbls = IE.Document.getElementsByTagName("table")
For r = 0 To tbls.Length - 1
Debug.Print r, tbls(r).Rows.Length
Next r
Set tbl = IE.Document.getElementsByTagName("table")(5)
Set trs = tbl.getElementsByTagName("tr")
For r = 0 To trs.Length - 1
Set tds = trs(r).getElementsByTagName("td")
'if no <td> then look for <th>
If tds.Length = 0 Then Set tds = trs(r).getElementsByTagName("th")
For c = 0 To tds.Length - 1
ActiveSheet.Range("B4").Offset(r, c).Value = tds(c).innerText
Next c
Next r
End Sub

I looked all over for the answer to this question, too. I finally found the solution by talking to a coworker which was actually through recording a macro.
I know, you all think you are above this, but it is actually the best way. See the full post here: http://automatic-office.com/?p=344 In short, you want to record the macro and go to data --> from web and navigate to your website and select the table you want. Tell excell which cell to put it in and thats it!
I have used the above solutions "get element by id" type stuff in the past, and it is great for a few elements, but if you want a whole table, and you aren't super experienced, just record a macro. don't tell your friends and then reformat it to look like your own work so no one knows you used the macro tool ;)

Related

eBay Product scraper

I am very limited on VBA,
The Code is in a Module, the code has a sub process as well, so sorry if I post the code wrong
A) open IE
B) Subprocess gets the data.
The code works fine on ebay.com but NOT for ebay.co.uk - can't work out why, also it converts urls to hyperlinks
It only does the first page, I need it to go through an X amount of pages - have a code but can't get it to work so have removed it.
Can the search query be run AFTER Ebay opens, so it opens, then search item is input to ebay and then code runs, or to run from a cell, IF its Cell A1 the data extracted needs to be pasted in A2 and below.
I have looked at elements for ebay.com and ebay.co.uk and they look the same to me, so can't work out why its not working as it works for 1 and not the other.
I did input the code for getting data from several pages it did not work. I know this code works as I have it for when I fetch urls from google
Public IE As New SHDocVw.InternetExplorer
Sub GetData()
Dim HTMLdoc As MSHTml.HTMLDocument
Dim othwb As Variant
Dim objShellWindows As New SHDocVw.ShellWindows
Set IE = CreateObject("internetexplorer.application")
With IE
.Visible = True
'.Navigate "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=jackets&_sacat=0"
.Navigate "https://www.ebay.com/sch/i.html_from=R40&_nkw=ralph+lauren&_sacat=1059&LH_TitleDesc=0&_dmd=1&rt=nc"
While .Busy Or .readyState <> 4: DoEvents: Wend
Set HTMLdoc = IE.document
ProcessHTMLPage HTMLdoc
.Quit
End With
End Sub
code here
enter
'''''' THIS IS THE SUB PROCESS '''''
Sub ProcessHTMLPage(HTMLPage As MSHTml.HTMLDocument)
Dim HTMLItem As MSHTml.IHTMLElement
Dim HTMLItems As MSHTml.IHTMLElementCollection
Dim HTMLInput As MSHTml.IHTMLElement
Dim rownum As Long
rownum = 1
Set HTMLItems = HTMLPage.getElementsByClassName("s-item__title")
For Each HTMLItem In HTMLItems
Cells(rownum, 1).Value = HTMLItem.innerText
rownum = rownum + 1
Next HTMLItem
rownum = 1
Set HTMLItems = HTMLPage.getElementsByClassName("s-item__price")
For Each HTMLItem In HTMLItems
Cells(rownum, 2).Value = HTMLItem.innerText
rownum = rownum + 1
Next HTMLItem
rownum = 1
Set HTMLItems = HTMLPage.getElementsByClassName("s-item__link")
For Each HTMLItem In HTMLItems
Cells(rownum, 3).Value = HTMLItem.href
rownum = rownum + 1
Next HTMLItem
'Converts each text hyperlink selected into a working hyperlink from C1 to 25000 rows
Range("C1:C25000").Select
For Each xCell In Selection
ActiveSheet.Hyperlinks.Add Anchor:=xCell, Address:=xCell.Formula
Next xCell
Range("C1").Select
End Sub
Code for going to next page
pageNumber = 1
'i = 2
If pageNumber >= 6 Then Exit Do 'the first 6 pages
internetdata.getElementById("pnnext").click 'next web page
Do While internet.Busy Or internet.readyState <> 4
DoEvents
Loop
Set internetdata = internet.document
pageNumber = pageNumber + 1
Loop
Does not work on Ebay.co.uk - NO RESULTS ARE EXTRACTED - Works fine in ebay.com
Need it to get data from X amount of pages and NOT just 1 page
Can the search query be run AFTER Ebay opens, so it opens, then search item is input to ebay and then code runs, or to run from a cell, IF its Cell A1 the data extracted needs to be pasted in A2 and below.
This is my code for google search, I have got it working so the search comes from cell A1, I am look for something like this, I am going to see if I can use the ebay code with this. As this also does the first 25 pages in google search
enter Sub webpage()
Dim ie As Object
Dim htmlDoc As Object
Dim nextPageElement As Object
Dim div As Object
Dim link As Object
Dim url As String
Dim pageNumber As Long
Dim i As Long
' Takes seach from A1 and places it into google
url = "https://www.google.co.uk/search?q=" & Replace(Worksheets("Sheet1").Range("A1").Value, " ", "+")
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.navigate url
Do While .Busy Or .readyState <> 4
DoEvents
Loop
End With
Application.Wait Now + TimeSerial(0, 0, 5)
Set htmlDoc = ie.document
pageNumber = 1
i = 2
Do
For Each div In htmlDoc.getElementsByTagName("div")
If div.getAttribute("class") = "r" Then
Set link = div.getElementsByTagName("a")(0)
Cells(i, 2).Value = link.getAttribute("href")
i = i + 1
End If
Next div
If pageNumber >= 25 Then Exit Do 'the first 25 pages
Set nextPageElement = htmlDoc.getElementById("pnnext")
If nextPageElement Is Nothing Then Exit Do
' Clicks web next page
nextPageElement.Click 'next web page
Do While ie.Busy Or ie.readyState <> 4
DoEvents
Loop
Application.Wait Now + TimeSerial(0, 0, 5)
Set htmlDoc = ie.document
pageNumber = pageNumber + 1
Loop
MsgBox "All Done"
Set ie = Nothing
Set htmlDoc = Nothing
Set nextPageElement = Nothing
Set div = Nothing
Set link = Nothing
End Sub
code here
Question 1: Why does it work for one domain but not the other?
To answer question 1 (the other questions should be new posts) - the html is not the same at all. The classes which work for ebay.com are not found in ebay.co.uk; So, your loop over collections doesn't do anything because they are count 0 (or length 0 with nodeLists if using querySelectorAll). Instead, you need branched code. Set your selectors based on the url domain.
I have used css selectors as this is the easiest, and fastest way, to select the required elements whilst maintaining the flexibility of a code re-factor to reduce the lines of repeated code.
Side note:
If you are unsure about whether your selection method will work across different pages you can do at least two things:
Right click > inspect element > visually check the class names are the same for the elements you are attempting to compare. So, if you are looking at product names, are the class names in the html the same on both pages?
You can use the search facility of the browser > open element tab via F12 then press Ctrl+F to pull up search box > enter your class name from the first page into this box in the second page and hit enter. You can also enter css selectors here and some cases regex. You will get a hit count telling you how many matches found. You can keep pressing enter to cycle through matches and each match will be highlighted in the html above, so you can easily compare if matched results are what you expected.
click image to enlarge
img url: https://i.stack.imgur.com/MWkEx.png
VBA:
Option Explicit
Public Sub GetData()
Dim htmlDoc As MSHTML.HTMLDocument, ie As SHDocVw.InternetExplorer, ws As Worksheet
Set ie = New SHDocVw.InternetExplorer
Set htmlDoc = New MSHTML.HTMLDocument
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
'.Navigate2 "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=jackets&_sacat=0"
.Navigate2 "https://www.ebay.com/sch/i.html_from=R40&_nkw=ralph+lauren&_sacat=1059&LH_TitleDesc=0&_dmd=1&rt=nc"
While .Busy Or .readyState <> 4: DoEvents: Wend
Dim index As Long, HTMLItems As Object, rowNum As Long, xCell As Range
Dim cssSelectors(), i As Long
Select Case True
Case InStr(.document.URL, "ebay.co.uk") > 0
cssSelectors = Array(".gvtitle a", ".amt", ".gvtitle a")
Case InStr(.document.URL, "ebay.com") > 0
cssSelectors = Array(".s-item__title", ".s-item__price", ".s-item__link")
End Select
With ws
For i = LBound(cssSelectors) To UBound(cssSelectors)
rowNum = 1
Set HTMLItems = ie.document.querySelectorAll(cssSelectors(i))
For index = 0 To HTMLItems.length - 1
.Cells(rowNum, i + 1).Value = IIf(i = 2, HTMLItems.item(index).getAttribute("href"), HTMLItems.item(index).innerText)
rowNum = rowNum + 1
Next
Next
For Each xCell In .Range("C1:C25000") '<= all these really?
.Hyperlinks.Add Anchor:=xCell, Address:=xCell.Formula
Next xCell
End With
.Quit
End With
End Sub
If this works on eBay then you need to find out yourself why it's not working on ebay.co.uk. My point is if the code itself works than there is nothing we can help you with here. You need to take some time to investigate ebay.co.uk and find the differences as I am sure it's something minor. I can't help u fix code that isn't actually broken. I wish you luck though.

VBA Scraping div elements

So, I've trying to scrape data from a website but I simply can't reach my goal...
I'm new with VBA and i've tried to search the basics of vba in order to understand some code.
So far I got this code but it's only scraping the data from the 1st div and it scrap all the data to one cell, and I need the macro to run trought all the page and scrap all the data that has the className I input on the code on diferent cells (eg: 1st div to cell A:1, 2nd div to cell A2... and so on)
Could you help me or give me some "lights" of what I'm doing wrong pls?
Thank you!
Code:
Sub BoschRoupa()
Dim ieObj As InternetExplorer
Dim htmlEle As IHTMLElement
Dim i As Integer
i = 1
Set ieObj = New InternetExplorer
ieObj.Visible = False
ieObj.navigate "https://www.worten.pt/grandes-eletrodomesticos/maquinas-de-roupa/maquinas-de-roupa-ver-todos-marca-BALAY-e-BOSCH-e-SIEMENS?per_page=100"
Application.Wait Now + TimeValue("00:00:05")
For Each htmlEle In ieObj.document.getElementsByClassName("w-product__content")(0).getElementsByTagName("div")
With ActiveSheet
.Range("A" & i).Value = htmlEle.Children(0).textContent
End With
i = i + 1
Next htmlEle
End Sub
You can use xmlhttp, rather than a browser, then the following loop to write out all the div info. I would probably be more selective in how I grab only data of interest but the following, I hope, is in the spirit of what you have asked for.
Option Explicit
Public Sub GetInfo()
Dim data As Object, i As Long, html As HTMLDocument, r As Long, c As Long, item As Object, div As Object
Set html = New HTMLDocument '<== VBE > Tools > References > Microsoft HTML Object Library
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.worten.pt/grandes-eletrodomesticos/maquinas-de-roupa/maquinas-de-roupa-ver-todos-marca-BALAY-e-BOSCH-e-SIEMENS?per_page=100", False
.send
html.body.innerHTML = .responseText
End With
Set data = html.getElementsByClassName("w-product__content")
For Each item In data
r = r + 1: c = 1
For Each div In item.getElementsByTagName("div")
With ThisWorkbook.Worksheets("Sheet1")
.Cells(r, c) = div.innerText
End With
c = c + 1
Next
Next
End Sub

Scraping Web VBA Excel unclear

I just started learning web scraping and I'm trying to make a code that Search for specific Data in Web page and click on search then Extract specific Data in excel sheet , I succeed to put the variable that I want to look for in the Web page but When I use the search button I receive this error
I donno how to do it or to correct it
this is my code and Button code
VBA Code
Sub clickICC()
Dim ie As Object
Dim form As Variant, button As Variant
Set ie = CreateObject("InternetExplorer.Application")
myjobtyp = InputBox("Enter type of MP,MOD,DATE")
With ie
.Visible = True
.navigate ("http://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
While ie.ReadyState <> 4
DoEvents
Wend
ie.document.getElementsbyname("searchById").Item.innertext = myjobtyp
Set form = ie.document.getElementsbytagname("form")
Set button = form(0).onsubmit
form(0).submit
Do While ie.busy: DoEvents: Loop
Set TDelements = .documents.getElementsbytagname("td")
r = 0
c = 0
For Each TDelements In TDelements
sheet1.Range("A1").Offset(r, c).Value = TDelement.innertext
r = r + 1
Next
End With
Set ie = Nothing
End Sub
the web button code
Anyone have a clue about this or light me on how fixing this ?
You could try to use a CSS selector to target the src string.
document.querySelector("input[src*=""/cmh/cmh/image/button_search.gif""]").Click
Note that there is a typo in your original post:
Set TDelements = .documents.getElementsbytagname("td")
There is no s on the end of .document.

VBA web scraping

I am trying to get a row of data from this table on this website: http://www.nasdaq.com/symbol/neog/financials?query=balance-sheet
Now I can manage to get the "total liabilities" row using the
doc.getelementsbyclassname("net")(3).innertext
but I cannot figure out how to get any other rows of data such as common stock.
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Row = Range("bscode").Row And _
Target.Column = Range("bscode").Column Then
Dim IE As New InternetExplorer
IE.Visible = True
IE.navigate "http://www.nasdaq.com/symbol/" & Range("bscode").Value & "/financials?query=balance-sheet&data=quarterly"
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim sD As String
sD = Doc.getElementsByTagName("tr")(8).innerText
MsgBox sD
Dim aD As Variant
aD = Split(sD, "$")
Range("bs").Value = aD(1)
Range("ba").Value = aD(2)
Range("bb").Value = aD(3)
Range("bc").Value = aD(4)
End If
End Sub
If it helps, I have the HTML source and the tr highlighted that I want to grab.
screenshot of HTML code
The issue is the method of finding the table row data. Could someone please explain to me how to get other rows of data? It would be much appreciated !
I was able to do some trial and error and to get the correct reference this way:
Dim eTR As Object, cTR As Object, I as Integer 'I used object, because I did late binding
Set cTR = Doc.getElementsByTagName("tr")
i = 0
For Each eTR In cTR
If Left(eTR.innerText, 3) = "Com" Then
Debug.Print "(" & i; "): " & eTR.innerText
End If
i = i + 1
Next
The immediate window then displayed
(308): Common Stocks ... (a bunch of space) ...
$5,941$5,877$5,773$3,779
I then tested this statement:
sd = Doc.getElementsByTagName("tr")(308).innerText
Debug.Print sd
And got the same result.

Permission denied when trying to draw data from a table in IE

I have just recently started looking at applications of VBA in Excel accessing web pages through IE, and have no experience with html coding, so the solution to this might be really simple...
I have a section of code (below) that is supposed to navigate to a website, access a table and pull out the data into excel. However, at seemingly random times, for no reason that I can determine, the Object Variable 'TDelement' becomes locked somehow, and Excel throws up an Error 70: Permission Denied when I try to access the next cell through the loop. It doesn't happen all the time, and it doesn't happen on the same table cell.
Dim IE As Object
Dim TDElements As Object
Dim TDelement As Object
Dim Web_Address As String
Dim DteTm As Date
Web_Address = "http://www.bom.gov.au/fwo/IDQ65388/IDQ65388.040762.tbl.shtml"
' Access the Webpage
IE.Navigate Web_Address
' Wait while IE loading...
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
' Find and Set Data Table Cells/object within webpage
Set TDElements = IE.document.GetElementsByTagName("td")
' Pull each TDElement (table cell) from TDElements
Rw = 1
Col = 2
For Each TDelement In TDElements
If Col = 1 Then
Col = 2
ElseIf Col = 2 Then
Col = 1
End If
If Col = 1 Then
DteTm = TDelement.innerText
Worksheets(1).Cells(Rw, Col).Value = DteTm
ElseIf Col = 2 Then
Worksheets(1).Cells(Rw, Col).Value = TDelement.innerText
End If
If Col = 2 Then
Rw = Rw + 1
End If
Next
If the error is going to occur within a cycle of the loop, it occurs on either
DteTm = TDelement.innerText or
Worksheets(1).Cells(Rw, Col).Value = TDelement.innerText,
dependant on the outcome of the If...Then statement, obviously.
After a bit of googling, the general concensus seemed to be that error 70 is related to naming conflicts with variables (ie trying to use the same variable name twice). Because of this I tried adding Set TDelement = Nothing before Next to clear the variable at the end of each loop, but it didn't resolve the issue (not all that surprising; I have never had an issue with variables in loops like this before).
Could it have something to do with .innerText? Even though it is mentioned on just about every forum post that I have seen with regards to pulling data from IE, it isn't mentioned in the Excel help files at all...
Any help on this would be greatly appreciated.
Try below code :
Sub sample()
Dim IE As Object
Dim Web_Address As String
Dim tblTR As Object
Dim tblTD As Object
Set IE = CreateObject("internetexplorer.application")
Web_Address = "http://www.bom.gov.au/fwo/IDQ65388/IDQ65388.040762.tbl.shtml"
' Access the Webpage
IE.Navigate Web_Address
IE.Visible = True
Start:
' Wait while IE loading...
Do While IE.Busy
Application.Wait DateAdd("s", 5, Now)
Loop
' Find and Set Data Table Cells/object within webpage
Set tblTR = IE.document.GetElementsByTagName("tr")
If tblTR Is Nothing Then GoTo Start
Dim i As Integer
i = 1
For Each tblTD In tblTR
If Not tblTD Is Nothing Then
Worksheets(1).Cells(i, 1).Value = tblTD.all(0).innerText
Worksheets(1).Cells(i, 2).Value = tblTD.all(1).innerText
End If
i = i + 1
Next
End Sub

Resources