Get data from website [duplicate] - excel

This question already has an answer here:
Get data from listings on a website to excel VBA
(1 answer)
Closed 9 years ago.
<span itemprop="streetAddress">
**94 Grand St**
</span>
how to get this data through getelementby method in excel vba
I have tried getelementbyid, getelementbyname etc. but nothing is working
Option Explicit
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/biz/if-boutique-new-york#query:boutique"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("span") ' ## returns the list
MsgBox (Listings(0))
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
Next
Set html = Nothing
Set ie = Nothing
End Sub
The above program is used by me to get the innerText value inside the span tag... but its not working

For the single result you are looking for in detail you want to use these two lines in your code (there is only 1 listing at the detailed level)
Adapt your IE code
Set Listings = html.getElementbyid("bizInfoBody") ' ## returns the list
Range("A1").Offset(r, 0).Value = Listings.innerText
with XMLHTTP
Sub GetTxt()
Dim objXmlHTTP As Object
Dim objHtmlDoc As Object
Dim objHtmlBody As Object
Dim objTbl As Object
Dim strResponse As String
Dim strSite As String
Set objHtmlDoc = CreateObject("htmlfile")
Set objHtmlBody = objHtmlDoc.body
Set objXmlHTTP = CreateObject("MSXML2.XMLHTTP")
strSite = "http://www.yelp.com/biz/if-boutique-new-york"
With objXmlHTTP
.Open "GET", strSite, False
.Send
If .Status = 200 Then
strResponse = .responseText
objHtmlBody.innerHTML = objXmlHTTP.responseText
Set objTbl = objHtmlBody.Document.getElementbyid("bizInfoBody")
MsgBox objTbl.innerText
End If
End With
End Sub

Related

Select Item in a dropdown from website via Excel Macro

I would like to select the options "Addition, Bulk, Reduction" using excel VBA
This what I have so far, but nothing is being selected.
Dim ie As InternetExplorer
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate "my URL"
Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
'time_adjust_group_ident = Reduction
Dim doc As HTMLDocument
Set doc = ie.document
doc.getElementById("time_adjust_group_ident").Value = "Reduction"
End Sub
You don't need Internet Explorer object for this. Please take a look in the code below where I use MSXML2.XMLHTTP to make a HTTP request and get the HTML response as a string, and then parse it using the HTMLFile object.
I'm using the CreateObject method instead of adding the references via Tools > References, so you can run this code anywhere without having to add references manually every time you open this in a different machine.
In this example, I'm retrieving the children elements of the language-selector dropdown in a given website, and looping through it using a For Each to write each child element's content in a spreadsheet row.
Sub LoadHtml()
Dim strUrl As String
strUrl = "https://developer.mozilla.org/en-US/docs/Web/HTML/Element/select"
Dim httpRequest As Object
Set httpRequest = CreateObject("MSXML2.XMLHTTP")
With httpRequest
.Open "GET", strUrl, False
.send
End With
Dim html As Object
Set html = CreateObject("HTMLFile")
html.body.innerHTML = httpRequest.ResponseText
Dim child As Object
Dim row As Integer
row = 1
For Each child In html.getElementById("language-selector").Children
Range("A" & row) = child.innerText
row = row + 1
Next
End Sub

When the search button is clicked using vba the text entered in search box is not seen by web page

I have written vba code for entering manufacturer part number in search box of below website and clicking on search icon. It is able enter manufacturer part number in search box and click on search icon, but when "search icon is clicked the text entered in the text box is not picked up". It searches empty data.
'HTML Part for search icon
<em class="fa fa-search" aria-hidden="true" style="color: gray;"></em>
It being almost a month I have tried various different way which was also mentioned on stack overflow, like using "createEvent("keyboardevent")" but nothing worked.
' VBA code
Sub AptivScrapping()
Dim IE As SHDocVw.InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate "https://ecat.aptiv.com"
Do While IE.readyState < READYSTATE_COMPLETE
Loop
Dim idoc As MSHTML.HTMLDocument
Set idoc = IE.document
idoc.getElementById("searchUserInput").Value = "33188785"
Dim doc_ele As MSHTML.IHTMLElement
Dim doc_eles As MSHTML.IHTMLElementCollection
Set doc_eles = idoc.getElementsByTagName("a")
For Each doc_ele In doc_eles
If doc_ele.getAttribute("ng-click") = "SearchButtonClick(1)" Then
doc_ele.Click
Exit Sub
Else
End If
Next doc_ele
End Sub
The page does an xhr request to retrieve the search results. You can find it in the network tab after clicking submit. This means you can avoid, in this case, the expense of a browser and issue an xhr request. The response is json so you do need a json parser to handle the results.
I would use jsonconverter.bas to parse the json. After installing the code from that link in a standard module called JsonConverter, go to VBE > Tools > References > Add a reference to Microsoft Scripting Runtime
I dimension an array to hold the results. I determine rows from the number of items in the json collection returned and the number of columns from the size of the first item dictionary. I loop the json object, and inner loop the dictionary keys of each dictionary in collection, and populate the array. I write the array out in one go at end which is less i/o expensive.
Option Explicit
Public Sub GetInfo()
Dim json As Object, ws As Worksheet, headers()
Dim item As Object, key As Variant, results(), r As Long, c As Long
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://ecat.aptiv.com/json/eCatalogSearch/SearchProducts?filter=All&options=&pageSize=10&search=33188785", False
.send
Set json = JsonConverter.ParseJson(.responseText)("Products")
End With
headers = json.item(1).keys
ReDim results(1 To json.Count, 1 To UBound(headers) + 1)
For Each item In json
r = r + 1: c = 1
For Each key In item.keys
results(r, c) = item(key)
c = c + 1
Next
Next
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
You can do this instead:
txt = "33188785"
IE.navigate "https://ecat.aptiv.com/feature?search=" & txt
This will take you straight to the Search Result.
Code:
Sub AptivScrapping()
Dim IE As SHDocVw.InternetExplorer
Dim txt As String
Set IE = New InternetExplorer
txt = "33188785"
IE.Visible = True
IE.navigate "https://ecat.aptiv.com/feature?search=" & txt
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
End Sub
This will be faster as You will only have to load one page.
Why that's happening, i am not sure, but seems like the TextBox that is used to input text is not being Activated when adding text automatically to it. It is being activated when we click inside it.
I got the solution for above problem from Mrxel.com below is the link for that post.
https://www.mrexcel.com/forum/excel-questions/1105434-vba-ie-automation-issue-angularjs-input-text-post5317832.html#post5317832
In this case I need to enter the search string character by character and sendKeys and input events inside the loop. Below is the working vba code.
Sub AptivScrapping()
Dim IE As SHDocVw.InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate "https://ecat.aptiv.com"
Do While IE.readyState < READYSTATE_COMPLETE
Loop
Dim idoc As MSHTML.HTMLDocument
Set idoc = IE.document
IE.document.getElementById("searchUserInput").Focus = True
IE.document.getElementById("searchUserInput").Select
sFieldInput = "33188785"
For s = 1 To Len(sFieldInput)
Application.SendKeys Mid(sFieldInput, s, 1)
While IE.readyState < 4 Or IE.Busy
Application.Wait DateAdd("s", LoopSeconds, Now)
Wend
Next s
IE.document.getElementById("searchUserInput").Focus = False
Dim doc_ele As MSHTML.IHTMLElement
Dim doc_eles As MSHTML.IHTMLElementCollection
Set doc_eles = idoc.getElementsByTagName("a")
For Each doc_ele In doc_eles
If doc_ele.getAttribute("ng-click") = "SearchButtonClick(1)" Then
doc_ele.Click
Exit Sub
Else
End If
Next doc_ele
End Sub

How to extract text from div element under numerous elements?

pastebin.com/p9m5tMMw
Hello, I am trying to scrape the data from the website. I have attached the code via paste bin. However the text for example "NET SALES" that I need is under tr then td then div. I am having a hard time extracting that data. If it is NET SALES I would like to get the value of the div inside the 4th td. I have attached the code I have so far. The report generated is also dynamic, so having a hard time identifying a correct id for it. The table id" ctl00_cpMain_rptMain_fixedTable" can be used but I can't get the code to navigate down all those elements to get to the tbody that contains the tr and elements I need. Thank you
Sub GetData()
Dim IE As Object
Dim ListOfRows As Object
Dim CellsInsideRow As Object
Dim DivsInsideCell As Object
Dim StrInsideDiv As Object
Dim tRows As Object
Dim tCells As Object
Dim tDivs As Object
Dim Content As Object
Dim ContentArea As Object
Dim ThingInDiv As Object
Dim tThings
Dim t As Integer
Dim s As String
Dim Tbody As Object
Dim ttDivs As HTMLDivElement
Dim DivsInsideOfDivs As Object
Set IE = CreateObject("InternetExplorer.Application")
'Dim IE As SHDocVw.InternetExplorer'
Dim HtmlDoc As MSHTML.HTMLDocument
Dim htmlinput As MSHTML.IHTMLElement
Set IE = New SHDocVw.InternetExplorer
' Open Home Page'
With IE
.Visible = True
.navigate "https://ballout.sage-nexgen.com/report/SDateXParam.aspx?r=204"
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
End With
' Identify Source Code'
Set HtmlDoc = IE.document
'Mention this bug'
' Input Username and Password'
'On Error Resume Next
'IE.document.forms("aspnetForm").elements("ctl00$cpMain$logMain$UserName").Value = "**********"
'IE.document.forms("aspnetForm").elements("ctl00$cpMain$logMain$Password").Value = "******"
'IE.document.forms("aspnetForm").elements("ctl00$cpMain$logMain$LoginButton").Click
Do While IE.Busy
DoEvents
Loop
IE.document.forms("aspnetForm").elements("ctl00$cpMain$clbStores$0").Click
Do While IE.Busy
DoEvents
Loop
IE.document.forms("aspnetForm").elements("ctl00$cpMain$StartDate").Value = Range("A1").Text '
Do While IE.Busy
DoEvents
Loop
IE.document.getElementById("ctl00_cpMain_cmdRun2").Click
Do While IE.Busy
DoEvents
Loop
On Error Resume Next
Set ListOfRows = IE.document.getElementsByTagName("tr")
Debug.Print ListOfRows.Length
For Each tRows In ListOfRows
Set CellsInsideRow = tRows.getElementsByTagName("td")
For Each tCells In CellsInsideRow
Set DivsInsideCell = tCells.getElementsByTagName("div")
For Each tDivs In DivsInsideCell
Set ttDivs = tDivs
'If Trim(ttDivs.innerText) = "AVERAGE NET SALE" Then
'Debug.Print Trim(ttDivs.innerText)
'End If
Next tDivs
Next tCells
Next tRows
'Set ListOfRows = Content.getElementsByTagName("tr")
'MsgBox (Len(ListOfRows))
'For Each tRows In ListOfRows
' Set CellsInsideRow = tRows.getElementsByTagName("td")
' For Each tCells In CellsInsideRow
'
' Set DivsInsideCell = tCells.getElementsByTagName("div")
'Next tCells
'Next tRows
End Sub
Information
You have nested tables but in your html there is a table with id content that has all the info. You can gather the rows (tr) in that table and loop looking for the string of interest. If you use .innerText on the row you will get the financial amount along with the text description. Otherwise, you can extract the sub info from the td elements within the row. Something like the following:
Dim html As HTMLDocument, tableRows As Object, i As Long, rowOfInterest As Object
Set html = ie.document
Dim a As Object
Set tableRows = html.querySelectorAll("#content tr")
For i = 0 To tableRows.Length - 1
If InStr(tableRows.Item(i).innerText, "NET SALES") > 0 Then
Set rowOfInterest = tableRows.Item(i)
Exit For
End If
Next
If Not rowOfInterest Is Nothing Then
Debug.Print tableRows.Item(i).innerText
Debug.Print tableRows.Item(i).getElementsByTagName("td")(1).innerText
End If
Sample of the row info:
The values are populated through javascript so you need to allow enough time for population.
The HTML is a bit of a mess. Inside the target HTMLTableCell is another table and many divs inside that tables cels. I would just split the target cells.InnerText by line breaks and process the split data. It'll be a lot easier than trying to decipher this HTML.
Sub ProcessData(IE As InternetExplorerMedium)
Const AverageNetSalesText As String = "AVERAGE NET SALE"
Dim doc As HTMLDocument, tbl As HTMLTable, ListRows As Object, tr As HTMLTableRow, td As HTMLTableCell
Dim data As Variant
Set doc = IE.Document
Set tbl = doc.getElementById("ctl00_cpMain_rptMain_fixedTable")
Set ListRows = tbl.getElementsByTagName("TR")
For Each tr In ListRows
If tr.Cells.Length >= 2 Then
Set td = tr.Cells(2)
If Not td Is Nothing Then
If InStr(td.innerText, "AVERAGE") > 0 Then
data = Split(td.innerText, vbNewLine)
Dim AverageNetSales As Single
If UBound(data) >= 15 Then
If InStr(data(15), AverageNetSalesText) > 0 Then
data(15) = Trim(data(15))
AverageNetSales = Right(data(15), Len(data(15)) - Len(AverageNetSalesText))
'Do something
Range("A17") = AverageNetSales
Exit Sub
End If
End If
End If
End If
End If
Next
End Sub

Excel Macro To Pull Google Image Links

The goal is to get images from Google Images that match the part numbers in my database. My code runs, and it pulls up the correct Google pages but refuses to put the links into the spreadsheet. I have tried everything I can think of, but as of now, I keep on getting Error 1004 (Application-defined or Object-defined error).`
Sub SearchBotGoogleImgLink()
Dim objIE As Object
Set IE = CreateObject("InternetExplorer.Application")
Dim HTMLdoc As HTMLDocument
Dim imgElements As IHTMLElementCollection
Dim imgElement As HTMLImg
Dim aElement As HTMLAnchorElement
Dim n As Integer
Dim i As Integer
Dim url As String
Dim url2 As String
Dim m As Long
Dim lastRow As Long
Dim url3 As String
Dim SearchRow As Long
Dim aEle As HTMLLinkElement
Worksheets("Sheet1").Select
SearchRow = 1
Do Until IsEmpty(ActiveSheet.Cells(SearchRow, 1))
Sheets("Sheet1").Select
Application.StatusBar = SearchRow - 1 & " of " & "4368" & " Items Done"
Item = Trim(ActiveSheet.Cells(SearchRow, 1))
url = "https://www.google.com/search?hl=en&biw=1600&bih=796&tbm=isch&sa=1&ei=CTOpW_2jO6nAjwT67rqACw&q=A2N0015C3KUU&oq=" & Cells(SearchRow, 1) & "&oq=A2N0015C3KUU&gs_l=img.12...0.0..1704...0.0..0.0.0.......1......gws-wiz-img.9wB6WwQJhwA"
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.navigate url
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
For Each aEle In objIE.document.getElementsByTagName("IMG")
result = aEle
Sheets("Sheet1").Range(SearchRow & "C").Value = result
Sheets("Sheet1").Range(SearchRow & "D") = aEle.innerHTML
Sheets("Sheet1").Range(SearchRow & "F").Value = aEle.innerText
Debug.Print aEle.innerText
Next
Loop
'For i = 1 To lastRow
'url = "https://www.google.com/search?hl=en&biw=1600&bih=796&tbm=isch&sa=1&ei=CTOpW_2jO6nAjwT67rqACw&q=A2N0015C3KUU&oq=" & Cells(SearchRow, 1) & "&oq=A2N0015C3KUU&gs_l=img.12...0.0..1704...0.0..0.0.0.......1......gws-wiz-img.9wB6WwQJhwA"
Set HTMLdoc = objIE.document
Set imgElements = HTMLdoc.getElementsByTagName("IMG")
n = 1
For Each imgElement In imgElements
If InStr(ingElement.src, sImageSearchString) Then
If imgElement.ParentNode.nodeName = "A" Then
Set aElement = imgElement.ParentNode
If n = 2 Then
url2 = aElement.href 'imgElement.src
url3 = imgElement.src 'aElement.href
n = n + 1
End If
End If
End If
Next
Cells(SearchRow, 5) = url2
IE.Quit
Set IE = Nothing
End Sub
Notes on your code:
You need Option Explicit at the top of your code to check on variable declarations and typos amongst other advantages. There are a number of missing declarations e.g. result, and used ones later e.g. Set IE = CreateObject("InternetExplorer.Application"). You have two different variables (one late bound and one early) both creating IE instances. You only in fact use one.
Your current error may be down to you trying to work with an object here:
result = aEle which won't work without the Set keyword to provide the required reference.
Without example URLs and expected output it is difficult to advise on the later loops in your code. You appear to have a duplicate loop over IMG elements but this time with some restrictions. It is likely these loops can be merged.
An example:
The following uses an arbitrary concatenation in to pull the img src links in from search results based on A2N0015C3KUU.
It uses a CSS selector combination of #ires img[src] to target elements with img tags and src attributes within the parent element with id ires (search results).
It is to demonstrate the principle of gathering aNodeList of matching elements and writing out to a sheet. The querySelectorAll method applied the CSS selector combination to the HTMLDocument and returns the nodeList. The nodeList is looped along its .Length, with items accessed by index starting at 0.
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.google.com/search?hl=en&biw=1600&bih=796&tbm=isch&sa=1&ei=CTOpW_2jO6nAjwT67rqACw&q=A2N0015C3KUU&oq=1&%20%22&oq=A2N0015C3KUU&gs_l=img.12...0.0..1704...0.0..0.0.0.......1......gws-wiz-img.9wB6WwQJhwA"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim aNodeList As Object, i As Long
Set aNodeList = IE.document.querySelectorAll("#ires img[src]")
For i = 0 To aNodeList.Length - 1
ActiveSheet.Cells(i + 2, 4) = aNodeList.item(i).src
Next
'Quit '<== Remember to quit application
End With
End Sub

Looping through a row and copying each cell In a specific procedure

What I have to do is use Excel VBA to:
login to Amazon Seller
open a workbook
loop through a column to get an order number
put it in the search box
hit the search button
go to the order page and extract the data
then have the extracted data go back into a specified column in
another Excel workbook
The loop and order number parts are what I'm currently stumped on. I've figured out this much code as of this moment:
Sub MyAmazonSeller()
Dim MyHTML_Element As IHTMLElement
Dim MyURL As String
Dim oSignInLink As HTMLLinkElement
Dim oInputEmail As HTMLInputElement
Dim oInputPassword As HTMLInputElement
Dim oInputSigninButton As HTMLInputButtonElement
'InputSearchOrder will be the destination for order numbers taken from the workbook
Dim InputSearchOrder As HTMLInputElement
Dim InputSearchButton As HTMLInputButtonElement
Dim IE As InternetExplorer
Dim AAOrder As Workbook
Dim AAws As Worksheet
MyURL = "https://sellercentral.amazon.com/gp/homepage.html"
Set IE = New InternetExplorer
' Open the browser and navigate.
With IE
.Silent = True
.Navigate MyURL
.Visible = True
Do
DoEvents
Loop Until .ReadyState = READYSTATE_COMPLETE
End With
' Get the html document.
Set HTMLDoc = IE.Document
' See if you have the sign in link is because you are in the main
' page
Set oSignInLink = HTMLDoc.getElementById("signin-button-container")
If Not oSignInLink Is Nothing Then
oSignInLink.Click
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
End If
' Get the email field and the next button
Set oInputEmail = HTMLDoc.getElementById("username")
Set oInputPassword = HTMLDoc.getElementById("password")
' Click the button and wait
oInputEmail.Value = "xxxxxx#xxxxxx.net"
' Get the password field and the sign in button
Set oInputPassword = HTMLDoc.getElementById("password")
Set oInputSigninButton = HTMLDoc.getElementById("sign-in-button")
' Click the button and wait
oInputPassword.Value = "xxxxxxxx"
oInputSigninButton.Click
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
Application.Wait (Now + TimeValue("0:00:05"))
Set AAOrder = Application.Workbooks.Open("Z:\Employee Folders\Employee\trackingnumber_sample_spreadsheet.xls")
Set AAws = AAws.Worksheets("PrimeOrdersWithNoFulfillmentRe")
Set InputSearchOrder = HTMLDoc.getElementById("sc-search-field")
'What I'm currently stuck on
InputSearchOrder.Value = "001-7163923-7572632"
Set InputSearchButton = HTMLDoc.getElementsByClassName("sc-search-button")(0)
InputSearchButton.Click
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
'Was able to add this snippet, but I'm getting an error 13, most likely with
'my e variable. I'm basically trying to do a loop within a loop, extracting 5
'pieces of data and sticking them back into their respective columns in the
'original Excel sheet. The problem comes when scraping the HTML. I'm basically
'trying to get text in the tables which have a few levels and it's frustrating
'me to no end.
With HTMLDoc
Set elems = HTMLDoc.getElementsByTagName("td")
For Each e In elems
If e.innerText Like "*1Z*" Then
Range("D2").Value = e.innerText
End If
Next e
End With
Err_Clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
End Sub

Resources