Web scraping with Excel VBA to get hotel details - excel

I am trying to scrape www.booking.com to get some hotel information based on a search i did on the website. The url is containing my search like address, check-in etc.
After the search, i am trying to scrape the hotel name, score, distance and other information without going further into detail (by clicking on the hotel). But in some way i cant go through the steps to get this information.
If i am right, all the search results are placed in the ID "hotellist_inner", which contains all the seperate hotel results in class "sr_property_block". But i already get an error on the first line:
For Each itemEle In objIE.Document.getElementsById("hotellist_inner"), as it seems that the result of getElementById is empty.
See my example code below.
Sub test()
Dim ws As Worksheet: Set ws = Sheets("scrape")
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
Dim url As String
url = ws.Range("A1")
With objIE
.navigate url
.Visible = True
End With
'wait IE
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
ResRw = 2
For Each itemEle In objIE.Document.getElementById("hotellist_inner")
'Every booking goes on a new line, so reset the column to 1
ResCol = 1
For Each d In itemEle.getElementByClassName("sr_property_block ") 'Each result (hotel) has this Class
'For the next result, add 1 to the column
ResCol = ResCol + 1
Next d
'For the next result, add 1 to the row
ResRw = ResRw + 1
Next
objIE.Quit
Set objIE = Nothing
End Sub
Sample url: https://www.booking.com/searchresults.nl.html?label=gen173nr-1DCAEoggI46AdIM1gEaKkBiAEBmAEcuAEXyAEM2AED6AEB-AEDiAIBqAIDuAL-25KJBsACAdICJDQ5MDk4ZmU4LWE3MDYtNGEwNi1hZTY2LWYzYTc5YmQwZWU2M9gCBOACAQ&sid=c7a70093e9732a158e970a642f2af63f&sb=1&sb_lp=1&src=index&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Findex.nl.html%3Flabel%3Dgen173nr-1DCAEoggI46AdIM1gEaKkBiAEBmAEcuAEXyAEM2AED6AEB-AEDiAIBqAIDuAL-25KJBsACAdICJDQ5MDk4ZmU4LWE3MDYtNGEwNi1hZTY2LWYzYTc5YmQwZWU2M9gCBOACAQ%3Bsid%3Dc7a70093e9732a158e970a642f2af63f%3Bsb_price_type%3Dtotal%26%3B&ss=Amsterdam%2C+Noord-Holland%2C+Nederland&is_ski_area=&checkin_year=2021&checkin_month=8&checkin_monthday=24&checkout_year=2021&checkout_month=8&checkout_monthday=31&group_adults=1&group_children=0&no_rooms=1&b_h4u_keep_filters=&from_sf=1&ss_raw=Amsterdam&ac_position=0&ac_langcode=nl&ac_click_type=b&dest_id=-2140479&dest_type=city&iata=AMS&place_id_lat=52.372898&place_id_lon=4.893&search_pageview_id=694a3bbf21c20007&search_selected=true&search_pageview_id=694a3bbf21c20007&ac_suggestion_list_length=5&ac_suggestion_theme_list_length=0
Does anyone have an idea how to fix this?
Tnx!

Related

eBay Product scraper

I am very limited on VBA,
The Code is in a Module, the code has a sub process as well, so sorry if I post the code wrong
A) open IE
B) Subprocess gets the data.
The code works fine on ebay.com but NOT for ebay.co.uk - can't work out why, also it converts urls to hyperlinks
It only does the first page, I need it to go through an X amount of pages - have a code but can't get it to work so have removed it.
Can the search query be run AFTER Ebay opens, so it opens, then search item is input to ebay and then code runs, or to run from a cell, IF its Cell A1 the data extracted needs to be pasted in A2 and below.
I have looked at elements for ebay.com and ebay.co.uk and they look the same to me, so can't work out why its not working as it works for 1 and not the other.
I did input the code for getting data from several pages it did not work. I know this code works as I have it for when I fetch urls from google
Public IE As New SHDocVw.InternetExplorer
Sub GetData()
Dim HTMLdoc As MSHTml.HTMLDocument
Dim othwb As Variant
Dim objShellWindows As New SHDocVw.ShellWindows
Set IE = CreateObject("internetexplorer.application")
With IE
.Visible = True
'.Navigate "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=jackets&_sacat=0"
.Navigate "https://www.ebay.com/sch/i.html_from=R40&_nkw=ralph+lauren&_sacat=1059&LH_TitleDesc=0&_dmd=1&rt=nc"
While .Busy Or .readyState <> 4: DoEvents: Wend
Set HTMLdoc = IE.document
ProcessHTMLPage HTMLdoc
.Quit
End With
End Sub
code here
enter
'''''' THIS IS THE SUB PROCESS '''''
Sub ProcessHTMLPage(HTMLPage As MSHTml.HTMLDocument)
Dim HTMLItem As MSHTml.IHTMLElement
Dim HTMLItems As MSHTml.IHTMLElementCollection
Dim HTMLInput As MSHTml.IHTMLElement
Dim rownum As Long
rownum = 1
Set HTMLItems = HTMLPage.getElementsByClassName("s-item__title")
For Each HTMLItem In HTMLItems
Cells(rownum, 1).Value = HTMLItem.innerText
rownum = rownum + 1
Next HTMLItem
rownum = 1
Set HTMLItems = HTMLPage.getElementsByClassName("s-item__price")
For Each HTMLItem In HTMLItems
Cells(rownum, 2).Value = HTMLItem.innerText
rownum = rownum + 1
Next HTMLItem
rownum = 1
Set HTMLItems = HTMLPage.getElementsByClassName("s-item__link")
For Each HTMLItem In HTMLItems
Cells(rownum, 3).Value = HTMLItem.href
rownum = rownum + 1
Next HTMLItem
'Converts each text hyperlink selected into a working hyperlink from C1 to 25000 rows
Range("C1:C25000").Select
For Each xCell In Selection
ActiveSheet.Hyperlinks.Add Anchor:=xCell, Address:=xCell.Formula
Next xCell
Range("C1").Select
End Sub
Code for going to next page
pageNumber = 1
'i = 2
If pageNumber >= 6 Then Exit Do 'the first 6 pages
internetdata.getElementById("pnnext").click 'next web page
Do While internet.Busy Or internet.readyState <> 4
DoEvents
Loop
Set internetdata = internet.document
pageNumber = pageNumber + 1
Loop
Does not work on Ebay.co.uk - NO RESULTS ARE EXTRACTED - Works fine in ebay.com
Need it to get data from X amount of pages and NOT just 1 page
Can the search query be run AFTER Ebay opens, so it opens, then search item is input to ebay and then code runs, or to run from a cell, IF its Cell A1 the data extracted needs to be pasted in A2 and below.
This is my code for google search, I have got it working so the search comes from cell A1, I am look for something like this, I am going to see if I can use the ebay code with this. As this also does the first 25 pages in google search
enter Sub webpage()
Dim ie As Object
Dim htmlDoc As Object
Dim nextPageElement As Object
Dim div As Object
Dim link As Object
Dim url As String
Dim pageNumber As Long
Dim i As Long
' Takes seach from A1 and places it into google
url = "https://www.google.co.uk/search?q=" & Replace(Worksheets("Sheet1").Range("A1").Value, " ", "+")
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.navigate url
Do While .Busy Or .readyState <> 4
DoEvents
Loop
End With
Application.Wait Now + TimeSerial(0, 0, 5)
Set htmlDoc = ie.document
pageNumber = 1
i = 2
Do
For Each div In htmlDoc.getElementsByTagName("div")
If div.getAttribute("class") = "r" Then
Set link = div.getElementsByTagName("a")(0)
Cells(i, 2).Value = link.getAttribute("href")
i = i + 1
End If
Next div
If pageNumber >= 25 Then Exit Do 'the first 25 pages
Set nextPageElement = htmlDoc.getElementById("pnnext")
If nextPageElement Is Nothing Then Exit Do
' Clicks web next page
nextPageElement.Click 'next web page
Do While ie.Busy Or ie.readyState <> 4
DoEvents
Loop
Application.Wait Now + TimeSerial(0, 0, 5)
Set htmlDoc = ie.document
pageNumber = pageNumber + 1
Loop
MsgBox "All Done"
Set ie = Nothing
Set htmlDoc = Nothing
Set nextPageElement = Nothing
Set div = Nothing
Set link = Nothing
End Sub
code here
Question 1: Why does it work for one domain but not the other?
To answer question 1 (the other questions should be new posts) - the html is not the same at all. The classes which work for ebay.com are not found in ebay.co.uk; So, your loop over collections doesn't do anything because they are count 0 (or length 0 with nodeLists if using querySelectorAll). Instead, you need branched code. Set your selectors based on the url domain.
I have used css selectors as this is the easiest, and fastest way, to select the required elements whilst maintaining the flexibility of a code re-factor to reduce the lines of repeated code.
Side note:
If you are unsure about whether your selection method will work across different pages you can do at least two things:
Right click > inspect element > visually check the class names are the same for the elements you are attempting to compare. So, if you are looking at product names, are the class names in the html the same on both pages?
You can use the search facility of the browser > open element tab via F12 then press Ctrl+F to pull up search box > enter your class name from the first page into this box in the second page and hit enter. You can also enter css selectors here and some cases regex. You will get a hit count telling you how many matches found. You can keep pressing enter to cycle through matches and each match will be highlighted in the html above, so you can easily compare if matched results are what you expected.
click image to enlarge
img url: https://i.stack.imgur.com/MWkEx.png
VBA:
Option Explicit
Public Sub GetData()
Dim htmlDoc As MSHTML.HTMLDocument, ie As SHDocVw.InternetExplorer, ws As Worksheet
Set ie = New SHDocVw.InternetExplorer
Set htmlDoc = New MSHTML.HTMLDocument
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
'.Navigate2 "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=jackets&_sacat=0"
.Navigate2 "https://www.ebay.com/sch/i.html_from=R40&_nkw=ralph+lauren&_sacat=1059&LH_TitleDesc=0&_dmd=1&rt=nc"
While .Busy Or .readyState <> 4: DoEvents: Wend
Dim index As Long, HTMLItems As Object, rowNum As Long, xCell As Range
Dim cssSelectors(), i As Long
Select Case True
Case InStr(.document.URL, "ebay.co.uk") > 0
cssSelectors = Array(".gvtitle a", ".amt", ".gvtitle a")
Case InStr(.document.URL, "ebay.com") > 0
cssSelectors = Array(".s-item__title", ".s-item__price", ".s-item__link")
End Select
With ws
For i = LBound(cssSelectors) To UBound(cssSelectors)
rowNum = 1
Set HTMLItems = ie.document.querySelectorAll(cssSelectors(i))
For index = 0 To HTMLItems.length - 1
.Cells(rowNum, i + 1).Value = IIf(i = 2, HTMLItems.item(index).getAttribute("href"), HTMLItems.item(index).innerText)
rowNum = rowNum + 1
Next
Next
For Each xCell In .Range("C1:C25000") '<= all these really?
.Hyperlinks.Add Anchor:=xCell, Address:=xCell.Formula
Next xCell
End With
.Quit
End With
End Sub
If this works on eBay then you need to find out yourself why it's not working on ebay.co.uk. My point is if the code itself works than there is nothing we can help you with here. You need to take some time to investigate ebay.co.uk and find the differences as I am sure it's something minor. I can't help u fix code that isn't actually broken. I wish you luck though.

Scraping Web VBA Excel unclear

I just started learning web scraping and I'm trying to make a code that Search for specific Data in Web page and click on search then Extract specific Data in excel sheet , I succeed to put the variable that I want to look for in the Web page but When I use the search button I receive this error
I donno how to do it or to correct it
this is my code and Button code
VBA Code
Sub clickICC()
Dim ie As Object
Dim form As Variant, button As Variant
Set ie = CreateObject("InternetExplorer.Application")
myjobtyp = InputBox("Enter type of MP,MOD,DATE")
With ie
.Visible = True
.navigate ("http://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
While ie.ReadyState <> 4
DoEvents
Wend
ie.document.getElementsbyname("searchById").Item.innertext = myjobtyp
Set form = ie.document.getElementsbytagname("form")
Set button = form(0).onsubmit
form(0).submit
Do While ie.busy: DoEvents: Loop
Set TDelements = .documents.getElementsbytagname("td")
r = 0
c = 0
For Each TDelements In TDelements
sheet1.Range("A1").Offset(r, c).Value = TDelement.innertext
r = r + 1
Next
End With
Set ie = Nothing
End Sub
the web button code
Anyone have a clue about this or light me on how fixing this ?
You could try to use a CSS selector to target the src string.
document.querySelector("input[src*=""/cmh/cmh/image/button_search.gif""]").Click
Note that there is a typo in your original post:
Set TDelements = .documents.getElementsbytagname("td")
There is no s on the end of .document.

loop through rows of data to submit web form

Sub AutoLoadAccounts()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate "https://www.urlhere.com/admin/accounts/create"
IE.Visible = True
While IE.busy
DoEvents
Wend
IE.Document.All("title").Value = ThisWorkbook.Sheets("sheet1").Range("a1")
IE.Document.All("names").Value = ThisWorkbook.Sheets("sheet1").Range("b1")
IE.Document.All("floor").Value = 30
IE.Document.getElementById("status").selectedindex = 1
IE.Document.getElementById("email_state").selectedindex = 1
IE.Document.All("id").Value = ThisWorkbook.Sheets("sheet1").Range("c1")
IE.Document.All("years").Value = ThisWorkbook.Sheets("sheet1").Range("d1")
IE.Document.All("submit").Click
End Sub
The above code I use to populate a web form and submit it. I have around 150 rows of data ranging from A1:D1. I am trying to find a way to loop through the rows 1 by 1 after submitting the form until it reaches the end.
So essentially it will start on the first row and populate the fields from A1:D1, then once complete go down to the next row and do the same for A2:D2. and so on
The trick here is to organise your source data. Using two columns you can record the field name and the required value:
A B
1 Title Sample Title
2 Names Sample Names
3 Floor Sample Floor
To loop:
Sub AutoLoadAccounts()
Dim IE As Object
Dim cRow As Range ' Current row, used to extract values from Excel.
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate "https://www.urlhere.com/admin/accounts/create"
IE.Visible = True
While IE.busy
DoEvents
Wend
' Executes once for each row in the source range.
For Each cRow In ThisWorkbook.Sheets("sheet1").Range("A1:A3")
' Read field name and value from current row.
IE.Document.All(cRow.Value).Value = cRow.Offset(0, 1)
Next
IE.Document.All("submit").Click
End Sub
This code could be improved. At the moment the source range is hard coded (Range("A1:A3")). You could improve this, so the code automatically identifies all completed rows in Excel. If you are interested research the worksheets UsedRange object.
EDIT
Added example that reads source data from columns, not rows.
Sub AutoLoadAccounts_Columns()
Dim IE As Object
Dim cRow As Range ' Current row, used to extract values from Excel.
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate "https://www.urlhere.com/admin/accounts/create"
IE.Visible = True
While IE.busy
DoEvents
Wend
' Executes once for each row in the source range.
For Each cRow In ThisWorkbook.Sheets("sheet1").Range("A1:C1")
' Read field name and value from current row.
IE.Document.All(cRow.Value).Value = cRow.Offset(1, 0).Value
Next
IE.Document.All("submit").Click
End Sub

Copy dropdown table from internet VBA

I am trying to copy a table from this website: http://www.nzfma.org/data/search.aspx#
I need to select the date as yesterday's date, then copy and paste the table to a file.
My code is below:
Sub Test1()
'open IE, navigate to the website of interest and loop until fully loaded
Dim NZFMA As Worksheet
Dim TodayN As Range
Dim elemCollection As Object
Set NZFMA = Sheets("NZFMA")
Set TodayN = NZFMA.Range("B2")
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.navigate "http://www.nzfma.org/data/search.aspx"
Do Until Not ie.Busy And ie.ReadyState = 4
DoEvents
Loop
'Select the dates from the drop-down box
ie.Document.getElementbyid("ctl00_cphBody_rdpDate_dateInput").Value = Format(TodayN, "yyyy-mm-dd")
'Click the submit button
ie.Document.getElementbyid("cphBody_btnSearch").Value = "Search"
'Copy the results
Set elemCollection = ie.Document.getElementbyid("cphBody_upResults")
While ie.ReadyState = 4
DoEvents
Wend
End With
End Sub
For some reason, my macro stops after the first getElementbyID line. Can someone advise as to which part of the code is wrong?
Alright, haven't tested this but something like it should work to change the first dropdown box
Dim elemCollection As HTMLFormElement
set elemCollection = ie.Document.getElementbyid("cphBody_lstMarket")
dim teststring as string
teststring = whatever
Select Case TestString
Case "BKBM"
elemCollection.selectedIndex = 1
Case "BBRS030"
elemCollection.selectedIndex = 2
etc etc
To change the date, you will need something like
ie.Document.getElementbyid("ctl00_cphBody_rdpDate_dateInput")(0).value = "05/07/2016"
Anyway, have a go with that, and please tell me the results

Looping through a row and copying each cell In a specific procedure

What I have to do is use Excel VBA to:
login to Amazon Seller
open a workbook
loop through a column to get an order number
put it in the search box
hit the search button
go to the order page and extract the data
then have the extracted data go back into a specified column in
another Excel workbook
The loop and order number parts are what I'm currently stumped on. I've figured out this much code as of this moment:
Sub MyAmazonSeller()
Dim MyHTML_Element As IHTMLElement
Dim MyURL As String
Dim oSignInLink As HTMLLinkElement
Dim oInputEmail As HTMLInputElement
Dim oInputPassword As HTMLInputElement
Dim oInputSigninButton As HTMLInputButtonElement
'InputSearchOrder will be the destination for order numbers taken from the workbook
Dim InputSearchOrder As HTMLInputElement
Dim InputSearchButton As HTMLInputButtonElement
Dim IE As InternetExplorer
Dim AAOrder As Workbook
Dim AAws As Worksheet
MyURL = "https://sellercentral.amazon.com/gp/homepage.html"
Set IE = New InternetExplorer
' Open the browser and navigate.
With IE
.Silent = True
.Navigate MyURL
.Visible = True
Do
DoEvents
Loop Until .ReadyState = READYSTATE_COMPLETE
End With
' Get the html document.
Set HTMLDoc = IE.Document
' See if you have the sign in link is because you are in the main
' page
Set oSignInLink = HTMLDoc.getElementById("signin-button-container")
If Not oSignInLink Is Nothing Then
oSignInLink.Click
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
End If
' Get the email field and the next button
Set oInputEmail = HTMLDoc.getElementById("username")
Set oInputPassword = HTMLDoc.getElementById("password")
' Click the button and wait
oInputEmail.Value = "xxxxxx#xxxxxx.net"
' Get the password field and the sign in button
Set oInputPassword = HTMLDoc.getElementById("password")
Set oInputSigninButton = HTMLDoc.getElementById("sign-in-button")
' Click the button and wait
oInputPassword.Value = "xxxxxxxx"
oInputSigninButton.Click
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
Application.Wait (Now + TimeValue("0:00:05"))
Set AAOrder = Application.Workbooks.Open("Z:\Employee Folders\Employee\trackingnumber_sample_spreadsheet.xls")
Set AAws = AAws.Worksheets("PrimeOrdersWithNoFulfillmentRe")
Set InputSearchOrder = HTMLDoc.getElementById("sc-search-field")
'What I'm currently stuck on
InputSearchOrder.Value = "001-7163923-7572632"
Set InputSearchButton = HTMLDoc.getElementsByClassName("sc-search-button")(0)
InputSearchButton.Click
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
'Was able to add this snippet, but I'm getting an error 13, most likely with
'my e variable. I'm basically trying to do a loop within a loop, extracting 5
'pieces of data and sticking them back into their respective columns in the
'original Excel sheet. The problem comes when scraping the HTML. I'm basically
'trying to get text in the tables which have a few levels and it's frustrating
'me to no end.
With HTMLDoc
Set elems = HTMLDoc.getElementsByTagName("td")
For Each e In elems
If e.innerText Like "*1Z*" Then
Range("D2").Value = e.innerText
End If
Next e
End With
Err_Clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
End Sub

Resources