I have been using the below function to scroll down a page for over 2 years now and on the 31st December 2019 it just stopped working, no errors, just stopped scrolling down.
I'm using Chrome version Version 79.0.3945.88 and ChromeDriver 2.36.540470. Any ideas or help is greatly appreciated.
def scrollToEndOfPage(self, driver):
try:
time.sleep(1)
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight;")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(randint(2,4))
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight;")
if new_height == last_height:
break
last_height = new_height
except Exception as e:
print(str(e))
Update: 1
I've run document.body.scrollHeight; on the website in question (internal site) and it displays the page height but when I try and execute driver.execute_script("return document.body.scrollHeight;") via a script, it hangs on this request and doesn't return anything and there are no errors.
You can try to wait for the page to be fully loaded before scrolling.
For that you can use code below to wait for JavaScript to finish:
from selenium.webdriver.support.ui import WebDriverWait
# ...
WebDriverWait(browser, 30).until(lambda d: d.execute_script(
'return (document.readyState == "complete" || document.readyState == "interactive")'))
Or use WebDriverWait and wait visibility/clickable of specific element/elements like below:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_all_elements_located((By.XPATH, "some elements on locator")))
# or
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "some clickable element locator")))
Related
I am trying to click an element using selenium chromedriver by the Use of ID of an element to click.
I want to Click Year '2020" from the following webpage: 'https://www.satp.org/datasheet-terrorist-attack/major-incidents/Pakistan'
I tried with the below code.
driver = webdriver.Chrome(executable_path=ChromeDriver_Path, options = options)
driver.get('https://www.satp.org/datasheet-terrorist-attack/major-incidents/Pakistan')
Id = "ctl00_ContentPlaceHolder1_gvMajorIncident_ct123_lbtnYear" ### Id of an Element 2020
wait = WebDriverWait(driver, 20) ##Wait for 20 seconds
element = wait.until(EC.element_to_be_clickable((By.ID, Id)))
driver.execute_script("arguments[0].scrollIntoView();", element)
element.click()
time.sleep(10)
but unfortunately this gives an Error as below:
element = wait.until(EC.element_to_be_clickable((By.ID, Id)))
File "C:\Users\Pavan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Please anyone help me on this... Thanks;
I don't know if your imports were correct. Also, your code doesn't scroll to the bottom of the page. Use this.
import os
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome(executable_path='driver path')
driver.get('https://www.satp.org/datasheet-terrorist-attack/major-incidents/Pakistan')
driver.maximize_window()
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
element = WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="ctl00_ContentPlaceHolder1_gvMajorIncident_ctl23_lbtnYear"]')))
element.click()
The problem I'm facing is I'm using
driver.find_elements_by_classname("a_classname_common_to_all_images_in_tripadvisor_hotels")
However, on running script each time I'm getting less as well as different outcomes.
For instance, sometimes it scrapes first 5 out of 30 on the page, sometimes 4/30 and so on.
I'm scraping images from this link:
https://www.tripadvisor.in/Hotels-g304551-New_Delhi_National_Capital_Territory_of_Delhi-Hotels.html
images = driver.find_elements_by_class_name("_1a4WY7aS")
I am able to find all names of the hotels using class_name method, however with images it's variable.
Any help is appreciated, thanks :)
From How can I scroll a web page using selenium webdriver in python?
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
Inducing a webdriver wait to load all your elements.
images = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "_1a4WY7aS")))
Import
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
When it scrapes only 5 of the images present means only 5 images were loaded. You should do 2 things to get every image on the page.
Scroll down to the end of the page: You can do this by selecting the body element and then sending the down keys.
from selenium.webdriver.common.keys import Keys
import time
for _ in range(10):
driver.find_element_by_tag_name("body").send_keys(Keys.PAGE_DOWN)
time.sleep(0.2)
After scrolling, wait for the elements to be present
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,
"img._1a4WY7aS")))
I am trying to get all the products' url from this webpage, but I managed to get only a fraction of it.
My first attempt was to scrape the webpage with Beautifulsoup, but then I realized selenium would be better as I needed to click the "Show more" button several times. I also added a code to scroll down the page as I though that was the problem, but the result didn't change.
import time
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
def getListingLinks(link):
# Open the driver
driver = webdriver.Chrome(executable_path="")
driver.maximize_window()
driver.get(link)
time.sleep(3)
# scroll down: repeated to ensure it reaches the bottom and all items are loaded
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
listing_links = []
while True:
try:
driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, '//*[#id="main-content"]/div[2]/div[2]/div[4]/button'))))
driver.execute_script("arguments[0].click();", WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#main-content > div:nth-child(2) > div.main-column > div.btn-wrapper.center > button"))))
print("Button clicked")
links = driver.find_elements_by_class_name('fop-contentWrapper')
for link in links:
algo=link.find_element_by_css_selector('.fop-contentWrapper a').get_attribute('href')
print(algo)
listing_links.append(str(algo))
except:
print("No more Buttons")
break
driver.close()
return listing_links
fresh_food = getListingLinks("https://www.ocado.com/browse/fresh-20002")
print(len(fresh_food)) ## Output: 228
As you can see, I get 228 urls while I would like to get 5605 links, that is the actual number of products in the webpage according to Ocado. I believe I have a problem with the order of my code, but can't find the proper order. I would sincerely appreciate any help.
Hey I am scraping Shopify Review Shop Url, but while I am navigating from the search results, a pop up appears and I have no idea how to detect it and close it.
Here's my code
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
url='https://apps.shopify.com/sales-pop'
driver.get(url)
#Loop and Navigate Through the Search Results
page_number = 2
while True:
try:
link = driver.find_element_by_link_text(str(page_number))
except NoSuchElementException:
break
if page_number > 8:
timeout = 20
try:
WebDriverWait(driver,timeout).until(EC.visibility_of_element_located((By.XPATH,'//div[#title="close"]')))
except TimeoutException:
print("Timed out waiting for page to load")
driver.quit()
#Switch to the Popup
driver.switch_to_alert()
driver.find_element_by_xpath('//div[#title="close"]').click()
driver.implicitly_wait(5)
link.click()
print(driver.current_url)
page_number += 1
else:
driver.implicitly_wait(5)
link.click()
print(driver.current_url)
page_number += 1
#Scraping Rating
stars = driver.find_elements_by_xpath('//figure[#class="resourcesreviews-reviews-star"]')
starstars = []
for star in stars:
starstar=star.find_element_by_xpath('.//div/span')
starstars.append(starstar.get_attribute('class'))
#Scraping URL
urls = driver.find_elements_by_xpath('//figcaption[#class="clearfix"]')
titles=[]
for url in urls:
title=url.find_element_by_xpath('.//strong/a')
titles.append(title.get_attribute('href'))
#Print Titles and Rating Side by Side
for titless, starstarss in zip(titles, starstars):
print(titless + " " + starstarss)
You can just use WebDriverWaitandwindow_handles. Specifically, you can probably replace your #Switch to Popup section with something like:
WebDriverWait(driver, 5).until(lambda d: len(d.window_handles) == 2)
driver.switch_to_window(driver.window_handles[1]).close()
driver.switch_to_window(driver.window_handles[0])
this is the problem:
I am using selenium to download all the successful projects from this webpage ("https://www.rockethub.com/projects"). The url does not change if a click on any button.
I'm interested in successful project, thus I click on the button status and then I click on successful.
Once on this page I need to scroll down repedetly to make other urls appear.
Here is the problem. So far I have been not able to scroll down the page
This is my code:
from selenium.webdriver import Firefox
from selenium import webdriver
url="https://www.rockethub.com/projects"
link=[]
wd = webdriver.Firefox()
wd.get(url)
next_button = wd.find_element_by_link_text('Status')
next_button.click()
next_but = wd.find_element_by_link_text('Successful')
next_but.click()
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Any idea on how to solve this?
Thanks
Giangi
Since the content is updated dynamically, you need to wait for a change of the content before executing the next step:
class element_is_not(object):
""" An expectation for checking that the element returned by
the locator is not equal to a given element.
"""
def __init__(self, locator, element):
self.locator = locator
self.element = element
def __call__(self, driver):
new_element = driver.find_element(*self.locator)
return new_element if self.element != new_element else None
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Firefox()
wait = WebDriverWait(driver, 10)
driver.get("https://www.rockethub.com/projects")
# get the last box
by_last_box = (By.CSS_SELECTOR, '.project-box:last-of-type')
last_box = wait.until(element_is_not(by_last_box, None))
# click on menu Status > Successful
driver.find_element_by_link_text('Status').click()
driver.find_element_by_link_text('Successful').click()
# wait for a new box to be added
last_box = wait.until(element_is_not(by_last_box, last_box))
# scroll down the page
driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
# wait for a new box to be added
last_box = wait.until(element_is_not(by_last_box, last_box))
Run the wd.execute_script("window.scrollTo(0, document.body.scrollHeight);") in loop, since each time the script is executed only certain number of data is rerieved, so you have to execute it in a loop.
If you are just looking to retrieve all the successful projects at once and not interested in simulating the scrolling down to the page, then look at this answer, it may help.