Can Selenium detect when webpage finishes loading in Python3? - python-3.x

In python I wrote:
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
driver.get(url)
try:
WebDriverWait(driver, 10).until(
lambda driver: driver.execute_script('return document.readyState') == 'complete')
except se.TimeoutException:
return False
# Start Parsing
Even though I have waited for readyState for some websites when I parse it I see that there is no checkbox. But, If I add time.sleep(5) Before parsing for the same website I get that there is a checkbox.
My question is, how can I have a general solution that works with the majority of websites? I can't just write time.sleep(5) as some websites might need much more and some might finish within 0.001 seconds (which will have bad impact on performance...)
I just want to stimulate a real browser and not to handle anything before the refresh button appears again (which means everything was loaded).

Ideally web applications when accessed through get(), returns the control to the WebDriver only when document.readyState equals to complete. So unless the AUT(Application under Test) behaves otherwise, the following line of code is typically an overhead:
WebDriverWait(driver, 10).until(lambda driver: driver.execute_script('return document.readyState') == 'complete')
However, as per your test requirements you can configure the pageLoadStrategy either as:
none
eager
normal
You can find a detailed discussion in What is the correct syntax checking the .readyState of a website in Selenium Python
At this point, it is to be noted that using time.sleep(secs) without any specific condition to achieve defeats the purpose of Automation and should be avoided at any cost.
Solution
The generic approach that would work with all the websites is to induce WebDriverWait as per the prevailing test scenario. As an example:
To wait for the presence of an element you need to invoke the expected_conditions of presence_of_element_located()
To wait for the visibility of an element you need to invoke the expected_conditions of visibility_of_element_located()
To wait for the element to be visible, enabled and interactable such that you can click it you need to invoke the expected_conditions of element_to_be_clickable()

Related

Speedup web scraping with selenium

I am newbie to web scraping using selenium and I am scraping seetickets.us
My scraper works as follows.
sign in
search for events
click on each event
scrape data
come back
click on next event
repeat
Now the problem is that some of the events do not contain some elements such as
this event: https://wl.seetickets.us/event/Beta-Hi-Fi/484490?afflky=WorldCafeLive
which does not contain pricing table
but this one does
https://www.seetickets.us/event/Wake-Up-Daisy-1100AM/477633
so I have used try except blocks
try:
find element
except:
return none
but if it doesnt found the element in try, it takes 5 seconds to go to except because I have used
webdriver.implicitwait(5)
Now , if any page does not contain multiple elements , the selenium takes very much time to scrape that page.
I have thousands of pages to scrape. What should be done to speed up the process.
Thanks
To speedup web scraping using Selenium:
Remove implicitwait() totally.
Induce WebDriverWait to synchronise the webdriver instance with the WebBrowser instance for either of the following element states:
presence_of_element_located()
visibility_of_element_located()
element_to_be_clickable()
Your effective code block will be:
try:
element = WebDriverWait(driver, 3).until(EC.visibility_of_element_located((By.ID, "input"))))
print("Element is visible")
except TimeoutException:
print("Element is not visible")
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Instead of ImplicitWait try to use ExplicitWait but apply it to search of main container only to wait for content to be loaded. For all inner elements apply find_element with no waits.
P.S. It's always better to share your real code instead of pseudo-code
Instead of using implicitWait and waiting for each individual element, only wait for the full page load, for example wait for h1 tag, which will indicate the full page has been loaded then proceed with extraction.
#wait for page load
try:
pageLoadCheck=WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH, "(//h1)[1]"))).get_attribute("textContent").strip()
#extract data without any wait once the page is loaded
try:
dataOne=driver.find_element_by_xpath("((//h1/following-sibling::div)[1]//a[contains(#href,'tel:')])[1]").get_attribute("textContent").strip()
except:
dataOne=''
except Exception as e:
print(e)

How do I print the last message from a Reddit message group using Selenium

So I get the messages from this line:
<pre class="_3Gy8WZD53wWAE41lr57by3 ">Sleep</pre>
My code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
PATH = 'C:\\Users\\User\\Desktop\\chromedriver.exe'
driver = webdriver.Chrome(PATH)
driver.get('https://www.reddit.com')
time.sleep(80) # TIME TO LOGIN IN
search = driver.find_element_by_class_name('_3Gy8WZD53wWAE41lr57by3 ')
print(driver.find_element_by_xpath(".//pre").text) # *LET'S CALL THIS 'S'*
And everything works, kinda. When I print: 's' it prints out the last message from that chat.
Note that whenever someone enters a message, it will be under the variable(class): '_3Gy8WZD53wWAE41lr57by3 '
My goal is to print out the first message from the that chat.
I had to edit it twice because of some mistakes that I had made
I would suggest 2 changes to your code which'll save you major frustration:
Avoid explicit sleep calls, instead, wait for presence of elements. This will allow your program to wait as little time as possible for the page you're trying to load.
Utilize css selectors instead of xpath --> you have much finer control over accessing elements, plus, your code becomes more robust and flexible.
In terms of execution, here's how that looks:
Wait up to 80 seconds for login:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
# Get the page, now the user will need to log in
driver.get('https://www.reddit.com')
# Wait until the page is loaded, up to 80 seconds
try:
element = WebDriverWait(driver, 80).until(
EC.presence_of_element_located((By. CSS_SELECTOR, "pre. _3Gy8WZD53wWAE41lr57by3"))
)
except TimeoutException:
print("You didn't log in, shutting down program")
driver.quit()
# continue as normal here
Utilize css selectors to find your messages:
# I personally like to always use the plural form of this function
# since, if it fails, it returns an empty list. The single form of
# this function results in an error if no results are found
# NOTE: utilize reddit's class for comments, you may need to change the css selector
all_messages = driver.find_elements_by_css_selector('pre._3Gy8WZD53wWAE41lr57by3')
# You can now access the first and last elements from this:
first_message = all_messages[0].text
last_message = all_messages[-1].text
# Alternatively, if you are concerned about memory usage from potentially large
# lists of messages, use css selector 'nth-of-type'
# NOTE: accessing first instance of list of the list exists allows None
# if no results are found
first_message = driver.find_elements_by_css_selector('pre._3Gy8WZD53wWAE41lr57by3:first-of-type')
first_message = first_message[0] if first_message else None
last_message = driver.find_elements_by_css_selector('pre._3Gy8WZD53wWAE41lr57by3:last-of-type')
last_message = last_message[0] if last_message else None
I hope this provides an immediate solution but also some fundamentals how to optimize your web scraping moving forward.

Selenium: Edge webdriver not waiting for page to load before executing next step (python)

I am writing some tests using selenium with Python. So far my suite works perfectly with Chrome and Firefox. However, the same code is not working when I try with the Edge (EdgeHTML). I am using the latest version at the time of writing which is release 17134, version: 6.17134. My tests are running on Windows 10.
The problem is that Edge is not waiting for the page to load. As part of every test, a login is first performed. The credentials are entred and the form submitted. Firefox and Chrome will now wait for the page we are redirected to, to load. However, with Edge, the next code is executed as soon as the login submit button is clicked which of course results in a failed test.
Is this a bug with Edge? It seems a bit too fundamental to be the case. Does the browser need to be configured in a certain manner? I cannot see anything in the documentation.
This is the code run with the last statement resulting in a redirect as we have logged in:
self.driver.find_element_by_id("login-email").send_keys(username)
self.driver.find_element_by_id("login-password").send_keys(password)
self.driver.find_element_by_id("login-messenger").click()
Edge decides it does not need to wait and will then execute the next code which is to navigate to a protected page. The code is:
send_page = SendPage(driver)
send_page.load_page()
More concisely:
self.driver.find_element_by_id("login-messenger").click()
# should wait now for the login redirect before excuting the line below but it does not!
self.driver.get(BasePage.base_url + self.uri)
I can probably perform a workaround by waiting for an element on the extent page to be present thus making Edge wait. This does not feel like the right thing to do. I certainly don't want to have to keep making invasive changes just for Edge.
Any advice please on what I should do?
Is this a bug with Edge? It seems a bit too fundamental to be the
case. Does the browser need to be configured in a certain manner? I
cannot see anything in the documentation.
No, I think it is not a bug with Edge browser. Because of the difference between the browser's performance, perhaps Edge browser will spend more time to load the page.
Generally, we could use the time.sleep(secs) method, WebDriverWait() method and implicitly_wait() method to wait the page load.
The code block below shows you how to wait for a page load to complete. It uses a timeout. It waits for an element to show on the page (you need an element id).
Then if the page is loaded, it shows page loaded. If the timeout period (in seconds) has passed, it will show the timeout error.
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get('https://pythonbasics.org')
timeout = 3
try:
element_present = EC.presence_of_element_located((By.ID, 'main'))
WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
print("Timed out waiting for page to load")
finally:
print("Page loaded")
More detail information, please check the following articles.
Wait until page is loaded with Selenium WebDriver for Python
How to wait for elements in Python Selenium WebDriver

Using selenium wait until to wait some query finished in python [duplicate]

This question already has answers here:
How can I make sure if some HTML elements are loaded for Selenium + Python?
(2 answers)
Do we have any generic function to check if page has completely loaded in Selenium
(7 answers)
Closed 4 years ago.
I'm try to write process on UI with selenium and python3.x. In some case, I would like to wait particular query finish before go to next step. since during the query finish, there is no element I can pick up to use and return document.readyState as complete does not work for me. Is it possible to use selenium wait.until to do it? like
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
WebDriverWait(driver, 120).until(lambda x: "query return 0")
WebDriverWait(driver, 120).until(EC."something to wait for")
Shorter solution:
WebDriverWait(driver, 10).until(lambda d: d.execute_script("return jQuery.active == 0"))
If the query is being run with Ajax and you are using JQuery, you could try using the javascript executor in order to wait for all ajax requests in flight to finish after intitiating the request:
driver.execute_script("return jQuery.active == 0")
You could wrap this into a custom ExpectedCondition for re-use with code like this:
class ajax_requests_to_finish(object):
def __call__(self, driver):
return driver.execute_script("return jQuery.active == 0")
This can be used like this:
# First perform an action that fires an ajax request
wait = WebDriverWait(driver, 10)
wait.until(ajax_requests_to_finish())
Disclaimer: I'm a java programmer, I don't really know any python. I drew upon this page as an example for how to create custom wait conditions in Python, since I only know how to do this in Java. Sorry if my code is incorrect, this sourced page describes a more generic working example. https://selenium-python.readthedocs.io/waits.html
Edit: As sers pointed out in another answer, this potential solution is possible as a one-liner

Selenium implicit and explicit waits not working / has no effect

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Firefox()
driver.get("https://google.com")
#driver.implicitly_wait(10)
WebDriverWait(driver,10)
print("waiting 10 sec")
driver.quit()
It just quits after page loading. the waits have no effect at all!
demo : https://www.youtube.com/watch?v=GocfsDZFqk8&feature=youtu.be
any help would be highly appreciated.
If you want a pause 10 seconds, use time.sleep():
import time
time.sleep(10) # take a pause 10 seconds
Note: WebDriverWait(driver,10) doesn't work like that. Instead you can use it like this:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
# this will wait at least 10 seconds until url will contain "your_url"
WebDriverWait(driver, 10).until(EC.url_contains(("your_url")))
but it will not wait exactly 10 seconds, only until expected_conditions will be satisfied.
Also: as source code us tells:
def implicitly_wait(self, time_to_wait):
"""
Sets a sticky timeout to implicitly wait for an element to be found,
or a command to complete. This method only needs to be called one
time per session. To set the timeout for calls to
execute_async_script, see set_script_timeout.
:Args:
- time_to_wait: Amount of time to wait (in seconds)
:Usage:
driver.implicitly_wait(30)
"""
...
driver.implicitly_wait(10) also is used for waiting elements, not to pause script.
PS: it is always a good practice to use WebDriverWait instead of hard pause, because with WebDriverWait your test will be more quickly, since you don't have to wait the whole amount of time, but only until expected_conditions will be satisfied. As I understood, you are just playing arround at the moment, but for the future WebDriverWait is better to use.
At least with Python and Chrome driver, my experience is that even when using
WebDriverWait you STILL need to use time.sleep for things to work reliably. using implicitly_wait doesnt work. I need to put time.sleep(1) after each operation, or sometimes things don't fire off.
So we had this same problem, what we did was modify the driver class in selenium with a decorator to sleep for .44 seconds on functions that we modified in the get_driver() function. In this case we wanted to wait to find elements by class, name and id before selenium inputted our desired content. Worked like a charm.
def sleep_decorator(func):
def wrapper(*args, **kwargs):
time.sleep(.44) # Added side-effect
return func(*args, **kwargs) # Modified return
return wrapper
def get_driver():
driver = webdriver.Chrome()
driver.find_element_by_id = sleep_decorator(driver.find_element_by_id)
driver.find_element_by_name = sleep_decorator(driver.find_element_by_name)
driver.find_element_by_class_name = sleep_decorator(driver.find_element_by_class_name)
driver.find_elements_by_id = sleep_decorator(driver.find_elements_by_id)
driver.find_elements_by_name = sleep_decorator(driver.find_elements_by_name)
driver.find_elements_by_class_name = sleep_decorator(driver.find_elements_by_class_name)
return driver
By using this WebDriverWait(driver,10) , you have declared the Explicit wait. This is just the declaration , you are not using explicit wait at all.
For make use of Explicit wait, you will have to bind the above code with EC which is Expected condition.
Something like :
wait = WebDriverWait(driver,10)
element = wait.until(EC.element_to_be_clickable((By.NAME, 'q')))
element.send_keys("Hi Google")
You can refer this link for explicit wait : Explicit wait
Note that time.sleep(10) is worst/extreme type of explicit wait which sets the condition to an exact time period to wait. There are some convenience methods provided that help you write code that will wait only as long as required. WebDriverWait in combination with ExpectedCondition is one way this can be accomplished.

Resources