Unable to click HREF under headers (invisible elements) - python-3.x

I am wanting to click all the Href tabs under the main headers and to navigate to those pages to scrape them. For speed of the job, I do am wanting to click the href without having to click the headers. My question is, is there a way to click these buttons even though it is not visible like the page on the right? It does not seem to be working for me. It seems to give me:
Traceback (most recent call last):
File "C:/Users/Bain3/PycharmProjects/untitled4/Centrebet2.py", line 58, in <module>
EC.element_to_be_clickable((By.XPATH, '(//*[#id="accordionMenu1_ulSports"]/li/ul/li/ul/li/a)[%s]' % str(index + 1)))).click()
File "C:\Users\Bain3\Anaconda3\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
I have replaced
EC.element_to_be_clickable((By.XPATH, '(//*[#id="accordionMenu1_ulSports"]/li/ul/li/ul/li/a)[%s]' % str(index + 1)))).click()
with
driver.find_element_by_xpath('(//*[#id="accordionMenu1_ulSports"]/li/ul/li/ul/li/a)[%s]' % str(index + 1)).click()
This however does not seem to remedy it as it only clicks visible elements.
My code below is:
from random import shuffle
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium import webdriver as web
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from random import randint
from time import sleep
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import csv
import requests
import time
from selenium import webdriver
success = False
while not success:
try:
driver = webdriver.Chrome()
driver.set_window_size(1024, 600)
driver.maximize_window()
driver.get('http://centrebet.com/')
success = True
except:
driver.quit()
sleep(5)
sports = driver.find_element_by_id("accordionMenu1_ulSports")
if sports.get_attribute("style") == "display: none;":
driver.find_element_by_xpath('//ul[#id="menu_acc"]/li[3]/a').click()
driver.find_element_by_xpath(".//*[#data-type ='sports_l1'][contains(text(), 'Soccer')]").click()
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
options = driver.find_elements_by_xpath('//*[#id="accordionMenu1_ulSports"]/li/ul/li/ul/li/a')
# Get list of inetegers [1, 2, ... n]
indexes = [index for index in range(len(options))]
# Shuffle them
shuffle(indexes)
for index in indexes:
# Click on random option
wait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, '(//*[#id="accordionMenu1_ulSports"]/li/ul/li/ul/li/a)[%s]' % str(index + 1)))).click()
I have also tried:
driver.execute_script('document.getElementByxpath("//*[#id="accordionMenu1_ulSports"]/li/ul/li/ul/li/a").style.visibility = "visible";')
To remedy this. Though this simply gives an error. Any ideas on how to resolve this issue of invisible elements?

driver.execute_script('document.getElementByxpath("//*[#id="accordionMenu1_ulSports"]/li/ul/li/ul/li/a").style.visibility = "visible";')
gives you error because it's not correct way to use XPath in Javascript. Correct way you can find here
To scrape required data you can use below code:
import requests
import time
from selenium import webdriver
url = "http://centrebet.com/"
success = False
while not success:
try:
driver = webdriver.Chrome()
driver.set_window_size(1024, 600)
driver.maximize_window()
driver.get(url)
success = True
except:
driver.quit()
time.sleep(5)
sports = driver.find_element_by_id("accordionMenu1_ulSports")
links = [url + link.get_attribute("onclick").replace("menulink('", "").replace("')", "") for link in sports.find_elements_by_xpath('.//a[starts-with(#onclick, "menulink")]')]
for link in links:
print(requests.get(link).text)
Instead of clicking on each link, you can request content of each page with HTTP-GET

You can even try using JavascriptExecutor.
Use below code to make your style attribute = display:block;
driver.execute_script("arguments[0].style.display = 'none'", driver.find_element_by_xpath("//*[#id='accordionMenu1_ulSports']/li/ul/li/ul"))
Note : Make sure you are using correct xpath. your <ul> element is hidden not <a> so so take xpath of that <ul> tag only and try

Related

How to click element using selenium web driver(Python)

I am trying to click an element using selenium chromedriver by the Use of ID of an element to click.
I want to Click Year '2020" from the following webpage: 'https://www.satp.org/datasheet-terrorist-attack/major-incidents/Pakistan'
I tried with the below code.
driver = webdriver.Chrome(executable_path=ChromeDriver_Path, options = options)
driver.get('https://www.satp.org/datasheet-terrorist-attack/major-incidents/Pakistan')
Id = "ctl00_ContentPlaceHolder1_gvMajorIncident_ct123_lbtnYear" ### Id of an Element 2020
wait = WebDriverWait(driver, 20) ##Wait for 20 seconds
element = wait.until(EC.element_to_be_clickable((By.ID, Id)))
driver.execute_script("arguments[0].scrollIntoView();", element)
element.click()
time.sleep(10)
but unfortunately this gives an Error as below:
element = wait.until(EC.element_to_be_clickable((By.ID, Id)))
File "C:\Users\Pavan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Please anyone help me on this... Thanks;
I don't know if your imports were correct. Also, your code doesn't scroll to the bottom of the page. Use this.
import os
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome(executable_path='driver path')
driver.get('https://www.satp.org/datasheet-terrorist-attack/major-incidents/Pakistan')
driver.maximize_window()
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
element = WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="ctl00_ContentPlaceHolder1_gvMajorIncident_ctl23_lbtnYear"]')))
element.click()

Python3 web automation error - ElementNotInteractableException: Message: element not interactable [duplicate]

I understand this question has been asked but I need some solution for this error:
Traceback (most recent call last):
File "goeventz_automation.py", line 405, in <module>
if login(driver) is not None:
File "goeventz_automation.py", line 149, in login
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#track-element='header-login']"))).click()
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/support/wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
This is the code where its getting error:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import TimeoutException
import urllib.request as request
import urllib.error as error
from PIL import Image
from selenium.webdriver.chrome.options import Options
import datetime as dt
import time
from common_file import *
from login_credentials import *
def login(driver):
global _email, _password
if waiter(driver, "//a[#track-element='header-login']") is not None:
#login = driver.find_element_by_xpath("//a[#track-element='header-login']")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#track-element='header-login']"))).click()
#login.click()
if waiter(driver,"//input[#id='user_email']") is not None:
email = driver.find_element_by_xpath("//input[#id='user_email']")
password = driver.find_element_by_xpath("//input[#id='password']")
email.send_keys(_email)
password.send_keys(_password)
driver.find_element_by_xpath("//button[#track-element='click-for-login']").click()
return driver
else:
print("There was an error in selecting the email input field. It may be the page has not loaded properly.")
return None
else:
print("There was an error in selecting the header-login attribute on the page.")
return None
if __name__ == '__main__':
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('/usr/bin/chromium/chromedriver',chrome_options=chrome_options)
#d.get('https://www.google.nl/')
#driver = webdriver.Chrome()
driver.maximize_window()
driver.get('https://www.goeventz.com/')
if login(driver) is not None:
print(create_event(driver))
I think there is some problem with Keys.ENTER, but I don't know how to solve this. I have tried every possible solution.............
This error message...
selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
...implies that the desired element was not interactable when you tried to invoke click() on it.
A couple of facts:
When you initialize the Chrome browser always in maximized mode.
You can disable-extensions.
You need to disable-infobars as well.
I have used the same xpath which you have constructed and you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.add_argument("start-maximized");
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.goeventz.com/")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[#track-element='header-login']"))).click()
Browser Snapshot:
copy full xpath instead of copying only xpath. It will work
Instead of using login.send_keys(Keys.ENTER) you should use selenium click() method which would work fine for you.
You can check first if the element is clickable first and then you can click on it.
Like:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#track-element='header-login']"))).click()
Overview
It seems like you're having an XPATH problem finding the "Submit" button or your Submit button is not clickable, or your Submit button has some client side events attached to it (javascript/etc) that are required in order to effectively submit the page.
Calling the pw.submit() method in most cases should get rid of the need to wait for the submit button to become clickable and avoid any issues in locating the button in most cases. On many other websites, some of the necessary back-end processes are primed by client-side activities that are performed after the "submit" button is actually clicked (although on a side-note this is not considered best-practice because it makes the site less accessible, etc, I digress). Above all, it's important to watch your script execute and make sure that you're not getting any noticeable errors displayed on the webpage about the credentials that you're submitting.
Also, however, some websites require that you add a certain minimum amount of time between the entry of the username, password, and submitting the page in order for it to be considered a valid submitting process. I've even run in to websites that require you to use send_keys 1 at a time for usernames and passwords to avoid some anti-scraping technologies they employ. In these cases, I usually use the following between the calls:
from random import random, randint
def sleepyTime(first=5, second=10):
# returns the value of the time slept (as float)
# sleeps a random amount of time between the number variable in first
# and the number variable second (in seconds)
sleepy_time = round(random() * randint(first, second), 2)
sleepy_time = sleepy_time if sleepy_time > first else (first + random())
sleep(sleepy_time)
return sleepy_time
I don't see what use you have for making the _email and _password variables global, unless they are being changed somewhere in the login function and you want that change to be precipitated out to the other scopes.
How I would try to solve it
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException, TimeoutException
TIME_TIMEOUT = 20 # Twenty-second timeout default
def eprint(*args, **kwargs):
""" Prints an error message to the user in the console (prints to sys.stderr), passes
all provided args and kwargs along to the function as usual. Be aware that the 'file' argument
to print can be overridden if supplied again in kwargs.
"""
print(*args, file=sys.stderr, **kwargs)
def login(driver):
global _email, _password
try:
email = WebDriverWait(driver, TIME_TIMEOUT).until(EC.presence_of_element_located((By.XPATH, "//input[#id='user_email']")))
pw = WebDriverWait(driver, TIME_TIMEOUT).until(EC.presence_of_element_located((By.XPATH, "//input[#id='password']"))
pw.submit()
# if this doesn't work try the following:
# btn_submit = WebDriverWait(driver, TIME_TIMEOUT).until(EC.element_to_be_clickable((By.XPATH, "//button[#track-element='click-for-login']"))
# btn_submit.click()
# if that doesn't work, try to add some random wait times using the
# sleepyTime() example from above to add some artificial waiting to your email entry, your password entry, and the attempt to submit the form.
except NoSuchElementException as ex:
eprint(ex.msg())
except TimeoutException as toex:
eprint(toex.msg)
if __name__ == '__main__':
driver = webdriver.Chrome('/usr/bin/chromium/chromedriver',chrome_options=chrome_options)
#d.get('https://www.google.nl/')
#driver = webdriver.Chrome()
driver.maximize_window()
driver.get('https://www.goeventz.com/')
if login(driver) is not None:
print(create_event(driver))
For headless chrome browser you need to provide window size as well in chrome options.For headless browser selenium unable to know what your window size.Try that and let me know.
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('window-size=1920x1480')
I faced this error as well. Now check your browser if the element is inside the iframe. If so, use driver.find_element(By.CSS_SELECTOR, "#payment > div > div > iframe") and driver.switch_to.frame(iframe) Then you will be able to work out.

Selenium headless cannot interact with element but non-headless can [duplicate]

I understand this question has been asked but I need some solution for this error:
Traceback (most recent call last):
File "goeventz_automation.py", line 405, in <module>
if login(driver) is not None:
File "goeventz_automation.py", line 149, in login
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#track-element='header-login']"))).click()
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/support/wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
This is the code where its getting error:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import TimeoutException
import urllib.request as request
import urllib.error as error
from PIL import Image
from selenium.webdriver.chrome.options import Options
import datetime as dt
import time
from common_file import *
from login_credentials import *
def login(driver):
global _email, _password
if waiter(driver, "//a[#track-element='header-login']") is not None:
#login = driver.find_element_by_xpath("//a[#track-element='header-login']")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#track-element='header-login']"))).click()
#login.click()
if waiter(driver,"//input[#id='user_email']") is not None:
email = driver.find_element_by_xpath("//input[#id='user_email']")
password = driver.find_element_by_xpath("//input[#id='password']")
email.send_keys(_email)
password.send_keys(_password)
driver.find_element_by_xpath("//button[#track-element='click-for-login']").click()
return driver
else:
print("There was an error in selecting the email input field. It may be the page has not loaded properly.")
return None
else:
print("There was an error in selecting the header-login attribute on the page.")
return None
if __name__ == '__main__':
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('/usr/bin/chromium/chromedriver',chrome_options=chrome_options)
#d.get('https://www.google.nl/')
#driver = webdriver.Chrome()
driver.maximize_window()
driver.get('https://www.goeventz.com/')
if login(driver) is not None:
print(create_event(driver))
I think there is some problem with Keys.ENTER, but I don't know how to solve this. I have tried every possible solution.............
This error message...
selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
...implies that the desired element was not interactable when you tried to invoke click() on it.
A couple of facts:
When you initialize the Chrome browser always in maximized mode.
You can disable-extensions.
You need to disable-infobars as well.
I have used the same xpath which you have constructed and you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.add_argument("start-maximized");
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.goeventz.com/")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[#track-element='header-login']"))).click()
Browser Snapshot:
copy full xpath instead of copying only xpath. It will work
Instead of using login.send_keys(Keys.ENTER) you should use selenium click() method which would work fine for you.
You can check first if the element is clickable first and then you can click on it.
Like:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#track-element='header-login']"))).click()
Overview
It seems like you're having an XPATH problem finding the "Submit" button or your Submit button is not clickable, or your Submit button has some client side events attached to it (javascript/etc) that are required in order to effectively submit the page.
Calling the pw.submit() method in most cases should get rid of the need to wait for the submit button to become clickable and avoid any issues in locating the button in most cases. On many other websites, some of the necessary back-end processes are primed by client-side activities that are performed after the "submit" button is actually clicked (although on a side-note this is not considered best-practice because it makes the site less accessible, etc, I digress). Above all, it's important to watch your script execute and make sure that you're not getting any noticeable errors displayed on the webpage about the credentials that you're submitting.
Also, however, some websites require that you add a certain minimum amount of time between the entry of the username, password, and submitting the page in order for it to be considered a valid submitting process. I've even run in to websites that require you to use send_keys 1 at a time for usernames and passwords to avoid some anti-scraping technologies they employ. In these cases, I usually use the following between the calls:
from random import random, randint
def sleepyTime(first=5, second=10):
# returns the value of the time slept (as float)
# sleeps a random amount of time between the number variable in first
# and the number variable second (in seconds)
sleepy_time = round(random() * randint(first, second), 2)
sleepy_time = sleepy_time if sleepy_time > first else (first + random())
sleep(sleepy_time)
return sleepy_time
I don't see what use you have for making the _email and _password variables global, unless they are being changed somewhere in the login function and you want that change to be precipitated out to the other scopes.
How I would try to solve it
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException, TimeoutException
TIME_TIMEOUT = 20 # Twenty-second timeout default
def eprint(*args, **kwargs):
""" Prints an error message to the user in the console (prints to sys.stderr), passes
all provided args and kwargs along to the function as usual. Be aware that the 'file' argument
to print can be overridden if supplied again in kwargs.
"""
print(*args, file=sys.stderr, **kwargs)
def login(driver):
global _email, _password
try:
email = WebDriverWait(driver, TIME_TIMEOUT).until(EC.presence_of_element_located((By.XPATH, "//input[#id='user_email']")))
pw = WebDriverWait(driver, TIME_TIMEOUT).until(EC.presence_of_element_located((By.XPATH, "//input[#id='password']"))
pw.submit()
# if this doesn't work try the following:
# btn_submit = WebDriverWait(driver, TIME_TIMEOUT).until(EC.element_to_be_clickable((By.XPATH, "//button[#track-element='click-for-login']"))
# btn_submit.click()
# if that doesn't work, try to add some random wait times using the
# sleepyTime() example from above to add some artificial waiting to your email entry, your password entry, and the attempt to submit the form.
except NoSuchElementException as ex:
eprint(ex.msg())
except TimeoutException as toex:
eprint(toex.msg)
if __name__ == '__main__':
driver = webdriver.Chrome('/usr/bin/chromium/chromedriver',chrome_options=chrome_options)
#d.get('https://www.google.nl/')
#driver = webdriver.Chrome()
driver.maximize_window()
driver.get('https://www.goeventz.com/')
if login(driver) is not None:
print(create_event(driver))
For headless chrome browser you need to provide window size as well in chrome options.For headless browser selenium unable to know what your window size.Try that and let me know.
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('window-size=1920x1480')
I faced this error as well. Now check your browser if the element is inside the iframe. If so, use driver.find_element(By.CSS_SELECTOR, "#payment > div > div > iframe") and driver.switch_to.frame(iframe) Then you will be able to work out.

selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable when clicking on an element using Selenium Python

I understand this question has been asked but I need some solution for this error:
Traceback (most recent call last):
File "goeventz_automation.py", line 405, in <module>
if login(driver) is not None:
File "goeventz_automation.py", line 149, in login
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#track-element='header-login']"))).click()
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/support/wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
This is the code where its getting error:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import TimeoutException
import urllib.request as request
import urllib.error as error
from PIL import Image
from selenium.webdriver.chrome.options import Options
import datetime as dt
import time
from common_file import *
from login_credentials import *
def login(driver):
global _email, _password
if waiter(driver, "//a[#track-element='header-login']") is not None:
#login = driver.find_element_by_xpath("//a[#track-element='header-login']")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#track-element='header-login']"))).click()
#login.click()
if waiter(driver,"//input[#id='user_email']") is not None:
email = driver.find_element_by_xpath("//input[#id='user_email']")
password = driver.find_element_by_xpath("//input[#id='password']")
email.send_keys(_email)
password.send_keys(_password)
driver.find_element_by_xpath("//button[#track-element='click-for-login']").click()
return driver
else:
print("There was an error in selecting the email input field. It may be the page has not loaded properly.")
return None
else:
print("There was an error in selecting the header-login attribute on the page.")
return None
if __name__ == '__main__':
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('/usr/bin/chromium/chromedriver',chrome_options=chrome_options)
#d.get('https://www.google.nl/')
#driver = webdriver.Chrome()
driver.maximize_window()
driver.get('https://www.goeventz.com/')
if login(driver) is not None:
print(create_event(driver))
I think there is some problem with Keys.ENTER, but I don't know how to solve this. I have tried every possible solution.............
This error message...
selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
...implies that the desired element was not interactable when you tried to invoke click() on it.
A couple of facts:
When you initialize the Chrome browser always in maximized mode.
You can disable-extensions.
You need to disable-infobars as well.
I have used the same xpath which you have constructed and you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.add_argument("start-maximized");
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.goeventz.com/")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[#track-element='header-login']"))).click()
Browser Snapshot:
copy full xpath instead of copying only xpath. It will work
Instead of using login.send_keys(Keys.ENTER) you should use selenium click() method which would work fine for you.
You can check first if the element is clickable first and then you can click on it.
Like:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#track-element='header-login']"))).click()
Overview
It seems like you're having an XPATH problem finding the "Submit" button or your Submit button is not clickable, or your Submit button has some client side events attached to it (javascript/etc) that are required in order to effectively submit the page.
Calling the pw.submit() method in most cases should get rid of the need to wait for the submit button to become clickable and avoid any issues in locating the button in most cases. On many other websites, some of the necessary back-end processes are primed by client-side activities that are performed after the "submit" button is actually clicked (although on a side-note this is not considered best-practice because it makes the site less accessible, etc, I digress). Above all, it's important to watch your script execute and make sure that you're not getting any noticeable errors displayed on the webpage about the credentials that you're submitting.
Also, however, some websites require that you add a certain minimum amount of time between the entry of the username, password, and submitting the page in order for it to be considered a valid submitting process. I've even run in to websites that require you to use send_keys 1 at a time for usernames and passwords to avoid some anti-scraping technologies they employ. In these cases, I usually use the following between the calls:
from random import random, randint
def sleepyTime(first=5, second=10):
# returns the value of the time slept (as float)
# sleeps a random amount of time between the number variable in first
# and the number variable second (in seconds)
sleepy_time = round(random() * randint(first, second), 2)
sleepy_time = sleepy_time if sleepy_time > first else (first + random())
sleep(sleepy_time)
return sleepy_time
I don't see what use you have for making the _email and _password variables global, unless they are being changed somewhere in the login function and you want that change to be precipitated out to the other scopes.
How I would try to solve it
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException, TimeoutException
TIME_TIMEOUT = 20 # Twenty-second timeout default
def eprint(*args, **kwargs):
""" Prints an error message to the user in the console (prints to sys.stderr), passes
all provided args and kwargs along to the function as usual. Be aware that the 'file' argument
to print can be overridden if supplied again in kwargs.
"""
print(*args, file=sys.stderr, **kwargs)
def login(driver):
global _email, _password
try:
email = WebDriverWait(driver, TIME_TIMEOUT).until(EC.presence_of_element_located((By.XPATH, "//input[#id='user_email']")))
pw = WebDriverWait(driver, TIME_TIMEOUT).until(EC.presence_of_element_located((By.XPATH, "//input[#id='password']"))
pw.submit()
# if this doesn't work try the following:
# btn_submit = WebDriverWait(driver, TIME_TIMEOUT).until(EC.element_to_be_clickable((By.XPATH, "//button[#track-element='click-for-login']"))
# btn_submit.click()
# if that doesn't work, try to add some random wait times using the
# sleepyTime() example from above to add some artificial waiting to your email entry, your password entry, and the attempt to submit the form.
except NoSuchElementException as ex:
eprint(ex.msg())
except TimeoutException as toex:
eprint(toex.msg)
if __name__ == '__main__':
driver = webdriver.Chrome('/usr/bin/chromium/chromedriver',chrome_options=chrome_options)
#d.get('https://www.google.nl/')
#driver = webdriver.Chrome()
driver.maximize_window()
driver.get('https://www.goeventz.com/')
if login(driver) is not None:
print(create_event(driver))
For headless chrome browser you need to provide window size as well in chrome options.For headless browser selenium unable to know what your window size.Try that and let me know.
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('window-size=1920x1480')
I faced this error as well. Now check your browser if the element is inside the iframe. If so, use driver.find_element(By.CSS_SELECTOR, "#payment > div > div > iframe") and driver.switch_to.frame(iframe) Then you will be able to work out.

How to increase the request page time in python 3 while scraping web pages?

I have started scraping reviews from e-commerce platform and perform sentiment analysis and share it with people on my blog to make the life of people easier and understand everything about the product in just one article.
I am using python packages like selenium and bs4. Here is my code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from contextlib import closing
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver import Firefox
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time
import requests
import re
from bs4 import BeautifulSoup
def remove_non_ascii_1(text):
return ''.join([i if ord(i) < 128 else ' ' for i in text])
with closing(Firefox()) as browser:
site = "https://www.flipkart.com/honor-8-pro-midnight-black-128-gb/product-reviews/itmeymafrghbjcpf?page=1&pid=MOBEWXHMVYBBMZGJ"
browser.get(site)
file = open("review.txt", "w")
for count in range(1, 100):
nav_btns = browser.find_elements_by_class_name('_33m_Yg')
button = ""
for btn in nav_btns:
number = int(btn.text)
if(number==count):
button = btn
break
button.send_keys(Keys.RETURN)
WebDriverWait(browser, timeout=10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "_2xg6Ul")))
read_more_btns = browser.find_elements_by_class_name('_1EPkIx')
for rm in read_more_btns:
browser.execute_script("return arguments[0].scrollIntoView();", rm)
browser.execute_script("window.scrollBy(0, -150);")
rm.click()
page_source = browser.page_source
soup = BeautifulSoup(page_source, "lxml")
ans = soup.find_all("div", class_="_3DCdKt")
for tag in ans:
title = str(tag.find("p", class_="_2xg6Ul").string).replace(u"\u2018", "'").replace(u"\u2019", "'")
title = remove_non_ascii_1(title)
title.encode('ascii','ignore')
content = tag.find("div", class_="qwjRop").div.prettify().replace(u"\u2018", "'").replace(u"\u2019", "'")
content = remove_non_ascii_1(content)
content.encode('ascii','ignore')
content = content[15:-7]
votes = tag.find_all("span", class_="_1_BQL8")
upvotes = int(votes[0].string)
downvotes = int(votes[1].string)
file.write("Review Title : %s\n\n" % title )
file.write("Upvotes : " + str(upvotes) + "\n\nDownvotes : " + str(downvotes) + "\n\n")
file.write("Review Content :\n%s\n\n\n\n" % content )
file.close()
The code is working fine on platform like Amazon, but on Flipkart, after crawling 14 pages I get an error saying "Someting is Wrong!!!" and the crawling stops.
In command line I get this error:
C:\Users\prate\Desktop\Crawler\Git_Crawler\New>python scrape.py
Traceback (most recent call last):
File "scrape.py", line 37, in
WebDriverWait(browser, timeout=10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "_2xg6Ul")))
File "C:\Users\prate\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
There is no message printed as well. I think if I increase the request time interval on the platform it might let me crawl.
What should I do?
The error says it all :
C:\Users\prate\Desktop\Crawler\Git_Crawler\New>python scrape.py Traceback (most recent call last): File "scrape.py", line 37, in WebDriverWait(browser, timeout=10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "_2xg6Ul"))) File "C:\Users\prate\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message:
If you look at the API Docs of the expected_conditions clause presence_of_all_elements_located(locator) it is defined as :
An expectation for checking that there is at least one element present on a web page. locator is used to find the element returns the list of WebElements once they are located
Now, if you browse to the intended webpage :
https://www.flipkart.com/honor-8-pro-midnight-black-128-gb/product-reviews/itmeymafrghbjcpf?page=1&pid=MOBEWXHMVYBBMZGJ
You will find the webpage have no products or reviews and the Locator Strategy which you have adapted as (By.CLASS_NAME, "_2xg6Ul") doesn't identifies any element on the webpage.
Hence even though the synchronization time elapses, no webelements are added to the list and selenium.common.exceptions.TimeoutException is raised.
As you mentioned The code is working fine on platform like Amazon it is worth to mention that the website https://www.flipkart.com is ReactJS based and may differ from website to website

Resources