Python locating elements in html using selenium - python-3.x

i'm new to python selenium package. I'm developing crawler for a bookie site.
I'm unable to click and open an image link.
My code:
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
web = 'https://odibets.com/'
driver.get(web)
driver.implicitly_wait(3)
# btn = driver.find_element_by_css_selector('span.icon')""
btn = driver.find_element_by_xpath("//a[#href='/League'] and //span[text()='League']")
# <img src="https://s3-eu-west-1.amazonaws.com/odibets/img/menu/odi-league-2.png">
# //img[#src ="https://s3-eu-west-1.amazonaws.com/odibets/img/menu/odi-league-2.png"]
# //span[text()='League']
btn.click()
I get the following exception.
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: Unable to locate an element with the xpath expression //a[#href='/League'] and //span[text()='League'] because of the following error: TypeError: Failed to execute 'evaluate' on 'Document': The result is not a node set, and therefore cannot be converted to the desired type. (Session info: chrome=96.0.4664.45) Stacktrace: Backtrace: Ordinal0 [0x010D6903+2517251]
Attached is the snapshot code from chrome developer tools and page itself.

Your href was /league not /League
driver.find_element_by_xpath("//a[#href='/league'] [.//span[contains(.,'League')]]").click()
This also works somehow the element wasn't clicking correctly.
elem=driver.find_element_by_xpath("(//a[#href='/league'] [.//span[contains(.,'League')]])[1]")
driver.execute_script("arguments[0].click()", elem)

Related

Selenium Firefox stuck after downloading a file

I am using python selenium (last version) with geckodriver 0.31.0 and Firefox 103 to login a website and download a file, but after downloading the file, browser is stuck and browser.quit() is never invoked.
This is the relevant code:
s = Service(DRIVER_PATH)
firefox_options = Options()
firefox_options.set_preference("browser.download.folderList", 2) # to not use the default directory for downloading the file
firefox_options.set_preference("browser.download.manager.showWhenStarting", False) # turns off the showing of download progress
firefox_options.set_preference("browser.download.dir", "/home/<myuser>/")
firefox_options.set_preference("browser.download.directory_upgrade", True)
firefox_options.set_preference("browser.download.prompt_for_download", False)
firefox_options.set_preference("browser.download.manager.showWhenStarting", False)
firefox_options.set_preference("browser.download.manager.alertOnEXEOpen", False)
firefox_options.set_preference("browser.download.manager.focusWhenStarting", False)
firefox_options.set_preference("browser.helperApps.alwaysAsk.force", False)
firefox_options.set_preference("browser.download.manager.alertOnEXEOpen", False)
firefox_options.set_preference("browser.download.manager.closeWhenDone", True)
firefox_options.set_preference("browser.download.manager.showAlertOnComplete", False)
firefox_options.set_preference("browser.download.manager.useWindow", False)
firefox_options.set_preference("services.sync.prefs.sync.browser.download.manager.showWhenStarting", False)
firefox_options.set_preference("pdfjs.disabled", True)
firefox_options.add_argument("--disable-infobars")
firefox_options.add_argument("--disable-extensions")
firefox_options.set_preference("network.proxy.socks_remote_dns", True)
browser = webdriver.Firefox(service=s, options=firefox_options)
browser.get(URL)
browser.find_element(By.NAME, "login").send_keys(USER)
browser.find_element(By.NAME, "password").send_keys(PASSWORD)
browser.find_element(By.CLASS_NAME, CLASS).click()
# download file
browser.get(URL + "feed.rss")
time.sleep(3)
browser.quit()
I know I can download the file with python requests by passing selenium cookies but I need to download the file with Selenium.
Firefox 102.3 shows the download window with this profile (at least in java). I needed to add the following config line:
"browser.download.alwaysOpenPanel" = false
I am not sure if this helps with python or if this is the problem. In selenium-java, I loose the focus with certain code if there is a popup window ("Show all downloads").

Apache Airflow did not recognize error in xml and report success

Hello Community I have a question for you, regarding a Python function. in Diser Function I read a large XML into a JSON format. In doing so I want to check if there is an tag within the XML at the first or last position, if so then raise error etc.
Unfortunately I have the problem that sometimes the function doesn't seem to recognize this tag and Apache Airflow then reports a success back.
For this I have then built a second function in advance, which checks the xml in advance via a Beautifulsoup.
But now I always get a failed reported.
Can you explain to me why the "old" checked a success reports, but soup cancels ?
Should I combine the two or is there a more elegant solution in general ?
def parse_large_xml(xml_file_name, json_file_name, extra_fields=None, clean_keys=True):
"""
Converts SAP xml response to json.
- processes the xml file iteratively (one tag at a time)
- capable to deal with large files
Args:
xml_file_name: input filename
json_file_name: output filename
extra_fields: extra fields to add to the json
clean_keys: flag, if set remove prefixes from the response keys
"""
######### Extra check #####
# This is the new extra check
with open(xml_file_name, 'r') as xMl_File:
data = xMl_File.read()
if "error" in set(tag.name for tag in BeautifulSoup(data, 'xml').find_all()):
logging.info(tag.name)
errorMsg= f"error in response file \"{xml_file_name} (XML contains error tag)"
logging.error(msg=errString, exc_info=True, stack_info=True);
raise RuntimeError(errString)
########################################################################################
# This is the old Check
if extra_fields is None:
extra_fields = {}
with open(json_file_name, 'w') as json_file:
for event, elem in progressbar.progressbar(ET.iterparse(xml_file_name, events=('start', 'end'))):
if 'content' in elem.tag and event == 'end':
elem_dict = xmltodict.parse(tostring(elem))
if clean_keys:
elem_dict = clean_response_keys(elem_dict)
response_dict = {'raw_json_response': json.dumps(elem_dict)}
response_dict = {**extra_fields, **response_dict}
response_dict['hash'] = hashlib.sha256(
response_dict['raw_json_response'].encode()).hexdigest()
response_dict['date'] = get_scheduled_date()
json.dump(response_dict, json_file)
json_file.write('\n')
elem.clear()
elif 'error' in elem.tag and event == 'end':
errString = f"error in response file \"{xml_file_name}\":\n{tostring(elem)}"
logging.error(msg=errString, exc_info=True, stack_info=True);
raise RuntimeError(errString)
I´m using Apache Airflow 1.10.15 and composer: 1.16.10 as well as Python 3.
Here is an example error xml as it is returned but not recognized
<error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
<code>
DBSQL_CONNECTION_NO_METADATA
</code>
<message>
Runtime Error: 'DBSQL_CONNECTION_NO_METADATA'. The OData request processing has been abnormal terminated. If "Runtime Error" is not initial, launch transaction ST22 for details and analysis. Otherwise, launch transaction SM21 for system log analysis.
</message>
<timestamp>
20220210031242
</timestamp>
</error>

Google cloud vision api:: AttributeError: 'WebDetection' object has no attribute 'best_guess_labels'

I am trying to call the function "detect web" from Google Cloud Vision API using python. However I am not able to call one of its method named "best_guess_labels". When I tried to call the method, it throws out an error as
"AttributeError: 'WebDetection' object has no attribute 'best_guess_labels':
WebDetection is a json file that was created using this link and stored into a local folder ==> https://cloud.google.com/docs/authentication/getting-started
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="WebDetection.json"
The function of "detect web" is taken from this link --> https://cloud.google.com/vision/docs/detecting-web
Here is the function copied from the above link for your ready reference.
def detect_web(path):
"""Detects web annotations given an image."""
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.web_detection(image=image)
annotations = response.web_detection
if annotations.best_guess_labels:
for label in annotations.best_guess_labels:
print('\nBest guess label: {}'.format(label.label))
if annotations.pages_with_matching_images:
print('\n{} Pages with matching images found:'.format(
len(annotations.pages_with_matching_images)))
for page in annotations.pages_with_matching_images:
print('\n\tPage url : {}'.format(page.url))
if page.full_matching_images:
print('\t{} Full Matches found: '.format(
len(page.full_matching_images)))
for image in page.full_matching_images:
print('\t\tImage url : {}'.format(image.url))
if page.partial_matching_images:
print('\t{} Partial Matches found: '.format(
len(page.partial_matching_images)))
for image in page.partial_matching_images:
print('\t\tImage url : {}'.format(image.url))
if annotations.web_entities:
print('\n{} Web entities found: '.format(
len(annotations.web_entities)))
for entity in annotations.web_entities:
print('\n\tScore : {}'.format(entity.score))
print(u'\tDescription: {}'.format(entity.description))
if annotations.visually_similar_images:
print('\n{} visually similar images found:\n'.format(
len(annotations.visually_similar_images)))
for image in annotations.visually_similar_images:
print('\tImage url : {}'.format(image.url))
However, When i execute the above function using this code
detect_web("download.jpg")
I am getting the below error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-71-6d38dd9b3a76> in <module>()
----> 1 detect_web("download.jpg")
<ipython-input-70-c127dc709a32> in detect_web(path)
13 annotations = response.web_detection
14
---> 15 if annotations.best_guess_labels:
16 for label in annotations.best_guess_labels:
17 print('\nBest guess label: {}'.format(label.label))
AttributeError: 'WebDetection' object has no attribute 'best_guess_labels'
I tried to debugging and found that the "best_guess_labels" is not part of the Json file. I am not sure whether the json file got corrupted, but i tried to redo the exercise, but i still getting the same error.
What might have caused the issue?
I've been using google-cloud-vision==0.34.0 with the following code and I got no errors as well I'm getting a Best guess label: document within the response:
import argparse
import io
import re
from google.cloud import vision
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="file.json"
def detect_web(path):
"""Detects web annotations given an image."""
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.web_detection(image=image)
annotations = response.web_detection
if annotations.best_guess_labels:
for label in annotations.best_guess_labels:
print('\nBest guess label: {}'.format(label.label))
if annotations.pages_with_matching_images:
print('\n{} Pages with matching images found:'.format(
len(annotations.pages_with_matching_images)))
for page in annotations.pages_with_matching_images:
print('\n\tPage url : {}'.format(page.url))
if page.full_matching_images:
print('\t{} Full Matches found: '.format(
len(page.full_matching_images)))
for image in page.full_matching_images:
print('\t\tImage url : {}'.format(image.url))
if page.partial_matching_images:
print('\t{} Partial Matches found: '.format(
len(page.partial_matching_images)))
for image in page.partial_matching_images:
print('\t\tImage url : {}'.format(image.url))
if __name__ == '__main__':
detect_web("file.jpg")

Opening tabs in browser with a variable link

Im using python and selenium, trying to open a new tab. The send_keys function is not opening the tabs but execute_script does. My issue is I have a url that is saved in a variable, and I need to pass that into the script, but I get an error.
Code:
src = 'http://yahoo.com'
driver.execute_script("window.open(" + src + ",'_blank');")
Error Message:
selenium.common.exceptions.WebDriverException: Message: unknown error: Runtime.evaluate threw exception: SyntaxError: missing ) after argument list
Also tried, does not work:
driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 't')
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
Does work, but url is hardcoded:
driver.execute_script("window.open('http://www.google.com/','_blank');")
You can use format to insert a variable.
An example:
driver = webdriver.Chrome(executable_path="/tmp/chromedriver")
link = 'http://example.com'
driver.execute_script('window.open("{}","_blank");'.format(link))
driver.switch_to.window(driver.window_handles[-1])
this worked:
driver.execute_script('''window.open('',"_blank");''')
driver.switch_to.window(driver.window_handles[-1])
driver.get(src)

Clicking a button with Selenium in Python

Goal: use Selenium and Python to search for company name on LinkedIn's search bar THEN click on the "Companies" button in the navigation to arrive to information about companies that are similar to the keyword (rather than individuals at that company). See below for an example. "CalSTRS" is the company I search for in the search bar. Then I want to click on the "Companies" navigation button.
My Helper Functions: I have defined the following helper functions (including here for reproducibility).
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from random import randint
from selenium.webdriver.common.action_chains import ActionChains
browser = webdriver.Chrome()
def li_bot_login(usrnm, pwrd):
##-----Log into linkedin and get to your feed-----
browser.get('https://www.linkedin.com')
##-----Find the Search Bar-----
u = browser.find_element_by_name('session_key')
##-----Enter Username and Password, Enter-----
u.send_keys(usrnm)
p = browser.find_element_by_name('session_password')
p.send_keys(pwrd + Keys.ENTER)
def li_bot_search(search_term):
#------Search for term in linkedin search box and land you at the search results page------
search_box = browser.find_element_by_css_selector('artdeco-typeahead-input.ember-view > input')
search_box.send_keys(str(search_term) + Keys.ENTER)
def li_bot_close():
##-----Close the Webdriver-----
browser.close()
li_bot_login()
li_bot_search('calstrs')
time.sleep(5)
li_bot_close()
Here is the HTML of the "Companies" button element:
<button data-vertical="COMPANIES" data-ember-action="" data-ember-action-7255="7255">
Companies
</button>
And the XPath:
//*[#id="ember1202"]/div[5]/div[3]/div[1]/ul/li[5]/button
What I have tried: Admittedly, I am not very experienced with HTML and CSS so I am probably missing something obvious. Clearly, I am not selecting / interacting with the right element. So far, I have tried...
companies_btn = browser.find_element_by_link_text('Companies')
companies_btn.click()
which returns this traceback:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"link text","selector":"Companies"}
(Session info: chrome=63.0.3239.132)
(Driver info: chromedriver=2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e73),platform=Windows NT 10.0.16299 x86_64)
and
companies_btn_xpath = '//*[#id="ember1202"]/div[5]/div[3]/div[1]/ul/li[5]/button'
browser.find_element_by_xpath(companies_btn_xpath).click()
with this traceback...
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id="ember1202"]/div[5]/div[3]/div[1]/ul/li[5]/button"}
(Session info: chrome=63.0.3239.132)
(Driver info: chromedriver=2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e73),platform=Windows NT 10.0.16299 x86_64)
and
browser.find_element_by_css_selector('#ember1202 > div.application-outlet > div.authentication-outlet > div.neptune-grid.two-column > ul > li:nth-child(5) > button').click()
which returns this traceback...
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"#ember1202 > div.application-outlet > div.authentication-outlet > div.neptune-grid.two-column > ul > li:nth-child(5) > button"}
(Session info: chrome=63.0.3239.132)
(Driver info: chromedriver=2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e73),platform=Windows NT 10.0.16299 x86_64)
It seem that you simply used incorrect selectors.
Note that
#id of div like "ember1002" is dynamic value, so it will be different each time you visit page: "ember1920", "ember1202", etc...
find_element_by_link_text() can be applied to links only, e.g. <a>Companies</a>, but not buttons
Try to find button by its text content:
browser.find_element_by_xpath('//button[normalize-space()="Companies"]').click()
With capybara-py (which can be used to drive Selenium), this is as easy as:
page.click_button("Companies")
Bonus: This will be resilient against changes in the implementation of the button, e.g., using <input type="submit" />, etc. It will also be resilient in the face of a delay before the button appears, as click_button() will wait for it to be visible and enabled.

Resources