Attribute raises NoSuchElementException in selenium Python - python-3.x

I can get the element which attribute contains X but I can't get the attribute itself. Why?
This code does not raise any errors:
links = browser.find_elements_by_xpath('//div[#aria-label="Conversations"]//a[contains(#data-href, "https://www.messenger.com/t/")]')
This code raises NoSuchElementException error:
links = browser.find_elements_by_xpath('//div[#aria-label="Conversations"]//a[contains(#data-href, "https://www.messenger.com/t/")]/#data-href')
This code is extracted from Messenger main page with the chats. I would like to get all links in the ul list...
I don't get it... Any help please?

To get all the links data-href value you need to traverse links elements and use the get_attribute()
links = browser.find_elements_by_xpath('//div[#aria-label="Conversations"]//a[contains(#data-href, "https://www.messenger.com/t/")]')
ul_list=[link.get_attribute("data-href") for link in links]
print(ul_list)

wait = WebDriverWait(driver, 20)
lists=wait.until(EC.presence_of_all_elements_located((By.XPATH, '//div[#aria-label="Conversations"]//a[contains(#data-href, "https://www.messenger.com/t/")]/#data-href')))
for element in lists:
print element.text
Note: Add below imports to your solution
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

Related

Scrape Product Image with BeautifulSoup (Error)

I need your help. I'm working on a telegram bot which sends me all the sales from amazon.
It works well but this function doesn't work properly. I have always the same error that, however, blocks the script
imgs_str = img_div.img.get('data-a-dynamic-image') # a string in Json format
AttributeError: 'NoneType' object has no attribute 'img'
def take_image(soup):
img_div = soup.find(id="imgTagWrapperId")
imgs_str = img_div.img.get('data-a-dynamic-image') # a string in Json format
# convert to a dictionary
imgs_dict = json.loads(imgs_str)
#each key in the dictionary is a link of an image, and the value shows the size (print all the dictionay to inspect)
num_element = 0
first_link = list(imgs_dict.keys())[num_element]
return first_link
I still don't understand how to solve this issue.
Thanks for All!
From the looks of the error, soup.find didn't work.
Have you tried using images = soup.findAll("img",{"id":"imgTagWrapperId"})
This will return a list
Images are not inserted in HTML Page they are linked to it so you need wait until uploaded. Here i will give you two options;
1-) (not recommend cause there may be a margin of error) simply; you can wait until the image is loaded(for this you can use "time.sleep()"
2-)(recommend) I would rather use Selenium Web Driver. You also have to wait when you use selenium, but the good thing is that selenium has a unique function for this job.
I will show how make it with selenium;
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Chrome()
browser.get("url")
delay = 3 # seconds
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'imgTagWrapperId')))# I used what do you want find
print ("Page is ready!")
except TimeoutException:
print ("Loading took too much time!")
More Documention
Code example for way 1
Q/A for way 2

Can't locate elements from a website using selenium

Trying to scrape data from a business directory but I keep getting the data was not found
name =
driver.find_elements_by_xpath('/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text
# Results in: IndexError: list index out of range
So I tried to use WebDriverWait to make the code wait for the data to load but it doesn't find the elements, even though the data get loaded to the website.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from bs4 import BeautifulSoup
import requests
import time
url='https://www.dmcc.ae/business-search?directory=1&submissionGuid=2c8df029-a92e-4b5d-a014-7ef9948e664b'
driver = webdriver.Firefox()
driver.get(url)
wait=WebDriverWait(driver,50)
wait.until(EC.visibility_of_element_located((By.CLASS_NAME,'searched-list ng-scope')))
name = driver.find_elements_by_xpath('/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text
print(name)
driver.switch_to.frame(driver.find_element_by_css_selector("#pym-0 iframe"))
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located(
(By.CSS_SELECTOR, '.searched-list.ng-scope')))
name = driver.find_elements_by_xpath(
'/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text
its inside iframe , to interact with iframe element switch to it first. Here iframe doesn't have any unique identified . So we used the parent div which had unique id as reference from that we found the child iframe
now if you want to interact outside iframe use;
driver.switch_to.default_content()
<iframe src="https://dmcc.secure.force.com/Business_directory_Page?initialWidth=987&childId=pym-0&parentTitle=List%20of%20Companies%20Registered%20in%20Dubai%2C%20DMCC%20Free%20Zone&parentUrl=https%3A%2F%2Fwww.dmcc.ae%2Fbusiness-search%3Fdirectory%3D1%26submissionGuid%3D2c8df029-a92e-4b5d-a014-7ef9948e664b" width="100%" scrolling="no" marginheight="0" frameborder="0" height="3657px"></iframe>
Switch to iframe and handle the accept button.
driver.get('https://www.dmcc.ae/business-search?directory=1&submissionGuid=2c8df029-a92e-4b5d-a014-7ef9948e664b')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#hs-eu-confirmation-button"))).click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,'#pym-0 > iframe')))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,'.searched-list.ng-scope')))
name = driver.find_elements_by_xpath('//*[#id="directory_list"]/div/div/div/div[1]/h4')[0]
print(name.text))
Outputs
1 BOXOFFICE DMCC

unable to click the dropdown element in selenium

i tried so many xpath in selenium but all were unable to click the element and always give me a error element not found or element not interactable
how to solve it any help would be appreciated
Here is xpath of the Element is give Below:
(//a[#href='javascript:void(0)' and #class='select2-choice select2-default'])[1]
Try with ".//*[#id='s2id_search_input']/a"
Wait for element to be clickable before click on it:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# ...
wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#s2id_search_input a.select2-choice'))).click()
With scroll:
element = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#s2id_search_input a.select2-choice')))
driver.execute_script('arguments[0].scrollIntoView()', element)
element.click()
Click using JavaScript:
element = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#s2id_search_input a.select2-choice')))
driver.execute_script('arguments[0].click()', element)

find aria-label in html page using soup python

i have html pages, with this code :
<span itemprop="title" data-andiallelmwithtext="15" aria-current="page" aria-label="you in page
number 452">page 452</span>
i want to find the aria-label, so i have tried this:
is_452 = soup.find("span", {"aria-label": "you in page number 452"})
print(is_452)
i want to get the result :
is_452 =page 452
i'm getting the result:
is_452=none
how to do it ?
It has line breaks in it, so it doesn't match through text.Try the following
from simplified_scrapy.simplified_doc import SimplifiedDoc
html='''<span itemprop="title" data-andiallelmwithtext="15" aria-current="page" aria-label="you in page
number 452">page 452</span>'''
doc = SimplifiedDoc(html)
is_452 = doc.getElementByReg('aria-label="you in page[\s]*number 452"',tag="span")
print (is_452.text)
Possibly the desired element is a dynamic element and you can use Selenium to extract the value of the aria-label attribute inducing WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "section#header a.cart-heading[href='/cart']"))).get_attribute("aria-label"))
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//section[#id='header']//a[#class='cart-heading' and #href='/cart']"))).get_attribute("aria-label"))
Note : You have to add the following imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
The reason soup fails in doing this is because of the line break. I have a simpler solution which doesn't use any separate library, just BeautifulSoup only. I know this question is old, but it has 1k views so it's clear many people search up this question.
You can use triple-quote strings to take into account the newline.
This:
is_452 = soup.find("span", {"aria-label": "you in page number 452"})
print(is_452)
Would become:
search_label = """you in page
number 452"""
is_452 = soup.find("span", {"aria-label": search_label})
print(is_452)

How to extract all href from a class in Python Selenium?

I am trying to extract people's href from the URL https://www.dx3canada.com/agenda/speakers.
I tried:
elems = driver.find_elements_by_css_selector('.display-flex card vancouver')
href_output = []
for ele in elems:
href_output.append(ele.get_attribute("href"))
print(href_output)
But the output list returns nothing...
The expected href shown as the image below and I hope the outputs as a list of hrefs:
I really appreciate the help!
To extract the people's href attribute from the URL https://www.dx3canada.com/agenda/speakers as the the desired elements are within an <iframe> so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the visibility of all elements located.
You can use the following Locator Strategies:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.dx3canada.com/agenda/speakers')
WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#whovaIframeSpeaker")))
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.display-flex.card.vancouver")))])
Console Output:
['https://whova.com/embedded/speaker_detail/dcrma_202003/9942778/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907682/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907688/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907676/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907696/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907690/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907670/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907693/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9942779/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9908087/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907671/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907681/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907673/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907678/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907689/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907674/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907684/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907685/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907686/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9942780/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907695/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907687/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907683/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907692/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907672/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907697/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907680/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907679/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907675/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907677/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907694/']
Here you can find a relevant discussion on Ways to deal with #document under iframe
Your images are in an iframe, so you will need to switch to this before you can scrape the href attributes using frame_to_be_available_and_switch_to_it.
Then, to get the list of all href attributes, you may need to run some Javascript to scroll the image into view, and handle the case where the images may be lazy loading the href:
# first, switch to iframe
WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[#id='whovaIframeSpeaker']")))
elements_list = driver.find_elements_by_xpath("//div[contains(#class, 'template-section-body')]/a[contains(#class, 'display-flex card vancouver')]")
for element in elements_list:
driver.execute_script("arguments[0].scrollIntoView(true);", element)
print(element.get_attribute("href"))
The results of this code:
For your css selector use .display-flex.card.vancouver instead.
elems = driver.find_elements_by_css_selector('.display-flex.card.vancouver')
Each word is a class, so you need to place a dot in the front of each one.

Resources