How to get text which is inside the span tag using selenium webdriver? - python-3.x

I want to get the text which is inside the span. However, I am not able to achieve it. The text is inside ul<li<span<a<span. I am using selenium with python.
Below is the code which I tried:
departmentCategoryContent = driver.find_elements_by_class_name('a-list-item')
departmentCategory = departmentCategoryContent.find_elements_by_tag_name('span')
after this, I am just iterating departmentCategory and printing the text using .text i.e
[ print(x.text) for x in departmentCategory ]
However, this is generating an error: AttributeError: 'list' object has no attribute 'find_elements_by_tag_name'.
Can anyone tell me what I am doing wrong and how I can get the text?

Problem:
As far as I understand, departmentCategoryContent is a list, not a single WebElement, then it doesn't have the find_elements_by_tag_name() method.
Solution:
you can choose 1 of 2 ways below:
You need for-each of list departmentCategoryContent first, then find_elements_by_tag_name().
Save time with one single statement, using find_elements_by_css_selector():
departmentCategory = driver.find_elements_by_css_selector('.a-spacing-micro.apb-browse-refinements-indent-2 .a-list-item span')
[ print(x.text) for x in departmentCategory ]
Test on devtool:
Explanation:
Your locator .a-list-item span will return all the span tag belong to the div that has class .a-list-time. There are 88 items containing the unwanted tags.
So, you need to add more specific locator to separate the other div. In this case, I use some more classes. .a-spacing-micro.apb-browse-refinements-indent-2

You're looping over the wrong thing. You want to loop through the 'a-list-item' list and find a single span element that is a child of that webElement. Try this:
departmentCategoryContent = driver.find_elements_by_class_name('a-list-item')
print(x.find_element_by_tag_name('span').text) for x in departmentCategoryContent
note that the second dom search is a find_element (not find_elements) which will return a single webElement, not a list.

Related

(Python) Selenium list objects are not callable when looping through them (autocomplete don't work either)

I'm fairly new to Python and Selenium.
I'm trying to gather elements from a webpage using Python and Selenium in VS Code.
I've already done similar things in other webpages so I can confirm the setups and the drivers all work fine.
Here is the code which I'll try to explain line by line.
'// Creating an empty Array of Names'
Names = []
'// Finding and Saving the Table im interested in'
Table = driver.find_element_by_id("pokedex")
'// Finding and Saving in a list all the elements with a particular class name'
NameCells = Table.find_elements_by_class_name("cell-name")
'// Looping through the List'
for NameCell in NameCells:
'// If it finds a child element with a particular class...'
if NameCell.find_elements(By.CLASS_NAME, "text-muted"):
'// ... append it in the array once transformed into text'
Names.append(NameCell.find_element(By.CLASS_NAME, "text-muted").text)
'// ... else...'
else:
'// ... append an element with another class into the array once transformed into text.'
Names.append(NameCell.find_element(By.CLASS_NAME, "ent-name").text)
'// .. and print the array.'
print(Names)
The problem is that while I can use functions like "find_element" in the second and third line of code... I can't use it in the for loop, in the fifth line of code.
VS Code doesn't even show me the expected functions after digiting the ".".
I tried to complete it myself hoping it worked but of course it didn't.
Why does it happen?
Why can't I use WebElements functions at times?
I'm noticing it's happening mainly on Lists of objects rather than single objects.

How to parse the only the second span tag in an HTML document using python bs4

I want to parse only one span tag in my html document. There are three sibling span tags without any class or I'd. I am targeting the second one only using BeautifulSoup 4.
Given the following html document:
<div class="adress">
<span>35456 street</span>
<span>city, state</span>
<span>zipcode</span>
</div>
I tried:
for spn in soup.findAll('span'):
data = spn[1].text
but it didn't work. The expected result is the text in the second span stored in a a variable:
data = "city, state"
and how to to get both the first and second span concatenated in one variable.
You are trying to slice an individual span (a Tag instance). Get rid of the for loop and slice the findAll response instead, i.e.
>>> soup.findAll('span')[1]
<span>city, state</span>
You can get the first and second tags together using:
>>> soup.findAll('span')[:2]
[<span>35456 street</span>, <span>city, state</span>]
or, as a string:
>>> "".join([str(tag) for tag in soup.findAll('span')[:2]])
'<span>35456 street</span><span>city, state</span>'
Another option:
data = soup.select_one('div > span:nth-of-type(2)').get_text(strip=True)
print(data)
Output:
city, state

Can't acess dynamic element on webpage

I can't acess a textbox on a webpage box , it's a dynamic element. I've tried to filter it by many attributes on the xpath but it seems that the number that changes on the id and name is the only unique part of the element's xpath. All the filters I try show at least 3 element. I've been trying for 2 days, really need some help here.
from selenium import webdriver
def click_btn(submit_xpath): #clicks on button
submit_box = driver.find_element_by_xpath(submit_xpath)
submit_box.click()
driver.implicitly_wait(7)
return
#sends text to text box
def send_text_to_box(box_xpath, text):
box = driver.find_element_by_xpath(box_xpath)
box.send_keys(text)
driver.implicitly_wait(3)
return
descr = 'Can't send this text'
send_text_to_box('//*[#id="textfield-1285-inputEl"]', descr)' #the number
#here is the changeable part on the xpath
:
edit: it worked now with the following xpath //input[contains(#id, 'textfield') and contains(#aria-readonly, 'false') and contains (#class, 'x-form-invalid-field-default')] . Hopefully I found something specific on this element:
You can use partial string to find the element instead of an exact match. That is, in place of
send_text_to_box('//*[#id="textfield-1285-inputEl"]', descr)' please try send_text_to_box('//*[contains(#id,"inputEl")]', descr)'
In case if there are multiple elements that have string 'inputE1' in id, you should look for something else that remains constant(some other property may be). Else, try finding some other element and then traverse to the required input.

How to get content from div class using Selenium - Python?

I want to extract the contents on the left side using the div class <table__9d458b97>
I don't want to use xpath to do the job because some contents don't sit in the same position.
driver2 = webdriver.Chrome(r'XXXX\chromedriver.exe')
driver2.get("https://www.bloomberg.com/profiles/people/15103277-mark-elliot-zuckerberg")
Here is my code using the xpath (how can I use the class?):
boardmembership_table=driver2.find_elements_by_xpath('//*[#id="root"]/div/section/div[5]')[0]
boardmembership_table.text
Thanks for the help!
You could make use of css_selector
Your can use the following code
from selenium.webdriver import Chrome
driver2 = Chrome()
driver2.get("https://www.bloomberg.com/profiles/people/15103277-mark-elliot-zuckerberg")
els = driver2.find_elements_by_css_selector('.table__9d458b97[role="table"]')
for el in els:
print(el.text)
driver2.close()
Note that you are using find_elements_by_css_selector which will return a list of elements or an empty list if None found.
You can use the below xpath, if you want to access Board Membership table.
//*[#id="root"]/div/section/div[h2[.='Board Memberships']]
Also you can use following sibling to get the div next to the title 'Board Membership'
like this
'//h2[contains(.,"Board Membership")]//following-sibling::div'

How to convert selenium webelelements to list of strings in python

I have gathered obligatory data from the scopus website. my outputs have been saved in a list named "document". when I use type method for each element of this list, the python returns me this class:
"<class'selenium.webdriver.firefox.webelement.FirefoxWebElement'>"
In continius in order to solve this issue, I have used text method such this:
document=driver.find_elements_by_tag_name('td')
for i in document:
print i.text
So, I could see the result in text format. But, when I call each element of the list independently, white space is printed in this code:
x=[]
for i in document:
x.append(i.text)
print (x[2]) will return white space.
What should I do?
As you have used the following line of code :
document=driver.find_elements_by_tag_name('td')
and see the output on Console as :
"<class'selenium.webdriver.firefox.webelement.FirefoxWebElement'>"
This is the expected behavior as Selenium prints the reference of the Nodes matching your search criteria.
As per your Code Attempt to print the text leaving out the white spaces you can use the following code block :
x=[]
document = driver.find_elements_by_tag_name('td')
for i in document :
if (i.get_attribute("innerHTML") != "null") :
x.append(i.get_attribute("innerHTML"))
print(x[2])
My code was correct. But, the selected elements for displaying were space. By select another element, the result was shown.

Resources