I'm currently building a web scraper and I'm trying to find out how to tell if an element contains any text, to run code accordingly
like so:
if (element.hasText):
doStuff()
I want to check this because the scraper sometimes says: AttributeError: 'NoneType' object has no attribute 'text'
You can do this:
try:
if (element.hasText):
doStuff()
except ValueError:
#Contains no text
continue
This basically tests the elements to see if it contains text, if it gives an error it skips it and you know it contains no text.
Related
I'm studying ANSYS pyMAPDL, following the tutorial https://www.youtube.com/watch?v=szHmg-xW_hM&t=1s
It was understood from both the tutorial code itself and pyMAPDL documentation that prnsol returns a str object, but it has the enriched method to_list() which allows the output to be exported to list format. https://mapdldocs.pyansys.com/user_guide/post.html
However, when I run the following line from the sample code, it just reports attribute error
mapdl_s_1_list = mapdl.prnsol('S', 'PRIN',).to_list()
AttributeError: 'str' object has no attribute 'to_list'
If I print out the prnsol() result, it contains both the header and table, I have to do some processing in order to extra the table information for further processing.
Would like to know if there is a quick fix to make the enriched method work? Thanks!
hawkoli1987
I am willing to parse https://2gis.kz , and I encountered the problem that I am getting error while using .text or any methods used to extract text from a class
I am typing the search query such as "fitness"
My window variable is
all_cards = driver.find_elements(By.CLASS_NAME,"_1hf7139")
for card_ in all_cards:
card_.click()
window = driver.find_element(By.CLASS_NAME, "_18lzknl")
This is a quite simplified version of how I open a mini-window with all of the essential information inside it. Below I am attaching the piece of code where I am trying to extract text from a phone number holder.
texts = window.find_elements(By.CLASS_NAME,'_b0ke8')
print(texts) # this prints out something from where I am concluding that this thing is accessible
try:
print(texts.text)
except:
print(".text")
try:
print(texts.text())
except:
print(".text()")
try:
print(texts.get_attribute("innerHTML"))
except:
print('getAttribute("innerHTML")')
try:
print(texts.get_attribute("textContent"))
except:
print('getAttribute("textContent")')
try:
print(texts.get_attribute("outerHTML"))
except:
print('getAttribute("outerHTML")')
Hi, guys, I solved an issue. The .text was not working for some reason. I guess developers somehow managed to protect information from using this method. I used a
get_attribute("innerHTML") # afaik this allows us to get a html code of a particular class
and now it works like a charm.
texts = window.find_elements(By.TAG_NAME, "bdo")
with io.open("t.txt", "a", encoding="utf-8") as f:
for text in texts:
nums = re.sub("[^0-9]", "",
text.get_attribute("innerHTML"))
f.write(nums+'\n')
f.close()
So the problem was that:
I was trying to print a list of items just by using print(texts)
Even when I tried to print each element of texts variable in a for loop, I was getting an error due to the fact that it was decoded in utf-8.
I hope someone will find it useful and will not spend a plethora of time trying to fix such a simple bug.
find_elements method returns a list of web elements. So this
texts = window.find_elements(By.CLASS_NAME,'_b0ke8')
gives you texts a list of web elements.
You can not apply .text method directly on list.
In order to get each element text you will have to iterate over elements in the list and extract that element text, like this:
text_elements = window.find_elements(By.CLASS_NAME,'_b0ke8')
for element in text_elements:
print(element.text)
Also, I'm not sure about locators you are using.
_1hf7139, _18lzknl and _b0ke8 class names are seem to be dynamic class names i.e they may change each browsing session.
I am working on creating a program that would read a list of aircraft registrations from an excel file and return the aircraft type codes.
My source of information is FlightRadar24. (example - https://www.flightradar24.com/data/aircraft/n502dn)
I tried inspecting the elements on the page to find the correct class id to invoke and found the id to be listed as "details" When I run my code, it extracts the aircraft name with the class id/name details, instead of the type code.
See here for the example data
I then changed my approach to using XPath to seek the correct text but with the xpath it prints out
(For Xpath, i used a browser add on to find the exact xpath for the element, fairly confident that it is correct.)
It gives no output. What would you suggest in this particular instance when extracting values without a definite id ?
for i in list_regs:
driver.get('https://www.flightradar24.com/data/aircraft/'+i)
driver.implicitly_wait(3)
load = 0
while load==0:
try:
element = driver.find_element_by_xpath("/html/body/div[5]/div/section/section[2]/div[1]/div[1]/div[2]/div[2]/span")
print('element') #Printing to terminal to see if the right value is returned.
You should probably change your xpath expression to:
//label[.="TYPE CODE"]/following-sibling::span[#class="details"]
and
print('element')
to
print(element)
Edit:
This works for me:
element = driver.find_element_by_xpath('//label[.="TYPE CODE"]/following-sibling::span[#class="details"]')
print(element.text)
Output:
A359
This problem seems very simple, but I'm having trouble finding an already existing solution on StackOverflow.
When I run a sqlalchemy command like the following
valid_columns = db.session.query(CrmLabels).filter_by(user_id=client_id).first()
I get back a CrmLabels object that is not iterable. If I print this object, I get a list
[Convert Source, Convert Medium, Landing Page]
But this is not iterable. I would like to get exactly what I've shown above, except as a list of strings
['Convert Source', 'Convert Medium', 'Landing Page']
How can I run a query that will return this result?
Below change should do it:
valid_columns = (
db.session.query(CrmLabels).filter_by(user_id=client_id)
.statement.execute() # just add this
.first()
)
However, you need to be certain about the order of columns, and you can use valid_columns.keys() to make sure the values are in the expected order.
Alternatively, you can create a dictionary using dict(valid_columns.items()).
I have a script using BeautifulSoup where I am trying to get the text within a span element.
number_of_pages = soup.find('span', attrs={'class':'random})
print(number_of_pages.string)
and it returns a variable like {{lastPage()}} which means it is generated by JS. So, then I changed my script to use Selenium but it returns an element that doesn't contain the text I need. I tried a random website to see if it works there:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("https://hoshiikins.com/") #navigates to hoshiikins.com
spanList= browser.find_elements_by_xpath("/html/body/div[1]/main/div/div[13]/div/div[2]/div/p")
print(spanList)
and what it returns is:
[<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="fe20e73e-5638-420e-a8a0-a8785153c157", element="3065d5b1-f8a6-4e46-9359-87386b4d1511")>]
I then thought it was an issue related to how fast the script runs. So, I added a delay/wait:
element = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.XPATH, "/html/body/div[1]/main/div/div[13]/div/div[2]/div/p"))
)
I even tried different parts of the page and used a class and an ID but I am not getting any text back. Note that I had tried using the spanList.getattribute('value') or spanList.text but they return nothing.
I had this same issue, your variable spanList is an web object, the find elements function doesn't return meaningful text. You have to do one more step and add .text to return the text. You can do this in the print statement
print(spanText.text)
If this tag is an input element then you'll need
print(spanText.get_attribute('value'))
This should print what you are looking for
It sounds like you're perhaps misunderstanding your results, the code you provided for Selenium works with one small change:
driver.get("https://hoshiikins.com/")
spanList = driver.find_elements_by_xpath("/html/body/div[1]/main/div/div[13]/div/div[2]/div/p")
for span in spanList:
print(span.text)
Returns Indivdually Handcrafted with Love, Just for You.
You're using find_elements_by_xpath, which is different from find_element_by_xpath as the former is plural (elements). So all you have to do is either change it to element or iterate over your result set and get the text property of the element.