Selenium unusual output in python - python-3.x

I am getting comments from website I tried it already and it worked well but now it gives me unusual output.
Part of my code:
comments = driver.find_elements_by_class_name("comment-text")
time.sleep(1)
print(comments[1])
The output:
<selenium.webdriver.remote.webelement.WebElement (session="bb6ae0409dd8ec8c191f9bd84f79bea7", element="5f5d4a2f-7a93-41fe-9ca5-aa4e7c525792")>

You want
print(comments[1].text)
You were printing the element itself which is just some GUID (I think). I'm assuming you want the text contained in the element which means you need .text.

Related

Reading and writing a text value using selenium and pandas when the html element has no definite id

I am working on creating a program that would read a list of aircraft registrations from an excel file and return the aircraft type codes.
My source of information is FlightRadar24. (example - https://www.flightradar24.com/data/aircraft/n502dn)
I tried inspecting the elements on the page to find the correct class id to invoke and found the id to be listed as "details" When I run my code, it extracts the aircraft name with the class id/name details, instead of the type code.
See here for the example data
I then changed my approach to using XPath to seek the correct text but with the xpath it prints out
(For Xpath, i used a browser add on to find the exact xpath for the element, fairly confident that it is correct.)
It gives no output. What would you suggest in this particular instance when extracting values without a definite id ?
for i in list_regs:
driver.get('https://www.flightradar24.com/data/aircraft/'+i)
driver.implicitly_wait(3)
load = 0
while load==0:
try:
element = driver.find_element_by_xpath("/html/body/div[5]/div/section/section[2]/div[1]/div[1]/div[2]/div[2]/span")
print('element') #Printing to terminal to see if the right value is returned.
You should probably change your xpath expression to:
//label[.="TYPE CODE"]/following-sibling::span[#class="details"]
and
print('element')
to
print(element)
Edit:
This works for me:
element = driver.find_element_by_xpath('//label[.="TYPE CODE"]/following-sibling::span[#class="details"]')
print(element.text)
Output:
A359

Selenium Webdriver element returns None even with WebDriverWait

I have a script using BeautifulSoup where I am trying to get the text within a span element.
number_of_pages = soup.find('span', attrs={'class':'random})
print(number_of_pages.string)
and it returns a variable like {{lastPage()}} which means it is generated by JS. So, then I changed my script to use Selenium but it returns an element that doesn't contain the text I need. I tried a random website to see if it works there:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("https://hoshiikins.com/") #navigates to hoshiikins.com
spanList= browser.find_elements_by_xpath("/html/body/div[1]/main/div/div[13]/div/div[2]/div/p")
print(spanList)
and what it returns is:
[<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="fe20e73e-5638-420e-a8a0-a8785153c157", element="3065d5b1-f8a6-4e46-9359-87386b4d1511")>]
I then thought it was an issue related to how fast the script runs. So, I added a delay/wait:
element = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.XPATH, "/html/body/div[1]/main/div/div[13]/div/div[2]/div/p"))
)
I even tried different parts of the page and used a class and an ID but I am not getting any text back. Note that I had tried using the spanList.getattribute('value') or spanList.text but they return nothing.
I had this same issue, your variable spanList is an web object, the find elements function doesn't return meaningful text. You have to do one more step and add .text to return the text. You can do this in the print statement
print(spanText.text)
If this tag is an input element then you'll need
print(spanText.get_attribute('value'))
This should print what you are looking for
It sounds like you're perhaps misunderstanding your results, the code you provided for Selenium works with one small change:
driver.get("https://hoshiikins.com/")
spanList = driver.find_elements_by_xpath("/html/body/div[1]/main/div/div[13]/div/div[2]/div/p")
for span in spanList:
print(span.text)
Returns Indivdually Handcrafted with Love, Just for You.
You're using find_elements_by_xpath, which is different from find_element_by_xpath as the former is plural (elements). So all you have to do is either change it to element or iterate over your result set and get the text property of the element.

webp2y XML helper sanitize line breaks under python3

In my web2py app I’m processing a list of items, where the user can click on a link for each item to select this. An item has an UUID, a title and a description. For a better orientation the item description is also displayed as link title. To prevent injections by and to escape tags in the description I’m using the XML sanitizer as follows:
A(this_item.title, \
callback = URL('item', 'select', \
vars=dict(uuid=this_item.uuid), user_signature=True), \
_title=XML(str_replace(this_item.description, {'\r\n':'
', '<':'<', '>':'>'}), sanitize=True))
Using Python 2 everything was fine. Since I have switched to Python 3 I have the following problem. When the description contains line breaks the sanitizer is not working anymore. For example the following string produces by my str_replace routine is fine to be sanitized by the XML helper under Python 2 but not under Python 3:
Header

Line1
Line2
Line3
Sanitizing line breaks escaped by 
 is the problem with Python 3 (but not with Python 2). Everything else is no problem for the XML helper to sanitize (e.g. less than or greater than, I need these, since if there is no description it is generated as <no description>).
How can be line breaks sanitized by the XML helper running web2py under Python3?
Thanks for any support!
Best regards
Clemens
This is down to a change in python's HTMLParser class between 3.4 and 3.5, where convert_charrefs started defaulting to True:
Python 3.4 DeprecationWarning convert_charrefs
I think the following fix in the your web2py yatl source should correct it:
https://github.com/web2py/yatl/compare/master...timnyborg:patch-1

How to remove special usual characters from a pandas dataframe using Python

I have a file some crazy stuff in it. It looks like this:
I attempted to get rid of it using this:
df['firstname'] = map(lambda x: x.decode('utf-8','ignore'), df['firstname'])
But I wound up with this in my dataframe: <map object at 0x0000022141F637F0>
I got that example from another question and this seems to be the Python3 method for doing this but I'm not sure what I'm doing wrong.
Edit: For some odd reason someone thinks that this has something to do with getting a map to return a list. The central issue is getting rid of non UTF-8 characters. Whether or not I'm even doing that correctly has yet to be established.
As I understand it, I have to apply an operation to every character in a column of the dataframe. Is there another technique or is map the correct way and if it is, why am I getting the output I've indicated?
Edit2: For some reason, my machine wouldn't let me create an example. I can now. This is what i'm dealing with. All those weird characters need to go.
import pandas as pd
data = [['🦎Ale','Αλέξανδρα'],['��Grain','Girl🌾'],['Đỗ Vũ','ên Anh'],['Don','Johnson']]
df = pd.DataFrame(data,columns=['firstname','lastname'])
print(df)
Edit 3: I tired doing this using a reg ex and for some reason, it still didn't work.
df['firstname'] = df['firstname'].replace('[^a-zA-z\s]',' ')
This regex works FINE in another process, but here, it still leaves the ugly characters.
Edit 4: It turns out that it's image data that we're looking at.

How to save the output of text from selenium chrome (Python)

I'm using Selenium for extracting comments of Youtube.
Everything went well. But when I print comment.text, the output is the last sentence.
I don't know who to save it for further analyze (cleaning and tokenization)
path = "/mnt/c/Users/xxx/chromedriver.exe"
This is the path that I saved and downloaded my chrome
chrome = webdriver.Chrome(path)
url = "https://www.youtube.com/watch?v=WPni755-Krg"
chrome.get(url)
chrome.maximize_window()
scrolldown
sleep = 5
chrome.execute_script('window.scrollTo(0, 500);'
time.sleep(sleep)
chrome.execute_script('window.scrollTo(0, 1080);')
time.sleep(sleep)
text_comment = chrome.find_element_by_xpath('//*[#id="contents"]')
comments = text_comment.find_elements_by_xpath('//*[#id="content-text"]')
comment_ids = []
Try this approach for getting the text of all comments. (the forloop part edited- there was no indention in the previous code.)
for comment in comments:
comment_ids.append(comment.get_attribute('id'))
print(comment.text)
when I print, i can see all the texts here. but how can i open it for further study. Should i always use for loop? I want to tokenize the texts but the output is only last sentence. Is there a way to save this .text file with the whole texts inside it and open it again? I googled it a lot but it wasn't successful.
So it sounds like you're just trying to store these comments to reference later. Your current solution is to append them to a string and use a token to create substrings? I'm not familiar with pythons data structures, but this sounds like a great job for an array or a list depending on how you plan to reference this data.

Resources