Selenium's .text just giving blank lines - python-3.x

I've got this:
....
events = driver.find_elements_by_class_name("textblock")
for event in events:
content = event.text
print(content)
print(events)
This is the element, there are multiple. Each with different text:
<div class="textblock">example text</div>
Here's the output:
# one blank line for every item in list below...
# items below have been striped
[<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="", element="")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="", element="")> # and so on...
As you can see, instead of the text inside the div, I'm getting blank lines. Why is that?

Selenium will only return visible text. So, if this div is hidden in any way, or obstructed by another element, it will return an empty string because it is not visible. If you still need the text even if it's not visible, use javascript:
events = driver.find_elements_by_class_name("textblock")
for event in events:
content = driver.execute_script('return arguments[0].textContent;', event)
print(content)
print(events)

As per the HTML you have shared your code block looks good to me. However I would suggest you to get a bit granular and instead of using .text method we can use get_attribute("innerHTML") method as follows :
....
events = driver.find_elements_by_class_name("textblock")
for event in events:
content = event.get_attribute("innerHTML")
print(content)
print(events)

Related

can't click() an onclick element with selenium (tried text link, partial text link, xpath, css selector)

I need to scrap some data from this url:
https://www.cnrtl.fr/definition/coupe
The data/results I need to scrap are located in those 3 different tabs:
I'm unable to click on the onclick element which should let me switch from a tab to another.
Here the html code for one of the 3 onclick elements:
The 3 onclick elements differ from each other by the number at the end:
#COUPE1:
return sendRequest(5,'/definition/coupe//0');
#COUPE2:
return sendRequest(5,'/definition/coupe//1');
#COUPER:
return sendRequest(5,'/definition/coupe//2');
I tried to find them by link text, partial link text, xpath and css selector.
I've followed this thread:
Python + Selenium: How can click on "onclick" elements?
Also try the contains and text() method.
Without success.
There are a few ways you could do this. I chose the method I did because the page reloads causing the elements to become stale.
#Get the URL
driver.get("https://www.cnrtl.fr/definition/coupe")
#Find the parent element of the tabs
tabs = driver.find_element(By.ID, 'vtoolbar')
#Get all the list items under the parent (tabs)
lis = tabs.find_elements(By.TAG_NAME, 'li')
#loop over them (skipping the first tab, because that's already loaded)
for i in range(1, len(lis)):
#Execute the same JS as the page would on click, using the index of the loop
driver.execute_script(f"sendRequest(5,'/definition/coupe//{i}');")
#Sleep to visualise the clicking
time.sleep(3)

I need help in Python with displaying the contents of a 2D Set into a Tkinter Textbox

Disclaimer: I have only begun to learn about Python. I took a crash course just to learn the very basics about a month ago and the rest of my efforts to learn have all been research thru Google and looking at solutions here in Stack Overflow.
I am trying to create an application that will read all PDF files stored in a folder and extract their filenames, page numbers, and the contents of the first page, and store this information into a 2D set. Once this is done, the application will create a tkinter GUI with 2 listboxes and 1 text box.
The application should display the PDF filenames in the first listbox, and the corresponding page numbers of each file in the second listbox. Both listboxes are synched in scrolling.
The text box should display the text contents on the first page of the PDF.
What I want to happen is that each time I click a PDF filename in the first listbox with the mouse or with up or down arrow keys, the application should display the contents of the first page of the selected file in the text box.
This is how my GUI looks and how it should function
https://i.stack.imgur.com/xrkvo.jpg
I have been successful in all other requirements so far except the part where when I select a filename in the first listbox, the contents of the first page of the PDF should be displayed in the text box.
Here is my code for populating the listboxes and text box. The contents of my 2D set pdfFiles is [['PDF1 filename', 'PDF1 total pages', 'PDF1 text content of first page'], ['PDF2 filename', 'PDF2 total pages', 'PDF2 text content of first page'], ... etc.
===========Setting the Listboxes and Textbox=========
scrollbar = Scrollbar(list_2)
scrollbar.pack(side=RIGHT, fill=Y)
list_1.config(yscrollcommand=scrollbar.set)
list_1.bind("<MouseWheel>", scrolllistbox2)
list_2.config(yscrollcommand=scrollbar.set)
list_2.bind("<MouseWheel>", scrolllistbox1)
txt_3 = tk.Text(my_window, font='Arial 10', wrap=WORD)
txt_3.place(relx=0.5, rely=0.12, relwidth=0.472, relheight=0.86)
scrollbar = Scrollbar(txt_3)
scrollbar.pack(side=RIGHT, fill=Y)
list_1.bind("<<ListboxSelect>>", CurSelect)
============Populating the Listboxes with the content of the 2D Set===
i = 0
while i < count:
list_1.insert(tk.END, pdfFiles[i][0])
list_2.insert(tk.END, pdfFiles[i][1])
i = i + 1
============Here is my code for CurSelect function========
def CurSelect(evt):
values = [list_1.get(idx) for idx in list_1.curselection()]
print(", ".join(values)) ????
========================
The print command above is just my test command to show that I have successfully extracted the selected item in the listbox. What I need now is to somehow link that information to its corresponding page content in my 2D list and display it in the text box.
Something like:
1) select the filename in the listbox
2) link the selected filename to the filenames stored in the pdfFilename 2D set
3) once filename is found, identify the corresponding text of the first page
4) display the text of the first page of the selected file in the text box
I hope I am making sense. Please help.
You don't need much to finish it. You just need some small things:
1. Get the selected item of your listbox:
selected_indexes = list_1.curselection()
first_selected = selected_indexes[0] # it's possible to select multiple items
2. Get the corresponding PDF text:
pdf_text = pdfFiles[first_selected][2]
3. Change the text of your Text widget: (from https://stackoverflow.com/a/20908371/8733066)
txt_3.delete("1.0", tk.END)
txt_3.insert(tk.END, pdf_text)
so replace your CurSelect(evt) method with this:
def CurSelect(evt):
selected_indexes = list_1.curselection()
first_selected = selected_indexes[0]
pdf_text = pdfFiles[first_selected][2]
txt_3.delete("1.0", tk.END)
txt_3.insert(tk.END, pdf_text)

How to click "Next" button until it no longer exists - Python, Selenium, Requests

I am scraping data from a webpage that is paginated, and once I finish scraping one page, I need to click the next button and continue scraping the next page. I then need to stop once I have scraped all of the pages and a next button no longer exists. Below contains the html around the "Next" button that I need to click.
<tr align="center">
<td colspan="8" bgcolor="#FFFFFF">
<br>
<span class="paging">
<b> -- Page 1 of 3 -- </b>
</span>
<p>
<span class="paging">
<a href="page=100155&by=state&state=AL&pagenum=2"> .
<b>Next -></b>
</a>
</span>
<span class="paging">
Last ->>
</span>
</p>
</td>
</tr>
I have tried selecting on class and on link text, and both have not worked for me in my current attempts.
2 examples of my code:
while True:
try:
link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "Next ->"))).click()
except TimeoutException:
break
while True:
try:
link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "paging"))).click()
except TimeoutException:
break
All of the solutions I have found online have not worked, and have primarily ended with the following error:
ElementClickInterceptedException: Message: element click
intercepted: Element <a href="?
page=100155&by=state&state=AL&pagenum=2">...</a> is not
clickable at point (119, 840). Other element would receive the
click: <body class="custom-background hfeed" style="position:
relative; min-height: 100%; top: 0px;">...</body>
(Session info: chrome=76.0.3809.132)
If the remainder of the error code would be helpful to review, please let me know and I will update the post with this error.
I have looked at the following resources, all to no avail:
Python Selenium clicking next button until the end
python - How to click "next" in Selenium until it's no longer available?
Python Selenium Click Next Button
Python Selenium clicking next button until the end
Selenium clicking next button programmatically until the last page
How can I make Selenium click on the "Next" button until it is no longer possible?
Could anyone provide suggestions on how I can select the "Next" button (if it exists) and go to the next page with this set of HTML? Please let me know if you need any further clarification on the request.
We can approach this problem through the solution using two major libraries - selenium and requests.
Approach - Scrape the page for page number and next page link every time
Using Selenium (If the site is Dynamic)
We can check if the page we are on is the last page or not, and if it is not the last page, we can check for the next button (assuming the website follows the same html structure for paging in all pages)
stop = False
driver.get(url)
while not stop:
paging_elements = driver.find_elements_by_class_name("paging")
page_numbers = paging_elements[0].text.strip(" -- ").split("of")
## Getting the current page number and the final page number
final = int(page_numbers[1].strip())
current = int(page_numbers[0].split("Page")[-1].strip())
if current==final:
stop=True
else:
next_page_link = paging_elements[-2].find_element_by_name("a").get_attribute('href')
driver.get(next_page_link)
time.sleep(5) # This gap can be changed as per the load time of the page
Using Requests and BS4 (If the site is static)
import requests
r = requests.get(url)
stop = False
while not stop:
soup = BeautifulSoup(r.text, 'html.parser')
paging_elements = soup.find_all('span', attrs={'class': "paging"})
page_numbers = paging_elements[0].text.strip(" -- ").split("of")
## Getting the current page number and the final page number
final = int(page_numbers[1].strip())
current = int(page_numbers[0].split("Page")[-1].strip())
if current==final:
stop=True
else:
next_page_link = paging_elements[-2].find("a").get('href')
r = request.get(next_page_link)
Alternative approaches
One method is using the URL of the website itself instead of the button-clicking process as the button click is intercepted in this case.
Most web pages have a page attribute added to their URL (visible for pages >=2). So, a paginated website might have URLs such as:
www.targetwebsite.com/category?page_num=1
www.targetwebsite.com/category?page_num=2
www.targetwebsite.com/category?page_num=3
and so on.
In such cases, one can simply iterate over the page numbers until the final page number (as originally out in the proposed answer). This approach eliminates the breakage possibility of the target website changing CSS layout/style.
Furthermore, there might be a requirement to create the next_page_link by appending the base URL as done for next_url in the other question (line 40-41):
next_url = next_link.find("a").get("href")
r = session.get("https://reverb.com/marketplace" + next_url)
I hope this helps!
It sounds like you're asking two different questions here:
How to click Next button until it no longer exists
How to click Next button with Javascript.
Here's a solution to #2 -- Javascript clicking:
public static void ExecuteJavaScriptClickButton(this IWebDriver driver, IWebElement element)
{
((IJavaScriptExecutor) driver).ExecuteScript("arguments[0].click();", element);
}
In the above code, you have to cast your WebDriver instance as IJavascriptExecutor, which allows you to run JS code through Selenium. The parameter element is the element you wish to click -- in this case, the Next button.
Based on your code sample, your Javascript click may look something like this:
var nextButton = driver.findElement(By.LINK_TEXT, "Next ->"));
driver.ExecuteJavascriptClickButton(nextButton);
Now, moving onto your other issue -- clicking until the button is no longer visible. I would implement this in a while loop that breaks whenever the Next button no longer exists. I also recommend implementing a function that can check the presence of the Next button, and ignore the ElementNotFound or NoSuchElement exception in case the button does not exist, to avoid breaking your test. Here's a sample that includes an ElementExists implementation:
public bool ElementExists(this IWebDriver driver, By by)
{
// attempt to find the element -- return true if we find it
try
{
return driver.findElements(by).Count > 0;
}
// catch exception where we did not find the element -- return false
catch (Exception e)
{
return false;
}
}
public void ClickNextUntilInvisible()
{
while (driver.ElementExists(By.LINK_TEXT, "Next ->"))
{
// find next button inside while loop so it does not go stale
var nextButton = driver.findElement(By.LINK_TEXT, "Next ->"));
// click next button using javascript
driver.ExecuteJavascriptClickButton(nextButton);
}
}
This while loop checks for the presence of the Next button with each iteration. If the button does not exist, the loop breaks. Inside the loop, we call driver.findElement with each successive click, so that we do not get a StaleElementReferenceException.
Hope this helps.

compute Visible property for a button, based upon length of a textarea field

I would like to calculate the visibility of a button based upon the content of a text area field (multi line edit box). it should contain at least some text.
I could use the onkeypress event (server) and perform a partial refresh on the button BUT I notice that the partial refresh spinner appears then when users are writing in the field. I would like to avoid this.
What options do I have?
You would be best off writing a client side script for that event. This script should show the button when there are more than 200 characters in the textarea. You will need to set the style visibility to hidden for the button initially. If the form can be edited multiple times, you will need to write this as a function and call it on page load as well as in the keypress event.
If you can use the keyup event instead of keypress, this may be better.
var textareaID = '#{id:textareaID}';
var buttonID = '#{id:buttonID}';
var textareaValue = document.getElementById(textareaID).value;
var visibility;
if (textareaValue.length > 200) {
visibility = 'visible';
}
else
{
visibility = 'hidden';
}
document.getElementById(buttonID).style.visibility=visibility;

Python Selenium, the text of the next element

I can get the text of the first element. But I do not know how to go through the entire list and get the text of each element. Here is the tree from the site:
Screenshot
So I get the text of the first element:
driver.find_element_by_xpath("//a[#my-peer-link='participant.user_id']").click()
print(driver.find_element_by_xpath("//span[#ng-bind=\"'#' + user.username\"]").text)
In each
div class="md_modal_list_peer_wrap clearfix" ng-repeat="participant in
chatFull.participants.participants"
is contained
div class="md_modal_list_peer_name"
which contains
a class="md_modal_list_peer_name"
my-peer-link="participant.user_id">Олег
which you need to press. That is, execute:
driver.find_element_by_xpath("//a[#my-peer-link='participant.user_id']").click()
After that, a new window opens, from which I get the text of the element:
driver.find_element_by_xpath("//span[#ng-bind=\"'#' + user.username\"]").text
But there are several of these elements and I need to get the text with everyone:
div class="md_modal_list_peer_wrap clearfix" ng-repeat="participant in
chatFull.participants.participants"
How to do it?
Vladimir, I haven't done a careful analysis of this problem; however, could it be as simple as this?
Rather than using
driver.find_element_by_xpath("//span[#ng-bind=\"'#' + user.username\"]").text
could you use:
for span in driver.find_elements_by_xpath("//span[#ng-bind=\"'#' + user.username\"]"):
span.text
(Notice plural in `find_elements_by_xpath'.)
You'll need to click each element and store the text in a list:
So first use
elements_to_click = driver.find_elements_by_xpath("//a[#my-peer-link='participant.user_id']"
This will return a list of elements
Loop through those elements by
clicking on element
switch to the new window (When a new WindowHandle was created, otherwise skip this step.)
get the text via
driver.find_element_by_xpath("//span[#ng-bind=\"'#' + user.username\"]").text
store it in a list
close window (When a new WindowHandle is created, otherwise skip this step.)
switch to previous window

Resources