Selenium scrapes only one result and ignores other related reults - python-3.x

I am new to selenium. Searching a web site, I get 10 results for each page. Those results are shown as lists (li tags) on the page and each list contains the same attributes. When my conditions are met, I go to another related web page and get desired content. However, when my code keeps looping for the lists, it fails to find the same attributes for the others. Here is my code:
p_url = "https://www.linkedin.com/vsearch/f?keywords=BARCO%2BNV%2Bkortrijk&pt=people&page_num=5"
driver.get(p_url)
time.sleep(5)
results = driver.find_element_by_id("results-container")
employees = results.find_elements_by_tag_name('li')
#emp_list = []
#for i in range(len(employees)):
# emp_list.append(employees[i])
for emp in employees:
try:
main_emp = emp.find_element_by_css_selector("a.title.main-headline")
name = emp.find_element_by_css_selector("a.title.main-headline").text
href = main_emp.get_attribute("href")
if name != "LinkedIn Member":
location = emp.find_element_by_class_name("demographic").text
href = main_emp.get_attribute("href")
print(href)
print(location)
driver.get(href)
exp = driver.find_element_by_id("background-experience")
amkk = exp.find_elements_by_class_name("editable-item")
for amk in amkk:
him = amk.find_element_by_tag_name("header").text
him2 = amk.find_element_by_class_name("experience-date-locale").text
if '\n' in him:
a = him.split('\n')
print(a[0])
print(a[1])
print(him2)
except Exception as exc:
print(exc)
continue
In this code the line main_emp = emp.find_element_by_css_selector("a.title.main-headline") stop working after it works for the first time. As a result I got an error of Message: stale element reference: element is not attached to the page document
From stackoverflow questions I saw that some say the content is removed from DOM structure and from another post someone suggested to fill a list with the results. Here what I have tried emp_list = []
for i in range(len(employees)):
emp_list.append(employees[i]) , however, it also did not work out.
How can I overcome this?

The selector you are using is wrong. You are getting the results using the results-container id. This works fine, but the collecting the elements form this is not working. It is returning more elements than just the employees (I'm not quite sure why).
If you change you selectors to this single selector you will get just the employees and no other unwanted elements.
employees = results.find_elements_by_css_selector("ol[id='results']>li")
Edit
Since you are opening the employees and losing the list of elements you might want to try opening the employee in a new tab, perform your actions here and close the tab afterwards.
Example:
for emp in employees:
try:
main_emp = emp.find_element_by_css_selector("a.title.main-headline")
# Do stuff you need...
# Open employee in new tab (make sure Keys is imported)
main_emp.send_keys(Keys.CONTROL + 't')
# Focus on new tab
driver.switch_to_window(d.window_handles[1])
# Do stuff inside the employee page
# Close the tab you opened
driver.close()
# Switch back to the first tab
driver.switch_to_window(d.window_handles[0])
Note: For OSX you should use main_emp.send_keys(Keys.COMMAND + 't')

Related

Selenium: Stale Element Reference Exception Error

I am trying to loop through all the pages of a website. but I am getting a stale element reference: element is not attached to the page document error. This happens when the script try to click the third page. The script got the error when it runs to page.click(). Any suggestions?
while driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a')[-1].text=='...':
links=driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a')
for link in links:
if ((link.text !='...') and (link.text !='ADD DOCUMENTS')):
print('Page Number: '+ link.text)
print('Page Position: '+str(links.index(link)))
position=links.index(link)
page=driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a')[position]
page.click()
time.sleep(5)
driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a')[-1].click()
You can locate the link element each time again according to the index, not to use elements found initially.
Something like this:
amount = len(driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a'))
for i in range(1,amount+1):
link = driver.find_element_by_xpath("(//*[#id='jsGrid_vgAllCases']//a)["+str(i) +"]")
from now you can continue within your for loop with this link like this:
amount = len(driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a'))
for i in range(1,amount+1):
link = driver.find_element_by_xpath("(//*[#id='jsGrid_vgAllCases']//a)["+str(i) +"]")
if ((link.text !='...') and (link.text !='ADD DOCUMENTS')):
print('Page Number: '+ link.text)
print('Page Position: '+str(links.index(link)))
position=links.index(link)
page=driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a')[position]
page.click()
time.sleep(5)
(I'm not sure about the correctness of all the rest your code, just copy-pasted it)
I'm running into an issue with the Stale Element Exception too. Interesting with Firefox no problem, Chrome && Edge both fail randomly. In general i have two generic find method with retry logic, these find methods would look like:
// Yes C# but should be relevant for any WebDriver...
public static IWebElement( this IWebDriver driver, By locator)
public static IWebElement( this IWebElement element, By locator)
The WebDriver variant seems to work fine for my othe fetches as the search is always "fresh"... But the WebElement search is the one causing grief. Unfortunately the app forces me to need the WebElement version. Why he page/html will be something like:
<node id='Best closest ID Possible'>
<span>
<div>text i want</div>
<div>meh ignore this </div>
<div>More text i want</div>
</span>
<span>
<!-- same pattern ... -->
So the code get the closest element possible by id and child spans i.e. "//*[#id='...']/span" will give all the nodes of interest. This is now where i run into issues, enumerating all element, will do two XPath select i.e. "./div[1]" and "./div[3]" for pulling out the text desired. It is only in fetching the text nodes under the elements where randomly a StaleElement will be thrown. Sometimes the very first XPath fails, sometimes i'll go through a few pages, as the pages being might have 10,000's or more pages, while the structure is the same i'll spot check random pages as they all the same format. At most i've gotten through 20 consecutive pages with Chrome (ver 92.0.4515.107) or Edge (ver 94.0.986), both seem to be the latest as of now.
One solution that should work, get all the the span elements first, i.e. '//*[#id='x']/span' get my list then query from the driver like:
var nodeList = driver.FindElements(By.XPath('//*[#id='x']/span' ));
for( int idx = 0 ; idx < nodeList.Count; idx++)
{
string str1 = driver.FindElements(By.XPath("//*[#id='x']/span[idx+1]/div[1]")).GetAttribute("innerText");
string str2 = driver.FindElements(By.XPath("//*[#id='x']/span[idx+1]/div[3]")).GetAttribute("innerText");
}
```
Think it would work but, YUK! This is kind of simplified and being able to do an XPath from the respective "ID" located node would be preferable..

Kodi addons : how to correctly set an URL using xbmcplugin.addDirectoryItems and xbmcgui.ListItem?

I'm trying to update a plugin for Kodi 19 (and Python3).
But! Hell! Their documentation is a mess, and when you search the internet, a lot of code is outdated.
I cannot understand how correctly create a virtual folder with items using xbmcplugin.addDirectoryItems.
here's my (simplified) code:
this is my KODI menu function
def menu_live():
#this is were I get my datas (from internet)
datas = api.get_live_videos()
listing = datas_to_list(datas)
sortable_by = (xbmcplugin.SORT_METHOD_DATE,
xbmcplugin.SORT_METHOD_DURATION)
xbmcplugin.addDirectoryItems(common.plugin.handle, listing, len(listing))
xbmcplugin.addSortMethod(common.plugin.handle, xbmcplugin.SORT_METHOD_LABEL)
xbmcplugin.endOfDirectory(common.plugin.handle)
this builds a list of items for the virtual folder
def datas_to_list(datas):
list_items = []
if datas and len(datas):
for data in datas:
li = data_to_listitem(data)
url = li.getPath()
list_items.append((url, li, True))
return list_items
this create a xbmcgui.ListItem for our listing
def data_to_listitem(data):
#here I parse my data to build a xbmcgui.ListItem
label = ...
url = ...
...
list_item = xbmcgui.ListItem(label)
list_item.setPath(url)
return list_item
I don't understand well how to interact with the media url.
It seems that it can be defined within xbmcgui.ListItem using
list_item.setPath(url)
which seems ok to me (an url is set to the item itself)
but then, it seems that you also need to set the URL when adding the item to the list,
li = data_to_listitem(data)
list_items.append((url, li, True))
This looks weird since it means you have to know the URL outside the function that builds the item.
So currently, my workaround is
li = data_to_listitem(data)
url = li.getPath() #I retrieve the URL defined in the above function
list_items.append((url, li, True))
That code works. But the question is: if I can define an URL on the ListItem using setPath(), then why should I also fill that URL when appending the ListItem to my listing list_items.append((url, li, True)) ?
Thanks a lot !
I'm not exactly sure what your question is. But Video/audio add-on development is thoroughly explained in these guides: https://kodi.wiki/view/HOW-TO:Audio_addon, https://kodi.wiki/view/Audio-video_add-on_tutorial and https://kodi.wiki/view/HOW-TO:Video_addon. Have a look at them, especially the video-add-on guide (as pointed out by Roman), and try to adapt to your case.
Edit
But the question is: if I can define an URL on the ListItem using setPath(), then why should I also fill that URL when appending the
ListItem to my listing?
I'm far from an expert, but from my understanding and in the context of https://kodi.wiki/view/HOW-TO:Video_addon tutorial, the url in
list_items.append((url, li, is_folder))
is used to route your plugin to your playback function, as well as passing arguments to it (e.g. video url and possibly other useful stuff needed for playback). That is, the list item passed here doesn't need to have its path set.
ListItem.setPath(video_url)
on the other hand, is for resolving the video url and start the playback after you have selected an item.

Selenium Stale Element Reference Errors (Seems Random)?

I know there have been several questions asked regarding stale elements, but I can't seem to resolve these.
My site is private so unfortunately can't share, but seems to always throw the error somewhere within the below for-loop. This loop is meant to get the text of each row in a table (number of rows varies). I've assigned WebDriverWait commands and have a very similar for-loop earlier in my code to do the same thing in another table on the website which works perfectly. I've also tried including the link click command and table, body, and tableText definition inside the loop to redefine at every iteration.
Once the code stops and the error message displays (stale element reference: element is not attached to the page document (Session info: chrome=89.0.4389.128)), if I manually run everything line-by-line, it all seems to work and correctly grabs the text.
Any ideas? Thanks!
link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "*link address*")))
link.click()
table = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "TableId")))
body = tableSig.find_element(By.CLASS_NAME, "*table body class*")
tableText = body.find_elements(By.TAG_NAME, "tr")
rows = len(tableText)
approvedSigs = [None]*rows
for i in range(1, rows+1):
approvedSigs[i-1] = (tableText[i-1].text)
approvedSigs[i-1] = approvedSigs[i-1].lstrip()
approvedSigs[i-1] = approvedSigs[i-1][9:]
approvedSigs[i-1] = approvedSigs[i-1].replace("\n"," ")

how to use selenium python to open new chat whatsapp (i need to target the second icon New Chat)

I need to target the second icon New Chat but they have the same class name
from selenium import webdriver
driver = webdriver.Chrome('C:/Users/ka-my/AppData/Local/Programs/Python/Python37-32/chromedriver')
driver.get('https://web.whatsapp.com/')
input('Enter anything after scanning QR code')
user1 = driver.find_element_by_class_name('_3j8Pd')
user1.click()
1.i need to target the second icon New Chat
just like Facebook and google the class names are dynamically generated So the best way around that is to look for something constant which is the icon string
new_chat = driver.find_elements_by_xpath('//div[#title="New chat"]') # return a list
if new_chat:
new_chat[0].click()
To get 2nd icon in new chat, you can use this:
# get the 2nd element in the list
second_icon = driver.find_elements_by_xpath("//div[#class='_3j8Pd']")[1]
Or:
# get the 2nd element in the list
second_icon = driver.find_elements_by_xpath("//div[#class='_3j8Pd'][2]")
In first example, we are getting a list of all the div elements, and picking the 2nd item using the [1] index. In second example, we are using element index in XPath [2] to get the second element in the list. List index is 0-based and XPath element index is 1-based, so that is why we see 1 and 2 here.

Filtering Haystack (SOLR) results by django_id

With Django/Haystack/SOLR, I'd like to be able to restrict the result of a search to those records within a particular range of django_ids. Getting these IDs is not a problem, but trying to filter by them produces some unexpected effects. The code looks like this (extraneous code trimmed for clarity):
def view_results(request,arg):
# django_ids list is first calculated using arg...
sqs = SearchQuerySet().facet('example_facet') # STEP_1
sqs = sqs.filter(django_id__in=django_ids) # STEP_2
view = search_view_factory(
view_class=SearchView,
template='search/search-results.html',
searchqueryset=sqs,
form_class=FacetedSearchForm
)
return view(request)
At the point marked STEP_1 I get all the database records. At STEP_2 the records are successfully narrowed down to the number I'd expect for that list of django_ids. The problem comes when the search results are displayed in cases where the user has specified a search term in the form. Rather than returning all records from STEP_2 which match the term, I get all records from STEP_2 plus all from STEP_1 which match the term.
Presumably, therefore, I need to override one/some of the methods in for SearchView in haystack/views.py, but what? Can anyone suggest a means of achieving what is required here?
After a bit more thought, I found a way around this. In the code above, the problem was occurring in the view = search_view_factory... line, so I needed to create my own SearchView class and override the get_results(self) method in order to apply the filtering after the search has been run with the user's search terms. The result is code along these lines:
class MySearchView(SearchView):
def get_results(self):
search = self.form.search()
# The ID I need for the database search is at the end of the URL,
# but this may have some search parameters on and need cleaning up.
view_id = self.request.path.split("/")[-1]
view_query = MyView.objects.filter(id=view_id.split("&")[0])
# At this point the django_ids of the required objects can be found.
if len(view_query) > 0:
view_item = view_query.__getitem__(0)
django_ids = []
for thing in view_item.things.all():
django_ids.append(thing.id)
search = search.filter_and(django_id__in=django_ids)
return search
Using search.filter_and rather than search.filter at the end was another thing which turned out to be essential, but which didn't do what I needed when the filtering was being performed before getting to the SearchView.

Resources