I want to scrape terms from SAP Glossary website with terms details.
I can only get 50 terms now. Because I couldn't figure out how to click on 'load more' then continue scrolling down to scrape more terms.
I noticed the 'load more' button has to change color to orange so it's clickable
page_url = "https://help.sap.com/glossary/?locale=en-US&search=CRM"
driver.get(page_url)
driver.maximize_window()
element = driver.find_elements(by=By.XPATH,value='//a[#role="menuitem"]')
load_more = driver.find_elements(by=By.CSS_SELECTOR,value='button.motion-button')
detail = []
c = driver.find_elements(by=By.TAG_NAME,value='p')
for i in range(51):
element[i].click()
detail.append(c[0].text)
print(i,c[0].text)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
I found this video talks exactly what I need. It's not about the 'Load more' button...you need to find the json file
https://www.youtube.com/watch?v=qqNufBruvUc
I wrote the following code can meet your requirement, first while the 'load more' button is existed, click it to load more data. after all data loaded. then use 'find_elements' to get the element collection.
from time import sleep
from clicknium import clicknium as cc
if not cc.chrome.extension.is_installed():
cc.chrome.extension.install_or_update()
tab = cc.chrome.open("https://help.sap.com/glossary/?locale=en-US&search=CRM")
load_more = tab.find_element_by_css_selector('button.motion-button')
while tab.is_existing_by_css_selector('button.motion-button'):
load_more.click()
sleep(1)
elements = tab.find_elements_by_xpath('//a[#role="menuitem"]')
for element in elements:
element.click()
print(element.get_text())
Related
I need to scrap some data from this url:
https://www.cnrtl.fr/definition/coupe
The data/results I need to scrap are located in those 3 different tabs:
I'm unable to click on the onclick element which should let me switch from a tab to another.
Here the html code for one of the 3 onclick elements:
The 3 onclick elements differ from each other by the number at the end:
#COUPE1:
return sendRequest(5,'/definition/coupe//0');
#COUPE2:
return sendRequest(5,'/definition/coupe//1');
#COUPER:
return sendRequest(5,'/definition/coupe//2');
I tried to find them by link text, partial link text, xpath and css selector.
I've followed this thread:
Python + Selenium: How can click on "onclick" elements?
Also try the contains and text() method.
Without success.
There are a few ways you could do this. I chose the method I did because the page reloads causing the elements to become stale.
#Get the URL
driver.get("https://www.cnrtl.fr/definition/coupe")
#Find the parent element of the tabs
tabs = driver.find_element(By.ID, 'vtoolbar')
#Get all the list items under the parent (tabs)
lis = tabs.find_elements(By.TAG_NAME, 'li')
#loop over them (skipping the first tab, because that's already loaded)
for i in range(1, len(lis)):
#Execute the same JS as the page would on click, using the index of the loop
driver.execute_script(f"sendRequest(5,'/definition/coupe//{i}');")
#Sleep to visualise the clicking
time.sleep(3)
I am trying to add information to my Listbox and keeping it the size I state when I configure it. Here is my code for the Listbox with the scrollbar and an image of what it looks like.
Picture of the listbox.
taskList = Listbox(setBox, bg="#1B2834",fg="white")
taskList.configure(width=183,height=39)
taskList.pack(side=LEFT,fill=BOTH)
taskScroll = Scrollbar(setBox)
taskScroll.configure(bg="#1B2834",width=18)
taskScroll.pack(side = RIGHT, fill = BOTH)
taskList.config(yscrollcommand = taskScroll.set)
taskScroll.config(command = taskList.yview)
Now, when i click a button the command is to execute this following code:
def savetasks():
#make tasks
letters = string.ascii_uppercase
result_str = ''.join(random.choice(letters) for i in range(4))
num = str(random.randrange(0,9))
taskIDnum = num+result_str
taskIDLBL = Label(taskList, text=taskIDnum,bg="#1B2834", fg="White")
taskIDLBL.configure(padx=20,pady=10)
taskIDLBL.pack()
This code works fine as well, creating new labels with a random ID but it resizes the listbox to look like this...
Picture of the list box after clicking the button to execute the command.
Lastly, the scroll bar is not scrollable and when I create a lot of id's that end up going off my screen I cannot use the scroll bar to scroll down to see them, is there a way to not let the Listbox be resized and is it possible to set the Listbox with max and min-height?
If there is an easier way to do this without using a Listbox please let know, I just need to able to scroll down to see all the other id's and I didn't see any other way to use a scroll bar, that I NEEDED to use a Listbox
This question already has an answer here:
Looping through rows of a table while clicking links in selenium (python)
(1 answer)
Closed 3 years ago.
I have 60 buttons on a page and I want to click all of them. Is it possible to create a loop to do so?
XPATH for the buttons:
/html/body/form/div[3]/div[4]/table/tbody/tr/td[2]/div/table/tbody/tr[2]/td/div/div/div/ct-polling-activity/div/div/div/div[2]/div[1]/table/tbody/tr[1]/td[6]/div/i
/html/body/form/div[3]/div[4]/table/tbody/tr/td[2]/div/table/tbody/tr[2]/td/div/div/div/ct-polling-activity/div/div/div/div[2]/div[1]/table/tbody/tr[2]/td[6]/div/i
The only different number for them is the last tr[ ], being a sequence until it reaches 60.
This is the function that I'm using to click the buttons.
def explicit_wait_xpath(my_selector):
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.XPATH,
my_selector)))
element.click()
According to the question you asked, I think you can just create a loop, dynamically update the selector and pass it to function call.
def explicit_wait_xpath(my_selector):
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.XPATH,my_selector)))
element.click()
for i in range(1,61):
# use of f-string
selector = f'"/html/body/form/div[3]/div[4]/table/tbody/tr/td[2]/div/table/tbody/tr[2]/td/div/div/div/ct-polling-activity/div/div/div/div[2]/div[1]/table/tbody/tr[{i}]/td[6]/div/i"'
explicit_wait_xpath(selector)
Disclaimer: I have only begun to learn about Python. I took a crash course just to learn the very basics about a month ago and the rest of my efforts to learn have all been research thru Google and looking at solutions here in Stack Overflow.
I am trying to create an application that will read all PDF files stored in a folder and extract their filenames, page numbers, and the contents of the first page, and store this information into a 2D set. Once this is done, the application will create a tkinter GUI with 2 listboxes and 1 text box.
The application should display the PDF filenames in the first listbox, and the corresponding page numbers of each file in the second listbox. Both listboxes are synched in scrolling.
The text box should display the text contents on the first page of the PDF.
What I want to happen is that each time I click a PDF filename in the first listbox with the mouse or with up or down arrow keys, the application should display the contents of the first page of the selected file in the text box.
This is how my GUI looks and how it should function
https://i.stack.imgur.com/xrkvo.jpg
I have been successful in all other requirements so far except the part where when I select a filename in the first listbox, the contents of the first page of the PDF should be displayed in the text box.
Here is my code for populating the listboxes and text box. The contents of my 2D set pdfFiles is [['PDF1 filename', 'PDF1 total pages', 'PDF1 text content of first page'], ['PDF2 filename', 'PDF2 total pages', 'PDF2 text content of first page'], ... etc.
===========Setting the Listboxes and Textbox=========
scrollbar = Scrollbar(list_2)
scrollbar.pack(side=RIGHT, fill=Y)
list_1.config(yscrollcommand=scrollbar.set)
list_1.bind("<MouseWheel>", scrolllistbox2)
list_2.config(yscrollcommand=scrollbar.set)
list_2.bind("<MouseWheel>", scrolllistbox1)
txt_3 = tk.Text(my_window, font='Arial 10', wrap=WORD)
txt_3.place(relx=0.5, rely=0.12, relwidth=0.472, relheight=0.86)
scrollbar = Scrollbar(txt_3)
scrollbar.pack(side=RIGHT, fill=Y)
list_1.bind("<<ListboxSelect>>", CurSelect)
============Populating the Listboxes with the content of the 2D Set===
i = 0
while i < count:
list_1.insert(tk.END, pdfFiles[i][0])
list_2.insert(tk.END, pdfFiles[i][1])
i = i + 1
============Here is my code for CurSelect function========
def CurSelect(evt):
values = [list_1.get(idx) for idx in list_1.curselection()]
print(", ".join(values)) ????
========================
The print command above is just my test command to show that I have successfully extracted the selected item in the listbox. What I need now is to somehow link that information to its corresponding page content in my 2D list and display it in the text box.
Something like:
1) select the filename in the listbox
2) link the selected filename to the filenames stored in the pdfFilename 2D set
3) once filename is found, identify the corresponding text of the first page
4) display the text of the first page of the selected file in the text box
I hope I am making sense. Please help.
You don't need much to finish it. You just need some small things:
1. Get the selected item of your listbox:
selected_indexes = list_1.curselection()
first_selected = selected_indexes[0] # it's possible to select multiple items
2. Get the corresponding PDF text:
pdf_text = pdfFiles[first_selected][2]
3. Change the text of your Text widget: (from https://stackoverflow.com/a/20908371/8733066)
txt_3.delete("1.0", tk.END)
txt_3.insert(tk.END, pdf_text)
so replace your CurSelect(evt) method with this:
def CurSelect(evt):
selected_indexes = list_1.curselection()
first_selected = selected_indexes[0]
pdf_text = pdfFiles[first_selected][2]
txt_3.delete("1.0", tk.END)
txt_3.insert(tk.END, pdf_text)
I am scraping data from a webpage that is paginated, and once I finish scraping one page, I need to click the next button and continue scraping the next page. I then need to stop once I have scraped all of the pages and a next button no longer exists. Below contains the html around the "Next" button that I need to click.
<tr align="center">
<td colspan="8" bgcolor="#FFFFFF">
<br>
<span class="paging">
<b> -- Page 1 of 3 -- </b>
</span>
<p>
<span class="paging">
<a href="page=100155&by=state&state=AL&pagenum=2"> .
<b>Next -></b>
</a>
</span>
<span class="paging">
Last ->>
</span>
</p>
</td>
</tr>
I have tried selecting on class and on link text, and both have not worked for me in my current attempts.
2 examples of my code:
while True:
try:
link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "Next ->"))).click()
except TimeoutException:
break
while True:
try:
link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "paging"))).click()
except TimeoutException:
break
All of the solutions I have found online have not worked, and have primarily ended with the following error:
ElementClickInterceptedException: Message: element click
intercepted: Element <a href="?
page=100155&by=state&state=AL&pagenum=2">...</a> is not
clickable at point (119, 840). Other element would receive the
click: <body class="custom-background hfeed" style="position:
relative; min-height: 100%; top: 0px;">...</body>
(Session info: chrome=76.0.3809.132)
If the remainder of the error code would be helpful to review, please let me know and I will update the post with this error.
I have looked at the following resources, all to no avail:
Python Selenium clicking next button until the end
python - How to click "next" in Selenium until it's no longer available?
Python Selenium Click Next Button
Python Selenium clicking next button until the end
Selenium clicking next button programmatically until the last page
How can I make Selenium click on the "Next" button until it is no longer possible?
Could anyone provide suggestions on how I can select the "Next" button (if it exists) and go to the next page with this set of HTML? Please let me know if you need any further clarification on the request.
We can approach this problem through the solution using two major libraries - selenium and requests.
Approach - Scrape the page for page number and next page link every time
Using Selenium (If the site is Dynamic)
We can check if the page we are on is the last page or not, and if it is not the last page, we can check for the next button (assuming the website follows the same html structure for paging in all pages)
stop = False
driver.get(url)
while not stop:
paging_elements = driver.find_elements_by_class_name("paging")
page_numbers = paging_elements[0].text.strip(" -- ").split("of")
## Getting the current page number and the final page number
final = int(page_numbers[1].strip())
current = int(page_numbers[0].split("Page")[-1].strip())
if current==final:
stop=True
else:
next_page_link = paging_elements[-2].find_element_by_name("a").get_attribute('href')
driver.get(next_page_link)
time.sleep(5) # This gap can be changed as per the load time of the page
Using Requests and BS4 (If the site is static)
import requests
r = requests.get(url)
stop = False
while not stop:
soup = BeautifulSoup(r.text, 'html.parser')
paging_elements = soup.find_all('span', attrs={'class': "paging"})
page_numbers = paging_elements[0].text.strip(" -- ").split("of")
## Getting the current page number and the final page number
final = int(page_numbers[1].strip())
current = int(page_numbers[0].split("Page")[-1].strip())
if current==final:
stop=True
else:
next_page_link = paging_elements[-2].find("a").get('href')
r = request.get(next_page_link)
Alternative approaches
One method is using the URL of the website itself instead of the button-clicking process as the button click is intercepted in this case.
Most web pages have a page attribute added to their URL (visible for pages >=2). So, a paginated website might have URLs such as:
www.targetwebsite.com/category?page_num=1
www.targetwebsite.com/category?page_num=2
www.targetwebsite.com/category?page_num=3
and so on.
In such cases, one can simply iterate over the page numbers until the final page number (as originally out in the proposed answer). This approach eliminates the breakage possibility of the target website changing CSS layout/style.
Furthermore, there might be a requirement to create the next_page_link by appending the base URL as done for next_url in the other question (line 40-41):
next_url = next_link.find("a").get("href")
r = session.get("https://reverb.com/marketplace" + next_url)
I hope this helps!
It sounds like you're asking two different questions here:
How to click Next button until it no longer exists
How to click Next button with Javascript.
Here's a solution to #2 -- Javascript clicking:
public static void ExecuteJavaScriptClickButton(this IWebDriver driver, IWebElement element)
{
((IJavaScriptExecutor) driver).ExecuteScript("arguments[0].click();", element);
}
In the above code, you have to cast your WebDriver instance as IJavascriptExecutor, which allows you to run JS code through Selenium. The parameter element is the element you wish to click -- in this case, the Next button.
Based on your code sample, your Javascript click may look something like this:
var nextButton = driver.findElement(By.LINK_TEXT, "Next ->"));
driver.ExecuteJavascriptClickButton(nextButton);
Now, moving onto your other issue -- clicking until the button is no longer visible. I would implement this in a while loop that breaks whenever the Next button no longer exists. I also recommend implementing a function that can check the presence of the Next button, and ignore the ElementNotFound or NoSuchElement exception in case the button does not exist, to avoid breaking your test. Here's a sample that includes an ElementExists implementation:
public bool ElementExists(this IWebDriver driver, By by)
{
// attempt to find the element -- return true if we find it
try
{
return driver.findElements(by).Count > 0;
}
// catch exception where we did not find the element -- return false
catch (Exception e)
{
return false;
}
}
public void ClickNextUntilInvisible()
{
while (driver.ElementExists(By.LINK_TEXT, "Next ->"))
{
// find next button inside while loop so it does not go stale
var nextButton = driver.findElement(By.LINK_TEXT, "Next ->"));
// click next button using javascript
driver.ExecuteJavascriptClickButton(nextButton);
}
}
This while loop checks for the presence of the Next button with each iteration. If the button does not exist, the loop breaks. Inside the loop, we call driver.findElement with each successive click, so that we do not get a StaleElementReferenceException.
Hope this helps.