How to replace selected string from content editable div? - python-3.x

I'm trying to replace chapter titles from the contenteditable="true" div tag by using python and selenium-webdriver, at first I am searching for the chapter title, which is usually at first line... then I'm replacing it with empty value and saving.. but it's not saving after refreshing browser. But I see that code is working. Here is my code
##getting content editable div tag
input_field = driver.find_element_by_css_selector('.trumbowyg-editor')
### getting innerHTML of content editable div
chapter_html = input_field.get_attribute('innerHTML')
chapter_content = input_field.get_attribute('innerHTML')
if re.search('<\w*>', chapter_html):
chapter_content = re.split('<\w*>|</\w*>', chapter_html)
first_chapter = chapter_content[1]
### replacing first_chapter with ''
chapter_replace = chapter_html.replace(first_chapter, '')
### writing back innerHTML without first_chapter string
driver.execute_script("arguments[0].innerHTML = arguments[1];",input_field, chapter_replace)
time.sleep(1)
## click on save button
driver.find_element_by_css_selector('.btn.save-button').click()
How I can handle this ? It is working when I'm doing manually(I mean it probably can't be site problem/bug)... Please help ...
Relevant HTML is following:
<div class="trumbowyg-editor" dir="ltr" contenteditable="true">
<p>Chapter 1</p>
<p> There is some text</p>
<p> There is some text</p>
<p> There is some text</p>
</div>

As per the HTML you have shared to replace the chapter title with empty value you have to induce WebDriverWait with expected_conditions clause set to visibility_of_element_located and can use the following block of code :
page_number = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='trumbowyg-editor' and #contenteditable='true']/p[contains(.,'Chapter')]")))
driver.execute_script("arguments[0].removeAttribute('innerHTML')", page_number)
#or
driver.execute_script("arguments[0].removeAttribute('innerText')", page_number)
#or
driver.execute_script("arguments[0].removeAttribute('textContent')", page_number)

Related

Python - Web Scraping :How to access div tag of 1 class when I am getting data for div tags for multiple classes

I want div tag of 2 different classes in my result.
I am using following command to scrape the data -
'''
result = soup.select('div', {'class' : ['col-s-12', 'search-page-text clearfix row'] })
'''
Now, I have specific set of information in class 'col-s-12' and another set of information n class 'search-page-text clearfix row'
Now, I want to find children of only div tag with class - 'col-s-12'. When I am running below code, then it looks for children of both the div tags, since I have not specified anywhere which class I want to search
'''
for div in result:
print(div)
prod_name = div.find("a" , recursive=False)[0] #should come from 'col-s-12' only
prod_info = div.find("a" , recursive=False)[0] # should come from 'search-page-text clearfix row' only
'''
Example -
'''
<div class = 'col-s-12'>
This is what I want or variable **prod_name**
</div>
<div class = 'search-page-text clearfix row'>
<a> This should be stored in variable **prod_info** </a>
</div>
'''
You can search for first <a> tag under tag with class="col-s-12" and then use .find_next('a') to search next <a> tag.
Note: .select() method accepts only CSS selectors, not dictionaries.
For example:
txt = '''<div class = 'col-s-12'>
This is what I want or variable **prod_name**
</div>
<div class = 'search-page-text clearfix row'>
<a> This should be stored in variable **prod_info** </a>
</div>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(txt, 'html.parser')
prod_name = soup.select_one('.col-s-12 > a')
prod_info = prod_name.find_next('a')
print(prod_name.get_text(strip=True))
print(prod_info.get_text(strip=True))
Prints:
This is what I want or variable **prod_name**
This should be stored in variable **prod_info**

Selenium Can't Find Element Returning None or []

im having trouble accessing element, here is my code:
driver.get(url)
desc = driver.find_elements_by_xpath('//p[#class="somethingcss xxx"]')
and im trying to use another method like this
desc = driver.find_elements_by_class_name('somethingcss xxx')
the element i try to find like this
<div data-testid="descContainer">
<div class="abc1123">
<h2 class="xxx">The Description<span data-tid="prodTitle">The Description</span></h2>
<p data-id="paragraphxx" class="somethingcss xxx">sometext here
<br>text
<br>
<br>text
<br> and several text with
<br> tag below
</p>
</div>
<!--and another div tag below-->
i want to extract tag p inside div class="abc1123", but it doesn't return any result, only return [] when i try to get_attribute or extract it to text.
When i try extract another element using this method with another class, it works perfectly.
Does anyone know why I can't access these elements?
Try the following css selector to locate p tag.
print(driver.find_element_by_css_selector("p[data-id^='paragraph'][class^='somethingcss']").text)
OR Use get_attribute("textContent")
print(driver.find_element_by_css_selector("p[data-id^='paragraph'][class^='somethingcss']").get_attribute("textContent"))

Scrape a span text from multiple span elements of same name within a p tag in a website

I want to scrape the text from the span tag within multiple span tags with similar names. Using python, beautifulsoup to parse the website.
Just cannot uniquely identify that specific gross-amount span element.
The span tag has name=nv and a data value but the other one has that too. I just wanna extract the gross numerical dollar figure in millions.
Please advise.
this is the structure :
<p class="sort-num_votes-visible">
<span class="text-muted">Votes:</span>
<span name="nv" data-value="93122">93,122</span>
<span class="ghost">|</span>
<span class="text-muted">Gross:</span>
<span name="nv" data-value="69,645,701">$69.65M</span>
</p>
Want the text from second span under span class= text muted Gross.
What you can do is find the <span> tag that has the text 'Gross:'. Then, once it finds that tag, tell it to go find the next <span> tag (which is the value amount), and get that text.
from bs4 import BeautifulSoup as BS
html = '''<p class="sort-num_votes-visible">
<span class="text-muted">Votes:</span>
<span name="nv" data-value="93122">93,122</span>
<span class="ghost">|</span>
<span class="text-muted">Gross:</span>
<span name="nv" data-value="69,645,701">$69.65M</span>
</p>'''
soup = BS(html, 'html.parser')
gross_value = soup.find('span', text='Gross:').find_next('span').text
Output:
print (gross_value)
$69.65M
or if you want to get the data-value, change that last line to:
gross_value = soup.find('span', text='Gross:').find_next('span')['data-value']
Output:
print (gross_value)
69,645,701
And finally, if you need those values as an integer instead of a string, so you can aggregate in some way later:
gross_value = int(soup.find('span', text='Gross:').find_next('span')['data-value'].replace(',', ''))
Output:
print (gross_value)
69645701

How can i click the third href link?

<ul id='pairSublinksLevel1' class='arial_14 bold newBigTabs'>...<ul>
<ul id='pairSublinksLevel2' class='arial_12 newBigTabs'>
<li>...</li>
<li>...</li>
<li>
<a href='/equities/...'> last data </a> #<-- HERE
</li>
<li>...</li>
Question is how can i get click third li tag ??
In my code
xpath = "//ul[#id='pairSublinksLevel2']"
element = driver.find_element_by_xpath(xpath)
actions = element.find_element_by_css_selector('a').click()
code works partially. but i want to click third li tag.
The code keeps clicking on the second tag.
Try
driver.find_element_by_xpath("//ul[#id='pairSublinksLevel2']/li[3]/a").click()
EDIT:
Thanks #DebanjanB for suggestion:
When you get the element with xpath //ul[#id='pairSublinksLevel2'] and search for a tag in its child elements, then it will return the first match(In your case, it could be inside second li tag). So you can use indexing as given above to get the specific numbered match. Please note that such indexing starts from 1 not 0.
As per the HTML you have shared you can use either of the following solutions:
Using link_text:
driver.find_element_by_link_text("last data").click()
Using partial_link_text:
driver.find_element_by_partial_link_text("last data").click()
Using css_selector:
driver.find_element_by_css_selector("ul.newBigTabs#pairSublinksLevel2 a[href*='equities']").click()
Using xpath:
driver.find_element_by_xpath("//ul[#class='arial_12 newBigTabs' and #id='pairSublinksLevel2']//a[contains(#href,'equities') and contains(.,'last data')]").click()
Reference: Official locator strategies for the webdriver

Python to click a button/link in JavaScript

I want to show more page by click a button to run a JavaScript function as below:
<div class="loading" style="display:none;">
<p class="btn blue"><span>さらに表示</span></p>
</div>
I tried the code, it doesn't work, what can I do?
more_info_button = driver.find_element_by_tag_name('a').get('href=javascript:void(0);')
more_info_button.click()
If you want to click link that contains attribute #href equal to "javascript:void(0);", try
more_info_button = driver.find_element_by_xpath('//a[#href="javascript:void(0);"]')
more_info_button.click()
Same with CSS selector:
more_info_button = driver.find_element_by_css_selector('a[href="javascript:void(0);"]')
To locate link by text in preceding paragraph:
more_info_button = driver.find_element_by_xpath('//a[preceding-sibling::p[.="さらに表示"]]')
Update
Try below code to get extended topics list after clicking the button:
from selenium.webdriver.support.ui import WebDriverWait
topics_number = len(driver.find_elements_by_class_name('topics'))
more_info_button.click()
WebDriverWait(driver, 10).until(lambda driver: len(driver.find_elements_by_class_name('topics')) > topics_number)
extended_list = driver.find_elements_by_class_name('topics')

Resources