handle words with accents - watir

I'm trying to access a field by the display name that is a link, something like this:
<a class="node" href="javascript: MCMenu(7);">MÓVEL</a>
and trying to access the item by doing this:
t= $browser.link(:text => "MÓVEL").exists?
t.click
the error is:
unable to locate element, using {:text=>"M\303\223VEL", :tag_name=>"a"} (Watir::Exception::UnknownObjectException)

Try it with a regex like this:
t = $browser.link(:text => /M.VEL/).exists?

Related

Find if text exist inside a nested Div, if yes print out the whole string, Selenium Python

i'm very new to selenium(3.141.0) and python3, and i got a problem that couldn't figure it out.
The html looks similar to this
<div class='a'>
<div>
<p><b>ABC</b></p>
<p><b>ABC#123</b></p>
<p><b>XYZ</b></p>
<div>
</div>
I want selenium to find if # exist inside that div, (can not target the paragraph only element because sometime the text i want to extract is inside different element BUT it's always inside that <div class='a'>) If # exist => print the whole <p><b>ABC#123</b></p> (or sometime <div>ABC#123<div> )
To find an element with contained text, you must use an XPath. From what you are describing, it looks like you want the locator
//div[#class='a']//*[contains(text(),'#')]
^ a DIV with class 'a'
^ that has a descendant element that contains the text '#' within itself or a descendant
The code would look something like
for e in driver.find_elements(By.XPATH, "//div[#class='a']//*[contains(text(),'#')]"):
print(e.get_attribute('outerHTML')
and it will print all instances of <b>ABC#123</b>, <div>ABC#123</div>, or <p>ABC#123</p>, whichever exists

Scrapy parse is returning an empty array, regardles of yield

I am brand new to Scrapy, and I could use a hint here. I realize that there are quite a few similar questions, but none of them seem to fix my problem. I have the following code written for a simple web scraper:
import scrapy
from ScriptScraper.items import ScriptItem
class ScriptScraper(scrapy.Spider):
name = "script_scraper"
allowed_domains = ["https://proplay.ws"]
start_urls = ["https://proplay.ws/dramas/"]
def parse(self, response):
for column in response.xpath('//div[#class="content-column one_fourth"]'):
text = column.xpath('//p/b/text()').extract()
item = ScriptItem()
item['url'] = "test"
item['title'] = text
yield item
I will want to do some more involved scraping later, but right now, I'm just trying to get the scraper to return anything at all. The HTML for the site I'm trying to scrape looks like this:
<div class="content-column one_fourth">
::before
<p>
<b>
All dramas
<br>
(in alphabetical
<br>
order):
</b>
</p>
...
</div>
and I am running the following command in the Terminal:
scrapy parse --spider=script_scraper -c parse_ITEM -d 2 https://proplay.ws/dramas/
According to my understanding of Scrapy, the code I have written should be yielding the text "All dramas"; however, it is yielding an empty array instead. Can anyone give me a hint as to why this is not producing the expected yield? Again, I apologize for the repetitive question.
your XPath expressions are not exactly as you want to extract data. If you want the first column's first-row item. Then your XPath expression should be.
item = {}
item['text'] = response.xpath ('//div[#class="content-column one_fourth"][1]/p[1]/b/text()').extract()[0].
The function extract() will return all the matches for the expression, it returns an array. If you want the first you should use extract()[0] or extract_first().
Go through this page https://devhints.io/xpath to get more knowledge related to Xpath.

separate texts from a href in same td with XPath python

I have an HTML webpage like this:
<tr><td style="text-align:center;">7</td><td class="multi_row" style="line-height:15px;">Loaded on 'NYK LEO 303W' at Port of Loading<br> NYK LEO 303W</td><td class="multi_row" style="line-height:15px;">VANCOUVER, BC ,CANADA<br> 3891 DELTAPORT GCT</td><td class="ico_e">2018-10-26 23:30</td></tr>
I want to separate the <a href>'s string part in one variable and have a pure text like 'bla bla bla' in another variable.
this is what i have done till now:
event_path = driver.find_elements_by_xpath("//table[#id='detail']//tr/td[2]")
event = [cell.text for cell in event_path]
its for the text part
and this one is for the string in :
vessel_path = driver.find_elements_by_xpath("//table[#id='detail']//tr/td[2]/a")
vessel = [cell.text.split(' ')[:2] for cell in vessel_path]
the split(' ')[:2] is cuz the data is sth like this : NYK LEO 303W and i just need words not the number (it can be done more reliable with regex)
Try to use below to get only first text node from td
event = [driver.execute_script('return arguments[0].firstChild.textContent;', cell).strip() for cell in event_path]
Please try following code :
elements = driver.find_elements_by_classname("multi_row")
for element in elements
print(element.text)
In your case, I see vessel that you are expecting is already present in title attribute of anchor.
If it is a valid case, then you can get it directly from attibutes like,
vessel_path = driver.find_elements_by_xpath("//table[#id='detail']//tr/td[2]/a")
vessel = [cell.get_attribute("title") for cell in vessel_path]

Watir and <script>

I am very new to programming so trying to solve the following issue with Watir:
I have a webpage that is full of fields, I'm trying to scrape values from inside ==$. The values inside start from var pageData if that helps.
X path is //*[#id="innerpage"]/script[48]
How can I achieve this?
Thanks
I don't know what ==$ means or what var pageData means, but to get the element at the provided XPath you use:
element = browser.element(id: 'innerpage').script(index: 47)
Though hopefully there's something more unique you can use than just the 48th script element.
From there you get the information at the element as desired:
element.text
element.value
element.attribute(attribute_name)

changing the font color in a computed field using javascript

How to change the font color of Hello alone in "Hello World" using javascript/some other method?
I tried the following code,
var s= session.getCommonUserName()
s.fontcolor("green")
"Hello"+" "+ s.toUpperCase()
where i tried to change just the color of the username alone. But it failed.
I wouldn't bother to send down unformatted HTML to the client and then let the client do the JavaScript work. You create a computed field and give it the data type HTML (that keeps HTML you create intact) and use SSJS. So no JS needs to execute at the client side:
var cu = session.getCommonUserName();
return "Hello"+" <span style=\"color : green\">"+ cu.toUpperCase()+"</span>";
Don't forget to cross your t, dot your i and finish a statement with a semicolon :-)
If you want to do it with client java script, then you must do something like this:
dojo.style("html_element_id", "color", "green");
So in your case you can have have something like:
<p><span id="span1">Hello</span> World.</p>
Or you can do it directly if you don't need to change it with CJS:
<p><span style="color:green">Hello</span> World</p>
one way to do it is to wrap your 'hello' in a html span and then change the color of that span.
<span id='myspan'>hello</span> world
javascript code:
document.getElementById('myspan').style.color='green';
Went old school on this one...
Say you want to put your formatted text in a div
<div id="test">
</div>
Then you need the following javascript to do so:
div = document.getElementById("test");
hello = document.createElement("span");
hello.innerHTML = "Hello"
hello.style.color = "green";
div.appendChild(hello);
div.appendChild(document.createTextNode(" world!"));

Resources